Skip to content

Deployment Runbook (MT-RAG)

Last verified: 2026-02-15 (Europe/Zurich)

This is the complete, stand-alone deployment runbook for:

  • DGX Spark running vLLM (OpenAI-compatible)
  • Tailscale networking
  • Hostinger VPS running Dokploy + Docker Compose (React frontend + FastAPI + Neo4j)
  • Neo4j Community (single DB: graph-fixed-size) with GraphRAG + communities pipeline

Note: Streamlit was used as a test/prototype UI and can still be run as an optional fallback. For frontend runtime and local workflow details, see Frontend (React).

Secrets policy: - Do NOT commit real secrets (API keys, tokens, passwords). - Keep placeholders in docs and store real values in Dokploy secrets/env.


0) Where to run commands (IMPORTANT)

Dokploy has two different terminals:

0.1 VPS host terminal (has docker, can modify /srv/...)

Use this for:

  • any command that starts with docker ...
  • deleting/creating files under /srv/mt_upload and /srv/mt_data
  • deleting Docker volumes

Open it by SSH from your laptop:

PowerShell
ssh -i "$env:USERPROFILE\.ssh\vps_hostinger" root@VPS_PUBLIC_IP

0.2 Dokploy "Docker Terminal" (inside a container, NO docker command)

Use this for:

  • running python -m graphbuild ... and python -m communities ... (inside the api container)
  • running cypher-shell ... (inside the neo4j container)

In the Docker Terminal you are already inside a container shell, so docker ... will not work.


1) Source-of-truth infrastructure values

DGX Spark (vLLM host)

  • Hostname: VLLM_HOST
  • Tailscale IP: VLLM_TAILSCALE_IP
  • Tailnet DNS (TLS): VLLM_TAILNET_DNS
  • vLLM port: 8000
  • vLLM model id: VLLM_MODEL_ID
  • vLLM container name: VLLM_CONTAINER

Hostinger VPS (Dokploy host)

  • Public IPv4: VPS_PUBLIC_IP
  • Hostname: VPS_HOSTNAME
  • Tailscale IP: VPS_TAILSCALE_IP

Windows laptop

  • Device name: ag
  • Tailscale IP: LAPTOP_TAILSCALE_IP
  • Spark SSH key: %USERPROFILE%\.ssh\tailscale_spark
  • VPS SSH key: %USERPROFILE%\.ssh\vps_hostinger

2) Persistent config - DGX Spark

2.1 Expose vLLM over tailnet (no tunnel)

Bash
sudo tailscale serve --bg --tcp 8000 tcp://localhost:8000
sudo tailscale serve status

2.2 Auto-restart vLLM container

Bash
docker update --restart unless-stopped VLLM_CONTAINER

3) Dokploy environment variables + mounts

3.1 vLLM connectivity (required)

Set these in Dokploy env/secrets for frontend and API services (and Streamlit if fallback is enabled):

  • VLLM_URL=http://VLLM_TAILSCALE_IP:8000/v1/chat/completions
  • VLLM_MODEL=VLLM_MODEL_ID
  • VLLM_API_KEY=<SECRET_TOKEN> (example: token-local-dev)

3.2 Data mount (required)

The app expects a mounted data root:

  • CT_DATA_ROOT=/data

Recommended host paths on the VPS:

  • Upload staging: /srv/mt_upload
  • Data root: /srv/mt_data

Compose mount (via Dokploy env + compose var):

  • Set in Dokploy env: CT_DATA_HOST=/srv/mt_data
  • Compose volume line mounts ${CT_DATA_HOST}:/data

Tip: make the mount read-only if you want safety: /srv/mt_data:/data:ro

3.3 Neo4j GraphRAG (single DB)

Required:

  • NEO4J_URI=bolt://neo4j:7687 (inside Compose network)
  • NEO4J_USER=neo4j
  • NEO4J_PASSWORD=<SECRET_PASSWORD> (used by API and optional Streamlit containers)
  • NEO4J_DATABASE=graph-fixed-size

Optional (only if you need the dataset->DB fallback):

  • NEO4J_DATABASE_FIXED_SIZE=graph-fixed-size

Do NOT use multi-db on the VPS (Community):

  • Do not set NEO4J_DATABASE_SEMANTIC
  • Do not refer to graph-semantic

Graph expansion default:

  • COMM_INGEST_TAG=comm_fixed_C1_g1_2 (update this when you change ingest tag)
  • RAG_API_KEY=<SECRET_API_KEY>

3.5 Legacy Streamlit auto-pipeline

  • AUTO_PIPELINE_FROM_CHAT=1

3.6 Public domain routing (optional)

If you expose the React frontend under a domain (for example crains.souveraen.cloud):

  • DNS A record points to the VPS IP: VPS_PUBLIC_IP
  • If you use CAA records, include: CAA 0 issue "letsencrypt.org"
  • In Dokploy -> mt-rag -> Domains, map the host to the frontend service on container port 3000 and enable Let's Encrypt.

4) Critical Neo4j note (why dump/load is NOT the default)

Local store format check result for graph-fixed-size:

  • store = block-block-1.1

Neo4j Community cannot load a database that uses the block store format. Therefore:

  • Default deployment approach: rebuild graph on VPS from mounted GOLD using graphbuild ingest
  • Do NOT plan on neo4j-admin database dump/load from local to VPS Community (it will fail)

5) Minimal "it works" tests

5.1 Windows -> Spark vLLM over Tailscale

PowerShell
1
2
3
tailscale status
tailscale ping VLLM_TAILSCALE_IP
curl.exe -H "Authorization: Bearer <SECRET_TOKEN>" http://VLLM_TAILSCALE_IP:8000/v1/models

PowerShell chat completion test:

PowerShell
$uri = "http://VLLM_TAILSCALE_IP:8000/v1/chat/completions"
$headers = @{
  Authorization = "Bearer <SECRET_TOKEN>"
  "Content-Type" = "application/json"
}
$bodyJson = @{
  model    = "VLLM_MODEL_ID"
  messages = @(@{ role="user"; content="Hello" })
} | ConvertTo-Json -Depth 10

$response = Invoke-RestMethod -Method Post -Uri $uri -Headers $headers -Body $bodyJson
$response.choices[0].message.content

5.2 VPS -> Spark vLLM over Tailscale (deployment-critical)

Bash
tailscale status
curl -H "Authorization: Bearer <SECRET_TOKEN>" http://VLLM_TAILSCALE_IP:8000/v1/models

6) Dokploy deploy checklist (high level)

  • Dokploy app created
  • Repo linked
  • Compose file: deploy/docker-compose.yml
  • Host dirs created:
  • /srv/mt_upload
  • /srv/mt_data
  • Dokploy env set:
  • CT_DATA_ROOT=/data
  • CT_DATA_HOST=/srv/mt_data
  • Neo4j + vLLM vars above
  • AUTO_PIPELINE_FROM_CHAT=1
  • Deploy
  • Verify:
  • Frontend responds (React in production; Streamlit only if fallback is enabled)
  • API /healthz ok
  • API /query returns {answer, sources}
  • GraphRAG path works (graph expansion uses Neo4j)

7) START HERE after deploy: VPS/Dokploy GraphDB smoke test (Option C)

7.1 Verify /data mount (inside api container)

Dokploy -> Docker Terminal -> select the api container:

Bash
1
2
3
4
ls -lah /data | head
ls -lh /data/gold_subsample_chunk/fixed_size/2025-09-14/chunks_enriched.parquet
ls -lh /data/vectordb_bm25/fixed_size/2025-09-14/bm25__text.pkl
ls -lah /data/vectordb_dense/gemini_text-embedding-004/fixed_size/2025-09-14 | head

7.2 Verify Neo4j is up and DB exists (inside neo4j container)

Dokploy -> Docker Terminal -> select the neo4j container:

Bash
cypher-shell -u neo4j -p "${NEO4J_AUTH#*/}" "SHOW DATABASES;"

Use ${NEO4J_AUTH#*/} because the neo4j container should NOT receive NEO4J_PASSWORD as an env var (strict validation in Neo4j 2025.x).

7.3 Run ingest (rebuild graph from GOLD) (inside api container)

Dokploy -> Docker Terminal -> select the api container:

Bash
1
2
3
4
DATE="2025-09-14"
TAG="comm_fixed_C1_g1_2"

python -m graphbuild ingest   --dataset fixed_size   --date "$DATE"   --ingest-tag "$TAG"   --delete-tag "$TAG"   --batch-size 1000   --data-root /data

7.4 Verify counts (inside neo4j container)

Dokploy -> Docker Terminal -> select the neo4j container:

Bash
1
2
3
cypher-shell -u neo4j -p "${NEO4J_AUTH#*/}" "MATCH (n) RETURN count(n) AS nodes;"
cypher-shell -u neo4j -p "${NEO4J_AUTH#*/}" "MATCH ()-[r]->() RETURN count(r) AS rels;"
cypher-shell -u neo4j -p "${NEO4J_AUTH#*/}" "MATCH (n) RETURN head(labels(n)) AS label, count(*) AS c ORDER BY c DESC LIMIT 10;"

Run these inside the api container (Dokploy Docker Terminal -> api):

Bash
DATE="2025-09-14"
TAG="comm_fixed_C1_g1_2"

# 1) ingest
python -m graphbuild ingest   --dataset fixed_size   --date "$DATE"   --ingest-tag "$TAG"   --delete-tag "$TAG"   --batch-size 1000   --data-root /data

# 2) communities + summaries + vector index
python -m communities communities --dataset fixed_size --ingest-tag "$TAG" --levels "C1:1.2"
python -m communities summaries   --dataset fixed_size --level C1 --ingest-tag "$TAG"
python -m communities ensure-index --dataset fixed_size --level C1

Verify (inside neo4j container):

Bash
cypher-shell -u neo4j -p "${NEO4J_AUTH#*/}" "MATCH (c:Community) RETURN count(c) AS communities;"
cypher-shell -u neo4j -p "${NEO4J_AUTH#*/}" "MATCH ()-[r:IN_COMMUNITY]->() RETURN count(r) AS in_community_rels;"

9) OPTIONAL: Dump/load (only if you change edition/format later)

Dump/load is NOT recommended for the current setup because the local DB uses block store format.

Only consider dump/load if:

  • You switch VPS Neo4j to Enterprise, OR
  • You migrate/rebuild the DB into aligned store format first

Also: do NOT copy Neo4j Desktop data directories from Windows to Linux.


In deploy/docker-compose.yml:

  • Use neo4j:2025.08.0 (Community)
  • Set default DB name:
  • NEO4J_initial_dbms_default__database=${NEO4J_DATABASE:-graph-fixed-size}
  • Set password only via:
  • NEO4J_AUTH=neo4j/${NEO4J_PASSWORD}
  • Do NOT pass NEO4J_PASSWORD into the neo4j container (strict validation in 2025.x)
  • Neo4j healthcheck should use NEO4J_AUTH, e.g.:
  • cypher-shell -u neo4j -p "$${NEO4J_AUTH#*/}" "RETURN 1"
  • Remove semantic DB env vars from API + any legacy Streamlit service
  • Set PYTHONPATH for API + Streamlit (if enabled) to:
  • /app/src:/app/src/rag

10.1 Domain routing + internal service DNS (important)

If you route Streamlit (legacy fallback) via Dokploy/Traefik external network (dokploy-network), ensure Streamlit is connected to both:

  • the compose default network (for api/neo4j hostname resolution)
  • the external dokploy network (for domain routing)

Example snippet:

YAML
1
2
3
4
5
6
7
8
9
streamlit:
  networks:
    - default
    - dokploy

networks:
  dokploy:
    external: true
    name: dokploy-network

11) Security notes

  • Prefer tailnet-only access to vLLM.
  • Do not expose Neo4j to the public internet by default.
  • Store all secrets in Dokploy secrets/env, not in git.

Appendix - Full Reset + Reupload + Rebuild (wipe EVERYTHING)

This appendix is the combined full reset procedure. Use it when you want the VPS dataset and Neo4j graph to be totally empty before reupload/rebuild.

A) Full Reset + Reupload + Rebuild Runbook (VPS + Dokploy)

This runbook completely wipes:

  • all uploaded dataset artifacts on the VPS (/srv/mt_upload, /srv/mt_data)
  • the entire Neo4j database for this Dokploy app (by deleting the Neo4j Docker volumes)

Then it rebuilds everything (ingest + communities + summaries + vector index).

Destructive: this deletes all Neo4j graph data and all uploaded dataset files for this MT deployment.

A.1 Variables you choose each run

  • DATASET=fixed_size
  • DATE=YYYY-MM-DD (example: 2026-01-24)
  • TAG=comm_fixed_<DATE>_C1_g1_2 (example: comm_fixed_2026_01_24_C1_g1_2)
  • LEVEL=C1

A.2 Step 1 - Update Dokploy env vars (before you deploy)

In Dokploy -> mt-rag -> Environment update:

  • COMM_INGEST_TAG=<TAG>

Recommended:

  • UI_DEFAULT_DATASET=fixed_size
  • UI_DEFAULT_DATE=<DATE>
  • UI_DEFAULT_INGEST_TAG=<TAG>

Stable / keep as-is:

  • CT_DATA_HOST=/srv/mt_data
  • CT_DATA_ROOT=/data
  • NEO4J_DATABASE=graph-fixed-size

Dokploy -> mt-rag -> General -> Stop

A.4 Step 3 - FULL WIPE of uploaded dataset files on the VPS (HOST via SSH)

SSH into the VPS:

PowerShell
ssh -i "$env:USERPROFILE\.ssh\vps_hostinger" root@VPS_PUBLIC_IP

On the VPS:

Bash
1
2
3
mkdir -p /srv/mt_upload /srv/mt_data
rm -rf /srv/mt_upload/*
rm -rf /srv/mt_data/*

A.5 Step 4 - FULL WIPE of Neo4j graph database (HOST via SSH)

A.5.1 Identify the compose project name

Bash
docker ps --format "table {{.Names}}\t{{.Image}}\t{{.Status}}" | egrep "mt-rag|neo4j|api|streamlit"

Example compose project prefix:

Bash
PROJ="mt-rag-iixs1o"

A.5.2 Bring the stack down (no volume deletion yet)

Bash
cd "/etc/dokploy/compose/${PROJ}/code"
docker compose -p "$PROJ" -f deploy/docker-compose.yml down --remove-orphans

A.5.3 Delete Neo4j volumes

Bash
docker volume ls --format '{{.Name}}' | grep "$PROJ" | grep neo4j
docker volume rm "${PROJ}_neo4j_data" "${PROJ}_neo4j_logs"

A.6 Step 5 - Reupload fresh data (Windows -> VPS)

A.6.1 Update scripts/sync_mt.ps1

Edit: - $Date = "<DATE>"

A.6.2 Run upload/extract script (Windows)

PowerShell
powershell -ExecutionPolicy Bypass -File .\scripts\sync_mt.ps1

A.6.3 Verify files exist on VPS (optional)

PowerShell
1
2
3
4
$k = "$env:USERPROFILE\.ssh\vps_hostinger"
ssh -i $k root@VPS_PUBLIC_IP "ls -lh /srv/mt_data/gold_subsample_chunk/fixed_size/<DATE>/chunks_enriched.parquet"
ssh -i $k root@VPS_PUBLIC_IP "ls -lh /srv/mt_data/vectordb_bm25/fixed_size/<DATE>/bm25__text.pkl"
ssh -i $k root@VPS_PUBLIC_IP "ls -lah /srv/mt_data/vectordb_dense/gemini_text-embedding-004/fixed_size/<DATE> | head"

A.7 Step 6 - Deploy in Dokploy

Dokploy -> mt-rag -> General -> Deploy

A.8 Step 7 - Rebuild the graph (inside API container)

Bash
1
2
3
4
DATE="<DATE>"
TAG="<TAG>"

python -m graphbuild ingest   --dataset fixed_size   --date "$DATE"   --ingest-tag "$TAG"   --delete-tag "$TAG"   --batch-size 1000   --data-root /data

A.9 Step 8 - Re-run processing (communities + summaries + index)

Bash
1
2
3
4
5
TAG="<TAG>"

python -m communities communities   --dataset fixed_size --ingest-tag "$TAG" --levels "C1:1.2"
python -m communities summaries   --dataset fixed_size --level C1 --ingest-tag "$TAG"
python -m communities ensure-index   --dataset fixed_size --level C1

A.10 Step 9 - Verify counts (inside Neo4j container)

Bash
1
2
3
4
cypher-shell -u neo4j -p "${NEO4J_AUTH#*/}" "MATCH (n) RETURN count(n) AS nodes;"
cypher-shell -u neo4j -p "${NEO4J_AUTH#*/}" "MATCH ()-[r]->() RETURN count(r) AS rels;"
cypher-shell -u neo4j -p "${NEO4J_AUTH#*/}" "MATCH (c:Community) RETURN count(c) AS communities;"
cypher-shell -u neo4j -p "${NEO4J_AUTH#*/}" "MATCH ()-[r:IN_COMMUNITY]->() RETURN count(r) AS in_community_rels;"

A.11 Step 10 - Quick functional test

  • Open frontend domain (for example https://crains.souveraen.cloud)
  • Run a query

Optional API check (inside api container):

Bash
python -c "import urllib.request; print(urllib.request.urlopen('http://localhost:8000/healthz').read().decode())"

A.12 Notes

  • Step A.4 deletes all uploaded files in /srv/mt_upload and /srv/mt_data.
  • Step A.5 deletes the Neo4j volumes, so the graph DB is empty.

If you only need to replace one date folder or only rebuild by tag, skip A.4/A.5 and just:

  • upload new data (overwriting only the relevant folders)
  • re-run ingest with --delete-tag
  • re-run communities pipeline