Deployment Runbook (MT-RAG)¶

Last verified: 2026-02-15 (Europe/Zurich)

This is the complete, stand-alone deployment runbook for:

DGX Spark running vLLM (OpenAI-compatible)
Tailscale networking
Hostinger VPS running Dokploy + Docker Compose (React frontend + FastAPI + Neo4j)
Neo4j Community (single DB: graph-fixed-size) with GraphRAG + communities pipeline

Note: Streamlit was used as a test/prototype UI and can still be run as an optional fallback. For frontend runtime and local workflow details, see Frontend (React).

Secrets policy: - Do NOT commit real secrets (API keys, tokens, passwords). - Keep placeholders in docs and store real values in Dokploy secrets/env.

0) Where to run commands (IMPORTANT)¶

Dokploy has two different terminals:

0.1 VPS host terminal (has docker, can modify /srv/...)¶

Use this for:

any command that starts with docker ...
deleting/creating files under /srv/mt_upload and /srv/mt_data
deleting Docker volumes

Open it by SSH from your laptop:

PowerShell
ssh -i "$env:USERPROFILE\.ssh\vps_hostinger" root@VPS_PUBLIC_IP

0.2 Dokploy "Docker Terminal" (inside a container, NO docker command)¶

Use this for:

running python -m graphbuild ... and python -m communities ... (inside the api container)
running cypher-shell ... (inside the neo4j container)

In the Docker Terminal you are already inside a container shell, so docker ... will not work.

1) Source-of-truth infrastructure values¶

DGX Spark (vLLM host)¶

Hostname: VLLM_HOST
Tailscale IP: VLLM_TAILSCALE_IP
Tailnet DNS (TLS): VLLM_TAILNET_DNS
vLLM port: 8000
vLLM model id: VLLM_MODEL_ID
vLLM container name: VLLM_CONTAINER

Hostinger VPS (Dokploy host)¶

Public IPv4: VPS_PUBLIC_IP
Hostname: VPS_HOSTNAME
Tailscale IP: VPS_TAILSCALE_IP

Windows laptop¶

Device name: ag
Tailscale IP: LAPTOP_TAILSCALE_IP
Spark SSH key: %USERPROFILE%\.ssh\tailscale_spark
VPS SSH key: %USERPROFILE%\.ssh\vps_hostinger

2) Persistent config - DGX Spark¶

2.1 Expose vLLM over tailnet (no tunnel)¶

Bash
sudo tailscale serve --bg --tcp 8000 tcp://localhost:8000
sudo tailscale serve status

2.2 Auto-restart vLLM container¶

Bash
docker update --restart unless-stopped VLLM_CONTAINER

3) Dokploy environment variables + mounts¶

3.1 vLLM connectivity (required)¶

Set these in Dokploy env/secrets for frontend and API services (and Streamlit if fallback is enabled):

VLLM_URL=http://VLLM_TAILSCALE_IP:8000/v1/chat/completions
VLLM_MODEL=VLLM_MODEL_ID
VLLM_API_KEY=<SECRET_TOKEN> (example: token-local-dev)

3.2 Data mount (required)¶

The app expects a mounted data root:

CT_DATA_ROOT=/data

Recommended host paths on the VPS:

Upload staging: /srv/mt_upload
Data root: /srv/mt_data

Compose mount (via Dokploy env + compose var):

Set in Dokploy env: CT_DATA_HOST=/srv/mt_data
Compose volume line mounts ${CT_DATA_HOST}:/data

Tip: make the mount read-only if you want safety: /srv/mt_data:/data:ro

3.3 Neo4j GraphRAG (single DB)¶

Required:

NEO4J_URI=bolt://neo4j:7687 (inside Compose network)
NEO4J_USER=neo4j
NEO4J_PASSWORD=<SECRET_PASSWORD> (used by API and optional Streamlit containers)
NEO4J_DATABASE=graph-fixed-size

Optional (only if you need the dataset->DB fallback):

NEO4J_DATABASE_FIXED_SIZE=graph-fixed-size

Do NOT use multi-db on the VPS (Community):

Do not set NEO4J_DATABASE_SEMANTIC
Do not refer to graph-semantic

Graph expansion default:

COMM_INGEST_TAG=comm_fixed_C1_g1_2 (update this when you change ingest tag)

3.4 RAG API auth (optional but recommended)¶

RAG_API_KEY=<SECRET_API_KEY>

3.5 Legacy Streamlit auto-pipeline¶

AUTO_PIPELINE_FROM_CHAT=1

3.6 Public domain routing (optional)¶

If you expose the React frontend under a domain (for example crains.souveraen.cloud):

DNS A record points to the VPS IP: VPS_PUBLIC_IP
If you use CAA records, include: CAA 0 issue "letsencrypt.org"
In Dokploy -> mt-rag -> Domains, map the host to the frontend service on container port 3000 and enable Let's Encrypt.

4) Critical Neo4j note (why dump/load is NOT the default)¶

Local store format check result for graph-fixed-size:

store = block-block-1.1

Neo4j Community cannot load a database that uses the block store format. Therefore:

Default deployment approach: rebuild graph on VPS from mounted GOLD using graphbuild ingest
Do NOT plan on neo4j-admin database dump/load from local to VPS Community (it will fail)

5) Minimal "it works" tests¶

5.1 Windows -> Spark vLLM over Tailscale¶

PowerShell
tailscale status
tailscale ping VLLM_TAILSCALE_IP
curl.exe -H "Authorization: Bearer <SECRET_TOKEN>" http://VLLM_TAILSCALE_IP:8000/v1/models

PowerShell chat completion test:

PowerShell
$uri = "http://VLLM_TAILSCALE_IP:8000/v1/chat/completions"
$headers = @{
  Authorization = "Bearer <SECRET_TOKEN>"
  "Content-Type" = "application/json"
}
$bodyJson = @{
  model    = "VLLM_MODEL_ID"
  messages = @(@{ role="user"; content="Hello" })
} | ConvertTo-Json -Depth 10

$response = Invoke-RestMethod -Method Post -Uri $uri -Headers $headers -Body $bodyJson
$response.choices[0].message.content

5.2 VPS -> Spark vLLM over Tailscale (deployment-critical)¶

Bash
tailscale status
curl -H "Authorization: Bearer <SECRET_TOKEN>" http://VLLM_TAILSCALE_IP:8000/v1/models

6) Dokploy deploy checklist (high level)¶

Dokploy app created
Repo linked
Compose file: deploy/docker-compose.yml
Host dirs created:
/srv/mt_upload
/srv/mt_data
Dokploy env set:
CT_DATA_ROOT=/data
CT_DATA_HOST=/srv/mt_data
Neo4j + vLLM vars above
AUTO_PIPELINE_FROM_CHAT=1
Deploy
Verify:
Frontend responds (React in production; Streamlit only if fallback is enabled)
API /healthz ok
API /query returns {answer, sources}
GraphRAG path works (graph expansion uses Neo4j)

7) START HERE after deploy: VPS/Dokploy GraphDB smoke test (Option C)¶

7.1 Verify `/data` mount (inside api container)¶

Dokploy -> Docker Terminal -> select the api container:

Bash
ls -lah /data | head
ls -lh /data/gold_subsample_chunk/fixed_size/2025-09-14/chunks_enriched.parquet
ls -lh /data/vectordb_bm25/fixed_size/2025-09-14/bm25__text.pkl
ls -lah /data/vectordb_dense/gemini_text-embedding-004/fixed_size/2025-09-14 | head

7.2 Verify Neo4j is up and DB exists (inside neo4j container)¶

Dokploy -> Docker Terminal -> select the neo4j container:

Bash
cypher-shell -u neo4j -p "${NEO4J_AUTH#*/}" "SHOW DATABASES;"

Use ${NEO4J_AUTH#*/} because the neo4j container should NOT receive NEO4J_PASSWORD as an env var (strict validation in Neo4j 2025.x).

7.3 Run ingest (rebuild graph from GOLD) (inside api container)¶

Dokploy -> Docker Terminal -> select the api container:

Bash
DATE="2025-09-14"
TAG="comm_fixed_C1_g1_2"

python -m graphbuild ingest   --dataset fixed_size   --date "$DATE"   --ingest-tag "$TAG"   --delete-tag "$TAG"   --batch-size 1000   --data-root /data

7.4 Verify counts (inside neo4j container)¶

Dokploy -> Docker Terminal -> select the neo4j container:

Bash
cypher-shell -u neo4j -p "${NEO4J_AUTH#*/}" "MATCH (n) RETURN count(n) AS nodes;"
cypher-shell -u neo4j -p "${NEO4J_AUTH#*/}" "MATCH ()-[r]->() RETURN count(r) AS rels;"
cypher-shell -u neo4j -p "${NEO4J_AUTH#*/}" "MATCH (n) RETURN head(labels(n)) AS label, count(*) AS c ORDER BY c DESC LIMIT 10;"

8) Full graph + communities bootstrapping (recommended after ingest)¶

Run these inside the api container (Dokploy Docker Terminal -> api):

Bash
DATE="2025-09-14"
TAG="comm_fixed_C1_g1_2"

# 1) ingest
python -m graphbuild ingest   --dataset fixed_size   --date "$DATE"   --ingest-tag "$TAG"   --delete-tag "$TAG"   --batch-size 1000   --data-root /data

# 2) communities + summaries + vector index
python -m communities communities --dataset fixed_size --ingest-tag "$TAG" --levels "C1:1.2"
python -m communities summaries   --dataset fixed_size --level C1 --ingest-tag "$TAG"
python -m communities ensure-index --dataset fixed_size --level C1

Verify (inside neo4j container):

Bash
cypher-shell -u neo4j -p "${NEO4J_AUTH#*/}" "MATCH (c:Community) RETURN count(c) AS communities;"
cypher-shell -u neo4j -p "${NEO4J_AUTH#*/}" "MATCH ()-[r:IN_COMMUNITY]->() RETURN count(r) AS in_community_rels;"

9) OPTIONAL: Dump/load (only if you change edition/format later)¶

Dump/load is NOT recommended for the current setup because the local DB uses block store format.

Only consider dump/load if:

You switch VPS Neo4j to Enterprise, OR
You migrate/rebuild the DB into aligned store format first

Also: do NOT copy Neo4j Desktop data directories from Windows to Linux.

10) Recommended compose edits (quick summary)¶

In deploy/docker-compose.yml:

Use neo4j:2025.08.0 (Community)
Set default DB name:
NEO4J_initial_dbms_default__database=${NEO4J_DATABASE:-graph-fixed-size}
Set password only via:
NEO4J_AUTH=neo4j/${NEO4J_PASSWORD}
Do NOT pass NEO4J_PASSWORD into the neo4j container (strict validation in 2025.x)
Neo4j healthcheck should use NEO4J_AUTH, e.g.:
cypher-shell -u neo4j -p "$${NEO4J_AUTH#*/}" "RETURN 1"
Remove semantic DB env vars from API + any legacy Streamlit service
Set PYTHONPATH for API + Streamlit (if enabled) to:
/app/src:/app/src/rag

10.1 Domain routing + internal service DNS (important)¶

If you route Streamlit (legacy fallback) via Dokploy/Traefik external network (dokploy-network), ensure Streamlit is connected to both:

the compose default network (for api/neo4j hostname resolution)
the external dokploy network (for domain routing)

Example snippet:

YAML
streamlit:
  networks:
    - default
    - dokploy

networks:
  dokploy:
    external: true
    name: dokploy-network

11) Security notes¶

Prefer tailnet-only access to vLLM.
Do not expose Neo4j to the public internet by default.
Store all secrets in Dokploy secrets/env, not in git.

Appendix - Full Reset + Reupload + Rebuild (wipe EVERYTHING)¶

This appendix is the combined full reset procedure. Use it when you want the VPS dataset and Neo4j graph to be totally empty before reupload/rebuild.

A) Full Reset + Reupload + Rebuild Runbook (VPS + Dokploy)¶

This runbook completely wipes:

all uploaded dataset artifacts on the VPS (/srv/mt_upload, /srv/mt_data)
the entire Neo4j database for this Dokploy app (by deleting the Neo4j Docker volumes)

Then it rebuilds everything (ingest + communities + summaries + vector index).

Destructive: this deletes all Neo4j graph data and all uploaded dataset files for this MT deployment.

A.1 Variables you choose each run¶

DATASET=fixed_size
DATE=YYYY-MM-DD (example: 2026-01-24)
TAG=comm_fixed_<DATE>_C1_g1_2 (example: comm_fixed_2026_01_24_C1_g1_2)
LEVEL=C1

A.2 Step 1 - Update Dokploy env vars (before you deploy)¶

In Dokploy -> mt-rag -> Environment update:

COMM_INGEST_TAG=<TAG>

Recommended:

UI_DEFAULT_DATASET=fixed_size
UI_DEFAULT_DATE=<DATE>
UI_DEFAULT_INGEST_TAG=<TAG>

Stable / keep as-is:

CT_DATA_HOST=/srv/mt_data
CT_DATA_ROOT=/data
NEO4J_DATABASE=graph-fixed-size

A.3 Step 2 - Stop the app in Dokploy (recommended)¶

Dokploy -> mt-rag -> General -> Stop

A.4 Step 3 - FULL WIPE of uploaded dataset files on the VPS (HOST via SSH)¶

SSH into the VPS:

PowerShell
ssh -i "$env:USERPROFILE\.ssh\vps_hostinger" root@VPS_PUBLIC_IP

On the VPS:

Bash
mkdir -p /srv/mt_upload /srv/mt_data
rm -rf /srv/mt_upload/*
rm -rf /srv/mt_data/*

A.5 Step 4 - FULL WIPE of Neo4j graph database (HOST via SSH)¶

A.5.1 Identify the compose project name¶

Bash
docker ps --format "table {{.Names}}\t{{.Image}}\t{{.Status}}" | egrep "mt-rag|neo4j|api|streamlit"

Example compose project prefix:

Bash
PROJ="mt-rag-iixs1o"

A.5.2 Bring the stack down (no volume deletion yet)¶

Bash
cd "/etc/dokploy/compose/${PROJ}/code"
docker compose -p "$PROJ" -f deploy/docker-compose.yml down --remove-orphans

A.5.3 Delete Neo4j volumes¶

Bash
docker volume ls --format '{{.Name}}' | grep "$PROJ" | grep neo4j
docker volume rm "${PROJ}_neo4j_data" "${PROJ}_neo4j_logs"

A.6 Step 5 - Reupload fresh data (Windows -> VPS)¶

A.6.1 Update `scripts/sync_mt.ps1`¶

Edit: - $Date = "<DATE>"

A.6.2 Run upload/extract script (Windows)¶

PowerShell
powershell -ExecutionPolicy Bypass -File .\scripts\sync_mt.ps1

A.6.3 Verify files exist on VPS (optional)¶

PowerShell
$k = "$env:USERPROFILE\.ssh\vps_hostinger"
ssh -i $k root@VPS_PUBLIC_IP "ls -lh /srv/mt_data/gold_subsample_chunk/fixed_size/<DATE>/chunks_enriched.parquet"
ssh -i $k root@VPS_PUBLIC_IP "ls -lh /srv/mt_data/vectordb_bm25/fixed_size/<DATE>/bm25__text.pkl"
ssh -i $k root@VPS_PUBLIC_IP "ls -lah /srv/mt_data/vectordb_dense/gemini_text-embedding-004/fixed_size/<DATE> | head"

A.7 Step 6 - Deploy in Dokploy¶

Dokploy -> mt-rag -> General -> Deploy

A.8 Step 7 - Rebuild the graph (inside API container)¶

Bash
DATE="<DATE>"
TAG="<TAG>"

python -m graphbuild ingest   --dataset fixed_size   --date "$DATE"   --ingest-tag "$TAG"   --delete-tag "$TAG"   --batch-size 1000   --data-root /data

A.9 Step 8 - Re-run processing (communities + summaries + index)¶

Bash
TAG="<TAG>"

python -m communities communities   --dataset fixed_size --ingest-tag "$TAG" --levels "C1:1.2"
python -m communities summaries   --dataset fixed_size --level C1 --ingest-tag "$TAG"
python -m communities ensure-index   --dataset fixed_size --level C1

A.10 Step 9 - Verify counts (inside Neo4j container)¶

Bash
cypher-shell -u neo4j -p "${NEO4J_AUTH#*/}" "MATCH (n) RETURN count(n) AS nodes;"
cypher-shell -u neo4j -p "${NEO4J_AUTH#*/}" "MATCH ()-[r]->() RETURN count(r) AS rels;"
cypher-shell -u neo4j -p "${NEO4J_AUTH#*/}" "MATCH (c:Community) RETURN count(c) AS communities;"
cypher-shell -u neo4j -p "${NEO4J_AUTH#*/}" "MATCH ()-[r:IN_COMMUNITY]->() RETURN count(r) AS in_community_rels;"

A.11 Step 10 - Quick functional test¶

Open frontend domain (for example https://crains.souveraen.cloud)
Run a query

Optional API check (inside api container):

Bash
python -c "import urllib.request; print(urllib.request.urlopen('http://localhost:8000/healthz').read().decode())"

A.12 Notes¶

Step A.4 deletes all uploaded files in /srv/mt_upload and /srv/mt_data.
Step A.5 deletes the Neo4j volumes, so the graph DB is empty.

If you only need to replace one date folder or only rebuild by tag, skip A.4/A.5 and just:

upload new data (overwriting only the relevant folders)
re-run ingest with --delete-tag
re-run communities pipeline