Skip to content

06 — GraphRAG Γ (gamma) Selection Playbook (Neo4j/GDS)

Goal: choose a resolution (γ) per level (C0/C1/C2) that balances coverage and community size for GraphRAG chunk retrieval, and prepare for a later step where we choose the best level overall using a QA evaluation pipeline.


What this notebook does

  1. Loads candidate runs (each run = one (level, γ) with a distinct --ingest-tag).
  2. Computes metrics from Neo4j for each candidate:
  3. Entities per community → p50 / p90 / p95
  4. Distinct chunks per community → p50 / p90 / p95
  5. #communities
  6. Global distinct chunk coverage and coverage_ratio (= covered / total chunks)
  7. Selects a recommended γ per level via an objective‑driven rule:
  8. coverage (maximize coverage),
  9. moderate (smaller communities with decent coverage),
  10. balanced (default): coverage‑heavy cost with penalties for very large communities and many communities.
  11. Cross‑level summary that lists winners for C0, C1, C2 and writes a selection manifest.
  12. Hook for QA evaluation: you can later load your retrieval metrics (Recall@K, MRR, nDCG, EM/F1, latency) and pick the final champion level.

ℹ️ You do not need community summaries/embeddings to run this notebook; it works on community structure + chunk links only.

Prerequisites

  • .env contains NEO4J_URI, NEO4J_USER, NEO4J_PASSWORD and dataset DB names (NEO4J_DATABASE_FIXED_SIZE / NEO4J_DATABASE_SEMANTIC) if you use per‑dataset DBs.
  • Your chunk/entity graph is already ingested.
  • You executed one or more community builds (one tag per γ value you want to compare).

CLI examples

PowerShell
# Make the package importable
$env:PYTHONPATH = "$PWD\src\rag"

# C1 grid (you already ran these)
python -m communities communities --dataset fixed_size --ingest-tag comm_fixed_C1_g0_8 --levels "C1:0.8" --min-weight 1 --min-size 8
python -m communities communities --dataset fixed_size --ingest-tag comm_fixed_C1_g1_0 --levels "C1:1.0" --min-weight 1 --min-size 8
python -m communities communities --dataset fixed_size --ingest-tag comm_fixed_C1_g1_2 --levels "C1:1.2" --min-weight 1 --min-size 8

# Optional: add C0 and C2 grids as needed
python -m communities communities --dataset fixed_size --ingest-tag comm_fixed_C0_g0_6 --levels "C0:0.6" --min-weight 1 --min-size 8
python -m communities communities --dataset fixed_size --ingest-tag comm_fixed_C2_g1_6 --levels "C2:1.6" --min-weight 1 --min-size 8
Text Only
# --- Configuration ------------------------------------------------------------
import os, re, json
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from neo4j import GraphDatabase

# Load .env if present
try:
    from dotenv import load_dotenv
    load_dotenv()
except Exception:
    pass

print("Libraries imported.")

# Dataset to analyze
DATASET = "fixed_size"  # "fixed_size" or "semantic"

# Levels to analyze (add/remove as needed)
LEVELS = ["C0", "C1", "C2"]

# Candidate runs per level; each has a distinct ingest_tag that encodes the gamma
# Edit these to match your actual runs (the defaults mirror what you've run so far).
CANDIDATE_RUNS = {
    "C0": [
        {"label": "g0.6", "ingest_tag": "comm_fixed_C0_g0_6"},
        # Add more: {"label": "g0.8", "ingest_tag": "comm_fixed_C0_g0_8"},
    ],
    "C1": [
        {"label": "g0.8", "ingest_tag": "comm_fixed_C1_g0_8"},
        {"label": "g1.0", "ingest_tag": "comm_fixed_C1_g1_0"},
        {"label": "g1.2", "ingest_tag": "comm_fixed_C1_g1_2"},
    ],
    "C2": [
        {"label": "g1.6", "ingest_tag": "comm_fixed_C2_g1_6"},
        # Add more: {"label": "g2.0", "ingest_tag": "comm_fixed_C2_g2_0"},
    ],
}

print("Configured dataset:", DATASET)
for lv in LEVELS:
    print(" ", lv, ":", [r["ingest_tag"] for r in CANDIDATE_RUNS.get(lv, [])])

# Selection objective: "coverage", "moderate", "balanced" (best for chunk retrieval)
SELECTION_OBJECTIVE = "balanced"
COVERAGE_TARGET = 0.95            # aim for ≥95% coverage in balanced/coverage
MIN_COVERAGE_FOR_MODERATE = 0.80  # ensure minimum recall in "moderate"

# Output directory (CSV + conclusions + manifest)
OUT_DIR = "reports/graphrag_gamma_selection"
os.makedirs(OUT_DIR, exist_ok=True)
CSV_PATH = os.path.join(OUT_DIR, "gamma_selection_summary.csv")
CONCLUSIONS_MD = os.path.join(OUT_DIR, "gamma_selection_conclusions.md")
MANIFEST_JSON = os.path.join(OUT_DIR, "gamma_selection_manifest.json")
Text Only
1
2
3
4
5
Libraries imported.
Configured dataset: fixed_size
  C0 : ['comm_fixed_C0_g0_6']
  C1 : ['comm_fixed_C1_g0_8', 'comm_fixed_C1_g1_0', 'comm_fixed_C1_g1_2']
  C2 : ['comm_fixed_C2_g1_6']
Text Only
# --- Connect to Neo4j ---------------------------------------------------------
def _env(k, default=None):
    v = os.getenv(k)
    return v if (v is not None and str(v).strip() != "") else default

def pick_database(dataset: str):
    db = _env("NEO4J_DATABASE")
    if db:
        return db
    ds = (dataset or "").strip().lower()
    if ds in {"fixed", "fixed_size", "fixedsize"}:
        return _env("NEO4J_DATABASE_FIXED_SIZE", "neo4j")
    if ds == "semantic":
        return _env("NEO4J_DATABASE_SEMANTIC", "neo4j")
    return "neo4j"

URI = _env("NEO4J_URI", "neo4j://127.0.0.1:7687")
USER = _env("NEO4J_USER", "neo4j")
PWD  = _env("NEO4J_PASSWORD")
DB   = pick_database(DATASET)

assert PWD, "NEO4J_PASSWORD is not set (set it in .env or your shell)."
driver = GraphDatabase.driver(URI, auth=(USER, PWD))
print("Connected:", URI, "| DB:", DB, "| USER:", USER)
Text Only
1
Connected: bolt://127.0.0.1:7687 | DB: graph-fixed-size | USER: neo4j
Text Only
# --- Helpers ------------------------------------------------------------------
def _types_for_dataset(dataset: str):
    ds = (dataset or "").strip().lower()
    if ds == "semantic":
        return ["semantic"]
    if ds in {"fixed", "fixed_size", "fixedsize"}:
        return ["fixed", "fixed_size"]
    return ["fixed", "fixed_size", "semantic"]

def _parse_gamma_from_strings(*candidates):
    # Accept 'g1.2', 'g1_2', 'gamma1.0', 'gamma_0_8'
    rx = re.compile(r"g(?:amma)?[_-]?(\d+(?:[._]\d+)?)", re.IGNORECASE)
    for s in candidates:
        if not s:
            continue
        m = rx.search(str(s))
        if m:
            val = m.group(1).replace("_", ".")
            try:
                return float(val)
            except Exception:
                pass
    return None

def _fetch_gamma(session, dataset, level, tag, label=None):
    # Read full properties(c) once to avoid UnknownProperty warnings
    rec = session.run(
        '''
        MATCH (c:Community)
        WHERE c.dataset=$ds AND c.level=$lv AND c.ingest_tag=$tag
        RETURN properties(c) AS props
        LIMIT 1
        ''',
        ds=dataset, lv=level, tag=tag
    ).single()
    if rec and rec.get("props"):
        props = rec["props"] or {}
        params = props.get("params")
        if isinstance(params, dict):
            for k in ("resolution", "gamma"):
                if k in params:
                    try:
                        return float(params[k])
                    except Exception:
                        pass
    return _parse_gamma_from_strings(label, tag)

def _fetch_total_chunks(session, dataset):
    types = _types_for_dataset(dataset)
    rec = session.run(
        '''
        MATCH (ch:Chunk)
        WHERE ch.chunk_type IN $types
        RETURN count(DISTINCT ch) AS n
        ''',
        types=types
    ).single()
    return int(rec["n"]) if rec and rec["n"] is not None else 0

def _collect_comm_metrics(session, dataset, level, tag):
    """
    Returns:
      sizes: list[int]  (#entities/community from comm.size)
      chunks_per_comm: list[int]  (distinct chunks reachable per community)
      n_communities: int
      unique_chunks_global: int (distinct chunks covered by this run across all comms)
    """
    types = _types_for_dataset(dataset)
    rows = session.run(
        '''
        MATCH (comm:Community)
        WHERE comm.dataset=$ds AND comm.level=$lv AND comm.ingest_tag=$tag
        OPTIONAL MATCH (comm)<-[:IN_COMMUNITY {level:$lv, ingest_tag:$tag}]-(e)
        OPTIONAL MATCH (ch:Chunk)-[:CHUNK_MENTIONS_PERSON|CHUNK_MENTIONS_ORG|CHUNK_MENTIONS_LOCATION|CHUNK_MENTIONS_TECH]->(e)
        WITH comm, coalesce(comm.size, 0) AS size, collect(DISTINCT ch) AS chs
        WITH size, [c IN chs WHERE c IS NOT NULL AND c.chunk_type IN $types] AS chs2
        RETURN size AS size, size(chs2) AS chunks
        ''',
        ds=dataset, lv=level, tag=tag, types=types
    ).data()
    sizes = [int(r["size"]) for r in rows] if rows else []
    chunks_per_comm = [int(r["chunks"]) for r in rows] if rows else []
    n_communities = len(sizes)

    rec = session.run(
        '''
        MATCH (comm:Community)
        WHERE comm.dataset=$ds AND comm.level=$lv AND comm.ingest_tag=$tag
        OPTIONAL MATCH (comm)<-[:IN_COMMUNITY {level:$lv, ingest_tag:$tag}]-(e)
        OPTIONAL MATCH (ch:Chunk)-[:CHUNK_MENTIONS_PERSON|CHUNK_MENTIONS_ORG|CHUNK_MENTIONS_LOCATION|CHUNK_MENTIONS_TECH]->(e)
        WITH collect(DISTINCT ch) AS chs
        RETURN size([c IN chs WHERE c IS NOT NULL AND c.chunk_type IN $types]) AS n
        ''',
        ds=dataset, lv=level, tag=tag, types=types
    ).single()
    unique_chunks_global = int(rec["n"]) if rec and rec["n"] is not None else 0
    return sizes, chunks_per_comm, n_communities, unique_chunks_global

def _pct(arr, p):
    if not arr:
        return float("nan")
    return float(np.percentile(np.asarray(arr, dtype=float), p))
Text Only
# --- Compute metrics for all candidates ---------------------------------------
rows = []
with driver.session(database=DB) as s:
    total_chunks_dataset = _fetch_total_chunks(s, DATASET)
print("Total chunks in dataset:", total_chunks_dataset)

for lv in LEVELS:
    for spec in CANDIDATE_RUNS.get(lv, []):
        tag = spec["ingest_tag"]
        label = spec.get("label", tag)
        with driver.session(database=DB) as s:
            gamma = _fetch_gamma(s, DATASET, lv, tag, label)
            sizes, chs, n_comms, uniq_chunks = _collect_comm_metrics(s, DATASET, lv, tag)
        coverage_ratio = (uniq_chunks / total_chunks_dataset) if total_chunks_dataset > 0 else float("nan")
        rows.append({
            "dataset": DATASET,
            "level": lv,
            "ingest_tag": tag,
            "label": label,
            "gamma": gamma,
            "communities": n_comms,
            "entities_p50": _pct(sizes, 50),
            "entities_p90": _pct(sizes, 90),
            "entities_p95": _pct(sizes, 95),
            "chunks_p50": _pct(chs, 50),
            "chunks_p90": _pct(chs, 90),
            "chunks_p95": _pct(chs, 95),
            "unique_chunks_global": uniq_chunks,
            "coverage_ratio": coverage_ratio,
            "total_chunks_dataset": total_chunks_dataset,
        })

summary_df = pd.DataFrame(rows).sort_values(["level", "gamma", "label"], na_position="last").reset_index(drop=True)
print("Done. Rows:", len(summary_df))
summary_df
Text Only
1
2
Total chunks in dataset: 4022
Done. Rows: 5
dataset level ingest_tag label gamma communities entities_p50 entities_p90 entities_p95 chunks_p50 chunks_p90 chunks_p95 unique_chunks_global coverage_ratio total_chunks_dataset
0 fixed_size C0 comm_fixed_C0_g0_6 g0.6 0.6 80 14.0 359.3 1225.0 2.0 90.6 400.65 3928 0.976629 4022
1 fixed_size C1 comm_fixed_C1_g0_8 g0.8 0.8 18 15.0 54.7 112.0 3.0 11.5 29.55 174 0.043262 4022
2 fixed_size C1 comm_fixed_C1_g1_0 g1.0 1.0 43 19.0 706.2 1504.1 3.0 297.8 872.80 3354 0.833913 4022
3 fixed_size C1 comm_fixed_C1_g1_2 g1.2 1.2 100 16.0 573.2 1317.6 3.0 312.9 747.90 3928 0.976629 4022
4 fixed_size C2 comm_fixed_C2_g1_6 g1.6 1.6 108 18.0 495.2 992.8 3.0 325.6 800.45 3930 0.977126 4022
Text Only
1
2
3
4
5
# --- Persist to CSV -----------------------------------------------------------
summary_df.to_csv(CSV_PATH, index=False)
print("Wrote:", CSV_PATH)

summary_df.head(20)
Text Only
1
Wrote: reports/graphrag_gamma_selection\gamma_selection_summary.csv
dataset level ingest_tag label gamma communities entities_p50 entities_p90 entities_p95 chunks_p50 chunks_p90 chunks_p95 unique_chunks_global coverage_ratio total_chunks_dataset
0 fixed_size C0 comm_fixed_C0_g0_6 g0.6 0.6 80 14.0 359.3 1225.0 2.0 90.6 400.65 3928 0.976629 4022
1 fixed_size C1 comm_fixed_C1_g0_8 g0.8 0.8 18 15.0 54.7 112.0 3.0 11.5 29.55 174 0.043262 4022
2 fixed_size C1 comm_fixed_C1_g1_0 g1.0 1.0 43 19.0 706.2 1504.1 3.0 297.8 872.80 3354 0.833913 4022
3 fixed_size C1 comm_fixed_C1_g1_2 g1.2 1.2 100 16.0 573.2 1317.6 3.0 312.9 747.90 3928 0.976629 4022
4 fixed_size C2 comm_fixed_C2_g1_6 g1.6 1.6 108 18.0 495.2 992.8 3.0 325.6 800.45 3930 0.977126 4022
Text Only
# --- Quick charts -------------------------------------------------------------
def _bar(df, x_col, y_col, title, ylabel):
    plt.figure()
    plt.bar(df[x_col].astype(str), df[y_col])
    plt.title(title)
    plt.xlabel(x_col); plt.ylabel(ylabel)
    plt.xticks(rotation=45, ha="right")
    plt.tight_layout(); plt.show()

def _scatter(df, x_col, y_col, title, xlabel, ylabel, annotate_col=None):
    plt.figure()
    plt.scatter(df[x_col], df[y_col])
    plt.title(title); plt.xlabel(xlabel); plt.ylabel(ylabel)
    if annotate_col:
        for _, r in df.iterrows():
            plt.annotate(str(r[annotate_col]), (r[x_col], r[y_col]), textcoords="offset points", xytext=(4,4))
    plt.tight_layout(); plt.show()

for lv in LEVELS:
    dfl = summary_df[summary_df["level"] == lv].copy()
    if not len(dfl):
        continue

    # Entities per community
    _bar(dfl, "label", "entities_p50", f"{lv} – Entities per community (p50)", "entities")
    _bar(dfl, "label", "entities_p90", f"{lv} – Entities per community (p90)", "entities")
    _bar(dfl, "label", "entities_p95", f"{lv} – Entities per community (p95)", "entities")

    # Distinct chunks per community
    _bar(dfl, "label", "chunks_p50", f"{lv} – Distinct chunks per community (p50)", "chunks")
    _bar(dfl, "label", "chunks_p90", f"{lv} – Distinct chunks per community (p90)", "chunks")
    _bar(dfl, "label", "chunks_p95", f"{lv} – Distinct chunks per community (p95)", "chunks")

    # Pareto: coverage vs. #communities
    _scatter(dfl, "communities", "coverage_ratio", f"{lv} – Coverage vs #communities",
             "#communities", "coverage_ratio", annotate_col="label")

png

png

png

png

png

png

png

png

png

png

png

png

png

png

png

png

png

png

png

png

png

Text Only
# --- Data‑driven γ selection per level ---------------------------------------
from math import sqrt

def _normalize(series):
    arr = np.asarray(series, dtype=float)
    if len(arr) == 0:
        return arr
    mn, mx = np.nanmin(arr), np.nanmax(arr)
    if not np.isfinite(mn) or not np.isfinite(mx) or mx - mn == 0:
        return np.zeros_like(arr, dtype=float)
    return (arr - mn) / (mx - mn)

def _elbow_by_line_distance(xs, ys):
    if len(xs) < 3:
        return 0
    x0, y0 = xs[0], ys[0]
    x1, y1 = xs[-1], ys[-1]
    denom = sqrt((x1 - x0)**2 + (y1 - y0)**2) + 1e-9
    best_i, best_d = 0, -1.0
    for i in range(1, len(xs)-1):
        d = abs((y1 - y0)*xs[i] - (x1 - x0)*ys[i] + x1*y0 - y1*x0) / denom
        if d > best_d: best_i, best_d = i, d
    return best_i

def select_gamma_for_level(dfl, objective="balanced", coverage_target=0.95, min_cov_for_moderate=0.80):
    dfl = dfl.copy().sort_values(["gamma", "label"], na_position="last").reset_index(drop=True)
    if not len(dfl):
        return None, "No candidates.", {}

    if objective == "coverage":
        max_cov = dfl["coverage_ratio"].max()
        best = dfl[dfl["coverage_ratio"] == max_cov].sort_values(
            ["entities_p95", "chunks_p95", "communities"], ascending=[True, True, True]
        ).head(1)
        reason = f"Coverage-first: picked highest coverage_ratio={best['coverage_ratio'].iloc[0]:.3f}."
        return best.iloc[0], reason, {"rule": "coverage-first"}

    if objective == "moderate":
        cov_ok = dfl[dfl["coverage_ratio"] >= min_cov_for_moderate]
        if not len(cov_ok): cov_ok = dfl
        best = cov_ok.sort_values(
            ["entities_p95", "chunks_p95", "communities"], ascending=[True, True, True]
        ).head(1)
        reason = (f"Moderate-size: coverage ≥ {min_cov_for_moderate:.0%}, "
                  f"then minimized p95(entities)→p95(chunks)→#communities.")
        return best.iloc[0], reason, {"rule": "moderate"}

    # balanced
    dfl["_cov_cost"]    = 1.0 - _normalize(dfl["coverage_ratio"].values)
    dfl["_p95ent_cost"] = _normalize(dfl["entities_p95"].values)
    dfl["_p95ch_cost"]  = _normalize(dfl["chunks_p95"].values)
    dfl["_ncomm_cost"]  = _normalize(dfl["communities"].values)

    w_cov, w_ent, w_ch, w_nc = 0.55, 0.20, 0.15, 0.10
    dfl["_cost_balanced"] = (w_cov * dfl["_cov_cost"] +
                              w_ent * dfl["_p95ent_cost"] +
                              w_ch  * dfl["_p95ch_cost"] +
                              w_nc  * dfl["_ncomm_cost"])

    eligible = dfl[dfl["coverage_ratio"] >= coverage_target]
    if len(eligible):
        best = eligible.sort_values(["_cost_balanced"]).head(1)
        reason = (f"Balanced: among candidates with coverage ≥ {coverage_target:.0%}, "
                  f"minimized weighted cost (coverage-heavy).")
    else:
        best = dfl.sort_values(["_cost_balanced"]).head(1)
        reason = ("Balanced: no candidate met the coverage target; picked min weighted cost overall.")

    xs = _normalize(dfl["communities"].values)
    ys = _normalize(dfl["coverage_ratio"].values)
    idx = _elbow_by_line_distance(xs, ys) if len(dfl) >= 3 else 0
    elbow = dfl.iloc[idx]

    return best.iloc[0], reason + f" (Elbow diagnostic suggests {elbow.get('label')}).", {
        "rule": "balanced",
        "elbow_label": elbow.get("label"),
        "elbow_gamma": float(elbow.get("gamma")) if pd.notnull(elbow.get("gamma")) else None,
    }

# Produce per-level recommendations
conclusions = []
winners = []
for lv in LEVELS:
    dfl = summary_df[summary_df["level"] == lv]
    if not len(dfl):
        conclusions.append(f"### {lv}\nNo candidates found.\n")
        continue
    chosen, reason, diag = select_gamma_for_level(
        dfl, objective=SELECTION_OBJECTIVE,
        coverage_target=COVERAGE_TARGET,
        min_cov_for_moderate=MIN_COVERAGE_FOR_MODERATE
    )
    if chosen is None:
        conclusions.append(f"### {lv}\nNo selection could be made.\n")
        continue
    winners.append(chosen.to_dict())
    line = (
        f"### {lv} — **Recommended γ: {chosen.get('gamma')}** "
        f"(tag: `{chosen.get('ingest_tag')}`, label: `{chosen.get('label')}`)\n"
        f"- coverage_ratio: **{chosen.get('coverage_ratio'):.3f}** "
        f"({int(chosen.get('unique_chunks_global'))} / {int(chosen.get('total_chunks_dataset'))} chunks)\n"
        f"- communities: {int(chosen.get('communities'))}\n"
        f"- entities p50/p95: {chosen.get('entities_p50'):.1f} / {chosen.get('entities_p95'):.1f}\n"
        f"- chunks per community p50/p95: {chosen.get('chunks_p50'):.1f} / {chosen.get('chunks_p95'):.1f}\n"
        f"- Selection rule: {reason}\n"
    )
    if diag.get("elbow_label"):
        line += f"- Elbow diagnostic suggested **{diag['elbow_label']}**.\n"
    conclusions.append(line)

# Persist conclusions and a manifest of winners per level
conclusion_md = "# Data‑driven selection (per level)\n\n" + "\n\n".join(conclusions)
with open(CONCLUSIONS_MD, "w", encoding="utf-8") as f:
    f.write(conclusion_md)

manifest = {
    "dataset": DATASET,
    "objective": SELECTION_OBJECTIVE,
    "coverage_target": COVERAGE_TARGET,
    "winners_per_level": winners,
}
with open(MANIFEST_JSON, "w", encoding="utf-8") as f:
    json.dump(manifest, f, indent=2)

print(conclusion_md)
print("---")
print("Wrote:", CONCLUSIONS_MD)
print("Wrote:", MANIFEST_JSON)
Text Only
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# Data‑driven selection (per level)

### C0 — **Recommended γ: 0.6** (tag: `comm_fixed_C0_g0_6`, label: `g0.6`)
- coverage_ratio: **0.977** (3928 / 4022 chunks)
- communities: 80
- entities p50/p95: 14.0 / 1225.0
- chunks per community p50/p95: 2.0 / 400.6
- Selection rule: Balanced: among candidates with coverage ≥ 95%, minimized weighted cost (coverage-heavy). (Elbow diagnostic suggests g0.6).
- Elbow diagnostic suggested **g0.6**.


### C1 — **Recommended γ: 1.2** (tag: `comm_fixed_C1_g1_2`, label: `g1.2`)
- coverage_ratio: **0.977** (3928 / 4022 chunks)
- communities: 100
- entities p50/p95: 16.0 / 1317.6
- chunks per community p50/p95: 3.0 / 747.9
- Selection rule: Balanced: among candidates with coverage ≥ 95%, minimized weighted cost (coverage-heavy). (Elbow diagnostic suggests g1.0).
- Elbow diagnostic suggested **g1.0**.


### C2 — **Recommended γ: 1.6** (tag: `comm_fixed_C2_g1_6`, label: `g1.6`)
- coverage_ratio: **0.977** (3930 / 4022 chunks)
- communities: 108
- entities p50/p95: 18.0 / 992.8
- chunks per community p50/p95: 3.0 / 800.4
- Selection rule: Balanced: among candidates with coverage ≥ 95%, minimized weighted cost (coverage-heavy). (Elbow diagnostic suggests g1.6).
- Elbow diagnostic suggested **g1.6**.

---
Wrote: reports/graphrag_gamma_selection\gamma_selection_conclusions.md
Wrote: reports/graphrag_gamma_selection\gamma_selection_manifest.json

Using the selected configuration in retrieval

Once you've picked a winner per level (and later, the final champion level from QA): - Use the selected (level, ingest_tag) in your retriever. - Example CLI to pull chunks for a query:

PowerShell
# Example: retrieve with the recommended C1 config (adjust tag/level as needed)
python -m communities retrieve "solid-state battery" --dataset fixed_size --level C1 --ingest-tag comm_fixed_C1_g1_2 --k-comms 24 --top-k 100 --rerank --json

Tip: If you want to blend multiple levels, run communities retrieve for each (level, tag) and RRF-merge the chunk IDs.

Text Only
1
2
3
4
5
6
# --- Close the driver ---------------------------------------------------------
try:
    driver.close()
    print("Closed Neo4j driver.")
except Exception:
    pass
Text Only
1
Closed Neo4j driver.