Skip to content

Rerankers

Cross-encoders clean noisy candidates, realign BM25/dense/graph evidence, and emit consistent scores for evaluation and generative post-processing. This page documents the shared reranker layer used by Hybrid/GraphRAG pipelines, the standalone CLI, and the benchmark overlay runner.

Reranker and benchmark flow

What the reranker layer guarantees

  • Schema sanity – All inputs are normalized into a Candidate with id, doc (text), metadata, and the original score if present.
  • Deterministic scoring – A single attached key (rerank_score by default, configurable via --attach-key) is produced for every candidate.
  • Blend-aware ordering – Downstream sort may use final_score = alpha * rerank_score + (1 - alpha) * fused_score or pure rerank/fused order.
  • Result diversitymax_per_doc limits repeat chunks per document after sorting to keep breadth without drowning rare sources.
  • Observability – Structured JSONL logs separate rerank-only latency (RERANK_LATENCY_LOG) from hybrid stage timings (HYB_LATENCY_LOG).

Inputs and outputs

Inputs

  • Candidate lists containing id, summary (preferred) or doc/text/raw_text, optional score, and free-form metadata.
  • Provider credentials when required: CO_API_KEY/COHERE_API_KEY/COHERE_KEY for Cohere; none needed for HuggingFace or FlashRank.
  • Optional observability hooks: RERANK_LATENCY_LOG (rerank-stage JSONL), BENCH_QID (join key), and HYB_LATENCY_LOG (seed/graph/rerank/fuse timing when using hybrid retrievers).

Outputs

  • Sorted candidates with attached rerank signal, preserved metadata, and optional blended final_score trimmed to top_k.
  • Reranker latency JSONL (one row per query with stage=rerank) when RERANK_LATENCY_LOG is set; hybrid pipelines also populate HYB_LATENCY_LOG.

Provider options

  • Cohere APIspec="cohere[:model]" (default COHERE_RERANK_MODELrerank-v3.5); hosted inference, stable floats.
  • HuggingFace CrossEncoderspec="hf:<repo_or_model>" (default HF_CE_MODELBAAI/bge-reranker-base); local via sentence-transformers.
  • FlashRankspec="flashrank[:model]" (default FLASHRANK_MODELms-marco-MiniLM-L-12-v2); lightweight CPU/GPU-friendly reranker.
  • Auto-selection — if --spec is omitted, build_reranker prefers Cohere when a key exists; otherwise falls back to HuggingFace.

Candidate normalization and text choice

  • Rerankers materialize Candidate objects with normalized doc, preserved metadata, and the original score when present.
  • Text preference: summarydoc/text/raw_text, unless --prefer text (CLI) or prefer="text" (Python) forces full-body reranking.
  • Returned dicts always include id, score, metadata, normalized doc, and the attached rerank key so evaluation remains deterministic.

CLI: direct scoring

Use the CLI to rescore an arbitrary JSON payload without running the full retriever stack.

Bash
1
2
3
4
5
6
7
8
python -m rag.rerank score \
  --query "solid-state battery thermal runaway" \
  --spec hf:BAAI/bge-reranker-base \
  --in candidates.json \
  --out reranked.json \
  --prefer summary \
  --top-k 20 \
  --attach-key rerank_score
  • --spec accepts cohere[:model], hf:<repo>, or flashrank[:model].
  • --prefer toggles whether text is drawn from summary (default) or text.
  • --top-k trims the sorted output; omit to keep all items.
  • Post-rerank diversity (hybrid CLI): add --diversify mmr --mmr-lambda 0.5 to enforce semantic spread after rerank/blend and doc caps; use --diversify none (default) to disable. Uses a default SentenceTransformer encoder (sentence-transformers/all-MiniLM-L6-v2); optionally override with MMR_EMB_MODEL.

Retrieval integration (Hybrid / GraphRAG)

Pipeline: TripleHybridRetriever collects dense + BM25 seeds (and optional graph expansion) before RerankingTripleRetriever overlays cross-encoder scores.

Key knobs

  • fetch_top_n (default 120) controls how many seeds enter reranking.
  • rerank_mode (dense|summary) selects which embeddings feed the cross-encoder.
  • alpha blends rerank and fused scores; sort_by toggles final, rerank, or fused ordering; max_per_doc caps per-document results.
  • Graph expansion: graph_expand, ingest_tag, level, hyb_expand_ratio, hyb_expand_limit mirror the hybrid CLI and app settings (production React/API and legacy Streamlit).
  • Post-rerank diversity (MMR): --diversify mmr enables a Maximal Marginal Relevance pass after rerank/blend and doc caps; tune --mmr-lambda (0 = more diversity, 1 = pure relevance). Requires a local SentenceTransformer encoder (default sentence-transformers/all-MiniLM-L6-v2); set MMR_EMB_MODEL to override.
  • Observability: hybrid stage metrics (seed, graph_expand, rerank, fuse) stream to HYB_LATENCY_LOG; rerank-only timing writes to RERANK_LATENCY_LOG when present.

Retrieval params reference

Defaults are from the CLI (python -m rag.retrieval.triple_retriever) and the app (DEFAULT_RETRIEVAL_SETTINGS). If the app does not expose a param, the CLI default is used.

Param What it does CLI default App default
dataset Which dataset to query required fixed_size
date Dataset snapshot date (YYYY-MM-DD); omit for latest GOLD None 2025-09-14
ingest_tag GraphRAG ingest tag None (falls back to COMM_INGEST_TAG) comm_fixed_C1_g1_2
level Community level C1 C1
seed_k Seeds per channel (BM25 + dense) 120 120
fetch_top_n Candidates fetched before rerank 120 120
expand_ratio Graph expansion ratio 2.0 2.0
expand_limit Graph expansion cap 800 800
rerank_mode Candidate scoring mode (dense or summary) dense dense
rerank_spec Cross-encoder provider/model hf:BAAI/bge-reranker-base not exposed (CLI default)
prefer Text selection for reranker summary summary
alpha Blend weight for rerank vs fused 0.7 0.7
sort_by Ordering: final, rerank, fused final final
diversify Post-rerank diversity mode none mmr
mmr_lambda MMR relevance/diversity balance 0.5 0.5
max_per_doc Cap chunks per doc_id 0 2
min_occurrence Ensure N patent/media items 0 (alias: --min-occurance) 3
top_k Final results returned 20 20
graph_required Error if graph not available False True
coverage Emit coverage diagnostics False True
fusion Emit fused payload (text or summary) None summary
include_doc Include reranked text in output False not exposed (CLI default)
attach_metadata Include light metadata in output True not exposed (CLI default)
bench_root Override bench_out root None not exposed (CLI default)
dense_date Override dense index date None not exposed (CLI default)
dense_persist Override dense index path None not exposed (CLI default)

Benchmark overlay (end-to-end reference)

python -m rag.bench.run_rerank_overlay mirrors the hybrid stack, writes rerank-stage logs, and emits run JSON compatible with rag.bench.cli evaluate.

PowerShell
# 0) Activate env & set paths
. .\.venv\Scripts\Activate.ps1
$env:PYTHONPATH = "$PWD\src"

$env:DATASET = "fixed_size"
$TOPK = 100
$DATE = $(python -c "import os; from rag.utils.paths import latest_gold_chunk_date; print(latest_gold_chunk_date(os.environ['DATASET']))")

$QA    = "bench_out\$env:DATASET\$DATE\qa.parquet"
$QRELS = "bench_out\$env:DATASET\$DATE\qrels.parquet"
$EVALS = "bench_out\$env:DATASET\$DATE\evals"

# Optional: clean previous logs for a fresh latency read
Remove-Item "$EVALS\latency_e2e.jsonl"    -ErrorAction SilentlyContinue
Remove-Item "$EVALS\latency_stages.jsonl" -ErrorAction SilentlyContinue

# 5) Rerank overlay WITHOUT graph (HF example)
python -m rag.bench.run_rerank_overlay `
  $env:DATASET `
  $QA `
  --date $DATE `
  --top-k $TOPK `
  --rerank-spec "hf:cross-encoder/ms-marco-MiniLM-L-12-v2" `
  --prefer summary `
  --alpha 0.7 `
  --sort-by final `
  --no-graph-expand `
  --out         (Join-Path $EVALS "hybrid_rerank_hf_msmarco_no_graph_run.json") `
  --overlay-log (Join-Path $EVALS "latency_rerank_overlay_hf_msmarco_no_graph.jsonl")

# Stage log path emitted by run_rerank_overlay
$RERANK_STAGE_NO_GRAPH = Join-Path $EVALS "evals_rerank\latency_rerank_no_graph.jsonl"

python -m rag.bench evaluate `
  --qrels $QRELS `
  --run   (Join-Path $EVALS "hybrid_rerank_hf_msmarco_no_graph_run.json") `
  --run-id "06_Hybrid_rerank(hf-msmarco)_no_graph" `
  --e2e-log   (Join-Path $EVALS "latency_rerank_overlay_hf_msmarco_no_graph.jsonl") `
  --stage-log (Join-Path $EVALS "latency_stages_rerank_no_graph.jsonl") `
  --rerank-log $RERANK_STAGE_NO_GRAPH
  • Swap --rerank-spec to cohere:rerank-v3.5 or flashrank:ms-marco-MiniLM-L-12-v2 to compare providers.
  • Add --graph-expand --ingest-tag <tag> --level <level> to score graph-expanded candidates; stage log path becomes latency_rerank_graph.jsonl.
  • Overlay logs produced:
  • latency_rerank_overlay_<spec>_<suffix>.jsonl — end-to-end overlay timing.
  • latency_stages_rerank_<suffix>.jsonl — hybrid seeds/graph/rerank/fuse stages.
  • evals_rerank/latency_rerank_<suffix>.jsonl — reranker-only latency emitted by run_rerank_overlay.

Operational guidance

  • Prefer HuggingFace/FlashRank locally (GPU-aware) and Cohere in hosted environments when latency/throughput matter.
  • Ensure dense indexes exist when rerank_mode=dense; fall back to rerank_mode=summary if missing.
  • Monitor rerank medians via summarize_single_stage(..., stage="rerank") and trim fetch_top_n if p95 grows beyond latency targets.
  • Keep fetch_top_n close to hyb_seed_k (e.g., 120) and tune alpha downward if rerankers overpower fusion scores.