Rerankers¶
Cross-encoders clean noisy candidates, realign BM25/dense/graph evidence, and emit consistent scores for evaluation and generative post-processing. This page documents the shared reranker layer used by Hybrid/GraphRAG pipelines, the standalone CLI, and the benchmark overlay runner.
What the reranker layer guarantees¶
- Schema sanity – All inputs are normalized into a
Candidatewithid,doc(text),metadata, and the originalscoreif present. - Deterministic scoring – A single attached key (
rerank_scoreby default, configurable via--attach-key) is produced for every candidate. - Blend-aware ordering – Downstream sort may use
final_score = alpha * rerank_score + (1 - alpha) * fused_scoreor pure rerank/fused order. - Result diversity –
max_per_doclimits repeat chunks per document after sorting to keep breadth without drowning rare sources. - Observability – Structured JSONL logs separate rerank-only latency (
RERANK_LATENCY_LOG) from hybrid stage timings (HYB_LATENCY_LOG).
Inputs and outputs¶
Inputs
- Candidate lists containing
id,summary(preferred) ordoc/text/raw_text, optionalscore, and free-form metadata. - Provider credentials when required:
CO_API_KEY/COHERE_API_KEY/COHERE_KEYfor Cohere; none needed for HuggingFace or FlashRank. - Optional observability hooks:
RERANK_LATENCY_LOG(rerank-stage JSONL),BENCH_QID(join key), andHYB_LATENCY_LOG(seed/graph/rerank/fuse timing when using hybrid retrievers).
Outputs
- Sorted candidates with attached rerank signal, preserved metadata, and optional blended
final_scoretrimmed totop_k. - Reranker latency JSONL (one row per query with
stage=rerank) whenRERANK_LATENCY_LOGis set; hybrid pipelines also populateHYB_LATENCY_LOG.
Provider options¶
- Cohere API —
spec="cohere[:model]"(defaultCOHERE_RERANK_MODEL→rerank-v3.5); hosted inference, stable floats. - HuggingFace CrossEncoder —
spec="hf:<repo_or_model>"(defaultHF_CE_MODEL→BAAI/bge-reranker-base); local viasentence-transformers. - FlashRank —
spec="flashrank[:model]"(defaultFLASHRANK_MODEL→ms-marco-MiniLM-L-12-v2); lightweight CPU/GPU-friendly reranker. - Auto-selection — if
--specis omitted,build_rerankerprefers Cohere when a key exists; otherwise falls back to HuggingFace.
Candidate normalization and text choice¶
- Rerankers materialize
Candidateobjects with normalizeddoc, preservedmetadata, and the originalscorewhen present. - Text preference:
summary→doc/text/raw_text, unless--prefer text(CLI) orprefer="text"(Python) forces full-body reranking. - Returned dicts always include
id,score,metadata, normalizeddoc, and the attached rerank key so evaluation remains deterministic.
CLI: direct scoring¶
Use the CLI to rescore an arbitrary JSON payload without running the full retriever stack.
| Bash | |
|---|---|
--specacceptscohere[:model],hf:<repo>, orflashrank[:model].--prefertoggles whether text is drawn fromsummary(default) ortext.--top-ktrims the sorted output; omit to keep all items.- Post-rerank diversity (hybrid CLI): add
--diversify mmr --mmr-lambda 0.5to enforce semantic spread after rerank/blend and doc caps; use--diversify none(default) to disable. Uses a default SentenceTransformer encoder (sentence-transformers/all-MiniLM-L6-v2); optionally override withMMR_EMB_MODEL.
Retrieval integration (Hybrid / GraphRAG)¶
Pipeline: TripleHybridRetriever collects dense + BM25 seeds (and optional graph expansion) before RerankingTripleRetriever overlays cross-encoder scores.
Key knobs
fetch_top_n(default 120) controls how many seeds enter reranking.rerank_mode(dense|summary) selects which embeddings feed the cross-encoder.alphablends rerank and fused scores;sort_bytogglesfinal,rerank, orfusedordering;max_per_doccaps per-document results.- Graph expansion:
graph_expand,ingest_tag,level,hyb_expand_ratio,hyb_expand_limitmirror the hybrid CLI and app settings (production React/API and legacy Streamlit). - Post-rerank diversity (MMR):
--diversify mmrenables a Maximal Marginal Relevance pass after rerank/blend and doc caps; tune--mmr-lambda(0 = more diversity, 1 = pure relevance). Requires a local SentenceTransformer encoder (defaultsentence-transformers/all-MiniLM-L6-v2); setMMR_EMB_MODELto override. - Observability: hybrid stage metrics (seed, graph_expand, rerank, fuse) stream to
HYB_LATENCY_LOG; rerank-only timing writes toRERANK_LATENCY_LOGwhen present.
Retrieval params reference¶
Defaults are from the CLI (python -m rag.retrieval.triple_retriever) and the app (DEFAULT_RETRIEVAL_SETTINGS). If the app does not expose a param, the CLI default is used.
| Param | What it does | CLI default | App default |
|---|---|---|---|
| dataset | Which dataset to query | required | fixed_size |
| date | Dataset snapshot date (YYYY-MM-DD); omit for latest GOLD | None | 2025-09-14 |
| ingest_tag | GraphRAG ingest tag | None (falls back to COMM_INGEST_TAG) |
comm_fixed_C1_g1_2 |
| level | Community level | C1 | C1 |
| seed_k | Seeds per channel (BM25 + dense) | 120 | 120 |
| fetch_top_n | Candidates fetched before rerank | 120 | 120 |
| expand_ratio | Graph expansion ratio | 2.0 | 2.0 |
| expand_limit | Graph expansion cap | 800 | 800 |
| rerank_mode | Candidate scoring mode (dense or summary) |
dense | dense |
| rerank_spec | Cross-encoder provider/model | hf:BAAI/bge-reranker-base | not exposed (CLI default) |
| prefer | Text selection for reranker | summary | summary |
| alpha | Blend weight for rerank vs fused | 0.7 | 0.7 |
| sort_by | Ordering: final, rerank, fused | final | final |
| diversify | Post-rerank diversity mode | none | mmr |
| mmr_lambda | MMR relevance/diversity balance | 0.5 | 0.5 |
| max_per_doc | Cap chunks per doc_id | 0 | 2 |
| min_occurrence | Ensure N patent/media items | 0 (alias: --min-occurance) |
3 |
| top_k | Final results returned | 20 | 20 |
| graph_required | Error if graph not available | False | True |
| coverage | Emit coverage diagnostics | False | True |
| fusion | Emit fused payload (text or summary) |
None | summary |
| include_doc | Include reranked text in output | False | not exposed (CLI default) |
| attach_metadata | Include light metadata in output | True | not exposed (CLI default) |
| bench_root | Override bench_out root |
None | not exposed (CLI default) |
| dense_date | Override dense index date | None | not exposed (CLI default) |
| dense_persist | Override dense index path | None | not exposed (CLI default) |
Benchmark overlay (end-to-end reference)¶
python -m rag.bench.run_rerank_overlay mirrors the hybrid stack, writes rerank-stage logs, and emits run JSON compatible with rag.bench.cli evaluate.
- Swap
--rerank-spectocohere:rerank-v3.5orflashrank:ms-marco-MiniLM-L-12-v2to compare providers. - Add
--graph-expand --ingest-tag <tag> --level <level>to score graph-expanded candidates; stage log path becomeslatency_rerank_graph.jsonl. - Overlay logs produced:
latency_rerank_overlay_<spec>_<suffix>.jsonl— end-to-end overlay timing.latency_stages_rerank_<suffix>.jsonl— hybrid seeds/graph/rerank/fuse stages.evals_rerank/latency_rerank_<suffix>.jsonl— reranker-only latency emitted byrun_rerank_overlay.
Operational guidance¶
- Prefer HuggingFace/FlashRank locally (GPU-aware) and Cohere in hosted environments when latency/throughput matter.
- Ensure dense indexes exist when
rerank_mode=dense; fall back torerank_mode=summaryif missing. - Monitor rerank medians via
summarize_single_stage(..., stage="rerank")and trimfetch_top_nif p95 grows beyond latency targets. - Keep
fetch_top_nclose tohyb_seed_k(e.g., 120) and tunealphadownward if rerankers overpower fusion scores.