u/Affectionate-Lie1192

Hybrid retrieval is well supported in most RAG libraries now, but the strategy is usually fixed per pipeline. LlamaIndex's RouterRetriever is the closest prior art to per-query routing, and it makes an LLM call to pick. Augur does it with cheap heuristics on query signals. Quoted phrases, code-like tokens, named entities, question type, and language. No round-trip, sub-millisecond, recorded in the trace.

Augur routes per query: code-like tokens and quoted phrases bias toward BM25, natural-language questions toward vector, the rest to weighted hybrid. A cross-encoder reranks the top-30 either way. Every routing decision plus span timings come back in the response.

BEIR NDCG@10 (44 MB on-device stack: MiniLM-L6 + ms-marco):

Dataset	Auto	BM25	BM25 +rerank	Contriever	ColBERTv2
SciFact	.70	.67	.69	.68	.69
FiQA	.35	.24	.35	.33	.36
NFCorpus	.32	.33	.35	.33	.34

Baselines are the published numbers from the BEIR, E5, and ColBERTv2 papers. Auto runs the same router across all three corpora with no per-dataset tuning.

import { Augur, LocalEmbedder, LocalReranker } from "@augur-rag/core";

const augr = new Augur({
  embedder: new LocalEmbedder(),
  reranker: new LocalReranker(),
});

const { results, trace } = await augr.search({ query: "exit code 137" });
// trace.decision.strategy === "keyword"
// trace.decision.reasons === ["code-like token", "short query"]

Adapters: in-memory, pgvector, Pinecone, Turbopuffer. Custom adapters are five methods. HTTP server with OpenAPI docs is in a separate package if you don't want to embed the SDK.

Repo: https://github.com/willgitdata/augur · npm: @augur-rag/core

Would love any feedback!

I built Augur, a TypeScript RAG SDK with per-query routing and full traces