
I built Augur, a TypeScript RAG SDK with per-query routing and full traces
Hybrid retrieval is well supported in most RAG libraries now, but the strategy is usually fixed per pipeline. LlamaIndex's RouterRetriever is the closest prior art to per-query routing, and it makes an LLM call to pick. Augur does it with cheap heuristics on query signals. Quoted phrases, code-like tokens, named entities, question type, and language. No round-trip, sub-millisecond, recorded in the trace.
Augur routes per query: code-like tokens and quoted phrases bias toward BM25, natural-language questions toward vector, the rest to weighted hybrid. A cross-encoder reranks the top-30 either way. Every routing decision plus span timings come back in the response.
BEIR NDCG@10 (44 MB on-device stack: MiniLM-L6 + ms-marco):
| Dataset | Auto | BM25 | BM25 +rerank | Contriever | ColBERTv2 |
|---|---|---|---|---|---|
| SciFact | .70 | .67 | .69 | .68 | .69 |
| FiQA | .35 | .24 | .35 | .33 | .36 |
| NFCorpus | .32 | .33 | .35 | .33 | .34 |
Baselines are the published numbers from the BEIR, E5, and ColBERTv2 papers. Auto runs the same router across all three corpora with no per-dataset tuning.
import { Augur, LocalEmbedder, LocalReranker } from "@augur-rag/core";
const augr = new Augur({
embedder: new LocalEmbedder(),
reranker: new LocalReranker(),
});
const { results, trace } = await augr.search({ query: "exit code 137" });
// trace.decision.strategy === "keyword"
// trace.decision.reasons === ["code-like token", "short query"]
Adapters: in-memory, pgvector, Pinecone, Turbopuffer. Custom adapters are five methods. HTTP server with OpenAPI docs is in a separate package if you don't want to embed the SDK.
Repo: https://github.com/willgitdata/augur · npm: @augur-rag/core
Would love any feedback!