u/Ill-Structure4482

I’ve been working on an early exploration project around a common RAG retrieval problem.

This is not meant to argue that RAG is obsolete or that vector search is useless. The hypothesis is narrower:

Can hash-addressed evidence chunks act as a complementary pre-retrieval boundary layer for RAG?

The idea is to take high-value or frequently reused text, split it into chunks, and attach each chunk to metadata such as corpus, jurisdiction, unit, source_hash, and evidence_hash.

Traditional RAG often starts with:

“Which chunk is semantically similar?”

This exploration first asks:

“Which bounded evidence address is this query allowed to search?”

This does not mean removing BM25, vector search, or hybrid retrieval. The idea is to narrow the allowed search space first, then let BM25 / TF-IDF / vector / hybrid retrieval work inside that bounded subset.

I call this experiment HSRAG: Hash-Structured Retrieval-Augmented Generation. The name may sound larger than the current implementation, so the more precise description is: an early exploration project where hash-addressed chunks act as retrieval boundaries and auditable evidence units. I plan to keep testing different hypotheses, baselines, corpus settings, and failure cases.

I used legal text as the first benchmark domain because legal retrieval has many easy-to-define cross-domain failure cases, such as:

EU AI Act Article 5
U.S. FTC Act Section 5
CDA Section 230
EU DMA gatekeeper obligations

The latest benchmark is RQ6, which tests multi-turn retrieval contamination. For example, if a user first asks about EU law and then switches to a similar U.S. law, does the retriever incorrectly carry over the previous EU context?

RQ6 stress run:

20,000 Monte Carlo trials
720,000 result rows
retrieval modes: BM25, TF-IDF, Hybrid RRF, HSRAG CTHC, HSRAG Hybrid Subset
context policies: no_memory, naive_memory, bounded_cthc_memory

In this controlled benchmark, the current observation is that HSRAG modes had:

0 wrong-corpus retrieval
0 wrong-jurisdiction retrieval
0 false allow on NO_EVIDENCE / AMBIGUOUS cases
0 cross-turn contamination

Important caveats:

This depends on clean upfront corpus classification
This is not legal advice
This is not production-ready
This is not a RAG replacement claim
If metadata is wrong, the hash-addressed boundary can also be wrong
Multi-law comparison questions should still be decomposed into atomic retrieval tasks before synthesis

Repo: https://github.com/Void-Ghost000/HSRAG

I’m curious how people here would think about this:

Are hash-addressed evidence chunks useful as a complement to RAG?
Is this better described as metadata filtering, routing, or retrieval governance?
What would be a fair vector baseline for this kind of benchmark?
In production RAG, would the biggest issue be ingestion, metadata quality, scale, or query decomposition?

Exploring hash-addressed evidence chunks as a complement to RAG