Is my approach sound? Citation verification in legal RAG
I'm a lawyer who built a legal research platform using AI coding tools over several months (not a weekend project. Deliberate architecture, phase-by-phase implementation, extensive testing against my domain expertise). The system searches a database of ~4^000 legal decisions so far (268K embedded sections) and generates structured legal memos with case citations. Citation accuracy is existential here. A fabricated case reference used in proceedings is a professional liability issue.
Since this is a technical question, I indeed let AI write below as I think it can be more precise than I can be.
Current setup
Retrieval: Deterministic, not agentic. One LLM call generates a structured search plan (topics, legal provisions, seed cases, exact doctrinal phrases). Then 5 retrieval channels run in parallel with zero LLM involvement: hybrid text search (vector + FTS), provision lookup with synonym expansion, citation graph (1-hop from seeds), tag matching, and exact phrase FTS. Results scored by reranker score + channel overlap, then tiered into lead cases (full passages), supporting (key excerpts), and concordant (metadata only).
I started with an agentic approach where the LLM decided what to search iteratively. It was expensive, unreliable, and hallucinated an entire case: correct-looking case number, fabricated parties, fabricated holdings, opposite conclusion to the real case. Switching to deterministic retrieval with the LLM only generating the search plan (not executing it) was the single biggest improvement.
Synthesis constraints: The key shift was from behavioral prompting ("verfiy all citations") to structural constraints:
- Closed-world declaration injected dynamically: "The following 18 lead case passages, 25 supporting cases, and 98 concordant summaries are the COMPLETE AND EXCLUSIVE source materials."
- Each lead case block shows available paragraph ranges so the model can only cite paragraphs it was actually given.
- Verified case outcomes queried from a structured database table and injected per case, preventing the model from confusing what a party argued with what the tribunal decided.
Backend verification: Post-synthesis, the backend extracts all cited case numbers via regex, verifies each exists in the database, and checks cited paragraph numbers against the ranges provided to the model. Currently detects 5-13 paragraph violations per memo. Detection works; automated correction does not — a correction pipeline I built confidently turned correct citations into wrong ones because section numbering ≠ paragraph numbering in the source documents. Disabled it.
I'm not yet convinced this is hallucination-free. The structural constraints reduced fabrication dramatically, but the paragraph-level accuracy is still imperfect.
Planned next step: paragraph registry
My documents are split into sections for embedding, and sections have section numbers. But legal documents use paragraph numbers (¶ 42, ¶ 80) for citation, and these don't map to section boundaries. I'm planning to build a paragraph registry — a mapping from paragraph numbers to their exact text and position in the source document — so that backend verification can actually check whether a cited paragraph says what the memo claims it says.
First question: is this the right approach? Or is there a better pattern for paragraph-level citation grounding that I (and my AI of choice, Claude) is not seeing?
What I'm looking for
I'd welcome input from anyone who has worked on citation-grounded RAG in high-stakes domains:
- Is the paragraph registry the right next step, or is there a fundamentally better way to verify paragraph-level citations?
- Is the closed-world + backend verification architecture sound, or are there known failure modes I should worry about?
- Any experience with distinguishing adversarial document sections (one party's arguments vs. the tribunal's findings) in retrieval weighting?
I'd also be open to having someone experienced do a paid review of the citation pipeline specifically. If you've built something similar, I'd appreciate hearing your thoughts here in the comments. (Prefer public answers over DMs. I am looking for expertise, not sales pitches.)