u/ekshaks

RAG for the legal domain has been “hot” for a long time, and the market is now crowded with products.

I see a lot of posts from devs/lawyers building legal RAG, but discussions focused mainly around chunking, embeddings, reranking, and fine-tuning. That is important, but I think they overlook the harder question: what will actually help legal professionals?

I wrote down my impressions on why useful Legal RAG is still hard even after many years of research/products:

Legal queries are complex. They need keyword search, semantic search, jurisdiction awareness, and some legal knowledge baked into the retrieval process. So we probably need robust hybrid/agentic search pipelines, not just vector search. This is harder to build.
Retrieving “superficially” relevant cases/citations is not enough. A citation can be semantically relevant but legally unusable: overruled, wrong jurisdiction, lower court, stale, or not citable for the point you need.
This second issue is critical. It needs "authority-aware" retrieval and citation validation, both of which need significant human involvement. It is not something a better embedding model or reranking alone will fix.

I also think this is a problem with many benchmarks. Without enough human involvement, benchmarks end up being curated with LLM judges, checking narrow retrieval from specific passages, and do not match the messier patterns lawyers deal with in reality.

Without hard, realistic public legal benchmarks, it is difficult to know whether we are building “real” Legal AI, or just better demos.

If you’ve tried building Legal RAG, or getting lawyers to use your tool, I’d love to know the challenges you faced and the top blockers to adoption.

Longer write-up here: https://agentengg.substack.com/p/why-legal-ai-remains-unsolved-a-technical

Legal RAG remains unsolved because it needs authority, not just relevance