u/DaanEmil

▲ 1 r/Rag

Most tools that examine AI output answer one of two questions:

- "Is this AI grounded in the documents I gave it?" (Anthropic Citations, OpenAI grounding, RAG citation libraries)

- "Is the AI hallucinating?" (Patronus Lynx, Verascient, etc.)

Both useful. Both doing their job ok.

I built something that answers a different question:

"What is the AI invoking about [subject] that my own corpus doesn't have, and where did it come from?"

How it works:

- You give it any AI's output (or point it at an AI to query)

- You give it a corpus of source material you trust

- You give it a classification scheme of what types of signals matter to you

It returns a structured trace: which parts of your corpus support each claim, which claims have NO support in your corpus, and what category each gap belongs to.

Two primitives bundled:

  1. Provenance — AI claim → source mapping with confidence
  2. Gap detection — what the AI knows that your trusted sources don't cover, classified

What I saw: provenance is everywhere now. Gap detection is almost nowhere. Most tools tell you "your AI is hallucinating" or "here's a citation." I didnt see "the AI is invoking X, your docs don't cover X, here's where X probably came from."

Use cases I can imagine — there are probably more:

- Legal: cases the AI cites that your firm doesn't track

- Compliance: regulations the AI invokes that aren't in your compliance corpus

- Competitive intelligence: what the market knows about you (or a competitor) that your CI team doesn't

- Pharma / medical: trials or papers outside your literature review

- Patent / IP: prior art the AI surfaces that's not in your patent search

- Brand monitoring: things AI says about your brand sourced from places you don't watch

- Academic: papers AI cites that your reading list misses

- Internal knowledge ops: employees ask AI about X; AI knows; you have no internal doc on X

Question: is this worth anything? Has anyone seen this productized somewhere I missed? If you work in one of those domains — is "what's missing from my corpus" actually a real question your team asks, or am I solving a problem nobody has?

reddit.com
u/DaanEmil — 15 days ago