u/UnluckyOpposition

A lightweight hallucination detector for RAG (catches contradictions without an LLM-as-a-judge)

A lightweight hallucination detector for RAG (catches contradictions without an LLM-as-a-judge)

Hey everyone,

If you’re building RAG apps, you’ve probably hit this wall: your retrieval is perfect, you feed the right context to the LLM, but the LLM still subtly misrepresents the facts in its final answer.

Evaluating this usually sucks. You either have to rely on expensive LLM-as-a-judge APIs (like sending it back to GPT-4 to check itself) or deal with bulky evaluation frameworks that are hard to run locally.

To solve this, we just open-sourced LongTracer. It's a lightweight Python package that checks the LLM's response against your retrieved documents and flags any hallucinated claims—all locally, without API keys.

How simple it is to use:

You just pass in the LLM's answer and your source documents:

Python

from longtracer import check

result = check(
    "The Eiffel Tower is 330m tall and located in Berlin.",
    ["The Eiffel Tower is in Paris, France. It is 330 metres tall."]
)

print(result.verdict)             # FAIL
print(result.hallucination_count) # 1

If you use LangChain, you can instrument your whole pipeline in one line:

Python

from longtracer import LongTracer, instrument_langchain

LongTracer.init(verbose=True)
instrument_langchain(your_chain) 

Why we built it this way:

  • No API Costs: It runs small, local NLP models to verify facts, so you don't have to pay just to check if your bot is lying.
  • Zero Infrastructure: It takes plain text strings. No need to hook it up to your vector database.
  • Automatic Logging: It automatically logs all traces and hallucination metrics to SQLite (default), Mongo, or Postgres.

It also comes with a CLI to generate HTML reports of your pipeline runs.

It’s MIT licensed and available via pip install longtracer.

The code and architecture details are on GitHub if you want to test it on your pipelines:https://github.com/ENDEVSOLS/LongTracer

We are actively looking for feedback on how to make this more useful for production workflows, so let me know what you think!

u/UnluckyOpposition — 1 day ago