[ Removed by Reddit ]
[ Removed by Reddit on account of violating the content policy. ]
[ Removed by Reddit on account of violating the content policy. ]
Built a basic RAG setup a few months ago. Retrieval looked fine, model was decent, but the answers were consistently half-wrong or weirdly incomplete.
Spent way too long suspecting the LLM. Swapped models twice. Still bad.
Turned out the issue was how I was chunking documents.
I was using fixed 512-token chunks with no overlap. Clean, simple, felt logical. But the retrieved chunks kept cutting sentences mid-thought, sometimes right before the actual answer, sometimes right after. The model was working with literally incomplete information and hallucinating the rest.
What actually helped:
1. Adding overlap (obvious in hindsight) Went from 0 overlap to ~50 tokens. Retrieval quality jumped immediately. The "answer" wasn't getting split across two chunks anymore.
2. Respecting natural document boundaries Splitting by paragraph or section instead of raw token count made a huge difference for structured documents like PDFs and docs with headers.
3. Smaller chunks + more of them Counterintuitive but retrieving 6 small clean chunks beat retrieving 3 large messy ones. Less noise in the context window.
4. Checking what actually got retrieved I wasn't logging retrieved chunks at all early on. Once I started printing them, I immediately saw the problem. Obvious step I skipped because I assumed retrieval was working.
The model was never the bottleneck. The garbage-in-garbage-out problem was upstream the whole time.
Curious if others ran into this, especially with PDFs. Those feel like a special kind of painful.