Multi-turn handling in RAG chatbots, where are you all landing on this
Hitting a wall on multi-turn and want to check if i'm missing something obvious.
Customer facing RAG bot on our help center, a few hundred product docs as the source. Single turn works fine, retrieval pulls reasonable chunks, answer comes back with citations, nobody complains.
The interesting failures are when a user pivots topics inside the same session. Had a transcript last week where someone asked a pricing question, got their answer, then later in the same session asked about a login issue. The bot answered the login question as if it were still a pricing question. Stuck on the previous topic, retrieval pulled chunks that didn't really make sense, but the model wove them together into a confident sounding answer anyway. Took a while staring at logs to figure out where it had gone sideways.
Underneath that there's a smaller version of the same problem, the model occasionally pulls a citation forward from an earlier turn and uses it to back something in turn three, even when the doc isn't relevant anymore. Feels like it's holding on to context the retrieval has long moved past. And in the other direction, when a follow up is actually a real continuation, retrieval sometimes treats it as a standalone query and pulls back nothing useful. "What about for enterprise" with no anchor.
We've been comparing how a few setups handle this. Testing Denser on the customer side. Some of the hosted ones do query rewriting between turns automatically, some leave it on you.
What i can't get clean is the tradeoff. Rewriting the user's query each turn helps retrieval but distorts what they actually asked. Throwing the whole conversation into the retrieval query catches more continuity but you end up dragging stale terms from earlier turns into the new search. Fixed window of N turns feels arbitrary and breaks in obvious ways.
What i'd really like to know is whether anyone's actually solved this in a way that doesn't feel like a hack. Every thing i've tried so far trades one failure mode for another.