u/BadGeeky

▲ 3 r/Rag

Multi-turn handling in RAG chatbots, where are you all landing on this

Hitting a wall on multi-turn and want to check if i'm missing something obvious.

Customer facing RAG bot on our help center, a few hundred product docs as the source. Single turn works fine, retrieval pulls reasonable chunks, answer comes back with citations, nobody complains.

The interesting failures are when a user pivots topics inside the same session. Had a transcript last week where someone asked a pricing question, got their answer, then later in the same session asked about a login issue. The bot answered the login question as if it were still a pricing question. Stuck on the previous topic, retrieval pulled chunks that didn't really make sense, but the model wove them together into a confident sounding answer anyway. Took a while staring at logs to figure out where it had gone sideways.

Underneath that there's a smaller version of the same problem, the model occasionally pulls a citation forward from an earlier turn and uses it to back something in turn three, even when the doc isn't relevant anymore. Feels like it's holding on to context the retrieval has long moved past. And in the other direction, when a follow up is actually a real continuation, retrieval sometimes treats it as a standalone query and pulls back nothing useful. "What about for enterprise" with no anchor.

We've been comparing how a few setups handle this. Testing Denser on the customer side. Some of the hosted ones do query rewriting between turns automatically, some leave it on you.

What i can't get clean is the tradeoff. Rewriting the user's query each turn helps retrieval but distorts what they actually asked. Throwing the whole conversation into the retrieval query catches more continuity but you end up dragging stale terms from earlier turns into the new search. Fixed window of N turns feels arbitrary and breaks in obvious ways.

What i'd really like to know is whether anyone's actually solved this in a way that doesn't feel like a hack. Every thing i've tried so far trades one failure mode for another.

reddit.com
u/BadGeeky — 2 days ago

Spent years on a stack before realizing I'd never looked at what populations who age well actually eat

Got pulled down a rabbit hole on dietary patterns in regions with longer healthspan and one thing kept showing up that I'd never paid attention to. Mushroom intake in parts of East Asia is dramatically higher than what most people in the West eat, even the health conscious ones. Not just shiitake, basically all kinds, often daily. There's a long-running Singapore cohort that found higher mushroom consumption tracked with slower cognitive decline in older adults.

That sent me into a tangent about why mushrooms specifically. Apparently there's a compound in them that humans have a specific transporter dedicated to, which is unusual for something that isn't strictly essential. Plasma levels drop significantly with age and lower levels correlate with worse cognitive trajectories.

What got me was I'd spent maybe four years tweaking the standard rotation, magnesium and fish oil and b vitamins, without ever asking what populations who actually age well are doing differently at the food level. Bumping mushroom intake is what I tried first but the amounts you'd realistically need to move plasma levels are kind of impractical from food alone unless you're eating them at most meals.

Curious to hear from people who went food-first before reaching for supplementation, especially for stuff that's hard to hit from a Western diet.

reddit.com
u/BadGeeky — 6 days ago

I've been through three bluetooth speakers in the last two years and they all have the same problem. They sound fine when I'm doing dishes or have people over, but the second I want to play something at like 20% volume late at night while reading or cooking dinner solo, everything turns into a thin tinny mess. Bass disappears, vocals get this weird hollow quality, and I end up just turning my phone speaker on instead.

I get that there's some Fletcher Munson loudness curve thing happening at low volumes but most consumer speakers don't seem to engineer for it at all. The auto loudness compensation on some of them actually makes it worse, like it's pumping fake bass that sounds boomy and disconnected from the rest of the mix.

Most of what gets recommended in this sub is built for parties or beach use. I want something specifically meant to live indoors and sound decent at conversational volume. Doesn't need to be portable, doesn't need to be waterproof, just needs to not sound like garbage at the volume I actually use it 90 percent of the time.

Mostly trying to figure out if this is a driver count thing, a tuning thing, or just a spend more money thing.

reddit.com
u/BadGeeky — 14 days ago

Working on a production RAG pipeline, went the standard route. BM25 plus vector ensemble with LangChain's EnsembleRetriever doing RRF fusion. Theory being keyword matching for proper nouns, version codes, exact terms, and semantic matching for everything else.

What's killing me is the weight tuning. Some queries clearly want more BM25 weight (codes, exact phrases, anything where keyword precision matters). Others clearly want more vector (paraphrased questions, conceptual stuff). Any single weight combo I lock in, half my eval set gets better and the other half regresses. Feels like there's no global optimum, every choice is a tradeoff.

One side thing that's been bugging me. RRF on its own is rank-based and shouldn't need weights at all. The original paper just sums 1/(k+rank) across retrievers. LangChain's implementation takes weights and applies them to the RRF scores, which is technically RRF plus weighted fusion combined. Works, but it means I'm tuning a parameter that the algorithm conceptually doesn't have.

Tried a couple of escape hatches. Query classification routing helps a bit (short keyword-heavy queries to BM25-weighted, long NL queries to vector-weighted), but the classifier becomes its own weak link. Dropping fusion and just using a strong reranker on a wide vector candidate set actually worked better than fusion for our data. Set vector to top-50, rerank to top-5, skip BM25 entirely. Tradeoff is reranker latency, which is real.

For people running mixed-query RAG in production, what's the strategy that survived contact with real users? Genuinely curious if anyone has a cleaner pattern than "just throw a reranker at it".

reddit.com
u/BadGeeky — 16 days ago