u/ricklopor

been noticing this a lot lately when testing models for content workflows. they handle short back-and-forth really well but the moment you get into a longer multi-turn conversation, something breaks down. like the model starts losing track of what was established earlier and just. drifts. reckon it's less about intelligence and more about how quickly context gets muddled, especially when the relevant info isn't sitting right at the end of the prompt. what gets me is whether scaling actually fixes this or just papers over it. newer reasoning-focused models seem better at staying coherent but I've still hit plenty of cases where they confidently go off in the wrong direction mid-conversation. curious if others are seeing this too, and whether you think it's a fundamental training data limitation or more of an architecture problem that could actually be solved.

do LLMs actually generalize or just pattern match really well in conversations