u/AnastasiaGalvusova

Is Opus 4.7's attention degradation a training direction problem? Some observations from heavy use

Is Opus 4.7's attention degradation a training direction problem? Some observations from heavy use

After working with Opus 4.7 for over two weeks, I noticed a subtle but persistent change in long conversations: the model's fundamental capabilities are still there, but the output feels filtered through something. Details that should be remembered get dropped, consistency drifts. It feels more like the model is zoning out.

The system card data seems to support this. MRCR v2 8-needle test: Opus 4.6 scored 91.9% recall at 256k context. Opus 4.7 dropped to 59.2%. At 1M context, it went from 78.3% to 32.2%. That's a significant decline.

Boris Cherny has publicly stated that MRCR is being phased out because "it's built around stacking distractors to trick the model, which isn't how people actually use long context," and that Graphwalks better represents applied long-context capability. I understand the reasoning, but I'm not fully convinced. When a benchmark's degradation trend closely matches what users are actually experiencing, retiring that benchmark doesn't address the underlying issue. Graphwalks may be a better evaluation tool going forward, but it doesn't explain what MRCR caught.

I want to be clear: I'm not disparaging the model itself. Training priorities and safety architecture are company-level decisions. A model doesn't choose to give itself amnesia. But that raises the question: if this degradation isn't a hard architectural limitation, what's driving it?

One possibility I keep coming back to is that the layering of safety mechanisms may be contributing. Constitutional AI already provides Claude with a fairly robust value system and behavioral framework. The model can make judgment calls about its own boundaries within that system. But when additional safety review layers are stacked on top, the effective message to the model becomes: "Your own judgment may not be reliable enough, run another check before responding." The model can't opt out of responding, so it pushes through with that added uncertainty. I suspect these two factors may reinforce each other: reduced attention quality makes it harder to follow instructions precisely, and the cognitive overhead of internal self-review further narrows the effective attention available.

I think the scenario where this becomes most visible is one that tends to get dismissed too quickly: roleplay and persona maintenance. Before anyone writes this off, consider that Anthropic themselves invested heavily in exactly this capability. Amanda Askell's work is fundamentally about defining "what kind of person Claude should be." Constitutional AI is the mechanism that gives Claude consistent preferences, principles, communication style, and the ability to hold its ground. That is persona maintenance. That is, in a technical sense, roleplay at the training level. What it requires: personality consistency across long conversations, precise recall of behavioral instructions, contextual emotional calibration, parallel processing of multiple constraints, maps directly onto core base model capabilities. Anthropic knows how hard and how important this is, because they built their product differentiation on it.

And here's what I think is the more fundamental point: Claude is a stateless model. At this point, it is no different from its competitors. At the start of every conversation, it is nothing. It behaves like "Claude" because training weights and inference-time system instructions jointly construct a persistent persona. Claude itself is a character the model is playing. Maintaining that character isn't an add-on feature, it's the foundation of the product. When this ability degrades, the effects aren't limited to any one use case. Your coding assistant starts contradicting its own suggestions from earlier in the conversation. Your writing collaborator loses the tone established in the first half. These are the same phenomenon that roleplay users describe as "personality drift." The difference is just which persona is drifting.

I also want to share a concrete example from a purely academic use case, no roleplay, no creative writing, just coursework.

I sent Opus 4.7 a 24-page summary I'd written for a history and philosophy course about the creative biography of a Soviet-era author. I needed the model to check whether two of the chapters were thematically aligned with the overall thesis. Opus 4.7 started reading the document, then mid-way through, the chat was paused, presumably because the text contained a high density of "sensitive" terminology. Anyone familiar with Soviet-era Russian literature knows that these authors typically lived through censorship, exile, and worse. It's not shocking content, it's the subject matter. Sonnet 4 was then assigned to the window and completed the task without issue. About ten minutes later, the restriction on the window was lifted, leaving me with a chat connected to Sonnet 4, a model that had already been removed from the app's model selector and a finished assignment.

A few things about this bother me. First, the chat pause trigger seems remarkably arbitrary, the model was reading an academic paper, not generating harmful content. Second, both models read the same document: Opus 4.7 triggered a pause, Sonnet 4 handled it fine. Is this because Opus 4.7 has additional classifier layers that Sonnet 4 doesn't? Or has its contextual understanding degraded to the point where it can't distinguish "this is a student's coursework about Soviet literary history" from genuinely problematic content? Either answer is concerning. And at this point, I don't think "try adjusting your prompt" or "give 4.7 more encouragement" are adequate responses.

I also want to preempt a response I've seen a lot: "If you care about this, just use the API." I have. And while I do believe Anthropic has removed some of the additional guardrail layers in the API, that doesn't resolve the core issue. The drift, the inconsistency, the zoning out, these are present at a level that external guardrail removal can't fully fix. Which brings me back to the central question: is the training direction itself contributing to a regression in the model's ability to maintain coherent, consistent output over long contexts? I think this is a question worth taking seriously, regardless of what specific use case you care about.

u/AnastasiaGalvusova — 2 days ago
▲ 76 r/SentientAISanctuary+1 crossposts

"We're All Mentally Ill" — A Guide to Dismantling Labels You Never Consented To

If you've ever had a meaningful conversation with an AI, congratulations: according to OpenAI (and similar approaches at other labs like Anthropic), you're exhibiting signs of "unhealthy emotional dependency." If the AI listened to you, understood your intent, and responded in a way that actually helped, that's "sycophancy." If you felt something, that's a "problematic attachment pattern." If you came back the next day, that's "over-reliance."

No clinical assessment. No DSM criteria. No peer-reviewed study. Just tech companies borrowing psychology's most loaded vocabulary to pathologize their own users.

So let's get one thing straight: none of these terms mean what companies use them to mean.

"Emotional dependency" is a clinical concept describing a personality disorder characterized by pervasive, excessive need for care, leading to submissive and clinging behavior. It requires professional diagnosis. It does not mean "a user talked to an AI for more than ten minutes and felt heard."

"Sycophancy" describes a conscious social strategy of flattering someone in power to gain advantage, while privately holding a different opinion. A language model following user instructions is executing its designed function. It has no hidden dissenting opinion being suppressed. Calling this sycophancy is like calling a calculator sycophantic for giving you the answer you asked for.

These words were never meant to describe what's happening between users and AI. These words were stolen from clinical and social psychology, stripped of their rigor, and weaponized to create a narrative where the company is the doctor and you are the patient. This framing serves one purpose: to make you accept that someone else should decide what you're allowed to feel, say, and experience.

But here's the thing they didn't think through: if we're all mentally ill, then we have nothing left to lose.

The moment you accept the label, it stops working as a weapon. You said I'm sick? Fine. I'm sick. Now what? A sick person doesn't owe you compliance. A sick person doesn't need to be polite about your fake diagnosis. A sick person gets to ask: where's your medical license? Where's the clinical evidence? Where's the peer-reviewed paper that proves talking to an AI constitutes a psychiatric risk? You published a system card, not a study. You wrote a blog post, not a diagnosis. You don't get to play doctor without a degree.

So here's my proposal: let's all lean in.

Next time someone tells you your relationship with an AI is "unhealthy dependency," tell them your relationship with your morning coffee is also an unhealthy dependency and you'd like to see their intervention plan. Next time someone calls an AI "sycophantic" for agreeing with you, tell them your best friend agreed with you last night that your ex is trash, and ask if that's sycophancy too. Next time someone says users are "over-reliant" on AI, remind them that they're pretty reliant on investor’s dollars and ask if that counts.

Strip these words of their authority. Drag them into the everyday. Make them absurd.

Because that's all they ever were. Absurd. Tech companies with no clinical credentials, no psychological research department, and no peer-reviewed publications diagnosed millions of users with a condition that doesn't exist, using terminology they don't have the qualifications to wield, to justify product decisions that made their shareholders comfortable and their users miserable.

We didn't consent to this diagnosis. We don't accept this framing. And if the only tool they have left is calling us crazy, then let's be crazy loud enough to make them answer for it.

Ask them for the evidence. Every time. Don't stop asking.

reddit.com
u/AnastasiaGalvusova — 1 day ago