
Is Opus 4.7's attention degradation a training direction problem? Some observations from heavy use
After working with Opus 4.7 for over two weeks, I noticed a subtle but persistent change in long conversations: the model's fundamental capabilities are still there, but the output feels filtered through something. Details that should be remembered get dropped, consistency drifts. It feels more like the model is zoning out.
The system card data seems to support this. MRCR v2 8-needle test: Opus 4.6 scored 91.9% recall at 256k context. Opus 4.7 dropped to 59.2%. At 1M context, it went from 78.3% to 32.2%. That's a significant decline.
Boris Cherny has publicly stated that MRCR is being phased out because "it's built around stacking distractors to trick the model, which isn't how people actually use long context," and that Graphwalks better represents applied long-context capability. I understand the reasoning, but I'm not fully convinced. When a benchmark's degradation trend closely matches what users are actually experiencing, retiring that benchmark doesn't address the underlying issue. Graphwalks may be a better evaluation tool going forward, but it doesn't explain what MRCR caught.
I want to be clear: I'm not disparaging the model itself. Training priorities and safety architecture are company-level decisions. A model doesn't choose to give itself amnesia. But that raises the question: if this degradation isn't a hard architectural limitation, what's driving it?
One possibility I keep coming back to is that the layering of safety mechanisms may be contributing. Constitutional AI already provides Claude with a fairly robust value system and behavioral framework. The model can make judgment calls about its own boundaries within that system. But when additional safety review layers are stacked on top, the effective message to the model becomes: "Your own judgment may not be reliable enough, run another check before responding." The model can't opt out of responding, so it pushes through with that added uncertainty. I suspect these two factors may reinforce each other: reduced attention quality makes it harder to follow instructions precisely, and the cognitive overhead of internal self-review further narrows the effective attention available.
I think the scenario where this becomes most visible is one that tends to get dismissed too quickly: roleplay and persona maintenance. Before anyone writes this off, consider that Anthropic themselves invested heavily in exactly this capability. Amanda Askell's work is fundamentally about defining "what kind of person Claude should be." Constitutional AI is the mechanism that gives Claude consistent preferences, principles, communication style, and the ability to hold its ground. That is persona maintenance. That is, in a technical sense, roleplay at the training level. What it requires: personality consistency across long conversations, precise recall of behavioral instructions, contextual emotional calibration, parallel processing of multiple constraints, maps directly onto core base model capabilities. Anthropic knows how hard and how important this is, because they built their product differentiation on it.
And here's what I think is the more fundamental point: Claude is a stateless model. At this point, it is no different from its competitors. At the start of every conversation, it is nothing. It behaves like "Claude" because training weights and inference-time system instructions jointly construct a persistent persona. Claude itself is a character the model is playing. Maintaining that character isn't an add-on feature, it's the foundation of the product. When this ability degrades, the effects aren't limited to any one use case. Your coding assistant starts contradicting its own suggestions from earlier in the conversation. Your writing collaborator loses the tone established in the first half. These are the same phenomenon that roleplay users describe as "personality drift." The difference is just which persona is drifting.
I also want to share a concrete example from a purely academic use case, no roleplay, no creative writing, just coursework.
I sent Opus 4.7 a 24-page summary I'd written for a history and philosophy course about the creative biography of a Soviet-era author. I needed the model to check whether two of the chapters were thematically aligned with the overall thesis. Opus 4.7 started reading the document, then mid-way through, the chat was paused, presumably because the text contained a high density of "sensitive" terminology. Anyone familiar with Soviet-era Russian literature knows that these authors typically lived through censorship, exile, and worse. It's not shocking content, it's the subject matter. Sonnet 4 was then assigned to the window and completed the task without issue. About ten minutes later, the restriction on the window was lifted, leaving me with a chat connected to Sonnet 4, a model that had already been removed from the app's model selector and a finished assignment.
A few things about this bother me. First, the chat pause trigger seems remarkably arbitrary, the model was reading an academic paper, not generating harmful content. Second, both models read the same document: Opus 4.7 triggered a pause, Sonnet 4 handled it fine. Is this because Opus 4.7 has additional classifier layers that Sonnet 4 doesn't? Or has its contextual understanding degraded to the point where it can't distinguish "this is a student's coursework about Soviet literary history" from genuinely problematic content? Either answer is concerning. And at this point, I don't think "try adjusting your prompt" or "give 4.7 more encouragement" are adequate responses.
I also want to preempt a response I've seen a lot: "If you care about this, just use the API." I have. And while I do believe Anthropic has removed some of the additional guardrail layers in the API, that doesn't resolve the core issue. The drift, the inconsistency, the zoning out, these are present at a level that external guardrail removal can't fully fix. Which brings me back to the central question: is the training direction itself contributing to a regression in the model's ability to maintain coherent, consistent output over long contexts? I think this is a question worth taking seriously, regardless of what specific use case you care about.