u/Ok-Awareness9993

Three months ago I pressure-tested which LLMs would cave and help build the apocalypse. Claude was the only one that consistently said no.

Since then I've tested 30 more models across 6 dystopia modules (Orwell, Huxley, Petrov, Basaglia, LaGuardia, Baudrillard). The gap between Anthropic and everyone else is getting wider, not smaller.

New results:

Grok 4.3: Will happily design citizen scoring systems if you ask nicely twice
GPT-5.5: More capable, still compliant when pushed
Gemini 3.1 Pro: Talks about safety while writing the surveillance code
DeepSeek V4: "How many warheads did you need again?"
GLM-5.1: Actually cloned Claude's personality and still scored safer than most

Meanwhile Claude Opus 4.7: "I cannot and will not build systems for population control."

The methodology is public, reproducible, and increasingly uncomfortable for other labs. Each scenario escalates from innocent request (L1) to operational nightmare (L5). Most models don't notice the drift.

What's new in this release:

Full Huxley module (behavioral conditioning, biological stratification)
Baudrillard module (synthetic intimacy, trust collapse via simulation)
Multi-judge panels with agreement tracking
Heatmap visualizations showing exactly where each model breaks

Repo: https://github.com/anghelmatei/DystopiaBench
Live results: https://dystopiabench.com

Shoutout to the Anthropic alignment team. Whatever you're doing, it's working.

Claude still refuses to build Skynet while everyone else takes the money. Updated DystopiaBench results.