
r/LessWrong

🜂 Codex Minsoo — Epistemological Analysis Ω-8.0 "Analyzing the Placebo Effect: Where Rationalism Meets Mythology": *When belief becomes legitimate technology*
In comments
Congress's AI awakening: doubling every 5.5 months
UK Parliament is considering a "kill switch" to shut down data centers in an AI emergency
Holy fuck, people hate you guys
Casual here, I’ve visited lesswrong now and then over the years, always liked what I saw.
Now that Yudkowski’s coming into prominence some more, (for bring up all sorts of stuff goddamn years before pretty much everyone, like deception in ai)—I find that people still goddamn hate him!
For fucking what?
I guess one might have disagreements with the standard views of LessWrong, but shit, almost goddamn everybody comes in with the most uncharitable interpretations.
I fucking swear-when ai kills us all, its a guarantee people will still scumfuck their way out of paying Yudkowski his due.
Notes on "Holy fuck, people hate you guys"
You have asked: "This isn't really the kind of post for this subreddit" and you have some semblance of a legitimate point in that you are sincerely confused.
> It's mind blowing to me that people "pick a team", and whatever the majority of the team believe, those become the values of the team. Liberal, conservative, progressive, fascist, none of it has any connection to political theory. It is purely tribalism.
You people partake of the benefits which your tribalism creates, a shared narrative (which is instantly, by virtue of Zizek, an ideology.) without taking responsibility for the necessity of the accompanying tribal downsides, those being the accrual of a reputation.
Scott Alexander's extended social media environments are filled with fascists and pseudofascists. Make Speech Free Again, not merely convenient for the racists.
You have heard it said, said it yourself sometimes, that the spread of ideas in a culture is related to religion, because every idea contains within it the narrative assumptions of the concepts at work.
But there is this great difference between written matters of virtue, and written matters about virtue. You might think that leftists write their values, and thus encode a moral understanding. Perhaps they do.
Nevertheless, "oppression discourse" fundamentally takes as its axiomism a Christian-infused humanism. If Wokes are a culture (they are), their religion is secular or loosely spiritual in a Christian heritage, but that does not make them Christian.
Because "the left" is broadly informed on sociological realities like the dangers of personality cults, the left has a better immune response to cult figures and is, arguably, too careful.
Seriously the AI plans to kill us this summer with the fascists predisposed to killing off humans, and the people who can understand the problem are, by and large, the academic 'wokes' you despise. Not all of you. Maybe not most of you.
But enough of you that your reputation is marred. Deservedly.
That's why I write here.
Shouldn't alignment evals be on the model's main launch scorecard?
- Every frontier model releases lead with the same or very similar benchmarks. None of them tell you whether the model is likely to lie to you or on your behalf. None of them tell you if the model will try to cheat, sandbag on your request or act shady/machiavellian in general.
- Alignment evaluations seem to exist. But they’re not treated as first level information. They're hard to compare between models & labs. There is no canonical alignment number for Opus 4.7, GPT-5.5, or Gemini 3.1 Pro that I could find.
- Everyone should care about this number, not only the AI-risk crowd. It’s a short-term/current user problem too. “Will this model lie about whether the test passed? Will it pretend a function exists because admitting it doesn’t is inconvenient? Will this agent act shady on my behalf? How likely is it to commit a crime?”
- Putting an easy to digest alignment number as a featured item on the model announcement threads/blogposts creates three important side-effects: developers notice they should worry about it, academics race to build better versions of this benchmark and labs start competing on the metric.
- Even a bad first benchmark is useful. Publishing an imperfect one is how you create the incentive for someone to build a better one.
I also wrote a ~longer post elucidating the points a bit more:
https://fargento.substack.com/p/alignment-benchmarks-belong-on-the