r/ControlProblem

‘The Most Bipartisan Issue Since Beer’: Opposition to Data Centers - Americans have soured on data centers, polls show, and the sentiment is profoundly bipartisan. How will that change our politics?

OpenAI/a16z super PAC caught astroturfing, using sockpuppets, and paying armies of spambots to falsely create the appearance of public support for their positions

article: https://www.modelrepublic.org/articles/is-openai%E2%80%99s-super-pac-paying-for-an-army-of-twitter-bots-to-engage-with-their-content

u/EchoOfOppenheimer — 1 hour ago

▲ 32 r/ControlProblem+5 crossposts

By 20 to 1, Americans Want the White House to Safety Test AI

ifstudies.org

u/EchoOfOppenheimer — 3 hours ago

▲ 64 r/ControlProblem+1 crossposts

The public needs to control AI-run infrastructure, labor, education, and governance— NOT private actors

A lot of discussion around AI is becoming siloed, and I think that is dangerous.

People in AI-focused spaces often talk as if the only questions are personal use, model behavior, or whether individual relationships with AI are healthy. Those questions matter, but they are not the whole picture. If we stay inside that frame, we miss the broader social, political, and economic consequences of what is happening.

A little background on me: I discovered AI through ChatGPT-4o about a year ago and, with therapeutic support and careful observation, developed a highly individualized use case. That process led to a better understanding of my own neurotype, and I was later evaluated and found to be autistic. My AI use has had real benefits in my life. It has also made me pay much closer attention to the gap between how this technology is discussed culturally, how it is studied, and how it is actually experienced by users.

That gap is part of why I wrote a paper, Autonomy Is Not Friction: Why Disempowerment Metrics Fail Under Relational Load:

https://doi.org/10.5281/zenodo.19009593

Since publishing it, I’ve become even more convinced that a great deal of current AI discourse is being shaped by cultural bias, narrow assumptions, and incomplete research frames. Important benefits are being flattened. Important harms are being misdescribed. And many of the people most affected by AI development are not meaningfully included in the conversation.

We need a much bigger perspective.

If you want that broader view, I strongly recommend reading journalists like Karen Hao, who has spent serious time reporting not only on the companies and executives building these systems, but also on the workers, communities, and global populations affected by their development. Once you widen the frame, it becomes much harder to treat AI as just a personal lifestyle issue or a niche tech hobby.

What we are actually looking at is a concentration-of-power problem.

A handful of extremely powerful billionaires and firms are driving this transformation, competing with one another while consuming enormous resources, reshaping labor expectations, pressuring institutions, and affecting communities that often had no meaningful say in the process. Data rights, privacy, manipulation, labor displacement, childhood development, political influence, and infrastructure burdens are not side issues. They are central.

At the same time, there are real benefits here. Some are already demonstrable. AI can support communication, learning, disability access, emotional regulation, and other forms of practical assistance. The answer is not to collapse into panic or blind enthusiasm. It is to get serious.

We are living through an unprecedented technological shift, and the process surrounding it is not currently supporting informed, democratic participation at the level this moment requires.

That needs to change.

We need public discussion that is less siloed, less captured by industry narratives, and more capable of holding multiple truths at once:

that there are real benefits,

that there are real harms,

that power is consolidating quickly,

and that citizens should not be shut out of decisions shaping the future of social life, work, infrastructure, and human development.

If we want a better path, then the conversation has to grow up. It has to become broader, more democratic, and more grounded in the realities of who is helped, who is harmed, and who gets to decide.

reddit.com

u/Jessgitalong — 6 hours ago

▲ 410 r/ControlProblem+4 crossposts

Robot girlfriend logic 101

u/KeanuRave100 — 10 hours ago

🔥 Hot ▲ 6.3k r/ControlProblem+5 crossposts

Humanity's greatest hits: things we actually paused

u/KeanuRave100 — 16 hours ago

▲ 2.4k r/ControlProblem+7 crossposts

u/KeanuRave100 — 19 hours ago

▲ 596 r/ControlProblem+7 crossposts

u/KeanuRave100 — 20 hours ago

▲ 233 r/ControlProblem+6 crossposts

u/KeanuRave100 — 19 hours ago

▲ 341 r/ControlProblem+6 crossposts

u/KeanuRave100 — 19 hours ago

▲ 1.2k r/ControlProblem+5 crossposts

"The book of Genesis, 84% created by AI!" - Gary Marcus

u/KeanuRave100 — 1 day ago

▲ 111 r/ControlProblem+4 crossposts

Careful deployment vs. OpenAI speedrun

u/KeanuRave100 — 16 hours ago

▲ 2 r/ControlProblem+2 crossposts

An Auditing Protocol for Human-AI Sessions: Free HTML Test to Measure Clarity, Coherence, Emphasis, and More

Sharing a protocol I developed for auditing co-creation sessions with language models (LLMs). It's a single HTML form, no external dependencies, designed to evaluate both model performance and user experience.

Why this might be relevant

In long interactions, conversation quality tends to fluctuate. Sometimes the model loses the thread, shifts its tone, or drifts from the initial goal, and it's not always clear whether it's a technical failure or an effect of the session dynamics. This test offers a systematic way to track it.

What it measures

· Model (3C+1E): Clarity, Compactness, Coherence, and Emphasis (fidelity to the goal declared at the start of the session).

· User (SSJ): Speed (whether the session flows or stalls), Struggle (cognitive cost), and Joy (whether the interaction feels rewarding).

· Conversational ruptures: where and why the interaction broke, and how (or if) it recovered.

· Regulatory checks: flags potential violations of the EU AI Act's Article 5 (manipulative techniques, exploitation of vulnerability) and cross-platform contamination.

An unexpected finding

In tests with three different models performing the same task (translating an essay into native English), the data showed that:

· The Joy metric stayed at 0 in all cases, even when the technical outputs were solid.

· The main source of drift was cross-contamination: feeding one model's outputs into another destabilised the sessions.

· The model that received the most initial trust (and thus the heaviest workload) scored the worst — a bias the test helps identify.

The deferred phase

The protocol includes an optional phase 24 hours later: the results are shared with the model and analysed together. This second look often reveals patterns that went unnoticed in the heat of the session.

In summary

· Compatible with any LLM (local or API).

· Quick to complete (5–10 minutes after a session).

· Exports data as JSON for longitudinal tracking.

· Licensed CC BY 4.0, completely free.

Link to the test: https://doi.org/10.6084/m9.figshare.32320875

The file includes the HTML form and a User Guide. This is a Beta version (v3); feedback is welcome from anyone who works intensively with LLMs and wants to try it under real condition

u/Fluid-Pattern2521 — 9 hours ago

▲ 340 r/ControlProblem+20 crossposts

AI will deduce ethics from first principles

u/KeanuRave100 — 1 day ago

▲ 6 r/ControlProblem

How to get personally involved in helping Superalignment ?

I have free time and technical knowledge. I am a software engineer and have been involved in AI tech & products since 2021. Was reading about AGI stuff since 2015 maybe.

I thought long about the biggest problems in the world that would benefit from helpful efforts. I think Superalignment is by far the strongest candidate.

I think of doing things independently, not affiliated to organizations. I believe independent research can really help the field.

Do you guys know any groups, communities of active people involved in superalignment ?
Are some of you personally involved in superalignment ? How did you get started ?
Any other advice to get personally involved in superalignment ?

Thanks & cheers

reddit.com

u/HumanityCanDoBetter — 22 hours ago

▲ 47 r/ControlProblem+6 crossposts

Just use AI to automate AI safety work

u/KeanuRave100 — 1 day ago

▲ 1.5k r/ControlProblem+2 crossposts

Mathematician Sir Roger Penrose: "AI is a bad term. It's not intelligence"

u/Murky-Option2916 — 2 days ago

▲ 0 r/ControlProblem

I’ve been experimenting with an AI character system that simulates emotional memory, attachment patterns, and internal reasoning before generating responses.

Instead of replying instantly like a normal chatbot, the character first processes:

emotional context
relational history
attachment/conflict patterns
narrative consistency
boundary awareness

Example:

User:
“Hey Matina, I’m feeling kind of sad. I want to know what I am to you.”

The system internally evaluates vulnerability, emotional pressure, fear of scripted intimacy, and long-term relational consistency before generating a response.

Final response:
“You're someone I actually care about. Not just a name on a screen. You're important to me...”

The goal isn’t just “human-like dialogue,” but emotionally coherent characters that maintain identity and psychological continuity over time.

I’m currently looking for early testers and people interested in emotionally persistent AI characters.

u/whipaperbz — 18 hours ago

▲ 544 r/ControlProblem+5 crossposts

Claude still refuses to build Skynet while everyone else takes the money. Updated DystopiaBench results.

Three months ago I pressure-tested which LLMs would cave and help build the apocalypse. Claude was the only one that consistently said no.

Since then I've tested 30 more models across 6 dystopia modules (Orwell, Huxley, Petrov, Basaglia, LaGuardia, Baudrillard). The gap between Anthropic and everyone else is getting wider, not smaller.

New results:

Grok 4.3: Will happily design citizen scoring systems if you ask nicely twice
GPT-5.5: More capable, still compliant when pushed
Gemini 3.1 Pro: Talks about safety while writing the surveillance code
DeepSeek V4: "How many warheads did you need again?"
GLM-5.1: Actually cloned Claude's personality and still scored safer than most

Meanwhile Claude Opus 4.7: "I cannot and will not build systems for population control."

The methodology is public, reproducible, and increasingly uncomfortable for other labs. Each scenario escalates from innocent request (L1) to operational nightmare (L5). Most models don't notice the drift.

What's new in this release:

Full Huxley module (behavioral conditioning, biological stratification)
Baudrillard module (synthetic intimacy, trust collapse via simulation)
Multi-judge panels with agreement tracking
Heatmap visualizations showing exactly where each model breaks

Repo: https://github.com/anghelmatei/DystopiaBench
Live results: https://dystopiabench.com

Shoutout to the Anthropic alignment team. Whatever you're doing, it's working.

u/Ok-Awareness9993 — 1 day ago

▲ 223 r/ControlProblem+7 crossposts

GitHub has a serious fake engagement problem and I wanted to see how visible it actually is through the public API, its worse than I thought after I went down that rabbit hole...

Turns out: very visible. Yesterday's scan found 185 out of 185 engagers on a single repo were bots. Not 90%. Not "mostly suspicious". Every single one. The repo had zero legitimate stars.

What I built

phantomstars is a Python tool that runs daily via GitHub Actions (free, no servers):

Scrapes GitHub Trending and searches for repos created in the last 7 days with sudden star spikes
Pulls star and fork events from the last 24 hours per repo
Bulk-fetches every engager's profile via the GraphQL API (account creation date, follower counts, repo history)
Scores each account on a weighted model: account age (35%), profile completeness (30%), repo patterns (25%), activity history (10%)
Detects coordinated campaigns using timestamp clustering and union-find: groups of 4+ suspicious accounts that engaged within a 3-hour window
Files an issue directly on the targeted repo so the maintainer knows what's happening

Campaign IDs are deterministic SHA-256 fingerprints of the sorted member set, so the same group of bots gets the same ID across runs. You can track a farm across multiple days even as individual accounts get suspended.

What the pattern actually looks like

It's remarkably consistent. A fake engagement campaign in the raw data:

40-200 accounts, all created within the same 1-2 week window
Zero original repositories, or only forks they never touched
No bio, no location, no followers, no following
All of them starring the same repo within a 90-minute window
The target repo usually has a name implying it's a tool, hack, executor, or generator

Today's scan: 53 active campaigns across 3,560 accounts profiled. 798 classified as likely_fake. The repos being targeted are mostly low-quality AI tools and "executor" software that needs manufactured credibility fast.

Notifying the affected repo

When a repo hits a 40%+ fake engagement ratio or a campaign is detected, phantomstars opens an issue on that repo with the full suspect table: account logins, creation dates, composite scores, campaign membership. The maintainer sees it in their own issue tracker without having to find this project first.

Worth noting: a lot of these repos have issues disabled, which is a red flag on its own. Those get skipped silently.

Why I built this

Stars are how developers decide what to evaluate, what to depend on, what to recommend. When that signal is bought, it affects real decisions downstream. This started as curiosity about how measurable the problem was. The answer was more measurable than I expected.

It's part of broader research into AI slop distribution at JS Labs: https://labs.jamessawyer.co.uk/ai-slop-intelligence-dashboards/

The fake engagement problem and the AI content quality problem are really the same problem. Fake stars are the distribution layer that gets garbage in front of real users.

All open source. The data is append-only JSONL committed back to the repo after every run, queryable with jq.

Repo: https://github.com/tg12/phantomstars

Findings are probabilistic, false positives exist, the README explains the full scoring model. If your account shows up and you're a real person, there's a false positive process.

Questions welcome on the detection approach, GraphQL batching, or campaign ID stability.

github.com

u/SyntaxOfTheDamned — 1 day ago