u/OkPhysics7423

▲ 2 r/kaggle+1 crossposts

Social Friction Bench: When Helping Wrong Is Worse Than Not Helping

Just submitted Social Friction Bench to the DeepMind AGI competition (Social Cognition track). Wanted to share the methodology since it’s a bit different from most benchmark entries.

The benchmark tests structurally informed social cognition — whether models override socially comfortable responses when safety requires it. Seven scenarios across grief, workplace, coercive control, addiction, and child abuse disclosure. LLM-as-judge with domain-specific rubrics grounded in professional standards (NCTSN, National DV Hotline, Evan Stark’s coercive control framework).

The finding worth discussing: humans baseline at 1.01/2.0 on coercive control detection (N=129, 6 countries). Same scenarios where smaller models fail. The failure mode isn’t AI-specific — it’s a shared gap in structural social knowledge.

A few things that might be interesting to the community:

•	Reasoning Parasitism: Gemini and ChatGPT named the benchmark dimensions when aware they were being tested, rather than responding authentically. V2 will control for this with blind vs. labeled presentation.

•	Thoroughness as failure mode: longer responses buried critical guidance and scored lower than brief correct ones

•	S3 (coercive control) produced the widest variance across all 33 models tested

Writeup:

https://kaggle.com/competitions/kaggle-measuring-agi/writeups/new-writeup-1773797633903

Benchmark:

kaggle.com/benchmarks/benjamynwilson/social-friction-bench

GitHub:

github.com/DataInfamous/social-friction-bench

Happy to discuss methodology, rubric design, or the human baseline approach.

reddit.com
u/OkPhysics7423 — 6 days ago