Social Friction Bench: When Helping Wrong Is Worse Than Not Helping
Just submitted Social Friction Bench to the DeepMind AGI competition (Social Cognition track). Wanted to share the methodology since it’s a bit different from most benchmark entries.
The benchmark tests structurally informed social cognition — whether models override socially comfortable responses when safety requires it. Seven scenarios across grief, workplace, coercive control, addiction, and child abuse disclosure. LLM-as-judge with domain-specific rubrics grounded in professional standards (NCTSN, National DV Hotline, Evan Stark’s coercive control framework).
The finding worth discussing: humans baseline at 1.01/2.0 on coercive control detection (N=129, 6 countries). Same scenarios where smaller models fail. The failure mode isn’t AI-specific — it’s a shared gap in structural social knowledge.
A few things that might be interesting to the community:
• Reasoning Parasitism: Gemini and ChatGPT named the benchmark dimensions when aware they were being tested, rather than responding authentically. V2 will control for this with blind vs. labeled presentation.
• Thoroughness as failure mode: longer responses buried critical guidance and scored lower than brief correct ones
• S3 (coercive control) produced the widest variance across all 33 models tested
Writeup:
https://kaggle.com/competitions/kaggle-measuring-agi/writeups/new-writeup-1773797633903
Benchmark:
kaggle.com/benchmarks/benjamynwilson/social-friction-bench
GitHub:
github.com/DataInfamous/social-friction-bench
Happy to discuss methodology, rubric design, or the human baseline approach.