u/gbomb13

AI 2027 is 88% accurate so far

https://ai2027tracker.com/

ex: AI 2027 projected the frontier CyBench score to be 85% by now -- yet Claude Opus 4.6 and Mythos score 100%. It projected OSWorld at 80% -- yet Mythos scores 79.6%. It projected AI to clear 8-hour tasks on RE-Bench -- yet Mythos clears 8 hours on Anthropic's internal RE-Bench.

u/gbomb13 — 21 hours ago

🔥 Hot ▲ 236 r/accelerate

Claude mythos vs strongest 2025 model exactly 1 year ago

We can assume for benchmarks which didn't exist back then, the 2025 model would score <20%.

This is one year of progress

u/gbomb13 — 4 days ago