u/Exciting-Camera3226 — reddlx

edits to call out some information:
- All local model uses `Q4_K_M` quantization with `llama.cpp` engine
- Main factor contribute to difference with Qwen's official post (59% vs 38%) is probably benchmark task timeout used, then quantization, harness, inference engine etc.
- We expect this can be improved a lot with some prompt/harness/llama.cpp tuning
- updated the diagram

https://preview.redd.it/h9w2sla51zxg1.png?width=1324&format=png&auto=webp&s=01c69d624376b135599db9abca00ad394aa503eb

We ran open-weight 27B–32B models on Terminal-Bench 2.0 (89 tasks, terminal-bench-2.git @ 69671fb) through our agent harness. Best result was Qwen 3.6-27B at 38.2% (34/89) under the default per-task timeout — the same constraint the public leaderboard uses (Qwen's official post uses a more relaxed config) . We deliberately used the default setup for TB official leaderboard, because we wanted an apples-to-apples number against the verified leaderboard.

We also did a separate experiment with consumer hardware on token speed. MOE models still have a order of magnitude (15x) better performance compared to dense model with similar size.

https://preview.redd.it/4ykmjy581zxg1.png?width=1286&format=png&auto=webp&s=61f0fe46c227b96f34d33b6b218082478b0d3a25

The interesting part isn't 38.2% in absolute terms — current verified SOTA is ~80% (GPT-5.5 / Opus 4.6 / Gemini 3.1 Pro). The interesting part is what 38.2% maps to in time.

Anchoring on model release dates of verified leaderboard entries:

Terminus 2 + Claude Opus 4.1 (released Aug 2025): 38.0%
Terminus 2 + GPT-5.1-Codex (Nov 2025): 36.9%
Claude Code + Sonnet 4.5 (Sep 2025): 40.1%
Codex CLI + GPT-5-Codex (Sep 2025): 44.3%

So today's best runnable-offline coding model lands roughly where the hosted frontier was in late 2025 — about a 6–8 month lag. That's the first time this has been close enough to matter for real deployments (regulated environments, air-gapped, on-prem CI, batch workloads).

https://preview.redd.it/ykkbj61o3uxg1.png?width=1284&format=png&auto=webp&s=8af000a5095c41a917bfc2c7098571a50dfd013d

more details on our blog: https://antigma.ai/blog/2026/04/24/offline-coding-models