u/AgreeingElk234

Benchmarking Coding Agents: What’s actually working best with OS models right now?

Benchmarking Coding Agents: What’s actually working best with OS models right now?

Artificial Analysis Coding Agent Index (Source: Artificial Analysis)

As a huge fan of Artificial Analysis, I was checking the model indexes and benchmarks when I noticed a new section focusing on coding agent performance, specifically comparing performance across different agents and models.

As a heavy user of coding agents with open-source/weight models, this is critical information to have. This leads me to a question: in your experience, which are the best coding agents to use with open-source models?

Currently, I use the Claude Code + Kimi 2.6 combo, but I’d like to know your thoughts :)

reddit.com
u/AgreeingElk234 — 3 days ago
▲ 2 r/openrouter+1 crossposts

Kimi 2.6 "Infinite Thinking" loop on OpenRouter: No tokens consumed but stuck for 20+ mins

Since Anthropic made its models effectively unusable due to token costs, at least for a student like me, I’ve switched to open-weights models, specifically the Kimi class via OpenRouter. My current setup uses Claude Code as the coding agent with the OpenRouter API providing the model backend. I primarily use LLMs for Data Science, AI work, and statistical modelling, and I hadn’t encountered any issues during several months of testing and full-time usage.

However, last night while editing a LaTeX presentation in VS Code, I experienced a non-negligible issue. After prompting the agent (Claude Code) with my edits, the model (Kimi 2.6) started its "thinking" process. I could see the live token consumption initially, but after I asked it to clarify some points and refactor some equations to make the text more "elegant," the model entered a thinking mode for an unreasonable amount of time given the simplicity of the task.

Normally, I would assume the reasoning process was just following its flow, but through OpenRouter’s monitoring options, I noticed that no tokens were being consumed after the first response. I tried to be patient, but after about 20 minutes of "reasoning" for a basic task with no change, I realized something was wrong.

Irritated, I conducted a few tests to better understand the problem: (1) Fresh Session: I initialized a new session with the same prompt. Results? Same "infinite thinking" loop. (2) Coding Task: I asked it to modify a chart in a Jupyter Notebook within the same workspace. It worked perfectly. (3) Different Workspace: I prompted a simple task in a completely different workspace. Again, no problems.

I’m a big fan of the open-source ecosystem, but I also need reliable tools. Has anyone else experienced this kind of "ghost" thinking or infinite loops with Kimi on OpenRouter? If so, how did you handle it? Any and all advice is welcome.

reddit.com
u/AgreeingElk234 — 7 days ago