Is there a way to benchmark tokens/sec for the same model across providers?
I’m trying to compare throughput (tokens/sec) for the same model (e.g., DeepSeek V4 Flash) on different providers without having to manually test each one myself.
I’m trying to compare throughput (tokens/sec) for the same model (e.g., DeepSeek V4 Flash) on different providers without having to manually test each one myself.
I'd like to give a set of rules like these for it to try to follow those rules when refactoring code. Also a rule to always commit using conventional commits after meaningful changes.
I tried ollama/qwen3.5:0.8b but every time it pastes markdown, never accomplishes the tasks given.
I was using OpenCode with OpenCode's Zen MiniMax M2.5 Free and hit a rate limit. Switched to OpenRouter’s Gemma 4 31B free (different provider entirely), but I’m still seeing the same rate limit message.
That makes me think it’s not the upstream API but OpenCode itself clamping down. Does OpenCode have its own global rate limits per user/IP? Or could something else be cached/carrying over?
Anyone else run into this?