r/openrouter

▲ 29 r/openrouter+5 crossposts

Heads up to everyone here.

We're launching a free plan for BetterClaw this week. figured I should write about it properly since a lot of you joined this sub, wondering when you could actually try what we built without paying.

What you get on free:

  • 1 agent running on our infrastructure
  • unlimited chat (no restrictions on messages)
  • 100 tasks per month (one-time + cron combined)
  • 7-day memory retention (auto-purged after that)
  • 7-day chat and task history
  • daily cron minimum (one scheduled task per day max)
  • curated skills marketplace + Telegram + Slack webhook integrations
  • 1 vCPU / 1 GB infrastructure
  • BYOK so you bring your own API key and control your model costs
  • No credit card required to sign up

The whole point of this tier is to let people actually experience what we built. You've been reading my posts for months about cost management, skill safety, memory bloat, all the OpenClaw headaches. free plan lets you see how we solved those without committing anything.

What BetterClaw does differently for the people who haven't been following:

Smart context management so you're not burning tokens on housekeeping every time your agent checks its own pulse. secrets that auto-purge from agent memory after 5 minutes, which is a direct response to ClawHavoc and the whole .env exfiltration mess. verified skills marketplace where we test every skill before it hits you. workspace isolation that keeps one agent's mess out of another's context.

None of this is revolutionary. It's just the stuff that should have been in OpenClaw from day one.

What you need to know before signing up:

It's genuinely free. not a trial. not a "free for 14 days, then we charge you." We don't need your card. We don't need your details beyond what's required to run an agent on our infrastructure.

BYOK means you bring your own API key. whether that's Anthropic, Z.ai for GLM-5.1, MiniMax, whatever. Your model bill goes to your provider directly. we don't take a cut on tokens.

The 100 tasks a month and 7-day memory window are real limits. if your use case is "daily briefing + a few ad-hoc requests," free handles that easily. if you're running heavy automation with cron jobs every hour, you'll hit the wall fast. that's fine. start free, see what your actual usage looks like, decide from there.

What I'd actually do with the free plan:

Start with your most annoying recurring task. the one you've been hacking together with OpenClaw and fighting about every week. Move that one agent over. See if it runs smoother. If it does, you keep it. If it doesn't, you leave it and keep doing what you were doing. no drama.

This sub exists because we wanted a place where agent conversations aren't constant sales pitches. That doesn't change. The free plan is just making it easier for you to try what we built without any commitment.

Disclosure: I run BetterClaw (betterclaw.io). been building this for months, finally in a place where free access makes sense. Happy to answer questions in the comments.

u/ShabzSparq — 2 days ago
▲ 8 r/openrouter+1 crossposts

Are DeepSeek models on OpenRouter via NovitaAI and SiliconFlow the same quality as official DeepSeek?

I have been using DeepSeek V4 for coding tasks through both OpenRouter and the official DeepSeek platform. Today, I checked the OpenRouter logs and noticed that the provider for the DeepSeek model was not DeepSeek itself, but NovitaAI and SiliconFlow instead.

Now I am wondering whether these providers deliver the same quality as the original DeepSeek service or if the quality is degraded in some way.

If the quality is identical or even slightly worse, I feel like I might stop using OpenRouter and just use DeepSeek directly instead. After all, DeepSeek is the company that actually created the model, while other providers are essentially hosting it and making money from it. I would rather have that revenue go directly to DeepSeek so their team has more resources to continue improving the model.

What do you guys think?

reddit.com
u/Existing_Arrival_702 — 14 hours ago
▲ 18 r/openrouter+1 crossposts

Built a router that picks the right LLM for each request automatically, under 1ms overhead

Been working on something that's been bugging me for a while. Every time I build something with LLMs, I end up hardcoding a model name and then spending weeks second-guessing that decision. Is gpt-4o overkill for this? Should I have used Haiku? What happens when this model has an outage?

So we built a router that handles all of this at the request level. You tell it your priority (speed, cost, quality, or balanced) and it scores every model in the catalogue using a weighted formula across latency, cost, and quality dimensions, then picks the best one. The whole scoring decision takes under 1ms because it's just math, no network call.

The weights look like this:

  • speed priority: 0.70 latency / 0.20 cost / 0.10 quality
  • cost priority: 0.20 / 0.70 / 0.10
  • quality priority: 0.10 / 0.20 / 0.70

It sits in front of OpenRouter, so you get access to the full catalogue. If the selected model fails, it falls back to the next best candidate automatically. Repeated identical requests hit Redis cache (or in-memory if you're not running Redis). FastAPI server with a CLI for dry-runs if you want to see routing decisions without burning tokens.

Curious if anyone has tried something similar or has thoughts on the scoring approach. The quality scores are static right now which is the obvious weak point.

Github repo is in comments below 👇

This project was built using Neo AI Engineer. Evaluated by myself.

u/gvij — 1 day ago
▲ 37 r/openrouter+4 crossposts

A few things I’m trying to figure out:

Which Chinese LLM is currently strongest for coding (debugging, generation, reasoning over codebases)?

How does it compare to tools like GPT or Claude for dev work?

What’s the easiest way to access it — API, OpenRouter, local setup, etc.?

Are there any free or low-cost options that are actually usable?

Any setup tips (IDE integrations, latency, rate limits, etc.)?

Trying to rebuild a practical dev workflow post-GHCP, so real-world experiences would really help.

reddit.com
u/RegisterTop3586 — 14 days ago

What is the best open source CLI alternative of (Kilocode or OpenCode) to use openrouter with all models !

I just tried to use my openrouteur API in Kilo Code and OpenCode, but I do not see all the models. And since I'm looking for a CLI where I can have a good interface like Opencode, but be able to use LLM like with aider.chat

Thanks

reddit.com
u/rjn2-8 — 4 days ago
▲ 1 r/openrouter+1 crossposts

Claude Code is pricing me out—tried OpenRouter & Ollama on Windows, but it's a mess. Any fixes? 🛠️

Listen, I’m over the subscription fatigue. I’m trying to get a solid agentic workflow going without selling a kidney for Claude Code, but I’ve hit a brick wall on Windows.

Here’s the "Wall of Shame" of what didn't work so far:

❌ Ollama/Local Models: Even with high-quant versions, the reasoning just isn't there for heavy lifting. It falls apart the second things get complex.

❌ The "Chinese Route": Qwen’s free tier got nuked, so that’s off the table.

❌ OpenRouter Bridge: I tried hooking Claude up through OpenRouter, but it’s been a nightmare.

❌ Environment Variables: I’ve messed with PowerShell, tweaked the API keys, and messed with the tokens—nothing. It keeps throwing the same model errors every time.

Has anyone actually successfully bridged Claude Code to a different provider or found a local wrapper that doesn't hallucinate every third line?

Drop your setup in the comments. If you've got a config that actually breathes, I'm all ears. Cheers! 🍻

reddit.com
u/Downtown_Grab_2704 — 5 days ago
▲ 1 r/openrouter+1 crossposts

proxies on janitor ai

hi so ive spent weeks maybe even months trying to get a proxy to work. and it js never worked. what am I doing wrong? (i've added the api key it's js not shown in the screenshot)

u/Similar-Produce-6228 — 2 days ago
▲ 10 r/openrouter+1 crossposts

MinusPod LLM benchmark: 32 models tested on podcast ad detection (real transcripts, human-verified)

I maintain MinusPod, a self-hosted podcast server that uses Whisper and an LLM to strip ads. Users kept asking which LLM to use, and I didn't have a real answer. So I built a benchmark.

What was tested

  • 32 models across 12 providers, from frontier (GPT-5.5, Claude Opus 4.7, Gemini 2.5 Pro, Grok 4.1, o3) down to free OpenRouter models
  • 7 podcast episodes, 6 with ads and 1 no-ad negative control, all with human-verified ad timestamps
  • Each episode split into ~85-second sliding windows. Models judge each window independently.
  • 5 trials per (model, episode) at temperature 0 to catch non-determinism
  • Predictions scored at IoU >= 0.5 against ground truth
  • Costs recomputed from token counts at a fixed pricing snapshot, so all rows compare at the same prices
  • ~14,400 unique calls per sweep

Top results

Quick definitions for the table columns:

  • F1: combined precision and recall against human-verified ad spans. 0 means the model got nothing right, 1 means it found every ad with the correct boundaries. Higher is better.
  • Cost/episode: average USD per episode at a fixed pricing snapshot. Lower is better.
  • JSON compliance: fraction of responses that parsed as clean JSON matching the requested schema. 1.0 means every response came back well-formed. Higher is better.
Rank Model F1 Cost/episode JSON compliance
1 grok-4.1-fast 0.642 $0.15 0.87
2 qwen3.5-plus (free tier) 0.616 $0.00 1.00
3 gpt-5.5 0.613 $3.46 0.87
4 claude-opus-4-7 0.593 $4.10 1.00
5 gemini-2.5-pro 0.549 $2.03 0.97

A few things the data surfaced:

  • Most models are heavily recall-biased. They flag non-ads as ads. o3 is the only paid model that leans the other way (precision 0.70, recall 0.48).
  • F1 and boundary accuracy don't track. Some models that score well on F1 are still 15+ seconds off on where the ad starts or ends, which matters if you're actually cutting the audio.
  • JSON schema compliance varies. o4-mini parsed cleanly only 5% of the time. Combined with its 0.07 F1, it was the worst-paid model in the run.
  • Self-reported confidence is poorly calibrated almost everywhere. Several models claim 0.95+ confidence at a true hit rate of 0.20 to 0.45.

Caveats

  • F1 numbers are upper-bounded by transcript quality—the benchmark scores against transcripts produced by faster-whisper large-v3 with an initial_prompt containing sponsor vocabulary. Smaller Whisper models or no vocabulary prompt will result in lower ceilings. Production results will vary.
  • Latency numbers for OpenRouter-routed models include OpenRouter queueing and upstream provider load. Treat them as indicators of availability, not model speed.
  • Data science is not my background. The metric choices (F1 at IoU 0.5, MAE for boundaries, per-bin calibration tables) are what I could defend after reading around. I'd genuinely like a critique. PRs and issues welcome, especially on scoring methodology, additional episodes, or anything I'm computing wrong.

Repo and full report: https://github.com/ttlequals0/MinusPod/tree/main/benchmarks/llm


About MinusPod

MinusPod is a self-hosted server that removes ads before you ever hit play. It transcribes episodes with Whisper, uses an LLM to detect and cut ad segments, and gets smarter over time by building cross-episode ad patterns and learning from your corrections. Bring your own LLM: Claude, Ollama, OpenRouter, or any OpenAI-compatible provider.

https://github.com/ttlequals0/MinusPod

u/ttlequals0 — 2 days ago

Why no good providers of Gemma 3.6 35B?

Gemma 4 26b A4b https://openrouter.ai/google/gemma-4-26b-a4b-it is a really good model, but when I use it via OpenRouter the max token/s providers is like 30-40 tokens/s. There also seems to be cold starts where some requests take 105 seconds to complete (for short text prompts).

I could save a tremendous amount of money in my service if a proper provider existed, but am now using gemini 3.1 flash lite instead, which has twice the cost.

u/Sure_Proposal_9207 — 6 days ago

BYOK upstream cost and BYOK usage inference cost

I've been using OpenRouter for a week now via BYOK. From my understanding, as long as my monthly request is less than 1M, I wouldn't get charged for using BYOK. However when I check my logs, I see a BYOK upstream cost and BYOK usage inference cost. Could someone explain the fee structures for those? I am well under my 1M limit. Thanks!

reddit.com
u/kanchodaisuki — 3 days ago

I’ve been using Claude for a while, and as we all know the limits have become pretty unusable. So I decided to try OpenRouter running Claude as the agent with different models underneath.

I started with smaller models (Qwen, Gemma 3.6) but their output was obviously lacking compared to Sonnet. Then I tried heavier models like deepseek v4 Pro, MiniMax 2.7, and GLM 5.1 for planning, with lighter models handling the actual coding. That combo works pretty well, but it costs a lot even for simple things(to the point it’s cheaper that i just subscribe into a higher tier of claude).

I’m pretty sure there’s a context problem somewhere I’m just honestly not sure where to start picking it apart.​​​​​​​​​​​​​​​​

any recommendations where i could learn some more on how to improve openrouter?

reddit.com
u/Ok_Skin4565 — 7 days ago

Optimization Tip Needed: Built a feature across a stack via Cline + OpenRouter. Cost hit $3.5. How to optimize multi-step agent workflows?

I’m a dev building an AI platform.

I recently built a full "Tokenomics & AI Usage Monitoring" feature consisting from 10 steps: DB models, Tracker Services, Admin/PI Routers, and Frontends) using OpenRouter and the Cline extension in VS Code. I primarily used DeepSeek V4 Pro for its amazing price-to-performance ratio.

The issue I noticed: The total cost to build this single feature reached around $3.50. I know it's cheap for the value, but I want to optimize my workflow for scaling.

My workflow:🧱

To avoid confusing the model with my massive codebase, I tried to be as organized as possible:

  1. I provided concise .md files containing the implementation roadmap and phase summaries. ( which  I had already done before using Cline and openrouter api key )

  2. I used @file to inject specific context rather than scanning the whole @codebase.

The Dilemma: 💣

🚩If I stayed in the same chat task, the context window blew up (sending the whole chat history + complex DB schemas again), costing me ~$0.50 per message.

🚩If I clicked "Start New Task" for each step, I still had to re-inject the roadmap and core .py files to get the model "up to speed" before coding, which still cost around ~$0.40 just to initiate the step.

❔❔My Question to the pros here:

1.How do you guys handle massive, complex codebases without bleeding tokens on context loading?

2.Are you using Prompt Caching heavily with OpenRouter/Cline for this? If so, how do you set it up effectively?

3.Any specific hacks for multi-step agentic workflows so the AI remembers the "architecture rules" without paying for that context every single prompt?

Would love to hear your advanced workflows!

THANKS ❤🙏

reddit.com
u/yozarsif1 — 6 days ago

Hey guys, I was looking for some free openrouter model for content generation (mostly emails).
Can someone tell me the best free one currently in the market right now?

reddit.com
u/NighTHawK202004 — 9 days ago
▲ 3 r/openrouter+1 crossposts

I got tired of tracking best/new models for OpenRouter, so I automated it

Hi everyone!

I recently started using Hermes and OpenRouter, but one pain point has been limiting the selection of models when using openrouter/auto to get the best balance of quality and price per token. I automated the process in two steps: On one side, there is a cron job that does sentiment analysis on models, then reranks them based on model popularity and price. On the other side, there is a skill (/automodel) that configures the selection of models in the Hermes config. As a starting point, it builds three lists (free, balanced, best), and it can pull the data from either a local folder (your own cron job output or custom source) or a URL (I have a cron job updating the demo site). I hope you find it useful, and feel free to use it as a starting point for your own model selection workflows!

You can find the source and demo site at https://github.com/crisberrios/openrouter-hermes-automodel

----Update 1----

Based on initial feedback, updated the ranking algorithms and demo UI.

u/kambeix — 3 days ago
▲ 22 r/openrouter+6 crossposts

Everyone's facing insane costs and rate limits from Claude Code it's gotten ridiculous these last few months. I needed a better alternative to save my money, so I found Cline, started clining , and it was amazing. But I kept thinking: imagine bringing Cline as a provider into Claude Code mature environment to test... and it rocked I combined Cline cost/performance models with Claude Code ecosystem into one product, handling cache control, API calls, ToS schemas, and building req/response to fit perfectly. Now I've got 13 providers and Cline one of my faves you guys gotta try it: https://github.com/AbdoKnbGit/tau

u/JhonDoe191ee — 6 days ago