r/LLM_Gateways

Been using this cheap LLM gateway for a bit

https://oneapi.unit23api.com/

OpenAI + Claude API fully compatible. Just swap the base URL and your existing code works. Only Running DeepSeek and MiniMax-m2.5. The Cheapest I've found. New accounts get $5 free to try it out. Worth checking if you're burning through API budgets.

reddit.com
u/Dry_Information_6567 — 4 days ago
▲ 7 r/LLM_Gateways+1 crossposts

switched from liteLLM to a go based proxy, tradeoffs after a month

we were on litellm for about 6 months and it was mostly fine. the thing that eventually killed it for us was streaming latency. every request was getting maybe 5-8ms added which doesn't sound bad until you stack tool calls in a multi-turn agent and the user is sitting there watching a spinner for an extra 200ms per turn. we spent two weeks trying to optimize it and i'm still not sure if it was litellm or our setup but we couldn't get it lower. could totally be skill issue on our end tbh

switched to bifrost which is a go proxy. latency is better but the migration took a bit of effort. we had a few provider configs that didn’t transfer cleanly and one of our test providers isn’t supported yet so we paused that integration. not a blocker for us but worth calling out

the one thing that actually surprised me was the cost logging. we could see per-request costs tagged by endpoint and that's how we found out our summarization step was doing 5 retries on failures and each retry was resending full context. was costing us roughly 3x what we thought for that step. litellm gives you cost data but it's per-provider not per-request so we never would have caught that

that said the docs are still catching up. i had to read go source code once or twice to figure out some config options. filed issues and got responses pretty fast though so that helped

not saying everyone should switch. litellm has way more providers and if you're a python shop extending it is easy. we just had a specific latency problem and this solved it for us

reddit.com
u/llamacoded — 6 days ago
▲ 7 r/LLM_Gateways+1 crossposts

LLM Pricing is 100x Harder Than you think: We open-sourced our pricing database (3,500+ models, free API)

hey community,

i saw a thread here a couple months ago asking this exact question and it resonated hard.

https://preview.redd.it/umrpmntiejvg1.png?width=1710&format=png&auto=webp&s=5004a95eba8d3dbb7fa343095ff0f85b02965244

I've been building LLM cost infrastructure for Portkey's gateway for the last 3 years and the answer is: it's not solved because the problem is way more complex than it looks.

https://preview.redd.it/6x1efm45fjvg1.png?width=1200&format=png&auto=webp&s=c8708edc728b9019eaa3a9cbd19eef520832dc36

the naive formula (cost = tokens × rate) breaks in at least 6 ways:

  1. thinking tokens — reasoning models consume tokens for internal reasoning that never appear in the response. you still pay. if you only count visible output, you undercount agentic workloads by 30-40%.
  2. cache asymmetry — anthropic charges 25% more for cache writes ($3.75/M vs $3.00/M). openai charges nothing for writes. reads are discounted differently. a single "cache discount" multiplier is wrong for at least one provider.
  3. context thresholds — cross 128K tokens and per-token cost can double. nothing in the API response tells you which tier you hit.
  4. same model, different prices — kimi k2.5: $0.5/$2.8 on together, $0.6/$3.0 on fireworks. bedrock prepends regional prefixes, azure returns deployment names. you need extra logic just to resolve the model ID.
  5. non-token billing — images bill by resolution, video by second, audio has separate i/o rates, embeddings are input-only. each maps to a completely different pricing structure.
  6. new dimensions — started with 2 billing dimensions (input/output tokens). now 20+. web search, grounding, code execution each have their own cost model.

and we open-sourced the pricing database we use in production:

  • github+ free API: github.com/portkey-ai/models
  • 3,500+ models, 50+ providers
  • updated daily via an automated agent (claude agent SDK + skill files)
  • MIT license

if you're maintaining a pricing JSON somewhere in your repo, this might help

reddit.com
u/Wonderful-Agency-210 — 7 days ago