switched from liteLLM to a go based proxy, tradeoffs after a month
we were on litellm for about 6 months and it was mostly fine. the thing that eventually killed it for us was streaming latency. every request was getting maybe 5-8ms added which doesn't sound bad until you stack tool calls in a multi-turn agent and the user is sitting there watching a spinner for an extra 200ms per turn. we spent two weeks trying to optimize it and i'm still not sure if it was litellm or our setup but we couldn't get it lower. could totally be skill issue on our end tbh
switched to bifrost which is a go proxy. latency is better but the migration took a bit of effort. we had a few provider configs that didn’t transfer cleanly and one of our test providers isn’t supported yet so we paused that integration. not a blocker for us but worth calling out
the one thing that actually surprised me was the cost logging. we could see per-request costs tagged by endpoint and that's how we found out our summarization step was doing 5 retries on failures and each retry was resending full context. was costing us roughly 3x what we thought for that step. litellm gives you cost data but it's per-provider not per-request so we never would have caught that
that said the docs are still catching up. i had to read go source code once or twice to figure out some config options. filed issues and got responses pretty fast though so that helped
not saying everyone should switch. litellm has way more providers and if you're a python shop extending it is easy. we just had a specific latency problem and this solved it for us