Switched 70% of our agent traffic to DeepSeek R2 without a redeploy. Here's how
DeepSeek R2 came out last week; pricing roughly 70% lower than the Western frontier models we were using. For a pre-seed startup that number matters.
The problem with switching models mid-production: we had LangChain agents with prompts tuned to a specific provider's behavior. Every previous model switch meant updating config, testing, redeploying, and praying nothing broke at 2am. With 3 people on the team that's a half-day minimum.
What we did instead: route through a gateway with weighted routing config. Set R2 to handle 30% of traffic initially, watch error rates and output quality for 48 hours, then bump to 70%. No code changes. No redeploys. If R2 started producing bad outputs we could roll back in 30 seconds by changing a config value.
The 48-hour shadow period caught one prompt that broke badly on R2's tool-call format. Fixed it before it ever hit majority traffic. Would have been a production incident if we'd done a hard cutover.
Bill dropped 41.3% in the first week. Still watching quality metrics but so far no regressions on the tasks that matter.