u/Noobcreate — reddlx

Kubernetes scales compute. It doesn't solve fairness or spiky traffic. When an agentic burst hits your services, pods get hammered, one noisy tenant crowds out everyone else, and the autoscaler takes 2 minutes to respond while requests are already failing.

EZThrottle Local is a single BEAM node that queues inbound jobs in memory and drains them to your upstream at a pace the service controls — via response headers. Your service responds with X-EZTHROTTLE-RPS: 20 and the queue drains at 20 RPS. As Kubernetes scales up, you raise the number. No redeploy, no config change.

Each user gets their own independent pace when you need it. A premium user can run at 50 RPS while a free-tier user runs at 2, in parallel, without either affecting the other — opt-in via a single response header.

On a 32GB machine it holds 3–32 million jobs in memory. That's hours of buffer at most agentic workloads — enough for any autoscaler to catch up before a single request is dropped. Built on the BEAM so hot code reloads preserve the in-memory queue across deploys.

https://github.com/rjpruitt16/ezthrottle-local

I was in the Golang subreddit speaking about my new business and they got mad because I told them that the BEAM is better than Golang in certain problem sets. I built a Golang SDK just for them so they could get the reliability the BEAM could offer in serverless, and they downvoted my post. So I was curious what you guys think. I'm open to all criticism and feedback, but I truly cannot imagine any design coming close. I may have learned the BEAM with AI after getting laid off, but I feel like my years of operations work with Java, Golang, and Python stacks in serverless make me more pissed off than you all. Lol JK, I just wanted to get you excited for the design. I promise I'm humble.

If you have no idea what the internet was like before TCP — to send a file over the internet you had to know C. It was a huge pain in the ass. If you missed a chunk of data you had to rewrite the program. Every developer had their own custom retry logic. Everyone just sent packets as fast as possible with no appreciation for pacing. Then a small team — Vint Cerf and Bob Kahn — wrote TCP and it became the most foundational algorithm of the entire internet. We built the modern web, APIs, and databases on top of that algorithm. Sending email became trivial.

One more story that is important before my design: before Ericsson, in order to make a call there was a human switch operator. If the switch was full, the caller would be rejected. It didn't matter if the caller had an emergency — the model was first come, first served. Ericsson built the runtime to deal with concurrent processes with isolation boundaries that would make sure telecom systems were resilient to crashes.

Now on to my design. Agent retry storms are coming for everyone's APIs. A human being might visit 10–30 websites and call APIs maybe 20 times. The Cloudflare CEO said it plainly at SXSW this year: "Your agent will often go to a thousand times the number of sites a human would — it might go to 5,000 sites. And that's real traffic, and that's real load." Those agents will call APIs as fast as possible. The APIs will throw 429s, which inspires the servers to send more requests. The servers of the API will slow down, which will inspire clients to send more until it crashes. The fleet of machines that was over capacity at N machines will find only a ticking time bomb before N-1 machines handle the same capacity. The autoscalers will provision new machines and warm up — but crash before the entire fleet is down.

Enter EZThrottle. The same way Ericsson absorbs bursts of call requests and routes them to the best switch, EZThrottle queues, paces, and reroutes API calls past partial outages — in both directions. It protects the APIs you call and the API you run. It solves the noisy neighbor problem by giving each user their own queue. When it receives a 500, it uses the Fly.io network to send directly to another region to see if it works over there. It's what Cloudflare is for inbound traffic, but for your outbound API calls. Stripe, Google, OpenAI, and your gateway server could all be having partial outages and EZThrottle will fight to get each call through. No cold starts. No performance choking on retry storms. No spiky traffic — just smooth, predictable requests sent at the pace the API can actually handle. The resilience of the BEAM in your non-BEAM services.

I've linked the actual writeups below, but tell me — have you ever seen a more elegant architecture on the BEAM?

https://ezthrottle.network/blog/making-failure-boring-again
https://ezthrottle.network/blog/serverless-2-rip-operations
https://ezthrottle.network/blog/a-queue-per-user-at-scale