
Kubernetes scales compute. It doesn't solve fairness or spiky traffic. When an agentic burst hits your services, pods get hammered, one noisy tenant crowds out everyone else, and the autoscaler takes 2 minutes to respond while requests are already failing.
EZThrottle Local is a single BEAM node that queues inbound jobs in memory and drains them to your upstream at a pace the service controls — via response headers. Your service responds with X-EZTHROTTLE-RPS: 20 and the queue drains at 20 RPS. As Kubernetes scales up, you raise the number. No redeploy, no config change.
Each user gets their own independent pace when you need it. A premium user can run at 50 RPS while a free-tier user runs at 2, in parallel, without either affecting the other — opt-in via a single response header.
On a 32GB machine it holds 3–32 million jobs in memory. That's hours of buffer at most agentic workloads — enough for any autoscaler to catch up before a single request is dropped. Built on the BEAM so hot code reloads preserve the in-memory queue across deploys.