The $5,000 OpenAI bill story: why every vibe-coded app needs rate limiting
Quick PSA for anyone who built an AI app with Lovable, Bolt, or V0:
Your AI endpoint probably has no rate limiting. The AI builder generated something like this:
const { prompt } = req.body;
const response = await openai.chat.completions.create({ ... });
res.json({ result });
Notice what's missing? No auth check. No per-user limit. No per-IP throttle. No cap on prompt size. No max_tokens on the response.
Now imagine someone finds your endpoint URL and points a script at it:
while true; do
curl -X POST https://yourapp.com/api/chat \
-H 'Content-Type: application/json' \
-d '{"prompt": "Write a 5000-word essay"}'
done
That script runs all night. By morning, your OpenAI bill is $1k-$5k depending on the model. The charges have already cleared. Sometimes you can get credits refunded, sometimes not.
The fix is layered rate limiting:
- Per-IP (5 requests/minute, blocks anonymous abuse)
- Per-authenticated-user (100 requests/day, prevents legitimate user runaway)
- Global cost cap (2000 requests/day across everyone, caps total damage)
Plus: cap your prompt.length (reject anything over 2000 chars), set max_tokens on the response, and set up usage alerts in your provider dashboard.
For serverless deployments, Upstash Ratelimit is the standard. Free tier handles 10k commands/day, plenty for most starting apps.
I wrote a longer guide with full code examples for Vercel/Node. Happy to drop the link in the comments if anyone wants it.
Anyone here actually been hit by this? Curious what the real bill numbers look like in the wild.