How we keep LLM costs predictable without a database
June 8, 2026 · Quravin
The model is the bill. Everything else — Lambda, S3, the CDN — rounds to zero at our scale. So cost control is really spend control, and it has to be exact.
A hard counter on S3
We didn’t want to run a database just to count requests. Instead we use S3
conditional writes (If-Match on the ETag) as a compare-and-swap: read the
counter, increment, write only if the ETag still matches, retry on conflict.
That gives an atomic monthly request counter and a daily cost counter per
organization — durable, serverless, and correct under concurrency.
Three layers of protection
- Monthly request quota per plan (with per-app overrides).
- Daily USD cost cap — a hard ceiling that stops runs before they spend.
- Anonymous limits — per-IP and a global daily ceiling on the public try-it endpoint, plus a Turnstile bot check, so the free tools can’t be turned into someone else’s free GPU.
The payoff
You get a bill you can predict and a kill-switch you control, without operating any stateful infrastructure. Read more in the FAQ or start building.