Back to Blog

Usage-Based Pricing vs a Cost Ceiling: Lessons From Vercel's $10K/Day Bot Spike

June 7, 2026

Usage-Based Pricing vs a Cost Ceiling: Lessons From Vercel's $10K/Day Bot Spike

Flat per-hour cost ceiling line versus a usage-based pricing bill spiking from AI bot traffic

On April 12, 2026, traffic to Vercel's own docs AI chat endpoint spiked to roughly 10x normal. At the peak it hit about 1,300 requests per minute, routed through residential proxies that hid the real client IPs. Vercel says that, left unchecked, the inference cost would have run at more than $10,000 per day.

This wasn't a customer who misconfigured something. It was Vercel — a platform that knows this attack surface better than almost anyone — getting hit on its own infrastructure. They caught it and blocked it. But the number is the point: a single AI endpoint, automated traffic through residential proxies, a five-figure daily run rate. (Vercel: Protecting against token theft)

That's the shape of the problem with usage-based pricing in 2026. When your bill is a function of inbound requests, and inbound requests increasingly come from bots you didn't invite, your cost has no ceiling. This post is about that gap — and about what changes when your bill is a function of the instance you chose to run, not the traffic the internet decides to send you.

A note up front, because honesty matters more than a scary headline: a cost ceiling is a cost story, not a security one. Flat pricing won't stop a bot swarm from hammering your app. It changes what that swarm can do to your invoice. We'll be precise about that boundary at the end.

What usage-based pricing actually couples together

Usage-based pricing is a genuinely good model a lot of the time. When you launch with five users, paying per request or per invocation means you pay almost nothing. Scale-to-zero is real — you're not renting a box that idles at 3am. For bursty, early-stage workloads, your cost tracks your usage, and at low usage that's a rounding error.

The catch is structural. The exact property that makes usage-based pricing cheap when traffic is low — cost scales directly with traffic — is what makes it dangerous when traffic spikes for reasons that have nothing to do with your success.

A usage-metered bill has, in practice, no ceiling. It's a function of inbound traffic, and inbound traffic is not something you fully control. One viral post, one retry loop gone wrong, one aggressive crawler — and the line item that was a rounding error becomes a number you have to explain to someone.

Vercel's incident is the clean version of this. The endpoint worked exactly as designed. Each request did its job and billed for inference. The "attack" was just... more requests. That's the whole mechanism.

"Denial of wallet": when the attack is your invoice

Security people have a name for this now: a denial of wallet attack. A classic DDoS tries to knock you offline. A denial-of-wallet attack does the opposite — it keeps your service online and hammers it, driving your metered costs up until the bill itself is the damage.

Auto-scaling and per-request billing — the two features that make modern platforms feel magical — are exactly what the attack weaponizes. The more it hits you, the more you scale, the more you pay. There's no natural stopping point except your card declining.

This isn't fringe anymore. The threat is mainstream enough that Vercel published a defensive playbook for it, mitigating denial of wallet risks. When a major platform writes the mitigation guide for an attack on its own billing model, the problem is real.

And the traffic environment makes it worse every quarter. A large and growing share of web traffic is now automated rather than human — AI crawlers like GPTBot, ClaudeBot, Bytespider, and PerplexityBot aggressively re-crawl dynamic routes and assets. On a per-request or per-egress plan, you're paying to serve data to bots, sometimes to train someone else's model. The Vercel case is just the sharp end of that: instead of a crawler reading static pages, it was traffic hammering a metered inference endpoint, where every call costs real money.

What made it nasty operationally: the requests came through residential proxies, so per-IP rate limits had nothing useful to act on. Hundreds of thousands of requests, no single IP to block. That's the part teams underestimate — the obvious defense (rate-limit the abusive IP) quietly stops working when there isn't one.

Where a flat per-hour cost ceiling changes the math

Here's the structural difference, stated plainly.

With usage-based pricing, your cost is a function of traffic — a variable you don't control.

With flat per-hour instance pricing, your cost is a function of the instance you chose to run — a number you picked, and a number you can forecast.

This is the model Deployra uses. You run your app as a container on Kubernetes on an instance with a known size and a known hourly rate. A traffic spike — legitimate or hostile — hits the instance you're already paying for. It doesn't silently multiply your bill per request, because there is no per-request meter to multiply.

The numbers are public and flat (monthly = hourly × 730):

  • Web Service Basic-512MB (0.5 CPU, 512 MB RAM): $3.21/month
  • Web Service Basic-2GB (1.0 CPU, 2 GB RAM): $7.65/month
  • Web Service Basic-4GB (2.0 CPU, 4 GB RAM): $13.79/month

A typical full-stack app — one Web Basic-2GB plus a managed database (Basic-1GB at $4.41/month) — runs about $12.06/month. Not "roughly twelve dollars plus whatever the internet decides to do to you this month." Twelve dollars. If a bot swarm finds your busiest endpoint, the worst case at the billing layer is that your instance is busy — not that your invoice quietly grows a digit.

There's also a guardrail on the scaling side. Deployra's autoscaling uses Kubernetes HPA with explicit minReplicas and maxReplicas settings. You set the upper bound on how many replicas can spin up. Even under load, your scale — and therefore your cost ceiling — is capped at a number you defined, not whatever the traffic demands. The meter doesn't get to decide how big you get.

Run Vercel's incident through this model: 1,300 requests per minute against a flat instance means a busy box and maybe degraded latency until you act. It does not mean a $10K/day run rate, because there is no per-inference meter to run up. The traffic is the same. The financial blast radius is not.

Be honest about what a cost ceiling does and doesn't do

If this post told you flat pricing is a magic shield against abuse, it would be lying. Here's the honest boundary.

What a flat per-hour cost ceiling genuinely gives you:

  • A forecastable number. Your monthly cost is the instances you run, full stop. You can put it in a spreadsheet before the month starts and be right at the end of it.
  • No per-request billing surface to attack. A denial-of-wallet attack inflates a meter. With flat instance pricing, hammering your endpoint doesn't inflate a per-request charge, because there isn't one. The financial blast radius of a spike is bounded by the instance, not the request count.
  • A scale cap you control. With HPA maxReplicas, you decide the maximum footprint. Your cost can't scale past the bound you set.

What it does NOT do:

  • It doesn't make you immune to attacks. A bot swarm can still degrade performance, exhaust a small instance, or affect availability. Flat pricing changes your financial exposure, not your security posture. You still want rate limiting, sane timeouts, and bot filtering at your app or proxy layer. Vercel didn't just rely on its pricing model — it blocked the bots. You should too.
  • It doesn't autoscale infinitely for free. This is the real tradeoff, and it's the honest one. A flat instance has finite capacity. A genuine, sustained surge of legitimate traffic means you'll need a bigger instance or more replicas — and that costs more. The difference: that's a predictable step you choose and can see coming, not a silent per-request multiplier that lands as a surprise. You trade "infinite invisible scaling" for "scaling you decide on and can budget for."
  • It doesn't replace billing alerts. Wherever you host, set cost alarms, cap what you can, watch your traffic. Predictable pricing makes the worst case smaller; it doesn't make monitoring optional.

The pitch, stated plainly: not "you can never get hurt," but "the worst case at the billing layer is a number you already know."

A 10-minute gut-check for your own stack

You don't have to migrate anything to take the lesson. Spend ten minutes on these questions about wherever you deploy today:

  • What's your worst-case bill this month? If you can't answer with a specific number, your cost is a function of traffic, not of a decision you made.
  • What happens if a bot swarm finds your busiest endpoint? Does your invoice grow per request, or is it bounded by an instance you're already paying for?
  • Do residential-proxy bots break your defenses? If your only abuse control is per-IP rate limiting, Vercel's incident shows how fast that fails when there's no single IP.
  • Do you have a scale ceiling? If your platform auto-scales, is there a maxReplicas-style cap, or can it scale until your card declines?
  • Are billing alerts set? Predictable pricing helps, but alerts are your seatbelt everywhere.

If those answers make you uncomfortable, you've found the same gap Vercel's own incident points at — and it's worth closing on your terms.

The takeaway

Usage-based pricing isn't evil. It's a genuinely good fit for low, bursty, early-stage workloads. But it couples your bill to inbound traffic, and in 2026 inbound traffic increasingly means AI agents and bot swarms you didn't invite. Vercel's April 12, 2026 incident — 10x spike, ~1,300 requests per minute, a $10K/day run rate caught just in time — is what that coupling looks like at scale, on infrastructure run by people who know exactly what they're doing.

A flat per-hour cost ceiling breaks that coupling at the billing layer. Your cost becomes the instances you chose to run — a Web Basic-2GB is $7.65/month, a full-stack app around $12.06/month — and your scale is capped at a maxReplicas bound you set. It won't make you invincible. It will make your worst case a number you can forecast.

That's the whole promise of no surprise bills: not that nothing can go wrong, but that the bill isn't the thing that goes wrong.


Related Articles

Forecast Your Bill, Not Your Anxiety

Stop wondering what the internet will do to your invoice this month. Try Deployra — full-stack deployment on Kubernetes with flat per-hour instance pricing from $3.21/month, a scale ceiling you control, and no per-request meter for a bot swarm to run up.

Ready to get started with Deployra?