Even Your Spending Cap Has a Spending Cap

You did the responsible thing. You opened the billing console, found the budget setting, typed in a number you could live with, and clicked save. Then you went back to building. That number was supposed to be the wall — the point where the meter stops and someone has to make a decision before the spend continues.
Here's the thing: on most clouds, it isn't a wall. It's a doorbell. And worse, the real ceiling above it isn't fixed either — it quietly ratchets upward as you spend more. Your spending cap has a spending cap, and that one moves on its own.
This post is about why "I set a budget" and "my spend is bounded" are two completely different statements, what the 2026 changes to Google's billing actually do, and what an upper bound on spend looks like when the platform is built to have one.
A budget alert is not a budget cap
Start with the most expensive misconception in cloud computing: that a budget stops spending.
It doesn't. On Google Cloud, the documentation is explicit — setting a budget does not cap your usage or your spend. Budgets exist to trigger alerts so you can see costs trending over time (Google Cloud, Cloud Billing budgets docs). You set $50, you blow past $50, and the only thing that changes is an email lands in your inbox. The meter keeps running.
If you want an actual hard stop on GCP, you have to build one yourself: wire the budget's alert into a Pub/Sub topic, write a Cloud Function that calls the billing API to detach the billing account from the project, and hope it fires fast enough. There's a well-known community project that exists purely to automate this shutoff (Cyclenerd/poweroff-google-cloud-cap-billing on GitHub). The fact that a popular open-source tool exists just to turn billing off tells you everything about the default.
How sharp is the gap between "alert" and "wall"? Sharp enough that a hobbyist reportedly woke up to an ~$18,000 Google Cloud bill while having a $7 budget configured — a forgotten public API key let an attacker fire 60,000+ requests and tear through a $1,400 spending cap (Tom's Hardware, reported 2026). Treat the dollar figure as reported, not gospel — but the mechanism is real and documented: a budget is informational. It does not pull the plug.
The cap that upgrades itself
Now the part that gives this post its title.
In response to exactly this kind of pain, Google added enforced monthly spend caps to the Gemini API. Starting April 1, 2026, every billing account carries a mandatory monthly cap tied to its usage tier, and when aggregate spend hits that cap, API requests pause until the next billing cycle (Google AI for Developers, Gemini API billing docs; Google, "more control over Gemini API costs," 2026). Good. That's a real cap — the first one in this story that actually stops something.
But read the next sentence in the policy. The system automatically upgrades you to the next tier as your usage grows and your payment history matures. Meet the criteria, and your cap moves up — no action required on your part (Google AI for Developers, Gemini API billing docs).
So the tiers, as reported, look roughly like this:
| Tier | Monthly spend cap (as reported) | How you "qualify" | |---|---|---| | Tier 1 | ~$250/mo | entry | | Tier 2 | ~$2,000/mo | ~$100 cumulative spend + ~3 days | | Tier 3 | ~$20,000 to $100,000+/mo | ~$1,000 cumulative spend + ~30 days |
(Figures as reported by Google AI for Developers and secondary coverage in 2026; treat exact thresholds as point-in-time, not permanent.)
Look at what that means in practice. Your spend cap is $250 — until you've spent $100 and waited three days, at which point it becomes $2,000 automatically. Keep going, and it becomes $20,000 or more. The cap is real, but it's not fixed. It's a function of how much you've already spent. The more you spend, the more you're allowed to spend, and the elevator only goes up.
That's the trap in one line: the safeguard scales with your spending instead of against it. A runaway loop, a leaked key, or a bad week of bot traffic is precisely the situation that pushes you past the spend-and-time thresholds — which is exactly when your cap quietly upgrades to a bigger number. Your spending cap has a spending cap, and the second one is reached by spending.
This is not a knock on Google specifically. Google added these caps because the old world — budgets-as-alerts — burned enough people that "more control over costs" became a headline feature. It's progress. But a cap that auto-ratchets is a different animal from a cap you set and forget.
Why this keeps happening: the meter has no ceiling
Step back from any one provider and the pattern is structural. Usage-metered pricing — per request, per invocation, per GB of egress, per token — has no inherent upper bound. The architecture is a meter, and meters count up forever. Every "cost control" you bolt on (alerts, tiers, quotas, kill-switch functions) is a layer of software trying to impose a ceiling that the billing model fundamentally doesn't have.
That's why the controls feel bolt-on, because they are. And it's why they fail in the worst moments: the alert is async, the kill-switch function has its own cold start, the tier upgraded itself last Tuesday. When the underlying unit of billing is "one more request," your only real defense is constant vigilance — which is not a defense, it's a part-time job.
We've written before about the flip side of this — denial of wallet, where a bot swarm weaponizes per-request billing to turn a side project into a four-figure invoice. Same root cause: the meter doesn't know how to stop.
What an actual upper bound looks like
Here's the alternative, and it's not "set a better budget." It's a billing model where the ceiling is a property of what you provisioned, not a number you hope a script enforces in time.
Deployra prices per hour, per instance, and you cap spend by capping replicas. When you turn on autoscaling, you set minReplicas and maxReplicas. That maxReplicas is the ceiling — your service can scale up to it under load and no further. There is no tier sitting above it that upgrades itself because you got popular. The arithmetic is closed:
max monthly spend = (per-hour price of the instance) × (maxReplicas) × 720 hours
You can compute your worst case before you deploy, and it stays true next month. No async alert, no kill-switch function, no "you've been auto-upgraded to Tier 3."
The per-hour prices are flat and published:
| Deployra Web Service tier | CPU / RAM | $/hour | $/month (×720) | |---|---|---|---| | Free | 0.1 / 512 MB | $0 | $0.00 | | Basic-512MB | 0.5 / 512 MB | $0.004403 | $3.17 | | Basic-2GB | 1.0 / 2 GB | $0.010486 | $7.55 | | Basic-4GB | 2.0 / 4 GB | $0.018889 | $13.60 |
A typical full-stack app — one Web Basic-2GB ($7.55) plus one managed database on Basic-1GB ($4.35) — lands around $11.90/month. Add a managed Redis cache on Basic-256MB and you're at $1.49 more. Those are the numbers. They don't have a hidden tier above them.
Now do the worst-case math the way you can't on a metered cloud. Suppose you run a Web Basic-2GB service and set maxReplicas to 5. Your absolute ceiling is $0.010486 × 5 × 720 ≈ $37.75/month, full stop — even if a bot storm pins every replica at 100% CPU for the entire month. That's not a budget you're hoping holds. It's the most Kubernetes will ever schedule, because you told it the maximum and it obeys.
The honest comparison
To be clear about what each model is good at:
- Metered clouds (GCP, AWS, and AI APIs) scale to effectively infinite throughput. If your business genuinely needs to serve a 100x traffic spike and you'd rather pay for it than drop requests, unbounded autoscaling is a feature, not a bug. The enforced Gemini caps and auto-upgrades are a real improvement on the old alerts-only world.
- Deployra trades that unbounded ceiling for a knowable one. Your
maxReplicasis the wall. If traffic exceeds what your capped replicas can serve, you scale up deliberately by raising the cap — you don't discover it after the fact on an invoice. For indie projects, side projects, and cost-conscious startups, a ceiling you can name in advance beats a ceiling that names itself later.
Different tools for different risk appetites. But if your nightmare is the surprise invoice — the bill with one more digit than you expected — you want the model where the maximum is something you set and the platform physically can't exceed.
What to do on the cloud you're on right now
Not migrating today? Tighten what you can:
- Treat every budget as an alert, never a cap. Assume it will not stop spending. On GCP, wire the budget alert to a function that actually detaches billing if you need a hard stop.
- Know your tier and its upgrade triggers. If you're on the Gemini API, you're inside the April 2026 tier system whether you noticed or not. Check which tier you're on and what spend/time pushes you to the next one, so an "auto-upgrade" is never a surprise.
- Cap throughput at the source. Rate-limit your own endpoints, scope and rotate API keys, and never ship a public key in a project. The $18K-on-a-$7-budget story started with a forgotten key.
- Forecast your worst case, not your average. If you can't write down the maximum your bill could be this month, you don't have a cap — you have a hope.
The takeaway
A budget you set is a wish. A cap that auto-upgrades is a wish with extra steps. The only spending cap that holds is one tied to a resource you provisioned — because the platform can count replicas, and it can refuse to schedule the next one.
That's the whole idea behind Deployra's pricing: flat per-hour rates, a replica ceiling you control, and a worst-case number you can compute before you deploy. No surprise bills. No tier that upgrades itself while you sleep.
Ready to know your maximum in advance? See the full pricing on deployra.com and do the worst-case math for yourself.