cost-guard
Category: safety · Cloud + Local · Status: v1 — production
Estimates request cost from input tokens × pricing. If any limit would be breached, returns a 429-shaped error response without calling the provider. Tracks actual spend after each call.
What it does
You wake up to find a runaway script burned $5,000 over the weekend. cost-guard is the module that prevents that.
When to use it
✅ Production apps where you can’t ride blind on the provider’s rate limits ✅ Multi-tenant apps where each user has a daily budget ✅ Free tiers (cap users at $0.50/day so abusers can’t drain you) ✅ During chaotic dev sprints (one bad loop, no $500 surprise)
❌ Apps where users bring their own provider key (cost-guard caps by your tracking, not their bill)
Configuration
cost-guard:
perRequest: 0.10 # USD; reject single requests above this
perDay: 5.00 # USD; per-user daily cap
perMonth: 100.00 # USD; per-user monthly cap
keyPrefix: 'cost' # storage key namespace (rarely change)All three limits are optional. Set only what you need. Setting none makes the module a no-op.
Metrics emitted
cost.estimated(USD; pre-hook)cost.actual(USD; post-hook)cost.day_spent(USD; running total)cost.month_spent(USD; running total)
Examples
Tight per-request cap — block expensive prompts:
cost-guard:
perRequest: 0.05Free tier — give users $0.50/day to play with:
cost-guard:
perDay: 0.50Production app — multiple guards in concert:
cost-guard:
perRequest: 0.50 # nothing absurd
perDay: 25.00 # daily user budget
perMonth: 500.00 # monthly user budgetWhat “exceeded” looks like
When a limit is breached, the client receives a 429 response with this shape:
{
"type": "error",
"error": {
"type": "cost_limit_per_day",
"message": "Daily cost cap exceeded",
"limit": 5.00,
"spent": 4.87,
"estimated": 0.21,
"resets_at": "2026-04-28T00:00:00.000Z"
}
}The resets_at field tells your client when to retry without burning a request.
How it works
Pre hook
- Estimate request cost:
inputTokens × pricing[model]. - Attach
metadata['cost.estimated']. - If
perRequestset and estimate > limit → short-circuit with 429. - If
perDayset: read today’s spend from KV, checktoday + estimated > limit→ short-circuit if exceeded. - If
perMonthset: same with the month bucket. - Otherwise continue.
Post hook
- Skip on error responses.
- Calculate actual cost from response usage (
input_tokens + cached_input_tokens × cache_rate + output_tokens). - Increment the day + month buckets in KV.
- Day buckets TTL after 48h. Month buckets TTL after 35 days.
Cost-guard tracks estimated spend during the pre check and actual spend in the post hook. There’s a race window where bursts of concurrent requests can each pass the check before any of them increment the counter. With Upstash Redis the gap is ~5–10ms; in practice the overage is bounded by your concurrency level. v1.1 uses Lua INCR for an atomic check-and-increment.
Cloud vs Local
| Mode | Pricing source |
|---|---|
| Cloud | Static pricing map keyed by model prefix (updated periodically). |
| Local | Same — included in the binary. |
The pricing map is in packages/modules-core/src/lib/cost.ts. Submit a PR if you spot stale pricing.