`cost-guard`

Category: safety · Cloud + Local · Status: v1 — production

Estimates request cost from input tokens × pricing. If any limit would be breached, returns a 429-shaped error response without calling the provider. Tracks actual spend after each call.

What it does

You wake up to find a runaway script burned $5,000 over the weekend. cost-guard is the module that prevents that.

When to use it

✅ Production apps where you can’t ride blind on the provider’s rate limits ✅ Multi-tenant apps where each user has a daily budget ✅ Free tiers (cap users at $0.50/day so abusers can’t drain you) ✅ During chaotic dev sprints (one bad loop, no $500 surprise)

❌ Apps where users bring their own provider key (cost-guard caps by your tracking, not their bill)

Configuration


cost-guard:
  perRequest: 0.10          # USD; reject single requests above this
  perDay: 5.00              # USD; per-user daily cap
  perMonth: 100.00          # USD; per-user monthly cap
  keyPrefix: 'cost'         # storage key namespace (rarely change)

All three limits are optional. Set only what you need. Setting none makes the module a no-op.

Metrics emitted

cost.estimated (USD; pre-hook)
cost.actual (USD; post-hook)
cost.day_spent (USD; running total)
cost.month_spent (USD; running total)

Examples

Tight per-request cap — block expensive prompts:


cost-guard:
  perRequest: 0.05

Free tier — give users $0.50/day to play with:


cost-guard:
  perDay: 0.50

Production app — multiple guards in concert:


cost-guard:
  perRequest: 0.50          # nothing absurd
  perDay: 25.00             # daily user budget
  perMonth: 500.00          # monthly user budget

What “exceeded” looks like

When a limit is breached, the client receives a 429 response with this shape:


{
  "type": "error",
  "error": {
    "type": "cost_limit_per_day",
    "message": "Daily cost cap exceeded",
    "limit": 5.00,
    "spent": 4.87,
    "estimated": 0.21,
    "resets_at": "2026-04-28T00:00:00.000Z"
  }
}

The resets_at field tells your client when to retry without burning a request.

How it works

Pre hook

Estimate request cost: inputTokens × pricing[model].
Attach metadata['cost.estimated'].
If perRequest set and estimate > limit → short-circuit with 429.
If perDay set: read today’s spend from KV, check today + estimated > limit → short-circuit if exceeded.
If perMonth set: same with the month bucket.
Otherwise continue.

Post hook

Skip on error responses.
Calculate actual cost from response usage (input_tokens + cached_input_tokens × cache_rate + output_tokens).
Increment the day + month buckets in KV.
Day buckets TTL after 48h. Month buckets TTL after 35 days.

Cost-guard tracks estimated spend during the pre check and actual spend in the post hook. There’s a race window where bursts of concurrent requests can each pass the check before any of them increment the counter. With Upstash Redis the gap is ~5–10ms; in practice the overage is bounded by your concurrency level. v1.1 uses Lua INCR for an atomic check-and-increment.

Cloud vs Local

Mode	Pricing source
Cloud	Static pricing map keyed by model prefix (updated periodically).
Local	Same — included in the binary.

The pricing map is in packages/modules-core/src/lib/cost.ts. Submit a PR if you spot stale pricing.

Source

packages/modules-core/src/cost-guard.ts