Skip to Content
prxy.monster v1 is in early access. See what shipped →
Modulescost-guard

cost-guard

Category: safety · Cloud + Local · Status: v1 — production

Estimates request cost from input tokens × pricing. If any limit would be breached, returns a 429-shaped error response without calling the provider. Tracks actual spend after each call.

What it does

You wake up to find a runaway script burned $5,000 over the weekend. cost-guard is the module that prevents that.

When to use it

✅ Production apps where you can’t ride blind on the provider’s rate limits ✅ Multi-tenant apps where each user has a daily budget ✅ Free tiers (cap users at $0.50/day so abusers can’t drain you) ✅ During chaotic dev sprints (one bad loop, no $500 surprise)

❌ Apps where users bring their own provider key (cost-guard caps by your tracking, not their bill)

Configuration

cost-guard: perRequest: 0.10 # USD; reject single requests above this perDay: 5.00 # USD; per-user daily cap perMonth: 100.00 # USD; per-user monthly cap keyPrefix: 'cost' # storage key namespace (rarely change)

All three limits are optional. Set only what you need. Setting none makes the module a no-op.

Metrics emitted

  • cost.estimated (USD; pre-hook)
  • cost.actual (USD; post-hook)
  • cost.day_spent (USD; running total)
  • cost.month_spent (USD; running total)

Examples

Tight per-request cap — block expensive prompts:

cost-guard: perRequest: 0.05

Free tier — give users $0.50/day to play with:

cost-guard: perDay: 0.50

Production app — multiple guards in concert:

cost-guard: perRequest: 0.50 # nothing absurd perDay: 25.00 # daily user budget perMonth: 500.00 # monthly user budget

What “exceeded” looks like

When a limit is breached, the client receives a 429 response with this shape:

{ "type": "error", "error": { "type": "cost_limit_per_day", "message": "Daily cost cap exceeded", "limit": 5.00, "spent": 4.87, "estimated": 0.21, "resets_at": "2026-04-28T00:00:00.000Z" } }

The resets_at field tells your client when to retry without burning a request.

How it works

Pre hook

  1. Estimate request cost: inputTokens × pricing[model].
  2. Attach metadata['cost.estimated'].
  3. If perRequest set and estimate > limit → short-circuit with 429.
  4. If perDay set: read today’s spend from KV, check today + estimated > limit → short-circuit if exceeded.
  5. If perMonth set: same with the month bucket.
  6. Otherwise continue.

Post hook

  1. Skip on error responses.
  2. Calculate actual cost from response usage (input_tokens + cached_input_tokens × cache_rate + output_tokens).
  3. Increment the day + month buckets in KV.
  4. Day buckets TTL after 48h. Month buckets TTL after 35 days.

Cost-guard tracks estimated spend during the pre check and actual spend in the post hook. There’s a race window where bursts of concurrent requests can each pass the check before any of them increment the counter. With Upstash Redis the gap is ~5–10ms; in practice the overage is bounded by your concurrency level. v1.1 uses Lua INCR for an atomic check-and-increment.

Cloud vs Local

ModePricing source
CloudStatic pricing map keyed by model prefix (updated periodically).
LocalSame — included in the binary.

The pricing map is in packages/modules-core/src/lib/cost.ts. Submit a PR if you spot stale pricing.

Source

packages/modules-core/src/cost-guard.ts

Last updated on