Customer support bot

For chat agents that field hundreds of similar questions per day. Built around aggressive caching plus tight cost caps to keep per-conversation cost bounded.

What this pipeline is good at

30–50% cache hit rate on common questions.
Hard $-per-conversation cap so a single user can’t drain your budget.
PII redaction before requests leave the gateway.
Smart routing — Haiku for simple questions, Sonnet for hard ones.

The pipeline

Env var

PRXY_PIPE='exact-cache,semantic-cache,cost-guard,patterns'

YAML config

pipeline:
  - exact-cache:
      ttlSeconds: 86400         # repeat questions within a day
  - semantic-cache:
      similarity: 0.92          # tight — support answers should match closely
      ttlSeconds: 86400
      scope: 'global'           # cross-user — answers are public knowledge
  - cost-guard:
      perRequest: 0.05          # nothing absurd
      perDay: 0.50              # cap per user/day
  - patterns:
      maxInjected: 3
      minSuccessRate: 0.85      # only highly-trusted patterns

Use guardrails first and router last:

PRXY_PIPE='guardrails,exact-cache,semantic-cache,cost-guard,patterns,router'

Why this order

Caches first — exact then semantic. Most common questions hit the cache, never hit the provider.
cost-guard after caches — no point burning cap budget on a request the cache would have answered.
patterns last — only relevant for cache misses; injects context before the provider call.
guardrails before everything — strip PII before any module sees the request.
router after everything else — picks the cheapest model that can handle what’s left after all the optimizations.

Cost math

For a support bot doing 10,000 conversations/month, ~3 turns each:

Without prxy.monster	With this pipeline
30,000 calls × $0.02 each = $600/mo	18,000 cache misses × $0.02 + 12,000 hits × $0 = $360/mo

Plus cap protection: a single user looping a bug at 10 calls/sec costs you $0.50 max instead of $50.

Variants

Knowledge-base Q&A only (no per-user data):

pipeline:
  - semantic-cache:
      similarity: 0.85          # looser — KB answers tolerate more variance
      scope: 'global'
      ttlSeconds: 604800        # week
  - cost-guard: { perRequest: 0.10 }

High-traffic with strict redaction (with production modules):

pipeline:
  - guardrails:
      pii_redact: true
      custom_patterns: ['/sk-[a-zA-Z0-9]{32,}/']
  - exact-cache: { ttlSeconds: 3600 }
  - semantic-cache: { similarity: 0.90, scope: 'global' }
  - cost-guard: { perRequest: 0.05, perDay: 0.50 }