Customer support bot
For chat agents that field hundreds of similar questions per day. Built around aggressive caching plus tight cost caps to keep per-conversation cost bounded.
What this pipeline is good at
- 30–50% cache hit rate on common questions.
- Hard $-per-conversation cap so a single user can’t drain your budget.
- (v1.1) PII redaction before requests leave the gateway.
- (v1.1) Smart routing — Haiku for simple questions, Sonnet for hard ones.
The pipeline
Env var
PRXY_PIPE='exact-cache,semantic-cache,cost-guard,patterns'When v1.1 ships, add guardrails first and router last:
PRXY_PIPE='guardrails,exact-cache,semantic-cache,cost-guard,patterns,router'Why this order
- Caches first — exact then semantic. Most common questions hit the cache, never hit the provider.
cost-guardafter caches — no point burning cap budget on a request the cache would have answered.patternslast — only relevant for cache misses; injects context before the provider call.- (v1.1)
guardrailsbefore everything — strip PII before any module sees the request. - (v1.1)
routerafter everything else — picks the cheapest model that can handle what’s left after all the optimizations.
Cost math
For a support bot doing 10,000 conversations/month, ~3 turns each:
| Without prxy.monster | With this pipeline |
|---|---|
| 30,000 calls × $0.02 each = $600/mo | 18,000 cache misses × $0.02 + 12,000 hits × $0 = $360/mo |
Plus cap protection: a single user looping a bug at 10 calls/sec costs you $0.50 max instead of $50.
Variants
Knowledge-base Q&A only (no per-user data):
pipeline:
- semantic-cache:
similarity: 0.85 # looser — KB answers tolerate more variance
scope: 'global'
ttlSeconds: 604800 # week
- cost-guard: { perRequest: 0.10 }High-traffic with strict redaction (when v1.1 lands):
pipeline:
- guardrails:
pii_redact: true
custom_patterns: ['/sk-[a-zA-Z0-9]{32,}/']
- exact-cache: { ttlSeconds: 3600 }
- semantic-cache: { similarity: 0.90, scope: 'global' }
- cost-guard: { perRequest: 0.05, perDay: 0.50 }See also
Last updated on