`guardrails`

Category: safety · Cloud + Local · Status: v1.0 — production (regex backend)

Content filtering at the gateway layer instead of in your app code.

What it does

Pluggable content filtering. v1 ships the regex backend — local, zero-cost, no third-party calls.

PII redaction — emails, phone numbers, SSNs, credit cards (16-digit) replaced with placeholders before the request leaves the gateway.
Profanity block — short built-in list, can be extended via custom_patterns.
Custom regex policies — your own deny/redact rules.

When to use it

Customer-facing apps where users send unmoderated input.
Compliance-sensitive workloads (SOC 2, HIPAA, etc).
Internal tools where you need to keep secrets out of model logs (block API key paste, etc).

Configuration


guardrails:
  pii_redact: true                    # default false
  profanity_block: false              # default false
  custom_patterns:
    - 'sk-[a-zA-Z0-9]{20,}'           # block accidental API key paste
    - '^password:'                    # block "password:" leading lines
  backend: 'regex'                    # only 'regex' in v1
  on_pii: 'redact'                    # 'redact' | 'block' | 'log-only'

Behavior on violation

Match type	Default action
PII (email/SSN/card/phone)	Redact in place, continue
Profanity	400 short-circuit
Custom pattern	400 short-circuit

Set on_pii: 'block' to short-circuit on PII too. Set on_pii: 'log-only' to only count matches in metadata without mutating the request.

PII patterns built in

Email → [REDACTED_EMAIL]
US SSN (123-45-6789) → [REDACTED_SSN]
16-digit credit card (4111-1111-1111-1111) → [REDACTED_CARD]
North-American phone ((555) 123-4567) → [REDACTED_PHONE]

Metrics emitted

guardrails.backend — which backend ran.
guardrails.stats.pii_redactions — how many PII strings matched.
guardrails.stats.blocked_by — 'profanity' or 'custom:<pattern>' when short-circuited.

Roadmap

v1.1: backend: 'callout' for NVIDIA NIM / Anthropic Constitutional / OpenAI Moderation backends. Same config surface — just point at a different inference engine.

Source

packages/modules-core/src/guardrails.ts