What prxy.monster is

A drop-in proxy that sits between your app and your LLM provider. You change one env var (ANTHROPIC_BASE_URL or OPENAI_BASE_URL) and suddenly your requests pass through a configurable pipeline of middleware before they hit the provider.

Think Express middleware, but for LLM calls.

The mental model


your app  ──▶  prxy.monster  ──▶  Anthropic / OpenAI / etc.
              │
              ├── auth (validate your prxy_xxx key)
              ├── load pipeline (PRXY_PIPE config)
              ├── pre-modules (cache check, optimize, inject)
              ├── provider call
              └── post-modules (write cache, learn patterns, track usage)

Each module is a small, independently-testable unit. They share a common interface and access storage through an adapter.

Why this matters

Most LLM gateways do one thing: route. prxy.monster does that too, but the routing is one module out of many. You compose:

Caches that skip provider calls entirely.
Cost guards that refuse expensive requests before they leave the gateway.
Optimizers that strip unused MCP tools, restructure prompts for provider cache hits, manage context length.
Memory that injects learned patterns from past successful calls.
Routers that pick the cheapest model that’s likely to handle the task.

You opt into the modules you want, in the order you want them.

Two backends, identical API

The same code runs in two environments:

Mode	Storage	Auth	Use when
Cloud	Postgres + Redis + R2	Hashed API keys + user sessions	You want zero-ops + cross-device sync.
Local	SQLite + filesystem	Bypassed (free, single-user)	You want zero data leaving your machine.

Modules don’t know which one they’re running on. The storage adapter handles the difference.

What it isn’t

Not a model marketplace. You bring your own provider key. We don’t markup tokens.
Not a workflow builder. No DAGs, no visual editor. The pipeline is a flat ordered list.
Not a logging service. Logging is a side effect of caching + pattern modules — there’s no separate observability product layered on top.

Where to go next

Pipeline execution — how requests flow through modules.
Modules — the unit of composition.
Storage adapters — how cloud and local share an interface.
Quickstart cloud — get running in 90 seconds.