FAQ
What is prxy.monster?
A composable AI gateway. You set one env var (ANTHROPIC_BASE_URL or OPENAI_BASE_URL), and your LLM calls flow through a configurable pipeline of middleware before hitting the provider. Modules handle caching, cost limits, prompt optimization, persistent learning.
How is it different from OpenRouter / Portkey / Helicone / LiteLLM?
See Migration for detailed comparisons. Short version: prxy.monster’s emphasis is composable modules + local mode + persistent memory. Other gateways are stronger on specific axes (OpenRouter on routing, Helicone on observability) — we trade some breadth in those for depth elsewhere.
Do you store my prompts?
Cloud mode: prompts pass through and aren’t logged in plaintext. The patterns module saves successful “fix X with Y” snippets to your private pattern store. The semantic-cache module saves embeddings + responses for cache lookups, scoped to your user by default. Both are deletable from the dashboard.
Local mode: everything stays on your machine. Nothing is sent to us at all.
See Local mode → Privacy for full details.
Do I need to use your provider keys?
No. Bring your own key (BYOK) is the default. You pay your provider for tokens. We charge a flat tier for the gateway. No markup.
In v1, you set provider keys in the dashboard or pass them per-request. v1.1 adds optional hosted-key tiers for users who’d rather not manage keys at all.
Does the gateway add latency?
A few milliseconds for the proxy itself. The pipeline modules each add their own:
exact-cachehit: net negative latency (ms instead of ~1s provider call).semantic-cachelookup: ~10–30ms when a key is set; instant on hit.mcp-optimizer: ~20–50ms for embedding (cached after first request).cost-guard: under 1ms (KV lookup).patternsinjection: ~10–20ms (vector search).
For most workloads the modules save more time than they cost.
Can I run my own modules?
Yes. See SDK → Module interface. Modules are TypeScript objects implementing a six-field interface.
In v1, custom modules load from a local file path. v1.1 adds npm-package loading.
What providers are supported?
In v1: Anthropic, OpenAI, Google (Gemini), Groq.
The router module (v1.1) makes adding more straightforward — implement a ProviderClient interface and the router can route to it.
Does it work with streaming?
Yes. SSE streams pass through. Cache hits on streaming requests are replayed as synthetic streams (your client cannot tell the difference).
In v1, post-hooks are skipped on streaming responses (caches still write, via stream accumulation). Full post-hook streaming support lands in v1.1.
Does it work with tool use / function calling?
Yes. Tools are part of the canonical request shape. mcp-optimizer operates on them directly. Tool calls + tool results round-trip cleanly through both Anthropic and OpenAI shapes.
Can I A/B test pipelines?
Yes. Two ways:
- Per-request override with the
x-prxy-pipeheader — overrides the pipeline for one call only. - Multiple keys — give the variant key a different
pipelineConfig.
Is there a dashboard?
Cloud mode: yes — at app.prxy.monster . v1 ships keys + balance + basic usage. v1.1 ships full per-module observability.
Local mode: no built-in dashboard in v1. Query SQLite directly or wire your own. v1.1 ships an open-source dashboard package you can run alongside the local container.
What’s on the roadmap?
See Changelog for shipped + planned. Highlights:
- v1.1:
router,rehydrator,compaction-bridge,prompt-optimizer,tool-cache,guardrailsmodules. - v1.2: hybrid local-cloud sync, npm module registry,
evalsmodule. - v2.0: Q-learning router,
collectivepatterns (opt-in cross-user memory).
Where do I report bugs / request features?
GitHub Issues . Or email support@prxy.monster for paid tiers.