Skip to Content
prxy.monster v1 is in early access. See what shipped →
IntegrationsUsing prxy.monster with the OpenAI SDK

Using prxy.monster with the OpenAI SDK

prxy.monster exposes a fully OpenAI-compatible API at https://api.prxy.monster/v1. Any client that talks to the OpenAI API talks to prxy.monster.

Install

npm install openai # or pip install openai

Configure

The official OpenAI client respects OPENAI_BASE_URL everywhere — Node, Python, edge.

export OPENAI_BASE_URL=https://api.prxy.monster/v1 export OPENAI_API_KEY=prxy_live_xxxxxxxxxxxxxxxxxxxxxxxx

Code change

None. Both openai (Node) and openai (Python) auto-pick up the env var.

// Before AND after — no diff import OpenAI from 'openai'; const client = new OpenAI(); const r = await client.chat.completions.create({ model: 'gpt-4o-mini', messages: [{ role: 'user', content: 'hi' }], });

If you prefer explicit:

const client = new OpenAI({ baseURL: 'https://api.prxy.monster/v1', apiKey: process.env.OPENAI_API_KEY, });

Verify

curl https://api.prxy.monster/health

Or, with the CLI:

prxy doctor

What you get

  • Infinite contextchat.completions.create calls compress old turns instead of dropping them.
  • Semantic cache — similar prompts hit cache, return in 15-30ms.
  • Pattern memory — successful answers get learned and re-injected.
  • Cost guards — hard per-request budget caps before the OpenAI bill arrives.
PRXY_PIPE=mcp-optimizer,semantic-cache,patterns,ipc

For batch / cost-sensitive workloads, add exact-cache first:

PRXY_PIPE=exact-cache,semantic-cache,cost-guard,patterns

Streaming

const stream = await client.chat.completions.create({ model: 'gpt-4o-mini', messages: [{ role: 'user', content: 'tell a story' }], stream: true, }); for await (const chunk of stream) { process.stdout.write(chunk.choices[0]?.delta?.content ?? ''); }

Works identically. Cache hits replay as synthetic SSE.

Common issues

  • Function calling / tools — pass-through. mcp-optimizer prunes irrelevant tool defs automatically if you ship many.
  • response_format: { type: 'json_object' } — pass-through.
  • Assistants API (v1/assistants, threads, runs) — currently not proxied. Use the Chat Completions API instead.
  • Embeddings (v1/embeddings) — pass-through, used internally by the semantic-cache module too.

Full example

Plain Node script: github.com/Ekkos-Technologies-Inc/prxy-examples/tree/main/examples/openai-quickstart 

prxy.monster speaks the OpenAI Chat Completions wire format. Newer OpenAI features (Responses API, Realtime API) are not yet proxied — track /changelog for support.

Last updated on