Skip to Content
prxy.monster v1 is in early access. See what shipped →
IntegrationsUsing prxy.monster with LlamaIndex (TypeScript)

Using prxy.monster with LlamaIndex (TypeScript)

LlamaIndex TS uses @llamaindex/openai for the OpenAI provider, which accepts a baseURL argument. Set it on the OpenAI LLM instance — every query engine, retriever, and agent that uses it inherits the routing.

Install

npm install llamaindex @llamaindex/openai

Configure

import { OpenAI } from '@llamaindex/openai'; import { Settings } from 'llamaindex'; Settings.llm = new OpenAI({ model: 'gpt-4o', apiKey: process.env.OPENAI_API_KEY, // your prxy key additionalSessionOptions: { baseURL: 'https://api.prxy.monster/v1', }, });

Or per-LLM (without setting a global default):

const llm = new OpenAI({ model: 'gpt-4o', apiKey: process.env.OPENAI_API_KEY, additionalSessionOptions: { baseURL: 'https://api.prxy.monster/v1', }, }); const queryEngine = index.asQueryEngine({ llm });

Code change

The additionalSessionOptions: { baseURL: ... } line is the only diff. If you set the env var instead:

export OPENAI_BASE_URL=https://api.prxy.monster/v1 export OPENAI_API_KEY=prxy_live_xxxxxxxxxxxxxxxxxxxxxxxx

…then no code change at all — the underlying OpenAI client picks it up.

Verify

curl https://api.prxy.monster/health

Run any query — successful response confirms routing.

What you get

  • Semantic cache for repeated retrievals — same question across sessions returns cached answers.
  • Pattern memory — successful query patterns get learned.
  • Infinite context — long-running query engines stop hitting context limits via the ipc module.
  • Cost guards — hard per-request budget caps before the OpenAI bill arrives.

Embeddings

LlamaIndex’s OpenAIEmbedding also accepts additionalSessionOptions.baseURL. Both your LLM calls AND your embedding calls route through prxy.monster:

import { OpenAIEmbedding } from '@llamaindex/openai'; Settings.embedModel = new OpenAIEmbedding({ model: 'text-embedding-3-small', apiKey: process.env.OPENAI_API_KEY, additionalSessionOptions: { baseURL: 'https://api.prxy.monster/v1', }, });

For RAG-heavy LlamaIndex apps:

PRXY_PIPE=semantic-cache,exact-cache,patterns,cost-guard

exact-cache first because RAG often produces identical retrieval prompts.

Common issues

  • Settings.llm is global — set it once at app startup, before instantiating any indexes / engines.
  • asQueryEngine() builds a fresh LLM call each query — every call goes through prxy.monster as expected.
  • StreamingqueryEngine.query({ stream: true }) works; cache hits replay as synthetic SSE.

Full example

Adapt examples/openai-quickstart  — replace the OpenAI client with the LlamaIndex Settings.llm setup above.

Verify the exact option name (additionalSessionOptions.baseURL) with the LlamaIndex TS docs  for your installed version.

Last updated on