Skip to Content
prxy.monster v1 is in early access. See what shipped →
IntegrationsUsing prxy.monster with LlamaIndex (Python)

Using prxy.monster with LlamaIndex (Python)

LlamaIndex Python’s OpenAI LLM class accepts an api_base constructor arg. Set it once, every query engine / retriever / agent inherits the routing.

Install

pip install llama-index llama-index-llms-openai llama-index-embeddings-openai

Configure

from llama_index.llms.openai import OpenAI from llama_index.core import Settings Settings.llm = OpenAI( model="gpt-4o", api_base="https://api.prxy.monster/v1", api_key="prxy_live_xxxxxxxxxxxxxxxxxxxxxxxx", )

Or per-LLM (no global default):

llm = OpenAI( model="gpt-4o", api_base="https://api.prxy.monster/v1", api_key="prxy_live_xxxxxxxxxxxxxxxxxxxxxxxx", ) query_engine = index.as_query_engine(llm=llm)

Or via env var (no constructor change):

export OPENAI_BASE_URL=https://api.prxy.monster/v1 export OPENAI_API_KEY=prxy_live_xxxxxxxxxxxxxxxxxxxxxxxx
from llama_index.llms.openai import OpenAI Settings.llm = OpenAI(model="gpt-4o") # picks up env vars

Verify

curl https://api.prxy.monster/health

Run a query — successful response confirms routing.

What you get

  • Semantic cache for retrievals (RAG often produces near-identical follow-up queries).
  • Pattern memory — successful retrieval / answer patterns get learned.
  • Infinite contextipc keeps multi-turn QueryEngine sessions from hitting the wall.
  • Cost guards — per-request and per-day budget caps.

Embeddings

The same api_base arg works for OpenAIEmbedding:

from llama_index.embeddings.openai import OpenAIEmbedding Settings.embed_model = OpenAIEmbedding( model="text-embedding-3-small", api_base="https://api.prxy.monster/v1", api_key="prxy_live_xxxxxxxxxxxxxxxxxxxxxxxx", )

Both your LLM calls AND your embedding calls route through prxy.monster.

Anthropic provider

If you’re using llama-index-llms-anthropic, the same pattern applies — pass the prxy base URL:

from llama_index.llms.anthropic import Anthropic Settings.llm = Anthropic( model="claude-sonnet-4-6", base_url="https://api.prxy.monster", api_key="prxy_live_xxxxxxxxxxxxxxxxxxxxxxxx", )

For RAG:

PRXY_PIPE=exact-cache,semantic-cache,patterns,cost-guard

For agents with tool use:

PRXY_PIPE=mcp-optimizer,semantic-cache,patterns,ipc

Common issues

  • Settings.llm is process-global — set it at app startup, before instantiating indexes.
  • Async APIs (aquery, astream_query) — work identically.
  • Function calling (llm.predict_and_call(tools=...)) — pass-through.

Full example

Adapt examples/openai-quickstart  — replace the OpenAI client with Settings.llm = OpenAI(...) as shown above.

Verify the exact constructor argument name with the LlamaIndex Python docs  for your installed version. api_base is stable across recent versions.

Last updated on