Using prxy.monster with LlamaIndex (Python)
LlamaIndex Python’s OpenAI LLM class accepts an api_base constructor arg. Set it once, every query engine / retriever / agent inherits the routing.
Install
pip install llama-index llama-index-llms-openai llama-index-embeddings-openaiConfigure
from llama_index.llms.openai import OpenAI
from llama_index.core import Settings
Settings.llm = OpenAI(
model="gpt-4o",
api_base="https://api.prxy.monster/v1",
api_key="prxy_live_xxxxxxxxxxxxxxxxxxxxxxxx",
)Or per-LLM (no global default):
llm = OpenAI(
model="gpt-4o",
api_base="https://api.prxy.monster/v1",
api_key="prxy_live_xxxxxxxxxxxxxxxxxxxxxxxx",
)
query_engine = index.as_query_engine(llm=llm)Or via env var (no constructor change):
export OPENAI_BASE_URL=https://api.prxy.monster/v1
export OPENAI_API_KEY=prxy_live_xxxxxxxxxxxxxxxxxxxxxxxxfrom llama_index.llms.openai import OpenAI
Settings.llm = OpenAI(model="gpt-4o") # picks up env varsVerify
curl https://api.prxy.monster/healthRun a query — successful response confirms routing.
What you get
- Semantic cache for retrievals (RAG often produces near-identical follow-up queries).
- Pattern memory — successful retrieval / answer patterns get learned.
- Infinite context —
ipckeeps multi-turn QueryEngine sessions from hitting the wall. - Cost guards — per-request and per-day budget caps.
Embeddings
The same api_base arg works for OpenAIEmbedding:
from llama_index.embeddings.openai import OpenAIEmbedding
Settings.embed_model = OpenAIEmbedding(
model="text-embedding-3-small",
api_base="https://api.prxy.monster/v1",
api_key="prxy_live_xxxxxxxxxxxxxxxxxxxxxxxx",
)Both your LLM calls AND your embedding calls route through prxy.monster.
Anthropic provider
If you’re using llama-index-llms-anthropic, the same pattern applies — pass the prxy base URL:
from llama_index.llms.anthropic import Anthropic
Settings.llm = Anthropic(
model="claude-sonnet-4-6",
base_url="https://api.prxy.monster",
api_key="prxy_live_xxxxxxxxxxxxxxxxxxxxxxxx",
)Recommended pipeline
For RAG:
PRXY_PIPE=exact-cache,semantic-cache,patterns,cost-guardFor agents with tool use:
PRXY_PIPE=mcp-optimizer,semantic-cache,patterns,ipcCommon issues
- Settings.llm is process-global — set it at app startup, before instantiating indexes.
- Async APIs (
aquery,astream_query) — work identically. - Function calling (
llm.predict_and_call(tools=...)) — pass-through.
Full example
Adapt examples/openai-quickstart — replace the OpenAI client with Settings.llm = OpenAI(...) as shown above.
Verify the exact constructor argument name with the LlamaIndex Python docs for your installed version. api_base is stable across recent versions.