Using prxy.monster with LangChain Python

LangChain Python’s chat model classes (ChatOpenAI, ChatAnthropic) accept base_url (or anthropic_api_url on ChatAnthropic) constructor args. Set it once, every chain / agent / LangGraph node downstream inherits it.

Install


pip install langchain-openai langchain-anthropic langchain-core

Configure

ChatOpenAI


from langchain_openai import ChatOpenAI
 
llm = ChatOpenAI(
    model="gpt-4o",
    base_url="https://api.prxy.monster/v1",
    api_key="prxy_live_xxxxxxxxxxxxxxxxxxxxxxxx",
)
 
r = llm.invoke("hi")

ChatAnthropic


from langchain_anthropic import ChatAnthropic
 
llm = ChatAnthropic(
    model="claude-sonnet-4-6",
    anthropic_api_url="https://api.prxy.monster",
    anthropic_api_key="prxy_live_xxxxxxxxxxxxxxxxxxxxxxxx",
)
 
r = llm.invoke("hi")

Env vars


export OPENAI_BASE_URL=https://api.prxy.monster/v1
export OPENAI_API_KEY=prxy_live_xxxxxxxxxxxxxxxxxxxxxxxx
 
export ANTHROPIC_BASE_URL=https://api.prxy.monster
export ANTHROPIC_API_KEY=prxy_live_xxxxxxxxxxxxxxxxxxxxxxxx

Both ChatOpenAI() and ChatAnthropic() will pick these up automatically with no constructor args.

Code change

If you used env vars: zero. If you wired explicit args: the base_url / anthropic_api_url line above is the only diff.

Verify


curl https://api.prxy.monster/health

Run any chain — successful response confirms routing.

What you get

Semantic cache across every chain that uses the patched LLM.
Infinite context for long agent conversations (ipc module).
Pattern memory — successful solutions get learned and re-injected.
Cost guards — hard per-request budget caps.

LangChain memory vs prxy.monster

LangChain memory (ConversationBufferMemory, ConversationSummaryMemory, etc.) is inside your chain. prxy.monster is between your chain and the LLM. They don’t conflict.

Use LangChain memory for chain-local conversation state. Use prxy.monster for cross-chain caching, cross-session pattern memory, and token-budget management.

LangGraph


from langchain_anthropic import ChatAnthropic
from langgraph.graph import StateGraph
 
llm = ChatAnthropic(
    model="claude-sonnet-4-6",
    anthropic_api_url="https://api.prxy.monster",
)
 
graph = StateGraph(MyState)
graph.add_node("reason", lambda s: {"messages": [llm.invoke(s["messages"])]})
graph.add_node("answer", lambda s: {"messages": [llm.invoke(s["messages"])]})
# every node that uses `llm` routes through prxy.monster

LangSmith tracing is unaffected — it happens client-side before the request leaves the process.

CrewAI

CrewAI agents accept any LangChain LLM. The same ChatAnthropic / ChatOpenAI instance with the prxy base URL plugs in:


from crewai import Agent
from langchain_anthropic import ChatAnthropic
 
llm = ChatAnthropic(
    model="claude-sonnet-4-6",
    anthropic_api_url="https://api.prxy.monster",
)
 
researcher = Agent(role="Researcher", goal="...", llm=llm)

Recommended pipeline

RAG chains:


PRXY_PIPE=semantic-cache,patterns,ipc,cost-guard

Agentic workflows with tool use:


PRXY_PIPE=mcp-optimizer,semantic-cache,patterns,ipc

Common issues

Async chains (await llm.ainvoke(...)) — work identically. Streaming via astream works too.
Tool calling (llm.bind_tools([...])) — pass-through. mcp-optimizer prunes irrelevant tool defs per call.
Structured output (llm.with_structured_output(MyModel)) — pass-through. Schema-based extraction is just a tool call under the hood.

Full example

A minimal Python LangChain script will land in examples/langchain-rag — for now, the JS version is one-to-one with the Python equivalent.

Verify the exact constructor argument name with the LangChain Python docs for your installed version. base_url (OpenAI) and anthropic_api_url (Anthropic) are stable as of langchain 0.3.x.