Using prxy.monster with LangChain JS
LangChain JS uses provider chat models (ChatOpenAI, ChatAnthropic) that accept a configuration.baseURL (OpenAI) or clientOptions.baseURL (Anthropic) override. One arg per model — every chain, agent, and LangGraph node built on top inherits it.
Install
You probably have these already.
npm install @langchain/openai @langchain/anthropic @langchain/coreConfigure
You can either set env vars OR pass options explicitly. Both work; explicit is clearer when reviewing code.
ChatOpenAI
import { ChatOpenAI } from '@langchain/openai';
const llm = new ChatOpenAI({
model: 'gpt-4o',
apiKey: process.env.OPENAI_API_KEY, // your prxy key
configuration: {
baseURL: 'https://api.prxy.monster/v1',
},
});
const r = await llm.invoke('hi');Code change
If you used env vars: zero. If you wired explicit configuration: the baseURL line above is the only diff.
Verify
curl https://api.prxy.monster/healthBuild a one-line chain and run it — successful response confirms routing.
What you get
- Caching across chains — every chain that uses the patched LLM gets
semantic-cachefor free. - Infinite context for long-running agents —
ipccompresses old messages, your agents stop hitting the wall. - Pattern memory — successful agent solutions get re-injected on similar future runs.
- MCP optimization — if you use LangChain’s MCP tool integration, irrelevant tool defs get pruned per call.
How this composes with LangChain memory
LangChain memory (BufferMemory, SummaryMemory, etc.) lives inside your chain. prxy.monster lives between your chain and the LLM. They don’t conflict — they handle different layers:
| Concern | LangChain memory | prxy.monster |
|---|---|---|
| Per-chain conversation state | Yes | No |
| Cross-chain caching | No | Yes (semantic-cache) |
| Cross-session pattern learning | No | Yes (patterns) |
| Token-budget management | Manual | Automatic (ipc, cost-guard) |
| Tool-def overhead | No | Yes (mcp-optimizer) |
Use both. They compose cleanly.
LangGraph
LangGraph wraps the same chat models. Set baseURL on the underlying ChatOpenAI / ChatAnthropic instance you pass to nodes — every node inherits it.
import { ChatAnthropic } from '@langchain/anthropic';
import { StateGraph } from '@langchain/langgraph';
const llm = new ChatAnthropic({
model: 'claude-sonnet-4-6',
clientOptions: { baseURL: 'https://api.prxy.monster' },
});
// Every node that uses `llm` now goes through prxy.monster
const graph = new StateGraph(...)
.addNode('reason', async (state) => llm.invoke(state.messages))
.addNode('answer', async (state) => llm.invoke(state.messages))
.compile();LangSmith tracing is unaffected — traces happen client-side before the request leaves your process.
Recommended pipeline
For RAG / chain workflows:
PRXY_PIPE=semantic-cache,patterns,ipc,cost-guardFor agentic workflows with tool use:
PRXY_PIPE=mcp-optimizer,semantic-cache,patterns,ipcCommon issues
maxTokensvsmax_tokens— LangChain JS uses camelCase; the SDK translates to snake_case before hitting the wire. prxy.monster sees the snake_case version. Nothing to do.- Streaming —
llm.stream(...)works. Cache hits replay as synthetic SSE. - Custom callbacks — fire client-side, before the request leaves. They see the original chain state, not what the proxy did.
Full example
LangChain RAG pipeline with semantic caching: github.com/Ekkos-Technologies-Inc/prxy-examples/tree/main/examples/langchain-rag
Verify the exact constructor option name (configuration.baseURL for OpenAI, clientOptions.baseURL for Anthropic) with the LangChain JS docs for your installed version.