mcp-optimizer
Category: optimization · Cloud + Local · Status: v1 — production
Embeds each tool’s name + description, scores each one against the current user message, drops the tools that score below the threshold. The kept set is stable per-session so provider prompt caches don’t shatter.
What it does
A typical MCP-using agent ships 30–80 tools on every request, even though the user is only asking about one. mcp-optimizer keeps the relevant subset.
Before: 67k tokens of tool definitions
After: 8k tokens of tool definitions
88% reduction in MCP overheadWhen to use it
✅ Any agent that uses MCP tools ✅ Claude Code, Cline, Continue.dev, custom MCP clients ✅ Multi-server MCP setups (filesystem + GitHub + Slack + …)
❌ Apps that don’t use tools ❌ Apps where every tool is always relevant (rare)
Configuration
mcp-optimizer:
relevanceThreshold: 0.6 # 0.0 - 1.0; lower = keep more tools
preserveTools: [] # always keep these (by name)
embeddingModel: 'voyage-3-lite' # or 'text-embedding-3-small'
minTools: 1 # never drop below this manyMetrics emitted
mcp-optimizer.tools.before(number)mcp-optimizer.tools.after(number)mcp-optimizer.tokens.saved(number)mcp-optimizer.duration_ms(number)
Examples
Conservative — keep most tools, drop only obvious mismatches:
mcp-optimizer:
relevanceThreshold: 0.4Aggressive — strip hard, save tokens:
mcp-optimizer:
relevanceThreshold: 0.75
preserveTools: ['read_file', 'write_file', 'bash']Coding-assistant tuned — keep file ops always, drop the rest by relevance:
mcp-optimizer:
relevanceThreshold: 0.6
preserveTools:
- read_file
- write_file
- bash
- grep
- globHow it works
- Pre hook: extract the user’s last message text.
- Embed the user message.
- For each tool in
request.tools: embed${name}: ${description}. (Cached by tool hash — first request pays the cost, subsequent ones hit the cache.) - Compute cosine similarity. Keep tools above threshold + any in
preserveTools. Always keep at leastminTools. - Replace
request.toolswith the kept subset. - Attach
metadata['mcp-optimizer.tokens.saved']for downstream visibility.
The kept subset is stable per session. We hash the input set + threshold + user message into a session key — so the same cache prefix lands at Anthropic on the next turn. Critical for prompt cache hit rates.
Cloud vs Local
| Mode | Embedding backend |
|---|---|
| Cloud | Voyage AI (configurable) — falls back to deterministic stub if no key |
| Local | Same — uses your VOYAGE_API_KEY if set, else stub |
The stub is a SHA256-of-trigrams projected to 256-dim. Quality is poor but stable, so caches behave deterministically in tests.