`mcp-optimizer`

Category: optimization · Cloud + Local · Status: v1 — production

Embeds each tool’s name + description, scores each one against the current user message, drops the tools that score below the threshold. The kept set is stable per-session so provider prompt caches don’t shatter.

What it does

A typical MCP-using agent ships 30–80 tools on every request, even though the user is only asking about one. mcp-optimizer keeps the relevant subset.


Before: 67k tokens of tool definitions
After:   8k tokens of tool definitions
         88% reduction in MCP overhead

When to use it

✅ Any agent that uses MCP tools ✅ Claude Code, Cline, Continue.dev, custom MCP clients ✅ Multi-server MCP setups (filesystem + GitHub + Slack + …)

❌ Apps that don’t use tools ❌ Apps where every tool is always relevant (rare)

Configuration


mcp-optimizer:
  relevanceThreshold: 0.6        # 0.0 - 1.0; lower = keep more tools
  preserveTools: []              # always keep these (by name)
  embeddingModel: 'voyage-3-lite' # or 'text-embedding-3-small'
  minTools: 1                    # never drop below this many

Metrics emitted

mcp-optimizer.tools.before (number)
mcp-optimizer.tools.after (number)
mcp-optimizer.tokens.saved (number)
mcp-optimizer.duration_ms (number)

Examples

Conservative — keep most tools, drop only obvious mismatches:


mcp-optimizer:
  relevanceThreshold: 0.4

Aggressive — strip hard, save tokens:


mcp-optimizer:
  relevanceThreshold: 0.75
  preserveTools: ['read_file', 'write_file', 'bash']

Coding-assistant tuned — keep file ops always, drop the rest by relevance:


mcp-optimizer:
  relevanceThreshold: 0.6
  preserveTools:
    - read_file
    - write_file
    - bash
    - grep
    - glob

How it works

Pre hook: extract the user’s last message text.
Embed the user message.
For each tool in request.tools: embed ${name}: ${description}. (Cached by tool hash — first request pays the cost, subsequent ones hit the cache.)
Compute cosine similarity. Keep tools above threshold + any in preserveTools. Always keep at least minTools.
Replace request.tools with the kept subset.
Attach metadata['mcp-optimizer.tokens.saved'] for downstream visibility.

The kept subset is stable per session. We hash the input set + threshold + user message into a session key — so the same cache prefix lands at Anthropic on the next turn. Critical for prompt cache hit rates.

Cloud vs Local

Mode	Embedding backend
Cloud	Voyage AI (configurable) — falls back to deterministic stub if no key
Local	Same — uses your `VOYAGE_API_KEY` if set, else stub

The stub is a SHA256-of-trigrams projected to 256-dim. Quality is poor but stable, so caches behave deterministically in tests.

Source

packages/modules-core/src/mcp-optimizer.ts