`router`

Category: routing · Cloud + Local · Status: v1.0 — production

Picks the right model for each request. Falls back through a configured chain. The cloud edition records per-(query bucket, model) outcomes and learns over time which model produces high-quality results most cheaply.

What it does

Three strategies, swappable at any time:

cheapest-first (default) — sort the candidate list by per-token price, pick the cheapest model that fits the budget.
fallback — try fallback_chain[0]. (Real fallback-on-error is handled at the gateway pipeline level — this strategy just picks the model name.)
q-learning — score each (query bucket, model) by historical success rate; pick the best. Cold start falls back to cheapest-first.

The originally-requested model is preserved in metadata['router.requested_model'] so downstream modules and observability dashboards can see what the client asked for vs. what shipped.

When to use it

You don’t want to hardcode claude-sonnet-4-6 everywhere — let the router pick.
You want automatic provider failover (Anthropic 5xx → fall back to OpenAI).
You want the cheap model to handle the easy questions, the expensive model only when needed.

Configuration


router:
  strategy: 'cheapest-first'  # 'q-learning' | 'fallback' | 'cheapest-first'
  fallback_chain:
    - claude-sonnet-4-6
    - gpt-4o
    - gemini-2.0-pro
  prefer:                     # try these first if confidence high
    - claude-haiku-4-5
  budget_per_request: 0.10    # never pick a model whose estimate exceeds this

Metrics emitted

router.requested_model (string) — what the client asked for.
router.selected_model (string) — what was actually shipped.
router.strategy (string) — which strategy ran.

How it works

Pre hook:
- Record the requested model in metadata.
- Build the candidate list: prefer first, then fallback_chain, dedup. Always include the requested model.
- Filter out anything above budget_per_request (estimate per model).
- Apply the strategy:
  - cheapest-first → sort by pricing.input + pricing.output, take the cheapest.
  - fallback → take candidates[0].
  - q-learning → look up historical success rate per (bucket, model) and take the highest-rated. Cold start falls back to cheapest-first.
- Mutate request.model to the selection.
Post hook (q-learning only):
- Increment the (bucket, model) counter; add 1 to n always, add 1 to s if the response was successful.
- 30-day TTL on stat rows.

Migration note

If you’re coming from OpenRouter and using their fallback feature, this is the module that replicates it.

Source

packages/modules-core/src/router.ts