router
Category: routing · Cloud + Local · Status: v1.0 — production
Picks the right model for each request. Falls back through a configured chain. The cloud edition records per-(query bucket, model) outcomes and learns over time which model produces high-quality results most cheaply.
What it does
Three strategies, swappable at any time:
cheapest-first(default) — sort the candidate list by per-token price, pick the cheapest model that fits the budget.fallback— tryfallback_chain[0]. (Real fallback-on-error is handled at the gateway pipeline level — this strategy just picks the model name.)q-learning— score each (query bucket, model) by historical success rate; pick the best. Cold start falls back tocheapest-first.
The originally-requested model is preserved in metadata['router.requested_model'] so downstream modules and observability dashboards can see what the client asked for vs. what shipped.
When to use it
- You don’t want to hardcode
claude-sonnet-4-6everywhere — let the router pick. - You want automatic provider failover (Anthropic 5xx → fall back to OpenAI).
- You want the cheap model to handle the easy questions, the expensive model only when needed.
Configuration
router:
strategy: 'cheapest-first' # 'q-learning' | 'fallback' | 'cheapest-first'
fallback_chain:
- claude-sonnet-4-6
- gpt-4o
- gemini-2.0-pro
prefer: # try these first if confidence high
- claude-haiku-4-5
budget_per_request: 0.10 # never pick a model whose estimate exceeds thisMetrics emitted
router.requested_model(string) — what the client asked for.router.selected_model(string) — what was actually shipped.router.strategy(string) — which strategy ran.
How it works
-
Pre hook:
- Record the requested model in metadata.
- Build the candidate list:
preferfirst, thenfallback_chain, dedup. Always include the requested model. - Filter out anything above
budget_per_request(estimate per model). - Apply the strategy:
cheapest-first→ sort bypricing.input + pricing.output, take the cheapest.fallback→ takecandidates[0].q-learning→ look up historical success rate per (bucket, model) and take the highest-rated. Cold start falls back to cheapest-first.
- Mutate
request.modelto the selection.
-
Post hook (q-learning only):
- Increment the (bucket, model) counter; add 1 to
nalways, add 1 tosif the response was successful. - 30-day TTL on stat rows.
- Increment the (bucket, model) counter; add 1 to
Migration note
If you’re coming from OpenRouter and using their fallback feature, this is the module that replicates it.
Source
Last updated on