Skip to Content
prxy.monster v1 is in early access. See what shipped →
Modulesrouter

router

Category: routing · Cloud + Local · Status: v1.0 — production

Picks the right model for each request. Falls back through a configured chain. The cloud edition records per-(query bucket, model) outcomes and learns over time which model produces high-quality results most cheaply.

What it does

Three strategies, swappable at any time:

  • cheapest-first (default) — sort the candidate list by per-token price, pick the cheapest model that fits the budget.
  • fallback — try fallback_chain[0]. (Real fallback-on-error is handled at the gateway pipeline level — this strategy just picks the model name.)
  • q-learning — score each (query bucket, model) by historical success rate; pick the best. Cold start falls back to cheapest-first.

The originally-requested model is preserved in metadata['router.requested_model'] so downstream modules and observability dashboards can see what the client asked for vs. what shipped.

When to use it

  • You don’t want to hardcode claude-sonnet-4-6 everywhere — let the router pick.
  • You want automatic provider failover (Anthropic 5xx → fall back to OpenAI).
  • You want the cheap model to handle the easy questions, the expensive model only when needed.

Configuration

router: strategy: 'cheapest-first' # 'q-learning' | 'fallback' | 'cheapest-first' fallback_chain: - claude-sonnet-4-6 - gpt-4o - gemini-2.0-pro prefer: # try these first if confidence high - claude-haiku-4-5 budget_per_request: 0.10 # never pick a model whose estimate exceeds this

Metrics emitted

  • router.requested_model (string) — what the client asked for.
  • router.selected_model (string) — what was actually shipped.
  • router.strategy (string) — which strategy ran.

How it works

  1. Pre hook:

    • Record the requested model in metadata.
    • Build the candidate list: prefer first, then fallback_chain, dedup. Always include the requested model.
    • Filter out anything above budget_per_request (estimate per model).
    • Apply the strategy:
      • cheapest-first → sort by pricing.input + pricing.output, take the cheapest.
      • fallback → take candidates[0].
      • q-learning → look up historical success rate per (bucket, model) and take the highest-rated. Cold start falls back to cheapest-first.
    • Mutate request.model to the selection.
  2. Post hook (q-learning only):

    • Increment the (bucket, model) counter; add 1 to n always, add 1 to s if the response was successful.
    • 30-day TTL on stat rows.

Migration note

If you’re coming from OpenRouter and using their fallback feature, this is the module that replicates it.

Source

packages/modules-core/src/router.ts

Last updated on