Skip to content
Auto-routing

Auto-routing

What it solves

A single fiz run request usually under-specifies the route. The caller asks for a policy (cheap / default / smart / air-gapped) or pins one axis (--model, --provider, --harness) and expects the runtime to fill in the rest, skip providers that just timed out, exclude providers that are quota-exhausted, and reuse a previously-good choice when a correlation key is present. That’s auto-routing: the engine that collapses an under-specified request into a concrete (harness, provider, endpoint, model) decision against live signals.

The engine is model-first: it ranks concrete models against the caller’s policy/power bounds and capability requirements, then picks the best provider that can serve the chosen model. Provider preference (local-first, subscription-first) is a tiebreaker, not a primary axis. See ADR-006 for the rationale.

Public surface

Resolve one request

The single public entry point is FizeauService.ResolveRoute, returning a RouteDecision with the chosen Harness, Provider, Endpoint, ServerInstance, and Model, plus the full ranked Candidates trace (including rejected candidates and their typed FilterReason).

Internally it delegates to internal/routing.Resolve, the single ranking engine; everything else (cooldowns, lease reuse, escalation) is plumbing around it.

Failure modes

The engine returns typed errors so callers can branch precisely:

Quota state machine

ProviderQuotaStateStore tracks each provider as available or quota_exhausted with a RetryAfter instant. Transitions:

available     --MarkQuotaExhausted--> quota_exhausted
quota_exhausted --MarkAvailable----> available
quota_exhausted --(now >= retry_after)--> available  // auto-decay

ProviderBurnRateTracker maintains a per-provider rolling daily-token window and predictively transitions a provider to quota_exhausted before the upstream quota error fires, when a daily_token_budget is configured. This turns observed token usage (see Performance tracking) into routing pressure.

Per-attempt feedback

After every dispatch, the public RecordRouteAttempt method records the outcome into internal/routehealth.Store. Failed attempts cool down the (provider, model, endpoint) tuple for routing.health_cooldown (default 60s) so the next ResolveRoute skips it. This makes auto-routing adaptive rather than purely configuration-driven.

Routing-quality ring

internal/routingquality.Store is a 1024-entry in-memory ring of recent Execute calls and their overrides. The root facade projects that store onto RoutingQualityMetrics, which exposes three first-class numbers (ADR-006 §5):

  • AutoAcceptanceRate — fraction of requests with no override. The headline routing-health number.
  • OverrideDisagreementRate — fraction of overrides where the user pin actually differed from auto’s choice on the overridden axis.
  • OverrideClassBreakdown — pivot of (prompt-feature bucket, axis, match) so operators can see which requests humans keep overriding.

Operator surface

Config (.fizeau/config.yaml)

The routing: block exposes:

KeyEffect
default_modelDefault model-route key when caller passes neither --model nor --provider.
health_cooldownHow long a failed candidate is deprioritized (default 60s).
history_windowLookback for scoring healthy candidates.
probe_timeoutTimeout for provider availability/model probes.
reliability_weightScore weight for recent success rate.
performance_weightScore weight for observed latency/throughput.
load_weightScore weight for recent selection volume (load-balancing).
cost_weightScore weight for known cost.
capability_weightScore weight for benchmark capability (swe_bench_verified).

Per-provider: daily_token_budget arms the predictive burn-rate tracker for that provider.

Env-var overrides

FIZEAU_PROVIDER, FIZEAU_BASE_URL, FIZEAU_API_KEY, FIZEAU_MODEL, plus the sampling pin overrides FIZEAU_TEMPERATURE / FIZEAU_TOP_P / FIZEAU_TOP_K / FIZEAU_MIN_P (source) override config-file values for one process. The bench harness uses these to inject per-trial samplers without editing config.yaml.

CLI

The operator surface for routing lives in three subcommands — see the auto-generated CLI reference:

  • fiz route-status — live cooldowns, per-candidate health, last decisions, plus routing-quality metrics (--overrides for the override-class pivot).
  • fiz providers — every configured provider and its current quota state.
  • fiz check — probe one or all providers (forces a MarkAvailable on success).
  • fiz models — list discovered models with routing metadata.

Examples

Configure two providers with a daily budget on the cloud one:

# .fizeau/config.yaml
providers:
  local:
    type: lmstudio
    base_url: http://127.0.0.1:1234/v1
  cloud:
    type: openrouter
    api_key: ${OPENROUTER_API_KEY}
    daily_token_budget: 2000000
routing:
  default_model: qwen3-coder-30b
  health_cooldown: 90s
  reliability_weight: 1.0
  performance_weight: 0.5
  capability_weight: 0.5

Inspect live state:

$ fiz route-status --json | jq '.routing_quality'
{
  "auto_acceptance_rate": 0.94,
  "override_disagreement_rate": 0.21,
  "total_requests": 312,
  "total_overrides": 19
}

A 0.94 acceptance rate over the recent window means humans accepted the auto choice 94% of the time. The 0.21 disagreement rate inside the overrides says ~80% of those overrides were redundant — the human pinned what auto would have picked.

Where to look next