Auto-routing
What it solves
A single fiz run request usually under-specifies the route. The caller
asks for a policy (cheap / default / smart / air-gapped) or pins one
axis (--model, --provider, --harness) and expects the runtime to
fill in the rest, skip providers that just timed out, exclude providers
that are quota-exhausted, and reuse a previously-good choice when a
correlation key is present. That’s auto-routing: the engine that
collapses an under-specified request into a concrete
(harness, provider, endpoint, model) decision against live signals.
The engine is model-first: it ranks concrete models against the
caller’s policy/power bounds and capability requirements, then picks the
best provider that can serve the chosen model. Provider preference
(local-first, subscription-first) is a tiebreaker, not a primary
axis. See ADR-006 for the rationale.
Public surface
Resolve one request
The single public entry point is FizeauService.ResolveRoute, returning a
RouteDecision with the chosen Harness, Provider, Endpoint,
ServerInstance, and Model, plus the full ranked Candidates trace
(including rejected candidates and their typed
FilterReason).
Internally it delegates to
internal/routing.Resolve,
the single ranking engine; everything else (cooldowns, lease reuse,
escalation) is plumbing around it.
Failure modes
The engine returns typed errors so callers can branch precisely:
ErrUnknownProvider,ErrUnknownPolicy,ErrHarnessModelIncompatible— configuration mistakes the operator must fix.ErrModelConstraintAmbiguous/ErrModelConstraintNoMatch— the--modelpin matched zero or many concrete models.ErrNoLiveProvider— the entire policy ladder (cheap → default → smart) lacked any live provider that supports the prompt size + tool requirement.*NoViableProviderForNow— every otherwise-eligible candidate is currently quota-exhausted. Carries aRetryAfterso DDx-style supervisors pause work rather than treating the request as a permanent failure.
Quota state machine
ProviderQuotaStateStore
tracks each provider as available or quota_exhausted with a
RetryAfter instant. Transitions:
available --MarkQuotaExhausted--> quota_exhausted
quota_exhausted --MarkAvailable----> available
quota_exhausted --(now >= retry_after)--> available // auto-decayProviderBurnRateTracker
maintains a per-provider rolling daily-token window and predictively
transitions a provider to quota_exhausted before the upstream quota
error fires, when a daily_token_budget is configured. This turns
observed token usage (see Performance tracking)
into routing pressure.
Per-attempt feedback
After every dispatch, the public RecordRouteAttempt method records the
outcome into
internal/routehealth.Store.
Failed attempts cool down the (provider, model, endpoint) tuple for
routing.health_cooldown (default 60s) so the next ResolveRoute skips
it. This makes auto-routing adaptive rather than purely
configuration-driven.
Routing-quality ring
internal/routingquality.Store
is a 1024-entry in-memory ring of recent Execute calls and their
overrides. The root facade projects that store onto RoutingQualityMetrics,
which exposes three first-class numbers (ADR-006 §5):
- AutoAcceptanceRate — fraction of requests with no override. The headline routing-health number.
- OverrideDisagreementRate — fraction of overrides where the user pin actually differed from auto’s choice on the overridden axis.
- OverrideClassBreakdown — pivot of (prompt-feature bucket, axis, match) so operators can see which requests humans keep overriding.
Operator surface
Config (.fizeau/config.yaml)
The
routing: block
exposes:
| Key | Effect |
|---|---|
default_model | Default model-route key when caller passes neither --model nor --provider. |
health_cooldown | How long a failed candidate is deprioritized (default 60s). |
history_window | Lookback for scoring healthy candidates. |
probe_timeout | Timeout for provider availability/model probes. |
reliability_weight | Score weight for recent success rate. |
performance_weight | Score weight for observed latency/throughput. |
load_weight | Score weight for recent selection volume (load-balancing). |
cost_weight | Score weight for known cost. |
capability_weight | Score weight for benchmark capability (swe_bench_verified). |
Per-provider:
daily_token_budget
arms the predictive burn-rate tracker for that provider.
Env-var overrides
FIZEAU_PROVIDER, FIZEAU_BASE_URL, FIZEAU_API_KEY, FIZEAU_MODEL,
plus the sampling pin overrides
FIZEAU_TEMPERATURE / FIZEAU_TOP_P / FIZEAU_TOP_K / FIZEAU_MIN_P
(source)
override config-file values for one process. The bench harness uses
these to inject per-trial samplers without editing config.yaml.
CLI
The operator surface for routing lives in three subcommands — see the auto-generated CLI reference:
fiz route-status— live cooldowns, per-candidate health, last decisions, plus routing-quality metrics (--overridesfor the override-class pivot).fiz providers— every configured provider and its current quota state.fiz check— probe one or all providers (forces aMarkAvailableon success).fiz models— list discovered models with routing metadata.
Examples
Configure two providers with a daily budget on the cloud one:
# .fizeau/config.yaml
providers:
local:
type: lmstudio
base_url: http://127.0.0.1:1234/v1
cloud:
type: openrouter
api_key: ${OPENROUTER_API_KEY}
daily_token_budget: 2000000
routing:
default_model: qwen3-coder-30b
health_cooldown: 90s
reliability_weight: 1.0
performance_weight: 0.5
capability_weight: 0.5Inspect live state:
$ fiz route-status --json | jq '.routing_quality'
{
"auto_acceptance_rate": 0.94,
"override_disagreement_rate": 0.21,
"total_requests": 312,
"total_overrides": 19
}A 0.94 acceptance rate over the recent window means humans accepted the auto choice 94% of the time. The 0.21 disagreement rate inside the overrides says ~80% of those overrides were redundant — the human pinned what auto would have picked.
Where to look next
- Source of truth:
AGENTS.mdpackage layout § Routing, quota, and modeling. - Engine internals:
internal/routing/engine.go,internal/routing/score.go,internal/routing/gating.go. - ADRs: ADR-006 (routing-quality is the primary user surface), ADR-007 (sampling resolution chain).
- Sibling page: Performance tracking — the per-turn signals that auto-routing reads back.