Skip to content
Performance tracking

Performance tracking

What it solves

Fizeau treats round-trip timing as a first-class output. Every turn emits a structured record — request, response, tool call, session start and end — with token counts, latency, cost, and (for streaming providers) per-chunk delta timing. Downstream code reads these records to compute throughput, attribute cost, and feed signals back into auto-routing so the next decision incorporates the last one’s outcome.

The point is not “we collect metrics.” The point is that the measurement chain is the public surface — public events, public projections, an OpenTelemetry semantic-convention span — so embedders never need to parse harness-native streams or raw JSONL to learn what happened.

Public surface

Per-turn JSONL session log

The session logger (internal/session/logger.go) writes one JSONL line per event into <SessionLogDir>/<session-id>.jsonl. Event payloads are stable, versioned types defined in internal/session/event.go:

EventFields that matter for performance
session.startselected_provider, selected_endpoint, resolved_model, sticky.*, utilization.* (source)
llm.requesttemperature, top_p, top_k, min_p, seed, cache_policy, sampling_source (source)
llm.responseusage.{input,output,total}, cost_usd, latency_ms, finish_reason (source)
tool.callduration_ms, output, error (source)
session.endaggregated tokens, cost_usd, duration_ms, full route/sticky/utilization snapshot (source)

The lifecycle wrapper serviceSessionLog writes session.end exactly once even when the run fails partway through.

Aggregated projection

The service-owned projection UsageReport folds historical session logs into per-(provider, model) rows. Rows expose computed accessors so callers don’t re-derive them:

  • SuccessRate() — per-provider reliability rate
  • CostPerSuccess() — known cost ÷ successful sessions
  • InputTokensPerSecond() / OutputTokensPerSecond() — throughput
  • CacheHitRate() — cached-input / total-input fraction

The report also carries a RoutingQualityMetrics block (auto-acceptance rate, override-class breakdown), so a single UsageReport covers both what happened and how the routing performed over the same window.

OpenTelemetry surface

When telemetry: is configured, every chat call produces an invoke_agent / chat / execute_tool span tagged with stable semantic-convention keys:

  • Standard GenAI keys (gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, gen_ai.request.model, gen_ai.response.model, …).
  • DDX timing keys (telemetry.go): ddx.timing.first_token_ms (TTFT), ddx.timing.queue_ms, ddx.timing.prefill_ms, ddx.timing.generation_ms, ddx.timing.cache_read_ms, ddx.timing.cache_write_ms.
  • DDX cost keys (telemetry.go): ddx.cost.amount, ddx.cost.input_amount, ddx.cost.output_amount, ddx.cost.cache_read_amount, ddx.cost.cache_write_amount, ddx.cost.pricing_ref, ddx.cost.source.

Feedback into routing

Token counts also flow through the service-owned ProviderBurnRateTracker, which wraps the internal quota burn-rate implementation used by routing. When projected end-of-day usage exceeds the configured daily_token_budget, the tracker pre-emptively transitions the provider to quota_exhausted — without waiting for the upstream 429. This loop makes performance tracking actionable rather than purely diagnostic.

Operator surface

Config (.fizeau/config.yaml)

KeyEffect
session_log_dirWhere per-session JSONL is written. Defaults to .fizeau/sessions/.
telemetry.enabledToggle OTel span emission (source).
telemetry.pricing.*Per-(provider, model) pricing for cost attribution when the provider doesn’t return cost.
providers.<name>.daily_token_budgetArms predictive burn-rate exhaustion for that provider.

CLI

  • fiz log [session-id] — pretty-print one session log, or list recent sessions when no id is given.
  • fiz replay <session-id> — re-render the public event stream from a session log.
  • fiz usageUsageReport projection over a window (--since today, --since 7d, …) including the routing_quality block.

Examples

Stream tokens from a recent session as ND-JSON:

$ fiz log 2026-05-09T14-32-08Z --json | \
    jq -c 'select(.type=="llm.response") | {latency_ms: .data.latency_ms, in: .data.usage.input, out: .data.usage.output, cost: .data.cost_usd}'
{"latency_ms":847,"in":1240,"out":312,"cost":0.0019}
{"latency_ms":621,"in":1583,"out":89,"cost":0.0012}

7-day report with provider reliability and cost per success:

$ fiz usage --since 7d --json | \
    jq '.rows[] | {provider, model, sessions, success_rate: (.success_sessions / .sessions), cost_per_success: .known_cost_usd}'

Where to look next