Embeddable Go agent runtime

Fizeau

An agentic-development runtime with its own measured loop. Other tools build on fizeau instead of writing their own — we own the harness, sampling, performance instrumentation, cost tracking, and subscription accounting so they don't have to. Local-model-first via vLLM, MLX, LM Studio, Ollama; cloud providers when you want them.

Get started View benchmarks GitHub →

LATEST · Qwen3.6-27B / OpenRouter · TB-2.1 · 2026-05-13 15:38:48 UTC

PASS @ k

61.8%

55 of 89 tasks · k = 5 reps

DECODE

46.8

tokens / second · p50 / turn

TTFT

0.89s

first-token latency · p50 / turn

Why fizeau exists

Fizeau exists for three reasons that build on each other:

Facilitate agentic development. A reusable, embeddable agent loop with the right primitives — tool-calling, planning, compaction, retries, session logging — so building tools doesn't mean re-implementing the loop every time.
Make agentic work measurable. Per-turn timing, prefill vs decode breakdown, cost-per-trial, subscription-quota accounting — first-class outputs, not bolted-on observability. You can't improve prompts, agents, or providers you can't measure.
Make local models a real option. Local serving (vLLM, MLX, LM Studio, Ollama) on the same provider surface as cloud frontier models. The benchmarks compare them honestly. Self-hosted at the right quantization is often cheaper, sometimes faster, and rarely the right answer for everything — but you can pick per workload because the data is on the table.

What it is

Fizeau is an agent runtime with a built-in agent loop (the fiz harness): it manages the prompt, tool-call protocol, file/edit/bash tooling, planning, compaction, retries, sampling, reasoning, quotas, and session log. It is not an LLM serving runtime — it does not host weights. Upstream model traffic goes to whatever provider the profile points at (OpenAI, Anthropic, OpenRouter, vLLM, oMLX, RapidMLX, native local).

Fizeau can also run as a wrapper around a different agent CLI (Claude Code, Codex, Pi, OpenCode) — the fiz-harness-* profiles in the profile catalog use this mode, where fiz handles configuration, environment, tool-call accounting, and session logging while delegating the reasoning loop to the wrapped agent. This isolates "is the agent loop hurting?" from "is the model hurting?" — same model, different harness, different profile.

For benchmark purposes, each profile in the catalog holds either axis constant and varies the other. A delta between two profiles that share a model but differ in harness is harness loss; a delta between two profiles that share a harness but differ in provider is provider/runtime loss.

Built for instrumented agent work

Every surface assumes you want to know what the medium is doing. There is no separate observability layer — the runtime emits structured per-turn timing as a first-class output.

RUNTIME

Built-in agent loop

Tool-calling LLM loop with read, write, edit, bash, find, grep, ls, patch, task. Compaction, retry, sampler, reasoning, quotas — all wired through one provider-shaped surface.

PROVIDERS

One surface, many backends

OpenAI, Anthropic, OpenRouter, vLLM, oMLX, RapidMLX, native local. Profile definitions are YAML; benchmark deltas reflect provider/runtime, not harness drift.

MEASUREMENT

TTFT, decode, prefill — per turn

Every llm.request → llm.delta → llm.response chain is timed and recorded. No sampling, no aggregation loss. Bucket by context length; attribute wall-time to prefill vs generation.

HARNESS-AS-WRAPPER

Wrap Claude Code, Codex, Pi, OpenCode

fiz-harness-* profiles route through fiz as a measurement wrapper around another agent CLI. Holds the model constant; varies the harness; isolates "is the loop hurting?" from "is the model hurting?"

SESSIONS

JSONL session logs, replayable

Every turn, every tool call, every cost figure on disk in line-delimited JSON. fiz log to list, fiz replay to render. Replays drive the per-turn timing analysis behind every chart on this site.

EMBEDDABLE

Go library, no subprocess overhead

fizeau.New(...).Execute(ctx, request). Lives inside a build orchestrator (DDx) or any Go service that needs a tool-using model on its critical path.