ADR-012: Per-Source On-Disk Cache for Discovery + Runtime Signals
| Date | Status | Deciders | Related | Confidence |
|---|---|---|---|---|
| 2026-05-11 | Accepted | Fizeau maintainers | ADR-009, ADR-010 | High |
Context
The fiz models command (bead EPIC) must produce an available-models
snapshot that combines two tiers of information:
- Discovery signals — whether a source exposes a model at all
(provider
/v1/modelsendpoint, harness PTY enumeration,/propsintrospection). Slow to collect: PTY enumeration takes ~30 s; OR/api/v1/modelstakes ~316 ms p50 over the open internet. - Runtime signals — quota availability, rate-limit headroom, observed latency. Change frequently (5-min order) but don’t affect model existence.
Three requirements conflict without a cache:
- UI latency ≤ 100 ms.
fiz modelsmust return stale data immediately rather than block on discovery IO. - Multi-process safety.
ddx work,ddx try,fiz models, and the routing layer all run concurrently and must share cache state without corruption or thundering-herd re-discovery. - Crash safety. A killed refresh process must not leave the cache in a corrupt or permanently-locked state.
Fizeau already has a battle-tested file-locking idiom in
cmd/bench/matrix.go:acquireMatrixLock (lines 1222–1259): atomic
O_CREATE|O_EXCL create, JSON {PID, StartedAt} payload, crash
recovery via syscall.Kill(pid, 0), cleanup-on-exit closure. The
harness quota caches (internal/harnesses/claude/quota_cache.go,
internal/harnesses/codex/quota_cache.go,
internal/harnesses/gemini/quota_cache.go) demonstrate the
atomic-write contract: write to .tmp, os.Rename to final path,
chmod 0o600.
This ADR extends both idioms with a two-tier lock + long-lived refresh marker pattern to handle the case where the IO under the lock can take 30 s (PTY) — far longer than what a brief mutex is designed to hold.
Decision
1. File layout
~/.cache/fizeau/ (os.UserCacheDir() + "/fizeau")
├── discovery/ # slow, stable; one file per source
│ ├── openrouter.json # 24 h TTL
│ ├── claude-subscription.json # 24 h TTL
│ ├── codex-subscription.json # 24 h TTL
│ ├── vidar-ds4.json # 1 h TTL (LAN, local)
│ └── sindri-club-3090-llamacpp.json # 1 h TTL (LAN, local)
└── runtime/ # hot, volatile; one file per source
├── openrouter.json # 5 min TTL
├── claude-subscription.json # 5 min TTL
└── ...Source names are kebab-case slugs derived from the provider entry in
fizeau’s config (same identifier used by the routing layer). Each
data file is JSON. Top-level keys are fizeau’s canonical model
identities: <provider>/<model_id> (e.g.
openrouter/anthropic/claude-opus-4-5). Alongside each data file,
the cache manager may write two side-car files:
<source>.lock— brief mutation lock (held for microseconds).<source>.refreshing— long-lived refresh marker (held for the duration of the IO, up toRefreshDeadline).
2. TTL defaults
All values are operator-configurable via environment variables or the fizeau config file. The env-var names are the canonical override mechanism; config-file keys mirror them.
| Signal tier | Source type | Default TTL | Env override |
|---|---|---|---|
| Discovery | PTY enumeration | 24 h | FIZ_TTL_DISCOVERY_PTY |
| Discovery | HTTP /v1/models (remote) | 24 h | FIZ_TTL_DISCOVERY_HTTP_REMOTE |
| Discovery | HTTP /v1/models (LAN) | 1 h | FIZ_TTL_DISCOVERY_HTTP_LOCAL |
| Runtime | Any | 5 min | FIZ_TTL_RUNTIME |
Lock and deadline constants:
| Constant | Default | Env override |
|---|---|---|
| Lock acquisition timeout | 100 ms | FIZ_LOCK_TIMEOUT |
| Refresh deadline — PTY | 60 s | FIZ_REFRESH_DEADLINE_PTY |
| Refresh deadline — HTTP discovery | 10 s | FIZ_REFRESH_DEADLINE_HTTP |
| Refresh deadline — runtime | 5 s | FIZ_REFRESH_DEADLINE_RUNTIME |
| Marker staleness threshold | 2 × refresh deadline | derived |
Rationale: PTY is the slowest path; 60 s is conservative but not unreachable on a cold harness boot. HTTP remote (OR) is ~316 ms p50 measured; 10 s is a 30× headroom factor. LAN endpoints respond in < 15 ms; the 1 h TTL trades off freshness against SSH-tunnel startup cost. Runtime signals change on a ~5 min order (quota windows, rate- limit headers), matching the TTL.
3. Lock + marker pattern (two-tier)
Tier 1 — <source>.lock (brief mutation lock):
Identical to acquireMatrixLock semantics. Held only during state
transitions: “claim refresh slot” and “commit refresh result”. Held
for microseconds. Uses O_CREATE|O_EXCL atomic create. Crash
recovery: if the owning PID is dead (syscall.Kill(pid, 0) returns
ESRCH), the lock is removed and the caller retries once.
JSON payload: {"pid": <int>, "started_at": "<RFC3339>"}.
Tier 2 — <source>.refreshing (long-lived refresh marker):
Written after the lock is acquired and claimed. Held for the duration of the IO. Other processes inspect this marker to decide whether to wait or return stale data immediately. It is removed (under the Tier 1 lock) after the data file is atomically committed.
JSON payload:
{
"pid": <int>,
"started_at": "<RFC3339>",
"deadline": "<RFC3339>"
}A marker is considered stale (orphan) when either:
now > deadline + staleness_threshold(2 × refresh deadline), ORsyscall.Kill(pid, 0)returnsESRCH(process dead).
Stale markers are overridden by the next caller during claim_refresh
(the marker is removed under the Tier 1 lock before a new one is
written).
4. Algorithms (pseudocode)
Algorithm 1 — claim_refresh(source) → ClaimedByMe | AlreadyInFlight(marker)
func claim_refresh(source):
acquire tier-1 lock (timeout=LockAcquisitionTimeout):
on timeout: return error "lock contention"
existing_marker = read_marker_if_exists(source)
if existing_marker != nil:
if is_stale(existing_marker):
remove(source.refreshing) // orphan cleanup
else:
release tier-1 lock
return AlreadyInFlight(existing_marker)
// Write the marker before releasing the lock so no other
// process can claim the slot between lock release and marker
// write.
write_marker(source, pid=self, started_at=now,
deadline=now+RefreshDeadline(source))
release tier-1 lock
return ClaimedByMeAlgorithm 2 — refresh_and_commit(source)
func refresh_and_commit(source):
claim = claim_refresh(source)
if claim == AlreadyInFlight:
return wait_for_refresh(source, claim.marker,
max_wait=RefreshDeadline(source))
// Perform slow IO outside any lock.
data = fetch_source(source) // PTY, HTTP, etc.
// Atomic commit (mirrors harness quota_cache.go pattern).
tmp = source.data_path + ".tmp"
write_json(tmp, data)
chmod(tmp, 0o600)
os.Rename(tmp, source.data_path) // atomic on POSIX
// Remove marker under tier-1 lock.
acquire tier-1 lock (timeout=LockAcquisitionTimeout):
// Guard: if our deadline passed and another process
// claimed the slot, do not remove the new marker.
current_marker = read_marker_if_exists(source)
if current_marker != nil && current_marker.pid == self:
remove(source.refreshing)
release tier-1 lockAlgorithm 3 — wait_for_refresh(source, marker, max_wait)
func wait_for_refresh(source, marker, max_wait):
deadline = marker.deadline + staleness_threshold
poll_interval = 250ms
for now() < min(deadline, now() + max_wait):
sleep(poll_interval)
if not exists(source.refreshing):
return read(source) // refresh completed
current = read_marker_if_exists(source)
if current == nil:
return read(source) // refresh completed
if is_stale(current):
return read(source) // orphan; return whatever is on disk
// Timed out waiting; return stale data without error.
return read(source)Algorithm 4 — read(source) → (data, fresh_bool)
func read(source):
if not exists(source.data_path):
return (nil, false)
data = read_json(source.data_path) // no lock; file is immutable
// until atomic rename
fresh = (now() - data.captured_at) < TTL(source)
return (data, fresh)
// NB: stale data is always returned without error. The caller
// decides whether to trigger a background refresh.
// Torn reads are impossible: rename is atomic on POSIX; a
// reader either sees the old complete file or the new complete
// file, never a partial write.Algorithm 5 — force_refresh(source)
func force_refresh(source):
// Synchronous. Bypasses TTL check. Waits for any in-flight
// refresh to finish, then reads fresh data.
claim = claim_refresh(source)
if claim == AlreadyInFlight:
wait_for_refresh(source, claim.marker,
max_wait=RefreshDeadline(source))
data, _ = read(source)
return data
// We hold the claim; run refresh synchronously.
refresh_and_commit(source)
data, _ = read(source)
return dataAlgorithm 6 — In-process single-flight composition
// singleflightGroup is a golang.org/x/sync/singleflight.Group,
// one per cache instance (process-lifetime).
//
// This composes with file-based coordination: within a process,
// singleflight ensures at most one goroutine runs refresh_and_commit
// per source. Across processes, the file-based marker (Algorithm 1)
// provides the same guarantee.
func maybe_background_refresh(source):
data, fresh = read(source)
if fresh:
return data
// Stale or missing — trigger background refresh via
// singleflight so concurrent callers share one goroutine.
go singleflightGroup.Do(source.key(), func():
refresh_and_commit(source)
)
return data // return stale immediately; never block UI
func ensure_fresh(source):
// Blocking variant used by force_refresh and tests.
_, _, _ = singleflightGroup.Do(source.key(), func():
refresh_and_commit(source)
)
data, _ = read(source)
return datasingleflight.Group deduplicates concurrent Do calls with the
same key: if a refresh is already running, a second caller blocks
on the same goroutine and gets the same result. This eliminates
the in-process thundering herd without file IO overhead.
5. Crash recovery
No manual cleanup is required. Recovery happens lazily on the next
call to claim_refresh:
- Stale
.lockfile (dead PID): removed on next acquisition attempt, same asacquireMatrixLocksemantics. - Stale
.refreshingmarker (dead PID or past deadline + staleness threshold): removed under tier-1 lock before a new marker is written. - Partial data write (
.tmpfile left behind): safe to remove or overwrite on the nextrefresh_and_commit; the finalos.Renamewas never called so the committed data file is intact.
6. Force-refresh semantics
force_refresh is synchronous and bypasses the TTL check. It:
- Inspects the marker. If a refresh is already in flight, waits
up to
RefreshDeadlinefor it to complete. - If no refresh is in flight, runs one synchronously.
- Returns fresh data.
Used by fiz models --refresh and by the fiz cache refresh <source> subcommand.
7. Cache prune
fiz cache prune removes discovery and runtime data files (and
their sidecar lock/marker files) for sources not named in the
current fizeau config. The command is explicit only — it is
not called at startup. Rationale: auto-prune at startup could
silently discard a cache that another process is actively using;
the operator should decide when pruning is safe.
Safety: fiz cache prune acquires the tier-1 lock for each source
before removing its files. It skips sources that have an active
.refreshing marker.
8. Prior art: acquireMatrixLock extension
cmd/bench/matrix.go:acquireMatrixLock (lines 1222–1259) is the
direct prior art for the tier-1 .lock file:
O_CREATE|O_EXCLatomic create.- JSON
{PID, StartedAt}payload for post-mortem inspection. processAlive(PID)(syscall.Kill(pid, 0)) for crash recovery.- Single-use: acquired before a matrix cell run, released on exit.
This ADR extends that pattern with a separate tier-2
.refreshing marker so that the tier-1 lock can be released
promptly after writing the marker. This is necessary because the
IO under the marker (PTY enumeration, HTTP) can take 30–60 s — far
longer than what a blocking lock should hold. The two-tier design
keeps the brief-lock semantics of acquireMatrixLock intact while
allowing other processes to observe in-progress refresh state.
9. Mandatory test matrix for bead M1
The implementation bead (M1) must ship all 10 tests passing.
No test may be skipped, marked as flaky, or guarded by a build tag.
Multi-process tests must use the standard TestMain +
os.Args[0] helper-process pattern (spawn the test binary itself
as a child process) to avoid requiring external binaries.
| # | Name | What it proves |
|---|---|---|
| 1 | TestConcurrentClaimTwoProcs | Two child processes race claim_refresh; exactly one returns ClaimedByMe, the other AlreadyInFlight. |
| 2 | TestReaderDuringRefresh | Atomic rename guarantees no torn read: 100 concurrent readers observe only complete, checksummed versions while a writer produces 100 versions sequentially. |
| 3 | TestCrashDuringRefresh | A child process writes a marker and is killed (SIGKILL). Next process detects dead PID in marker and claims successfully. |
| 4 | TestRefreshTimeout | Marker deadline is set to now - 1 s (expired). Next process claims the slot; the original process’s late commit is rejected (tier-1 lock check guards the marker removal). |
| 5 | TestForceRefreshWaitsAndReadsFresh | force_refresh with an in-flight marker waits until the refresh completes and returns the newly written data, not stale data. |
| 6 | TestConcurrentNormalAndForce | Concurrent read (normal) + force_refresh: normal returns stale immediately without blocking; force_refresh waits and singleflight deduplicates the goroutines. |
| 7 | TestAtomicRenameVerified | 100 concurrent readers + 1 writer producing 100 versions; every read returns a complete, consistent version (verified by checksum); no version is ever partially written. |
| 8 | TestPruneDoesNotRaceActiveSources | fiz cache prune with a source whose .refreshing marker is active; prune skips that source and does not remove any of its files. |
| 9 | TestStaleWhileRevalidate | Stale cache entry: read returns stale data immediately (≤ 5 ms); a background refresh is triggered; after the refresh completes, subsequent read returns fresh data. |
| 10 | TestPIDReuseSafety | Marker contains a PID that has since been reused by an unrelated OS process (alive check passes); deadline check is the safety net — marker is treated as stale once now > deadline + staleness_threshold. |
Consequences
Positive:
fiz modelsnever hangs. The read path is always cache-bounded (≤ 100 ms) becausereadnever waits on IO — it returns stale data immediately and triggers background refresh.- Multi-process safe across
ddx work,ddx try,fiz models, and the routing layer. Exactly one process refreshes each source at a time; others either wait (force) or return stale (normal read). - Crash-safe by construction. PID-alive checks and deadline checks together recover from all observed failure modes (kill -9, OOM, timeout).
- Reuses
acquireMatrixLock’s provenO_CREATE|O_EXCL+ PID idiom. No new cross-platform lock library is required. - Each source has an independent lifecycle. OR re-discovery does
not block PTY discovery. A slow PTY source does not delay the
fiz modelsresponse for cloud sources. - Atomic-rename write contract (from harness quota caches) ensures readers never observe a partial write.
Negative:
- Two sidecar files per source (
.lock+.refreshing) is more structural complexity than a single in-process mutex. - File-based coordination has inherent edge cases on network
filesystems (NFS, CIFS) where
O_EXCLandrenamemay not be atomic. The cache directory is always underos.UserCacheDir(), which is local on all supported platforms; NFS is not a target. - PID reuse is a real (if rare) edge case: a new OS process that happens to get the same PID as the crashed refresher would defeat the PID-alive check. The deadline check is the safety net; the worst outcome is serving stale data past the deadline, which is within the acceptable degradation envelope.
- The test matrix (10 tests, several multi-process) is non-trivial
to maintain. The
TestMainhelper-process pattern requires discipline: new multi-process tests must follow the established pattern or they will not work in CI.
Out of scope
- Persistent storage of historical signals. The cache is current-state only; evicted entries are gone.
- Multi-machine cache sharing. Per-host only; distributed caches require coordination primitives beyond file locks.
- Compression. Files are plain JSON; the largest anticipated cache file (OR full model list) is ~200 KB uncompressed.
- Encryption at rest. Cache files contain no secrets; model lists and quota windows are non-sensitive operational data.
- OR sub-provider cache (deferred to bead M5). Sub-provider routing metadata warrants a separate cache tier with its own TTL strategy.
- NFS / network filesystem support. The cache directory is always local.
References
- Bead A (
fizeau-b2c5c826) — version-aware ranker; parses model IDs that become the canonical key<provider>/<model_id>. - Bead D (
fizeau-d18e11f5) —IncludeByDefaultfilter; composes withAutoRoutablein the snapshot produced from the cache. - Bead E1 (
fizeau-c04be6b0) —quota_poolcatalog field; consumed in bead M2 enrichment which reads from the runtime cache tier defined here. - Bead E2 (
fizeau-5b6512ef) — ADR-011 cost-based routing; downstream consumer of the available-models snapshot. cmd/bench/matrix.go:acquireMatrixLock(lines 1222–1259) — prior-art tier-1 lock idiom extended by this ADR.internal/harnesses/claude/quota_cache.go— atomic-rename (.tmp→ final) write pattern reused for data files.internal/harnesses/codex/quota_cache.go— same atomic-rename pattern; TTL freshness check pattern (CapturedAt+ TTL).internal/harnesses/gemini/quota_cache.go— per-tier routing decision pattern; model of how the snapshot layer consumes cache data.golang.org/x/sync/singleflight— in-process deduplication composing with file-based cross-process coordination (§4 Algorithm 6).- ADR-009 (routing surface redesign) — defines the snapshot concept that this cache backs.
- ADR-010 (reasoning wire form from catalog) — L1 introspection
data from
/propsis a candidate input to the discovery cache tier defined here.