ADR-002: PTY Cassette Transport for Harness Golden Masters
| Date | Status | Deciders | Related | Confidence |
|---|---|---|---|---|
| 2026-04-20 | Accepted, amended | Fizeau maintainers | CONTRACT-003, harness capability matrix | Medium |
Context
| Aspect | Description |
|---|---|
| Problem | Real subprocess harness support needs golden-master evidence that exercises the same PTY behavior users see, but the project has not chosen whether tmux, direct PTY supervision, or a separate terminal recorder owns that lifecycle. |
| Current State | Runtime subprocess execution uses os/exec plus harness-specific runners. Quota probes have used tmux-shaped experiments, but normal harness execution does not have one attachable PTY transport. Existing concerns require either standardizing on tmux for the whole lifecycle or owning PTY/session supervision directly. |
| Requirements | One direct PTY transport must cover metadata probe mode, record mode, replay mode, cancellation, cleanup, quota/status/model/reasoning probing, service-event capture, and deterministic cassette playback. Normal prompt execution may continue to use harness-native batch modes when they satisfy DDX requirements. Record mode must fail fast on missing binary/auth/subscription/quota instead of writing misleading fixtures. |
Current-State Research
This decision was re-reviewed against local and current external terminal-agent managers on 2026-04-20.
| Source | Transport Shape | Useful Patterns | Limits for Fizeau |
|---|---|---|---|
gastown local repo | tmux is the runtime/session boundary | Session creation runs the target command directly instead of racing shell readiness; pane capture and process-command checks are used for inspection and zombie cleanup; input helpers account for paste/Enter timing. | Strong operator UX, but the service would inherit tmux server state, send timing quirks, and display scraping as correctness inputs. |
ntm current GitHub repo | tmux is an explicit required dependency and multi-agent control plane | Uses named sessions/panes, strict session-name validation, command timeouts, a circuit breaker, semantic capture budgets, pipe-pane streaming with polling fallback, and buffer paste for multiline/large prompts. | Its own analysis notes deep tmux lock-in. Good negative evidence for Fizeau’s library boundary. |
claude-squad current GitHub repo | hybrid: tmux owns sessions, Go PTY owns attach/control | Uses creack/pty to attach to tmux, resize the attached session, forward stdin/stdout, and write direct bytes for key input while still using tmux capture-pane. | Confirms that user impersonation needs a real terminal channel, but still leaves tmux as the persistent lifecycle owner. |
dmux current GitHub repo | tmux panes plus git worktrees, TypeScript/Ink UI | Treats tmux as an operator pane manager with persistent panes, hooks, and worktree isolation. | Valuable operator pattern; not a deterministic record/replay or service-event evidence layer. |
dun local repo | non-interactive CLI harnesses | Uses stdin/stdout CLI modes such as claude --print and codex exec -; prior spike docs prefer stdin for large prompts. | Does not solve TUI-only model/quota/status surfaces. |
creack/pty | direct Go PTY primitive | Starts commands with a controlling terminal, supports explicit terminal sizing and resize handling, and keeps lifecycle inside the process. | Requires Fizeau to own screen parsing, process-group cleanup, timeout behavior, and inspection UI. |
Netflix/go-expect | expect-style terminal automation | Useful expectation/input layer over a pseudoterminal. | It does not own process lifecycle, so it is a helper over the selected terminal transport, not the transport itself. |
asciinema | terminal recording/playback | Proven lightweight terminal recording format with timing and replay concepts. | Record/playback only; it does not drive the harness or emit CONTRACT-003 service events. |
The current ecosystem trend for human multi-agent operation is tmux. That is not the right baseline for Fizeau. Fizeau needs a reusable direct PTY library because cassettes need raw bytes, structured service events, deterministic replay, credential-free playback, and process cleanup without depending on global tmux server state. tmux evidence is useful for understanding common TUI failure modes, but it must not be part of the core harness capability story.
Decision
Fizeau will own direct PTY lifecycle in-process using Go os/exec plus a
small reusable PTY library. That library includes cassette recording and
playback as a first-class layer; recording/replay is not a separate harness
helper bolted on beside the PTY code. tmux is not part of the core harness
execution, model-list probing, quota probing, cassette recording, cassette
replay, cancellation, or inspection design.
Existing tmux quota helpers are legacy experiments. They can remain only as
temporary diagnostics while direct PTY replacements are being built, and their
results do not promote a capability to final supported status. Any capability
that can only be proven through tmux is gap until direct PTY evidence exists.
If a future operator project wants tmux attach/switch UX, it must live outside the core service/cassette path and consume Fizeau outputs like any other client. The Fizeau baseline is direct PTY only.
SPIKE-002 clarified the only acceptable success criteria for this area: Fizeau must control Claude/Codex well enough to extract TUI-only quota, available-model, reasoning-level, and related status facts; and it must replay cassettes well enough that client-side parsers and terminal assertions run without live Claude/Codex binaries, credentials, or network. Per-run token usage remains a core capability, but it should stay on native stream or batch JSON evidence unless a future harness path makes it TUI-derived. tmux’s human attachability is useful operator UX, but it is not in the current baseline and is not the accepted replay or capability-evidence path.
The baseline live process is a small background probe, not a general terminal control plane. It may periodically start Claude, Codex, or another harness, drive the minimum TUI flow required to read quota/status/model/reasoning facts, write a scrubbed snapshot/cache, and exit. If a harness’s batch prompt mode later stops working for normal execution, the same PTY wrapper may be promoted to drive an interactive session, but that is a future fallback decision rather than the first implementation target.
The cassette recorder and player remain part of internal/pty for the baseline
implementation, subject to the build-vs-buy gate in
ADR-004.
The project will adopt PTY, terminal-emulator, and recording concepts where
existing libraries fit. If reuse appears later, extract the mature PTY library
as a whole rather than splitting cassette playback from session and terminal
modeling prematurely.
Key Points: Direct PTY only | tmux helpers are legacy diagnostics | cassettes are versioned evidence artifacts
Module Boundaries
The direct PTY work must land as a reusable terminal library with narrow package boundaries. Harness-specific code consumes the library; it does not live inside it.
| Layer | Proposed Boundary | Owns | Must Not Own |
|---|---|---|---|
| Raw PTY session | internal/pty/session | Start, command argv/env/workdir, terminal size, PTY file descriptors, process groups, stdin bytes, resize, raw output stream, Wait, timeout, cancellation, Close/Kill cleanup | Claude/Codex parsing, model names, quota semantics, cassette schema, service events |
| Terminal model | internal/pty/terminal | Byte-to-frame derivation, normalized screen snapshots, key encoding, expect/wait predicates, frame diffing, capture metadata | Process spawning, harness-specific slash commands, quota/model interpretation |
| Cassette record/replay | internal/pty/cassette | Versioned manifest, input/output/frame streams, event timestamps, timing normalization, scrub reports, replay scheduler, deterministic and real-time playback drivers, read-only inspection inputs | Live credentials, provider calls, harness-specific capability decisions |
| Cassette assertion tests | internal/ptytest or equivalent test-only package | Scenario specs, fixture discovery, cassette playback assertions, time-coded predicates, replay clocks, fixture isolation, parallel-safe temp homes, and test reporting | Production harness behavior, credential storage, terminal emulation internals |
| Harness probes | internal/harnesses/<name> | Claude/Codex prompt flows, quota/status/model-list extraction, reasoning-level discovery, normalized errors, capability matrix updates | PTY lifecycle primitives or cassette file-format internals |
| Debug snapshots | CLI/debug helpers over internal/pty | Dump current rendered VT state, cursor/screen metadata, recent raw byte offsets, and recent timed input/output events for failed probes | Interactive terminal UI, long-lived session management, tmux-style attachability |
No package below internal/pty may import internal/harnesses. The PTY
library must be testable with synthetic programs and ordinary Unix TUIs before
Claude or Codex are involved. Claude and Codex quota/model probes are acceptance
tests for the harness adapters, not proof that the PTY library is complete by
themselves. The terminal rendering decision is detailed in
ADR-003.
The build-vs-buy boundary and extraction triggers are detailed in
ADR-004.
The terminal rendering decision is supported by the top spike in
SPIKE-001.
The recorder/driver build-vs-buy pressure test is captured in
SPIKE-002.
Data Flow
The cassette layer observes the PTY library; it does not replace or wrap harness-specific parsing.
internal/pty/session raw bytes and input events
-> internal/pty/terminal frame derivation and screen normalization
-> internal/harnesses/<name> adapter parsing and service-event emission
-> internal/pty/cassette CassetteTee writes raw output, timed input, frames,
opaque service-event JSON, final metadata, and scrub reportsinternal/pty/cassette may store and replay opaque service-event JSON, but it
must not import harness adapters or CONTRACT-003 typed-event decoders. Service
assertions stay above the cassette library. Harness adapters hand timed opaque
events to the cassette layer through a narrow CassetteTee-style interface so
the dependency direction stays internal/harnesses/<name> -> internal/pty,
never the reverse.
Cassette Data Contract
Every cassette is a single versioned directory or archive with a manifest and
append-only event streams. Version 1 contains:
| Field | Required | Description |
|---|---|---|
manifest.version | Yes | Cassette schema version. Starts at 1; incompatible changes increment the major version. |
manifest.id | Yes | Stable UUID generated when the cassette is recorded. Assertion specs must reference this ID so they cannot silently attach to a different recording. |
manifest.content_digest | Yes | Digest metadata for recorded evidence, including at least sha256 over output.raw. Assertion specs must reference the digest they were authored against. |
manifest.harness | Yes | Harness name, binary path fingerprint, binary version string when available, and capability row snapshot. |
manifest.command | Yes | Scrubbed argv, working directory policy, environment allowlist names, timeout settings, and permission mode. |
manifest.terminal | Yes | Initial rows/cols, resize events, locale, TERM value, PTY mode flags, and terminal emulator identity {name, version} used to derive frames. |
manifest.timing | Yes | Clock policy, timestamp resolution, replay default, and any scaling/collapse policy used by tests. Version 1 defaults to 100ms timestamp resolution, but recorders may choose a finer resolution_ms without a schema bump. |
manifest.provenance | Yes | Agent git SHA, contract version, OS/arch, recorded-at timestamp, and recorder version. |
input.jsonl | Yes | User/input events: bytes sent to stdin, paste boundaries, control keys, resize events, and signal events. Every record includes monotonic seq and t_ms. |
output.raw | Yes | Raw output bytes from the PTY, exactly as observed after environment scrubbing. This is the byte-for-byte evidence stream. |
output.jsonl | Yes | Timed raw output chunks from the PTY. Every record includes monotonic seq and t_ms, byte offset into output.raw, chunk length, and optional chunk digest. Inline chunk bytes are forbidden in version 1; replay reads bytes from output.raw by offset to avoid JSON byte-encoding ambiguity. |
frames.jsonl | Yes | Screen snapshots or frame diffs at monotonic seq and t_ms timestamps for human review and deterministic replay assertions. Frames are derived artifacts, not the byte-level evidence source. |
service-events.jsonl | Yes | Opaque service-event JSON emitted during the run, including routing, tool, final, and typed-drain-compatible payloads. Every record includes monotonic seq and t_ms. |
final.json | Yes | Exit status, signal, duration, final metadata, usage, cost, routing actual, session log path, and normalized final text. |
quota.json | When applicable | Scrubbed quota/status probe output and parsed quota windows used to accept or reject the record run. |
scrub-report.json | Yes | Redaction rules applied, environment values removed, secret-pattern hit counts, and fields intentionally preserved. |
assertions.json or assertions.yaml | Test fixtures only | Time-coded semantic assertions for automated tests. This is a sidecar test spec, not observed evidence, and may be regenerated or tightened without changing the recorded PTY facts. |
Because the child process runs under a PTY, stdout and stderr are normally
merged by the terminal slave and recorded together in output.raw. If a future
harness exposes a separate non-PTY stderr stream, that stream must either be
normalized into service events or added as an explicit optional artifact; it
must not be silently dropped.
Timing is event-driven. The recorder writes a monotonic seq and monotonic
t_ms timestamp on every observed input event, raw output chunk, resize,
signal, derived frame, service event, and final event. Fixed-interval frame
sampling is optional and derived; it is not the authoritative recording model.
This keeps recordings compact while preserving the exact timeline needed to
test terminal emulation, buffering, delays, and interactions.
Timestamps are stored as monotonic milliseconds from cassette start, quantized
to manifest.timing.resolution_ms. Version 1 uses resolution_ms: 100 by
default so replay preserves the shape of a real TUI session without pretending
to be nanosecond-accurate. Recorders may use a finer capture resolution in
version 1 when a TUI or test needs it; replay mode remains orthogonal to
capture resolution. Replay supports three timing modes:
realtime: sleep according to recordedt_msvalues at the recorded resolution; this is the default for human inspection and visual playback.scaled: multiply recorded delays by a caller-provided factor while preserving event order and relative pacing.collapsed: ignore sleeps and replay in event order for fast deterministic CI assertions.
Replay must preserve event order, raw output chunk boundaries, resize ordering,
process exit, final service metadata, and the recorded timing relation within
one timestamp resolution. Terminal emulator tests replay output.jsonl at full
speed under a virtual clock, so assertions run at the same logical place in the
timeline without waiting on wall-clock sleeps.
Event order is authoritative by seq, not by t_ms alone. Multiple events may
share the same quantized timestamp. Replay and assertion evaluation must process
same-t_ms events in ascending seq order. seq is global across all cassette
streams, assigned at observation time, and must be contiguous after merge. If a
reader needs a deterministic merge for older diagnostic artifacts that lack
seq, the fallback ordering is resize/signal, input, output chunk, derived
frame, service event, final; accepted version-1 cassettes must not rely on that
fallback.
Nondeterministic terminal content normalization is separate from secret scrubbing. Scrubbing removes sensitive values. Normalization handles volatile screen facts such as clocks, PIDs, elapsed durations, and animation counters so semantic frame assertions remain stable without weakening raw evidence storage.
Default scrubbing and normalization rules are part of the cassette contract.
Version 1 starts with: explicit environment allowlist, HOME and worktree
path rewriting, bearer/API token patterns, account identifiers where configured,
UUID/request/session identifiers, RFC3339 and local timestamp values, elapsed
durations, PIDs, transient socket/file names, and animation counters. Harness
adapters may register extension rules, but the scrub report must list every
rule applied and every intentionally preserved volatile field.
Schema Evolution
Version 1 readers reject cassettes with a manifest.version higher than the
reader supports, missing required artifacts, missing required fields, or unknown
required feature flags. Readers may ignore unknown optional fields within the
same major version. Additive optional fields do not require a schema bump;
renaming, removing, or changing the meaning of required fields requires a new
major version. Writers stamp the supported manifest.version and refuse to
overwrite a cassette written with a newer major version.
Recorders must compute timing from a monotonic elapsed clock such as
time.Since(recordingStart) or an injected monotonic test clock. They must not
derive t_ms from wall-clock timestamps after serialization. Tests must cover a
wall-clock jump during recording without changing monotonic event order.
Record Mode
Record mode runs the real harness binary through the direct PTY transport. It fails before writing a cassette when:
- the harness binary is missing or not executable;
- authentication is missing, expired, or for the wrong account;
- subscription or quota state cannot be confirmed for subscription harnesses;
- requested model, reasoning, permission, or workdir capability is unsupported by the harness capability matrix;
- the run exits before producing a final service event.
If a failure happens after cassette creation starts, the recorder writes an explicit failed-run artifact only under a diagnostic path, never as accepted golden-master evidence.
Replay mode is parallel-safe. Record mode is not assumed to be parallel-safe for authenticated harnesses. Recorders must serialize per harness account and fail fast on lock contention rather than running two Claude or Codex record jobs against the same credential, quota window, or session store.
Accepted authenticated cassettes must carry freshness metadata sufficient for
the capability matrix: captured_at, harness binary version, auth/account
class, and any freshness window used for a supported claim. Stale cassettes
remain useful parser fixtures, but they cannot promote or retain live
capability support without a documented refresh policy.
Replay Mode
Replay mode never uses credentials and never contacts a provider. It feeds the recorded input/output/frame streams through the same parser, service-event decoder, and typed drain assertions used by live mode. Replay can prove parser, event-shape, timing behavior, cancellation, cleanup, and PTY transport behavior; it cannot prove that a live external harness still works today.
output.raw plus output.jsonl is the authoritative replay input for terminal
emulator assertions. frames.jsonl is stored for human review, debugging, and
fast smoke checks, but correctness assertions that validate terminal rendering
must be able to re-derive frames from output.raw/output.jsonl through the
manifest-pinned emulator. When the emulator backend or version changes, the
reader must either reject stale frame-derived assertions with a clear emulator
mismatch or require cassette re-recording/regeneration.
Replay is deterministic by default:
- CI uses
collapsedtiming unless a test explicitly asks forrealtimeorscaledplayback; - environment is reconstructed only from the cassette allowlist;
- terminal size and resize events come from
manifest.terminal; - service-event assertions compare typed payloads after documented scrub rules, not raw secrets or machine-specific paths.
Realtime and scaled replay exist to validate the cassette replay scheduler:
given a recorded event sequence and timestamp resolution, the scheduler must
emit events in seq order, sleep according to recorded deltas within one
resolution tick for realtime, multiply deltas by the requested factor for
scaled, and avoid wall-clock sleeps in collapsed mode while preserving the
same logical t_ms positions for assertions.
Automated Cassette Assertion Framework
All PTY/cassette acceptance tests must be automated. Manual inspection is useful
for debugging, but it is never a promotion gate for supported capability
status. The default go test ./... path must run replay-only tests that are
credential-free, provider-free, parallel-safe, and fast. Live record mode and
Docker conformance mode may be opt-in because they need binaries, credentials,
or containers, but when enabled they must still run without human keystrokes or
manual TUI observation.
The project will build a test-only cassette assertion framework on top of
internal/pty/session, internal/pty/terminal, and internal/pty/cassette.
That framework owns:
- scenario definitions that name the cassette, terminal size, replay mode,
fixture driver, environment policy, expected artifacts, and expected
manifest.id/manifest.content_digestvalues; - time-coded assertions over frames, raw output chunks, input events, resize events, service events, final metadata, exit status, and timing gaps;
- a virtual clock for
collapsedreplay so time-coded tests run quickly while preserving recorded event order; realtimeandscaledreplay modes for tests that explicitly validate scheduler behavior;- parallel fixture isolation: read-only cassette inputs, per-test temp dirs,
per-test
HOME/config roots for record mode, no global tmux/session state, unique artifact output paths, and deterministic cleanup; - structured failure reports that include the failed assertion, nearest frame timestamps, relevant screen excerpts, and service-event context.
Assertion specs must bind to the cassette they were authored against by
manifest.id and content digest. CI must fail when an assertion file points to
a different cassette ID or digest, even if the current predicates happen to
pass.
Assertion specs must support at least these predicate families:
| Predicate Family | Examples |
|---|---|
| Frame content | at t_ms screen contains, within window eventually contains, never contains, stable_for, normalized volatile text comparison |
| Terminal state | cursor position/visibility, rows/cols, alternate-screen state, style/color policy, scrollback or screen clear facts |
| Timing and buffering | output chunk ordering, maximum gap, minimum delay, delayed prompt arrival, split escape sequence handling, backpressure/large output completion |
| Input and resize | exact bytes sent, paste boundaries, control keys, signal events, resize order relative to output |
| Service events | typed-drain-compatible JSON shape, quota/model/reasoning/usage metadata, final status, warning presence or absence |
| Process lifecycle | exit code, signal, EOF, timeout, cancellation, no leaked child process evidence |
The initial scenario set is:
topthrough Docker conformance, with initial paint, refresh, input-driven change, and resize-driven layout change.claudeauthenticated record mode plus replay cassettes for quota/status, model list, and reasoning levels.codexauthenticated record mode plus replay cassettes for quota/status, model list, and reasoning levels.
The framework must be extensible before those scenarios are marked complete. Adding a new weird terminal case should require adding a scenario fixture and assertion spec, not writing one-off sleeps or parser-specific test plumbing. Required synthetic fixture families include partial ANSI/VT escape sequences, one-byte chunking, alternate screen, cursor addressing, screen clears, SGR style changes, OSC title, OSC 8 hyperlinks, OSC 52 clipboard writes, bell, DECRQM/mode-query responses, line-drawing and alternate character sets, bracketed paste, SGR mouse mode, focus-in/out, Unicode wide and combining characters, resize during output, resize during an escape sequence, rapid redraw/spinner frames, delayed output, no-newline prompts, PTY backpressure or large buffered output, final output burst at process exit, EOF during redraw, cancellation, and timeout. Sixel and image protocols are out of scope for the first implementation unless a selected primary harness emits them; if observed, they become explicit gap fixtures rather than silently ignored behavior.
PTY Library Test Strategy
The PTY library is not complete until it proves useful behavior against real terminal programs, not only fake sessions and happy-path harness probes. These tests are layered on the automated cassette assertion framework above; they do not rely on manual inspection, arbitrary sleeps, or terminal text scraping outside the selected emulator.
| Test Class | Required Coverage |
|---|---|
| Unit and fake-session tests | Startup failure, normal exit, EOF, timeout, cancellation, process-group cleanup, large input, multiline paste boundaries, control keys, resize events, raw output capture, frame derivation, deterministic fake clock, replay ordering, and assertion-runner failure reporting. |
| Host PTY smoke tests | Portable Unix commands such as sh, cat, stty size, and sleep verify stdin/stdout, exit status, terminal sizing, cancellation, and no leaked child processes without credentials or network. Linux and macOS host smoke targets are required before primary PTY support is promoted. Windows support is an explicit gap until a Windows PTY adapter and fixtures are designed. |
| Docker TUI conformance tests | A pinned Linux container image supplies known TUI programs. The first required target is Unix top: capture several distinct screens from one run, including initial paint, later refresh frames, and at least one interaction or resize that changes the screen. Assertions are time-coded scenario predicates over rendered frames and service metadata, not brittle byte-for-byte full-screen output. |
| Terminal rendering tests | internal/pty/terminal must wrap a real VT/ANSI emulator. Tests must prove screen clears, cursor movement, SGR style policy, alternate-screen behavior where available, Unicode/wide characters, partial escape sequences, resize races, buffering/delay behavior, and volatile-content normalization. Regex ANSI stripping is not accepted as the screen model. |
| Additional TUI diversity | Add at least two more common terminal shapes before calling the library mature: a pager flow such as less, and an editor or curses-style full-screen flow such as vim, nano, or dialog, using Docker when host availability is inconsistent. Each new TUI shape must be a reusable scenario fixture. |
| Cassette replay tests | Record a deterministic synthetic terminal run, replay it through the cassette reader/player and assertion framework, and assert manifest fields, input ordering, raw output, frame snapshots, scrub report, final status, read-only replay behavior, and parallel replay safety. |
| Authenticated harness tests | Opt-in recorder tests drive Claude and Codex through the same PTY library to extract TUI-only quota/status, model listings, and reasoning levels. Per-run token usage is covered by native stream capability tests unless it becomes TUI-derived; then it must get its own checklist row and cassette scenario. Missing binary/auth/quota/timeout cases must fail before writing accepted cassettes. Replay cassettes for these flows must run in default CI without credentials. |
Inspection
Live inspection attaches to a read-only mirror of the direct PTY stream.
Inspectors may watch frames and output bytes but cannot write to stdin, resize
the authoritative PTY, or mutate cassette files. Recorded-run inspection reads
frames.jsonl and output.raw through a viewer that opens files read-only and
never normalizes or rewrites the evidence.
Alternatives
| Option | Pros | Cons | Evaluation |
|---|---|---|---|
| Direct PTY ownership in agent | One dependency-light lifecycle for execution, record, replay, cancellation, and inspection; portable test seams; cassette format can be shaped around CONTRACT-003 events | Requires careful PTY implementation and platform testing; attach UX must be built | Selected: best fit for a library-first service boundary without a global tmux dependency |
| Terminal-session interface with direct PTY and tmux adapters | Would preserve an easy operator escape hatch | Keeps tmux alive as an attractive partial implementation and makes capability evidence ambiguous | Rejected for the core baseline; direct PTY library only |
| Standardize on tmux for all harness lifecycle | Mature attach/detach UX, pane capture, process supervision already exists; matches tools such as gastown, ntm, claude-squad, and dmux | Makes tmux a hard dependency for library consumers and CI; Windows portability is poor; machine-local tmux state complicates deterministic replay; tmux capture is a derived screen view rather than raw service evidence | Rejected |
| Keep tmux only for quota/status while direct exec handles normal runs | Minimal short-term change | Violates the single-transport concern; quota behavior and live execution would diverge; cassette replay could not prove the path that quota probes use | Rejected: partial helper is explicitly the failure mode this ADR resolves |
| Adopt ntm or another terminal manager as the core | Faster access to mature tmux orchestration patterns and robot APIs | Adds another lifecycle owner without CONTRACT-003 semantics; inherits tmux coupling; does not define Fizeau cassette/service-event evidence | Rejected |
| Use asciinema/script-style recorder as the core | Existing terminal recording/playback concepts and viewer ecosystem | Records terminal output but does not drive input, manage auth/quota preflight, own process cleanup, or emit service events | Rejected: useful format reference, insufficient as harness transport |
| Split a generic PTY cassette project now | Clean abstraction if multiple projects need it | Premature API freeze; no second consumer yet; slows harness support beads | Rejected for now by ADR-004; revisit at the documented extraction triggers |
Consequences
| Type | Impact |
|---|---|
| Positive | Harness execution, quota probes, model-list probes, record/replay, cancellation, and inspection share one direct PTY library. |
| Positive | Library consumers do not need tmux installed to use or test Fizeau harness support. |
| Positive | Golden-master cassettes can carry CONTRACT-003 service events and typed-drain payloads as first-class evidence. |
| Negative | The project must own PTY edge cases: resize races, process groups, signal handling, terminal modes, and OS portability. |
| Negative | Read-only inspection needs a purpose-built viewer instead of relying on tmux attach. |
| Neutral | Developers may still use tmux manually outside Fizeau, but tmux is not a Fizeau dependency or evidence source. |
Risks
| Risk | Prob | Impact | Mitigation |
|---|---|---|---|
| Direct PTY implementation leaks subprocesses on cancellation | M | H | Add process-group cleanup tests, timeout tests, and failed-run diagnostics before marking live capabilities supported |
| Legacy tmux helpers linger and hide direct PTY gaps | M | H | Track replacement beads and mark tmux-only capabilities gap until direct PTY evidence exists |
| Cassette scrub rules remove data needed for replay | M | M | Store scrub reports and compare replay against typed events rather than raw secrets |
| Replay creates false confidence about live harness availability | H | M | Keep live-run policy: fresh record-mode evidence is required to promote or retain supported capability status |
| Cross-platform PTY behavior diverges | M | M | Require Linux and macOS host smoke tests before claiming primary PTY support; track Windows as an explicit unsupported gap until an OS-specific adapter and fixtures exist |
Validation
| Success Metric | Review Trigger |
|---|---|
| A future cassette runner can record and replay one codex or claude run through the same direct PTY transport | Record and replay use different process/session supervisors |
| Time-coded cassette assertions run in collapsed mode quickly and in parallel for top, Claude, Codex, and synthetic edge fixtures | Tests rely on sleeps, manual inspection, or serial global state |
PTY conformance tests capture useful multi-frame output from Unix top and at least two other terminal program shapes | The library is marked complete using only fake sessions or Claude/Codex probes |
| Linux and macOS host PTY smoke tests pass or report an explicit platform gap | Primary PTY support is claimed from Docker-only Linux evidence |
| Codex and Claude model-list and quota probes run through the direct PTY library | A capability is marked supported from tmux-only evidence |
| Accepted cassettes contain manifest, input, output, frames, service events, final metadata, quota data when applicable, and scrub report | A cassette lacks any required version-1 artifact |
| Record mode refuses missing auth/quota/binary cases before writing accepted evidence | CI or local record mode creates a passing cassette for an unauthenticated harness |
| Inspection cannot alter the live PTY or recorded files | Viewer writes to stdin, resizes the authoritative PTY, or rewrites cassette artifacts |
Concern Impact
- Resolves inspectable harness execution concern: Selects direct PTY ownership as the canonical service evidence path and rejects tmux in the core harness/cassette design.
- Supports harness capability matrix: Future
supportedharness capabilities can cite versioned cassette evidence produced by this transport.
References
- CONTRACT-003 Fizeau Service Interface
- Concerns
- Architecture
- ADR-003 PTY Terminal Rendering and Screen Model
- ADR-004 Terminal Harness Build-vs-Buy Boundary
- SPIKE-001 Direct PTY Rendering With Unix Top
- gastown local tmux wrapper
- dun local harness spike
- Named Tmux Manager
- Claude Squad
- dmux
- creack/pty
- Netflix/go-expect
- asciinema
Review Checklist
- Context names a specific problem
- Decision statement is actionable
- At least two alternatives were evaluated
- Each alternative has concrete pros and cons
- Selected option’s rationale explains why it wins
- Consequences include positive and negative impacts
- Negative consequences have mitigations
- Risks are specific with probability and impact assessments
- Validation section defines review triggers
- Concern impact is complete
- ADR is consistent with governing feature spec and PRD requirements