Skip to content

Explorer

Browser Analytical Workbench

Benchmark Explorer

Initializing browser database...

Summary

What Was Collected

The explorer loads the valid benchmark cell dataset directly in the browser with DuckDB-WASM. A cell is one result-bearing run: an explicit timeout, a completed graded pass, or a completed graded failure. Setup/auth/provider-invalid rows are excluded from the Parquet table and counted in the manifest.

Rows -
Pass rate -
Timeouts -
Tasks -
Models -
GPUs -
Tokens -
Wall p50 -

Task Type Distribution

GPU Distribution

Wall Time by GPU

Wall Time by Model