Edison Labs
Benchmarks / LabBench2

SuppQA2

suppqa2

10 runs · 5 models · evaluated by HybridEvaluator.

# Model Variant Mode Score Avg. dur Tokens Date
1 gpt-5-2-pro tools,high inject 0.368 8.9m 9.1M 2026-01-26
2 gpt-5-2 tools,high inject 0.344 4.8m 9.6M 2026-01-26
3 gemini-3-pro-preview tools,high inject 0.304 2.6m 53.7k 2026-01-26
4 claude-opus-4-6 tools,high inject 0.248 3.5m 108.4M 2026-03-23
5 claude-opus-4-5 tools,high inject 0.160 1.5m 46.3M 2026-03-22
6 claude-opus-4-6 inject 0.056 7.3s 36.4k 2026-03-20
7 gemini-3-pro-preview inject 0.056 1.4m 34.1k 2026-01-26
8 gpt-5-2 inject 0.056 2.4s 17.0k 2026-01-26
9 gpt-5-2-pro inject 0.040 1.2m 148.2k 2026-01-26
10 claude-opus-4-5 inject 6.2s 34.5k 2026-03-20

Click column headers to sort. Click mode chips to filter.