Edison Labs
Benchmarks / LabBench2

TableQA2

tableqa2

10 runs · 5 models · evaluated by HybridEvaluator.

# Model Variant Mode Score Avg. dur Tokens Date
1 gpt-5-2-pro tools,high inject 0.700 4.9m 4.8M 2026-02-03
2 gemini-3-pro-preview tools,high inject 0.630 2.2m 42.2k 2026-02-03
3 gpt-5-2 tools,high inject 0.630 2.5m 4.9M 2026-02-03
4 claude-opus-4-6 tools,high inject 0.590 2.2m 42.4M 2026-03-23
5 claude-opus-4-5 tools,high inject 0.450 47.1s 26.1M 2026-03-22
6 gpt-5-2 inject 0.100 2.0s 11.6k 2026-02-03
7 claude-opus-4-6 inject 0.090 7.2s 28.9k 2026-03-20
8 gpt-5-2-pro inject 0.090 1.5m 109.5k 2026-02-03
9 gemini-3-pro-preview inject 0.080 1.5m 27.7k 2026-02-03
10 claude-opus-4-5 inject 0.040 7.4s 29.8k 2026-03-20

Click column headers to sort. Click mode chips to filter.