Edison Labs
Benchmarks / LabBench2

FigQA2 (pdf)

figqa2-pdf

10 runs · 5 models · evaluated by HybridEvaluator.

# Model Variant Mode Score Avg. dur Tokens Date
1 gpt-5-2 tools,high file 0.644 11.2m 13.2M 2026-02-03
2 gpt-5-2-pro tools,high file 0.436 7.0m 12.3M 2026-02-03
3 gemini-3-pro-preview file 0.416 1.7m 1.1M 2026-02-03
4 claude-opus-4-6 file 0.396 36.6s 14.3M 2026-03-20
5 gpt-5-2-pro file 0.396 3.1m 11.8M 2026-02-03
6 claude-opus-4-5 tools,high file 0.386 55.3s 8.5M 2026-03-22
7 claude-opus-4-6 tools,high file 0.366 2.2m 33.5M 2026-03-22
8 gpt-5-2 file 0.366 1.5m 9.4M 2026-02-03
9 claude-opus-4-5 file 0.356 22.4s 8.0M 2026-03-20
10 gemini-3-pro-preview tools,high file 0.337 3.7m 1.2M 2026-02-03

Click column headers to sort. Click mode chips to filter.