FigQA2 (image)
figqa2-img
10 runs · 5 models · evaluated by HybridEvaluator.
| # | Model ↕ | Variant ↕ | Mode ↕ | Score ↓ | Avg. dur ↕ | Tokens ↕ | Date ↕ |
|---|---|---|---|---|---|---|---|
| 1 | gpt-5-2 | tools,high | file | 0.663 | 8.1m | 3.4M | 2026-02-03 |
| 2 | gpt-5-2-pro | — | file | 0.614 | 1.8m | 1.1M | 2026-02-03 |
| 3 | gemini-3-pro-preview | — | file | 0.604 | 1.8m | 143.9k | 2026-02-03 |
| 4 | gpt-5-2-pro | tools,high | file | 0.594 | 2.8m | 1.8M | 2026-02-03 |
| 5 | gpt-5-2 | — | file | 0.564 | 12.2s | 1.1M | 2026-02-03 |
| 6 | gemini-3-pro-preview | tools,high | file | 0.545 | 3.9m | 729.1k | 2026-02-03 |
| 7 | claude-opus-4-5 | tools,high | file | 0.446 | 21.6s | 1.6M | 2026-03-22 |
| 8 | claude-opus-4-5 | — | file | 0.426 | 8.0s | 526.8k | 2026-03-20 |
| 9 | claude-opus-4-6 | — | file | 0.416 | 8.6s | 531.4k | 2026-03-20 |
| 10 | claude-opus-4-6 | tools,high | file | 0.406 | 41.0s | 6.6M | 2026-03-22 |
Click column headers to sort. Click mode chips to filter.