FigQA2 (pdf)
figqa2-pdf
10 runs · 5 models · evaluated by HybridEvaluator.
| # | Model ↕ | Variant ↕ | Mode ↕ | Score ↓ | Avg. dur ↕ | Tokens ↕ | Date ↕ |
|---|---|---|---|---|---|---|---|
| 1 | gpt-5-2 | tools,high | file | 0.644 | 11.2m | 13.2M | 2026-02-03 |
| 2 | gpt-5-2-pro | tools,high | file | 0.436 | 7.0m | 12.3M | 2026-02-03 |
| 3 | gemini-3-pro-preview | — | file | 0.416 | 1.7m | 1.1M | 2026-02-03 |
| 4 | claude-opus-4-6 | — | file | 0.396 | 36.6s | 14.3M | 2026-03-20 |
| 5 | gpt-5-2-pro | — | file | 0.396 | 3.1m | 11.8M | 2026-02-03 |
| 6 | claude-opus-4-5 | tools,high | file | 0.386 | 55.3s | 8.5M | 2026-03-22 |
| 7 | claude-opus-4-6 | tools,high | file | 0.366 | 2.2m | 33.5M | 2026-03-22 |
| 8 | gpt-5-2 | — | file | 0.366 | 1.5m | 9.4M | 2026-02-03 |
| 9 | claude-opus-4-5 | — | file | 0.356 | 22.4s | 8.0M | 2026-03-20 |
| 10 | gemini-3-pro-preview | tools,high | file | 0.337 | 3.7m | 1.2M | 2026-02-03 |
Click column headers to sort. Click mode chips to filter.