FigQA2
figqa2
10 runs · 5 models · evaluated by HybridEvaluator.
| # | Model ↕ | Variant ↕ | Mode ↕ | Score ↓ | Avg. dur ↕ | Tokens ↕ | Date ↕ |
|---|---|---|---|---|---|---|---|
| 1 | gpt-5-2 | tools,high | inject | 0.426 | 4.0m | 6.2M | 2026-02-03 |
| 2 | gpt-5-2-pro | tools,high | inject | 0.347 | 8.0m | 5.8M | 2026-02-03 |
| 3 | claude-opus-4-6 | tools,high | inject | 0.287 | 4.0m | 60.9M | 2026-03-22 |
| 4 | claude-opus-4-5 | tools,high | inject | 0.257 | 1.4m | 31.6M | 2026-03-22 |
| 5 | gemini-3-pro-preview | tools,high | inject | 0.238 | 3.4m | 53.0k | 2026-02-03 |
| 6 | gpt-5-2-pro | — | inject | 0.119 | 1.6m | 116.9k | 2026-02-03 |
| 7 | gpt-5-2 | — | inject | 0.119 | 2.7s | 15.2k | 2026-02-03 |
| 8 | claude-opus-4-6 | — | inject | 0.109 | 9.0s | 41.0k | 2026-03-20 |
| 9 | gemini-3-pro-preview | — | inject | 0.109 | 2.0m | 35.0k | 2026-02-03 |
| 10 | claude-opus-4-5 | — | inject | 0.079 | 7.1s | 36.4k | 2026-03-20 |
Click column headers to sort. Click mode chips to filter.