TableQA2
tableqa2
10 runs · 5 models · evaluated by HybridEvaluator.
| # | Model ↕ | Variant ↕ | Mode ↕ | Score ↓ | Avg. dur ↕ | Tokens ↕ | Date ↕ |
|---|---|---|---|---|---|---|---|
| 1 | gpt-5-2-pro | tools,high | inject | 0.700 | 4.9m | 4.8M | 2026-02-03 |
| 2 | gemini-3-pro-preview | tools,high | inject | 0.630 | 2.2m | 42.2k | 2026-02-03 |
| 3 | gpt-5-2 | tools,high | inject | 0.630 | 2.5m | 4.9M | 2026-02-03 |
| 4 | claude-opus-4-6 | tools,high | inject | 0.590 | 2.2m | 42.4M | 2026-03-23 |
| 5 | claude-opus-4-5 | tools,high | inject | 0.450 | 47.1s | 26.1M | 2026-03-22 |
| 6 | gpt-5-2 | — | inject | 0.100 | 2.0s | 11.6k | 2026-02-03 |
| 7 | claude-opus-4-6 | — | inject | 0.090 | 7.2s | 28.9k | 2026-03-20 |
| 8 | gpt-5-2-pro | — | inject | 0.090 | 1.5m | 109.5k | 2026-02-03 |
| 9 | gemini-3-pro-preview | — | inject | 0.080 | 1.5m | 27.7k | 2026-02-03 |
| 10 | claude-opus-4-5 | — | inject | 0.040 | 7.4s | 29.8k | 2026-03-20 |
Click column headers to sort. Click mode chips to filter.