SuppQA2
suppqa2
10 runs · 5 models · evaluated by HybridEvaluator.
| # | Model ↕ | Variant ↕ | Mode ↕ | Score ↓ | Avg. dur ↕ | Tokens ↕ | Date ↕ |
|---|---|---|---|---|---|---|---|
| 1 | gpt-5-2-pro | tools,high | inject | 0.368 | 8.9m | 9.1M | 2026-01-26 |
| 2 | gpt-5-2 | tools,high | inject | 0.344 | 4.8m | 9.6M | 2026-01-26 |
| 3 | gemini-3-pro-preview | tools,high | inject | 0.304 | 2.6m | 53.7k | 2026-01-26 |
| 4 | claude-opus-4-6 | tools,high | inject | 0.248 | 3.5m | 108.4M | 2026-03-23 |
| 5 | claude-opus-4-5 | tools,high | inject | 0.160 | 1.5m | 46.3M | 2026-03-22 |
| 6 | claude-opus-4-6 | — | inject | 0.056 | 7.3s | 36.4k | 2026-03-20 |
| 7 | gemini-3-pro-preview | — | inject | 0.056 | 1.4m | 34.1k | 2026-01-26 |
| 8 | gpt-5-2 | — | inject | 0.056 | 2.4s | 17.0k | 2026-01-26 |
| 9 | gpt-5-2-pro | — | inject | 0.040 | 1.2m | 148.2k | 2026-01-26 |
| 10 | claude-opus-4-5 | — | inject | — | 6.2s | 34.5k | 2026-03-20 |
Click column headers to sort. Click mode chips to filter.