ProtocolQA2
protocolqa2
16 runs · 8 models · evaluated by HybridEvaluator.
| # | Model ↕ | Variant ↕ | Mode ↕ | Score ↓ | Avg. dur ↕ | Tokens ↕ | Date ↕ |
|---|---|---|---|---|---|---|---|
| 1 | gemini-3-pro-preview | tools,high | file | 0.616 | 1.6m | 451.4k | 2026-01-27 |
| 2 | claude-opus-4-6 | tools,high | file | 0.584 | 34.5s | 5.9M | 2026-03-22 |
| 3 | claude-opus-4-6 | — | file | 0.536 | 16.6s | 5.0M | 2026-03-20 |
| 4 | gemini-3-pro-preview | — | file | 0.536 | 1.0m | 436.9k | 2026-01-27 |
| 5 | claude-opus-4-5 | tools,high | file | 0.512 | 30.2s | 4.2M | 2026-03-22 |
| 6 | gpt-5-2-pro | tools,high | file | 0.504 | 2.0m | 1.4M | 2026-01-26 |
| 7 | gpt-5-2-pro | tools,high_retry | file | 0.500 | 29.5s | 24.9k | 2026-01-27 |
| 8 | claude-opus-4-5_retry | — | file | 0.484 | 13.7s | 541.7k | 2026-01-26 |
| 9 | gpt-5-2-pro | — | file | 0.472 | 1.3m | 4.4M | 2026-01-27 |
| 10 | gpt-5-2 | tools,high | file | 0.416 | 1.3m | 4.9M | 2026-01-27 |
| 11 | gpt-5-2 | — | file | 0.360 | 8.4s | 2.4M | 2026-01-27 |
| 12 | claude-opus-4-5 | — | file | 0.328 | 13.0s | 3.6M | 2026-03-20 |
| 13 | claude-opus-4-5 | tools,high_retry | file | 0.314 | 22.1s | 2.4M | 2026-01-26 |
| 14 | gpt-5-2-pro_retry | — | file | — | 0.0s | 0 | 2026-01-26 |
| 15 | gpt-5-2 | tools,high_retry | file | — | 0.0s | 0 | 2026-01-26 |
| 16 | gpt-5-2_retry | — | file | — | 0.0s | 0 | 2026-01-26 |
Click column headers to sort. Click mode chips to filter.