PatentQA
patentqa
10 runs · 5 models · evaluated by HybridEvaluator.
| # | Model ↕ | Variant ↕ | Mode ↕ | Score ↓ | Avg. dur ↕ | Tokens ↕ | Date ↕ |
|---|---|---|---|---|---|---|---|
| 1 | gpt-5-2-pro | tools,high | inject | 0.909 | 4.3m | 5.5M | 2026-02-03 |
| 2 | gpt-5-2 | tools,high | inject | 0.802 | 2.0m | 5.5M | 2026-02-03 |
| 3 | claude-opus-4-6 | tools,high | inject | 0.570 | 3.6m | 74.5M | 2026-03-22 |
| 4 | claude-opus-4-5 | tools,high | inject | 0.438 | 2.4m | 42.4M | 2026-03-22 |
| 5 | gemini-3-pro-preview | tools,high | inject | 0.438 | 3.2m | 59.0k | 2026-02-03 |
| 6 | gemini-3-pro-preview | — | inject | 0.215 | 1.1m | 42.9k | 2026-02-03 |
| 7 | gpt-5-2-pro | — | inject | 0.182 | 1.2m | 123.4k | 2026-02-03 |
| 8 | claude-opus-4-6 | — | inject | 0.174 | 8.5s | 39.3k | 2026-03-20 |
| 9 | gpt-5-2 | — | inject | 0.165 | 3.8s | 20.1k | 2026-02-03 |
| 10 | claude-opus-4-5 | — | inject | 0.099 | 6.7s | 34.8k | 2026-03-20 |
Click column headers to sort. Click mode chips to filter.