Edison Labs
Benchmarks / LabBench2

PatentQA

patentqa

10 runs · 5 models · evaluated by HybridEvaluator.

# Model Variant Mode Score Avg. dur Tokens Date
1 gpt-5-2-pro tools,high inject 0.909 4.3m 5.5M 2026-02-03
2 gpt-5-2 tools,high inject 0.802 2.0m 5.5M 2026-02-03
3 claude-opus-4-6 tools,high inject 0.570 3.6m 74.5M 2026-03-22
4 claude-opus-4-5 tools,high inject 0.438 2.4m 42.4M 2026-03-22
5 gemini-3-pro-preview tools,high inject 0.438 3.2m 59.0k 2026-02-03
6 gemini-3-pro-preview inject 0.215 1.1m 42.9k 2026-02-03
7 gpt-5-2-pro inject 0.182 1.2m 123.4k 2026-02-03
8 claude-opus-4-6 inject 0.174 8.5s 39.3k 2026-03-20
9 gpt-5-2 inject 0.165 3.8s 20.1k 2026-02-03
10 claude-opus-4-5 inject 0.099 6.7s 34.8k 2026-03-20

Click column headers to sort. Click mode chips to filter.