Edison Labs
Benchmarks / LabBench2

LitQA3

litqa3

10 runs · 5 models · evaluated by HybridEvaluator.

# Model Variant Mode Score Avg. dur Tokens Date
1 gpt-5-2-pro tools,high inject 0.851 3.4m 5.4M 2026-02-03
2 gpt-5-2 tools,high inject 0.815 1.6m 5.4M 2026-02-03
3 claude-opus-4-6 tools,high inject 0.756 1.1m 31.2M 2026-03-22
4 gemini-3-pro-preview tools,high inject 0.744 1.7m 61.0k 2026-02-03
5 claude-opus-4-5 tools,high inject 0.732 40.0s 23.4M 2026-03-22
6 gpt-5-2 inject 0.196 4.6s 25.3k 2026-02-03
7 claude-opus-4-6 inject 0.185 7.4s 50.2k 2026-03-20
8 gemini-3-pro-preview inject 0.185 1.3m 56.5k 2026-02-03
9 gpt-5-2-pro inject 0.179 1.4m 180.2k 2026-02-03
10 claude-opus-4-5 inject 0.143 6.8s 51.9k 2026-03-20

Click column headers to sort. Click mode chips to filter.