Edison Labs
Benchmarks / LabBench2

TrialQA

trialqa

10 runs · 5 models · evaluated by HybridEvaluator.

# Model Variant Mode Score Avg. dur Tokens Date
1 gpt-5-2-pro tools,high inject 0.933 3.5m 4.1M 2026-02-03
2 gpt-5-2 tools,high inject 0.908 1.3m 3.8M 2026-02-03
3 gemini-3-pro-preview tools,high inject 0.800 1.3m 39.5k 2026-02-03
4 claude-opus-4-6 tools,high inject 0.492 1.6m 43.8M 2026-03-23
5 claude-opus-4-5 tools,high inject 0.475 1.0m 27.3M 2026-03-22
6 claude-opus-4-6 inject 0.300 7.2s 33.6k 2026-03-20
7 gpt-5-2-pro inject 0.267 1.1m 108.1k 2026-02-03
8 gemini-3-pro-preview inject 0.242 1.1m 34.8k 2026-02-03
9 gpt-5-2 inject 0.225 3.0s 15.8k 2026-02-03
10 claude-opus-4-5 inject 0.167 6.9s 34.5k 2026-03-20

Click column headers to sort. Click mode chips to filter.