Leaderboard
Switch view: Combined pools all votes, Text/Image filter to that modality. BT Elo = Bradley-Terry MLE (refreshed every 30 min, with 95% bootstrap CI). K=4 / K=8 = live online Elo at two K-factor settings. Click headers to sort.
| # | Model | Released | BT Elo ▼ | online K=4 | online K=8 | games | code-runs |
|---|---|---|---|---|---|---|---|
| 1 | Fable 5 claude-fable-5 | 2026-06 | 1197 ±96 | 1047.7 | 1086.0 | 49 | – |
| 2 | Claude Opus 4.8 claude-opus-4-8 | 2026-05 | 1144 ±112 | 1025.0 | 1049.3 | 35 | – |
| 3 | GPT-5.5 gpt-5.5 | 2026-04 | 1133 ±27 | 1135.3 | 1141.7 | 639 | 100% |
| 4 | Gemini 3.1 Pro gemini-3.1-pro-preview | 2026-02 | 1116 ±24 | 1124.3 | 1122.7 | 742 | 100% |
| 5 | Gemini 3.5 Flash gemini-3.5-flash | 2026-05 | 1116 ±28 | 1137.8 | 1147.1 | 611 | 100% |
| 6 | GPT-5.4 gpt-5.4 | 2026-03 | 1042 ±27 | 1047.7 | 1044.8 | 565 | 100% |
| 7 | Gemini 3 Flash gemini-3-flash-preview | 2025-12 | 1012 ±26 | 1032.3 | 1039.0 | 628 | 100% |
| 8 | Claude Sonnet 4.6 claude-sonnet-4-6 | 2026-02 | 985 ±27 | 990.9 | 983.9 | 593 | 100% |
| 9 | Claude Opus 4.7 claude-opus-4-7 | 2026-04 | 978 ±27 | 992.5 | 983.0 | 639 | 100% |
| 10 | Gemma 4 31B gemma-4-31b-it | 2026-04 | 921 ±30 | 939.4 | 929.8 | 516 | – |
| 11 | GPT-5.4 Mini gpt-5.4-mini | 2026-03 | 920 ±28 | 931.3 | 926.7 | 583 | 100% |
| 12 | Gemini 3.1 Flash Lite gemini-3.1-flash-lite-preview | 2026-03 | 851 ±31 | 906.8 | 909.5 | 426 | 100% |
| 13 | Gemma 4 26B A4B gemma-4-26b-a4b-it | 2026-04 | 830 ±34 | 874.3 | 855.6 | 450 | – |
| 14 | Claude Haiku 4.5 claude-haiku-4-5 | 2025-10 | 773 ±32 | 827.7 | 805.1 | 444 | 100% |