3DCodeBench

Leaderboard

Switch view: Combined pools all votes, Text/Image filter to that modality. BT Elo = Bradley-Terry MLE (refreshed every 30 min, with 95% bootstrap CI). K=4 / K=8 = live online Elo at two K-factor settings. Click headers to sort.

#ModelReleasedBT Eloonline K=4online K=8gamescode-runs
1
Fable 5
claude-fable-5
2026-06
1197
±96
1047.71086.049
2
Claude Opus 4.8
claude-opus-4-8
2026-05
1144
±112
1025.01049.335
3
GPT-5.5
gpt-5.5
2026-04
1133
±27
1135.31141.7639100%
4
Gemini 3.1 Pro
gemini-3.1-pro-preview
2026-02
1116
±24
1124.31122.7742100%
5
Gemini 3.5 Flash
gemini-3.5-flash
2026-05
1116
±28
1137.81147.1611100%
6
GPT-5.4
gpt-5.4
2026-03
1042
±27
1047.71044.8565100%
7
Gemini 3 Flash
gemini-3-flash-preview
2025-12
1012
±26
1032.31039.0628100%
8
Claude Sonnet 4.6
claude-sonnet-4-6
2026-02
985
±27
990.9983.9593100%
9
Claude Opus 4.7
claude-opus-4-7
2026-04
978
±27
992.5983.0639100%
10
Gemma 4 31B
gemma-4-31b-it
2026-04
921
±30
939.4929.8516
11
GPT-5.4 Mini
gpt-5.4-mini
2026-03
920
±28
931.3926.7583100%
12
Gemini 3.1 Flash Lite
gemini-3.1-flash-lite-preview
2026-03
851
±31
906.8909.5426100%
13
Gemma 4 26B A4B
gemma-4-26b-a4b-it
2026-04
830
±34
874.3855.6450
14
Claude Haiku 4.5
claude-haiku-4-5
2025-10
773
±32
827.7805.1444100%