Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
294
Tracked models
27
Providers
251
Benchmarked
13.2
Avg. index
294 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 61 | o4-mini o4-mini multimodalvisionmulti-input reasoning | OpenAI | 32.7 Programming | 48.8 | 70.7 | 38.2 | 32.7 | 41.9 | $1.1 in / $4.4 out |
| 62 | o3 o3-2025-04-16 multimodalvisionmulti-input reasoning | OpenAI | 30.7 Programming | 46.2 | 38.4 | 20.5 | 30.7 | 27.7 | $2 in / $8 out |
| 63 | Gemini 2.5 Pro Preview 06-05 gemini-2.5-pro-preview-06-05 multimodalvisionmulti-input reasoning | Google | 30.0 Programming | 51.7 | 63.2 | 0.0 | 30.0 | 27.9 | |
| 64 | DeepSeek-V3.1 deepseek-v3.1 codeprogrammingtool use | DeepSeek | 28.7 Programming | 38.7 | 40.2 | 15.3 | 28.7 | 58.9 | $0.27 in / $1 out |
| 65 | Nemotron 3 Super (120B A12B) nemotron-3-super-120b-a12b codeprogrammingtool use | NVIDIA | 27.0 Programming | 48.9 | 0.0 | 8.9 | 27.0 | 0.0 | N/A |
| 66 | GPT-5.4 Mini gpt-5.4-mini texttext-to-textlanguage | OpenAI | 26.9 Programming | 57.4 | 77.4 | 27.1 | 26.9 | 32.8 | |
| 67 | Qwen3.6-35B-A3B qwen3.6-35b-a3b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 26.6 Programming | 55.7 | 0.0 | 17.7 | 26.6 | 0.0 | N/A |
| 68 | Gemini 2.5 Pro gemini-2.5-pro multimodalvisionmulti-input reasoning | Google | 25.6 Programming | 44.6 | 63.2 | 0.0 | 25.6 | 27.9 | |
| 69 | LongCat-Flash-Lite longcat-flash-lite codeprogrammingtool use | Meituan | 25.3 Programming | 24.7 | 83.8 | 29.5 | 25.3 | 83.3 | $0.1 in / $0.4 out |
| 70 | Devstral Medium devstral-medium-2507 codeprogrammingtool use | Mistral AI | 24.7 Programming | 0.0 | 64.5 | 0.0 | 24.7 | 53.2 | |
| 71 | GPT-5 mini gpt-5-mini-2025-08-07 multimodalvisionmulti-input reasoning | OpenAI | 23.7 Programming | 41.9 | 89.7 | 0.0 | 23.7 | 56.3 | |
| 72 | Gemini 2.5 Flash gemini-2.5-flash multimodalvisionmulti-input reasoning | Google | 23.4 Programming | 40.1 | 63.2 | 0.0 | 23.4 | 42.6 | |
| 73 | Mercury 2 mercury-2 codeprogrammingtool use | Inception | 22.3 Programming | 44.6 | 72.5 | 0.0 | 22.3 | 69.2 | $0.25 in / $0.75 out |
| 74 | LongCat-Flash-Thinking longcat-flash-thinking codeprogrammingtool use | Meituan | 22.1 Programming | 50.4 | 0.0 | 0.0 | 22.1 | 0.0 | |
| 75 | GLM-4.7-Flash glm-4.7-flash codeprogrammingtool use | Zhipu AI | 21.2 Programming | 38.5 | 29.1 | 12.0 | 21.2 | 72.2 | $0.07 in / $0.4 out |
| 76 | GLM-4.5-Air glm-4.5-air codeprogrammingtool use | Zhipu AI | 20.2 Programming | 28.1 | 0.0 | 24.9 | 20.2 | 0.0 | N/A |
| 77 | Kimi K2-Instruct-0905 kimi-k2-instruct-0905 codeprogrammingtool use | Moonshot AI | 19.6 Programming | 24.9 | 0.0 | 6.6 | 19.6 | 0.0 | |
| 78 | MiniMax M1 80K minimax-m1-80k codeprogrammingtool use | MiniMax | 19.4 Programming | 24.6 | 84.9 | 20.9 | 19.4 | 41.7 | $0.55 in / $2.2 out |
| 79 | MiniMax M1 40K minimax-m1-40k codeprogrammingtool use | MiniMax | 18.5 Programming | 22.9 | 0.0 | 26.8 | 18.5 | 0.0 | N/A |
| 80 | GPT-4.1 gpt-4.1-2025-04-14 multimodalvisionmulti-input reasoning | OpenAI | 17.7 Programming | 28.8 | 75.4 | 32.8 | 17.7 | 34.6 |
o4-mini
OpenAI
32.7
$1.1 in / $4.4 out
o3
OpenAI
30.7
$2 in / $8 out
Gemini 2.5 Pro Preview 06-05
30.0
$1.25 in / $10 out
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $1.25 in / $10 out |
| $0.75 in / $4.5 out |
| $1.25 in / $10 out |
| $0.4 in / $2 out |
| $0.25 in / $2 out |
| $0.3 in / $2.5 out |
| N/A |
| N/A |
| $2 in / $8 out |
DeepSeek-V3.1
DeepSeek
28.7
$0.27 in / $1 out
Nemotron 3 Super (120B A12B)
NVIDIA
27.0
N/A
Qwen3.6-35B-A3B
Alibaba Cloud / Qwen Team
26.6
N/A
Gemini 2.5 Pro
25.6
$1.25 in / $10 out
LongCat-Flash-Lite
Meituan
25.3
$0.1 in / $0.4 out
Devstral Medium
Mistral AI
24.7
$0.4 in / $2 out
GPT-5 mini
OpenAI
23.7
$0.25 in / $2 out
Gemini 2.5 Flash
23.4
$0.3 in / $2.5 out
Mercury 2
Inception
22.3
$0.25 in / $0.75 out
LongCat-Flash-Thinking
Meituan
22.1
N/A
GLM-4.7-Flash
Zhipu AI
21.2
$0.07 in / $0.4 out
GLM-4.5-Air
Zhipu AI
20.2
N/A
Kimi K2-Instruct-0905
Moonshot AI
19.6
N/A
MiniMax M1 80K
MiniMax
19.4
$0.55 in / $2.2 out
MiniMax M1 40K
MiniMax
18.5
N/A
GPT-4.1
OpenAI
17.7
$2 in / $8 out