Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
296
Tracked models
27
Providers
253
Benchmarked
34.7
Avg. index
296 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 161 | GPT-4.1 nano gpt-4.1-nano-2025-04-14 multimodalvisionmulti-input reasoning | OpenAI | 34.2 overall | 12.5 | 93.4 | 0.0 | 0.0 | 82.7 | $0.1 in / $0.4 out |
| 162 | Nemotron 3 Nano (30B A3B) nemotron-3-nano-30b-a3b codeprogrammingtool use | NVIDIA | 34.1 overall | 45.4 | 66.0 | 3.3 | 4.4 | 90.9 | $0.06 in / $0.24 out |
| 163 | Qwen3.6-35B-A3B qwen3.6-35b-a3b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 33.7 overall | 55.3 | 0.0 | 15.5 | 26.0 | 0.0 | N/A |
| 164 | MiniMax M1 80K minimax-m1-80k codeprogrammingtool use | MiniMax | 33.6 overall | 24.2 | 84.0 | 20.9 | 19.0 | 41.8 | $0.55 in / $2.2 out |
| 165 | Ministral 8B Instruct ministral-8b-instruct-2410 textinference | Mistral AI | 33.6 overall | 0.0 | 7.0 | 0.0 | 0.0 | 76.1 | $0.1 in / $0.1 out |
| 166 | o3 o3-2025-04-16 multimodalvisionmulti-input reasoning | OpenAI | 33.2 overall | 46.0 | 38.9 | 19.6 | 30.2 | 27.7 | $2 in / $8 out |
| 167 | DeepSeek-V3 deepseek-v3 codeprogrammingtool use | DeepSeek | 33.2 overall | 27.3 | 58.0 | 0.0 | 10.4 | 60.5 | $0.27 in / $1.1 out |
| 168 | DeepSeek-V3.1 deepseek-v3.1 codeprogrammingtool use | DeepSeek | 32.9 overall | 38.4 | 39.8 | 15.2 | 28.3 | 58.8 | $0.27 in / $1 out |
| 169 | DeepSeek R1 Distill Qwen 32B deepseek-r1-distill-qwen-32b textinference | DeepSeek | 32.7 overall | 26.6 | 16.6 | 0.0 | 0.0 | 75.9 | $0.12 in / $0.18 out |
| 170 | DeepSeek-V2.5 deepseek-v2.5 codeprogrammingtool use | DeepSeek | 32.5 overall | 0.0 | 46.5 | 0.0 | 0.9 | 79.7 | $0.14 in / $0.28 out |
| 171 | DeepSeek R1 Distill Llama 70B deepseek-r1-distill-llama-70b textinference | DeepSeek | 32.2 overall | 28.8 | 16.6 | 0.0 | 0.0 | 66.6 | $0.1 in / $0.4 out |
| 172 | Qwen3.5-4B qwen3.5-4b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 32.1 overall | 32.1 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 173 | Claude 3 Haiku claude-3-haiku-20240307 multimodalvisionmulti-input reasoning | Anthropic | 32.0 overall | 5.8 | 61.8 | 0.0 | 0.0 | 57.9 | |
| 174 | Mistral Large 3 (675B Instruct 2512) mistral-large-latest multimodalvisionmulti-input reasoning | Mistral AI | 31.6 overall | 22.2 | 40.1 | 0.0 | 0.0 | 44.5 | |
| 175 | Phi 4 Reasoning Plus phi-4-reasoning-plus textinference | Microsoft | 31.5 overall | 31.5 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 176 | Hermes 3 70B hermes-3-70b textinference | Nous Research | 30.1 overall | 30.1 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 177 | Grok-2 grok-2 multimodalvisionmulti-input reasoning | xAI | 30.1 overall | 27.1 | 38.3 | 0.0 | 0.0 | 25.4 | $2 in / $10 out |
| 178 | GLM-4.7-Flash glm-4.7-flash codeprogrammingtool use | Zhipu AI | 29.9 overall | 38.2 | 29.7 | 11.4 | 20.7 | 72.1 | $0.07 in / $0.4 out |
| 179 | GPT-4o gpt-4o-2024-05-13 multimodalvisionmulti-input reasoning | OpenAI | 29.9 overall | 22.3 | 45.4 | 0.0 | 0.0 | 26.5 | $2.5 in / $10 out |
| 180 | Llama 3.3 70B Instruct llama-3.3-70b-instruct textinference | Meta | 29.9 overall | 19.6 | 21.4 | 0.0 | 0.0 | 72.2 | $0.2 in / $0.2 out |
GPT-4.1 nano
OpenAI
34.2
$0.1 in / $0.4 out
Nemotron 3 Nano (30B A3B)
NVIDIA
34.1
$0.06 in / $0.24 out
Qwen3.6-35B-A3B
Alibaba Cloud / Qwen Team
33.7
N/A
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $0.25 in / $1.25 out |
| $0.5 in / $1.5 out |
MiniMax M1 80K
MiniMax
33.6
$0.55 in / $2.2 out
Ministral 8B Instruct
Mistral AI
33.6
$0.1 in / $0.1 out
o3
OpenAI
33.2
$2 in / $8 out
DeepSeek-V3
DeepSeek
33.2
$0.27 in / $1.1 out
DeepSeek-V3.1
DeepSeek
32.9
$0.27 in / $1 out
DeepSeek R1 Distill Qwen 32B
DeepSeek
32.7
$0.12 in / $0.18 out
DeepSeek-V2.5
DeepSeek
32.5
$0.14 in / $0.28 out
DeepSeek R1 Distill Llama 70B
DeepSeek
32.2
$0.1 in / $0.4 out
Qwen3.5-4B
Alibaba Cloud / Qwen Team
32.1
N/A
Claude 3 Haiku
Anthropic
32.0
$0.25 in / $1.25 out
Mistral Large 3 (675B Instruct 2512)
Mistral AI
31.6
$0.5 in / $1.5 out
Phi 4 Reasoning Plus
Microsoft
31.5
N/A
Hermes 3 70B
Nous Research
30.1
N/A
Grok-2
xAI
30.1
$2 in / $10 out
GLM-4.7-Flash
Zhipu AI
29.9
$0.07 in / $0.4 out
GPT-4o
OpenAI
29.9
$2.5 in / $10 out
Llama 3.3 70B Instruct
Meta
29.9
$0.2 in / $0.2 out