Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
296
Tracked models
27
Providers
253
Benchmarked
32.1
Avg. index
296 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 161 | Gemma 3 27B gemma-3-27b-it multimodalvisionmulti-input reasoning | Google | 20.3 Inference | 10.7 | 20.3 | 0.0 | 0.0 | 73.9 | $0.1 in / $0.2 out |
| 162 | Gemma 3 4B gemma-3-4b-it multimodalvisionmulti-input reasoning | Google | 20.3 Inference | 4.5 | 20.3 | 0.0 | 0.0 | 82.0 | $0.02 in / $0.04 out |
| 163 | Gemma 3n E4B Instructed gemma-3n-e4b-it multimodalvisionmulti-input reasoning | Google | 20.3 Inference | 1.3 | 20.3 | 0.0 | 0.0 | 10.3 | |
| 164 | o1 o1-2024-12-17 multimodalvisionmulti-input reasoning | OpenAI | 19.4 Inference | 42.9 | 19.4 | 44.7 | 6.5 | 4.9 | $15 in / $60 out |
| 165 | ERNIE 4.5 ernie-4.5 textinference | Baidu | 18.8 Inference | 24.5 | 18.8 | 0.0 | 0.0 | 34.6 | $0.4 in / $4 out |
| 166 | Mistral Large 3 mistral-large-3-2509 multimodalvisionmulti-input reasoning | Mistral AI | 18.8 Inference | 9.6 | 18.8 | 0.0 | 0.0 | 29.1 | |
| 167 | DeepSeek R1 Distill Llama 70B deepseek-r1-distill-llama-70b textinference | DeepSeek | 16.6 Inference | 28.8 | 16.6 | 0.0 | 0.0 | 66.6 | $0.1 in / $0.4 out |
| 168 | DeepSeek R1 Distill Qwen 32B deepseek-r1-distill-qwen-32b textinference | DeepSeek | 16.6 Inference | 26.6 | 16.6 | 0.0 | 0.0 | 75.9 | $0.12 in / $0.18 out |
| 169 | Qwen2.5 72B Instruct qwen-2.5-72b-instruct textinference | Alibaba Cloud / Qwen Team | 15.0 Inference | 17.8 | 15.0 | 0.0 | 0.0 | 54.5 | $0.35 in / $0.4 out |
| 170 | DeepSeek-R1 deepseek-r1 textinference | DeepSeek | 14.3 Inference | 0.0 | 14.3 | 0.0 | 0.0 | 35.1 | $0.55 in / $2.19 out |
| 171 | DeepSeek-R1-0528 deepseek-r1-0528 codeprogrammingtool use | DeepSeek | 14.3 Inference | 50.1 | 14.3 | 0.0 | 6.6 | 35.1 | $0.55 in / $2.19 out |
| 172 | Qwen3 32B qwen3-32b textinference | Alibaba Cloud / Qwen Team | 13.3 Inference | 21.4 | 13.3 | 0.0 | 0.0 | 69.8 | $0.1 in / $0.3 out |
| 173 | Phi-4-multimodal-instruct phi-4-multimodal-instruct multimodalvisionmulti-input reasoning | Microsoft | 12.3 Inference | 8.8 | 12.3 | 0.0 | 0.0 | 79.8 | $0.05 in / $0.1 out |
| 174 | Llama 3.2 90B Instruct llama-3.2-90b-instruct multimodalvisionmulti-input reasoning | Meta | 11.3 Inference | 16.3 | 11.3 | 0.0 | 0.0 | 54.9 | $0.35 in / $0.4 out |
| 175 | Phi-3.5-mini-instruct phi-3.5-mini-instruct multimodalvisionmulti-input reasoning | Microsoft | 10.8 Inference | 2.7 | 10.8 | 0.0 | 0.0 | 77.2 | $0.1 in / $0.1 out |
| 176 | Phi 4 phi-4 textinference | Microsoft | 9.0 Inference | 15.6 | 9.0 | 0.0 | 0.0 | 77.2 | $0.07 in / $0.14 out |
| 177 | Ministral 8B Instruct ministral-8b-instruct-2410 textinference | Mistral AI | 7.0 Inference | 0.0 | 7.0 | 0.0 | 0.0 | 76.1 | $0.1 in / $0.1 out |
| 178 | Pixtral-12B pixtral-12b-2409 multimodalvisionmulti-input reasoning | Mistral AI | 7.0 Inference | 8.1 | 7.0 | 0.0 | 0.0 | 73.0 | |
| 179 | Pixtral Large pixtral-large multimodalvisionmulti-input reasoning | Mistral AI | 7.0 Inference | 27.8 | 7.0 | 0.0 | 0.0 | 22.4 | |
| 180 | Qwen3-Next-80B-A3B-Instruct qwen3-next-80b-a3b-instruct textinference | Alibaba Cloud / Qwen Team | 6.1 Inference | 29.5 | 6.1 | 17.9 | 0.0 | 51.9 | $0.15 in / $1.5 out |
Gemma 3 27B
20.3
$0.1 in / $0.2 out
Gemma 3 4B
20.3
$0.02 in / $0.04 out
Gemma 3n E4B Instructed
20.3
$20 in / $40 out
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $20 in / $40 out |
| $2 in / $5 out |
| $0.15 in / $0.15 out |
| $2 in / $6 out |
o1
OpenAI
19.4
$15 in / $60 out
ERNIE 4.5
Baidu
18.8
$0.4 in / $4 out
Mistral Large 3
Mistral AI
18.8
$2 in / $5 out
DeepSeek R1 Distill Llama 70B
DeepSeek
16.6
$0.1 in / $0.4 out
DeepSeek R1 Distill Qwen 32B
DeepSeek
16.6
$0.12 in / $0.18 out
Qwen2.5 72B Instruct
Alibaba Cloud / Qwen Team
15.0
$0.35 in / $0.4 out
DeepSeek-R1
DeepSeek
14.3
$0.55 in / $2.19 out
DeepSeek-R1-0528
DeepSeek
14.3
$0.55 in / $2.19 out
Qwen3 32B
Alibaba Cloud / Qwen Team
13.3
$0.1 in / $0.3 out
Phi-4-multimodal-instruct
Microsoft
12.3
$0.05 in / $0.1 out
Llama 3.2 90B Instruct
Meta
11.3
$0.35 in / $0.4 out
Phi-3.5-mini-instruct
Microsoft
10.8
$0.1 in / $0.1 out
Phi 4
Microsoft
9.0
$0.07 in / $0.14 out
Ministral 8B Instruct
Mistral AI
7.0
$0.1 in / $0.1 out
Pixtral-12B
Mistral AI
7.0
$0.15 in / $0.15 out
Pixtral Large
Mistral AI
7.0
$2 in / $6 out
Qwen3-Next-80B-A3B-Instruct
Alibaba Cloud / Qwen Team
6.1
$0.15 in / $1.5 out