Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
296
Tracked models
27
Providers
253
Benchmarked
34.7
Avg. index
296 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 221 | DeepSeek-R1 deepseek-r1 textinference | DeepSeek | 22.3 overall | 0.0 | 14.3 | 0.0 | 0.0 | 35.1 | $0.55 in / $2.19 out |
| 222 | Magistral Medium magistral-medium multimodalvisionmulti-input reasoning | Mistral AI | 22.2 overall | 22.2 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 223 | Mistral Large 3 (675B Base) mistral-large-3-675b-base-2512 multimodalvisionmulti-input reasoning | Mistral AI | 22.2 overall | 22.2 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 224 | Mistral Large 3 (675B Instruct 2512 Eagle) mistral-large-3-675B-instruct-2512-eagle multimodalvisionmulti-input reasoning | Mistral AI | 22.2 overall | 22.2 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 225 | Mistral Large 3 (675B Instruct 2512 NVFP4) mistral-large-3-675b-instruct-2512-nvfp4 multimodalvisionmulti-input reasoning | Mistral AI | 22.2 overall | 22.2 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 226 | Llama 3.2 90B Instruct llama-3.2-90b-instruct multimodalvisionmulti-input reasoning | Meta | 22.0 overall | 16.3 | 11.3 | 0.0 | 0.0 | 54.9 | $0.35 in / $0.4 out |
| 227 | Sarvam-30B sarvam-30b codeprogrammingtool use | Sarvam AI | 21.7 overall | 46.4 | 0.0 | 8.2 | 5.2 | 0.0 | N/A |
| 228 | Phi 4 Mini Reasoning phi-4-mini-reasoning textinference | Microsoft | 21.7 overall | 21.7 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 229 | DeepSeek-R1-0528 deepseek-r1-0528 codeprogrammingtool use | DeepSeek | 21.3 overall | 50.1 | 14.3 | 0.0 | 6.6 | 35.1 | $0.55 in / $2.19 out |
| 230 | GPT-3.5 Turbo gpt-3.5-turbo-0125 multimodalvisionmulti-input reasoning | OpenAI | 21.3 overall | 2.5 | 36.7 | 0.0 | 0.0 | 49.4 | |
| 231 | Mistral Small mistral-small-2409 textinference | Mistral AI | 21.3 overall | 0.0 | 2.1 | 0.0 | 0.0 | 51.9 | $0.2 in / $0.6 out |
| 232 | Pixtral Large pixtral-large multimodalvisionmulti-input reasoning | Mistral AI | 20.6 overall | 27.8 | 7.0 | 0.0 | 0.0 | 22.4 | |
| 233 | GPT-5 nano gpt-5-nano-2025-08-07 multimodalvisionmulti-input reasoning | OpenAI | 19.9 overall | 26.3 | 0.0 | 0.0 | 11.8 | 0.0 | |
| 234 | Pixtral-12B pixtral-12b-2409 multimodalvisionmulti-input reasoning | Mistral AI | 19.8 overall | 8.1 | 7.0 | 0.0 | 0.0 | 73.0 | |
| 235 | Gemma 4 E4B gemma-4-e4b-it multimodalvisionmulti-input reasoning | Google | 19.2 overall | 19.2 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 236 | Mistral Small 3.2 24B Instruct mistral-small-3.2-24b-instruct-2506 multimodalvisionmulti-input reasoning | Mistral AI | 19.1 overall | 19.1 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 237 | Phi-3.5-mini-instruct phi-3.5-mini-instruct multimodalvisionmulti-input reasoning | Microsoft | 18.9 overall | 2.7 | 10.8 | 0.0 | 0.0 | 77.2 | $0.1 in / $0.1 out |
| 238 | Jamba 1.5 Large jamba-1.5-large textinference | AI21 Labs | 18.8 overall | 8.1 | 33.6 | 0.0 | 0.0 | 25.2 | $2 in / $8 out |
| 239 | Qwen2.5 32B Instruct qwen-2.5-32b-instruct textinference | Alibaba Cloud / Qwen Team | 18.6 overall | 18.6 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 240 | DeepSeek R1 Distill Qwen 7B deepseek-r1-distill-qwen-7b textinference | DeepSeek | 18.3 overall | 18.3 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
DeepSeek-R1
DeepSeek
22.3
$0.55 in / $2.19 out
Magistral Medium
Mistral AI
22.2
N/A
Mistral Large 3 (675B Base)
Mistral AI
22.2
N/A
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| N/A |
| N/A |
| N/A |
| N/A |
| $0.5 in / $1.5 out |
| $2 in / $6 out |
| N/A |
| $0.15 in / $0.15 out |
| N/A |
Mistral Large 3 (675B Instruct 2512 Eagle)
Mistral AI
22.2
N/A
Mistral Large 3 (675B Instruct 2512 NVFP4)
Mistral AI
22.2
N/A
Llama 3.2 90B Instruct
Meta
22.0
$0.35 in / $0.4 out
Sarvam-30B
Sarvam AI
21.7
N/A
Phi 4 Mini Reasoning
Microsoft
21.7
N/A
DeepSeek-R1-0528
DeepSeek
21.3
$0.55 in / $2.19 out
GPT-3.5 Turbo
OpenAI
21.3
$0.5 in / $1.5 out
Mistral Small
Mistral AI
21.3
$0.2 in / $0.6 out
Pixtral Large
Mistral AI
20.6
$2 in / $6 out
GPT-5 nano
OpenAI
19.9
N/A
Pixtral-12B
Mistral AI
19.8
$0.15 in / $0.15 out
Gemma 4 E4B
19.2
N/A
Mistral Small 3.2 24B Instruct
Mistral AI
19.1
N/A
Phi-3.5-mini-instruct
Microsoft
18.9
$0.1 in / $0.1 out
Jamba 1.5 Large
AI21 Labs
18.8
$2 in / $8 out
Qwen2.5 32B Instruct
Alibaba Cloud / Qwen Team
18.6
N/A
DeepSeek R1 Distill Qwen 7B
DeepSeek
18.3
N/A