Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
296
Tracked models
27
Providers
253
Benchmarked
27.4
Avg. index
296 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 161 | Magistral Medium magistral-medium multimodalvisionmulti-input reasoning | Mistral AI | 22.2 Benchmarks | 22.2 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 162 | Mistral Large 3 (675B Base) mistral-large-3-675b-base-2512 multimodalvisionmulti-input reasoning | Mistral AI | 22.2 Benchmarks | 22.2 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 163 | Mistral Large 3 (675B Instruct 2512 Eagle) mistral-large-3-675B-instruct-2512-eagle multimodalvisionmulti-input reasoning | Mistral AI | 22.2 Benchmarks | 22.2 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 164 | Mistral Large 3 (675B Instruct 2512 NVFP4) mistral-large-3-675b-instruct-2512-nvfp4 multimodalvisionmulti-input reasoning | Mistral AI | 22.2 Benchmarks | 22.2 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 165 | Mistral Large 3 (675B Instruct 2512) mistral-large-latest multimodalvisionmulti-input reasoning | Mistral AI | 22.2 Benchmarks | 22.2 | 40.1 | 0.0 | 0.0 | 44.5 | |
| 166 | Min istral 3 (3B Reasoning 2512) ministral-3b-latest multimodalvisionmulti-input reasoning | Mistral AI | 22.0 Benchmarks | 22.0 | 79.6 | 0.0 | 0.0 | 95.8 | |
| 167 | Phi 4 Mini Reasoning phi-4-mini-reasoning textinference | Microsoft | 21.7 Benchmarks | 21.7 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 168 | Gemini 2.5 Flash-Lite gemini-2.5-flash-lite multimodalvisionmulti-input reasoning | Google | 21.4 Benchmarks | 21.4 | 32.8 | 0.0 | 3.5 | 64.1 | |
| 169 | Qwen3 32B qwen3-32b textinference | Alibaba Cloud / Qwen Team | 21.4 Benchmarks | 21.4 | 13.3 | 0.0 | 0.0 | 69.8 | $0.1 in / $0.3 out |
| 170 | Qwen2.5 VL 32B Instruct qwen2.5-vl-32b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 21.2 Benchmarks | 21.2 | 0.0 | 1.6 | 0.0 | 0.0 | N/A |
| 171 | GPT-4.1 mini gpt-4.1-mini-2025-04-14 multimodalvisionmulti-input reasoning | OpenAI | 20.7 Benchmarks | 20.7 | 90.6 | 8.9 | 2.6 | 56.8 | |
| 172 | Llama 3.1 405B Instruct llama-3.1-405b-instruct textinference | Meta | 20.0 Benchmarks | 20.0 | 21.4 | 0.0 | 0.0 | 44.5 | $0.89 in / $0.89 out |
| 173 | Nova Pro nova-pro multimodalvisionmulti-input reasoning | Amazon | 20.0 Benchmarks | 20.0 | 70.5 | 0.0 | 0.0 | 43.2 | $0.8 in / $3.2 out |
| 174 | Llama 3.3 70B Instruct llama-3.3-70b-instruct textinference | Meta | 19.6 Benchmarks | 19.6 | 21.4 | 0.0 | 0.0 | 72.2 | $0.2 in / $0.2 out |
| 175 | Qwen3 VL 4B Instruct qwen3-vl-4b-instruct multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 19.6 Benchmarks | 19.6 | 66.0 | 19.5 | 0.0 | 70.3 | |
| 176 | Claude 3 Opus claude-3-opus-20240229 multimodalvisionmulti-input reasoning | Anthropic | 19.3 Benchmarks | 19.3 | 71.7 | 0.0 | 0.0 | 19.5 | |
| 177 | Gemma 4 E4B gemma-4-e4b-it multimodalvisionmulti-input reasoning | Google | 19.2 Benchmarks | 19.2 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 178 | Mistral Small 3.2 24B Instruct mistral-small-3.2-24b-instruct-2506 multimodalvisionmulti-input reasoning | Mistral AI | 19.1 Benchmarks | 19.1 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 179 | Qwen2.5 32B Instruct qwen-2.5-32b-instruct textinference | Alibaba Cloud / Qwen Team | 18.6 Benchmarks | 18.6 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 180 | DeepSeek R1 Distill Qwen 7B deepseek-r1-distill-qwen-7b textinference | DeepSeek | 18.3 Benchmarks | 18.3 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
Magistral Medium
Mistral AI
22.2
N/A
Mistral Large 3 (675B Base)
Mistral AI
22.2
N/A
Mistral Large 3 (675B Instruct 2512 Eagle)
Mistral AI
22.2
N/A
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| N/A |
| N/A |
| N/A |
| $0.5 in / $1.5 out |
| $0.1 in / $0.1 out |
| $0.1 in / $0.4 out |
| $0.4 in / $1.6 out |
| $0.1 in / $0.6 out |
| $15 in / $75 out |
| N/A |
Mistral Large 3 (675B Instruct 2512 NVFP4)
Mistral AI
22.2
N/A
Mistral Large 3 (675B Instruct 2512)
Mistral AI
22.2
$0.5 in / $1.5 out
Min istral 3 (3B Reasoning 2512)
Mistral AI
22.0
$0.1 in / $0.1 out
Phi 4 Mini Reasoning
Microsoft
21.7
N/A
Gemini 2.5 Flash-Lite
21.4
$0.1 in / $0.4 out
Qwen3 32B
Alibaba Cloud / Qwen Team
21.4
$0.1 in / $0.3 out
Qwen2.5 VL 32B Instruct
Alibaba Cloud / Qwen Team
21.2
N/A
GPT-4.1 mini
OpenAI
20.7
$0.4 in / $1.6 out
Llama 3.1 405B Instruct
Meta
20.0
$0.89 in / $0.89 out
Nova Pro
Amazon
20.0
$0.8 in / $3.2 out
Llama 3.3 70B Instruct
Meta
19.6
$0.2 in / $0.2 out
Qwen3 VL 4B Instruct
Alibaba Cloud / Qwen Team
19.6
$0.1 in / $0.6 out
Claude 3 Opus
Anthropic
19.3
$15 in / $75 out
Gemma 4 E4B
19.2
N/A
Mistral Small 3.2 24B Instruct
Mistral AI
19.1
N/A
Qwen2.5 32B Instruct
Alibaba Cloud / Qwen Team
18.6
N/A
DeepSeek R1 Distill Qwen 7B
DeepSeek
18.3
N/A