Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
296
Tracked models
27
Providers
253
Benchmarked
27.4
Avg. index
296 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 201 | Grok-1.5V grok-1.5v multimodalvisionmulti-input reasoning | xAI | 9.8 Benchmarks | 9.8 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 202 | Qwen3 VL 8B Instruct qwen3-vl-8b-instruct multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 9.8 Benchmarks | 9.8 | 66.0 | 26.7 | 0.0 | 75.3 | |
| 203 | Mistral Large 3 mistral-large-3-2509 multimodalvisionmulti-input reasoning | Mistral AI | 9.6 Benchmarks | 9.6 | 18.8 | 0.0 | 0.0 | 29.1 | |
| 204 | Qwen2.5 VL 7B Instruct qwen2.5-vl-7b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 9.6 Benchmarks | 9.6 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 205 | Gemma 4 E2B gemma-4-e2b-it multimodalvisionmulti-input reasoning | Google | 9.5 Benchmarks | 9.5 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 206 | Qwen2-VL-72B-Instruct qwen2-vl-72b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 9.3 Benchmarks | 9.3 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 207 | Gemma 3 12B gemma-3-12b-it multimodalvisionmulti-input reasoning | Google | 9.1 Benchmarks | 9.1 | 20.3 | 0.0 | 0.0 | 80.7 | $0.05 in / $0.1 out |
| 208 | Nova Micro nova-micro textinference | Amazon | 9.1 Benchmarks | 9.1 | 52.7 | 0.0 | 0.0 | 91.3 | $0.03 in / $0.14 out |
| 209 | Phi-4-multimodal-instruct phi-4-multimodal-instruct multimodalvisionmulti-input reasoning | Microsoft | 8.8 Benchmarks | 8.8 | 12.3 | 0.0 | 0.0 | 79.8 | $0.05 in / $0.1 out |
| 210 | Grok-1.5 grok-1.5 multimodalvisionmulti-input reasoning | xAI | 8.6 Benchmarks | 8.6 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 211 | Phi-3.5-MoE-instruct phi-3.5-moe-instruct multimodalvisionmulti-input reasoning | Microsoft | 8.2 Benchmarks | 8.2 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 212 | Jamba 1.5 Large jamba-1.5-large textinference | AI21 Labs | 8.1 Benchmarks | 8.1 | 33.6 | 0.0 | 0.0 | 25.2 | $2 in / $8 out |
| 213 | Pixtral-12B pixtral-12b-2409 multimodalvisionmulti-input reasoning | Mistral AI | 8.1 Benchmarks | 8.1 | 7.0 | 0.0 | 0.0 | 73.0 | |
| 214 | Qwen2.5-Omni-7B qwen2.5-omni-7b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 7.6 Benchmarks | 7.6 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 215 | Qwen2.5 7B Instruct qwen-2.5-7b-instruct textinference | Alibaba Cloud / Qwen Team | 7.4 Benchmarks | 7.4 | 71.1 | 0.0 | 0.0 | 77.2 | $0.3 in / $0.3 out |
| 216 | Gemini Diffusion gemini-diffusion codeprogrammingtool use | Google | 7.0 Benchmarks | 7.0 | 0.0 | 0.0 | 1.7 | 0.0 | N/A |
| 217 | DeepSeek VL2 deepseek-vl2 multimodalvisionmulti-input reasoning | DeepSeek | 6.9 Benchmarks | 6.9 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 218 | GPT-4 gpt-4-0613 multimodalvisionmulti-input reasoning | OpenAI | 6.8 Benchmarks | 6.8 | 54.9 | 0.0 | 0.0 | 18.7 | $30 in / $60 out |
| 219 | Mistral Small 3 24B Base mistral-small-24b-base-2501 multimodalvisionmulti-input reasoning | Mistral AI | 6.4 Benchmarks | 6.4 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 220 | DeepSeek R1 Distill Qwen 1.5B deepseek-r1-distill-qwen-1.5b textinference | DeepSeek | 6.1 Benchmarks | 6.1 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
Grok-1.5V
xAI
9.8
N/A
Qwen3 VL 8B Instruct
Alibaba Cloud / Qwen Team
9.8
$0.08 in / $0.5 out
Mistral Large 3
Mistral AI
9.6
$2 in / $5 out
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $0.08 in / $0.5 out |
| $2 in / $5 out |
| $0.15 in / $0.15 out |
| N/A |
Qwen2.5 VL 7B Instruct
Alibaba Cloud / Qwen Team
9.6
N/A
Gemma 4 E2B
9.5
N/A
Qwen2-VL-72B-Instruct
Alibaba Cloud / Qwen Team
9.3
N/A
Gemma 3 12B
9.1
$0.05 in / $0.1 out
Nova Micro
Amazon
9.1
$0.03 in / $0.14 out
Phi-4-multimodal-instruct
Microsoft
8.8
$0.05 in / $0.1 out
Grok-1.5
xAI
8.6
N/A
Phi-3.5-MoE-instruct
Microsoft
8.2
N/A
Jamba 1.5 Large
AI21 Labs
8.1
$2 in / $8 out
Pixtral-12B
Mistral AI
8.1
$0.15 in / $0.15 out
Qwen2.5-Omni-7B
Alibaba Cloud / Qwen Team
7.6
N/A
Qwen2.5 7B Instruct
Alibaba Cloud / Qwen Team
7.4
$0.3 in / $0.3 out
Gemini Diffusion
7.0
N/A
DeepSeek VL2
DeepSeek
6.9
N/A
GPT-4
OpenAI
6.8
$30 in / $60 out
Mistral Small 3 24B Base
Mistral AI
6.4
N/A
DeepSeek R1 Distill Qwen 1.5B
DeepSeek
6.1
N/A