Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
296
Tracked models
27
Providers
253
Benchmarked
34.7
Avg. index
296 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 241 | DeepSeek R1 Distill Llama 8B deepseek-r1-distill-llama-8b textinference | DeepSeek | 17.8 overall | 17.8 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 242 | Kimi K2-Instruct-0905 kimi-k2-instruct-0905 codeprogrammingtool use | Moonshot AI | 17.1 overall | 24.4 | 0.0 | 6.6 | 19.3 | 0.0 | |
| 243 | Claude 3 Sonnet claude-3-sonnet-20240229 multimodalvisionmulti-input reasoning | Anthropic | 16.7 overall | 10.0 | 30.5 | 0.0 | 0.0 | 13.3 | |
| 244 | Llama 3.1 Nemotron Nano 8B V1 llama-3.1-nemotron-nano-8b-v1 textinference | NVIDIA | 16.3 overall | 16.3 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 245 | Qwen2.5 VL 72B Instruct qwen2.5-vl-72b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 16.0 overall | 24.9 | 0.0 | 5.7 | 0.0 | 0.0 | N/A |
| 246 | Mistral Large 3 mistral-large-3-2509 multimodalvisionmulti-input reasoning | Mistral AI | 16.0 overall | 9.6 | 18.8 | 0.0 | 0.0 | 29.1 | |
| 247 | Mistral Small 3.1 24B Instruct mistral-small-3.1-24b-instruct-2503 multimodalvisionmulti-input reasoning | Mistral AI | 15.7 overall | 15.7 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 248 | Qwen2.5 14B Instruct qwen-2.5-14b-instruct textinference | Alibaba Cloud / Qwen Team | 14.6 overall | 14.6 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 249 | o3-pro o3-pro-2025-06-10 multimodalvisionmulti-input reasoning | OpenAI | 14.6 overall | 0.0 | 21.4 | 0.0 | 0.0 | 3.6 | $20 in / $80 out |
| 250 | Qwen3.5-2B qwen3.5-2b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 14.4 overall | 14.4 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 251 | Claude 3.5 Haiku claude-3-5-haiku-20241022 codeprogrammingtool use | Anthropic | 13.5 overall | 10.8 | 30.5 | 3.0 | 7.8 | 31.8 | |
| 252 | Qwen2.5 VL 32B Instruct qwen2.5-vl-32b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 12.2 overall | 21.2 | 0.0 | 1.6 | 0.0 | 0.0 | N/A |
| 253 | Qwen2 72B Instruct qwen2-72b-instruct textinference | Alibaba Cloud / Qwen Team | 12.0 overall | 12.0 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 254 | Grok-1.5V grok-1.5v multimodalvisionmulti-input reasoning | xAI | 9.8 overall | 9.8 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 255 | Gemma 4 E2B gemma-4-e2b-it multimodalvisionmulti-input reasoning | Google | 9.5 overall | 9.5 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 256 | Qwen2-VL-72B-Instruct qwen2-vl-72b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 9.3 overall | 9.3 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 257 | Grok-1.5 grok-1.5 multimodalvisionmulti-input reasoning | xAI | 8.6 overall | 8.6 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 258 | Gemma 3n E4B Instructed gemma-3n-e4b-it multimodalvisionmulti-input reasoning | Google | 8.6 overall | 1.3 | 20.3 | 0.0 | 0.0 | 10.3 | |
| 259 | Phi-3.5-MoE-instruct phi-3.5-moe-instruct multimodalvisionmulti-input reasoning | Microsoft | 8.2 overall | 8.2 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 260 | Qwen2.5-Omni-7B qwen2.5-omni-7b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 7.6 overall | 7.6 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
DeepSeek R1 Distill Llama 8B
DeepSeek
17.8
N/A
Kimi K2-Instruct-0905
Moonshot AI
17.1
N/A
Claude 3 Sonnet
Anthropic
16.7
$3 in / $15 out
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| N/A |
| $3 in / $15 out |
| $2 in / $5 out |
| N/A |
| $0.8 in / $4 out |
| $20 in / $40 out |
Llama 3.1 Nemotron Nano 8B V1
NVIDIA
16.3
N/A
Qwen2.5 VL 72B Instruct
Alibaba Cloud / Qwen Team
16.0
N/A
Mistral Large 3
Mistral AI
16.0
$2 in / $5 out
Mistral Small 3.1 24B Instruct
Mistral AI
15.7
N/A
Qwen2.5 14B Instruct
Alibaba Cloud / Qwen Team
14.6
N/A
o3-pro
OpenAI
14.6
$20 in / $80 out
Qwen3.5-2B
Alibaba Cloud / Qwen Team
14.4
N/A
Claude 3.5 Haiku
Anthropic
13.5
$0.8 in / $4 out
Qwen2.5 VL 32B Instruct
Alibaba Cloud / Qwen Team
12.2
N/A
Qwen2 72B Instruct
Alibaba Cloud / Qwen Team
12.0
N/A
Grok-1.5V
xAI
9.8
N/A
Gemma 4 E2B
9.5
N/A
Qwen2-VL-72B-Instruct
Alibaba Cloud / Qwen Team
9.3
N/A
Grok-1.5
xAI
8.6
N/A
Gemma 3n E4B Instructed
8.6
$20 in / $40 out
Phi-3.5-MoE-instruct
Microsoft
8.2
N/A
Qwen2.5-Omni-7B
Alibaba Cloud / Qwen Team
7.6
N/A