Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
296
Tracked models
27
Providers
253
Benchmarked
34.7
Avg. index
296 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 181 | Nemotron 3 Super (120B A12B) nemotron-3-super-120b-a12b codeprogrammingtool use | NVIDIA | 29.1 overall | 48.3 | 0.0 | 8.7 | 26.8 | 0.0 | N/A |
| 182 | QwQ-32B qwq-32b textinference | Alibaba Cloud / Qwen Team | 28.8 overall | 28.8 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 183 | Gemini 1.0 Pro gemini-1.0-pro multimodalvisionmulti-input reasoning | Google | 28.8 overall | 3.2 | 57.2 | 0.0 | 0.0 | 55.4 | |
| 184 | Qwen3 VL 32B Instruct qwen3-vl-32b-instruct multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 28.7 overall | 29.4 | 0.0 | 27.9 | 0.0 | 0.0 | |
| 185 | GPT-4.1 mini gpt-4.1-mini-2025-04-14 multimodalvisionmulti-input reasoning | OpenAI | 28.7 overall | 20.7 | 90.6 | 8.9 | 2.6 | 56.8 | |
| 186 | Mistral Small 3 24B Instruct mistral-small-24b-instruct-2501 textinference | Mistral AI | 28.6 overall | 14.2 | 21.4 | 0.0 | 0.0 | 80.7 | $0.07 in / $0.14 out |
| 187 | o3-mini o3-mini codeprogrammingtool use | OpenAI | 28.1 overall | 25.6 | 70.4 | 11.9 | 12.2 | 41.6 | $1.1 in / $4.4 out |
| 188 | Qwen3 32B qwen3-32b textinference | Alibaba Cloud / Qwen Team | 28.0 overall | 21.4 | 13.3 | 0.0 | 0.0 | 69.8 | $0.1 in / $0.3 out |
| 189 | GPT-4 Turbo gpt-4-turbo-2024-04-09 textinference | OpenAI | 27.9 overall | 16.9 | 52.7 | 0.0 | 0.0 | 18.8 | $10 in / $30 out |
| 190 | o1 o1-2024-12-17 multimodalvisionmulti-input reasoning | OpenAI | 27.8 overall | 42.9 | 19.4 | 44.7 | 6.5 | 4.9 | $15 in / $60 out |
| 191 | MiniCPM-SALA minicpm-sala textinference | OpenBMB | 27.5 overall | 27.5 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 192 | Kimi K2 Instruct kimi-k2-instruct codeprogrammingtool use | Moonshot AI | 27.3 overall | 24.4 | 46.1 | 14.8 | 15.3 | 62.1 | $0.5 in / $0.5 out |
| 193 | GPT-4.5 gpt-4.5 multimodalvisionmulti-input reasoning | OpenAI | 27.1 overall | 41.9 | 29.7 | 35.8 | 6.0 | 7.0 | $75 in / $150 out |
| 194 | Kimi K2 Base kimi-k2-base textinference | Moonshot AI | 26.9 overall | 26.9 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 195 | o1-preview o1-preview codeprogrammingtool use | OpenAI | 26.6 overall | 41.8 | 33.0 | 0.0 | 9.5 | 11.8 | $15 in / $60 out |
| 196 | Sarvam-105B sarvam-105b codeprogrammingtool use | Sarvam AI | 25.7 overall | 42.9 | 0.0 | 17.9 | 12.1 | 0.0 | N/A |
| 197 | Gemma 3 12B gemma-3-12b-it multimodalvisionmulti-input reasoning | Google | 25.7 overall | 9.1 | 20.3 | 0.0 | 0.0 | 80.7 | $0.05 in / $0.1 out |
| 198 | Llama 3.1 70B Instruct llama-3.1-70b-instruct textinference | Meta | 25.5 overall | 11.2 | 21.4 | 0.0 | 0.0 | 72.2 | $0.2 in / $0.2 out |
| 199 | Gemma 3 27B gemma-3-27b-it multimodalvisionmulti-input reasoning | Google | 25.3 overall | 10.7 | 20.3 | 0.0 | 0.0 | 73.9 | $0.1 in / $0.2 out |
| 200 | Phi 4 phi-4 textinference | Microsoft | 25.1 overall | 15.6 | 9.0 | 0.0 | 0.0 | 77.2 | $0.07 in / $0.14 out |
Nemotron 3 Super (120B A12B)
NVIDIA
29.1
N/A
QwQ-32B
Alibaba Cloud / Qwen Team
28.8
N/A
Gemini 1.0 Pro
28.8
$0.5 in / $1.5 out
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $0.5 in / $1.5 out |
| N/A |
| $0.4 in / $1.6 out |
Qwen3 VL 32B Instruct
Alibaba Cloud / Qwen Team
28.7
N/A
GPT-4.1 mini
OpenAI
28.7
$0.4 in / $1.6 out
Mistral Small 3 24B Instruct
Mistral AI
28.6
$0.07 in / $0.14 out
o3-mini
OpenAI
28.1
$1.1 in / $4.4 out
Qwen3 32B
Alibaba Cloud / Qwen Team
28.0
$0.1 in / $0.3 out
GPT-4 Turbo
OpenAI
27.9
$10 in / $30 out
o1
OpenAI
27.8
$15 in / $60 out
MiniCPM-SALA
OpenBMB
27.5
N/A
Kimi K2 Instruct
Moonshot AI
27.3
$0.5 in / $0.5 out
GPT-4.5
OpenAI
27.1
$75 in / $150 out
Kimi K2 Base
Moonshot AI
26.9
N/A
o1-preview
OpenAI
26.6
$15 in / $60 out
Sarvam-105B
Sarvam AI
25.7
N/A
Gemma 3 12B
25.7
$0.05 in / $0.1 out
Llama 3.1 70B Instruct
Meta
25.5
$0.2 in / $0.2 out
Gemma 3 27B
25.3
$0.1 in / $0.2 out
Phi 4
Microsoft
25.1
$0.07 in / $0.14 out