Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
296
Tracked models
27
Providers
253
Benchmarked
11.5
Avg. index
296 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 81 | Grok 4 Fast grok-4-fast multimodalvisionmulti-input reasoning | xAI | 14.7 Agentic | 57.6 | 68.8 | 14.7 | 0.0 | 67.2 | $0.2 in / $0.5 out |
| 82 | o3-mini o3-mini codeprogrammingtool use | OpenAI | 11.9 Agentic | 25.6 | 70.4 | 11.9 | 12.2 | 41.6 | $1.1 in / $4.4 out |
| 83 | GLM-4.7-Flash glm-4.7-flash codeprogrammingtool use | Zhipu AI | 11.4 Agentic | 38.2 | 29.7 | 11.4 | 20.7 | 72.1 | $0.07 in / $0.4 out |
| 84 | GPT-5.4 nano gpt-5.4-nano multimodalvisionmulti-input reasoning | OpenAI | 9.7 Agentic | 45.6 | 76.5 | 9.7 | 10.0 | 57.1 | $0.2 in / $1.25 out |
| 85 | GPT-4.1 mini gpt-4.1-mini-2025-04-14 multimodalvisionmulti-input reasoning | OpenAI | 8.9 Agentic | 20.7 | 90.6 | 8.9 | 2.6 | 56.8 | |
| 86 | Nemotron 3 Super (120B A12B) nemotron-3-super-120b-a12b codeprogrammingtool use | NVIDIA | 8.7 Agentic | 48.3 | 0.0 | 8.7 | 26.8 | 0.0 | N/A |
| 87 | DeepSeek-V3.2-Speciale deepseek-v3.2-speciale codeprogrammingtool use | DeepSeek | 8.5 Agentic | 53.8 | 0.0 | 8.5 | 44.9 | 0.0 | |
| 88 | Sarvam-30B sarvam-30b codeprogrammingtool use | Sarvam AI | 8.2 Agentic | 46.4 | 0.0 | 8.2 | 5.2 | 0.0 | N/A |
| 89 | Kimi K2-Instruct-0905 kimi-k2-instruct-0905 codeprogrammingtool use | Moonshot AI | 6.6 Agentic | 24.4 | 0.0 | 6.6 | 19.3 | 0.0 | |
| 90 | GPT OSS 20B gpt-oss-20b textinference | OpenAI | 6.0 Agentic | 25.8 | 77.2 | 6.0 | 0.0 | 79.0 | $0.1 in / $0.5 out |
| 91 | Qwen2.5 VL 72B Instruct qwen2.5-vl-72b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 5.7 Agentic | 24.9 | 0.0 | 5.7 | 0.0 | 0.0 | N/A |
| 92 | Nemotron 3 Nano (30B A3B) nemotron-3-nano-30b-a3b codeprogrammingtool use | NVIDIA | 3.3 Agentic | 45.4 | 66.0 | 3.3 | 4.4 | 90.9 | $0.06 in / $0.24 out |
| 93 | Claude 3.5 Haiku claude-3-5-haiku-20241022 codeprogrammingtool use | Anthropic | 3.0 Agentic | 10.8 | 30.5 | 3.0 | 7.8 | 31.8 | |
| 94 | Qwen2.5 VL 32B Instruct qwen2.5-vl-32b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 1.6 Agentic | 21.2 | 0.0 | 1.6 | 0.0 | 0.0 | N/A |
| 95 | ChatGPT-4o Latest chatgpt-4o-latest multimodalvisionmulti-input reasoning | OpenAI | 0.0 Agentic | 56.0 | 63.8 | 0.0 | 0.0 | 32.0 | |
| 96 | Claude 3.5 Sonnet claude-3-5-sonnet-20240620 multimodalvisionmulti-input reasoning | Anthropic | 0.0 Agentic | 25.4 | 68.2 | 0.0 | 0.0 | 24.6 | |
| 97 | Claude 3 Haiku claude-3-haiku-20240307 multimodalvisionmulti-input reasoning | Anthropic | 0.0 Agentic | 5.8 | 61.8 | 0.0 | 0.0 | 57.9 | |
| 98 | Claude 3 Opus claude-3-opus-20240229 multimodalvisionmulti-input reasoning | Anthropic | 0.0 Agentic | 19.3 | 71.7 | 0.0 | 0.0 | 19.5 | |
| 99 | Claude 3 Sonnet claude-3-sonnet-20240229 multimodalvisionmulti-input reasoning | Anthropic | 0.0 Agentic | 10.0 | 30.5 | 0.0 | 0.0 | 13.3 | |
| 100 | Codestral-22B codestral-22b textinference | Mistral AI | 0.0 Agentic | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
Grok 4 Fast
xAI
14.7
$0.2 in / $0.5 out
o3-mini
OpenAI
11.9
$1.1 in / $4.4 out
GLM-4.7-Flash
Zhipu AI
11.4
$0.07 in / $0.4 out
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $0.4 in / $1.6 out |
| N/A |
| N/A |
| $0.8 in / $4 out |
| $2.5 in / $10 out |
| $3 in / $15 out |
| $0.25 in / $1.25 out |
| $15 in / $75 out |
| $3 in / $15 out |
GPT-5.4 nano
OpenAI
9.7
$0.2 in / $1.25 out
GPT-4.1 mini
OpenAI
8.9
$0.4 in / $1.6 out
Nemotron 3 Super (120B A12B)
NVIDIA
8.7
N/A
DeepSeek-V3.2-Speciale
DeepSeek
8.5
N/A
Sarvam-30B
Sarvam AI
8.2
N/A
Kimi K2-Instruct-0905
Moonshot AI
6.6
N/A
GPT OSS 20B
OpenAI
6.0
$0.1 in / $0.5 out
Qwen2.5 VL 72B Instruct
Alibaba Cloud / Qwen Team
5.7
N/A
Nemotron 3 Nano (30B A3B)
NVIDIA
3.3
$0.06 in / $0.24 out
Claude 3.5 Haiku
Anthropic
3.0
$0.8 in / $4 out
Qwen2.5 VL 32B Instruct
Alibaba Cloud / Qwen Team
1.6
N/A
ChatGPT-4o Latest
OpenAI
0.0
$2.5 in / $10 out
Claude 3.5 Sonnet
Anthropic
0.0
$3 in / $15 out
Claude 3 Haiku
Anthropic
0.0
$0.25 in / $1.25 out
Claude 3 Opus
Anthropic
0.0
$15 in / $75 out
Claude 3 Sonnet
Anthropic
0.0
$3 in / $15 out
Codestral-22B
Mistral AI
0.0
N/A