Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
309
Tracked models
27
Providers
264
Benchmarked
11.8
Avg. index
309 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 81 | DeepSeek-V3.2 (Thinking) deepseek-reasoner codeprogrammingtool use | DeepSeek | 15.0 Agentic | 51.8 | 0.0 | 15.0 | 43.5 | 0.0 | N/A |
| 82 | DeepSeek-V3.2 deepseek-v3.2 codeprogrammingtool use | DeepSeek | 15.0 Agentic | 56.7 | 0.0 | 15.0 | 43.5 | 0.0 | N/A |
| 83 | GPT-4o gpt-4o-2024-08-06 multimodalvisionmulti-input reasoning | OpenAI | 14.9 Agentic | 30.4 | 39.4 | 14.9 | 4.0 | 26.8 | |
| 84 | DeepSeek-V3.1 deepseek-v3.1 codeprogrammingtool use | DeepSeek | 14.0 Agentic | 37.5 | 0.0 | 14.0 | 26.4 | 0.0 | N/A |
| 85 | Grok 4 Fast grok-4-fast multimodalvisionmulti-input reasoning | xAI | 13.7 Agentic | 57.1 | 62.1 | 13.7 | 0.0 | 73.7 | $0.2 in / $0.5 out |
| 86 | Kimi K2 Instruct kimi-k2-instruct codeprogrammingtool use | Moonshot AI | 13.5 Agentic | 23.8 | 0.0 | 13.5 | 14.0 | 0.0 | N/A |
| 87 | Qwen3.6-35B-A3B qwen3.6-35b-a3b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 13.5 Agentic | 54.2 | 0.0 | 13.5 | 25.1 | 0.0 | N/A |
| 88 | Nova 2 Lite nova-2-lite multimodalvisionmulti-input reasoning | Amazon | 13.0 Agentic | 42.8 | 72.2 | 13.0 | 27.0 | 50.0 | $0.3 in / $2.5 out |
| 89 | o3-mini o3-mini codeprogrammingtool use | OpenAI | 11.9 Agentic | 26.8 | 0.0 | 11.9 | 12.9 | 0.0 | N/A |
| 90 | GLM-4.7-Flash glm-4.7-flash codeprogrammingtool use | Zhipu AI | 10.7 Agentic | 37.1 | 0.0 | 10.7 | 19.0 | 0.0 | N/A |
| 91 | GPT-5.4 nano gpt-5.4-nano multimodalvisionmulti-input reasoning | OpenAI | 10.4 Agentic | 46.0 | 56.3 | 10.4 | 10.7 | 70.9 | $0.2 in / $1.25 out |
| 92 | GPT-4.1 mini gpt-4.1-mini-2025-04-14 multimodalvisionmulti-input reasoning | OpenAI | 8.9 Agentic | 20.2 | 87.8 | 8.9 | 2.4 | 65.6 | |
| 93 | Nemotron 3 Super (120B A12B) nemotron-3-super-120b-a12b codeprogrammingtool use | NVIDIA | 8.1 Agentic | 47.3 | 0.0 | 8.1 | 24.5 | 0.0 | N/A |
| 94 | DeepSeek-V3.2-Speciale deepseek-v3.2-speciale codeprogrammingtool use | DeepSeek | 7.6 Agentic | 53.0 | 0.0 | 7.6 | 43.5 | 0.0 | |
| 95 | Sarvam-30B sarvam-30b codeprogrammingtool use | Sarvam AI | 7.6 Agentic | 45.7 | 0.0 | 7.6 | 4.7 | 0.0 | N/A |
| 96 | GPT OSS 20B gpt-oss-20b textinference | OpenAI | 6.0 Agentic | 24.8 | 0.0 | 6.0 | 0.0 | 0.0 | N/A |
| 97 | Kimi K2-Instruct-0905 kimi-k2-instruct-0905 codeprogrammingtool use | Moonshot AI | 6.0 Agentic | 23.8 | 0.0 | 6.0 | 18.1 | 0.0 | |
| 98 | Qwen2.5 VL 72B Instruct qwen2.5-vl-72b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 5.6 Agentic | 24.1 | 0.0 | 5.6 | 0.0 | 0.0 | N/A |
| 99 | Claude 3.5 Haiku claude-3-5-haiku-20241022 codeprogrammingtool use | Anthropic | 3.0 Agentic | 10.5 | 0.0 | 3.0 | 7.1 | 0.0 | |
| 100 | Nemotron 3 Nano (30B A3B) nemotron-3-nano-30b-a3b codeprogrammingtool use | NVIDIA | 3.0 Agentic | 44.5 | 41.1 | 3.0 | 4.0 | 100.0 | $0.06 in / $0.24 out |
DeepSeek-V3.2 (Thinking)
DeepSeek
15.0
N/A
DeepSeek-V3.2
DeepSeek
15.0
N/A
GPT-4o
OpenAI
14.9
$2.5 in / $10 out
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $2.5 in / $10 out |
| $0.4 in / $1.6 out |
| N/A |
| N/A |
| N/A |
DeepSeek-V3.1
DeepSeek
14.0
N/A
Grok 4 Fast
xAI
13.7
$0.2 in / $0.5 out
Kimi K2 Instruct
Moonshot AI
13.5
N/A
Qwen3.6-35B-A3B
Alibaba Cloud / Qwen Team
13.5
N/A
Nova 2 Lite
Amazon
13.0
$0.3 in / $2.5 out
o3-mini
OpenAI
11.9
N/A
GLM-4.7-Flash
Zhipu AI
10.7
N/A
GPT-5.4 nano
OpenAI
10.4
$0.2 in / $1.25 out
GPT-4.1 mini
OpenAI
8.9
$0.4 in / $1.6 out
Nemotron 3 Super (120B A12B)
NVIDIA
8.1
N/A
DeepSeek-V3.2-Speciale
DeepSeek
7.6
N/A
Sarvam-30B
Sarvam AI
7.6
N/A
GPT OSS 20B
OpenAI
6.0
N/A
Kimi K2-Instruct-0905
Moonshot AI
6.0
N/A
Qwen2.5 VL 72B Instruct
Alibaba Cloud / Qwen Team
5.6
N/A
Claude 3.5 Haiku
Anthropic
3.0
N/A
Nemotron 3 Nano (30B A3B)
NVIDIA
3.0
$0.06 in / $0.24 out