Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
296
Tracked models
27
Providers
253
Benchmarked
32.1
Avg. index
296 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 181 | Qwen3-Next-80B-A3B-Thinking qwen3-next-80b-a3b-thinking textinference | Alibaba Cloud / Qwen Team | 6.1 Inference | 44.7 | 6.1 | 41.7 | 0.0 | 51.9 | $0.15 in / $1.5 out |
| 182 | Mistral Small mistral-small-2409 textinference | Mistral AI | 2.1 Inference | 0.0 | 2.1 | 0.0 | 0.0 | 51.9 | $0.2 in / $0.6 out |
| 183 | Claude Mythos Preview claude-mythos-preview multimodalvisionmulti-input reasoning | Anthropic | 0.0 Inference | 80.0 | 0.0 | 70.1 | 84.2 | 1.6 | |
| 184 | Claude Opus 4 claude-opus-4-20250514 multimodalvisionmulti-input reasoning | Anthropic | 0.0 Inference | 37.6 | 0.0 | 57.9 | 48.9 | 0.0 | |
| 185 | Claude Sonnet 4 claude-sonnet-4-20250514 multimodalvisionmulti-input reasoning | Anthropic | 0.0 Inference | 40.9 | 0.0 | 49.4 | 44.3 | 0.0 | |
| 186 | Codestral-22B codestral-22b textinference | Mistral AI | 0.0 Inference | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 187 | DeepSeek R1 Distill Llama 8B deepseek-r1-distill-llama-8b textinference | DeepSeek | 0.0 Inference | 17.8 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 188 | DeepSeek R1 Distill Qwen 14B deepseek-r1-distill-qwen-14b textinference | DeepSeek | 0.0 Inference | 24.7 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 189 | DeepSeek R1 Distill Qwen 1.5B deepseek-r1-distill-qwen-1.5b textinference | DeepSeek | 0.0 Inference | 6.1 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 190 | DeepSeek R1 Distill Qwen 7B deepseek-r1-distill-qwen-7b textinference | DeepSeek | 0.0 Inference | 18.3 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 191 | DeepSeek R1 Zero deepseek-r1-zero textinference | DeepSeek | 0.0 Inference | 39.4 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 192 | DeepSeek-V3.2 (Thinking) deepseek-reasoner codeprogrammingtool use | DeepSeek | 0.0 Inference | 52.5 | 0.0 | 15.5 | 44.9 | 0.0 | |
| 193 | DeepSeek-V3.2-Exp deepseek-v3.2-exp codeprogrammingtool use | DeepSeek | 0.0 Inference | 52.3 | 0.0 | 28.6 | 40.1 | 0.0 | N/A |
| 194 | DeepSeek-V3.2-Speciale deepseek-v3.2-speciale codeprogrammingtool use | DeepSeek | 0.0 Inference | 53.8 | 0.0 | 8.5 | 44.9 | 0.0 | |
| 195 | DeepSeek VL2 deepseek-vl2 multimodalvisionmulti-input reasoning | DeepSeek | 0.0 Inference | 6.9 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 196 | DeepSeek VL2 Small deepseek-vl2-small multimodalvisionmulti-input reasoning | DeepSeek | 0.0 Inference | 4.6 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 197 | DeepSeek VL2 Tiny deepseek-vl2-tiny multimodalvisionmulti-input reasoning | DeepSeek | 0.0 Inference | 1.2 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 198 | ERNIE 5.0 ernie-5.0 multimodalvisionmulti-input reasoning | Baidu | 0.0 Inference | 59.1 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 199 | Gemini 2.0 Flash Thinking gemini-2.0-flash-thinking multimodalvisionmulti-input reasoning | Google | 0.0 Inference | 46.5 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 200 | Gemini 3 Pro gemini-3-pro-preview multimodalvisionmulti-input reasoning | Google | 0.0 Inference | 73.2 | 0.0 | 63.8 | 56.1 | 0.0 |
Qwen3-Next-80B-A3B-Thinking
Alibaba Cloud / Qwen Team
6.1
$0.15 in / $1.5 out
Mistral Small
Mistral AI
2.1
$0.2 in / $0.6 out
Claude Mythos Preview
Anthropic
0.0
$25 in / $125 out
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $25 in / $125 out |
| N/A |
| N/A |
| N/A |
| N/A |
| N/A |
| N/A |
| N/A |
| N/A |
Claude Opus 4
Anthropic
0.0
N/A
Claude Sonnet 4
Anthropic
0.0
N/A
Codestral-22B
Mistral AI
0.0
N/A
DeepSeek R1 Distill Llama 8B
DeepSeek
0.0
N/A
DeepSeek R1 Distill Qwen 14B
DeepSeek
0.0
N/A
DeepSeek R1 Distill Qwen 1.5B
DeepSeek
0.0
N/A
DeepSeek R1 Distill Qwen 7B
DeepSeek
0.0
N/A
DeepSeek R1 Zero
DeepSeek
0.0
N/A
DeepSeek-V3.2 (Thinking)
DeepSeek
0.0
N/A
DeepSeek-V3.2-Exp
DeepSeek
0.0
N/A
DeepSeek-V3.2-Speciale
DeepSeek
0.0
N/A
DeepSeek VL2
DeepSeek
0.0
N/A
DeepSeek VL2 Small
DeepSeek
0.0
N/A
DeepSeek VL2 Tiny
DeepSeek
0.0
N/A
ERNIE 5.0
Baidu
0.0
N/A
Gemini 2.0 Flash Thinking
0.0
N/A
Gemini 3 Pro
0.0
N/A