Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
296
Tracked models
27
Providers
253
Benchmarked
27.4
Avg. index
296 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 181 | DeepSeek R1 Distill Llama 8B deepseek-r1-distill-llama-8b textinference | DeepSeek | 17.8 Benchmarks | 17.8 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 182 | Qwen2.5 72B Instruct qwen-2.5-72b-instruct textinference | Alibaba Cloud / Qwen Team | 17.8 Benchmarks | 17.8 | 15.0 | 0.0 | 0.0 | 54.5 | $0.35 in / $0.4 out |
| 183 | GPT-4 Turbo gpt-4-turbo-2024-04-09 textinference | OpenAI | 16.9 Benchmarks | 16.9 | 52.7 | 0.0 | 0.0 | 18.8 | $10 in / $30 out |
| 184 | Llama 3.1 Nemotron Nano 8B V1 llama-3.1-nemotron-nano-8b-v1 textinference | NVIDIA | 16.3 Benchmarks | 16.3 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 185 | Llama 3.2 90B Instruct llama-3.2-90b-instruct multimodalvisionmulti-input reasoning | Meta | 16.3 Benchmarks | 16.3 | 11.3 | 0.0 | 0.0 | 54.9 | $0.35 in / $0.4 out |
| 186 | Mistral Small 3.1 24B Instruct mistral-small-3.1-24b-instruct-2503 multimodalvisionmulti-input reasoning | Mistral AI | 15.7 Benchmarks | 15.7 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 187 | Phi 4 phi-4 textinference | Microsoft | 15.6 Benchmarks | 15.6 | 9.0 | 0.0 | 0.0 | 77.2 | $0.07 in / $0.14 out |
| 188 | GPT-4o mini gpt-4o-mini-2024-07-18 multimodalvisionmulti-input reasoning | OpenAI | 14.8 Benchmarks | 14.8 | 45.4 | 0.0 | 0.0 | 65.1 | |
| 189 | Qwen2.5 14B Instruct qwen-2.5-14b-instruct textinference | Alibaba Cloud / Qwen Team | 14.6 Benchmarks | 14.6 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 190 | Qwen3.5-2B qwen3.5-2b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 14.4 Benchmarks | 14.4 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 191 | Mistral Small 3 24B Instruct mistral-small-24b-instruct-2501 textinference | Mistral AI | 14.2 Benchmarks | 14.2 | 21.4 | 0.0 | 0.0 | 80.7 | $0.07 in / $0.14 out |
| 192 | Nova Lite nova-lite multimodalvisionmulti-input reasoning | Amazon | 13.5 Benchmarks | 13.5 | 70.5 | 0.0 | 0.0 | 86.7 | $0.06 in / $0.24 out |
| 193 | Mistral Small 3.1 24B Base mistral-small-3.1-24b-base-2503 multimodalvisionmulti-input reasoning | Mistral AI | 13.4 Benchmarks | 13.4 | 64.8 | 0.0 | 0.0 | 85.3 | |
| 194 | GPT-4.1 nano gpt-4.1-nano-2025-04-14 multimodalvisionmulti-input reasoning | OpenAI | 12.5 Benchmarks | 12.5 | 93.4 | 0.0 | 0.0 | 82.7 | |
| 195 | Qwen2 72B Instruct qwen2-72b-instruct textinference | Alibaba Cloud / Qwen Team | 12.0 Benchmarks | 12.0 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 196 | Llama 3.1 70B Instruct llama-3.1-70b-instruct textinference | Meta | 11.2 Benchmarks | 11.2 | 21.4 | 0.0 | 0.0 | 72.2 | $0.2 in / $0.2 out |
| 197 | Claude 3.5 Haiku claude-3-5-haiku-20241022 codeprogrammingtool use | Anthropic | 10.8 Benchmarks | 10.8 | 30.5 | 3.0 | 7.8 | 31.8 | |
| 198 | Gemma 3 27B gemma-3-27b-it multimodalvisionmulti-input reasoning | Google | 10.7 Benchmarks | 10.7 | 20.3 | 0.0 | 0.0 | 73.9 | |
| 199 | Gemini 1.5 Flash 8B gemini-1.5-flash-8b multimodalvisionmulti-input reasoning | Google | 10.4 Benchmarks | 10.4 | 91.9 | 0.0 | 0.0 | 88.4 | |
| 200 | Claude 3 Sonnet claude-3-sonnet-20240229 multimodalvisionmulti-input reasoning | Anthropic | 10.0 Benchmarks | 10.0 | 30.5 | 0.0 | 0.0 | 13.3 |
DeepSeek R1 Distill Llama 8B
DeepSeek
17.8
N/A
Qwen2.5 72B Instruct
Alibaba Cloud / Qwen Team
17.8
$0.35 in / $0.4 out
GPT-4 Turbo
OpenAI
16.9
$10 in / $30 out
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| N/A |
| $0.15 in / $0.6 out |
| $0.1 in / $0.3 out |
| $0.1 in / $0.4 out |
| $0.8 in / $4 out |
| $0.1 in / $0.2 out |
| $0.07 in / $0.3 out |
| $3 in / $15 out |
Llama 3.1 Nemotron Nano 8B V1
NVIDIA
16.3
N/A
Llama 3.2 90B Instruct
Meta
16.3
$0.35 in / $0.4 out
Mistral Small 3.1 24B Instruct
Mistral AI
15.7
N/A
Phi 4
Microsoft
15.6
$0.07 in / $0.14 out
GPT-4o mini
OpenAI
14.8
$0.15 in / $0.6 out
Qwen2.5 14B Instruct
Alibaba Cloud / Qwen Team
14.6
N/A
Qwen3.5-2B
Alibaba Cloud / Qwen Team
14.4
N/A
Mistral Small 3 24B Instruct
Mistral AI
14.2
$0.07 in / $0.14 out
Nova Lite
Amazon
13.5
$0.06 in / $0.24 out
Mistral Small 3.1 24B Base
Mistral AI
13.4
$0.1 in / $0.3 out
GPT-4.1 nano
OpenAI
12.5
$0.1 in / $0.4 out
Qwen2 72B Instruct
Alibaba Cloud / Qwen Team
12.0
N/A
Llama 3.1 70B Instruct
Meta
11.2
$0.2 in / $0.2 out
Claude 3.5 Haiku
Anthropic
10.8
$0.8 in / $4 out
Gemma 3 27B
10.7
$0.1 in / $0.2 out
Gemini 1.5 Flash 8B
10.4
$0.07 in / $0.3 out
Claude 3 Sonnet
Anthropic
10.0
$3 in / $15 out