Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
296
Tracked models
27
Providers
253
Benchmarked
34.7
Avg. index
296 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 201 | Llama 3.1 8B Instruct llama-3.1-8b-instruct textinference | Meta | 25.1 overall | 3.2 | 26.7 | 0.0 | 0.0 | 83.9 | $0.03 in / $0.03 out |
| 202 | Nemotron Nano 9B v2 nvidia-nemotron-nano-9b-v2 textinference | NVIDIA | 24.9 overall | 24.9 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 203 | Llama 3.1 405B Instruct llama-3.1-405b-instruct textinference | Meta | 24.9 overall | 20.0 | 21.4 | 0.0 | 0.0 | 44.5 | $0.89 in / $0.89 out |
| 204 | DeepSeek R1 Distill Qwen 14B deepseek-r1-distill-qwen-14b textinference | DeepSeek | 24.7 overall | 24.7 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 205 | ERNIE 4.5 ernie-4.5 textinference | Baidu | 24.7 overall | 24.5 | 18.8 | 0.0 | 0.0 | 34.6 | $0.4 in / $4 out |
| 206 | GLM-4.5-Air glm-4.5-air codeprogrammingtool use | Zhipu AI | 24.5 overall | 27.7 | 0.0 | 24.9 | 20.0 | 0.0 | N/A |
| 207 | Magistral Small 2506 magistral-small-2506 textinference | Mistral AI | 24.5 overall | 24.5 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 208 | Gemini 2.5 Flash-Lite gemini-2.5-flash-lite multimodalvisionmulti-input reasoning | Google | 24.2 overall | 21.4 | 32.8 | 0.0 | 3.5 | 64.1 | |
| 209 | Qwen3-Next-80B-A3B-Instruct qwen3-next-80b-a3b-instruct textinference | Alibaba Cloud / Qwen Team | 24.0 overall | 29.5 | 6.1 | 17.9 | 0.0 | 51.9 | $0.15 in / $1.5 out |
| 210 | Grok-2 mini grok-2-mini multimodalvisionmulti-input reasoning | xAI | 24.0 overall | 24.0 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 211 | Qwen2.5 72B Instruct qwen-2.5-72b-instruct textinference | Alibaba Cloud / Qwen Team | 23.8 overall | 17.8 | 15.0 | 0.0 | 0.0 | 54.5 | $0.35 in / $0.4 out |
| 212 | GPT-4o mini gpt-4o-mini-2024-07-18 multimodalvisionmulti-input reasoning | OpenAI | 23.6 overall | 14.8 | 45.4 | 0.0 | 0.0 | 65.1 | |
| 213 | Gemma 3 4B gemma-3-4b-it multimodalvisionmulti-input reasoning | Google | 23.6 overall | 4.5 | 20.3 | 0.0 | 0.0 | 82.0 | $0.02 in / $0.04 out |
| 214 | GPT-4o gpt-4o-2024-08-06 multimodalvisionmulti-input reasoning | OpenAI | 23.5 overall | 31.5 | 46.7 | 14.9 | 4.3 | 26.8 | |
| 215 | Mistral Large 2 mistral-large-2-2407 textinference | Mistral AI | 23.5 overall | 0.0 | 21.4 | 0.0 | 0.0 | 26.7 | $2 in / $6 out |
| 216 | GPT-4 gpt-4-0613 multimodalvisionmulti-input reasoning | OpenAI | 23.2 overall | 6.8 | 54.9 | 0.0 | 0.0 | 18.7 | $30 in / $60 out |
| 217 | Phi 4 Reasoning phi-4-reasoning textinference | Microsoft | 23.1 overall | 23.1 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 218 | Llama-3.3 Nemotron Super 49B v1 llama-3.3-nemotron-super-49b-v1 textinference | NVIDIA | 23.0 overall | 23.0 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 219 | Phi-4-multimodal-instruct phi-4-multimodal-instruct multimodalvisionmulti-input reasoning | Microsoft | 23.0 overall | 8.8 | 12.3 | 0.0 | 0.0 | 79.8 | $0.05 in / $0.1 out |
| 220 | MiniMax M1 40K minimax-m1-40k codeprogrammingtool use | MiniMax | 22.6 overall | 22.6 | 0.0 | 26.8 | 18.1 | 0.0 | N/A |
Llama 3.1 8B Instruct
Meta
25.1
$0.03 in / $0.03 out
Nemotron Nano 9B v2
NVIDIA
24.9
N/A
Llama 3.1 405B Instruct
Meta
24.9
$0.89 in / $0.89 out
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $0.1 in / $0.4 out |
| $0.15 in / $0.6 out |
| $2.5 in / $10 out |
DeepSeek R1 Distill Qwen 14B
DeepSeek
24.7
N/A
ERNIE 4.5
Baidu
24.7
$0.4 in / $4 out
GLM-4.5-Air
Zhipu AI
24.5
N/A
Magistral Small 2506
Mistral AI
24.5
N/A
Gemini 2.5 Flash-Lite
24.2
$0.1 in / $0.4 out
Qwen3-Next-80B-A3B-Instruct
Alibaba Cloud / Qwen Team
24.0
$0.15 in / $1.5 out
Grok-2 mini
xAI
24.0
N/A
Qwen2.5 72B Instruct
Alibaba Cloud / Qwen Team
23.8
$0.35 in / $0.4 out
GPT-4o mini
OpenAI
23.6
$0.15 in / $0.6 out
Gemma 3 4B
23.6
$0.02 in / $0.04 out
GPT-4o
OpenAI
23.5
$2.5 in / $10 out
Mistral Large 2
Mistral AI
23.5
$2 in / $6 out
GPT-4
OpenAI
23.2
$30 in / $60 out
Phi 4 Reasoning
Microsoft
23.1
N/A
Llama-3.3 Nemotron Super 49B v1
NVIDIA
23.0
N/A
Phi-4-multimodal-instruct
Microsoft
23.0
$0.05 in / $0.1 out
MiniMax M1 40K
MiniMax
22.6
N/A