Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
296
Tracked models
27
Providers
253
Benchmarked
32.1
Avg. index
296 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 141 | Claude Opus 4.1 claude-opus-4-1-20250805 multimodalvisionmulti-input reasoning | Anthropic | 30.5 Inference | 47.9 | 30.5 | 66.8 | 62.1 | 7.2 | $15 in / $75 out |
| 142 | Claude Opus 4.5 claude-opus-4-5-20251101 multimodalvisionmulti-input reasoning | Anthropic | 30.5 Inference | 56.1 | 30.5 | 42.5 | 74.2 | 10.7 | |
| 143 | Claude Sonnet 4.5 claude-sonnet-4-5-20250929 multimodalvisionmulti-input reasoning | Anthropic | 30.5 Inference | 53.0 | 30.5 | 71.8 | 74.6 | 13.3 | |
| 144 | Claude Sonnet 4.6 claude-sonnet-4-6 multimodalvisionmulti-input reasoning | Anthropic | 30.5 Inference | 66.1 | 30.5 | 48.5 | 68.2 | 13.3 | |
| 145 | GLM-4.7-Flash glm-4.7-flash codeprogrammingtool use | Zhipu AI | 29.7 Inference | 38.2 | 29.7 | 11.4 | 20.7 | 72.1 | $0.07 in / $0.4 out |
| 146 | GPT-4.5 gpt-4.5 multimodalvisionmulti-input reasoning | OpenAI | 29.7 Inference | 41.9 | 29.7 | 35.8 | 6.0 | 7.0 | $75 in / $150 out |
| 147 | Granite 3.3 8B Instruct granite-3.3-8b-instruct multimodalvisionmulti-input reasoning | IBM | 29.7 Inference | 0.0 | 29.7 | 0.0 | 0.0 | 56.7 | $0.5 in / $0.5 out |
| 148 | QwQ-32B-Preview qwq-32b-preview textinference | Alibaba Cloud / Qwen Team | 29.7 Inference | 28.8 | 29.7 | 0.0 | 0.0 | 61.9 | $0.15 in / $0.6 out |
| 149 | Llama 3.1 8B Instruct llama-3.1-8b-instruct textinference | Meta | 26.7 Inference | 3.2 | 26.7 | 0.0 | 0.0 | 83.9 | $0.03 in / $0.03 out |
| 150 | K-EXAONE-236B-A23B k-exaone-236b-a23b multimodalvisionmulti-input reasoning | LG AI Research | 24.9 Inference | 43.4 | 24.9 | 0.0 | 0.0 | 49.2 | $0.6 in / $1 out |
| 151 | GLM-5 glm-5 codeprogrammingtool use | Zhipu AI | 23.0 Inference | 0.0 | 23.0 | 47.8 | 63.8 | 30.6 | $1 in / $3.2 out |
| 152 | Llama 3.1 405B Instruct llama-3.1-405b-instruct textinference | Meta | 21.4 Inference | 20.0 | 21.4 | 0.0 | 0.0 | 44.5 | $0.89 in / $0.89 out |
| 153 | Llama 3.1 70B Instruct llama-3.1-70b-instruct textinference | Meta | 21.4 Inference | 11.2 | 21.4 | 0.0 | 0.0 | 72.2 | $0.2 in / $0.2 out |
| 154 | Llama 3.3 70B Instruct llama-3.3-70b-instruct textinference | Meta | 21.4 Inference | 19.6 | 21.4 | 0.0 | 0.0 | 72.2 | $0.2 in / $0.2 out |
| 155 | Mistral Large 2 mistral-large-2-2407 textinference | Mistral AI | 21.4 Inference | 0.0 | 21.4 | 0.0 | 0.0 | 26.7 | $2 in / $6 out |
| 156 | Mistral NeMo Instruct mistral-nemo-instruct-2407 textinference | Mistral AI | 21.4 Inference | 0.0 | 21.4 | 0.0 | 0.0 | 77.3 | $0.15 in / $0.15 out |
| 157 | Mistral Small 3 24B Instruct mistral-small-24b-instruct-2501 textinference | Mistral AI | 21.4 Inference | 14.2 | 21.4 | 0.0 | 0.0 | 80.7 | $0.07 in / $0.14 out |
| 158 | o3-pro o3-pro-2025-06-10 multimodalvisionmulti-input reasoning | OpenAI | 21.4 Inference | 0.0 | 21.4 | 0.0 | 0.0 | 3.6 | $20 in / $80 out |
| 159 | Qwen2.5-Coder 32B Instruct qwen-2.5-coder-32b-instruct textinference | Alibaba Cloud / Qwen Team | 21.4 Inference | 0.0 | 21.4 | 0.0 | 0.0 | 81.5 | $0.09 in / $0.09 out |
| 160 | Gemma 3 12B gemma-3-12b-it multimodalvisionmulti-input reasoning | Google | 20.3 Inference | 9.1 | 20.3 | 0.0 | 0.0 | 80.7 | $0.05 in / $0.1 out |
Claude Opus 4.1
Anthropic
30.5
$15 in / $75 out
Claude Opus 4.5
Anthropic
30.5
$5 in / $25 out
Claude Sonnet 4.5
Anthropic
30.5
$3 in / $15 out
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $5 in / $25 out |
| $3 in / $15 out |
| $3 in / $15 out |
Claude Sonnet 4.6
Anthropic
30.5
$3 in / $15 out
GLM-4.7-Flash
Zhipu AI
29.7
$0.07 in / $0.4 out
GPT-4.5
OpenAI
29.7
$75 in / $150 out
Granite 3.3 8B Instruct
IBM
29.7
$0.5 in / $0.5 out
QwQ-32B-Preview
Alibaba Cloud / Qwen Team
29.7
$0.15 in / $0.6 out
Llama 3.1 8B Instruct
Meta
26.7
$0.03 in / $0.03 out
K-EXAONE-236B-A23B
LG AI Research
24.9
$0.6 in / $1 out
GLM-5
Zhipu AI
23.0
$1 in / $3.2 out
Llama 3.1 405B Instruct
Meta
21.4
$0.89 in / $0.89 out
Llama 3.1 70B Instruct
Meta
21.4
$0.2 in / $0.2 out
Llama 3.3 70B Instruct
Meta
21.4
$0.2 in / $0.2 out
Mistral Large 2
Mistral AI
21.4
$2 in / $6 out
Mistral NeMo Instruct
Mistral AI
21.4
$0.15 in / $0.15 out
Mistral Small 3 24B Instruct
Mistral AI
21.4
$0.07 in / $0.14 out
o3-pro
OpenAI
21.4
$20 in / $80 out
Qwen2.5-Coder 32B Instruct
Alibaba Cloud / Qwen Team
21.4
$0.09 in / $0.09 out
Gemma 3 12B
20.3
$0.05 in / $0.1 out