Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
296
Tracked models
27
Providers
253
Benchmarked
27.4
Avg. index
296 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 121 | Qwen3-Next-80B-A3B-Instruct qwen3-next-80b-a3b-instruct textinference | Alibaba Cloud / Qwen Team | 29.5 Benchmarks | 29.5 | 6.1 | 17.9 | 0.0 | 51.9 | $0.15 in / $1.5 out |
| 122 | Qwen3 VL 32B Instruct qwen3-vl-32b-instruct multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 29.4 Benchmarks | 29.4 | 0.0 | 27.9 | 0.0 | 0.0 | |
| 123 | Llama 4 Scout llama-4-scout multimodalvisionmulti-input reasoning | Meta | 29.0 Benchmarks | 29.0 | 62.1 | 0.0 | 0.0 | 78.1 | $0.08 in / $0.3 out |
| 124 | DeepSeek R1 Distill Llama 70B deepseek-r1-distill-llama-70b textinference | DeepSeek | 28.8 Benchmarks | 28.8 | 16.6 | 0.0 | 0.0 | 66.6 | $0.1 in / $0.4 out |
| 125 | QwQ-32B qwq-32b textinference | Alibaba Cloud / Qwen Team | 28.8 Benchmarks | 28.8 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 126 | QwQ-32B-Preview qwq-32b-preview textinference | Alibaba Cloud / Qwen Team | 28.8 Benchmarks | 28.8 | 29.7 | 0.0 | 0.0 | 61.9 | $0.15 in / $0.6 out |
| 127 | GPT-4.1 gpt-4.1-2025-04-14 multimodalvisionmulti-input reasoning | OpenAI | 28.7 Benchmarks | 28.7 | 75.9 | 32.8 | 17.3 | 34.6 | |
| 128 | Qwen3 VL 30B A3B Instruct qwen3-vl-30b-a3b-instruct multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 28.3 Benchmarks | 28.3 | 66.0 | 23.6 | 0.0 | 63.7 | |
| 129 | LongCat-Flash-Chat longcat-flash-chat codeprogrammingtool use | Meituan | 27.9 Benchmarks | 27.9 | 52.7 | 49.2 | 39.1 | 57.9 | $0.3 in / $1.2 out |
| 130 | Pixtral Large pixtral-large multimodalvisionmulti-input reasoning | Mistral AI | 27.8 Benchmarks | 27.8 | 7.0 | 0.0 | 0.0 | 22.4 | |
| 131 | GLM-4.5-Air glm-4.5-air codeprogrammingtool use | Zhipu AI | 27.7 Benchmarks | 27.7 | 0.0 | 24.9 | 20.0 | 0.0 | N/A |
| 132 | Gemini 1.5 Pro gemini-1.5-pro multimodalvisionmulti-input reasoning | Google | 27.6 Benchmarks | 27.6 | 65.2 | 0.0 | 0.0 | 24.3 | |
| 133 | MiniCPM-SALA minicpm-sala textinference | OpenBMB | 27.5 Benchmarks | 27.5 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 134 | DeepSeek-V3 deepseek-v3 codeprogrammingtool use | DeepSeek | 27.3 Benchmarks | 27.3 | 58.0 | 0.0 | 10.4 | 60.5 | $0.27 in / $1.1 out |
| 135 | Grok-2 grok-2 multimodalvisionmulti-input reasoning | xAI | 27.1 Benchmarks | 27.1 | 38.3 | 0.0 | 0.0 | 25.4 | $2 in / $10 out |
| 136 | Kimi K2 Base kimi-k2-base textinference | Moonshot AI | 26.9 Benchmarks | 26.9 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 137 | DeepSeek R1 Distill Qwen 32B deepseek-r1-distill-qwen-32b textinference | DeepSeek | 26.6 Benchmarks | 26.6 | 16.6 | 0.0 | 0.0 | 75.9 | $0.12 in / $0.18 out |
| 138 | GPT-5 nano gpt-5-nano-2025-08-07 multimodalvisionmulti-input reasoning | OpenAI | 26.3 Benchmarks | 26.3 | 0.0 | 0.0 | 11.8 | 0.0 | |
| 139 | GPT OSS 20B gpt-oss-20b textinference | OpenAI | 25.8 Benchmarks | 25.8 | 77.2 | 6.0 | 0.0 | 79.0 | $0.1 in / $0.5 out |
| 140 | o1-mini o1-mini textinference | OpenAI | 25.7 Benchmarks | 25.7 | 61.3 | 0.0 | 0.0 | 30.1 | $3 in / $12 out |
Qwen3-Next-80B-A3B-Instruct
Alibaba Cloud / Qwen Team
29.5
$0.15 in / $1.5 out
Qwen3 VL 32B Instruct
Alibaba Cloud / Qwen Team
29.4
N/A
Llama 4 Scout
Meta
29.0
$0.08 in / $0.3 out
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| N/A |
| $2 in / $8 out |
| $0.2 in / $0.7 out |
| $2 in / $6 out |
| $2.5 in / $10 out |
| N/A |
DeepSeek R1 Distill Llama 70B
DeepSeek
28.8
$0.1 in / $0.4 out
QwQ-32B
Alibaba Cloud / Qwen Team
28.8
N/A
QwQ-32B-Preview
Alibaba Cloud / Qwen Team
28.8
$0.15 in / $0.6 out
GPT-4.1
OpenAI
28.7
$2 in / $8 out
Qwen3 VL 30B A3B Instruct
Alibaba Cloud / Qwen Team
28.3
$0.2 in / $0.7 out
LongCat-Flash-Chat
Meituan
27.9
$0.3 in / $1.2 out
Pixtral Large
Mistral AI
27.8
$2 in / $6 out
GLM-4.5-Air
Zhipu AI
27.7
N/A
Gemini 1.5 Pro
27.6
$2.5 in / $10 out
MiniCPM-SALA
OpenBMB
27.5
N/A
DeepSeek-V3
DeepSeek
27.3
$0.27 in / $1.1 out
Grok-2
xAI
27.1
$2 in / $10 out
Kimi K2 Base
Moonshot AI
26.9
N/A
DeepSeek R1 Distill Qwen 32B
DeepSeek
26.6
$0.12 in / $0.18 out
GPT-5 nano
OpenAI
26.3
N/A
GPT OSS 20B
OpenAI
25.8
$0.1 in / $0.5 out
o1-mini
OpenAI
25.7
$3 in / $12 out