Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
296
Tracked models
27
Providers
253
Benchmarked
34.7
Avg. index
296 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 141 | Llama 3.2 11B Instruct llama-3.2-11b-instruct multimodalvisionmulti-input reasoning | Meta | 37.5 overall | 4.0 | 60.3 | 0.0 | 0.0 | 94.9 | $0.05 in / $0.05 out |
| 142 | Nova Micro nova-micro textinference | Amazon | 37.3 overall | 9.1 | 52.7 | 0.0 | 0.0 | 91.3 | $0.03 in / $0.14 out |
| 143 | Qwen3 Max qwen3-max codeprogrammingtool use | Alibaba Cloud / Qwen Team | 37.1 overall | 29.8 | 55.2 | 0.0 | 35.8 | 31.3 | $0.5 in / $5 out |
| 144 | o1-mini o1-mini textinference | OpenAI | 37.1 overall | 25.7 | 61.3 | 0.0 | 0.0 | 30.1 | $3 in / $12 out |
| 145 | GPT OSS 20B gpt-oss-20b textinference | OpenAI | 37.0 overall | 25.8 | 77.2 | 6.0 | 0.0 | 79.0 | $0.1 in / $0.5 out |
| 146 | Qwen3-Next-80B-A3B-Thinking qwen3-next-80b-a3b-thinking textinference | Alibaba Cloud / Qwen Team | 36.8 overall | 44.7 | 6.1 | 41.7 | 0.0 | 51.9 | $0.15 in / $1.5 out |
| 147 | Grok-4.1 Thinking grok-4.1-thinking-2025-11-17 multimodalvisionmulti-input reasoning | xAI | 36.7 overall | 0.0 | 48.5 | 0.0 | 0.0 | 17.8 | |
| 148 | GLM-4.5 glm-4.5 codeprogrammingtool use | Zhipu AI | 36.6 overall | 33.8 | 0.0 | 36.4 | 40.3 | 0.0 | N/A |
| 149 | DeepSeek-V3.2-Speciale deepseek-v3.2-speciale codeprogrammingtool use | DeepSeek | 36.5 overall | 53.8 | 0.0 | 8.5 | 44.9 | 0.0 | |
| 150 | Qwen3 VL 4B Instruct qwen3-vl-4b-instruct multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 35.6 overall | 19.6 | 66.0 | 19.5 | 0.0 | 70.3 | |
| 151 | Llama 3.1 Nemotron Ultra 253B v1 llama-3.1-nemotron-ultra-253b-v1 textinference | NVIDIA | 35.4 overall | 35.4 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 152 | Qwen3 VL 4B Thinking qwen3-vl-4b-thinking multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 35.4 overall | 23.0 | 66.0 | 18.9 | 0.0 | 60.4 | |
| 153 | Jamba 1.5 Mini jamba-1.5-mini textinference | AI21 Labs | 35.4 overall | 4.7 | 65.8 | 0.0 | 0.0 | 72.4 | $0.2 in / $0.4 out |
| 154 | GPT-5.4 nano gpt-5.4-nano multimodalvisionmulti-input reasoning | OpenAI | 35.3 overall | 45.6 | 76.5 | 9.7 | 10.0 | 57.1 | $0.2 in / $1.25 out |
| 155 | Kimi-k1.5 kimi-k1.5 multimodalvisionmulti-input reasoning | Moonshot AI | 35.3 overall | 35.3 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 156 | GPT-4.1 gpt-4.1-2025-04-14 multimodalvisionmulti-input reasoning | OpenAI | 35.3 overall | 28.7 | 75.9 | 32.8 | 17.3 | 34.6 | |
| 157 | QwQ-32B-Preview qwq-32b-preview textinference | Alibaba Cloud / Qwen Team | 35.2 overall | 28.8 | 29.7 | 0.0 | 0.0 | 61.9 | $0.15 in / $0.6 out |
| 158 | Claude 3.5 Sonnet claude-3-5-sonnet-20241022 multimodalvisionmulti-input reasoning | Anthropic | 34.9 overall | 33.7 | 68.2 | 38.7 | 12.9 | 24.6 | |
| 159 | Claude 3 Opus claude-3-opus-20240229 multimodalvisionmulti-input reasoning | Anthropic | 34.9 overall | 19.3 | 71.7 | 0.0 | 0.0 | 19.5 | |
| 160 | Qwen3 VL 8B Instruct qwen3-vl-8b-instruct multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 34.9 overall | 9.8 | 66.0 | 26.7 | 0.0 | 75.3 | $0.08 in / $0.5 out |
Llama 3.2 11B Instruct
Meta
37.5
$0.05 in / $0.05 out
Nova Micro
Amazon
37.3
$0.03 in / $0.14 out
Qwen3 Max
Alibaba Cloud / Qwen Team
37.1
$0.5 in / $5 out
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $3 in / $15 out |
| N/A |
| $0.1 in / $0.6 out |
| $0.1 in / $1 out |
| $2 in / $8 out |
| $3 in / $15 out |
| $15 in / $75 out |
o1-mini
OpenAI
37.1
$3 in / $12 out
GPT OSS 20B
OpenAI
37.0
$0.1 in / $0.5 out
Qwen3-Next-80B-A3B-Thinking
Alibaba Cloud / Qwen Team
36.8
$0.15 in / $1.5 out
Grok-4.1 Thinking
xAI
36.7
$3 in / $15 out
GLM-4.5
Zhipu AI
36.6
N/A
DeepSeek-V3.2-Speciale
DeepSeek
36.5
N/A
Qwen3 VL 4B Instruct
Alibaba Cloud / Qwen Team
35.6
$0.1 in / $0.6 out
Llama 3.1 Nemotron Ultra 253B v1
NVIDIA
35.4
N/A
Qwen3 VL 4B Thinking
Alibaba Cloud / Qwen Team
35.4
$0.1 in / $1 out
Jamba 1.5 Mini
AI21 Labs
35.4
$0.2 in / $0.4 out
GPT-5.4 nano
OpenAI
35.3
$0.2 in / $1.25 out
Kimi-k1.5
Moonshot AI
35.3
N/A
GPT-4.1
OpenAI
35.3
$2 in / $8 out
QwQ-32B-Preview
Alibaba Cloud / Qwen Team
35.2
$0.15 in / $0.6 out
Claude 3.5 Sonnet
Anthropic
34.9
$3 in / $15 out
Claude 3 Opus
Anthropic
34.9
$15 in / $75 out
Qwen3 VL 8B Instruct
Alibaba Cloud / Qwen Team
34.9
$0.08 in / $0.5 out