Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
294
Tracked models
27
Providers
251
Benchmarked
11.4
Avg. index
294 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 61 | Qwen3-235B-A22B-Thinking-2507 qwen3-235b-a22b-thinking-2507 textinference | Alibaba Cloud / Qwen Team | 26.8 Agentic | 46.9 | 66.8 | 26.8 | 0.0 | 39.4 | $0.3 in / $3 out |
| 62 | Qwen3 VL 8B Instruct qwen3-vl-8b-instruct multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 26.7 Agentic | 9.8 | 66.8 | 26.7 | 0.0 | 75.6 | $0.08 in / $0.5 out |
| 63 | GLM-4.5-Air glm-4.5-air codeprogrammingtool use | Zhipu AI | 24.9 Agentic | 28.1 | 0.0 | 24.9 | 20.2 | 0.0 | N/A |
| 64 | Qwen3 VL 30B A3B Instruct qwen3-vl-30b-a3b-instruct multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 23.6 Agentic | 28.7 | 66.8 | 23.6 | 0.0 | 63.3 | |
| 65 | Qwen3 VL 8B Thinking qwen3-vl-8b-thinking multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 23.5 Agentic | 35.9 | 66.8 | 23.5 | 0.0 | 45.6 | |
| 66 | Qwen3 VL 30B A3B Thinking qwen3-vl-30b-a3b-thinking multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 21.3 Agentic | 35.5 | 66.8 | 21.3 | 0.0 | 60.0 | |
| 67 | MiniMax M1 80K minimax-m1-80k codeprogrammingtool use | MiniMax | 20.9 Agentic | 24.6 | 84.9 | 20.9 | 19.4 | 41.7 | $0.55 in / $2.2 out |
| 68 | o3 o3-2025-04-16 multimodalvisionmulti-input reasoning | OpenAI | 20.5 Agentic | 46.2 | 38.4 | 20.5 | 30.7 | 27.7 | $2 in / $8 out |
| 69 | Qwen3 VL 4B Instruct qwen3-vl-4b-instruct multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 19.5 Agentic | 19.7 | 66.8 | 19.5 | 0.0 | 70.6 | |
| 70 | Qwen3 VL 4B Thinking qwen3-vl-4b-thinking multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 18.9 Agentic | 23.1 | 66.8 | 18.9 | 0.0 | 60.6 | |
| 71 | Sarvam-105B sarvam-105b codeprogrammingtool use | Sarvam AI | 18.8 Agentic | 43.2 | 0.0 | 18.8 | 12.4 | 0.0 | N/A |
| 72 | Qwen3-Next-80B-A3B-Instruct qwen3-next-80b-a3b-instruct textinference | Alibaba Cloud / Qwen Team | 17.9 Agentic | 29.7 | 6.1 | 17.9 | 0.0 | 51.9 | $0.15 in / $1.5 out |
| 73 | Qwen3.6-35B-A3B qwen3.6-35b-a3b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 17.7 Agentic | 55.7 | 0.0 | 17.7 | 26.6 | 0.0 | N/A |
| 74 | DeepSeek-V3.2 (Thinking) deepseek-reasoner codeprogrammingtool use | DeepSeek | 16.6 Agentic | 53.1 | 0.0 | 16.6 | 45.9 | 0.0 | N/A |
| 75 | DeepSeek-V3.2 deepseek-v3.2 codeprogrammingtool use | DeepSeek | 16.6 Agentic | 58.1 | 52.5 | 16.6 | 45.9 | 70.0 | $0.26 in / $0.38 out |
| 76 | Grok 4 Fast grok-4-fast multimodalvisionmulti-input reasoning | xAI | 15.4 Agentic | 58.0 | 68.2 | 15.4 | 0.0 | 67.2 | $0.2 in / $0.5 out |
| 77 | DeepSeek-V3.1 deepseek-v3.1 codeprogrammingtool use | DeepSeek | 15.3 Agentic | 38.7 | 40.2 | 15.3 | 28.7 | 58.9 | $0.27 in / $1 out |
| 78 | GPT-4o gpt-4o-2024-08-06 multimodalvisionmulti-input reasoning | OpenAI | 14.9 Agentic | 31.6 | 45.9 | 14.9 | 4.4 | 26.8 | |
| 79 | Kimi K2 Instruct kimi-k2-instruct codeprogrammingtool use | Moonshot AI | 14.8 Agentic | 24.9 | 46.6 | 14.8 | 15.3 | 61.7 | $0.5 in / $0.5 out |
| 80 | GLM-4.7-Flash glm-4.7-flash codeprogrammingtool use | Zhipu AI | 12.0 Agentic | 38.5 | 29.1 | 12.0 | 21.2 | 72.2 | $0.07 in / $0.4 out |
Qwen3-235B-A22B-Thinking-2507
Alibaba Cloud / Qwen Team
26.8
$0.3 in / $3 out
Qwen3 VL 8B Instruct
Alibaba Cloud / Qwen Team
26.7
$0.08 in / $0.5 out
GLM-4.5-Air
Zhipu AI
24.9
N/A
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $0.2 in / $0.7 out |
| $0.18 in / $2.09 out |
| $0.2 in / $1 out |
| $0.1 in / $0.6 out |
| $0.1 in / $1 out |
| $2.5 in / $10 out |
Qwen3 VL 30B A3B Instruct
Alibaba Cloud / Qwen Team
23.6
$0.2 in / $0.7 out
Qwen3 VL 8B Thinking
Alibaba Cloud / Qwen Team
23.5
$0.18 in / $2.09 out
Qwen3 VL 30B A3B Thinking
Alibaba Cloud / Qwen Team
21.3
$0.2 in / $1 out
MiniMax M1 80K
MiniMax
20.9
$0.55 in / $2.2 out
o3
OpenAI
20.5
$2 in / $8 out
Qwen3 VL 4B Instruct
Alibaba Cloud / Qwen Team
19.5
$0.1 in / $0.6 out
Qwen3 VL 4B Thinking
Alibaba Cloud / Qwen Team
18.9
$0.1 in / $1 out
Sarvam-105B
Sarvam AI
18.8
N/A
Qwen3-Next-80B-A3B-Instruct
Alibaba Cloud / Qwen Team
17.9
$0.15 in / $1.5 out
Qwen3.6-35B-A3B
Alibaba Cloud / Qwen Team
17.7
N/A
DeepSeek-V3.2 (Thinking)
DeepSeek
16.6
N/A
DeepSeek-V3.2
DeepSeek
16.6
$0.26 in / $0.38 out
Grok 4 Fast
xAI
15.4
$0.2 in / $0.5 out
DeepSeek-V3.1
DeepSeek
15.3
$0.27 in / $1 out
GPT-4o
OpenAI
14.9
$2.5 in / $10 out
Kimi K2 Instruct
Moonshot AI
14.8
$0.5 in / $0.5 out
GLM-4.7-Flash
Zhipu AI
12.0
$0.07 in / $0.4 out