Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
294
Tracked models
27
Providers
251
Benchmarked
13.2
Avg. index
294 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 41 | DeepSeek-V3.2-Speciale deepseek-v3.2-speciale codeprogrammingtool use | DeepSeek | 45.9 Programming | 54.5 | 0.0 | 9.7 | 45.9 | 0.0 | N/A |
| 42 | Claude Sonnet 4 claude-sonnet-4-20250514 multimodalvisionmulti-input reasoning | Anthropic | 44.9 Programming | 41.0 | 0.0 | 49.4 | 44.9 | 0.0 | |
| 43 | Qwen3.6-27B qwen3.6-27b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 44.6 Programming | 59.8 | 0.0 | 0.0 | 44.6 | 0.0 | N/A |
| 44 | GLM-4.7 glm-4.7 multimodalvisionmulti-input reasoning | Zhipu AI | 44.5 Programming | 63.2 | 52.8 | 28.2 | 44.5 | 40.6 | $0.6 in / $2.2 out |
| 45 | MiniMax M2 minimax-m2 codeprogrammingtool use | MiniMax | 42.8 Programming | 32.2 | 55.9 | 41.4 | 42.8 | 52.3 | $0.3 in / $1.2 out |
| 46 | Qwen3.5-27B qwen3.5-27b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 42.4 Programming | 61.9 | 66.8 | 47.5 | 42.4 | 43.9 | $0.3 in / $2.4 out |
| 47 | Qwen3.5-122B-A10B qwen3.5-122b-a10b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 41.5 Programming | 64.8 | 66.8 | 51.6 | 41.5 | 38.1 | $0.4 in / $3.2 out |
| 48 | Muse Spark muse-spark multimodalvisionmulti-input reasoning | Meta | 41.3 Programming | 71.0 | 0.0 | 67.3 | 41.3 | 0.0 | N/A |
| 49 | GLM-4.5 glm-4.5 codeprogrammingtool use | Zhipu AI | 40.6 Programming | 34.3 | 0.0 | 36.4 | 40.6 | 0.0 | N/A |
| 50 | DeepSeek-V3.2-Exp deepseek-v3.2-exp codeprogrammingtool use | DeepSeek | 40.5 Programming | 52.7 | 0.0 | 28.8 | 40.5 | 0.0 | N/A |
| 51 | GPT-5.2 Codex gpt-5.2-codex multimodalvisionmulti-input reasoning | OpenAI | 40.4 Programming | 0.0 | 48.6 | 0.0 | 40.4 | 19.5 | |
| 52 | Claude 3.7 Sonnet claude-3-7-sonnet-20250219 multimodalvisionmulti-input reasoning | Anthropic | 40.1 Programming | 43.7 | 30.1 | 49.0 | 40.1 | 13.2 | |
| 53 | Grok Code Fast 1 grok-code-fast-1 codeprogrammingtool use | xAI | 39.7 Programming | 0.0 | 47.2 | 0.0 | 39.7 | 49.5 | $0.2 in / $1.5 out |
| 54 | LongCat-Flash-Chat longcat-flash-chat codeprogrammingtool use | Meituan | 39.4 Programming | 28.1 | 51.9 | 49.2 | 39.4 | 57.7 | $0.3 in / $1.2 out |
| 55 | MiMo-V2-Flash mimo-v2-flash codeprogrammingtool use | Xiaomi | 39.3 Programming | 53.7 | 79.8 | 27.2 | 39.3 | 85.9 | $0.1 in / $0.3 out |
| 56 | LongCat-Flash-Thinking-2601 longcat-flash-thinking-2601 codeprogrammingtool use | Meituan | 38.0 Programming | 56.3 | 51.9 | 30.8 | 38.0 | 57.7 | |
| 57 | Qwen3-Coder 480B A35B Instruct qwen3-coder-480b-a35b-instruct codeprogrammingtool use | Alibaba Cloud / Qwen Team | 36.6 Programming | 0.0 | 0.0 | 50.7 | 36.6 | 0.0 | |
| 58 | Qwen3 Max qwen3-max codeprogrammingtool use | Alibaba Cloud / Qwen Team | 36.6 Programming | 30.0 | 55.9 | 0.0 | 36.6 | 31.7 | $0.5 in / $5 out |
| 59 | MiniMax M2.7 minimax-m2.7 codeprogrammingtool use | MiniMax | 35.9 Programming | 0.0 | 52.8 | 50.8 | 35.9 | 55.0 | $0.3 in / $1.2 out |
| 60 | Qwen3.5-35B-A3B qwen3.5-35b-a3b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 34.4 Programming | 57.2 | 66.8 | 44.3 | 34.4 | 46.4 | $0.25 in / $2 out |
DeepSeek-V3.2-Speciale
DeepSeek
45.9
N/A
Claude Sonnet 4
Anthropic
44.9
N/A
Qwen3.6-27B
Alibaba Cloud / Qwen Team
44.6
N/A
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| N/A |
| $1.75 in / $14 out |
| $3 in / $15 out |
| $0.3 in / $1.2 out |
| N/A |
GLM-4.7
Zhipu AI
44.5
$0.6 in / $2.2 out
MiniMax M2
MiniMax
42.8
$0.3 in / $1.2 out
Qwen3.5-27B
Alibaba Cloud / Qwen Team
42.4
$0.3 in / $2.4 out
Qwen3.5-122B-A10B
Alibaba Cloud / Qwen Team
41.5
$0.4 in / $3.2 out
Muse Spark
Meta
41.3
N/A
GLM-4.5
Zhipu AI
40.6
N/A
DeepSeek-V3.2-Exp
DeepSeek
40.5
N/A
GPT-5.2 Codex
OpenAI
40.4
$1.75 in / $14 out
Claude 3.7 Sonnet
Anthropic
40.1
$3 in / $15 out
Grok Code Fast 1
xAI
39.7
$0.2 in / $1.5 out
LongCat-Flash-Chat
Meituan
39.4
$0.3 in / $1.2 out
MiMo-V2-Flash
Xiaomi
39.3
$0.1 in / $0.3 out
LongCat-Flash-Thinking-2601
Meituan
38.0
$0.3 in / $1.2 out
Qwen3-Coder 480B A35B Instruct
Alibaba Cloud / Qwen Team
36.6
N/A
Qwen3 Max
Alibaba Cloud / Qwen Team
36.6
$0.5 in / $5 out
MiniMax M2.7
MiniMax
35.9
$0.3 in / $1.2 out
Qwen3.5-35B-A3B
Alibaba Cloud / Qwen Team
34.4
$0.25 in / $2 out