Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
309
Tracked models
27
Providers
264
Benchmarked
11.8
Avg. index
309 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 21 | GPT-5.2 Pro gpt-5.2-pro-2025-12-11 multimodalvisionmulti-input reasoning | OpenAI | 53.4 Agentic | 65.5 | 0.0 | 53.4 | 0.0 | 0.0 | N/A |
| 22 | Claude Haiku 4.5 claude-haiku-4-5-20251001 multimodalvisionmulti-input reasoning | Anthropic | 53.3 Agentic | 31.5 | 55.3 | 53.3 | 54.9 | 38.7 | |
| 23 | Kimi K2-Thinking-0905 kimi-k2-thinking-0905 codeprogrammingtool use | Moonshot AI | 52.8 Agentic | 68.7 | 0.0 | 52.8 | 59.8 | 0.0 | |
| 24 | MiniMax M2.1 minimax-m2.1 codeprogrammingtool use | MiniMax | 52.1 Agentic | 40.8 | 72.2 | 52.1 | 48.7 | 68.6 | $0.3 in / $1.2 out |
| 25 | Seed 2.0 Pro seed-2.0-pro multimodalvisionmulti-input reasoning | ByteDance | 51.9 Agentic | 68.0 | 0.0 | 51.9 | 58.5 | 0.0 | N/A |
| 26 | Qwen3-Coder 480B A35B Instruct qwen3-coder-480b-a35b-instruct codeprogrammingtool use | Alibaba Cloud / Qwen Team | 50.7 Agentic | 0.0 | 0.0 | 50.7 | 33.6 | 0.0 | |
| 27 | MiniMax M2.5 minimax-m2.5 codeprogrammingtool use | MiniMax | 50.4 Agentic | 0.0 | 72.2 | 50.4 | 56.9 | 68.6 | $0.3 in / $1.2 out |
| 28 | Claude Sonnet 4 claude-sonnet-4-20250514 multimodalvisionmulti-input reasoning | Anthropic | 49.4 Agentic | 39.9 | 0.0 | 49.4 | 43.6 | 0.0 | |
| 29 | Claude 3.7 Sonnet claude-3-7-sonnet-20250219 multimodalvisionmulti-input reasoning | Anthropic | 49.1 Agentic | 43.0 | 0.0 | 49.1 | 38.8 | 0.0 | |
| 30 | Qwen3.5-122B-A10B qwen3.5-122b-a10b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 48.7 Agentic | 63.6 | 41.1 | 48.7 | 39.5 | 43.0 | $0.4 in / $3.2 out |
| 31 | LongCat-Flash-Chat longcat-flash-chat codeprogrammingtool use | Meituan | 48.1 Agentic | 26.9 | 0.0 | 48.1 | 37.4 | 0.0 | N/A |
| 32 | Claude Sonnet 4.6 claude-sonnet-4-6 multimodalvisionmulti-input reasoning | Anthropic | 47.6 Agentic | 64.7 | 14.6 | 47.6 | 66.4 | 9.3 | |
| 33 | DeepSeek-V4-Flash-Max deepseek-v4-flash-max codeprogrammingtool use | DeepSeek | 47.6 Agentic | 58.3 | 89.2 | 47.6 | 44.2 | 98.7 | |
| 34 | Kimi K2.5 kimi-k2.5 multimodalvisionmulti-input reasoning | Moonshot AI | 47.3 Agentic | 67.2 | 0.0 | 47.3 | 44.6 | 0.0 | N/A |
| 35 | GLM-5.1 glm-5.1 codeprogrammingtool use | Zhipu AI | 46.0 Agentic | 66.3 | 21.5 | 46.0 | 54.9 | 31.6 | $1.4 in / $4.4 out |
| 36 | Qwen3.5-27B qwen3.5-27b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 44.9 Agentic | 60.8 | 41.1 | 44.9 | 40.3 | 53.2 | $0.3 in / $2.4 out |
| 37 | o1 o1-2024-12-17 multimodalvisionmulti-input reasoning | OpenAI | 44.7 Agentic | 42.3 | 0.0 | 44.7 | 6.0 | 0.0 | N/A |
| 38 | GPT-5.2 gpt-5.2-2025-12-11 multimodalvisionmulti-input reasoning | OpenAI | 44.4 Agentic | 75.3 | 66.9 | 44.4 | 70.7 | 27.1 | |
| 39 | GLM-5 glm-5 codeprogrammingtool use | Zhipu AI | 43.6 Agentic | 0.0 | 8.7 | 43.6 | 62.5 | 31.8 | $1 in / $3.2 out |
| 40 | Qwen3.6 Plus qwen3.6-plus multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 42.1 Agentic | 70.2 | 72.2 | 42.1 | 61.0 | 44.9 | $0.5 in / $3 out |
GPT-5.2 Pro
OpenAI
53.4
N/A
Claude Haiku 4.5
Anthropic
53.3
$1 in / $5 out
Kimi K2-Thinking-0905
Moonshot AI
52.8
N/A
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $1 in / $5 out |
| N/A |
| N/A |
| N/A |
| N/A |
| $3 in / $15 out |
| $0.14 in / $0.28 out |
| $1.75 in / $14 out |
MiniMax M2.1
MiniMax
52.1
$0.3 in / $1.2 out
Seed 2.0 Pro
ByteDance
51.9
N/A
Qwen3-Coder 480B A35B Instruct
Alibaba Cloud / Qwen Team
50.7
N/A
MiniMax M2.5
MiniMax
50.4
$0.3 in / $1.2 out
Claude Sonnet 4
Anthropic
49.4
N/A
Claude 3.7 Sonnet
Anthropic
49.1
N/A
Qwen3.5-122B-A10B
Alibaba Cloud / Qwen Team
48.7
$0.4 in / $3.2 out
LongCat-Flash-Chat
Meituan
48.1
N/A
Claude Sonnet 4.6
Anthropic
47.6
$3 in / $15 out
DeepSeek-V4-Flash-Max
DeepSeek
47.6
$0.14 in / $0.28 out
Kimi K2.5
Moonshot AI
47.3
N/A
GLM-5.1
Zhipu AI
46.0
$1.4 in / $4.4 out
Qwen3.5-27B
Alibaba Cloud / Qwen Team
44.9
$0.3 in / $2.4 out
o1
OpenAI
44.7
N/A
GPT-5.2
OpenAI
44.4
$1.75 in / $14 out
GLM-5
Zhipu AI
43.6
$1 in / $3.2 out
Qwen3.6 Plus
Alibaba Cloud / Qwen Team
42.1
$0.5 in / $3 out