Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
294
Tracked models
27
Providers
251
Benchmarked
11.4
Avg. index
294 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 21 | MiniMax M2.5 minimax-m2.5 codeprogrammingtool use | MiniMax | 53.0 Agentic | 0.0 | 73.9 | 53.0 | 56.3 | 57.7 | $0.3 in / $1.2 out |
| 22 | Qwen3.5-122B-A10B qwen3.5-122b-a10b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 51.6 Agentic | 64.8 | 66.8 | 51.6 | 41.5 | 38.1 | $0.4 in / $3.2 out |
| 23 | GLM-5 glm-5 codeprogrammingtool use | Zhipu AI | 51.3 Agentic | 0.0 | 22.1 | 51.3 | 65.3 | 30.2 | $1 in / $3.2 out |
| 24 | MiniMax M2.7 minimax-m2.7 codeprogrammingtool use | MiniMax | 50.8 Agentic | 0.0 | 52.8 | 50.8 | 35.9 | 55.0 | $0.3 in / $1.2 out |
| 25 | Qwen3-Coder 480B A35B Instruct qwen3-coder-480b-a35b-instruct codeprogrammingtool use | Alibaba Cloud / Qwen Team | 50.7 Agentic | 0.0 | 0.0 | 50.7 | 36.6 | 0.0 | |
| 26 | GPT-5.2 gpt-5.2-2025-12-11 multimodalvisionmulti-input reasoning | OpenAI | 50.3 Agentic | 76.9 | 71.4 | 50.3 | 72.4 | 26.4 | |
| 27 | Claude Sonnet 4.6 claude-sonnet-4-6 multimodalvisionmulti-input reasoning | Anthropic | 49.6 Agentic | 66.1 | 30.1 | 49.6 | 68.9 | 13.2 | |
| 28 | Kimi K2.5 kimi-k2.5 multimodalvisionmulti-input reasoning | Moonshot AI | 49.5 Agentic | 68.0 | 66.8 | 49.5 | 48.5 | 38.1 | $0.6 in / $3 out |
| 29 | Claude Sonnet 4 claude-sonnet-4-20250514 multimodalvisionmulti-input reasoning | Anthropic | 49.4 Agentic | 41.0 | 0.0 | 49.4 | 44.9 | 0.0 | |
| 30 | Qwen3.6 Plus qwen3.6-plus multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 49.3 Agentic | 71.9 | 0.0 | 49.3 | 62.2 | 0.0 | N/A |
| 31 | LongCat-Flash-Chat longcat-flash-chat codeprogrammingtool use | Meituan | 49.2 Agentic | 28.1 | 51.9 | 49.2 | 39.4 | 57.7 | $0.3 in / $1.2 out |
| 32 | Claude 3.7 Sonnet claude-3-7-sonnet-20250219 multimodalvisionmulti-input reasoning | Anthropic | 49.0 Agentic | 43.7 | 30.1 | 49.0 | 40.1 | 13.2 | |
| 33 | Qwen3.5-27B qwen3.5-27b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 47.5 Agentic | 61.9 | 66.8 | 47.5 | 42.4 | 43.9 | $0.3 in / $2.4 out |
| 34 | Kimi K2.6 kimi-k2.6 multimodalvisionmulti-input reasoning | Moonshot AI | 45.3 Agentic | 68.5 | 66.8 | 45.3 | 81.0 | 33.3 | $0.95 in / $4 out |
| 35 | Step-3.5-Flash step-3.5-flash codeprogrammingtool use | StepFun | 45.3 Agentic | 62.3 | 63.2 | 45.3 | 53.0 | 82.1 | $0.1 in / $0.4 out |
| 36 | o1 o1-2024-12-17 multimodalvisionmulti-input reasoning | OpenAI | 44.7 Agentic | 43.1 | 41.9 | 44.7 | 6.7 | 11.7 | $15 in / $60 out |
| 37 | Qwen3.5-35B-A3B qwen3.5-35b-a3b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 44.3 Agentic | 57.2 | 66.8 | 44.3 | 34.4 | 46.4 | $0.25 in / $2 out |
| 38 | Claude Opus 4.5 claude-opus-4-5-20251101 multimodalvisionmulti-input reasoning | Anthropic | 44.2 Agentic | 56.3 | 30.1 | 44.2 | 74.2 | 10.6 | |
| 39 | Gemini 3 Flash gemini-3-flash-preview multimodalvisionmulti-input reasoning | Google | 42.5 Agentic | 71.3 | 84.9 | 42.5 | 66.6 | 38.9 | |
| 40 | Qwen3-Next-80B-A3B-Thinking qwen3-next-80b-a3b-thinking textinference | Alibaba Cloud / Qwen Team | 41.7 Agentic | 44.9 | 6.1 | 41.7 | 0.0 | 51.9 | $0.15 in / $1.5 out |
MiniMax M2.5
MiniMax
53.0
$0.3 in / $1.2 out
Qwen3.5-122B-A10B
Alibaba Cloud / Qwen Team
51.6
$0.4 in / $3.2 out
GLM-5
Zhipu AI
51.3
$1 in / $3.2 out
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| N/A |
| $1.75 in / $14 out |
| $3 in / $15 out |
| N/A |
| $3 in / $15 out |
| $5 in / $25 out |
| $0.5 in / $3 out |
MiniMax M2.7
MiniMax
50.8
$0.3 in / $1.2 out
Qwen3-Coder 480B A35B Instruct
Alibaba Cloud / Qwen Team
50.7
N/A
GPT-5.2
OpenAI
50.3
$1.75 in / $14 out
Claude Sonnet 4.6
Anthropic
49.6
$3 in / $15 out
Kimi K2.5
Moonshot AI
49.5
$0.6 in / $3 out
Claude Sonnet 4
Anthropic
49.4
N/A
Qwen3.6 Plus
Alibaba Cloud / Qwen Team
49.3
N/A
LongCat-Flash-Chat
Meituan
49.2
$0.3 in / $1.2 out
Claude 3.7 Sonnet
Anthropic
49.0
$3 in / $15 out
Qwen3.5-27B
Alibaba Cloud / Qwen Team
47.5
$0.3 in / $2.4 out
Kimi K2.6
Moonshot AI
45.3
$0.95 in / $4 out
Step-3.5-Flash
StepFun
45.3
$0.1 in / $0.4 out
o1
OpenAI
44.7
$15 in / $60 out
Qwen3.5-35B-A3B
Alibaba Cloud / Qwen Team
44.3
$0.25 in / $2 out
Claude Opus 4.5
Anthropic
44.2
$5 in / $25 out
Gemini 3 Flash
42.5
$0.5 in / $3 out
Qwen3-Next-80B-A3B-Thinking
Alibaba Cloud / Qwen Team
41.7
$0.15 in / $1.5 out