Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
309
Tracked models
27
Providers
264
Benchmarked
11.8
Avg. index
309 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 41 | Step-3.5-Flash step-3.5-flash codeprogrammingtool use | StepFun | 42.0 Agentic | 62.8 | 60.4 | 42.0 | 50.6 | 95.0 | $0.1 in / $0.4 out |
| 42 | Qwen3.5-35B-A3B qwen3.5-35b-a3b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 41.8 Agentic | 55.9 | 41.1 | 41.8 | 31.6 | 56.3 | $0.25 in / $2 out |
| 43 | Qwen3-Next-80B-A3B-Thinking qwen3-next-80b-a3b-thinking textinference | Alibaba Cloud / Qwen Team | 41.7 Agentic | 43.4 | 0.0 | 41.7 | 0.0 | 0.0 | N/A |
| 44 | Claude Opus 4.5 claude-opus-4-5-20251101 multimodalvisionmulti-input reasoning | Anthropic | 41.4 Agentic | 55.3 | 0.0 | 41.4 | 73.5 | 0.0 | |
| 45 | MiniMax M2 minimax-m2 codeprogrammingtool use | MiniMax | 41.2 Agentic | 30.6 | 41.8 | 41.2 | 41.5 | 59.5 | $0.3 in / $1.2 out |
| 46 | MiniMax M2.7 minimax-m2.7 codeprogrammingtool use | MiniMax | 40.1 Agentic | 0.0 | 25.3 | 40.1 | 39.6 | 67.7 | $0.3 in / $1.2 out |
| 47 | Qwen3 VL 235B A22B Thinking qwen3-vl-235b-a22b-thinking multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 39.3 Agentic | 36.8 | 0.0 | 39.3 | 0.0 | 0.0 | |
| 48 | Gemini 3 Flash gemini-3-flash-preview multimodalvisionmulti-input reasoning | Google | 38.8 Agentic | 70.0 | 72.2 | 38.8 | 63.7 | 44.9 | |
| 49 | Claude 3.5 Sonnet claude-3-5-sonnet-20241022 multimodalvisionmulti-input reasoning | Anthropic | 38.7 Agentic | 33.0 | 0.0 | 38.7 | 11.9 | 0.0 | |
| 50 | MiniMax M3 minimax-m3 multimodalvisionmulti-input reasoning | MiniMax | 38.7 Agentic | 54.6 | 72.2 | 38.7 | 74.3 | 48.1 | $0.6 in / $2.4 out |
| 51 | o4-mini o4-mini multimodalvisionmulti-input reasoning | OpenAI | 37.5 Agentic | 47.6 | 0.0 | 37.5 | 30.1 | 0.0 | N/A |
| 52 | GLM-4.5 glm-4.5 codeprogrammingtool use | Zhipu AI | 36.2 Agentic | 32.6 | 0.0 | 36.2 | 38.0 | 0.0 | N/A |
| 53 | GLM-4.6 glm-4.6 multimodalvisionmulti-input reasoning | Zhipu AI | 36.0 Agentic | 45.6 | 0.0 | 36.0 | 43.9 | 0.0 | N/A |
| 54 | GPT-4.5 gpt-4.5 multimodalvisionmulti-input reasoning | OpenAI | 35.8 Agentic | 41.3 | 0.0 | 35.8 | 5.5 | 0.0 | N/A |
| 55 | Qwen3 VL 32B Thinking qwen3-vl-32b-thinking multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 34.1 Agentic | 43.3 | 0.0 | 34.1 | 0.0 | 0.0 | |
| 56 | GPT-4.1 gpt-4.1-2025-04-14 multimodalvisionmulti-input reasoning | OpenAI | 32.8 Agentic | 27.9 | 76.0 | 32.8 | 15.8 | 36.7 | |
| 57 | Qwen3.5-397B-A17B qwen3.5-397b-a17b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 31.1 Agentic | 57.0 | 41.1 | 31.1 | 57.7 | 39.2 | $0.6 in / $3.6 out |
| 58 | LongCat-Flash-Lite longcat-flash-lite codeprogrammingtool use | Meituan | 30.1 Agentic | 23.6 | 74.7 | 30.1 | 24.5 | 96.5 | $0.1 in / $0.4 out |
| 59 | LongCat-Flash-Thinking-2601 longcat-flash-thinking-2601 codeprogrammingtool use | Meituan | 29.0 Agentic | 54.9 | 0.0 | 29.0 | 35.2 | 0.0 | |
| 60 | DeepSeek-V3.2-Exp deepseek-v3.2-exp codeprogrammingtool use | DeepSeek | 28.0 Agentic | 51.5 | 0.0 | 28.0 | 38.8 | 0.0 | N/A |
Step-3.5-Flash
StepFun
42.0
$0.1 in / $0.4 out
Qwen3.5-35B-A3B
Alibaba Cloud / Qwen Team
41.8
$0.25 in / $2 out
Qwen3-Next-80B-A3B-Thinking
Alibaba Cloud / Qwen Team
41.7
N/A
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| N/A |
| N/A |
| $0.5 in / $3 out |
| N/A |
| N/A |
| $2 in / $8 out |
| N/A |
Claude Opus 4.5
Anthropic
41.4
N/A
MiniMax M2
MiniMax
41.2
$0.3 in / $1.2 out
MiniMax M2.7
MiniMax
40.1
$0.3 in / $1.2 out
Qwen3 VL 235B A22B Thinking
Alibaba Cloud / Qwen Team
39.3
N/A
Gemini 3 Flash
38.8
$0.5 in / $3 out
Claude 3.5 Sonnet
Anthropic
38.7
N/A
MiniMax M3
MiniMax
38.7
$0.6 in / $2.4 out
o4-mini
OpenAI
37.5
N/A
GLM-4.5
Zhipu AI
36.2
N/A
GLM-4.6
Zhipu AI
36.0
N/A
GPT-4.5
OpenAI
35.8
N/A
Qwen3 VL 32B Thinking
Alibaba Cloud / Qwen Team
34.1
N/A
GPT-4.1
OpenAI
32.8
$2 in / $8 out
Qwen3.5-397B-A17B
Alibaba Cloud / Qwen Team
31.1
$0.6 in / $3.6 out
LongCat-Flash-Lite
Meituan
30.1
$0.1 in / $0.4 out
LongCat-Flash-Thinking-2601
Meituan
29.0
N/A
DeepSeek-V3.2-Exp
DeepSeek
28.0
N/A