Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
294
Tracked models
27
Providers
251
Benchmarked
11.4
Avg. index
294 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 41 | MiniMax M2 minimax-m2 codeprogrammingtool use | MiniMax | 41.4 Agentic | 32.2 | 55.9 | 41.4 | 42.8 | 52.3 | $0.3 in / $1.2 out |
| 42 | Qwen3 VL 235B A22B Thinking qwen3-vl-235b-a22b-thinking multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 40.2 Agentic | 37.9 | 66.8 | 40.2 | 0.0 | 37.2 | |
| 43 | Claude 3.5 Sonnet claude-3-5-sonnet-20241022 multimodalvisionmulti-input reasoning | Anthropic | 38.7 Agentic | 33.9 | 67.4 | 38.7 | 13.2 | 24.5 | |
| 44 | o4-mini o4-mini multimodalvisionmulti-input reasoning | OpenAI | 38.2 Agentic | 48.8 | 70.7 | 38.2 | 32.7 | 41.9 | $1.1 in / $4.4 out |
| 45 | GLM-4.6 glm-4.6 multimodalvisionmulti-input reasoning | Zhipu AI | 37.7 Agentic | 47.0 | 34.9 | 37.7 | 46.1 | 42.8 | $0.55 in / $2.19 out |
| 46 | GLM-4.5 glm-4.5 codeprogrammingtool use | Zhipu AI | 36.4 Agentic | 34.3 | 0.0 | 36.4 | 40.6 | 0.0 | N/A |
| 47 | GPT-4.5 gpt-4.5 multimodalvisionmulti-input reasoning | OpenAI | 35.8 Agentic | 41.9 | 29.1 | 35.8 | 6.2 | 6.8 | $75 in / $150 out |
| 48 | Qwen3.5-397B-A17B qwen3.5-397b-a17b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 35.6 Agentic | 58.6 | 66.8 | 35.6 | 60.9 | 35.3 | $0.6 in / $3.6 out |
| 49 | Qwen3 VL 32B Thinking qwen3-vl-32b-thinking multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 34.6 Agentic | 44.6 | 0.0 | 34.6 | 0.0 | 0.0 | |
| 50 | GPT-4.1 gpt-4.1-2025-04-14 multimodalvisionmulti-input reasoning | OpenAI | 32.8 Agentic | 28.8 | 75.4 | 32.8 | 17.7 | 34.6 | |
| 51 | LongCat-Flash-Thinking-2601 longcat-flash-thinking-2601 codeprogrammingtool use | Meituan | 30.8 Agentic | 56.3 | 51.9 | 30.8 | 38.0 | 57.7 | |
| 52 | LongCat-Flash-Lite longcat-flash-lite codeprogrammingtool use | Meituan | 29.5 Agentic | 24.7 | 83.8 | 29.5 | 25.3 | 83.3 | $0.1 in / $0.4 out |
| 53 | GPT-5 gpt-5-2025-08-07 multimodalvisionmulti-input reasoning | OpenAI | 29.0 Agentic | 64.4 | 0.0 | 29.0 | 51.7 | 0.0 | N/A |
| 54 | DeepSeek-V3.2-Exp deepseek-v3.2-exp codeprogrammingtool use | DeepSeek | 28.8 Agentic | 52.7 | 0.0 | 28.8 | 40.5 | 0.0 | N/A |
| 55 | GLM-4.7 glm-4.7 multimodalvisionmulti-input reasoning | Zhipu AI | 28.2 Agentic | 63.2 | 52.8 | 28.2 | 44.5 | 40.6 | $0.6 in / $2.2 out |
| 56 | Qwen3 VL 32B Instruct qwen3-vl-32b-instruct multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 27.9 Agentic | 29.5 | 0.0 | 27.9 | 0.0 | 0.0 | |
| 57 | MiMo-V2-Flash mimo-v2-flash codeprogrammingtool use | Xiaomi | 27.2 Agentic | 53.7 | 79.8 | 27.2 | 39.3 | 85.9 | $0.1 in / $0.3 out |
| 58 | GPT-5.4 Mini gpt-5.4-mini texttext-to-textlanguage | OpenAI | 27.1 Agentic | 57.4 | 77.4 | 27.1 | 26.9 | 32.8 | |
| 59 | GPT OSS 120B gpt-oss-120b textinference | OpenAI | 26.8 Agentic | 36.6 | 34.9 | 26.8 | 0.0 | 76.7 | $0.09 in / $0.45 out |
| 60 | MiniMax M1 40K minimax-m1-40k codeprogrammingtool use | MiniMax | 26.8 Agentic | 22.9 | 0.0 | 26.8 | 18.5 | 0.0 | N/A |
MiniMax M2
MiniMax
41.4
$0.3 in / $1.2 out
Qwen3 VL 235B A22B Thinking
Alibaba Cloud / Qwen Team
40.2
$0.45 in / $3.49 out
Claude 3.5 Sonnet
Anthropic
38.7
$3 in / $15 out
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $0.45 in / $3.49 out |
| $3 in / $15 out |
| N/A |
| $2 in / $8 out |
| $0.3 in / $1.2 out |
| N/A |
| $0.75 in / $4.5 out |
o4-mini
OpenAI
38.2
$1.1 in / $4.4 out
GLM-4.6
Zhipu AI
37.7
$0.55 in / $2.19 out
GLM-4.5
Zhipu AI
36.4
N/A
GPT-4.5
OpenAI
35.8
$75 in / $150 out
Qwen3.5-397B-A17B
Alibaba Cloud / Qwen Team
35.6
$0.6 in / $3.6 out
Qwen3 VL 32B Thinking
Alibaba Cloud / Qwen Team
34.6
N/A
GPT-4.1
OpenAI
32.8
$2 in / $8 out
LongCat-Flash-Thinking-2601
Meituan
30.8
$0.3 in / $1.2 out
LongCat-Flash-Lite
Meituan
29.5
$0.1 in / $0.4 out
GPT-5
OpenAI
29.0
N/A
DeepSeek-V3.2-Exp
DeepSeek
28.8
N/A
GLM-4.7
Zhipu AI
28.2
$0.6 in / $2.2 out
Qwen3 VL 32B Instruct
Alibaba Cloud / Qwen Team
27.9
N/A
MiMo-V2-Flash
Xiaomi
27.2
$0.1 in / $0.3 out
GPT OSS 120B
OpenAI
26.8
$0.09 in / $0.45 out
MiniMax M1 40K
MiniMax
26.8
N/A