Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
294
Tracked models
27
Providers
251
Benchmarked
13.2
Avg. index
294 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 21 | GLM-5.1 glm-5.1 codeprogrammingtool use | Zhipu AI | 58.3 Programming | 67.1 | 46.6 | 54.4 | 58.3 | 30.6 | $1.4 in / $4.4 out |
| 22 | Gemini 3 Pro gemini-3-pro-preview multimodalvisionmulti-input reasoning | Google | 57.4 Programming | 73.3 | 0.0 | 63.8 | 57.4 | 0.0 | |
| 23 | Claude Haiku 4.5 claude-haiku-4-5-20251001 multimodalvisionmulti-input reasoning | Anthropic | 57.2 Programming | 32.9 | 61.2 | 54.2 | 57.2 | 37.7 | |
| 24 | GPT-5.1 gpt-5.1-2025-11-13 multimodalvisionmulti-input reasoning | OpenAI | 57.2 Programming | 65.0 | 71.4 | 0.0 | 57.2 | 31.9 | |
| 25 | GPT-5.1 Instant gpt-5.1-instant-2025-11-12 multimodalvisionmulti-input reasoning | OpenAI | 57.2 Programming | 65.0 | 71.4 | 0.0 | 57.2 | 31.9 | |
| 26 | GPT-5.1 Thinking gpt-5.1-thinking-2025-11-12 multimodalvisionmulti-input reasoning | OpenAI | 57.2 Programming | 65.0 | 55.1 | 0.0 | 57.2 | 27.0 | |
| 27 | MiniMax M2.5 minimax-m2.5 codeprogrammingtool use | MiniMax | 56.3 Programming | 0.0 | 73.9 | 53.0 | 56.3 | 57.7 | $0.3 in / $1.2 out |
| 28 | MiMo-V2-Omni mimo-v2-omni multimodalvisionmulti-input reasoning | Xiaomi | 55.6 Programming | 0.0 | 59.2 | 0.0 | 55.6 | 44.7 | $0.4 in / $2 out |
| 29 | GPT-5 Codex gpt-5-codex-2025-09-15 codeprogrammingtool use | OpenAI | 54.3 Programming | 0.0 | 0.0 | 0.0 | 54.3 | 0.0 | N/A |
| 30 | Step-3.5-Flash step-3.5-flash codeprogrammingtool use | StepFun | 53.0 Programming | 62.3 | 63.2 | 45.3 | 53.0 | 82.1 | $0.1 in / $0.4 out |
| 31 | GPT-5 gpt-5-2025-08-07 multimodalvisionmulti-input reasoning | OpenAI | 51.7 Programming | 64.4 | 0.0 | 29.0 | 51.7 | 0.0 | |
| 32 | GPT-5.1 Codex gpt-5.1-codex multimodalvisionmulti-input reasoning | OpenAI | 51.2 Programming | 0.0 | 48.6 | 0.0 | 51.2 | 25.1 | |
| 33 | MiniMax M2.1 minimax-m2.1 codeprogrammingtool use | MiniMax | 50.6 Programming | 42.7 | 73.9 | 56.6 | 50.6 | 57.7 | $0.3 in / $1.2 out |
| 34 | Seed 2.0 Lite seed-2.0-lite multimodalvisionmulti-input reasoning | ByteDance | 50.3 Programming | 58.1 | 0.0 | 0.0 | 50.3 | 0.0 | N/A |
| 35 | Claude Opus 4 claude-opus-4-20250514 multimodalvisionmulti-input reasoning | Anthropic | 49.5 Programming | 37.8 | 0.0 | 57.9 | 49.5 | 0.0 | |
| 36 | GPT-5.3 Codex gpt-5.3-codex texttext-to-textcoding | OpenAI | 49.3 Programming | 0.0 | 48.6 | 0.0 | 49.3 | 19.5 | |
| 37 | Kimi K2.5 kimi-k2.5 multimodalvisionmulti-input reasoning | Moonshot AI | 48.5 Programming | 68.0 | 66.8 | 49.5 | 48.5 | 38.1 | |
| 38 | GLM-4.6 glm-4.6 multimodalvisionmulti-input reasoning | Zhipu AI | 46.1 Programming | 47.0 | 34.9 | 37.7 | 46.1 | 42.8 | $0.55 in / $2.19 out |
| 39 | DeepSeek-V3.2 (Thinking) deepseek-reasoner codeprogrammingtool use | DeepSeek | 45.9 Programming | 53.1 | 0.0 | 16.6 | 45.9 | 0.0 | |
| 40 | DeepSeek-V3.2 deepseek-v3.2 codeprogrammingtool use | DeepSeek | 45.9 Programming | 58.1 | 52.5 | 16.6 | 45.9 | 70.0 | $0.26 in / $0.38 out |
GLM-5.1
Zhipu AI
58.3
$1.4 in / $4.4 out
Gemini 3 Pro
57.4
N/A
Claude Haiku 4.5
Anthropic
57.2
$1 in / $5 out
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| N/A |
| $1 in / $5 out |
| $1.25 in / $10 out |
| $1.25 in / $10 out |
| $1.25 in / $10 out |
| N/A |
| $1.25 in / $10 out |
| N/A |
| $1.75 in / $14 out |
| $0.6 in / $3 out |
| N/A |
GPT-5.1
OpenAI
57.2
$1.25 in / $10 out
GPT-5.1 Instant
OpenAI
57.2
$1.25 in / $10 out
GPT-5.1 Thinking
OpenAI
57.2
$1.25 in / $10 out
MiniMax M2.5
MiniMax
56.3
$0.3 in / $1.2 out
MiMo-V2-Omni
Xiaomi
55.6
$0.4 in / $2 out
GPT-5 Codex
OpenAI
54.3
N/A
Step-3.5-Flash
StepFun
53.0
$0.1 in / $0.4 out
GPT-5
OpenAI
51.7
N/A
GPT-5.1 Codex
OpenAI
51.2
$1.25 in / $10 out
MiniMax M2.1
MiniMax
50.6
$0.3 in / $1.2 out
Seed 2.0 Lite
ByteDance
50.3
N/A
Claude Opus 4
Anthropic
49.5
N/A
Kimi K2.5
Moonshot AI
48.5
$0.6 in / $3 out
GLM-4.6
Zhipu AI
46.1
$0.55 in / $2.19 out
DeepSeek-V3.2 (Thinking)
DeepSeek
45.9
N/A
DeepSeek-V3.2
DeepSeek
45.9
$0.26 in / $0.38 out