Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
294
Tracked models
27
Providers
251
Benchmarked
11.4
Avg. index
294 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 1 | GPT-5.5 gpt-5.5 multimodalvisionmulti-input reasoning | OpenAI | 76.2 Agentic | 80.3 | 84.9 | 76.2 | 65.4 | 6.7 | $5 in / $30 out |
| 2 | Gemini 3.1 Pro gemini-3.1-pro-preview multimodalvisionmulti-input reasoning | Google | 72.3 Agentic | 74.3 | 66.8 | 72.3 | 65.5 | 22.1 | |
| 3 | Claude Sonnet 4.5 claude-sonnet-4-5-20250929 multimodalvisionmulti-input reasoning | Anthropic | 71.8 Agentic | 53.3 | 30.1 | 71.8 | 74.6 | 13.2 | |
| 4 | GPT-5.5 Pro gpt-5.5-pro multimodalvisionmulti-input reasoning | OpenAI | 71.8 Agentic | 67.8 | 84.9 | 71.8 | 59.1 | 0.6 | $30 in / $180 out |
| 5 | Claude Mythos Preview claude-mythos-preview multimodalvisionmulti-input reasoning | Anthropic | 70.0 Agentic | 80.0 | 0.0 | 70.0 | 84.2 | 1.7 | |
| 6 | Claude Opus 4.7 claude-opus-4-7 multimodalvisionmulti-input reasoning | Anthropic | 69.2 Agentic | 76.8 | 42.8 | 69.2 | 81.2 | 10.6 | |
| 7 | Muse Spark muse-spark multimodalvisionmulti-input reasoning | Meta | 67.3 Agentic | 71.0 | 0.0 | 67.3 | 41.3 | 0.0 | N/A |
| 8 | Claude Opus 4.1 claude-opus-4-1-20250805 multimodalvisionmulti-input reasoning | Anthropic | 66.8 Agentic | 48.1 | 30.1 | 66.8 | 62.9 | 7.0 | |
| 9 | Gemini 3 Pro gemini-3-pro-preview multimodalvisionmulti-input reasoning | Google | 63.8 Agentic | 73.3 | 0.0 | 63.8 | 57.4 | 0.0 | |
| 10 | GPT-5.4 gpt-5.4 texttext-to-textlanguage | OpenAI | 63.8 Agentic | 76.3 | 51.1 | 63.8 | 62.1 | 18.2 | |
| 11 | Claude Opus 4.6 claude-opus-4-6 multimodalvisionmulti-input reasoning | Anthropic | 60.7 Agentic | 79.5 | 42.8 | 60.7 | 73.3 | 10.6 | |
| 12 | Claude Opus 4 claude-opus-4-20250514 multimodalvisionmulti-input reasoning | Anthropic | 57.9 Agentic | 37.8 | 0.0 | 57.9 | 49.5 | 0.0 | |
| 13 | Qwen3 VL 235B A22B Instruct qwen3-vl-235b-a22b-instruct multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 56.7 Agentic | 37.1 | 66.8 | 56.7 | 0.0 | 49.4 | |
| 14 | MiniMax M2.1 minimax-m2.1 codeprogrammingtool use | MiniMax | 56.6 Agentic | 42.7 | 73.9 | 56.6 | 50.6 | 57.7 | $0.3 in / $1.2 out |
| 15 | GPT-5.2 Pro gpt-5.2-pro-2025-12-11 multimodalvisionmulti-input reasoning | OpenAI | 56.4 Agentic | 67.3 | 31.3 | 56.4 | 0.0 | 2.5 | |
| 16 | GLM-5V-Turbo glm-5v-turbo multimodalvisionmulti-input reasoning | Zhipu AI | 54.9 Agentic | 0.0 | 0.0 | 54.9 | 0.0 | 0.0 | N/A |
| 17 | Seed 2.0 Pro seed-2.0-pro multimodalvisionmulti-input reasoning | ByteDance | 54.7 Agentic | 68.2 | 0.0 | 54.7 | 61.8 | 0.0 | N/A |
| 18 | GLM-5.1 glm-5.1 codeprogrammingtool use | Zhipu AI | 54.4 Agentic | 67.1 | 46.6 | 54.4 | 58.3 | 30.6 | $1.4 in / $4.4 out |
| 19 | Claude Haiku 4.5 claude-haiku-4-5-20251001 multimodalvisionmulti-input reasoning | Anthropic | 54.2 Agentic | 32.9 | 61.2 | 54.2 | 57.2 | 37.7 | |
| 20 | Kimi K2-Thinking-0905 kimi-k2-thinking-0905 codeprogrammingtool use | Moonshot AI | 53.5 Agentic | 69.3 | 0.0 | 53.5 | 62.5 | 0.0 |
GPT-5.5
OpenAI
76.2
$5 in / $30 out
Gemini 3.1 Pro
72.3
$2.5 in / $15 out
Claude Sonnet 4.5
Anthropic
71.8
$3 in / $15 out
Page 1 of 15 · 294 models
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $2.5 in / $15 out |
| $3 in / $15 out |
| $25 in / $125 out |
| $5 in / $25 out |
| $15 in / $75 out |
| N/A |
| $2.5 in / $15 out |
| $5 in / $25 out |
| N/A |
| $0.3 in / $1.5 out |
| $21 in / $168 out |
| $1 in / $5 out |
| N/A |
GPT-5.5 Pro
OpenAI
71.8
$30 in / $180 out
Claude Mythos Preview
Anthropic
70.0
$25 in / $125 out
Claude Opus 4.7
Anthropic
69.2
$5 in / $25 out
Muse Spark
Meta
67.3
N/A
Claude Opus 4.1
Anthropic
66.8
$15 in / $75 out
Gemini 3 Pro
63.8
N/A
Claude Opus 4.6
Anthropic
60.7
$5 in / $25 out
Claude Opus 4
Anthropic
57.9
N/A
Qwen3 VL 235B A22B Instruct
Alibaba Cloud / Qwen Team
56.7
$0.3 in / $1.5 out
MiniMax M2.1
MiniMax
56.6
$0.3 in / $1.2 out
GPT-5.2 Pro
OpenAI
56.4
$21 in / $168 out
GLM-5V-Turbo
Zhipu AI
54.9
N/A
Seed 2.0 Pro
ByteDance
54.7
N/A
GLM-5.1
Zhipu AI
54.4
$1.4 in / $4.4 out
Claude Haiku 4.5
Anthropic
54.2
$1 in / $5 out
Kimi K2-Thinking-0905
Moonshot AI
53.5
N/A