Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
294
Tracked models
27
Providers
251
Benchmarked
13.2
Avg. index
294 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Claude Mythos Preview claude-mythos-preview multimodalvisionmulti-input reasoning | Anthropic | 84.2 Programming | 80.0 | 0.0 | 70.0 | 84.2 | 1.7 | $25 in / $125 out |
| 2 | Claude Opus 4.7 claude-opus-4-7 multimodalvisionmulti-input reasoning | Anthropic | 81.2 Programming | 76.8 | 42.8 | 69.2 | 81.2 | 10.6 | |
| 3 | Kimi K2.6 kimi-k2.6 multimodalvisionmulti-input reasoning | Moonshot AI | 81.0 Programming | 68.5 | 66.8 | 45.3 | 81.0 | 33.3 | |
| 4 | Claude Sonnet 4.5 claude-sonnet-4-5-20250929 multimodalvisionmulti-input reasoning | Anthropic | 74.6 Programming | 53.3 | 30.1 | 71.8 | 74.6 | 13.2 | |
| 5 | Claude Opus 4.5 claude-opus-4-5-20251101 multimodalvisionmulti-input reasoning | Anthropic | 74.2 Programming | 56.3 | 30.1 | 44.2 | 74.2 | 10.6 | |
| 6 | Claude Opus 4.6 claude-opus-4-6 multimodalvisionmulti-input reasoning | Anthropic | 73.3 Programming | 79.5 | 42.8 | 60.7 | 73.3 | 10.6 | |
| 7 | GPT-5.2 gpt-5.2-2025-12-11 multimodalvisionmulti-input reasoning | OpenAI | 72.4 Programming | 76.9 | 71.4 | 50.3 | 72.4 | 26.4 | |
| 8 | Claude Sonnet 4.6 claude-sonnet-4-6 multimodalvisionmulti-input reasoning | Anthropic | 68.9 Programming | 66.1 | 30.1 | 49.6 | 68.9 | 13.2 | |
| 9 | Gemini 3 Flash gemini-3-flash-preview multimodalvisionmulti-input reasoning | Google | 66.6 Programming | 71.3 | 84.9 | 42.5 | 66.6 | 38.9 | |
| 10 | MiMo-V2-Pro mimo-v2-pro codeprogrammingtool use | Xiaomi | 66.6 Programming | 0.0 | 84.9 | 0.0 | 66.6 | 36.4 | $1 in / $3 out |
| 11 | Gemini 3.1 Pro gemini-3.1-pro-preview multimodalvisionmulti-input reasoning | Google | 65.5 Programming | 74.3 | 66.8 | 72.3 | 65.5 | 22.1 | |
| 12 | GPT-5.5 gpt-5.5 multimodalvisionmulti-input reasoning | OpenAI | 65.4 Programming | 80.3 | 84.9 | 76.2 | 65.4 | 6.7 | $5 in / $30 out |
| 13 | GLM-5 glm-5 codeprogrammingtool use | Zhipu AI | 65.3 Programming | 0.0 | 22.1 | 51.3 | 65.3 | 30.2 | $1 in / $3.2 out |
| 14 | Claude Opus 4.1 claude-opus-4-1-20250805 multimodalvisionmulti-input reasoning | Anthropic | 62.9 Programming | 48.1 | 30.1 | 66.8 | 62.9 | 7.0 | |
| 15 | Kimi K2-Thinking-0905 kimi-k2-thinking-0905 codeprogrammingtool use | Moonshot AI | 62.5 Programming | 69.3 | 0.0 | 53.5 | 62.5 | 0.0 | |
| 16 | Qwen3.6 Plus qwen3.6-plus multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 62.2 Programming | 71.9 | 0.0 | 49.3 | 62.2 | 0.0 | N/A |
| 17 | GPT-5.4 gpt-5.4 texttext-to-textlanguage | OpenAI | 62.1 Programming | 76.3 | 51.1 | 63.8 | 62.1 | 18.2 | |
| 18 | Seed 2.0 Pro seed-2.0-pro multimodalvisionmulti-input reasoning | ByteDance | 61.8 Programming | 68.2 | 0.0 | 54.7 | 61.8 | 0.0 | N/A |
| 19 | Qwen3.5-397B-A17B qwen3.5-397b-a17b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 60.9 Programming | 58.6 | 66.8 | 35.6 | 60.9 | 35.3 | $0.6 in / $3.6 out |
| 20 | GPT-5.5 Pro gpt-5.5-pro multimodalvisionmulti-input reasoning | OpenAI | 59.1 Programming | 67.8 | 84.9 | 71.8 | 59.1 | 0.6 | $30 in / $180 out |
Claude Mythos Preview
Anthropic
84.2
$25 in / $125 out
Claude Opus 4.7
Anthropic
81.2
$5 in / $25 out
Kimi K2.6
Moonshot AI
81.0
$0.95 in / $4 out
Page 1 of 15 · 294 models
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $5 in / $25 out |
| $0.95 in / $4 out |
| $3 in / $15 out |
| $5 in / $25 out |
| $5 in / $25 out |
| $1.75 in / $14 out |
| $3 in / $15 out |
| $0.5 in / $3 out |
| $2.5 in / $15 out |
| $15 in / $75 out |
| N/A |
| $2.5 in / $15 out |
Claude Sonnet 4.5
Anthropic
74.6
$3 in / $15 out
Claude Opus 4.5
Anthropic
74.2
$5 in / $25 out
Claude Opus 4.6
Anthropic
73.3
$5 in / $25 out
GPT-5.2
OpenAI
72.4
$1.75 in / $14 out
Claude Sonnet 4.6
Anthropic
68.9
$3 in / $15 out
Gemini 3 Flash
66.6
$0.5 in / $3 out
MiMo-V2-Pro
Xiaomi
66.6
$1 in / $3 out
Gemini 3.1 Pro
65.5
$2.5 in / $15 out
GPT-5.5
OpenAI
65.4
$5 in / $30 out
GLM-5
Zhipu AI
65.3
$1 in / $3.2 out
Claude Opus 4.1
Anthropic
62.9
$15 in / $75 out
Kimi K2-Thinking-0905
Moonshot AI
62.5
N/A
Qwen3.6 Plus
Alibaba Cloud / Qwen Team
62.2
N/A
Seed 2.0 Pro
ByteDance
61.8
N/A
Qwen3.5-397B-A17B
Alibaba Cloud / Qwen Team
60.9
$0.6 in / $3.6 out
GPT-5.5 Pro
OpenAI
59.1
$30 in / $180 out