Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
309
Tracked models
27
Providers
264
Benchmarked
13.9
Avg. index
309 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Claude Mythos Preview claude-mythos-preview multimodalvisionmulti-input reasoning | Anthropic | 84.2 Programming | 80.0 | 0.0 | 70.2 | 84.2 | 0.0 | N/A |
| 2 | Claude Opus 4.8 claude-opus-4-8 multimodalvisionmulti-input reasoning | Anthropic | 82.0 Programming | 75.2 | 31.5 | 80.0 | 82.0 | 6.3 | |
| 3 | Qwen3.7 Max qwen3.7-max multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 81.5 Programming | 66.1 | 72.2 | 61.7 | 81.5 | 35.4 | |
| 4 | Claude Opus 4.7 claude-opus-4-7 multimodalvisionmulti-input reasoning | Anthropic | 79.9 Programming | 76.6 | 31.5 | 63.8 | 79.9 | 6.3 | |
| 5 | Kimi K2.6 kimi-k2.6 texttext-to-textlanguage | Moonshot AI | 75.4 Programming | 67.0 | 41.1 | 57.6 | 75.4 | 36.7 | |
| 6 | Claude Sonnet 4.5 claude-sonnet-4-5-20250929 multimodalvisionmulti-input reasoning | Anthropic | 74.6 Programming | 51.9 | 14.6 | 71.8 | 74.6 | 9.3 | |
| 7 | MiniMax M3 minimax-m3 multimodalvisionmulti-input reasoning | MiniMax | 74.3 Programming | 54.6 | 72.2 | 38.7 | 74.3 | 48.1 | $0.6 in / $2.4 out |
| 8 | Claude Opus 4.5 claude-opus-4-5-20251101 multimodalvisionmulti-input reasoning | Anthropic | 73.5 Programming | 55.3 | 0.0 | 41.4 | 73.5 | 0.0 | |
| 9 | Claude Opus 4.6 claude-opus-4-6 multimodalvisionmulti-input reasoning | Anthropic | 72.8 Programming | 78.2 | 31.5 | 57.8 | 72.8 | 6.3 | |
| 10 | GPT-5.2 gpt-5.2-2025-12-11 multimodalvisionmulti-input reasoning | OpenAI | 70.7 Programming | 75.3 | 66.9 | 44.4 | 70.7 | 27.1 | |
| 11 | Claude Sonnet 4.6 claude-sonnet-4-6 multimodalvisionmulti-input reasoning | Anthropic | 66.4 Programming | 64.7 | 14.6 | 47.6 | 66.4 | 9.3 | |
| 12 | Gemini 3.1 Pro gemini-3.1-pro-preview multimodalvisionmulti-input reasoning | Google | 66.0 Programming | 73.8 | 59.4 | 68.9 | 66.0 | 18.5 | |
| 13 | Gemini 3 Flash gemini-3-flash-preview multimodalvisionmulti-input reasoning | Google | 63.7 Programming | 70.0 | 72.2 | 38.8 | 63.7 | 44.9 | |
| 14 | MiMo-V2-Pro mimo-v2-pro codeprogrammingtool use | Xiaomi | 63.7 Programming | 0.0 | 0.0 | 0.0 | 63.7 | 0.0 | N/A |
| 15 | GLM-5 glm-5 codeprogrammingtool use | Zhipu AI | 62.5 Programming | 0.0 | 8.7 | 43.6 | 62.5 | 31.8 | $1 in / $3.2 out |
| 16 | Claude Opus 4.1 claude-opus-4-1-20250805 multimodalvisionmulti-input reasoning | Anthropic | 62.0 Programming | 46.4 | 0.0 | 67.4 | 62.0 | 0.0 | |
| 17 | Mistral Medium 3.5 mistral-medium-3-5 multimodalvisionmulti-input reasoning | Mistral AI | 61.7 Programming | 34.9 | 28.5 | 16.8 | 61.7 | 29.1 | |
| 18 | GPT-5.5 gpt-5.5 multimodalvisionmulti-input reasoning | OpenAI | 61.6 Programming | 80.4 | 93.7 | 70.2 | 61.6 | 1.9 | $5 in / $30 out |
| 19 | Qwen3.6 Plus qwen3.6-plus multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 61.0 Programming | 70.2 | 72.2 | 42.1 | 61.0 | 44.9 | $0.5 in / $3 out |
| 20 | GPT-5.4 gpt-5.4 texttext-to-textlanguage | OpenAI | 60.6 Programming | 75.3 | 38.9 | 56.2 | 60.6 | 14.1 |
Claude Mythos Preview
Anthropic
84.2
N/A
Claude Opus 4.8
Anthropic
82.0
$5 in / $25 out
Qwen3.7 Max
Alibaba Cloud / Qwen Team
81.5
$1.25 in / $3.75 out
Page 1 of 16 · 309 models
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $5 in / $25 out |
| $1.25 in / $3.75 out |
| $5 in / $25 out |
| $0.95 in / $4 out |
| $3 in / $15 out |
| N/A |
| $5 in / $25 out |
| $1.75 in / $14 out |
| $3 in / $15 out |
| $2.5 in / $15 out |
| $0.5 in / $3 out |
| N/A |
| $1.5 in / $7.5 out |
| $2.5 in / $15 out |
Claude Opus 4.7
Anthropic
79.9
$5 in / $25 out
Claude Sonnet 4.5
Anthropic
74.6
$3 in / $15 out
MiniMax M3
MiniMax
74.3
$0.6 in / $2.4 out
Claude Opus 4.5
Anthropic
73.5
N/A
Claude Opus 4.6
Anthropic
72.8
$5 in / $25 out
GPT-5.2
OpenAI
70.7
$1.75 in / $14 out
Claude Sonnet 4.6
Anthropic
66.4
$3 in / $15 out
Gemini 3.1 Pro
66.0
$2.5 in / $15 out
Gemini 3 Flash
63.7
$0.5 in / $3 out
MiMo-V2-Pro
Xiaomi
63.7
N/A
GLM-5
Zhipu AI
62.5
$1 in / $3.2 out
Claude Opus 4.1
Anthropic
62.0
N/A
Mistral Medium 3.5
Mistral AI
61.7
$1.5 in / $7.5 out
GPT-5.5
OpenAI
61.6
$5 in / $30 out
Qwen3.6 Plus
Alibaba Cloud / Qwen Team
61.0
$0.5 in / $3 out