Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
309
Tracked models
27
Providers
264
Benchmarked
29.3
Avg. index
309 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Claude Mythos Preview claude-mythos-preview multimodalvisionmulti-input reasoning | Anthropic | 78.1 overall | 80.0 | 0.0 | 70.2 | 84.2 | 0.0 | N/A |
| 2 | Grok-4 Heavy grok-4-heavy multimodalvisionmulti-input reasoning | xAI | 72.0 overall | 72.0 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 3 | GPT-5.1 High gpt-5.1-high-2025-11-12 multimodalvisionmulti-input reasoning | OpenAI | 68.3 overall | 68.3 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 4 | GPT-5.5 gpt-5.5 multimodalvisionmulti-input reasoning | OpenAI | 68.1 overall | 80.4 | 93.7 | 70.2 | 61.6 | 1.9 | $5 in / $30 out |
| 5 | GPT-5.5 Pro gpt-5.5-pro multimodalvisionmulti-input reasoning | OpenAI | 66.8 overall | 67.8 | 0.0 | 71.8 | 60.1 | 0.0 | N/A |
| 6 | Grok-4.1 Fast Non-Reasoning grok-4-1-fast-non-reasoning multimodalvisionmulti-input reasoning | xAI | 66.6 overall | 0.0 | 62.1 | 0.0 | 0.0 | 73.7 | |
| 7 | Grok-4.1 Fast Reasoning grok-4-1-fast-reasoning multimodalvisionmulti-input reasoning | xAI | 66.6 overall | 0.0 | 62.1 | 0.0 | 0.0 | 73.7 | |
| 8 | Grok-4 Fast Non-Reasoning grok-4-fast-non-reasoning multimodalvisionmulti-input reasoning | xAI | 66.6 overall | 0.0 | 62.1 | 0.0 | 0.0 | 73.7 | |
| 9 | Grok-4 Fast Reasoning grok-4-fast-reasoning multimodalvisionmulti-input reasoning | xAI | 66.6 overall | 0.0 | 62.1 | 0.0 | 0.0 | 73.7 | |
| 10 | Qwen3.7 Max qwen3.7-max multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 66.3 overall | 66.1 | 72.2 | 61.7 | 81.5 | 35.4 | $1.25 in / $3.75 out |
| 11 | DeepSeek-V4-Pro-Max deepseek-v4-pro-max codeprogrammingtool use | DeepSeek | 64.2 overall | 67.4 | 89.2 | 61.3 | 58.6 | 34.2 | |
| 12 | Claude Opus 4.8 claude-opus-4-8 multimodalvisionmulti-input reasoning | Anthropic | 64.0 overall | 75.2 | 31.5 | 80.0 | 82.0 | 6.3 | |
| 13 | MiMo-V2-Pro mimo-v2-pro codeprogrammingtool use | Xiaomi | 63.7 overall | 0.0 | 0.0 | 0.0 | 63.7 | 0.0 | N/A |
| 14 | Gemini 3 Pro gemini-3-pro-preview multimodalvisionmulti-input reasoning | Google | 63.2 overall | 72.0 | 0.0 | 60.7 | 54.6 | 0.0 | |
| 15 | Gemini 3.1 Pro gemini-3.1-pro-preview multimodalvisionmulti-input reasoning | Google | 63.1 overall | 73.8 | 59.4 | 68.9 | 66.0 | 18.5 | |
| 16 | DeepSeek-V3.2 (Non-thinking) deepseek-chat textinference | DeepSeek | 63.1 overall | 0.0 | 53.0 | 0.0 | 0.0 | 79.3 | $0.28 in / $0.42 out |
| 17 | GPT-5 High gpt-5-high-2025-08-07 multimodalvisionmulti-input reasoning | OpenAI | 62.7 overall | 62.7 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 18 | Nova 2 Sonic nova-2-sonic multimodalvisionmulti-input reasoning | Amazon | 62.4 overall | 0.0 | 72.2 | 0.0 | 0.0 | 46.8 | $0.33 in / $2.75 out |
| 19 | Gemini 3.1 Flash-Lite gemini-3.1-flash-lite-preview multimodalvisionmulti-input reasoning | Google | 61.8 overall | 55.3 | 72.2 | 0.0 | 0.0 | 63.3 | |
| 20 | DeepSeek-V4-Flash-Max deepseek-v4-flash-max codeprogrammingtool use | DeepSeek | 61.6 overall | 58.3 | 89.2 | 47.6 | 44.2 | 98.7 |
Claude Mythos Preview
Anthropic
78.1
N/A
Grok-4 Heavy
xAI
72.0
N/A
GPT-5.1 High
OpenAI
68.3
N/A
Page 1 of 16 · 309 models
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| N/A |
| N/A |
| $0.2 in / $0.5 out |
| $0.2 in / $0.5 out |
| $0.2 in / $0.5 out |
| $0.2 in / $0.5 out |
| $1.74 in / $3.48 out |
| $5 in / $25 out |
| N/A |
| $2.5 in / $15 out |
| N/A |
| $0.25 in / $1.5 out |
| $0.14 in / $0.28 out |
GPT-5.5
OpenAI
68.1
$5 in / $30 out
GPT-5.5 Pro
OpenAI
66.8
N/A
Grok-4.1 Fast Non-Reasoning
xAI
66.6
$0.2 in / $0.5 out
Grok-4.1 Fast Reasoning
xAI
66.6
$0.2 in / $0.5 out
Grok-4 Fast Non-Reasoning
xAI
66.6
$0.2 in / $0.5 out
Grok-4 Fast Reasoning
xAI
66.6
$0.2 in / $0.5 out
Qwen3.7 Max
Alibaba Cloud / Qwen Team
66.3
$1.25 in / $3.75 out
DeepSeek-V4-Pro-Max
DeepSeek
64.2
$1.74 in / $3.48 out
Claude Opus 4.8
Anthropic
64.0
$5 in / $25 out
MiMo-V2-Pro
Xiaomi
63.7
N/A
Gemini 3 Pro
63.2
N/A
Gemini 3.1 Pro
63.1
$2.5 in / $15 out
DeepSeek-V3.2 (Non-thinking)
DeepSeek
63.1
$0.28 in / $0.42 out
GPT-5 High
OpenAI
62.7
N/A
Nova 2 Sonic
Amazon
62.4
$0.33 in / $2.75 out
Gemini 3.1 Flash-Lite
61.8
$0.25 in / $1.5 out
DeepSeek-V4-Flash-Max
DeepSeek
61.6
$0.14 in / $0.28 out