Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
291
Tracked models
27
Providers
248
Benchmarked
34.7
Avg. index
291 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Grok-4 Heavy grok-4-heavy multimodalvisionmulti-input reasoning | xAI | 73.2 overall | 73.2 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 2 | Grok-4.20 Beta Non-Reasoning grok-4.20-beta-0309-non-reasoning multimodalvisionmulti-input reasoning | xAI | 70.3 overall | 0.0 | 97.2 | 0.0 | 0.0 | 27.2 | |
| 3 | Grok-4.20 Beta Reasoning grok-4.20-beta-0309-reasoning multimodalvisionmulti-input reasoning | xAI | 70.3 overall | 0.0 | 97.2 | 0.0 | 0.0 | 27.2 | |
| 4 | Claude Mythos Preview claude-mythos-preview multimodalvisionmulti-input reasoning | Anthropic | 69.4 overall | 80.0 | 0.0 | 71.8 | 84.2 | 1.1 | |
| 5 | GPT-5.1 High gpt-5.1-high-2025-11-12 multimodalvisionmulti-input reasoning | OpenAI | 68.8 overall | 68.8 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 6 | Qwen3-Coder qwen3-coder textinference | Alibaba Cloud / Qwen Team | 68.7 overall | 0.0 | 56.5 | 0.0 | 0.0 | 88.2 | $0.18 in / $0.18 out |
| 7 | Grok-4.1 Fast Non-Reasoning grok-4-1-fast-non-reasoning multimodalvisionmulti-input reasoning | xAI | 67.6 overall | 0.0 | 68.0 | 0.0 | 0.0 | 66.9 | |
| 8 | Grok-4.1 Fast Reasoning grok-4-1-fast-reasoning multimodalvisionmulti-input reasoning | xAI | 67.6 overall | 0.0 | 68.0 | 0.0 | 0.0 | 66.9 | |
| 9 | Grok-4 Fast Non-Reasoning grok-4-fast-non-reasoning multimodalvisionmulti-input reasoning | xAI | 67.6 overall | 0.0 | 68.0 | 0.0 | 0.0 | 66.9 | |
| 10 | Grok-4 Fast Reasoning grok-4-fast-reasoning multimodalvisionmulti-input reasoning | xAI | 67.6 overall | 0.0 | 68.0 | 0.0 | 0.0 | 66.9 | |
| 11 | MiMo-V2-Pro mimo-v2-pro codeprogrammingtool use | Xiaomi | 66.4 overall | 0.0 | 85.3 | 0.0 | 66.5 | 35.7 | $1 in / $3 out |
| 12 | Gemini 3.1 Pro gemini-3.1-pro-preview multimodalvisionmulti-input reasoning | Google | 66.2 overall | 76.5 | 66.5 | 74.1 | 64.5 | 21.2 | |
| 13 | Gemini 3 Pro gemini-3-pro-preview multimodalvisionmulti-input reasoning | Google | 66.1 overall | 74.4 | 0.0 | 63.8 | 58.1 | 0.0 | |
| 14 | Gemini 3.1 Flash-Lite gemini-3.1-flash-lite-preview multimodalvisionmulti-input reasoning | Google | 64.6 overall | 58.1 | 85.3 | 0.0 | 0.0 | 50.0 | |
| 15 | GPT-5.2 gpt-5.2-2025-12-11 multimodalvisionmulti-input reasoning | OpenAI | 64.2 overall | 77.6 | 71.8 | 52.5 | 72.0 | 25.6 | |
| 16 | Claude Opus 4.7 claude-opus-4-7 multimodalvisionmulti-input reasoning | Anthropic | 64.2 overall | 77.1 | 43.1 | 70.7 | 80.8 | 9.8 | |
| 17 | Gemma 4 31B gemma-4-31b-it multimodalvisionmulti-input reasoning | Google | 63.8 overall | 57.1 | 67.5 | 0.0 | 0.0 | 76.4 | $0.14 in / $0.4 out |
| 18 | GPT-5 High gpt-5-high-2025-08-07 multimodalvisionmulti-input reasoning | OpenAI | 63.6 overall | 63.6 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 19 | Seed 2.0 Pro seed-2.0-pro multimodalvisionmulti-input reasoning | ByteDance | 63.1 overall | 68.4 | 0.0 | 57.4 | 62.5 | 0.0 | N/A |
| 20 | Gemini 3 Flash gemini-3-flash-preview multimodalvisionmulti-input reasoning | Google | 63.0 overall | 72.2 | 85.3 | 44.5 | 66.5 | 38.2 |
Grok-4 Heavy
xAI
73.2
N/A
Grok-4.20 Beta Non-Reasoning
xAI
70.3
$2 in / $6 out
Grok-4.20 Beta Reasoning
xAI
70.3
$2 in / $6 out
Page 1 of 15 · 291 models
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $2 in / $6 out |
| $2 in / $6 out |
| $25 in / $125 out |
| N/A |
| $0.2 in / $0.5 out |
| $0.2 in / $0.5 out |
| $0.2 in / $0.5 out |
| $0.2 in / $0.5 out |
| $2.5 in / $15 out |
| N/A |
| $0.25 in / $1.5 out |
| $1.75 in / $14 out |
| $5 in / $25 out |
| N/A |
| $0.5 in / $3 out |
Claude Mythos Preview
Anthropic
69.4
$25 in / $125 out
GPT-5.1 High
OpenAI
68.8
N/A
Qwen3-Coder
Alibaba Cloud / Qwen Team
68.7
$0.18 in / $0.18 out
Grok-4.1 Fast Non-Reasoning
xAI
67.6
$0.2 in / $0.5 out
Grok-4.1 Fast Reasoning
xAI
67.6
$0.2 in / $0.5 out
Grok-4 Fast Non-Reasoning
xAI
67.6
$0.2 in / $0.5 out
Grok-4 Fast Reasoning
xAI
67.6
$0.2 in / $0.5 out
MiMo-V2-Pro
Xiaomi
66.4
$1 in / $3 out
Gemini 3.1 Pro
66.2
$2.5 in / $15 out
Gemini 3 Pro
66.1
N/A
Gemini 3.1 Flash-Lite
64.6
$0.25 in / $1.5 out
GPT-5.2
OpenAI
64.2
$1.75 in / $14 out
Claude Opus 4.7
Anthropic
64.2
$5 in / $25 out
Gemma 4 31B
63.8
$0.14 in / $0.4 out
GPT-5 High
OpenAI
63.6
N/A
Seed 2.0 Pro
ByteDance
63.1
N/A
Gemini 3 Flash
63.0
$0.5 in / $3 out