Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
309
Tracked models
27
Providers
264
Benchmarked
27.7
Avg. index
309 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 1 | GPT-5.5 gpt-5.5 multimodalvisionmulti-input reasoning | OpenAI | 80.4 Benchmarks | 80.4 | 93.7 | 70.2 | 61.6 | 1.9 | $5 in / $30 out |
| 2 | Claude Mythos Preview claude-mythos-preview multimodalvisionmulti-input reasoning | Anthropic | 80.0 Benchmarks | 80.0 | 0.0 | 70.2 | 84.2 | 0.0 | |
| 3 | Claude Opus 4.6 claude-opus-4-6 multimodalvisionmulti-input reasoning | Anthropic | 78.2 Benchmarks | 78.2 | 31.5 | 57.8 | 72.8 | 6.3 | |
| 4 | Claude Opus 4.7 claude-opus-4-7 multimodalvisionmulti-input reasoning | Anthropic | 76.6 Benchmarks | 76.6 | 31.5 | 63.8 | 79.9 | 6.3 | |
| 5 | GPT-5.2 gpt-5.2-2025-12-11 multimodalvisionmulti-input reasoning | OpenAI | 75.3 Benchmarks | 75.3 | 66.9 | 44.4 | 70.7 | 27.1 | |
| 6 | GPT-5.4 gpt-5.4 texttext-to-textlanguage | OpenAI | 75.3 Benchmarks | 75.3 | 38.9 | 56.2 | 60.6 | 14.1 | |
| 7 | Claude Opus 4.8 claude-opus-4-8 multimodalvisionmulti-input reasoning | Anthropic | 75.2 Benchmarks | 75.2 | 31.5 | 80.0 | 82.0 | 6.3 | |
| 8 | Gemini 3.1 Pro gemini-3.1-pro-preview multimodalvisionmulti-input reasoning | Google | 73.8 Benchmarks | 73.8 | 59.4 | 68.9 | 66.0 | 18.5 | |
| 9 | Gemini 3 Pro gemini-3-pro-preview multimodalvisionmulti-input reasoning | Google | 72.0 Benchmarks | 72.0 | 0.0 | 60.7 | 54.6 | 0.0 | |
| 10 | Grok-4 Heavy grok-4-heavy multimodalvisionmulti-input reasoning | xAI | 72.0 Benchmarks | 72.0 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 11 | Qwen3.6 Plus qwen3.6-plus multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 70.2 Benchmarks | 70.2 | 72.2 | 42.1 | 61.0 | 44.9 | $0.5 in / $3 out |
| 12 | Gemini 3 Flash gemini-3-flash-preview multimodalvisionmulti-input reasoning | Google | 70.0 Benchmarks | 70.0 | 72.2 | 38.8 | 63.7 | 44.9 | |
| 13 | Muse Spark muse-spark multimodalvisionmulti-input reasoning | Meta | 69.9 Benchmarks | 69.9 | 0.0 | 64.1 | 39.1 | 0.0 | N/A |
| 14 | Kimi K2-Thinking-0905 kimi-k2-thinking-0905 codeprogrammingtool use | Moonshot AI | 68.7 Benchmarks | 68.7 | 0.0 | 52.8 | 59.8 | 0.0 | |
| 15 | GPT-5.1 High gpt-5.1-high-2025-11-12 multimodalvisionmulti-input reasoning | OpenAI | 68.3 Benchmarks | 68.3 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 16 | Seed 2.0 Pro seed-2.0-pro multimodalvisionmulti-input reasoning | ByteDance | 68.0 Benchmarks | 68.0 | 0.0 | 51.9 | 58.5 | 0.0 | N/A |
| 17 | GPT-5.5 Pro gpt-5.5-pro multimodalvisionmulti-input reasoning | OpenAI | 67.8 Benchmarks | 67.8 | 0.0 | 71.8 | 60.1 | 0.0 | N/A |
| 18 | DeepSeek-V4-Pro-Max deepseek-v4-pro-max codeprogrammingtool use | DeepSeek | 67.4 Benchmarks | 67.4 | 89.2 | 61.3 | 58.6 | 34.2 | |
| 19 | Kimi K2.5 kimi-k2.5 multimodalvisionmulti-input reasoning | Moonshot AI | 67.2 Benchmarks | 67.2 | 0.0 | 47.3 | 44.6 | 0.0 | N/A |
| 20 | Kimi K2.6 kimi-k2.6 texttext-to-textlanguage | Moonshot AI | 67.0 Benchmarks | 67.0 | 41.1 | 57.6 | 75.4 | 36.7 |
GPT-5.5
OpenAI
80.4
$5 in / $30 out
Claude Mythos Preview
Anthropic
80.0
N/A
Claude Opus 4.6
Anthropic
78.2
$5 in / $25 out
Page 1 of 16 · 309 models
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| N/A |
| $5 in / $25 out |
| $5 in / $25 out |
| $1.75 in / $14 out |
| $2.5 in / $15 out |
| $5 in / $25 out |
| $2.5 in / $15 out |
| N/A |
| $0.5 in / $3 out |
| N/A |
| N/A |
| $1.74 in / $3.48 out |
| $0.95 in / $4 out |
Claude Opus 4.7
Anthropic
76.6
$5 in / $25 out
GPT-5.2
OpenAI
75.3
$1.75 in / $14 out
Claude Opus 4.8
Anthropic
75.2
$5 in / $25 out
Gemini 3.1 Pro
73.8
$2.5 in / $15 out
Gemini 3 Pro
72.0
N/A
Grok-4 Heavy
xAI
72.0
N/A
Qwen3.6 Plus
Alibaba Cloud / Qwen Team
70.2
$0.5 in / $3 out
Gemini 3 Flash
70.0
$0.5 in / $3 out
Muse Spark
Meta
69.9
N/A
Kimi K2-Thinking-0905
Moonshot AI
68.7
N/A
GPT-5.1 High
OpenAI
68.3
N/A
Seed 2.0 Pro
ByteDance
68.0
N/A
GPT-5.5 Pro
OpenAI
67.8
N/A
DeepSeek-V4-Pro-Max
DeepSeek
67.4
$1.74 in / $3.48 out
Kimi K2.5
Moonshot AI
67.2
N/A