Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
294
Tracked models
27
Providers
251
Benchmarked
27.4
Avg. index
294 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 1 | GPT-5.5 gpt-5.5 multimodalvisionmulti-input reasoning | OpenAI | 80.3 Benchmarks | 80.3 | 84.9 | 76.2 | 65.4 | 6.7 | $5 in / $30 out |
| 2 | Claude Mythos Preview claude-mythos-preview multimodalvisionmulti-input reasoning | Anthropic | 80.0 Benchmarks | 80.0 | 0.0 | 70.0 | 84.2 | 1.7 | |
| 3 | Claude Opus 4.6 claude-opus-4-6 multimodalvisionmulti-input reasoning | Anthropic | 79.5 Benchmarks | 79.5 | 42.8 | 60.7 | 73.3 | 10.6 | |
| 4 | GPT-5.2 gpt-5.2-2025-12-11 multimodalvisionmulti-input reasoning | OpenAI | 76.9 Benchmarks | 76.9 | 71.4 | 50.3 | 72.4 | 26.4 | |
| 5 | Claude Opus 4.7 claude-opus-4-7 multimodalvisionmulti-input reasoning | Anthropic | 76.8 Benchmarks | 76.8 | 42.8 | 69.2 | 81.2 | 10.6 | |
| 6 | GPT-5.4 gpt-5.4 texttext-to-textlanguage | OpenAI | 76.3 Benchmarks | 76.3 | 51.1 | 63.8 | 62.1 | 18.2 | |
| 7 | Gemini 3.1 Pro gemini-3.1-pro-preview multimodalvisionmulti-input reasoning | Google | 74.3 Benchmarks | 74.3 | 66.8 | 72.3 | 65.5 | 22.1 | |
| 8 | Gemini 3 Pro gemini-3-pro-preview multimodalvisionmulti-input reasoning | Google | 73.3 Benchmarks | 73.3 | 0.0 | 63.8 | 57.4 | 0.0 | |
| 9 | Grok-4 Heavy grok-4-heavy multimodalvisionmulti-input reasoning | xAI | 72.4 Benchmarks | 72.4 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 10 | Qwen3.6 Plus qwen3.6-plus multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 71.9 Benchmarks | 71.9 | 0.0 | 49.3 | 62.2 | 0.0 | N/A |
| 11 | Gemini 3 Flash gemini-3-flash-preview multimodalvisionmulti-input reasoning | Google | 71.3 Benchmarks | 71.3 | 84.9 | 42.5 | 66.6 | 38.9 | |
| 12 | Muse Spark muse-spark multimodalvisionmulti-input reasoning | Meta | 71.0 Benchmarks | 71.0 | 0.0 | 67.3 | 41.3 | 0.0 | N/A |
| 13 | Kimi K2-Thinking-0905 kimi-k2-thinking-0905 codeprogrammingtool use | Moonshot AI | 69.3 Benchmarks | 69.3 | 0.0 | 53.5 | 62.5 | 0.0 | |
| 14 | GPT-5.1 High gpt-5.1-high-2025-11-12 multimodalvisionmulti-input reasoning | OpenAI | 68.7 Benchmarks | 68.7 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 15 | Kimi K2.6 kimi-k2.6 multimodalvisionmulti-input reasoning | Moonshot AI | 68.5 Benchmarks | 68.5 | 66.8 | 45.3 | 81.0 | 33.3 | |
| 16 | Seed 2.0 Pro seed-2.0-pro multimodalvisionmulti-input reasoning | ByteDance | 68.2 Benchmarks | 68.2 | 0.0 | 54.7 | 61.8 | 0.0 | N/A |
| 17 | Kimi K2.5 kimi-k2.5 multimodalvisionmulti-input reasoning | Moonshot AI | 68.0 Benchmarks | 68.0 | 66.8 | 49.5 | 48.5 | 38.1 | |
| 18 | GPT-5.5 Pro gpt-5.5-pro multimodalvisionmulti-input reasoning | OpenAI | 67.8 Benchmarks | 67.8 | 84.9 | 71.8 | 59.1 | 0.6 | $30 in / $180 out |
| 19 | GPT-5.2 Pro gpt-5.2-pro-2025-12-11 multimodalvisionmulti-input reasoning | OpenAI | 67.3 Benchmarks | 67.3 | 31.3 | 56.4 | 0.0 | 2.5 | |
| 20 | GLM-5.1 glm-5.1 codeprogrammingtool use | Zhipu AI | 67.1 Benchmarks | 67.1 | 46.6 | 54.4 | 58.3 | 30.6 | $1.4 in / $4.4 out |
GPT-5.5
OpenAI
80.3
$5 in / $30 out
Claude Mythos Preview
Anthropic
80.0
$25 in / $125 out
Claude Opus 4.6
Anthropic
79.5
$5 in / $25 out
Page 1 of 15 · 294 models
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $25 in / $125 out |
| $5 in / $25 out |
| $1.75 in / $14 out |
| $5 in / $25 out |
| $2.5 in / $15 out |
| $2.5 in / $15 out |
| N/A |
| $0.5 in / $3 out |
| N/A |
| N/A |
| $0.95 in / $4 out |
| $0.6 in / $3 out |
| $21 in / $168 out |
GPT-5.2
OpenAI
76.9
$1.75 in / $14 out
Claude Opus 4.7
Anthropic
76.8
$5 in / $25 out
Gemini 3.1 Pro
74.3
$2.5 in / $15 out
Gemini 3 Pro
73.3
N/A
Grok-4 Heavy
xAI
72.4
N/A
Qwen3.6 Plus
Alibaba Cloud / Qwen Team
71.9
N/A
Gemini 3 Flash
71.3
$0.5 in / $3 out
Muse Spark
Meta
71.0
N/A
Kimi K2-Thinking-0905
Moonshot AI
69.3
N/A
GPT-5.1 High
OpenAI
68.7
N/A
Kimi K2.6
Moonshot AI
68.5
$0.95 in / $4 out
Seed 2.0 Pro
ByteDance
68.2
N/A
Kimi K2.5
Moonshot AI
68.0
$0.6 in / $3 out
GPT-5.5 Pro
OpenAI
67.8
$30 in / $180 out
GPT-5.2 Pro
OpenAI
67.3
$21 in / $168 out
GLM-5.1
Zhipu AI
67.1
$1.4 in / $4.4 out