Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
309
Tracked models
27
Providers
264
Benchmarked
11.8
Avg. index
309 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Claude Opus 4.8 claude-opus-4-8 multimodalvisionmulti-input reasoning | Anthropic | 80.0 Agentic | 75.2 | 31.5 | 80.0 | 82.0 | 6.3 | $5 in / $25 out |
| 2 | Gemini 3.5 Flash gemini-3.5-flash multimodalvisionmulti-input reasoning | Google | 74.4 Agentic | 62.8 | 89.2 | 74.4 | 30.5 | 26.6 | |
| 3 | Claude Sonnet 4.5 claude-sonnet-4-5-20250929 multimodalvisionmulti-input reasoning | Anthropic | 71.8 Agentic | 51.9 | 14.6 | 71.8 | 74.6 | 9.3 | |
| 4 | GPT-5.5 Pro gpt-5.5-pro multimodalvisionmulti-input reasoning | OpenAI | 71.8 Agentic | 67.8 | 0.0 | 71.8 | 60.1 | 0.0 | N/A |
| 5 | Claude Mythos Preview claude-mythos-preview multimodalvisionmulti-input reasoning | Anthropic | 70.2 Agentic | 80.0 | 0.0 | 70.2 | 84.2 | 0.0 | |
| 6 | GPT-5.5 gpt-5.5 multimodalvisionmulti-input reasoning | OpenAI | 70.2 Agentic | 80.4 | 93.7 | 70.2 | 61.6 | 1.9 | $5 in / $30 out |
| 7 | Gemini 3.1 Pro gemini-3.1-pro-preview multimodalvisionmulti-input reasoning | Google | 68.9 Agentic | 73.8 | 59.4 | 68.9 | 66.0 | 18.5 | |
| 8 | Claude Opus 4.1 claude-opus-4-1-20250805 multimodalvisionmulti-input reasoning | Anthropic | 67.4 Agentic | 46.4 | 0.0 | 67.4 | 62.0 | 0.0 | |
| 9 | Muse Spark muse-spark multimodalvisionmulti-input reasoning | Meta | 64.1 Agentic | 69.9 | 0.0 | 64.1 | 39.1 | 0.0 | N/A |
| 10 | Claude Opus 4.7 claude-opus-4-7 multimodalvisionmulti-input reasoning | Anthropic | 63.8 Agentic | 76.6 | 31.5 | 63.8 | 79.9 | 6.3 | |
| 11 | Qwen3.7 Max qwen3.7-max multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 61.7 Agentic | 66.1 | 72.2 | 61.7 | 81.5 | 35.4 | $1.25 in / $3.75 out |
| 12 | DeepSeek-V4-Pro-Max deepseek-v4-pro-max codeprogrammingtool use | DeepSeek | 61.3 Agentic | 67.4 | 89.2 | 61.3 | 58.6 | 34.2 | |
| 13 | Gemini 3 Pro gemini-3-pro-preview multimodalvisionmulti-input reasoning | Google | 60.7 Agentic | 72.0 | 0.0 | 60.7 | 54.6 | 0.0 | |
| 14 | Claude Opus 4.6 claude-opus-4-6 multimodalvisionmulti-input reasoning | Anthropic | 57.8 Agentic | 78.2 | 31.5 | 57.8 | 72.8 | 6.3 | |
| 15 | Kimi K2.6 kimi-k2.6 texttext-to-textlanguage | Moonshot AI | 57.6 Agentic | 67.0 | 41.1 | 57.6 | 75.4 | 36.7 | |
| 16 | Claude Opus 4 claude-opus-4-20250514 multimodalvisionmulti-input reasoning | Anthropic | 57.4 Agentic | 36.4 | 0.0 | 57.4 | 47.8 | 0.0 | |
| 17 | Nova 2 Pro nova-2-pro multimodalvisionmulti-input reasoning | Amazon | 57.2 Agentic | 46.8 | 0.0 | 57.2 | 50.6 | 0.0 | N/A |
| 18 | GPT-5.4 gpt-5.4 texttext-to-textlanguage | OpenAI | 56.2 Agentic | 75.3 | 38.9 | 56.2 | 60.6 | 14.1 | |
| 19 | Qwen3 VL 235B A22B Instruct qwen3-vl-235b-a22b-instruct multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 55.8 Agentic | 36.1 | 41.1 | 55.8 | 0.0 | 62.0 | |
| 20 | GLM-5V-Turbo glm-5v-turbo multimodalvisionmulti-input reasoning | Zhipu AI | 54.9 Agentic | 0.0 | 0.0 | 54.9 | 0.0 | 0.0 | N/A |
Claude Opus 4.8
Anthropic
80.0
$5 in / $25 out
Gemini 3.5 Flash
74.4
$1.5 in / $9 out
Claude Sonnet 4.5
Anthropic
71.8
$3 in / $15 out
Page 1 of 16 · 309 models
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $1.5 in / $9 out |
| $3 in / $15 out |
| N/A |
| $2.5 in / $15 out |
| N/A |
| $5 in / $25 out |
| $1.74 in / $3.48 out |
| N/A |
| $5 in / $25 out |
| $0.95 in / $4 out |
| N/A |
| $2.5 in / $15 out |
| $0.3 in / $1.49 out |
GPT-5.5 Pro
OpenAI
71.8
N/A
Claude Mythos Preview
Anthropic
70.2
N/A
GPT-5.5
OpenAI
70.2
$5 in / $30 out
Gemini 3.1 Pro
68.9
$2.5 in / $15 out
Claude Opus 4.1
Anthropic
67.4
N/A
Muse Spark
Meta
64.1
N/A
Claude Opus 4.7
Anthropic
63.8
$5 in / $25 out
Qwen3.7 Max
Alibaba Cloud / Qwen Team
61.7
$1.25 in / $3.75 out
DeepSeek-V4-Pro-Max
DeepSeek
61.3
$1.74 in / $3.48 out
Gemini 3 Pro
60.7
N/A
Claude Opus 4.6
Anthropic
57.8
$5 in / $25 out
Claude Opus 4
Anthropic
57.4
N/A
Nova 2 Pro
Amazon
57.2
N/A
Qwen3 VL 235B A22B Instruct
Alibaba Cloud / Qwen Team
55.8
$0.3 in / $1.49 out
GLM-5V-Turbo
Zhipu AI
54.9
N/A