Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
309
Tracked models
27
Providers
264
Benchmarked
29.3
Avg. index
309 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 21 | GPT-5.1 Thinking gpt-5.1-thinking-2025-11-12 multimodalvisionmulti-input reasoning | OpenAI | 61.1 overall | 65.4 | 0.0 | 0.0 | 55.7 | 0.0 | N/A |
| 22 | Kimi K2-Thinking-0905 kimi-k2-thinking-0905 codeprogrammingtool use | Moonshot AI | 60.9 overall | 68.7 | 0.0 | 52.8 | 59.8 | 0.0 | |
| 23 | GPT-5.1 Codex High gpt-5.1-codex-high multimodalvisionmulti-input reasoning | OpenAI | 60.9 overall | 60.9 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 24 | GPT-5.2 gpt-5.2-2025-12-11 multimodalvisionmulti-input reasoning | OpenAI | 60.7 overall | 75.3 | 66.9 | 44.4 | 70.7 | 27.1 | |
| 25 | Grok 4.3 grok-4.3 textinference | xAI | 60.5 overall | 0.0 | 72.2 | 0.0 | 0.0 | 41.8 | $1.25 in / $2.5 out |
| 26 | Claude Opus 4.7 claude-opus-4-7 multimodalvisionmulti-input reasoning | Anthropic | 60.0 overall | 76.6 | 31.5 | 63.8 | 79.9 | 6.3 | |
| 27 | Seed 2.0 Pro seed-2.0-pro multimodalvisionmulti-input reasoning | ByteDance | 60.0 overall | 68.0 | 0.0 | 51.9 | 58.5 | 0.0 | N/A |
| 28 | GPT-5.2 Pro gpt-5.2-pro-2025-12-11 multimodalvisionmulti-input reasoning | OpenAI | 60.0 overall | 65.5 | 0.0 | 53.4 | 0.0 | 0.0 | |
| 29 | MiniMax M2.5 minimax-m2.5 codeprogrammingtool use | MiniMax | 59.8 overall | 0.0 | 72.2 | 50.4 | 56.9 | 68.6 | $0.3 in / $1.2 out |
| 30 | Kimi K2.6 kimi-k2.6 texttext-to-textlanguage | Moonshot AI | 59.4 overall | 67.0 | 41.1 | 57.6 | 75.4 | 36.7 | |
| 31 | Qwen3.6 Plus qwen3.6-plus multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 59.2 overall | 70.2 | 72.2 | 42.1 | 61.0 | 44.9 | $0.5 in / $3 out |
| 32 | Gemini 3.5 Flash gemini-3.5-flash multimodalvisionmulti-input reasoning | Google | 59.1 overall | 62.8 | 89.2 | 74.4 | 30.5 | 26.6 | |
| 33 | Gemini 3 Flash gemini-3-flash-preview multimodalvisionmulti-input reasoning | Google | 58.9 overall | 70.0 | 72.2 | 38.8 | 63.7 | 44.9 | |
| 34 | Muse Spark muse-spark multimodalvisionmulti-input reasoning | Meta | 58.9 overall | 69.9 | 0.0 | 64.1 | 39.1 | 0.0 | N/A |
| 35 | GPT-5.1 gpt-5.1-2025-11-13 multimodalvisionmulti-input reasoning | OpenAI | 58.7 overall | 65.4 | 66.9 | 0.0 | 55.7 | 33.2 | |
| 36 | GPT-5.1 Instant gpt-5.1-instant-2025-11-12 multimodalvisionmulti-input reasoning | OpenAI | 58.7 overall | 65.4 | 66.9 | 0.0 | 55.7 | 33.2 | |
| 37 | ERNIE 5.0 ernie-5.0 multimodalvisionmulti-input reasoning | Baidu | 58.2 overall | 58.2 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 38 | Step-3.5-Flash step-3.5-flash codeprogrammingtool use | StepFun | 57.9 overall | 62.8 | 60.4 | 42.0 | 50.6 | 95.0 | $0.1 in / $0.4 out |
| 39 | Claude Opus 4.1 claude-opus-4-1-20250805 multimodalvisionmulti-input reasoning | Anthropic | 57.9 overall | 46.4 | 0.0 | 67.4 | 62.0 | 0.0 | |
| 40 | Claude Opus 4.6 claude-opus-4-6 multimodalvisionmulti-input reasoning | Anthropic | 57.4 overall | 78.2 | 31.5 | 57.8 | 72.8 | 6.3 |
GPT-5.1 Thinking
OpenAI
61.1
N/A
Kimi K2-Thinking-0905
Moonshot AI
60.9
N/A
GPT-5.1 Codex High
OpenAI
60.9
N/A
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| N/A |
| N/A |
| $1.75 in / $14 out |
| $5 in / $25 out |
| N/A |
| $0.95 in / $4 out |
| $1.5 in / $9 out |
| $0.5 in / $3 out |
| $1.25 in / $10 out |
| $1.25 in / $10 out |
| N/A |
| $5 in / $25 out |
GPT-5.2
OpenAI
60.7
$1.75 in / $14 out
Grok 4.3
xAI
60.5
$1.25 in / $2.5 out
Claude Opus 4.7
Anthropic
60.0
$5 in / $25 out
Seed 2.0 Pro
ByteDance
60.0
N/A
GPT-5.2 Pro
OpenAI
60.0
N/A
MiniMax M2.5
MiniMax
59.8
$0.3 in / $1.2 out
Qwen3.6 Plus
Alibaba Cloud / Qwen Team
59.2
$0.5 in / $3 out
Gemini 3.5 Flash
59.1
$1.5 in / $9 out
Gemini 3 Flash
58.9
$0.5 in / $3 out
Muse Spark
Meta
58.9
N/A
GPT-5.1
OpenAI
58.7
$1.25 in / $10 out
GPT-5.1 Instant
OpenAI
58.7
$1.25 in / $10 out
ERNIE 5.0
Baidu
58.2
N/A
Step-3.5-Flash
StepFun
57.9
$0.1 in / $0.4 out
Claude Opus 4.1
Anthropic
57.9
N/A
Claude Opus 4.6
Anthropic
57.4
$5 in / $25 out