Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
294
Tracked models
27
Providers
251
Benchmarked
34.7
Avg. index
294 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 21 | Gemini 3 Flash gemini-3-flash-preview multimodalvisionmulti-input reasoning | Google | 62.3 overall | 71.3 | 84.9 | 42.5 | 66.6 | 38.9 | $0.5 in / $3 out |
| 22 | Kimi K2-Thinking-0905 kimi-k2-thinking-0905 codeprogrammingtool use | Moonshot AI | 62.2 overall | 69.3 | 0.0 | 53.5 | 62.5 | 0.0 | |
| 23 | DeepSeek-V3.2 (Non-thinking) deepseek-chat textinference | DeepSeek | 62.2 overall | 0.0 | 57.3 | 0.0 | 0.0 | 70.1 | $0.28 in / $0.42 out |
| 24 | Kimi K2.6 kimi-k2.6 multimodalvisionmulti-input reasoning | Moonshot AI | 61.9 overall | 68.5 | 66.8 | 45.3 | 81.0 | 33.3 | $0.95 in / $4 out |
| 25 | Seed 2.0 Pro seed-2.0-pro multimodalvisionmulti-input reasoning | ByteDance | 61.9 overall | 68.2 | 0.0 | 54.7 | 61.8 | 0.0 | N/A |
| 26 | Qwen3.6 Plus qwen3.6-plus multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 61.7 overall | 71.9 | 0.0 | 49.3 | 62.2 | 0.0 | N/A |
| 27 | Muse Spark muse-spark multimodalvisionmulti-input reasoning | Meta | 61.0 overall | 71.0 | 0.0 | 67.3 | 41.3 | 0.0 | N/A |
| 28 | Claude Opus 4.6 claude-opus-4-6 multimodalvisionmulti-input reasoning | Anthropic | 60.9 overall | 79.5 | 42.8 | 60.7 | 73.3 | 10.6 | |
| 29 | Gemini 2.0 Flash gemini-2.0-flash multimodalvisionmulti-input reasoning | Google | 60.5 overall | 33.4 | 94.1 | 0.0 | 0.0 | 82.7 | |
| 30 | GPT-5.4 gpt-5.4 texttext-to-textlanguage | OpenAI | 60.3 overall | 76.3 | 51.1 | 63.8 | 62.1 | 18.2 | |
| 31 | GPT-5.1 gpt-5.1-2025-11-13 multimodalvisionmulti-input reasoning | OpenAI | 59.8 overall | 65.0 | 71.4 | 0.0 | 57.2 | 31.9 | |
| 32 | GPT-5.1 Instant gpt-5.1-instant-2025-11-12 multimodalvisionmulti-input reasoning | OpenAI | 59.8 overall | 65.0 | 71.4 | 0.0 | 57.2 | 31.9 | |
| 33 | ERNIE 5.0 ernie-5.0 multimodalvisionmulti-input reasoning | Baidu | 59.7 overall | 59.7 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 34 | MiniMax M2.5 minimax-m2.5 codeprogrammingtool use | MiniMax | 59.3 overall | 0.0 | 73.9 | 53.0 | 56.3 | 57.7 | $0.3 in / $1.2 out |
| 35 | Llama 4 Scout llama-4-scout multimodalvisionmulti-input reasoning | Meta | 58.8 overall | 29.2 | 93.0 | 0.0 | 0.0 | 87.2 | $0.08 in / $0.3 out |
| 36 | Ministral 3 (8B Reasoning 2512) ministral-8b-latest multimodalvisionmulti-input reasoning | Mistral AI | 58.6 overall | 31.8 | 84.8 | 0.0 | 0.0 | 92.1 | |
| 37 | Step-3.5-Flash step-3.5-flash codeprogrammingtool use | StepFun | 58.3 overall | 62.3 | 63.2 | 45.3 | 53.0 | 82.1 | $0.1 in / $0.4 out |
| 38 | Ministral 3 (14B Reasoning 2512) ministral-14b-latest multimodalvisionmulti-input reasoning | Mistral AI | 58.0 overall | 37.9 | 76.8 | 0.0 | 0.0 | 84.5 | |
| 39 | Gemma 4 26B-A4B gemma-4-26b-a4b-it multimodalvisionmulti-input reasoning | Google | 56.8 overall | 43.7 | 66.8 | 0.0 | 0.0 | 77.8 | |
| 40 | GPT-5.1 Medium gpt-5.1-medium-2025-11-12 multimodalvisionmulti-input reasoning | OpenAI | 56.6 overall | 63.6 | 61.6 | 0.0 | 0.0 | 29.0 |
Gemini 3 Flash
62.3
$0.5 in / $3 out
Kimi K2-Thinking-0905
Moonshot AI
62.2
N/A
DeepSeek-V3.2 (Non-thinking)
DeepSeek
62.2
$0.28 in / $0.42 out
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| N/A |
| $5 in / $25 out |
| $0.1 in / $0.4 out |
| $2.5 in / $15 out |
| $1.25 in / $10 out |
| $1.25 in / $10 out |
| $0.15 in / $0.15 out |
| $0.2 in / $0.2 out |
| $0.13 in / $0.4 out |
| $1.25 in / $10 out |
Kimi K2.6
Moonshot AI
61.9
$0.95 in / $4 out
Seed 2.0 Pro
ByteDance
61.9
N/A
Qwen3.6 Plus
Alibaba Cloud / Qwen Team
61.7
N/A
Muse Spark
Meta
61.0
N/A
Claude Opus 4.6
Anthropic
60.9
$5 in / $25 out
Gemini 2.0 Flash
60.5
$0.1 in / $0.4 out
GPT-5.1
OpenAI
59.8
$1.25 in / $10 out
GPT-5.1 Instant
OpenAI
59.8
$1.25 in / $10 out
ERNIE 5.0
Baidu
59.7
N/A
MiniMax M2.5
MiniMax
59.3
$0.3 in / $1.2 out
Llama 4 Scout
Meta
58.8
$0.08 in / $0.3 out
Ministral 3 (8B Reasoning 2512)
Mistral AI
58.6
$0.15 in / $0.15 out
Step-3.5-Flash
StepFun
58.3
$0.1 in / $0.4 out
Ministral 3 (14B Reasoning 2512)
Mistral AI
58.0
$0.2 in / $0.2 out
Gemma 4 26B-A4B
56.8
$0.13 in / $0.4 out
GPT-5.1 Medium
OpenAI
56.6
$1.25 in / $10 out