Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
294
Tracked models
27
Providers
251
Benchmarked
27.4
Avg. index
294 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 21 | Claude Sonnet 4.6 claude-sonnet-4-6 multimodalvisionmulti-input reasoning | Anthropic | 66.1 Benchmarks | 66.1 | 30.1 | 49.6 | 68.9 | 13.2 | $3 in / $15 out |
| 22 | GPT-5.1 gpt-5.1-2025-11-13 multimodalvisionmulti-input reasoning | OpenAI | 65.0 Benchmarks | 65.0 | 71.4 | 0.0 | 57.2 | 31.9 | |
| 23 | GPT-5.1 Instant gpt-5.1-instant-2025-11-12 multimodalvisionmulti-input reasoning | OpenAI | 65.0 Benchmarks | 65.0 | 71.4 | 0.0 | 57.2 | 31.9 | |
| 24 | GPT-5.1 Thinking gpt-5.1-thinking-2025-11-12 multimodalvisionmulti-input reasoning | OpenAI | 65.0 Benchmarks | 65.0 | 55.1 | 0.0 | 57.2 | 27.0 | |
| 25 | Qwen3.5-122B-A10B qwen3.5-122b-a10b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 64.8 Benchmarks | 64.8 | 66.8 | 51.6 | 41.5 | 38.1 | $0.4 in / $3.2 out |
| 26 | GPT-5 gpt-5-2025-08-07 multimodalvisionmulti-input reasoning | OpenAI | 64.4 Benchmarks | 64.4 | 0.0 | 29.0 | 51.7 | 0.0 | N/A |
| 27 | GPT-5.1 Medium gpt-5.1-medium-2025-11-12 multimodalvisionmulti-input reasoning | OpenAI | 63.6 Benchmarks | 63.6 | 61.6 | 0.0 | 0.0 | 29.0 | |
| 28 | GLM-4.7 glm-4.7 multimodalvisionmulti-input reasoning | Zhipu AI | 63.2 Benchmarks | 63.2 | 52.8 | 28.2 | 44.5 | 40.6 | $0.6 in / $2.2 out |
| 29 | GPT-5 High gpt-5-high-2025-08-07 multimodalvisionmulti-input reasoning | OpenAI | 63.2 Benchmarks | 63.2 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 30 | Step-3.5-Flash step-3.5-flash codeprogrammingtool use | StepFun | 62.3 Benchmarks | 62.3 | 63.2 | 45.3 | 53.0 | 82.1 | $0.1 in / $0.4 out |
| 31 | Qwen3.5-27B qwen3.5-27b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 61.9 Benchmarks | 61.9 | 66.8 | 47.5 | 42.4 | 43.9 | $0.3 in / $2.4 out |
| 32 | GPT-5.1 Codex High gpt-5.1-codex-high multimodalvisionmulti-input reasoning | OpenAI | 61.0 Benchmarks | 61.0 | 48.6 | 0.0 | 0.0 | 25.1 | |
| 33 | Qwen3.6-27B qwen3.6-27b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 59.8 Benchmarks | 59.8 | 0.0 | 0.0 | 44.6 | 0.0 | N/A |
| 34 | ERNIE 5.0 ernie-5.0 multimodalvisionmulti-input reasoning | Baidu | 59.7 Benchmarks | 59.7 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 35 | Grok-3 grok-3 multimodalvisionmulti-input reasoning | xAI | 59.5 Benchmarks | 59.5 | 51.9 | 0.0 | 0.0 | 22.6 | $3 in / $15 out |
| 36 | Qwen3.5-397B-A17B qwen3.5-397b-a17b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 58.6 Benchmarks | 58.6 | 66.8 | 35.6 | 60.9 | 35.3 | $0.6 in / $3.6 out |
| 37 | DeepSeek-V3.2 deepseek-v3.2 codeprogrammingtool use | DeepSeek | 58.1 Benchmarks | 58.1 | 52.5 | 16.6 | 45.9 | 70.0 | $0.26 in / $0.38 out |
| 38 | Seed 2.0 Lite seed-2.0-lite multimodalvisionmulti-input reasoning | ByteDance | 58.1 Benchmarks | 58.1 | 0.0 | 0.0 | 50.3 | 0.0 | N/A |
| 39 | Grok 4 Fast grok-4-fast multimodalvisionmulti-input reasoning | xAI | 58.0 Benchmarks | 58.0 | 68.2 | 15.4 | 0.0 | 67.2 | $0.2 in / $0.5 out |
| 40 | GPT-5.4 Mini gpt-5.4-mini texttext-to-textlanguage | OpenAI | 57.4 Benchmarks | 57.4 | 77.4 | 27.1 | 26.9 | 32.8 |
Claude Sonnet 4.6
Anthropic
66.1
$3 in / $15 out
GPT-5.1
OpenAI
65.0
$1.25 in / $10 out
GPT-5.1 Instant
OpenAI
65.0
$1.25 in / $10 out
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $1.25 in / $10 out |
| $1.25 in / $10 out |
| $1.25 in / $10 out |
| $1.25 in / $10 out |
| N/A |
| $1.25 in / $10 out |
| $0.75 in / $4.5 out |
GPT-5.1 Thinking
OpenAI
65.0
$1.25 in / $10 out
Qwen3.5-122B-A10B
Alibaba Cloud / Qwen Team
64.8
$0.4 in / $3.2 out
GPT-5
OpenAI
64.4
N/A
GPT-5.1 Medium
OpenAI
63.6
$1.25 in / $10 out
GLM-4.7
Zhipu AI
63.2
$0.6 in / $2.2 out
GPT-5 High
OpenAI
63.2
N/A
Step-3.5-Flash
StepFun
62.3
$0.1 in / $0.4 out
Qwen3.5-27B
Alibaba Cloud / Qwen Team
61.9
$0.3 in / $2.4 out
GPT-5.1 Codex High
OpenAI
61.0
$1.25 in / $10 out
Qwen3.6-27B
Alibaba Cloud / Qwen Team
59.8
N/A
ERNIE 5.0
Baidu
59.7
N/A
Grok-3
xAI
59.5
$3 in / $15 out
Qwen3.5-397B-A17B
Alibaba Cloud / Qwen Team
58.6
$0.6 in / $3.6 out
DeepSeek-V3.2
DeepSeek
58.1
$0.26 in / $0.38 out
Seed 2.0 Lite
ByteDance
58.1
N/A
Grok 4 Fast
xAI
58.0
$0.2 in / $0.5 out