Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
294
Tracked models
27
Providers
251
Benchmarked
34.7
Avg. index
294 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 41 | Kimi K2.5 kimi-k2.5 multimodalvisionmulti-input reasoning | Moonshot AI | 56.1 overall | 68.0 | 66.8 | 49.5 | 48.5 | 38.1 | $0.6 in / $3 out |
| 42 | GPT-5.1 Thinking gpt-5.1-thinking-2025-11-12 multimodalvisionmulti-input reasoning | OpenAI | 55.7 overall | 65.0 | 55.1 | 0.0 | 57.2 | 27.0 | |
| 43 | GLM-5.1 glm-5.1 codeprogrammingtool use | Zhipu AI | 55.2 overall | 67.1 | 46.6 | 54.4 | 58.3 | 30.6 | $1.4 in / $4.4 out |
| 44 | Grok-3 Mini grok-3-mini multimodalvisionmulti-input reasoning | xAI | 55.1 overall | 53.4 | 51.9 | 0.0 | 0.0 | 65.0 | $0.3 in / $0.5 out |
| 45 | GLM-5V-Turbo glm-5v-turbo multimodalvisionmulti-input reasoning | Zhipu AI | 54.9 overall | 0.0 | 0.0 | 54.9 | 0.0 | 0.0 | N/A |
| 46 | Claude Sonnet 4.5 claude-sonnet-4-5-20250929 multimodalvisionmulti-input reasoning | Anthropic | 54.7 overall | 53.3 | 30.1 | 71.8 | 74.6 | 13.2 | |
| 47 | Seed 2.0 Lite seed-2.0-lite multimodalvisionmulti-input reasoning | ByteDance | 54.6 overall | 58.1 | 0.0 | 0.0 | 50.3 | 0.0 | N/A |
| 48 | MiMo-V2-Omni mimo-v2-omni multimodalvisionmulti-input reasoning | Xiaomi | 54.5 overall | 0.0 | 59.2 | 0.0 | 55.6 | 44.7 | $0.4 in / $2 out |
| 49 | MiniMax M2.1 minimax-m2.1 codeprogrammingtool use | MiniMax | 54.3 overall | 42.7 | 73.9 | 56.6 | 50.6 | 57.7 | $0.3 in / $1.2 out |
| 50 | GPT-5 Codex gpt-5-codex-2025-09-15 codeprogrammingtool use | OpenAI | 54.3 overall | 0.0 | 0.0 | 0.0 | 54.3 | 0.0 | N/A |
| 51 | Qwen3.5-122B-A10B qwen3.5-122b-a10b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 54.1 overall | 64.8 | 66.8 | 51.6 | 41.5 | 38.1 | $0.4 in / $3.2 out |
| 52 | ChatGPT-4o Latest chatgpt-4o-latest multimodalvisionmulti-input reasoning | OpenAI | 54.1 overall | 56.6 | 63.5 | 0.0 | 0.0 | 32.0 | |
| 53 | GPT OSS 20B High gpt-oss-20b-high textinference | OpenAI | 53.9 overall | 53.9 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 54 | GPT OSS 120B High gpt-oss-120b-high multimodalvisionmulti-input reasoning | OpenAI | 53.8 overall | 44.9 | 57.3 | 0.0 | 0.0 | 73.2 | |
| 55 | Qwen3-235B-A22B-Instruct-2507 qwen3-235b-a22b-instruct-2507 textinference | Alibaba Cloud / Qwen Team | 53.6 overall | 42.9 | 66.8 | 0.0 | 0.0 | 62.8 | $0.15 in / $0.8 out |
| 56 | Qwen3.5-27B qwen3.5-27b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 53.1 overall | 61.9 | 66.8 | 47.5 | 42.4 | 43.9 | $0.3 in / $2.4 out |
| 57 | Qwen3.6-27B qwen3.6-27b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 53.1 overall | 59.8 | 0.0 | 0.0 | 44.6 | 0.0 | N/A |
| 58 | GPT-5 Medium gpt-5-medium-2025-08-07 multimodalvisionmulti-input reasoning | OpenAI | 53.1 overall | 56.9 | 61.6 | 0.0 | 0.0 | 29.0 | |
| 59 | Min istral 3 (3B Reasoning 2512) ministral-3b-latest multimodalvisionmulti-input reasoning | Mistral AI | 52.8 overall | 22.1 | 79.7 | 0.0 | 0.0 | 95.8 | |
| 60 | Qwen3.5-397B-A17B qwen3.5-397b-a17b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 52.6 overall | 58.6 | 66.8 | 35.6 | 60.9 | 35.3 | $0.6 in / $3.6 out |
Kimi K2.5
Moonshot AI
56.1
$0.6 in / $3 out
GPT-5.1 Thinking
OpenAI
55.7
$1.25 in / $10 out
GLM-5.1
Zhipu AI
55.2
$1.4 in / $4.4 out
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $1.25 in / $10 out |
| $3 in / $15 out |
| $2.5 in / $10 out |
| $0.1 in / $0.5 out |
| $1.25 in / $10 out |
| $0.1 in / $0.1 out |
Grok-3 Mini
xAI
55.1
$0.3 in / $0.5 out
GLM-5V-Turbo
Zhipu AI
54.9
N/A
Claude Sonnet 4.5
Anthropic
54.7
$3 in / $15 out
Seed 2.0 Lite
ByteDance
54.6
N/A
MiMo-V2-Omni
Xiaomi
54.5
$0.4 in / $2 out
MiniMax M2.1
MiniMax
54.3
$0.3 in / $1.2 out
GPT-5 Codex
OpenAI
54.3
N/A
Qwen3.5-122B-A10B
Alibaba Cloud / Qwen Team
54.1
$0.4 in / $3.2 out
ChatGPT-4o Latest
OpenAI
54.1
$2.5 in / $10 out
GPT OSS 20B High
OpenAI
53.9
N/A
GPT OSS 120B High
OpenAI
53.8
$0.1 in / $0.5 out
Qwen3-235B-A22B-Instruct-2507
Alibaba Cloud / Qwen Team
53.6
$0.15 in / $0.8 out
Qwen3.5-27B
Alibaba Cloud / Qwen Team
53.1
$0.3 in / $2.4 out
Qwen3.6-27B
Alibaba Cloud / Qwen Team
53.1
N/A
GPT-5 Medium
OpenAI
53.1
$1.25 in / $10 out
Min istral 3 (3B Reasoning 2512)
Mistral AI
52.8
$0.1 in / $0.1 out
Qwen3.5-397B-A17B
Alibaba Cloud / Qwen Team
52.6
$0.6 in / $3.6 out