Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
296
Tracked models
27
Providers
253
Benchmarked
27.4
Avg. index
296 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 81 | K-EXAONE-236B-A23B k-exaone-236b-a23b multimodalvisionmulti-input reasoning | LG AI Research | 43.4 Benchmarks | 43.4 | 24.9 | 0.0 | 0.0 | 49.2 | $0.6 in / $1 out |
| 82 | Gemma 4 26B-A4B gemma-4-26b-a4b-it multimodalvisionmulti-input reasoning | Google | 43.3 Benchmarks | 43.3 | 66.0 | 0.0 | 0.0 | 77.5 | |
| 83 | o1 o1-2024-12-17 multimodalvisionmulti-input reasoning | OpenAI | 42.9 Benchmarks | 42.9 | 19.4 | 44.7 | 6.5 | 4.9 | $15 in / $60 out |
| 84 | Sarvam-105B sarvam-105b codeprogrammingtool use | Sarvam AI | 42.9 Benchmarks | 42.9 | 0.0 | 17.9 | 12.1 | 0.0 | N/A |
| 85 | Qwen3-235B-A22B-Instruct-2507 qwen3-235b-a22b-instruct-2507 textinference | Alibaba Cloud / Qwen Team | 42.4 Benchmarks | 42.4 | 66.0 | 0.0 | 0.0 | 63.2 | $0.15 in / $0.8 out |
| 86 | MiniMax M2.1 minimax-m2.1 codeprogrammingtool use | MiniMax | 42.2 Benchmarks | 42.2 | 74.5 | 53.9 | 50.3 | 57.9 | $0.3 in / $1.2 out |
| 87 | GPT-4.5 gpt-4.5 multimodalvisionmulti-input reasoning | OpenAI | 41.9 Benchmarks | 41.9 | 29.7 | 35.8 | 6.0 | 7.0 | $75 in / $150 out |
| 88 | o1-preview o1-preview codeprogrammingtool use | OpenAI | 41.8 Benchmarks | 41.8 | 33.0 | 0.0 | 9.5 | 11.8 | $15 in / $60 out |
| 89 | GPT-5 mini gpt-5-mini-2025-08-07 multimodalvisionmulti-input reasoning | OpenAI | 41.5 Benchmarks | 41.5 | 89.4 | 0.0 | 23.7 | 56.3 | |
| 90 | Claude Sonnet 4 claude-sonnet-4-20250514 multimodalvisionmulti-input reasoning | Anthropic | 40.9 Benchmarks | 40.9 | 0.0 | 49.4 | 44.3 | 0.0 | |
| 91 | Gemini 2.5 Flash gemini-2.5-flash multimodalvisionmulti-input reasoning | Google | 39.6 Benchmarks | 39.6 | 62.8 | 0.0 | 22.9 | 42.6 | |
| 92 | DeepSeek R1 Zero deepseek-r1-zero textinference | DeepSeek | 39.4 Benchmarks | 39.4 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 93 | Qwen3.5-9B qwen3.5-9b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 38.5 Benchmarks | 38.5 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 94 | DeepSeek-V3.1 deepseek-v3.1 codeprogrammingtool use | DeepSeek | 38.4 Benchmarks | 38.4 | 39.8 | 15.2 | 28.3 | 58.8 | $0.27 in / $1 out |
| 95 | GLM-4.7-Flash glm-4.7-flash codeprogrammingtool use | Zhipu AI | 38.2 Benchmarks | 38.2 | 29.7 | 11.4 | 20.7 | 72.1 | $0.07 in / $0.4 out |
| 96 | QvQ-72B-Preview qvq-72b-preview multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 38.2 Benchmarks | 38.2 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 97 | Ministral 3 (14B Reasoning 2512) ministral-14b-latest multimodalvisionmulti-input reasoning | Mistral AI | 37.7 Benchmarks | 37.7 | 77.0 | 0.0 | 0.0 | 84.8 | |
| 98 | Qwen3 VL 235B A22B Thinking qwen3-vl-235b-a22b-thinking multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 37.7 Benchmarks | 37.7 | 66.0 | 40.2 | 0.0 | 37.4 | |
| 99 | Claude Opus 4 claude-opus-4-20250514 multimodalvisionmulti-input reasoning | Anthropic | 37.6 Benchmarks | 37.6 | 0.0 | 57.9 | 48.9 | 0.0 | |
| 100 | Qwen3 VL 235B A22B Instruct qwen3-vl-235b-a22b-instruct multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 36.9 Benchmarks | 36.9 | 66.0 | 56.7 | 0.0 | 49.5 |
K-EXAONE-236B-A23B
LG AI Research
43.4
$0.6 in / $1 out
Gemma 4 26B-A4B
43.3
$0.13 in / $0.4 out
o1
OpenAI
42.9
$15 in / $60 out
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $0.13 in / $0.4 out |
| $0.25 in / $2 out |
| N/A |
| $0.3 in / $2.5 out |
| $0.2 in / $0.2 out |
| $0.45 in / $3.49 out |
| N/A |
| $0.3 in / $1.5 out |
Sarvam-105B
Sarvam AI
42.9
N/A
Qwen3-235B-A22B-Instruct-2507
Alibaba Cloud / Qwen Team
42.4
$0.15 in / $0.8 out
MiniMax M2.1
MiniMax
42.2
$0.3 in / $1.2 out
GPT-4.5
OpenAI
41.9
$75 in / $150 out
o1-preview
OpenAI
41.8
$15 in / $60 out
GPT-5 mini
OpenAI
41.5
$0.25 in / $2 out
Claude Sonnet 4
Anthropic
40.9
N/A
Gemini 2.5 Flash
39.6
$0.3 in / $2.5 out
DeepSeek R1 Zero
DeepSeek
39.4
N/A
Qwen3.5-9B
Alibaba Cloud / Qwen Team
38.5
N/A
DeepSeek-V3.1
DeepSeek
38.4
$0.27 in / $1 out
GLM-4.7-Flash
Zhipu AI
38.2
$0.07 in / $0.4 out
QvQ-72B-Preview
Alibaba Cloud / Qwen Team
38.2
N/A
Ministral 3 (14B Reasoning 2512)
Mistral AI
37.7
$0.2 in / $0.2 out
Qwen3 VL 235B A22B Thinking
Alibaba Cloud / Qwen Team
37.7
$0.45 in / $3.49 out
Claude Opus 4
Anthropic
37.6
N/A
Qwen3 VL 235B A22B Instruct
Alibaba Cloud / Qwen Team
36.9
$0.3 in / $1.5 out