Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
296
Tracked models
27
Providers
253
Benchmarked
34.7
Avg. index
296 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 121 | Qwen3 VL 30B A3B Thinking qwen3-vl-30b-a3b-thinking multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 40.4 overall | 35.1 | 66.0 | 21.3 | 0.0 | 59.9 | $0.2 in / $1 out |
| 122 | Granite 3.3 8B Instruct granite-3.3-8b-instruct multimodalvisionmulti-input reasoning | IBM | 40.1 overall | 0.0 | 29.7 | 0.0 | 0.0 | 56.7 | $0.5 in / $0.5 out |
| 123 | Gemini 2.5 Flash gemini-2.5-flash multimodalvisionmulti-input reasoning | Google | 40.0 overall | 39.6 | 62.8 | 0.0 | 22.9 | 42.6 | |
| 124 | Qwen3 VL 32B Thinking qwen3-vl-32b-thinking multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 39.8 overall | 44.3 | 0.0 | 34.6 | 0.0 | 0.0 | |
| 125 | DeepSeek-V3 0324 deepseek-v3-0324 textinference | DeepSeek | 39.5 overall | 32.8 | 39.8 | 0.0 | 0.0 | 57.7 | $0.28 in / $1.14 out |
| 126 | DeepSeek R1 Zero deepseek-r1-zero textinference | DeepSeek | 39.4 overall | 39.4 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 127 | Qwen3 VL 8B Thinking qwen3-vl-8b-thinking multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 39.4 overall | 35.6 | 66.0 | 23.5 | 0.0 | 45.6 | |
| 128 | Qwen3 VL 30B A3B Instruct qwen3-vl-30b-a3b-instruct multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 39.2 overall | 28.3 | 66.0 | 23.6 | 0.0 | 63.7 | |
| 129 | Nova Pro nova-pro multimodalvisionmulti-input reasoning | Amazon | 39.2 overall | 20.0 | 70.5 | 0.0 | 0.0 | 43.2 | $0.8 in / $3.2 out |
| 130 | Qwen2.5 7B Instruct qwen-2.5-7b-instruct textinference | Alibaba Cloud / Qwen Team | 39.2 overall | 7.4 | 71.1 | 0.0 | 0.0 | 77.2 | $0.3 in / $0.3 out |
| 131 | K-EXAONE-236B-A23B k-exaone-236b-a23b multimodalvisionmulti-input reasoning | LG AI Research | 39.0 overall | 43.4 | 24.9 | 0.0 | 0.0 | 49.2 | $0.6 in / $1 out |
| 132 | Claude 3.7 Sonnet claude-3-7-sonnet-20250219 multimodalvisionmulti-input reasoning | Anthropic | 38.9 overall | 43.5 | 30.5 | 49.0 | 39.6 | 13.3 | |
| 133 | Qwen3.5-9B qwen3.5-9b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 38.5 overall | 38.5 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 134 | Qwen3 30B A3B qwen3-30b-a3b textinference | Alibaba Cloud / Qwen Team | 38.4 overall | 25.6 | 40.1 | 0.0 | 0.0 | 71.3 | $0.1 in / $0.44 out |
| 135 | DeepSeek-V3.2 (Thinking) deepseek-reasoner codeprogrammingtool use | DeepSeek | 38.2 overall | 52.5 | 0.0 | 15.5 | 44.9 | 0.0 | N/A |
| 136 | QvQ-72B-Preview qvq-72b-preview multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 38.2 overall | 38.2 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 137 | Gemini 1.5 Pro gemini-1.5-pro multimodalvisionmulti-input reasoning | Google | 38.2 overall | 27.6 | 65.2 | 0.0 | 0.0 | 24.3 | |
| 138 | GPT OSS 120B gpt-oss-120b textinference | OpenAI | 38.1 overall | 36.1 | 34.5 | 26.8 | 0.0 | 76.4 | $0.09 in / $0.45 out |
| 139 | Claude 3.5 Sonnet claude-3-5-sonnet-20240620 multimodalvisionmulti-input reasoning | Anthropic | 37.9 overall | 25.4 | 68.2 | 0.0 | 0.0 | 24.6 | |
| 140 | LongCat-Flash-Thinking longcat-flash-thinking codeprogrammingtool use | Meituan | 37.6 overall | 50.2 | 0.0 | 0.0 | 21.6 | 0.0 |
Qwen3 VL 30B A3B Thinking
Alibaba Cloud / Qwen Team
40.4
$0.2 in / $1 out
Granite 3.3 8B Instruct
IBM
40.1
$0.5 in / $0.5 out
Gemini 2.5 Flash
40.0
$0.3 in / $2.5 out
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $0.3 in / $2.5 out |
| N/A |
| $0.18 in / $2.09 out |
| $0.2 in / $0.7 out |
| $3 in / $15 out |
| $2.5 in / $10 out |
| $3 in / $15 out |
| N/A |
Qwen3 VL 32B Thinking
Alibaba Cloud / Qwen Team
39.8
N/A
DeepSeek-V3 0324
DeepSeek
39.5
$0.28 in / $1.14 out
DeepSeek R1 Zero
DeepSeek
39.4
N/A
Qwen3 VL 8B Thinking
Alibaba Cloud / Qwen Team
39.4
$0.18 in / $2.09 out
Qwen3 VL 30B A3B Instruct
Alibaba Cloud / Qwen Team
39.2
$0.2 in / $0.7 out
Nova Pro
Amazon
39.2
$0.8 in / $3.2 out
Qwen2.5 7B Instruct
Alibaba Cloud / Qwen Team
39.2
$0.3 in / $0.3 out
K-EXAONE-236B-A23B
LG AI Research
39.0
$0.6 in / $1 out
Claude 3.7 Sonnet
Anthropic
38.9
$3 in / $15 out
Qwen3.5-9B
Alibaba Cloud / Qwen Team
38.5
N/A
Qwen3 30B A3B
Alibaba Cloud / Qwen Team
38.4
$0.1 in / $0.44 out
DeepSeek-V3.2 (Thinking)
DeepSeek
38.2
N/A
QvQ-72B-Preview
Alibaba Cloud / Qwen Team
38.2
N/A
Gemini 1.5 Pro
38.2
$2.5 in / $10 out
GPT OSS 120B
OpenAI
38.1
$0.09 in / $0.45 out
Claude 3.5 Sonnet
Anthropic
37.9
$3 in / $15 out
LongCat-Flash-Thinking
Meituan
37.6
N/A