Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
294
Tracked models
27
Providers
251
Benchmarked
31.8
Avg. index
294 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 61 | Qwen3 VL 30B A3B Instruct qwen3-vl-30b-a3b-instruct multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 66.8 Inference | 28.7 | 66.8 | 23.6 | 0.0 | 63.3 | $0.2 in / $0.7 out |
| 62 | Qwen3 VL 30B A3B Thinking qwen3-vl-30b-a3b-thinking multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 66.8 Inference | 35.5 | 66.8 | 21.3 | 0.0 | 60.0 | |
| 63 | Qwen3 VL 4B Instruct qwen3-vl-4b-instruct multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 66.8 Inference | 19.7 | 66.8 | 19.5 | 0.0 | 70.6 | |
| 64 | Qwen3 VL 4B Thinking qwen3-vl-4b-thinking multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 66.8 Inference | 23.1 | 66.8 | 18.9 | 0.0 | 60.6 | |
| 65 | Qwen3 VL 8B Instruct qwen3-vl-8b-instruct multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 66.8 Inference | 9.8 | 66.8 | 26.7 | 0.0 | 75.6 | |
| 66 | Qwen3 VL 8B Thinking qwen3-vl-8b-thinking multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 66.8 Inference | 35.9 | 66.8 | 23.5 | 0.0 | 45.6 | |
| 67 | Gemini 1.5 Pro gemini-1.5-pro multimodalvisionmulti-input reasoning | Google | 65.5 Inference | 27.8 | 65.5 | 0.0 | 0.0 | 24.6 | |
| 68 | Jamba 1.5 Mini jamba-1.5-mini textinference | AI21 Labs | 65.2 Inference | 4.8 | 65.2 | 0.0 | 0.0 | 72.4 | $0.2 in / $0.4 out |
| 69 | Devstral Medium devstral-medium-2507 codeprogrammingtool use | Mistral AI | 64.5 Inference | 0.0 | 64.5 | 0.0 | 24.7 | 53.2 | $0.4 in / $2 out |
| 70 | Devstral Small 1.1 devstral-small-2507 codeprogrammingtool use | Mistral AI | 64.5 Inference | 0.0 | 64.5 | 0.0 | 15.0 | 85.0 | |
| 71 | Mistral Small 3.1 24B Base mistral-small-3.1-24b-base-2503 multimodalvisionmulti-input reasoning | Mistral AI | 64.5 Inference | 13.5 | 64.5 | 0.0 | 0.0 | 85.0 | |
| 72 | Grok-4.1 grok-4.1-2025-11-17 multimodalvisionmulti-input reasoning | xAI | 64.2 Inference | 0.0 | 64.2 | 0.0 | 0.0 | 22.6 | |
| 73 | ChatGPT-4o Latest chatgpt-4o-latest multimodalvisionmulti-input reasoning | OpenAI | 63.5 Inference | 56.6 | 63.5 | 0.0 | 0.0 | 32.0 | |
| 74 | Gemini 2.0 Flash-Lite gemini-2.0-flash-lite multimodalvisionmulti-input reasoning | Google | 63.2 Inference | 25.7 | 63.2 | 0.0 | 0.0 | 79.7 | |
| 75 | Gemini 2.5 Flash gemini-2.5-flash multimodalvisionmulti-input reasoning | Google | 63.2 Inference | 40.1 | 63.2 | 0.0 | 23.4 | 42.6 | |
| 76 | Gemini 2.5 Pro gemini-2.5-pro multimodalvisionmulti-input reasoning | Google | 63.2 Inference | 44.6 | 63.2 | 0.0 | 25.6 | 27.9 | |
| 77 | Gemini 2.5 Pro Preview 06-05 gemini-2.5-pro-preview-06-05 multimodalvisionmulti-input reasoning | Google | 63.2 Inference | 51.7 | 63.2 | 0.0 | 30.0 | 27.9 | |
| 78 | Step-3.5-Flash step-3.5-flash codeprogrammingtool use | StepFun | 63.2 Inference | 62.3 | 63.2 | 45.3 | 53.0 | 82.1 | $0.1 in / $0.4 out |
| 79 | GPT-5.1 Medium gpt-5.1-medium-2025-11-12 multimodalvisionmulti-input reasoning | OpenAI | 61.6 Inference | 63.6 | 61.6 | 0.0 | 0.0 | 29.0 | |
| 80 | GPT-5 Medium gpt-5-medium-2025-08-07 multimodalvisionmulti-input reasoning | OpenAI | 61.6 Inference | 56.9 | 61.6 | 0.0 | 0.0 | 29.0 |
Qwen3 VL 30B A3B Instruct
Alibaba Cloud / Qwen Team
66.8
$0.2 in / $0.7 out
Qwen3 VL 30B A3B Thinking
Alibaba Cloud / Qwen Team
66.8
$0.2 in / $1 out
Qwen3 VL 4B Instruct
Alibaba Cloud / Qwen Team
66.8
$0.1 in / $0.6 out
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $0.2 in / $1 out |
| $0.1 in / $0.6 out |
| $0.1 in / $1 out |
| $0.08 in / $0.5 out |
| $0.18 in / $2.09 out |
| $2.5 in / $10 out |
| $0.1 in / $0.3 out |
| $0.1 in / $0.3 out |
| $3 in / $15 out |
| $2.5 in / $10 out |
| $0.07 in / $0.3 out |
| $0.3 in / $2.5 out |
| $1.25 in / $10 out |
| $1.25 in / $10 out |
| $1.25 in / $10 out |
| $1.25 in / $10 out |
Qwen3 VL 4B Thinking
Alibaba Cloud / Qwen Team
66.8
$0.1 in / $1 out
Qwen3 VL 8B Instruct
Alibaba Cloud / Qwen Team
66.8
$0.08 in / $0.5 out
Qwen3 VL 8B Thinking
Alibaba Cloud / Qwen Team
66.8
$0.18 in / $2.09 out
Gemini 1.5 Pro
65.5
$2.5 in / $10 out
Jamba 1.5 Mini
AI21 Labs
65.2
$0.2 in / $0.4 out
Devstral Medium
Mistral AI
64.5
$0.4 in / $2 out
Devstral Small 1.1
Mistral AI
64.5
$0.1 in / $0.3 out
Mistral Small 3.1 24B Base
Mistral AI
64.5
$0.1 in / $0.3 out
Grok-4.1
xAI
64.2
$3 in / $15 out
ChatGPT-4o Latest
OpenAI
63.5
$2.5 in / $10 out
Gemini 2.0 Flash-Lite
63.2
$0.07 in / $0.3 out
Gemini 2.5 Flash
63.2
$0.3 in / $2.5 out
Gemini 2.5 Pro
63.2
$1.25 in / $10 out
Gemini 2.5 Pro Preview 06-05
63.2
$1.25 in / $10 out
Step-3.5-Flash
StepFun
63.2
$0.1 in / $0.4 out
GPT-5.1 Medium
OpenAI
61.6
$1.25 in / $10 out
GPT-5 Medium
OpenAI
61.6
$1.25 in / $10 out