Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
296
Tracked models
27
Providers
253
Benchmarked
30.8
Avg. index
296 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 81 | Granite 3.3 8B Instruct granite-3.3-8b-instruct multimodalvisionmulti-input reasoning | IBM | 56.7 Value / Price | 0.0 | 29.7 | 0.0 | 0.0 | 56.7 | $0.5 in / $0.5 out |
| 82 | GPT-5 mini gpt-5-mini-2025-08-07 multimodalvisionmulti-input reasoning | OpenAI | 56.3 Value / Price | 41.5 | 89.4 | 0.0 | 23.7 | 56.3 | |
| 83 | Command R+ command-r-plus-04-2024 textinference | Cohere | 55.4 Value / Price | 0.0 | 32.5 | 0.0 | 0.0 | 55.4 | $0.25 in / $1 out |
| 84 | Gemini 1.0 Pro gemini-1.0-pro multimodalvisionmulti-input reasoning | Google | 55.4 Value / Price | 3.2 | 57.2 | 0.0 | 0.0 | 55.4 | |
| 85 | Llama 3.2 90B Instruct llama-3.2-90b-instruct multimodalvisionmulti-input reasoning | Meta | 54.9 Value / Price | 16.3 | 11.3 | 0.0 | 0.0 | 54.9 | $0.35 in / $0.4 out |
| 86 | MiniMax M2 minimax-m2 codeprogrammingtool use | MiniMax | 54.9 Value / Price | 31.9 | 84.0 | 41.1 | 42.4 | 54.9 | $0.3 in / $1.2 out |
| 87 | MiniMax M2.7 minimax-m2.7 codeprogrammingtool use | MiniMax | 54.9 Value / Price | 0.0 | 52.2 | 44.9 | 40.1 | 54.9 | $0.3 in / $1.2 out |
| 88 | Qwen2.5 72B Instruct qwen-2.5-72b-instruct textinference | Alibaba Cloud / Qwen Team | 54.5 Value / Price | 17.8 | 15.0 | 0.0 | 0.0 | 54.5 | $0.35 in / $0.4 out |
| 89 | Devstral Medium devstral-medium-2507 codeprogrammingtool use | Mistral AI | 53.4 Value / Price | 0.0 | 64.8 | 0.0 | 24.2 | 53.4 | |
| 90 | Mistral Small mistral-small-2409 textinference | Mistral AI | 51.9 Value / Price | 0.0 | 2.1 | 0.0 | 0.0 | 51.9 | $0.2 in / $0.6 out |
| 91 | Qwen3-Next-80B-A3B-Instruct qwen3-next-80b-a3b-instruct textinference | Alibaba Cloud / Qwen Team | 51.9 Value / Price | 29.5 | 6.1 | 17.9 | 0.0 | 51.9 | $0.15 in / $1.5 out |
| 92 | Qwen3-Next-80B-A3B-Thinking qwen3-next-80b-a3b-thinking textinference | Alibaba Cloud / Qwen Team | 51.9 Value / Price | 44.7 | 6.1 | 41.7 | 0.0 | 51.9 | $0.15 in / $1.5 out |
| 93 | Gemini 3.1 Flash-Lite gemini-3.1-flash-lite-preview multimodalvisionmulti-input reasoning | Google | 50.5 Value / Price | 56.0 | 84.0 | 0.0 | 0.0 | 50.5 | |
| 94 | Grok Code Fast 1 grok-code-fast-1 codeprogrammingtool use | xAI | 49.7 Value / Price | 0.0 | 47.7 | 0.0 | 38.8 | 49.7 | $0.2 in / $1.5 out |
| 95 | Qwen3 VL 235B A22B Instruct qwen3-vl-235b-a22b-instruct multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 49.5 Value / Price | 36.9 | 66.0 | 56.7 | 0.0 | 49.5 | |
| 96 | GPT-3.5 Turbo gpt-3.5-turbo-0125 multimodalvisionmulti-input reasoning | OpenAI | 49.4 Value / Price | 2.5 | 36.7 | 0.0 | 0.0 | 49.4 | |
| 97 | K-EXAONE-236B-A23B k-exaone-236b-a23b multimodalvisionmulti-input reasoning | LG AI Research | 49.2 Value / Price | 43.4 | 24.9 | 0.0 | 0.0 | 49.2 | $0.6 in / $1 out |
| 98 | Qwen3.5-35B-A3B qwen3.5-35b-a3b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 46.4 Value / Price | 56.9 | 66.0 | 43.3 | 33.6 | 46.4 | $0.25 in / $2 out |
| 99 | Qwen3 VL 8B Thinking qwen3-vl-8b-thinking multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 45.6 Value / Price | 35.6 | 66.0 | 23.5 | 0.0 | 45.6 | |
| 100 | MiMo-V2-Omni mimo-v2-omni multimodalvisionmulti-input reasoning | Xiaomi | 44.8 Value / Price | 0.0 | 58.6 | 0.0 | 54.4 | 44.8 |
Granite 3.3 8B Instruct
IBM
56.7
$0.5 in / $0.5 out
GPT-5 mini
OpenAI
56.3
$0.25 in / $2 out
Command R+
Cohere
55.4
$0.25 in / $1 out
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $0.25 in / $2 out |
| $0.5 in / $1.5 out |
| $0.4 in / $2 out |
| $0.25 in / $1.5 out |
| $0.3 in / $1.5 out |
| $0.5 in / $1.5 out |
| $0.18 in / $2.09 out |
| $0.4 in / $2 out |
Gemini 1.0 Pro
55.4
$0.5 in / $1.5 out
Llama 3.2 90B Instruct
Meta
54.9
$0.35 in / $0.4 out
MiniMax M2
MiniMax
54.9
$0.3 in / $1.2 out
MiniMax M2.7
MiniMax
54.9
$0.3 in / $1.2 out
Qwen2.5 72B Instruct
Alibaba Cloud / Qwen Team
54.5
$0.35 in / $0.4 out
Devstral Medium
Mistral AI
53.4
$0.4 in / $2 out
Mistral Small
Mistral AI
51.9
$0.2 in / $0.6 out
Qwen3-Next-80B-A3B-Instruct
Alibaba Cloud / Qwen Team
51.9
$0.15 in / $1.5 out
Qwen3-Next-80B-A3B-Thinking
Alibaba Cloud / Qwen Team
51.9
$0.15 in / $1.5 out
Gemini 3.1 Flash-Lite
50.5
$0.25 in / $1.5 out
Grok Code Fast 1
xAI
49.7
$0.2 in / $1.5 out
Qwen3 VL 235B A22B Instruct
Alibaba Cloud / Qwen Team
49.5
$0.3 in / $1.5 out
GPT-3.5 Turbo
OpenAI
49.4
$0.5 in / $1.5 out
K-EXAONE-236B-A23B
LG AI Research
49.2
$0.6 in / $1 out
Qwen3.5-35B-A3B
Alibaba Cloud / Qwen Team
46.4
$0.25 in / $2 out
Qwen3 VL 8B Thinking
Alibaba Cloud / Qwen Team
45.6
$0.18 in / $2.09 out
MiMo-V2-Omni
Xiaomi
44.8
$0.4 in / $2 out