Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
296
Tracked models
27
Providers
253
Benchmarked
34.7
Avg. index
296 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 101 | Gemini 2.5 Pro Preview 06-05 gemini-2.5-pro-preview-06-05 multimodalvisionmulti-input reasoning | Google | 44.2 overall | 51.2 | 62.8 | 0.0 | 29.3 | 27.6 | $1.25 in / $10 out |
| 102 | Qwen3 VL 235B A22B Thinking qwen3-vl-235b-a22b-thinking multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 44.2 overall | 37.7 | 66.0 | 40.2 | 0.0 | 37.4 | |
| 103 | Nova Lite nova-lite multimodalvisionmulti-input reasoning | Amazon | 44.0 overall | 13.5 | 70.5 | 0.0 | 0.0 | 86.7 | $0.06 in / $0.24 out |
| 104 | Grok Code Fast 1 grok-code-fast-1 codeprogrammingtool use | xAI | 44.0 overall | 0.0 | 47.7 | 0.0 | 38.8 | 49.7 | $0.2 in / $1.5 out |
| 105 | Devstral Medium devstral-medium-2507 codeprogrammingtool use | Mistral AI | 43.8 overall | 0.0 | 64.8 | 0.0 | 24.2 | 53.4 | $0.4 in / $2 out |
| 106 | Qwen3-Coder 480B A35B Instruct qwen3-coder-480b-a35b-instruct codeprogrammingtool use | Alibaba Cloud / Qwen Team | 43.6 overall | 0.0 | 0.0 | 50.7 | 35.8 | 0.0 | |
| 107 | Qwen3-235B-A22B-Thinking-2507 qwen3-235b-a22b-thinking-2507 textinference | Alibaba Cloud / Qwen Team | 43.5 overall | 46.4 | 66.0 | 26.8 | 0.0 | 39.6 | $0.3 in / $3 out |
| 108 | GPT-5.4 Mini gpt-5.4-mini texttext-to-textlanguage | OpenAI | 43.3 overall | 56.8 | 76.5 | 23.8 | 28.1 | 32.4 | |
| 109 | Mistral NeMo Instruct mistral-nemo-instruct-2407 textinference | Mistral AI | 42.9 overall | 0.0 | 21.4 | 0.0 | 0.0 | 77.3 | $0.15 in / $0.15 out |
| 110 | GPT-5.3 Chat gpt-5.3-chat-latest multimodalvisionmulti-input reasoning | OpenAI | 42.6 overall | 0.0 | 52.7 | 0.0 | 0.0 | 26.5 | |
| 111 | LongCat-Flash-Chat longcat-flash-chat codeprogrammingtool use | Meituan | 42.4 overall | 27.9 | 52.7 | 49.2 | 39.1 | 57.9 | $0.3 in / $1.2 out |
| 112 | Mistral Small 3.1 24B Base mistral-small-3.1-24b-base-2503 multimodalvisionmulti-input reasoning | Mistral AI | 42.0 overall | 13.4 | 64.8 | 0.0 | 0.0 | 85.3 | |
| 113 | GLM-4.6 glm-4.6 multimodalvisionmulti-input reasoning | Zhipu AI | 41.8 overall | 46.5 | 34.5 | 37.3 | 45.7 | 42.9 | $0.55 in / $2.19 out |
| 114 | Llama 3.2 3B Instruct llama-3.2-3b-instruct textinference | Meta | 41.4 overall | 5.2 | 68.9 | 0.0 | 0.0 | 98.8 | $0.01 in / $0.02 out |
| 115 | Qwen3 235B A22B qwen3-235b-a22b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 41.3 overall | 30.5 | 33.5 | 0.0 | 0.0 | 84.0 | $0.1 in / $0.1 out |
| 116 | Command R+ command-r-plus-04-2024 textinference | Cohere | 41.3 overall | 0.0 | 32.5 | 0.0 | 0.0 | 55.4 | $0.25 in / $1 out |
| 117 | LongCat-Flash-Lite longcat-flash-lite codeprogrammingtool use | Meituan | 41.1 overall | 24.5 | 83.6 | 29.5 | 25.1 | 83.1 | $0.1 in / $0.4 out |
| 118 | DeepSeek-V3.2-Exp deepseek-v3.2-exp codeprogrammingtool use | DeepSeek | 41.0 overall | 52.3 | 0.0 | 28.6 | 40.1 | 0.0 | N/A |
| 119 | GPT-5.2 Codex gpt-5.2-codex multimodalvisionmulti-input reasoning | OpenAI | 40.6 overall | 0.0 | 49.0 | 0.0 | 44.1 | 19.6 | $1.75 in / $14 out |
| 120 | Gemini 2.5 Pro gemini-2.5-pro multimodalvisionmulti-input reasoning | Google | 40.4 overall | 44.2 | 62.8 | 0.0 | 25.0 | 27.6 |
Gemini 2.5 Pro Preview 06-05
44.2
$1.25 in / $10 out
Qwen3 VL 235B A22B Thinking
Alibaba Cloud / Qwen Team
44.2
$0.45 in / $3.49 out
Nova Lite
Amazon
44.0
$0.06 in / $0.24 out
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $0.45 in / $3.49 out |
| N/A |
| $0.75 in / $4.5 out |
| $1.75 in / $14 out |
| $0.1 in / $0.3 out |
| $1.25 in / $10 out |
Grok Code Fast 1
xAI
44.0
$0.2 in / $1.5 out
Devstral Medium
Mistral AI
43.8
$0.4 in / $2 out
Qwen3-Coder 480B A35B Instruct
Alibaba Cloud / Qwen Team
43.6
N/A
Qwen3-235B-A22B-Thinking-2507
Alibaba Cloud / Qwen Team
43.5
$0.3 in / $3 out
Mistral NeMo Instruct
Mistral AI
42.9
$0.15 in / $0.15 out
GPT-5.3 Chat
OpenAI
42.6
$1.75 in / $14 out
LongCat-Flash-Chat
Meituan
42.4
$0.3 in / $1.2 out
Mistral Small 3.1 24B Base
Mistral AI
42.0
$0.1 in / $0.3 out
GLM-4.6
Zhipu AI
41.8
$0.55 in / $2.19 out
Llama 3.2 3B Instruct
Meta
41.4
$0.01 in / $0.02 out
Qwen3 235B A22B
Alibaba Cloud / Qwen Team
41.3
$0.1 in / $0.1 out
Command R+
Cohere
41.3
$0.25 in / $1 out
LongCat-Flash-Lite
Meituan
41.1
$0.1 in / $0.4 out
DeepSeek-V3.2-Exp
DeepSeek
41.0
N/A
GPT-5.2 Codex
OpenAI
40.6
$1.75 in / $14 out
Gemini 2.5 Pro
40.4
$1.25 in / $10 out