Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
296
Tracked models
27
Providers
253
Benchmarked
30.8
Avg. index
296 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 101 | Llama 3.1 405B Instruct llama-3.1-405b-instruct textinference | Meta | 44.5 Value / Price | 20.0 | 21.4 | 0.0 | 0.0 | 44.5 | $0.89 in / $0.89 out |
| 102 | Mistral Large 3 (675B Instruct 2512) mistral-large-latest multimodalvisionmulti-input reasoning | Mistral AI | 44.5 Value / Price | 22.2 | 40.1 | 0.0 | 0.0 | 44.5 | |
| 103 | Qwen3.5-27B qwen3.5-27b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 44.0 Value / Price | 61.8 | 66.0 | 46.5 | 41.4 | 44.0 | $0.3 in / $2.4 out |
| 104 | Nova Pro nova-pro multimodalvisionmulti-input reasoning | Amazon | 43.2 Value / Price | 20.0 | 70.5 | 0.0 | 0.0 | 43.2 | $0.8 in / $3.2 out |
| 105 | GLM-4.6 glm-4.6 multimodalvisionmulti-input reasoning | Zhipu AI | 42.9 Value / Price | 46.5 | 34.5 | 37.3 | 45.7 | 42.9 | $0.55 in / $2.19 out |
| 106 | Gemini 2.5 Flash gemini-2.5-flash multimodalvisionmulti-input reasoning | Google | 42.6 Value / Price | 39.6 | 62.8 | 0.0 | 22.9 | 42.6 | |
| 107 | MiniMax M1 80K minimax-m1-80k codeprogrammingtool use | MiniMax | 41.8 Value / Price | 24.2 | 84.0 | 20.9 | 19.0 | 41.8 | $0.55 in / $2.2 out |
| 108 | o3-mini o3-mini codeprogrammingtool use | OpenAI | 41.6 Value / Price | 25.6 | 70.4 | 11.9 | 12.2 | 41.6 | $1.1 in / $4.4 out |
| 109 | o4-mini o4-mini multimodalvisionmulti-input reasoning | OpenAI | 41.6 Value / Price | 48.5 | 70.4 | 37.6 | 31.9 | 41.6 | $1.1 in / $4.4 out |
| 110 | GLM-4.7 glm-4.7 multimodalvisionmulti-input reasoning | Zhipu AI | 40.7 Value / Price | 62.4 | 52.2 | 27.6 | 43.8 | 40.7 | $0.6 in / $2.2 out |
| 111 | Kimi K2 0905 kimi-k2-0905 textinference | Moonshot AI | 40.1 Value / Price | 44.0 | 66.0 | 0.0 | 0.0 | 40.1 | $0.6 in / $2.5 out |
| 112 | Qwen3-235B-A22B-Thinking-2507 qwen3-235b-a22b-thinking-2507 textinference | Alibaba Cloud / Qwen Team | 39.6 Value / Price | 46.4 | 66.0 | 26.8 | 0.0 | 39.6 | $0.3 in / $3 out |
| 113 | Gemini 3 Flash gemini-3-flash-preview multimodalvisionmulti-input reasoning | Google | 39.0 Value / Price | 71.1 | 84.0 | 41.2 | 65.1 | 39.0 | |
| 114 | Kimi K2.5 kimi-k2.5 multimodalvisionmulti-input reasoning | Moonshot AI | 38.2 Value / Price | 67.9 | 66.0 | 48.9 | 47.7 | 38.2 | |
| 115 | Qwen3.5-122B-A10B qwen3.5-122b-a10b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 38.2 Value / Price | 64.5 | 66.0 | 50.5 | 40.5 | 38.2 | $0.4 in / $3.2 out |
| 116 | Claude Haiku 4.5 claude-haiku-4-5-20251001 multimodalvisionmulti-input reasoning | Anthropic | 37.7 Value / Price | 32.7 | 61.8 | 54.2 | 56.6 | 37.7 | |
| 117 | Qwen3 VL 235B A22B Thinking qwen3-vl-235b-a22b-thinking multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 37.4 Value / Price | 37.7 | 66.0 | 40.2 | 0.0 | 37.4 | |
| 118 | MiMo-V2-Pro mimo-v2-pro codeprogrammingtool use | Xiaomi | 36.5 Value / Price | 0.0 | 84.0 | 0.0 | 65.1 | 36.5 | $1 in / $3 out |
| 119 | Qwen3.5-397B-A17B qwen3.5-397b-a17b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 35.4 Value / Price | 58.0 | 66.0 | 33.0 | 59.5 | 35.4 | $0.6 in / $3.6 out |
| 120 | DeepSeek-R1 deepseek-r1 textinference | DeepSeek | 35.1 Value / Price | 0.0 | 14.3 | 0.0 | 0.0 | 35.1 | $0.55 in / $2.19 out |
Llama 3.1 405B Instruct
Meta
44.5
$0.89 in / $0.89 out
Mistral Large 3 (675B Instruct 2512)
Mistral AI
44.5
$0.5 in / $1.5 out
Qwen3.5-27B
Alibaba Cloud / Qwen Team
44.0
$0.3 in / $2.4 out
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $0.5 in / $1.5 out |
| $0.3 in / $2.5 out |
| $0.5 in / $3 out |
| $0.6 in / $3 out |
| $1 in / $5 out |
| $0.45 in / $3.49 out |
Nova Pro
Amazon
43.2
$0.8 in / $3.2 out
GLM-4.6
Zhipu AI
42.9
$0.55 in / $2.19 out
Gemini 2.5 Flash
42.6
$0.3 in / $2.5 out
MiniMax M1 80K
MiniMax
41.8
$0.55 in / $2.2 out
o3-mini
OpenAI
41.6
$1.1 in / $4.4 out
o4-mini
OpenAI
41.6
$1.1 in / $4.4 out
GLM-4.7
Zhipu AI
40.7
$0.6 in / $2.2 out
Kimi K2 0905
Moonshot AI
40.1
$0.6 in / $2.5 out
Qwen3-235B-A22B-Thinking-2507
Alibaba Cloud / Qwen Team
39.6
$0.3 in / $3 out
Gemini 3 Flash
39.0
$0.5 in / $3 out
Kimi K2.5
Moonshot AI
38.2
$0.6 in / $3 out
Qwen3.5-122B-A10B
Alibaba Cloud / Qwen Team
38.2
$0.4 in / $3.2 out
Claude Haiku 4.5
Anthropic
37.7
$1 in / $5 out
Qwen3 VL 235B A22B Thinking
Alibaba Cloud / Qwen Team
37.4
$0.45 in / $3.49 out
MiMo-V2-Pro
Xiaomi
36.5
$1 in / $3 out
Qwen3.5-397B-A17B
Alibaba Cloud / Qwen Team
35.4
$0.6 in / $3.6 out
DeepSeek-R1
DeepSeek
35.1
$0.55 in / $2.19 out