Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
296
Tracked models
27
Providers
253
Benchmarked
27.4
Avg. index
296 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 101 | GPT OSS 120B gpt-oss-120b textinference | OpenAI | 36.1 Benchmarks | 36.1 | 34.5 | 26.8 | 0.0 | 76.4 | $0.09 in / $0.45 out |
| 102 | Qwen3 VL 8B Thinking qwen3-vl-8b-thinking multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 35.6 Benchmarks | 35.6 | 66.0 | 23.5 | 0.0 | 45.6 | |
| 103 | Llama 3.1 Nemotron Ultra 253B v1 llama-3.1-nemotron-ultra-253b-v1 textinference | NVIDIA | 35.4 Benchmarks | 35.4 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 104 | Llama 4 Maverick llama-4-maverick multimodalvisionmulti-input reasoning | Meta | 35.4 Benchmarks | 35.4 | 55.8 | 0.0 | 0.0 | 57.1 | $0.17 in / $0.85 out |
| 105 | Kimi-k1.5 kimi-k1.5 multimodalvisionmulti-input reasoning | Moonshot AI | 35.3 Benchmarks | 35.3 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 106 | Qwen3 VL 30B A3B Thinking qwen3-vl-30b-a3b-thinking multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 35.1 Benchmarks | 35.1 | 66.0 | 21.3 | 0.0 | 59.9 | |
| 107 | Mistral Small 4 mistral-small-latest multimodalvisionmulti-input reasoning | Mistral AI | 34.7 Benchmarks | 34.7 | 55.2 | 0.0 | 0.0 | 66.8 | |
| 108 | GLM-4.5 glm-4.5 codeprogrammingtool use | Zhipu AI | 33.8 Benchmarks | 33.8 | 0.0 | 36.4 | 40.3 | 0.0 | N/A |
| 109 | Claude 3.5 Sonnet claude-3-5-sonnet-20241022 multimodalvisionmulti-input reasoning | Anthropic | 33.7 Benchmarks | 33.7 | 68.2 | 38.7 | 12.9 | 24.6 | |
| 110 | Gemini 2.0 Flash gemini-2.0-flash multimodalvisionmulti-input reasoning | Google | 33.3 Benchmarks | 33.3 | 93.9 | 0.0 | 0.0 | 82.5 | |
| 111 | DeepSeek-V3 0324 deepseek-v3-0324 textinference | DeepSeek | 32.8 Benchmarks | 32.8 | 39.8 | 0.0 | 0.0 | 57.7 | $0.28 in / $1.14 out |
| 112 | Claude Haiku 4.5 claude-haiku-4-5-20251001 multimodalvisionmulti-input reasoning | Anthropic | 32.7 Benchmarks | 32.7 | 61.8 | 54.2 | 56.6 | 37.7 | |
| 113 | Qwen3.5-4B qwen3.5-4b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 32.1 Benchmarks | 32.1 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 114 | MiniMax M2 minimax-m2 codeprogrammingtool use | MiniMax | 31.9 Benchmarks | 31.9 | 84.0 | 41.1 | 42.4 | 54.9 | $0.3 in / $1.2 out |
| 115 | Ministral 3 (8B Reasoning 2512) ministral-8b-latest multimodalvisionmulti-input reasoning | Mistral AI | 31.6 Benchmarks | 31.6 | 84.5 | 0.0 | 0.0 | 92.1 | |
| 116 | GPT-4o gpt-4o-2024-08-06 multimodalvisionmulti-input reasoning | OpenAI | 31.5 Benchmarks | 31.5 | 46.7 | 14.9 | 4.3 | 26.8 | |
| 117 | Phi 4 Reasoning Plus phi-4-reasoning-plus textinference | Microsoft | 31.5 Benchmarks | 31.5 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 118 | Qwen3 235B A22B qwen3-235b-a22b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 30.5 Benchmarks | 30.5 | 33.5 | 0.0 | 0.0 | 84.0 | $0.1 in / $0.1 out |
| 119 | Hermes 3 70B hermes-3-70b textinference | Nous Research | 30.1 Benchmarks | 30.1 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 120 | Qwen3 Max qwen3-max codeprogrammingtool use | Alibaba Cloud / Qwen Team | 29.8 Benchmarks | 29.8 | 55.2 | 0.0 | 35.8 | 31.3 | $0.5 in / $5 out |
GPT OSS 120B
OpenAI
36.1
$0.09 in / $0.45 out
Qwen3 VL 8B Thinking
Alibaba Cloud / Qwen Team
35.6
$0.18 in / $2.09 out
Llama 3.1 Nemotron Ultra 253B v1
NVIDIA
35.4
N/A
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $0.18 in / $2.09 out |
| $0.2 in / $1 out |
| $0.15 in / $0.6 out |
| $3 in / $15 out |
| $0.1 in / $0.4 out |
| $1 in / $5 out |
| $0.15 in / $0.15 out |
| $2.5 in / $10 out |
Llama 4 Maverick
Meta
35.4
$0.17 in / $0.85 out
Kimi-k1.5
Moonshot AI
35.3
N/A
Qwen3 VL 30B A3B Thinking
Alibaba Cloud / Qwen Team
35.1
$0.2 in / $1 out
Mistral Small 4
Mistral AI
34.7
$0.15 in / $0.6 out
GLM-4.5
Zhipu AI
33.8
N/A
Claude 3.5 Sonnet
Anthropic
33.7
$3 in / $15 out
Gemini 2.0 Flash
33.3
$0.1 in / $0.4 out
DeepSeek-V3 0324
DeepSeek
32.8
$0.28 in / $1.14 out
Claude Haiku 4.5
Anthropic
32.7
$1 in / $5 out
Qwen3.5-4B
Alibaba Cloud / Qwen Team
32.1
N/A
MiniMax M2
MiniMax
31.9
$0.3 in / $1.2 out
Ministral 3 (8B Reasoning 2512)
Mistral AI
31.6
$0.15 in / $0.15 out
GPT-4o
OpenAI
31.5
$2.5 in / $10 out
Phi 4 Reasoning Plus
Microsoft
31.5
N/A
Qwen3 235B A22B
Alibaba Cloud / Qwen Team
30.5
$0.1 in / $0.1 out
Hermes 3 70B
Nous Research
30.1
N/A
Qwen3 Max
Alibaba Cloud / Qwen Team
29.8
$0.5 in / $5 out