Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
294
Tracked models
27
Providers
251
Benchmarked
31.8
Avg. index
294 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 41 | Grok 4 Fast grok-4-fast multimodalvisionmulti-input reasoning | xAI | 68.2 Inference | 58.0 | 68.2 | 15.4 | 0.0 | 67.2 | $0.2 in / $0.5 out |
| 42 | Grok-4 Fast Non-Reasoning grok-4-fast-non-reasoning multimodalvisionmulti-input reasoning | xAI | 68.2 Inference | 0.0 | 68.2 | 0.0 | 0.0 | 67.2 | |
| 43 | Grok-4 Fast Reasoning grok-4-fast-reasoning multimodalvisionmulti-input reasoning | xAI | 68.2 Inference | 0.0 | 68.2 | 0.0 | 0.0 | 67.2 | |
| 44 | Claude 3.5 Sonnet claude-3-5-sonnet-20240620 multimodalvisionmulti-input reasoning | Anthropic | 67.4 Inference | 25.6 | 67.4 | 0.0 | 0.0 | 24.5 | |
| 45 | Claude 3.5 Sonnet claude-3-5-sonnet-20241022 multimodalvisionmulti-input reasoning | Anthropic | 67.4 Inference | 33.9 | 67.4 | 38.7 | 13.2 | 24.5 | |
| 46 | Gemini 3.1 Pro gemini-3.1-pro-preview multimodalvisionmulti-input reasoning | Google | 66.8 Inference | 74.3 | 66.8 | 72.3 | 65.5 | 22.1 | |
| 47 | Gemma 4 26B-A4B gemma-4-26b-a4b-it multimodalvisionmulti-input reasoning | Google | 66.8 Inference | 43.7 | 66.8 | 0.0 | 0.0 | 77.8 | |
| 48 | Gemma 4 31B gemma-4-31b-it multimodalvisionmulti-input reasoning | Google | 66.8 Inference | 56.5 | 66.8 | 0.0 | 0.0 | 76.7 | |
| 49 | Kimi K2 0905 kimi-k2-0905 textinference | Moonshot AI | 66.8 Inference | 44.4 | 66.8 | 0.0 | 0.0 | 40.0 | $0.6 in / $2.5 out |
| 50 | Kimi K2.5 kimi-k2.5 multimodalvisionmulti-input reasoning | Moonshot AI | 66.8 Inference | 68.0 | 66.8 | 49.5 | 48.5 | 38.1 | $0.6 in / $3 out |
| 51 | Kimi K2.6 kimi-k2.6 multimodalvisionmulti-input reasoning | Moonshot AI | 66.8 Inference | 68.5 | 66.8 | 45.3 | 81.0 | 33.3 | $0.95 in / $4 out |
| 52 | Nemotron 3 Nano (30B A3B) nemotron-3-nano-30b-a3b codeprogrammingtool use | NVIDIA | 66.8 Inference | 45.8 | 66.8 | 3.3 | 4.4 | 90.8 | $0.06 in / $0.24 out |
| 53 | Qwen3-235B-A22B-Instruct-2507 qwen3-235b-a22b-instruct-2507 textinference | Alibaba Cloud / Qwen Team | 66.8 Inference | 42.9 | 66.8 | 0.0 | 0.0 | 62.8 | $0.15 in / $0.8 out |
| 54 | Qwen3-235B-A22B-Thinking-2507 qwen3-235b-a22b-thinking-2507 textinference | Alibaba Cloud / Qwen Team | 66.8 Inference | 46.9 | 66.8 | 26.8 | 0.0 | 39.4 | $0.3 in / $3 out |
| 55 | Qwen3.5-122B-A10B qwen3.5-122b-a10b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 66.8 Inference | 64.8 | 66.8 | 51.6 | 41.5 | 38.1 | $0.4 in / $3.2 out |
| 56 | Qwen3.5-27B qwen3.5-27b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 66.8 Inference | 61.9 | 66.8 | 47.5 | 42.4 | 43.9 | $0.3 in / $2.4 out |
| 57 | Qwen3.5-35B-A3B qwen3.5-35b-a3b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 66.8 Inference | 57.2 | 66.8 | 44.3 | 34.4 | 46.4 | $0.25 in / $2 out |
| 58 | Qwen3.5-397B-A17B qwen3.5-397b-a17b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 66.8 Inference | 58.6 | 66.8 | 35.6 | 60.9 | 35.3 | $0.6 in / $3.6 out |
| 59 | Qwen3 VL 235B A22B Instruct qwen3-vl-235b-a22b-instruct multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 66.8 Inference | 37.1 | 66.8 | 56.7 | 0.0 | 49.4 | |
| 60 | Qwen3 VL 235B A22B Thinking qwen3-vl-235b-a22b-thinking multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 66.8 Inference | 37.9 | 66.8 | 40.2 | 0.0 | 37.2 |
Grok 4 Fast
xAI
68.2
$0.2 in / $0.5 out
Grok-4 Fast Non-Reasoning
xAI
68.2
$0.2 in / $0.5 out
Grok-4 Fast Reasoning
xAI
68.2
$0.2 in / $0.5 out
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $0.2 in / $0.5 out |
| $0.2 in / $0.5 out |
| $3 in / $15 out |
| $3 in / $15 out |
| $2.5 in / $15 out |
| $0.13 in / $0.4 out |
| $0.14 in / $0.4 out |
| $0.3 in / $1.5 out |
| $0.45 in / $3.49 out |
Claude 3.5 Sonnet
Anthropic
67.4
$3 in / $15 out
Claude 3.5 Sonnet
Anthropic
67.4
$3 in / $15 out
Gemini 3.1 Pro
66.8
$2.5 in / $15 out
Gemma 4 26B-A4B
66.8
$0.13 in / $0.4 out
Gemma 4 31B
66.8
$0.14 in / $0.4 out
Kimi K2 0905
Moonshot AI
66.8
$0.6 in / $2.5 out
Kimi K2.5
Moonshot AI
66.8
$0.6 in / $3 out
Kimi K2.6
Moonshot AI
66.8
$0.95 in / $4 out
Nemotron 3 Nano (30B A3B)
NVIDIA
66.8
$0.06 in / $0.24 out
Qwen3-235B-A22B-Instruct-2507
Alibaba Cloud / Qwen Team
66.8
$0.15 in / $0.8 out
Qwen3-235B-A22B-Thinking-2507
Alibaba Cloud / Qwen Team
66.8
$0.3 in / $3 out
Qwen3.5-122B-A10B
Alibaba Cloud / Qwen Team
66.8
$0.4 in / $3.2 out
Qwen3.5-27B
Alibaba Cloud / Qwen Team
66.8
$0.3 in / $2.4 out
Qwen3.5-35B-A3B
Alibaba Cloud / Qwen Team
66.8
$0.25 in / $2 out
Qwen3.5-397B-A17B
Alibaba Cloud / Qwen Team
66.8
$0.6 in / $3.6 out
Qwen3 VL 235B A22B Instruct
Alibaba Cloud / Qwen Team
66.8
$0.3 in / $1.5 out
Qwen3 VL 235B A22B Thinking
Alibaba Cloud / Qwen Team
66.8
$0.45 in / $3.49 out