Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
309
Tracked models
27
Providers
264
Benchmarked
13.1
Avg. index
309 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Nemotron 3 Nano (30B A3B) nemotron-3-nano-30b-a3b codeprogrammingtool use | NVIDIA | 100.0 Value / Price | 44.5 | 41.1 | 3.0 | 4.0 | 100.0 | $0.06 in / $0.24 out |
| 2 | DeepSeek-V4-Flash-Max deepseek-v4-flash-max codeprogrammingtool use | DeepSeek | 98.7 Value / Price | 58.3 | 89.2 | 47.6 | 44.2 | 98.7 | |
| 3 | LongCat-Flash-Lite longcat-flash-lite codeprogrammingtool use | Meituan | 96.5 Value / Price | 23.6 | 74.7 | 30.1 | 24.5 | 96.5 | |
| 4 | GPT-4.1 nano gpt-4.1-nano-2025-04-14 multimodalvisionmulti-input reasoning | OpenAI | 95.9 Value / Price | 12.2 | 90.8 | 0.0 | 0.0 | 95.9 | |
| 5 | Step-3.5-Flash step-3.5-flash codeprogrammingtool use | StepFun | 95.0 Value / Price | 62.8 | 60.4 | 42.0 | 50.6 | 95.0 | $0.1 in / $0.4 out |
| 6 | Gemma 4 26B-A4B gemma-4-26b-a4b-it multimodalvisionmulti-input reasoning | Google | 93.7 Value / Price | 42.3 | 41.1 | 0.0 | 0.0 | 93.7 | |
| 7 | Gemma 4 31B gemma-4-31b-it multimodalvisionmulti-input reasoning | Google | 90.5 Value / Price | 54.9 | 41.1 | 0.0 | 0.0 | 90.5 | |
| 8 | GPT OSS 120B gpt-oss-120b textinference | OpenAI | 90.5 Value / Price | 34.9 | 14.6 | 26.8 | 0.0 | 90.5 | $0.09 in / $0.45 out |
| 9 | Qwen3 VL 8B Instruct qwen3-vl-8b-instruct multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 87.3 Value / Price | 8.8 | 41.1 | 26.4 | 0.0 | 87.3 | |
| 10 | GPT OSS 120B High gpt-oss-120b-high multimodalvisionmulti-input reasoning | OpenAI | 83.3 Value / Price | 44.2 | 53.0 | 0.0 | 0.0 | 83.3 | |
| 11 | Qwen3 VL 4B Instruct qwen3-vl-4b-instruct multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 81.0 Value / Price | 18.9 | 41.1 | 18.8 | 0.0 | 81.0 | |
| 12 | Mercury 2 mercury-2 codeprogrammingtool use | Inception | 79.7 Value / Price | 43.4 | 69.0 | 0.0 | 20.3 | 79.7 | $0.25 in / $0.75 out |
| 13 | Qwen3 30B A3B qwen3-30b-a3b textinference | Alibaba Cloud / Qwen Team | 79.5 Value / Price | 24.6 | 26.0 | 0.0 | 0.0 | 79.5 | $0.1 in / $0.44 out |
| 14 | DeepSeek-V3.2 (Non-thinking) deepseek-chat textinference | DeepSeek | 79.3 Value / Price | 0.0 | 53.0 | 0.0 | 0.0 | 79.3 | $0.28 in / $0.42 out |
| 15 | Mistral Small 4 mistral-small-latest multimodalvisionmulti-input reasoning | Mistral AI | 75.9 Value / Price | 32.9 | 28.5 | 0.0 | 0.0 | 75.9 | |
| 16 | Grok-4.1 Fast Non-Reasoning grok-4-1-fast-non-reasoning multimodalvisionmulti-input reasoning | xAI | 73.7 Value / Price | 0.0 | 62.1 | 0.0 | 0.0 | 73.7 | |
| 17 | Grok-4.1 Fast Reasoning grok-4-1-fast-reasoning multimodalvisionmulti-input reasoning | xAI | 73.7 Value / Price | 0.0 | 62.1 | 0.0 | 0.0 | 73.7 | |
| 18 | Grok 4 Fast grok-4-fast multimodalvisionmulti-input reasoning | xAI | 73.7 Value / Price | 57.1 | 62.1 | 13.7 | 0.0 | 73.7 | $0.2 in / $0.5 out |
| 19 | Grok-4 Fast Non-Reasoning grok-4-fast-non-reasoning multimodalvisionmulti-input reasoning | xAI | 73.7 Value / Price | 0.0 | 62.1 | 0.0 | 0.0 | 73.7 | |
| 20 | Grok-4 Fast Reasoning grok-4-fast-reasoning multimodalvisionmulti-input reasoning | xAI | 73.7 Value / Price | 0.0 | 62.1 | 0.0 | 0.0 | 73.7 |
Nemotron 3 Nano (30B A3B)
NVIDIA
100.0
$0.06 in / $0.24 out
DeepSeek-V4-Flash-Max
DeepSeek
98.7
$0.14 in / $0.28 out
LongCat-Flash-Lite
Meituan
96.5
$0.1 in / $0.4 out
Page 1 of 16 · 309 models
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $0.14 in / $0.28 out |
| $0.1 in / $0.4 out |
| $0.1 in / $0.4 out |
| $0.13 in / $0.4 out |
| $0.14 in / $0.4 out |
| $0.08 in / $0.5 out |
| $0.1 in / $0.5 out |
| $0.1 in / $0.6 out |
| $0.15 in / $0.6 out |
| $0.2 in / $0.5 out |
| $0.2 in / $0.5 out |
| $0.2 in / $0.5 out |
| $0.2 in / $0.5 out |
GPT-4.1 nano
OpenAI
95.9
$0.1 in / $0.4 out
Step-3.5-Flash
StepFun
95.0
$0.1 in / $0.4 out
Gemma 4 26B-A4B
93.7
$0.13 in / $0.4 out
Gemma 4 31B
90.5
$0.14 in / $0.4 out
GPT OSS 120B
OpenAI
90.5
$0.09 in / $0.45 out
Qwen3 VL 8B Instruct
Alibaba Cloud / Qwen Team
87.3
$0.08 in / $0.5 out
GPT OSS 120B High
OpenAI
83.3
$0.1 in / $0.5 out
Qwen3 VL 4B Instruct
Alibaba Cloud / Qwen Team
81.0
$0.1 in / $0.6 out
Mercury 2
Inception
79.7
$0.25 in / $0.75 out
Qwen3 30B A3B
Alibaba Cloud / Qwen Team
79.5
$0.1 in / $0.44 out
DeepSeek-V3.2 (Non-thinking)
DeepSeek
79.3
$0.28 in / $0.42 out
Mistral Small 4
Mistral AI
75.9
$0.15 in / $0.6 out
Grok-4.1 Fast Non-Reasoning
xAI
73.7
$0.2 in / $0.5 out
Grok-4.1 Fast Reasoning
xAI
73.7
$0.2 in / $0.5 out
Grok 4 Fast
xAI
73.7
$0.2 in / $0.5 out
Grok-4 Fast Non-Reasoning
xAI
73.7
$0.2 in / $0.5 out
Grok-4 Fast Reasoning
xAI
73.7
$0.2 in / $0.5 out