Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
296
Tracked models
27
Providers
253
Benchmarked
27.4
Avg. index
296 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 141 | o3-mini o3-mini codeprogrammingtool use | OpenAI | 25.6 Benchmarks | 25.6 | 70.4 | 11.9 | 12.2 | 41.6 | $1.1 in / $4.4 out |
| 142 | Qwen3 30B A3B qwen3-30b-a3b textinference | Alibaba Cloud / Qwen Team | 25.6 Benchmarks | 25.6 | 40.1 | 0.0 | 0.0 | 71.3 | $0.1 in / $0.44 out |
| 143 | Claude 3.5 Sonnet claude-3-5-sonnet-20240620 multimodalvisionmulti-input reasoning | Anthropic | 25.4 Benchmarks | 25.4 | 68.2 | 0.0 | 0.0 | 24.6 | |
| 144 | Gemini 2.0 Flash-Lite gemini-2.0-flash-lite multimodalvisionmulti-input reasoning | Google | 25.3 Benchmarks | 25.3 | 62.8 | 0.0 | 0.0 | 79.7 | |
| 145 | Nemotron Nano 9B v2 nvidia-nemotron-nano-9b-v2 textinference | NVIDIA | 24.9 Benchmarks | 24.9 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 146 | Qwen2.5 VL 72B Instruct qwen2.5-vl-72b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 24.9 Benchmarks | 24.9 | 0.0 | 5.7 | 0.0 | 0.0 | N/A |
| 147 | DeepSeek R1 Distill Qwen 14B deepseek-r1-distill-qwen-14b textinference | DeepSeek | 24.7 Benchmarks | 24.7 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 148 | ERNIE 4.5 ernie-4.5 textinference | Baidu | 24.5 Benchmarks | 24.5 | 18.8 | 0.0 | 0.0 | 34.6 | $0.4 in / $4 out |
| 149 | LongCat-Flash-Lite longcat-flash-lite codeprogrammingtool use | Meituan | 24.5 Benchmarks | 24.5 | 83.6 | 29.5 | 25.1 | 83.1 | $0.1 in / $0.4 out |
| 150 | Magistral Small 2506 magistral-small-2506 textinference | Mistral AI | 24.5 Benchmarks | 24.5 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 151 | Kimi K2 Instruct kimi-k2-instruct codeprogrammingtool use | Moonshot AI | 24.4 Benchmarks | 24.4 | 46.1 | 14.8 | 15.3 | 62.1 | $0.5 in / $0.5 out |
| 152 | Kimi K2-Instruct-0905 kimi-k2-instruct-0905 codeprogrammingtool use | Moonshot AI | 24.4 Benchmarks | 24.4 | 0.0 | 6.6 | 19.3 | 0.0 | |
| 153 | MiniMax M1 80K minimax-m1-80k codeprogrammingtool use | MiniMax | 24.2 Benchmarks | 24.2 | 84.0 | 20.9 | 19.0 | 41.8 | $0.55 in / $2.2 out |
| 154 | Grok-2 mini grok-2-mini multimodalvisionmulti-input reasoning | xAI | 24.0 Benchmarks | 24.0 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 155 | Phi 4 Reasoning phi-4-reasoning textinference | Microsoft | 23.1 Benchmarks | 23.1 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 156 | Gemini 1.5 Flash gemini-1.5-flash multimodalvisionmulti-input reasoning | Google | 23.0 Benchmarks | 23.0 | 91.9 | 0.0 | 0.0 | 71.7 | |
| 157 | Llama-3.3 Nemotron Super 49B v1 llama-3.3-nemotron-super-49b-v1 textinference | NVIDIA | 23.0 Benchmarks | 23.0 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 158 | Qwen3 VL 4B Thinking qwen3-vl-4b-thinking multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 23.0 Benchmarks | 23.0 | 66.0 | 18.9 | 0.0 | 60.4 | |
| 159 | MiniMax M1 40K minimax-m1-40k codeprogrammingtool use | MiniMax | 22.6 Benchmarks | 22.6 | 0.0 | 26.8 | 18.1 | 0.0 | N/A |
| 160 | GPT-4o gpt-4o-2024-05-13 multimodalvisionmulti-input reasoning | OpenAI | 22.3 Benchmarks | 22.3 | 45.4 | 0.0 | 0.0 | 26.5 |
o3-mini
OpenAI
25.6
$1.1 in / $4.4 out
Qwen3 30B A3B
Alibaba Cloud / Qwen Team
25.6
$0.1 in / $0.44 out
Claude 3.5 Sonnet
Anthropic
25.4
$3 in / $15 out
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $3 in / $15 out |
| $0.07 in / $0.3 out |
| N/A |
| $0.15 in / $0.6 out |
| $0.1 in / $1 out |
| $2.5 in / $10 out |
Gemini 2.0 Flash-Lite
25.3
$0.07 in / $0.3 out
Nemotron Nano 9B v2
NVIDIA
24.9
N/A
Qwen2.5 VL 72B Instruct
Alibaba Cloud / Qwen Team
24.9
N/A
DeepSeek R1 Distill Qwen 14B
DeepSeek
24.7
N/A
ERNIE 4.5
Baidu
24.5
$0.4 in / $4 out
LongCat-Flash-Lite
Meituan
24.5
$0.1 in / $0.4 out
Magistral Small 2506
Mistral AI
24.5
N/A
Kimi K2 Instruct
Moonshot AI
24.4
$0.5 in / $0.5 out
Kimi K2-Instruct-0905
Moonshot AI
24.4
N/A
MiniMax M1 80K
MiniMax
24.2
$0.55 in / $2.2 out
Grok-2 mini
xAI
24.0
N/A
Phi 4 Reasoning
Microsoft
23.1
N/A
Gemini 1.5 Flash
23.0
$0.15 in / $0.6 out
Llama-3.3 Nemotron Super 49B v1
NVIDIA
23.0
N/A
Qwen3 VL 4B Thinking
Alibaba Cloud / Qwen Team
23.0
$0.1 in / $1 out
MiniMax M1 40K
MiniMax
22.6
N/A
GPT-4o
OpenAI
22.3
$2.5 in / $10 out