Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
309
Tracked models
27
Providers
264
Benchmarked
13.1
Avg. index
309 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 1 | GPT-5.5 gpt-5.5 multimodalvisionmulti-input reasoning | OpenAI | 93.7 Inference | 80.4 | 93.7 | 70.2 | 61.6 | 1.9 | $5 in / $30 out |
| 2 | GPT-4.1 nano gpt-4.1-nano-2025-04-14 multimodalvisionmulti-input reasoning | OpenAI | 90.8 Inference | 12.2 | 90.8 | 0.0 | 0.0 | 95.9 | |
| 3 | DeepSeek-V4-Flash-Max deepseek-v4-flash-max codeprogrammingtool use | DeepSeek | 89.2 Inference | 58.3 | 89.2 | 47.6 | 44.2 | 98.7 | |
| 4 | DeepSeek-V4-Pro-Max deepseek-v4-pro-max codeprogrammingtool use | DeepSeek | 89.2 Inference | 67.4 | 89.2 | 61.3 | 58.6 | 34.2 | |
| 5 | Gemini 3.5 Flash gemini-3.5-flash multimodalvisionmulti-input reasoning | Google | 89.2 Inference | 62.8 | 89.2 | 74.4 | 30.5 | 26.6 | |
| 6 | GPT-4.1 mini gpt-4.1-mini-2025-04-14 multimodalvisionmulti-input reasoning | OpenAI | 87.8 Inference | 20.2 | 87.8 | 8.9 | 2.4 | 65.6 | |
| 7 | GPT-5 mini gpt-5-mini-2025-08-07 multimodalvisionmulti-input reasoning | OpenAI | 81.7 Inference | 41.7 | 81.7 | 0.0 | 27.3 | 64.0 | |
| 8 | GPT-4.1 gpt-4.1-2025-04-14 multimodalvisionmulti-input reasoning | OpenAI | 76.0 Inference | 27.9 | 76.0 | 32.8 | 15.8 | 36.7 | |
| 9 | LongCat-Flash-Lite longcat-flash-lite codeprogrammingtool use | Meituan | 74.7 Inference | 23.6 | 74.7 | 30.1 | 24.5 | 96.5 | $0.1 in / $0.4 out |
| 10 | Gemini 3.1 Flash-Lite gemini-3.1-flash-lite-preview multimodalvisionmulti-input reasoning | Google | 72.2 Inference | 55.3 | 72.2 | 0.0 | 0.0 | 63.3 | |
| 11 | Gemini 3 Flash gemini-3-flash-preview multimodalvisionmulti-input reasoning | Google | 72.2 Inference | 70.0 | 72.2 | 38.8 | 63.7 | 44.9 | |
| 12 | Grok 4.3 grok-4.3 textinference | xAI | 72.2 Inference | 0.0 | 72.2 | 0.0 | 0.0 | 41.8 | $1.25 in / $2.5 out |
| 13 | MiniMax M2.1 minimax-m2.1 codeprogrammingtool use | MiniMax | 72.2 Inference | 40.8 | 72.2 | 52.1 | 48.7 | 68.6 | $0.3 in / $1.2 out |
| 14 | MiniMax M2.5 minimax-m2.5 codeprogrammingtool use | MiniMax | 72.2 Inference | 0.0 | 72.2 | 50.4 | 56.9 | 68.6 | $0.3 in / $1.2 out |
| 15 | MiniMax M3 minimax-m3 multimodalvisionmulti-input reasoning | MiniMax | 72.2 Inference | 54.6 | 72.2 | 38.7 | 74.3 | 48.1 | $0.6 in / $2.4 out |
| 16 | Nova 2 Lite nova-2-lite multimodalvisionmulti-input reasoning | Amazon | 72.2 Inference | 42.8 | 72.2 | 13.0 | 27.0 | 50.0 | $0.3 in / $2.5 out |
| 17 | Nova 2 Sonic nova-2-sonic multimodalvisionmulti-input reasoning | Amazon | 72.2 Inference | 0.0 | 72.2 | 0.0 | 0.0 | 46.8 | $0.33 in / $2.75 out |
| 18 | Qwen3.6 Plus qwen3.6-plus multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 72.2 Inference | 70.2 | 72.2 | 42.1 | 61.0 | 44.9 | $0.5 in / $3 out |
| 19 | Qwen3.7 Max qwen3.7-max multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 72.2 Inference | 66.1 | 72.2 | 61.7 | 81.5 | 35.4 | $1.25 in / $3.75 out |
| 20 | Mercury 2 mercury-2 codeprogrammingtool use | Inception | 69.0 Inference | 43.4 | 69.0 | 0.0 | 20.3 | 79.7 | $0.25 in / $0.75 out |
GPT-5.5
OpenAI
93.7
$5 in / $30 out
GPT-4.1 nano
OpenAI
90.8
$0.1 in / $0.4 out
DeepSeek-V4-Flash-Max
DeepSeek
89.2
$0.14 in / $0.28 out
Page 1 of 16 · 309 models
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $0.1 in / $0.4 out |
| $0.14 in / $0.28 out |
| $1.74 in / $3.48 out |
| $1.5 in / $9 out |
| $0.4 in / $1.6 out |
| $0.25 in / $2 out |
| $2 in / $8 out |
| $0.25 in / $1.5 out |
| $0.5 in / $3 out |
DeepSeek-V4-Pro-Max
DeepSeek
89.2
$1.74 in / $3.48 out
Gemini 3.5 Flash
89.2
$1.5 in / $9 out
GPT-4.1 mini
OpenAI
87.8
$0.4 in / $1.6 out
GPT-5 mini
OpenAI
81.7
$0.25 in / $2 out
GPT-4.1
OpenAI
76.0
$2 in / $8 out
LongCat-Flash-Lite
Meituan
74.7
$0.1 in / $0.4 out
Gemini 3.1 Flash-Lite
72.2
$0.25 in / $1.5 out
Gemini 3 Flash
72.2
$0.5 in / $3 out
Grok 4.3
xAI
72.2
$1.25 in / $2.5 out
MiniMax M2.1
MiniMax
72.2
$0.3 in / $1.2 out
MiniMax M2.5
MiniMax
72.2
$0.3 in / $1.2 out
MiniMax M3
MiniMax
72.2
$0.6 in / $2.4 out
Nova 2 Lite
Amazon
72.2
$0.3 in / $2.5 out
Nova 2 Sonic
Amazon
72.2
$0.33 in / $2.75 out
Qwen3.6 Plus
Alibaba Cloud / Qwen Team
72.2
$0.5 in / $3 out
Qwen3.7 Max
Alibaba Cloud / Qwen Team
72.2
$1.25 in / $3.75 out
Mercury 2
Inception
69.0
$0.25 in / $0.75 out