Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
294
Tracked models
27
Providers
251
Benchmarked
30.7
Avg. index
294 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 61 | Gemini 2.5 Flash-Lite gemini-2.5-flash-lite multimodalvisionmulti-input reasoning | Google | 64.4 Value / Price | 21.6 | 32.9 | 0.0 | 3.5 | 64.4 | $0.1 in / $0.4 out |
| 62 | Qwen3 32B qwen3-32b textinference | Alibaba Cloud / Qwen Team | 63.4 Value / Price | 21.4 | 13.3 | 0.0 | 0.0 | 63.4 | $0.1 in / $0.44 out |
| 63 | Qwen3 VL 30B A3B Instruct qwen3-vl-30b-a3b-instruct multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 63.3 Value / Price | 28.7 | 66.8 | 23.6 | 0.0 | 63.3 | |
| 64 | Qwen3-235B-A22B-Instruct-2507 qwen3-235b-a22b-instruct-2507 textinference | Alibaba Cloud / Qwen Team | 62.8 Value / Price | 42.9 | 66.8 | 0.0 | 0.0 | 62.8 | $0.15 in / $0.8 out |
| 65 | QwQ-32B-Preview qwq-32b-preview textinference | Alibaba Cloud / Qwen Team | 62.0 Value / Price | 29.0 | 29.5 | 0.0 | 0.0 | 62.0 | $0.15 in / $0.6 out |
| 66 | Kimi K2 Instruct kimi-k2-instruct codeprogrammingtool use | Moonshot AI | 61.7 Value / Price | 24.9 | 46.6 | 14.8 | 15.3 | 61.7 | |
| 67 | Llama 4 Maverick llama-4-maverick multimodalvisionmulti-input reasoning | Meta | 61.2 Value / Price | 35.5 | 58.9 | 0.0 | 0.0 | 61.2 | $0.17 in / $0.6 out |
| 68 | Qwen3 VL 4B Thinking qwen3-vl-4b-thinking multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 60.6 Value / Price | 23.1 | 66.8 | 18.9 | 0.0 | 60.6 | |
| 69 | DeepSeek-V3 deepseek-v3 codeprogrammingtool use | DeepSeek | 60.4 Value / Price | 28.0 | 57.3 | 0.0 | 10.6 | 60.4 | $0.27 in / $1.1 out |
| 70 | Qwen3 VL 30B A3B Thinking qwen3-vl-30b-a3b-thinking multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 60.0 Value / Price | 35.5 | 66.8 | 21.3 | 0.0 | 60.0 | |
| 71 | Claude 3 Haiku claude-3-haiku-20240307 multimodalvisionmulti-input reasoning | Anthropic | 59.9 Value / Price | 5.8 | 68.7 | 0.0 | 0.0 | 59.9 | |
| 72 | DeepSeek-V3.1 deepseek-v3.1 codeprogrammingtool use | DeepSeek | 58.9 Value / Price | 38.7 | 40.2 | 15.3 | 28.7 | 58.9 | $0.27 in / $1 out |
| 73 | DeepSeek-V3 0324 deepseek-v3-0324 textinference | DeepSeek | 57.8 Value / Price | 33.1 | 40.2 | 0.0 | 0.0 | 57.8 | $0.28 in / $1.14 out |
| 74 | LongCat-Flash-Chat longcat-flash-chat codeprogrammingtool use | Meituan | 57.7 Value / Price | 28.1 | 51.9 | 49.2 | 39.4 | 57.7 | |
| 75 | LongCat-Flash-Thinking-2601 longcat-flash-thinking-2601 codeprogrammingtool use | Meituan | 57.7 Value / Price | 56.3 | 51.9 | 30.8 | 38.0 | 57.7 | |
| 76 | MiniMax M2.1 minimax-m2.1 codeprogrammingtool use | MiniMax | 57.7 Value / Price | 42.7 | 73.9 | 56.6 | 50.6 | 57.7 | $0.3 in / $1.2 out |
| 77 | MiniMax M2.5 minimax-m2.5 codeprogrammingtool use | MiniMax | 57.7 Value / Price | 0.0 | 73.9 | 53.0 | 56.3 | 57.7 | $0.3 in / $1.2 out |
| 78 | GPT-5.4 nano gpt-5.4-nano multimodalvisionmulti-input reasoning | OpenAI | 57.2 Value / Price | 46.1 | 77.4 | 11.0 | 11.2 | 57.2 | |
| 79 | GPT-4.1 mini gpt-4.1-mini-2025-04-14 multimodalvisionmulti-input reasoning | OpenAI | 56.8 Value / Price | 20.8 | 90.9 | 8.9 | 2.6 | 56.8 | |
| 80 | GPT-5 mini gpt-5-mini-2025-08-07 multimodalvisionmulti-input reasoning | OpenAI | 56.3 Value / Price | 41.9 | 89.7 | 0.0 | 23.7 | 56.3 |
Gemini 2.5 Flash-Lite
64.4
$0.1 in / $0.4 out
Qwen3 32B
Alibaba Cloud / Qwen Team
63.4
$0.1 in / $0.44 out
Qwen3 VL 30B A3B Instruct
Alibaba Cloud / Qwen Team
63.3
$0.2 in / $0.7 out
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $0.2 in / $0.7 out |
| $0.5 in / $0.5 out |
| $0.1 in / $1 out |
| $0.2 in / $1 out |
| $0.25 in / $1.25 out |
| $0.3 in / $1.2 out |
| $0.3 in / $1.2 out |
| $0.2 in / $1.25 out |
| $0.4 in / $1.6 out |
| $0.25 in / $2 out |
Qwen3-235B-A22B-Instruct-2507
Alibaba Cloud / Qwen Team
62.8
$0.15 in / $0.8 out
QwQ-32B-Preview
Alibaba Cloud / Qwen Team
62.0
$0.15 in / $0.6 out
Kimi K2 Instruct
Moonshot AI
61.7
$0.5 in / $0.5 out
Llama 4 Maverick
Meta
61.2
$0.17 in / $0.6 out
Qwen3 VL 4B Thinking
Alibaba Cloud / Qwen Team
60.6
$0.1 in / $1 out
DeepSeek-V3
DeepSeek
60.4
$0.27 in / $1.1 out
Qwen3 VL 30B A3B Thinking
Alibaba Cloud / Qwen Team
60.0
$0.2 in / $1 out
Claude 3 Haiku
Anthropic
59.9
$0.25 in / $1.25 out
DeepSeek-V3.1
DeepSeek
58.9
$0.27 in / $1 out
DeepSeek-V3 0324
DeepSeek
57.8
$0.28 in / $1.14 out
LongCat-Flash-Chat
Meituan
57.7
$0.3 in / $1.2 out
LongCat-Flash-Thinking-2601
Meituan
57.7
$0.3 in / $1.2 out
MiniMax M2.1
MiniMax
57.7
$0.3 in / $1.2 out
MiniMax M2.5
MiniMax
57.7
$0.3 in / $1.2 out
GPT-5.4 nano
OpenAI
57.2
$0.2 in / $1.25 out
GPT-4.1 mini
OpenAI
56.8
$0.4 in / $1.6 out
GPT-5 mini
OpenAI
56.3
$0.25 in / $2 out