Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
294
Tracked models
27
Providers
251
Benchmarked
34.7
Avg. index
294 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 81 | Step3-VL-10B step3-vl-10b multimodalvisionmulti-input reasoning | StepFun | 47.4 overall | 47.4 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 82 | MiniMax M2.7 minimax-m2.7 codeprogrammingtool use | MiniMax | 47.3 overall | 0.0 | 52.8 | 50.8 | 35.9 | 55.0 | $0.3 in / $1.2 out |
| 83 | Mercury 2 mercury-2 codeprogrammingtool use | Inception | 47.2 overall | 44.6 | 72.5 | 0.0 | 22.3 | 69.2 | $0.25 in / $0.75 out |
| 84 | Llama 4 Maverick llama-4-maverick multimodalvisionmulti-input reasoning | Meta | 47.2 overall | 35.5 | 58.9 | 0.0 | 0.0 | 61.2 | $0.17 in / $0.6 out |
| 85 | Mistral Small 4 mistral-small-latest multimodalvisionmulti-input reasoning | Mistral AI | 47.0 overall | 34.8 | 55.9 | 0.0 | 0.0 | 66.9 | |
| 86 | GLM-4.7 glm-4.7 multimodalvisionmulti-input reasoning | Zhipu AI | 46.8 overall | 63.2 | 52.8 | 28.2 | 44.5 | 40.6 | $0.6 in / $2.2 out |
| 87 | Gemini 2.0 Flash-Lite gemini-2.0-flash-lite multimodalvisionmulti-input reasoning | Google | 46.8 overall | 25.7 | 63.2 | 0.0 | 0.0 | 79.7 | |
| 88 | Gemini 2.0 Flash Thinking gemini-2.0-flash-thinking multimodalvisionmulti-input reasoning | Google | 46.7 overall | 46.7 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 89 | GLM-5 glm-5 codeprogrammingtool use | Zhipu AI | 46.2 overall | 0.0 | 22.1 | 51.3 | 65.3 | 30.2 | $1 in / $3.2 out |
| 90 | Devstral Small 1.1 devstral-small-2507 codeprogrammingtool use | Mistral AI | 46.1 overall | 0.0 | 64.5 | 0.0 | 15.0 | 85.0 | $0.1 in / $0.3 out |
| 91 | DeepSeek-V3.2 deepseek-v3.2 codeprogrammingtool use | DeepSeek | 45.8 overall | 58.1 | 52.5 | 16.6 | 45.9 | 70.0 | $0.26 in / $0.38 out |
| 92 | LongCat-Flash-Thinking-2601 longcat-flash-thinking-2601 codeprogrammingtool use | Meituan | 45.6 overall | 56.3 | 51.9 | 30.8 | 38.0 | 57.7 | |
| 93 | o4-mini o4-mini multimodalvisionmulti-input reasoning | OpenAI | 45.5 overall | 48.8 | 70.7 | 38.2 | 32.7 | 41.9 | $1.1 in / $4.4 out |
| 94 | Claude Sonnet 4 claude-sonnet-4-20250514 multimodalvisionmulti-input reasoning | Anthropic | 44.9 overall | 41.0 | 0.0 | 49.4 | 44.9 | 0.0 | |
| 95 | GPT-5.1 Codex gpt-5.1-codex multimodalvisionmulti-input reasoning | OpenAI | 44.9 overall | 0.0 | 48.6 | 0.0 | 51.2 | 25.1 | $1.25 in / $10 out |
| 96 | Gemini 2.5 Pro Preview 06-05 gemini-2.5-pro-preview-06-05 multimodalvisionmulti-input reasoning | Google | 44.7 overall | 51.7 | 63.2 | 0.0 | 30.0 | 27.9 | |
| 97 | Qwen3 VL 235B A22B Thinking qwen3-vl-235b-a22b-thinking multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 44.4 overall | 37.9 | 66.8 | 40.2 | 0.0 | 37.2 | |
| 98 | Grok Code Fast 1 grok-code-fast-1 codeprogrammingtool use | xAI | 44.2 overall | 0.0 | 47.2 | 0.0 | 39.7 | 49.5 | $0.2 in / $1.5 out |
| 99 | GPT-5.4 Mini gpt-5.4-mini texttext-to-textlanguage | OpenAI | 44.1 overall | 57.4 | 77.4 | 27.1 | 26.9 | 32.8 | |
| 100 | Qwen2.5-Coder 32B Instruct qwen-2.5-coder-32b-instruct textinference | Alibaba Cloud / Qwen Team | 44.1 overall | 0.0 | 20.9 | 0.0 | 0.0 | 81.2 | $0.09 in / $0.09 out |
Step3-VL-10B
StepFun
47.4
N/A
MiniMax M2.7
MiniMax
47.3
$0.3 in / $1.2 out
Mercury 2
Inception
47.2
$0.25 in / $0.75 out
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $0.15 in / $0.6 out |
| $0.07 in / $0.3 out |
| N/A |
| $0.3 in / $1.2 out |
| N/A |
| $1.25 in / $10 out |
| $0.45 in / $3.49 out |
| $0.75 in / $4.5 out |
Llama 4 Maverick
Meta
47.2
$0.17 in / $0.6 out
Mistral Small 4
Mistral AI
47.0
$0.15 in / $0.6 out
GLM-4.7
Zhipu AI
46.8
$0.6 in / $2.2 out
Gemini 2.0 Flash-Lite
46.8
$0.07 in / $0.3 out
Gemini 2.0 Flash Thinking
46.7
N/A
GLM-5
Zhipu AI
46.2
$1 in / $3.2 out
Devstral Small 1.1
Mistral AI
46.1
$0.1 in / $0.3 out
DeepSeek-V3.2
DeepSeek
45.8
$0.26 in / $0.38 out
LongCat-Flash-Thinking-2601
Meituan
45.6
$0.3 in / $1.2 out
o4-mini
OpenAI
45.5
$1.1 in / $4.4 out
Claude Sonnet 4
Anthropic
44.9
N/A
GPT-5.1 Codex
OpenAI
44.9
$1.25 in / $10 out
Gemini 2.5 Pro Preview 06-05
44.7
$1.25 in / $10 out
Qwen3 VL 235B A22B Thinking
Alibaba Cloud / Qwen Team
44.4
$0.45 in / $3.49 out
Grok Code Fast 1
xAI
44.2
$0.2 in / $1.5 out
Qwen2.5-Coder 32B Instruct
Alibaba Cloud / Qwen Team
44.1
$0.09 in / $0.09 out