Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
296
Tracked models
27
Providers
253
Benchmarked
32.1
Avg. index
296 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 81 | GPT-5.1 Medium gpt-5.1-medium-2025-11-12 multimodalvisionmulti-input reasoning | OpenAI | 61.9 Inference | 63.6 | 61.9 | 0.0 | 0.0 | 28.9 | $1.25 in / $10 out |
| 82 | GPT-5 Medium gpt-5-medium-2025-08-07 multimodalvisionmulti-input reasoning | OpenAI | 61.9 Inference | 56.7 | 61.9 | 0.0 | 0.0 | 28.9 | |
| 83 | Claude 3 Haiku claude-3-haiku-20240307 multimodalvisionmulti-input reasoning | Anthropic | 61.8 Inference | 5.8 | 61.8 | 0.0 | 0.0 | 57.9 | |
| 84 | Claude Haiku 4.5 claude-haiku-4-5-20251001 multimodalvisionmulti-input reasoning | Anthropic | 61.8 Inference | 32.7 | 61.8 | 54.2 | 56.6 | 37.7 | |
| 85 | o1-mini o1-mini textinference | OpenAI | 61.3 Inference | 25.7 | 61.3 | 0.0 | 0.0 | 30.1 | $3 in / $12 out |
| 86 | Llama 3.2 11B Instruct llama-3.2-11b-instruct multimodalvisionmulti-input reasoning | Meta | 60.3 Inference | 4.0 | 60.3 | 0.0 | 0.0 | 94.9 | $0.05 in / $0.05 out |
| 87 | MiMo-V2-Omni mimo-v2-omni multimodalvisionmulti-input reasoning | Xiaomi | 58.6 Inference | 0.0 | 58.6 | 0.0 | 54.4 | 44.8 | $0.4 in / $2 out |
| 88 | DeepSeek-V3.2 (Non-thinking) deepseek-chat textinference | DeepSeek | 58.0 Inference | 0.0 | 58.0 | 0.0 | 0.0 | 70.2 | $0.28 in / $0.42 out |
| 89 | DeepSeek-V3 deepseek-v3 codeprogrammingtool use | DeepSeek | 58.0 Inference | 27.3 | 58.0 | 0.0 | 10.4 | 60.5 | $0.27 in / $1.1 out |
| 90 | GPT OSS 120B High gpt-oss-120b-high multimodalvisionmulti-input reasoning | OpenAI | 58.0 Inference | 44.7 | 58.0 | 0.0 | 0.0 | 73.3 | |
| 91 | Gemini 1.0 Pro gemini-1.0-pro multimodalvisionmulti-input reasoning | Google | 57.2 Inference | 3.2 | 57.2 | 0.0 | 0.0 | 55.4 | |
| 92 | Llama 4 Maverick llama-4-maverick multimodalvisionmulti-input reasoning | Meta | 55.8 Inference | 35.4 | 55.8 | 0.0 | 0.0 | 57.1 | $0.17 in / $0.85 out |
| 93 | GPT-5.1 Thinking gpt-5.1-thinking-2025-11-12 multimodalvisionmulti-input reasoning | OpenAI | 55.6 Inference | 64.9 | 55.6 | 0.0 | 56.2 | 27.0 | |
| 94 | Mistral Small 4 mistral-small-latest multimodalvisionmulti-input reasoning | Mistral AI | 55.2 Inference | 34.7 | 55.2 | 0.0 | 0.0 | 66.8 | |
| 95 | Qwen3-Coder qwen3-coder textinference | Alibaba Cloud / Qwen Team | 55.2 Inference | 0.0 | 55.2 | 0.0 | 0.0 | 88.5 | $0.18 in / $0.18 out |
| 96 | Qwen3 Max qwen3-max codeprogrammingtool use | Alibaba Cloud / Qwen Team | 55.2 Inference | 29.8 | 55.2 | 0.0 | 35.8 | 31.3 | $0.5 in / $5 out |
| 97 | GPT-4 gpt-4-0613 multimodalvisionmulti-input reasoning | OpenAI | 54.9 Inference | 6.8 | 54.9 | 0.0 | 0.0 | 18.7 | $30 in / $60 out |
| 98 | DeepSeek-V3.2 deepseek-v3.2 codeprogrammingtool use | DeepSeek | 53.2 Inference | 57.3 | 53.2 | 15.5 | 44.9 | 70.0 | $0.26 in / $0.38 out |
| 99 | GPT-4 Turbo gpt-4-turbo-2024-04-09 textinference | OpenAI | 52.7 Inference | 16.9 | 52.7 | 0.0 | 0.0 | 18.8 | $10 in / $30 out |
| 100 | GPT-5.3 Chat gpt-5.3-chat-latest multimodalvisionmulti-input reasoning | OpenAI | 52.7 Inference | 0.0 | 52.7 | 0.0 | 0.0 | 26.5 |
GPT-5.1 Medium
OpenAI
61.9
$1.25 in / $10 out
GPT-5 Medium
OpenAI
61.9
$1.25 in / $10 out
Claude 3 Haiku
Anthropic
61.8
$0.25 in / $1.25 out
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $1.25 in / $10 out |
| $0.25 in / $1.25 out |
| $1 in / $5 out |
| $0.1 in / $0.5 out |
| $0.5 in / $1.5 out |
| $1.25 in / $10 out |
| $0.15 in / $0.6 out |
| $1.75 in / $14 out |
Claude Haiku 4.5
Anthropic
61.8
$1 in / $5 out
o1-mini
OpenAI
61.3
$3 in / $12 out
Llama 3.2 11B Instruct
Meta
60.3
$0.05 in / $0.05 out
MiMo-V2-Omni
Xiaomi
58.6
$0.4 in / $2 out
DeepSeek-V3.2 (Non-thinking)
DeepSeek
58.0
$0.28 in / $0.42 out
DeepSeek-V3
DeepSeek
58.0
$0.27 in / $1.1 out
GPT OSS 120B High
OpenAI
58.0
$0.1 in / $0.5 out
Gemini 1.0 Pro
57.2
$0.5 in / $1.5 out
Llama 4 Maverick
Meta
55.8
$0.17 in / $0.85 out
GPT-5.1 Thinking
OpenAI
55.6
$1.25 in / $10 out
Mistral Small 4
Mistral AI
55.2
$0.15 in / $0.6 out
Qwen3-Coder
Alibaba Cloud / Qwen Team
55.2
$0.18 in / $0.18 out
Qwen3 Max
Alibaba Cloud / Qwen Team
55.2
$0.5 in / $5 out
GPT-4
OpenAI
54.9
$30 in / $60 out
DeepSeek-V3.2
DeepSeek
53.2
$0.26 in / $0.38 out
GPT-4 Turbo
OpenAI
52.7
$10 in / $30 out
GPT-5.3 Chat
OpenAI
52.7
$1.75 in / $14 out