Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
294
Tracked models
27
Providers
251
Benchmarked
31.8
Avg. index
294 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 21 | GPT-5.4 nano gpt-5.4-nano multimodalvisionmulti-input reasoning | OpenAI | 77.4 Inference | 46.1 | 77.4 | 11.0 | 11.2 | 57.2 | $0.2 in / $1.25 out |
| 22 | GPT OSS 20B gpt-oss-20b textinference | OpenAI | 77.3 Inference | 26.1 | 77.3 | 6.0 | 0.0 | 79.3 | $0.1 in / $0.5 out |
| 23 | Ministral 3 (14B Reasoning 2512) ministral-14b-latest multimodalvisionmulti-input reasoning | Mistral AI | 76.8 Inference | 37.9 | 76.8 | 0.0 | 0.0 | 84.5 | |
| 24 | GPT-4.1 gpt-4.1-2025-04-14 multimodalvisionmulti-input reasoning | OpenAI | 75.4 Inference | 28.8 | 75.4 | 32.8 | 17.7 | 34.6 | |
| 25 | MiniMax M2.1 minimax-m2.1 codeprogrammingtool use | MiniMax | 73.9 Inference | 42.7 | 73.9 | 56.6 | 50.6 | 57.7 | $0.3 in / $1.2 out |
| 26 | MiniMax M2.5 minimax-m2.5 codeprogrammingtool use | MiniMax | 73.9 Inference | 0.0 | 73.9 | 53.0 | 56.3 | 57.7 | $0.3 in / $1.2 out |
| 27 | Mercury 2 mercury-2 codeprogrammingtool use | Inception | 72.5 Inference | 44.6 | 72.5 | 0.0 | 22.3 | 69.2 | $0.25 in / $0.75 out |
| 28 | GPT-5.1 gpt-5.1-2025-11-13 multimodalvisionmulti-input reasoning | OpenAI | 71.4 Inference | 65.0 | 71.4 | 0.0 | 57.2 | 31.9 | |
| 29 | GPT-5.1 Instant gpt-5.1-instant-2025-11-12 multimodalvisionmulti-input reasoning | OpenAI | 71.4 Inference | 65.0 | 71.4 | 0.0 | 57.2 | 31.9 | |
| 30 | GPT-5.2 gpt-5.2-2025-12-11 multimodalvisionmulti-input reasoning | OpenAI | 71.4 Inference | 76.9 | 71.4 | 50.3 | 72.4 | 26.4 | |
| 31 | Qwen2.5 7B Instruct qwen-2.5-7b-instruct textinference | Alibaba Cloud / Qwen Team | 70.8 Inference | 7.5 | 70.8 | 0.0 | 0.0 | 77.3 | $0.3 in / $0.3 out |
| 32 | o3-mini o3-mini codeprogrammingtool use | OpenAI | 70.7 Inference | 26.0 | 70.7 | 11.9 | 12.5 | 41.9 | $1.1 in / $4.4 out |
| 33 | o4-mini o4-mini multimodalvisionmulti-input reasoning | OpenAI | 70.7 Inference | 48.8 | 70.7 | 38.2 | 32.7 | 41.9 | $1.1 in / $4.4 out |
| 34 | Nova Lite nova-lite multimodalvisionmulti-input reasoning | Amazon | 69.9 Inference | 13.6 | 69.9 | 0.0 | 0.0 | 86.4 | $0.06 in / $0.24 out |
| 35 | Nova Pro nova-pro multimodalvisionmulti-input reasoning | Amazon | 69.9 Inference | 20.0 | 69.9 | 0.0 | 0.0 | 42.8 | $0.8 in / $3.2 out |
| 36 | Llama 3.2 3B Instruct llama-3.2-3b-instruct textinference | Meta | 69.0 Inference | 5.3 | 69.0 | 0.0 | 0.0 | 98.8 | $0.01 in / $0.02 out |
| 37 | Claude 3.5 Haiku claude-3-5-haiku-20241022 codeprogrammingtool use | Anthropic | 68.7 Inference | 10.9 | 68.7 | 3.0 | 7.9 | 43.1 | |
| 38 | Claude 3 Haiku claude-3-haiku-20240307 multimodalvisionmulti-input reasoning | Anthropic | 68.7 Inference | 5.8 | 68.7 | 0.0 | 0.0 | 59.9 | |
| 39 | Grok-4.1 Fast Non-Reasoning grok-4-1-fast-non-reasoning multimodalvisionmulti-input reasoning | xAI | 68.2 Inference | 0.0 | 68.2 | 0.0 | 0.0 | 67.2 | |
| 40 | Grok-4.1 Fast Reasoning grok-4-1-fast-reasoning multimodalvisionmulti-input reasoning | xAI | 68.2 Inference | 0.0 | 68.2 | 0.0 | 0.0 | 67.2 |
GPT-5.4 nano
OpenAI
77.4
$0.2 in / $1.25 out
GPT OSS 20B
OpenAI
77.3
$0.1 in / $0.5 out
Ministral 3 (14B Reasoning 2512)
Mistral AI
76.8
$0.2 in / $0.2 out
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $0.2 in / $0.2 out |
| $2 in / $8 out |
| $1.25 in / $10 out |
| $1.25 in / $10 out |
| $1.75 in / $14 out |
| $0.8 in / $4 out |
| $0.25 in / $1.25 out |
| $0.2 in / $0.5 out |
| $0.2 in / $0.5 out |
GPT-4.1
OpenAI
75.4
$2 in / $8 out
MiniMax M2.1
MiniMax
73.9
$0.3 in / $1.2 out
MiniMax M2.5
MiniMax
73.9
$0.3 in / $1.2 out
Mercury 2
Inception
72.5
$0.25 in / $0.75 out
GPT-5.1
OpenAI
71.4
$1.25 in / $10 out
GPT-5.1 Instant
OpenAI
71.4
$1.25 in / $10 out
GPT-5.2
OpenAI
71.4
$1.75 in / $14 out
Qwen2.5 7B Instruct
Alibaba Cloud / Qwen Team
70.8
$0.3 in / $0.3 out
o3-mini
OpenAI
70.7
$1.1 in / $4.4 out
o4-mini
OpenAI
70.7
$1.1 in / $4.4 out
Nova Lite
Amazon
69.9
$0.06 in / $0.24 out
Nova Pro
Amazon
69.9
$0.8 in / $3.2 out
Llama 3.2 3B Instruct
Meta
69.0
$0.01 in / $0.02 out
Claude 3.5 Haiku
Anthropic
68.7
$0.8 in / $4 out
Claude 3 Haiku
Anthropic
68.7
$0.25 in / $1.25 out
Grok-4.1 Fast Non-Reasoning
xAI
68.2
$0.2 in / $0.5 out
Grok-4.1 Fast Reasoning
xAI
68.2
$0.2 in / $0.5 out