Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
296
Tracked models
27
Providers
253
Benchmarked
32.1
Avg. index
296 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 121 | Claude Opus 4.6 claude-opus-4-6 multimodalvisionmulti-input reasoning | Anthropic | 43.1 Inference | 79.5 | 43.1 | 59.3 | 73.3 | 10.7 | $5 in / $25 out |
| 122 | Claude Opus 4.7 claude-opus-4-7 multimodalvisionmulti-input reasoning | Anthropic | 43.1 Inference | 76.8 | 43.1 | 68.6 | 81.4 | 10.7 | |
| 123 | Mistral Large 3 (675B Instruct 2512) mistral-large-latest multimodalvisionmulti-input reasoning | Mistral AI | 40.1 Inference | 22.2 | 40.1 | 0.0 | 0.0 | 44.5 | |
| 124 | Qwen3 30B A3B qwen3-30b-a3b textinference | Alibaba Cloud / Qwen Team | 40.1 Inference | 25.6 | 40.1 | 0.0 | 0.0 | 71.3 | $0.1 in / $0.44 out |
| 125 | DeepSeek-V3 0324 deepseek-v3-0324 textinference | DeepSeek | 39.8 Inference | 32.8 | 39.8 | 0.0 | 0.0 | 57.7 | $0.28 in / $1.14 out |
| 126 | DeepSeek-V3.1 deepseek-v3.1 codeprogrammingtool use | DeepSeek | 39.8 Inference | 38.4 | 39.8 | 15.2 | 28.3 | 58.8 | $0.27 in / $1 out |
| 127 | o3 o3-2025-04-16 multimodalvisionmulti-input reasoning | OpenAI | 38.9 Inference | 46.0 | 38.9 | 19.6 | 30.2 | 27.7 | $2 in / $8 out |
| 128 | Grok-2 grok-2 multimodalvisionmulti-input reasoning | xAI | 38.3 Inference | 27.1 | 38.3 | 0.0 | 0.0 | 25.4 | $2 in / $10 out |
| 129 | GPT-3.5 Turbo gpt-3.5-turbo-0125 multimodalvisionmulti-input reasoning | OpenAI | 36.7 Inference | 2.5 | 36.7 | 0.0 | 0.0 | 49.4 | |
| 130 | GLM-4.6 glm-4.6 multimodalvisionmulti-input reasoning | Zhipu AI | 34.5 Inference | 46.5 | 34.5 | 37.3 | 45.7 | 42.9 | $0.55 in / $2.19 out |
| 131 | GPT OSS 120B gpt-oss-120b textinference | OpenAI | 34.5 Inference | 36.1 | 34.5 | 26.8 | 0.0 | 76.4 | $0.09 in / $0.45 out |
| 132 | Jamba 1.5 Large jamba-1.5-large textinference | AI21 Labs | 33.6 Inference | 8.1 | 33.6 | 0.0 | 0.0 | 25.2 | $2 in / $8 out |
| 133 | Qwen3 235B A22B qwen3-235b-a22b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 33.5 Inference | 30.5 | 33.5 | 0.0 | 0.0 | 84.0 | $0.1 in / $0.1 out |
| 134 | o1-preview o1-preview codeprogrammingtool use | OpenAI | 33.0 Inference | 41.8 | 33.0 | 0.0 | 9.5 | 11.8 | $15 in / $60 out |
| 135 | Gemini 2.5 Flash-Lite gemini-2.5-flash-lite multimodalvisionmulti-input reasoning | Google | 32.8 Inference | 21.4 | 32.8 | 0.0 | 3.5 | 64.1 | |
| 136 | Command R+ command-r-plus-04-2024 textinference | Cohere | 32.5 Inference | 0.0 | 32.5 | 0.0 | 0.0 | 55.4 | $0.25 in / $1 out |
| 137 | GPT-5.2 Pro gpt-5.2-pro-2025-12-11 multimodalvisionmulti-input reasoning | OpenAI | 31.6 Inference | 66.9 | 31.6 | 55.4 | 0.0 | 2.7 | |
| 138 | Claude 3.5 Haiku claude-3-5-haiku-20241022 codeprogrammingtool use | Anthropic | 30.5 Inference | 10.8 | 30.5 | 3.0 | 7.8 | 31.8 | |
| 139 | Claude 3.7 Sonnet claude-3-7-sonnet-20250219 multimodalvisionmulti-input reasoning | Anthropic | 30.5 Inference | 43.5 | 30.5 | 49.0 | 39.6 | 13.3 | |
| 140 | Claude 3 Sonnet claude-3-sonnet-20240229 multimodalvisionmulti-input reasoning | Anthropic | 30.5 Inference | 10.0 | 30.5 | 0.0 | 0.0 | 13.3 |
Claude Opus 4.6
Anthropic
43.1
$5 in / $25 out
Claude Opus 4.7
Anthropic
43.1
$5 in / $25 out
Mistral Large 3 (675B Instruct 2512)
Mistral AI
40.1
$0.5 in / $1.5 out
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $5 in / $25 out |
| $0.5 in / $1.5 out |
| $0.5 in / $1.5 out |
| $0.1 in / $0.4 out |
| $21 in / $168 out |
| $0.8 in / $4 out |
| $3 in / $15 out |
| $3 in / $15 out |
Qwen3 30B A3B
Alibaba Cloud / Qwen Team
40.1
$0.1 in / $0.44 out
DeepSeek-V3 0324
DeepSeek
39.8
$0.28 in / $1.14 out
DeepSeek-V3.1
DeepSeek
39.8
$0.27 in / $1 out
o3
OpenAI
38.9
$2 in / $8 out
Grok-2
xAI
38.3
$2 in / $10 out
GPT-3.5 Turbo
OpenAI
36.7
$0.5 in / $1.5 out
GLM-4.6
Zhipu AI
34.5
$0.55 in / $2.19 out
GPT OSS 120B
OpenAI
34.5
$0.09 in / $0.45 out
Jamba 1.5 Large
AI21 Labs
33.6
$2 in / $8 out
Qwen3 235B A22B
Alibaba Cloud / Qwen Team
33.5
$0.1 in / $0.1 out
o1-preview
OpenAI
33.0
$15 in / $60 out
Gemini 2.5 Flash-Lite
32.8
$0.1 in / $0.4 out
Command R+
Cohere
32.5
$0.25 in / $1 out
GPT-5.2 Pro
OpenAI
31.6
$21 in / $168 out
Claude 3.5 Haiku
Anthropic
30.5
$0.8 in / $4 out
Claude 3.7 Sonnet
Anthropic
30.5
$3 in / $15 out
Claude 3 Sonnet
Anthropic
30.5
$3 in / $15 out