Skytells
HomeModelsCLIChangelog
  • Home
  • Models
  • CLI
  • Changelog
Skytells

Addressing the world's greatest challenges with AI. Enterprise research, foundation models, and infrastructure trusted by organizations worldwide since 2012.

Get Started

  • Console
  • Learn
  • Documentation
  • API Reference
  • Pricing
  • ModelsNew

Platform

  • Cloud AgentsNew
  • AI Solutions
  • Infrastructure
  • Edge Network
  • Trust Center
  • CLI

Resources

  • Blog
  • Changelog
  • AI Leaderboard
  • Research
  • Status

Company

  • About
  • Careers
  • Legal
  • Privacy Policy

© 2012–2026 Skytells, Inc. All rights reserved.

Live rankings

AI Model Leaderboard

Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.

Explore full leaderboardBrowse model catalog

296

Tracked models

27

Providers

253

Benchmarked

27.4

Avg. index

OverallBenchmarksInferenceAgenticProgrammingValue / Price

296 models

RankModelProviderScoreBenchmarksInferenceAgenticProgrammingValuePrice
161

Magistral Medium

magistral-medium

multimodalvisionmulti-input reasoning
Mistral AI

22.2

Benchmarks

22.20.00.00.00.0N/A
162

Mistral Large 3 (675B Base)

mistral-large-3-675b-base-2512

multimodalvisionmulti-input reasoning
Mistral AI

22.2

Benchmarks

22.20.00.00.00.0
163

Mistral Large 3 (675B Instruct 2512 Eagle)

mistral-large-3-675B-instruct-2512-eagle

multimodalvisionmulti-input reasoning
Mistral AI

22.2

Benchmarks

22.20.00.00.00.0
164

Mistral Large 3 (675B Instruct 2512 NVFP4)

mistral-large-3-675b-instruct-2512-nvfp4

multimodalvisionmulti-input reasoning
Mistral AI

22.2

Benchmarks

22.20.00.00.00.0
165

Mistral Large 3 (675B Instruct 2512)

mistral-large-latest

multimodalvisionmulti-input reasoning
Mistral AI

22.2

Benchmarks

22.240.10.00.044.5
166

Min istral 3 (3B Reasoning 2512)

ministral-3b-latest

multimodalvisionmulti-input reasoning
Mistral AI

22.0

Benchmarks

22.079.60.00.095.8
167

Phi 4 Mini Reasoning

phi-4-mini-reasoning

textinference
MMicrosoft

21.7

Benchmarks

21.70.00.00.00.0N/A
168

Gemini 2.5 Flash-Lite

gemini-2.5-flash-lite

multimodalvisionmulti-input reasoning
Google

21.4

Benchmarks

21.432.80.03.564.1
169

Qwen3 32B

qwen3-32b

textinference
AAlibaba Cloud / Qwen Team

21.4

Benchmarks

21.413.30.00.069.8$0.1 in / $0.3 out
170

Qwen2.5 VL 32B Instruct

qwen2.5-vl-32b

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

21.2

Benchmarks

21.20.01.60.00.0N/A
171

GPT-4.1 mini

gpt-4.1-mini-2025-04-14

multimodalvisionmulti-input reasoning
OpenAI

20.7

Benchmarks

20.790.68.92.656.8
172

Llama 3.1 405B Instruct

llama-3.1-405b-instruct

textinference
MMeta

20.0

Benchmarks

20.021.40.00.044.5$0.89 in / $0.89 out
173

Nova Pro

nova-pro

multimodalvisionmulti-input reasoning
AAmazon

20.0

Benchmarks

20.070.50.00.043.2$0.8 in / $3.2 out
174

Llama 3.3 70B Instruct

llama-3.3-70b-instruct

textinference
MMeta

19.6

Benchmarks

19.621.40.00.072.2$0.2 in / $0.2 out
175

Qwen3 VL 4B Instruct

qwen3-vl-4b-instruct

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

19.6

Benchmarks

19.666.019.50.070.3
176

Claude 3 Opus

claude-3-opus-20240229

multimodalvisionmulti-input reasoning
Anthropic

19.3

Benchmarks

19.371.70.00.019.5
177

Gemma 4 E4B

gemma-4-e4b-it

multimodalvisionmulti-input reasoning
Google

19.2

Benchmarks

19.20.00.00.00.0N/A
178

Mistral Small 3.2 24B Instruct

mistral-small-3.2-24b-instruct-2506

multimodalvisionmulti-input reasoning
Mistral AI

19.1

Benchmarks

19.10.00.00.00.0
179

Qwen2.5 32B Instruct

qwen-2.5-32b-instruct

textinference
AAlibaba Cloud / Qwen Team

18.6

Benchmarks

18.60.00.00.00.0N/A
180

DeepSeek R1 Distill Qwen 7B

deepseek-r1-distill-qwen-7b

textinference
DeepSeek

18.3

Benchmarks

18.30.00.00.00.0N/A
161

Magistral Medium

Mistral AI

22.2

N/A

162

Mistral Large 3 (675B Base)

Mistral AI

22.2

N/A

163

Mistral Large 3 (675B Instruct 2512 Eagle)

Mistral AI

22.2

N/A

Page 9 of 15 · 296 models

PreviousNext

Want benchmark charts, model comparison, and pricing analytics?

Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.

Open full leaderboard

Rankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.

N/A
N/A
N/A
$0.5 in / $1.5 out
$0.1 in / $0.1 out
$0.1 in / $0.4 out
$0.4 in / $1.6 out
$0.1 in / $0.6 out
$15 in / $75 out
N/A
164

Mistral Large 3 (675B Instruct 2512 NVFP4)

Mistral AI

22.2

N/A

165

Mistral Large 3 (675B Instruct 2512)

Mistral AI

22.2

$0.5 in / $1.5 out

166

Min istral 3 (3B Reasoning 2512)

Mistral AI

22.0

$0.1 in / $0.1 out

167
M

Phi 4 Mini Reasoning

Microsoft

21.7

N/A

168

Gemini 2.5 Flash-Lite

Google

21.4

$0.1 in / $0.4 out

169
A

Qwen3 32B

Alibaba Cloud / Qwen Team

21.4

$0.1 in / $0.3 out

170
A

Qwen2.5 VL 32B Instruct

Alibaba Cloud / Qwen Team

21.2

N/A

171

GPT-4.1 mini

OpenAI

20.7

$0.4 in / $1.6 out

172
M

Llama 3.1 405B Instruct

Meta

20.0

$0.89 in / $0.89 out

173
A

Nova Pro

Amazon

20.0

$0.8 in / $3.2 out

174
M

Llama 3.3 70B Instruct

Meta

19.6

$0.2 in / $0.2 out

175
A

Qwen3 VL 4B Instruct

Alibaba Cloud / Qwen Team

19.6

$0.1 in / $0.6 out

176

Claude 3 Opus

Anthropic

19.3

$15 in / $75 out

177

Gemma 4 E4B

Google

19.2

N/A

178

Mistral Small 3.2 24B Instruct

Mistral AI

19.1

N/A

179
A

Qwen2.5 32B Instruct

Alibaba Cloud / Qwen Team

18.6

N/A

180

DeepSeek R1 Distill Qwen 7B

DeepSeek

18.3

N/A