Skytells
HomeModelsCLIChangelog
  • Home
  • Models
  • CLI
  • Changelog
Skytells

Addressing the world's greatest challenges with AI. Enterprise research, foundation models, and infrastructure trusted by organizations worldwide since 2012.

Get Started

  • Console
  • Learn
  • Documentation
  • API Reference
  • Pricing
  • ModelsNew

Platform

  • Cloud AgentsNew
  • AI Solutions
  • Infrastructure
  • Edge Network
  • Trust Center
  • CLI

Resources

  • Blog
  • Changelog
  • AI Leaderboard
  • Research
  • Status

Company

  • About
  • Careers
  • Legal
  • Privacy Policy

© 2012–2026 Skytells, Inc. All rights reserved.

Live rankings

AI Model Leaderboard

Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.

Explore full leaderboardBrowse model catalog

296

Tracked models

27

Providers

253

Benchmarked

34.7

Avg. index

OverallBenchmarksInferenceAgenticProgrammingValue / Price

296 models

RankModelProviderScoreBenchmarksInferenceAgenticProgrammingValuePrice
161

GPT-4.1 nano

gpt-4.1-nano-2025-04-14

multimodalvisionmulti-input reasoning
OpenAI

34.2

overall

12.593.40.00.082.7$0.1 in / $0.4 out
162

Nemotron 3 Nano (30B A3B)

nemotron-3-nano-30b-a3b

codeprogrammingtool use
NNVIDIA

34.1

overall

45.466.03.34.490.9$0.06 in / $0.24 out
163

Qwen3.6-35B-A3B

qwen3.6-35b-a3b

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

33.7

overall

55.30.015.526.00.0N/A
164

MiniMax M1 80K

minimax-m1-80k

codeprogrammingtool use
MiniMax

33.6

overall

24.284.020.919.041.8$0.55 in / $2.2 out
165

Ministral 8B Instruct

ministral-8b-instruct-2410

textinference
Mistral AI

33.6

overall

0.07.00.00.076.1$0.1 in / $0.1 out
166

o3

o3-2025-04-16

multimodalvisionmulti-input reasoning
OpenAI

33.2

overall

46.038.919.630.227.7$2 in / $8 out
167

DeepSeek-V3

deepseek-v3

codeprogrammingtool use
DeepSeek

33.2

overall

27.358.00.010.460.5$0.27 in / $1.1 out
168

DeepSeek-V3.1

deepseek-v3.1

codeprogrammingtool use
DeepSeek

32.9

overall

38.439.815.228.358.8$0.27 in / $1 out
169

DeepSeek R1 Distill Qwen 32B

deepseek-r1-distill-qwen-32b

textinference
DeepSeek

32.7

overall

26.616.60.00.075.9$0.12 in / $0.18 out
170

DeepSeek-V2.5

deepseek-v2.5

codeprogrammingtool use
DeepSeek

32.5

overall

0.046.50.00.979.7$0.14 in / $0.28 out
171

DeepSeek R1 Distill Llama 70B

deepseek-r1-distill-llama-70b

textinference
DeepSeek

32.2

overall

28.816.60.00.066.6$0.1 in / $0.4 out
172

Qwen3.5-4B

qwen3.5-4b

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

32.1

overall

32.10.00.00.00.0N/A
173

Claude 3 Haiku

claude-3-haiku-20240307

multimodalvisionmulti-input reasoning
Anthropic

32.0

overall

5.861.80.00.057.9
174

Mistral Large 3 (675B Instruct 2512)

mistral-large-latest

multimodalvisionmulti-input reasoning
Mistral AI

31.6

overall

22.240.10.00.044.5
175

Phi 4 Reasoning Plus

phi-4-reasoning-plus

textinference
MMicrosoft

31.5

overall

31.50.00.00.00.0N/A
176

Hermes 3 70B

hermes-3-70b

textinference
NNous Research

30.1

overall

30.10.00.00.00.0N/A
177

Grok-2

grok-2

multimodalvisionmulti-input reasoning
xAI

30.1

overall

27.138.30.00.025.4$2 in / $10 out
178

GLM-4.7-Flash

glm-4.7-flash

codeprogrammingtool use
ZZhipu AI

29.9

overall

38.229.711.420.772.1$0.07 in / $0.4 out
179

GPT-4o

gpt-4o-2024-05-13

multimodalvisionmulti-input reasoning
OpenAI

29.9

overall

22.345.40.00.026.5$2.5 in / $10 out
180

Llama 3.3 70B Instruct

llama-3.3-70b-instruct

textinference
MMeta

29.9

overall

19.621.40.00.072.2$0.2 in / $0.2 out
161

GPT-4.1 nano

OpenAI

34.2

$0.1 in / $0.4 out

162
N

Nemotron 3 Nano (30B A3B)

NVIDIA

34.1

$0.06 in / $0.24 out

163
A

Qwen3.6-35B-A3B

Alibaba Cloud / Qwen Team

33.7

N/A

164

Page 9 of 15 · 296 models

PreviousNext

Want benchmark charts, model comparison, and pricing analytics?

Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.

Open full leaderboard

Rankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.

$0.25 in / $1.25 out
$0.5 in / $1.5 out

MiniMax M1 80K

MiniMax

33.6

$0.55 in / $2.2 out

165

Ministral 8B Instruct

Mistral AI

33.6

$0.1 in / $0.1 out

166

o3

OpenAI

33.2

$2 in / $8 out

167

DeepSeek-V3

DeepSeek

33.2

$0.27 in / $1.1 out

168

DeepSeek-V3.1

DeepSeek

32.9

$0.27 in / $1 out

169

DeepSeek R1 Distill Qwen 32B

DeepSeek

32.7

$0.12 in / $0.18 out

170

DeepSeek-V2.5

DeepSeek

32.5

$0.14 in / $0.28 out

171

DeepSeek R1 Distill Llama 70B

DeepSeek

32.2

$0.1 in / $0.4 out

172
A

Qwen3.5-4B

Alibaba Cloud / Qwen Team

32.1

N/A

173

Claude 3 Haiku

Anthropic

32.0

$0.25 in / $1.25 out

174

Mistral Large 3 (675B Instruct 2512)

Mistral AI

31.6

$0.5 in / $1.5 out

175
M

Phi 4 Reasoning Plus

Microsoft

31.5

N/A

176
N

Hermes 3 70B

Nous Research

30.1

N/A

177

Grok-2

xAI

30.1

$2 in / $10 out

178
Z

GLM-4.7-Flash

Zhipu AI

29.9

$0.07 in / $0.4 out

179

GPT-4o

OpenAI

29.9

$2.5 in / $10 out

180
M

Llama 3.3 70B Instruct

Meta

29.9

$0.2 in / $0.2 out