Skytells
HomeModelsCLIChangelog
  • Home
  • Models
  • CLI
  • Changelog
Skytells

Addressing the world's greatest challenges with AI. Enterprise research, foundation models, and infrastructure trusted by organizations worldwide since 2012.

Get Started

  • Console
  • Learn
  • Documentation
  • API Reference
  • Pricing
  • ModelsNew

Platform

  • Cloud AgentsNew
  • AI Solutions
  • Infrastructure
  • Edge Network
  • Trust Center
  • CLI

Resources

  • Blog
  • Changelog
  • AI Leaderboard
  • Research
  • Status

Company

  • About
  • Careers
  • Legal
  • Privacy Policy

© 2012–2026 Skytells, Inc. All rights reserved.

Live rankings

AI Model Leaderboard

Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.

Explore full leaderboardBrowse model catalog

296

Tracked models

27

Providers

253

Benchmarked

32.1

Avg. index

OverallBenchmarksInferenceAgenticProgrammingValue / Price

296 models

RankModelProviderScoreBenchmarksInferenceAgenticProgrammingValuePrice
181

Qwen3-Next-80B-A3B-Thinking

qwen3-next-80b-a3b-thinking

textinference
AAlibaba Cloud / Qwen Team

6.1

Inference

44.76.141.70.051.9$0.15 in / $1.5 out
182

Mistral Small

mistral-small-2409

textinference
Mistral AI

2.1

Inference

0.02.10.00.051.9$0.2 in / $0.6 out
183

Claude Mythos Preview

claude-mythos-preview

multimodalvisionmulti-input reasoning
Anthropic

0.0

Inference

80.00.070.184.21.6
184

Claude Opus 4

claude-opus-4-20250514

multimodalvisionmulti-input reasoning
Anthropic

0.0

Inference

37.60.057.948.90.0
185

Claude Sonnet 4

claude-sonnet-4-20250514

multimodalvisionmulti-input reasoning
Anthropic

0.0

Inference

40.90.049.444.30.0
186

Codestral-22B

codestral-22b

textinference
Mistral AI

0.0

Inference

0.00.00.00.00.0N/A
187

DeepSeek R1 Distill Llama 8B

deepseek-r1-distill-llama-8b

textinference
DeepSeek

0.0

Inference

17.80.00.00.00.0N/A
188

DeepSeek R1 Distill Qwen 14B

deepseek-r1-distill-qwen-14b

textinference
DeepSeek

0.0

Inference

24.70.00.00.00.0N/A
189

DeepSeek R1 Distill Qwen 1.5B

deepseek-r1-distill-qwen-1.5b

textinference
DeepSeek

0.0

Inference

6.10.00.00.00.0N/A
190

DeepSeek R1 Distill Qwen 7B

deepseek-r1-distill-qwen-7b

textinference
DeepSeek

0.0

Inference

18.30.00.00.00.0N/A
191

DeepSeek R1 Zero

deepseek-r1-zero

textinference
DeepSeek

0.0

Inference

39.40.00.00.00.0N/A
192

DeepSeek-V3.2 (Thinking)

deepseek-reasoner

codeprogrammingtool use
DeepSeek

0.0

Inference

52.50.015.544.90.0
193

DeepSeek-V3.2-Exp

deepseek-v3.2-exp

codeprogrammingtool use
DeepSeek

0.0

Inference

52.30.028.640.10.0N/A
194

DeepSeek-V3.2-Speciale

deepseek-v3.2-speciale

codeprogrammingtool use
DeepSeek

0.0

Inference

53.80.08.544.90.0
195

DeepSeek VL2

deepseek-vl2

multimodalvisionmulti-input reasoning
DeepSeek

0.0

Inference

6.90.00.00.00.0N/A
196

DeepSeek VL2 Small

deepseek-vl2-small

multimodalvisionmulti-input reasoning
DeepSeek

0.0

Inference

4.60.00.00.00.0
197

DeepSeek VL2 Tiny

deepseek-vl2-tiny

multimodalvisionmulti-input reasoning
DeepSeek

0.0

Inference

1.20.00.00.00.0
198

ERNIE 5.0

ernie-5.0

multimodalvisionmulti-input reasoning
BBaidu

0.0

Inference

59.10.00.00.00.0N/A
199

Gemini 2.0 Flash Thinking

gemini-2.0-flash-thinking

multimodalvisionmulti-input reasoning
Google

0.0

Inference

46.50.00.00.00.0
200

Gemini 3 Pro

gemini-3-pro-preview

multimodalvisionmulti-input reasoning
Google

0.0

Inference

73.20.063.856.10.0
181
A

Qwen3-Next-80B-A3B-Thinking

Alibaba Cloud / Qwen Team

6.1

$0.15 in / $1.5 out

182

Mistral Small

Mistral AI

2.1

$0.2 in / $0.6 out

183

Claude Mythos Preview

Anthropic

0.0

$25 in / $125 out

184

Page 10 of 15 · 296 models

PreviousNext

Want benchmark charts, model comparison, and pricing analytics?

Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.

Open full leaderboard

Rankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.

$25 in / $125 out
N/A
N/A
N/A
N/A
N/A
N/A
N/A
N/A

Claude Opus 4

Anthropic

0.0

N/A

185

Claude Sonnet 4

Anthropic

0.0

N/A

186

Codestral-22B

Mistral AI

0.0

N/A

187

DeepSeek R1 Distill Llama 8B

DeepSeek

0.0

N/A

188

DeepSeek R1 Distill Qwen 14B

DeepSeek

0.0

N/A

189

DeepSeek R1 Distill Qwen 1.5B

DeepSeek

0.0

N/A

190

DeepSeek R1 Distill Qwen 7B

DeepSeek

0.0

N/A

191

DeepSeek R1 Zero

DeepSeek

0.0

N/A

192

DeepSeek-V3.2 (Thinking)

DeepSeek

0.0

N/A

193

DeepSeek-V3.2-Exp

DeepSeek

0.0

N/A

194

DeepSeek-V3.2-Speciale

DeepSeek

0.0

N/A

195

DeepSeek VL2

DeepSeek

0.0

N/A

196

DeepSeek VL2 Small

DeepSeek

0.0

N/A

197

DeepSeek VL2 Tiny

DeepSeek

0.0

N/A

198
B

ERNIE 5.0

Baidu

0.0

N/A

199

Gemini 2.0 Flash Thinking

Google

0.0

N/A

200

Gemini 3 Pro

Google

0.0

N/A