Skytells
HomeModelsCLIChangelog
  • Home
  • Models
  • CLI
  • Changelog
Skytells

Addressing the world's greatest challenges with AI. Enterprise research, foundation models, and infrastructure trusted by organizations worldwide since 2012.

Get Started

  • Console
  • Learn
  • Documentation
  • API Reference
  • Pricing
  • ModelsNew

Platform

  • Cloud AgentsNew
  • AI Solutions
  • Infrastructure
  • Edge Network
  • Trust Center
  • CLI

Resources

  • Blog
  • Changelog
  • AI Leaderboard
  • Research
  • Status

Company

  • About
  • Careers
  • Legal
  • Privacy Policy

© 2012–2026 Skytells, Inc. All rights reserved.

Live rankings

AI Model Leaderboard

Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.

Explore full leaderboardBrowse model catalog

296

Tracked models

27

Providers

253

Benchmarked

32.1

Avg. index

OverallBenchmarksInferenceAgenticProgrammingValue / Price

296 models

RankModelProviderScoreBenchmarksInferenceAgenticProgrammingValuePrice
141

Claude Opus 4.1

claude-opus-4-1-20250805

multimodalvisionmulti-input reasoning
Anthropic

30.5

Inference

47.930.566.862.17.2$15 in / $75 out
142

Claude Opus 4.5

claude-opus-4-5-20251101

multimodalvisionmulti-input reasoning
Anthropic

30.5

Inference

56.130.542.574.210.7
143

Claude Sonnet 4.5

claude-sonnet-4-5-20250929

multimodalvisionmulti-input reasoning
Anthropic

30.5

Inference

53.030.571.874.613.3
144

Claude Sonnet 4.6

claude-sonnet-4-6

multimodalvisionmulti-input reasoning
Anthropic

30.5

Inference

66.130.548.568.213.3
145

GLM-4.7-Flash

glm-4.7-flash

codeprogrammingtool use
ZZhipu AI

29.7

Inference

38.229.711.420.772.1$0.07 in / $0.4 out
146

GPT-4.5

gpt-4.5

multimodalvisionmulti-input reasoning
OpenAI

29.7

Inference

41.929.735.86.07.0$75 in / $150 out
147

Granite 3.3 8B Instruct

granite-3.3-8b-instruct

multimodalvisionmulti-input reasoning
IIBM

29.7

Inference

0.029.70.00.056.7$0.5 in / $0.5 out
148

QwQ-32B-Preview

qwq-32b-preview

textinference
AAlibaba Cloud / Qwen Team

29.7

Inference

28.829.70.00.061.9$0.15 in / $0.6 out
149

Llama 3.1 8B Instruct

llama-3.1-8b-instruct

textinference
MMeta

26.7

Inference

3.226.70.00.083.9$0.03 in / $0.03 out
150

K-EXAONE-236B-A23B

k-exaone-236b-a23b

multimodalvisionmulti-input reasoning
LLG AI Research

24.9

Inference

43.424.90.00.049.2$0.6 in / $1 out
151

GLM-5

glm-5

codeprogrammingtool use
ZZhipu AI

23.0

Inference

0.023.047.863.830.6$1 in / $3.2 out
152

Llama 3.1 405B Instruct

llama-3.1-405b-instruct

textinference
MMeta

21.4

Inference

20.021.40.00.044.5$0.89 in / $0.89 out
153

Llama 3.1 70B Instruct

llama-3.1-70b-instruct

textinference
MMeta

21.4

Inference

11.221.40.00.072.2$0.2 in / $0.2 out
154

Llama 3.3 70B Instruct

llama-3.3-70b-instruct

textinference
MMeta

21.4

Inference

19.621.40.00.072.2$0.2 in / $0.2 out
155

Mistral Large 2

mistral-large-2-2407

textinference
Mistral AI

21.4

Inference

0.021.40.00.026.7$2 in / $6 out
156

Mistral NeMo Instruct

mistral-nemo-instruct-2407

textinference
Mistral AI

21.4

Inference

0.021.40.00.077.3$0.15 in / $0.15 out
157

Mistral Small 3 24B Instruct

mistral-small-24b-instruct-2501

textinference
Mistral AI

21.4

Inference

14.221.40.00.080.7$0.07 in / $0.14 out
158

o3-pro

o3-pro-2025-06-10

multimodalvisionmulti-input reasoning
OpenAI

21.4

Inference

0.021.40.00.03.6$20 in / $80 out
159

Qwen2.5-Coder 32B Instruct

qwen-2.5-coder-32b-instruct

textinference
AAlibaba Cloud / Qwen Team

21.4

Inference

0.021.40.00.081.5$0.09 in / $0.09 out
160

Gemma 3 12B

gemma-3-12b-it

multimodalvisionmulti-input reasoning
Google

20.3

Inference

9.120.30.00.080.7$0.05 in / $0.1 out
141

Claude Opus 4.1

Anthropic

30.5

$15 in / $75 out

142

Claude Opus 4.5

Anthropic

30.5

$5 in / $25 out

143

Claude Sonnet 4.5

Anthropic

30.5

$3 in / $15 out

144

Page 8 of 15 · 296 models

PreviousNext

Want benchmark charts, model comparison, and pricing analytics?

Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.

Open full leaderboard

Rankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.

$5 in / $25 out
$3 in / $15 out
$3 in / $15 out

Claude Sonnet 4.6

Anthropic

30.5

$3 in / $15 out

145
Z

GLM-4.7-Flash

Zhipu AI

29.7

$0.07 in / $0.4 out

146

GPT-4.5

OpenAI

29.7

$75 in / $150 out

147
I

Granite 3.3 8B Instruct

IBM

29.7

$0.5 in / $0.5 out

148
A

QwQ-32B-Preview

Alibaba Cloud / Qwen Team

29.7

$0.15 in / $0.6 out

149
M

Llama 3.1 8B Instruct

Meta

26.7

$0.03 in / $0.03 out

150
L

K-EXAONE-236B-A23B

LG AI Research

24.9

$0.6 in / $1 out

151
Z

GLM-5

Zhipu AI

23.0

$1 in / $3.2 out

152
M

Llama 3.1 405B Instruct

Meta

21.4

$0.89 in / $0.89 out

153
M

Llama 3.1 70B Instruct

Meta

21.4

$0.2 in / $0.2 out

154
M

Llama 3.3 70B Instruct

Meta

21.4

$0.2 in / $0.2 out

155

Mistral Large 2

Mistral AI

21.4

$2 in / $6 out

156

Mistral NeMo Instruct

Mistral AI

21.4

$0.15 in / $0.15 out

157

Mistral Small 3 24B Instruct

Mistral AI

21.4

$0.07 in / $0.14 out

158

o3-pro

OpenAI

21.4

$20 in / $80 out

159
A

Qwen2.5-Coder 32B Instruct

Alibaba Cloud / Qwen Team

21.4

$0.09 in / $0.09 out

160

Gemma 3 12B

Google

20.3

$0.05 in / $0.1 out