Skytells
HomeModelsCLIChangelog
  • Home
  • Models
  • CLI
  • Changelog
Skytells

Addressing the world's greatest challenges with AI. Enterprise research, foundation models, and infrastructure trusted by organizations worldwide since 2012.

Get Started

  • Console
  • Learn
  • Documentation
  • API Reference
  • Pricing
  • ModelsNew

Platform

  • Cloud AgentsNew
  • AI Solutions
  • Infrastructure
  • Edge Network
  • Trust Center
  • CLI

Resources

  • Blog
  • Changelog
  • AI Leaderboard
  • Research
  • Status

Company

  • About
  • Careers
  • Legal
  • Privacy Policy

© 2012–2026 Skytells, Inc. All rights reserved.

Live rankings

AI Model Leaderboard

Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.

Explore full leaderboardBrowse model catalog

296

Tracked models

27

Providers

253

Benchmarked

32.1

Avg. index

OverallBenchmarksInferenceAgenticProgrammingValue / Price

296 models

RankModelProviderScoreBenchmarksInferenceAgenticProgrammingValuePrice
121

Claude Opus 4.6

claude-opus-4-6

multimodalvisionmulti-input reasoning
Anthropic

43.1

Inference

79.543.159.373.310.7$5 in / $25 out
122

Claude Opus 4.7

claude-opus-4-7

multimodalvisionmulti-input reasoning
Anthropic

43.1

Inference

76.843.168.681.410.7
123

Mistral Large 3 (675B Instruct 2512)

mistral-large-latest

multimodalvisionmulti-input reasoning
Mistral AI

40.1

Inference

22.240.10.00.044.5
124

Qwen3 30B A3B

qwen3-30b-a3b

textinference
AAlibaba Cloud / Qwen Team

40.1

Inference

25.640.10.00.071.3$0.1 in / $0.44 out
125

DeepSeek-V3 0324

deepseek-v3-0324

textinference
DeepSeek

39.8

Inference

32.839.80.00.057.7$0.28 in / $1.14 out
126

DeepSeek-V3.1

deepseek-v3.1

codeprogrammingtool use
DeepSeek

39.8

Inference

38.439.815.228.358.8$0.27 in / $1 out
127

o3

o3-2025-04-16

multimodalvisionmulti-input reasoning
OpenAI

38.9

Inference

46.038.919.630.227.7$2 in / $8 out
128

Grok-2

grok-2

multimodalvisionmulti-input reasoning
xAI

38.3

Inference

27.138.30.00.025.4$2 in / $10 out
129

GPT-3.5 Turbo

gpt-3.5-turbo-0125

multimodalvisionmulti-input reasoning
OpenAI

36.7

Inference

2.536.70.00.049.4
130

GLM-4.6

glm-4.6

multimodalvisionmulti-input reasoning
ZZhipu AI

34.5

Inference

46.534.537.345.742.9$0.55 in / $2.19 out
131

GPT OSS 120B

gpt-oss-120b

textinference
OpenAI

34.5

Inference

36.134.526.80.076.4$0.09 in / $0.45 out
132

Jamba 1.5 Large

jamba-1.5-large

textinference
AAI21 Labs

33.6

Inference

8.133.60.00.025.2$2 in / $8 out
133

Qwen3 235B A22B

qwen3-235b-a22b

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

33.5

Inference

30.533.50.00.084.0$0.1 in / $0.1 out
134

o1-preview

o1-preview

codeprogrammingtool use
OpenAI

33.0

Inference

41.833.00.09.511.8$15 in / $60 out
135

Gemini 2.5 Flash-Lite

gemini-2.5-flash-lite

multimodalvisionmulti-input reasoning
Google

32.8

Inference

21.432.80.03.564.1
136

Command R+

command-r-plus-04-2024

textinference
Cohere

32.5

Inference

0.032.50.00.055.4$0.25 in / $1 out
137

GPT-5.2 Pro

gpt-5.2-pro-2025-12-11

multimodalvisionmulti-input reasoning
OpenAI

31.6

Inference

66.931.655.40.02.7
138

Claude 3.5 Haiku

claude-3-5-haiku-20241022

codeprogrammingtool use
Anthropic

30.5

Inference

10.830.53.07.831.8
139

Claude 3.7 Sonnet

claude-3-7-sonnet-20250219

multimodalvisionmulti-input reasoning
Anthropic

30.5

Inference

43.530.549.039.613.3
140

Claude 3 Sonnet

claude-3-sonnet-20240229

multimodalvisionmulti-input reasoning
Anthropic

30.5

Inference

10.030.50.00.013.3
121

Claude Opus 4.6

Anthropic

43.1

$5 in / $25 out

122

Claude Opus 4.7

Anthropic

43.1

$5 in / $25 out

123

Mistral Large 3 (675B Instruct 2512)

Mistral AI

40.1

$0.5 in / $1.5 out

124

Page 7 of 15 · 296 models

PreviousNext

Want benchmark charts, model comparison, and pricing analytics?

Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.

Open full leaderboard

Rankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.

$5 in / $25 out
$0.5 in / $1.5 out
$0.5 in / $1.5 out
$0.1 in / $0.4 out
$21 in / $168 out
$0.8 in / $4 out
$3 in / $15 out
$3 in / $15 out
A

Qwen3 30B A3B

Alibaba Cloud / Qwen Team

40.1

$0.1 in / $0.44 out

125

DeepSeek-V3 0324

DeepSeek

39.8

$0.28 in / $1.14 out

126

DeepSeek-V3.1

DeepSeek

39.8

$0.27 in / $1 out

127

o3

OpenAI

38.9

$2 in / $8 out

128

Grok-2

xAI

38.3

$2 in / $10 out

129

GPT-3.5 Turbo

OpenAI

36.7

$0.5 in / $1.5 out

130
Z

GLM-4.6

Zhipu AI

34.5

$0.55 in / $2.19 out

131

GPT OSS 120B

OpenAI

34.5

$0.09 in / $0.45 out

132
A

Jamba 1.5 Large

AI21 Labs

33.6

$2 in / $8 out

133
A

Qwen3 235B A22B

Alibaba Cloud / Qwen Team

33.5

$0.1 in / $0.1 out

134

o1-preview

OpenAI

33.0

$15 in / $60 out

135

Gemini 2.5 Flash-Lite

Google

32.8

$0.1 in / $0.4 out

136

Command R+

Cohere

32.5

$0.25 in / $1 out

137

GPT-5.2 Pro

OpenAI

31.6

$21 in / $168 out

138

Claude 3.5 Haiku

Anthropic

30.5

$0.8 in / $4 out

139

Claude 3.7 Sonnet

Anthropic

30.5

$3 in / $15 out

140

Claude 3 Sonnet

Anthropic

30.5

$3 in / $15 out