Skytells
HomeModelsCLIChangelog
  • Home
  • Models
  • CLI
  • Changelog
Skytells

Addressing the world's greatest challenges with AI. Enterprise research, foundation models, and infrastructure trusted by organizations worldwide since 2012.

Get Started

  • Console
  • Learn
  • Documentation
  • API Reference
  • Pricing
  • ModelsNew

Platform

  • Cloud AgentsNew
  • AI Solutions
  • Infrastructure
  • Edge Network
  • Trust Center
  • CLI

Resources

  • Blog
  • Changelog
  • AI Leaderboard
  • Research
  • Status

Company

  • About
  • Careers
  • Legal
  • Privacy Policy

© 2012–2026 Skytells, Inc. All rights reserved.

Live rankings

AI Model Leaderboard

Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.

Explore full leaderboardBrowse model catalog

296

Tracked models

27

Providers

253

Benchmarked

34.7

Avg. index

OverallBenchmarksInferenceAgenticProgrammingValue / Price

296 models

RankModelProviderScoreBenchmarksInferenceAgenticProgrammingValuePrice
121

Qwen3 VL 30B A3B Thinking

qwen3-vl-30b-a3b-thinking

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

40.4

overall

35.166.021.30.059.9$0.2 in / $1 out
122

Granite 3.3 8B Instruct

granite-3.3-8b-instruct

multimodalvisionmulti-input reasoning
IIBM

40.1

overall

0.029.70.00.056.7$0.5 in / $0.5 out
123

Gemini 2.5 Flash

gemini-2.5-flash

multimodalvisionmulti-input reasoning
Google

40.0

overall

39.662.80.022.942.6
124

Qwen3 VL 32B Thinking

qwen3-vl-32b-thinking

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

39.8

overall

44.30.034.60.00.0
125

DeepSeek-V3 0324

deepseek-v3-0324

textinference
DeepSeek

39.5

overall

32.839.80.00.057.7$0.28 in / $1.14 out
126

DeepSeek R1 Zero

deepseek-r1-zero

textinference
DeepSeek

39.4

overall

39.40.00.00.00.0N/A
127

Qwen3 VL 8B Thinking

qwen3-vl-8b-thinking

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

39.4

overall

35.666.023.50.045.6
128

Qwen3 VL 30B A3B Instruct

qwen3-vl-30b-a3b-instruct

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

39.2

overall

28.366.023.60.063.7
129

Nova Pro

nova-pro

multimodalvisionmulti-input reasoning
AAmazon

39.2

overall

20.070.50.00.043.2$0.8 in / $3.2 out
130

Qwen2.5 7B Instruct

qwen-2.5-7b-instruct

textinference
AAlibaba Cloud / Qwen Team

39.2

overall

7.471.10.00.077.2$0.3 in / $0.3 out
131

K-EXAONE-236B-A23B

k-exaone-236b-a23b

multimodalvisionmulti-input reasoning
LLG AI Research

39.0

overall

43.424.90.00.049.2$0.6 in / $1 out
132

Claude 3.7 Sonnet

claude-3-7-sonnet-20250219

multimodalvisionmulti-input reasoning
Anthropic

38.9

overall

43.530.549.039.613.3
133

Qwen3.5-9B

qwen3.5-9b

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

38.5

overall

38.50.00.00.00.0N/A
134

Qwen3 30B A3B

qwen3-30b-a3b

textinference
AAlibaba Cloud / Qwen Team

38.4

overall

25.640.10.00.071.3$0.1 in / $0.44 out
135

DeepSeek-V3.2 (Thinking)

deepseek-reasoner

codeprogrammingtool use
DeepSeek

38.2

overall

52.50.015.544.90.0N/A
136

QvQ-72B-Preview

qvq-72b-preview

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

38.2

overall

38.20.00.00.00.0N/A
137

Gemini 1.5 Pro

gemini-1.5-pro

multimodalvisionmulti-input reasoning
Google

38.2

overall

27.665.20.00.024.3
138

GPT OSS 120B

gpt-oss-120b

textinference
OpenAI

38.1

overall

36.134.526.80.076.4$0.09 in / $0.45 out
139

Claude 3.5 Sonnet

claude-3-5-sonnet-20240620

multimodalvisionmulti-input reasoning
Anthropic

37.9

overall

25.468.20.00.024.6
140

LongCat-Flash-Thinking

longcat-flash-thinking

codeprogrammingtool use
Meituan

37.6

overall

50.20.00.021.60.0
121
A

Qwen3 VL 30B A3B Thinking

Alibaba Cloud / Qwen Team

40.4

$0.2 in / $1 out

122
I

Granite 3.3 8B Instruct

IBM

40.1

$0.5 in / $0.5 out

123

Gemini 2.5 Flash

Google

40.0

$0.3 in / $2.5 out

124

Page 7 of 15 · 296 models

PreviousNext

Want benchmark charts, model comparison, and pricing analytics?

Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.

Open full leaderboard

Rankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.

$0.3 in / $2.5 out
N/A
$0.18 in / $2.09 out
$0.2 in / $0.7 out
$3 in / $15 out
$2.5 in / $10 out
$3 in / $15 out
N/A
A

Qwen3 VL 32B Thinking

Alibaba Cloud / Qwen Team

39.8

N/A

125

DeepSeek-V3 0324

DeepSeek

39.5

$0.28 in / $1.14 out

126

DeepSeek R1 Zero

DeepSeek

39.4

N/A

127
A

Qwen3 VL 8B Thinking

Alibaba Cloud / Qwen Team

39.4

$0.18 in / $2.09 out

128
A

Qwen3 VL 30B A3B Instruct

Alibaba Cloud / Qwen Team

39.2

$0.2 in / $0.7 out

129
A

Nova Pro

Amazon

39.2

$0.8 in / $3.2 out

130
A

Qwen2.5 7B Instruct

Alibaba Cloud / Qwen Team

39.2

$0.3 in / $0.3 out

131
L

K-EXAONE-236B-A23B

LG AI Research

39.0

$0.6 in / $1 out

132

Claude 3.7 Sonnet

Anthropic

38.9

$3 in / $15 out

133
A

Qwen3.5-9B

Alibaba Cloud / Qwen Team

38.5

N/A

134
A

Qwen3 30B A3B

Alibaba Cloud / Qwen Team

38.4

$0.1 in / $0.44 out

135

DeepSeek-V3.2 (Thinking)

DeepSeek

38.2

N/A

136
A

QvQ-72B-Preview

Alibaba Cloud / Qwen Team

38.2

N/A

137

Gemini 1.5 Pro

Google

38.2

$2.5 in / $10 out

138

GPT OSS 120B

OpenAI

38.1

$0.09 in / $0.45 out

139

Claude 3.5 Sonnet

Anthropic

37.9

$3 in / $15 out

140

LongCat-Flash-Thinking

Meituan

37.6

N/A