Skytells
HomeModelsCLIChangelog
  • Home
  • Models
  • CLI
  • Changelog
Skytells

Addressing the world's greatest challenges with AI. Enterprise research, foundation models, and infrastructure trusted by organizations worldwide since 2012.

Get Started

  • Console
  • Learn
  • Documentation
  • API Reference
  • Pricing
  • ModelsNew

Platform

  • Cloud AgentsNew
  • AI Solutions
  • Infrastructure
  • Edge Network
  • Trust Center
  • CLI

Resources

  • Blog
  • Changelog
  • AI Leaderboard
  • Research
  • Status

Company

  • About
  • Careers
  • Legal
  • Privacy Policy

© 2012–2026 Skytells, Inc. All rights reserved.

Live rankings

AI Model Leaderboard

Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.

Explore full leaderboardBrowse model catalog

296

Tracked models

27

Providers

253

Benchmarked

34.7

Avg. index

OverallBenchmarksInferenceAgenticProgrammingValue / Price

296 models

RankModelProviderScoreBenchmarksInferenceAgenticProgrammingValuePrice
141

Llama 3.2 11B Instruct

llama-3.2-11b-instruct

multimodalvisionmulti-input reasoning
MMeta

37.5

overall

4.060.30.00.094.9$0.05 in / $0.05 out
142

Nova Micro

nova-micro

textinference
AAmazon

37.3

overall

9.152.70.00.091.3$0.03 in / $0.14 out
143

Qwen3 Max

qwen3-max

codeprogrammingtool use
AAlibaba Cloud / Qwen Team

37.1

overall

29.855.20.035.831.3$0.5 in / $5 out
144

o1-mini

o1-mini

textinference
OpenAI

37.1

overall

25.761.30.00.030.1$3 in / $12 out
145

GPT OSS 20B

gpt-oss-20b

textinference
OpenAI

37.0

overall

25.877.26.00.079.0$0.1 in / $0.5 out
146

Qwen3-Next-80B-A3B-Thinking

qwen3-next-80b-a3b-thinking

textinference
AAlibaba Cloud / Qwen Team

36.8

overall

44.76.141.70.051.9$0.15 in / $1.5 out
147

Grok-4.1 Thinking

grok-4.1-thinking-2025-11-17

multimodalvisionmulti-input reasoning
xAI

36.7

overall

0.048.50.00.017.8
148

GLM-4.5

glm-4.5

codeprogrammingtool use
ZZhipu AI

36.6

overall

33.80.036.440.30.0N/A
149

DeepSeek-V3.2-Speciale

deepseek-v3.2-speciale

codeprogrammingtool use
DeepSeek

36.5

overall

53.80.08.544.90.0
150

Qwen3 VL 4B Instruct

qwen3-vl-4b-instruct

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

35.6

overall

19.666.019.50.070.3
151

Llama 3.1 Nemotron Ultra 253B v1

llama-3.1-nemotron-ultra-253b-v1

textinference
NNVIDIA

35.4

overall

35.40.00.00.00.0N/A
152

Qwen3 VL 4B Thinking

qwen3-vl-4b-thinking

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

35.4

overall

23.066.018.90.060.4
153

Jamba 1.5 Mini

jamba-1.5-mini

textinference
AAI21 Labs

35.4

overall

4.765.80.00.072.4$0.2 in / $0.4 out
154

GPT-5.4 nano

gpt-5.4-nano

multimodalvisionmulti-input reasoning
OpenAI

35.3

overall

45.676.59.710.057.1$0.2 in / $1.25 out
155

Kimi-k1.5

kimi-k1.5

multimodalvisionmulti-input reasoning
Moonshot AI

35.3

overall

35.30.00.00.00.0N/A
156

GPT-4.1

gpt-4.1-2025-04-14

multimodalvisionmulti-input reasoning
OpenAI

35.3

overall

28.775.932.817.334.6
157

QwQ-32B-Preview

qwq-32b-preview

textinference
AAlibaba Cloud / Qwen Team

35.2

overall

28.829.70.00.061.9$0.15 in / $0.6 out
158

Claude 3.5 Sonnet

claude-3-5-sonnet-20241022

multimodalvisionmulti-input reasoning
Anthropic

34.9

overall

33.768.238.712.924.6
159

Claude 3 Opus

claude-3-opus-20240229

multimodalvisionmulti-input reasoning
Anthropic

34.9

overall

19.371.70.00.019.5
160

Qwen3 VL 8B Instruct

qwen3-vl-8b-instruct

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

34.9

overall

9.866.026.70.075.3$0.08 in / $0.5 out
141
M

Llama 3.2 11B Instruct

Meta

37.5

$0.05 in / $0.05 out

142
A

Nova Micro

Amazon

37.3

$0.03 in / $0.14 out

143
A

Qwen3 Max

Alibaba Cloud / Qwen Team

37.1

$0.5 in / $5 out

144

Page 8 of 15 · 296 models

PreviousNext

Want benchmark charts, model comparison, and pricing analytics?

Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.

Open full leaderboard

Rankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.

$3 in / $15 out
N/A
$0.1 in / $0.6 out
$0.1 in / $1 out
$2 in / $8 out
$3 in / $15 out
$15 in / $75 out

o1-mini

OpenAI

37.1

$3 in / $12 out

145

GPT OSS 20B

OpenAI

37.0

$0.1 in / $0.5 out

146
A

Qwen3-Next-80B-A3B-Thinking

Alibaba Cloud / Qwen Team

36.8

$0.15 in / $1.5 out

147

Grok-4.1 Thinking

xAI

36.7

$3 in / $15 out

148
Z

GLM-4.5

Zhipu AI

36.6

N/A

149

DeepSeek-V3.2-Speciale

DeepSeek

36.5

N/A

150
A

Qwen3 VL 4B Instruct

Alibaba Cloud / Qwen Team

35.6

$0.1 in / $0.6 out

151
N

Llama 3.1 Nemotron Ultra 253B v1

NVIDIA

35.4

N/A

152
A

Qwen3 VL 4B Thinking

Alibaba Cloud / Qwen Team

35.4

$0.1 in / $1 out

153
A

Jamba 1.5 Mini

AI21 Labs

35.4

$0.2 in / $0.4 out

154

GPT-5.4 nano

OpenAI

35.3

$0.2 in / $1.25 out

155

Kimi-k1.5

Moonshot AI

35.3

N/A

156

GPT-4.1

OpenAI

35.3

$2 in / $8 out

157
A

QwQ-32B-Preview

Alibaba Cloud / Qwen Team

35.2

$0.15 in / $0.6 out

158

Claude 3.5 Sonnet

Anthropic

34.9

$3 in / $15 out

159

Claude 3 Opus

Anthropic

34.9

$15 in / $75 out

160
A

Qwen3 VL 8B Instruct

Alibaba Cloud / Qwen Team

34.9

$0.08 in / $0.5 out