Skytells
HomeModelsCLIChangelog
  • Home
  • Models
  • CLI
  • Changelog
Skytells

Addressing the world's greatest challenges with AI. Enterprise research, foundation models, and infrastructure trusted by organizations worldwide since 2012.

Get Started

  • Console
  • Learn
  • Documentation
  • API Reference
  • Pricing
  • ModelsNew

Platform

  • Cloud AgentsNew
  • AI Solutions
  • Infrastructure
  • Edge Network
  • Trust Center
  • CLI

Resources

  • Blog
  • Changelog
  • AI Leaderboard
  • Research
  • Status

Company

  • About
  • Careers
  • Legal
  • Privacy Policy

© 2012–2026 Skytells, Inc. All rights reserved.

Live rankings

AI Model Leaderboard

Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.

Explore full leaderboardBrowse model catalog

296

Tracked models

27

Providers

253

Benchmarked

27.4

Avg. index

OverallBenchmarksInferenceAgenticProgrammingValue / Price

296 models

RankModelProviderScoreBenchmarksInferenceAgenticProgrammingValuePrice
121

Qwen3-Next-80B-A3B-Instruct

qwen3-next-80b-a3b-instruct

textinference
AAlibaba Cloud / Qwen Team

29.5

Benchmarks

29.56.117.90.051.9$0.15 in / $1.5 out
122

Qwen3 VL 32B Instruct

qwen3-vl-32b-instruct

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

29.4

Benchmarks

29.40.027.90.00.0
123

Llama 4 Scout

llama-4-scout

multimodalvisionmulti-input reasoning
MMeta

29.0

Benchmarks

29.062.10.00.078.1$0.08 in / $0.3 out
124

DeepSeek R1 Distill Llama 70B

deepseek-r1-distill-llama-70b

textinference
DeepSeek

28.8

Benchmarks

28.816.60.00.066.6$0.1 in / $0.4 out
125

QwQ-32B

qwq-32b

textinference
AAlibaba Cloud / Qwen Team

28.8

Benchmarks

28.80.00.00.00.0N/A
126

QwQ-32B-Preview

qwq-32b-preview

textinference
AAlibaba Cloud / Qwen Team

28.8

Benchmarks

28.829.70.00.061.9$0.15 in / $0.6 out
127

GPT-4.1

gpt-4.1-2025-04-14

multimodalvisionmulti-input reasoning
OpenAI

28.7

Benchmarks

28.775.932.817.334.6
128

Qwen3 VL 30B A3B Instruct

qwen3-vl-30b-a3b-instruct

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

28.3

Benchmarks

28.366.023.60.063.7
129

LongCat-Flash-Chat

longcat-flash-chat

codeprogrammingtool use
Meituan

27.9

Benchmarks

27.952.749.239.157.9$0.3 in / $1.2 out
130

Pixtral Large

pixtral-large

multimodalvisionmulti-input reasoning
Mistral AI

27.8

Benchmarks

27.87.00.00.022.4
131

GLM-4.5-Air

glm-4.5-air

codeprogrammingtool use
ZZhipu AI

27.7

Benchmarks

27.70.024.920.00.0N/A
132

Gemini 1.5 Pro

gemini-1.5-pro

multimodalvisionmulti-input reasoning
Google

27.6

Benchmarks

27.665.20.00.024.3
133

MiniCPM-SALA

minicpm-sala

textinference
OOpenBMB

27.5

Benchmarks

27.50.00.00.00.0N/A
134

DeepSeek-V3

deepseek-v3

codeprogrammingtool use
DeepSeek

27.3

Benchmarks

27.358.00.010.460.5$0.27 in / $1.1 out
135

Grok-2

grok-2

multimodalvisionmulti-input reasoning
xAI

27.1

Benchmarks

27.138.30.00.025.4$2 in / $10 out
136

Kimi K2 Base

kimi-k2-base

textinference
Moonshot AI

26.9

Benchmarks

26.90.00.00.00.0N/A
137

DeepSeek R1 Distill Qwen 32B

deepseek-r1-distill-qwen-32b

textinference
DeepSeek

26.6

Benchmarks

26.616.60.00.075.9$0.12 in / $0.18 out
138

GPT-5 nano

gpt-5-nano-2025-08-07

multimodalvisionmulti-input reasoning
OpenAI

26.3

Benchmarks

26.30.00.011.80.0
139

GPT OSS 20B

gpt-oss-20b

textinference
OpenAI

25.8

Benchmarks

25.877.26.00.079.0$0.1 in / $0.5 out
140

o1-mini

o1-mini

textinference
OpenAI

25.7

Benchmarks

25.761.30.00.030.1$3 in / $12 out
121
A

Qwen3-Next-80B-A3B-Instruct

Alibaba Cloud / Qwen Team

29.5

$0.15 in / $1.5 out

122
A

Qwen3 VL 32B Instruct

Alibaba Cloud / Qwen Team

29.4

N/A

123
M

Llama 4 Scout

Meta

29.0

$0.08 in / $0.3 out

124

Page 7 of 15 · 296 models

PreviousNext

Want benchmark charts, model comparison, and pricing analytics?

Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.

Open full leaderboard

Rankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.

N/A
$2 in / $8 out
$0.2 in / $0.7 out
$2 in / $6 out
$2.5 in / $10 out
N/A

DeepSeek R1 Distill Llama 70B

DeepSeek

28.8

$0.1 in / $0.4 out

125
A

QwQ-32B

Alibaba Cloud / Qwen Team

28.8

N/A

126
A

QwQ-32B-Preview

Alibaba Cloud / Qwen Team

28.8

$0.15 in / $0.6 out

127

GPT-4.1

OpenAI

28.7

$2 in / $8 out

128
A

Qwen3 VL 30B A3B Instruct

Alibaba Cloud / Qwen Team

28.3

$0.2 in / $0.7 out

129

LongCat-Flash-Chat

Meituan

27.9

$0.3 in / $1.2 out

130

Pixtral Large

Mistral AI

27.8

$2 in / $6 out

131
Z

GLM-4.5-Air

Zhipu AI

27.7

N/A

132

Gemini 1.5 Pro

Google

27.6

$2.5 in / $10 out

133
O

MiniCPM-SALA

OpenBMB

27.5

N/A

134

DeepSeek-V3

DeepSeek

27.3

$0.27 in / $1.1 out

135

Grok-2

xAI

27.1

$2 in / $10 out

136

Kimi K2 Base

Moonshot AI

26.9

N/A

137

DeepSeek R1 Distill Qwen 32B

DeepSeek

26.6

$0.12 in / $0.18 out

138

GPT-5 nano

OpenAI

26.3

N/A

139

GPT OSS 20B

OpenAI

25.8

$0.1 in / $0.5 out

140

o1-mini

OpenAI

25.7

$3 in / $12 out