Skytells
HomeModelsCLIChangelog
  • Home
  • Models
  • CLI
  • Changelog
Skytells

Addressing the world's greatest challenges with AI. Enterprise research, foundation models, and infrastructure trusted by organizations worldwide since 2012.

Get Started

  • Console
  • Learn
  • Documentation
  • API Reference
  • Pricing
  • ModelsNew

Platform

  • Cloud AgentsNew
  • AI Solutions
  • Infrastructure
  • Edge Network
  • Trust Center
  • CLI

Resources

  • Blog
  • Changelog
  • AI Leaderboard
  • Research
  • Status

Company

  • About
  • Careers
  • Legal
  • Privacy Policy

© 2012–2026 Skytells, Inc. All rights reserved.

Live rankings

AI Model Leaderboard

Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.

Explore full leaderboardBrowse model catalog

296

Tracked models

27

Providers

253

Benchmarked

27.4

Avg. index

OverallBenchmarksInferenceAgenticProgrammingValue / Price

296 models

RankModelProviderScoreBenchmarksInferenceAgenticProgrammingValuePrice
101

GPT OSS 120B

gpt-oss-120b

textinference
OpenAI

36.1

Benchmarks

36.134.526.80.076.4$0.09 in / $0.45 out
102

Qwen3 VL 8B Thinking

qwen3-vl-8b-thinking

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

35.6

Benchmarks

35.666.023.50.045.6
103

Llama 3.1 Nemotron Ultra 253B v1

llama-3.1-nemotron-ultra-253b-v1

textinference
NNVIDIA

35.4

Benchmarks

35.40.00.00.00.0N/A
104

Llama 4 Maverick

llama-4-maverick

multimodalvisionmulti-input reasoning
MMeta

35.4

Benchmarks

35.455.80.00.057.1$0.17 in / $0.85 out
105

Kimi-k1.5

kimi-k1.5

multimodalvisionmulti-input reasoning
Moonshot AI

35.3

Benchmarks

35.30.00.00.00.0N/A
106

Qwen3 VL 30B A3B Thinking

qwen3-vl-30b-a3b-thinking

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

35.1

Benchmarks

35.166.021.30.059.9
107

Mistral Small 4

mistral-small-latest

multimodalvisionmulti-input reasoning
Mistral AI

34.7

Benchmarks

34.755.20.00.066.8
108

GLM-4.5

glm-4.5

codeprogrammingtool use
ZZhipu AI

33.8

Benchmarks

33.80.036.440.30.0N/A
109

Claude 3.5 Sonnet

claude-3-5-sonnet-20241022

multimodalvisionmulti-input reasoning
Anthropic

33.7

Benchmarks

33.768.238.712.924.6
110

Gemini 2.0 Flash

gemini-2.0-flash

multimodalvisionmulti-input reasoning
Google

33.3

Benchmarks

33.393.90.00.082.5
111

DeepSeek-V3 0324

deepseek-v3-0324

textinference
DeepSeek

32.8

Benchmarks

32.839.80.00.057.7$0.28 in / $1.14 out
112

Claude Haiku 4.5

claude-haiku-4-5-20251001

multimodalvisionmulti-input reasoning
Anthropic

32.7

Benchmarks

32.761.854.256.637.7
113

Qwen3.5-4B

qwen3.5-4b

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

32.1

Benchmarks

32.10.00.00.00.0N/A
114

MiniMax M2

minimax-m2

codeprogrammingtool use
MiniMax

31.9

Benchmarks

31.984.041.142.454.9$0.3 in / $1.2 out
115

Ministral 3 (8B Reasoning 2512)

ministral-8b-latest

multimodalvisionmulti-input reasoning
Mistral AI

31.6

Benchmarks

31.684.50.00.092.1
116

GPT-4o

gpt-4o-2024-08-06

multimodalvisionmulti-input reasoning
OpenAI

31.5

Benchmarks

31.546.714.94.326.8
117

Phi 4 Reasoning Plus

phi-4-reasoning-plus

textinference
MMicrosoft

31.5

Benchmarks

31.50.00.00.00.0N/A
118

Qwen3 235B A22B

qwen3-235b-a22b

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

30.5

Benchmarks

30.533.50.00.084.0$0.1 in / $0.1 out
119

Hermes 3 70B

hermes-3-70b

textinference
NNous Research

30.1

Benchmarks

30.10.00.00.00.0N/A
120

Qwen3 Max

qwen3-max

codeprogrammingtool use
AAlibaba Cloud / Qwen Team

29.8

Benchmarks

29.855.20.035.831.3$0.5 in / $5 out
101

GPT OSS 120B

OpenAI

36.1

$0.09 in / $0.45 out

102
A

Qwen3 VL 8B Thinking

Alibaba Cloud / Qwen Team

35.6

$0.18 in / $2.09 out

103
N

Llama 3.1 Nemotron Ultra 253B v1

NVIDIA

35.4

N/A

104

Page 6 of 15 · 296 models

PreviousNext

Want benchmark charts, model comparison, and pricing analytics?

Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.

Open full leaderboard

Rankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.

$0.18 in / $2.09 out
$0.2 in / $1 out
$0.15 in / $0.6 out
$3 in / $15 out
$0.1 in / $0.4 out
$1 in / $5 out
$0.15 in / $0.15 out
$2.5 in / $10 out
M

Llama 4 Maverick

Meta

35.4

$0.17 in / $0.85 out

105

Kimi-k1.5

Moonshot AI

35.3

N/A

106
A

Qwen3 VL 30B A3B Thinking

Alibaba Cloud / Qwen Team

35.1

$0.2 in / $1 out

107

Mistral Small 4

Mistral AI

34.7

$0.15 in / $0.6 out

108
Z

GLM-4.5

Zhipu AI

33.8

N/A

109

Claude 3.5 Sonnet

Anthropic

33.7

$3 in / $15 out

110

Gemini 2.0 Flash

Google

33.3

$0.1 in / $0.4 out

111

DeepSeek-V3 0324

DeepSeek

32.8

$0.28 in / $1.14 out

112

Claude Haiku 4.5

Anthropic

32.7

$1 in / $5 out

113
A

Qwen3.5-4B

Alibaba Cloud / Qwen Team

32.1

N/A

114

MiniMax M2

MiniMax

31.9

$0.3 in / $1.2 out

115

Ministral 3 (8B Reasoning 2512)

Mistral AI

31.6

$0.15 in / $0.15 out

116

GPT-4o

OpenAI

31.5

$2.5 in / $10 out

117
M

Phi 4 Reasoning Plus

Microsoft

31.5

N/A

118
A

Qwen3 235B A22B

Alibaba Cloud / Qwen Team

30.5

$0.1 in / $0.1 out

119
N

Hermes 3 70B

Nous Research

30.1

N/A

120
A

Qwen3 Max

Alibaba Cloud / Qwen Team

29.8

$0.5 in / $5 out