Skytells
HomeModelsCLIChangelog
  • Home
  • Models
  • CLI
  • Changelog
Skytells

Addressing the world's greatest challenges with AI. Enterprise research, foundation models, and infrastructure trusted by organizations worldwide since 2012.

Get Started

  • Console
  • Learn
  • Documentation
  • API Reference
  • Pricing
  • ModelsNew

Platform

  • Cloud AgentsNew
  • AI Solutions
  • Infrastructure
  • Edge Network
  • Trust Center
  • CLI

Resources

  • Blog
  • Changelog
  • AI Leaderboard
  • Research
  • Status

Company

  • About
  • Careers
  • Legal
  • Privacy Policy

© 2012–2026 Skytells, Inc. All rights reserved.

Live rankings

AI Model Leaderboard

Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.

Explore full leaderboardBrowse model catalog

296

Tracked models

27

Providers

253

Benchmarked

34.7

Avg. index

OverallBenchmarksInferenceAgenticProgrammingValue / Price

296 models

RankModelProviderScoreBenchmarksInferenceAgenticProgrammingValuePrice
181

Nemotron 3 Super (120B A12B)

nemotron-3-super-120b-a12b

codeprogrammingtool use
NNVIDIA

29.1

overall

48.30.08.726.80.0N/A
182

QwQ-32B

qwq-32b

textinference
AAlibaba Cloud / Qwen Team

28.8

overall

28.80.00.00.00.0N/A
183

Gemini 1.0 Pro

gemini-1.0-pro

multimodalvisionmulti-input reasoning
Google

28.8

overall

3.257.20.00.055.4
184

Qwen3 VL 32B Instruct

qwen3-vl-32b-instruct

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

28.7

overall

29.40.027.90.00.0
185

GPT-4.1 mini

gpt-4.1-mini-2025-04-14

multimodalvisionmulti-input reasoning
OpenAI

28.7

overall

20.790.68.92.656.8
186

Mistral Small 3 24B Instruct

mistral-small-24b-instruct-2501

textinference
Mistral AI

28.6

overall

14.221.40.00.080.7$0.07 in / $0.14 out
187

o3-mini

o3-mini

codeprogrammingtool use
OpenAI

28.1

overall

25.670.411.912.241.6$1.1 in / $4.4 out
188

Qwen3 32B

qwen3-32b

textinference
AAlibaba Cloud / Qwen Team

28.0

overall

21.413.30.00.069.8$0.1 in / $0.3 out
189

GPT-4 Turbo

gpt-4-turbo-2024-04-09

textinference
OpenAI

27.9

overall

16.952.70.00.018.8$10 in / $30 out
190

o1

o1-2024-12-17

multimodalvisionmulti-input reasoning
OpenAI

27.8

overall

42.919.444.76.54.9$15 in / $60 out
191

MiniCPM-SALA

minicpm-sala

textinference
OOpenBMB

27.5

overall

27.50.00.00.00.0N/A
192

Kimi K2 Instruct

kimi-k2-instruct

codeprogrammingtool use
Moonshot AI

27.3

overall

24.446.114.815.362.1$0.5 in / $0.5 out
193

GPT-4.5

gpt-4.5

multimodalvisionmulti-input reasoning
OpenAI

27.1

overall

41.929.735.86.07.0$75 in / $150 out
194

Kimi K2 Base

kimi-k2-base

textinference
Moonshot AI

26.9

overall

26.90.00.00.00.0N/A
195

o1-preview

o1-preview

codeprogrammingtool use
OpenAI

26.6

overall

41.833.00.09.511.8$15 in / $60 out
196

Sarvam-105B

sarvam-105b

codeprogrammingtool use
SSarvam AI

25.7

overall

42.90.017.912.10.0N/A
197

Gemma 3 12B

gemma-3-12b-it

multimodalvisionmulti-input reasoning
Google

25.7

overall

9.120.30.00.080.7$0.05 in / $0.1 out
198

Llama 3.1 70B Instruct

llama-3.1-70b-instruct

textinference
MMeta

25.5

overall

11.221.40.00.072.2$0.2 in / $0.2 out
199

Gemma 3 27B

gemma-3-27b-it

multimodalvisionmulti-input reasoning
Google

25.3

overall

10.720.30.00.073.9$0.1 in / $0.2 out
200

Phi 4

phi-4

textinference
MMicrosoft

25.1

overall

15.69.00.00.077.2$0.07 in / $0.14 out
181
N

Nemotron 3 Super (120B A12B)

NVIDIA

29.1

N/A

182
A

QwQ-32B

Alibaba Cloud / Qwen Team

28.8

N/A

183

Gemini 1.0 Pro

Google

28.8

$0.5 in / $1.5 out

184

Page 10 of 15 · 296 models

PreviousNext

Want benchmark charts, model comparison, and pricing analytics?

Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.

Open full leaderboard

Rankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.

$0.5 in / $1.5 out
N/A
$0.4 in / $1.6 out
A

Qwen3 VL 32B Instruct

Alibaba Cloud / Qwen Team

28.7

N/A

185

GPT-4.1 mini

OpenAI

28.7

$0.4 in / $1.6 out

186

Mistral Small 3 24B Instruct

Mistral AI

28.6

$0.07 in / $0.14 out

187

o3-mini

OpenAI

28.1

$1.1 in / $4.4 out

188
A

Qwen3 32B

Alibaba Cloud / Qwen Team

28.0

$0.1 in / $0.3 out

189

GPT-4 Turbo

OpenAI

27.9

$10 in / $30 out

190

o1

OpenAI

27.8

$15 in / $60 out

191
O

MiniCPM-SALA

OpenBMB

27.5

N/A

192

Kimi K2 Instruct

Moonshot AI

27.3

$0.5 in / $0.5 out

193

GPT-4.5

OpenAI

27.1

$75 in / $150 out

194

Kimi K2 Base

Moonshot AI

26.9

N/A

195

o1-preview

OpenAI

26.6

$15 in / $60 out

196
S

Sarvam-105B

Sarvam AI

25.7

N/A

197

Gemma 3 12B

Google

25.7

$0.05 in / $0.1 out

198
M

Llama 3.1 70B Instruct

Meta

25.5

$0.2 in / $0.2 out

199

Gemma 3 27B

Google

25.3

$0.1 in / $0.2 out

200
M

Phi 4

Microsoft

25.1

$0.07 in / $0.14 out