Skytells
HomeModelsCLIChangelog
  • Home
  • Models
  • CLI
  • Changelog
Skytells

Addressing the world's greatest challenges with AI. Enterprise research, foundation models, and infrastructure trusted by organizations worldwide since 2012.

Get Started

  • Console
  • Learn
  • Documentation
  • API Reference
  • Pricing
  • ModelsNew

Platform

  • Cloud AgentsNew
  • AI Solutions
  • Infrastructure
  • Edge Network
  • Trust Center
  • CLI

Resources

  • Blog
  • Changelog
  • AI Leaderboard
  • Research
  • Status

Company

  • About
  • Careers
  • Legal
  • Privacy Policy

© 2012–2026 Skytells, Inc. All rights reserved.

Live rankings

AI Model Leaderboard

Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.

Explore full leaderboardBrowse model catalog

296

Tracked models

27

Providers

253

Benchmarked

27.4

Avg. index

OverallBenchmarksInferenceAgenticProgrammingValue / Price

296 models

RankModelProviderScoreBenchmarksInferenceAgenticProgrammingValuePrice
141

o3-mini

o3-mini

codeprogrammingtool use
OpenAI

25.6

Benchmarks

25.670.411.912.241.6$1.1 in / $4.4 out
142

Qwen3 30B A3B

qwen3-30b-a3b

textinference
AAlibaba Cloud / Qwen Team

25.6

Benchmarks

25.640.10.00.071.3$0.1 in / $0.44 out
143

Claude 3.5 Sonnet

claude-3-5-sonnet-20240620

multimodalvisionmulti-input reasoning
Anthropic

25.4

Benchmarks

25.468.20.00.024.6
144

Gemini 2.0 Flash-Lite

gemini-2.0-flash-lite

multimodalvisionmulti-input reasoning
Google

25.3

Benchmarks

25.362.80.00.079.7
145

Nemotron Nano 9B v2

nvidia-nemotron-nano-9b-v2

textinference
NNVIDIA

24.9

Benchmarks

24.90.00.00.00.0N/A
146

Qwen2.5 VL 72B Instruct

qwen2.5-vl-72b

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

24.9

Benchmarks

24.90.05.70.00.0N/A
147

DeepSeek R1 Distill Qwen 14B

deepseek-r1-distill-qwen-14b

textinference
DeepSeek

24.7

Benchmarks

24.70.00.00.00.0N/A
148

ERNIE 4.5

ernie-4.5

textinference
BBaidu

24.5

Benchmarks

24.518.80.00.034.6$0.4 in / $4 out
149

LongCat-Flash-Lite

longcat-flash-lite

codeprogrammingtool use
Meituan

24.5

Benchmarks

24.583.629.525.183.1$0.1 in / $0.4 out
150

Magistral Small 2506

magistral-small-2506

textinference
Mistral AI

24.5

Benchmarks

24.50.00.00.00.0N/A
151

Kimi K2 Instruct

kimi-k2-instruct

codeprogrammingtool use
Moonshot AI

24.4

Benchmarks

24.446.114.815.362.1$0.5 in / $0.5 out
152

Kimi K2-Instruct-0905

kimi-k2-instruct-0905

codeprogrammingtool use
Moonshot AI

24.4

Benchmarks

24.40.06.619.30.0
153

MiniMax M1 80K

minimax-m1-80k

codeprogrammingtool use
MiniMax

24.2

Benchmarks

24.284.020.919.041.8$0.55 in / $2.2 out
154

Grok-2 mini

grok-2-mini

multimodalvisionmulti-input reasoning
xAI

24.0

Benchmarks

24.00.00.00.00.0N/A
155

Phi 4 Reasoning

phi-4-reasoning

textinference
MMicrosoft

23.1

Benchmarks

23.10.00.00.00.0N/A
156

Gemini 1.5 Flash

gemini-1.5-flash

multimodalvisionmulti-input reasoning
Google

23.0

Benchmarks

23.091.90.00.071.7
157

Llama-3.3 Nemotron Super 49B v1

llama-3.3-nemotron-super-49b-v1

textinference
NNVIDIA

23.0

Benchmarks

23.00.00.00.00.0N/A
158

Qwen3 VL 4B Thinking

qwen3-vl-4b-thinking

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

23.0

Benchmarks

23.066.018.90.060.4
159

MiniMax M1 40K

minimax-m1-40k

codeprogrammingtool use
MiniMax

22.6

Benchmarks

22.60.026.818.10.0N/A
160

GPT-4o

gpt-4o-2024-05-13

multimodalvisionmulti-input reasoning
OpenAI

22.3

Benchmarks

22.345.40.00.026.5
141

o3-mini

OpenAI

25.6

$1.1 in / $4.4 out

142
A

Qwen3 30B A3B

Alibaba Cloud / Qwen Team

25.6

$0.1 in / $0.44 out

143

Claude 3.5 Sonnet

Anthropic

25.4

$3 in / $15 out

144

Page 8 of 15 · 296 models

PreviousNext

Want benchmark charts, model comparison, and pricing analytics?

Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.

Open full leaderboard

Rankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.

$3 in / $15 out
$0.07 in / $0.3 out
N/A
$0.15 in / $0.6 out
$0.1 in / $1 out
$2.5 in / $10 out

Gemini 2.0 Flash-Lite

Google

25.3

$0.07 in / $0.3 out

145
N

Nemotron Nano 9B v2

NVIDIA

24.9

N/A

146
A

Qwen2.5 VL 72B Instruct

Alibaba Cloud / Qwen Team

24.9

N/A

147

DeepSeek R1 Distill Qwen 14B

DeepSeek

24.7

N/A

148
B

ERNIE 4.5

Baidu

24.5

$0.4 in / $4 out

149

LongCat-Flash-Lite

Meituan

24.5

$0.1 in / $0.4 out

150

Magistral Small 2506

Mistral AI

24.5

N/A

151

Kimi K2 Instruct

Moonshot AI

24.4

$0.5 in / $0.5 out

152

Kimi K2-Instruct-0905

Moonshot AI

24.4

N/A

153

MiniMax M1 80K

MiniMax

24.2

$0.55 in / $2.2 out

154

Grok-2 mini

xAI

24.0

N/A

155
M

Phi 4 Reasoning

Microsoft

23.1

N/A

156

Gemini 1.5 Flash

Google

23.0

$0.15 in / $0.6 out

157
N

Llama-3.3 Nemotron Super 49B v1

NVIDIA

23.0

N/A

158
A

Qwen3 VL 4B Thinking

Alibaba Cloud / Qwen Team

23.0

$0.1 in / $1 out

159

MiniMax M1 40K

MiniMax

22.6

N/A

160

GPT-4o

OpenAI

22.3

$2.5 in / $10 out