Skytells
HomeModelsCLIChangelog
  • Home
  • Models
  • CLI
  • Changelog
Skytells

Addressing the world's greatest challenges with AI. Enterprise research, foundation models, and infrastructure trusted by organizations worldwide since 2012.

Get Started

  • Console
  • Learn
  • Documentation
  • API Reference
  • Pricing
  • ModelsNew

Platform

  • Cloud AgentsNew
  • AI Solutions
  • Infrastructure
  • Edge Network
  • Trust Center
  • CLI

Resources

  • Blog
  • Changelog
  • AI Leaderboard
  • Research
  • Status

Company

  • About
  • Careers
  • Legal
  • Privacy Policy

© 2012–2026 Skytells, Inc. All rights reserved.

Live rankings

AI Model Leaderboard

Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.

Explore full leaderboardBrowse model catalog

296

Tracked models

27

Providers

253

Benchmarked

34.7

Avg. index

OverallBenchmarksInferenceAgenticProgrammingValue / Price

296 models

RankModelProviderScoreBenchmarksInferenceAgenticProgrammingValuePrice
201

Llama 3.1 8B Instruct

llama-3.1-8b-instruct

textinference
MMeta

25.1

overall

3.226.70.00.083.9$0.03 in / $0.03 out
202

Nemotron Nano 9B v2

nvidia-nemotron-nano-9b-v2

textinference
NNVIDIA

24.9

overall

24.90.00.00.00.0N/A
203

Llama 3.1 405B Instruct

llama-3.1-405b-instruct

textinference
MMeta

24.9

overall

20.021.40.00.044.5$0.89 in / $0.89 out
204

DeepSeek R1 Distill Qwen 14B

deepseek-r1-distill-qwen-14b

textinference
DeepSeek

24.7

overall

24.70.00.00.00.0N/A
205

ERNIE 4.5

ernie-4.5

textinference
BBaidu

24.7

overall

24.518.80.00.034.6$0.4 in / $4 out
206

GLM-4.5-Air

glm-4.5-air

codeprogrammingtool use
ZZhipu AI

24.5

overall

27.70.024.920.00.0N/A
207

Magistral Small 2506

magistral-small-2506

textinference
Mistral AI

24.5

overall

24.50.00.00.00.0N/A
208

Gemini 2.5 Flash-Lite

gemini-2.5-flash-lite

multimodalvisionmulti-input reasoning
Google

24.2

overall

21.432.80.03.564.1
209

Qwen3-Next-80B-A3B-Instruct

qwen3-next-80b-a3b-instruct

textinference
AAlibaba Cloud / Qwen Team

24.0

overall

29.56.117.90.051.9$0.15 in / $1.5 out
210

Grok-2 mini

grok-2-mini

multimodalvisionmulti-input reasoning
xAI

24.0

overall

24.00.00.00.00.0N/A
211

Qwen2.5 72B Instruct

qwen-2.5-72b-instruct

textinference
AAlibaba Cloud / Qwen Team

23.8

overall

17.815.00.00.054.5$0.35 in / $0.4 out
212

GPT-4o mini

gpt-4o-mini-2024-07-18

multimodalvisionmulti-input reasoning
OpenAI

23.6

overall

14.845.40.00.065.1
213

Gemma 3 4B

gemma-3-4b-it

multimodalvisionmulti-input reasoning
Google

23.6

overall

4.520.30.00.082.0$0.02 in / $0.04 out
214

GPT-4o

gpt-4o-2024-08-06

multimodalvisionmulti-input reasoning
OpenAI

23.5

overall

31.546.714.94.326.8
215

Mistral Large 2

mistral-large-2-2407

textinference
Mistral AI

23.5

overall

0.021.40.00.026.7$2 in / $6 out
216

GPT-4

gpt-4-0613

multimodalvisionmulti-input reasoning
OpenAI

23.2

overall

6.854.90.00.018.7$30 in / $60 out
217

Phi 4 Reasoning

phi-4-reasoning

textinference
MMicrosoft

23.1

overall

23.10.00.00.00.0N/A
218

Llama-3.3 Nemotron Super 49B v1

llama-3.3-nemotron-super-49b-v1

textinference
NNVIDIA

23.0

overall

23.00.00.00.00.0N/A
219

Phi-4-multimodal-instruct

phi-4-multimodal-instruct

multimodalvisionmulti-input reasoning
MMicrosoft

23.0

overall

8.812.30.00.079.8$0.05 in / $0.1 out
220

MiniMax M1 40K

minimax-m1-40k

codeprogrammingtool use
MiniMax

22.6

overall

22.60.026.818.10.0N/A
201
M

Llama 3.1 8B Instruct

Meta

25.1

$0.03 in / $0.03 out

202
N

Nemotron Nano 9B v2

NVIDIA

24.9

N/A

203
M

Llama 3.1 405B Instruct

Meta

24.9

$0.89 in / $0.89 out

204

Page 11 of 15 · 296 models

PreviousNext

Want benchmark charts, model comparison, and pricing analytics?

Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.

Open full leaderboard

Rankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.

$0.1 in / $0.4 out
$0.15 in / $0.6 out
$2.5 in / $10 out

DeepSeek R1 Distill Qwen 14B

DeepSeek

24.7

N/A

205
B

ERNIE 4.5

Baidu

24.7

$0.4 in / $4 out

206
Z

GLM-4.5-Air

Zhipu AI

24.5

N/A

207

Magistral Small 2506

Mistral AI

24.5

N/A

208

Gemini 2.5 Flash-Lite

Google

24.2

$0.1 in / $0.4 out

209
A

Qwen3-Next-80B-A3B-Instruct

Alibaba Cloud / Qwen Team

24.0

$0.15 in / $1.5 out

210

Grok-2 mini

xAI

24.0

N/A

211
A

Qwen2.5 72B Instruct

Alibaba Cloud / Qwen Team

23.8

$0.35 in / $0.4 out

212

GPT-4o mini

OpenAI

23.6

$0.15 in / $0.6 out

213

Gemma 3 4B

Google

23.6

$0.02 in / $0.04 out

214

GPT-4o

OpenAI

23.5

$2.5 in / $10 out

215

Mistral Large 2

Mistral AI

23.5

$2 in / $6 out

216

GPT-4

OpenAI

23.2

$30 in / $60 out

217
M

Phi 4 Reasoning

Microsoft

23.1

N/A

218
N

Llama-3.3 Nemotron Super 49B v1

NVIDIA

23.0

N/A

219
M

Phi-4-multimodal-instruct

Microsoft

23.0

$0.05 in / $0.1 out

220

MiniMax M1 40K

MiniMax

22.6

N/A