Skytells
HomeModelsCLIChangelog
  • Home
  • Models
  • CLI
  • Changelog
Skytells

Addressing the world's greatest challenges with AI. Enterprise research, foundation models, and infrastructure trusted by organizations worldwide since 2012.

Get Started

  • Console
  • Learn
  • Documentation
  • API Reference
  • Pricing
  • ModelsNew

Platform

  • Cloud AgentsNew
  • AI Solutions
  • Infrastructure
  • Edge Network
  • Trust Center
  • CLI

Resources

  • Blog
  • Changelog
  • AI Leaderboard
  • Research
  • Status

Company

  • About
  • Careers
  • Legal
  • Privacy Policy

© 2012–2026 Skytells, Inc. All rights reserved.

Live rankings

AI Model Leaderboard

Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.

Explore full leaderboardBrowse model catalog

296

Tracked models

27

Providers

253

Benchmarked

34.7

Avg. index

OverallBenchmarksInferenceAgenticProgrammingValue / Price

296 models

RankModelProviderScoreBenchmarksInferenceAgenticProgrammingValuePrice
241

DeepSeek R1 Distill Llama 8B

deepseek-r1-distill-llama-8b

textinference
DeepSeek

17.8

overall

17.80.00.00.00.0N/A
242

Kimi K2-Instruct-0905

kimi-k2-instruct-0905

codeprogrammingtool use
Moonshot AI

17.1

overall

24.40.06.619.30.0
243

Claude 3 Sonnet

claude-3-sonnet-20240229

multimodalvisionmulti-input reasoning
Anthropic

16.7

overall

10.030.50.00.013.3
244

Llama 3.1 Nemotron Nano 8B V1

llama-3.1-nemotron-nano-8b-v1

textinference
NNVIDIA

16.3

overall

16.30.00.00.00.0N/A
245

Qwen2.5 VL 72B Instruct

qwen2.5-vl-72b

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

16.0

overall

24.90.05.70.00.0N/A
246

Mistral Large 3

mistral-large-3-2509

multimodalvisionmulti-input reasoning
Mistral AI

16.0

overall

9.618.80.00.029.1
247

Mistral Small 3.1 24B Instruct

mistral-small-3.1-24b-instruct-2503

multimodalvisionmulti-input reasoning
Mistral AI

15.7

overall

15.70.00.00.00.0
248

Qwen2.5 14B Instruct

qwen-2.5-14b-instruct

textinference
AAlibaba Cloud / Qwen Team

14.6

overall

14.60.00.00.00.0N/A
249

o3-pro

o3-pro-2025-06-10

multimodalvisionmulti-input reasoning
OpenAI

14.6

overall

0.021.40.00.03.6$20 in / $80 out
250

Qwen3.5-2B

qwen3.5-2b

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

14.4

overall

14.40.00.00.00.0N/A
251

Claude 3.5 Haiku

claude-3-5-haiku-20241022

codeprogrammingtool use
Anthropic

13.5

overall

10.830.53.07.831.8
252

Qwen2.5 VL 32B Instruct

qwen2.5-vl-32b

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

12.2

overall

21.20.01.60.00.0N/A
253

Qwen2 72B Instruct

qwen2-72b-instruct

textinference
AAlibaba Cloud / Qwen Team

12.0

overall

12.00.00.00.00.0N/A
254

Grok-1.5V

grok-1.5v

multimodalvisionmulti-input reasoning
xAI

9.8

overall

9.80.00.00.00.0N/A
255

Gemma 4 E2B

gemma-4-e2b-it

multimodalvisionmulti-input reasoning
Google

9.5

overall

9.50.00.00.00.0N/A
256

Qwen2-VL-72B-Instruct

qwen2-vl-72b

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

9.3

overall

9.30.00.00.00.0N/A
257

Grok-1.5

grok-1.5

multimodalvisionmulti-input reasoning
xAI

8.6

overall

8.60.00.00.00.0N/A
258

Gemma 3n E4B Instructed

gemma-3n-e4b-it

multimodalvisionmulti-input reasoning
Google

8.6

overall

1.320.30.00.010.3
259

Phi-3.5-MoE-instruct

phi-3.5-moe-instruct

multimodalvisionmulti-input reasoning
MMicrosoft

8.2

overall

8.20.00.00.00.0N/A
260

Qwen2.5-Omni-7B

qwen2.5-omni-7b

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

7.6

overall

7.60.00.00.00.0N/A
241

DeepSeek R1 Distill Llama 8B

DeepSeek

17.8

N/A

242

Kimi K2-Instruct-0905

Moonshot AI

17.1

N/A

243

Claude 3 Sonnet

Anthropic

16.7

$3 in / $15 out

Page 13 of 15 · 296 models

PreviousNext

Want benchmark charts, model comparison, and pricing analytics?

Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.

Open full leaderboard

Rankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.

N/A
$3 in / $15 out
$2 in / $5 out
N/A
$0.8 in / $4 out
$20 in / $40 out
244
N

Llama 3.1 Nemotron Nano 8B V1

NVIDIA

16.3

N/A

245
A

Qwen2.5 VL 72B Instruct

Alibaba Cloud / Qwen Team

16.0

N/A

246

Mistral Large 3

Mistral AI

16.0

$2 in / $5 out

247

Mistral Small 3.1 24B Instruct

Mistral AI

15.7

N/A

248
A

Qwen2.5 14B Instruct

Alibaba Cloud / Qwen Team

14.6

N/A

249

o3-pro

OpenAI

14.6

$20 in / $80 out

250
A

Qwen3.5-2B

Alibaba Cloud / Qwen Team

14.4

N/A

251

Claude 3.5 Haiku

Anthropic

13.5

$0.8 in / $4 out

252
A

Qwen2.5 VL 32B Instruct

Alibaba Cloud / Qwen Team

12.2

N/A

253
A

Qwen2 72B Instruct

Alibaba Cloud / Qwen Team

12.0

N/A

254

Grok-1.5V

xAI

9.8

N/A

255

Gemma 4 E2B

Google

9.5

N/A

256
A

Qwen2-VL-72B-Instruct

Alibaba Cloud / Qwen Team

9.3

N/A

257

Grok-1.5

xAI

8.6

N/A

258

Gemma 3n E4B Instructed

Google

8.6

$20 in / $40 out

259
M

Phi-3.5-MoE-instruct

Microsoft

8.2

N/A

260
A

Qwen2.5-Omni-7B

Alibaba Cloud / Qwen Team

7.6

N/A