Skytells
HomeModelsCLIChangelog
  • Home
  • Models
  • CLI
  • Changelog
Skytells

Addressing the world's greatest challenges with AI. Enterprise research, foundation models, and infrastructure trusted by organizations worldwide since 2012.

Get Started

  • Console
  • Learn
  • Documentation
  • API Reference
  • Pricing
  • ModelsNew

Platform

  • Cloud AgentsNew
  • AI Solutions
  • Infrastructure
  • Edge Network
  • Trust Center
  • CLI

Resources

  • Blog
  • Changelog
  • AI Leaderboard
  • Research
  • Status

Company

  • About
  • Careers
  • Legal
  • Privacy Policy

© 2012–2026 Skytells, Inc. All rights reserved.

Live rankings

AI Model Leaderboard

Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.

Explore full leaderboardBrowse model catalog

296

Tracked models

27

Providers

253

Benchmarked

27.4

Avg. index

OverallBenchmarksInferenceAgenticProgrammingValue / Price

296 models

RankModelProviderScoreBenchmarksInferenceAgenticProgrammingValuePrice
181

DeepSeek R1 Distill Llama 8B

deepseek-r1-distill-llama-8b

textinference
DeepSeek

17.8

Benchmarks

17.80.00.00.00.0N/A
182

Qwen2.5 72B Instruct

qwen-2.5-72b-instruct

textinference
AAlibaba Cloud / Qwen Team

17.8

Benchmarks

17.815.00.00.054.5$0.35 in / $0.4 out
183

GPT-4 Turbo

gpt-4-turbo-2024-04-09

textinference
OpenAI

16.9

Benchmarks

16.952.70.00.018.8$10 in / $30 out
184

Llama 3.1 Nemotron Nano 8B V1

llama-3.1-nemotron-nano-8b-v1

textinference
NNVIDIA

16.3

Benchmarks

16.30.00.00.00.0N/A
185

Llama 3.2 90B Instruct

llama-3.2-90b-instruct

multimodalvisionmulti-input reasoning
MMeta

16.3

Benchmarks

16.311.30.00.054.9$0.35 in / $0.4 out
186

Mistral Small 3.1 24B Instruct

mistral-small-3.1-24b-instruct-2503

multimodalvisionmulti-input reasoning
Mistral AI

15.7

Benchmarks

15.70.00.00.00.0
187

Phi 4

phi-4

textinference
MMicrosoft

15.6

Benchmarks

15.69.00.00.077.2$0.07 in / $0.14 out
188

GPT-4o mini

gpt-4o-mini-2024-07-18

multimodalvisionmulti-input reasoning
OpenAI

14.8

Benchmarks

14.845.40.00.065.1
189

Qwen2.5 14B Instruct

qwen-2.5-14b-instruct

textinference
AAlibaba Cloud / Qwen Team

14.6

Benchmarks

14.60.00.00.00.0N/A
190

Qwen3.5-2B

qwen3.5-2b

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

14.4

Benchmarks

14.40.00.00.00.0N/A
191

Mistral Small 3 24B Instruct

mistral-small-24b-instruct-2501

textinference
Mistral AI

14.2

Benchmarks

14.221.40.00.080.7$0.07 in / $0.14 out
192

Nova Lite

nova-lite

multimodalvisionmulti-input reasoning
AAmazon

13.5

Benchmarks

13.570.50.00.086.7$0.06 in / $0.24 out
193

Mistral Small 3.1 24B Base

mistral-small-3.1-24b-base-2503

multimodalvisionmulti-input reasoning
Mistral AI

13.4

Benchmarks

13.464.80.00.085.3
194

GPT-4.1 nano

gpt-4.1-nano-2025-04-14

multimodalvisionmulti-input reasoning
OpenAI

12.5

Benchmarks

12.593.40.00.082.7
195

Qwen2 72B Instruct

qwen2-72b-instruct

textinference
AAlibaba Cloud / Qwen Team

12.0

Benchmarks

12.00.00.00.00.0N/A
196

Llama 3.1 70B Instruct

llama-3.1-70b-instruct

textinference
MMeta

11.2

Benchmarks

11.221.40.00.072.2$0.2 in / $0.2 out
197

Claude 3.5 Haiku

claude-3-5-haiku-20241022

codeprogrammingtool use
Anthropic

10.8

Benchmarks

10.830.53.07.831.8
198

Gemma 3 27B

gemma-3-27b-it

multimodalvisionmulti-input reasoning
Google

10.7

Benchmarks

10.720.30.00.073.9
199

Gemini 1.5 Flash 8B

gemini-1.5-flash-8b

multimodalvisionmulti-input reasoning
Google

10.4

Benchmarks

10.491.90.00.088.4
200

Claude 3 Sonnet

claude-3-sonnet-20240229

multimodalvisionmulti-input reasoning
Anthropic

10.0

Benchmarks

10.030.50.00.013.3
181

DeepSeek R1 Distill Llama 8B

DeepSeek

17.8

N/A

182
A

Qwen2.5 72B Instruct

Alibaba Cloud / Qwen Team

17.8

$0.35 in / $0.4 out

183

GPT-4 Turbo

OpenAI

16.9

$10 in / $30 out

184

Page 10 of 15 · 296 models

PreviousNext

Want benchmark charts, model comparison, and pricing analytics?

Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.

Open full leaderboard

Rankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.

N/A
$0.15 in / $0.6 out
$0.1 in / $0.3 out
$0.1 in / $0.4 out
$0.8 in / $4 out
$0.1 in / $0.2 out
$0.07 in / $0.3 out
$3 in / $15 out
N

Llama 3.1 Nemotron Nano 8B V1

NVIDIA

16.3

N/A

185
M

Llama 3.2 90B Instruct

Meta

16.3

$0.35 in / $0.4 out

186

Mistral Small 3.1 24B Instruct

Mistral AI

15.7

N/A

187
M

Phi 4

Microsoft

15.6

$0.07 in / $0.14 out

188

GPT-4o mini

OpenAI

14.8

$0.15 in / $0.6 out

189
A

Qwen2.5 14B Instruct

Alibaba Cloud / Qwen Team

14.6

N/A

190
A

Qwen3.5-2B

Alibaba Cloud / Qwen Team

14.4

N/A

191

Mistral Small 3 24B Instruct

Mistral AI

14.2

$0.07 in / $0.14 out

192
A

Nova Lite

Amazon

13.5

$0.06 in / $0.24 out

193

Mistral Small 3.1 24B Base

Mistral AI

13.4

$0.1 in / $0.3 out

194

GPT-4.1 nano

OpenAI

12.5

$0.1 in / $0.4 out

195
A

Qwen2 72B Instruct

Alibaba Cloud / Qwen Team

12.0

N/A

196
M

Llama 3.1 70B Instruct

Meta

11.2

$0.2 in / $0.2 out

197

Claude 3.5 Haiku

Anthropic

10.8

$0.8 in / $4 out

198

Gemma 3 27B

Google

10.7

$0.1 in / $0.2 out

199

Gemini 1.5 Flash 8B

Google

10.4

$0.07 in / $0.3 out

200

Claude 3 Sonnet

Anthropic

10.0

$3 in / $15 out