Skytells
HomeModelsCLIChangelog
  • Home
  • Models
  • CLI
  • Changelog
Skytells

Addressing the world's greatest challenges with AI. Enterprise research, foundation models, and infrastructure trusted by organizations worldwide since 2012.

Get Started

  • Console
  • Learn
  • Documentation
  • API Reference
  • Pricing
  • ModelsNew

Platform

  • Cloud AgentsNew
  • AI Solutions
  • Infrastructure
  • Edge Network
  • Trust Center
  • CLI

Resources

  • Blog
  • Changelog
  • AI Leaderboard
  • Research
  • Status

Company

  • About
  • Careers
  • Legal
  • Privacy Policy

© 2012–2026 Skytells, Inc. All rights reserved.

Live rankings

AI Model Leaderboard

Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.

Explore full leaderboardBrowse model catalog

296

Tracked models

27

Providers

253

Benchmarked

27.4

Avg. index

OverallBenchmarksInferenceAgenticProgrammingValue / Price

296 models

RankModelProviderScoreBenchmarksInferenceAgenticProgrammingValuePrice
221

Claude 3 Haiku

claude-3-haiku-20240307

multimodalvisionmulti-input reasoning
Anthropic

5.8

Benchmarks

5.861.80.00.057.9$0.25 in / $1.25 out
222

Llama 3.2 3B Instruct

llama-3.2-3b-instruct

textinference
MMeta

5.2

Benchmarks

5.268.90.00.098.8$0.01 in / $0.02 out
223

Jamba 1.5 Mini

jamba-1.5-mini

textinference
AAI21 Labs

4.7

Benchmarks

4.765.80.00.072.4$0.2 in / $0.4 out
224

DeepSeek VL2 Small

deepseek-vl2-small

multimodalvisionmulti-input reasoning
DeepSeek

4.6

Benchmarks

4.60.00.00.00.0
225

Gemma 3 4B

gemma-3-4b-it

multimodalvisionmulti-input reasoning
Google

4.5

Benchmarks

4.520.30.00.082.0$0.02 in / $0.04 out
226

GPT-5.1 Codex Mini

gpt-5.1-codex-mini

multimodalvisionmulti-input reasoning
OpenAI

4.0

Benchmarks

4.00.00.00.00.0
227

Llama 3.2 11B Instruct

llama-3.2-11b-instruct

multimodalvisionmulti-input reasoning
MMeta

4.0

Benchmarks

4.060.30.00.094.9$0.05 in / $0.05 out
228

Gemini 1.0 Pro

gemini-1.0-pro

multimodalvisionmulti-input reasoning
Google

3.2

Benchmarks

3.257.20.00.055.4
229

Llama 3.1 8B Instruct

llama-3.1-8b-instruct

textinference
MMeta

3.2

Benchmarks

3.226.70.00.083.9$0.03 in / $0.03 out
230

Phi-3.5-mini-instruct

phi-3.5-mini-instruct

multimodalvisionmulti-input reasoning
MMicrosoft

2.7

Benchmarks

2.710.80.00.077.2$0.1 in / $0.1 out
231

GPT-3.5 Turbo

gpt-3.5-turbo-0125

multimodalvisionmulti-input reasoning
OpenAI

2.5

Benchmarks

2.536.70.00.049.4
232

Qwen2 7B Instruct

qwen2-7b-instruct

textinference
AAlibaba Cloud / Qwen Team

2.4

Benchmarks

2.40.00.00.00.0N/A
233

Phi-3.5-vision-instruct

phi-3.5-vision-instruct

multimodalvisionmulti-input reasoning
MMicrosoft

2.3

Benchmarks

2.30.00.00.00.0N/A
234

Phi 4 Mini

phi-4-mini

textinference
MMicrosoft

2.0

Benchmarks

2.00.00.00.00.0N/A
235

Gemma 3n E4B Instructed

gemma-3n-e4b-it

multimodalvisionmulti-input reasoning
Google

1.3

Benchmarks

1.320.30.00.010.3
236

Gemma 3n E4B Instructed LiteRT Preview

gemma-3n-e4b-it-litert-preview

multimodalvisionmulti-input reasoning
Google

1.3

Benchmarks

1.30.00.00.00.0
237

DeepSeek VL2 Tiny

deepseek-vl2-tiny

multimodalvisionmulti-input reasoning
DeepSeek

1.2

Benchmarks

1.20.00.00.00.0
238

Gemma 3n E2B Instructed

gemma-3n-e2b-it

multimodalvisionmulti-input reasoning
Google

1.0

Benchmarks

1.00.00.00.00.0
239

Gemma 3n E2B Instructed LiteRT (Preview)

gemma-3n-e2b-it-litert-preview

multimodalvisionmulti-input reasoning
Google

1.0

Benchmarks

1.00.00.00.00.0
240

Gemma 3 1B

gemma-3-1b-it

textinference
Google

0.9

Benchmarks

0.90.00.00.00.0N/A
221

Claude 3 Haiku

Anthropic

5.8

$0.25 in / $1.25 out

222
M

Llama 3.2 3B Instruct

Meta

5.2

$0.01 in / $0.02 out

223
A

Jamba 1.5 Mini

AI21 Labs

4.7

$0.2 in / $0.4 out

224

Page 12 of 15 · 296 models

PreviousNext

Want benchmark charts, model comparison, and pricing analytics?

Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.

Open full leaderboard

Rankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.

N/A
N/A
$0.5 in / $1.5 out
$0.5 in / $1.5 out
$20 in / $40 out
N/A
N/A
N/A
N/A

DeepSeek VL2 Small

DeepSeek

4.6

N/A

225

Gemma 3 4B

Google

4.5

$0.02 in / $0.04 out

226

GPT-5.1 Codex Mini

OpenAI

4.0

N/A

227
M

Llama 3.2 11B Instruct

Meta

4.0

$0.05 in / $0.05 out

228

Gemini 1.0 Pro

Google

3.2

$0.5 in / $1.5 out

229
M

Llama 3.1 8B Instruct

Meta

3.2

$0.03 in / $0.03 out

230
M

Phi-3.5-mini-instruct

Microsoft

2.7

$0.1 in / $0.1 out

231

GPT-3.5 Turbo

OpenAI

2.5

$0.5 in / $1.5 out

232
A

Qwen2 7B Instruct

Alibaba Cloud / Qwen Team

2.4

N/A

233
M

Phi-3.5-vision-instruct

Microsoft

2.3

N/A

234
M

Phi 4 Mini

Microsoft

2.0

N/A

235

Gemma 3n E4B Instructed

Google

1.3

$20 in / $40 out

236

Gemma 3n E4B Instructed LiteRT Preview

Google

1.3

N/A

237

DeepSeek VL2 Tiny

DeepSeek

1.2

N/A

238

Gemma 3n E2B Instructed

Google

1.0

N/A

239

Gemma 3n E2B Instructed LiteRT (Preview)

Google

1.0

N/A

240

Gemma 3 1B

Google

0.9

N/A