Skytells
HomeModelsCLIChangelog
  • Home
  • Models
  • CLI
  • Changelog
Skytells

Addressing the world's greatest challenges with AI. Enterprise research, foundation models, and infrastructure trusted by organizations worldwide since 2012.

Get Started

  • Console
  • Learn
  • Documentation
  • API Reference
  • Pricing
  • ModelsNew

Platform

  • Cloud AgentsNew
  • AI Solutions
  • Infrastructure
  • Edge Network
  • Trust Center
  • CLI

Resources

  • Blog
  • Changelog
  • AI Leaderboard
  • Research
  • Status

Company

  • About
  • Careers
  • Legal
  • Privacy Policy

© 2012–2026 Skytells, Inc. All rights reserved.

Live rankings

AI Model Leaderboard

Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.

Explore full leaderboardBrowse model catalog

296

Tracked models

27

Providers

253

Benchmarked

27.4

Avg. index

OverallBenchmarksInferenceAgenticProgrammingValue / Price

296 models

RankModelProviderScoreBenchmarksInferenceAgenticProgrammingValuePrice
201

Grok-1.5V

grok-1.5v

multimodalvisionmulti-input reasoning
xAI

9.8

Benchmarks

9.80.00.00.00.0N/A
202

Qwen3 VL 8B Instruct

qwen3-vl-8b-instruct

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

9.8

Benchmarks

9.866.026.70.075.3
203

Mistral Large 3

mistral-large-3-2509

multimodalvisionmulti-input reasoning
Mistral AI

9.6

Benchmarks

9.618.80.00.029.1
204

Qwen2.5 VL 7B Instruct

qwen2.5-vl-7b

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

9.6

Benchmarks

9.60.00.00.00.0N/A
205

Gemma 4 E2B

gemma-4-e2b-it

multimodalvisionmulti-input reasoning
Google

9.5

Benchmarks

9.50.00.00.00.0N/A
206

Qwen2-VL-72B-Instruct

qwen2-vl-72b

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

9.3

Benchmarks

9.30.00.00.00.0N/A
207

Gemma 3 12B

gemma-3-12b-it

multimodalvisionmulti-input reasoning
Google

9.1

Benchmarks

9.120.30.00.080.7$0.05 in / $0.1 out
208

Nova Micro

nova-micro

textinference
AAmazon

9.1

Benchmarks

9.152.70.00.091.3$0.03 in / $0.14 out
209

Phi-4-multimodal-instruct

phi-4-multimodal-instruct

multimodalvisionmulti-input reasoning
MMicrosoft

8.8

Benchmarks

8.812.30.00.079.8$0.05 in / $0.1 out
210

Grok-1.5

grok-1.5

multimodalvisionmulti-input reasoning
xAI

8.6

Benchmarks

8.60.00.00.00.0N/A
211

Phi-3.5-MoE-instruct

phi-3.5-moe-instruct

multimodalvisionmulti-input reasoning
MMicrosoft

8.2

Benchmarks

8.20.00.00.00.0N/A
212

Jamba 1.5 Large

jamba-1.5-large

textinference
AAI21 Labs

8.1

Benchmarks

8.133.60.00.025.2$2 in / $8 out
213

Pixtral-12B

pixtral-12b-2409

multimodalvisionmulti-input reasoning
Mistral AI

8.1

Benchmarks

8.17.00.00.073.0
214

Qwen2.5-Omni-7B

qwen2.5-omni-7b

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

7.6

Benchmarks

7.60.00.00.00.0N/A
215

Qwen2.5 7B Instruct

qwen-2.5-7b-instruct

textinference
AAlibaba Cloud / Qwen Team

7.4

Benchmarks

7.471.10.00.077.2$0.3 in / $0.3 out
216

Gemini Diffusion

gemini-diffusion

codeprogrammingtool use
Google

7.0

Benchmarks

7.00.00.01.70.0N/A
217

DeepSeek VL2

deepseek-vl2

multimodalvisionmulti-input reasoning
DeepSeek

6.9

Benchmarks

6.90.00.00.00.0N/A
218

GPT-4

gpt-4-0613

multimodalvisionmulti-input reasoning
OpenAI

6.8

Benchmarks

6.854.90.00.018.7$30 in / $60 out
219

Mistral Small 3 24B Base

mistral-small-24b-base-2501

multimodalvisionmulti-input reasoning
Mistral AI

6.4

Benchmarks

6.40.00.00.00.0
220

DeepSeek R1 Distill Qwen 1.5B

deepseek-r1-distill-qwen-1.5b

textinference
DeepSeek

6.1

Benchmarks

6.10.00.00.00.0N/A
201

Grok-1.5V

xAI

9.8

N/A

202
A

Qwen3 VL 8B Instruct

Alibaba Cloud / Qwen Team

9.8

$0.08 in / $0.5 out

203

Mistral Large 3

Mistral AI

9.6

$2 in / $5 out

204

Page 11 of 15 · 296 models

PreviousNext

Want benchmark charts, model comparison, and pricing analytics?

Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.

Open full leaderboard

Rankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.

$0.08 in / $0.5 out
$2 in / $5 out
$0.15 in / $0.15 out
N/A
A

Qwen2.5 VL 7B Instruct

Alibaba Cloud / Qwen Team

9.6

N/A

205

Gemma 4 E2B

Google

9.5

N/A

206
A

Qwen2-VL-72B-Instruct

Alibaba Cloud / Qwen Team

9.3

N/A

207

Gemma 3 12B

Google

9.1

$0.05 in / $0.1 out

208
A

Nova Micro

Amazon

9.1

$0.03 in / $0.14 out

209
M

Phi-4-multimodal-instruct

Microsoft

8.8

$0.05 in / $0.1 out

210

Grok-1.5

xAI

8.6

N/A

211
M

Phi-3.5-MoE-instruct

Microsoft

8.2

N/A

212
A

Jamba 1.5 Large

AI21 Labs

8.1

$2 in / $8 out

213

Pixtral-12B

Mistral AI

8.1

$0.15 in / $0.15 out

214
A

Qwen2.5-Omni-7B

Alibaba Cloud / Qwen Team

7.6

N/A

215
A

Qwen2.5 7B Instruct

Alibaba Cloud / Qwen Team

7.4

$0.3 in / $0.3 out

216

Gemini Diffusion

Google

7.0

N/A

217

DeepSeek VL2

DeepSeek

6.9

N/A

218

GPT-4

OpenAI

6.8

$30 in / $60 out

219

Mistral Small 3 24B Base

Mistral AI

6.4

N/A

220

DeepSeek R1 Distill Qwen 1.5B

DeepSeek

6.1

N/A