Skytells
HomeModelsCLIChangelog
  • Home
  • Models
  • CLI
  • Changelog
Skytells

Addressing the world's greatest challenges with AI. Enterprise research, foundation models, and infrastructure trusted by organizations worldwide since 2012.

Get Started

  • Console
  • Learn
  • Documentation
  • API Reference
  • Pricing
  • ModelsNew

Platform

  • Cloud AgentsNew
  • AI Solutions
  • Infrastructure
  • Edge Network
  • Trust Center
  • CLI

Resources

  • Blog
  • Changelog
  • AI Leaderboard
  • Research
  • Status

Company

  • About
  • Careers
  • Legal
  • Privacy Policy

© 2012–2026 Skytells, Inc. All rights reserved.

Live rankings

AI Model Leaderboard

Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.

Explore full leaderboardBrowse model catalog

296

Tracked models

27

Providers

253

Benchmarked

34.7

Avg. index

OverallBenchmarksInferenceAgenticProgrammingValue / Price

296 models

RankModelProviderScoreBenchmarksInferenceAgenticProgrammingValuePrice
101

Gemini 2.5 Pro Preview 06-05

gemini-2.5-pro-preview-06-05

multimodalvisionmulti-input reasoning
Google

44.2

overall

51.262.80.029.327.6$1.25 in / $10 out
102

Qwen3 VL 235B A22B Thinking

qwen3-vl-235b-a22b-thinking

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

44.2

overall

37.766.040.20.037.4
103

Nova Lite

nova-lite

multimodalvisionmulti-input reasoning
AAmazon

44.0

overall

13.570.50.00.086.7$0.06 in / $0.24 out
104

Grok Code Fast 1

grok-code-fast-1

codeprogrammingtool use
xAI

44.0

overall

0.047.70.038.849.7$0.2 in / $1.5 out
105

Devstral Medium

devstral-medium-2507

codeprogrammingtool use
Mistral AI

43.8

overall

0.064.80.024.253.4$0.4 in / $2 out
106

Qwen3-Coder 480B A35B Instruct

qwen3-coder-480b-a35b-instruct

codeprogrammingtool use
AAlibaba Cloud / Qwen Team

43.6

overall

0.00.050.735.80.0
107

Qwen3-235B-A22B-Thinking-2507

qwen3-235b-a22b-thinking-2507

textinference
AAlibaba Cloud / Qwen Team

43.5

overall

46.466.026.80.039.6$0.3 in / $3 out
108

GPT-5.4 Mini

gpt-5.4-mini

texttext-to-textlanguage
OpenAI

43.3

overall

56.876.523.828.132.4
109

Mistral NeMo Instruct

mistral-nemo-instruct-2407

textinference
Mistral AI

42.9

overall

0.021.40.00.077.3$0.15 in / $0.15 out
110

GPT-5.3 Chat

gpt-5.3-chat-latest

multimodalvisionmulti-input reasoning
OpenAI

42.6

overall

0.052.70.00.026.5
111

LongCat-Flash-Chat

longcat-flash-chat

codeprogrammingtool use
Meituan

42.4

overall

27.952.749.239.157.9$0.3 in / $1.2 out
112

Mistral Small 3.1 24B Base

mistral-small-3.1-24b-base-2503

multimodalvisionmulti-input reasoning
Mistral AI

42.0

overall

13.464.80.00.085.3
113

GLM-4.6

glm-4.6

multimodalvisionmulti-input reasoning
ZZhipu AI

41.8

overall

46.534.537.345.742.9$0.55 in / $2.19 out
114

Llama 3.2 3B Instruct

llama-3.2-3b-instruct

textinference
MMeta

41.4

overall

5.268.90.00.098.8$0.01 in / $0.02 out
115

Qwen3 235B A22B

qwen3-235b-a22b

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

41.3

overall

30.533.50.00.084.0$0.1 in / $0.1 out
116

Command R+

command-r-plus-04-2024

textinference
Cohere

41.3

overall

0.032.50.00.055.4$0.25 in / $1 out
117

LongCat-Flash-Lite

longcat-flash-lite

codeprogrammingtool use
Meituan

41.1

overall

24.583.629.525.183.1$0.1 in / $0.4 out
118

DeepSeek-V3.2-Exp

deepseek-v3.2-exp

codeprogrammingtool use
DeepSeek

41.0

overall

52.30.028.640.10.0N/A
119

GPT-5.2 Codex

gpt-5.2-codex

multimodalvisionmulti-input reasoning
OpenAI

40.6

overall

0.049.00.044.119.6$1.75 in / $14 out
120

Gemini 2.5 Pro

gemini-2.5-pro

multimodalvisionmulti-input reasoning
Google

40.4

overall

44.262.80.025.027.6
101

Gemini 2.5 Pro Preview 06-05

Google

44.2

$1.25 in / $10 out

102
A

Qwen3 VL 235B A22B Thinking

Alibaba Cloud / Qwen Team

44.2

$0.45 in / $3.49 out

103
A

Nova Lite

Amazon

44.0

$0.06 in / $0.24 out

104

Page 6 of 15 · 296 models

PreviousNext

Want benchmark charts, model comparison, and pricing analytics?

Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.

Open full leaderboard

Rankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.

$0.45 in / $3.49 out
N/A
$0.75 in / $4.5 out
$1.75 in / $14 out
$0.1 in / $0.3 out
$1.25 in / $10 out

Grok Code Fast 1

xAI

44.0

$0.2 in / $1.5 out

105

Devstral Medium

Mistral AI

43.8

$0.4 in / $2 out

106
A

Qwen3-Coder 480B A35B Instruct

Alibaba Cloud / Qwen Team

43.6

N/A

107
A

Qwen3-235B-A22B-Thinking-2507

Alibaba Cloud / Qwen Team

43.5

$0.3 in / $3 out

108

GPT-5.4 Mini

OpenAI

43.3

$0.75 in / $4.5 out

109

Mistral NeMo Instruct

Mistral AI

42.9

$0.15 in / $0.15 out

110

GPT-5.3 Chat

OpenAI

42.6

$1.75 in / $14 out

111

LongCat-Flash-Chat

Meituan

42.4

$0.3 in / $1.2 out

112

Mistral Small 3.1 24B Base

Mistral AI

42.0

$0.1 in / $0.3 out

113
Z

GLM-4.6

Zhipu AI

41.8

$0.55 in / $2.19 out

114
M

Llama 3.2 3B Instruct

Meta

41.4

$0.01 in / $0.02 out

115
A

Qwen3 235B A22B

Alibaba Cloud / Qwen Team

41.3

$0.1 in / $0.1 out

116

Command R+

Cohere

41.3

$0.25 in / $1 out

117

LongCat-Flash-Lite

Meituan

41.1

$0.1 in / $0.4 out

118

DeepSeek-V3.2-Exp

DeepSeek

41.0

N/A

119

GPT-5.2 Codex

OpenAI

40.6

$1.75 in / $14 out

120

Gemini 2.5 Pro

Google

40.4

$1.25 in / $10 out