Skytells
HomeModelsCLIChangelog
  • Home
  • Models
  • CLI
  • Changelog
Skytells

Addressing the world's greatest challenges with AI. Enterprise research, foundation models, and infrastructure trusted by organizations worldwide since 2012.

Get Started

  • Console
  • Learn
  • Documentation
  • API Reference
  • Pricing
  • ModelsNew

Platform

  • Cloud AgentsNew
  • AI Solutions
  • Infrastructure
  • Edge Network
  • Trust Center
  • CLI

Resources

  • Blog
  • Changelog
  • AI Leaderboard
  • Research
  • Status

Company

  • About
  • Careers
  • Legal
  • Privacy Policy

© 2012–2026 Skytells, Inc. All rights reserved.

Live rankings

AI Model Leaderboard

Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.

Explore full leaderboardBrowse model catalog

294

Tracked models

27

Providers

251

Benchmarked

34.7

Avg. index

OverallBenchmarksInferenceAgenticProgrammingValue / Price

294 models

RankModelProviderScoreBenchmarksInferenceAgenticProgrammingValuePrice
81

Step3-VL-10B

step3-vl-10b

multimodalvisionmulti-input reasoning
SStepFun

47.4

overall

47.40.00.00.00.0N/A
82

MiniMax M2.7

minimax-m2.7

codeprogrammingtool use
MiniMax

47.3

overall

0.052.850.835.955.0$0.3 in / $1.2 out
83

Mercury 2

mercury-2

codeprogrammingtool use
IInception

47.2

overall

44.672.50.022.369.2$0.25 in / $0.75 out
84

Llama 4 Maverick

llama-4-maverick

multimodalvisionmulti-input reasoning
MMeta

47.2

overall

35.558.90.00.061.2$0.17 in / $0.6 out
85

Mistral Small 4

mistral-small-latest

multimodalvisionmulti-input reasoning
Mistral AI

47.0

overall

34.855.90.00.066.9
86

GLM-4.7

glm-4.7

multimodalvisionmulti-input reasoning
ZZhipu AI

46.8

overall

63.252.828.244.540.6$0.6 in / $2.2 out
87

Gemini 2.0 Flash-Lite

gemini-2.0-flash-lite

multimodalvisionmulti-input reasoning
Google

46.8

overall

25.763.20.00.079.7
88

Gemini 2.0 Flash Thinking

gemini-2.0-flash-thinking

multimodalvisionmulti-input reasoning
Google

46.7

overall

46.70.00.00.00.0
89

GLM-5

glm-5

codeprogrammingtool use
ZZhipu AI

46.2

overall

0.022.151.365.330.2$1 in / $3.2 out
90

Devstral Small 1.1

devstral-small-2507

codeprogrammingtool use
Mistral AI

46.1

overall

0.064.50.015.085.0$0.1 in / $0.3 out
91

DeepSeek-V3.2

deepseek-v3.2

codeprogrammingtool use
DeepSeek

45.8

overall

58.152.516.645.970.0$0.26 in / $0.38 out
92

LongCat-Flash-Thinking-2601

longcat-flash-thinking-2601

codeprogrammingtool use
Meituan

45.6

overall

56.351.930.838.057.7
93

o4-mini

o4-mini

multimodalvisionmulti-input reasoning
OpenAI

45.5

overall

48.870.738.232.741.9$1.1 in / $4.4 out
94

Claude Sonnet 4

claude-sonnet-4-20250514

multimodalvisionmulti-input reasoning
Anthropic

44.9

overall

41.00.049.444.90.0
95

GPT-5.1 Codex

gpt-5.1-codex

multimodalvisionmulti-input reasoning
OpenAI

44.9

overall

0.048.60.051.225.1$1.25 in / $10 out
96

Gemini 2.5 Pro Preview 06-05

gemini-2.5-pro-preview-06-05

multimodalvisionmulti-input reasoning
Google

44.7

overall

51.763.20.030.027.9
97

Qwen3 VL 235B A22B Thinking

qwen3-vl-235b-a22b-thinking

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

44.4

overall

37.966.840.20.037.2
98

Grok Code Fast 1

grok-code-fast-1

codeprogrammingtool use
xAI

44.2

overall

0.047.20.039.749.5$0.2 in / $1.5 out
99

GPT-5.4 Mini

gpt-5.4-mini

texttext-to-textlanguage
OpenAI

44.1

overall

57.477.427.126.932.8
100

Qwen2.5-Coder 32B Instruct

qwen-2.5-coder-32b-instruct

textinference
AAlibaba Cloud / Qwen Team

44.1

overall

0.020.90.00.081.2$0.09 in / $0.09 out
81
S

Step3-VL-10B

StepFun

47.4

N/A

82

MiniMax M2.7

MiniMax

47.3

$0.3 in / $1.2 out

83
I

Mercury 2

Inception

47.2

$0.25 in / $0.75 out

84

Page 5 of 15 · 294 models

PreviousNext

Want benchmark charts, model comparison, and pricing analytics?

Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.

Open full leaderboard

Rankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.

$0.15 in / $0.6 out
$0.07 in / $0.3 out
N/A
$0.3 in / $1.2 out
N/A
$1.25 in / $10 out
$0.45 in / $3.49 out
$0.75 in / $4.5 out
M

Llama 4 Maverick

Meta

47.2

$0.17 in / $0.6 out

85

Mistral Small 4

Mistral AI

47.0

$0.15 in / $0.6 out

86
Z

GLM-4.7

Zhipu AI

46.8

$0.6 in / $2.2 out

87

Gemini 2.0 Flash-Lite

Google

46.8

$0.07 in / $0.3 out

88

Gemini 2.0 Flash Thinking

Google

46.7

N/A

89
Z

GLM-5

Zhipu AI

46.2

$1 in / $3.2 out

90

Devstral Small 1.1

Mistral AI

46.1

$0.1 in / $0.3 out

91

DeepSeek-V3.2

DeepSeek

45.8

$0.26 in / $0.38 out

92

LongCat-Flash-Thinking-2601

Meituan

45.6

$0.3 in / $1.2 out

93

o4-mini

OpenAI

45.5

$1.1 in / $4.4 out

94

Claude Sonnet 4

Anthropic

44.9

N/A

95

GPT-5.1 Codex

OpenAI

44.9

$1.25 in / $10 out

96

Gemini 2.5 Pro Preview 06-05

Google

44.7

$1.25 in / $10 out

97
A

Qwen3 VL 235B A22B Thinking

Alibaba Cloud / Qwen Team

44.4

$0.45 in / $3.49 out

98

Grok Code Fast 1

xAI

44.2

$0.2 in / $1.5 out

99

GPT-5.4 Mini

OpenAI

44.1

$0.75 in / $4.5 out

100
A

Qwen2.5-Coder 32B Instruct

Alibaba Cloud / Qwen Team

44.1

$0.09 in / $0.09 out