Skytells
HomeModelsCLIChangelog
  • Home
  • Models
  • CLI
  • Changelog
Skytells

Addressing the world's greatest challenges with AI. Enterprise research, foundation models, and infrastructure trusted by organizations worldwide since 2012.

Get Started

  • Console
  • Learn
  • Documentation
  • API Reference
  • Pricing
  • ModelsNew

Platform

  • Cloud AgentsNew
  • AI Solutions
  • Infrastructure
  • Edge Network
  • Trust Center
  • CLI

Resources

  • Blog
  • Changelog
  • AI Leaderboard
  • Research
  • Status

Company

  • About
  • Careers
  • Legal
  • Privacy Policy

© 2012–2026 Skytells, Inc. All rights reserved.

Live rankings

AI Model Leaderboard

Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.

Explore full leaderboardBrowse model catalog

296

Tracked models

27

Providers

253

Benchmarked

27.4

Avg. index

OverallBenchmarksInferenceAgenticProgrammingValue / Price

296 models

RankModelProviderScoreBenchmarksInferenceAgenticProgrammingValuePrice
81

K-EXAONE-236B-A23B

k-exaone-236b-a23b

multimodalvisionmulti-input reasoning
LLG AI Research

43.4

Benchmarks

43.424.90.00.049.2$0.6 in / $1 out
82

Gemma 4 26B-A4B

gemma-4-26b-a4b-it

multimodalvisionmulti-input reasoning
Google

43.3

Benchmarks

43.366.00.00.077.5
83

o1

o1-2024-12-17

multimodalvisionmulti-input reasoning
OpenAI

42.9

Benchmarks

42.919.444.76.54.9$15 in / $60 out
84

Sarvam-105B

sarvam-105b

codeprogrammingtool use
SSarvam AI

42.9

Benchmarks

42.90.017.912.10.0N/A
85

Qwen3-235B-A22B-Instruct-2507

qwen3-235b-a22b-instruct-2507

textinference
AAlibaba Cloud / Qwen Team

42.4

Benchmarks

42.466.00.00.063.2$0.15 in / $0.8 out
86

MiniMax M2.1

minimax-m2.1

codeprogrammingtool use
MiniMax

42.2

Benchmarks

42.274.553.950.357.9$0.3 in / $1.2 out
87

GPT-4.5

gpt-4.5

multimodalvisionmulti-input reasoning
OpenAI

41.9

Benchmarks

41.929.735.86.07.0$75 in / $150 out
88

o1-preview

o1-preview

codeprogrammingtool use
OpenAI

41.8

Benchmarks

41.833.00.09.511.8$15 in / $60 out
89

GPT-5 mini

gpt-5-mini-2025-08-07

multimodalvisionmulti-input reasoning
OpenAI

41.5

Benchmarks

41.589.40.023.756.3
90

Claude Sonnet 4

claude-sonnet-4-20250514

multimodalvisionmulti-input reasoning
Anthropic

40.9

Benchmarks

40.90.049.444.30.0
91

Gemini 2.5 Flash

gemini-2.5-flash

multimodalvisionmulti-input reasoning
Google

39.6

Benchmarks

39.662.80.022.942.6
92

DeepSeek R1 Zero

deepseek-r1-zero

textinference
DeepSeek

39.4

Benchmarks

39.40.00.00.00.0N/A
93

Qwen3.5-9B

qwen3.5-9b

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

38.5

Benchmarks

38.50.00.00.00.0N/A
94

DeepSeek-V3.1

deepseek-v3.1

codeprogrammingtool use
DeepSeek

38.4

Benchmarks

38.439.815.228.358.8$0.27 in / $1 out
95

GLM-4.7-Flash

glm-4.7-flash

codeprogrammingtool use
ZZhipu AI

38.2

Benchmarks

38.229.711.420.772.1$0.07 in / $0.4 out
96

QvQ-72B-Preview

qvq-72b-preview

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

38.2

Benchmarks

38.20.00.00.00.0N/A
97

Ministral 3 (14B Reasoning 2512)

ministral-14b-latest

multimodalvisionmulti-input reasoning
Mistral AI

37.7

Benchmarks

37.777.00.00.084.8
98

Qwen3 VL 235B A22B Thinking

qwen3-vl-235b-a22b-thinking

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

37.7

Benchmarks

37.766.040.20.037.4
99

Claude Opus 4

claude-opus-4-20250514

multimodalvisionmulti-input reasoning
Anthropic

37.6

Benchmarks

37.60.057.948.90.0
100

Qwen3 VL 235B A22B Instruct

qwen3-vl-235b-a22b-instruct

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

36.9

Benchmarks

36.966.056.70.049.5
81
L

K-EXAONE-236B-A23B

LG AI Research

43.4

$0.6 in / $1 out

82

Gemma 4 26B-A4B

Google

43.3

$0.13 in / $0.4 out

83

o1

OpenAI

42.9

$15 in / $60 out

84

Page 5 of 15 · 296 models

PreviousNext

Want benchmark charts, model comparison, and pricing analytics?

Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.

Open full leaderboard

Rankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.

$0.13 in / $0.4 out
$0.25 in / $2 out
N/A
$0.3 in / $2.5 out
$0.2 in / $0.2 out
$0.45 in / $3.49 out
N/A
$0.3 in / $1.5 out
S

Sarvam-105B

Sarvam AI

42.9

N/A

85
A

Qwen3-235B-A22B-Instruct-2507

Alibaba Cloud / Qwen Team

42.4

$0.15 in / $0.8 out

86

MiniMax M2.1

MiniMax

42.2

$0.3 in / $1.2 out

87

GPT-4.5

OpenAI

41.9

$75 in / $150 out

88

o1-preview

OpenAI

41.8

$15 in / $60 out

89

GPT-5 mini

OpenAI

41.5

$0.25 in / $2 out

90

Claude Sonnet 4

Anthropic

40.9

N/A

91

Gemini 2.5 Flash

Google

39.6

$0.3 in / $2.5 out

92

DeepSeek R1 Zero

DeepSeek

39.4

N/A

93
A

Qwen3.5-9B

Alibaba Cloud / Qwen Team

38.5

N/A

94

DeepSeek-V3.1

DeepSeek

38.4

$0.27 in / $1 out

95
Z

GLM-4.7-Flash

Zhipu AI

38.2

$0.07 in / $0.4 out

96
A

QvQ-72B-Preview

Alibaba Cloud / Qwen Team

38.2

N/A

97

Ministral 3 (14B Reasoning 2512)

Mistral AI

37.7

$0.2 in / $0.2 out

98
A

Qwen3 VL 235B A22B Thinking

Alibaba Cloud / Qwen Team

37.7

$0.45 in / $3.49 out

99

Claude Opus 4

Anthropic

37.6

N/A

100
A

Qwen3 VL 235B A22B Instruct

Alibaba Cloud / Qwen Team

36.9

$0.3 in / $1.5 out