Skytells
HomeModelsCLIChangelog
  • Home
  • Models
  • CLI
  • Changelog
Skytells

Addressing the world's greatest challenges with AI. Enterprise research, foundation models, and infrastructure trusted by organizations worldwide since 2012.

Get Started

  • Console
  • Learn
  • Documentation
  • API Reference
  • Pricing
  • ModelsNew

Platform

  • Cloud AgentsNew
  • AI Solutions
  • Infrastructure
  • Edge Network
  • Trust Center
  • CLI

Resources

  • Blog
  • Changelog
  • AI Leaderboard
  • Research
  • Status

Company

  • About
  • Careers
  • Legal
  • Privacy Policy

© 2012–2026 Skytells, Inc. All rights reserved.

Live rankings

AI Model Leaderboard

Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.

Explore full leaderboardBrowse model catalog

309

Tracked models

27

Providers

264

Benchmarked

29.3

Avg. index

OverallBenchmarksInferenceAgenticProgrammingValue / Price

309 models

RankModelProviderScoreBenchmarksInferenceAgenticProgrammingValuePrice
21

GPT-5.1 Thinking

gpt-5.1-thinking-2025-11-12

multimodalvisionmulti-input reasoning
OpenAI

61.1

overall

65.40.00.055.70.0N/A
22

Kimi K2-Thinking-0905

kimi-k2-thinking-0905

codeprogrammingtool use
Moonshot AI

60.9

overall

68.70.052.859.80.0
23

GPT-5.1 Codex High

gpt-5.1-codex-high

multimodalvisionmulti-input reasoning
OpenAI

60.9

overall

60.90.00.00.00.0
24

GPT-5.2

gpt-5.2-2025-12-11

multimodalvisionmulti-input reasoning
OpenAI

60.7

overall

75.366.944.470.727.1
25

Grok 4.3

grok-4.3

textinference
xAI

60.5

overall

0.072.20.00.041.8$1.25 in / $2.5 out
26

Claude Opus 4.7

claude-opus-4-7

multimodalvisionmulti-input reasoning
Anthropic

60.0

overall

76.631.563.879.96.3
27

Seed 2.0 Pro

seed-2.0-pro

multimodalvisionmulti-input reasoning
BByteDance

60.0

overall

68.00.051.958.50.0N/A
28

GPT-5.2 Pro

gpt-5.2-pro-2025-12-11

multimodalvisionmulti-input reasoning
OpenAI

60.0

overall

65.50.053.40.00.0
29

MiniMax M2.5

minimax-m2.5

codeprogrammingtool use
MiniMax

59.8

overall

0.072.250.456.968.6$0.3 in / $1.2 out
30

Kimi K2.6

kimi-k2.6

texttext-to-textlanguage
Moonshot AI

59.4

overall

67.041.157.675.436.7
31

Qwen3.6 Plus

qwen3.6-plus

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

59.2

overall

70.272.242.161.044.9$0.5 in / $3 out
32

Gemini 3.5 Flash

gemini-3.5-flash

multimodalvisionmulti-input reasoning
Google

59.1

overall

62.889.274.430.526.6
33

Gemini 3 Flash

gemini-3-flash-preview

multimodalvisionmulti-input reasoning
Google

58.9

overall

70.072.238.863.744.9
34

Muse Spark

muse-spark

multimodalvisionmulti-input reasoning
MMeta

58.9

overall

69.90.064.139.10.0N/A
35

GPT-5.1

gpt-5.1-2025-11-13

multimodalvisionmulti-input reasoning
OpenAI

58.7

overall

65.466.90.055.733.2
36

GPT-5.1 Instant

gpt-5.1-instant-2025-11-12

multimodalvisionmulti-input reasoning
OpenAI

58.7

overall

65.466.90.055.733.2
37

ERNIE 5.0

ernie-5.0

multimodalvisionmulti-input reasoning
BBaidu

58.2

overall

58.20.00.00.00.0N/A
38

Step-3.5-Flash

step-3.5-flash

codeprogrammingtool use
SStepFun

57.9

overall

62.860.442.050.695.0$0.1 in / $0.4 out
39

Claude Opus 4.1

claude-opus-4-1-20250805

multimodalvisionmulti-input reasoning
Anthropic

57.9

overall

46.40.067.462.00.0
40

Claude Opus 4.6

claude-opus-4-6

multimodalvisionmulti-input reasoning
Anthropic

57.4

overall

78.231.557.872.86.3
21

GPT-5.1 Thinking

OpenAI

61.1

N/A

22

Kimi K2-Thinking-0905

Moonshot AI

60.9

N/A

23

GPT-5.1 Codex High

OpenAI

60.9

N/A

24

Page 2 of 16 · 309 models

PreviousNext

Want benchmark charts, model comparison, and pricing analytics?

Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.

Open full leaderboard

Rankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.

N/A
N/A
$1.75 in / $14 out
$5 in / $25 out
N/A
$0.95 in / $4 out
$1.5 in / $9 out
$0.5 in / $3 out
$1.25 in / $10 out
$1.25 in / $10 out
N/A
$5 in / $25 out

GPT-5.2

OpenAI

60.7

$1.75 in / $14 out

25

Grok 4.3

xAI

60.5

$1.25 in / $2.5 out

26

Claude Opus 4.7

Anthropic

60.0

$5 in / $25 out

27
B

Seed 2.0 Pro

ByteDance

60.0

N/A

28

GPT-5.2 Pro

OpenAI

60.0

N/A

29

MiniMax M2.5

MiniMax

59.8

$0.3 in / $1.2 out

30

Kimi K2.6

Moonshot AI

59.4

$0.95 in / $4 out

31
A

Qwen3.6 Plus

Alibaba Cloud / Qwen Team

59.2

$0.5 in / $3 out

32

Gemini 3.5 Flash

Google

59.1

$1.5 in / $9 out

33

Gemini 3 Flash

Google

58.9

$0.5 in / $3 out

34
M

Muse Spark

Meta

58.9

N/A

35

GPT-5.1

OpenAI

58.7

$1.25 in / $10 out

36

GPT-5.1 Instant

OpenAI

58.7

$1.25 in / $10 out

37
B

ERNIE 5.0

Baidu

58.2

N/A

38
S

Step-3.5-Flash

StepFun

57.9

$0.1 in / $0.4 out

39

Claude Opus 4.1

Anthropic

57.9

N/A

40

Claude Opus 4.6

Anthropic

57.4

$5 in / $25 out