Skytells
HomeModelsCLIChangelog
  • Home
  • Models
  • CLI
  • Changelog
Skytells

Addressing the world's greatest challenges with AI. Enterprise research, foundation models, and infrastructure trusted by organizations worldwide since 2012.

Get Started

  • Console
  • Learn
  • Documentation
  • API Reference
  • Pricing
  • ModelsNew

Platform

  • Cloud AgentsNew
  • AI Solutions
  • Infrastructure
  • Edge Network
  • Trust Center
  • CLI

Resources

  • Blog
  • Changelog
  • AI Leaderboard
  • Research
  • Status

Company

  • About
  • Careers
  • Legal
  • Privacy Policy

© 2012–2026 Skytells, Inc. All rights reserved.

Live rankings

AI Model Leaderboard

Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.

Explore full leaderboardBrowse model catalog

309

Tracked models

27

Providers

264

Benchmarked

27.7

Avg. index

OverallBenchmarksInferenceAgenticProgrammingValue / Price

309 models

RankModelProviderScoreBenchmarksInferenceAgenticProgrammingValuePrice
1

GPT-5.5

gpt-5.5

multimodalvisionmulti-input reasoning
OpenAI

80.4

Benchmarks

80.493.770.261.61.9$5 in / $30 out
2

Claude Mythos Preview

claude-mythos-preview

multimodalvisionmulti-input reasoning
Anthropic

80.0

Benchmarks

80.00.070.284.20.0
3

Claude Opus 4.6

claude-opus-4-6

multimodalvisionmulti-input reasoning
Anthropic

78.2

Benchmarks

78.231.557.872.86.3
4

Claude Opus 4.7

claude-opus-4-7

multimodalvisionmulti-input reasoning
Anthropic

76.6

Benchmarks

76.631.563.879.96.3
5

GPT-5.2

gpt-5.2-2025-12-11

multimodalvisionmulti-input reasoning
OpenAI

75.3

Benchmarks

75.366.944.470.727.1
6

GPT-5.4

gpt-5.4

texttext-to-textlanguage
OpenAI

75.3

Benchmarks

75.338.956.260.614.1
7

Claude Opus 4.8

claude-opus-4-8

multimodalvisionmulti-input reasoning
Anthropic

75.2

Benchmarks

75.231.580.082.06.3
8

Gemini 3.1 Pro

gemini-3.1-pro-preview

multimodalvisionmulti-input reasoning
Google

73.8

Benchmarks

73.859.468.966.018.5
9

Gemini 3 Pro

gemini-3-pro-preview

multimodalvisionmulti-input reasoning
Google

72.0

Benchmarks

72.00.060.754.60.0
10

Grok-4 Heavy

grok-4-heavy

multimodalvisionmulti-input reasoning
xAI

72.0

Benchmarks

72.00.00.00.00.0N/A
11

Qwen3.6 Plus

qwen3.6-plus

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

70.2

Benchmarks

70.272.242.161.044.9$0.5 in / $3 out
12

Gemini 3 Flash

gemini-3-flash-preview

multimodalvisionmulti-input reasoning
Google

70.0

Benchmarks

70.072.238.863.744.9
13

Muse Spark

muse-spark

multimodalvisionmulti-input reasoning
MMeta

69.9

Benchmarks

69.90.064.139.10.0N/A
14

Kimi K2-Thinking-0905

kimi-k2-thinking-0905

codeprogrammingtool use
Moonshot AI

68.7

Benchmarks

68.70.052.859.80.0
15

GPT-5.1 High

gpt-5.1-high-2025-11-12

multimodalvisionmulti-input reasoning
OpenAI

68.3

Benchmarks

68.30.00.00.00.0
16

Seed 2.0 Pro

seed-2.0-pro

multimodalvisionmulti-input reasoning
BByteDance

68.0

Benchmarks

68.00.051.958.50.0N/A
17

GPT-5.5 Pro

gpt-5.5-pro

multimodalvisionmulti-input reasoning
OpenAI

67.8

Benchmarks

67.80.071.860.10.0N/A
18

DeepSeek-V4-Pro-Max

deepseek-v4-pro-max

codeprogrammingtool use
DeepSeek

67.4

Benchmarks

67.489.261.358.634.2
19

Kimi K2.5

kimi-k2.5

multimodalvisionmulti-input reasoning
Moonshot AI

67.2

Benchmarks

67.20.047.344.60.0N/A
20

Kimi K2.6

kimi-k2.6

texttext-to-textlanguage
Moonshot AI

67.0

Benchmarks

67.041.157.675.436.7
1

GPT-5.5

OpenAI

80.4

$5 in / $30 out

2

Claude Mythos Preview

Anthropic

80.0

N/A

3

Claude Opus 4.6

Anthropic

78.2

$5 in / $25 out

Page 1 of 16 · 309 models

Next

Want benchmark charts, model comparison, and pricing analytics?

Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.

Open full leaderboard

Rankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.

N/A
$5 in / $25 out
$5 in / $25 out
$1.75 in / $14 out
$2.5 in / $15 out
$5 in / $25 out
$2.5 in / $15 out
N/A
$0.5 in / $3 out
N/A
N/A
$1.74 in / $3.48 out
$0.95 in / $4 out
4

Claude Opus 4.7

Anthropic

76.6

$5 in / $25 out

5

GPT-5.2

OpenAI

75.3

$1.75 in / $14 out

6

GPT-5.4

OpenAI

75.3

$2.5 in / $15 out

7

Claude Opus 4.8

Anthropic

75.2

$5 in / $25 out

8

Gemini 3.1 Pro

Google

73.8

$2.5 in / $15 out

9

Gemini 3 Pro

Google

72.0

N/A

10

Grok-4 Heavy

xAI

72.0

N/A

11
A

Qwen3.6 Plus

Alibaba Cloud / Qwen Team

70.2

$0.5 in / $3 out

12

Gemini 3 Flash

Google

70.0

$0.5 in / $3 out

13
M

Muse Spark

Meta

69.9

N/A

14

Kimi K2-Thinking-0905

Moonshot AI

68.7

N/A

15

GPT-5.1 High

OpenAI

68.3

N/A

16
B

Seed 2.0 Pro

ByteDance

68.0

N/A

17

GPT-5.5 Pro

OpenAI

67.8

N/A

18

DeepSeek-V4-Pro-Max

DeepSeek

67.4

$1.74 in / $3.48 out

19

Kimi K2.5

Moonshot AI

67.2

N/A

20

Kimi K2.6

Moonshot AI

67.0

$0.95 in / $4 out