Skytells
HomeModelsCLIChangelog
  • Home
  • Models
  • CLI
  • Changelog
Skytells

Addressing the world's greatest challenges with AI. Enterprise research, foundation models, and infrastructure trusted by organizations worldwide since 2012.

Get Started

  • Console
  • Learn
  • Documentation
  • API Reference
  • Pricing
  • ModelsNew

Platform

  • Cloud AgentsNew
  • AI Solutions
  • Infrastructure
  • Edge Network
  • Trust Center
  • CLI

Resources

  • Blog
  • Changelog
  • AI Leaderboard
  • Research
  • Status

Company

  • About
  • Careers
  • Legal
  • Privacy Policy

© 2012–2026 Skytells, Inc. All rights reserved.

Live rankings

AI Model Leaderboard

Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.

Explore full leaderboardBrowse model catalog

309

Tracked models

27

Providers

264

Benchmarked

13.9

Avg. index

OverallBenchmarksInferenceAgenticProgrammingValue / Price

309 models

RankModelProviderScoreBenchmarksInferenceAgenticProgrammingValuePrice
1

Claude Mythos Preview

claude-mythos-preview

multimodalvisionmulti-input reasoning
Anthropic

84.2

Programming

80.00.070.284.20.0N/A
2

Claude Opus 4.8

claude-opus-4-8

multimodalvisionmulti-input reasoning
Anthropic

82.0

Programming

75.231.580.082.06.3
3

Qwen3.7 Max

qwen3.7-max

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

81.5

Programming

66.172.261.781.535.4
4

Claude Opus 4.7

claude-opus-4-7

multimodalvisionmulti-input reasoning
Anthropic

79.9

Programming

76.631.563.879.96.3
5

Kimi K2.6

kimi-k2.6

texttext-to-textlanguage
Moonshot AI

75.4

Programming

67.041.157.675.436.7
6

Claude Sonnet 4.5

claude-sonnet-4-5-20250929

multimodalvisionmulti-input reasoning
Anthropic

74.6

Programming

51.914.671.874.69.3
7

MiniMax M3

minimax-m3

multimodalvisionmulti-input reasoning
MiniMax

74.3

Programming

54.672.238.774.348.1$0.6 in / $2.4 out
8

Claude Opus 4.5

claude-opus-4-5-20251101

multimodalvisionmulti-input reasoning
Anthropic

73.5

Programming

55.30.041.473.50.0
9

Claude Opus 4.6

claude-opus-4-6

multimodalvisionmulti-input reasoning
Anthropic

72.8

Programming

78.231.557.872.86.3
10

GPT-5.2

gpt-5.2-2025-12-11

multimodalvisionmulti-input reasoning
OpenAI

70.7

Programming

75.366.944.470.727.1
11

Claude Sonnet 4.6

claude-sonnet-4-6

multimodalvisionmulti-input reasoning
Anthropic

66.4

Programming

64.714.647.666.49.3
12

Gemini 3.1 Pro

gemini-3.1-pro-preview

multimodalvisionmulti-input reasoning
Google

66.0

Programming

73.859.468.966.018.5
13

Gemini 3 Flash

gemini-3-flash-preview

multimodalvisionmulti-input reasoning
Google

63.7

Programming

70.072.238.863.744.9
14

MiMo-V2-Pro

mimo-v2-pro

codeprogrammingtool use
Xiaomi

63.7

Programming

0.00.00.063.70.0N/A
15

GLM-5

glm-5

codeprogrammingtool use
ZZhipu AI

62.5

Programming

0.08.743.662.531.8$1 in / $3.2 out
16

Claude Opus 4.1

claude-opus-4-1-20250805

multimodalvisionmulti-input reasoning
Anthropic

62.0

Programming

46.40.067.462.00.0
17

Mistral Medium 3.5

mistral-medium-3-5

multimodalvisionmulti-input reasoning
Mistral AI

61.7

Programming

34.928.516.861.729.1
18

GPT-5.5

gpt-5.5

multimodalvisionmulti-input reasoning
OpenAI

61.6

Programming

80.493.770.261.61.9$5 in / $30 out
19

Qwen3.6 Plus

qwen3.6-plus

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

61.0

Programming

70.272.242.161.044.9$0.5 in / $3 out
20

GPT-5.4

gpt-5.4

texttext-to-textlanguage
OpenAI

60.6

Programming

75.338.956.260.614.1
1

Claude Mythos Preview

Anthropic

84.2

N/A

2

Claude Opus 4.8

Anthropic

82.0

$5 in / $25 out

3
A

Qwen3.7 Max

Alibaba Cloud / Qwen Team

81.5

$1.25 in / $3.75 out

Page 1 of 16 · 309 models

Next

Want benchmark charts, model comparison, and pricing analytics?

Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.

Open full leaderboard

Rankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.

$5 in / $25 out
$1.25 in / $3.75 out
$5 in / $25 out
$0.95 in / $4 out
$3 in / $15 out
N/A
$5 in / $25 out
$1.75 in / $14 out
$3 in / $15 out
$2.5 in / $15 out
$0.5 in / $3 out
N/A
$1.5 in / $7.5 out
$2.5 in / $15 out
4

Claude Opus 4.7

Anthropic

79.9

$5 in / $25 out

5

Kimi K2.6

Moonshot AI

75.4

$0.95 in / $4 out

6

Claude Sonnet 4.5

Anthropic

74.6

$3 in / $15 out

7

MiniMax M3

MiniMax

74.3

$0.6 in / $2.4 out

8

Claude Opus 4.5

Anthropic

73.5

N/A

9

Claude Opus 4.6

Anthropic

72.8

$5 in / $25 out

10

GPT-5.2

OpenAI

70.7

$1.75 in / $14 out

11

Claude Sonnet 4.6

Anthropic

66.4

$3 in / $15 out

12

Gemini 3.1 Pro

Google

66.0

$2.5 in / $15 out

13

Gemini 3 Flash

Google

63.7

$0.5 in / $3 out

14

MiMo-V2-Pro

Xiaomi

63.7

N/A

15
Z

GLM-5

Zhipu AI

62.5

$1 in / $3.2 out

16

Claude Opus 4.1

Anthropic

62.0

N/A

17

Mistral Medium 3.5

Mistral AI

61.7

$1.5 in / $7.5 out

18

GPT-5.5

OpenAI

61.6

$5 in / $30 out

19
A

Qwen3.6 Plus

Alibaba Cloud / Qwen Team

61.0

$0.5 in / $3 out

20

GPT-5.4

OpenAI

60.6

$2.5 in / $15 out