Skytells
HomeModelsCLIChangelog
  • Home
  • Models
  • CLI
  • Changelog
Skytells

Addressing the world's greatest challenges with AI. Enterprise research, foundation models, and infrastructure trusted by organizations worldwide since 2012.

Get Started

  • Console
  • Learn
  • Documentation
  • API Reference
  • Pricing
  • ModelsNew

Platform

  • Cloud AgentsNew
  • AI Solutions
  • Infrastructure
  • Edge Network
  • Trust Center
  • CLI

Resources

  • Blog
  • Changelog
  • AI Leaderboard
  • Research
  • Status

Company

  • About
  • Careers
  • Legal
  • Privacy Policy

© 2012–2026 Skytells, Inc. All rights reserved.

Live rankings

AI Model Leaderboard

Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.

Explore full leaderboardBrowse model catalog

291

Tracked models

27

Providers

248

Benchmarked

34.7

Avg. index

OverallBenchmarksInferenceAgenticProgrammingValue / Price

291 models

RankModelProviderScoreBenchmarksInferenceAgenticProgrammingValuePrice
1

Grok-4 Heavy

grok-4-heavy

multimodalvisionmulti-input reasoning
xAI

73.2

overall

73.20.00.00.00.0N/A
2

Grok-4.20 Beta Non-Reasoning

grok-4.20-beta-0309-non-reasoning

multimodalvisionmulti-input reasoning
xAI

70.3

overall

0.097.20.00.027.2
3

Grok-4.20 Beta Reasoning

grok-4.20-beta-0309-reasoning

multimodalvisionmulti-input reasoning
xAI

70.3

overall

0.097.20.00.027.2
4

Claude Mythos Preview

claude-mythos-preview

multimodalvisionmulti-input reasoning
Anthropic

69.4

overall

80.00.071.884.21.1
5

GPT-5.1 High

gpt-5.1-high-2025-11-12

multimodalvisionmulti-input reasoning
OpenAI

68.8

overall

68.80.00.00.00.0
6

Qwen3-Coder

qwen3-coder

textinference
AAlibaba Cloud / Qwen Team

68.7

overall

0.056.50.00.088.2$0.18 in / $0.18 out
7

Grok-4.1 Fast Non-Reasoning

grok-4-1-fast-non-reasoning

multimodalvisionmulti-input reasoning
xAI

67.6

overall

0.068.00.00.066.9
8

Grok-4.1 Fast Reasoning

grok-4-1-fast-reasoning

multimodalvisionmulti-input reasoning
xAI

67.6

overall

0.068.00.00.066.9
9

Grok-4 Fast Non-Reasoning

grok-4-fast-non-reasoning

multimodalvisionmulti-input reasoning
xAI

67.6

overall

0.068.00.00.066.9
10

Grok-4 Fast Reasoning

grok-4-fast-reasoning

multimodalvisionmulti-input reasoning
xAI

67.6

overall

0.068.00.00.066.9
11

MiMo-V2-Pro

mimo-v2-pro

codeprogrammingtool use
Xiaomi

66.4

overall

0.085.30.066.535.7$1 in / $3 out
12

Gemini 3.1 Pro

gemini-3.1-pro-preview

multimodalvisionmulti-input reasoning
Google

66.2

overall

76.566.574.164.521.2
13

Gemini 3 Pro

gemini-3-pro-preview

multimodalvisionmulti-input reasoning
Google

66.1

overall

74.40.063.858.10.0
14

Gemini 3.1 Flash-Lite

gemini-3.1-flash-lite-preview

multimodalvisionmulti-input reasoning
Google

64.6

overall

58.185.30.00.050.0
15

GPT-5.2

gpt-5.2-2025-12-11

multimodalvisionmulti-input reasoning
OpenAI

64.2

overall

77.671.852.572.025.6
16

Claude Opus 4.7

claude-opus-4-7

multimodalvisionmulti-input reasoning
Anthropic

64.2

overall

77.143.170.780.89.8
17

Gemma 4 31B

gemma-4-31b-it

multimodalvisionmulti-input reasoning
Google

63.8

overall

57.167.50.00.076.4$0.14 in / $0.4 out
18

GPT-5 High

gpt-5-high-2025-08-07

multimodalvisionmulti-input reasoning
OpenAI

63.6

overall

63.60.00.00.00.0
19

Seed 2.0 Pro

seed-2.0-pro

multimodalvisionmulti-input reasoning
BByteDance

63.1

overall

68.40.057.462.50.0N/A
20

Gemini 3 Flash

gemini-3-flash-preview

multimodalvisionmulti-input reasoning
Google

63.0

overall

72.285.344.566.538.2
1

Grok-4 Heavy

xAI

73.2

N/A

2

Grok-4.20 Beta Non-Reasoning

xAI

70.3

$2 in / $6 out

3

Grok-4.20 Beta Reasoning

xAI

70.3

$2 in / $6 out

Page 1 of 15 · 291 models

Next

Want benchmark charts, model comparison, and pricing analytics?

Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.

Open full leaderboard

Rankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.

$2 in / $6 out
$2 in / $6 out
$25 in / $125 out
N/A
$0.2 in / $0.5 out
$0.2 in / $0.5 out
$0.2 in / $0.5 out
$0.2 in / $0.5 out
$2.5 in / $15 out
N/A
$0.25 in / $1.5 out
$1.75 in / $14 out
$5 in / $25 out
N/A
$0.5 in / $3 out
4

Claude Mythos Preview

Anthropic

69.4

$25 in / $125 out

5

GPT-5.1 High

OpenAI

68.8

N/A

6
A

Qwen3-Coder

Alibaba Cloud / Qwen Team

68.7

$0.18 in / $0.18 out

7

Grok-4.1 Fast Non-Reasoning

xAI

67.6

$0.2 in / $0.5 out

8

Grok-4.1 Fast Reasoning

xAI

67.6

$0.2 in / $0.5 out

9

Grok-4 Fast Non-Reasoning

xAI

67.6

$0.2 in / $0.5 out

10

Grok-4 Fast Reasoning

xAI

67.6

$0.2 in / $0.5 out

11

MiMo-V2-Pro

Xiaomi

66.4

$1 in / $3 out

12

Gemini 3.1 Pro

Google

66.2

$2.5 in / $15 out

13

Gemini 3 Pro

Google

66.1

N/A

14

Gemini 3.1 Flash-Lite

Google

64.6

$0.25 in / $1.5 out

15

GPT-5.2

OpenAI

64.2

$1.75 in / $14 out

16

Claude Opus 4.7

Anthropic

64.2

$5 in / $25 out

17

Gemma 4 31B

Google

63.8

$0.14 in / $0.4 out

18

GPT-5 High

OpenAI

63.6

N/A

19
B

Seed 2.0 Pro

ByteDance

63.1

N/A

20

Gemini 3 Flash

Google

63.0

$0.5 in / $3 out