Skytells
HomeModelsCLIChangelog
  • Home
  • Models
  • CLI
  • Changelog
Skytells

Addressing the world's greatest challenges with AI. Enterprise research, foundation models, and infrastructure trusted by organizations worldwide since 2012.

Get Started

  • Console
  • Learn
  • Documentation
  • API Reference
  • Pricing
  • ModelsNew

Platform

  • Cloud AgentsNew
  • AI Solutions
  • Infrastructure
  • Edge Network
  • Trust Center
  • CLI

Resources

  • Blog
  • Changelog
  • AI Leaderboard
  • Research
  • Status

Company

  • About
  • Careers
  • Legal
  • Privacy Policy

© 2012–2026 Skytells, Inc. All rights reserved.

Live rankings

AI Model Leaderboard

Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.

Explore full leaderboardBrowse model catalog

294

Tracked models

27

Providers

251

Benchmarked

34.7

Avg. index

OverallBenchmarksInferenceAgenticProgrammingValue / Price

294 models

RankModelProviderScoreBenchmarksInferenceAgenticProgrammingValuePrice
21

Gemini 3 Flash

gemini-3-flash-preview

multimodalvisionmulti-input reasoning
Google

62.3

overall

71.384.942.566.638.9$0.5 in / $3 out
22

Kimi K2-Thinking-0905

kimi-k2-thinking-0905

codeprogrammingtool use
Moonshot AI

62.2

overall

69.30.053.562.50.0
23

DeepSeek-V3.2 (Non-thinking)

deepseek-chat

textinference
DeepSeek

62.2

overall

0.057.30.00.070.1$0.28 in / $0.42 out
24

Kimi K2.6

kimi-k2.6

multimodalvisionmulti-input reasoning
Moonshot AI

61.9

overall

68.566.845.381.033.3$0.95 in / $4 out
25

Seed 2.0 Pro

seed-2.0-pro

multimodalvisionmulti-input reasoning
BByteDance

61.9

overall

68.20.054.761.80.0N/A
26

Qwen3.6 Plus

qwen3.6-plus

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

61.7

overall

71.90.049.362.20.0N/A
27

Muse Spark

muse-spark

multimodalvisionmulti-input reasoning
MMeta

61.0

overall

71.00.067.341.30.0N/A
28

Claude Opus 4.6

claude-opus-4-6

multimodalvisionmulti-input reasoning
Anthropic

60.9

overall

79.542.860.773.310.6
29

Gemini 2.0 Flash

gemini-2.0-flash

multimodalvisionmulti-input reasoning
Google

60.5

overall

33.494.10.00.082.7
30

GPT-5.4

gpt-5.4

texttext-to-textlanguage
OpenAI

60.3

overall

76.351.163.862.118.2
31

GPT-5.1

gpt-5.1-2025-11-13

multimodalvisionmulti-input reasoning
OpenAI

59.8

overall

65.071.40.057.231.9
32

GPT-5.1 Instant

gpt-5.1-instant-2025-11-12

multimodalvisionmulti-input reasoning
OpenAI

59.8

overall

65.071.40.057.231.9
33

ERNIE 5.0

ernie-5.0

multimodalvisionmulti-input reasoning
BBaidu

59.7

overall

59.70.00.00.00.0N/A
34

MiniMax M2.5

minimax-m2.5

codeprogrammingtool use
MiniMax

59.3

overall

0.073.953.056.357.7$0.3 in / $1.2 out
35

Llama 4 Scout

llama-4-scout

multimodalvisionmulti-input reasoning
MMeta

58.8

overall

29.293.00.00.087.2$0.08 in / $0.3 out
36

Ministral 3 (8B Reasoning 2512)

ministral-8b-latest

multimodalvisionmulti-input reasoning
Mistral AI

58.6

overall

31.884.80.00.092.1
37

Step-3.5-Flash

step-3.5-flash

codeprogrammingtool use
SStepFun

58.3

overall

62.363.245.353.082.1$0.1 in / $0.4 out
38

Ministral 3 (14B Reasoning 2512)

ministral-14b-latest

multimodalvisionmulti-input reasoning
Mistral AI

58.0

overall

37.976.80.00.084.5
39

Gemma 4 26B-A4B

gemma-4-26b-a4b-it

multimodalvisionmulti-input reasoning
Google

56.8

overall

43.766.80.00.077.8
40

GPT-5.1 Medium

gpt-5.1-medium-2025-11-12

multimodalvisionmulti-input reasoning
OpenAI

56.6

overall

63.661.60.00.029.0
21

Gemini 3 Flash

Google

62.3

$0.5 in / $3 out

22

Kimi K2-Thinking-0905

Moonshot AI

62.2

N/A

23

DeepSeek-V3.2 (Non-thinking)

DeepSeek

62.2

$0.28 in / $0.42 out

24

Page 2 of 15 · 294 models

PreviousNext

Want benchmark charts, model comparison, and pricing analytics?

Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.

Open full leaderboard

Rankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.

N/A
$5 in / $25 out
$0.1 in / $0.4 out
$2.5 in / $15 out
$1.25 in / $10 out
$1.25 in / $10 out
$0.15 in / $0.15 out
$0.2 in / $0.2 out
$0.13 in / $0.4 out
$1.25 in / $10 out

Kimi K2.6

Moonshot AI

61.9

$0.95 in / $4 out

25
B

Seed 2.0 Pro

ByteDance

61.9

N/A

26
A

Qwen3.6 Plus

Alibaba Cloud / Qwen Team

61.7

N/A

27
M

Muse Spark

Meta

61.0

N/A

28

Claude Opus 4.6

Anthropic

60.9

$5 in / $25 out

29

Gemini 2.0 Flash

Google

60.5

$0.1 in / $0.4 out

30

GPT-5.4

OpenAI

60.3

$2.5 in / $15 out

31

GPT-5.1

OpenAI

59.8

$1.25 in / $10 out

32

GPT-5.1 Instant

OpenAI

59.8

$1.25 in / $10 out

33
B

ERNIE 5.0

Baidu

59.7

N/A

34

MiniMax M2.5

MiniMax

59.3

$0.3 in / $1.2 out

35
M

Llama 4 Scout

Meta

58.8

$0.08 in / $0.3 out

36

Ministral 3 (8B Reasoning 2512)

Mistral AI

58.6

$0.15 in / $0.15 out

37
S

Step-3.5-Flash

StepFun

58.3

$0.1 in / $0.4 out

38

Ministral 3 (14B Reasoning 2512)

Mistral AI

58.0

$0.2 in / $0.2 out

39

Gemma 4 26B-A4B

Google

56.8

$0.13 in / $0.4 out

40

GPT-5.1 Medium

OpenAI

56.6

$1.25 in / $10 out