Skytells
HomeModelsCLIChangelog
  • Home
  • Models
  • CLI
  • Changelog
Skytells

Addressing the world's greatest challenges with AI. Enterprise research, foundation models, and infrastructure trusted by organizations worldwide since 2012.

Get Started

  • Console
  • Learn
  • Documentation
  • API Reference
  • Pricing
  • ModelsNew

Platform

  • Cloud AgentsNew
  • AI Solutions
  • Infrastructure
  • Edge Network
  • Trust Center
  • CLI

Resources

  • Blog
  • Changelog
  • AI Leaderboard
  • Research
  • Status

Company

  • About
  • Careers
  • Legal
  • Privacy Policy

© 2012–2026 Skytells, Inc. All rights reserved.

Live rankings

AI Model Leaderboard

Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.

Explore full leaderboardBrowse model catalog

309

Tracked models

27

Providers

264

Benchmarked

11.8

Avg. index

OverallBenchmarksInferenceAgenticProgrammingValue / Price

309 models

RankModelProviderScoreBenchmarksInferenceAgenticProgrammingValuePrice
21

GPT-5.2 Pro

gpt-5.2-pro-2025-12-11

multimodalvisionmulti-input reasoning
OpenAI

53.4

Agentic

65.50.053.40.00.0N/A
22

Claude Haiku 4.5

claude-haiku-4-5-20251001

multimodalvisionmulti-input reasoning
Anthropic

53.3

Agentic

31.555.353.354.938.7
23

Kimi K2-Thinking-0905

kimi-k2-thinking-0905

codeprogrammingtool use
Moonshot AI

52.8

Agentic

68.70.052.859.80.0
24

MiniMax M2.1

minimax-m2.1

codeprogrammingtool use
MiniMax

52.1

Agentic

40.872.252.148.768.6$0.3 in / $1.2 out
25

Seed 2.0 Pro

seed-2.0-pro

multimodalvisionmulti-input reasoning
BByteDance

51.9

Agentic

68.00.051.958.50.0N/A
26

Qwen3-Coder 480B A35B Instruct

qwen3-coder-480b-a35b-instruct

codeprogrammingtool use
AAlibaba Cloud / Qwen Team

50.7

Agentic

0.00.050.733.60.0
27

MiniMax M2.5

minimax-m2.5

codeprogrammingtool use
MiniMax

50.4

Agentic

0.072.250.456.968.6$0.3 in / $1.2 out
28

Claude Sonnet 4

claude-sonnet-4-20250514

multimodalvisionmulti-input reasoning
Anthropic

49.4

Agentic

39.90.049.443.60.0
29

Claude 3.7 Sonnet

claude-3-7-sonnet-20250219

multimodalvisionmulti-input reasoning
Anthropic

49.1

Agentic

43.00.049.138.80.0
30

Qwen3.5-122B-A10B

qwen3.5-122b-a10b

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

48.7

Agentic

63.641.148.739.543.0$0.4 in / $3.2 out
31

LongCat-Flash-Chat

longcat-flash-chat

codeprogrammingtool use
Meituan

48.1

Agentic

26.90.048.137.40.0N/A
32

Claude Sonnet 4.6

claude-sonnet-4-6

multimodalvisionmulti-input reasoning
Anthropic

47.6

Agentic

64.714.647.666.49.3
33

DeepSeek-V4-Flash-Max

deepseek-v4-flash-max

codeprogrammingtool use
DeepSeek

47.6

Agentic

58.389.247.644.298.7
34

Kimi K2.5

kimi-k2.5

multimodalvisionmulti-input reasoning
Moonshot AI

47.3

Agentic

67.20.047.344.60.0N/A
35

GLM-5.1

glm-5.1

codeprogrammingtool use
ZZhipu AI

46.0

Agentic

66.321.546.054.931.6$1.4 in / $4.4 out
36

Qwen3.5-27B

qwen3.5-27b

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

44.9

Agentic

60.841.144.940.353.2$0.3 in / $2.4 out
37

o1

o1-2024-12-17

multimodalvisionmulti-input reasoning
OpenAI

44.7

Agentic

42.30.044.76.00.0N/A
38

GPT-5.2

gpt-5.2-2025-12-11

multimodalvisionmulti-input reasoning
OpenAI

44.4

Agentic

75.366.944.470.727.1
39

GLM-5

glm-5

codeprogrammingtool use
ZZhipu AI

43.6

Agentic

0.08.743.662.531.8$1 in / $3.2 out
40

Qwen3.6 Plus

qwen3.6-plus

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

42.1

Agentic

70.272.242.161.044.9$0.5 in / $3 out
21

GPT-5.2 Pro

OpenAI

53.4

N/A

22

Claude Haiku 4.5

Anthropic

53.3

$1 in / $5 out

23

Kimi K2-Thinking-0905

Moonshot AI

52.8

N/A

24

Page 2 of 16 · 309 models

PreviousNext

Want benchmark charts, model comparison, and pricing analytics?

Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.

Open full leaderboard

Rankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.

$1 in / $5 out
N/A
N/A
N/A
N/A
$3 in / $15 out
$0.14 in / $0.28 out
$1.75 in / $14 out

MiniMax M2.1

MiniMax

52.1

$0.3 in / $1.2 out

25
B

Seed 2.0 Pro

ByteDance

51.9

N/A

26
A

Qwen3-Coder 480B A35B Instruct

Alibaba Cloud / Qwen Team

50.7

N/A

27

MiniMax M2.5

MiniMax

50.4

$0.3 in / $1.2 out

28

Claude Sonnet 4

Anthropic

49.4

N/A

29

Claude 3.7 Sonnet

Anthropic

49.1

N/A

30
A

Qwen3.5-122B-A10B

Alibaba Cloud / Qwen Team

48.7

$0.4 in / $3.2 out

31

LongCat-Flash-Chat

Meituan

48.1

N/A

32

Claude Sonnet 4.6

Anthropic

47.6

$3 in / $15 out

33

DeepSeek-V4-Flash-Max

DeepSeek

47.6

$0.14 in / $0.28 out

34

Kimi K2.5

Moonshot AI

47.3

N/A

35
Z

GLM-5.1

Zhipu AI

46.0

$1.4 in / $4.4 out

36
A

Qwen3.5-27B

Alibaba Cloud / Qwen Team

44.9

$0.3 in / $2.4 out

37

o1

OpenAI

44.7

N/A

38

GPT-5.2

OpenAI

44.4

$1.75 in / $14 out

39
Z

GLM-5

Zhipu AI

43.6

$1 in / $3.2 out

40
A

Qwen3.6 Plus

Alibaba Cloud / Qwen Team

42.1

$0.5 in / $3 out