Skytells
HomeModelsCLIChangelog
  • Home
  • Models
  • CLI
  • Changelog
Skytells

Addressing the world's greatest challenges with AI. Enterprise research, foundation models, and infrastructure trusted by organizations worldwide since 2012.

Get Started

  • Console
  • Learn
  • Documentation
  • API Reference
  • Pricing
  • ModelsNew

Platform

  • Cloud AgentsNew
  • AI Solutions
  • Infrastructure
  • Edge Network
  • Trust Center
  • CLI

Resources

  • Blog
  • Changelog
  • AI Leaderboard
  • Research
  • Status

Company

  • About
  • Careers
  • Legal
  • Privacy Policy

© 2012–2026 Skytells, Inc. All rights reserved.

Live rankings

AI Model Leaderboard

Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.

Explore full leaderboardBrowse model catalog

294

Tracked models

27

Providers

251

Benchmarked

11.4

Avg. index

OverallBenchmarksInferenceAgenticProgrammingValue / Price

294 models

RankModelProviderScoreBenchmarksInferenceAgenticProgrammingValuePrice
41

MiniMax M2

minimax-m2

codeprogrammingtool use
MiniMax

41.4

Agentic

32.255.941.442.852.3$0.3 in / $1.2 out
42

Qwen3 VL 235B A22B Thinking

qwen3-vl-235b-a22b-thinking

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

40.2

Agentic

37.966.840.20.037.2
43

Claude 3.5 Sonnet

claude-3-5-sonnet-20241022

multimodalvisionmulti-input reasoning
Anthropic

38.7

Agentic

33.967.438.713.224.5
44

o4-mini

o4-mini

multimodalvisionmulti-input reasoning
OpenAI

38.2

Agentic

48.870.738.232.741.9$1.1 in / $4.4 out
45

GLM-4.6

glm-4.6

multimodalvisionmulti-input reasoning
ZZhipu AI

37.7

Agentic

47.034.937.746.142.8$0.55 in / $2.19 out
46

GLM-4.5

glm-4.5

codeprogrammingtool use
ZZhipu AI

36.4

Agentic

34.30.036.440.60.0N/A
47

GPT-4.5

gpt-4.5

multimodalvisionmulti-input reasoning
OpenAI

35.8

Agentic

41.929.135.86.26.8$75 in / $150 out
48

Qwen3.5-397B-A17B

qwen3.5-397b-a17b

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

35.6

Agentic

58.666.835.660.935.3$0.6 in / $3.6 out
49

Qwen3 VL 32B Thinking

qwen3-vl-32b-thinking

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

34.6

Agentic

44.60.034.60.00.0
50

GPT-4.1

gpt-4.1-2025-04-14

multimodalvisionmulti-input reasoning
OpenAI

32.8

Agentic

28.875.432.817.734.6
51

LongCat-Flash-Thinking-2601

longcat-flash-thinking-2601

codeprogrammingtool use
Meituan

30.8

Agentic

56.351.930.838.057.7
52

LongCat-Flash-Lite

longcat-flash-lite

codeprogrammingtool use
Meituan

29.5

Agentic

24.783.829.525.383.3$0.1 in / $0.4 out
53

GPT-5

gpt-5-2025-08-07

multimodalvisionmulti-input reasoning
OpenAI

29.0

Agentic

64.40.029.051.70.0N/A
54

DeepSeek-V3.2-Exp

deepseek-v3.2-exp

codeprogrammingtool use
DeepSeek

28.8

Agentic

52.70.028.840.50.0N/A
55

GLM-4.7

glm-4.7

multimodalvisionmulti-input reasoning
ZZhipu AI

28.2

Agentic

63.252.828.244.540.6$0.6 in / $2.2 out
56

Qwen3 VL 32B Instruct

qwen3-vl-32b-instruct

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

27.9

Agentic

29.50.027.90.00.0
57

MiMo-V2-Flash

mimo-v2-flash

codeprogrammingtool use
Xiaomi

27.2

Agentic

53.779.827.239.385.9$0.1 in / $0.3 out
58

GPT-5.4 Mini

gpt-5.4-mini

texttext-to-textlanguage
OpenAI

27.1

Agentic

57.477.427.126.932.8
59

GPT OSS 120B

gpt-oss-120b

textinference
OpenAI

26.8

Agentic

36.634.926.80.076.7$0.09 in / $0.45 out
60

MiniMax M1 40K

minimax-m1-40k

codeprogrammingtool use
MiniMax

26.8

Agentic

22.90.026.818.50.0N/A
41

MiniMax M2

MiniMax

41.4

$0.3 in / $1.2 out

42
A

Qwen3 VL 235B A22B Thinking

Alibaba Cloud / Qwen Team

40.2

$0.45 in / $3.49 out

43

Claude 3.5 Sonnet

Anthropic

38.7

$3 in / $15 out

44

Page 3 of 15 · 294 models

PreviousNext

Want benchmark charts, model comparison, and pricing analytics?

Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.

Open full leaderboard

Rankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.

$0.45 in / $3.49 out
$3 in / $15 out
N/A
$2 in / $8 out
$0.3 in / $1.2 out
N/A
$0.75 in / $4.5 out

o4-mini

OpenAI

38.2

$1.1 in / $4.4 out

45
Z

GLM-4.6

Zhipu AI

37.7

$0.55 in / $2.19 out

46
Z

GLM-4.5

Zhipu AI

36.4

N/A

47

GPT-4.5

OpenAI

35.8

$75 in / $150 out

48
A

Qwen3.5-397B-A17B

Alibaba Cloud / Qwen Team

35.6

$0.6 in / $3.6 out

49
A

Qwen3 VL 32B Thinking

Alibaba Cloud / Qwen Team

34.6

N/A

50

GPT-4.1

OpenAI

32.8

$2 in / $8 out

51

LongCat-Flash-Thinking-2601

Meituan

30.8

$0.3 in / $1.2 out

52

LongCat-Flash-Lite

Meituan

29.5

$0.1 in / $0.4 out

53

GPT-5

OpenAI

29.0

N/A

54

DeepSeek-V3.2-Exp

DeepSeek

28.8

N/A

55
Z

GLM-4.7

Zhipu AI

28.2

$0.6 in / $2.2 out

56
A

Qwen3 VL 32B Instruct

Alibaba Cloud / Qwen Team

27.9

N/A

57

MiMo-V2-Flash

Xiaomi

27.2

$0.1 in / $0.3 out

58

GPT-5.4 Mini

OpenAI

27.1

$0.75 in / $4.5 out

59

GPT OSS 120B

OpenAI

26.8

$0.09 in / $0.45 out

60

MiniMax M1 40K

MiniMax

26.8

N/A