Skytells
HomeModelsCLIChangelog
  • Home
  • Models
  • CLI
  • Changelog
Skytells

Addressing the world's greatest challenges with AI. Enterprise research, foundation models, and infrastructure trusted by organizations worldwide since 2012.

Get Started

  • Console
  • Learn
  • Documentation
  • API Reference
  • Pricing
  • ModelsNew

Platform

  • Cloud AgentsNew
  • AI Solutions
  • Infrastructure
  • Edge Network
  • Trust Center
  • CLI

Resources

  • Blog
  • Changelog
  • AI Leaderboard
  • Research
  • Status

Company

  • About
  • Careers
  • Legal
  • Privacy Policy

© 2012–2026 Skytells, Inc. All rights reserved.

Live rankings

AI Model Leaderboard

Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.

Explore full leaderboardBrowse model catalog

309

Tracked models

27

Providers

264

Benchmarked

11.8

Avg. index

OverallBenchmarksInferenceAgenticProgrammingValue / Price

309 models

RankModelProviderScoreBenchmarksInferenceAgenticProgrammingValuePrice
41

Step-3.5-Flash

step-3.5-flash

codeprogrammingtool use
SStepFun

42.0

Agentic

62.860.442.050.695.0$0.1 in / $0.4 out
42

Qwen3.5-35B-A3B

qwen3.5-35b-a3b

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

41.8

Agentic

55.941.141.831.656.3$0.25 in / $2 out
43

Qwen3-Next-80B-A3B-Thinking

qwen3-next-80b-a3b-thinking

textinference
AAlibaba Cloud / Qwen Team

41.7

Agentic

43.40.041.70.00.0N/A
44

Claude Opus 4.5

claude-opus-4-5-20251101

multimodalvisionmulti-input reasoning
Anthropic

41.4

Agentic

55.30.041.473.50.0
45

MiniMax M2

minimax-m2

codeprogrammingtool use
MiniMax

41.2

Agentic

30.641.841.241.559.5$0.3 in / $1.2 out
46

MiniMax M2.7

minimax-m2.7

codeprogrammingtool use
MiniMax

40.1

Agentic

0.025.340.139.667.7$0.3 in / $1.2 out
47

Qwen3 VL 235B A22B Thinking

qwen3-vl-235b-a22b-thinking

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

39.3

Agentic

36.80.039.30.00.0
48

Gemini 3 Flash

gemini-3-flash-preview

multimodalvisionmulti-input reasoning
Google

38.8

Agentic

70.072.238.863.744.9
49

Claude 3.5 Sonnet

claude-3-5-sonnet-20241022

multimodalvisionmulti-input reasoning
Anthropic

38.7

Agentic

33.00.038.711.90.0
50

MiniMax M3

minimax-m3

multimodalvisionmulti-input reasoning
MiniMax

38.7

Agentic

54.672.238.774.348.1$0.6 in / $2.4 out
51

o4-mini

o4-mini

multimodalvisionmulti-input reasoning
OpenAI

37.5

Agentic

47.60.037.530.10.0N/A
52

GLM-4.5

glm-4.5

codeprogrammingtool use
ZZhipu AI

36.2

Agentic

32.60.036.238.00.0N/A
53

GLM-4.6

glm-4.6

multimodalvisionmulti-input reasoning
ZZhipu AI

36.0

Agentic

45.60.036.043.90.0N/A
54

GPT-4.5

gpt-4.5

multimodalvisionmulti-input reasoning
OpenAI

35.8

Agentic

41.30.035.85.50.0N/A
55

Qwen3 VL 32B Thinking

qwen3-vl-32b-thinking

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

34.1

Agentic

43.30.034.10.00.0
56

GPT-4.1

gpt-4.1-2025-04-14

multimodalvisionmulti-input reasoning
OpenAI

32.8

Agentic

27.976.032.815.836.7
57

Qwen3.5-397B-A17B

qwen3.5-397b-a17b

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

31.1

Agentic

57.041.131.157.739.2$0.6 in / $3.6 out
58

LongCat-Flash-Lite

longcat-flash-lite

codeprogrammingtool use
Meituan

30.1

Agentic

23.674.730.124.596.5$0.1 in / $0.4 out
59

LongCat-Flash-Thinking-2601

longcat-flash-thinking-2601

codeprogrammingtool use
Meituan

29.0

Agentic

54.90.029.035.20.0
60

DeepSeek-V3.2-Exp

deepseek-v3.2-exp

codeprogrammingtool use
DeepSeek

28.0

Agentic

51.50.028.038.80.0N/A
41
S

Step-3.5-Flash

StepFun

42.0

$0.1 in / $0.4 out

42
A

Qwen3.5-35B-A3B

Alibaba Cloud / Qwen Team

41.8

$0.25 in / $2 out

43
A

Qwen3-Next-80B-A3B-Thinking

Alibaba Cloud / Qwen Team

41.7

N/A

44

Page 3 of 16 · 309 models

PreviousNext

Want benchmark charts, model comparison, and pricing analytics?

Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.

Open full leaderboard

Rankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.

N/A
N/A
$0.5 in / $3 out
N/A
N/A
$2 in / $8 out
N/A

Claude Opus 4.5

Anthropic

41.4

N/A

45

MiniMax M2

MiniMax

41.2

$0.3 in / $1.2 out

46

MiniMax M2.7

MiniMax

40.1

$0.3 in / $1.2 out

47
A

Qwen3 VL 235B A22B Thinking

Alibaba Cloud / Qwen Team

39.3

N/A

48

Gemini 3 Flash

Google

38.8

$0.5 in / $3 out

49

Claude 3.5 Sonnet

Anthropic

38.7

N/A

50

MiniMax M3

MiniMax

38.7

$0.6 in / $2.4 out

51

o4-mini

OpenAI

37.5

N/A

52
Z

GLM-4.5

Zhipu AI

36.2

N/A

53
Z

GLM-4.6

Zhipu AI

36.0

N/A

54

GPT-4.5

OpenAI

35.8

N/A

55
A

Qwen3 VL 32B Thinking

Alibaba Cloud / Qwen Team

34.1

N/A

56

GPT-4.1

OpenAI

32.8

$2 in / $8 out

57
A

Qwen3.5-397B-A17B

Alibaba Cloud / Qwen Team

31.1

$0.6 in / $3.6 out

58

LongCat-Flash-Lite

Meituan

30.1

$0.1 in / $0.4 out

59

LongCat-Flash-Thinking-2601

Meituan

29.0

N/A

60

DeepSeek-V3.2-Exp

DeepSeek

28.0

N/A