Skytells
HomeModelsCLIChangelog
  • Home
  • Models
  • CLI
  • Changelog
Skytells

Addressing the world's greatest challenges with AI. Enterprise research, foundation models, and infrastructure trusted by organizations worldwide since 2012.

Get Started

  • Console
  • Learn
  • Documentation
  • API Reference
  • Pricing
  • ModelsNew

Platform

  • Cloud AgentsNew
  • AI Solutions
  • Infrastructure
  • Edge Network
  • Trust Center
  • CLI

Resources

  • Blog
  • Changelog
  • AI Leaderboard
  • Research
  • Status

Company

  • About
  • Careers
  • Legal
  • Privacy Policy

© 2012–2026 Skytells, Inc. All rights reserved.

Live rankings

AI Model Leaderboard

Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.

Explore full leaderboardBrowse model catalog

294

Tracked models

27

Providers

251

Benchmarked

13.2

Avg. index

OverallBenchmarksInferenceAgenticProgrammingValue / Price

294 models

RankModelProviderScoreBenchmarksInferenceAgenticProgrammingValuePrice
61

o4-mini

o4-mini

multimodalvisionmulti-input reasoning
OpenAI

32.7

Programming

48.870.738.232.741.9$1.1 in / $4.4 out
62

o3

o3-2025-04-16

multimodalvisionmulti-input reasoning
OpenAI

30.7

Programming

46.238.420.530.727.7$2 in / $8 out
63

Gemini 2.5 Pro Preview 06-05

gemini-2.5-pro-preview-06-05

multimodalvisionmulti-input reasoning
Google

30.0

Programming

51.763.20.030.027.9
64

DeepSeek-V3.1

deepseek-v3.1

codeprogrammingtool use
DeepSeek

28.7

Programming

38.740.215.328.758.9$0.27 in / $1 out
65

Nemotron 3 Super (120B A12B)

nemotron-3-super-120b-a12b

codeprogrammingtool use
NNVIDIA

27.0

Programming

48.90.08.927.00.0N/A
66

GPT-5.4 Mini

gpt-5.4-mini

texttext-to-textlanguage
OpenAI

26.9

Programming

57.477.427.126.932.8
67

Qwen3.6-35B-A3B

qwen3.6-35b-a3b

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

26.6

Programming

55.70.017.726.60.0N/A
68

Gemini 2.5 Pro

gemini-2.5-pro

multimodalvisionmulti-input reasoning
Google

25.6

Programming

44.663.20.025.627.9
69

LongCat-Flash-Lite

longcat-flash-lite

codeprogrammingtool use
Meituan

25.3

Programming

24.783.829.525.383.3$0.1 in / $0.4 out
70

Devstral Medium

devstral-medium-2507

codeprogrammingtool use
Mistral AI

24.7

Programming

0.064.50.024.753.2
71

GPT-5 mini

gpt-5-mini-2025-08-07

multimodalvisionmulti-input reasoning
OpenAI

23.7

Programming

41.989.70.023.756.3
72

Gemini 2.5 Flash

gemini-2.5-flash

multimodalvisionmulti-input reasoning
Google

23.4

Programming

40.163.20.023.442.6
73

Mercury 2

mercury-2

codeprogrammingtool use
IInception

22.3

Programming

44.672.50.022.369.2$0.25 in / $0.75 out
74

LongCat-Flash-Thinking

longcat-flash-thinking

codeprogrammingtool use
Meituan

22.1

Programming

50.40.00.022.10.0
75

GLM-4.7-Flash

glm-4.7-flash

codeprogrammingtool use
ZZhipu AI

21.2

Programming

38.529.112.021.272.2$0.07 in / $0.4 out
76

GLM-4.5-Air

glm-4.5-air

codeprogrammingtool use
ZZhipu AI

20.2

Programming

28.10.024.920.20.0N/A
77

Kimi K2-Instruct-0905

kimi-k2-instruct-0905

codeprogrammingtool use
Moonshot AI

19.6

Programming

24.90.06.619.60.0
78

MiniMax M1 80K

minimax-m1-80k

codeprogrammingtool use
MiniMax

19.4

Programming

24.684.920.919.441.7$0.55 in / $2.2 out
79

MiniMax M1 40K

minimax-m1-40k

codeprogrammingtool use
MiniMax

18.5

Programming

22.90.026.818.50.0N/A
80

GPT-4.1

gpt-4.1-2025-04-14

multimodalvisionmulti-input reasoning
OpenAI

17.7

Programming

28.875.432.817.734.6
61

o4-mini

OpenAI

32.7

$1.1 in / $4.4 out

62

o3

OpenAI

30.7

$2 in / $8 out

63

Gemini 2.5 Pro Preview 06-05

Google

30.0

$1.25 in / $10 out

64

Page 4 of 15 · 294 models

PreviousNext

Want benchmark charts, model comparison, and pricing analytics?

Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.

Open full leaderboard

Rankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.

$1.25 in / $10 out
$0.75 in / $4.5 out
$1.25 in / $10 out
$0.4 in / $2 out
$0.25 in / $2 out
$0.3 in / $2.5 out
N/A
N/A
$2 in / $8 out

DeepSeek-V3.1

DeepSeek

28.7

$0.27 in / $1 out

65
N

Nemotron 3 Super (120B A12B)

NVIDIA

27.0

N/A

66

GPT-5.4 Mini

OpenAI

26.9

$0.75 in / $4.5 out

67
A

Qwen3.6-35B-A3B

Alibaba Cloud / Qwen Team

26.6

N/A

68

Gemini 2.5 Pro

Google

25.6

$1.25 in / $10 out

69

LongCat-Flash-Lite

Meituan

25.3

$0.1 in / $0.4 out

70

Devstral Medium

Mistral AI

24.7

$0.4 in / $2 out

71

GPT-5 mini

OpenAI

23.7

$0.25 in / $2 out

72

Gemini 2.5 Flash

Google

23.4

$0.3 in / $2.5 out

73
I

Mercury 2

Inception

22.3

$0.25 in / $0.75 out

74

LongCat-Flash-Thinking

Meituan

22.1

N/A

75
Z

GLM-4.7-Flash

Zhipu AI

21.2

$0.07 in / $0.4 out

76
Z

GLM-4.5-Air

Zhipu AI

20.2

N/A

77

Kimi K2-Instruct-0905

Moonshot AI

19.6

N/A

78

MiniMax M1 80K

MiniMax

19.4

$0.55 in / $2.2 out

79

MiniMax M1 40K

MiniMax

18.5

N/A

80

GPT-4.1

OpenAI

17.7

$2 in / $8 out