Skytells
HomeModelsCLIChangelog
  • Home
  • Models
  • CLI
  • Changelog
Skytells

Addressing the world's greatest challenges with AI. Enterprise research, foundation models, and infrastructure trusted by organizations worldwide since 2012.

Get Started

  • Console
  • Learn
  • Documentation
  • API Reference
  • Pricing
  • ModelsNew

Platform

  • Cloud AgentsNew
  • AI Solutions
  • Infrastructure
  • Edge Network
  • Trust Center
  • CLI

Resources

  • Blog
  • Changelog
  • AI Leaderboard
  • Research
  • Status

Company

  • About
  • Careers
  • Legal
  • Privacy Policy

© 2012–2026 Skytells, Inc. All rights reserved.

Live rankings

AI Model Leaderboard

Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.

Explore full leaderboardBrowse model catalog

309

Tracked models

27

Providers

264

Benchmarked

13.1

Avg. index

OverallBenchmarksInferenceAgenticProgrammingValue / Price

309 models

RankModelProviderScoreBenchmarksInferenceAgenticProgrammingValuePrice
1

GPT-5.5

gpt-5.5

multimodalvisionmulti-input reasoning
OpenAI

93.7

Inference

80.493.770.261.61.9$5 in / $30 out
2

GPT-4.1 nano

gpt-4.1-nano-2025-04-14

multimodalvisionmulti-input reasoning
OpenAI

90.8

Inference

12.290.80.00.095.9
3

DeepSeek-V4-Flash-Max

deepseek-v4-flash-max

codeprogrammingtool use
DeepSeek

89.2

Inference

58.389.247.644.298.7
4

DeepSeek-V4-Pro-Max

deepseek-v4-pro-max

codeprogrammingtool use
DeepSeek

89.2

Inference

67.489.261.358.634.2
5

Gemini 3.5 Flash

gemini-3.5-flash

multimodalvisionmulti-input reasoning
Google

89.2

Inference

62.889.274.430.526.6
6

GPT-4.1 mini

gpt-4.1-mini-2025-04-14

multimodalvisionmulti-input reasoning
OpenAI

87.8

Inference

20.287.88.92.465.6
7

GPT-5 mini

gpt-5-mini-2025-08-07

multimodalvisionmulti-input reasoning
OpenAI

81.7

Inference

41.781.70.027.364.0
8

GPT-4.1

gpt-4.1-2025-04-14

multimodalvisionmulti-input reasoning
OpenAI

76.0

Inference

27.976.032.815.836.7
9

LongCat-Flash-Lite

longcat-flash-lite

codeprogrammingtool use
Meituan

74.7

Inference

23.674.730.124.596.5$0.1 in / $0.4 out
10

Gemini 3.1 Flash-Lite

gemini-3.1-flash-lite-preview

multimodalvisionmulti-input reasoning
Google

72.2

Inference

55.372.20.00.063.3
11

Gemini 3 Flash

gemini-3-flash-preview

multimodalvisionmulti-input reasoning
Google

72.2

Inference

70.072.238.863.744.9
12

Grok 4.3

grok-4.3

textinference
xAI

72.2

Inference

0.072.20.00.041.8$1.25 in / $2.5 out
13

MiniMax M2.1

minimax-m2.1

codeprogrammingtool use
MiniMax

72.2

Inference

40.872.252.148.768.6$0.3 in / $1.2 out
14

MiniMax M2.5

minimax-m2.5

codeprogrammingtool use
MiniMax

72.2

Inference

0.072.250.456.968.6$0.3 in / $1.2 out
15

MiniMax M3

minimax-m3

multimodalvisionmulti-input reasoning
MiniMax

72.2

Inference

54.672.238.774.348.1$0.6 in / $2.4 out
16

Nova 2 Lite

nova-2-lite

multimodalvisionmulti-input reasoning
AAmazon

72.2

Inference

42.872.213.027.050.0$0.3 in / $2.5 out
17

Nova 2 Sonic

nova-2-sonic

multimodalvisionmulti-input reasoning
AAmazon

72.2

Inference

0.072.20.00.046.8$0.33 in / $2.75 out
18

Qwen3.6 Plus

qwen3.6-plus

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

72.2

Inference

70.272.242.161.044.9$0.5 in / $3 out
19

Qwen3.7 Max

qwen3.7-max

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

72.2

Inference

66.172.261.781.535.4$1.25 in / $3.75 out
20

Mercury 2

mercury-2

codeprogrammingtool use
IInception

69.0

Inference

43.469.00.020.379.7$0.25 in / $0.75 out
1

GPT-5.5

OpenAI

93.7

$5 in / $30 out

2

GPT-4.1 nano

OpenAI

90.8

$0.1 in / $0.4 out

3

DeepSeek-V4-Flash-Max

DeepSeek

89.2

$0.14 in / $0.28 out

Page 1 of 16 · 309 models

Next

Want benchmark charts, model comparison, and pricing analytics?

Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.

Open full leaderboard

Rankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.

$0.1 in / $0.4 out
$0.14 in / $0.28 out
$1.74 in / $3.48 out
$1.5 in / $9 out
$0.4 in / $1.6 out
$0.25 in / $2 out
$2 in / $8 out
$0.25 in / $1.5 out
$0.5 in / $3 out
4

DeepSeek-V4-Pro-Max

DeepSeek

89.2

$1.74 in / $3.48 out

5

Gemini 3.5 Flash

Google

89.2

$1.5 in / $9 out

6

GPT-4.1 mini

OpenAI

87.8

$0.4 in / $1.6 out

7

GPT-5 mini

OpenAI

81.7

$0.25 in / $2 out

8

GPT-4.1

OpenAI

76.0

$2 in / $8 out

9

LongCat-Flash-Lite

Meituan

74.7

$0.1 in / $0.4 out

10

Gemini 3.1 Flash-Lite

Google

72.2

$0.25 in / $1.5 out

11

Gemini 3 Flash

Google

72.2

$0.5 in / $3 out

12

Grok 4.3

xAI

72.2

$1.25 in / $2.5 out

13

MiniMax M2.1

MiniMax

72.2

$0.3 in / $1.2 out

14

MiniMax M2.5

MiniMax

72.2

$0.3 in / $1.2 out

15

MiniMax M3

MiniMax

72.2

$0.6 in / $2.4 out

16
A

Nova 2 Lite

Amazon

72.2

$0.3 in / $2.5 out

17
A

Nova 2 Sonic

Amazon

72.2

$0.33 in / $2.75 out

18
A

Qwen3.6 Plus

Alibaba Cloud / Qwen Team

72.2

$0.5 in / $3 out

19
A

Qwen3.7 Max

Alibaba Cloud / Qwen Team

72.2

$1.25 in / $3.75 out

20
I

Mercury 2

Inception

69.0

$0.25 in / $0.75 out