Skytells
HomeModelsCLIChangelog
  • Home
  • Models
  • CLI
  • Changelog
Skytells

Addressing the world's greatest challenges with AI. Enterprise research, foundation models, and infrastructure trusted by organizations worldwide since 2012.

Get Started

  • Console
  • Learn
  • Documentation
  • API Reference
  • Pricing
  • ModelsNew

Platform

  • Cloud AgentsNew
  • AI Solutions
  • Infrastructure
  • Edge Network
  • Trust Center
  • CLI

Resources

  • Blog
  • Changelog
  • AI Leaderboard
  • Research
  • Status

Company

  • About
  • Careers
  • Legal
  • Privacy Policy

© 2012–2026 Skytells, Inc. All rights reserved.

Live rankings

AI Model Leaderboard

Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.

Explore full leaderboardBrowse model catalog

294

Tracked models

27

Providers

251

Benchmarked

27.4

Avg. index

OverallBenchmarksInferenceAgenticProgrammingValue / Price

294 models

RankModelProviderScoreBenchmarksInferenceAgenticProgrammingValuePrice
61

o4-mini

o4-mini

multimodalvisionmulti-input reasoning
OpenAI

48.8

Benchmarks

48.870.738.232.741.9$1.1 in / $4.4 out
62

Claude Opus 4.1

claude-opus-4-1-20250805

multimodalvisionmulti-input reasoning
Anthropic

48.1

Benchmarks

48.130.166.862.97.0
63

o1-pro

o1-pro

multimodalvisionmulti-input reasoning
OpenAI

47.5

Benchmarks

47.50.00.00.00.0N/A
64

Step3-VL-10B

step3-vl-10b

multimodalvisionmulti-input reasoning
SStepFun

47.4

Benchmarks

47.40.00.00.00.0N/A
65

GLM-4.6

glm-4.6

multimodalvisionmulti-input reasoning
ZZhipu AI

47.0

Benchmarks

47.034.937.746.142.8$0.55 in / $2.19 out
66

Qwen3-235B-A22B-Thinking-2507

qwen3-235b-a22b-thinking-2507

textinference
AAlibaba Cloud / Qwen Team

46.9

Benchmarks

46.966.826.80.039.4$0.3 in / $3 out
67

Gemini 2.0 Flash Thinking

gemini-2.0-flash-thinking

multimodalvisionmulti-input reasoning
Google

46.7

Benchmarks

46.70.00.00.00.0
68

Sarvam-30B

sarvam-30b

codeprogrammingtool use
SSarvam AI

46.5

Benchmarks

46.50.08.55.30.0N/A
69

o3

o3-2025-04-16

multimodalvisionmulti-input reasoning
OpenAI

46.2

Benchmarks

46.238.420.530.727.7$2 in / $8 out
70

GPT-5.4 nano

gpt-5.4-nano

multimodalvisionmulti-input reasoning
OpenAI

46.1

Benchmarks

46.177.411.011.257.2
71

Nemotron 3 Nano (30B A3B)

nemotron-3-nano-30b-a3b

codeprogrammingtool use
NNVIDIA

45.8

Benchmarks

45.866.83.34.490.8$0.06 in / $0.24 out
72

GPT OSS 120B High

gpt-oss-120b-high

multimodalvisionmulti-input reasoning
OpenAI

44.9

Benchmarks

44.957.30.00.073.2
73

Qwen3-Next-80B-A3B-Thinking

qwen3-next-80b-a3b-thinking

textinference
AAlibaba Cloud / Qwen Team

44.9

Benchmarks

44.96.141.70.051.9$0.15 in / $1.5 out
74

Gemini 2.5 Pro

gemini-2.5-pro

multimodalvisionmulti-input reasoning
Google

44.6

Benchmarks

44.663.20.025.627.9
75

Mercury 2

mercury-2

codeprogrammingtool use
IInception

44.6

Benchmarks

44.672.50.022.369.2$0.25 in / $0.75 out
76

Qwen3 VL 32B Thinking

qwen3-vl-32b-thinking

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

44.6

Benchmarks

44.60.034.60.00.0
77

Kimi K2 0905

kimi-k2-0905

textinference
Moonshot AI

44.4

Benchmarks

44.466.80.00.040.0$0.6 in / $2.5 out
78

Claude 3.7 Sonnet

claude-3-7-sonnet-20250219

multimodalvisionmulti-input reasoning
Anthropic

43.7

Benchmarks

43.730.149.040.113.2
79

Gemma 4 26B-A4B

gemma-4-26b-a4b-it

multimodalvisionmulti-input reasoning
Google

43.7

Benchmarks

43.766.80.00.077.8
80

K-EXAONE-236B-A23B

k-exaone-236b-a23b

multimodalvisionmulti-input reasoning
LLG AI Research

43.4

Benchmarks

43.424.20.00.049.1$0.6 in / $1 out
61

o4-mini

OpenAI

48.8

$1.1 in / $4.4 out

62

Claude Opus 4.1

Anthropic

48.1

$15 in / $75 out

63

o1-pro

OpenAI

47.5

N/A

64

Page 4 of 15 · 294 models

PreviousNext

Want benchmark charts, model comparison, and pricing analytics?

Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.

Open full leaderboard

Rankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.

$15 in / $75 out
N/A
$0.2 in / $1.25 out
$0.1 in / $0.5 out
$1.25 in / $10 out
N/A
$3 in / $15 out
$0.13 in / $0.4 out
S

Step3-VL-10B

StepFun

47.4

N/A

65
Z

GLM-4.6

Zhipu AI

47.0

$0.55 in / $2.19 out

66
A

Qwen3-235B-A22B-Thinking-2507

Alibaba Cloud / Qwen Team

46.9

$0.3 in / $3 out

67

Gemini 2.0 Flash Thinking

Google

46.7

N/A

68
S

Sarvam-30B

Sarvam AI

46.5

N/A

69

o3

OpenAI

46.2

$2 in / $8 out

70

GPT-5.4 nano

OpenAI

46.1

$0.2 in / $1.25 out

71
N

Nemotron 3 Nano (30B A3B)

NVIDIA

45.8

$0.06 in / $0.24 out

72

GPT OSS 120B High

OpenAI

44.9

$0.1 in / $0.5 out

73
A

Qwen3-Next-80B-A3B-Thinking

Alibaba Cloud / Qwen Team

44.9

$0.15 in / $1.5 out

74

Gemini 2.5 Pro

Google

44.6

$1.25 in / $10 out

75
I

Mercury 2

Inception

44.6

$0.25 in / $0.75 out

76
A

Qwen3 VL 32B Thinking

Alibaba Cloud / Qwen Team

44.6

N/A

77

Kimi K2 0905

Moonshot AI

44.4

$0.6 in / $2.5 out

78

Claude 3.7 Sonnet

Anthropic

43.7

$3 in / $15 out

79

Gemma 4 26B-A4B

Google

43.7

$0.13 in / $0.4 out

80
L

K-EXAONE-236B-A23B

LG AI Research

43.4

$0.6 in / $1 out