Skytells
HomeModelsCLIChangelog
  • Home
  • Models
  • CLI
  • Changelog
Skytells

Addressing the world's greatest challenges with AI. Enterprise research, foundation models, and infrastructure trusted by organizations worldwide since 2012.

Get Started

  • Console
  • Learn
  • Documentation
  • API Reference
  • Pricing
  • ModelsNew

Platform

  • Cloud AgentsNew
  • AI Solutions
  • Infrastructure
  • Edge Network
  • Trust Center
  • CLI

Resources

  • Blog
  • Changelog
  • AI Leaderboard
  • Research
  • Status

Company

  • About
  • Careers
  • Legal
  • Privacy Policy

© 2012–2026 Skytells, Inc. All rights reserved.

Live rankings

AI Model Leaderboard

Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.

Explore full leaderboardBrowse model catalog

294

Tracked models

27

Providers

251

Benchmarked

27.4

Avg. index

OverallBenchmarksInferenceAgenticProgrammingValue / Price

294 models

RankModelProviderScoreBenchmarksInferenceAgenticProgrammingValuePrice
41

Qwen3.5-35B-A3B

qwen3.5-35b-a3b

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

57.2

Benchmarks

57.266.844.334.446.4$0.25 in / $2 out
42

GPT-5 Medium

gpt-5-medium-2025-08-07

multimodalvisionmulti-input reasoning
OpenAI

56.9

Benchmarks

56.961.60.00.029.0
43

ChatGPT-4o Latest

chatgpt-4o-latest

multimodalvisionmulti-input reasoning
OpenAI

56.6

Benchmarks

56.663.50.00.032.0
44

Gemma 4 31B

gemma-4-31b-it

multimodalvisionmulti-input reasoning
Google

56.5

Benchmarks

56.566.80.00.076.7
45

Claude Opus 4.5

claude-opus-4-5-20251101

multimodalvisionmulti-input reasoning
Anthropic

56.3

Benchmarks

56.330.144.274.210.6
46

Gemini 3.1 Flash-Lite

gemini-3.1-flash-lite-preview

multimodalvisionmulti-input reasoning
Google

56.3

Benchmarks

56.384.90.00.050.6
47

LongCat-Flash-Thinking-2601

longcat-flash-thinking-2601

codeprogrammingtool use
Meituan

56.3

Benchmarks

56.351.930.838.057.7
48

Qwen3.6-35B-A3B

qwen3.6-35b-a3b

multimodalvisionmulti-input reasoning
AAlibaba Cloud / Qwen Team

55.7

Benchmarks

55.70.017.726.60.0N/A
49

DeepSeek-V3.2-Speciale

deepseek-v3.2-speciale

codeprogrammingtool use
DeepSeek

54.5

Benchmarks

54.50.09.745.90.0
50

GPT OSS 20B High

gpt-oss-20b-high

textinference
OpenAI

53.9

Benchmarks

53.90.00.00.00.0N/A
51

MiMo-V2-Flash

mimo-v2-flash

codeprogrammingtool use
Xiaomi

53.7

Benchmarks

53.779.827.239.385.9$0.1 in / $0.3 out
52

Grok-3 Mini

grok-3-mini

multimodalvisionmulti-input reasoning
xAI

53.4

Benchmarks

53.451.90.00.065.0$0.3 in / $0.5 out
53

Claude Sonnet 4.5

claude-sonnet-4-5-20250929

multimodalvisionmulti-input reasoning
Anthropic

53.3

Benchmarks

53.330.171.874.613.2
54

DeepSeek-V3.2 (Thinking)

deepseek-reasoner

codeprogrammingtool use
DeepSeek

53.1

Benchmarks

53.10.016.645.90.0
55

DeepSeek-V3.2-Exp

deepseek-v3.2-exp

codeprogrammingtool use
DeepSeek

52.7

Benchmarks

52.70.028.840.50.0N/A
56

Grok-4

grok-4

multimodalvisionmulti-input reasoning
xAI

52.2

Benchmarks

52.20.00.00.00.0N/A
57

Gemini 2.5 Pro Preview 06-05

gemini-2.5-pro-preview-06-05

multimodalvisionmulti-input reasoning
Google

51.7

Benchmarks

51.763.20.030.027.9
58

DeepSeek-R1-0528

deepseek-r1-0528

codeprogrammingtool use
DeepSeek

50.4

Benchmarks

50.414.40.06.835.0$0.55 in / $2.19 out
59

LongCat-Flash-Thinking

longcat-flash-thinking

codeprogrammingtool use
Meituan

50.4

Benchmarks

50.40.00.022.10.0
60

Nemotron 3 Super (120B A12B)

nemotron-3-super-120b-a12b

codeprogrammingtool use
NNVIDIA

48.9

Benchmarks

48.90.08.927.00.0N/A
41
A

Qwen3.5-35B-A3B

Alibaba Cloud / Qwen Team

57.2

$0.25 in / $2 out

42

GPT-5 Medium

OpenAI

56.9

$1.25 in / $10 out

43

ChatGPT-4o Latest

OpenAI

56.6

$2.5 in / $10 out

44

Page 3 of 15 · 294 models

PreviousNext

Want benchmark charts, model comparison, and pricing analytics?

Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.

Open full leaderboard

Rankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.

$1.25 in / $10 out
$2.5 in / $10 out
$0.14 in / $0.4 out
$5 in / $25 out
$0.25 in / $1.5 out
$0.3 in / $1.2 out
N/A
$3 in / $15 out
N/A
$1.25 in / $10 out
N/A

Gemma 4 31B

Google

56.5

$0.14 in / $0.4 out

45

Claude Opus 4.5

Anthropic

56.3

$5 in / $25 out

46

Gemini 3.1 Flash-Lite

Google

56.3

$0.25 in / $1.5 out

47

LongCat-Flash-Thinking-2601

Meituan

56.3

$0.3 in / $1.2 out

48
A

Qwen3.6-35B-A3B

Alibaba Cloud / Qwen Team

55.7

N/A

49

DeepSeek-V3.2-Speciale

DeepSeek

54.5

N/A

50

GPT OSS 20B High

OpenAI

53.9

N/A

51

MiMo-V2-Flash

Xiaomi

53.7

$0.1 in / $0.3 out

52

Grok-3 Mini

xAI

53.4

$0.3 in / $0.5 out

53

Claude Sonnet 4.5

Anthropic

53.3

$3 in / $15 out

54

DeepSeek-V3.2 (Thinking)

DeepSeek

53.1

N/A

55

DeepSeek-V3.2-Exp

DeepSeek

52.7

N/A

56

Grok-4

xAI

52.2

N/A

57

Gemini 2.5 Pro Preview 06-05

Google

51.7

$1.25 in / $10 out

58

DeepSeek-R1-0528

DeepSeek

50.4

$0.55 in / $2.19 out

59

LongCat-Flash-Thinking

Meituan

50.4

N/A

60
N

Nemotron 3 Super (120B A12B)

NVIDIA

48.9

N/A