Skytells
HomeModelsCLIChangelog
  • Home
  • Models
  • CLI
  • Changelog
Skytells

Addressing the world's greatest challenges with AI. Enterprise research, foundation models, and infrastructure trusted by organizations worldwide since 2012.

Get Started

  • Console
  • Learn
  • Documentation
  • API Reference
  • Pricing
  • ModelsNew

Platform

  • Cloud AgentsNew
  • AI Solutions
  • Infrastructure
  • Edge Network
  • Trust Center
  • CLI

Resources

  • Blog
  • Changelog
  • AI Leaderboard
  • Research
  • Status

Company

  • About
  • Careers
  • Legal
  • Privacy Policy

© 2012–2026 Skytells, Inc. All rights reserved.

Live rankings

AI Model Leaderboard

Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.

Explore full leaderboardBrowse model catalog

296

Tracked models

27

Providers

253

Benchmarked

32.1

Avg. index

OverallBenchmarksInferenceAgenticProgrammingValue / Price

296 models

RankModelProviderScoreBenchmarksInferenceAgenticProgrammingValuePrice
81

GPT-5.1 Medium

gpt-5.1-medium-2025-11-12

multimodalvisionmulti-input reasoning
OpenAI

61.9

Inference

63.661.90.00.028.9$1.25 in / $10 out
82

GPT-5 Medium

gpt-5-medium-2025-08-07

multimodalvisionmulti-input reasoning
OpenAI

61.9

Inference

56.761.90.00.028.9
83

Claude 3 Haiku

claude-3-haiku-20240307

multimodalvisionmulti-input reasoning
Anthropic

61.8

Inference

5.861.80.00.057.9
84

Claude Haiku 4.5

claude-haiku-4-5-20251001

multimodalvisionmulti-input reasoning
Anthropic

61.8

Inference

32.761.854.256.637.7
85

o1-mini

o1-mini

textinference
OpenAI

61.3

Inference

25.761.30.00.030.1$3 in / $12 out
86

Llama 3.2 11B Instruct

llama-3.2-11b-instruct

multimodalvisionmulti-input reasoning
MMeta

60.3

Inference

4.060.30.00.094.9$0.05 in / $0.05 out
87

MiMo-V2-Omni

mimo-v2-omni

multimodalvisionmulti-input reasoning
Xiaomi

58.6

Inference

0.058.60.054.444.8$0.4 in / $2 out
88

DeepSeek-V3.2 (Non-thinking)

deepseek-chat

textinference
DeepSeek

58.0

Inference

0.058.00.00.070.2$0.28 in / $0.42 out
89

DeepSeek-V3

deepseek-v3

codeprogrammingtool use
DeepSeek

58.0

Inference

27.358.00.010.460.5$0.27 in / $1.1 out
90

GPT OSS 120B High

gpt-oss-120b-high

multimodalvisionmulti-input reasoning
OpenAI

58.0

Inference

44.758.00.00.073.3
91

Gemini 1.0 Pro

gemini-1.0-pro

multimodalvisionmulti-input reasoning
Google

57.2

Inference

3.257.20.00.055.4
92

Llama 4 Maverick

llama-4-maverick

multimodalvisionmulti-input reasoning
MMeta

55.8

Inference

35.455.80.00.057.1$0.17 in / $0.85 out
93

GPT-5.1 Thinking

gpt-5.1-thinking-2025-11-12

multimodalvisionmulti-input reasoning
OpenAI

55.6

Inference

64.955.60.056.227.0
94

Mistral Small 4

mistral-small-latest

multimodalvisionmulti-input reasoning
Mistral AI

55.2

Inference

34.755.20.00.066.8
95

Qwen3-Coder

qwen3-coder

textinference
AAlibaba Cloud / Qwen Team

55.2

Inference

0.055.20.00.088.5$0.18 in / $0.18 out
96

Qwen3 Max

qwen3-max

codeprogrammingtool use
AAlibaba Cloud / Qwen Team

55.2

Inference

29.855.20.035.831.3$0.5 in / $5 out
97

GPT-4

gpt-4-0613

multimodalvisionmulti-input reasoning
OpenAI

54.9

Inference

6.854.90.00.018.7$30 in / $60 out
98

DeepSeek-V3.2

deepseek-v3.2

codeprogrammingtool use
DeepSeek

53.2

Inference

57.353.215.544.970.0$0.26 in / $0.38 out
99

GPT-4 Turbo

gpt-4-turbo-2024-04-09

textinference
OpenAI

52.7

Inference

16.952.70.00.018.8$10 in / $30 out
100

GPT-5.3 Chat

gpt-5.3-chat-latest

multimodalvisionmulti-input reasoning
OpenAI

52.7

Inference

0.052.70.00.026.5
81

GPT-5.1 Medium

OpenAI

61.9

$1.25 in / $10 out

82

GPT-5 Medium

OpenAI

61.9

$1.25 in / $10 out

83

Claude 3 Haiku

Anthropic

61.8

$0.25 in / $1.25 out

84

Page 5 of 15 · 296 models

PreviousNext

Want benchmark charts, model comparison, and pricing analytics?

Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.

Open full leaderboard

Rankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.

$1.25 in / $10 out
$0.25 in / $1.25 out
$1 in / $5 out
$0.1 in / $0.5 out
$0.5 in / $1.5 out
$1.25 in / $10 out
$0.15 in / $0.6 out
$1.75 in / $14 out

Claude Haiku 4.5

Anthropic

61.8

$1 in / $5 out

85

o1-mini

OpenAI

61.3

$3 in / $12 out

86
M

Llama 3.2 11B Instruct

Meta

60.3

$0.05 in / $0.05 out

87

MiMo-V2-Omni

Xiaomi

58.6

$0.4 in / $2 out

88

DeepSeek-V3.2 (Non-thinking)

DeepSeek

58.0

$0.28 in / $0.42 out

89

DeepSeek-V3

DeepSeek

58.0

$0.27 in / $1.1 out

90

GPT OSS 120B High

OpenAI

58.0

$0.1 in / $0.5 out

91

Gemini 1.0 Pro

Google

57.2

$0.5 in / $1.5 out

92
M

Llama 4 Maverick

Meta

55.8

$0.17 in / $0.85 out

93

GPT-5.1 Thinking

OpenAI

55.6

$1.25 in / $10 out

94

Mistral Small 4

Mistral AI

55.2

$0.15 in / $0.6 out

95
A

Qwen3-Coder

Alibaba Cloud / Qwen Team

55.2

$0.18 in / $0.18 out

96
A

Qwen3 Max

Alibaba Cloud / Qwen Team

55.2

$0.5 in / $5 out

97

GPT-4

OpenAI

54.9

$30 in / $60 out

98

DeepSeek-V3.2

DeepSeek

53.2

$0.26 in / $0.38 out

99

GPT-4 Turbo

OpenAI

52.7

$10 in / $30 out

100

GPT-5.3 Chat

OpenAI

52.7

$1.75 in / $14 out