Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
294
Tracked models
27
Providers
251
Benchmarked
27.4
Avg. index
294 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 61 | o4-mini o4-mini multimodalvisionmulti-input reasoning | OpenAI | 48.8 Benchmarks | 48.8 | 70.7 | 38.2 | 32.7 | 41.9 | $1.1 in / $4.4 out |
| 62 | Claude Opus 4.1 claude-opus-4-1-20250805 multimodalvisionmulti-input reasoning | Anthropic | 48.1 Benchmarks | 48.1 | 30.1 | 66.8 | 62.9 | 7.0 | |
| 63 | o1-pro o1-pro multimodalvisionmulti-input reasoning | OpenAI | 47.5 Benchmarks | 47.5 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 64 | Step3-VL-10B step3-vl-10b multimodalvisionmulti-input reasoning | StepFun | 47.4 Benchmarks | 47.4 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 65 | GLM-4.6 glm-4.6 multimodalvisionmulti-input reasoning | Zhipu AI | 47.0 Benchmarks | 47.0 | 34.9 | 37.7 | 46.1 | 42.8 | $0.55 in / $2.19 out |
| 66 | Qwen3-235B-A22B-Thinking-2507 qwen3-235b-a22b-thinking-2507 textinference | Alibaba Cloud / Qwen Team | 46.9 Benchmarks | 46.9 | 66.8 | 26.8 | 0.0 | 39.4 | $0.3 in / $3 out |
| 67 | Gemini 2.0 Flash Thinking gemini-2.0-flash-thinking multimodalvisionmulti-input reasoning | Google | 46.7 Benchmarks | 46.7 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 68 | Sarvam-30B sarvam-30b codeprogrammingtool use | Sarvam AI | 46.5 Benchmarks | 46.5 | 0.0 | 8.5 | 5.3 | 0.0 | N/A |
| 69 | o3 o3-2025-04-16 multimodalvisionmulti-input reasoning | OpenAI | 46.2 Benchmarks | 46.2 | 38.4 | 20.5 | 30.7 | 27.7 | $2 in / $8 out |
| 70 | GPT-5.4 nano gpt-5.4-nano multimodalvisionmulti-input reasoning | OpenAI | 46.1 Benchmarks | 46.1 | 77.4 | 11.0 | 11.2 | 57.2 | |
| 71 | Nemotron 3 Nano (30B A3B) nemotron-3-nano-30b-a3b codeprogrammingtool use | NVIDIA | 45.8 Benchmarks | 45.8 | 66.8 | 3.3 | 4.4 | 90.8 | $0.06 in / $0.24 out |
| 72 | GPT OSS 120B High gpt-oss-120b-high multimodalvisionmulti-input reasoning | OpenAI | 44.9 Benchmarks | 44.9 | 57.3 | 0.0 | 0.0 | 73.2 | |
| 73 | Qwen3-Next-80B-A3B-Thinking qwen3-next-80b-a3b-thinking textinference | Alibaba Cloud / Qwen Team | 44.9 Benchmarks | 44.9 | 6.1 | 41.7 | 0.0 | 51.9 | $0.15 in / $1.5 out |
| 74 | Gemini 2.5 Pro gemini-2.5-pro multimodalvisionmulti-input reasoning | Google | 44.6 Benchmarks | 44.6 | 63.2 | 0.0 | 25.6 | 27.9 | |
| 75 | Mercury 2 mercury-2 codeprogrammingtool use | Inception | 44.6 Benchmarks | 44.6 | 72.5 | 0.0 | 22.3 | 69.2 | $0.25 in / $0.75 out |
| 76 | Qwen3 VL 32B Thinking qwen3-vl-32b-thinking multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 44.6 Benchmarks | 44.6 | 0.0 | 34.6 | 0.0 | 0.0 | |
| 77 | Kimi K2 0905 kimi-k2-0905 textinference | Moonshot AI | 44.4 Benchmarks | 44.4 | 66.8 | 0.0 | 0.0 | 40.0 | $0.6 in / $2.5 out |
| 78 | Claude 3.7 Sonnet claude-3-7-sonnet-20250219 multimodalvisionmulti-input reasoning | Anthropic | 43.7 Benchmarks | 43.7 | 30.1 | 49.0 | 40.1 | 13.2 | |
| 79 | Gemma 4 26B-A4B gemma-4-26b-a4b-it multimodalvisionmulti-input reasoning | Google | 43.7 Benchmarks | 43.7 | 66.8 | 0.0 | 0.0 | 77.8 | |
| 80 | K-EXAONE-236B-A23B k-exaone-236b-a23b multimodalvisionmulti-input reasoning | LG AI Research | 43.4 Benchmarks | 43.4 | 24.2 | 0.0 | 0.0 | 49.1 | $0.6 in / $1 out |
o4-mini
OpenAI
48.8
$1.1 in / $4.4 out
Claude Opus 4.1
Anthropic
48.1
$15 in / $75 out
o1-pro
OpenAI
47.5
N/A
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $15 in / $75 out |
| N/A |
| $0.2 in / $1.25 out |
| $0.1 in / $0.5 out |
| $1.25 in / $10 out |
| N/A |
| $3 in / $15 out |
| $0.13 in / $0.4 out |
Step3-VL-10B
StepFun
47.4
N/A
GLM-4.6
Zhipu AI
47.0
$0.55 in / $2.19 out
Qwen3-235B-A22B-Thinking-2507
Alibaba Cloud / Qwen Team
46.9
$0.3 in / $3 out
Gemini 2.0 Flash Thinking
46.7
N/A
Sarvam-30B
Sarvam AI
46.5
N/A
o3
OpenAI
46.2
$2 in / $8 out
GPT-5.4 nano
OpenAI
46.1
$0.2 in / $1.25 out
Nemotron 3 Nano (30B A3B)
NVIDIA
45.8
$0.06 in / $0.24 out
GPT OSS 120B High
OpenAI
44.9
$0.1 in / $0.5 out
Qwen3-Next-80B-A3B-Thinking
Alibaba Cloud / Qwen Team
44.9
$0.15 in / $1.5 out
Gemini 2.5 Pro
44.6
$1.25 in / $10 out
Mercury 2
Inception
44.6
$0.25 in / $0.75 out
Qwen3 VL 32B Thinking
Alibaba Cloud / Qwen Team
44.6
N/A
Kimi K2 0905
Moonshot AI
44.4
$0.6 in / $2.5 out
Claude 3.7 Sonnet
Anthropic
43.7
$3 in / $15 out
Gemma 4 26B-A4B
43.7
$0.13 in / $0.4 out
K-EXAONE-236B-A23B
LG AI Research
43.4
$0.6 in / $1 out