Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
309
Tracked models
27
Providers
264
Benchmarked
29.3
Avg. index
309 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 41 | Gemma 4 31B gemma-4-31b-it multimodalvisionmulti-input reasoning | Google | 57.4 overall | 54.9 | 41.1 | 0.0 | 0.0 | 90.5 | $0.14 in / $0.4 out |
| 42 | MiniMax M3 minimax-m3 multimodalvisionmulti-input reasoning | MiniMax | 57.3 overall | 54.6 | 72.2 | 38.7 | 74.3 | 48.1 | $0.6 in / $2.4 out |
| 43 | Claude Opus 4.5 claude-opus-4-5-20251101 multimodalvisionmulti-input reasoning | Anthropic | 56.2 overall | 55.3 | 0.0 | 41.4 | 73.5 | 0.0 | |
| 44 | GPT-5 Medium gpt-5-medium-2025-08-07 multimodalvisionmulti-input reasoning | OpenAI | 56.0 overall | 56.0 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 45 | GPT-5.4 gpt-5.4 texttext-to-textlanguage | OpenAI | 55.5 overall | 75.3 | 38.9 | 56.2 | 60.6 | 14.1 | |
| 46 | ChatGPT-4o Latest chatgpt-4o-latest multimodalvisionmulti-input reasoning | OpenAI | 54.9 overall | 54.9 | 0.0 | 0.0 | 0.0 | 0.0 | |
| 47 | GLM-5V-Turbo glm-5v-turbo multimodalvisionmulti-input reasoning | Zhipu AI | 54.9 overall | 0.0 | 0.0 | 54.9 | 0.0 | 0.0 | N/A |
| 48 | GPT OSS 120B High gpt-oss-120b-high multimodalvisionmulti-input reasoning | OpenAI | 54.1 overall | 44.2 | 53.0 | 0.0 | 0.0 | 83.3 | |
| 49 | Kimi K2.5 kimi-k2.5 multimodalvisionmulti-input reasoning | Moonshot AI | 54.0 overall | 67.2 | 0.0 | 47.3 | 44.6 | 0.0 | N/A |
| 50 | Seed 2.0 Lite seed-2.0-lite multimodalvisionmulti-input reasoning | ByteDance | 53.3 overall | 57.6 | 0.0 | 0.0 | 47.8 | 0.0 | N/A |
| 51 | GPT OSS 20B High gpt-oss-20b-high textinference | OpenAI | 53.1 overall | 53.1 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 52 | MiniMax M2.1 minimax-m2.1 codeprogrammingtool use | MiniMax | 53.1 overall | 40.8 | 72.2 | 52.1 | 48.7 | 68.6 | $0.3 in / $1.2 out |
| 53 | MiMo-V2-Omni mimo-v2-omni multimodalvisionmulti-input reasoning | Xiaomi | 53.0 overall | 0.0 | 0.0 | 0.0 | 53.0 | 0.0 | N/A |
| 54 | GPT-5.1 Medium gpt-5.1-medium-2025-11-12 multimodalvisionmulti-input reasoning | OpenAI | 52.7 overall | 64.0 | 48.4 | 0.0 | 0.0 | 27.6 | |
| 55 | Grok-3 Mini grok-3-mini multimodalvisionmulti-input reasoning | xAI | 52.0 overall | 52.0 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 56 | GPT-5 Codex gpt-5-codex-2025-09-15 codeprogrammingtool use | OpenAI | 51.8 overall | 0.0 | 0.0 | 0.0 | 51.8 | 0.0 | N/A |
| 57 | Claude Sonnet 4.5 claude-sonnet-4-5-20250929 multimodalvisionmulti-input reasoning | Anthropic | 51.5 overall | 51.9 | 14.6 | 71.8 | 74.6 | 9.3 | |
| 58 | Gemma 4 26B-A4B gemma-4-26b-a4b-it multimodalvisionmulti-input reasoning | Google | 51.5 overall | 42.3 | 41.1 | 0.0 | 0.0 | 93.7 | |
| 59 | Nova 2 Pro nova-2-pro multimodalvisionmulti-input reasoning | Amazon | 51.3 overall | 46.8 | 0.0 | 57.2 | 50.6 | 0.0 | N/A |
| 60 | Grok-4 grok-4 multimodalvisionmulti-input reasoning | xAI | 50.5 overall | 50.5 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
Gemma 4 31B
57.4
$0.14 in / $0.4 out
MiniMax M3
MiniMax
57.3
$0.6 in / $2.4 out
Claude Opus 4.5
Anthropic
56.2
N/A
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| N/A |
| N/A |
| $2.5 in / $15 out |
| N/A |
| $0.1 in / $0.5 out |
| $1.25 in / $10 out |
| $3 in / $15 out |
| $0.13 in / $0.4 out |
GPT-5 Medium
OpenAI
56.0
N/A
ChatGPT-4o Latest
OpenAI
54.9
N/A
GLM-5V-Turbo
Zhipu AI
54.9
N/A
GPT OSS 120B High
OpenAI
54.1
$0.1 in / $0.5 out
Kimi K2.5
Moonshot AI
54.0
N/A
Seed 2.0 Lite
ByteDance
53.3
N/A
GPT OSS 20B High
OpenAI
53.1
N/A
MiniMax M2.1
MiniMax
53.1
$0.3 in / $1.2 out
MiMo-V2-Omni
Xiaomi
53.0
N/A
GPT-5.1 Medium
OpenAI
52.7
$1.25 in / $10 out
Grok-3 Mini
xAI
52.0
N/A
GPT-5 Codex
OpenAI
51.8
N/A
Claude Sonnet 4.5
Anthropic
51.5
$3 in / $15 out
Gemma 4 26B-A4B
51.5
$0.13 in / $0.4 out
Nova 2 Pro
Amazon
51.3
N/A
Grok-4
xAI
50.5
N/A