Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
294
Tracked models
27
Providers
251
Benchmarked
27.4
Avg. index
294 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 41 | Qwen3.5-35B-A3B qwen3.5-35b-a3b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 57.2 Benchmarks | 57.2 | 66.8 | 44.3 | 34.4 | 46.4 | $0.25 in / $2 out |
| 42 | GPT-5 Medium gpt-5-medium-2025-08-07 multimodalvisionmulti-input reasoning | OpenAI | 56.9 Benchmarks | 56.9 | 61.6 | 0.0 | 0.0 | 29.0 | |
| 43 | ChatGPT-4o Latest chatgpt-4o-latest multimodalvisionmulti-input reasoning | OpenAI | 56.6 Benchmarks | 56.6 | 63.5 | 0.0 | 0.0 | 32.0 | |
| 44 | Gemma 4 31B gemma-4-31b-it multimodalvisionmulti-input reasoning | Google | 56.5 Benchmarks | 56.5 | 66.8 | 0.0 | 0.0 | 76.7 | |
| 45 | Claude Opus 4.5 claude-opus-4-5-20251101 multimodalvisionmulti-input reasoning | Anthropic | 56.3 Benchmarks | 56.3 | 30.1 | 44.2 | 74.2 | 10.6 | |
| 46 | Gemini 3.1 Flash-Lite gemini-3.1-flash-lite-preview multimodalvisionmulti-input reasoning | Google | 56.3 Benchmarks | 56.3 | 84.9 | 0.0 | 0.0 | 50.6 | |
| 47 | LongCat-Flash-Thinking-2601 longcat-flash-thinking-2601 codeprogrammingtool use | Meituan | 56.3 Benchmarks | 56.3 | 51.9 | 30.8 | 38.0 | 57.7 | |
| 48 | Qwen3.6-35B-A3B qwen3.6-35b-a3b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 55.7 Benchmarks | 55.7 | 0.0 | 17.7 | 26.6 | 0.0 | N/A |
| 49 | DeepSeek-V3.2-Speciale deepseek-v3.2-speciale codeprogrammingtool use | DeepSeek | 54.5 Benchmarks | 54.5 | 0.0 | 9.7 | 45.9 | 0.0 | |
| 50 | GPT OSS 20B High gpt-oss-20b-high textinference | OpenAI | 53.9 Benchmarks | 53.9 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 51 | MiMo-V2-Flash mimo-v2-flash codeprogrammingtool use | Xiaomi | 53.7 Benchmarks | 53.7 | 79.8 | 27.2 | 39.3 | 85.9 | $0.1 in / $0.3 out |
| 52 | Grok-3 Mini grok-3-mini multimodalvisionmulti-input reasoning | xAI | 53.4 Benchmarks | 53.4 | 51.9 | 0.0 | 0.0 | 65.0 | $0.3 in / $0.5 out |
| 53 | Claude Sonnet 4.5 claude-sonnet-4-5-20250929 multimodalvisionmulti-input reasoning | Anthropic | 53.3 Benchmarks | 53.3 | 30.1 | 71.8 | 74.6 | 13.2 | |
| 54 | DeepSeek-V3.2 (Thinking) deepseek-reasoner codeprogrammingtool use | DeepSeek | 53.1 Benchmarks | 53.1 | 0.0 | 16.6 | 45.9 | 0.0 | |
| 55 | DeepSeek-V3.2-Exp deepseek-v3.2-exp codeprogrammingtool use | DeepSeek | 52.7 Benchmarks | 52.7 | 0.0 | 28.8 | 40.5 | 0.0 | N/A |
| 56 | Grok-4 grok-4 multimodalvisionmulti-input reasoning | xAI | 52.2 Benchmarks | 52.2 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 57 | Gemini 2.5 Pro Preview 06-05 gemini-2.5-pro-preview-06-05 multimodalvisionmulti-input reasoning | Google | 51.7 Benchmarks | 51.7 | 63.2 | 0.0 | 30.0 | 27.9 | |
| 58 | DeepSeek-R1-0528 deepseek-r1-0528 codeprogrammingtool use | DeepSeek | 50.4 Benchmarks | 50.4 | 14.4 | 0.0 | 6.8 | 35.0 | $0.55 in / $2.19 out |
| 59 | LongCat-Flash-Thinking longcat-flash-thinking codeprogrammingtool use | Meituan | 50.4 Benchmarks | 50.4 | 0.0 | 0.0 | 22.1 | 0.0 | |
| 60 | Nemotron 3 Super (120B A12B) nemotron-3-super-120b-a12b codeprogrammingtool use | NVIDIA | 48.9 Benchmarks | 48.9 | 0.0 | 8.9 | 27.0 | 0.0 | N/A |
Qwen3.5-35B-A3B
Alibaba Cloud / Qwen Team
57.2
$0.25 in / $2 out
GPT-5 Medium
OpenAI
56.9
$1.25 in / $10 out
ChatGPT-4o Latest
OpenAI
56.6
$2.5 in / $10 out
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $1.25 in / $10 out |
| $2.5 in / $10 out |
| $0.14 in / $0.4 out |
| $5 in / $25 out |
| $0.25 in / $1.5 out |
| $0.3 in / $1.2 out |
| N/A |
| $3 in / $15 out |
| N/A |
| $1.25 in / $10 out |
| N/A |
Gemma 4 31B
56.5
$0.14 in / $0.4 out
Claude Opus 4.5
Anthropic
56.3
$5 in / $25 out
Gemini 3.1 Flash-Lite
56.3
$0.25 in / $1.5 out
LongCat-Flash-Thinking-2601
Meituan
56.3
$0.3 in / $1.2 out
Qwen3.6-35B-A3B
Alibaba Cloud / Qwen Team
55.7
N/A
DeepSeek-V3.2-Speciale
DeepSeek
54.5
N/A
GPT OSS 20B High
OpenAI
53.9
N/A
MiMo-V2-Flash
Xiaomi
53.7
$0.1 in / $0.3 out
Grok-3 Mini
xAI
53.4
$0.3 in / $0.5 out
Claude Sonnet 4.5
Anthropic
53.3
$3 in / $15 out
DeepSeek-V3.2 (Thinking)
DeepSeek
53.1
N/A
DeepSeek-V3.2-Exp
DeepSeek
52.7
N/A
Grok-4
xAI
52.2
N/A
Gemini 2.5 Pro Preview 06-05
51.7
$1.25 in / $10 out
DeepSeek-R1-0528
DeepSeek
50.4
$0.55 in / $2.19 out
LongCat-Flash-Thinking
Meituan
50.4
N/A
Nemotron 3 Super (120B A12B)
NVIDIA
48.9
N/A