Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
294
Tracked models
27
Providers
251
Benchmarked
34.7
Avg. index
294 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 61 | Gemini 1.5 Flash gemini-1.5-flash multimodalvisionmulti-input reasoning | Google | 52.6 overall | 23.2 | 92.1 | 0.0 | 0.0 | 71.9 | $0.15 in / $0.6 out |
| 62 | Grok-4 grok-4 multimodalvisionmulti-input reasoning | xAI | 52.2 overall | 52.2 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
| 63 | Claude Sonnet 4.6 claude-sonnet-4-6 multimodalvisionmulti-input reasoning | Anthropic | 51.7 overall | 66.1 | 30.1 | 49.6 | 68.9 | 13.2 | |
| 64 | MiMo-V2-Flash mimo-v2-flash codeprogrammingtool use | Xiaomi | 51.5 overall | 53.7 | 79.8 | 27.2 | 39.3 | 85.9 | $0.1 in / $0.3 out |
| 65 | Qwen3 VL 235B A22B Instruct qwen3-vl-235b-a22b-instruct multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 50.8 overall | 37.1 | 66.8 | 56.7 | 0.0 | 49.4 | |
| 66 | GPT-5.1 Codex High gpt-5.1-codex-high multimodalvisionmulti-input reasoning | OpenAI | 50.7 overall | 61.0 | 48.6 | 0.0 | 0.0 | 25.1 | |
| 67 | Grok-3 grok-3 multimodalvisionmulti-input reasoning | xAI | 50.4 overall | 59.5 | 51.9 | 0.0 | 0.0 | 22.6 | $3 in / $15 out |
| 68 | Kimi K2 0905 kimi-k2-0905 textinference | Moonshot AI | 50.2 overall | 44.4 | 66.8 | 0.0 | 0.0 | 40.0 | $0.6 in / $2.5 out |
| 69 | Qwen3.5-35B-A3B qwen3.5-35b-a3b multimodalvisionmulti-input reasoning | Alibaba Cloud / Qwen Team | 49.5 overall | 57.2 | 66.8 | 44.3 | 34.4 | 46.4 | $0.25 in / $2 out |
| 70 | GPT-5 gpt-5-2025-08-07 multimodalvisionmulti-input reasoning | OpenAI | 49.1 overall | 64.4 | 0.0 | 29.0 | 51.7 | 0.0 | N/A |
| 71 | Gemini 1.5 Flash 8B gemini-1.5-flash-8b multimodalvisionmulti-input reasoning | Google | 49.1 overall | 10.4 | 92.1 | 0.0 | 0.0 | 88.3 | |
| 72 | Claude Opus 4.1 claude-opus-4-1-20250805 multimodalvisionmulti-input reasoning | Anthropic | 48.8 overall | 48.1 | 30.1 | 66.8 | 62.9 | 7.0 | |
| 73 | Claude Opus 4.5 claude-opus-4-5-20251101 multimodalvisionmulti-input reasoning | Anthropic | 48.6 overall | 56.3 | 30.1 | 44.2 | 74.2 | 10.6 | |
| 74 | GPT-5 mini gpt-5-mini-2025-08-07 multimodalvisionmulti-input reasoning | OpenAI | 48.6 overall | 41.9 | 89.7 | 0.0 | 23.7 | 56.3 | |
| 75 | Claude Haiku 4.5 claude-haiku-4-5-20251001 multimodalvisionmulti-input reasoning | Anthropic | 48.4 overall | 32.9 | 61.2 | 54.2 | 57.2 | 37.7 | |
| 76 | GPT-5.2 Pro gpt-5.2-pro-2025-12-11 multimodalvisionmulti-input reasoning | OpenAI | 48.2 overall | 67.3 | 31.3 | 56.4 | 0.0 | 2.5 | |
| 77 | Grok 4 Fast grok-4-fast multimodalvisionmulti-input reasoning | xAI | 48.2 overall | 58.0 | 68.2 | 15.4 | 0.0 | 67.2 | $0.2 in / $0.5 out |
| 78 | Grok-4.1 grok-4.1-2025-11-17 multimodalvisionmulti-input reasoning | xAI | 48.2 overall | 0.0 | 64.2 | 0.0 | 0.0 | 22.6 | $3 in / $15 out |
| 79 | Claude Opus 4 claude-opus-4-20250514 multimodalvisionmulti-input reasoning | Anthropic | 47.8 overall | 37.8 | 0.0 | 57.9 | 49.5 | 0.0 | |
| 80 | o1-pro o1-pro multimodalvisionmulti-input reasoning | OpenAI | 47.5 overall | 47.5 | 0.0 | 0.0 | 0.0 | 0.0 | N/A |
Gemini 1.5 Flash
52.6
$0.15 in / $0.6 out
Grok-4
xAI
52.2
N/A
Claude Sonnet 4.6
Anthropic
51.7
$3 in / $15 out
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $3 in / $15 out |
| $0.3 in / $1.5 out |
| $1.25 in / $10 out |
| $0.07 in / $0.3 out |
| $15 in / $75 out |
| $5 in / $25 out |
| $0.25 in / $2 out |
| $1 in / $5 out |
| $21 in / $168 out |
| N/A |
MiMo-V2-Flash
Xiaomi
51.5
$0.1 in / $0.3 out
Qwen3 VL 235B A22B Instruct
Alibaba Cloud / Qwen Team
50.8
$0.3 in / $1.5 out
GPT-5.1 Codex High
OpenAI
50.7
$1.25 in / $10 out
Grok-3
xAI
50.4
$3 in / $15 out
Kimi K2 0905
Moonshot AI
50.2
$0.6 in / $2.5 out
Qwen3.5-35B-A3B
Alibaba Cloud / Qwen Team
49.5
$0.25 in / $2 out
GPT-5
OpenAI
49.1
N/A
Gemini 1.5 Flash 8B
49.1
$0.07 in / $0.3 out
Claude Opus 4.1
Anthropic
48.8
$15 in / $75 out
Claude Opus 4.5
Anthropic
48.6
$5 in / $25 out
GPT-5 mini
OpenAI
48.6
$0.25 in / $2 out
Claude Haiku 4.5
Anthropic
48.4
$1 in / $5 out
GPT-5.2 Pro
OpenAI
48.2
$21 in / $168 out
Grok 4 Fast
xAI
48.2
$0.2 in / $0.5 out
Grok-4.1
xAI
48.2
$3 in / $15 out
Claude Opus 4
Anthropic
47.8
N/A
o1-pro
OpenAI
47.5
N/A