Every major AI model ranked across benchmark quality, inference speed, agentic capability, programming aptitude, and cost efficiency — updated continuously from published evaluation data.
296
Tracked models
27
Providers
253
Benchmarked
32.1
Avg. index
296 models
| Rank | Model | Provider | Score | Benchmarks | Inference | Agentic | Programming | Value | Price |
|---|---|---|---|---|---|---|---|---|---|
| 101 | Grok-3 grok-3 multimodalvisionmulti-input reasoning | xAI | 52.7 Inference | 59.3 | 52.7 | 0.0 | 0.0 | 22.7 | $3 in / $15 out |
| 102 | Grok-3 Mini grok-3-mini multimodalvisionmulti-input reasoning | xAI | 52.7 Inference | 53.1 | 52.7 | 0.0 | 0.0 | 65.6 | $0.3 in / $0.5 out |
| 103 | LongCat-Flash-Chat longcat-flash-chat codeprogrammingtool use | Meituan | 52.7 Inference | 27.9 | 52.7 | 49.2 | 39.1 | 57.9 | $0.3 in / $1.2 out |
| 104 | LongCat-Flash-Thinking-2601 longcat-flash-thinking-2601 codeprogrammingtool use | Meituan | 52.7 Inference | 55.7 | 52.7 | 29.4 | 37.1 | 57.9 | |
| 105 | Nova Micro nova-micro textinference | Amazon | 52.7 Inference | 9.1 | 52.7 | 0.0 | 0.0 | 91.3 | $0.03 in / $0.14 out |
| 106 | GLM-4.7 glm-4.7 multimodalvisionmulti-input reasoning | Zhipu AI | 52.2 Inference | 62.4 | 52.2 | 27.6 | 43.8 | 40.7 | $0.6 in / $2.2 out |
| 107 | MiniMax M2.7 minimax-m2.7 codeprogrammingtool use | MiniMax | 52.2 Inference | 0.0 | 52.2 | 44.9 | 40.1 | 54.9 | $0.3 in / $1.2 out |
| 108 | GPT-5.4 gpt-5.4 texttext-to-textlanguage | OpenAI | 51.5 Inference | 75.9 | 51.5 | 61.8 | 63.9 | 18.3 | |
| 109 | GPT-5.1 Codex gpt-5.1-codex multimodalvisionmulti-input reasoning | OpenAI | 49.0 Inference | 0.0 | 49.0 | 0.0 | 50.0 | 25.1 | |
| 110 | GPT-5.1 Codex High gpt-5.1-codex-high multimodalvisionmulti-input reasoning | OpenAI | 49.0 Inference | 61.0 | 49.0 | 0.0 | 0.0 | 25.1 | |
| 111 | GPT-5.2 Codex gpt-5.2-codex multimodalvisionmulti-input reasoning | OpenAI | 49.0 Inference | 0.0 | 49.0 | 0.0 | 44.1 | 19.6 | |
| 112 | GPT-5.3 Codex gpt-5.3-codex texttext-to-textcoding | OpenAI | 49.0 Inference | 0.0 | 49.0 | 0.0 | 52.2 | 19.6 | |
| 113 | Grok-4.1 Thinking grok-4.1-thinking-2025-11-17 multimodalvisionmulti-input reasoning | xAI | 48.5 Inference | 0.0 | 48.5 | 0.0 | 0.0 | 17.8 | |
| 114 | Grok Code Fast 1 grok-code-fast-1 codeprogrammingtool use | xAI | 47.7 Inference | 0.0 | 47.7 | 0.0 | 38.8 | 49.7 | $0.2 in / $1.5 out |
| 115 | GPT-4o gpt-4o-2024-08-06 multimodalvisionmulti-input reasoning | OpenAI | 46.7 Inference | 31.5 | 46.7 | 14.9 | 4.3 | 26.8 | |
| 116 | DeepSeek-V2.5 deepseek-v2.5 codeprogrammingtool use | DeepSeek | 46.5 Inference | 0.0 | 46.5 | 0.0 | 0.9 | 79.7 | $0.14 in / $0.28 out |
| 117 | GLM-5.1 glm-5.1 codeprogrammingtool use | Zhipu AI | 46.1 Inference | 66.8 | 46.1 | 51.5 | 60.2 | 30.2 | $1.4 in / $4.4 out |
| 118 | Kimi K2 Instruct kimi-k2-instruct codeprogrammingtool use | Moonshot AI | 46.1 Inference | 24.4 | 46.1 | 14.8 | 15.3 | 62.1 | $0.5 in / $0.5 out |
| 119 | GPT-4o gpt-4o-2024-05-13 multimodalvisionmulti-input reasoning | OpenAI | 45.4 Inference | 22.3 | 45.4 | 0.0 | 0.0 | 26.5 | |
| 120 | GPT-4o mini gpt-4o-mini-2024-07-18 multimodalvisionmulti-input reasoning | OpenAI | 45.4 Inference | 14.8 | 45.4 | 0.0 | 0.0 | 65.1 |
Grok-3
xAI
52.7
$3 in / $15 out
Grok-3 Mini
xAI
52.7
$0.3 in / $0.5 out
LongCat-Flash-Chat
Meituan
52.7
$0.3 in / $1.2 out
Want benchmark charts, model comparison, and pricing analytics?
Sign in to access the full interactive leaderboard with deep benchmark breakdowns and model comparison tools.
Open full leaderboardRankings are based on multi-dimensional evaluation across benchmark quality, inference efficiency, and cost-per-output. Scores are updated continuously and may differ from individual third-party benchmarks.
| $0.3 in / $1.2 out |
| $2.5 in / $15 out |
| $1.25 in / $10 out |
| $1.25 in / $10 out |
| $1.75 in / $14 out |
| $1.75 in / $14 out |
| $3 in / $15 out |
| $2.5 in / $10 out |
| $2.5 in / $10 out |
| $0.15 in / $0.6 out |
LongCat-Flash-Thinking-2601
Meituan
52.7
$0.3 in / $1.2 out
Nova Micro
Amazon
52.7
$0.03 in / $0.14 out
GLM-4.7
Zhipu AI
52.2
$0.6 in / $2.2 out
MiniMax M2.7
MiniMax
52.2
$0.3 in / $1.2 out
GPT-5.1 Codex
OpenAI
49.0
$1.25 in / $10 out
GPT-5.1 Codex High
OpenAI
49.0
$1.25 in / $10 out
GPT-5.2 Codex
OpenAI
49.0
$1.75 in / $14 out
Grok-4.1 Thinking
xAI
48.5
$3 in / $15 out
Grok Code Fast 1
xAI
47.7
$0.2 in / $1.5 out
GPT-4o
OpenAI
46.7
$2.5 in / $10 out
DeepSeek-V2.5
DeepSeek
46.5
$0.14 in / $0.28 out
GLM-5.1
Zhipu AI
46.1
$1.4 in / $4.4 out
Kimi K2 Instruct
Moonshot AI
46.1
$0.5 in / $0.5 out
GPT-4o
OpenAI
45.4
$2.5 in / $10 out
GPT-4o mini
OpenAI
45.4
$0.15 in / $0.6 out