---
title: "RankedAGI - AI Models Ranked by Latest Benchmarks"
description: "Compare benchmarks across different AI models, including coding, reasoning, agentic, math, multimodal, context, cost, and release metadata."
source: "https://rankedagi.com"
---

# RankedAGI Leaderboard

RankedAGI ranks AI models using public benchmark evidence and RankedAGI composite scores. The HTML page is interactive; this Markdown version is a compact agent-readable leaderboard.

| Rank | Model | Organization | Size | Version | Overall | Coding | Reasoning | Agentic | Math | Model page |
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
| 1 | Claude Mythos | Anthropic |  | Preview | 78.9% | 95.2% | 89.5% | 93.5% | 71.5% | https://rankedagi.com/models/claude-mythos-preview |
| 2 | GPT 5.5 Pro | OpenAI |  | Latest | 75.4% | 91.9% | 82.8% | 90.2% | 61.1% | https://rankedagi.com/models/gpt-5-5-pro |
| 3 | GPT 5.4 Pro | OpenAI |  | Latest | 74.4% | 89.4% | 81.6% | 89.1% | 61.4% | https://rankedagi.com/models/gpt-5-4-pro |
| 4 | Gemini 3 Deep Think | Google |  | Latest | 74.2% | 87.1% | 86.0% | 83.4% | 71.2% | https://rankedagi.com/models/gemini-3-deep-think |
| 5 | GPT 5.5 | OpenAI |  | Latest | 73.1% | 85.5% | 83.4% | 81.6% | 72.6% | https://rankedagi.com/models/gpt-5-5 |
| 6 | GPT‑5.4 | OpenAI |  | Latest | 70.7% | 82.9% | 77.4% | 77.8% | 76.5% | https://rankedagi.com/models/gpt-5-4 |
| 7 | Claude Opus 4.7 | Anthropic |  | Latest | 70.6% | 87.4% | 70.7% | 80.4% | 72.4% | https://rankedagi.com/models/claude-opus-4-7 |
| 8 | MiMo V2.5 Pro | Xiaomi |  | Latest | 69.5% | 82.3% | 72.1% | 79.3% | 71.5% | https://rankedagi.com/models/mimo-v2-5-pro |
| 9 | Claude Opus 4.6 | Anthropic |  | Latest | 68.1% | 80.0% | 76.8% | 68.8% | 76.8% | https://rankedagi.com/models/claude-opus-4-6 |
| 10 | Kimi K2.6 | Moonshot |  | Latest | 67.7% | 78.1% | 76.6% | 69.3% | 76.7% | https://rankedagi.com/models/kimi-k2-6 |
| 11 | Gemini 3.1 Pro | Google |  | Latest | 67.7% | 81.4% | 77.5% | 66.8% | 72.6% | https://rankedagi.com/models/gemini-3-1-pro |
| 12 | DeepSeek V4 Pro | DeepSeek | 1.6T | Latest | 67.5% | 78.8% | 75.6% | 69.7% | 72.8% | https://rankedagi.com/models/deepseek-v4-pro |
| 13 | Grok 4 Heavy | xAI |  | Latest | 67.4% | 73.7% | 76.7% | 72.8% | 76.5% | https://rankedagi.com/models/grok-4-heavy |
| 14 | Muse Spark | Meta |  | Latest | 67.3% | 80.8% | 69.9% | 73.1% | 72.6% | https://rankedagi.com/models/muse-spark |
| 15 | Claude Sonnet 4.6 | Anthropic |  | Latest | 67.1% | 81.0% | 74.3% | 67.1% | 73.0% | https://rankedagi.com/models/claude-sonnet-4-6 |
| 16 | GLM 5.1 | Z.ai | 754B | Latest | 66.7% | 77.4% | 72.4% | 69.5% | 76.5% | https://rankedagi.com/models/glm-5-1-754b |
| 17 | MiMo V2.5 | Xiaomi |  | Latest | 66.4% | 77.9% | 69.3% | 75.2% | 62.3% | https://rankedagi.com/models/mimo-v2-5 |
| 18 | GPT 5.3 Codex | OpenAI |  | Latest | 66.3% | 74.6% | 73.7% | 71.7% | 70.2% | https://rankedagi.com/models/gpt-5-3-codex |
| 19 | GPT 5.2 Pro | OpenAI |  | Latest | 66.2% | 76.2% | 69.9% | 71.3% | 76.5% | https://rankedagi.com/models/gpt-5-2-pro |
| 20 | Qwen 3.6 Plus | Alibaba |  | Latest | 65.7% | 77.7% | 69.7% | 68.1% | 76.4% | https://rankedagi.com/models/qwen-3-6-plus |
| 21 | Qwen 3.6 Max | Alibaba |  | Preview | 65.6% | 77.2% | 68.3% | 73.3% | 61.9% | https://rankedagi.com/models/qwen-3-6-max-preview |
| 22 | Grok 4.3 | xAI |  | Latest | 65.2% | 76.8% | 71.8% | 65.1% | 72.0% | https://rankedagi.com/models/grok-4-3 |
| 23 | Gemini 3 Pro | Google |  | Preview | 65.0% | 76.2% | 72.7% | 63.8% | 73.0% | https://rankedagi.com/models/gemini-3-pro-preview |
| 24 | DeepSeek V4 Flash | DeepSeek | 284B | Latest | 64.5% | 73.8% | 71.3% | 65.3% | 72.7% | https://rankedagi.com/models/deepseek-v4-flash |
| 25 | GLM 5 | Z.ai | 744B | Latest | 64.3% | 73.2% | 70.3% | 64.7% | 76.0% | https://rankedagi.com/models/glm-5-744b |
| 26 | Qwen 3.6 | Alibaba | 27B | Latest | 63.9% | 73.2% | 65.1% | 69.0% | 74.9% | https://rankedagi.com/models/qwen-3-6-27b |
| 27 | GPT 5.2 | OpenAI |  | Latest | 63.5% | 72.9% | 69.7% | 61.4% | 77.2% | https://rankedagi.com/models/gpt-5-2 |
| 28 | Claude Opus 4.5 | Anthropic |  | Latest | 63.0% | 77.6% | 65.0% | 60.1% | 73.5% | https://rankedagi.com/models/claude-opus-4-5 |
| 29 | GPT-5 pro | OpenAI |  | Latest | 62.9% | 68.5% | 68.6% | 64.9% | 76.1% | https://rankedagi.com/models/gpt-5-pro |
| 30 | Qwen 3.5 A17B | Alibaba | 397B | Latest | 62.5% | 71.4% | 66.5% | 63.0% | 75.3% | https://rankedagi.com/models/qwen-3-5-a17b-397b |
| 31 | Grok 4.2 | xAI |  | Latest | 62.5% | 70.6% | 69.6% | 60.3% | 72.2% | https://rankedagi.com/models/grok-4-20 |
| 32 | MiniMax M2.7 | MiniMax |  | Latest | 61.9% | 68.8% | 64.0% | 66.9% | 68.3% | https://rankedagi.com/models/minimax-m2-7 |
| 33 | MiMo V2 Pro | Xiaomi |  | Latest | 61.9% | 70.3% | 64.9% | 62.5% | 72.2% | https://rankedagi.com/models/mimo-v2-pro |
| 34 | Kimi K2.5 | Moonshot | 1T | Latest | 61.6% | 69.8% | 70.4% | 56.0% | 76.8% | https://rankedagi.com/models/kimi-k2-5 |
| 35 | Gemini 2.5 Deep Think | Google |  | Latest | 61.6% | 71.9% | 70.0% | 54.3% | 75.6% | https://rankedagi.com/models/gemini-2.5-deep-think |
| 36 | GPT-5 Thinking | OpenAI |  | Latest | 61.3% | 69.9% | 63.6% | 60.2% | 77.2% | https://rankedagi.com/models/gpt-5-thinking |
| 37 | GPT 5.4 mini | OpenAI |  | Latest | 61.1% | 68.7% | 65.5% | 59.7% | 72.5% | https://rankedagi.com/models/gpt-5-4-mini |
| 38 | DeepSeek V3.2 Speciale | DeepSeek | 685B | Latest | 60.8% | 66.2% | 67.3% | 59.7% | 75.7% | https://rankedagi.com/models/deepseek-v3-2-speciale |
| 39 | Qwen 3.6 A3B | Alibaba | 35B | Latest | 59.7% | 67.4% | 60.6% | 60.3% | 73.7% | https://rankedagi.com/models/qwen-3-6-a3b-35b |
| 40 | Grok 4 | xAI |  | Latest | 59.7% | 61.6% | 71.4% | 52.8% | 79.6% | https://rankedagi.com/models/grok-4 |
| 41 | GPT 5.1 | OpenAI |  | Latest | 59.3% | 68.6% | 58.6% | 57.4% | 76.2% | https://rankedagi.com/models/gpt-5-1 |
| 42 | MiniMax M2.5 | MiniMax |  | Latest | 59.2% | 69.7% | 59.7% | 57.6% | 70.1% | https://rankedagi.com/models/minimax-m2-5 |
| 43 | Qwen 3.5 | Alibaba | 27B | Latest | 58.9% | 67.7% | 62.3% | 54.5% | 74.5% | https://rankedagi.com/models/qwen-3-5-27b |
| 44 | MiniMax M2.1 | MiniMax |  | Latest | 58.9% | 68.0% | 57.6% | 61.3% | 65.1% | https://rankedagi.com/models/minimax-m2-1 |
| 45 | DeepSeek V3.2 | DeepSeek | 685B | Latest | 58.8% | 67.7% | 60.3% | 54.9% | 75.4% | https://rankedagi.com/models/deepseek-v3-2 |
| 46 | o3 high | OpenAI |  | Latest | 58.5% | 70.1% | 66.1% | 46.9% | 70.8% | https://rankedagi.com/models/o3-high |
| 47 | GPT 5.2 Codex | OpenAI |  | Latest | 58.3% | 65.0% | 60.6% | 59.0% | 62.1% | https://rankedagi.com/models/gpt-5-2-codex |
| 48 | o3 Pro | OpenAI |  | Latest | 57.6% | 64.7% | 64.6% | 51.4% | 64.3% | https://rankedagi.com/models/o3-pro |
| 49 | Grok 3 (Think) | xAI |  | Preview | 57.6% | 63.0% | 58.6% | 56.1% | 73.2% | https://rankedagi.com/models/grok-3-think-beta |
| 50 | Claude Opus 4.1 | Anthropic |  | Latest | 57.5% | 67.3% | 57.5% | 54.7% | 67.8% | https://rankedagi.com/models/claude-opus-4.1 |
| 51 | Muse Spark Contemplating | Meta |  | Latest | 57.1% | 37.7% | 72.4% | 71.1% |  | https://rankedagi.com/models/muse-spark-contemplating |
| 52 | DeepSeek V3.1 Think | DeepSeek | 840B | Latest | 57.1% | 62.8% | 63.3% | 49.0% | 73.5% | https://rankedagi.com/models/deepseek-v3-1-reasoner |
| 53 | Gemma 4 | Google | 31B | Latest | 56.8% | 62.5% | 59.1% | 53.2% | 71.9% | https://rankedagi.com/models/gemma-4-31b |
| 54 | MiMo V2 Flash | Xiaomi |  | Latest | 56.7% | 65.6% | 55.4% | 53.9% | 69.1% | https://rankedagi.com/models/mimo-v2-flash |
| 55 | Claude Sonnet 4.5 | Anthropic |  | Latest | 56.2% | 66.8% | 54.1% | 51.1% | 72.4% | https://rankedagi.com/models/claude-sonnet-4.5 |
| 56 | SWE-1.6 | Cognition |  | Latest | 55.8% | 59.9% |  | 59.9% |  | https://rankedagi.com/models/swe-1-6 |
| 57 | Mistral Medium 3.5 | Mistral | 128B | Latest | 55.4% | 63.5% | 55.4% | 52.0% | 63.7% | https://rankedagi.com/models/mistral-medium-3-5-128b |
| 58 | Grok 4.1 Fast Reasoning | xAI |  | Latest | 55.0% | 62.3% | 55.1% | 48.2% | 72.1% | https://rankedagi.com/models/grok-4-1-fast-reasoning |
| 59 | Gemini 2.5 Pro | Google |  | Old | 55.0% | 60.8% | 55.0% | 49.8% | 71.9% | https://rankedagi.com/models/gemini-2.5-pro-preview-03-25 |
| 60 | GPT 5.4 nano | OpenAI |  | Latest | 54.8% | 57.6% | 58.7% | 49.4% | 68.7% | https://rankedagi.com/models/gpt-5-4-nano |
| 61 | Gemini 2.5 Pro | Google |  | Latest | 54.6% | 58.6% | 60.1% | 45.9% | 66.8% | https://rankedagi.com/models/gemini-2.5-pro |
| 62 | o4 mini high | OpenAI |  | Latest | 53.4% | 58.2% | 59.7% | 41.4% | 70.8% | https://rankedagi.com/models/o4-mini-high |
| 63 | o3 | OpenAI |  | Latest | 53.4% | 59.4% | 55.2% | 45.8% | 63.3% | https://rankedagi.com/models/o3 |
| 64 | Claude 4 Opus Thinking | Anthropic |  | Latest | 53.4% | 61.6% | 56.4% | 41.3% | 70.7% | https://rankedagi.com/models/claude-4-opus-thinking |
| 65 | Gemini 2.5 Pro | Google |  | Old | 53.3% | 56.9% | 54.1% | 45.7% | 74.8% | https://rankedagi.com/models/gemini-2.5-pro-preview-05-06 |
| 66 | Claude 4 Opus | Anthropic |  | Latest | 53.1% | 60.5% | 48.1% | 51.0% | 62.0% | https://rankedagi.com/models/claude-4-opus |
| 67 | o1 Pro | OpenAI |  | Latest | 53.0% | 54.1% | 62.1% | 43.2% | 66.6% | https://rankedagi.com/models/o1-pro |
| 68 | DeepSeek R1 | DeepSeek | 685B | Latest | 52.6% | 56.8% | 53.9% | 46.0% | 62.6% | https://rankedagi.com/models/deepseek-r1-1 |
| 69 | Claude 4 Sonnet Thinking | Anthropic |  | Latest | 52.4% | 53.2% | 58.7% | 45.0% | 62.8% | https://rankedagi.com/models/claude-4-sonnet-thinking |
| 70 | GPT-5 mini | OpenAI |  | Latest | 52.3% | 55.4% | 55.2% | 42.8% | 72.1% | https://rankedagi.com/models/gpt-5-mini |
| 71 | Qwen 3 A22B Thinking | Alibaba | 235B | Latest | 52.2% | 56.1% | 54.8% | 43.1% | 72.1% | https://rankedagi.com/models/qwen-3-a22b-thinking-235b |
| 72 | SWE-1.5 | Cognition |  | Latest | 51.8% | 53.0% |  | 53.0% |  | https://rankedagi.com/models/swe-1-5 |
| 73 | Claude 4 Sonnet | Anthropic |  | Latest | 51.7% | 58.5% | 45.8% | 49.5% | 60.2% | https://rankedagi.com/models/claude-4-sonnet |
| 74 | gpt-oss-120b | OpenAI | 120B | Latest | 51.6% | 53.3% | 55.9% | 41.0% | 72.7% | https://rankedagi.com/models/gpt-oss-120b |
| 75 | GPT-5 | OpenAI |  | Latest | 51.5% | 57.5% | 44.1% | 49.0% | 67.2% | https://rankedagi.com/models/gpt-5 |
| 76 | o4 mini | OpenAI |  | Latest | 51.3% | 55.8% | 53.1% | 42.3% | 61.5% | https://rankedagi.com/models/o4-mini |
| 77 | Kimi K2 | Moonshot |  | Latest | 50.9% | 55.4% | 50.6% | 43.8% | 63.1% | https://rankedagi.com/models/kimi-k2 |
| 78 | Claude Haiku 4.5 | Anthropic |  | Latest | 50.8% | 57.9% | 45.3% | 45.2% | 68.0% | https://rankedagi.com/models/claude-haiku-4-5 |
| 79 | MiniMax M2 | MiniMax |  | Latest | 50.2% | 57.6% | 46.3% | 43.2% | 62.0% | https://rankedagi.com/models/minimax-m2 |
| 80 | o3 mini high | OpenAI |  | Latest | 50.0% | 52.3% | 50.6% | 40.9% | 68.7% | https://rankedagi.com/models/o3-mini-high |
| 81 | Grok 3 mini (Think) | xAI |  | Preview | 49.5% | 48.9% | 55.2% | 36.6% | 71.0% | https://rankedagi.com/models/grok-3-mini-think-beta |
| 82 | DeepSeek V3.1 | DeepSeek | 840B | Latest | 49.1% | 52.6% | 46.9% | 41.5% | 61.0% | https://rankedagi.com/models/deepseek-v3-1 |
| 83 | Qwen3 A3B Thinking | Alibaba | 30B | Latest | 48.6% | 48.9% | 51.2% | 38.7% | 65.4% | https://rankedagi.com/models/qwen3-30b-a3b-thinking |
| 84 | QWQ Max | Alibaba |  | Preview | 48.5% | 46.5% |  |  |  | https://rankedagi.com/models/qwq-max-preview |
| 85 | Qwen 3 A22B | Alibaba | 235B | Latest | 48.2% | 48.5% | 50.2% | 38.4% | 65.0% | https://rankedagi.com/models/qwen-3-a22b-235b |
| 86 | EXAONE 4.0 | LG | 32B | Latest | 48.2% | 48.7% | 49.8% | 38.6% | 66.1% | https://rankedagi.com/models/exaone-4.0 |
| 87 | o3 mini medium | OpenAI |  | Latest | 48.2% | 49.5% | 48.1% | 38.7% | 66.4% | https://rankedagi.com/models/o3-mini-medium |
| 88 | gpt-oss-20b | OpenAI | 20B | Latest | 48.1% | 46.3% | 49.7% | 38.6% | 71.2% | https://rankedagi.com/models/gpt-oss-20b |
| 89 | Qwen 3 Coder | Alibaba | 480B | Latest | 48.0% | 54.7% | 43.1% | 39.9% | 58.3% | https://rankedagi.com/models/qwen-3-coder-480b |
| 90 | Claude 3.7 Sonnet | Anthropic |  | Latest | 48.0% | 56.5% | 38.7% | 44.7% | 48.2% | https://rankedagi.com/models/claude-3.7-sonnet |
| 91 | Grok 3 mini | xAI |  | Latest | 47.9% | 49.0% | 46.1% | 39.1% | 66.4% | https://rankedagi.com/models/grok-3-mini |
| 92 | o1 | OpenAI |  | Latest | 47.8% | 49.0% | 47.0% | 37.9% | 65.8% | https://rankedagi.com/models/o1 |
| 93 | Gemini 2.5 Flash | Google |  | Latest | 47.8% | 50.2% | 48.0% | 36.0% | 62.8% | https://rankedagi.com/models/gemini-2.5-flash-preview |
| 94 | o1 Preview | OpenAI |  | Old | 47.5% | 48.1% | 46.2% | 40.8% | 55.8% | https://rankedagi.com/models/o1-preview |
| 95 | Gemini 2.5 Flash | Google |  | Preview | 47.5% | 48.6% | 47.1% | 36.1% | 67.3% | https://rankedagi.com/models/gemini-2.5-flash-preview-04-17 |
| 96 | DeepSeek R1 | DeepSeek | 671B | Latest | 47.4% | 49.7% | 43.4% | 39.9% | 62.0% | https://rankedagi.com/models/deepseek-r1 |
| 97 | GPT-5 nano | OpenAI |  | Latest | 47.0% | 45.8% | 48.6% | 36.9% | 63.6% | https://rankedagi.com/models/gpt-5-nano |
| 98 | GPT 4.5 | OpenAI |  | Preview | 46.8% | 47.5% | 43.4% | 40.9% | 54.3% | https://rankedagi.com/models/gpt-4.5-preview |
| 99 | Grok 3 | xAI |  | Latest | 46.8% | 47.7% | 45.9% | 37.5% | 57.3% | https://rankedagi.com/models/grok-3 |
| 100 | Claude 3.7 Sonnet Thinking | Anthropic |  | Latest | 46.4% | 48.6% | 43.0% | 38.4% | 55.7% | https://rankedagi.com/models/claude-3.7-sonnet-thinking |
| 101 | Llama 4 Behemoth | Meta | 2T | Preview | 46.2% | 48.1% | 46.6% | 39.3% | 42.4% | https://rankedagi.com/models/llama-4-behemoth-preview |
| 102 | Qwen 3 A22B | Alibaba | 235B | Old | 46.1% | 47.6% | 47.6% | 31.7% | 66.8% | https://rankedagi.com/models/qwen-3-235b-a22b-old |
| 103 | Kimi K2 | Moonshot | 1T | old | 45.8% | 50.4% | 40.0% | 38.3% | 54.0% | https://rankedagi.com/models/kimi-k2-1t |
| 104 | GPT 4.1 | OpenAI |  | Latest | 45.2% | 46.0% | 42.4% | 36.0% | 53.5% | https://rankedagi.com/models/gpt-4.1 |
| 105 | o3 mini low | OpenAI |  | Latest | 45.1% | 45.5% | 43.2% | 35.4% | 56.7% | https://rankedagi.com/models/o3-mini-low |
| 106 | Gemini 2.0 Flash Exp | Google |  | Old | 44.8% | 42.9% | 41.1% | 40.1% | 49.8% | https://rankedagi.com/models/gemini-2.0-flash-exp |
| 107 | Qwen 3 | Alibaba | 4B | Latest | 44.7% | 45.2% | 39.9% | 37.2% | 61.1% | https://rankedagi.com/models/qwen-3-4b-235b |
| 108 | Gemini Exp 1206 | Google |  | Old | 44.6% | 48.5% | 35.1% | 43.1% | 37.6% | https://rankedagi.com/models/gemini-exp-1206 |
| 109 | Qwen 3 | Alibaba | 32B | Latest | 44.5% | 45.9% | 46.3% | 26.9% | 68.0% | https://rankedagi.com/models/qwen-3-32b |
| 110 | Gemini 2 Flash Thinking | Google |  | Old | 44.0% | 46.9% | 35.0% | 41.1% | 39.6% | https://rankedagi.com/models/gemini-2-flash-thinking-1219 |
| 111 | Gemini 2 Flash Thinking | Google |  | Preview | 43.6% | 44.0% | 38.9% | 33.8% | 54.9% | https://rankedagi.com/models/gemini-2-flash-thinking-0121 |
| 112 | GPT 4o | OpenAI |  | Old | 43.5% | 46.2% | 34.5% | 39.2% | 41.9% | https://rankedagi.com/models/gpt-4o-2501 |
| 113 | Grok Code Fast 1 | xAI |  | Latest | 43.3% | 43.0% | 39.7% | 35.1% | 53.1% | https://rankedagi.com/models/grok-code-fast-1 |
| 114 | EXAONE Deep | LG | 32B | Latest | 43.2% | 42.7% | 46.4% | 24.5% | 67.9% | https://rankedagi.com/models/exaone-deep-32b |
| 115 | R1 Lite Preview | DeepSeek |  | Preview | 43.1% | 45.0% | 41.5% | 32.0% | 48.0% | https://rankedagi.com/models/r1-lite-preview |
| 116 | Qwen 3 A3B | Alibaba | 30B | Latest | 42.6% | 40.6% | 44.1% | 28.7% | 57.1% | https://rankedagi.com/models/qwen-3-30b-a3b |
| 117 | Qwen 3 A3B | Alibaba | 30B | old | 42.3% | 39.3% | 44.0% | 27.5% | 62.6% | https://rankedagi.com/models/qwen3-30b-a3b-old |
| 118 | Qwen 2.5 VL | Alibaba | 32B | Latest | 42.2% | 42.6% | 37.6% | 38.8% | 32.6% | https://rankedagi.com/models/qwen-2.5-vl-32b |
| 119 | R1 Distill Qwen | DeepSeek | 14B | Latest | 42.1% | 41.9% | 41.9% | 27.3% | 57.1% | https://rankedagi.com/models/r1-distill-qwen-14b |
| 120 | Gemini Exp 1121 | Google |  | Old | 42.1% | 43.9% | 32.8% | 38.1% | 37.1% | https://rankedagi.com/models/gemini-exp-1121 |
| 121 | Mistral Medium 3 | Mistral |  | Latest | 42.0% | 42.9% | 36.3% | 31.3% | 50.5% | https://rankedagi.com/models/mistral-medium-3 |
| 122 | o1 mini | OpenAI |  | Latest | 41.9% | 41.4% | 38.5% | 30.9% | 51.6% | https://rankedagi.com/models/o1-mini |
| 123 | Sky T1 Preview | NovaSky | 32B | Preview | 41.6% | 35.6% | 42.1% | 35.1% | 43.6% | https://rankedagi.com/models/sky-t1-preview-32b |
| 124 | EXAONE Deep | LG | 7.8B | Latest | 41.6% | 39.8% | 43.9% | 23.3% | 65.8% | https://rankedagi.com/models/exaone-deep-7.8b |
| 125 | DeepSeek V3 | DeepSeek | 685B | Latest | 41.6% | 41.2% | 39.5% | 26.6% | 54.4% | https://rankedagi.com/models/deepseek-3 |
| 126 | Gemini Exp 1114 | Google |  | Old | 41.5% | 42.6% | 32.9% | 36.7% | 37.3% | https://rankedagi.com/models/gemini-exp-1114 |
| 127 | Mercury Coder Small | Inception Labs |  | Preview | 41.3% | 41.1% | 37.5% | 36.0% | 33.4% | https://rankedagi.com/models/mercury-coder-small-preview |
| 128 | Deepseek V3 | DeepSeek | 671B | Old | 41.3% | 39.9% | 36.3% | 33.5% | 42.8% | https://rankedagi.com/models/deepseek-3-old |
| 129 | R1 Distill Llama | DeepSeek | 8B | Latest | 41.0% | 41.1% | 38.1% | 29.9% | 47.3% | https://rankedagi.com/models/r1-distill-llama-8b |
| 130 | Devstral Small 1.1 | Mistral | 24B | Latest | 41.0% | 39.9% | 35.1% | 33.3% | 47.8% | https://rankedagi.com/models/devstral-small-1.1-24b |
| 131 | Qwen 2.5 Max | Alibaba |  | Latest | 40.8% | 36.4% | 37.7% | 33.4% | 42.8% | https://rankedagi.com/models/qwen-2.5-max |
| 132 | Gemini 2.0 Pro | Google |  | Preview | 40.8% | 39.3% | 36.5% | 30.4% | 45.4% | https://rankedagi.com/models/gemini-2.0-pro-exp |
| 133 | QwQ | Alibaba | 32B | Latest | 40.7% | 39.3% | 38.8% | 23.6% | 63.1% | https://rankedagi.com/models/qwq-32b |
| 134 | Gemini 2.5 Flash Lite Thinking | Google |  | Latest | 40.6% | 38.0% | 36.9% | 29.4% | 56.5% | https://rankedagi.com/models/gemini-2.5-flash-lite-thinking |
| 135 | Claude 3.5 Sonnet | Anthropic |  | Latest | 40.4% | 41.6% | 34.2% | 31.5% | 36.3% | https://rankedagi.com/models/claude-3.5-sonnet |
| 136 | R1 Distill Qwen | DeepSeek | 7B | Latest | 40.1% | 37.8% | 38.7% | 26.0% | 55.5% | https://rankedagi.com/models/r1-distill-qwen-7b |
| 137 | R1 Distill Llama | DeepSeek | 70B | Latest | 40.1% | 38.1% | 36.7% | 27.3% | 57.6% | https://rankedagi.com/models/r1-distill-llama-70b |
| 138 | Gemini 1.5 Pro 002 | Google |  | Old | 39.7% | 33.0% | 38.0% | 30.4% | 47.5% | https://rankedagi.com/models/gemini-1.5-pro-002 |
| 139 | Claude 3.5 Sonnet | Anthropic |  | Old | 39.5% | 34.5% | 38.1% | 29.0% | 45.0% | https://rankedagi.com/models/claude-3.5-sonnet-old |
| 140 | Devstral Medium | Mistral |  | Latest | 39.5% | 43.6% | 30.7% | 31.1% | 35.4% | https://rankedagi.com/models/devstral-medium |
| 141 | Mercury Coder Mini | Inception Labs |  | Preview | 39.4% | 36.8% | 34.9% | 33.9% | 33.1% | https://rankedagi.com/models/mercury-coder-mini-preview |
| 142 | Llama 4 Maverick | Meta | 400B | Latest | 39.2% | 33.6% | 40.8% | 24.7% | 48.3% | https://rankedagi.com/models/llama-4-maverick-400b |
| 143 | Gemini 2.5 Flash Lite | Google |  | Latest | 39.1% | 36.1% | 34.6% | 27.4% | 52.3% | https://rankedagi.com/models/gemini-2.5-flash-lite |
| 144 | GPT 4.1 mini | OpenAI |  | Latest | 39.1% | 33.2% | 39.3% | 24.7% | 48.7% | https://rankedagi.com/models/gpt-4.1-mini |
| 145 | EXAONE Deep | LG | 2.4B | Latest | 39.0% | 35.2% | 39.1% | 21.0% | 63.2% | https://rankedagi.com/models/exaone-deep-2.4b |
| 146 | R1 Distill Qwen | DeepSeek | 32B | Latest | 38.6% | 35.4% | 35.9% | 24.6% | 56.3% | https://rankedagi.com/models/r1-distill-qwen-32b |
| 147 | Gemini 2.0 Flash | Google |  | Latest | 38.5% | 34.0% | 36.2% | 26.2% | 42.5% | https://rankedagi.com/models/gemini-2.0-flash |
| 148 | Gemini 2.0 Flash-Lite | Google |  | Latest | 38.5% | 31.6% | 36.5% | 29.9% | 38.5% | https://rankedagi.com/models/gemini-2.0-flash-lite |
| 149 | Grok 4.1 Fast | xAI |  | Latest | 38.3% | 34.2% | 33.7% | 28.1% | 46.4% | https://rankedagi.com/models/grok-4-1-fast |
| 150 | Qwen 2.5 Coder | Alibaba | 14B | Latest | 38.3% | 37.2% | 27.8% | 34.6% | 36.7% | https://rankedagi.com/models/qwen-2.5-coder-14b |
| 151 | QwQ Preview | Alibaba | 32B | Preview | 38.2% | 33.3% | 41.8% | 20.6% | 47.7% | https://rankedagi.com/models/qwq-preview-32b |
| 152 | Claude 3.5 Haiku | Anthropic |  | Latest | 38.2% | 35.3% | 30.3% | 30.4% | 42.4% | https://rankedagi.com/models/claude-3.5-haiku |
| 153 | GPT 4o | OpenAI |  | Latest | 38.2% | 36.3% | 30.4% | 26.0% | 47.7% | https://rankedagi.com/models/gpt-4o |
| 154 | Gemini 1.5 Flash 002 | Google |  | Old | 37.9% | 29.4% | 36.2% | 28.8% | 43.6% | https://rankedagi.com/models/gemini-1.5-flash-002-old |
| 155 | Llama 3.3 | Meta | 70B | Latest | 37.9% | 33.1% | 34.3% | 26.6% | 44.8% | https://rankedagi.com/models/llama-3.3-70b |
| 156 | Grok-2 mini | xAI |  | Latest | 37.8% | 30.2% | 36.7% | 26.7% | 42.3% | https://rankedagi.com/models/grok-2-mini |
| 157 | Llama 3.2 (Vision) | Meta | 90B | Latest | 37.7% | 32.9% | 35.1% | 30.2% | 32.6% | https://rankedagi.com/models/llama-3.2-(vision)-90b |
| 158 | Deepseek 2.5 | DeepSeek |  | Old | 37.6% | 35.1% | 32.2% | 21.6% | 56.3% | https://rankedagi.com/models/deepseek-2.5-old |
| 159 | Pixtral Large | Mistral | 123B | Latest | 37.5% | 31.4% | 36.0% | 29.2% | 33.7% | https://rankedagi.com/models/pixtral-large-123b |
| 160 | Mistral Large | Mistral | 123B | Old | 37.5% | 30.1% | 35.6% | 26.8% | 42.7% | https://rankedagi.com/models/mistral-large-123b-old |
| 161 | Grok 2 | xAI |  | Latest | 37.1% | 31.9% | 32.8% | 24.8% | 45.0% | https://rankedagi.com/models/grok-2 |
| 162 | Qwen 2.5 Coder | Alibaba | 7B | Latest | 36.6% | 32.6% | 26.5% | 32.9% | 35.7% | https://rankedagi.com/models/qwen-2.5-coder-7b |
| 163 | GPT 4o | OpenAI |  | Old | 36.6% | 31.1% | 31.9% | 26.1% | 36.3% | https://rankedagi.com/models/gpt-4o-old |
| 164 | Gemini 1.5 Pro 001 | Google |  | Old | 36.6% | 31.8% | 29.9% | 28.0% | 37.4% | https://rankedagi.com/models/gemini-1.5-pro-001-old |
| 165 | Phi 4 | Microsoft | 14B | Latest | 36.3% | 29.1% | 35.9% | 23.0% | 42.0% | https://rankedagi.com/models/phi-4-14b |
| 166 | Qwen 2.5 | Alibaba | 32B | Latest | 36.2% | 29.8% | 36.1% | 22.8% | 39.7% | https://rankedagi.com/models/qwen-2.5-32b |
| 167 | Llama 3.1 | Meta | 405B | Latest | 35.9% | 27.7% | 33.7% | 23.2% | 43.6% | https://rankedagi.com/models/llama-3.1-405b |
| 168 | GPT 4o | OpenAI |  | Old | 35.9% | 31.9% | 30.9% | 22.7% | 40.2% | https://rankedagi.com/models/gpt-4o-old2 |
| 169 | Llama 3.1 | Meta | 70B | Old | 35.8% | 26.4% | 34.1% | 24.4% | 41.4% | https://rankedagi.com/models/llama-3.1-70b-old |
| 170 | R1 Distill Qwen | DeepSeek | 1.5B | Latest | 35.5% | 30.6% | 30.7% | 24.6% | 36.9% | https://rankedagi.com/models/r1-distill-qwen-1.5b |
| 171 | Gemini 1.5 Flash 001 | Google |  | Old | 35.4% | 29.4% | 28.7% | 25.4% | 39.3% | https://rankedagi.com/models/gemini-1.5-flash-001-old |
| 172 | DeepSeek Coder 2 | DeepSeek | 236B | Latest | 35.2% | 27.7% | 32.8% | 22.2% | 39.4% | https://rankedagi.com/models/deepseek-coder-2-236b |
| 173 | GPT 4o mini | OpenAI |  | Latest | 35.0% | 27.4% | 30.3% | 24.0% | 39.1% | https://rankedagi.com/models/gpt-4o-mini |
| 174 | DeepSeek 2.5 | DeepSeek | 236B | Old | 34.9% | 27.9% | 32.5% | 21.0% | 40.1% | https://rankedagi.com/models/deepseek-2.5-236b-old |
| 175 | Llama 4 Scout | Meta | 109B | Latest | 34.9% | 28.4% | 35.1% | 18.5% | 41.2% | https://rankedagi.com/models/llama-4-scout-109b |
| 176 | Mistral Small 3 | Mistral | 24B | Latest | 34.8% | 27.4% | 32.8% | 22.4% | 39.6% | https://rankedagi.com/models/mistral-small-3 |
| 177 | Claude 3 Opus | Anthropic |  | Latest | 34.8% | 24.8% | 33.2% | 21.5% | 43.6% | https://rankedagi.com/models/claude-3-opus |
| 178 | Nova Pro | Amazon |  | Latest | 34.6% | 28.1% | 30.6% | 21.5% | 38.1% | https://rankedagi.com/models/nova-pro |
| 179 | Qwen 2.5 | Alibaba | 72B | Latest | 34.6% | 24.8% | 32.1% | 22.0% | 42.2% | https://rankedagi.com/models/qwen-2.5-72b |
| 180 | Mistral Large | Mistral | 123B | Latest | 34.5% | 29.0% | 30.5% | 19.0% | 41.5% | https://rankedagi.com/models/mistral-large-123b |
| 181 | Qwen 2.5 | Alibaba | 14B | Latest | 34.4% | 25.6% | 34.1% | 21.1% | 39.2% | https://rankedagi.com/models/qwen-2.5-14b |
| 182 | Codestral 25.01 | Mistral |  | Latest | 34.3% | 32.6% | 26.5% | 22.1% | 35.1% | https://rankedagi.com/models/codestral-2501 |
| 183 | Command A | Cohere | 111B | Latest | 34.2% | 28.9% | 29.7% | 19.8% | 36.9% | https://rankedagi.com/models/command-a |
| 184 | Mistral Small 3.1 | Mistral | 24B | Latest | 34.2% | 27.7% | 30.9% | 21.0% | 39.9% | https://rankedagi.com/models/mistral-small-3.1-24b |
| 185 | GPT 4.1 nano | OpenAI |  | Latest | 34.2% | 26.8% | 32.1% | 18.7% | 38.8% | https://rankedagi.com/models/gpt-4.1-nano |
| 186 | Qwen 2.5 | Alibaba | 3B | Latest | 34.0% | 27.8% | 22.5% | 31.2% | 30.9% | https://rankedagi.com/models/qwen-2.5-3b |
| 187 | Qwen 2.5 | Alibaba | 7B | Latest | 33.8% | 29.2% | 22.6% | 27.2% | 35.2% | https://rankedagi.com/models/qwen-2.5-7b |
| 188 | Gemma 3 | Google | 27B | Latest | 33.5% | 24.3% | 28.2% | 21.5% | 39.2% | https://rankedagi.com/models/gemma-3-27b |
| 189 | Qwen 2.5 Coder | Alibaba | 3B | Latest | 33.5% | 26.1% | 23.7% | 28.3% | 33.8% | https://rankedagi.com/models/qwen-2.5-coder-3b |
| 190 | Qwen 2.5 Coder | Alibaba | 32B | Latest | 32.9% | 29.2% | 30.3% | 13.3% | 35.4% | https://rankedagi.com/models/qwen-2.5-coder-32b |
| 191 | Gemma 2 | Google | 27B | Latest | 32.0% | 23.9% | 26.3% | 18.2% | 37.1% | https://rankedagi.com/models/gemma-2-27b |
| 192 | Nova Lite | Amazon |  | Latest | 32.0% | 23.6% | 31.4% | 14.4% | 32.9% | https://rankedagi.com/models/nova-lite |
| 193 | Llama 3.2 (Vision) | Meta | 11B | Latest | 32.0% | 22.6% | 25.7% | 23.8% | 30.3% | https://rankedagi.com/models/llama-3.2-(vision)-11b |
| 194 | Gemini 1.5 Flash | Google | 8B | Old | 31.9% | 20.9% | 29.6% | 18.4% | 33.3% | https://rankedagi.com/models/gemini-1.5-flash-8b |
| 195 | Gemma 3 | Google | 12B | Latest | 31.6% | 22.8% | 25.9% | 19.4% | 36.0% | https://rankedagi.com/models/gemme-3-12b |
| 196 | Qwen 2.5 Coder | Alibaba | 1.5B | Latest | 31.3% | 21.8% | 21.9% | 24.8% | 31.8% | https://rankedagi.com/models/qwen-2.5-coder-1.5b |
| 197 | Mistral Nemo | Mistral | 12B | Latest | 31.3% | 21.9% | 27.3% | 17.8% | 36.5% | https://rankedagi.com/models/mistral-nemo-12b |
| 198 | Nova Micro | Amazon |  | Latest | 31.2% | 22.0% | 30.5% | 13.4% | 32.7% | https://rankedagi.com/models/nova-micro |
| 199 | Claude 3 Haiku | Anthropic |  | Old | 31.0% | 21.1% | 24.4% | 18.3% | 36.0% | https://rankedagi.com/models/claude-3-haiku-old |
| 200 | Gemma 3 | Google | 4B | Latest | 30.2% | 21.5% | 23.7% | 17.1% | 33.6% | https://rankedagi.com/models/gemma-3-4b |
| 201 | Gemma 2 | Google | 9B | Latest | 30.2% | 19.3% | 22.7% | 20.4% | 35.1% | https://rankedagi.com/models/gemma-2-9b |
| 202 | Llama 3.1 | Meta | 8B | Latest | 29.9% | 17.7% | 25.1% | 16.2% | 34.4% | https://rankedagi.com/models/llama-3.1-8b |
| 203 | Qwen 2.5 Coder | Alibaba | 0.5B | Latest | 29.3% | 17.7% | 19.6% | 21.9% | 30.6% | https://rankedagi.com/models/qwen-2.5-coder-0.5b |
| 204 | Qwen 2.5 | Alibaba | 1.5B | Latest | 28.2% | 14.8% | 19.7% | 20.2% | 29.4% | https://rankedagi.com/models/qwen-2.5-1.5b |
| 205 | Gemma 2 | Google | 2B | Latest | 28.1% | 13.4% | 19.4% | 21.3% | 29.1% | https://rankedagi.com/models/gemma-2-2b |
| 206 | Llama 3.2 | Meta | 3B | Latest | 27.2% | 13.4% | 21.5% | 12.1% | 32.2% | https://rankedagi.com/models/llama-3.2-3b |
| 207 | Qwen 2.5 | Alibaba | 0.5B | Latest | 25.6% | 10.8% | 15.4% | 16.6% | 26.9% | https://rankedagi.com/models/qwen-2.5-0.5b |
| 208 | Llama 3.2 | Meta | 1B | Latest | 24.6% | 10.0% | 17.4% | 7.9% | 30.5% | https://rankedagi.com/models/llama-3.2-1b |

For methodology, benchmark inclusion, update frequency, and limitations, see https://rankedagi.com/sources.md.