---
title: "Gemini 2.5 Flash Benchmarks - RankedAGI"
description: "Detailed benchmark and metadata record for Gemini 2.5 Flash."
source: "https://rankedagi.com/models/gemini-2.5-flash-preview"
---

# Gemini 2.5 Flash

| Field | Value |
| --- | --- |
| Organization | Google |
| License | Proprietary |
| Version | Latest |
| Released | 2025-05-20 |
| Context window | 1M |
| Knowledge cutoff | 2025-01-01 |
| Input cost per million tokens | $0.3 |
| Output cost per million tokens | $2.5 |
| Last updated | 2026-05-04T01:28:27.294Z |

## Benchmarks

| Benchmark | Category | Value | Description | Source |
| --- | --- | --- | --- | --- |
| RankedAGI Coding | coding | 47.4% | RankedAGI Coding Score |  |
| SWEBench Verified | coding | 60.4% | Agentic Coding | https://www.swebench.com/index.html |
| RankedAGI Agentic | Agents | 35.2% | RankedAGI Agentic Score |  |
| LiveCodeBench v6 | coding | 62.0% | LiveCodeBench v6 - Evaluation of Large Language Models for Code |  |
| LiveBench Coding 25.5 | coding | 62.8% | LiveBench Coding Score 2025-05-30 |  |
| LiveCodeBench v5 | coding | 63.9% | LiveCodeBench - Evaluation of Large Language Models for Code |  |
| ChatArena (LMSYS) | coding | 1433 | ChatArena (LMSYS) Coding ELO Score with Style Control |  |
| Aider Polyglot | coding | 61.9% | Aider Polyglot Code Completion Benchmark |  |
| RankedAGI Reasoning | reasoning | 48.2% | RankedAGI Reasoning Score |  |
| Humanity's Last Exam | reasoning | 11.0% | Multidisciplinary Reasoning (no tools) |  |
| GPQA Diamond | reasoning | 82.8% | Generalized Prefix Question Answering Score (Reasoning) PhD Level Reasoning |  |
| LiveBench Reasoning 25.5 | reasoning | 78.5% | LiveBench Reasoning Score 2025-05-30 |  |
| Text Arena | general | 1419 | ChatArena (LMSYS) ELO Score | https://arena.ai/leaderboard/text |
| AIME 2025 | math | 72.0% | AIME 2025 Competition Math |  |
| Vending Bench 2 | Agents | $548.84 | Benchmark for measuring AI model performance on running a business over long time horizons. Models are tasked with running a simulated vending machine business over a year and scored on their bank account balance at the end. | https://andonlabs.com/evals/vending-bench-2 |
| LiveBench Math Score | math | 84.1% | LiveBench Math Score 25-05 |  |
| MMLU | knowledge | 88.4% | Image Reasoning |  |
| MMMU | imaging | 79.7% | Multimodal Understanding College-level visual problem-solving |  |
| LiveBench IF 25.5 | instruction-following | 79.6% | LiveBench Instruction Following Score 2025-05-30 |  |
| LiveBench Average 25.5 | general | 64.7% | LiveBench Average Score 2025-05-30 |  |
| LiveBench Coding 25.4 | coding | 62.8% | LiveBench Coding Score 2025-04 | https://livebench.ai/ |
| LiveBench Agentic Coding 25.5 | coding | 18.3% | LiveBench Agentic Coding Score 2025-05-30 | https://livebench.ai/ |
| RankedAGI Math | math | 63.3% | RankedAGI Math Score |  |
| RankedAGI Overall | general | 47.1% | Overall RankedAGI score |  |
| GDPval-AA Elo | Agents | 706 | Office Tasks (Artificial Analysis) |  |
| Code DesignArena | Design | 1124 |  | https://www.designarena.ai/leaderboard/code |
| Toolathlon | Agents | 3.7% | Benchmark to assess language agents' general tool use in realistic environments. | https://toolathlon.xyz/introduction |
