o3 mini
LatestOpenAI
•
Proprietary# 89
Released
Jan 31, 2025
# 19
Knowledge Cutoff
Oct 23
# 9
Context Length
200K
Reasoning levels
Scores above are from the best-performing level. Per-level detail views are coming later.
| Level | RAGI | Coding | Agentic | Reasoning | Math |
|---|---|---|---|---|---|
| medium | 0.491 | 0.491 | 0.409 | 0.502 | 0.670 |
| low | 0.471 | 0.482 | 0.382 | 0.467 | 0.579 |
| high shown above | 0.509 | 0.515 | 0.434 | 0.529 | 0.694 |
Benchmarks
# 86
Code RankedAGI
51.5%
# 45
SWEBench Verified
49.3%
# 103
Agentic RankedAGI
43.4%
# 23
LiveCodeBench v6
68.9%
# 1
LiveCodeBench v5
80.5%
# 9
Code LMArena
1332
# 8
Codeforces ELO
2130
# 13
Aider Polyglot
60.4%
# 2
Code LiveBench (old)
82.7%
# 84
Reason RankedAGI
52.9%
# 52
HLE
14.0%
# 45
GPQA Diamond
79.7%
# 49
Text Arena
1363
# 22
AIME 2025 I & II
86.5%
# 10
AIME 2024
87.3%
# 10
NYT Connections
61.4%
# 17
MMLU
86.9%
# 2
Halluc. Hughes
0.8%
# 2
Avg LiveBench (old)
75.8%
# 9
Coding LiveBench 25.4
65.5%
# 2
Data LiveBench
70.6%
# 12
Language LiveBench
50.7%
# 61
Math RankedAGI
69.4%
# 86
RAGI RankedAGI
50.9%
# 50
GDPval AA
748
Pricing
# 31
Input Cost /M
$1.1
# 36
Output Cost /M
$4.4
# 28
Cached Cost /M
$0.55