A benchmark of several hundred unpublished, highly challenging mathematics problems. Difficulty Tiers 1-3 cover undergraduate problems through exploratory problems suitable for an advanced graduate student, while Tier 4 is research-level mathematics.
On 2026-06-12, we released v2 which addressed errors in 42% of problems.
This project is supported by OpenAI.