FrontierMath Tiers 1-4

A benchmark of several hundred unpublished, highly challenging mathematics problems. Difficulty Tiers 1-3 cover undergraduate problems through exploratory problems suitable for an advanced graduate student, while Tier 4 is research-level mathematics.

Update (2026-05-11): We're conducting an AI-assisted review of FrontierMath: Tiers 1-4. This has flagged fatal errors in about a third of problems. We believe most are valid flags. We will release updated scores on a corrected dataset after completing a thorough human review.

This project is supported by OpenAI.

;