FrontierMath — benchmarking AI against advanced mathematical research
FrontierMath is a benchmark designed to measure AI systems’ mathematical reasoning capabilities. It includes both carefully-crafted challenge problems and open research problems that remain unsolved by mathematicians.
FrontierMath: Tiers 1-4
A benchmark of several hundred unpublished, highly challenging mathematics problems. Difficulty Tiers 1-3 cover undergraduate through early graduate level problems, while Tier 4 is research-level mathematics.
FrontierMath: Open Problems
A collection of unsolved mathematics problems that have resisted serious attempts by professional mathematicians. AI solutions would meaningfully advance the state of human mathematical knowledge.