FrontierMath — benchmarking AI against advanced mathematical research
A benchmark of several hundred unpublished, expert-level mathematics problems that takes specialists hours to days to solve. Difficulty Tiers 1-3 cover undergraduate through early graduate level problems, while Tier 4 is research-level mathematics. This project is supported by OpenAI.