Unprecedented difficulty
Each problem demands hours of work from expert mathematicians. Even the most advanced AI systems today, including GPT-4 and Gemini, solve less than 2% of them.
True evaluation
All problems are new and unpublished, eliminating data contamination concerns that plague existing benchmarks.
Mathematical depth
Created in collaboration with over 60 mathematicians, FrontierMath spans the full spectrum of modern mathematics, from algebraic geometry to Zermelo–Fraenkel set theory.
(top 25% of difficulty)
Learn more

Why we need stronger math benchmarks
Learn more about the development, significance, and future implications of FrontierMath.

Sample problems from the benchmark
Explore representative problems from our benchmark, showcasing the depth and breadth of mathematical challenges.

Expert perspectives
How will AI transform mathematics? Fields Medalists and other leading mathematicians discuss whether they expect AI to automate advanced math research.
Exploring AI’s mathematical limits
Read the full academic paper introducing FrontierMath, including methodology, evaluation procedures, and detailed analysis.
Read moreConflict of interest statement
OpenAI commissioned the production of 300 questions for FrontierMath, including all problems up to the upcoming version FrontierMath_XX-XX-25. OpenAI fully owns these 300 questions. They have access to all statements and solutions for questions up to version FrontierMath_XX-XX-25, except for a subset of 50 solutions added in version FrontierMath_XX-XX-25 randomly withheld for holdout evaluation. Epoch AI retains the right to conduct and publish internal evaluations using all questions in FrontierMath.
Version history
FrontierMath_XX-XX-25 (upcoming): Final 300 problem dataset. 50 solutions from this version will be held out from OpenAI for evaluation purposes.
FrontierMath_12-04-24: Includes 197 problems of which 5 were published. [ETA 14 Feb, 2025: we discovered 1 problem in this set was duplicated]
FrontierMath_11-26-24: Includes 180 questions. OpenAI’s o3 announcement on December 20th, 2024, claimed a 25.2% score on this version of the benchmark.
FrontierMath_11-06-24: Includes 147 questions. Internal delivery.
FrontierMath_10-22-24: Includes 119 questions. This version of the benchmark was analyzed in our arXiv paper.