FrontierMath
A math benchmark testing the limits of AI
Unprecedented difficulty
Each problem demands hours of work from expert mathematicians. Even the most advanced AI systems today, including GPT-4 and Gemini, solve less than 2% of them.
True evaluation
All problems are new and unpublished, eliminating data contamination concerns that plague existing benchmarks.
Mathematical depth
Created in collaboration with over 60 mathematicians, FrontierMath spans the full spectrum of modern mathematics, from algebraic geometry to Zermelo–Fraenkel set theory.
(top 25% of difficulty)
Learn more
Why we need stronger math benchmarks
Learn more about the development, significance, and future implications of FrontierMath.
Sample problems from the benchmark
Explore representative problems from our benchmark, showcasing the depth and breadth of mathematical challenges.
Expert perspectives
How will AI transform mathematics? Fields Medalists and other leading mathematicians discuss whether they expect AI to automate advanced math research.
Exploring AI’s mathematical limits
Read the full academic paper introducing FrontierMath, including methodology, evaluation procedures, and detailed analysis.
Read more