FrontierMath
A math benchmark testing the limits of AI
Unprecedented difficulty
Each problem demands hours of work from expert mathematicians. Even the most advanced AI systems today, including GPT-4 and Gemini, solve less than 2% of them.
True evaluation
All problems are new and unpublished, eliminating data contamination concerns that plague existing benchmarks.
Mathematical depth
Created in collaboration with over 60 mathematicians, FrontierMath spans the full spectrum of modern mathematics, from algebraic geometry to Zermelo–Fraenkel set theory.
Impressions of our research-level problems
Learn more
Exploring AI’s mathematical limits
Read the full academic paper introducing FrontierMath, including methodology, evaluation procedures, and detailed analysis.
Read more