FrontierMath

A math benchmark testing the limits of AI

Unprecedented difficulty

Each problem demands hours of work from expert mathematicians. Even the most advanced AI systems today, including GPT-4 and Gemini, solve less than 2% of them.

True evaluation

All problems are new and unpublished, eliminating data contamination concerns that plague existing benchmarks.

Mathematical depth

Created in collaboration with over 60 mathematicians, FrontierMath spans the full spectrum of modern mathematics, from algebraic geometry to Zermelo–Fraenkel set theory.

Impressions of our research-level problems

“These are extremely challenging... I think they will resist AIs for several years at least.”

Terence Tao
Terence Tao Fields Medalist (2006)

“Getting even one question right would be well beyond what we can do now, let alone saturating them.”

Timothy Gowers
Timothy Gowers Fields Medalist (1998)

“These are genuinely hard problems... most of them look well above my pay grade.”

Evan Chen
Evan Chen International Mathematical Olympiad Coach

Learn more

Exploring AI’s mathematical limits

Read the full academic paper introducing FrontierMath, including methodology, evaluation procedures, and detailed analysis.

Read more