FrontierMath

A math benchmark testing the limits of AI

In collaboration with OpenAI: Learn more

Unprecedented difficulty

Each problem demands hours of work from expert mathematicians. Even the most advanced AI systems today, including GPT-4 and Gemini, solve less than 2% of them.

True evaluation

All problems are new and unpublished, eliminating data contamination concerns that plague existing benchmarks.

Mathematical depth

Created in collaboration with over 60 mathematicians, FrontierMath spans the full spectrum of modern mathematics, from algebraic geometry to Zermelo–Fraenkel set theory.

Help shape the future of AI in mathematics

We are hosting a competition to establish rigorous human performance baselines for FrontierMath. With a prize pool of over $30,000, your participation will contribute directly to measuring AI progress in solving challenging mathematical problems.

Learn more
Impressions of our research-level problems
(top 25% of difficulty)

“These are extremely challenging... I think they will resist AIs for several years at least.”

Terence Tao
Terence Tao Fields Medalist (2006)

“Getting even one question right would be well beyond what we can do now, let alone saturating them.”

Timothy Gowers
Timothy Gowers Fields Medalist (1998)

“These are genuinely hard problems... most of them look well above my pay grade.”

Evan Chen
Evan Chen International Mathematical Olympiad Coach

Learn more

Exploring AI’s mathematical limits

Read the full academic paper introducing FrontierMath, including methodology, evaluation procedures, and detailed analysis.

Read more

Conflict of interest statement

OpenAI commissioned the production of 300 questions for FrontierMath, including all problems up to the upcoming version FrontierMath_XX-XX-25. OpenAI fully owns these 300 questions. They have access to all statements and solutions for questions up to version FrontierMath_XX-XX-25, except for a subset of 50 solutions added in version FrontierMath_XX-XX-25 randomly withheld for holdout evaluation. Epoch AI retains the right to conduct and publish internal evaluations using all questions in FrontierMath.

Learn more.

Version history

FrontierMath_XX-XX-25 (upcoming): Final 300 problem dataset. 50 solutions from this version will be held out from OpenAI for evaluation purposes.

FrontierMath_12-04-24: Includes 197 problems of which 5 were published. [ETA 14 Feb, 2025: we discovered 1 problem in this set was duplicated]

FrontierMath_11-26-24: Includes 180 questions. OpenAI’s o3 announcement on December 20th, 2024, claimed a 25.2% score on this version of the benchmark.

FrontierMath_11-06-24: Includes 147 questions. Internal delivery.

FrontierMath_10-22-24: Includes 119 questions. This version of the benchmark was analyzed in our arXiv paper.