AI in math

Few domains test AI reasoning as clearly as mathematics, where answers can be verified automatically and the hardest problems extend to the frontier of human knowledge. Epoch tracks how AI is performing on mathematical tasks over time, including through FrontierMath, our own benchmark of expert-level problems designed to test the limits of what today's best systems can do.

Filter

Type
0 results
RIP Classic Reasoning Benchmarks. What's Next?
Newsletter
May 5, 2026
RIP Classic Reasoning Benchmarks. What's Next?

Give up at least one of: text only, short time horizon, easy to grade, and expert human superiority.

By Greg Burnham

AI math capabilities could be jagged for a long time – Daniel Litt
Podcast
Jan. 29, 2026
AI math capabilities could be jagged for a long time – Daniel Litt

In this episode, Daniel Litt chats with the hosts about AI’s limits in mathematics, accelerating math research, and how to measure progress on open problems.

By Daniel Litt, Greg Burnham, and Anson Ho

Less than 70% of FrontierMath is within reach for today’s models
Newsletter
Oct. 17, 2025
Less than 70% of FrontierMath is within reach for today’s models

57% of problems have been solved at least once.

By Greg Burnham

Evaluating Gemini 2.5 Deep Think's math capabilities
Report
Oct. 9, 2025
Evaluating Gemini 2.5 Deep Think's math capabilities

It has improved at using background knowledge and doing precise computations. It can be a helpful research assistant and may take a more conceptual approach to geometry. It shows limited creativity and sometimes struggles with citations.

By Greg Burnham

LLMs have not yet solved the hardest problems on high school math contests
Data Insight
Sep. 3, 2025
LLMs have not yet solved the hardest problems on high school math contests

By Greg Burnham

Newsletter
Aug. 7, 2025
We didn’t learn much from the IMO

The problems gave AI only a slim chance to show new capabilities

By Greg Burnham

Newsletter
Aug. 2, 2025
Quantifying the algorithmic improvement from reasoning models

Reasoning models were as big of an improvement as the Transformer, at least on some benchmarks

By Anson Ho and Arden Berg

Evaluating Grok 4’s math capabilities
Report
Jul. 25, 2025
Evaluating Grok 4’s math capabilities

It’s good at involved computations, improving at proofs from a low base, and useful for literature search. It still favors low-level grinds and leans on background knowledge.

By Greg Burnham

Newsletter
Jul. 8, 2025
What will the IMO tell us about AI math capabilities?

Most discussion about AI and the IMO focuses on gold medals, but that's not the thing to pay most attention to.

By Greg Burnham

Newsletter
Jun. 6, 2025
Beyond benchmark scores: Analyzing o3-mini’s mathematical reasoning

Examining o3-mini's math reasoning: an erudite, vibes-based solver that excels in knowledge but lacks precision, creativity, and formal human rigor.

By Anson Ho, Jean-Stanislas Denain, and Elliot Glazer

Is AI already superhuman on FrontierMath?
Newsletter
May 23, 2025
Is AI already superhuman on FrontierMath?

How do humans and AIs compare on FrontierMath? We ran a competition at MIT to put this to the test.

By Anson Ho

Clarifying the creation and use of the FrontierMath benchmark
Update
Jan. 23, 2025
Clarifying the creation and use of the FrontierMath benchmark

We clarify that OpenAI commissioned Epoch AI to produce 300 math questions for the FrontierMath benchmark. They own these and have access to the statements and solutions, except for a 50-question holdout set.

By Tamay Besiroglu and Jaime Sevilla

FrontierMath competition: Setting benchmarks for AI evaluation
Update
Updated Mar. 18, 2025
FrontierMath competition: Setting benchmarks for AI evaluation

We are hosting a competition to establish rigorous human performance baselines for FrontierMath. With a prize pool of $10,000, your participation will contribute directly to measuring AI progress in solving challenging mathematical problems.

By Tamay Besiroglu, Elliot Glazer, and Caroline Falkman Olsson

What is the future of AI in mathematics? Interviews with leading mathematicians
Report
Dec. 4, 2024
What is the future of AI in mathematics? Interviews with leading mathematicians

How will AI transform mathematics? Fields Medalists and other leading mathematicians discuss whether they expect AI to automate advanced math research.

By Anson Ho and Tamay Besiroglu

FrontierMath: A benchmark for evaluating advanced mathematical reasoning in AI
Paper
Nov. 8, 2024
FrontierMath: A benchmark for evaluating advanced mathematical reasoning in AI

FrontierMath: a new benchmark of expert-level math problems designed to measure AI's mathematical abilities. See how leading AI models perform against the collective mathematics community.

By Tamay Besiroglu, Elliot Glazer, and Caroline Falkman Olsson