AI in Math: Data & Research

0 results

RIP Classic Reasoning Benchmarks. What's Next?

Newsletter

May 5, 2026

RIP Classic Reasoning Benchmarks. What's Next?

Give up at least one of: text only, short time horizon, easy to grade, and expert human superiority.

By Greg Burnham

$AI math capabilities could be jagged for a long time – Daniel Litt$

Podcast

Jan. 29, 2026

AI math capabilities could be jagged for a long time – Daniel Litt

In this episode, Daniel Litt chats with the hosts about AI’s limits in mathematics, accelerating math research, and how to measure progress on open problems.

By Daniel Litt, Greg Burnham, and Anson Ho

Less than 70% of FrontierMath is within reach for today’s models

Newsletter

Oct. 17, 2025

Less than 70% of FrontierMath is within reach for today’s models

57% of problems have been solved at least once.

By Greg Burnham

$Evaluating Gemini 2.5 Deep Think's math capabilities$

Report

Oct. 9, 2025

Evaluating Gemini 2.5 Deep Think's math capabilities

It has improved at using background knowledge and doing precise computations. It can be a helpful research assistant and may take a more conceptual approach to geometry. It shows limited creativity and sometimes struggles with citations.

By Greg Burnham

$LLMs have not yet solved the hardest problems on high school math contests$

Data Insight

Sep. 3, 2025

LLMs have not yet solved the hardest problems on high school math contests

By Greg Burnham

Newsletter

Aug. 7, 2025

We didn’t learn much from the IMO

The problems gave AI only a slim chance to show new capabilities

By Greg Burnham

Newsletter

Aug. 2, 2025

Quantifying the algorithmic improvement from reasoning models

Reasoning models were as big of an improvement as the Transformer, at least on some benchmarks

By Anson Ho and Arden Berg

$Evaluating Grok 4’s math capabilities$

Report

Jul. 25, 2025

Evaluating Grok 4’s math capabilities

It’s good at involved computations, improving at proofs from a low base, and useful for literature search. It still favors low-level grinds and leans on background knowledge.

By Greg Burnham

Newsletter

Jul. 8, 2025

What will the IMO tell us about AI math capabilities?

Most discussion about AI and the IMO focuses on gold medals, but that's not the thing to pay most attention to.

By Greg Burnham

Newsletter

Jun. 6, 2025

Beyond benchmark scores: Analyzing o3-mini’s mathematical reasoning

Examining o3-mini's math reasoning: an erudite, vibes-based solver that excels in knowledge but lacks precision, creativity, and formal human rigor.

By Anson Ho, Jean-Stanislas Denain, and Elliot Glazer

Is AI already superhuman on FrontierMath?

Newsletter

May 23, 2025

Is AI already superhuman on FrontierMath?

How do humans and AIs compare on FrontierMath? We ran a competition at MIT to put this to the test.

By Anson Ho

Clarifying the creation and use of the FrontierMath benchmark

Update

Jan. 23, 2025

Clarifying the creation and use of the FrontierMath benchmark

We clarify that OpenAI commissioned Epoch AI to produce 300 math questions for the FrontierMath benchmark. They own these and have access to the statements and solutions, except for a 50-question holdout set.

By Tamay Besiroglu and Jaime Sevilla

FrontierMath competition: Setting benchmarks for AI evaluation

Update

Updated Mar. 18, 2025

FrontierMath competition: Setting benchmarks for AI evaluation

We are hosting a competition to establish rigorous human performance baselines for FrontierMath. With a prize pool of $10,000, your participation will contribute directly to measuring AI progress in solving challenging mathematical problems.

By Tamay Besiroglu, Elliot Glazer, and Caroline Falkman Olsson

What is the future of AI in mathematics? Interviews with leading mathematicians

Report

Dec. 4, 2024

What is the future of AI in mathematics? Interviews with leading mathematicians

How will AI transform mathematics? Fields Medalists and other leading mathematicians discuss whether they expect AI to automate advanced math research.

By Anson Ho and Tamay Besiroglu

FrontierMath: A benchmark for evaluating advanced mathematical reasoning in AI

Paper

Nov. 8, 2024

FrontierMath: A benchmark for evaluating advanced mathematical reasoning in AI

FrontierMath: a new benchmark of expert-level math problems designed to measure AI's mathematical abilities. See how leading AI models perform against the collective mathematics community.

By Tamay Besiroglu, Elliot Glazer, and Caroline Falkman Olsson

AI Progress

Industry

Infrastructure

Impacts

Featured

Publications

Data explorers

Benchmarks by Epoch AI

Papers & Reports

Data Insights

Newsletter

Podcast

Capabilities

Models

Frontier Data Centers

Chip Owners

Companies

Polling on AI Use

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

AI in math

Featured

Filter

AI in Math: Data & Research

AI Progress

Industry

Infrastructure

Impacts

Featured

Publications

Data explorers

Benchmarks by Epoch AI

Scaling

Software progress

Open models

Capabilities

Math

Leading companies

Finances

Geopolitics

Chips

Data centers

Energy

Adoption and use

Economic impact

Future of AI

Publications

Papers & Reports

Data Insights

Newsletter

Podcast

Data explorers

Capabilities

Models

Frontier Data Centers

Chip Owners

Companies

Polling on AI Use

Benchmarks by Epoch AI

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

About Epoch AI

Donate

Team

Careers

Consultations

For press

Transparency

AI in math

Featured

Filter