Clarifying the creation and use of the FrontierMath benchmark

Published

Jan 23, 2025

Authors

FrontierMath is a benchmark we created to evaluate the mathematical capabilities of frontier AI models. We saw a need for high-quality, challenging mathematical problems that could meaningfully test the limits of these systems. This remains our core mission—to help the AI community and the public at large accurately understand and measure AI capabilities.

Building high-quality evaluations at this scale requires substantial resources. After approaching several potential funders, we partnered with OpenAI, who provided both the necessary funding and technical expertise to develop the benchmark.¹ Working with industry sponsors helps make the benchmark more impactful for the AI field.

However, we recognize we have not communicated clearly enough about the relationship between FrontierMath and OpenAI, leading to questions and concerns among contributors, researchers, and the public. To address these issues, here are the facts:

OpenAI commissioned Epoch AI to produce 300 advanced math problems for AI evaluation that form the core of the FrontierMath benchmark. As is typical of commissioned work, OpenAI retains ownership of these questions and has access to the problems and solutions, with the exception of a holdout set.
Epoch AI is free to conduct and publish evaluations of any models using the FrontierMath problem set commissioned by OpenAI. However, we cannot share the questions and answers with other parties without written permission from OpenAI.
We are finalizing a 50-problem set for which OpenAI will only receive the problem statements and not the solutions. This allows us to independently test OpenAI and other AI models on solutions no AI developer has access to. OpenAI will still own these problems as they commissioned them, but will not have access to the solutions.
Our agreement did not prevent us from disclosing to our contributors that this work was sponsored by an AI company. Many contributors were unaware of these details, and our communication with them should have been more systematic and transparent.
Per our agreement, we needed OpenAI’s permission before publicly disclosing their involvement. We requested this permission ahead of the benchmark announcement in November 2024. We received permission from OpenAI ahead of the o3 announcement, and updated our paper and publicly announced the partnership. At that point, we didn’t clarify the data access and ownership agreement with OpenAI. We are doing this now.

OpenAI has further commissioned work from Epoch AI to expand FrontierMath to include even higher-difficulty math problems. For this and future work, we will improve our disclosure practices to ensure that the public and all future contributors receive clear information about industry sponsorship and data access agreements from the outset. We are also reaching out individually to contributing mathematicians to address their questions and concerns.

Going forward, we will ensure all contributors have access to information about industry funding and data access agreements before participating and proactively publicly disclose benchmark sponsorship and data access agreements.

Notes

For full disclosure, a pilot of the benchmark was funded by Nat Friedman. ↩

About the authors

Former employee

Tamay Besiroglu co-founded Epoch AI and remains contributing to the organization as a research advisor. He left Epoch to co-lead Mechanize, a startup building virtual work environments, benchmarks, and training data for AI development. His research expertise focuses on the economics of computing and broader trends in machine learning.

Jaime Sevilla is the director of Epoch AI. His research is focused on technological forecasting and the trajectory of AI. He has a background in Mathematics and Computer Science.

Clarifying the creation and use of the FrontierMath benchmark

Published

Authors

Notes

About the authors

Tags

Related work

We value your privacy