AI Benchmarking
Our database of benchmark results, featuring the performance of leading AI models on challenging tasks. It includes results from benchmarks evaluated internally by Epoch AI as well as data collected from external sources. Explore trends in AI capabilities across time, by benchmark, or by model.
Benchmarking updates
February 20, 2026
We've updated our methodology for running agentic coding benchmarks.
Learn about the update
February 13, 2026
We analyzed three benchmarks — RLI, GDPval, and APEX-Agents — designed to test AI’s ability to do economically valuable digital work.
Read the report
January 27, 2026
We released FrontierMath: Open Problems, which tests AI on unsolved math research problems.
Discover the benchmark