Our database of benchmark results, featuring the performance of leading AI models on challenging tasks. It includes results from benchmarks evaluated internally by Epoch AI as well as data collected from external sources. Explore trends in AI capabilities across time, by benchmark, or by model.
GPT-5.5 Pro achieves a new high score of 159 on the Epoch Capabilities Index.
We released early results from MirrorCode, a new long-horizon SWE benchmark co-developed with METR, showing that AI can already complete some weeks-long coding tasks.
AI has solved one of the problems in FrontierMath: Open Problems, our benchmark of real research problems that mathematicians have tried and failed to solve.
Need deeper insights? Our team offers custom research and advisory services.
Book a consultation