Our database of benchmark results, featuring the performance of leading AI models on challenging tasks. It includes results from benchmarks evaluated internally by Epoch AI as well as data collected from external sources. Explore trends in AI capabilities across time, by benchmark, or by model.
We took another look at the capability gap between open-weight and proprietary models. Since the start of the year, open-weight models have lagged the state of the art by four months.
GPT-5.5 Pro achieves a new high score of 159 on the Epoch Capabilities Index.
We released early results from MirrorCode, a new long-horizon SWE benchmark co-developed with METR, showing that AI can already complete some weeks-long coding tasks.
Need deeper insights? Our team offers custom research and advisory services.
Book a consultation