Our database of benchmark results, featuring the performance of leading AI models on challenging tasks. It includes results from benchmarks evaluated internally by Epoch AI as well as data collected from external sources. Explore trends in AI capabilities across time, by benchmark, or by model.
We released early results from MirrorCode, a new long-horizon SWE benchmark co-developed with METR, showing that AI can already complete some weeks-long coding tasks.
We evaluated Meta's MuseSpark on our benchmarking suite and assessed a preliminary ECI score of 154, placing it a bit higher than GPT-5.2 and a bit lower than Opus 4.6.
AI has solved one of the problems in FrontierMath: Open Problems, our benchmark of real research problems that mathematicians have tried and failed to solve.
Need deeper insights? Our team offers custom research and advisory services.
Book a consultation