Our database of benchmark results, featuring the performance of leading AI models on challenging tasks. It includes results from benchmarks evaluated internally by Epoch AI as well as data collected from external sources. Explore trends in AI capabilities across time, by benchmark, or by model.
Claude Opus 4.7 scores 156 on ECI, putting it a bit ahead of Opus 4.6 and behind only GPT-5.4, Gemini 3.1 Pro, and GPT-5.4 Pro.
We released early results from MirrorCode, a new long-horizon SWE benchmark co-developed with METR, showing that AI can already complete some weeks-long coding tasks.
AI has solved one of the problems in FrontierMath: Open Problems, our benchmark of real research problems that mathematicians have tried and failed to solve.
Need deeper insights? Our team offers custom research and advisory services.
Book a consultation