Our database of benchmark results, featuring the performance of leading AI models on challenging tasks. It includes results from benchmarks evaluated internally by Epoch AI as well as data collected from external sources. Explore trends in AI capabilities across time, by benchmark, or by model.
We recently began tracking 13 new evals on our benchmarking hub. 7 of these have been incorporated into the Epoch Capabilities Index (ECI).
We added nine new external benchmarks to the hub, spanning agentic work, cybersecurity, algorithm engineering, forecasting, and research-level physics.
Claude Fable 5 achieves a new high score of 161 on the ECI, beating GPT-5.5 Pro by 1 point. This is the first time Anthropic has taken the lead on the ECI in over a year.
Need deeper insights? Our team offers custom research and advisory services.
Book a consultation