The Epoch Capabilities Index (ECI) combines scores from many different AI benchmarks into a single “general capability” scale, allowing comparisons between models even over timespans long enough for single benchmarks to reach saturation.
ECI is a composite metric which uses scores from 37 distinct benchmarks to generate a single, general capability scale. At a high level, ECI stitches together its component benchmarks, determining their relative difficulty by making comparisons wherever models are evaluated on multiple benchmarks. Individual models obtain higher ECI scores if they perform better on harder benchmarks.
We give an overview of our methodology in the Methodology section; further technical details are available in our paper, A Rosetta Stone for AI Benchmarks, which was funded by Google DeepMind, and written in collaboration with researchers from their AGI Safety & Alignment team. However, the ECI is an independent Epoch AI product that Epoch has full rights over.
Code for the ECI is available in a public repository here.
The interactive ECI leaderboard is available at epoch.ai/eci. For domain-specific ECI scores covering software engineering and math, see the Domain-specific ECI section.