Benchmark scores are well correlated, even across domains

Epoch's work is free to use, distribute, and reproduce provided the source and authors are credited under the Creative Commons BY license.

Learn more about this graph

We visualize correlations between benchmarks in our Benchmarking Hub using a pairwise correlation matrix. All correlations correspond to Spearman (rank) correlations.

Across 17 benchmarks with a minimum of 5 models evaluated on each of the other benchmarks, the median rank correlation is 0.73. Correlations are nearly as high across benchmark categories as they are within categories; we find a median correlation of 0.68 among benchmarks from different categories, and 0.79 among those from the same category. This high degree of agreement between benchmarks motivates our Epoch Capabilities Index, which is designed to capture a single capability factor. Unsurprisingly, ECI correlates well with underlying benchmarks.

Data

Analysis

Limitations

Explore this data

AI Capabilities

Benchmark results featuring the performance of leading AI models on challenging tasks.

Research & Commentary

More

Datasets

Benchmarking Data

By Epoch AI

Benchmark scores are well correlated, even across domains

Learn more about this graph

Data

Analysis

Limitations

Explore this data

Research & Commentary

More

Datasets

Benchmarking Data

By Epoch AI

AI Trends & Statistics

Papers & Reports

Newsletter: Gradient Updates

Data Insights

Podcast: Epoch After Hours

Models

Frontier Data Centers

Hardware

Companies

Chip Sales

Polling on Usage

AI Capabilities

FrontierMath

Learn more about this graph

Data

Analysis

Limitations

Explore this data

Related insights

Epoch’s Capabilities Index stitches together benchmarks across a wide range of difficulties

LLMs have not yet solved the hardest problems on high school math contests

Related topics