Benchmarking updates

Jun. 15, 2026

Claude Fable 5 achieves a new high score of 161 on the ECI, beating GPT-5.5 Pro by 1 point. This is the first time Anthropic has taken the lead on the ECI in over a year.

See the thread

Jun. 12, 2026

Claude Fable 5 scores 87% on FrontierMath Tiers 1–3 and 88% on Tier 4, continuing Anthropic's streak of rapid gains in math.

See the thread

May 30, 2026

We took another look at the capability gap between open-weight and proprietary models. Since the start of the year, open-weight models have lagged the state of the art by four months.

See the thread

Trusted by leaders at OpenAI, DeepMind,
and governments worldwide

Need deeper insights? Our team offers custom research and advisory services.

Book a consultation

Data on AI Capabilities and Benchmarking

Our database of benchmark results, featuring the performance of leading AI models on challenging tasks. It includes results from benchmarks evaluated internally by Epoch AI as well as data collected from external sources. Explore trends in AI capabilities across time, by benchmark, or by model.

Featured

Publications

Data explorers

Benchmarks by Epoch AI

AI Progress

Industry

Infrastructure

Impacts

Papers & Reports

Data Insights

Newsletter

Podcast

Capabilities

Models

Frontier Data Centers

Chip Owners

Companies

Polling on AI Use

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

AI Capabilities

Filter

Epoch AI–run benchmarks

Benchmark creator–run benchmarks

Model developer–run benchmarks

Benchmarking updates

Trusted by leaders at OpenAI, DeepMind,
and governments worldwide

Data on AI Capabilities and Benchmarking

Featured

Publications

Data explorers

Benchmarks by Epoch AI

AI Progress

Industry

Infrastructure

Impacts

Publications

Papers & Reports

Data Insights

Newsletter

Podcast

Data explorers

Capabilities

Models

Frontier Data Centers

Chip Owners

Companies

Polling on AI Use

Benchmarks by Epoch AI

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

Scaling

Software progress

Open models

Capabilities

Math

Leading companies

Finances

Geopolitics

Chips

Data centers

Energy

Adoption and use

Economic impact

Future of AI

About Epoch AI

Donate

Team

Careers

Consultations

For press

Transparency

AI Capabilities

Filter

Epoch AI–run benchmarks

Benchmark creator–run benchmarks

Model developer–run benchmarks

Benchmarking updates

Trusted by leaders at OpenAI, DeepMind, and governments worldwide

Trusted by leaders at OpenAI, DeepMind,
and governments worldwide