Benchmarking updates

May 30, 2026

We took another look at the capability gap between open-weight and proprietary models. Since the start of the year, open-weight models have lagged the state of the art by four months.

See the thread

Apr. 28, 2026

GPT-5.5 Pro achieves a new high score of 159 on the Epoch Capabilities Index.

See the thread

Apr. 10, 2026

We released early results from MirrorCode, a new long-horizon SWE benchmark co-developed with METR, showing that AI can already complete some weeks-long coding tasks.

Read the report

Trusted by leaders at OpenAI, DeepMind,
and governments worldwide

Need deeper insights? Our team offers custom research and advisory services.

Book a consultation

Data on AI Capabilities and Benchmarking

Our database of benchmark results, featuring the performance of leading AI models on challenging tasks. It includes results from benchmarks evaluated internally by Epoch AI as well as data collected from external sources. Explore trends in AI capabilities across time, by benchmark, or by model.

Featured

Publications

Data explorers

Benchmarks by Epoch AI

AI Progress

Industry

Infrastructure

Impacts

Papers & Reports

Data Insights

Newsletter

Podcast

Capabilities

Models

Frontier Data Centers

Chip Owners

Companies

Polling on AI Use

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

AI Capabilities

Filter

Epoch AI–run benchmarks

Benchmark creator–run benchmarks

Model developer–run benchmarks

Benchmarking updates

Trusted by leaders at OpenAI, DeepMind,
and governments worldwide

Data on AI Capabilities and Benchmarking

Featured

Publications

Data explorers

Benchmarks by Epoch AI

AI Progress

Industry

Infrastructure

Impacts

Publications

Papers & Reports

Data Insights

Newsletter

Podcast

Data explorers

Capabilities

Models

Frontier Data Centers

Chip Owners

Companies

Polling on AI Use

Benchmarks by Epoch AI

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

Scaling

Software progress

Open models

Capabilities

Math

Leading companies

Finances

Geopolitics

Chips

Data centers

Energy

Adoption and use

Economic impact

Future of AI

About Epoch AI

Donate

Team

Careers

Consultations

For press

Transparency

AI Capabilities

Filter

Epoch AI–run benchmarks

Benchmark creator–run benchmarks

Model developer–run benchmarks

Benchmarking updates

Trusted by leaders at OpenAI, DeepMind, and governments worldwide

Trusted by leaders at OpenAI, DeepMind,
and governments worldwide