Benchmarking updates

Jul. 1, 2026

We recently began tracking 13 new evals on our benchmarking hub. 7 of these have been incorporated into the Epoch Capabilities Index (ECI).

Browse the benchmarks

Jun. 22, 2026

We added nine new external benchmarks to the hub, spanning agentic work, cybersecurity, algorithm engineering, forecasting, and research-level physics.

Browse the benchmarks

Jun. 15, 2026

Claude Fable 5 achieves a new high score of 161 on the ECI, beating GPT-5.5 Pro by 1 point. This is the first time Anthropic has taken the lead on the ECI in over a year.

See the thread

Trusted by leaders at OpenAI, DeepMind,
and governments worldwide

Need deeper insights? Our team offers custom research and advisory services.

Book a consultation

Data on AI Capabilities and Benchmarking

Our database of benchmark results, featuring the performance of leading AI models on challenging tasks. It includes results from benchmarks evaluated internally by Epoch AI as well as data collected from external sources. Explore trends in AI capabilities across time, by benchmark, or by model.

Featured

Publications

Data explorers

Benchmarks by Epoch AI

AI Progress

Industry

Infrastructure

Impacts

Papers & Reports

Data Insights

Newsletter

Podcast

Capabilities

Models

Data Centers

Chip Owners

Companies

Polling on AI Use

MirrorCode

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

AI Capabilities

Filter

Epoch AI–run benchmarks

Benchmark creator–run benchmarks

Model developer–run benchmarks

Benchmarking updates

Trusted by leaders at OpenAI, DeepMind,
and governments worldwide

Data on AI Capabilities and Benchmarking

Featured

Publications

Data explorers

Benchmarks by Epoch AI

AI Progress

Industry

Infrastructure

Impacts

Publications

Papers & Reports

Data Insights

Newsletter

Podcast

Data explorers

Capabilities

Models

Data Centers

Chip Owners

Companies

Polling on AI Use

Benchmarks by Epoch AI

MirrorCode

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

Scaling

Software progress

Open models

Capabilities

Math

Leading companies

Finances

Geopolitics

Chips

Data centers

Energy

Adoption and use

Economic impact

Future of AI

About Epoch AI

Donate

Team

Careers

Consultations

Press

Transparency

AI Capabilities

Filter

Epoch AI–run benchmarks

Benchmark creator–run benchmarks

Model developer–run benchmarks

Benchmarking updates

Trusted by leaders at OpenAI, DeepMind, and governments worldwide

Trusted by leaders at OpenAI, DeepMind,
and governments worldwide