Data Insight

Nov. 27, 2024

Updated Feb. 7, 2025

US models currently outperform non-US models

The best US models have consistently higher accuracies than the best non-US models on GPQA Diamond and MATH Level 5. For example, on GPQA Diamond the best-performing model is OpenAI’s o1, while on MATH Level 5 the leading model is o3-mini.

Benchmark

However, with the release of DeepSeek-R1 in January 2025, the gap between US and non-US models has reduced substantially: DeepSeek-R1 trails behind o3-mini by only 2 percentage points on MATH Level 5, and scores only 4 percentage points lower than o1 on GPQA Diamond.

Epoch's work is free to use, distribute, and reproduce provided the source and authors are credited under the Creative Commons BY license.

Explore this data

AI Capabilities

Benchmark results featuring the performance of leading AI models on challenging tasks.

Featured

Publications

Data explorers

Benchmarks by Epoch AI

AI Progress

Industry

Infrastructure

Impacts

Papers & Reports

Data Insights

Newsletter

Podcast

Capabilities

Models

Data Centers

Chip Owners

Companies

Polling on AI Use

MirrorCode

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

US models currently outperform non-US models

Explore this data

US models currently outperform non-US models

Featured

Publications

Data explorers

Benchmarks by Epoch AI

AI Progress

Industry

Infrastructure

Impacts

Publications

Papers & Reports

Data Insights

Newsletter

Podcast

Data explorers

Capabilities

Models

Data Centers

Chip Owners

Companies

Polling on AI Use

Benchmarks by Epoch AI

MirrorCode

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

Scaling

Software progress

Open models

Capabilities

Math

Leading companies

Finances

Geopolitics

Chips

Data centers

Energy

Adoption and use

Economic impact

Future of AI

About Epoch AI

Donate

Team

Careers

Consultations

For press

Transparency

Explore this data

Related topics