CritPt

CritPt (short for “Complex Research using Integrated Thinking - Physics Test,” and a play on “critical point”) is the first benchmark to test whether AI can reason through complex, open-ended, research-level physics problems. Created by more than 50 active physics researchers across over 30 institutions, led by Argonne National Laboratory and the University of Illinois, its problems are modeled on entry-level original research projects rather than textbook exercises and span around a dozen modern physics subfields. All problems are unpublished and original to avoid contamination.

Methodology

We source results from the public CritPt leaderboard, hosted by Artificial Analysis.

CritPt contains 71 composite research challenges decomposed into roughly 190 checkpoint sub-tasks, created and reviewed by domain experts with an average of more than 40 hours of expert review per challenge. Answers use guess-resistant, machine-verifiable formats such as numerical arrays, symbolic expressions, and Python functions, scored by an automated physics-customized grading pipeline. The headline metric is accuracy; scores are very low, reflecting the benchmark’s difficulty.

For full details, see the CritPt paper and code.

Featured

Publications

Data explorers

Benchmarks by Epoch AI

AI Progress

Industry

Infrastructure

Impacts

Papers & Reports

Data Insights

Newsletter

Podcast

Capabilities

Models

Data Centers

Chip Owners

Companies

Polling on AI Use

MirrorCode

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

CritPt

CritPt

Methodology

CritPt

Featured

Publications

Data explorers

Benchmarks by Epoch AI

AI Progress

Industry

Infrastructure

Impacts

Publications

Papers & Reports

Data Insights

Newsletter

Podcast

Data explorers

Capabilities

Models

Data Centers

Chip Owners

Companies

Polling on AI Use

Benchmarks by Epoch AI

MirrorCode

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

Scaling

Software progress

Open models

Capabilities

Math

Leading companies

Finances

Geopolitics

Chips

Data centers

Energy

Adoption and use

Economic impact

Future of AI

About Epoch AI

Donate

Team

Careers

Consultations

For press

Transparency

CritPt

CritPt

Methodology