ARC-AGI-2

ARC-AGI-2, developed by the ARC Prize Foundation, is the second iteration of the Abstraction and Reasoning Corpus. Like its predecessor, it presents systems with grid-based input-output demonstrations and asks them to infer the underlying transformation rule and apply it to a novel test input, allowing two attempts per test case (pass@2). The benchmark comprises 1,360 tasks in total – 1,000 training tasks and 360 evaluation tasks split evenly across public, semi-private, and private sets of 120 each.

ARC-AGI-2 is designed to be substantially harder for AI systems while remaining accessible to humans. Tasks that were susceptible to brute-force program search in the original ARC-AGI have been removed, and new tasks target weaknesses in symbolic interpretation, compositional reasoning, and contextual rule application – areas where current systems struggle to assign meaning beyond surface-level visual patterns or to apply multiple interacting rules simultaneously. Controlled human testing with over 400 participants confirmed that every evaluation task can be solved by at least two people in two attempts or fewer, with an average human score of 60%. At the time of ARC-AGI-2’s release, pure LLMs scored 0% and frontier reasoning systems achieved only single-digit percentages.

Featured

Publications

Data explorers

Benchmarks by Epoch AI

AI Progress

Industry

Infrastructure

Impacts

Papers & Reports

Data Insights

Newsletter

Podcast

Capabilities

Models

Data Centers

Chip Owners

Companies

Polling on AI Use

MirrorCode

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

ARC-AGI-2

ARC-AGI-2

ARC-AGI-2

Featured

Publications

Data explorers

Benchmarks by Epoch AI

AI Progress

Industry

Infrastructure

Impacts

Publications

Papers & Reports

Data Insights

Newsletter

Podcast

Data explorers

Capabilities

Models

Data Centers

Chip Owners

Companies

Polling on AI Use

Benchmarks by Epoch AI

MirrorCode

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

Scaling

Software progress

Open models

Capabilities

Math

Leading companies

Finances

Geopolitics

Chips

Data centers

Energy

Adoption and use

Economic impact

Future of AI

About Epoch AI

Donate

Team

Careers

Consultations

For press

Transparency

ARC-AGI-2

ARC-AGI-2