FrontierCode

FrontierCode is a benchmark from Cognition that evaluates whether an AI coding agent’s patch on a real open-source issue is good enough to merge, not just whether it passes tests. Each task pairs a checked-out repository with a single issue, and the agent works autonomously in a container. Patches are graded against held-out tests and a maintainer-authored rubric covering behavioral correctness, regression safety, build and style cleanliness, and adherence to project conventions. FrontierCode 1.1 reports two nested subsets: Main contains the 100 hardest tasks, while Extended contains all 150 tasks.

Methodology

We source results from Cognition’s public FrontierCode leaderboard, using only the current 1.1 revision. Our chart reports the Main score: each model’s rubric score on the 100-task Main subset at its best-performing reasoning effort. Cognition also reports a separate binary pass rate, which we do not show.

FrontierCode grades each submission with a mean@5 aggregation against a weighted rubric, where failing any blocker criterion yields a zero. Models are run through agent harnesses such as Claude Code, Codex, mini-SWE-agent, and Devin; we keep each model’s harness and best-performing reasoning effort in the data export. FrontierCode 1.1 distinguishes legitimate internet use from consulting solution-bearing sources, zeroing runs flagged for unfair internet use.

Featured

Publications

Data explorers

Benchmarks by Epoch AI

AI Progress

Industry

Infrastructure

Impacts

Papers & Reports

Data Insights

Newsletter

Podcast

Capabilities

Models

Data Centers

Chip Owners

Companies

Polling on AI Use

MirrorCode

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

FrontierCode

FrontierCode

Methodology

FrontierCode

Featured

Publications

Data explorers

Benchmarks by Epoch AI

AI Progress

Industry

Infrastructure

Impacts

Publications

Papers & Reports

Data Insights

Newsletter

Podcast

Data explorers

Capabilities

Models

Data Centers

Chip Owners

Companies

Polling on AI Use

Benchmarks by Epoch AI

MirrorCode

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

Scaling

Software progress

Open models

Capabilities

Math

Leading companies

Finances

Geopolitics

Chips

Data centers

Energy

Adoption and use

Economic impact

Future of AI

About Epoch AI

Donate

Team

Careers

Consultations

For press

Transparency

FrontierCode

FrontierCode

Methodology