FrontierSWE

FrontierSWE evaluates coding agents on hard, long-horizon software engineering tasks collected from real-world technical domains. The benchmark covers implementation tasks, performance engineering tasks, and research tasks. The public leaderboard reports average rank across tasks and a dominance metric: the win rate against a random opponent on a task.

Methodology

We source results from the public FrontierSWE leaderboard. Our chart defaults to the leaderboard’s dominance metric. We also keep the leaderboard’s average rank and category-specific ranks for implementation, performance, and research tasks in the data export.

FrontierSWE evaluates coding agents on long-horizon software engineering tasks that are intended to take substantial amounts of work. The tasks are grouped into implementation, performance, and research categories, and submissions are run through agent harnesses such as Claude Code, Codex, Gemini CLI, and Kimi CLI. The leaderboard aggregates repeated runs using its Mean@5 setting and reports both average rank and category-specific ranks. We use dominance as the headline metric because it summarizes how often a system would beat another randomly selected system on a task.

Featured

Publications

Data explorers

Benchmarks by Epoch AI

AI Progress

Industry

Infrastructure

Impacts

Papers & Reports

Data Insights

Newsletter

Podcast

Capabilities

Models

Data Centers

Chip Owners

Companies

Polling on AI Use

MirrorCode

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

FrontierSWE

FrontierSWE

Methodology

FrontierSWE

Featured

Publications

Data explorers

Benchmarks by Epoch AI

AI Progress

Industry

Infrastructure

Impacts

Publications

Papers & Reports

Data Insights

Newsletter

Podcast

Data explorers

Capabilities

Models

Data Centers

Chip Owners

Companies

Polling on AI Use

Benchmarks by Epoch AI

MirrorCode

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

Scaling

Software progress

Open models

Capabilities

Math

Leading companies

Finances

Geopolitics

Chips

Data centers

Energy

Adoption and use

Economic impact

Future of AI

About Epoch AI

Donate

Team

Careers

Consultations

For press

Transparency

FrontierSWE

FrontierSWE

Methodology