Claude overperforms at software engineering and underperforms at math

Relative to their general Epoch Capabilities Index (ECI) values Anthropic’s Claude models overperform on software engineering benchmarks (aggregated by the SWE-ECI) and underperform on math (Math-ECI). The SWE overperformance has been consistent across most generations, and remains in recent models. The math gap may be narrowing — Opus 4.6 and 4.7 both have Math-ECIs within 1 point of their general ECI, compared to larger gaps for earlier models.

Enable JavaScript to see an interactive visualization.

These results come from our Domain-specific ECI Explorer, where you can view Math and SWE ECIs, or design your own variant. The ECI methodology compares performance relative to other large language models, and thus reflects how difficult tasks are on average for AIs, not how difficult they are for humans.

Epoch's work is free to use, distribute, and reproduce provided the source and authors are credited under the Creative Commons BY license.

Learn more about this graph

Using Domain-specific Epoch Capabilities Index results, we compare the relative math and software engineering abilities of Claude models from Anthropic by looking at their general ECIs, Math-ECIs and SWE-ECIs. We include only models from Anthropic with at least 2 math and 2 SWE benchmarks from the selection of benchmarks used to calculate the ECI.

See our documentation for Domain-specific ECI for details on our methodology.

Limitations

Download this data

Claude domain-specific ECI data

CSV, Updated May 15, 2026

Featured

Publications

Data explorers

Benchmarks by Epoch AI

AI Progress

Industry

Infrastructure

Impacts

Papers & Reports

Data Insights

Newsletter

Podcast

Capabilities

Models

Data Centers

Chip Owners

Companies

Polling on AI Use

MirrorCode

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

Claude overperforms at software engineering and underperforms at math

Learn more about this graph

Limitations

Download this data

Claude overperforms at software engineering and underperforms at math

Featured

Publications

Data explorers

Benchmarks by Epoch AI

AI Progress

Industry

Infrastructure

Impacts

Publications

Papers & Reports

Data Insights

Newsletter

Podcast

Data explorers

Capabilities

Models

Data Centers

Chip Owners

Companies

Polling on AI Use

Benchmarks by Epoch AI

MirrorCode

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

Scaling

Software progress

Open models

Capabilities

Math

Leading companies

Finances

Geopolitics

Chips

Data centers

Energy

Adoption and use

Economic impact

Future of AI

About Epoch AI

Donate

Team

Careers

Consultations

For press

Transparency

Learn more about this graph

Limitations

Download this data

Related topics