Data Insight
May 15, 2026

Claude overperforms at software engineering and underperforms at math

Relative to their general Epoch Capabilities Index (ECI) values Anthropic’s Claude models overperform on software engineering benchmarks (aggregated by the SWE-ECI) and underperform on math (Math-ECI). The SWE overperformance has been consistent across most generations, and remains in recent models. The math gap may be narrowing — Opus 4.6 and 4.7 both have Math-ECIs within 1 point of their general ECI, compared to larger gaps for earlier models.

These results come from our Domain-specific ECI Explorer, where you can view Math and SWE ECIs, or design your own variant. The ECI methodology compares performance relative to other large language models, and thus reflects how difficult tasks are on average for AIs, not how difficult they are for humans.

Epoch's work is free to use, distribute, and reproduce provided the source and authors are credited under the Creative Commons BY license.

Learn more about this graph

Using Domain-specific Epoch Capabilities Index results, we compare the relative math and software engineering abilities of Claude models from Anthropic by looking at their general ECIs, Math-ECIs and SWE-ECIs. We include only models from Anthropic with at least 2 math and 2 SWE benchmarks from the selection of benchmarks used to calculate the ECI.

See our documentation for Domain-specific ECI for details on our methodology.

Limitations

Download this data