The Agent Company

The Agent Company runs agents (often via open-source frameworks like OpenHands) on practical tasks and measures resolution rate, overall score, steps taken, and cost. Submissions are reproducible and patch outputs are often verified, helping separate robust agents from prompt-sensitive baselines.

This benchmark focuses on real-world agent behavior—reading, editing, executing, and iterating—rather than synthetic proxies. It is useful for gauging readiness for developer workflows and tool-augmented problem solving.

Featured

Publications

Data explorers

Benchmarks by Epoch AI

AI Progress

Industry

Infrastructure

Impacts

Papers & Reports

Data Insights

Newsletter

Podcast

Capabilities

Models

Data Centers

Chip Owners

Companies

Polling on AI Use

MirrorCode

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

The Agent Company