The Agent Company
A community-run evaluation of end-to-end software agents that attempt realistic tasks in reproducible environments.
About The Agent Company
The Agent Company runs agents (often via open-source frameworks like OpenHands) on practical tasks and measures resolution rate, overall score, steps taken, and cost. Submissions are reproducible and patch outputs are often verified, helping separate robust agents from prompt-sensitive baselines.
This benchmark focuses on real-world agent behavior—reading, editing, executing, and iterating—rather than synthetic proxies. It is useful for gauging readiness for developer workflows and tool-augmented problem solving.