About The Agent Company

The Agent Company runs agents (often via open-source frameworks like OpenHands) on practical tasks and measures resolution rate, overall score, steps taken, and cost. Submissions are reproducible and patch outputs are often verified, helping separate robust agents from prompt-sensitive baselines.

This benchmark focuses on real-world agent behavior—reading, editing, executing, and iterating—rather than synthetic proxies. It is useful for gauging readiness for developer workflows and tool-augmented problem solving.