The Remote Labor Index (RLI), created by the Center for AI Safety and Scale AI, measures how well AI agents can perform real, economically valuable remote work. Its projects are genuine assignments drawn from the freelance market, spanning software development, design, architecture, data analysis, game development, and video animation, and it is designed to tie AI agent capability directly to real paid-work outcomes.
We source results from the public Remote Labor Index leaderboards maintained by the Center for AI Safety and Scale AI.
RLI consists of 240 self-contained real-world projects across 23 freelance domains, representing over 6,000 hours of professional work valued at more than $140,000. An agent attempts each project end-to-end, and its deliverable is judged against the standard that a commissioned human professional’s work would be expected to meet. The headline metric is the automation rate, the percentage of projects completed to a professional-acceptance standard; current scores are low, in the low single-digit percentages.
For full details, see the Remote Labor Index paper.
Have a question? Noticed something wrong? Let us know.
A benchmark measuring how well AI agents can complete real, economically valuable remote freelance projects end-to-end.