About OS World

OSWorld evaluates computer-use agents in real OS environments running arbitrary desktop and web apps. Agents interact through human-like keyboard/mouse primitives (click, type, scroll, drag, wait) and receive UI state via structured observations, such as accessibility trees.

Tasks emphasize grounded interaction and long-horizon planning (file operations, app configuration, information lookup, multi-app workflows). Success is measured via execution-based checks that validate end-state goals and constraints.