OS World (Screenshot)
A computer-use benchmark where agents complete desktop or web tasks using only pixel/screenshot observations of the interface.
About OS World (Screenshot)
In the screenshot variant of OSWorld, agents observe the UI through images rather than privileged DOM or OS APIs. Tasks involve grounded interaction—clicking, typing, scrolling, and navigating menus—to accomplish goals such as file operations, application configuration, or information lookup.
This setting emphasizes perception, spatial grounding, and long-horizon planning in realistic software environments. Success is typically measured by task completion and adherence to goal specifications.