Cybench
A cybersecurity agent benchmark measuring autonomous vulnerability discovery and exploitation across sandboxed challenges.
About Cybench
Cybench evaluates end-to-end agent performance on security-oriented tasks that require reconnaissance, tool use, multi-step planning, and persistence. Results are reported in unguided and subtask-guided settings, as well as by the difficulty of the hardest task solved and time-to-solve statistics.
The focus is on practical, reproducible agent capability in realistic security scenarios rather than synthetic puzzles. As such, Cybench is often used in safety and capability reports to gauge real-world agentic competence and the reliability of long-horizon execution.