ALE-Bench

ALE-Bench

ALE-Bench, created by Sakana AI in collaboration with AtCoder, evaluates AI on long-horizon, objective-driven algorithm engineering. Its problems are hard combinatorial optimization tasks, such as routing, scheduling, planning, and multi-agent control, that reward iterative refinement over long sessions rather than single-shot answers, mirroring real heuristic-programming contests.

Methodology

We source results from the public ALE-Bench leaderboard.

ALE-Bench is built from 40 past AtCoder Heuristic Contest (AHC) problems. A Python library replicates the AHC contest environment with a code sandbox, and models iteratively submit solutions and receive public-evaluation feedback over a fixed time budget. Each submission is scored and converted into a “Performance” value using an Elo-rating-like method based on where it would rank against human contestants. Our chart defaults to average Performance (higher is better, on a roughly 0–3500 scale) and also exposes the contest rank (lower is better).

For full details, see the ALE-Bench paper and code.