AlgoTune

AlgoTune

AlgoTune evaluates whether language models can write code that runs faster than established reference implementations while still producing correct outputs. Tasks span mathematics, physics, computer science, and machine learning, including problems such as gzip compression, AES encryption, singular value decomposition, and convex optimization.

Methodology

We source results from the public AlgoTune leaderboard.

AlgoTune contains 154 coding tasks contributed by domain experts. Through the AlgoTuner harness, a model runs a budgeted loop, editing code, running and profiling it, and verifying correctness against held-out tests, keeping the fastest valid version under a fixed budget per task. Each solver is timed against a reference implementation from popular libraries such as SciPy, scikit-learn, and CVXPY. The headline score is the harmonic mean of speedups across all tasks, where 1.0x means no improvement over the reference and higher is better.

For full details, see the AlgoTune paper and code.