Vending-Bench 2

Vending-Bench 2, created by Andon Labs, measures an AI agent’s ability to stay coherent and run a business profitably over very long horizons. The agent autonomously operates a simulated vending machine business for a full simulated year, managing inventory, orders, pricing, supplier negotiation, daily fees, and disruptions, over a context spanning many millions of tokens. It targets long-horizon agentic reliability rather than single-shot reasoning, and builds on lessons from real-world deployments.

Methodology

We source results from the public Vending-Bench 2 leaderboard.

Each model runs over a full simulated year, and the headline score is averaged across five runs per model. Compared with the original Vending-Bench, version 2 adds real-world messiness such as adversarial suppliers, delayed or failed deliveries, and customer refund demands, and streamlines scoring to a single headline metric: the agent’s money balance in U.S. dollars at the end of the year (higher is better). Andon Labs estimates that a strong human strategy could reach roughly $63,000 per year, so even top models capture only a small fraction of skilled-human performance.

For full details, see the original Vending-Bench paper.

Featured

Publications

Data explorers

Benchmarks by Epoch AI

AI Progress

Industry

Infrastructure

Impacts

Papers & Reports

Data Insights

Newsletter

Podcast

Capabilities

Models

Data Centers

Chip Owners

Companies

Polling on AI Use

MirrorCode

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

Vending-Bench 2

Vending-Bench 2

Methodology

Vending-Bench 2

Featured

Publications

Data explorers

Benchmarks by Epoch AI

AI Progress

Industry

Infrastructure

Impacts

Publications

Papers & Reports

Data Insights

Newsletter

Podcast

Data explorers

Capabilities

Models

Data Centers

Chip Owners

Companies

Polling on AI Use

Benchmarks by Epoch AI

MirrorCode

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

Scaling

Software progress

Open models

Capabilities

Math

Leading companies

Finances

Geopolitics

Chips

Data centers

Energy

Adoption and use

Economic impact

Future of AI

About Epoch AI

Donate

Team

Careers

Consultations

For press

Transparency

Vending-Bench 2

Vending-Bench 2

Methodology