GBAEval

GBAEval evaluates AI coding agents on a long-horizon software engineering task: building a Game Boy Advance emulator from scratch in Rust and WebAssembly. Runs are graded with replay, procedural, and audio tests, so the benchmark measures the full range of functional correctness across emulator behavior.

Methodology

We source results from the public GBAEval leaderboard. The leaderboard reports an overall score that combines replay, procedural, and audio section scores. Our chart defaults to overall score but makes section scores available.

In GBAEval, agents are asked to build a Game Boy Advance emulator in Rust and WebAssembly. Candidate emulators are graded against Mesen2, a reference emulator, across three categories. Replay tests run fixed button-input traces and compare the resulting video frames, while procedural tests run ROMs that exercise hardware behavior and DMA audio tests compare generated sound output. The overall score weights replay tests most heavily, with procedural and audio tests contributing the remaining score.

Featured

Publications

Data explorers

Benchmarks by Epoch AI

AI Progress

Industry

Infrastructure

Impacts

Papers & Reports

Data Insights

Newsletter

Podcast

Capabilities

Models

Data Centers

Chip Owners

Companies

Polling on AI Use

MirrorCode

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

GBAEval

GBAEval

Methodology

GBAEval

Featured

Publications

Data explorers

Benchmarks by Epoch AI

AI Progress

Industry

Infrastructure

Impacts

Publications

Papers & Reports

Data Insights

Newsletter

Podcast

Data explorers

Capabilities

Models

Data Centers

Chip Owners

Companies

Polling on AI Use

Benchmarks by Epoch AI

MirrorCode

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

Scaling

Software progress

Open models

Capabilities

Math

Leading companies

Finances

Geopolitics

Chips

Data centers

Energy

Adoption and use

Economic impact

Future of AI

About Epoch AI

Donate

Team

Careers

Consultations

For press

Transparency

GBAEval

GBAEval

Methodology