About Factorio Learning Env.

Factorio is a video game where the player manages various in-game engineering projects in order to acquire resources and further develop infrastructure. Models in the Factorio Learning Environment are given instructions followed by the state of play on every step and are required to output code to represent the specific actions they’d like to take. The benchmark comprises two settings within that environment:

  1. Lab play, where the agent is required to build a factory of a given capacity in an environment with fixed resources, and
  2. Open play, where the agent is required to build as big of a factory as possible

Methodology

We source Factorio Learning Environment (FLE) evaluation results directly from the official leaderboard, maintained by the benchmark creators.

The benchmark evaluates LLMs acting as agents within the Factorio game. Agents interact with the game by generating Python code using a provided API within a Read-Eval-Print Loop (REPL). Each interaction cycle (code generation, execution, observation) constitutes one step.

For the lab-play setting, agents are tasked to build fully automatic production lines for 24 distinct target entities of increasing complexity, such as “Electronic circuit”, “Engine unit” or “Sulfuric acid”. Agents start with an inventory containing enough items to complete the task. Each task involves a trajectory of 128 API calls. The task counts as completed if the agent built a production line for the target entitity that achieves a target throughput during a 60-second period. The benchmark authors manually verified successful production lines to check that the agents did not cheat. The reported metric is the mean success rate across 8 runs per task.

For the open-play setting, agents operate in a world with unbounded space and resources, and are tasked to “build the largest factory possible.” The task stops after 5000 steps. The reported metric is the median production score across 8 independent runs.

For detailed methodology, including the specific API provided to agents, scoring calculations, environment setup, and agent scaffolding used for the leaderboard results, please refer to the original Factorio Learning Environment paper (arXiv:2503.09617) and code in the Factorio Learning Environment GitHub repository.