VPCT

Externally evaluated

The Visual Physics Comprehension Test (VPCT) is a benchmark designed to evaluate how well vision models can make predictions about basic physics scenarios.

Each problem involves an image with a ball, several ramps and buckets, and the task is to predict which of the buckets the ball is going to fall into. The full benchmark contains 100 problems of this format, with different ramp configurations.

VPCT was introduced by Chase Brower in 2025. The full dataset is accessible on Huggingface.

Methodology

We add data directly from the Visual Physics Comprehension Test web page.

The prompt used for the evaluation is the following:

You are an expert physics simulator. Looking at this image of a ball-and-bucket physics simulation, predict which bucket (numbered 1, 2, or 3 from left to right) the ball will eventually fall into.

Let’s think about this step by step:

  1. First, observe the initial position of the ball
  2. Note any obstacles or lines drawn that will affect the ball’s path
  3. Consider how gravity will affect the ball’s trajectory
  4. Think about how the ball will bounce and roll along the surfaces
  5. Analyze how the placement and angle of each line will guide the ball
  6. Factor in that the ball has some elasticity and will bounce slightly when it hits surfaces

Based on your analysis, please conclude with a clear answer in this format: ‘answer(X)’ where X is the bucket number (1, 2, or 3).

Explain your reasoning, then end with your answer in the specified format.

Reported model scores are averages over different numbers of runs, depending on the model (several models are evaluated with 2-3 runs but others are evaluated with just a single run). The evaluation code is accessible here.