Epoch AI’s work is free to use, distribute, and reproduce provided the source and authors are credited under the Creative Commons Attribution license.

Cite this work as

Luke Emberson and David Owen (2025), "Training compute growth is driven by larger clusters, longer training, and better hardware". Published online at epoch.ai. Retrieved from: 'https://epoch.ai/data-insights/training-compute-decomposition' [online resource]

BibTeX citation

  
  @misc{epoch2025trainingcomputedecomposition,
    title={Training compute growth is driven by larger clusters, longer training, and better hardware},
    author={Luke Emberson and David Owen},
    year={2025},
    url={https://epoch.ai/data-insights/training-compute-decomposition},
    note={Accessed: 2025-08-13}
  }
  

              

Training compute growth is driven by larger clusters, longer training, and better hardware

Since 2018, the most significant driver of compute scaling across frontier models has likely been an increase in the quantity of hardware used in training clusters. Also important have been a shift towards longer training runs, and increases in hardware performance.

Enable JavaScript to see an interactive visualization.

These trends are closely linked to a massive surge in investment. AI development budgets have been expanding by around 2-3x per year, enabling vast training and inference clusters and ever-larger models.

Authors

Luke Emberson, David Owen

Published

January 08, 2025

Learn more

Overview

As training compute for frontier AI models grows rapidly, understanding the key drivers of this growth becomes increasingly important. We analyze three main factors: hardware cluster sizes, per-unit hardware computing power, and training duration. By fitting exponential trends to these factors, we calculate their relative multiplicative contributions to the overall growth in training compute. Our analysis focuses on frontier models developed after 2018—a period during which we previously observed a 4-5x growth rate in training compute. We also examined potential trends in hardware utilization rates, finding weak evidence of a positive trend. However, since the data quality is poor and their impact is small, we omit them from our main analysis.

Data

Analysis

Assumptions

Explore this data

Data on AI Models

Our public database, the largest of its kind, tracks over 2600 machine learning models from 1950 to today. Explore data and graphs showing the trajectory of AI.

Updated June 19, 2024

Variable	10th percentile	Median	90th percentile
Hardware FLOP/s	1.36	1.41	1.48
Hardware quantity	1.5	1.69	1.91
Training time	1.37	1.53	1.71
Combined product*	3.58	4.27	4.93
Training compute	3.74	4.17	4.62