Epoch AI’s work is free to use, distribute, and reproduce provided the source and authors are credited under the Creative Commons Attribution license.

Cite this work as

                
                  Konstantin F. Pilz et al. (2025), "Power requirements of leading AI supercomputers have doubled every 13 months". Published online at epoch.ai. Retrieved from: 'https://epoch.ai/data-insights/ai-supercomputers-power-trend' [online resource]

BibTeX citation

  
  @misc{epoch2025aisupercomputerspowertrend,
    title={Power requirements of leading AI supercomputers have doubled every 13 months},
    author={Konstantin F. Pilz and Robi Rahman and James Sanders and Luke Emberson and Lennart Heim},
    year={2025},
    url={https://epoch.ai/data-insights/ai-supercomputers-power-trend},
    note={Accessed: }
  }
  

              

Power requirements of leading AI supercomputers have doubled every 13 months

Leading AI supercomputers are becoming ever more energy-intensive, using more power-hungry chips in greater numbers. In January 2019, Summit at Oak Ridge National Lab had the highest power capacity of any AI supercomputer at 13 MW. Today, xAI’s Colossus supercomputer uses 280 MW, over 20x as much.

Enable JavaScript to see an interactive visualization.

Colossus relies on mobile generators because the local grid has insufficient power capacity for so much hardware. In the future, we may see frontier models trained across geographically distributed supercomputers, to mitigate the difficulty of delivering enormous amounts of power to a single location, similar to the training setup for Gemini 1.0.

Authors

Konstantin F. Pilz, Robi Rahman, James Sanders, Luke Emberson, Lennart Heim

Published

June 5, 2025

Explore this data

Data on GPU clusters

Over 500 GPU clusters and supercomputers, including those used for AI training and inference.

Learn more

Data

Data come from our AI Supercomputers dataset, which tracks 728 supercomputers with dedicated AI accelerators, spanning from 2010 to the present. By our estimate, these systems account for roughly 10-20% of all AI chip performance produced before 2025. We focus on the 502 AI supercomputers which became operational in 2019 or later, since these are most relevant to modern AI training.

When available, we use the power capacity reported by an AI supercomputer’s owner. When we cannot find a reported figure, we estimate the maximum theoretical power requirements of each AI supercomputer with the following formula:

Chip TDP × number of chips × system overhead × PUE

Chip Thermal Design Power (TDP) represents the maximum power a chip is designed to operate at. This value is public information for most AI chips, but information is not available for some Chinese chips and custom silicon, such as Google’s TPU v5p; we omit these systems from our analysis.

System overhead represents the additional power needed to run non-GPU hardware. We apply a value of 1.82 for all systems, based on NVIDIA DGX H100 server specifications. See the Assumptions section for more details on this figure.

Power Usage Effectiveness (PUE) represents the overhead associated with non-IT equipment (e.g. air conditioning and power conversion inefficiencies). PUE has diminished over time as datacenters have improved their efficiency; we obtain annual figures from Shehabi et al. (2024). We subtract 0.29 from the overall PUE in each year, since the same report finds that specialized datacenters (like AI supercomputers) have an average PUE that is 0.29 lower than the overall average.

When power usage is reported by the TOP500 list, which provides average power draw during benchmarks, we multiply this figure by 1.5 to approximate peak power consumption, based on comparisons from cases where both peak and average values are known.

Analysis

To estimate a trend in power requirements for leading AI supercomputers, we first identify 65 “frontier” AI supercomputers, defined as those which were among the top 10 most powerful supercomputers at the date they began operation (using 16-bit performance). We then filter to 60 frontier supercomputers where we were able to obtain a power usage estimate. To see how the power requirements of these frontier supercomputers have grown over time, we fit a log-linear regression of power capacity on time.

We find that power requirements for frontier AI supercomputers have been growing by 1.9x per year, with a 90% confidence interval of 1.7x to 2.1x.

Assumptions and limitations

Our estimates represent power capacity, not power consumption. Power capacity corresponds to power usage when a supercomputer is running at maximum utilization. AI supercomputers are usually not fully utilized, so total annual energy usage is substantially less than power draw x hours per year. For instance, we found that Top500 supercomputers consumed about 65% of their peak power on average when running the Linpack benchmark.
Owner-reported power figures that were not obtained from TOP500 may not be well standardized. For instance, some may represent only the IT load, while others may account for PUE. We assume the noise introduced by this lack of standardization is minimal relative to the trend.
We assume that our formula for calculating power usage based on chip TDP is appropriate and unbiased. To test the consistency of our calculations, we regressed log(calculated power) on log(reported power), for all supercomputers where both figures were available. We found a correlation of 0.98 and a power law exponent of 1.009, indicating that our calculated figures closely match reported figures across a wide range of scales.
Our power calculations rely on an estimate of non-GPU system overhead obtained from an NVIDIA DGX H100 system. While Hopper chips likely comprise the majority of all AI computing power, there is significant variation in system overhead – across a sample of other systems we were able to obtain figures for, overhead varied from 1.3 to 2.5. A geometric mean of all of the systems we found data for produced a very similar figure to the H100 system.