GPU Clusters Documentation

Estimation

Some fields in the dataset can require estimation, because they are often not straightforwardly reported within papers or other sources. Here, we describe our estimation procedure for power requirements, hardware costs, and metadata on the confidence of our estimates.

Estimating power requirements

We calculated the peak power demand for each GPU cluster with the following formula:

Power = Chip TDP × number of chips × system overhead × PUE

We collected Thermal Design Power (TDP) for most chips when publicly available, though we did not find the TDP for some Chinese chips and custom silicon such as Google’s TPU v5p. We considered both primary and secondary chips when counting the number and types of chips. We used a 1.82× multiplier for non-GPU hardware to account for system overhead (additional power needed for other server components like CPUs, network switches, and storage), based on NVIDIA DGX H100 server specifications (NVIDIA, 2025). We also factored in Power Usage Effectiveness (PUE), which is the ratio of total data center power use to IT power use (with a minimum value of 1). According to the 2024 United States Data Center Energy Usage Report (Shehabi et al., 2024), specialized AI data center facilities had an average PUE of 1.14 in 2023, which is 0.29 lower than the overall national average of 1.43. We adjusted the trend for all data center facilities to estimate the average PUE of AI data centers by subtracting 0.29 from the overall values reported by Shehabi et al. (Table 2).

For more information, see Appendix B.4 in our paper.

Estimating hardware cost

We use the publicly reported total hardware cost of the GPU cluster in our analysis whenever it is available. When it is unavailable, we estimate this cost based on the chip type, quantity, and public chip prices. For each type of chip, we multiply the cost per chip by the number of chips, multiply by factors for intra-server and inter-server overhead, and then sum these costs if there are multiple types of chips. Then, we adjust for the cost of server-to-server networking equipment, which was estimated to be 19% of final hardware acquisition costs. Finally, we apply a discount factor of 15% to the final hardware cost of the cluster to account for large purchasers of AI chips often negotiating a discount on their order.

Our final formula for estimating hardware cost is as follows:

Hardware acquisition cost = [(Primary AI chip cost × Primary AI chip quantity) + (Secondary AI chip cost × Secondary AI chip quantity)] × Intra-server overhead × Inter-server overhead × Discount factor

In this formula, our intra-server overhead, or “chip-to-server” factor, is 1.64×, our inter-server overhead, or “server-to-cluster” factor, is 1.23×, and our discount factor is 0.85×.

For more information, see Appendix B.5 in our paper.

Estimating certainty

In many cases, we don’t have all of the exact details for the system, and so have to make an estimate based on the information we have. When we do so, we explain our reasoning in the “Notes” field, and lower the “Certainty” field as warranted.

For example, when we know the power capacity of the system, but not the chip type, we sometimes estimate the FLOP/s based on the net FLOP/s/W based on the chip(s) they are most likely using. For systems several years in the future, where we don’t know the specifications for chips that will exist by then, we assume that the 1.26x/year trend in FLOP/s/W (Appendix D.1 of our paper) improvement will continue.

Records

Downloads

Featured

Publications

Data explorers

Benchmarks by Epoch AI

AI Progress

Industry

Infrastructure

Impacts

Papers & Reports

Data Insights

Newsletter

Podcast

Capabilities

Models

Data Centers

Chip Owners

Companies

Polling on AI Use

MirrorCode

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

Estimation

Estimating power requirements

Estimating hardware cost

Estimating certainty

GPU Clusters Documentation – Estimation

Featured

Publications

Data explorers

Benchmarks by Epoch AI

AI Progress

Industry

Infrastructure

Impacts

Publications

Papers & Reports

Data Insights

Newsletter

Podcast

Data explorers

Capabilities

Models

Data Centers

Chip Owners

Companies

Polling on AI Use

Benchmarks by Epoch AI

MirrorCode

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

Scaling

Software progress

Open models

Capabilities

Math

Leading companies

Finances

Geopolitics

Chips

Data centers

Energy

Adoption and use

Economic impact

Future of AI

About Epoch AI

Donate

Team

Careers

Consultations

For press

Transparency

GPU Clusters Documentation

Estimation

Estimating power requirements

Estimating hardware cost

Estimating certainty