AI Supercomputers Documentation
Overview
Our AI Supercomputers Dataset is a collection of AI compute clusters made of hardware typically used for training large-scale AI models, with key details such as their performance, hardware type, location, and estimated cost and power draw. This dataset is useful for research about trends in the physical infrastructure used for training artificial intelligence.
This documentation describes which AI supercomputers are contained within the dataset, the information in its records (including data fields and definitions), and processes for adding new entries and auditing accuracy. It also includes a changelog and acknowledgements.
The dataset is accessible on our website as a visualization or table, and is available for download as a CSV file, refreshed daily. For a quick-start example of loading the data and working with it in your research, see this Google Colab demo notebook.
If you would like to ask any questions about the database, or suggest any systems that should be added or edited, feel free to contact us at data@epoch.ai.
If this dataset is useful for you, please cite it.
Use This Work
Epoch’s data is free to use, distribute, and reproduce provided the source and authors are credited under the Creative Commons Attribution license.
Citation
Konstantin Pilz, Robi Rahman, James Sanders, Lennart Heim, ‘Trends in AI Supercomputers’. Published online at epoch.ai. Retrieved from: ‘https://epoch.ai/data/ai-supercomputers’ [online resource], accessed
BibTeX citation
@misc{EpochAISupercomputers2025,
title = {“Trends in AI Supercomputers”},
author = {{Konstantin Pilz, Robi Rahman, James Sanders, Lennart Heim}},
year = {2025},
month = {04},
url = {https://epoch.ai/data/ai-supercomputers},
note = {Accessed: "}
}}
Inclusion
The dataset focuses on AI supercomputers, which are systems that support training of large-scale ML models. They contain large quantities of processors known as AI accelerators, are deployed on a continuous campus without significant physical separation, and form a single system, typically connected by high-bandwidth networking fabric. AI supercomputers are sometimes also referred to as “GPU clusters” or “AI datacenters”.
Criteria
A supercomputer should be included in this database if both of the following criteria are satisfied:
- The system contains chips that can accelerate AI workloads. These include NVIDIA’s V100, A100, H100, and GB200 GPUs, Google’s TPUs, and other chips commonly used to train frontier AI models.
- The system has high theoretical performance relative to other AI supercomputers at the time it was built. In order to train state-of-the-art machine learning models, developers typically need access to more compute than the amount that was used for previous models. Additionally, hardware improvement is rapid and follows an exponential trend over time, so we use a dynamic threshold during the study period. For inclusion, the theoretical performance of the system must be at least 1% of the largest known AI supercomputer that existed on the date when it first became operational.
- Theoretical performance is calculated by multiplying the number of AI chips by their theoretical maximum (non-sparse) FLOP/s value, for the highest available FLOP/s metric on 32-, 16-, or 8-bit number formats. See here for the minimum FLOP/s count required at any point in time to be included.
Outside of the study period, we apply an inclusion threshold that does not scale over time:
- For systems before 2017, the AI supercomputer is included if its theoretical performance is at least 10^16 FLOP/s, in any numerical precision, which roughly corresponds to 1% of the highest performance of any pre-2017 supercomputer.
- For systems after 2024, the AI supercomputer is included if its theoretical performance is equivalent to at least 1,000 H100s, which is 1% the size of the leading supercomputer as of EOY 2024.
Additionally, we also include planned supercomputers, if we anticipate they will likely meet the above requirements once they become operational. These are indicated with a value of “Planned” in the Status field, and can be shown or hidden in the visualization.
Data sources
We collect and maintain data on AI supercomputers from a variety of sources, including machine learning papers, publicly available news articles, press releases, and existing lists of supercomputers.
We created a list of potential supercomputers by using the Google Search API to search key terms like “AI supercomputer” and “GPU cluster” from 2019 to 2025, then used GPT-4o to extract any supercomputers mentioned in the resulting articles. We also added supercomputers from publicly available lists such as Top500 and MLPerf, and GPU rental marketplaces. For each potential supercomputer, we manually searched for public information such as number and type of chips used, when it was first operational, reported performance, owner, and location. A detailed description of our methods can be found in Appendix A of our paper.
Coverage
We estimate that we cover 10-20% of AI supercomputers by computing power as of early 2025.
This includes roughly 20-37% of NVIDIA H100s, 12% of A100s, and 18% of AMD MI300Xs. Meanwhile, we estimate we cover less than 4% of Google’s TPUs and very few custom AI chips designed by AWS, Microsoft, or Meta. We also only cover about 2% of NVIDIA chips designed to be sold in China (including the A800, H800, and H20).
The coverage of different companies varies considerably, from 43% for Meta and 20% for Microsoft to 10% for AWS and 0% for Apple. The coverage of Chinese companies is particularly poor. Our average coverage of 8 major companies is 15%.
From the end of 2020 to the end of 2024 we cover between 10-20% of total Chinese 16-bit FLOP/s based on an estimate by IDC (2025).
For more information, see Appendix C.1.1 of our paper.
Records
An entry in this database contains the details of an AI supercomputer at a point in time, typically the first date it became operational.
Many AI supercomputers are upgraded or expanded over time with newer hardware. In these cases, we create a new record, following the procedure below, and reflecting the date when the upgrade was completed.
Upgrades to supercomputers
If an existing AI supercomputer is upgraded in a way that substantially changes the supercomputer, we count this as a new AI supercomputer and create a new entry. We do this when one of the criteria apply:
- The AI supercomputer increases performance by more than 20%
- The majority of the AI chips used in the supercomputer are changed
We mark that the later supercomputer builds on the former by linking them in the “Builds Upon” and “Superseded by” fields. (See the table of fields below for further details.)
Chinese data
We anonymized the data of Chinese systems by concealing names and rounding values of numerical fields to one significant figure. We do this to protect our public data sources and reduce the risk of owners redacting relevant information. We took this step in response to reports about reduced Chinese openness triggered by increased coverage in American think tank reports. We may grant some trusted researchers access to the full dataset upon request. Inquiries should be directed to data@epoch.ai.
Fields
Estimation
Some fields in the dataset can require estimation, because they are often not straightforwardly reported within papers or other sources. Here, we describe our estimation procedure for power requirements, hardware costs, and metadata on the confidence of our estimates.
Estimating power requirements
We calculated the peak power demand for each AI supercomputer with the following formula:
Power = Chip TDP × number of chips × system overhead × PUE
We collected Thermal Design Power (TDP) for most chips when publicly available, though we did not find the TDP for some Chinese chips and custom silicon such as Google’s TPU v5p. We considered both primary and secondary chips when counting the number and types of chips. We used a 1.82× multiplier for non-GPU hardware to account for system overhead (additional power needed for other server components like CPUs, network switches, and storage), based on NVIDIA DGX H100 server specifications (NVIDIA, 2025). We also factored in Power Usage Effectiveness (PUE), which is the ratio of total data center power use to IT power use (with a minimum value of 1). According to the 2024 United States Data Center Energy Usage Report (Shehabi et al., 2024), specialized AI datacenter facilities had an average PUE of 1.14 in 2023, which is 0.29 lower than the overall national average of 1.43. We adjusted the trend for all datacenter facilities to estimate the average PUE of AI datacenters by subtracting 0.29 from the overall values reported by Shehabi et al. (Table 2).
For more information, see Appendix B.4 in our paper.
Estimating hardware cost
We use the publicly reported total hardware cost of the AI supercomputer in our analysis whenever it is available. When it is unavailable, we estimate this cost based on the chip type, quantity, and public chip prices. For each type of chip, we multiply the cost per chip by the number of chips, multiply by factors for intra-server and inter-server overhead, and then sum these costs if there are multiple types of chips. Then, we adjust for the cost of server-to-server networking equipment, which was estimated to be 19% of final hardware acquisition costs. Finally, we apply a discount factor of 15% to the final hardware cost of the AI supercomputer to account for large purchasers of AI chips often negotiating a discount on their order.
Our final formula for estimating hardware cost is as follows:
Hardware acquisition cost
=
[(Primary AI chip cost×Primary AI chip quantity)
+(Secondary AI chip cost×Secondary AI chip quantity)]
×Intra-server overhead×Inter-server overhead×Discount factor
In this formula, our intra-server overhead, or “chip-to-server” factor, is 1.64×, our inter-server overhead, or “server-to-cluster” factor, is 1.23×, and our discount factor is 0.85×.
For more information, see Appendix B.5 in our paper.
Estimating certainty
In many cases, we don’t have all of the exact details for the system, and so have to make an estimate based on the information we have. When we do so, we explain our reasoning in the “Notes” field, and lower the “Certainty” field as warranted.
For example, when we know the power capacity of the system, but not the chip type, we sometimes estimate the FLOP/s based on the net FLOP/s/W based on the chip(s) they are most likely using. For systems several years in the future, where we don’t know the specifications for chips that will exist by then, we assume that the 1.26x/year trend in FLOP/s/W (Appendix D.1 of our paper) improvement will continue.
Downloads
We recommend downloading the processed dataset, which filters out clusters not analyzed in our study. This excludes clusters based on low-certainty rumors, clusters not known to be contiguous within a single location, and potential duplicates.
The raw dataset is also available but not recommended.
Processed Dataset
CSV, Updated July 16, 2025
Acknowledgements
This data was collected by Epoch AI’s employees and collaborators, including Konstantin Pilz, Angelina Li, Rose Hadshar, Ben Cottier, Robi Rahman, James Sanders, Joanne Fang, Veronika Blablová, Lovis Heindrich, and David Atanasov.
This documentation was written by Konstantin Pilz, James Sanders, and Robi Rahman. Material on estimating supercomputer cost was adapted from previous work by Ben Cottier and Robi Rahman. Material on estimating hardware power consumption was adapted from previous work by Luke Emberson and Robi Rahman.