AI Supercomputers Documentation
Overview
Our AI Supercomputers Dataset is a collection of AI compute clusters made of hardware typically used for training large-scale AI models, with key details such as their performance, hardware type, location, and estimated cost and power draw. This dataset is useful for research about trends in the physical infrastructure used for training artificial intelligence.
This documentation describes which AI supercomputers are contained within the dataset, the information in its records (including data fields and definitions), and processes for adding new entries and auditing accuracy. It also includes a changelog and acknowledgements.
The dataset is accessible on our website as a visualization or table, and is available for download as a CSV file, refreshed daily. For a quick-start example of loading the data and working with it in your research, see this Google Colab demo notebook.
If you would like to ask any questions about the database, or suggest any systems that should be added or edited, feel free to contact us at data@epoch.ai.
If this dataset is useful for you, please cite it.
Use This Work
Epoch’s data is free to use, distribute, and reproduce provided the source and authors are credited under the Creative Commons Attribution license.
Citation
Konstantin Pilz, Robi Rahman, James Sanders, Lennart Heim, ‘Trends in AI Supercomputers’. Published online at epoch.ai. Retrieved from: ‘https://epoch.ai/data/ai-supercomputers’ [online resource], accessed 2025-08-13
BibTeX citation
@misc{EpochAISupercomputers2025,
title = {“Trends in AI Supercomputers”},
author = {{Konstantin Pilz, Robi Rahman, James Sanders, Lennart Heim}},
year = {2025},
month = {04},
url = {https://epoch.ai/data/ai-supercomputers},
note = {Accessed: 2025-08-13"}
}}
Inclusion
The dataset focuses on AI supercomputers, which are systems that support training of large-scale ML models. They contain large quantities of processors known as AI accelerators, are deployed on a continuous campus without significant physical separation, and form a single system, typically connected by high-bandwidth networking fabric. AI supercomputers are sometimes also referred to as “GPU clusters” or “AI datacenters”.
Criteria
A supercomputer should be included in this database if both of the following criteria are satisfied:
- The system contains chips that can accelerate AI workloads. These include NVIDIA’s V100, A100, H100, and GB200 GPUs, Google’s TPUs, and other chips commonly used to train frontier AI models.
- The system has high theoretical performance relative to other AI supercomputers at the time it was built. In order to train state-of-the-art machine learning models, developers typically need access to more compute than the amount that was used for previous models. Additionally, hardware improvement is rapid and follows an exponential trend over time, so we use a dynamic threshold during the study period. For inclusion, the theoretical performance of the system must be at least 1% of the largest known AI supercomputer that existed on the date when it first became operational.
- Theoretical performance is calculated by multiplying the number of AI chips by their theoretical maximum (non-sparse) FLOP/s value, for the highest available FLOP/s metric on 32-, 16-, or 8-bit number formats. See here for the minimum FLOP/s count required at any point in time to be included.
Outside of the study period, we apply an inclusion threshold that does not scale over time:
- For systems before 2017, the AI supercomputer is included if its theoretical performance is at least 10^16 FLOP/s, in any numerical precision, which roughly corresponds to 1% of the highest performance of any pre-2017 supercomputer.
- For systems after 2024, the AI supercomputer is included if its theoretical performance is equivalent to at least 1,000 H100s, which is 1% the size of the leading supercomputer as of EOY 2024.
Additionally, we also include planned supercomputers, if we anticipate they will likely meet the above requirements once they become operational. These are indicated with a value of “Planned” in the Status field, and can be shown or hidden in the visualization.
Data sources
We collect and maintain data on AI supercomputers from a variety of sources, including machine learning papers, publicly available news articles, press releases, and existing lists of supercomputers.
We created a list of potential supercomputers by using the Google Search API to search key terms like “AI supercomputer” and “GPU cluster” from 2019 to 2025, then used GPT-4o to extract any supercomputers mentioned in the resulting articles. We also added supercomputers from publicly available lists such as Top500 and MLPerf, and GPU rental marketplaces. For each potential supercomputer, we manually searched for public information such as number and type of chips used, when it was first operational, reported performance, owner, and location. A detailed description of our methods can be found in Appendix A of our paper.
Coverage
We estimate that we cover 10-20% of AI supercomputers by computing power as of early 2025.
This includes roughly 20-37% of NVIDIA H100s, 12% of A100s, and 18% of AMD MI300Xs. Meanwhile, we estimate we cover less than 4% of Google’s TPUs and very few custom AI chips designed by AWS, Microsoft, or Meta. We also only cover about 2% of NVIDIA chips designed to be sold in China (including the A800, H800, and H20).
The coverage of different companies varies considerably, from 43% for Meta and 20% for Microsoft to 10% for AWS and 0% for Apple. The coverage of Chinese companies is particularly poor. Our average coverage of 8 major companies is 15%.
From the end of 2020 to the end of 2024 we cover between 10-20% of total Chinese 16-bit FLOP/s based on an estimate by IDC (2025).
For more information, see Appendix C.1.1 of our paper.
Records
An entry in this database contains the details of an AI supercomputer at a point in time, typically the first date it became operational.
Many AI supercomputers are upgraded or expanded over time with newer hardware. In these cases, we create a new record, following the procedure below, and reflecting the date when the upgrade was completed.
Upgrades to supercomputers
If an existing AI supercomputer is upgraded in a way that substantially changes the supercomputer, we count this as a new AI supercomputer and create a new entry. We do this when one of the criteria apply:
- The AI supercomputer increases performance by more than 20%
- The majority of the AI chips used in the supercomputer are changed
We mark that the later supercomputer builds on the former by linking them in the “Builds Upon” and “Superseded by” fields. (See the table of fields below for further details.)
Chinese data
We anonymized the data of Chinese systems by concealing names and rounding values of numerical fields to one significant figure. We do this to protect our public data sources and reduce the risk of owners redacting relevant information. We took this step in response to reports about reduced Chinese openness triggered by increased coverage in American think tank reports. We may grant some trusted researchers access to the full dataset upon request. Inquiries should be directed to data@epoch.ai.
Fields
Column | Type | Definition | Example from Oak Ridge NL Frontier | Coverage |
---|---|---|---|---|
Name | Text | The owner’s name, followed by the cluster name used in the most official announcement. |
Oak Ridge NL Frontier | 100% 747 out of 747 models |
Status | Categorical (single select) | Existing: The supercomputer (as described in this entry) is believed to be operational (>80% of the GPUs available for training) |
Existing | 100% 747 out of 747 models |
Certainty | Categorical (single select) | How likely does (or will) this cluster exist as roughly as specified? |
Confirmed | 100% 747 out of 747 models |
Single cluster? | Categorical (single select) | Yes: The source clearly implies a single supercomputers deployed on a continuous campus without significant physical separation, connected by high-bandwidth networking fabric, forming a single system. There is no evidence otherwise |
Yes | 100% 745 out of 747 models |
Chip type (primary) | Categorical (single select) | The primary AI chip used in the supercomputer. This links to a specific chip in the Epoch ML Hardware Database. |
AMD Radeon Instinct MI250X | 61% 452 out of 747 models |
Chip quantity (primary) | Numerical | Total number of the primary AI chips in the supercomputer. |
37,632 | 84% 627 out of 747 models |
Hardware note | Text | Note to describe the all the AI chips in the supercomputer. |
AMD MI250X | 62% 462 out of 747 models |
Max OP/s (log) | Numerical | The base 10 logrithm of the maximum theorhetical operations per second the superocmputer can achieve (out of 32, 16, or 8 bit performance) |
19.158756074028645 | 93% 693 out of 747 models |
16-bit OP/s (log) | Numerical | Base 10 logrithm of the maximum theorhetical performance of the supercomputer in 16-bit FLOP/s |
19.158756074028645 | 90% 673 out of 747 models |
H100 equivalents | Numerical | Divides the supercomputer’s Max OP/s field (in 32, 16, or 8 bit) by an H100’s FP8 FLOP/s. Note, H100 equivalents isn’t a well defined measurement, and this number should be treated more as a “sense” instead of a precise measurement. It works fairly well for comparisons among supercomputers since ~2021, but makes less sense for older supercomputers. |
7,282 | 93% 693 out of 747 models |
First Operational Date | Date | When the supercomputer was first fully operational (ie, you could run a workload on at least 80% of the cluster). This will often be an approximation to the nearest month or so, since the exact data a supercomputer became operational is rarely announced. By default, we are conservative with this date, and set it to the first date that we have confirmation that it was operational, which will often be a few months after it was actually first operational. |
2022-05-30 | 78% 581 out of 747 models |
Country | Categorical (single select) | Country in which the supercomputer is physically located. |
United States of America | 89% 663 out of 747 models |
Owner | Categorical (multiple select) | The entity that owns the AI supercomputer itself (the hardware within the datacenter). This can include several entity if it is a joint venture. This can sometimes differ from the entity that owns the datacenter itself, or the entity that has a long term arrangement to rent the supercomputer. |
US Department of Energy | 62% 465 out of 747 models |
Sector | Categorical (single select) | Private: Owned by a commercial entity. |
Public | 98% 731 out of 747 models |
Power Capacity (MW) | Numerical | The peak power capacity of the system, in megawatts. Will be the reported power capacity if available, and the calculated power capacity if not. |
40.0 | 83% 618 out of 747 models |
Hardware Cost | Numerical | The cost to acquire the hardware for the system in 2025 US dollars. Includes the cost for the GPUs, CPUs, networking, etc, but not the datacenter itself. Will be the reported hardware cost if it exists, and the calculated hardware cost otherwise. |
$620,445,966 | 72% 537 out of 747 models |
Energy Efficiency | Numerical | Log of the 16-bit FLOP/s per watt. The numerator is the theorhetical max 16-bit FLOP/s of the system, and the denomenator is the peak power capacity of the system. |
360326400000.0 | 79% 593 out of 747 models |
Rank when first operational | Numerical | This supercomputer’s rank in the world by performance (FLOP/s, maximum of 32, 16, or 8 bit precision) the day it first became operational |
2 | 72% 540 out of 747 models |
Location | Text | Specific location where the supercomputer is physicaly located. |
Oak Ridge National Laboratory 5200, 1 Bethel Valley Rd, Oak Ridge, TN 37830 | 76% 571 out of 747 models |
Users | Categorical (multiple select) | Any significant entities known to use the supercomputer. “Cloud” refers to the supercomputer being offered publicly via the cloud. Will often just be the owner. |
US Government, Academia, Industry | 59% 438 out of 747 models |
Quote | Text | A quote from the source specifying the number of and type of chips and/or the reported or manual FLOP/s number if applicable. |
The system has 74 Olympus rack HPE cabinets, each with 128 AMD compute nodes, and a total of 9,408 AMD compute nodes.... Each Frontier compute node consists of [1x] 64-core AMD “Optimized 3rd Gen EPYC” CPU (with 2 hardware threads per physical core) with access to 512 GB of DDR4 memory. Each node also contains [4x] AMD MI250X | 62% 465 out of 747 models |
Note | Text | Notes that give additional context or helpful information about the supercomputer. |
(9408 nodes)x(4 GPUs/node) = 37,632 GPUs | 27% 203 out of 747 models |
First Operational Date Note | Text | When the supercomputer was first fully operational (ie, you could run a workload on at least 80% of the cluster). This will often be an approximation to the nearest month or so, since the exact data a supercomputer became operational is rarely announced. By default, we are conservative with this date, and set it to the first date that we have confirmation that it was operational, which will often be a few months after it was actually first operational. |
2022-05-30 | 66% 490 out of 747 models |
Certainty Note | Text | A briefly explanation of why the given certainty level was selected |
Details released on official government website | 16% 122 out of 747 models |
Builds Upon | Categorical (single select) | If a supercomputer was built in multiple phases, this links to the supercomputer corresponding to the preceding phase. |
[empty] | 7% 56 out of 747 models |
Superseded by | Categorical (single select) | If a supercomputer has a later phase (eg, was upgraded later in time), and that phase is already operational, this links to the supercomputer entry corresponding to that phase |
[empty] | 4% 31 out of 747 models |
Possible duplicate | Boolean | True if we think there is a >20% chance that this supercomputer is a duplicate of a supercomputer that already exists in this database. In circumstances where we think two entries might correspond to the same supercomputer, we will mark this field False for the entry with which we have more information, and True for the entry with which we have less information (this allows the user to filter out all entries with “Possible Duplicate”=True for analysis) |
False | 13% 94 out of 747 models |
Possible Duplicate Of | Categorical (multiple select) | If “Possible duplicate” is marked True, this will link to the entry that we think this supercomputer is a duplicate of |
[empty] | 7% 55 out of 747 models |
Chip type (secondary) | Categorical (single select) | The secondary AI chip used in the supercomputer, if there was any. This links to a specific chip in the Epoch ML Hardware Database. |
[empty] Frontier only used one AI chip type. |
4% 27 out of 747 models |
Chip quantity (secondary) | Numerical | Total number of the secondary AI chips in the supercomputer, if there are any. |
[empty] Frontier only used one AI chip type. |
4% 29 out of 747 models |
Total number of AI chips | Numerical | Corresponds to the number of AI chips in the supercomputer. Maxium of {Primary AI chip quantity + Secondary AI chip quantity; Manual total chips number; number of chips calculated from FLOP/s} |
37,632 | 85% 637 out of 747 models |
GPU supplier (primary) | Categorical (single select) | The GPU supplier for the Primary AI Chip type |
AMD | 100% 747 out of 747 models |
GPU supplier (secondary) | Categorical (single select) | The GPU supplier for the Secondary AI Chip type (if applicable) |
[empty] Frontier only used one AI chip type. |
38% 283 out of 747 models |
Exclude | Boolean | True if the entry should be excluded from analyses. This could be becase it falls below the inclusion threshold, or there is another reason for excluding it. False otherwise. |
False | 0% 0 out of 747 models |
Include in Standard Analysis | Boolean | True if this entry qualified to be used in the analyses we used for the original paper. That is, (Status=”Existing” or “Decomissioned”) AND (Exclude=False) AND (Certainty = “Confirmed” or “Likely”) AND (Possible Duplicate=False) AND (Single Cluster?=”Yes”) |
True | 63% 469 out of 747 models |
Max OP/s | Numerical | The maximum theorhetical operations per second the superocmputer can achieve (out of 32, 16, or 8 bit performance) |
1.4e19 | 93% 693 out of 747 models |
8-bit OP/s | Numerical | The maximum theorhetical operations per second the superocmputer can achieve in 8-bit numerical precision |
1.4e19 | 52% 387 out of 747 models |
16-bit OP/s | Numerical | The maximum theorhetical operations per second the superocmputer can achieve in 16-bit numerical precision |
1.4e19 | 90% 673 out of 747 models |
32-bit OP/s | Numerical | The maximum theorhetical operations per second the superocmputer can achieve in 32-bit numerical precision |
3.6e18 | 74% 550 out of 747 models |
Calculated Power Capacity (MW) | Numerical | The calculated power capacity of the supercomputer in megawatts based on the number and type of chips. See paper for full methodology |
40.0667904 | 80% 596 out of 747 models |
Reported power capacity (MW) | Text | Reported power capacity of the supercomputer in megawatts |
40.0 | 17% 124 out of 747 models |
Calculated Cost | Numerical | Calculates the cost (in 2025 US dollars) based on number and type of chips. See paper for full methodology |
$967,488,702 | 72% 536 out of 747 models |
Reported Cost | Numerical | Reported total cost of the supercomputer in US dollars at the time. |
$600,000,000 | 4% 33 out of 747 models |
Reported Cost (Inflation adjusted) | Numerical | Reported total cost of the supercomputer, adjusted to 2025 US dollars |
$620,445,966 | 4% 30 out of 747 models |
Cost Quote | Text | Quote from the source saying how much the supercomputer cost. Might refer to a currency besides dollars, or not yet be inflation adjusted |
$600M | 5% 34 out of 747 models |
Largest existing cluster when first operational | Text | The name of the largest known supercomputer existing on the first operaitonal date of this supercomputer |
Microsoft GPT-4 cluster | 72% 540 out of 747 models |
% of largest cluster when first operational | Numerical | The maximum theorhetical FLOP/s (32, 16, or 8 bit) for the this supercomputer, divided by the maximum theorhetical FLOP/s for the largest known supercomputer existing at the time. |
0.9239138461538378 | 72% 540 out of 747 models |
Source 1 (through Source n) | URL | URL for source on the AI supercomputer. Generally saved as a Wayback Machine link to preserve the link |
https://web.archive.org/web/20240720224959/https://docs.olcf.ornl.gov/systems/frontier_user_guide.html | 65% 487 out of 747 models |
Estimation
Some fields in the dataset can require estimation, because they are often not straightforwardly reported within papers or other sources. Here, we describe our estimation procedure for power requirements, hardware costs, and metadata on the confidence of our estimates.
Estimating power requirements
We calculated the peak power demand for each AI supercomputer with the following formula:
Power = Chip TDP × number of chips × system overhead × PUE
We collected Thermal Design Power (TDP) for most chips when publicly available, though we did not find the TDP for some Chinese chips and custom silicon such as Google’s TPU v5p. We considered both primary and secondary chips when counting the number and types of chips. We used a 1.82× multiplier for non-GPU hardware to account for system overhead (additional power needed for other server components like CPUs, network switches, and storage), based on NVIDIA DGX H100 server specifications (NVIDIA, 2025). We also factored in Power Usage Effectiveness (PUE), which is the ratio of total data center power use to IT power use (with a minimum value of 1). According to the 2024 United States Data Center Energy Usage Report (Shehabi et al., 2024), specialized AI datacenter facilities had an average PUE of 1.14 in 2023, which is 0.29 lower than the overall national average of 1.43. We adjusted the trend for all datacenter facilities to estimate the average PUE of AI datacenters by subtracting 0.29 from the overall values reported by Shehabi et al. (Table 2).
For more information, see Appendix B.4 in our paper.
Estimating hardware cost
We use the publicly reported total hardware cost of the AI supercomputer in our analysis whenever it is available. When it is unavailable, we estimate this cost based on the chip type, quantity, and public chip prices. For each type of chip, we multiply the cost per chip by the number of chips, multiply by factors for intra-server and inter-server overhead, and then sum these costs if there are multiple types of chips. Then, we adjust for the cost of server-to-server networking equipment, which was estimated to be 19% of final hardware acquisition costs. Finally, we apply a discount factor of 15% to the final hardware cost of the AI supercomputer to account for large purchasers of AI chips often negotiating a discount on their order.
Our final formula for estimating hardware cost is as follows:
Hardware acquisition cost
=
[(Primary AI chip cost×Primary AI chip quantity)
+(Secondary AI chip cost×Secondary AI chip quantity)]
×Intra-server overhead×Inter-server overhead×Discount factor
In this formula, our intra-server overhead, or “chip-to-server” factor, is 1.64×, our inter-server overhead, or “server-to-cluster” factor, is 1.23×, and our discount factor is 0.85×.
For more information, see Appendix B.5 in our paper.
Estimating certainty
In many cases, we don’t have all of the exact details for the system, and so have to make an estimate based on the information we have. When we do so, we explain our reasoning in the “Notes” field, and lower the “Certainty” field as warranted.
For example, when we know the power capacity of the system, but not the chip type, we sometimes estimate the FLOP/s based on the net FLOP/s/W based on the chip(s) they are most likely using. For systems several years in the future, where we don’t know the specifications for chips that will exist by then, we assume that the 1.26x/year trend in FLOP/s/W (Appendix D.1 of our paper) improvement will continue.
Downloads
We recommend downloading the processed dataset, which filters out clusters not analyzed in our study. This excludes clusters based on low-certainty rumors, clusters not known to be contiguous within a single location, and potential duplicates.
The raw dataset is also available but not recommended.
Processed Dataset
CSV, Updated August 13, 2025
Acknowledgements
This data was collected by Epoch AI’s employees and collaborators, including Konstantin Pilz, Angelina Li, Rose Hadshar, Ben Cottier, Robi Rahman, James Sanders, Joanne Fang, Veronika Blablová, Lovis Heindrich, and David Atanasov.
This documentation was written by Konstantin Pilz, James Sanders, and Robi Rahman. Material on estimating supercomputer cost was adapted from previous work by Ben Cottier and Robi Rahman. Material on estimating hardware power consumption was adapted from previous work by Luke Emberson and Robi Rahman.