AI Supercomputers Documentation

Overview

Our AI Supercomputers Dataset is a collection of AI compute clusters made of hardware typically used for training large-scale AI models, with key details such as their performance, hardware type, location, and estimated cost and power draw. This dataset is useful for research about trends in the physical infrastructure used for training artificial intelligence.

This documentation describes which AI supercomputers are contained within the dataset, the information in its records (including data fields and definitions), and processes for adding new entries and auditing accuracy. It also includes a changelog and acknowledgements.

The dataset is accessible on our website as a visualization or table, and is available for download as a CSV file, refreshed daily. For a quick-start example of loading the data and working with it in your research, see this Google Colab demo notebook.

If you would like to ask any questions about the database, or suggest any systems that should be added or edited, feel free to contact us at data@epoch.ai.

If this dataset is useful for you, please cite it.

Use This Work

Epoch’s data is free to use, distribute, and reproduce provided the source and authors are credited under the Creative Commons Attribution license.

Citation

Konstantin Pilz, Robi Rahman, James Sanders, Lennart Heim, ‘Trends in AI Supercomputers’. Published online at epoch.ai. Retrieved from: ‘https://epoch.ai/data/ai-supercomputers’ [online resource], accessed 2025-08-13

BibTeX citation

@misc{EpochAISupercomputers2025,
  title = {“Trends in AI Supercomputers”},
  author = {{Konstantin Pilz, Robi Rahman, James Sanders, Lennart Heim}},
  year = {2025},
  month = {04},
  url = {https://epoch.ai/data/ai-supercomputers},
  note = {Accessed: 2025-08-13"}
}}

Inclusion

The dataset focuses on AI supercomputers, which are systems that support training of large-scale ML models. They contain large quantities of processors known as AI accelerators, are deployed on a continuous campus without significant physical separation, and form a single system, typically connected by high-bandwidth networking fabric. AI supercomputers are sometimes also referred to as “GPU clusters” or “AI datacenters”.

Criteria

A supercomputer should be included in this database if both of the following criteria are satisfied:

The system contains chips that can accelerate AI workloads. These include NVIDIA’s V100, A100, H100, and GB200 GPUs, Google’s TPUs, and other chips commonly used to train frontier AI models.
The system has high theoretical performance relative to other AI supercomputers at the time it was built. In order to train state-of-the-art machine learning models, developers typically need access to more compute than the amount that was used for previous models. Additionally, hardware improvement is rapid and follows an exponential trend over time, so we use a dynamic threshold during the study period. For inclusion, the theoretical performance of the system must be at least 1% of the largest known AI supercomputer that existed on the date when it first became operational.
- Theoretical performance is calculated by multiplying the number of AI chips by their theoretical maximum (non-sparse) FLOP/s value, for the highest available FLOP/s metric on 32-, 16-, or 8-bit number formats. See here for the minimum FLOP/s count required at any point in time to be included.
Outside of the study period, we apply an inclusion threshold that does not scale over time:
- For systems before 2017, the AI supercomputer is included if its theoretical performance is at least 10^16 FLOP/s, in any numerical precision, which roughly corresponds to 1% of the highest performance of any pre-2017 supercomputer.
- For systems after 2024, the AI supercomputer is included if its theoretical performance is equivalent to at least 1,000 H100s, which is 1% the size of the leading supercomputer as of EOY 2024.

Additionally, we also include planned supercomputers, if we anticipate they will likely meet the above requirements once they become operational. These are indicated with a value of “Planned” in the Status field, and can be shown or hidden in the visualization.

Data sources

We collect and maintain data on AI supercomputers from a variety of sources, including machine learning papers, publicly available news articles, press releases, and existing lists of supercomputers.

We created a list of potential supercomputers by using the Google Search API to search key terms like “AI supercomputer” and “GPU cluster” from 2019 to 2025, then used GPT-4o to extract any supercomputers mentioned in the resulting articles. We also added supercomputers from publicly available lists such as Top500 and MLPerf, and GPU rental marketplaces. For each potential supercomputer, we manually searched for public information such as number and type of chips used, when it was first operational, reported performance, owner, and location. A detailed description of our methods can be found in Appendix A of our paper.

Coverage

We estimate that we cover 10-20% of AI supercomputers by computing power as of early 2025.

This includes roughly 20-37% of NVIDIA H100s, 12% of A100s, and 18% of AMD MI300Xs. Meanwhile, we estimate we cover less than 4% of Google’s TPUs and very few custom AI chips designed by AWS, Microsoft, or Meta. We also only cover about 2% of NVIDIA chips designed to be sold in China (including the A800, H800, and H20).
The coverage of different companies varies considerably, from 43% for Meta and 20% for Microsoft to 10% for AWS and 0% for Apple. The coverage of Chinese companies is particularly poor. Our average coverage of 8 major companies is 15%.
From the end of 2020 to the end of 2024 we cover between 10-20% of total Chinese 16-bit FLOP/s based on an estimate by IDC (2025).

For more information, see Appendix C.1.1 of our paper.

Records

An entry in this database contains the details of an AI supercomputer at a point in time, typically the first date it became operational.

Many AI supercomputers are upgraded or expanded over time with newer hardware. In these cases, we create a new record, following the procedure below, and reflecting the date when the upgrade was completed.

Upgrades to supercomputers

If an existing AI supercomputer is upgraded in a way that substantially changes the supercomputer, we count this as a new AI supercomputer and create a new entry. We do this when one of the criteria apply:

The AI supercomputer increases performance by more than 20%
The majority of the AI chips used in the supercomputer are changed

We mark that the later supercomputer builds on the former by linking them in the “Builds Upon” and “Superseded by” fields. (See the table of fields below for further details.)

Chinese data

We anonymized the data of Chinese systems by concealing names and rounding values of numerical fields to one significant figure. We do this to protect our public data sources and reduce the risk of owners redacting relevant information. We took this step in response to reports about reduced Chinese openness triggered by increased coverage in American think tank reports. We may grant some trusted researchers access to the full dataset upon request. Inquiries should be directed to data@epoch.ai.

Fields

Column	Type	Definition	Example from Oak Ridge NL Frontier	Coverage
Name	Text	The owner’s name, followed by the cluster name used in the most official announcement. If the name or the owner’s name is unknown, then other identifying information may be used to distinguish the supercomputer (such as location, year, chip type, an index, etc). Supercomputers that have several phases are notated by “Phase 1”, “Phase 2”, etc	Oak Ridge NL Frontier	100% 747 out of 747 models
Status	Categorical (single select)	Existing: The supercomputer (as described in this entry) is believed to be operational (>80% of the GPUs available for training) Planned: The supercomputer is not yet operational, and this entry describes plans for when it will be operational. Decomissioned: The supercomputer used to exist, but has since been decomissioned.	Existing	100% 747 out of 747 models
Certainty	Categorical (single select)	How likely does (or will) this cluster exist as roughly as specified? Confirmed: Official announcement that seems highly trustworthy or expert confirmation. Corresponds to >90% certainty. Likely: Cluster most likely exists/ will exist roughly in the specified form. 50-90% certainty. Unlikely: Cluster probably doesn’t/won’t exist roughly in the specified form. 10-50% certainty Confirmed false: Cluster does not/ will not exist roughly in the specified form. Use e.g. if an expert or media report revealed earlier announcements were wrong. (90% or more likely the cluster does not exist in roughly the specified form)	Confirmed	100% 747 out of 747 models
Single cluster?	Categorical (single select)	Yes: The source clearly implies a single supercomputers deployed on a continuous campus without significant physical separation, connected by high-bandwidth networking fabric, forming a single system. There is no evidence otherwise Unclear: No compelling evidence either way. No: The cluster the source refers to is, in fact, several distinct supercomputers.	Yes	100% 745 out of 747 models
Chip type (primary)	Categorical (single select)	The primary AI chip used in the supercomputer. This links to a specific chip in the Epoch ML Hardware Database.	AMD Radeon Instinct MI250X	61% 452 out of 747 models
Chip quantity (primary)	Numerical	Total number of the primary AI chips in the supercomputer. A GB200 is counted as two AI chips for this dataset.	37,632	84% 627 out of 747 models
Hardware note	Text	Note to describe the all the AI chips in the supercomputer.	AMD MI250X	62% 462 out of 747 models
Max OP/s (log)	Numerical	The base 10 logrithm of the maximum theorhetical operations per second the superocmputer can achieve (out of 32, 16, or 8 bit performance)	19.158756074028645	93% 693 out of 747 models
16-bit OP/s (log)	Numerical	Base 10 logrithm of the maximum theorhetical performance of the supercomputer in 16-bit FLOP/s	19.158756074028645	90% 673 out of 747 models
H100 equivalents	Numerical	Divides the supercomputer’s Max OP/s field (in 32, 16, or 8 bit) by an H100’s FP8 FLOP/s. Note, H100 equivalents isn’t a well defined measurement, and this number should be treated more as a “sense” instead of a precise measurement. It works fairly well for comparisons among supercomputers since ~2021, but makes less sense for older supercomputers.	7,282	93% 693 out of 747 models
First Operational Date	Date	When the supercomputer was first fully operational (ie, you could run a workload on at least 80% of the cluster). This will often be an approximation to the nearest month or so, since the exact data a supercomputer became operational is rarely announced. By default, we are conservative with this date, and set it to the first date that we have confirmation that it was operational, which will often be a few months after it was actually first operational. This data field is of date type, and will automatically be calculated based on the text in the “First operational note” field	2022-05-30	78% 581 out of 747 models
Country	Categorical (single select)	Country in which the supercomputer is physically located.	United States of America	89% 663 out of 747 models
Owner	Categorical (multiple select)	The entity that owns the AI supercomputer itself (the hardware within the datacenter). This can include several entity if it is a joint venture. This can sometimes differ from the entity that owns the datacenter itself, or the entity that has a long term arrangement to rent the supercomputer.	US Department of Energy	62% 465 out of 747 models
Sector	Categorical (single select)	Private: Owned by a commercial entity. Public: Owned by a government or university Public/Private: A colaboration between private company and government or university if actors from the public and private sectors have both paid for at least 25% of the supercomputer.	Public	98% 731 out of 747 models
Power Capacity (MW)	Numerical	The peak power capacity of the system, in megawatts. Will be the reported power capacity if available, and the calculated power capacity if not.	40.0	83% 618 out of 747 models
Hardware Cost	Numerical	The cost to acquire the hardware for the system in 2025 US dollars. Includes the cost for the GPUs, CPUs, networking, etc, but not the datacenter itself. Will be the reported hardware cost if it exists, and the calculated hardware cost otherwise.	$620,445,966	72% 537 out of 747 models
Energy Efficiency	Numerical	Log of the 16-bit FLOP/s per watt. The numerator is the theorhetical max 16-bit FLOP/s of the system, and the denomenator is the peak power capacity of the system.	360326400000.0	79% 593 out of 747 models
Rank when first operational	Numerical	This supercomputer’s rank in the world by performance (FLOP/s, maximum of 32, 16, or 8 bit precision) the day it first became operational	2	72% 540 out of 747 models
Location	Text	Specific location where the supercomputer is physicaly located.	Oak Ridge National Laboratory 5200, 1 Bethel Valley Rd, Oak Ridge, TN 37830	76% 571 out of 747 models
Users	Categorical (multiple select)	Any significant entities known to use the supercomputer. “Cloud” refers to the supercomputer being offered publicly via the cloud. Will often just be the owner.	US Government, Academia, Industry	59% 438 out of 747 models
Quote	Text	A quote from the source specifying the number of and type of chips and/or the reported or manual FLOP/s number if applicable.	The system has 74 Olympus rack HPE cabinets, each with 128 AMD compute nodes, and a total of 9,408 AMD compute nodes.... Each Frontier compute node consists of [1x] 64-core AMD “Optimized 3rd Gen EPYC” CPU (with 2 hardware threads per physical core) with access to 512 GB of DDR4 memory. Each node also contains [4x] AMD MI250X	62% 465 out of 747 models
Note	Text	Notes that give additional context or helpful information about the supercomputer.	(9408 nodes)x(4 GPUs/node) = 37,632 GPUs	27% 203 out of 747 models
First Operational Date Note	Text	When the supercomputer was first fully operational (ie, you could run a workload on at least 80% of the cluster). This will often be an approximation to the nearest month or so, since the exact data a supercomputer became operational is rarely announced. By default, we are conservative with this date, and set it to the first date that we have confirmation that it was operational, which will often be a few months after it was actually first operational. Will be “Planned” or “Planned for [date]” if supercomputer in construction or planned	2022-05-30	66% 490 out of 747 models
Certainty Note	Text	A briefly explanation of why the given certainty level was selected	Details released on official government website	16% 122 out of 747 models
Builds Upon	Categorical (single select)	If a supercomputer was built in multiple phases, this links to the supercomputer corresponding to the preceding phase.	[empty]	7% 56 out of 747 models
Superseded by	Categorical (single select)	If a supercomputer has a later phase (eg, was upgraded later in time), and that phase is already operational, this links to the supercomputer entry corresponding to that phase	[empty]	4% 31 out of 747 models
Possible duplicate	Boolean	True if we think there is a >20% chance that this supercomputer is a duplicate of a supercomputer that already exists in this database. In circumstances where we think two entries might correspond to the same supercomputer, we will mark this field False for the entry with which we have more information, and True for the entry with which we have less information (this allows the user to filter out all entries with “Possible Duplicate”=True for analysis)	False	13% 94 out of 747 models
Possible Duplicate Of	Categorical (multiple select)	If “Possible duplicate” is marked True, this will link to the entry that we think this supercomputer is a duplicate of	[empty]	7% 55 out of 747 models
Chip type (secondary)	Categorical (single select)	The secondary AI chip used in the supercomputer, if there was any. This links to a specific chip in the Epoch ML Hardware Database.	[empty] Frontier only used one AI chip type.	4% 27 out of 747 models
Chip quantity (secondary)	Numerical	Total number of the secondary AI chips in the supercomputer, if there are any. A GB200 is counted as two AI chips for this dataset.	[empty] Frontier only used one AI chip type.	4% 29 out of 747 models
Total number of AI chips	Numerical	Corresponds to the number of AI chips in the supercomputer. Maxium of {Primary AI chip quantity + Secondary AI chip quantity; Manual total chips number; number of chips calculated from FLOP/s}	37,632	85% 637 out of 747 models
GPU supplier (primary)	Categorical (single select)	The GPU supplier for the Primary AI Chip type	AMD	100% 747 out of 747 models
GPU supplier (secondary)	Categorical (single select)	The GPU supplier for the Secondary AI Chip type (if applicable)	[empty] Frontier only used one AI chip type.	38% 283 out of 747 models
Exclude	Boolean	True if the entry should be excluded from analyses. This could be becase it falls below the inclusion threshold, or there is another reason for excluding it. False otherwise.	False	0% 0 out of 747 models
Include in Standard Analysis	Boolean	True if this entry qualified to be used in the analyses we used for the original paper. That is, (Status=”Existing” or “Decomissioned”) AND (Exclude=False) AND (Certainty = “Confirmed” or “Likely”) AND (Possible Duplicate=False) AND (Single Cluster?=”Yes”) False otherwise	True	63% 469 out of 747 models
Max OP/s	Numerical	The maximum theorhetical operations per second the superocmputer can achieve (out of 32, 16, or 8 bit performance)	1.4e19	93% 693 out of 747 models
8-bit OP/s	Numerical	The maximum theorhetical operations per second the superocmputer can achieve in 8-bit numerical precision	1.4e19	52% 387 out of 747 models
16-bit OP/s	Numerical	The maximum theorhetical operations per second the superocmputer can achieve in 16-bit numerical precision	1.4e19	90% 673 out of 747 models
32-bit OP/s	Numerical	The maximum theorhetical operations per second the superocmputer can achieve in 32-bit numerical precision	3.6e18	74% 550 out of 747 models
Calculated Power Capacity (MW)	Numerical	The calculated power capacity of the supercomputer in megawatts based on the number and type of chips. See paper for full methodology	40.0667904	80% 596 out of 747 models
Reported power capacity (MW)	Text	Reported power capacity of the supercomputer in megawatts	40.0	17% 124 out of 747 models
Calculated Cost	Numerical	Calculates the cost (in 2025 US dollars) based on number and type of chips. See paper for full methodology	$967,488,702	72% 536 out of 747 models
Reported Cost	Numerical	Reported total cost of the supercomputer in US dollars at the time.	$600,000,000	4% 33 out of 747 models
Reported Cost (Inflation adjusted)	Numerical	Reported total cost of the supercomputer, adjusted to 2025 US dollars	$620,445,966	4% 30 out of 747 models
Cost Quote	Text	Quote from the source saying how much the supercomputer cost. Might refer to a currency besides dollars, or not yet be inflation adjusted	$600M	5% 34 out of 747 models
Largest existing cluster when first operational	Text	The name of the largest known supercomputer existing on the first operaitonal date of this supercomputer	Microsoft GPT-4 cluster	72% 540 out of 747 models
% of largest cluster when first operational	Numerical	The maximum theorhetical FLOP/s (32, 16, or 8 bit) for the this supercomputer, divided by the maximum theorhetical FLOP/s for the largest known supercomputer existing at the time.	0.9239138461538378	72% 540 out of 747 models
Source 1 (through Source n)	URL	URL for source on the AI supercomputer. Generally saved as a Wayback Machine link to preserve the link	https://web.archive.org/web/20240720224959/https://docs.olcf.ornl.gov/systems/frontier_user_guide.html	65% 487 out of 747 models

Estimation

Some fields in the dataset can require estimation, because they are often not straightforwardly reported within papers or other sources. Here, we describe our estimation procedure for power requirements, hardware costs, and metadata on the confidence of our estimates.

Estimating power requirements

We calculated the peak power demand for each AI supercomputer with the following formula:

Power = Chip TDP × number of chips × system overhead × PUE

We collected Thermal Design Power (TDP) for most chips when publicly available, though we did not find the TDP for some Chinese chips and custom silicon such as Google’s TPU v5p. We considered both primary and secondary chips when counting the number and types of chips. We used a 1.82× multiplier for non-GPU hardware to account for system overhead (additional power needed for other server components like CPUs, network switches, and storage), based on NVIDIA DGX H100 server specifications (NVIDIA, 2025). We also factored in Power Usage Effectiveness (PUE), which is the ratio of total data center power use to IT power use (with a minimum value of 1). According to the 2024 United States Data Center Energy Usage Report (Shehabi et al., 2024), specialized AI datacenter facilities had an average PUE of 1.14 in 2023, which is 0.29 lower than the overall national average of 1.43. We adjusted the trend for all datacenter facilities to estimate the average PUE of AI datacenters by subtracting 0.29 from the overall values reported by Shehabi et al. (Table 2).

For more information, see Appendix B.4 in our paper.

Estimating hardware cost

We use the publicly reported total hardware cost of the AI supercomputer in our analysis whenever it is available. When it is unavailable, we estimate this cost based on the chip type, quantity, and public chip prices. For each type of chip, we multiply the cost per chip by the number of chips, multiply by factors for intra-server and inter-server overhead, and then sum these costs if there are multiple types of chips. Then, we adjust for the cost of server-to-server networking equipment, which was estimated to be 19% of final hardware acquisition costs. Finally, we apply a discount factor of 15% to the final hardware cost of the AI supercomputer to account for large purchasers of AI chips often negotiating a discount on their order.

Our final formula for estimating hardware cost is as follows:

Hardware acquisition cost

[(Primary AI chip cost×Primary AI chip quantity)
+(Secondary AI chip cost×Secondary AI chip quantity)]
×Intra-server overhead×Inter-server overhead×Discount factor

In this formula, our intra-server overhead, or “chip-to-server” factor, is 1.64×, our inter-server overhead, or “server-to-cluster” factor, is 1.23×, and our discount factor is 0.85×.

For more information, see Appendix B.5 in our paper.

Estimating certainty

In many cases, we don’t have all of the exact details for the system, and so have to make an estimate based on the information we have. When we do so, we explain our reasoning in the “Notes” field, and lower the “Certainty” field as warranted.

For example, when we know the power capacity of the system, but not the chip type, we sometimes estimate the FLOP/s based on the net FLOP/s/W based on the chip(s) they are most likely using. For systems several years in the future, where we don’t know the specifications for chips that will exist by then, we assume that the 1.26x/year trend in FLOP/s/W (Appendix D.1 of our paper) improvement will continue.

AI Supercomputers Documentation

Overview

Use This Work

Citation

BibTeX citation

Inclusion

Inclusion

Criteria

Data sources

Coverage

Records

Upgrades to supercomputers

Chinese data

Fields

Estimation

Estimating power requirements

Estimating hardware cost

Estimating certainty

Downloads

Processed Dataset

Acknowledgements