AI Supercomputers Documentation

Overview

Our AI Supercomputers Dataset is a collection of AI compute clusters made of hardware typically used for training large-scale AI models, with key details such as their performance, hardware type, location, and estimated cost and power draw. This dataset is useful for research about trends in the physical infrastructure used for training artificial intelligence.

This documentation describes which AI supercomputers are contained within the dataset, the information in its records (including data fields and definitions), and processes for adding new entries and auditing accuracy. It also includes a changelog and acknowledgements.

The dataset is accessible on our website as a visualization or table, and is available for download as a CSV file, refreshed daily. For a quick-start example of loading the data and working with it in your research, see this Google Colab demo notebook.

If you would like to ask any questions about the database, or suggest any systems that should be added or edited, feel free to contact us at data@epoch.ai.

If this dataset is useful for you, please cite it.

Use This Work

Epoch’s data is free to use, distribute, and reproduce provided the source and authors are credited under the Creative Commons Attribution license.

Citation

Konstantin Pilz, Robi Rahman, James Sanders, Lennart Heim, ‘Trends in AI Supercomputers’. Published online at epoch.ai. Retrieved from: ‘https://epoch.ai/data/ai-supercomputers’ [online resource], accessed 

BibTeX citation

@misc{EpochAISupercomputers2025,
  title = {“Trends in AI Supercomputers”},
  author = {{Konstantin Pilz, Robi Rahman, James Sanders, Lennart Heim}},
  year = {2025},
  month = {04},
  url = {https://epoch.ai/data/ai-supercomputers},
  note = {Accessed: "}
}}

Inclusion

The dataset focuses on AI supercomputers, which are systems that support training of large-scale ML models. They contain large quantities of processors known as AI accelerators, are deployed on a continuous campus without significant physical separation, and form a single system, typically connected by high-bandwidth networking fabric. AI supercomputers are sometimes also referred to as “GPU clusters” or “AI datacenters”.

Criteria

A supercomputer should be included in this database if both of the following criteria are satisfied:

  • The system contains chips that can accelerate AI workloads. These include NVIDIA’s V100, A100, H100, and GB200 GPUs, Google’s TPUs, and other chips commonly used to train frontier AI models.
  • The system has high theoretical performance relative to other AI supercomputers at the time it was built. In order to train state-of-the-art machine learning models, developers typically need access to more compute than the amount that was used for previous models. Additionally, hardware improvement is rapid and follows an exponential trend over time, so we use a dynamic threshold during the study period. For inclusion, the theoretical performance of the system must be at least 1% of the largest known AI supercomputer that existed on the date when it first became operational.
    • Theoretical performance is calculated by multiplying the number of AI chips by their theoretical maximum (non-sparse) FLOP/s value, for the highest available FLOP/s metric on 32-, 16-, or 8-bit number formats. See here for the minimum FLOP/s count required at any point in time to be included.

    Outside of the study period, we apply an inclusion threshold that does not scale over time:

    • For systems before 2017, the AI supercomputer is included if its theoretical performance is at least 10^16 FLOP/s, in any numerical precision, which roughly corresponds to 1% of the highest performance of any pre-2017 supercomputer.
    • For systems after 2024, the AI supercomputer is included if its theoretical performance is equivalent to at least 1,000 H100s, which is 1% the size of the leading supercomputer as of EOY 2024.

Additionally, we also include planned supercomputers, if we anticipate they will likely meet the above requirements once they become operational. These are indicated with a value of “Planned” in the Status field, and can be shown or hidden in the visualization.

Data sources

We collect and maintain data on AI supercomputers from a variety of sources, including machine learning papers, publicly available news articles, press releases, and existing lists of supercomputers.

We created a list of potential supercomputers by using the Google Search API to search key terms like “AI supercomputer” and “GPU cluster” from 2019 to 2025, then used GPT-4o to extract any supercomputers mentioned in the resulting articles. We also added supercomputers from publicly available lists such as Top500 and MLPerf, and GPU rental marketplaces. For each potential supercomputer, we manually searched for public information such as number and type of chips used, when it was first operational, reported performance, owner, and location. A detailed description of our methods can be found in Appendix A of our paper.

Coverage

We estimate that we cover 10-20% of AI supercomputers by computing power as of early 2025.

This includes roughly 20-37% of NVIDIA H100s, 12% of A100s, and 18% of AMD MI300Xs. Meanwhile, we estimate we cover less than 4% of Google’s TPUs and very few custom AI chips designed by AWS, Microsoft, or Meta. We also only cover about 2% of NVIDIA chips designed to be sold in China (including the A800, H800, and H20).
The coverage of different companies varies considerably, from 43% for Meta and 20% for Microsoft to 10% for AWS and 0% for Apple. The coverage of Chinese companies is particularly poor. Our average coverage of 8 major companies is 15%.
From the end of 2020 to the end of 2024 we cover between 10-20% of total Chinese 16-bit FLOP/s based on an estimate by IDC (2025).

For more information, see Appendix C.1.1 of our paper.

Records

An entry in this database contains the details of an AI supercomputer at a point in time, typically the first date it became operational.

Many AI supercomputers are upgraded or expanded over time with newer hardware. In these cases, we create a new record, following the procedure below, and reflecting the date when the upgrade was completed.

Upgrades to supercomputers

If an existing AI supercomputer is upgraded in a way that substantially changes the supercomputer, we count this as a new AI supercomputer and create a new entry. We do this when one of the criteria apply:

  • The AI supercomputer increases performance by more than 20%
  • The majority of the AI chips used in the supercomputer are changed

We mark that the later supercomputer builds on the former by linking them in the “Builds Upon” and “Superseded by” fields. (See the table of fields below for further details.)

Chinese data

We anonymized the data of Chinese systems by concealing names and rounding values of numerical fields to one significant figure. We do this to protect our public data sources and reduce the risk of owners redacting relevant information. We took this step in response to reports about reduced Chinese openness triggered by increased coverage in American think tank reports. We may grant some trusted researchers access to the full dataset upon request. Inquiries should be directed to data@epoch.ai.

Fields

Column Type Definition Example from Oak Ridge NL Frontier Coverage
Name Text

The owner’s name, followed by the cluster name used in the most official announcement.
If the name or the owner’s name is unknown, then other identifying information may be used to distinguish the supercomputer (such as location, year, chip type, an index, etc).
Supercomputers that have several phases are notated by “Phase 1”, “Phase 2”, etc

Oak Ridge NL Frontier 100%
747 out of 747 models
Status Categorical (single select)

Existing: The supercomputer (as described in this entry) is believed to be operational (>80% of the GPUs available for training)
Planned: The supercomputer is not yet operational, and this entry describes plans for when it will be operational.
Decomissioned: The supercomputer used to exist, but has since been decomissioned.

Existing 100%
747 out of 747 models
Certainty Categorical (single select)

How likely does (or will) this cluster exist as roughly as specified?
Confirmed: Official announcement that seems highly trustworthy or expert confirmation. Corresponds to >90% certainty.
Likely: Cluster most likely exists/ will exist roughly in the specified form. 50-90% certainty.
Unlikely: Cluster probably doesn’t/won’t exist roughly in the specified form. 10-50% certainty
Confirmed false: Cluster does not/ will not exist roughly in the specified form. Use e.g. if an expert or media report revealed earlier announcements were wrong. (90% or more likely the cluster does not exist in roughly the specified form)

Confirmed 100%
747 out of 747 models
Single cluster? Categorical (single select)

Yes: The source clearly implies a single supercomputers deployed on a continuous campus without significant physical separation, connected by high-bandwidth networking fabric, forming a single system. There is no evidence otherwise
Unclear: No compelling evidence either way.
No: The cluster the source refers to is, in fact, several distinct supercomputers.

Yes 100%
745 out of 747 models
Chip type (primary) Categorical (single select)

The primary AI chip used in the supercomputer. This links to a specific chip in the Epoch ML Hardware Database.

AMD Radeon Instinct MI250X 61%
452 out of 747 models
Chip quantity (primary) Numerical

Total number of the primary AI chips in the supercomputer.
A GB200 is counted as two AI chips for this dataset.

37,632 84%
627 out of 747 models
Hardware note Text

Note to describe the all the AI chips in the supercomputer.

AMD MI250X 62%
462 out of 747 models
Max OP/s (log) Numerical

The base 10 logrithm of the maximum theorhetical operations per second the superocmputer can achieve (out of 32, 16, or 8 bit performance)

19.158756074028645 93%
693 out of 747 models
16-bit OP/s (log) Numerical

Base 10 logrithm of the maximum theorhetical performance of the supercomputer in 16-bit FLOP/s

19.158756074028645 90%
673 out of 747 models
H100 equivalents Numerical

Divides the supercomputer’s Max OP/s field (in 32, 16, or 8 bit) by an H100’s FP8 FLOP/s. Note, H100 equivalents isn’t a well defined measurement, and this number should be treated more as a “sense” instead of a precise measurement. It works fairly well for comparisons among supercomputers since ~2021, but makes less sense for older supercomputers.

7,282 93%
693 out of 747 models
First Operational Date Date

When the supercomputer was first fully operational (ie, you could run a workload on at least 80% of the cluster). This will often be an approximation to the nearest month or so, since the exact data a supercomputer became operational is rarely announced. By default, we are conservative with this date, and set it to the first date that we have confirmation that it was operational, which will often be a few months after it was actually first operational.
This data field is of date type, and will automatically be calculated based on the text in the “First operational note” field

2022-05-30 78%
580 out of 747 models
Country Categorical (single select)

Country in which the supercomputer is physically located.

United States of America 89%
662 out of 747 models
Owner Categorical (multiple select)

The entity that owns the AI supercomputer itself (the hardware within the datacenter). This can include several entity if it is a joint venture. This can sometimes differ from the entity that owns the datacenter itself, or the entity that has a long term arrangement to rent the supercomputer.

US Department of Energy 62%
465 out of 747 models
Sector Categorical (single select)

Private: Owned by a commercial entity.
Public: Owned by a government or university
Public/Private: A colaboration between private company and government or university if actors from the public and private sectors have both paid for at least 25% of the supercomputer.

Public 98%
731 out of 747 models
Power Capacity (MW) Numerical

The peak power capacity of the system, in megawatts. Will be the reported power capacity if available, and the calculated power capacity if not.

40.0 83%
618 out of 747 models
Hardware Cost Numerical

The cost to acquire the hardware for the system in 2025 US dollars. Includes the cost for the GPUs, CPUs, networking, etc, but not the datacenter itself. Will be the reported hardware cost if it exists, and the calculated hardware cost otherwise.

$620,445,966 72%
537 out of 747 models
Energy Efficiency Numerical

Log of the 16-bit FLOP/s per watt. The numerator is the theorhetical max 16-bit FLOP/s of the system, and the denomenator is the peak power capacity of the system.

360326400000.0 79%
593 out of 747 models
Rank when first operational Numerical

This supercomputer’s rank in the world by performance (FLOP/s, maximum of 32, 16, or 8 bit precision) the day it first became operational

2 72%
540 out of 747 models
Location Text

Specific location where the supercomputer is physicaly located.

Oak Ridge National Laboratory 5200, 1 Bethel Valley Rd, Oak Ridge, TN 37830 76%
571 out of 747 models
Users Categorical (multiple select)

Any significant entities known to use the supercomputer. “Cloud” refers to the supercomputer being offered publicly via the cloud. Will often just be the owner.

US Government, Academia, Industry 59%
438 out of 747 models
Quote Text

A quote from the source specifying the number of and type of chips and/or the reported or manual FLOP/s number if applicable.

The system has 74 Olympus rack HPE cabinets, each with 128 AMD compute nodes, and a total of 9,408 AMD compute nodes.... Each Frontier compute node consists of [1x] 64-core AMD “Optimized 3rd Gen EPYC” CPU (with 2 hardware threads per physical core) with access to 512 GB of DDR4 memory. Each node also contains [4x] AMD MI250X 62%
465 out of 747 models
Note Text

Notes that give additional context or helpful information about the supercomputer.

(9408 nodes)x(4 GPUs/node) = 37,632 GPUs 27%
202 out of 747 models
First Operational Date Note Text

When the supercomputer was first fully operational (ie, you could run a workload on at least 80% of the cluster). This will often be an approximation to the nearest month or so, since the exact data a supercomputer became operational is rarely announced. By default, we are conservative with this date, and set it to the first date that we have confirmation that it was operational, which will often be a few months after it was actually first operational.
Will be “Planned” or “Planned for [date]” if supercomputer in construction or planned

2022-05-30 66%
490 out of 747 models
Certainty Note Text

A briefly explanation of why the given certainty level was selected

Details released on official government website 16%
122 out of 747 models
Builds Upon Categorical (single select)

If a supercomputer was built in multiple phases, this links to the supercomputer corresponding to the preceding phase.

[empty] 7%
56 out of 747 models
Superseded by Categorical (single select)

If a supercomputer has a later phase (eg, was upgraded later in time), and that phase is already operational, this links to the supercomputer entry corresponding to that phase

[empty] 4%
30 out of 747 models
Possible duplicate Boolean

True if we think there is a >20% chance that this supercomputer is a duplicate of a supercomputer that already exists in this database. In circumstances where we think two entries might correspond to the same supercomputer, we will mark this field False for the entry with which we have more information, and True for the entry with which we have less information (this allows the user to filter out all entries with “Possible Duplicate”=True for analysis)

False 13%
94 out of 747 models
Possible Duplicate Of Categorical (multiple select)

If “Possible duplicate” is marked True, this will link to the entry that we think this supercomputer is a duplicate of

[empty] 7%
55 out of 747 models
Chip type (secondary) Categorical (single select)

The secondary AI chip used in the supercomputer, if there was any. This links to a specific chip in the Epoch ML Hardware Database.

[empty]

Frontier only used one AI chip type.
4%
27 out of 747 models
Chip quantity (secondary) Numerical

Total number of the secondary AI chips in the supercomputer, if there are any.
A GB200 is counted as two AI chips for this dataset.

[empty]

Frontier only used one AI chip type.
4%
29 out of 747 models
Total number of AI chips Numerical

Corresponds to the number of AI chips in the supercomputer. Maxium of {Primary AI chip quantity + Secondary AI chip quantity; Manual total chips number; number of chips calculated from FLOP/s}

37,632 85%
637 out of 747 models
GPU supplier (primary) Categorical (single select)

The GPU supplier for the Primary AI Chip type

AMD 100%
747 out of 747 models
GPU supplier (secondary) Categorical (single select)

The GPU supplier for the Secondary AI Chip type (if applicable)

[empty]

Frontier only used one AI chip type.
38%
283 out of 747 models
Exclude Boolean

True if the entry should be excluded from analyses. This could be becase it falls below the inclusion threshold, or there is another reason for excluding it. False otherwise.

False 0%
0 out of 747 models
Include in Standard Analysis Boolean

True if this entry qualified to be used in the analyses we used for the original paper. That is, (Status=”Existing” or “Decomissioned”) AND (Exclude=False) AND (Certainty = “Confirmed” or “Likely”) AND (Possible Duplicate=False) AND (Single Cluster?=”Yes”)
False otherwise

True 63%
468 out of 747 models
Max OP/s Numerical

The maximum theorhetical operations per second the superocmputer can achieve (out of 32, 16, or 8 bit performance)

1.4e19 93%
693 out of 747 models
8-bit OP/s Numerical

The maximum theorhetical operations per second the superocmputer can achieve in 8-bit numerical precision

1.4e19 52%
387 out of 747 models
16-bit OP/s Numerical

The maximum theorhetical operations per second the superocmputer can achieve in 16-bit numerical precision

1.4e19 90%
673 out of 747 models
32-bit OP/s Numerical

The maximum theorhetical operations per second the superocmputer can achieve in 32-bit numerical precision

3.6e18 74%
550 out of 747 models
Calculated Power Capacity (MW) Numerical

The calculated power capacity of the supercomputer in megawatts based on the number and type of chips. See paper for full methodology

40.0667904 80%
596 out of 747 models
Reported power capacity (MW) Text

Reported power capacity of the supercomputer in megawatts

40.0 17%
124 out of 747 models
Calculated Cost Numerical

Calculates the cost (in 2025 US dollars) based on number and type of chips. See paper for full methodology

$967,488,702 72%
536 out of 747 models
Reported Cost Numerical

Reported total cost of the supercomputer in US dollars at the time.

$600,000,000 4%
33 out of 747 models
Reported Cost (Inflation adjusted) Numerical

Reported total cost of the supercomputer, adjusted to 2025 US dollars

$620,445,966 4%
30 out of 747 models
Cost Quote Text

Quote from the source saying how much the supercomputer cost. Might refer to a currency besides dollars, or not yet be inflation adjusted

$600M 5%
34 out of 747 models
Largest existing cluster when first operational Text

The name of the largest known supercomputer existing on the first operaitonal date of this supercomputer

Microsoft GPT-4 cluster 72%
540 out of 747 models
% of largest cluster when first operational Numerical

The maximum theorhetical FLOP/s (32, 16, or 8 bit) for the this supercomputer, divided by the maximum theorhetical FLOP/s for the largest known supercomputer existing at the time.

0.9239138461538378 72%
540 out of 747 models
Source 1 (through Source n) URL

URL for source on the AI supercomputer. Generally saved as a Wayback Machine link to preserve the link

https://web.archive.org/web/20240720224959/https://docs.olcf.ornl.gov/systems/frontier_user_guide.html 65%
487 out of 747 models

Estimation

Some fields in the dataset can require estimation, because they are often not straightforwardly reported within papers or other sources. Here, we describe our estimation procedure for power requirements, hardware costs, and metadata on the confidence of our estimates.

Estimating power requirements

We calculated the peak power demand for each AI supercomputer with the following formula:

Power = Chip TDP × number of chips × system overhead × PUE

We collected Thermal Design Power (TDP) for most chips when publicly available, though we did not find the TDP for some Chinese chips and custom silicon such as Google’s TPU v5p. We considered both primary and secondary chips when counting the number and types of chips. We used a 1.82× multiplier for non-GPU hardware to account for system overhead (additional power needed for other server components like CPUs, network switches, and storage), based on NVIDIA DGX H100 server specifications (NVIDIA, 2025). We also factored in Power Usage Effectiveness (PUE), which is the ratio of total data center power use to IT power use (with a minimum value of 1). According to the 2024 United States Data Center Energy Usage Report (Shehabi et al., 2024), specialized AI datacenter facilities had an average PUE of 1.14 in 2023, which is 0.29 lower than the overall national average of 1.43. We adjusted the trend for all datacenter facilities to estimate the average PUE of AI datacenters by subtracting 0.29 from the overall values reported by Shehabi et al. (Table 2).

For more information, see Appendix B.4 in our paper.

Estimating hardware cost

We use the publicly reported total hardware cost of the AI supercomputer in our analysis whenever it is available. When it is unavailable, we estimate this cost based on the chip type, quantity, and public chip prices. For each type of chip, we multiply the cost per chip by the number of chips, multiply by factors for intra-server and inter-server overhead, and then sum these costs if there are multiple types of chips. Then, we adjust for the cost of server-to-server networking equipment, which was estimated to be 19% of final hardware acquisition costs. Finally, we apply a discount factor of 15% to the final hardware cost of the AI supercomputer to account for large purchasers of AI chips often negotiating a discount on their order.

Our final formula for estimating hardware cost is as follows:

Hardware acquisition cost

=

[(Primary AI chip cost×Primary AI chip quantity)
+(Secondary AI chip cost×Secondary AI chip quantity)]
×Intra-server overhead×Inter-server overhead×Discount factor

In this formula, our intra-server overhead, or “chip-to-server” factor, is 1.64×, our inter-server overhead, or “server-to-cluster” factor, is 1.23×, and our discount factor is 0.85×.

For more information, see Appendix B.5 in our paper.

Estimating certainty

In many cases, we don’t have all of the exact details for the system, and so have to make an estimate based on the information we have. When we do so, we explain our reasoning in the “Notes” field, and lower the “Certainty” field as warranted.

For example, when we know the power capacity of the system, but not the chip type, we sometimes estimate the FLOP/s based on the net FLOP/s/W based on the chip(s) they are most likely using. For systems several years in the future, where we don’t know the specifications for chips that will exist by then, we assume that the 1.26x/year trend in FLOP/s/W (Appendix D.1 of our paper) improvement will continue.

Downloads

We recommend downloading the processed dataset, which filters out clusters not analyzed in our study. This excludes clusters based on low-certainty rumors, clusters not known to be contiguous within a single location, and potential duplicates.

The raw dataset is also available but not recommended.

Processed Dataset

CSV, Updated July 16, 2025

Acknowledgements

This data was collected by Epoch AI’s employees and collaborators, including Konstantin Pilz, Angelina Li, Rose Hadshar, Ben Cottier, Robi Rahman, James Sanders, Joanne Fang, Veronika Blablová, Lovis Heindrich, and David Atanasov.

This documentation was written by Konstantin Pilz, James Sanders, and Robi Rahman. Material on estimating supercomputer cost was adapted from previous work by Ben Cottier and Robi Rahman. Material on estimating hardware power consumption was adapted from previous work by Luke Emberson and Robi Rahman.