The dataset focuses on GPU clusters, which contain large quantities of processors known as accelerators, and are deployed on a continuous campus without significant physical separation. GPU clusters are sometimes configured as AI supercomputers, in which the accelerators are linked with high-bandwidth networking hardware.
A cluster should be included in this database if both of the following criteria are satisfied:
The system contains chips that can accelerate AI workloads, if used with the right infrastructure and software. These include NVIDIA’s V100, A100, H100, and GB200 GPUs, Google’s TPUs, and other chips commonly used to train frontier AI models.
The system has high theoretical performance relative to other GPU clusters at the time it was built. In order to train state-of-the-art machine learning models, developers typically need access to more compute than the amount that was used for previous models. Additionally, hardware improvement is rapid and follows an exponential trend over time, so we use a dynamic threshold during the study period. For inclusion, the theoretical performance of the system must be at least 1% of the largest known GPU cluster that existed on the date when it first became operational.
Outside of the study period, we apply an inclusion threshold that does not scale over time:
Additionally, we also include planned clusters, if we anticipate they will likely meet the above requirements once they become operational. These are indicated with a value of “Planned” in the Status field, and can be shown or hidden in the visualization.
We collect and maintain data on GPU clusters from a variety of sources, including machine learning papers, publicly available news articles, press releases, and existing lists of supercomputers.
We created a list of potential clusters by using the Google Search API to search key terms like “AI supercomputer” and “GPU cluster” from 2019 to 2025, then used GPT-4o to extract any clusters or supercomputers mentioned in the resulting articles. We also added supercomputers from publicly available lists such as Top500 and MLPerf, and GPU rental marketplaces. For each potential cluster, we manually searched for public information such as number and type of chips used, when it was first operational, reported performance, owner, and location. A detailed description of our methods can be found in Appendix A of our paper.