Explore the methods
behind our analysis
Understand how satellite imagery, permits, and public disclosures are used to inform power and performance estimates.
Our open database of large AI data centers, using satellite and permit data to track compute, power use, and construction timelines.
Understand how satellite imagery, permits, and public disclosures are used to inform power and performance estimates.
Epoch’s Frontier Data Centers Hub is an independent database tracking the construction timelines of major US AI data centers through high-resolution satellite imagery, permits, and public documents.
We define an AI data center as a collection of one or more buildings which are located near each other and which run hardware specialized for AI. This hardware includes GPUs or custom chips like Google’s TPUs. These data centers may be used to experiment on, train, and deploy AI models.
We don’t use a hard limit for how close together the data center buildings have to be, but a rule of thumb is less than 10 km apart. The buildings either need to be networked together, or have a shared owner or user.
We do not distinguish between campuses of multiple buildings and individual buildings in what we classify as a data center.
We aimed for initial coverage of two to three of the largest data centers for each frontier AI lab in the United States, namely Anthropic, Google DeepMind, OpenAI, Meta and xAI.
Based on our prior work we believe most, but not all, of the largest data centers are in the United States. Additionally, focusing on one country allowed us to become familiar with permitting standards, providing an additional source for us to validate our methodology. We will continue expanding the coverage in other countries.
We collect general information about each data center, such as the location, owner and users. We also collect satellite or aerial images, and track key metrics over time as each data center evolves. Every data center we track has a timeline of total power capacity, compute capacity, and capital cost. We will increasingly cover other metrics such as building area and water use. See our methodology for details on how we obtain this information.
The database is currently a selection of the largest existing or planned data centers globally, most of which are in the US. Our database covers an estimated 15% of AI compute that has been delivered by chip manufacturers globally as of November 2025. We are expanding this coverage in the US and other countries.
The estimated 15% coverage is based on the following reasoning. Nvidia recently disclosed they’ve shipped 4M Hopper GPUs and 6M Blackwell GPUs as of October, excluding China. The “6M Blackwell GPUs” means 3M Blackwell GPUs as traditionally understood. This is the compute equivalent of ~7.5M H100e, for a total of 11.5M Nvidia H100e, plus a minor older stock of A100s, etc. We provisionally estimate, with higher uncertainty, that TPU, AMD, and Amazon Trainium chips are around 30-40% of the Nvidia total, increasing the total stock to 15-17M H100e. Since Nvidia certainly makes up the large majority of the total AI compute stock, uncertainty here does not dramatically affect the total. This means that the 2.5M H100e of operational capacity that we track in this data hub make up 15-17% of the shipped total. We will update these estimates in the coming months.
The GPU Clusters database has broad historical coverage of computing clusters used for AI and other applications. These clusters may make up just part of the total compute present in one building or campus. In contrast, this database looks at AI data centers at a project level, with a focus on the largest current and upcoming data centers. It also uses primary sources much more, including permitting and satellite imagery, to get greater detail and accuracy on individual data centers. Both databases will be maintained going forward.
Power is estimated for each data center based on the available evidence, which may include permitting documents, cooling equipment, and statements from companies. Compute estimates are based on the specific type and quantity of chips if reported. Otherwise, compute is calculated from power, based on the energy efficiency of chips that we believe are most likely to be used. Cost estimates are entirely calculated from power, based on a general cost-per-watt model. For details, see the methodology.
We model our uncertainty both in terms of quantities for a given state of a data center, and the timing of that state. For the quantities, we are 80% confident that any given power capacity is accurate within a factor of 1.4. That is, we expect that 80% of our estimates will lie between 70% and 140% of the actual value (or the planned value, if it occurs in the future). For compute capacity, this factor of uncertainty increases to 1.5, and for cost, the factor of uncertainty is 1.6. As for the timing, we are 80% confident that our estimate is within 6 months of the actual state.
These confidence levels are informed by data on how our estimates differ from the most credible reference values, and modeling on top of that data. We expect our estimates to become more accurate as we track more data centers and do more research on how they work.
By default, if we list a user, owner, or other affiliate of a data center, we have strong evidence for it. However, sometimes we are uncertain about these affiliations, especially the data center users. We indicate this uncertainty with “Speculative” and “Likely” tags.
“Speculative” means that we have no record of an affiliation, but we have some reason to believe it. For example, we speculate that Anthropic will use two Amazon data centers in Mississippi because the New York Times reported that at least one of the Anthropic-Amazon Project Rainier data centers is located there, but they didn’t specify which ones.
“Likely” means we have some record of an affiliation, but we aren’t confident. For example, OpenAI is a “Likely” user for Microsoft Fairwater because a Microsoft spokesperson stated it “initially will be used to train OpenAI models”, but Microsoft’s partnership with OpenAI has been weakening since 2023.
This is possible, but does not always happen in practice. The total capacity of a data center is more like a cap on the size of a training run in that data center. Data centers often run multiple jobs in parallel. Even if the entire data center is deployed on a single job, hardware failures will slightly reduce the capacity. Relatedly, when we report the total compute capacity of a data center in 8-bit OP/s, this is the theoretical peak capacity for number formats of 8 bits or above. The computational performance in practice is typically about one third of that, due to inefficiencies.
Epoch AI’s data is free to use, distribute, and reproduce provided the source and authors are credited under the Creative Commons Attribution license.
Need deeper insights? Our team offers custom research and advisory services.
Book a consultationHave a question? Noticed something wrong? Let us know.