AI Chip Sales Documentation

Overview

Epoch’s AI Chip Sales dataset tracks aggregate AI accelerator shipments and total compute capacity across major chip designers and manufacturers.

The data is available on our website as a visualization or table, and is available for download as a CSV file, updated daily. For a quick-start example of loading the data and working with it in your research, see this Google Colab demo notebook.

If you would like to ask any questions about the data, or suggest companies that should be added, feel free to contact us at data@epoch.ai.

Use This Work

Epoch’s data is free to use, distribute, and reproduce provided the source and authors are credited under the Creative Commons Attribution license.

Citation

Josh You, Venkat Somala, Yafah Edelman, ‘Data on AI Chip Sales’. Published online at epoch.ai. Retrieved from: ‘https://epoch.ai/data/ai-chip-sales’ [online resource], accessed 

BibTeX citation

@misc{EpochAIChipSales2025,
  title = {Data on AI Chip Sales},
  author = {{Josh You, Venkat Somala, Yafah Edelman}},
  year = {2025},
  month = {01},
  url = {https://epoch.ai/data/ai-chip-sales},
  note = {Accessed: }
}}

Methodology

Summary

This dataset currently tracks estimates of total sales or shipments of the flagship AI chips designed or sold by Amazon (Trainium/Inferentia), AMD, Huawei, Nvidia, and Google (TPU).

For most chip types and time periods, exact chip sales are not disclosed by the relevant companies, though Nvidia has provided the most informative direct disclosures on total Hopper and Blackwell sales. All of our figures are estimates grounded in data and research, though often with significant uncertainty.

Below, we break down specific methodologies and assumptions for each chip designer. In short:

  • For AMD and Nvidia, we use revenue-based models that start from revenue data from earnings disclosures and commentary. We allocate this to different products using earnings commentary, and estimate average selling prices based on media reporting and analyst estimates.
  • For Google’s TPU, we estimate Google’s TPU spending based on Broadcom earnings commentary and media reporting, and estimate average price paid per chip using a bill-of-materials production cost model, combined with estimates of Broadcom’s profit margin on TPUs.
  • For Amazon, we anchor on estimates of large-scale Trainium data centers, supplemented by estimates and information from analysts.
  • For Huawei, we directly synthesize volume estimates from third-party analysts.

Amazon

Amazon has ramped up its deployment of its custom Trainium and Inferentia AI chips in recent years. They launched the latest model, Trainium3, in December 2025, though it is unclear if volume shipments have begun.

In short, Amazon has probably deployed 2-3 million Trainium2 chips by the end of 2025, along with hundreds of thousands of older Trainium1 and Inferentia chips, with Anthropic as the dominant customer. Here, we explain how we estimate these volumes. We focus on 2024-2025 due to lack of evidence about previous years, and presumed low volumes before 2024.

Summary of evidence

Our estimate of Trainium volumes are primarily based on two stylized facts:

  1. Anthropic is by far the largest user of Amazon Trainium chips, and Amazon/Anthropic have built two frontier-scale data centers totaling an estimated 1.3M Trainium2 chips as of January 2026.
  2. These Anthropic deployments are a very large chunk, perhaps about 50%, of the total Trainium2 stocks deployed to date.

We also have a separate estimate from Omdia of 1.3 million Trainium/Inferentia chips deployed in 2024, without distinguishing between chip types. And SemiAnalysis, an independent semiconductor research firm, has produced estimates of relative Trainium growth over time, though they don’t publicly disclose their estimates of absolute Trainium volumes.

Evidence from Anthropic Amazon data centers

Epoch’s Frontier AI Data Centers dataset tracks the scale of two large operational Amazon–Anthropic data centers, both part of “Project Rainier”. These numbers come from a mix of analysis of satellite data and statements from Amazon. The larger one is located in New Carlisle, Indiana, and the other in Canton/Madison Mississippi. Both data centers probably exclusively use Trainium2, though this is only confirmed for New Carlisle.

As of June 2025, New Carlisle and Canton together were already at 750k Trainium2 units. By January 9, 2026, New Carlisle will have 1M Trainium2 online (corroborated by Amazon’s claim of 1M online by end of 2025), for a total of ~1.3 million Trainium2 between the two campuses. While it is not confirmed as of writing (January 7) that 1M Trainium2 are actually operational in New Carlisle, Amazon very likely has procured this many chips intended for that data center, given that there is a time delay between acquiring chips and installing them in a data center. This could also be an underestimate for Canton because of the lumpiness of updates, as the estimate for Canton dates to June 2025.


Note: the y-axis of this graph is in H100e. 1 H100e ~= 1.5 Trainium2.

Inferring total Trainium volumes from Anthropic data centers

Given that Amazon and Anthropic have deployed ~1.3 million Trainium2 in two large data center campuses, New Carlisle and Canton, what sort of bounds can we place on the total stock of Trainium?

Lower-bounding the total stock

According to SemiAnalysis (as well as common sense), New Carlisle and Canton don’t contain all of Anthropic’s Trainium chips, much less all Trainium(2) in the world. They estimate Anthropic’s Trainium capacity in Q1, which predates these two campuses, was roughly 15% of its Q4 2025 capacity.

We also know, from research firm Omdia, that Amazon deployed over 1 million of its custom AI chips in 2024, predating these two data centers. These may have been mostly older Trainium1 and Inferentia chips that are around 25% as powerful as Trainium2, but this would still be the equivalent of >200k Trainium2. And Amazon was using 80,000 Trainium 1 chips to run a single product (Rufus, the chatbot on Amazon.com) in summer 2024.

Upper-bounding the stock

These Anthropic data centers are probably a large share of total Trainium volumes:

  • Amazon Q3 2025 earnings reported that Trainium2 revenue (meaning cloud revenue from renting out Trainium) “grew 150% quarter over quarter”, coinciding with Project Rainier coming online around June 2025. This suggests that these two campuses are over half of the Trainium cloud fleet. Since New Carlisle doubled in H2 2025, Rainier’s share may have grown further since Q3.
  • SemiAnalysis claims that Anthropic is the only large external Trainium2 user, and “materially larger than Amazon’s internal needs” for services like Bedrock and Alexa. This suggests that including Amazon’s internal Trainium fleet doesn’t dramatically change the picture.
  • Supply chain estimates suggest an upper bound on production. Morgan Stanley,, used TSMC CoWoS wafer allocations to estimate in early 2025 that 1.1-1.4 million Trainium2 would be produced in 2025.1 This now looks like an underestimate, but does suggest that 2025 Trainium2 shipments are unlikely to be far greater than Rainier’s 1.3M total.

Overall, it looks like the 1.3M Project Rainier chips are a large chunk of total Trainium2. This makes much higher volumes (e.g. 4M or more Trainium2) less likely. As a result, we use 2M to 3M Trainium2 as a plausible range of the Trainium2 stock, with 2.5M as a point estimate (implying that New Carlisle and Canton together are about half of the total deployed stock).

Estimating volumes and deployment timelines

Starting from a point estimate of 2.5M Trainium2 deployed as of end of 2025, how much total Trainium (including older models) were shipped/deployed over time?

  • As noted above, Omdia estimates 1.3 million Trainium and Inferentia chips deployed in 2024, without breaking this down by chip type. Trainium2 became available in late 2024, so this production was split between Trainium2 and earlier generations.
  • Anthropic’s Project Rainier data centers ramped heavily in mid-late 2025. At the end of H1 2025, New Carlisle and Canton were already at a combined 750k Trainium2.
  • SemiAnalysis estimated relative growth of Trainium package production over time. Their chart shows that Trainium2 production increased by ~4x between 2024 and 2025, implying that 2024 Trn2 production was ~20% (~500k) of the cumulative total, though they note there is a time delay between package production and server shipments.2
    • They also show 6x growth over two years (or 25% growth per quarter). In 2025, this quarterly growth rate slowed to 12%. If 2M Trainium2 were shipped in 2025, this growth range implies 780k to 890k were shipped in H1 2025.3 The higher end looks more plausible given Project Rainier already reaching 750k by mid-year.

Overall, we have this timeline of rough estimates:

  • 2024: 500k-1M Trainium1/Inferentia, and ~500k Trainium2
  • 2025 H1: ~850k Trainium2
  • 2025 H2: ~1.15M Trainium2

Cost per chip

There is limited media and analyst coverage of Trainium costs, though SemiAnalysis estimates Trainium2 at a ~$4k unit cost per chip for Amazon.

This is much lower than competitors like the Nvidia H100 (priced at >20k each), despite Trainium2 having ~66% the FLOP/s spec of an H100 and a similar amount of memory. But this figure is plausible because Amazon’s custom chip effort is heavily vertically integrated, Amazon having acquired the chip designer Annapurna Labs. The Nvidia B200, which is more complex and has 2x more memory than Trainium2, likely only costs around $6k to produce. Amazon still partners with external chip design firms Marvell and Alchip, but this is reportedly much less lucrative for them than (e.g.) Google’s TPU is for Broadcom.4

Huawei

Huawei’s Ascend 910-series chips are China’s leading domestically-designed AI accelerators, with the 910B and 910C being the primary models in recent years. Huawei Ascend volumes are covered fairly extensively by other analysts, so we primarily summarize and synthesize this external work. We focus on 2024 and 2025, as there is less coverage of previous years, and it is likely that the vast majority of Huawei Ascends are less than two years old.

2024 Production

For the Ascend 910B, analysts largely agree on 400,000–450,000 units. 36kr reported roughly 400,000 shipments, while both SemiAnalysis and Bernstein came in at 450,000. The Financial Times was a notable outlier at just 200,000. For the newer 910C, only SemiAnalysis provided an estimate: around 50,000 units in limited initial production.

2024 totals: ~400k 910B and ~50k 910C

2025 Production

Estimates diverge more for 2025, especially on the 910B/910C mix. SemiAnalysis projects 152,000 910Bs and 653,000 910Cs—a significant shift toward the newer chip. Bernstein sees a more even split at 350,000 of each. Bloomberg reported ~300,000 910Cs for 2025, with plans to double that in 2026.

The U.S. Commerce Department offered a much lower figure—just 200,000 total Ascends. This number is out of step with other estimates and may be scoped to chips produced entirely in China’s indigenous supply chain. Until the relevant export controls were tightened recently, Huawei relied on imports from TSMC to acquire logic dies for Ascends, as well as HBM imports.

For our 910C estimate, we have greater confidence in SemiAnalysis’s higher figure of ~600,000, due to its far greater detail (the other analyst estimates are simply reported in the media) and SemiAnalysis’ relative expertise in semiconductors.

2025 totals (best estimate): ~200k 910B and ~600k 910C

Estimating the compute-equivalent total of Huawei Ascends

Given these estimates of Ascend volumes, what is the total computing power of the Ascend stock? The challenge is that there are multiple published specs for both the 910B and 910C. The 910B is commonly quoted at 320 16-bit TFLOP/s, but subvariant specs range from 280 to 400 TFLOP/s. Similarly, the 910C is most often quoted at 800 TFLOP/s (possibly rounded), though SemiAnalysis claims 780 TFLOP/s.

To handle this uncertainty, we use a Monte Carlo model, with a range of 280 to 400 TFLOP/s for the 910B and 780 to 800 TFLOP/s for the 910C. In addition, for the purpose of this dataset, we use the maximum OP/s rating for 8-bit (or greater) number formats. For the 910B, INT8 specs are 2x higher, or 560 to 800 TOP/s. While published INT8 specs are not available for the 910C, we assume the 910C’s maximum INT8 output is approximately 2x higher than its FP-16 rating, at ~1500 to 1600 TOP/s.

We also apply uncertainty intervals for unit counts based on the research above.

This model yields 180K H100-equivalents in 2024 (90% credence interval: 130K–240K), of which 140k comes from 910Bs and 40k from 910C, and 530K H100-equivalents in 2025 (450K–620K), split between 70k 910B and 460k 910C.

AMD

AMD sells AI chips called Instinct GPUs. They are arguably the largest competitor to Nvidia in selling AI chips that are not custom-designed for specific companies/applications (such as Google/Broadcom’s TPU and Amazon’s Trainium).5

We estimate AMD’s AI chip sales6 by first:

  • Estimating AMD’s total revenue from Instinct GPU sales.
  • Estimating average selling prices for each GPU model based on media reports and analyst coverage.
  • Modeling the production mix across different GPU generations each quarter based on product launch dates, known customer deployments, and company commentary.

With quarterly Instinct revenue, GPU pricing, and production mix estimates, we can calculate unit volumes by dividing revenue by weighted-average prices. We implement this using Monte Carlo simulation to model uncertainty from all input parameters and report results as median values with confidence intervals. The full model can be found here.

How much revenue does AMD make from AI GPUs?

AMD reports its “data center” revenue each quarter (as in, revenue from its division that sells products intended for data centers), but they do not break out how much revenue comes from Instinct AI GPUs versus other products like its EPYC CPUs. To estimate Instinct revenue, we rely on company disclosures and work backwards from a few anchor points.

The most important anchor is AMD’s disclosure that Instinct revenue exceeded $5 billion in 2024. A secondary anchor helps us distribute this $5B across quarters in 2024. In AMD’s Q2 2024 earnings call, Lisa Su, AMD’s CEO, noted that Instinct MI300 revenue exceeded $1 billion for the first time. This helps ground how we model the quarterly distribution of the $5+ billion for 2024.

For 2025, we are less certain about Instinct GPU revenue.

AMD’s data center segment grew overall in 2025, but experienced headwinds in Q1 and Q2 2025 from two factors. First, new U.S. export controls in April 2025 restricted AMD’s MI308 chips designed for the Chinese market, resulting in an $800M charge in Q2. Second, AMD faced unexpectedly weak demand for its new MI325X GPU. We estimate that Instinct’s share of datacenter revenue fell from ~50% in Q4 2024 to ~40% in Q1 2025 and ~30% in Q2 2025.

This decline reversed in Q3 2025 when AMD introduced a new GPU generation, the MI350 series, which “ramped really nicely” and drove Q3 growth. We estimate Instinct represented 35-50% of Q3 2025’s $4.34B datacenter revenue, or ~$1.5-2.3B in revenue.

We have meaningful uncertainty because AMD does not regularly disclose the EPYC/Instinct split. The actual revenue distribution could differ, particularly in quarters where we have limited direct information.

Quarter DC Revenue Instinct Share Instinct Revenue Notes
Q1 2024 $2.34B ~30% ~$0.7B Assumption based on ramp trajectory
Q2 2024 $2.83B ~36% >$1B Company disclosure
Q3 2024 $3.55B ~42% ~$1.5B Interpolation
Q4 2024 $3.86B ~47% ~$1.8B Interpolation
2024 Total $12.6B 44% ~$5B AMD FY2024
Q1 2025 $3.67B 35-45% $1.3-1.7B MI308 export controls, MI325X weak demand
Q2 2025 $3.24B 25-40% $0.8-1.3B $800M MI308 charge, continued softness
Q3 2025 $4.34B 40-53% $1.7-2.4B MI350 ramp

Average Selling Price

In order to convert Instinct revenue into unit volumes, we need to estimate what AMD sells each Instinct GPU for. Our estimates primarily draw from industry analyst reports.

We model each ASP as a uniform distribution within ranges anchored on media reports and analyst estimates.

GPU Model Price Range Source
MI250X $8,000-12,000 Massed Compute
MI300A $6000 - $7500 Derived from El Capitan cost / # of units
MI300X $10,000-15,000 Tom’s Hardware; Citi via SeekingAlpha (Microsoft pays ~$10k)
MI325X $10,000-20,000 Low end reflects hyperscaler discounts amid soft demand
MI350X $22,000-25,000 Seeking Alpha (reports of $25k pricing)
MI355X $25,000-30,000 Seeking Alpha

GPU Revenue Mix Over Time

Estimating the quarterly split between GPU models is necessary to measure the total compute capacity of AMD chips because GPUs vary in cost-effectiveness.

We model the revenue mix using product launch timelines, known customer deployments, and company commentary about production ramps. We estimated most 2024 sales came from the MI300 series, with a small share for the legacy MI250X, MI300A, and MI300X. In Q3 2025, the next-gen MI350 series launched ahead of schedule and we estimate that it quickly captured the majority of Instinct revenue. A more detailed timeline is shown below, with the full parameters available in our model.

We have significant uncertainty about the revenue mix because AMD does not disclose precise product shares. The actual mix could differ substantially, particularly during product transition periods when multiple generations overlap.

GPU Release date Description Revenue share notes
MI250X Nov 2021 Previous-gen flagship GPU Legacy product with minor share in early 2024
MI300A Jan 2023 First 300 series, designed for HPC over AI Most of the 44.5K MI300A for El Capitan were delivered in Q2 and Q3 2024. Ramped down after
MI300X Dec 2023 Flagship AI GPU, competing with Nvidia H100 Ramped through 2024, phasing out in 2025 for MI325X
MI308X Dec 2023 China-spec 300X to meet export controls Unclear if high-volume shipments ever took place, exports definitively blocked in April 2025
MI325X Oct 2024 300X refresh with more memory, akin to Nvidia H200 Shipped Q2 2025 at 5-20% share in Q1 2025, grew to 15-40% by Q2 2025 despite soft demand; timing vs B200 hurt
MI350 Jun 2025 Next-gen GPU, competing with Nvidia Blackwell Volume production ahead of schedule; primary driver of Q3 2025 growth

Volume Calculation

With quarterly Instinct revenue estimates, ASP distributions for each model, and production mix shares, we calculate unit volumes by multiplying total GPU revenue by each model’s production share to get model-specific revenue, then dividing by that model’s ASP.

Rather than working with point estimates, we sample from all the probability distributions simultaneously and compute the resulting unit volumes using Monte Carlo simulations.

Limitations

Our analysis has several sources of uncertainty that create several limitations:

Revenue allocation - AMD doesn’t report the EPYC/Instinct split for most quarters. We anchor on disclosed data points but if AMD’s actual EPYC/Instinct mix differs from our estimates, either because EPYC grew faster or slower than we assumed, our GPU revenue estimates would be biased.

Production mix - We estimate GPU model shares from product timelines and deployment reports, but AMD doesn’t disclose actual splits. The actual splits could differ substantially from our estimates, especially during product transitions. This matters because different models have different prices and misestimating the mix between a $15,000 and $25,000 GPU directly affects unit counts.

Pricing - GPU pricing varies by customer, contract structure, and timing in ways our ASP ranges would not fully capture. Large bulk purchases as well as customer demand can meaningfully affect average per-GPU prices. We model ASPs as uniform distributions within reported ranges, but actual contract prices likely have more complex distributions.

MI308 export controls on Q2 2025 and subsequent quarters remains uncertain. AMD disclosed the $800M Q2 charge and $1.5-1.8B annual impact, but the full effect on product mix, pricing, and customer allocation is difficult to model precisely.

Deployment lag - Our estimates reflect chips produced and sold by AMD, not necessarily chips deployed and operational in customer data centers. There may be deployment lags between when AMD recognizes revenue and when chips are online. For quarterly analysis, these timing differences could shift volumes between periods.

Summary

AMD’s Instinct GPUs compete with Nvidia’s data center accelerators but represent a much smaller share of the AI compute market. We estimate Instinct GPU volumes in 2024 and 2025 to understand AMD’s position in the AI chip market.

AMD bundles its Instinct GPU revenue with EPYC server CPU revenue in a single data center segment. This makes it more difficult to track AMD’s revenue from their Instinct GPU line. We estimate unit volumes by inferring GPU revenue and dividing by average chip prices. The main revenue anchor point is AMD’s statement that 2024 Instinct revenue exceeded $5 billion. We also use their Q2 2024 disclosure that MI300 revenue surpassed $1 billion that quarter. These two points let us estimate how GPU revenue was distributed across other quarters, accounting for product launches, media reports on adoption, and production schedules AMD discussed in earnings calls.

We estimate prices for each GPU model from analyst reports and media coverage of customer contracts. Hyperscalers like Microsoft pay around $10,000 per MI300X in volume, while newer models like the MI350X sell for $22,000-25,000. Production mix across GPU generations comes from product timelines, known deployments, and company statements about manufacturing. We run Monte Carlo simulations to propagate uncertainty rather than using point estimates.

Google

Overview

Google designs custom AI accelerators called Tensor Processing Units (TPUs), which are used by Google DeepMind for training and inference, by Google at large for AI/machine learning tasks, and by external cloud customers such as Anthropic.7

Google partners with Broadcom, a major chip designer, to produce the TPU. Google designs the high-level TPU architecture, while Broadcom handles physical design and supply logistics, including placing manufacturing orders with TSMC. In short, Google buys TPUs from Broadcom, which oversees their production.

Google does not disclose its TPU volumes, but we have a significant amount of evidence we can use to estimate these volumes. We use the following methodology:

  • We estimate how much Google spends on TPUs using Broadcom statements about their revenue from AI chips and AI hardware, and reports and estimates of Google’s share of that revenue.
  • We then figure out the average price of TPUs: First, we use a bottom-up model to estimate what TPUs cost to produce (i.e. how much they cost for Broadcom). We then adjust this using reports/statements about Broadcom’s gross margins on TPUs to estimate how much Google pays per TPU.

We focus on 2024-2025 sales for now because there is more information available on total TPU spending.

Given that many inputs are uncertain, we use Monte Carlo simulations and report results with confidence intervals. The full model can be found here: [LINK]

TPU Spend

Broadcom reports AI semiconductor revenue each quarter, but this includes both custom AI accelerators (so-called XPUs, which includes TPUs) and networking products. However, on earnings calls, Broadcom’s leadership has said that XPUs account for roughly 65 - 70% of its AI semiconductor revenue, which allows us to isolate XPU revenue. For example, Broadcom’s FY2024 AI semiconductor revenue was $12.2B, implying $8 - 8.5B in XPU revenue.

The next step is estimating Google’s share of that XPU revenue:

In 2024, Broadcom’s two primary XPU customers were Google and Meta. Meta reported spending approximately $987M with Broadcom in 2024.8 Subtracting disclosed customer spending from total XPU revenue implies that Google’s TPU-related spend with Broadcom in 2024 was roughly $7 - 7.5B.

In 2025, Broadcom had almost exactly $20B in AI semiconductor revenue in 2025. They said XPUs were a 65% share of AI semis in Q3 2025, similar to 2024, so the full year share was likely similar as well, for around $13B in XPU revenue.

We can estimate Google’s TPU/XPU spend a few ways:

  • Google increased its capex spending from $52B in 2024 to $91B-93B in 2025, a roughly 75% increase. If Google’s Broadcom spend scaled similarly, this suggests $12-13B in TPU-related spend (likely a slight overestimate).9
  • Extrapolate estimates that Google accounts for ~90% of Broadcom XPU revenue in 2025, implying roughly $12B in spending. HSBC estimated that Google will make up 78% of Broadcom ASIC revenue for 2026, which suggests that TPU share of Broadcom ASIC revenue in 2025 could be 85%+ since Broadcom will be scaling additional XPU programs with other designers in 2026.
  • Validating against other analyst estimates, which model Broadcom XPU revenue from Google to be between $11-13B.

TPU Price Modeling

From the previous step, we have an estimate of how much Google spent on TPUs with Broadcom. To translate this spending into TPU production volumes, we divide estimated revenue by per-chip prices. This has two steps:

  • Estimate the production cost to Broadcom for each TPU generation using a bill of materials model
  • Estimate the price Google pays, given reports of Broadcom’s margins

Bill of materials cost model

The cost of manufacturing a TPU can be broken down into four components.

  • Logic die cost is estimated from publicly reported wafer prices at each process node and die size estimates from industry analysts, adjusted for yield assumptions that vary with die size and process node.
  • HBM cost is based on reports of memory pricing by generation and the memory capacity.
  • Advanced packaging cost covers TSMC’s CoWoS process, estimated from their advanced packaging revenue disclosures and capacity allocation reports.
  • Auxiliary costs include the power delivery hardware, PCB, and module-level assembly and testing.

Our estimates draw from publicly available information including wafer pricing, memory costs by generation, and packaging costs derived from TSMC’s capacity and revenue disclosures. The estimates vary by TPU generation, since newer chips use more advanced processes, larger dies, and more memory.

Broadcom margins

Broadcom may make around a 60% gross profit margin on TPUs, with a plausible range of 50% to 70%:

  • In Q1 2024, CEO Hock Tan said that the custom chip business “can command margins similar to our corporate gross margin.” which was around 75% at the time.
  • For FY2023, we estimate margins between 60-75%.
  • In earnings calls in 2025, Broadcom consistently said that XPU margins were lower than for their overall business (which was mid-high 70s), and also hinted that XPU margins were lower than their overall semiconductors segment (mid-high 60s).10
  • One analyst, UBS, estimated a 55% margin on XPUs.

A 60% margin would mean that a chip that costs Broadcom $1k to produce would then sell for $1k / (1 - 60%) = $2,500.

Summary of results

TPU Version Manufacturing Cost Price to Google Median Price to Google
v3 $940 - $1400 $2700 - $5000 $3600
v4 $1100 - $1500 $3000 - $5400 $3900
v5e $950 - $1400 $2100 - $4000 $2900
v5p $2300 - $2900 $5000 - $8600 $6400
v6e $1600 - $1900 $3400 - $5700 $4300
v7 $4600 - $5500 $10000 - $17000 $13000

Our cost estimates cover the TPU module only and exclude networking. While networking is a meaningful system-level cost, Broadcom counts it separately from its XPU revenue. We therefore exclude networking to ensure consistency between the revenue data and the cost model.

Our estimates are generally aligned with external reports. One semiconductor research firm estimated prices of $3k for the v5e, $6k for the v5p, and $4k for the v6e, all of which align closely with our median estimates. However, the same source estimated the v7 (formerly v6p) at $8k, significantly lower than our $13k median. Other reporting places the v7 higher: one analyst estimated $13k, while The Information reported $12k. These latter estimates align well with our own.

Production Mix

To convert total TPU revenue into chip counts by generation, we estimate the quarterly production mix across TPU versions. The production mix matters because TPU generations differ substantially in cost. Inference-optimized chips are cheaper than training-focused chips with larger memory footprints, and newer generations are more expensive than older ones due to advanced process nodes and packaging. We model the quarterly production mix as probability distributions across TPU versions, with shares normalized to sum to 100%.

Here is a summary table describing when each TPU reached general availability. The full parameters can be found in our model.

TPU Model General availability date Description Production mix notes
v4 May 2022 Efficient and scalable chip. For Q1 and Q2 2023, we assume that effectively all the TPUs made were TPU v4 since the v4 had been generally available since May 2022 and outperforms the v3 on virtually all tasks. TPU v5e came into Preview in end of August 2023 so for Q3 and Q4 2023, we model production mix split between TPU v4 and v5e.
v5e Nov 2023 Small, cost-optimized inference chip. Remains a major share of production through 2024 and into 2025 due to broad applicability and lower cost. Share declines gradually as v6e ramps
v5p Dec 2023 Training-focused chip. Represents a smaller but higher-value segment of v5-generation production, with share increasing after GA in early 2024 and declining slowly as v6e scales. External analysis suggests v5e to v5p production volume ratio is 2:1
v6e Dec 2024 Next-gen cost-optimized chip Appears in pilot volumes in late 2024, then ramps meaningfully in 2025 following GA. Absorbs a growing share of new capacity additions and becomes the largest contributor to TPU production during 2025, while v5-generation production persists.
v7 Nov 2025 Powerful next-gen chip Enters production in late 2025 with early GA/preview. Contributes a minority share in 2025, with large-scale ramp deferred to 2026.

At a high level, our approach uses observable milestones in Google’s TPU rollout process including preview availability and general availability on Google Cloud Platform (GCP) to ground when a new TPU begins pilot, ramp up, and mass production. We then model how production share evolves as a TPU version moves from limited deployment to scaled use, while older generations decline gradually.

Google typically introduces a new TPU generation in stages. First, a TPU is publicly announced, often with high-level performance claims but without broad availability. This is followed by a preview period a few months later, during which the TPU is available in limited regions, configurations, or early-access programs on GCP. Only later does the TPU reach general availability (GA), at which point it can be broadly used by external customers.

During the preview stage, production is typically in pilot volumes and meant for validation, yield learning, and assessing platform readiness. At this stage, we model the TPU as contributing only a low single-digit share of revenue (~1-5%), reflecting pilot deployments and early silicon that are unlikely to materially affect overall accelerator spend.

General availability (GA) is the first point at which we assume a TPU can begin taking a meaningful but still minority share of production. GA implies that Google can standardize new internal deployments around the TPU and offer it broadly on GCP. In the quarter of GA and the immediately following quarter, we typically model the new TPU as capturing a moderate share of revenue (~10–25%), rather than becoming dominant immediately.

Following GA, we model a gradual ramp into scaled deployment. In this phase, the new generation absorbs incremental capacity additions and displaces older chips in revenue share. Depending on relative performance-per-dollar improvements and supply constraints, the new TPU can become the largest or dominant share over subsequent quarters, rather than in a single step.

The TPU v5e and v5p are a part of the same generation and have significant overlap in production timelines. The TPU v5e is optimized for cost-efficient inference and mixed workloads, while v5p targets higher-end training workloads.

We ground their production mix using third-party analysis suggesting that the ratio of v5e to v5p production volume is 2:1. While the TPU v5e and v5p have significant production overlap, the TPU v5e entered preview and GA earlier than v5p. In Q1 FY2411, we therefore model v5e as accounting for roughly 75–90% of TPU revenue, with v5p still in limited availability before its December 2023 preview. By Q2 FY24, after v5p reaches GA, its share rises to 15–30%, while v5e remains the majority.

Google announced TPU v6e in May 2024, offered a preview in October 2024, and made it generally available in December 2024. Thus we model v6e’s share in Q4 FY24 (Aug – Oct 2024) as 1-8% of Broadcom TPU revenue. After GA, in Q1 FY25, we model v6e moving into the 10–25% range, reflecting enough volume that it is available broadly for external customers but still not constituting the majority of spend.

The TPU v6e delivers substantially better compute-per-dollar than both v5e and v5p, and Google has indicated that it is used for high-priority workloads, including training its Gemini 2.0 models. Two quarters after the v6e preview, we model it reaching roughly 30–65% of TPU revenue.

We explicitly model continued production of v5e and v5p throughout FY25. This reflects both technical and economic realities. TPU v5e remains cheaper per hour to rent, while TPU v5p offers substantially higher memory capacity than v6e, which may still make it attractive for certain training configurations. More broadly, fleet turnover is gradual: existing v5 pods continue scaling, migrations take time, and supply constraints rarely allow a single accelerator to absorb all demand immediately.

Our estimates are in-between more aggressive transition assumptions which model a rapid shift away from v5 and more conservative estimates that still attribute most TPU production in 2025 to v5-generation chips. We treat both estimates as informative bounds and take an intermediate trajectory that reflects v6e’s economic advantages and as well as the continued value of the v5-generation.

TPU v7 was announced in April 2025 and in November 2025 Google said the v7 was generally available although some materials referred to TPU v7 as entering preview, while others described it as becoming generally available over the following weeks. We interpret it as an intermediate state between early preview and GA.

As a result, we model TPU v7’s late-2025 production as more substantial than the preview phases of earlier TPU generations, but still constrained relative to a mature GA ramp. It appears at low single-digit share in Q3 FY25 and Q4 FY25. We do not assume TPU v7 approaches majority share until 2026.

Volume Estimates

Chip volumes are calculated by dividing quarterly TPU revenue by the weighted-average price per chip. We implement this using Monte Carlo simulations to propagate uncertainty from all input parameters including revenue estimates, production mix shares, manufacturing costs, margins, and more. Results are reported as median values with 90% confidence intervals.

Limitations

This analysis relies on indirect estimation from proxies, which introduces uncertainty in a few areas:

  • Customer allocation: Broadcom’s XPU revenue is divided among multiple customers, and the exact split between customers is not disclosed. Our estimates rely on limited data points that may not fully represent the competitive dynamics.
  • Contract-specific pricing: Component prices are negotiated individually and can vary significantly by volume and customer relationship. The prices Google pays may differ from the rates reported by second-hand sources we use in our cost model.
  • Production versus deployment: Our estimates reflect chips manufactured by Broadcom, not chips deployed and operational in Google data centers. There may be lags between production and deployment that our model cannot capture.
  • TPU version mix uncertainty: There is little information from Google and media reports on production mix by TPU version. The production split between TPU versions is estimated from product announcement timelines, cloud availability dates, and industry reporting on production ramps. However, it is unclear how quickly new TPU generations ramp up as a share of total production - a newly launched TPU might account for 60% or 85% of quarterly spend depending on manufacturing yields, supply chain readiness, and Google’s deployment priorities. The actual mix could differ substantially, which affects volume estimates since different generations have very different prices. Production volumes are particularly uncertain for older TPU generations that continue production alongside newer flagship releases in any given quarter. While we are directionally confident in our estimates, the individual estimates have significant uncertainty.

Summary

This model estimates Google’s TPU production volumes by combining Broadcom’s publicly disclosed AI semiconductor revenue with bottom-up cost estimates for each TPU generation. We isolate custom accelerator revenue from Broadcom’s reported totals, estimate Google’s share, model per-chip manufacturing costs, and infer the quarterly production mix from product timelines and industry reporting. Dividing estimated spending by weighted-average prices gives us chip production volumes.

There are several sources of uncertainty including the allocation of Broadcom revenue between customers, contract-specific pricing, and the production mix between TPU generations. Results are reported as median values from Monte Carlo simulations with confidence intervals reflecting these uncertainties.

Nvidia

Nvidia is the largest seller of AI chips, which they call “GPUs”. Here, we explain how we estimated total Nvidia AI GPU shipments from 2022 onwards. We focus on Nvidia’s flagship AI GPUs, not its gaming GPUs (which are used by some AI practitioners) or chips for self-driving cars.

We estimate Nvidia AI GPU sales based on Nvidia’s revenue from AI compute, and reported/estimated prices of its GPUs over time, as well as announcements and Nvidia statements of which chip types were shipped over time.

We then compare the results of this model from a public disclosure from Nvidia that it has shipped 4M Hopper GPUs and 3M Blackwell GPUs through October 2025.

Revenue-based model

The code for this model can be found here.

The revenue-based model uses this approach:

  • We find Nvidia’s revenue from AI compute (AI GPUs/servers) over time, which Nvidia reports every quarter. This data runs through October 2025, the most recent quarter.
  • For each quarter, we record or estimate how Nvidia’s compute revenue breaks down by GPU model (e.g. A100, H100), mostly based on Nvidia’s earnings commentary.
  • Then, we estimate the number of GPU units shipped using reports and estimates of the prices of GPUs and other equipment.

We restrict our data to 2022, which should capture a very large majority of the existing stock of Nvidia chips, given a typical lifespan that is likely between 3-6 years12 and rapid growth in sales over time.

Nvidia’s AI GPU revenue

Nvidia breaks out revenue for its “Data Center” division, which sells products intended for data centers. Data Center has been >80% of Nvidia’s total revenue since 2024 and includes all of Nvidia’s mainstream AI chips, but not gaming GPUs and its self-driving car chips (which are out of scope for our counts).

Nvidia breaks down “Data Center” further into Compute and Networking, with Compute making up 80-85% of Data Center in recent years.13 Networking means cables, switches, and other components used to connect chips together. This is vitally important for creating chip clusters, but we mostly ignore networking revenue for the purpose of estimating chip counts.

A small nuance is that “Compute” is more than physical compute sales. Nvidia describes (page 5) its Compute sales as including hardware (GPUs, CPUs, and interconnects), cloud compute revenue (DGX Cloud), and AI-related software products.14 Software and cloud are likely a very small share; for example, they were $2 billion in revenue in 2024, compared to ~$100B in total Compute revenue. We assume that ~95-99% of Compute revenue is from hardware sales. Additionally, Nvidia’s cloud division mostly uses compute that Nvidia sells and then rents back.15

This revenue data is summarized below. Note that Nvidia’s fiscal year ends in January and runs ~11 months ahead of the calendar year (e.g. FY25 ended January 26, 2025). Because their quarters do not line up with calendar quarters, in our visualizations we use interpolation to distribute GPU counts by calendar quarter.

Period Data center revenue Compute revenue
Fiscal year 2023 (approx. Feb 2022 - Jan 2023) $15.0B $11.3B
Fiscal year 2024 (Feb 2023 - Jan 2024) $47.5B $38.9B
Fiscal year 2025 (Feb 2024 - Jan 2025) $115.3B $102.3B
FY 2026, first three quarters (Feb 2025 - Oct 2025) $131.4B $111.0B

Nvidia’s revenue mix over time

Given total revenue and prices of chips over time, the next step is allocating quarterly revenue to chip type. Nvidia doesn’t consistently break down sales by chip, but they provide substantial details in earnings commentary.

Nvidia’s Hopper generation (H100) launched in late 2022 and ramped massively in 2023 while A100s ramped down. The next major generation, Blackwell, ramped in early to mid-2025. And Nvidia’s China-spec chips made up around 15% of sales while their export was legal (late 2022 to October 2023, and ~Feb 2024 to April 2025; the H200 will also be sold in China in 2026).

An abbreviated timeline with notes is provided below, and more detailed estimates and notes can be found in this sheet.

Note is that we round down the sale of data center GPUs that don’t appear in this list to zero. The main GPU this applies to is the lower-grade L40, released early 2023. There is little information available about L40 sales, but they may have made up a small portion of 2023-2024 sales. We also bucket some GPUs together, like H200s with H100s and B100s with B200s.

GPU First shipped Approx. revenue share Notes
A100 June 2020 Dominant through late 2022 Flagship pre-Hopper. By September 2023, A100 revenue was “declining sequentially” and at the “tail end” of the architecture.
H100 and H200 “Hopper” (bucketed with 2024’s H200 due to lack of breakout and similar specs) Late 2022 ~80% of 2023-2024 ~15-20% in first half of 2025 Flagship AI chip of the post-ChatGPT era. Started shipping by October 2022, and H100 revenue was “much higher than A100” by January 2023. Hoppers were still sold through late 2025. By April, transition to Blackwell was “nearly complete” with Blackwell at ~70%, but the following quarter H100/H200 sales actually increased.
A800 Oct 2022 A800 and H800 ~20% through late 2023 China-compliant A100 variant; banned Oct 2023.
H800 Late 2022   China-compliant H100 variant, banned Oct 2023. China was 19% of data center sales during relevant period.
H20 Feb 2024 ~13% from Feb 2024 to April 2025 Heavily-downgraded Hopper variant for China; eventually banned April 2025. China was about 13% of 2024 revenue, and H20 had $4.6B in sales in spring 2025, or ~13% of compute sales.
“Blackwell” B200 (bucketed with B100) Late 2024 Majority by mid-2025 Next generation after Hopper. We bucket B200 bucket with the less powerful B100 due to lack of breakout, and presumed low B100 share. Most B200s are sold in GB200 systems (see notes on B200/B300 below). Quarter ending Jan 2025 had $11B of Blackwell revenue or ~33% share of data center. Next quarter, Blackwell was “nearly 70%” of data center compute.
B300 (aka Blackwell Ultra) Mid 2025 Majority by late 2025, displacing B200 Mid-cycle refresh of B200 with “tens of billions” in sales by July 2025, perhaps roughly half of Blackwell. The next quarter it reached two-thirds of Blackwell.

Chip prices

Note on servers vs chips

To estimate how many GPUs Nvidia shipped, the next step is to divide their Compute revenue by average prices per chip, which we collect using reports and estimates from the media and analysts.

As mentioned above, their compute revenue includes a small amount of software and cloud revenue. Another potentially significant nuance is that Nvidia sells some GPUs bundled into complete servers, which means Nvidia’s Compute revenue captures revenue from non-chip, non-networking server components as well.16 This would increase the effective Compute revenue that Nvidia makes per AI chip.

For context, most Nvidia AI chips are deployed in servers, which are necessary for training and commercial-scale inference, rather than individual cards. Most of the cost of a server is the GPUs themselves, but server components add costs: one bill-of-materials estimate of an 8-GPU H100 server has the full server sold at a ~38% premium vs the 8 GPUs, including networking components. And as we note below, Blackwell NVL72 systems command a broadly similar premium vs individual Blackwell GPUs, with a roughly 40k to 50k server price per GPU for GB200 NVL72 vs 30-40k for standalone B200s.

However, Nvidia’s overall “server premium” is probably much lower than ~30%:

  • Most Nvidia servers appear not to be purchased from Nvidia fully-assembled. One 2023 article estimates that Nvidia-packaged H100 servers were 20% of Nvidia data center revenues, meaning non-GPU server components made up <10% of Nvidia’s data center revenue.17 The remainder would be assembled or sold by other companies such as Dell and Supermicro.
  • Second, some of these server components would count towards Networking revenue, not Compute.

Overall, a detailed analysis of the Nvidia server market is outside our scope here. We choose uncertainty intervals about chip prices wide enough to incorporate this uncertainty about the server premium.

A100

Standalone A100s cost around $10k in 2023 per CNBC, while SemiAnalysis described it as “$10k+”. A DGX A100 server had a launch MSRP of $199k in 2020 or $25k per GPU (maybe the last Nvidia flagship AI product to have an MSRP), though this includes a server premium, and is before bulk discounts.

For A800s specifically, Financial Times reports a price of ~$10k each in 2023. This is probably a standalone chip price.

Overall, we use an uncertainty interval of $10k to $15k for A100s in 2022-2023, given that reports generally give a lower end of 10k, and a possible server premium.

H100

The Hopper generation has perhaps the best direct evidence of average price, thanks to a disclosure from Jensen Huang himself: Nvidia made a total of $100B in revenue from selling 4 million Hopper GPUs through October 2025 (excluding China). That works out to $25k per GPU on average—though both figures appear to be round numbers, suggesting some rounding.

Other reports include:

  • CNBC wrote in March 2024 that analysts generally estimate that H100s cost between 25k and 40k (without distinguishing GPUs vs servers, or whether these price figures are an industry average).
  • The Financial Times quoted SemiAnalysis in September 2024, claiming that Nvidia sold H100s for $20k to 23k, after recently cutting prices.
  • In May 2023, SemiAnalysis estimated the 8 GPUs in an HGX H100 server sold by Nvidia cost $195k, or $24k per GPU, while the whole server cost ~34k per GPU. We couldn’t find any corroborating reports of H100, so it is unclear if this price was actually available to any customer not buying the HGX bundle.
    • In 2024, they listed $24k as the average selling price of both the H100 and H200 GPU; an identical price for the upgraded H200 suggests some rounding.
    • By 2025, SemiAnalysis updated their estimate of an H100’s server cost to $190k for hyperscalers $23.7k / GPU, so GPUs themselves would be cheaper still.
  • In a small dataset we collected of public price listings, H100 servers generally cost ~35k per GPU, without a clear trend over time.

Overall, there is a relative consensus of reports of a price low-mid 20k range by 2024. We have more uncertainty about the average price in 2023, when there was reportedly a widespread shortage of H100s.

H200 vs H100

In 2024, Nvidia started selling the H200, a mid-cycle update to the H100. The main difference is improved memory: 141 GB vs 80 GB for the H100. High-bandwidth memory generally costs $10-20 per GB, so the H200 is ~$1k more expensive to produce than the H100, implying a price increase of ~$4k if Nvidia applied its overall GPU gross margin of ~75% to this additional cost.

Nvidia never broke H200 vs H100 sales, but at least one analyst predicted that the majority of Hopper sales would be H200 from H2 2024. So it is possible that the introduction of the H200 meaningfully increased the average price of Hoppers (relative to if all sales were H100).18

Overall, our credence intervals for average Hopper (H100/H200) prices are listed below. Because the H100 and H200 were on sale for over three years, we break down our estimates by year.

Year Low price High price Geometric mean Notes
2022 25k 35k ~29.6k Low volume in 2022 in any case
2023 25k 35k ~29.6k  
2024 23k 30k ~26.3k H100s were reportedly 20k-25k by late 2024. This may be pushed up by the H200.
2025 20k 27k ~23k Entire H100 server costs reportedly in low 20k range by 2025, but H200s are more expensive.

H800

Reuters reported that 8-GPU H800 servers were sold for 2 million yuan, or ~$280k. This is similar to SemiAnalysis’ 2023 report of $270k for an HGX H100 server, suggesting a similar price for the GPU. One article notes a range of “street” prices for H800s from $35k up to $69k, though the higher end is almost certainly not representative.

Overall, we use a range of 25k to 35k, identical to what we use for H100s.

H20

Based on several sources, unit prices were typically in the low-10k range. Reuters reports they sold for $12k to 15k “per card” in February 2024, while TechPowerUp reported $12k in July 2024. Reuters also reported in February 2025 that “Analysts estimate Nvidia shipped approximately 1 million H20 units in 2024, generating over $12 billion in revenue for the company.”

Given these reports, we use the following price assumptions:

Year Low price High price Geometric mean Notes
2024 10k 15k ~12.2k Most reports in 2024 converge on 12k to 15k
2025 10k 13k ~11.4k  

Blackwell (B200 and B300)

Nvidia’s next-gen Blackwell chips were released in late 2024 and come in three variants: B100, B200, and B300 (aka Blackwell Ultra).

B200s and B300s are the flagship GPUs of Blackwell, with B300 being a mid-cycle upgrade. They are mostly deployed in the form of GB200 and GB300 systems, which contain one Nvidia Grace CPU per two Blackwell GPUs, with the most common configuration being NVL72 server racks containing 72 Blackwell GPUs.19 GB200/GB300 systems have a 10% higher FLOP/s rating per GPU than standalone B200/B300s.

Overall, NVL72 appears to be the overwhelmingly popular form factor for Blackwell.20 Given the predominance of NVL72, B100s are likely a low-volume product and we group them together with B200s.

How much do B200s and B300s cost? Let’s start with B200 and discuss how the B300 differs.

For estimating the price of (G)B200s, there are two possible approaches: (1) find the price of standalone B200 GPUs, or (2) estimate the price-per-GPU of NVL72 systems, and adjust downwards for non-GPU costs like networking.

For standalone B200s:

  • Jensen Huang also said in March 2024 that standalone B200s would be priced between 30k and 40k each.21
  • We estimated that B200s cost ~$6400 [5.7k to 7.3k] to produce based on a bill of materials breakdown. At a gross margin of 75-80%, this implies a price of $25k to $32k, or $22.8k to $36.5k if multiplying the ends of the production cost confidence interval.22 This is arguably a less credible method than using financial reporting, especially when we don’t know the margin for B200 GPUs specifically.

Meanwhile, the price of GB200 NVL72s is fairly well documented:

  • SemiAnalysis estimates that GB200 NVL72 servers cost between 3.2M and 3.5M, varying for hyperscalers and neoclouds, or 44k to 49k per GPU, implying an industry average somewhere in between.
  • Another source (HSBC) gives a somewhat lower estimate: $2.6M for NVL72 GB200s (36k per GPU). Other analyst estimates (from mid-2024, before GB200s were widely available) were at $3 million per system, or ~$41.7k per GPU.
    • HSBC analysts also estimated the price of GB200 superchips (two GPUs) at $60 to 70k, or $30-35k per GPU.

However, the effective price of GB200s (Nvidia Compute revenue per GPU shipped) will be lower than this:

  • NVL72s contain a significant amount of networking equipment, so part of the price accrues to Nvidia’s Networking segment. Nvidia has said that its Networking revenue is increasing rapidly, “driven by the growth of NVLink compute fabric for GB200 and GB300 systems”, among other factors. Networking was 18% of Nvidia’s total data center revenue that quarter, a rough upper bound on what portion of the cost of an NVL72 accrues to Nvidia Networking (Nvidia also sells cluster-level networking equipment outside the server racks themselves).
  • As with H100 servers, some share of GB200 servers is sold by server OEMs or assembled by hyperscalers, in which case the cost of components that aren’t proprietary to Nvidia don’t flow to Nvidia.

Overall, these NVL72 figures point to a range of 35k to 40k per GPU. We use a price range of 33k to 42k for GB200s (a somewhat unprincipled widening of 35K-40K, which is a narrow range but would be a reasonable choice for a middle-50% interval).

GB300 vs GB200

How might GB300s differ in price from the GB200?

  • HSBC estimates that GB300 NVL72s sell for $3M, or ~15% higher than their $2.6M figure for GB200 NVL72 (this ratio shouldn’t be taken very literally; while the institution is the same, the methods and timeframe could vary and there could be rounding).
  • Apple (not a hyperscale buyer of AI compute) reportedly paid $3.7M to 4M per GB300 rack, or 51-55k per GPU. This is 10-20% higher than SemiAnalysis’ reported “neocloud” pricing for GB200 NVL72.
  • The B300 differs from the B200 in logic die and other components, but likely the greatest cost driver is the ~100GB increase in memory per GPU . If the HBM costs $15/GB and priced at a 75% margin (see H200 discussion above), this would increase the GPU’s price by ~6k vs the B200. This suggests a ~20% price increase.

A naive model of applying a 10-20% premium on our GB200 interval, with no correlation between GB300 premium and GB200 price, suggests a GB300 price range of 38k to 49k. The higher end is consistent with higher-end reports of GB300 NVL72 systems bought by Apple.

Reconciling with direct Nvidia statements

Nvidia CEO Jensen Huang disclosed that through October 2025, Nvidia had shipped a total of 4M Hoppers (H100s and H200s) and three million Blackwell GPUs through October, excluding the Chinese market in both cases. (excluding China sales): 4M Hopper GPUs, and 3M Blackwell GPUs (technically, he says 6M Blackwell GPUs, but he’s referring to the two individual dies within each GPU package)23

How does this compare with our estimates? Our model puts overall Nvidia GPU shipments through October 2025:

  • 4.3 million H100s, excluding A800s, H800s
  • 2.54 million Blackwell GPUs (1.6M B200s and 940k B300s) through October 2025

These aren’t perfectly consistent, especially the Blackwell total which is ~15% lower than 3 million. Since the 3M Blackwell count is actually derived from 6M Blackwell dies divided by two, this isn’t explained by rounding, since ~2.5M Blackwell GPUs would be 5M dies.24

Some other potential explanations for the Blackwell difference:

  • Since our Hopper count is slightly high and Blackwell count is low, we might be misallocating the revenue between generations, though their earnings commentary (compiled here) on the Hopper/Blackwell split in 2025 is fairly specific
  • We use the prices and specs of GB200 and GB300s. But B200 and B300 chips are slightly cheaper, and the somewhat less powerful B100 is cheaper still. So if these variants have a significant share of Blackwell, this would increase GPU counts (while also reducing compute power per GPU). Our GB200/GB300 split could also be off: the GB300 is more expensive.
  • Our Blackwell price estimates could be slightly too high in general

Given that the conclusions are not far off, we won’t attempt to fully resolve this difference. We intend to continue modeling Nvidia chip shipments after October 2025, and not just Hopper and Blackwell, so our focus is on building the best comprehensive model of Nvidia sales.

Notes

  1. 1.4 million if the wafers they forecast would be used for Trainium3 were reallocated to Trainium2. Trainium3 was announced near the end of 2025. 

  2. This was estimated by digitizing the chart image. 

  3. If H2 shipments are 25-56% higher than H1 and total for 2025 is 2M, then H1 = 2M / (1 + growth factor) = 2M / 2.25 to 2M / 2.56 = 780k to 890k. 

  4. “Amazon and Annapurna are heavily cost focused and drive their suppliers hard. Compared to Broadcom’s ASIC deals, the Trainium projects have much less profit pool available to the chip design partners Alchip and Marvell.” 

  5. But well behind Google/Broadcom overall. 

  6. Not including AMD’s gaming GPUs which can be used for AI/ML but are not designed for it. 

  7. Recently, Google/Broadcom have also decided to start selling TPUs directly, e.g. to Anthropic and to other cloud companies. These deliveries will likely begin in 2026 and the TPU volumes we estimate are presumably owned entirely by Google. 

  8. This was disclosed because Broadcom’s CEO, Hock Tan, is on Meta’s board. Not all of this revenue from Meta necessarily went to XPUs, in which case Google’s share of XPUs would be even higher. 

  9. The assumptions that TPU spend with Broadcom scales proportionally with overall CapEx may not hold exactly if spending shifts toward data centers, networking, or other areas. 

  10. For example, they forecast in Q2 that overall gross margins would decline “primarily due to a higher mix of XPUs in AI revenue” implying XPUs have a lower margin than their overall AI business. Hock Tan also didn’t push back against a caller who said “custom [chips] is probably dilutive within semis”. 

  11. Quarterly references correspond to Broadcom’s fiscal quarters, which run ahead of calendar quarters. For example, Broadcom Q4 FY24 spans roughly August - October 2024, and Broadcom Q1 FY25 spans November 2024 - January 2025. 

  12. See estimates of chip lifespan in “Analysis” here. It is possible that some 2022-era chips have already been retired but we don’t explicitly model retirements/failures for simplicity. 

  13. For 2022, an explicit Networking breakout was not available, so we estimate/interpolate this as 75% Compute (networking share has generally gone down since 2020). 

  14. As a very minor point, Nvidia actually report two sets of breakdowns. Nvidia has two “reportable segments”: “Compute & Networking” and “Graphics”. Nvidia also reports a separate breakdown over several “market platforms”, including Data Center. Within Data Center (which is what we use), revenue further is broken into “Compute” and “Networking”. For whatever reason, Data Center revenue (“Compute” plus “Networking”) is ~1% smaller than the “Compute & Networking” reportable segment. 

  15. Nvidia’s cloud reportedly runs mostly on rented capacity from Oracle and neoclouds. Nvidia also uses external cloud compute for at least some of the compute it uses for internal research and model training. This suggests that Nvidia owns very little of its own hardware (besides undelivered inventory, which we don’t include in our counts). 

  16. Networking equipment within a server would count as Networking revenue. 

  17. “we think if it was any higher than that, OEMs and ODMs would be screaming”, referring to third parties that sell Nvidia servers or assemble them as a service. 

  18. The fact that Nvidia received an export license to sell H200s, but not H100s, to China in late 2025 is perhaps a sign that the H100 was out of production or irrelevant by that time. 

  19. Alternatively, they can be deployed in 8-GPU B200 servers or standalone. 

  20. Nvidia said that by April 2025, hyperscalers were deploying almost 1000 NVL72 racks per week. At $2 to 3 million in revenue each, this would be 13 * 1000 * 2-3 million = $26B to 39B in revenue, versus $41 billion in total data center revenue the following quarter. 

  21. Full quote (while holding up a physical B200 die): “This will cost, you know, 30, 40 thousand dollars” 

  22. Company wide gross margin is about 75%, and could be higher for AI GPUs specifically. 

  23. The graph says 6M Blackwell GPUs, the small text on the slide “Blackwell and Rubin are 2 GPUs per chip” indicates that this is a case of Jensen math: he is counting the two GPU dies in each package as a separate GPU. So 6M actually means 3M Blackwell GPUs as traditionally understood (as in, a standalone B200 contains two GPUs under this accounting, and an NVL72 system contains 144). This is not a reference to the fact that GB200/GB300 superchips contain two (or four) GPUs each. 

  24. And a CEO of a publicly traded company discussing sales figures would most likely round down, not up. 

Records

The AI Chip Production dataset describes estimated AI accelerator shipments in “timeline” batches broken down by time periods, chip designer, and chip type. The level of specificity in time period and chip type can vary by the available information and data.

We provide a comprehensive guide to the database’s fields below. This includes example field values as reference.

If you would like to ask any questions about the database, or request a field that should be added, feel free to contact us at data@epoch.ai.

Timelines by chip [designer]

This is the primary data record, batching estimates of AI chip shipments by time period and chip type.

Column Type Definition Example value Coverage
Name Text

Brief identifier

FY23Q1 - A100 100%
102 out of 102 records
Chip manufacturer Text

The company that designs, produces, or sells the relevant chip types

Nvidia 100%
102 out of 102 records
Start date Date

Start date of the time period

2022-01-31 100%
102 out of 102 records
End date Date

End date of the time period

2022-05-01 100%
102 out of 102 records
Number of Units Numeric

Estimated number of chips for the specified manufacturer or chip type shipped between Start date and End date

226248 100%
102 out of 102 records
Compute estimate in H100e (median) Numeric

Median estimate of the computing power of this batch of chips, normalized the equivalent number of H100s in terms of peak 8-bit operations per second

70980 100%
102 out of 102 records
Notes Text

Relevant notes for context

Date estimates were generated: 01-06-2026 11:57 46%
47 out of 102 records
Chip TDP Numeric

Per-chip thermal design power in watts

400.0 89%
91 out of 102 records
Total TDP (W) Numeric

Total power capacity of the chips (Number of Units × Chip TDP). Excludes server and facility overheads.

90499200 100%
102 out of 102 records
Chip type Link to record

Relevant AI chip type

A100 100%
102 out of 102 records
Cost estimate (USD) Numeric

Approximate total purchase price of chips in nominal USD, excluding server, system, and facility costs

2714976000 100%
102 out of 102 records

Organization

Organizations involved in chip design

Column Type Definition Example value Coverage
Name Text

Organization name.

Nvidia 100%
5 out of 5 records
Country Text

Country of headquarters.

United States of America 100%
5 out of 5 records
Type Categorical (multiple select)

Role in AI compute supply, e.g. Chip designer or Compute owner.

Chip designer 100%
5 out of 5 records
Chip types Link (multi → Chip types)

Link to Chip Type table. Chip models designed by this organization.

H100,A800,H800,H20,A100,B200,B300,H200 100%
5 out of 5 records

Chip types

Metadata table with specifications for each chip model. This is a consolidated set of key fields; most of the data is synced with the Epoch ML Hardware dataset using the linked record.

Column Type Definition Example value Coverage
Name Text

Chip model name.

TPU v3 100%
25 out of 25 records
ML Hardware (linked) Link (single → ML Hardware)

Link to relevant record in ML hardware dataset.

Google TPU v3 100%
25 out of 25 records
Primarily for Chinese market Checkbox

Indicates whether the chip was primarily sold in the mainland China market.

False 24%
6 out of 25 records
Designer Link (single → Organization)

The (primary) designer of the chip type.

Google 100%
25 out of 25 records
8-bit OP/s Numeric

Peak dense 8-bit operations per second.

123000000000000 100%
25 out of 25 records
Notes Text

Notes for context.

[empty] 36%
9 out of 25 records
Memory Numeric

Memory capacity in bytes.

32000000000.0 92%
23 out of 25 records
H100e Numeric

Computing power relative to the Nvidia H100 in terms of 8-bit OP/s.

0.0621526023244062 100%
25 out of 25 records

Downloads

Download the AI Chip Sales dataset as individual CSV files for specific data types, or as a complete package containing all datasets.

Timelines by chip CSV

CSV, Updated January 6, 2026

Organizations CSV

CSV, Updated January 6, 2026

Chip types CSV

CSV, Updated January 6, 2026

AI chip sales ZIP

ZIP, Updated January 6, 2026

Acknowledgements

This data was collected by Epoch AI’s employees and collaborators, including John Croxton, Josh You, Venkat Somala, and Yafah Edelman.

This documentation was written by Josh You and Venkat Somala.