What did US export controls mean for China’s AI capabilities?
Published
Four days ago, the US government announced new rules around the export of powerful chips and semiconductor manufacturing equipment to China. This recent update is part of a broader trend of increasing export restrictions, following the announcement of the first export controls by the US Bureau of Industry and Security (BIS) in 2022 and the first revision in 2023.
The BIS documents in which these export controls are announced are quite long. The most recent update takes up two documents, totaling 210 pages, and previous updates weren’t any shorter. In this issue, I aim to cut through most of this noise and give you a broad picture of what impact the chip export controls have had on China’s ability to train and serve powerful AI models.
The high-level takeaway is that current export controls on China give the US a hardware lead of around 4 years when it comes to training frontier models, but the US has essentially no lead when it comes to serving those models to users. This difference arises because US export controls have thus far been focused on targeting the arithmetic performance of chips, which matters more for training but less for inference. Memory bandwidth has not been targeted at all, while network bandwidth was targeted in a rather ineffective way in 2022 which was abandoned with the 2023 update.
The October 2022 export controls
With the high-level conclusion out of the way, let’s go into some more detail to understand exactly how the chip export controls have changed over time. Under the rules introduced on October 7 2022, for the export of a chip to China to be prohibited, the chip had to meet two criteria at once:
-
It had to have a high enough arithmetic performance. The BIS has their own metric for measuring this, but this bar was set just below the A100, which was NVIDIA’s state-of-the-art machine learning GPU at the time.
-
It had to have a high enough network bandwidth. The threshold set by the BIS was 600 GB/s, which happens to be exactly equal to the NVLink bandwidth of the A100.
Overall, the rules appear to have been set precisely to make the A100 into an effective threshold: any chip weaker than the A100 could be freely exported, while any chip equal to or stronger than the A100 could not. With the new generation H100 poised to become the dominant GPU in 2023, and China not able to produce a domestic equivalent, we might think these rules would have given the US a significant lead in hardware performance. However, this didn’t happen, because together with the new generation H100 NVIDIA also released the H800 specifically for export to the Chinese market.
The H800 had exactly the same arithmetic performance as the H100, which is around three times that of the A100, so normally rule (1) would have banned its export to China. NVIDIA got around this restriction by lowering the H800’s NVLink bandwidth to 400 GB/s, compared to the 900 GB/s supported by the H100, ensuring the H800 remained below the bar set by (2).
This might still seem like a significant cost: the H800 has less than half of the H100’s network bandwidth, and surely NVIDIA wouldn’t be packing so much NVLink bandwidth into each device if this bandwidth was useless? However, detailed calculations show that 400 GB/s of NVLink bandwidth is still enough to orchestrate a frontier training run without running into communication bottlenecks, mostly because we can hide network communication behind useful work done by the GPUs. In this setting training efficiency is largely about FLOP per second per dollar, and on this metric the H800 was about as good as the H100.
For inference, the picture is more mixed. In short-context inference, it’s harder to hide communication behind arithmetic because the batch sizes are smaller and this causes latency to become a bigger problem for various overlapping schemes. Because of this, when it comes to price per output token for a fixed model and at a fixed token generation speed, the H800 stands somewhere between the A100 and the H100, giving the US a lead of around 18 months in price-performance.
In contrast, in long contexts, the cost of inference is dominated by having to read the KV cache from memory for every generated token, or perhaps every few generated tokens if we use speculative sampling. Because the KV cache adds little network communication cost, long-context inference performance is dominated almost entirely by memory bandwidth. The H800 and the H100 have identical memory bandwidth, so they also have very similar long-context inference performance, despite the network bandwidth difference.
Thus, the 2022 export controls created a situation in which the US had an 18 month lead over China in short-context inference, while in frontier model training and in long-context inference there was rough parity in hardware capabilities.
The October 2023 update
On October 17 2023, the BIS issued revised rules, tightening the export restrictions they had announced the previous year. There were two major changes that are relevant for our purposes:
-
Network bandwidth was removed entirely as a variable of interest. The new export control regime for chips was only about arithmetic performance.
-
The old limit at the A100 level for arithmetic power was nominally maintained, but the BIS added a licensing requirement for the export of chips that are at least half as powerful as the A100. In practice, this amounted to halving the ceiling on arithmetic performance for chips that can be exported to China.
Rule change (1) eliminated the H800 as a GPU that could be exported to China, because its arithmetic power was very high and the export controls no longer cared about its low network bandwidth. As a result, NVIDIA had to develop a new GPU for export to the Chinese market.
The GPU that resulted from this was the H20. This is a slightly scaled down version of the H200 when it comes to all variables aside from FLOP per second, but the limit imposed by rule change (2) was so tight that NVIDIA was only able to match around 15% of the H200’s arithmetic power. Because the H20 had an average sales price around half of the H200, this immediately gave the US around a 3x lead in training price-performance.
China’s best indigenous AI accelerator, the Ascend 910B produced by Huawei, is reported to have a similar performance to the A100. Just like the H20, the A100 is also around 3x worse than the H100 and H200 when it comes to training price-performance. So if we treat this as a lower bound on Chinese hardware capabilities, then the chip export controls were about as effective as they could have been looking only at hardware performance during training. Of course, in practice China’s limited capacity for scaling up production and their reliance on high-bandwidth memory and manufacturing equipment produced abroad means the US had more levers they could pull if they want to damage Chinese capabilities further.
The picture is very different when it comes to inference performance. The H20 has a very high memory and network bandwidth for its amount of arithmetic power, because these variables were not subject to export limitations and were therefore largely copied over from the much bigger H200 GPU. As I explained earlier, these are the decisive variables for determining inference performance. In practice, the slightly worse specs of the H20 are offset by the higher price of the H200, and the two GPUs yield similar price-performance in both short-context and long-context inference.
Piecing all this together, the 2023 update to the export controls created a new situation in which the US acquired a 3x cost advantage, or equivalently a four year lead, over China in training price-performance. However, the short-context inference advantage acquired after the 2022 controls was lost, as NVIDIA no longer had any reason to limit network bandwidth in their chips. The situation on the inference side returned to the pre-export control status quo in which China and the US were at parity.
The December 2024 update
The BIS issued the most recent update to the export controls on December 2 2024. This update didn’t affect chip exports, and focused more on expanding the list of targeted Chinese entities and preventing the export of high-bandwidth memory (HBM) to China. The new rule sets a limit of 2 GB/s/mm^2 on the bandwidth density of exported memory units.
This rule by itself would have prevented the export of all of the chips we’ve discussed in this post so far, as typical bandwidth densities for the HBM units of these GPUs is probably around 5 times larger than the 2 GB/s/mm^2 threshold. However, the control explicitly states that this rule only applies to HBM units sold by themselves, not to chips which come with co-packaged HBM. The upshot is despite speculation that low arithmetic, high bandwidth GPUs such as the H20 were going to be next on the ban list, it looks like NVIDIA will be able to continue exporting them to China for the next year.
The updated rules might have a more significant impact on China’s ability to manufacture chips domestically, as China relies on external suppliers such as SK Hynix and Samsung for HBM. However, as H20s match the price-performance of China’s best domestic chips and their export to China remains legal, the December 2024 update hasn’t changed anything when it comes to the balance of current hardware capabilities between the US and China.
What’s going to happen next?
If we assume that the US doesn’t relax existing export controls in the future, then as hardware progress continues the gap in training price-performance between the best US chips and the best chips that can be exported to China will continue to widen. At some point, we expect the gap to be so large that China will get a better deal by manufacturing chips domestically compared to buying them from the US. The important question is when this will happen.
If the US only relied on export controls on chips, then the current four year gap seems to be the best the US can do because the Ascend 910B is a rough equivalent of the A100. However, the US has more levers to pull when it comes to preventing the export of key components and manufacturing equipment to China, and it’s unclear if China can still produce many Ascend 910Bs without these imports.
I would guess the training price-performance gap between the US and China will remain roughly the same for the next few years, because Chinese efforts to catch up to the technology frontier will be offset by increasing US restrictions on components and equipment that China can import.
When it comes to inference, the future seems harder to predict because it’s unclear if the next revision of the export control rules will target chips with a lot of memory bandwidth or not. There is an explicit clause in the most recent update excluding these chips from being affected by controls on HBM exports, which suggests the BIS considered doing this but decided against it (as opposed to HBM concerns not being on their radar at all). I don’t have a good model of the political process that produced this result, so I can’t make a confident prediction about how we should expect it to change. There’s still a decent chance we’re going to see chips like the H20 on the chopping block next year, though.