Data insights

Epoch AI’s data insights break down complex AI trends into focused, digestible snapshots. Explore topics like training compute, hardware advancements, and AI training costs in a clear and accessible format.

Biology AI models are scaling 2-4x per year after rapid growth from 2019-2021

The training compute of top AI models trained on biological data grew rapidly in 2019-2021, but has scaled at a more sustainable pace since then (2-4x per year). Training compute for these models increased by 1,000x-10,000x between 2018 and 2021, but has only increased 10x-100x since 2021. We consider two categories of biology AI models. General-purpose protein language models (PLMs) like Evo 2 learn to predict biological sequences, and can generate embeddings that are useful across many tasks. Specialized models like AlphaFold are optimized for specific predictions such as protein structure or mutation effects. PLMs are trained using about 10x more compute than specialized models, but still lag about 100x behind today’s frontier language models, which now are trained with over 1e26 FLOP.
February 21, 2025

The stock of computing power from NVIDIA chips is doubling every 10 months

Total available computing power from NVIDIA chips has grown by approximately 2.3x per year since 2019, enabling the training of ever-larger models. The Hopper generation of NVIDIA AI chips currently accounts for 77% of the total computing power across all of their AI hardware. At this pace of growth, older generations tend to contribute less than half of cumulative compute around 4 years after their introduction. Note this analysis does not include TPUs or other specialized AI accelerators, for which less data is available. TPUs may provide comparable total computing power to NVIDIA chips.
February 13, 2025

Over 20 AI models have been trained at the scale of GPT-4

The largest AI models today are trained with over 1025 floating-point operations (FLOP) of compute. The first model trained at this scale was GPT-4, released in March 2023. Since then, we have identified over {{site.data.insights.models-over-1e25.count pretty_number}} publicly announced AI models from {{site.data.insights.models-over-1e25.organization-count pretty_number}} different AI developers that we believe to be over the 1025 FLOP training compute threshold. Training a model of this scale costs tens of millions of dollars with current hardware. Despite the high cost, we expect a proliferation of such models—we saw an average of roughly two models over this threshold announced every month during 2024. Models trained at this scale will be subject to additional requirements under the EU AI Act, coming into force in August 2025.
January 30, 2025

Chinese language models have scaled up more slowly than their global counterparts

The training compute of top Chinese language models grew rapidly after 2019, catching up to the top models globally by late 2021. Since then, China’s rate of scaling has fallen behind. The top 10 Chinese language models by training compute have scaled up by about 3x per year since late 2021, which is slower than the 5x per year trend maintained elsewhere since 2018. At China’s current rate, it would take about two years to reach where the global top models are today. The difference looks less dramatic when comparing just the largest known models from each region. Grok-2, the largest known US model, used twice the training compute of China’s largest known model Doubao-pro, released 3 months later. Given China’s current scaling rate of 3x per year, it would take 8 months to scale from Doubao-pro to Grok-2’s level of compute.
January 22, 2025

Frontier open models may surpass 1026 FLOP of training compute before 2026

The Biden Administration’s diffusion framework places restrictions on closed-weight models if their training compute surpasses either 1026 FLOP or the training compute of the largest open model. Historical trends suggest that the largest open model will surpass 1026 FLOP by November, 2025, and grow at close to 5x per year thereafter. There is an additional reason to expect a large open model before 2026: Mark Zuckerberg indicated in October 2024 that Llama 4 models were already being trained on a cluster “larger than 100k H100s”. In the same statement, it is strongly implied that these models will continue to be released with open weights. Models trained at this scale are very likely to surpass 1026 FLOP, and appear to be planned for release in 2025.
January 15, 2025

Training compute growth is driven by larger clusters, longer training, and better hardware

Since 2018, the most significant driver of compute scaling across frontier models has likely been an increase in the quantity of hardware used in training clusters. Also important have been a shift towards longer training runs, and increases in hardware performance. These trends are closely linked to a massive surge in investment. AI development budgets have been expanding by around 2-3x per year, enabling vast training and inference clusters and ever-larger models.
January 08, 2025

US models currently outperform non-US models

The best US models have consistently higher accuracies than the best non-US models on GPQA Diamond and MATH Level 5. For example, on GPQA Diamond the best-performing model is OpenAI’s o1, while on MATH Level 5 the leading model is o3-mini. However, with the release of DeepSeek-R1 in January 2025, the gap between US and non-US models has reduced substantially: DeepSeek-R1 trails behind o3-mini by only 2 percentage points on MATH Level 5, and scores only 4 percentage points lower than o1 on GPQA Diamond.
November 27, 2024

Models with downloadable weights currently lag behind the top-performing models

The models with the highest GPQA Diamond and MATH Level 5 accuracies tend not to have downloadable weights. For example, on GPQA Diamond OpenAI’s o1 outperformed the best-performing downloadable model at the time, Phi-4, by 20 percentage points. Similarly on MATH Level 5, Phi-4 lagged behind o1 by 29 percentage points. We further analyzed how far behind open models are in this article, where we found that the best-performing open LLMs lagged the best-performing closed LLMs by between 6 months on GPQA Diamond and 20 months on MMLU. However, the release of DeepSeek-R1 in January 2025 showed that the performance gap between open-weights and closed-weights has significantly decreased. For example, on MATH Level 5 DeepSeek-R1 only lags behind the current best-performing model, o3-mini, by 2 percentage points.
November 27, 2024

Accuracy increases with estimated training compute

GPQA Diamond and MATH Level 5 accuracies increase with estimated training compute. For GPQA Diamond, below 1024 FLOP most models struggle to rise above random chance performance — or even perform worse than random chance, due to failing to understand question formatting. Past 1024 FLOP, performance increases around 12 percentage points with every 10x increase of compute. On MATH Level 5, models with high compute estimates also tend to have higher scores: performance increases around 17 percentage points with every 10x increase in pretraining compute. However, the trend is much noisier than for GPQA Diamond. On both benchmarks, more recent models such as DeepSeek-R1, Phi-4, or Mistral Small 3 outperform older models trained with the same amount of compute, highlighting the role of algorithmic progress. Finally, note that these trends exclude many of the top-performing models, such as OpenAI’s o1, which we lack compute estimates for.
November 27, 2024

AI training cluster sizes increased by more than 20x since 2016

From Google’s NASv3 RL network, trained on 800 GPUs in 2016, to Meta’s Llama 3.1 405B, using 16,384 H100 GPUs in 2024, the number of processors used increased by a factor of over 20. Gemini Ultra was trained with an even larger number of TPUs, but precise details were not reported.
October 23, 2024

Performance per dollar improves around 30% each year

Performance per dollar has improved rapidly, and hardware at any given precision and fixed performance level becomes 30% cheaper each year. At the same time, manufacturers continue to introduce more powerful and expensive hardware.
October 23, 2024

The computational performance of machine learning hardware has doubled every 1.9 years

Measured in 16-bit floating point operations, ML hardware performance has increased at a rate of {{site.data.pcd.ml-hardware.growth.fp16-performance-growth format_growth: ‘%/year’}}, doubling every {{site.data.pcd.ml-hardware.growth.fp16-performance-growth format_growth: ‘doubling time’}}. A similar trend exists in 32-bit performance. Optimized ML number formats and tensor cores provided additional improvements. The improvement was driven by increasing transistor counts and other semiconductor manufacturing improvements, as well as specialized design for AI workloads. This improvement lowered cost per FLOP, increased energy efficiency, and enabled large-scale AI training.
October 23, 2024

The NVIDIA A100 has been the most popular hardware for training notable machine learning models

The NVIDIA A100 has been the most prolific hardware for highly-cited or state-of-the-art AI models identified in Epoch’s dataset, used for {{site.data.pcd.ml-hardware.counts.total-models-a100}} notable ML models since its release. In second place is the NVIDIA V100, used to train {{site.data.pcd.ml-hardware.counts.total-models-v100}} notable models, followed by Google’s TPU v3, used for {{site.data.pcd.ml-hardware.counts.total-models-tpu-v3}}. However, we estimate that the NVIDIA H100 had sold more units than the A100 by late 2023, so it may be the most popular GPU for training models in the near future.
October 23, 2024

Performance improves 12x when switching from FP32 to tensor-INT8

GPUs are typically faster when using tensor cores and number formats optimized for AI computing. Compared to using non-tensor FP32, TF32, tensor-FP16, and tensor-INT8 provide around {{site.data.pcd.ml-hardware.growth.fp32-to-tf32-improvement}}x, {{site.data.pcd.ml-hardware.growth.fp32-to-tf16-improvement}}x, and {{site.data.pcd.ml-hardware.growth.fp32-to-int8-improvement}}x greater performance on average in the aggregate performance trends. Some chips achieve even larger speedups. For example, the H100 is 59x faster in INT8 than in FP32. These improvements account for about half of the overall performance trend improvement since their introduction. Models trained with lower precision formats have become common, especially tensor-FP16, as developers take advantage of this boost in performance.
October 23, 2024

Leading ML hardware becomes 40% more energy-efficient each year

Historically, the energy efficiency of leading GPUs and TPUs has doubled every {{site.data.pcd.ml-hardware.growth.energy-efficiency format_growth: ‘doubling time’}}. In tensor-FP16 format, the most efficient accelerators are Meta’s MTIA, at up to 2.1 x 1012 FLOP/s per watt, and the NVIDIA H100, at up to 1.4 x 1012 FLOP/s per watt. The upcoming Blackwell series of processors may be even more efficient, depending on their power consumption.
October 23, 2024

Leading AI companies have hundreds of thousands of cutting-edge AI chips

The world's leading tech companies—Google, Microsoft, Meta, and Amazon—own AI computing power equivalent to hundreds of thousands of NVIDIA H100s. This compute is used both for their in-house AI development and for cloud customers, including many top AI labs such as OpenAI and Anthropic. Google may have access to the equivalent of over one million H100s, mostly from their TPUs. Microsoft likely has the single largest stock of NVIDIA accelerators, with around 500k H100-equivalents. A large share of AI computing power is collectively held by groups other than these four, including other cloud companies such as Oracle and CoreWeave, compute users such as Tesla and xAI, and national governments. We highlight Google, Microsoft, Meta, and Amazon as they are likely to have the most compute, and there is little public data for others.
October 09, 2024

The power required to train frontier AI models is doubling annually

Training frontier models requires a large and growing amount of power for GPUs, servers, cooling and other equipment. This is driven by an increase in GPU count; power draw per GPU is also growing, but at only a few percent per year. Training compute has grown even faster — around 4x/year. However, hardware efficiency (a 12x improvement in the last ten years), the adoption of lower precision formats (a 8x improvement) and longer training runs (a 4x increase) account for a roughly 2x/year decrease in power requirements relative to training compute.
September 19, 2024

The length of time spent training notable models is growing

Since 2010, the length of training runs has increased by {{site.data.pcd.growth.training-length | format_growth: 'x/year'}} among notable models, excluding those that are fine-tuned from base models. A continuation of this trend would ease hardware constraints, by increasing training compute without requiring more chips or power. However, longer training times face a tradeoff. For very long runs, waiting for future improvements to algorithms and hardware might outweigh the benefits of extended training.
August 16, 2024

Language models compose the large majority of large-scale AI models

Out of {{site.data.pcd.large-scale.count}} large-scale models with known compute, {{site.data.pcd.large-scale.counts.language}} are language models, of which {{site.data.pcd.large-scale.counts.vision pretty_number}} are vision-language models such as GPT-4. The first models trained with 1023 FLOP were for game-playing, but language has dominated since 2021. Other large-scale models have been developed for image and video generation, biological sequence modeling, and robotics.
June 19, 2024

Most large-scale models are developed by US companies

Over half of known large-scale models were developed in the United States. A quarter were developed in China, and this proportion has been growing in recent years. The European Union trails them with {{site.data.pcd.large-scale.counts.eu}} models, while the United Kingdom has developed {{site.data.pcd.large-scale.counts.uk}}. Graph axes start at 2020, as the majority of large-scale models were developed after this.
June 19, 2024

The pace of large-scale model releases is accelerating

In 2017, only {{site.data.pcd.large-scale.counts.confirmed-models-up-to-2017 pretty_number}} models exceeded 1023 FLOP in training compute. By 2020, this grew to {{site.data.pcd.large-scale.counts.confirmed-models-up-to-2020 pretty_number}} models; by 2022, there were {{site.data.pcd.large-scale.counts.confirmed-models-up-to-2022 pretty_number}}, and by 2024, there were {{site.data.pcd.large-scale.counts.confirmed-models-up-to-2024 pretty_number}} models known to exceed 1023 FLOP in our database, and {{site.data.pcd.large-scale.counts.unconfirmed-models-up-to-2024 pretty_number}} more with unconfirmed training compute that likely exceed 1023 FLOP. As AI investment increases and training hardware becomes more cost-effective, models at this scale come within reach of more and more developers.
June 19, 2024

Almost half of large-scale models have published, downloadable weights

{{site.data.pcd.large-scale.counts.confirmed-downloadable-weights}} large-scale models with known compute have downloadable weights. Most of these have a training compute between 1023 and 1024 FLOP, which is less compute than the largest proprietary models. The developers that have released the largest downloadable models today are Meta and the Technology Innovation Institute.
June 19, 2024

The size of datasets used to train language models doubles approximately every eight months

Across all domains of ML, models are using more and more training data. In language modeling, datasets are growing at a rate of {{site.data.pcd.growth.language-dataset format_growth: ‘x/year’}}. The largest models currently use datasets with tens of trillions of words. The largest public datasets are about ten times larger than this, for example Common Crawl contains hundreds of trillions of words before filtering.
June 19, 2024

Training compute costs are doubling every nine months for the largest AI models

The cost of training large-scale ML models is growing at a rate of [{{site.data.pcd.growth.large-scale-training-compute-cost format_growth: ‘x/year’}}]({% link _blog/2024-06-03-how-much-does-it-cost-to-train-frontier-ai-models.md %}). The most advanced models now cost hundreds of millions of dollars, with expenses measured by amortizing cluster costs over the training period. About half of this spending is on GPUs, with the remainder on other hardware and energy.
June 19, 2024

The training compute of notable AI models is doubling roughly every five months

Since 2010, the training compute used to create AI models has been [growing at a rate of {{site.data.pcd.growth.notable-training-compute-post-2010 format_growth: ‘x/year’}}]({% link _blog/2024-05-28-training-compute-of-frontier-ai-models-grows-by-4-5x-per-year.md%}). Most of this growth comes from increased spending, although improvements in hardware have also played a role.
June 19, 2024

Training compute has scaled up faster for language than vision

Before 2020, the largest vision and language models had similar training compute. After that, language models rapidly scaled to use more training compute, driven by the success of transformer-based architectures. Standalone vision models never caught up. Instead, the largest models have recently become multimodal, integrating vision and other modalities into large models such as GPT-4 and Gemini.
June 19, 2024