Paper
Jul. 5, 2022

Machine learning model sizes and the parameter gap

The model size of notable machine learning systems has grown ten times faster than before since 2018. After 2020 growth has not been entirely continuous: there was a jump of one order of magnitude which persists until today. This is relevant for forecasting model size and thus AI capabilities.

By Pablo Villalobos, Jaime Sevilla, Tamay Besiroglu, Lennart Heim, Anson Ho, and Marius Hobbhahn

Summary: The model size of notable machine learning systems has grown ten times faster than before since 2018. After 2020 growth has not been entirely continuous: there was a jump of one order of magnitude which persists until today. This is relevant for forecasting model size and thus AI capabilities.

In current ML systems, model size (number of parameters) is related to performance via known scaling laws. We used our dataset to analyze trends in the model size of 237 milestone machine learning systems. The systems are categorized into Language, Vision, Games and Other according to the task they solve.

Model size slowly increased by 7 orders of magnitude from the 1950s to around 2018. Since 2018, growth has accelerated for language models, with model size increasing by another 4 orders of magnitude in the four years from 2018 to 2022 (see Figure 1). Other domains like vision have grown at a more moderate pace, but still faster than before 2018.

Figure 1. Left: Transition period around 2018, assuming a single post-2018 trend. Right: the same period, assuming two separate post-2018 trends.

PeriodDataScale (start to end)SlopeDoubling time\(R^2\)
1952 to 2018\(n=109\)1e+01 to 3e+7 params0.1 OOMs/year
[0.1; 0.1; 0.1]
39.7 months
[36.4; 39.7; 40.7]
0.62
2018 to 2022
(single trend)
\(n=129\)3e+7 to 2e+12 params0.9 OOMs/year
[0.9; 0.9; 1.0]
4.0 months
[3.5; 4.0; 4.3]
0.31
2018 to 2022
(above gap)
\(n=27\)7e+10 to 2e+12 params-0.1 OOMs/year
[-0.4; -0.1; 0.2]
-14.2 months
[-52.5; -14.2; 52.0]
0.00
2018 to 2022
(below gap)
\(n=102\)3e+7 to 2e+10 params0.5 OOMs/year
[0.4; 0.5; 0.5]
8.0 months
[7.0; 8.0; 9.8]
0.25

Table 1. Summary of our main results. Around 2018 there was a general increase in growth. This can be split into the previous trend increasing its growth rate, and a separate cluster of very large models appearing on top.

The parameter gap

Starting in 2020, we see many models below 20B parameters and above 70B parameters, but very few in the 20B-70B range. We refer to this scarcity as the parameter gap (see Figure 2).

Figure 2: Model size over time, separated by domain. Red lines highlight the parameter gap. Most systems above the gap are language or multimodal models.

We have come up with some hypotheses that explain the parameter gap, of which these two are the ones most consistent with the evidence:

  1. Increasing model size beyond 20B parameters has a high marginal cost due to the need to adopt different parallelism techniques, so that mid-sized models are less cost-effective than bigger or smaller ones.
  2. GPT-3 initiated the gap by ‘jumping’ one order of magnitude in size over previous systems. This gap was maintained because researchers are incentivized to build the cheapest model that can outperform previous models. Those competing with GPT-3 are above the gap; the rest are below.

The existence of the parameter gap suggests that model size has some underlying constraints that might cause discontinuities in the future.
 

Read the full paper now on the arXiv

About the authors

Pablo Villalobos's avatar
Pablo Villalobos
Pablo Villalobos has a background in Mathematics and Computer Science. After spending some time as a software engineer, he decided to pivot towards AI. His interests include the economic consequences of advanced AI systems and the role of algorithmic improvements in AI progress.
Jaime Sevilla's avatar
Jaime Sevilla
Jaime Sevilla is the co-founder and director of Epoch AI. His research is focused on technological forecasting and the trajectory of AI. He has a background in Mathematics and Computer Science.
Tamay Besiroglu's avatar
Tamay Besiroglu
Tamay Besiroglu co-founded Epoch AI and remains contributing to the organization as a research advisor. He left Epoch to co-lead Mechanize, a startup building virtual work environments, benchmarks, and training data for AI development. His research expertise focuses on the economics of computing and broader trends in machine learning.
Lennart Heim's avatar
Lennart Heim
Lennart Heim is a former researcher at Epoch AI. His research focuses on the role of compute for advanced AI systems and how compute can be leveraged as an instrument for AI governance, with an emphasis on policy development and security implications.
Anson Ho's avatar
Anson Ho
Anson Ho is a researcher at Epoch AI. He is interested in helping develop a more rigorous understanding of future developments in AI and its societal impacts.
Marius Hobbhahn's avatar
Marius Hobbhahn
Marius Hobbhahn builds models for AI timelines and takeoff using historical trends and his best understanding of the future.