The length of time spent training notable models is growing

Since 2010, the length of training runs has increased by 1.2x per year among notable models, excluding those that are fine-tuned from base models.

A continuation of this trend would ease hardware constraints, by increasing training compute without requiring more chips or power.

However, longer training times face a tradeoff. For very long runs, waiting for future improvements to algorithms and hardware might outweigh the benefits of extended training.

Published

August 16, 2024

Last updated

February 21, 2025

Epoch’s work is free to use, distribute, and reproduce provided the source and authors are credited under the Creative Commons BY license.