AI Models Documentation

Inclusion

The database covers AI models, especially models that are notable for advancing the state of the art, or a large impact on the world or the history of the field. Here, we give an overview of how the data have been collected, and define the criteria for inclusion and notability.

Criteria

To be included in the database, an ML model must satisfy all inclusion criteria:

there must be reliable documentation of its existence and relevance to machine learning;
the model must include a learning component, it cannot be a non-learned algorithm;
the model must actually have been trained, it cannot be a theoretical description without experimental results.

Notability

Once added to the database, models are marked as notable if they satisfy any of the following:

highly cited (over 5000 citations);
high training compute cost (over $1,000,000, measured in 2023 USD, or at least 1% the cost of the most expensive model trained to date, whichever is greater);
significant use (over one million monthly active users);
state of the art performance (typically on a recognised benchmark, see details below);
an equivalent level of historical significance,
notability at least as great as the criteria above, identified at the discretion of Epoch staff.

Where there are many related models, for example several checkpoints along training or several sizes of a given model family, the database preferentially includes the version that used the most compute. Other versions may be included where they are notable in their own right.

State of the art

Identifying whether a model is state-of-the-art can be a more involved process, compared to simply checking citations or the training compute budget. We consider a model to be state of the art if there is good reason to believe that it was the best existing model at the time for a task of genuine interest. The default way to provide evidence for this is state-of-the-art performance on a recognised benchmark.

To be recognised, a benchmark should have any of the following:

100+ citations.
10+ submissions in total from 3+ research groups.
An associated publication in a reputable peer-reviewed academic venue. The publication does not need to focus exclusively on the benchmark; however, the benchmark should be a key result.

At our discretion, we may also identify models as state of the art where no benchmark result exists, but there is convincing evidence that a model truly is state-of-the-art. Eligible sources of evidence here are comparison on a non-benchmark database, a high-quality user preference study, or demonstration of state of the art capabilities. For example, GraphCast is compared against other weather prediction models on a weather database that is not a standalone benchmark. Nevertheless, we take this as convincing evidence that it is state of the art.

Historical significance

Models can be included on the grounds of historical significance if they marked a significant advance in AI history, even if they did not strictly advance the state of the art on any application. For example, many neural network breakthroughs performed worse than other ML techniques, but were directly influential for later AI development. Evidence to support this status may come from citations in later notable models, discussion in reviews or textbooks, or other unambiguous identification as an influential result.

Discretionary identification

Models can be included at the discretion of Epoch staff if they are as notable as the other models identified but not covered by the categories above. For example, we may mark a model as notable if it is on the Pareto frontier of cost-efficiency for an important task despite not having the highest performance on a benchmark.

Example	Include?	Why
Human-level control through deep reinforcement learning	Yes	Well-documented learned model, over 5000 citations, advanced state of the art for autonomous gameplay.
Stochastic Neural Analog Reinforcement Calculator	Yes	No individual associated paper, but other sources confirm its existence, and it was indisputably historically significant as one of the first neural learning systems.
Theory of neural-analog reinforcement systems and its application to the brain model problem	No	Historically significant, but no experimentally trained model; the result is entirely theoretical.
Scaling scaling laws with board games	No	Doesn’t meet any notability criteria. In addition to not being highly cited and using small compute models, there is no attempt at state of the art results. Rather, this is a paper examining scaling details.

Table 1: Examples of models evaluated against inclusion criteria.

Search process

This dataset has been collected from a variety of sources: literature reviews, historical accounts of AI development, highly-cited publications from top conferences, high-profile models from leading industry labs, bibliographies of notable papers, pre-existing datasets curating AI papers (see Acknowledgements), and ad hoc suggestions from contributors.

We monitor news coverage, releases from key AI labs, and benchmarks to identify new models as they are released. This can lead to a lag for new models. Typically, we aim to add the most prominent releases (e.g. GPT-4) within days of release. For less prominent models, reporting lags may extend to months.

Coverage

As of May 25, 2026, the dataset contains 3526 models, of which have compute estimates.

Coverage is most thorough for language and vision models developed since 2018 (1785 models and 534 models respectively), albeit with a lag for the newest models. More specialized domains, such as audio generation, likely have worse coverage in this period.
Coverage is fair, but less thorough, for deep learning language and vision models between 2010-2018 (131 models and 152 models respectively). Again, other domains may have worse coverage.
Coverage is quite sparse for historical models before 2010 (215 models before 2010 compared to 3288 models after), particularly models outside the paradigm of deep learning. Entries here are focused on notable models mentioned in textbooks and reviews, rather than a systematic search across sources.

If you would like to ask any questions about the database, or suggest a model that should be added, contact us at data@epoch.ai.

Overview

Records

AI Progress

Industry

Infrastructure

Impacts

Featured

Publications

Data explorers

Benchmarks by Epoch AI

Papers & Reports

Data Insights

Newsletter

Podcast

Capabilities

Models

Frontier Data Centers

Chip Owners

Companies

Polling on AI Use

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

Inclusion

Criteria

Notability

State of the art

Historical significance

Discretionary identification

Search process

Coverage

AI Models Documentation – Inclusion

AI Progress

Industry

Infrastructure

Impacts

Featured

Publications

Data explorers

Benchmarks by Epoch AI

Scaling

Software progress

Open models

Capabilities

Math

Leading companies

Finances

Geopolitics

Chips

Data centers

Energy

Adoption and use

Economic impact

Future of AI

Publications

Papers & Reports

Data Insights

Newsletter

Podcast

Data explorers

Capabilities

Models

Frontier Data Centers

Chip Owners

Companies

Polling on AI Use

Benchmarks by Epoch AI

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

About Epoch AI

Donate

Team

Careers

Consultations

For press

Transparency

AI Models Documentation

Inclusion

Criteria

Notability

State of the art

Historical significance

Discretionary identification

Search process

Coverage