The database covers AI models, especially models that are notable for advancing the state of the art, or a large impact on the world or the history of the field. Here, we give an overview of how the data have been collected, and define the criteria for inclusion and notability.
To be included in the database, an ML model must satisfy all inclusion criteria:
Once added to the database, models are marked as notable if they satisfy any of the following:
Where there are many related models, for example several checkpoints along training or several sizes of a given model family, the database preferentially includes the version that used the most compute. Other versions may be included where they are notable in their own right.
Identifying whether a model is state-of-the-art can be a more involved process, compared to simply checking citations or the training compute budget. We consider a model to be state of the art if there is good reason to believe that it was the best existing model at the time for a task of genuine interest. The default way to provide evidence for this is state-of-the-art performance on a recognised benchmark.
To be recognised, a benchmark should have any of the following:
At our discretion, we may also identify models as state of the art where no benchmark result exists, but there is convincing evidence that a model truly is state-of-the-art. Eligible sources of evidence here are comparison on a non-benchmark database, a high-quality user preference study, or demonstration of state of the art capabilities. For example, GraphCast is compared against other weather prediction models on a weather database that is not a standalone benchmark. Nevertheless, we take this as convincing evidence that it is state of the art.
Models can be included on the grounds of historical significance if they marked a significant advance in AI history, even if they did not strictly advance the state of the art on any application. For example, many neural network breakthroughs performed worse than other ML techniques, but were directly influential for later AI development. Evidence to support this status may come from citations in later notable models, discussion in reviews or textbooks, or other unambiguous identification as an influential result.
Models can be included at the discretion of Epoch staff if they are as notable as the other models identified but not covered by the categories above. For example, we may mark a model as notable if it is on the Pareto frontier of cost-efficiency for an important task despite not having the highest performance on a benchmark.
| Example | Include? | Why |
|---|---|---|
| Human-level control through deep reinforcement learning | Yes | Well-documented learned model, over 5000 citations, advanced state of the art for autonomous gameplay. |
| Stochastic Neural Analog Reinforcement Calculator | Yes | No individual associated paper, but other sources confirm its existence, and it was indisputably historically significant as one of the first neural learning systems. |
| Theory of neural-analog reinforcement systems and its application to the brain model problem | No | Historically significant, but no experimentally trained model; the result is entirely theoretical. |
| Scaling scaling laws with board games | No | Doesn’t meet any notability criteria. In addition to not being highly cited and using small compute models, there is no attempt at state of the art results. Rather, this is a paper examining scaling details. |
This dataset has been collected from a variety of sources: literature reviews, historical accounts of AI development, highly-cited publications from top conferences, high-profile models from leading industry labs, bibliographies of notable papers, pre-existing datasets curating AI papers (see Acknowledgements), and ad hoc suggestions from contributors.
We monitor news coverage, releases from key AI labs, and benchmarks to identify new models as they are released. This can lead to a lag for new models. Typically, we aim to add the most prominent releases (e.g. GPT-4) within days of release. For less prominent models, reporting lags may extend to months.
As of May 4, 2026, the dataset contains 3519 models, of which have compute estimates.
If you would like to ask any questions about the database, or suggest a model that should be added, contact us at data@epoch.ai.