This section provides more information about recurring processes in the database: adding new models, updating citation counts, and updating the hosted files by which the dataset can be accessed for analysis.
Entries are added to the dataset near-daily, including both newly-released models and older models newly identified as notable. Typically, most information that can easily be determined from public information is added at the time a model is entered in the database. However, it is common for some information to gradually be entered later. For example, a compute estimate might be omitted at first and only added after we devote further effort to calculating it.
If you would like to ask any questions about the database, or suggest a model that should be added, feel free to contact us at data@epoch.ai.
When models are added to the database, citation counts are recorded for those with academic publications or preprints. At the beginning of each month, citation counts are automatically updated for publications listed in Semantic Scholar. Publications not listed in Semantic Scholar rely on manual entry of citation count.
Epoch AI’s database is hosted as a CSV that is synced with the database daily. The easiest way to load the data in scripts is using the CSV URL. If you need the most up-to-date version reflecting unsynced changes, a CSV can be manually generated from the table view on the website.