Announcing our expanded biology AI coverage

Published

Jan 29, 2025

Authors

We’re pleased to announce an expansion of our Biological Model Dataset, a component of Epoch AI’s larger database of machine learning models. As the role of AI in biology continues to grow—powering advances in drug design, protein engineering, and genomics—the opportunities and governance challenges posed by biological AI models increase the importance of tracking advances in this field.

Our goal with this project is to provide a comprehensive resource for researchers and policymakers. To this end, we have curated information from over 360 models in this update, prioritizing recent models at the frontier of capability, scale, or scientific impact. Alongside details on their developers, intended tasks, and training datasets, we’ve included new estimates of the training compute that went into developing them.

Enable JavaScript to see an interactive visualization.

Analyzing compute and data trends can help us understand how invested the field is in scaling as a means to increase performance. The plot above tracks the evolution of training compute and dataset sizes in biological models, highlighting a substantial increase from 2017 to 2021, followed by a relative slowdown. This visualization underscores how quickly the field has advanced—and also suggests that the pace may be changing. By providing transparent, easy-to-explore compute estimates, we hope to enable deeper discussion of what’s driving progress and where bottlenecks may arise in the near future.

Download this data

Finally, because biological models can pose dual-use concerns, we have compiled information about safeguards that developers have adopted to mitigate such risks, such as data filtering, risk evaluations, inference-time refusal, and access controls. In our dataset, fewer than 3% of models have any such safeguards, although the most capable models (large foundation models like EvolutionaryScale’s ESM 3, or powerful specialized models like AlphaFold 3) tend to have more safeguards. We encourage developers to continue sharing best practices for mitigating potential misuse.

To protect sensitive information about model safeguards while enabling responsible research, detailed safeguards data is available upon request. Researchers and developers interested in accessing this information can email safeguards@epoch.ai.

You can find additional information about the dataset in our database documentation.

Sentinel Bio provided a grant to fund this data collection project and make it publicly available. Epoch AI owns the resulting dataset. We thank them for their generous support.

About the authors

Former employee

Pablo Villalobos has a background in Mathematics and Computer Science. After spending some time as a software engineer, he decided to pivot towards AI. His interests include the economic consequences of advanced AI systems and the role of algorithmic improvements in AI progress.

David Atanasov is an undergraduate student studying Bioinformatics and Computer Science. He is interested in the future progress of AI systems and especially their impacts on scientific research and the labor market.

Announcing our expanded biology AI coverage

Published

Authors

About the authors

Tags

Related work

We value your privacy