We've expanded our Biology AI Dataset, now covering 360+ models. Our analysis reveals rapid scaling from 2017-2021, followed by a notable slowdown in biological model development.
By Pablo Villalobos and David Atanasov
We’re pleased to announce an expansion of our Biological Model Dataset, a component of Epoch AI’s larger database of machine learning models. As the role of AI in biology continues to grow—powering advances in drug design, protein engineering, and genomics—the opportunities and governance challenges posed by biological AI models increase the importance of tracking advances in this field.
Our goal with this project is to provide a comprehensive resource for researchers and policymakers. To this end, we have curated information from over 360 models in this update, prioritizing recent models at the frontier of capability, scale, or scientific impact. Alongside details on their developers, intended tasks, and training datasets, we’ve included new estimates of the training compute that went into developing them.
Analyzing compute and data trends can help us understand how invested the field is in scaling as a means to increase performance. The plot above tracks the evolution of training compute and dataset sizes in biological models, highlighting a substantial increase from 2017 to 2021, followed by a relative slowdown. This visualization underscores how quickly the field has advanced—and also suggests that the pace may be changing. By providing transparent, easy-to-explore compute estimates, we hope to enable deeper discussion of what’s driving progress and where bottlenecks may arise in the near future.
Finally, because biological models can pose dual-use concerns, we have compiled information about safeguards that developers have adopted to mitigate such risks, such as data filtering, risk evaluations, inference-time refusal, and access controls. In our dataset, fewer than 3% of models have any such safeguards, although the most capable models (large foundation models like EvolutionaryScale’s ESM 3, or powerful specialized models like AlphaFold 3) tend to have more safeguards. We encourage developers to continue sharing best practices for mitigating potential misuse.
To protect sensitive information about model safeguards while enabling responsible research, detailed safeguards data is available upon request. Researchers and developers interested in accessing this information can email [email protected].
Sentinel Bio provided a grant to fund this data collection project and make it publicly available. Epoch AI owns the resulting dataset. We thank them for their generous support.
About the authors
Pablo Villalobos
Pablo Villalobos has a background in Mathematics and Computer Science. After spending some time as a software engineer, he decided to pivot towards AI. His interests include the economic consequences of advanced AI systems and the role of algorithmic improvements in AI progress.
David Atanasov
David Atanasov is an undergraduate student studying Bioinformatics and Computer Science. He is interested in the future progress of AI systems and especially their impacts on scientific research and the labor market.
Epoch AI’s work is free to use, distribute, and reproduce provided the source and authors are credited under the Creative Commons Attribution license.
Citation
Pablo Villalobos and David Atanasov (2025), "Announcing our expanded biology AI coverage". Published online at epoch.ai. Retrieved from 'https://epoch.ai/blog/announcing-expanded-biology-ai-coverage' [online resource]. Accessed 13 Apr 2026.
BibTeX Citation
@misc{epoch2025announcingexpandedbiologyaicoverage,
title={Announcing our expanded biology AI coverage},
author={Pablo Villalobos and David Atanasov},
year={2025},
url={https://epoch.ai/blog/announcing-expanded-biology-ai-coverage},
note={Accessed: 2026-04-13}}
Feedback
Have a question? Noticed something wrong? Let us know.
Announcing our expanded biology AI coverage
We've expanded our Biology AI Dataset, now covering 360+ models. Our analysis reveals rapid scaling from 2017-2021, followed by a notable slowdown in biological model development.