Biology AI Models Documentation

Inclusion

The dataset focuses on biological ML models: those that are trained on biological data, including biological sequences, molecular structures or data about molecular properties, among others. Here, we detail criteria for inclusion, and give an overview of how the data have been collected.

Criteria

To be included in the dataset, an ML model must satisfy all inclusion criteria:

there must be reliable documentation of its existence and relevance to machine learning;
the model must include a learning component and cannot be a non-learned algorithm;
the model must have been trained, it cannot be a theoretical description without experimental results;
the model must be directly and explicitly trained on biological data, including:
- biological sequence data;
- biomolecule structure data;
- fitness, pathogenicity or other biological properties of proteins or other biomolecules;
- cell-level data (cell type, expression levels, spatial or imaging data…).

Search process

This data has been collected mainly from a literature review, although some models have been collected from other sources like high-profile models from leading industry labs, bibliographies of notable papers, and ad hoc suggestions from contributors.

Coverage

As of May 25, 2026, the dataset contains 383 models, of which have compute estimates. The dataset does not provide exhaustive coverage of biological models. We attempt to cover the most historically relevant models, as well as significant models released in 2023 and 2024, but we expect some important models to be missing from our dataset.

Overview

Records

Featured

Publications

Data explorers

Benchmarks by Epoch AI

AI Progress

Industry

Infrastructure

Impacts

Papers & Reports

Data Insights

Newsletter

Podcast

Capabilities

Models

Data Centers

Chip Owners

Companies

Polling on AI Use

MirrorCode

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

Inclusion

Criteria

Search process

Coverage

Biology AI Models Documentation – Inclusion

Featured

Publications

Data explorers

Benchmarks by Epoch AI

AI Progress

Industry

Infrastructure

Impacts

Publications

Papers & Reports

Data Insights

Newsletter

Podcast

Data explorers

Capabilities

Models

Data Centers

Chip Owners

Companies

Polling on AI Use

Benchmarks by Epoch AI

MirrorCode

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

Scaling

Software progress

Open models

Capabilities

Math

Leading companies

Finances

Geopolitics

Chips

Data centers

Energy

Adoption and use

Economic impact

Future of AI

About Epoch AI

Donate

Team

Careers

Consultations

For press

Transparency

Biology AI Models Documentation

Inclusion

Criteria

Search process

Coverage