Machine Learning Hardware Documentation

Overview

Epoch AI’s Machine Learning Hardware dataset is a collection of AI accelerators, such as graphics processing units (GPUs) and tensor processing units (TPUs), used to develop and deploy machine learning models in the deep learning era.

This documentation describes the processors included in the dataset, its records, data fields, and definitions, and a changelog and acknowledgements.

The data is available on our website as a visualization or table, and is available for download as a CSV file, updated daily. For a quick-start example of loading the data and working with it in your research, see this Google Colab demo notebook.

If you would like to ask any questions about the data, or suggest hardware that should be added, feel free to contact us at data@epoch.ai.

If this data is useful for you, please cite it as illustrated:

Citation

      Epoch AI, ‘Data on Machine Learning Hardware’. Published online at epoch.ai. Retrieved from: ‘https://epoch.ai/data/machine-learning-hardware’ [online resource]
    

BibTeX citation

@misc{EpochMLHardware2024,
  title = "Data on Machine Learning Hardware",
  author = {{Epoch AI}},
  year = 2024,
  url = {https://epoch.ai/data/machine-learning-hardware},
  note = "Accessed: "
}

Inclusion

This dataset focuses on machine learning processors. These are processors used to train and deploy ML and AI models, especially those included in our Notable AI Models dataset. Here we explain the inclusion and search process and give an overview of data sources.

Inclusion criteria

To identify ML hardware, we annotated chips used for ML training in our database of Notable AI Models. We additionally added ML hardware that has not been documented in training those systems, but is clearly manufactured for ML - based on its description, supported numerical formats, or belonging to the same chip family as other ML hardware.

We use hardware datasheets, documented for each chip in the dataset, to fill in key information such as computing performance, die size, etc. Not all information is available, or even applicable, for all hardware, so columns can be left empty. We additionally use other sources, such as news coverage or hardware price archives, to fill in the price on release.

Records

This dataset has fields containing various processor details, attributes, and specifications. Records in the dataset have information about three broad areas:

Specifications about the processors, such as their clock speed, memory capacity, and performance.

Provenance details, such as the manufacturer and release date.

Metadata, such as sources containing information about the hardware, and a list of models it has been used to produce.

We provide a comprehensive guide to the data fields, below. This includes examples taken from the NVIDIA A100 SXM4 40 GB datacenter GPU, which is one of the most popular hardware used for machine learning. If you would like to request a field be added, contact us at data@epoch.ai.

Column	Notes	Example from NVIDIA A100 SXM4 40 GB
Name of the hardware	The full name of the hardware, including the manufacturer. For example, “Google TPU v5p”. Note that there can be different variations of hardware based on similar chips, which should be named distinctly, for example “NVIDIA H100 SXM5 80GB” versus “NVIDIA H100 PCIe”.
Manufacturer	The manufacturer of the hardware.	NVIDIA
Type	Indicates whether the hardware is a central processing unit (CPU), graphics processor (GPU), or tensor processor (TPU). For a small number of experimental other accelerators, such as the Meta MTIA series, this is “Other”.	GPU
Release date	The date when the hardware could first be rented, used for machine learning workloads, or purchased (excluding pre-orders).	2020-05-14
Release price (USD)	Price of the processor when released, in nominal US dollars. Prices are collected from hardware catalogs, news sources, or other documentation. Listed prices do not reflect bulk discounts.	0
FP64 (double precision) performance (FLOP/s)	These are performance figures for non-tensor operations, at different numerical precisions. Beginning in 2017, ML hardware added tensor cores specifically to optimize tensor operations, which are commonly used in AI training.	0.0e
FP32 (single precision) performance (FLOP/s)	These are performance figures for non-tensor operations, at different numerical precisions. Beginning in 2017, ML hardware added tensor cores specifically to optimize tensor operations, which are commonly used in AI training.	0.0e
FP16 (half precision) performance (FLOP/s)	These are performance figures for non-tensor operations, at different numerical precisions. Beginning in 2017, ML hardware added tensor cores specifically to optimize tensor operations, which are commonly used in AI training. FP16 data excludes processors with greater performance in FP32 than in FP16, because these are not designed to support half-precision calculations.	0.0e
TF32 (TensorFloat-32) performance (FLOP/s)	These are performance figures for tensor operations, specifically optimized for AI training.	0.0e
Tensor-FP16/BF16 performance (FLOP/s)	These are performance figures for tensor operations, specifically optimized for AI training.	0.0e
INT16 performance (OP/s)	These are performance figures for integer operations, at different numerical precisions.	0.0e
INT8 performance (OP/s)	These are performance figures for integer operations, at different numerical precisions.	0.0e
INT4 performance (OP/s)	These are performance figures for integer operations, at different numerical precisions.	0.0e
Memory size per board (byte)	The hardware’s amount of memory, in bytes.	0.0e
Memory bandwidth (byte/s)	Rate of data transfer between memory and processor, in bytes per second.	0.0e
Intranode bandwidth (byte/s)	Data transfer rate within a single node, in bytes per second. Nodes typically consist of servers which may contain CPUs, GPUs, memory, storage, etc.	0.0e
Internode bandwidth (bit/s)	Data transfer rate between separate nodes, in bits per second. Nodes typically consist of servers which may contain CPUs, GPUs, memory, storage, etc.	0.0e
Die size (mm^2)	The physical size or area of the processing chip, in square millimeters.	0
TDP (W)	Thermal design power, the theoretical maximum power that can be dissipated as heat. In theory, this is the maximum sustainable power draw for a given chip.	0
Base clock (MHz)	Default operating frequency of the processor, in megahertz.	0
Boost clock (MHz)	Maximum operating frequency of the processor, in megahertz.	0
Memory clock (MHz)	Operating frequency of the processor’s memory, in megahertz.	0
Memory bus (bit)	Amount of data that can be transferred between the memory and processor per cycle, in bits.	0
Tensor cores	Number of tensor cores, a specialized NVIDIA hardware component designed to accelerate matrix and tensor operations.	432
Process size (nm)	Nominal semiconductor manufacturing scale, in nanometers.	0
Foundry	The semiconductor manufacturer responsible for producing the processor die or chip in a foundry or fabrication plant.	TSMC
Number of transistors (millions)	Number of transistors in the processor, in millions.	0
Link to datasheet	Links to document(s) containing specifications or data about the processor.	https://www.techpowerup.com/gpu-specs/a100-sxm4-40-gb.c3506
Source for the price	Link to source(s) listing the price of the hardware.	https://www.nextplatform.com/2022/05/09/how-much-of-a-premium-will-nvidia-charge-for-hopper-gpus/
ML models	ML models trained with this hardware, cross-referenced from our database of models.	Florence, Luminous-supreme, Falcon-180B, GPT-3.5, GPT-4, StableLM-Base-Alpha-7B, Phi-1.5, WeLM, GLM-130B, BlenderBot 3, GPT-NeoX-20B, TinyLlama-1.1B (1T token checkpoint), TinyLlama-1.1B (3T token checkpoint), StableLM-2-1.6B, DINOv2, Stable Code 3B, Falcon-7B, Qarasu-14B, Flan T5-XXL + BLIP-2, BLIP-2 (Q-Former), Swin Transformer V2 (SwinV2-G), SPHINX (Llama 2 13B), EVA-01, CoRe, InstructBLIP, xTrimoPGLM -100B, MPT-7B, Pythia-12b, Pythia-2.8b, Pythia-6.9b, Pythia-160m, Pythia-1b, Pythia-1.4b, Pythia-70m, Pythia-410m, PLaMo-13B, Falcon 2 11B, Janus 1.3B, Luminous-extended, Luminous-base, TeleChat-7B, TeleChat-3B, TeleChat-12B, aiXcoder-7B Base, Janus-Pro-7B, Janus-Pro-1B, SEA-LION V1 3B, SEA-LION V1 7B, Llama-SEA-LION-v2-8B-IT, Novae, HelixProtX, Sailor-7B-Chat, SEA-LION-v1-7B-IT, SGPT BE 5.8B, ToolFormer, Stable Diffusion 2.1, AntiFormer, GPT-2 Medium (FlashAttention), Llemma 7B, Llemma 34B

Changelog

2024-10-23

Dataset published to epoch.ai.

Acknowledgements

The data have been collected by Epoch AI’s employees and collaborators, including Marius Hobbhahn, Lennart Heim, Gökçe Aydos, Robi Rahman, Josh You, Bartosz Podkanowicz, Luke Emberson, Natalia Martemianova, James Sanders, Joanne Fang, and Veronika Blablová.

We would like to thank Hanna Dohmen and Jacob Feldgoise for generously allowing us to include data from their report A Bigger Yard, A Higher Fence: Understanding BIS’s Expanded Controls on Advanced Computing Exports.

This documentation was written by Robi Rahman and David Owen.