Machine Learning Hardware Documentation

Records

This dataset has fields containing various processor details, attributes, and specifications. Records in the dataset have information about three broad areas:

Specifications about the processors, such as their clock speed, memory capacity, and performance.

Provenance details, such as the manufacturer and release date.

Metadata, such as sources containing information about the hardware, and a list of models it has been used to produce.

We provide a comprehensive guide to the data fields, below. This includes examples taken from the NVIDIA A100 SXM4 40 GB data center GPU, which is one of the most popular hardware used for machine learning. If you would like to request a field be added, contact us at data@epoch.ai.

Column	Type	Definition	Example from NVIDIA A100 SXM4 40 GB
Hardware name	Text	The full name of the hardware, including the manufacturer. For example, “Google TPU v5p”. Note that there can be different variations of hardware based on similar chips, which should be named distinctly, for example “NVIDIA H100 SXM5 80GB” versus “NVIDIA H100 PCIe”.	NVIDIA A100 SXM4 40 GB
Manufacturer	Text	The manufacturer of the hardware.	NVIDIA
Type	Text	Indicates whether the hardware is a central processing unit (CPU), graphics processor (GPU), or tensor processor (TPU). For a small number of experimental other accelerators, such as the Meta MTIA series, this is “Other”.	GPU
Release date	Date	The date when the hardware could first be rented, used for machine learning workloads, or purchased (excluding pre-orders).	2020-05-14
Release price (USD)	Numeric	Price of the processor when released, in nominal US dollars. Prices are collected from hardware catalogs, news sources, or other documentation. Listed prices do not reflect bulk discounts.	$15,000
FP64 (double precision) performance (FLOP/s)	Numeric	These are performance figures for non-tensor operations, at different numerical precisions. Beginning in 2017, ML hardware added tensor cores specifically to optimize tensor operations, which are commonly used in AI training.	9.7e+12
FP32 (single precision) performance (FLOP/s)	Numeric	These are performance figures for non-tensor operations, at different numerical precisions. Beginning in 2017, ML hardware added tensor cores specifically to optimize tensor operations, which are commonly used in AI training.	1.9e+13
FP16 (half precision) performance (FLOP/s)	Numeric	These are performance figures for non-tensor operations, at different numerical precisions. Beginning in 2017, ML hardware added tensor cores specifically to optimize tensor operations, which are commonly used in AI training. FP16 data excludes processors with greater performance in FP32 than in FP16, because these are not designed to support half-precision calculations.	7.8e+13
TF32 (TensorFloat-32) performance (FLOP/s)	Numeric	These are performance figures for tensor operations, specifically optimized for AI training.	1.6e+14
Tensor-FP16/BF16 performance (FLOP/s)	Numeric	These are performance figures for tensor operations, specifically optimized for AI training.	3.1e+14
INT16 performance (OP/s)	Numeric	These are performance figures for integer operations, at different numerical precisions.	NaN
INT8 performance (OP/s)	Numeric	These are performance figures for integer operations, at different numerical precisions.	6.2e+14
INT4 performance (OP/s)	Numeric	These are performance figures for integer operations, at different numerical precisions.	NaN
Memory size per board (byte)	Numeric	The hardware’s amount of memory, in bytes.	4.0e+10
Memory bandwidth (byte/s)	Numeric	Rate of data transfer between memory and processor, in bytes per second.	1.6e+12
ML OP/s	Numeric	Maximum performance in any format 8 bits or wider, in units of FLOP/s or OP/s.	6.2e+14
Total processing performance (bit-OP/s)	Numeric	Total Processing Performance (TPP) is the maximum theoretical performance of the hardware, measured in bit-operations per second (bit-op/s). It is calculated by multiplying the maximum operations per second for each supported numeric format by the bit width of that format, then using the format that produces the highest value.	5.0e+15
Intranode bandwidth (byte/s)	Numeric	Data transfer rate within a single node, in bytes per second. Nodes typically consist of servers which may contain CP0Us, GPUs, memory, storage, etc.	6.0e+11
Internode bandwidth (bit/s)	Numeric	Data transfer rate between separate nodes, in bits per second. Nodes typically consist of servers which may contain CPUs, GPUs, memory, storage, etc.	2.0e+11
Die size (mm^2)	Numeric	The physical size or area of the processing chip, in square millimeters.	826
TDP (W)	Numeric	Thermal design power, the theoretical maximum power that can be dissipated as heat. In theory, this is the maximum sustainable power draw for a given chip.	400
Base clock (MHz)	Numeric	Default operating frequency of the processor, in megahertz.	1095
Boost clock (MHz)	Numeric	Maximum operating frequency of the processor, in megahertz.	1410
Memory clock (MHz)	Numeric	Operating frequency of the processor’s memory, in megahertz.	1215
Memory bus (bit)	Numeric	Amount of data that can be transferred between the memory and processor per cycle, in bits.	5120
Tensor cores	Numeric	Number of tensor cores, a specialized NVIDIA hardware component designed to accelerate matrix and tensor operations.	432
Process size (nm)	Numeric	Nominal semiconductor manufacturing scale, in nanometers.	7
Foundry	Text	The semiconductor manufacturer responsible for producing the processor die or chip in a foundry or fabrication plant.	TSMC
Number of transistors (millions)	Numeric	Number of transistors in the processor, in millions.	54200
Link to datasheet	URL	Links to document(s) containing specifications or data about the processor.	https://www.techpowerup.com/gpu-specs/a100-sxm4-40-gb.c3506
Source for the price	URL	Link to source(s) listing the price of the hardware.	https://www.nextplatform.com/2022/05/09/how-much-of-a-premium-will-nvidia-charge-for-hopper-gpus/
ML models	Categorical (multiple select)	ML models trained with this hardware, cross-referenced from our database of models.	Florence, Luminous-supreme, Falcon-180B, GPT-3.5 (davinci-002), GPT-4 (Mar 2023), StableLM-Base-Alpha-7B, Phi-1.5, WeLM, GLM-130B, BlenderBot 3, GPT-NeoX-20B, TinyLlama-1.1B (1T token checkpoint), TinyLlama-1.1B (3T token checkpoint), StableLM-2-1.6B, DINOv2, Stable Code 3B, Falcon-7B, Qarasu-14B, Flan T5-XXL + BLIP-2, BLIP-2 (Q-Former), Swin Transformer V2 (SwinV2-G), SPHINX (Llama 2 13B), EVA-01, CoRe, InstructBLIP, xTrimoPGLM -100B, MPT-7B, Pythia-12b, Pythia-2.8b, Pythia-6.9b, Pythia-160m, Pythia-1b, Pythia-1.4b, Pythia-70m, Pythia-410m, PLaMo-13B, Falcon 2 11B, Janus 1.3B, Luminous-extended, Luminous-base, TeleChat-7B, TeleChat-3B, TeleChat-12B, aiXcoder-7B Base, Janus-Pro-7B, Janus-Pro-1B, SEA-LION V1 3B, SEA-LION V1 7B, Llama-SEA-LION-v2-8B-IT, Novae, HelixProtX, Sailor-7B-Chat, SEA-LION-v1-7B-IT, SGPT BE 5.8B, ToolFormer, Stable Diffusion 2.1, AntiFormer, GPT-2 Medium (FlashAttention), Llemma 7B, Llemma 34B, Teuken 7B, Aquila2 34B, Aquila2‑70B‑Expr, Deepseek OCR, GPT-4 (Jun 2023)

Inclusion

Changelog

Featured

Publications

Data explorers

Benchmarks by Epoch AI

AI Progress

Industry

Infrastructure

Impacts

Papers & Reports

Data Insights

Newsletter

Podcast

Capabilities

Models

Data Centers

Chip Owners

Companies

Polling on AI Use

MirrorCode

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

Records

Machine Learning Hardware Documentation – Records

Featured

Publications

Data explorers

Benchmarks by Epoch AI

AI Progress

Industry

Infrastructure

Impacts

Publications

Papers & Reports

Data Insights

Newsletter

Podcast

Data explorers

Capabilities

Models

Data Centers

Chip Owners

Companies

Polling on AI Use

Benchmarks by Epoch AI

MirrorCode

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

Scaling

Software progress

Open models

Capabilities

Math

Leading companies

Finances

Geopolitics

Chips

Data centers

Energy

Adoption and use

Economic impact

Future of AI

About Epoch AI

Donate

Team

Careers

Consultations

For press

Transparency

Machine Learning Hardware Documentation

Records