This dataset has fields containing various processor details, attributes, and specifications. Records in the dataset have information about three broad areas:
Specifications about the processors, such as their clock speed, memory capacity, and performance.
Provenance details, such as the manufacturer and release date.
Metadata, such as sources containing information about the hardware, and a list of models it has been used to produce.
We provide a comprehensive guide to the data fields, below. This includes examples taken from the NVIDIA A100 SXM4 40 GB datacenter GPU, which is one of the most popular hardware used for machine learning. If you would like to request a field be added, contact us at data@epoch.ai.
| Column | Type | Definition | Example from NVIDIA A100 SXM4 40 GB |
|---|---|---|---|
Hardware name | Text | The full name of the hardware, including the manufacturer. For example, “Google TPU v5p”. Note that there can be different variations of hardware based on similar chips, which should be named distinctly, for example “NVIDIA H100 SXM5 80GB” versus “NVIDIA H100 PCIe”. | NVIDIA A100 SXM4 40 GB |
Manufacturer | Text | The manufacturer of the hardware. | NVIDIA |
Type | Text | Indicates whether the hardware is a central processing unit (CPU), graphics processor (GPU), or tensor processor (TPU). For a small number of experimental other accelerators, such as the Meta MTIA series, this is “Other”. | GPU |
Release date | Date | The date when the hardware could first be rented, used for machine learning workloads, or purchased (excluding pre-orders). | 2020-05-14 |
Release price (USD) | Numeric | Price of the processor when released, in nominal US dollars. Prices are collected from hardware catalogs, news sources, or other documentation. Listed prices do not reflect bulk discounts. | $15,000 |
FP64 (double precision) performance (FLOP/s) | Numeric | These are performance figures for non-tensor operations, at different numerical precisions. Beginning in 2017, ML hardware added tensor cores specifically to optimize tensor operations, which are commonly used in AI training. | 9.7e+12 |
FP32 (single precision) performance (FLOP/s) | Numeric | These are performance figures for non-tensor operations, at different numerical precisions. Beginning in 2017, ML hardware added tensor cores specifically to optimize tensor operations, which are commonly used in AI training. | 1.9e+13 |
FP16 (half precision) performance (FLOP/s) | Numeric | These are performance figures for non-tensor operations, at different numerical precisions. Beginning in 2017, ML hardware added tensor cores specifically to optimize tensor operations, which are commonly used in AI training. | 7.8e+13 |
FP16 (half precision) performance (FLOP/s) | Numeric | These are performance figures for non-tensor operations, at different numerical precisions. Beginning in 2017, ML hardware added tensor cores specifically to optimize tensor operations, which are commonly used in AI training. | 7.8e+13 |
TF32 (TensorFloat-32) performance (FLOP/s) | Numeric | These are performance figures for tensor operations, specifically optimized for AI training. | 1.6e+14 |
Tensor-FP16/BF16 performance (FLOP/s) | Numeric | These are performance figures for tensor operations, specifically optimized for AI training. | 3.1e+14 |
INT16 performance (OP/s) | Numeric | These are performance figures for integer operations, at different numerical precisions. | NaN |
INT8 performance (OP/s) | Numeric | These are performance figures for integer operations, at different numerical precisions. | 6.2e+14 |
INT4 performance (OP/s) | Numeric | These are performance figures for integer operations, at different numerical precisions. | NaN |
Memory size per board (byte) | Numeric | The hardware’s amount of memory, in bytes. | 4.0e+10 |
Memory bandwidth (byte/s) | Numeric | Rate of data transfer between memory and processor, in bytes per second. | 1.6e+12 |
ML OP/s | Numeric | Maximum performance in any format 8 bits or wider, in units of FLOP/s or OP/s. | 6.2e+14 |
Intranode bandwidth (byte/s) | Numeric | Data transfer rate within a single node, in bytes per second. Nodes typically consist of servers which may contain CP0Us, GPUs, memory, storage, etc. | 6.0e+11 |
Internode bandwidth (bit/s) | Numeric | Data transfer rate between separate nodes, in bits per second. Nodes typically consist of servers which may contain CPUs, GPUs, memory, storage, etc. | 2.0e+11 |
Die size (mm^2) | Numeric | The physical size or area of the processing chip, in square millimeters. | 826 |
TDP (W) | Numeric | Thermal design power, the theoretical maximum power that can be dissipated as heat. In theory, this is the maximum sustainable power draw for a given chip. | 400 |
Base clock (MHz) | Numeric | Default operating frequency of the processor, in megahertz. | 1095 |
Boost clock (MHz) | Numeric | Maximum operating frequency of the processor, in megahertz. | 1410 |
Memory clock (MHz) | Numeric | Operating frequency of the processor’s memory, in megahertz. | 1215 |
Memory bus (bit) | Numeric | Amount of data that can be transferred between the memory and processor per cycle, in bits. | 5120 |
Tensor cores | Numeric | Number of tensor cores, a specialized NVIDIA hardware component designed to accelerate matrix and tensor operations. | 432 |
Process size (nm) | Numeric | Nominal semiconductor manufacturing scale, in nanometers. | 7 |
Foundry | Text | The semiconductor manufacturer responsible for producing the processor die or chip in a foundry or fabrication plant. | TSMC |
Number of transistors (millions) | Numeric | Number of transistors in the processor, in millions. | 54200 |
Link to datasheet | URL | Links to document(s) containing specifications or data about the processor. | |
Source for the price | URL | Link to source(s) listing the price of the hardware. | |
ML models | Categorical (multiple select) | ML models trained with this hardware, cross-referenced from our database of models. | Florence, Luminous-supreme, Falcon-180B, GPT-3.5 (davinci-002), GPT-4 (Mar 2023), StableLM-Base-Alpha-7B, Phi-1.5, WeLM, GLM-130B, BlenderBot 3, GPT-NeoX-20B, TinyLlama-1.1B (1T token checkpoint), TinyLlama-1.1B (3T token checkpoint), StableLM-2-1.6B, DINOv2, Stable Code 3B, Falcon-7B, Qarasu-14B, Flan T5-XXL + BLIP-2, BLIP-2 (Q-Former), Swin Transformer V2 (SwinV2-G), SPHINX (Llama 2 13B), EVA-01, CoRe, InstructBLIP, xTrimoPGLM -100B, MPT-7B, Pythia-12b, Pythia-2.8b, Pythia-6.9b, Pythia-160m, Pythia-1b, Pythia-1.4b, Pythia-70m, Pythia-410m, PLaMo-13B, Falcon 2 11B, Janus 1.3B, Luminous-extended, Luminous-base, TeleChat-7B, TeleChat-3B, TeleChat-12B, aiXcoder-7B Base, Janus-Pro-7B, Janus-Pro-1B, SEA-LION V1 3B, SEA-LION V1 7B, Llama-SEA-LION-v2-8B-IT, Novae, HelixProtX, Sailor-7B-Chat, SEA-LION-v1-7B-IT, SGPT BE 5.8B, ToolFormer, Stable Diffusion 2.1, AntiFormer, GPT-2 Medium (FlashAttention), Llemma 7B, Llemma 34B, Teuken 7B, Aquila2 34B, Aquila2‑70B‑Expr, Deepseek OCR, GPT-4 (Jun 2023) |
Have a question? Noticed something wrong? Let us know.
The Machine Learning Hardware dataset is a collection of AI accelerators, such as graphics processing units (GPUs) and tensor processing units (TPUs), used to develop and deploy machine learning models in the deep learning era.