AI Companies Documentation
Overview
Epoch’s AI Companies Dataset is a collection of key economic data on some of the most important frontier AI companies, including data on revenue, funding, staff counts, compute spending, and usage.
This documentation describes which companies are included in the dataset, its records, data fields, and definitions, and a changelog and acknowledgements.
The data is available on our website as a visualization or table, and is available for download as a CSV file, updated daily. For a quick-start example of loading the data and working with it in your research, see this Google Colab demo notebook.
If you would like to ask any questions about the data, or suggest companies that should be added, feel free to contact us at data@epoch.ai.
If this dataset is useful for you, please cite it.
Use This Work
Epoch’s data is free to use, distribute, and reproduce provided the source and authors are credited under the Creative Commons Attribution license.
Citation
Josh You, John Croxton, Venkat Somala, Yafah Edelman, ‘Data on AI Companies’. Published online at epoch.ai. Retrieved from: ‘https://epoch.ai/data/ai-companies’ [online resource], accessed
BibTeX citation
@misc{EpochAICompanies2025,
title = {Data on AI Companies},
author = {{Josh You, John Croxton, Venkat Somala, Yafah Edelman}},
year = {2025},
month = {09},
url = {https://epoch.ai/data/ai-companies},
note = {Accessed: }
}}Inclusion
For the initial phase of the AI Companies Dataset, we focus on foundation model developers, or AI developers for whom training their own models is a core business priority.
There are currently no formal criteria for selection: we prioritized companies if their models are near the frontier in general-purpose AI capabilities, or if they are among the most commercially significant AI companies.
Other major types of AI companies include: AI application developers that use third-party models (e.g. Anysphere and Perplexity), cloud compute companies (e.g. Microsoft and Amazon), semiconductor companies (e.g. NVIDIA and TSMC), and AI data vendors (e.g. Scale AI and Mercor).
Records
The AI companies database contains key economic data about many frontier AI companies. Records in the database have information covering the following:
Financials including their revenue, funding and valuations, and spending
Operations such as staff counts and product usage
Metadata such as notes on the above fields with supporting evidence and context, our confidence in the key figures, etc.
Fields
We provide a comprehensive guide to the database’s fields below. This includes example field values as reference, which are taken from OpenAI unless otherwise indicated. Note that for each company, we collected rich time-series data on the company’s revenue, valuation, etc. We store these as individual records in separate tables, detailed below.
If you would like to ask any questions about the database, or request a field that should be added, feel free to contact us at data@epoch.ai.
Companies
| Column | Type | Definition | Example value | Coverage |
|---|---|---|---|---|
| Name | Text | The name of the Company. |
OpenAI | 100% 9 out of 9 records |
| Company type | Categorical (multiple select) | The primary business focus of the AI company. Possible values:
|
Foundation | 100% 9 out of 9 records |
| Founding date | Date | The date when the company was officially established or incorporated. |
2015-12-01 | 78% 7 out of 9 records |
| Founding source/ notes | Text | Source for information on founding date. |
https://openai.com/index/introducing-openai/ | 78% 7 out of 9 records |
Revenue reports
| Column | Type | Definition | Example value | Coverage |
|---|---|---|---|---|
| Id | Text | Short description of record |
2022 28M | 98% 41 out of 42 records |
| Company | Link to record | The AI company. Linked field to Companies table |
OpenAI | 100% 42 out of 42 records |
| Date | Date | The as-of date for the reported revenue figure. |
2022-12-31 | 95% 40 out of 42 records |
| Type | Categorical (multiple select) | The classification of the revenue figure based on its scope and calculation method. Options include:
|
Yearly revenue | 95% 40 out of 42 records |
| Revenue info (non-annualized) | Text | Revenue data other than annualized revenue/ annual recurring revenue. |
$28,000,000 revenue in 2022 | 14% 6 out of 42 records |
| Source 1 | Link | Link to the primary source. |
https://www.theinformation.com/articles/openai-passes-1-billion-revenue-pace-as-big-companies-boost-ai-spending?rc=9mzoog | 98% 41 out of 42 records |
| Source 2 | Link | Additional source, if any. |
[empty] | 12% 5 out of 42 records |
| Source 3 | Link | Additional source, if any. |
[empty] | 2% 1 out of 42 records |
| Source type | Categorical (single select) | Type of source for the primary source. Options include:
|
Media report | 98% 41 out of 42 records |
| Notes | Text | Notes documenting details, evidence, and evidence for the revenue estimate, including relevant quotes. documenting the reasoning and/or evidence for the revenue estimate. |
"OpenAI generated just $28 million in revenue last year before it started charging for its groundbreaking chatbot, ChatGPT" | 98% 41 out of 42 records |
| Report date | Date | Publication date for the primary source. |
2023-08-29 | 98% 41 out of 42 records |
| Confidence | Categorical (single select) | Metadata describing our confidence in the recorded revenue figure, as a function of the credibility of the source and specificity of the report.
|
Likely | 98% 41 out of 42 records |
| Graph note | Text | Abbreviated note providing additional context for the graph tooltip. |
[empty] | 17% 7 out of 42 records |
Funding rounds
| Column | Type | Definition | Example value | Coverage |
|---|---|---|---|---|
| Id | Text | Short description of record |
OpenAI–Nvidia | 100% 39 out of 39 records |
| Company | Link to record | The AI company. Linked field to Companies table |
OpenAI | 100% 39 out of 39 records |
| Close date | Date | The date when the funding round was closed (completed). |
[empty] | 92% 36 out of 39 records |
| Status | Categorical (single select) | The current stage of the funding round process. Options include:
|
Late discussions | 100% 39 out of 39 records |
| Funding (equity) | Currency | Equity funding raised for the company, in USD. Includes convertible notes (debt that can be converted to equity), but not conventional debt. If it is a secondary share sale, the field will be equal to 0. |
100000000000.0 | 95% 37 out of 39 records |
| Funding (debt) | Number | Debt funding raised for the company, in USD. Includes loans, credit facilities, and bonds, but excludes convertible notes (which are classified as equity funding). |
[empty] | 10% 4 out of 39 records |
| Valuation (post-money) | Number | The post-money valuation of the company, in USD, which includes the new money raised in a funding round. |
[empty] | 72% 28 out of 39 records |
| Report date | Date | Publication of the primary source. |
2025-09-22 | 95% 37 out of 39 records |
| Source 1 | Link | Link to the source of the funding information, preferably an official announcement. |
https://openai.com/index/openai-nvidia-systems-partnership/ | 100% 39 out of 39 records |
| Source 2 | Link | Additional source, if any. |
https://fortune.com/2025/12/02/nvidia-openai-deal-not-signed-yet-100-billion-rally-colette-kress/ | 36% 14 out of 39 records |
| Source 3 | Link | Additional source, if any. |
[empty] | 15% 6 out of 39 records |
| Notes | Text | Metadata documenting the reasoning and/or evidence for the funding figure. Often includes direct quotes from the source that contains the funding figure. |
Nvidia has agreed to invest $100B in OpenAI as part of an AI data center deal. This will be phased in as each GW is deployed, so nothing is closed yet. "Strategic partnership enables OpenAI to build and deploy at least 10 gigawatts of AI datacenters with NVIDIA systems representing millions of GPUs for OpenAI’s next-generation AI infrastructure. To support the partnership, NVIDIA intends to invest up to $100 billion in OpenAI progressively as each gigawatt is deployed. The first gigawatt of NVIDIA systems will be deployed in the second half of 2026 on NVIDIA’s Vera Rubin platform." As of December 2025, deal has not been formally signed: "Speaking Tuesday at the UBS Global Technology and AI Conference in Scottsdale, Nvidia EVP and CFO Colette Kress told investors that the much-hyped OpenAI partnership is still at the letter-of-intent stage. “We still haven’t completed a definitive agreement,” Kress said when asked how much of the 10-gigawatt commitment is actually locked in." | 100% 39 out of 39 records |
| Type | Categorical (multiple select) | Type of funding. Options include:
|
Primary | 95% 37 out of 39 records |
| Confidence | Categorical (single select) | Metadata describing our confidence in the recorded funding figure, as a function of the credibility of the source and specificity of the report.
|
Confident | 97% 38 out of 39 records |
| Graph note | Text | Abbreviated note providing additional context for the graph tooltip. |
[empty] | 13% 5 out of 39 records |
| Exclude from graph view | Checkbox | Ad hoc checkbox to exclude data points from our main visualization, usually due to incomparability with other data points. This is not necessarily the only filtering logic in our graph view. |
[empty] | 10% 4 out of 39 records |
Staff counts
| Column | Type | Definition | Example value | Coverage |
|---|---|---|---|---|
| Id | Text | Short description of record |
2016, 52 | 100% 44 out of 44 records |
| Company | Categorical (single select) | The AI company. Linked field to Companies table |
OpenAI | 100% 44 out of 44 records |
| Staff count | Number | The number of employees or staff. |
52 | 100% 44 out of 44 records |
| Type | Categorical (single select) | The scope or category of employees included in Staff Count. Options include:
|
Full company | 98% 43 out of 44 records |
| Division name | Text | The specific division the staff count corresponds to, if applicable, further contextualizing the staff count if the Type is not Full company. |
[empty] | 36% 16 out of 44 records |
| Date | Date | As-of date for the staff count information. |
2016-12-31 | 100% 44 out of 44 records |
| Report date | Date | Publication date of primary source. |
2016-12-31 | 100% 44 out of 44 records |
| Source 1 | Link | Link to primary source. |
https://projects.propublica.org/nonprofits/organizations/810861541/201703459349300445/full | 100% 44 out of 44 records |
| Source 2 | Link | Additional source, if any. |
https://www.nytimes.com/2018/04/19/technology/artificial-intelligence-salaries-openai.html | 9% 4 out of 44 records |
| Source type | Categorical (single select) | Type of source for the primary source. Options include:
|
Company disclosure | 98% 43 out of 44 records |
| Notes | Text | Metadata documenting the reasoning and/or evidence for the staff count estimate. Often includes direct quotes from the source that contains the staff count information. |
The OpenAI nonprofit's 990 form for 2016 lists "Total number of individuals employed in calendar year 2016" as 52 NYT also repeats this claim, likely using the same source | 100% 44 out of 44 records |
| Confidence | Categorical (single select) | Metadata describing our confidence in the recorded staff count figure, as a function of the credibility of the source and specificity of the report.
|
Confident | 98% 43 out of 44 records |
| Graph note | Text | Abbreviated note providing additional context for the graph tooltip. |
[empty] | 14% 6 out of 44 records |
Usage reports
| Column | Type | Definition | Example value | Coverage |
|---|---|---|---|---|
| Id | Text | Short description of record |
ChatGPT 50M WAU | 100% 47 out of 47 records |
| Company | Categorical (single select) | The AI company. Linked field to Companies table |
OpenAI | 100% 47 out of 47 records |
| Active Users | Number | The number of users for the company’s AI-related product(s) |
50000000.0 | 64% 30 out of 47 records |
| Active users time period | Number | The time period (Daily, Weekly, or Monthly) over which the reported active users metric is measured. Note that it is not possible to calculate e.g. daily active users from monthly active users or vice versa, while messages or tokens per period can be normalized into different time periods. |
Weekly | 64% 30 out of 47 records |
| Daily messages | Link | Daily user-sent messages, queries, requests, etc. Can be normalized from a report of weekly or monthly messages. |
[empty] | 19% 9 out of 47 records |
| Daily tokens | Number | Daily input and output tokens processed. Can be normalized from a report of weekly or monthly tokens. |
[empty] | 26% 12 out of 47 records |
| Product | Text | The specific product or service, if any, that this usage report applies to. Includes subscription tier, if relevant. |
ChatGPT | 100% 47 out of 47 records |
| Date | Date | As-of date for the information in the report. |
2023-03-25 | 100% 47 out of 47 records |
| Report date | Date | Publication date of the primary source. |
2025-09-14 | 100% 47 out of 47 records |
| Source 1 | Link | Link to the primary source of the usage information. |
https://cdn.openai.com/pdf/a253471f-8260-40c6-a2cc-aa93fe9f142e/economic-research-chatgpt-usage-paper.pdf | 100% 47 out of 47 records |
| Source 2 | Link | Additional source, if any. |
https://www.nber.org/papers/w34255 | 40% 19 out of 47 records |
| Source type | Categorical (single select) | Type of source for the primary source. Options include:
|
Company disclosure | 100% 47 out of 47 records |
| Notes | Text | Metadata documenting the reasoning and/or evidence for the usage estimates. Often includes direct quotes from the source that contains the information. |
From Figure 3, weekly active ChatGPT users on consumer plans (Free, Plus, Pro). Number extracted using plotdigitizer and rounded to the nearest 5M. | 100% 47 out of 47 records |
| Confidence | Categorical (single select) | Metadata describing our confidence in the recorded figure, as a function of the credibility of the source and specificity of the report.
|
Likely | 100% 47 out of 47 records |
| Graph note | Text | Abbreviated note providing additional context for the graph tooltip. |
[empty] | 13% 6 out of 47 records |
| Exclude from graph view | Checkbox | Ad hoc checkbox to exclude data points from our main visualization, usually due to incomparability with other data points. This is not necessarily the only filtering logic in our graph view. |
[empty] | 13% 6 out of 47 records |
Compute spend
| Column | Type | Definition | Example value | Coverage |
|---|---|---|---|---|
| id | Text | Short description of record |
OpenAI 2022 compute + data | 100% 11 out of 11 records |
| Company | Link to record | The AI company. Links to AI Companies table. |
OpenAI | 100% 11 out of 11 records |
| Category | Categorical (multiple select) | The category or type of spending being reported. Options include:
|
R&D cloud compute,Inference cloud compute,Data | 100% 11 out of 11 records |
| Period type | Categorical (single select) | The time frame or reporting period to which the spending amount corresponds to. |
Year | 100% 11 out of 11 records |
| Source 1 | Link | Link to the primary source of the information. |
https://fortune.com/longform/chatgpt-openai-sam-altman-microsoft/ | 100% 11 out of 11 records |
| Source 2 | Link | Additional source, if any. |
[empty] | 36% 4 out of 11 records |
| Source 3 | Link | Additional source, if any. |
[empty] | 18% 2 out of 11 records |
| Source type | Categorical (single select) | Type of source for the primary source. Options include:
|
Media report | 100% 11 out of 11 records |
| Notes | Text | Metadata documenting the reasoning and/or evidence for the reported compute spend. Often includes direct quotes from the source that contains the information. |
OpenAI spend "416M on "computing and data", out of ~500M in total spending. "But it was projecting expenses of $416.45 million on computing and data, $89.31 million on staff, and $38.75 million in unspecified other operating expenses" | 100% 11 out of 11 records |
| Date | Date | The as-of date of the reported figure. For annual spending, this uses the last date of the year. |
2022-12-31 | 100% 11 out of 11 records |
| Report date | Date | The date when the primary source was published. |
2023-05-04 | 91% 10 out of 11 records |
| Confidence | Categorical (single select) | Metadata describing our confidence in the recorded figure, as a function of the credibility of the source and specificity of the report.
|
Likely | 100% 11 out of 11 records |
| Exclude from graph view | Checkbox | Ad hoc checkbox to exclude data points from our main visualization, usually due to incomparability with other data points. This is not necessarily the only filtering logic in our graph view. |
[empty] | 27% 3 out of 11 records |
Downloads
Download the AI Companies dataset as individual CSV files for specific data types, or as a complete package containing all datasets.
Companies dataset
CSV, Updated January 10, 2026
Revenue reports
CSV, Updated January 10, 2026
Funding rounds
CSV, Updated January 9, 2026
Staff counts
CSV, Updated January 10, 2026
Usage reports
CSV, Updated January 10, 2026
Compute spend
CSV, Updated October 1, 2025
AI companies ZIP
ZIP, Updated January 10, 2026
Acknowledgements
This data was collected by Epoch AI’s employees and collaborators, including John Croxton, Josh You, Venkat Somala, and Yafah Edelman.
This documentation was written by Josh You and Venkat Somala.