AI Companies Documentation
Overview
Epoch’s AI Companies Dataset is a collection of key economic data on some of the most important frontier AI companies, including data on revenue, funding, staff counts, compute spending, and usage.
This documentation describes which companies are included in the dataset, its records, data fields, and definitions, and a changelog and acknowledgements.
The data is available on our website as a visualization or table, and is available for download as a CSV file, updated daily. For a quick-start example of loading the data and working with it in your research, see this Google Colab demo notebook.
If you would like to ask any questions about the data, or suggest companies that should be added, feel free to contact us at data@epoch.ai.
If this dataset is useful for you, please cite it.
Use This Work
Epoch’s data is free to use, distribute, and reproduce provided the source and authors are credited under the Creative Commons Attribution license.
Citation
Josh You, John Croxton, Venkat Somala, Yafah Edelman, ‘Data on AI Companies’. Published online at epoch.ai. Retrieved from: ‘https://epoch.ai/data/ai-companies’ [online resource], accessed
BibTeX citation
@misc{EpochAICompanies2025,
title = {Data on AI Companies},
author = {{Josh You, John Croxton, Venkat Somala, Yafah Edelman}},
year = {2025},
month = {09},
url = {https://epoch.ai/data/ai-companies},
note = {Accessed: }
}}
Inclusion
For the initial phase of the AI Companies Dataset, we focus on foundation model developers, or AI developers for whom training their own models is a core business priority.
There are currently no formal criteria for selection: we prioritized companies if their models are near the frontier in general-purpose AI capabilities, or if they are among the most commercially significant AI companies.
Other major types of AI companies include: AI application developers that use third-party models (e.g. Anysphere and Perplexity), cloud compute companies (e.g. Microsoft and Amazon), semiconductor companies (e.g. NVIDIA and TSMC), and AI data vendors (e.g. Scale AI and Mercor).
Records
The AI companies database contains key economic data about many frontier AI companies. Records in the database have information covering the following:
Financials including their revenue, funding and valuations, and spending
Operations such as staff counts and product usage
Metadata such as notes on the above fields with supporting evidence and context, our confidence in the key figures, etc.
Fields
We provide a comprehensive guide to the database’s fields below. This includes example field values as reference, which are taken from OpenAI unless otherwise indicated. Note that for each company, we collected rich time-series data on the company’s revenue, valuation, etc. We store these as individual records in separate tables, detailed below.
If you would like to ask any questions about the database, or request a field that should be added, feel free to contact us at data@epoch.ai.
Companies
Column | Type | Definition | Example value | Coverage |
---|---|---|---|---|
Name | Text | The name of the Company. |
OpenAI | 100% 7 out of 7 records |
Company type | Categorical (multiple select) | The primary business focus of the AI company. Possible values: Foundation (develops their own AI models) Application (serves products powered by AI models, including third-party models) AI Cloud (renting out AI chips/compute) AI Vendor (sells data or other AI-related services to other companies). |
Foundation | 100% 7 out of 7 records |
Founding date | Date | The date when the company was officially established or incorporated. |
2015-12-01 | 100% 7 out of 7 records |
Founding source/ notes | Text | Source for information on founding date. |
https://openai.com/index/introducing-openai/ | 100% 7 out of 7 records |
Revenue reports
Column | Type | Definition | Example value | Coverage |
---|---|---|---|---|
Id | Text | Short description of record |
2022 28M | 100% 30 out of 30 records |
Company | Link to record | The AI company. Linked field to Companies table |
OpenAI | 100% 30 out of 30 records |
Date | Date | The as-of date for the reported revenue figure. |
2022-12-31 | 97% 29 out of 30 records |
Revenue level (normalized to annual) | Currency | Revenue rate in USD as indicated by the report, and normalized to an annual basis. Note that this is different from a direct report of annualized revenue run rate or annual recurring revenue; for example, this field can include revenue for a full calendar year. |
28000000.0 | 97% 29 out of 30 records |
Type | Categorical (multiple select) | The classification of the revenue figure based on its scope and calculation method. Options include: Annual recurring revenue (ARR) - revenue from recurring or predictable sources, extrapolated over the next year. Annualized revenue - revenue rate extrapolated from shorter periods over the next year. Yearly revenue - actual revenue over one year. Quarterly revenue - revenue for a quarter. Product/division revenue - revenue from a specific product line or business unit. |
Yearly revenue | 97% 29 out of 30 records |
Revenue info (non-annualized) | Text | Revenue data other than annualized revenue/ annual recurring revenue. |
$28,000,000 revenue in 2022 | 13% 4 out of 30 records |
Source 1 | Link | Link to the primary source. |
https://www.theinformation.com/articles/openai-passes-1-billion-revenue-pace-as-big-companies-boost-ai-spending?rc=9mzoog | 100% 30 out of 30 records |
Source 2 | Link | Additional source, if any. |
[empty] | 13% 4 out of 30 records |
Source 3 | Link | Additional source, if any. |
[empty] | 3% 1 out of 30 records |
Source type | Categorical (single select) | Type of source for the primary source. Options include Company disclosure: official statement from the company or its executives Media report: reported in an established media outlet. This means that the information is exclusively reported in the media, usually via an insider source or documents provided to a journalist. It does not mean any fact written in a newspaper or magazine. |
Media report | 100% 30 out of 30 records |
Notes | Text | Notes documenting details, evidence, and evidence for the revenue estimate, including relevant quotes. documenting the reasoning and/or evidence for the revenue estimate. |
"OpenAI generated just $28 million in revenue last year before it started charging for its groundbreaking chatbot, ChatGPT" | 100% 30 out of 30 records |
Report date | Date | Publication date for the primary source. |
2023-08-29 | 100% 30 out of 30 records |
Confidence | Categorical (single select) | Metadata describing our confidence in the recorded revenue figure, as a function of the credibility of the source and specificity of the report. Confident - based on primary or official sources with specific figures. Likely - based on credible but secondary sources such as news reports with insider sources. Uncertain - based on a less credible source, or a reported claim with significant uncertainty. |
Likely | 100% 30 out of 30 records |
Graph note | Text | Abbreviated note providing additional context for the graph tooltip. |
[empty] | 3% 1 out of 30 records |
Funding rounds
Column | Type | Definition | Example value | Coverage |
---|---|---|---|---|
Id | Text | Short description of record |
300B final stage | 100% 30 out of 30 records |
Company | Link to record | The AI company. Linked field to Companies table |
OpenAI | 100% 30 out of 30 records |
Close date | Date | The date when the funding round was closed (completed). |
[empty] | 90% 27 out of 30 records |
Status | Categorical (single select) | The current stage of the funding round process. Options include: Closed: Round has completed. Early discussions: Reports of tentative negotiations Late discussions: Reports of a firm intention among the company and investors, e.g. with a settled valuation and funding amount, but the round has not closed yet. |
Late discussions | 100% 30 out of 30 records |
Funding (equity) | Currency | Equity funding raised for the company, in USD. Includes convertible notes (debt that can be converted to equity), but not conventional debt. If it is a secondary share sale, the field will be equal to 0. |
20000000000.0 | 97% 29 out of 30 records |
Funding (debt) | Number | Debt funding raised for the company, in USD. Includes loans, credit facilities, and bonds, but excludes convertible notes (which are classified as equity funding). |
[empty] | 10% 3 out of 30 records |
Valuation (post-money) | Number | The post-money valuation of the company, in USD, which includes the new money raised in a funding round. |
300000000000.0 | 70% 21 out of 30 records |
Report date | Date | Publication of the primary source. |
2025-03-31 | 93% 28 out of 30 records |
Source 1 | Link | Link to the source of the funding information, preferably an official announcement. |
https://www.nytimes.com/2025/03/31/technology/openai-valuation-300-billion.html | 100% 30 out of 30 records |
Source 2 | Link | Additional source, if any. |
[empty] | 30% 9 out of 30 records |
Source 3 | Link | Additional source, if any. |
[empty] | 13% 4 out of 30 records |
Notes | Text | Metadata documenting the reasoning and/or evidence for the funding figure. Often includes direct quotes from the source that contains the funding figure. |
Softbank will transfer ~20B to OpenAI by 2025-end "The new investment will be made in two parts, according to a person familiar with the deal who spoke on the condition of anonymity. An initial $10 billion will arrive immediately, with another $30 billion arriving by the end of the year, the person said. SoftBank Group is providing 75 percent of the total, with the rest coming from other investors, including Microsoft, Thrive Capital, Coatue and Altimeter, the person said. Microsoft and Thrive Capital led previous investment rounds in OpenAI." | 100% 30 out of 30 records |
Type | Categorical (multiple select) | Type of funding. Options include: Primary - typical funding round where the company sells equity (shares) in exchange for money. Secondary - major sale of existing shares, such as employee shares. This does not lead to more funding for the company. Debt - Funding via borrowing money rather than selling equity. |
Primary | 100% 30 out of 30 records |
Confidence | Categorical (single select) | Metadata describing our confidence in the recorded funding figure, as a function of the credibility of the source and specificity of the report. Confident - based on primary or official sources with specific figures. Likely - based on credible but secondary sources such as news reports with insider sources. Uncertain - based on a less credible source, or a reported claim with significant uncertainty. |
Likely | 100% 30 out of 30 records |
Graph note | Text | Abbreviated note providing additional context for the graph tooltip. |
[empty] | 13% 4 out of 30 records |
Exclude from graph view | Checkbox | Ad hoc checkbox to exclude data points from our main visualization, usually due to incomparability with other data points. This is not necessarily the only filtering logic in our graph view. |
[empty] | 0% 0 out of 30 records |
Staff counts
Column | Type | Definition | Example value | Coverage |
---|---|---|---|---|
Id | Text | Short description of record |
2016, 52 | 100% 40 out of 40 records |
Company | Categorical (single select) | The AI company. Linked field to Companies table |
OpenAI | 100% 40 out of 40 records |
Staff count | Number | The number of employees or staff. |
52 | 100% 40 out of 40 records |
Type | Categorical (single select) | The scope or category of employees included in Staff Count. Options include: Full company - count of full-time, permanent staff for the entire company Full AI division - count of full-time, permanent staff of the company’s primary AI division, where the company is not a “pure-play” AI company Division/role - staff count in one company division, or one type of role Contractors - part-time or non-permanent staff, e.g. data contractors |
Full company | 100% 40 out of 40 records |
Division name | Text | The specific division the staff count corresponds to, if applicable, further contextualizing the staff count if the Type is not Full company. |
[empty] | 40% 16 out of 40 records |
Date | Date | As-of date for the staff count information. |
2016-12-31 | 100% 40 out of 40 records |
Report date | Date | Publication date of primary source. |
2016-12-31 | 100% 40 out of 40 records |
Source 1 | Link | Link to primary source. |
https://projects.propublica.org/nonprofits/organizations/810861541/201703459349300445/full | 100% 40 out of 40 records |
Source 2 | Link | Additional source, if any. |
https://www.nytimes.com/2018/04/19/technology/artificial-intelligence-salaries-openai.html | 8% 3 out of 40 records |
Source type | Categorical (single select) | Type of source for the primary source. Options include Company disclosure: official statement from the company or its executives Media report: reported in an established media outlet. This means that the information is exclusively reported in the media, usually via an insider source or documents provided to a journalist. It does not mean any fact written in a newspaper or magazine. Other: This can include reports from non-executive employees, or other evidence. |
Company disclosure | 100% 40 out of 40 records |
Notes | Text | Metadata documenting the reasoning and/or evidence for the staff count estimate. Often includes direct quotes from the source that contains the staff count information. |
The OpenAI nonprofit's 990 form for 2016 lists "Total number of individuals employed in calendar year 2016" as 52 NYT also repeats this claim, likely using the same source | 100% 40 out of 40 records |
Confidence | Categorical (single select) | Metadata describing our confidence in the recorded staff count figure, as a function of the credibility of the source and specificity of the report. Confident - based on primary or official sources with specific figures. Likely - based on reliable but secondary sources such as credible news reports. Uncertain - based on a less credible source, or a reported claim with significant uncertainty. |
Confident | 100% 40 out of 40 records |
Graph note | Text | Abbreviated note providing additional context for the graph tooltip. |
[empty] | 10% 4 out of 40 records |
Usage reports
Column | Type | Definition | Example value | Coverage |
---|---|---|---|---|
Id | Text | Short description of record |
ChatGPT 50M WAU | 100% 33 out of 33 records |
Company | Categorical (single select) | The AI company. Linked field to Companies table |
OpenAI | 100% 33 out of 33 records |
Active Users | Number | The number of users for the company’s AI-related product(s) |
50000000.0 | 73% 24 out of 33 records |
Active users time period | Number | The time period (Daily, Weekly, or Monthly) over which the reported active users metric is measured. Note that it is not possible to calculate e.g. daily active users from monthly active users or vice versa, while messages or tokens per period can be normalized into different time periods. |
Weekly | 73% 24 out of 33 records |
Daily messages | Link | Daily user-sent messages, queries, requests, etc. Can be normalized from a report of weekly or monthly messages. |
[empty] | 27% 9 out of 33 records |
Daily tokens | Number | Daily input and output tokens processed. Can be normalized from a report of weekly or monthly tokens. |
[empty] | 12% 4 out of 33 records |
Product | Text | The specific product or service, if any, that this usage report applies to. Includes subscription tier, if relevant. |
ChatGPT | 100% 33 out of 33 records |
Date | Date | As-of date for the information in the report. |
2023-03-25 | 100% 33 out of 33 records |
Report date | Date | Publication date of the primary source. |
2025-09-14 | 100% 33 out of 33 records |
Source 1 | Link | Link to the primary source of the usage information. |
https://cdn.openai.com/pdf/a253471f-8260-40c6-a2cc-aa93fe9f142e/economic-research-chatgpt-usage-paper.pdf | 100% 33 out of 33 records |
Source 2 | Link | Additional source, if any. |
https://www.nber.org/papers/w34255 | 52% 17 out of 33 records |
Source type | Categorical (single select) | Type of source for the primary source. Options include Company disclosure: official statement from the company or its executives Media report: reported in an established media outlet. This means that the information is exclusively reported in the media, usually via an insider source or documents provided to a journalist. It does not mean any fact written in a newspaper or magazine. Other: alternative sources such as credible third-party estimates. |
Company disclosure | 100% 33 out of 33 records |
Notes | Text | Metadata documenting the reasoning and/or evidence for the usage estimates. Often includes direct quotes from the source that contains the information. |
From Figure 3, weekly active ChatGPT users on consumer plans (Free, Plus, Pro). Number extracted using plotdigitizer and rounded to the nearest 5M. | 100% 33 out of 33 records |
Confidence | Categorical (single select) | Metadata describing our confidence in the recorded figure, as a function of the credibility of the source and specificity of the report. Confident - based on primary or official sources with specific figures. Likely - based on reliable but secondary sources such as credible news reports. Uncertain - based on a less credible source, or a reported claim with significant uncertainty. |
Likely | 100% 33 out of 33 records |
Graph note | Text | Abbreviated note providing additional context for the graph tooltip. |
[empty] | 3% 1 out of 33 records |
Exclude from graph view | Checkbox | Ad hoc checkbox to exclude data points from our main visualization, usually due to incomparability with other data points. This is not necessarily the only filtering logic in our graph view. |
[empty] | 9% 3 out of 33 records |
Compute spend
Column | Type | Definition | Example value | Coverage |
---|---|---|---|---|
id | Text | Short description of record |
OpenAI 2022 compute + data | 100% 11 out of 11 records |
Company | Link to record | The AI company. Links to AI Companies table. |
OpenAI | 100% 11 out of 11 records |
Category | Categorical (multiple select) | The category or type of spending being reported. Options include: Inference cloud compute - cloud compute spend for running models to serve users. R&D cloud compute - cloud compute spending for research, development, and model training. |
R&D cloud compute,Inference cloud compute,Data | 100% 11 out of 11 records |
Period type | Categorical (single select) | The time frame or reporting period to which the spending amount corresponds to. |
Year | 100% 11 out of 11 records |
Source 1 | Link | Link to the primary source of the information. |
https://fortune.com/longform/chatgpt-openai-sam-altman-microsoft/ | 100% 11 out of 11 records |
Source 2 | Link | Additional source, if any. |
[empty] | 36% 4 out of 11 records |
Source 3 | Link | Additional source, if any. |
[empty] | 18% 2 out of 11 records |
Source type | Categorical (single select) | Type of source for the primary source. Options include Company disclosure: official statement from the company or its executives Media report: reported in an established media outlet. This means that the information is exclusively reported in the media, usually via an insider source or documents provided to a journalist. It does not mean any fact written in a newspaper or magazine. Other: alternative sources such as third-party estimates. |
Media report | 100% 11 out of 11 records |
Notes | Text | Metadata documenting the reasoning and/or evidence for the reported compute spend. Often includes direct quotes from the source that contains the information. |
OpenAI spend "416M on "computing and data", out of ~500M in total spending. "But it was projecting expenses of $416.45 million on computing and data, $89.31 million on staff, and $38.75 million in unspecified other operating expenses" | 100% 11 out of 11 records |
Date | Date | The as-of date of the reported figure. For annual spending, this uses the last date of the year. |
2022-12-31 | 100% 11 out of 11 records |
Report date | Date | The date when the primary source was published. |
2023-05-04 | 91% 10 out of 11 records |
Confidence | Categorical (single select) | Metadata describing our confidence in the recorded figure, as a function of the credibility of the source and specificity of the report. Confident - based on primary or official sources with specific figures. Likely - based on reliable but secondary sources such as credible news reports. Uncertain - based on a less credible source, or a reported claim with significant uncertainty. |
Likely | 100% 11 out of 11 records |
Exclude from graph view | Checkbox | Ad hoc checkbox to exclude data points from our main visualization, usually due to incomparability with other data points. This is not necessarily the only filtering logic in our graph view. |
[empty] | 27% 3 out of 11 records |
Downloads
Download the AI Companies dataset as individual CSV files for specific data types, or as a complete package containing all datasets.
Companies dataset
CSV, Updated September 30, 2025
Revenue reports
CSV, Updated September 30, 2025
Funding rounds
CSV, Updated September 30, 2025
Staff counts
CSV, Updated September 30, 2025
Usage reports
CSV, Updated September 30, 2025
Compute spend
CSV, Updated September 30, 2025
AI companies ZIP
ZIP, Updated September 30, 2025
Acknowledgements
This data was collected by Epoch AI’s employees and collaborators, including John Croxton, Josh You, Venkat Somala, and Yafah Edelman.
This documentation was written by Josh You and Venkat Somala.