AI Companies Documentation

Overview

Epoch’s AI Companies Dataset is a collection of key economic data on some of the most important frontier AI companies, including data on revenue, funding, staff counts, compute spending, and usage.

This documentation describes which companies are included in the dataset, its records, data fields, and definitions, and a changelog and acknowledgements.

The data is available on our website as a visualization or table, and is available for download as a CSV file, updated daily. For a quick-start example of loading the data and working with it in your research, see this Google Colab demo notebook.

If you would like to ask any questions about the data, or suggest companies that should be added, feel free to contact us at data@epoch.ai.

If this dataset is useful for you, please cite it.

Use This Work

Epoch’s data is free to use, distribute, and reproduce provided the source and authors are credited under the Creative Commons Attribution license.

Citation

Josh You, John Croxton, Venkat Somala, Yafah Edelman, ‘Data on AI Companies’. Published online at epoch.ai. Retrieved from: ‘https://epoch.ai/data/ai-companies’ [online resource], accessed 

BibTeX citation

@misc{EpochAICompanies2025,
  title = {Data on AI Companies},
  author = {{Josh You, John Croxton, Venkat Somala, Yafah Edelman}},
  year = {2025},
  month = {09},
  url = {https://epoch.ai/data/ai-companies},
  note = {Accessed: }
}}

Inclusion

For the initial phase of the AI Companies Dataset, we focus on foundation model developers, or AI developers for whom training their own models is a core business priority.

There are currently no formal criteria for selection: we prioritized companies if their models are near the frontier in general-purpose AI capabilities, or if they are among the most commercially significant AI companies.

Other major types of AI companies include: AI application developers that use third-party models (e.g. Anysphere and Perplexity), cloud compute companies (e.g. Microsoft and Amazon), semiconductor companies (e.g. NVIDIA and TSMC), and AI data vendors (e.g. Scale AI and Mercor).

Records

The AI companies database contains key economic data about many frontier AI companies. Records in the database have information covering the following:

Financials including their revenue, funding and valuations, and spending

Operations such as staff counts and product usage

Metadata such as notes on the above fields with supporting evidence and context, our confidence in the key figures, etc.

Fields

We provide a comprehensive guide to the database’s fields below. This includes example field values as reference, which are taken from OpenAI unless otherwise indicated. Note that for each company, we collected rich time-series data on the company’s revenue, valuation, etc. We store these as individual records in separate tables, detailed below.

If you would like to ask any questions about the database, or request a field that should be added, feel free to contact us at data@epoch.ai.

Companies

Column Type Definition Example value Coverage
Name Text

The name of the Company.

OpenAI 100%
7 out of 7 records
Company type Categorical (multiple select)

The primary business focus of the AI company. Possible values: Foundation (develops their own AI models) Application (serves products powered by AI models, including third-party models) AI Cloud (renting out AI chips/compute) AI Vendor (sells data or other AI-related services to other companies).

Foundation 100%
7 out of 7 records
Founding date Date

The date when the company was officially established or incorporated.

2015-12-01 100%
7 out of 7 records
Founding source/ notes Text

Source for information on founding date.

https://openai.com/index/introducing-openai/ 100%
7 out of 7 records

Revenue reports

Column Type Definition Example value Coverage
Id Text

Short description of record

2022 28M 100%
30 out of 30 records
Company Link to record

The AI company. Linked field to Companies table

OpenAI 100%
30 out of 30 records
Date Date

The as-of date for the reported revenue figure.

2022-12-31 97%
29 out of 30 records
Revenue level (normalized to annual) Currency

Revenue rate in USD as indicated by the report, and normalized to an annual basis. Note that this is different from a direct report of annualized revenue run rate or annual recurring revenue; for example, this field can include revenue for a full calendar year.

28000000.0 97%
29 out of 30 records
Type Categorical (multiple select)

The classification of the revenue figure based on its scope and calculation method. Options include: Annual recurring revenue (ARR) - revenue from recurring or predictable sources, extrapolated over the next year. Annualized revenue - revenue rate extrapolated from shorter periods over the next year. Yearly revenue - actual revenue over one year. Quarterly revenue - revenue for a quarter. Product/division revenue - revenue from a specific product line or business unit.

Yearly revenue 97%
29 out of 30 records
Revenue info (non-annualized) Text

Revenue data other than annualized revenue/ annual recurring revenue.

$28,000,000 revenue in 2022 13%
4 out of 30 records
Source 1 Link

Link to the primary source.

https://www.theinformation.com/articles/openai-passes-1-billion-revenue-pace-as-big-companies-boost-ai-spending?rc=9mzoog 100%
30 out of 30 records
Source 2 Link

Additional source, if any.

[empty] 13%
4 out of 30 records
Source 3 Link

Additional source, if any.

[empty] 3%
1 out of 30 records
Source type Categorical (single select)

Type of source for the primary source. Options include Company disclosure: official statement from the company or its executives Media report: reported in an established media outlet. This means that the information is exclusively reported in the media, usually via an insider source or documents provided to a journalist. It does not mean any fact written in a newspaper or magazine.

Media report 100%
30 out of 30 records
Notes Text

Notes documenting details, evidence, and evidence for the revenue estimate, including relevant quotes. documenting the reasoning and/or evidence for the revenue estimate.

"OpenAI generated just $28 million in revenue last year before it started charging for its groundbreaking chatbot, ChatGPT" 100%
30 out of 30 records
Report date Date

Publication date for the primary source.

2023-08-29 100%
30 out of 30 records
Confidence Categorical (single select)

Metadata describing our confidence in the recorded revenue figure, as a function of the credibility of the source and specificity of the report. Confident - based on primary or official sources with specific figures. Likely - based on credible but secondary sources such as news reports with insider sources. Uncertain - based on a less credible source, or a reported claim with significant uncertainty.

Likely 100%
30 out of 30 records
Graph note Text

Abbreviated note providing additional context for the graph tooltip.

[empty] 3%
1 out of 30 records

Funding rounds

Column Type Definition Example value Coverage
Id Text

Short description of record

300B final stage 100%
30 out of 30 records
Company Link to record

The AI company. Linked field to Companies table

OpenAI 100%
30 out of 30 records
Close date Date

The date when the funding round was closed (completed).

[empty] 90%
27 out of 30 records
Status Categorical (single select)

The current stage of the funding round process. Options include: Closed: Round has completed. Early discussions: Reports of tentative negotiations Late discussions: Reports of a firm intention among the company and investors, e.g. with a settled valuation and funding amount, but the round has not closed yet.

Late discussions 100%
30 out of 30 records
Funding (equity) Currency

Equity funding raised for the company, in USD. Includes convertible notes (debt that can be converted to equity), but not conventional debt. If it is a secondary share sale, the field will be equal to 0.

20000000000.0 97%
29 out of 30 records
Funding (debt) Number

Debt funding raised for the company, in USD. Includes loans, credit facilities, and bonds, but excludes convertible notes (which are classified as equity funding).

[empty] 10%
3 out of 30 records
Valuation (post-money) Number

The post-money valuation of the company, in USD, which includes the new money raised in a funding round.

300000000000.0 70%
21 out of 30 records
Report date Date

Publication of the primary source.

2025-03-31 93%
28 out of 30 records
Source 1 Link

Link to the source of the funding information, preferably an official announcement.

https://www.nytimes.com/2025/03/31/technology/openai-valuation-300-billion.html 100%
30 out of 30 records
Source 2 Link

Additional source, if any.

[empty] 30%
9 out of 30 records
Source 3 Link

Additional source, if any.

[empty] 13%
4 out of 30 records
Notes Text

Metadata documenting the reasoning and/or evidence for the funding figure. Often includes direct quotes from the source that contains the funding figure.

Softbank will transfer ~20B to OpenAI by 2025-end "The new investment will be made in two parts, according to a person familiar with the deal who spoke on the condition of anonymity. An initial $10 billion will arrive immediately, with another $30 billion arriving by the end of the year, the person said. SoftBank Group is providing 75 percent of the total, with the rest coming from other investors, including Microsoft, Thrive Capital, Coatue and Altimeter, the person said. Microsoft and Thrive Capital led previous investment rounds in OpenAI." 100%
30 out of 30 records
Type Categorical (multiple select)

Type of funding. Options include: Primary - typical funding round where the company sells equity (shares) in exchange for money. Secondary - major sale of existing shares, such as employee shares. This does not lead to more funding for the company. Debt - Funding via borrowing money rather than selling equity.

Primary 100%
30 out of 30 records
Confidence Categorical (single select)

Metadata describing our confidence in the recorded funding figure, as a function of the credibility of the source and specificity of the report. Confident - based on primary or official sources with specific figures. Likely - based on credible but secondary sources such as news reports with insider sources. Uncertain - based on a less credible source, or a reported claim with significant uncertainty.

Likely 100%
30 out of 30 records
Graph note Text

Abbreviated note providing additional context for the graph tooltip.

[empty] 13%
4 out of 30 records
Exclude from graph view Checkbox

Ad hoc checkbox to exclude data points from our main visualization, usually due to incomparability with other data points. This is not necessarily the only filtering logic in our graph view.

[empty] 0%
0 out of 30 records

Staff counts

Column Type Definition Example value Coverage
Id Text

Short description of record

2016, 52 100%
40 out of 40 records
Company Categorical (single select)

The AI company. Linked field to Companies table

OpenAI 100%
40 out of 40 records
Staff count Number

The number of employees or staff.

52 100%
40 out of 40 records
Type Categorical (single select)

The scope or category of employees included in Staff Count. Options include: Full company - count of full-time, permanent staff for the entire company Full AI division - count of full-time, permanent staff of the company’s primary AI division, where the company is not a “pure-play” AI company Division/role - staff count in one company division, or one type of role Contractors - part-time or non-permanent staff, e.g. data contractors

Full company 100%
40 out of 40 records
Division name Text

The specific division the staff count corresponds to, if applicable, further contextualizing the staff count if the Type is not Full company.

[empty] 40%
16 out of 40 records
Date Date

As-of date for the staff count information.

2016-12-31 100%
40 out of 40 records
Report date Date

Publication date of primary source.

2016-12-31 100%
40 out of 40 records
Source 1 Link

Link to primary source.

https://projects.propublica.org/nonprofits/organizations/810861541/201703459349300445/full 100%
40 out of 40 records
Source 2 Link

Additional source, if any.

https://www.nytimes.com/2018/04/19/technology/artificial-intelligence-salaries-openai.html 8%
3 out of 40 records
Source type Categorical (single select)

Type of source for the primary source. Options include Company disclosure: official statement from the company or its executives Media report: reported in an established media outlet. This means that the information is exclusively reported in the media, usually via an insider source or documents provided to a journalist. It does not mean any fact written in a newspaper or magazine. Other: This can include reports from non-executive employees, or other evidence.

Company disclosure 100%
40 out of 40 records
Notes Text

Metadata documenting the reasoning and/or evidence for the staff count estimate. Often includes direct quotes from the source that contains the staff count information.

The OpenAI nonprofit's 990 form for 2016 lists "Total number of individuals employed in calendar year 2016" as 52 NYT also repeats this claim, likely using the same source 100%
40 out of 40 records
Confidence Categorical (single select)

Metadata describing our confidence in the recorded staff count figure, as a function of the credibility of the source and specificity of the report. Confident - based on primary or official sources with specific figures. Likely - based on reliable but secondary sources such as credible news reports. Uncertain - based on a less credible source, or a reported claim with significant uncertainty.

Confident 100%
40 out of 40 records
Graph note Text

Abbreviated note providing additional context for the graph tooltip.

[empty] 10%
4 out of 40 records

Usage reports

Column Type Definition Example value Coverage
Id Text

Short description of record

ChatGPT 50M WAU 100%
33 out of 33 records
Company Categorical (single select)

The AI company. Linked field to Companies table

OpenAI 100%
33 out of 33 records
Active Users Number

The number of users for the company’s AI-related product(s)

50000000.0 73%
24 out of 33 records
Active users time period Number

The time period (Daily, Weekly, or Monthly) over which the reported active users metric is measured. Note that it is not possible to calculate e.g. daily active users from monthly active users or vice versa, while messages or tokens per period can be normalized into different time periods.

Weekly 73%
24 out of 33 records
Daily messages Link

Daily user-sent messages, queries, requests, etc. Can be normalized from a report of weekly or monthly messages.

[empty] 27%
9 out of 33 records
Daily tokens Number

Daily input and output tokens processed. Can be normalized from a report of weekly or monthly tokens.

[empty] 12%
4 out of 33 records
Product Text

The specific product or service, if any, that this usage report applies to. Includes subscription tier, if relevant.

ChatGPT 100%
33 out of 33 records
Date Date

As-of date for the information in the report.

2023-03-25 100%
33 out of 33 records
Report date Date

Publication date of the primary source.

2025-09-14 100%
33 out of 33 records
Source 1 Link

Link to the primary source of the usage information.

https://cdn.openai.com/pdf/a253471f-8260-40c6-a2cc-aa93fe9f142e/economic-research-chatgpt-usage-paper.pdf 100%
33 out of 33 records
Source 2 Link

Additional source, if any.

https://www.nber.org/papers/w34255 52%
17 out of 33 records
Source type Categorical (single select)

Type of source for the primary source. Options include Company disclosure: official statement from the company or its executives Media report: reported in an established media outlet. This means that the information is exclusively reported in the media, usually via an insider source or documents provided to a journalist. It does not mean any fact written in a newspaper or magazine. Other: alternative sources such as credible third-party estimates.

Company disclosure 100%
33 out of 33 records
Notes Text

Metadata documenting the reasoning and/or evidence for the usage estimates. Often includes direct quotes from the source that contains the information.

From Figure 3, weekly active ChatGPT users on consumer plans (Free, Plus, Pro). Number extracted using plotdigitizer and rounded to the nearest 5M. 100%
33 out of 33 records
Confidence Categorical (single select)

Metadata describing our confidence in the recorded figure, as a function of the credibility of the source and specificity of the report. Confident - based on primary or official sources with specific figures. Likely - based on reliable but secondary sources such as credible news reports. Uncertain - based on a less credible source, or a reported claim with significant uncertainty.

Likely 100%
33 out of 33 records
Graph note Text

Abbreviated note providing additional context for the graph tooltip.

[empty] 3%
1 out of 33 records
Exclude from graph view Checkbox

Ad hoc checkbox to exclude data points from our main visualization, usually due to incomparability with other data points. This is not necessarily the only filtering logic in our graph view.

[empty] 9%
3 out of 33 records

Compute spend

Column Type Definition Example value Coverage
id Text

Short description of record

OpenAI 2022 compute + data 100%
11 out of 11 records
Company Link to record

The AI company. Links to AI Companies table.

OpenAI 100%
11 out of 11 records
Category Categorical (multiple select)

The category or type of spending being reported. Options include: Inference cloud compute - cloud compute spend for running models to serve users. R&D cloud compute - cloud compute spending for research, development, and model training.

R&D cloud compute,Inference cloud compute,Data 100%
11 out of 11 records
Period type Categorical (single select)

The time frame or reporting period to which the spending amount corresponds to.

Year 100%
11 out of 11 records
Source 1 Link

Link to the primary source of the information.

https://fortune.com/longform/chatgpt-openai-sam-altman-microsoft/ 100%
11 out of 11 records
Source 2 Link

Additional source, if any.

[empty] 36%
4 out of 11 records
Source 3 Link

Additional source, if any.

[empty] 18%
2 out of 11 records
Source type Categorical (single select)

Type of source for the primary source. Options include Company disclosure: official statement from the company or its executives Media report: reported in an established media outlet. This means that the information is exclusively reported in the media, usually via an insider source or documents provided to a journalist. It does not mean any fact written in a newspaper or magazine. Other: alternative sources such as third-party estimates.

Media report 100%
11 out of 11 records
Notes Text

Metadata documenting the reasoning and/or evidence for the reported compute spend. Often includes direct quotes from the source that contains the information.

OpenAI spend "416M on "computing and data", out of ~500M in total spending. "But it was projecting expenses of $416.45 million on computing and data, $89.31 million on staff, and $38.75 million in unspecified other operating expenses" 100%
11 out of 11 records
Date Date

The as-of date of the reported figure. For annual spending, this uses the last date of the year.

2022-12-31 100%
11 out of 11 records
Report date Date

The date when the primary source was published.

2023-05-04 91%
10 out of 11 records
Confidence Categorical (single select)

Metadata describing our confidence in the recorded figure, as a function of the credibility of the source and specificity of the report. Confident - based on primary or official sources with specific figures. Likely - based on reliable but secondary sources such as credible news reports. Uncertain - based on a less credible source, or a reported claim with significant uncertainty.

Likely 100%
11 out of 11 records
Exclude from graph view Checkbox

Ad hoc checkbox to exclude data points from our main visualization, usually due to incomparability with other data points. This is not necessarily the only filtering logic in our graph view.

[empty] 27%
3 out of 11 records

Downloads

Download the AI Companies dataset as individual CSV files for specific data types, or as a complete package containing all datasets.

Companies dataset

CSV, Updated September 30, 2025

Revenue reports

CSV, Updated September 30, 2025

Funding rounds

CSV, Updated September 30, 2025

Staff counts

CSV, Updated September 30, 2025

Usage reports

CSV, Updated September 30, 2025

Compute spend

CSV, Updated September 30, 2025

AI companies ZIP

ZIP, Updated September 30, 2025

Acknowledgements

This data was collected by Epoch AI’s employees and collaborators, including John Croxton, Josh You, Venkat Somala, and Yafah Edelman.

This documentation was written by Josh You and Venkat Somala.