All publications
  
      
      
        data insight
      
      
      
      
         · 
        
         1 min read
        
      
    
    
      Open-weight models lag state-of-the-art by around 3 months on average
    
    
    
  
  
      
      
        report
      
      
      
      
         · 
        
         14 min read
        
      
    
    
      What does OSWorld tell us about AI’s ability to use computers?
    
    
    
      We review OSWorld, a prominent computer use benchmark. Tasks are relatively simple, many don’t require GUIs, and success often hinges on interpreting ambiguous instructions. The benchmark is also not stable over time.
    
    
    
  
  
      
      
        report
      
      
      
      
         · 
        
         25 min read
        
      
    
    
      Could decentralized training solve AI’s power problem?
    
    
    
      We illustrate a decentralized 10 GW training run across a dozen sites spanning thousands of kilometers. Developers are likely to scale datacenters to multi-gigawatt levels before adopting decentralized training.
    
    
    
  
  
    Follow us on X
    
  
  
  
    
    
      
          
              
    
      
        
  
    
      
      @EpochAIResearch
    
    
      
    
  
  
      
    
      
        
  
    
      
      @EpochAIResearch
    
    
      
    
  
  
      
    
      
        
  
    
      
      @EpochAIResearch
    
    
      
    
  
  
      
    
      
        
  
    
      
      @EpochAIResearch
    
    
      
    
  
  
      
    
      
        
  
    
      
      @EpochAIResearch
    
    
      
    
  
  
      
    
      
        
  
    
      
      @EpochAIResearch
    
    
      
    
  
  
      
    
  
          
      
      
          
          
      
    
        
    Conventional wisdom in AI is that large-scale pretraining needs to happen in massive contiguous datacenter campuses. But is this true?
Our research suggests that conducting 10 GW training runs across two dozen sites — linked by a network spanning thousands of km — is feasible.
  
Our research suggests that conducting 10 GW training runs across two dozen sites — linked by a network spanning thousands of km — is feasible.
    We've launched a new tool to track AI progress!
The tool addresses one of the field's biggest challenges: benchmark saturation.
It's called the Epoch Capabilities Index (ECI) — here's what makes it different:
  
The tool addresses one of the field's biggest challenges: benchmark saturation.
It's called the Epoch Capabilities Index (ECI) — here's what makes it different:
    Stanford mathematician Ravi Vakil, president of the American Mathematical Society, expects AI’s impact on mathematics to come as a phase change, not a slow climb.
Every major shift in math has caught experts off guard, he says. This one will be no different, except that all our predictions will be even more wrong.
-- Timestamps –
00:00:00 – Playing games with imperfect information against AI
00:02:35 – When AI will learn to be truly creative
00:03:48 – AI’s impact will be even more unpredictable than the internet
00:08:02 – What an “AlphaGo moment” would look like for math
00:10:35 – How AI will actually be useful in mathematical research
00:12:20 – Writing “wow”-level math problems for AI
00:15:06 – On a 0-10 scale, AI will change math 8 + 3i
00:16:17 – Is math the next chess?
  
Every major shift in math has caught experts off guard, he says. This one will be no different, except that all our predictions will be even more wrong.
-- Timestamps –
00:00:00 – Playing games with imperfect information against AI
00:02:35 – When AI will learn to be truly creative
00:03:48 – AI’s impact will be even more unpredictable than the internet
00:08:02 – What an “AlphaGo moment” would look like for math
00:10:35 – How AI will actually be useful in mathematical research
00:12:20 – Writing “wow”-level math problems for AI
00:15:06 – On a 0-10 scale, AI will change math 8 + 3i
00:16:17 – Is math the next chess?
    We evaluated Claude Haiku 4.5 on several benchmarks.
Even with reasoning disabled, Haiku 4.5 performs similarly or better than early lightweight reasoning models, like o1-mini.
  
Even with reasoning disabled, Haiku 4.5 performs similarly or better than early lightweight reasoning models, like o1-mini.
    If you ran GPT-5 infinitely many times on FrontierMath—our extremely challenging math benchmark—would it eventually solve every problem?
Probably not. From what we can tell, it caps out below 50%.
What about throwing in *every* available model? Infinitely many times? 🧵
  
Probably not. From what we can tell, it caps out below 50%.
What about throwing in *every* available model? Infinitely many times? 🧵
    OpenAI is experiencing one of the fastest revenue growth rates in corporate history, with annualized revenue rising 3x a year, from $2 billion at the end of 2023 to $13 billion by August 2025.
  
  
      
      
        newsletter
      
      
      
      
         · 
        
         8 min read
        
      
    
    
      Less than 70% of FrontierMath is within reach for today’s models
    
    
    
  
      
      
        newsletter
      
      
      
      
         · 
        
         7 min read
        
      
    
    
      OpenAI is projecting unprecedented revenue growth
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         1 min read
        
      
    
    
      OpenAI's revenue has been growing 3x a year since 2024
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         3 min read
        
      
    
    
      Most of OpenAI’s 2024 compute went to experiments
    
    
    
  
  
      
      
        report
      
      
      
      
         · 
        
         23 min read
        
      
    
    
      Evaluating Gemini 2.5 Deep Think's math capabilities
    
    
    
      Improved use of knowledge and precision, helpful for research, more conceptual in geometry, but limited creativity and citation issues.
    
    
    
  
      
      
        newsletter
      
      
      
      
         · 
        
         8 min read
        
      
    
    
      How many digital workers could OpenAI deploy?
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         1 min read
        
      
    
    
      AI capabilities have steadily improved over the past year
    
    
    
  
  
      
      
        update
      
      
      
      
         · 
        
         2 min read
        
      
    
    
      Introducing the AI Companies Data Hub
    
    
    
      Our new AI Companies Data Hub tracks key economic and operational data, including frontier AI companies’ revenue, funding, valuations, staff counts, compute spending, and product usage
    
    
    
  
      
      
        newsletter
      
      
      
      
         · 
        
         7 min read
        
      
    
    
      Why GPT-5 used less training compute than GPT-4.5 (but GPT-6 probably won’t)
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         2 min read
        
      
    
    
      AI developers accurately report GPQA Diamond scores for recent models
    
    
    
  
      
      
        newsletter
      
      
      
      
         · 
        
         8 min read
        
      
    
    
      The huge potential implications of long-context inference
    
    
    
  
  
      
      
        report
      
      
      
      
         · 
        
         8 min read
        
      
    
    
      What will AI look like in 2030?
    
    
    
      If scaling persists to 2030, AI investments will reach hundreds of billions of dollars and require gigawatts of power. Benchmarks suggest AI could improve productivity in valuable areas such as scientific R&D.
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         3 min read
        
      
    
    
      What did it take to train Grok 4?
    
    
    
  
      
      
        newsletter
      
      
      
      
         · 
        
         11 min read
        
      
    
    
      Three challenges facing compute-based AI policies
    
    
    
  
      
      
        newsletter
      
      
      
      
         · 
        
         10 min read
        
      
    
    
      Compute scaling will slow down due to increasing lead times
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         5 min read
        
      
    
    
      LLMs have not yet solved the hardest problems on high school math contests
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         2 min read
        
      
    
    
      GPT-5 and GPT-4 were both major leaps in benchmarks from the previous generation
    
    
    
  
      
      
        newsletter
      
      
      
      
         · 
        
         7 min read
        
      
    
    
      Why future AI agents will be trained to work together
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         5 min read
        
      
    
    
      Frontier AI performance becomes accessible on consumer hardware within a year
    
    
    
  
  
      
      
        paper
      
      
      
      
         · 
        
         4 min read
        
      
    
    
      How much power will frontier AI training demand in 2030?
    
    
    
      The power required to train the largest frontier models is growing by more than 2x per year, and is on trend to reaching multiple gigawatts by 2030.
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         3 min read
        
      
    
    
      Compute is not a bottleneck for robotic manipulation
    
    
    
  
      
      
        newsletter
      
      
      
      
         · 
        
         10 min read
        
      
    
    
      We didn’t learn much from the IMO
    
    
    
  
      
      
        newsletter
      
      
      
      
         · 
        
         10 min read
        
      
    
    
      Quantifying the algorithmic improvement from reasoning models
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         2 min read
        
      
    
    
      Training open-weight models is becoming more data intensive
    
    
    
  
      
      
        newsletter
      
      
      
      
         · 
        
         12 min read
        
      
    
    
      Why China isn’t about to leap ahead of the West on compute
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         2 min read
        
      
    
    
      Frontier training runs will likely stop getting longer by around 2027
    
    
    
  
  
      
      
        report
      
      
      
      
         · 
        
         31 min read
        
      
    
    
      Evaluating Grok 4’s math capabilities
    
    
    
      It's good at involved computations, improving at proofs, and useful for literature search. It still favors low-level grinds and leans on background knowledge.
    
    
    
  
      
      
        newsletter
      
      
      
      
         · 
        
         11 min read
        
      
    
    
      After the ChatGPT moment: Measuring AI’s adoption
    
    
    
  
  
      
      
        update
      
      
      
      
         · 
        
         15 min read
        
      
    
    
      How to run SWE-bench Verified in one hour on one machine
    
    
    
      We are releasing a public registry of optimized Docker images for SWE-bench. This allows us to run SWE-bench Verified in 62 minutes on a single GitHub actions VM.
    
    
    
  
      
      
        newsletter
      
      
      
      
         · 
        
         15 min read
        
      
    
    
      What will the IMO tell us about AI math capabilities?
    
    
    
  
      
      
        newsletter
      
      
      
      
         · 
        
         9 min read
        
      
    
    
      How big could an “AI Manhattan Project” get?
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         4 min read
        
      
    
    
      LLMs now accept longer inputs, and the best models can use them more effectively
    
    
    
  
      
      
        newsletter
      
      
      
      
         · 
        
         9 min read
        
      
    
    
      AI and explosive growth redux
    
    
    
  
  
      
      
        paper
      
      
      
      
         · 
        
         4 min read
        
      
    
    
      Inference economics of language models
    
    
    
      We investigate how speed trades off against cost in language model inference. We find that inference latency scales with the square root of model size and the cube root of memory bandwidth, and other results.
    
    
    
  
      
      
        newsletter
      
      
      
      
         · 
        
         19 min read
        
      
    
    
      Do the biorisk evaluations of AI labs actually measure the risk of developing bioweapons?
    
    
    
  
  
      
      
        report
      
      
      
      
         · 
        
         11 min read
        
      
    
    
      What skills does SWE-bench Verified evaluate?
    
    
    
      We take a deep dive into SWE-bench Verified, a prominent agentic coding benchmark. While one of the best public tests of AI coding agents, it is limited by its focus on simple bug fixes in familiar open-source repositories.
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         3 min read
        
      
    
    
      LLM providers offer a trade-off between accuracy and speed
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         8 min read
        
      
    
    
      Over 30 AI models have been trained at the scale of GPT-4
    
    
    
  
      
      
        newsletter
      
      
      
      
         · 
        
         8 min read
        
      
    
    
      Beyond benchmark scores: Analyzing o3-mini’s mathematical reasoning
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         3 min read
        
      
    
    
      Power requirements of leading AI supercomputers have doubled every 13 months
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         1 min read
        
      
    
    
      Private-sector companies own a dominant share of GPU clusters
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         2 min read
        
      
    
    
      The US hosts the majority of GPU cluster performance, followed by China
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         1 min read
        
      
    
    
      Acquisition costs of leading AI supercomputers have doubled every 13 months
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         2 min read
        
      
    
    
      The computational performance of leading AI supercomputers has doubled every nine months
    
    
    
  
      
      
        update
      
      
      
      
         · 
        
         9 min read
        
      
    
    
      What is Epoch?
    
    
    
      Our director explains Epoch AI’s mission and how we decide our priorities. In short, we work on projects to understand the trajectory of AI, share this knowledge publicly, and inform important decisions about AI.
    
    
    
  
      
      
        newsletter
      
      
      
      
         · 
        
         11 min read
        
      
    
    
      GPQA Diamond: What’s left?
    
    
    
  
  
      
      
        report
      
      
      
      
         · 
        
         35 min read
        
      
    
    
      How many AI models will exceed compute thresholds?
    
    
    
      We project how many notable AI models will exceed training compute thresholds. Model counts rapidly grow from 10 above 1e26 FLOP by 2026, to over 200 by 2030.
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         4 min read
        
      
    
    
      Widespread adoption of new numeric formats took 3-4 years in past cycles
    
    
    
  
      
      
        newsletter
      
      
      
      
         · 
        
         7 min read
        
      
    
    
      Is AI already superhuman on FrontierMath?
    
    
    
  
      
      
        newsletter
      
      
      
      
         · 
        
         8 min read
        
      
    
    
      How fast can algorithms advance capabilities?
    
    
    
  
      
      
        newsletter
      
      
      
      
         · 
        
         10 min read
        
      
    
    
      How far can reasoning models scale?
    
    
    
  
      
      
        newsletter
      
      
      
      
         · 
        
         10 min read
        
      
    
    
      Where’s my ten minute AGI?
    
    
    
  
      
      
        newsletter
      
      
      
      
         · 
        
         12 min read
        
      
    
    
      The case for multi-decade AI timelines
    
    
    
  
  
      
      
        paper
      
      
      
      
         · 
        
         4 min read
        
      
    
    
      Trends in AI supercomputers
    
    
    
      AI supercomputers double in performance every 9 months, cost billions of dollars, and require as much power as mid-sized cities. Companies now own 80% of all AI supercomputers, while governments’ share has declined.
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         2 min read
        
      
    
    
      LLM responses to benchmark questions are getting longer over time
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         7 min read
        
      
    
    
      The combined revenues of leading AI companies grew by over 9x in 2023-2024
    
    
    
  
      
      
        newsletter
      
      
      
      
         · 
        
         4 min read
        
      
    
    
      The real reason AI benchmarks haven’t reflected economic impacts
    
    
    
  
      
      
        newsletter
      
      
      
      
         · 
        
         15 min read
        
      
    
    
      Most AI value will come from broad automation, not from R&D
    
    
    
  
  
      
      
        paper
      
      
      
      
         · 
        
         5 min read
        
      
    
    
      GATE: Modeling the trajectory of AI and automation
    
    
    
      We introduce a compute-centric model of AI automation and its economic effects, illustrating key dynamics of AI development. The model suggests large AI investments and subsequent economic growth.
    
    
    
  
  
      
      
        update
      
      
      
      
         · 
        
         1 min read
        
      
    
    
      FrontierMath competition: Setting benchmarks for AI evaluation
    
    
    
      We are hosting a competition to establish rigorous human performance baselines for FrontierMath. With a prize pool of $10,000, your participation will contribute directly to measuring AI progress in solving challenging mathematical problems.
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         5 min read
        
      
    
    
      LLM inference prices have fallen rapidly but unequally across tasks
    
    
    
  
      
      
        newsletter
      
      
      
      
         · 
        
         10 min read
        
      
    
    
      What AI can currently do is not the story
    
    
    
  
  
      
      
        report
      
      
      
      
         · 
        
         9 min read
        
      
    
    
      Train once, deploy many: AI and increasing returns
    
    
    
      AI's “train-once-deploy-many” advantage yields increasing returns: doubling compute more than doubles output by increasing models' inference efficiency and enabling more deployed inference instances.
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         4 min read
        
      
    
    
      Leading AI chip designs are used for around four years in frontier training
    
    
    
  
      
      
        newsletter
      
      
      
      
         · 
        
         14 min read
        
      
    
    
      The promise of reasoning models
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         3 min read
        
      
    
    
      Biology AI models are scaling 2-4x per year after rapid growth from 2019-2021
    
    
    
  
      
      
        newsletter
      
      
      
      
         · 
        
         11 min read
        
      
    
    
      AI progress is about to speed up
    
    
    
  
      
      
        newsletter
      
      
      
      
         · 
        
         13 min read
        
      
    
    
      Algorithmic progress likely spurs more spending on compute, not less
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         7 min read
        
      
    
    
      The stock of computing power from NVIDIA chips is doubling every 10 months
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         1 min read
        
      
    
    
      US models currently outperform non-US models
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         1 min read
        
      
    
    
      Models with downloadable weights currently lag behind the top-performing models
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         1 min read
        
      
    
    
      Accuracy increases with estimated training compute
    
    
    
  
      
      
        newsletter
      
      
      
      
         · 
        
         22 min read
        
      
    
    
      How much energy does ChatGPT use?
    
    
    
  
  
      
      
        update
      
      
      
      
         · 
        
         3 min read
        
      
    
    
      A more systematic and transparent AI benchmarking hub
    
    
    
      We've overhauled our AI benchmarking infrastructure to provide more transparent, systematic, and up-to-date evaluations of AI model capabilities.
    
    
    
  
      
      
        newsletter
      
      
      
      
         · 
        
         14 min read
        
      
    
    
      What went into training DeepSeek-R1?
    
    
    
  
  
      
      
        update
      
      
      
      
         · 
        
         2 min read
        
      
    
    
      Announcing our expanded biology AI coverage
    
    
    
      We've expanded our Biology AI Dataset, now covering 360+ models. Our analysis reveals rapid scaling from 2017-2021, followed by a notable slowdown in biological model development.
    
    
    
  
      
      
        newsletter
      
      
      
      
         · 
        
         16 min read
        
      
    
    
      AGI could drive wages below subsistence level
    
    
    
  
  
      
      
        update
      
      
      
      
         · 
        
         2 min read
        
      
    
    
      Clarifying the creation and use of the FrontierMath benchmark
    
    
    
      We clarify that OpenAI commissioned Epoch AI to produce 300 math questions for the FrontierMath benchmark. They own these and have access to the statements and solutions, except for a 50-question holdout set.
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         4 min read
        
      
    
    
      Chinese language models have scaled up more slowly than their global counterparts
    
    
    
  
      
      
        newsletter
      
      
      
      
         · 
        
         12 min read
        
      
    
    
      How has DeepSeek improved the Transformer architecture?
    
    
    
  
  
      
      
        update
      
      
      
      
         · 
        
         9 min read
        
      
    
    
      2024 impact report
    
    
    
      Epoch's Impact Report for 2024 highlights influential research on AI's trajectory, the launch of FrontierMath, an expanded AI data hub, engagement with leaders, $7M raised, and more.
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         3 min read
        
      
    
    
      Frontier open models may surpass 10²⁶ FLOP of training compute before 2026
    
    
    
  
      
      
        newsletter
      
      
      
      
         · 
        
         16 min read
        
      
    
    
      The economic consequences of automating remote work
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         4 min read
        
      
    
    
      Training compute growth is driven by larger clusters, longer training, and better hardware
    
    
    
  
      
      
        newsletter
      
      
      
      
         · 
        
         9 min read
        
      
    
    
      Moravec’s paradox and its implications
    
    
    
  
      
      
        newsletter
      
      
      
      
         · 
        
         10 min read
        
      
    
    
      How do mixture-of-experts models compare to dense models in inference?
    
    
    
  
      
      
        newsletter
      
      
      
      
         · 
        
         8 min read
        
      
    
    
      Frontier language models have become much smaller
    
    
    
  
      
      
        update
      
      
      
      
         · 
        
         1 min read
        
      
    
    
      Announcing Gradient Updates: Our new weekly newsletter
    
    
    
      We are announcing Gradient Updates, Epoch AI’s new weekly newsletter focused on timely and important questions in AI.
    
    
    
  
      
      
        newsletter
      
      
      
      
         · 
        
         8 min read
        
      
    
    
      What did US export controls mean for China’s AI capabilities?
    
    
    
  
  
      
      
        report
      
      
      
      
         · 
        
         7 min read
        
      
    
    
      What is the future of AI in mathematics? Interviews with leading mathematicians
    
    
    
      How will AI transform mathematics? Fields Medalists and other leading mathematicians discuss whether they expect AI to automate advanced math research.
    
    
    
  
  
      
      
        update
      
      
      
      
         · 
        
         6 min read
        
      
    
    
      Introducing the distributed training interactive simulator
    
    
    
      We introduce and walk you through an interactive tool that simulates distributed training runs of large language models under ideal conditions.
    
    
    
  
  
      
      
        update
      
      
      
      
         · 
        
         2 min read
        
      
    
    
      Introducing Epoch AI's AI benchmarking hub
    
    
    
      We are launching the AI Benchmarking Hub: a platform presenting our evaluations of leading models on challenging benchmarks, with analysis of trends in AI capabilities.
    
    
    
  
  
      
      
        report
      
      
      
      
         · 
        
         15 min read
        
      
    
    
      Hardware failures won’t limit AI scaling
    
    
    
      Hardware failures won't limit AI training scale - GPU memory checkpointing enables training with millions of GPUs despite failures.
    
    
    
  
  
      
      
        paper
      
      
      
      
         · 
        
         6 min read
        
      
    
    
      FrontierMath: A benchmark for evaluating advanced mathematical reasoning in AI
    
    
    
      FrontierMath: a new benchmark of expert-level math problems designed to measure AI's mathematical abilities. See how leading AI models perform against the collective mathematics community.
    
    
    
  
  
      
      
        report
      
      
      
      
         · 
        
         37 min read
        
      
    
    
      How far behind are open models?
    
    
    
      Analysis of open vs. closed AI models reveals the best open model today matches closed models in performance and training compute, but with a one-year lag.
    
    
    
  
  
      
      
        paper
      
      
      
      
         · 
        
         14 min read
        
      
    
    
      Data movement bottlenecks to large-scale model training: Scaling past 1e28 FLOP
    
    
    
      Data movement bottlenecks limit LLM scaling beyond 2e28 FLOP, with a "latency wall" at 2e31 FLOP. We may hit these in ~3 years. Aggressive batch size scaling could potentially overcome these limits.
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         1 min read
        
      
    
    
      AI training cluster sizes increased by more than 20x since 2016
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         1 min read
        
      
    
    
      Performance per dollar improves around 30% each year
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         1 min read
        
      
    
    
      The computational performance of machine learning hardware has doubled every 2.3 years
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         1 min read
        
      
    
    
      The NVIDIA A100 has been the most popular hardware for training notable machine learning models
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         1 min read
        
      
    
    
      Leading ML hardware becomes 40% more energy-efficient each year
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         1 min read
        
      
    
    
      Performance improves 13x when switching from FP32 to tensor-INT8
    
    
    
  
  
      
      
        update
      
      
      
      
         · 
        
         1 min read
        
      
    
    
      Introducing Epoch AI's machine learning hardware database
    
    
    
      Our new database covers hardware used to train AI models, featuring over 100 accelerators (GPUs and TPUs) across the deep learning era.
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         8 min read
        
      
    
    
      Leading AI companies have hundreds of thousands of cutting-edge AI chips
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         1 min read
        
      
    
    
      The power required to train frontier AI models is doubling annually
    
    
    
  
  
      
      
        report
      
      
      
      
         · 
        
         10 min read
        
      
    
    
      Interviewing AI researchers on automation of AI R&D
    
    
    
      AI could speed up AI R&D, especially in coding and debugging. We explore predictions on automation and researchers' suggestions for AI R&D evaluations.
    
    
    
  
  
      
      
        report
      
      
      
      
         · 
        
         83 min read
        
      
    
    
      Can AI scaling continue through 2030?
    
    
    
      We investigate four constraints to scaling AI training: power, chip manufacturing, data, and latency. We predict 2e29 FLOP runs will be feasible by 2030.
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         1 min read
        
      
    
    
      The length of time spent training notable models is growing
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         1 min read
        
      
    
    
      Language models compose the large majority of large-scale AI models
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         1 min read
        
      
    
    
      Most large-scale models are developed by US companies
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         1 min read
        
      
    
    
      The pace of large-scale model releases is accelerating
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         1 min read
        
      
    
    
      Almost half of large-scale models have published, downloadable weights
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         1 min read
        
      
    
    
      The size of datasets used to train language models doubles approximately every six months
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         1 min read
        
      
    
    
      Training compute costs are doubling every eight months for the largest AI models
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         1 min read
        
      
    
    
      The training compute of notable AI models has been doubling roughly every six months
    
    
    
  
  
      
      
        data insight
      
      
      
      
         · 
        
         1 min read
        
      
    
    
      Training compute has scaled up faster for language than vision
    
    
    
  
  
      
      
        update
      
      
      
      
         · 
        
         1 min read
        
      
    
    
      Announcing Epoch AI’s data hub
    
    
    
      We're launching a hub for data and visualizations, featuring our databases on notable and large-scale AI models for users and researchers.
    
    
    
  
  
      
      
        paper
      
      
      
      
         · 
        
         6 min read
        
      
    
    
      Will we run out of data? Limits of LLM scaling based on human-generated data
    
    
    
      If trends continue, language models will fully utilize the stock of human-generated public text between 2026 and 2032.
    
    
    
  
  
      
      
        paper
      
      
      
      
         · 
        
         4 min read
        
      
    
    
      How much does it cost to train frontier AI models?
    
    
    
      The cost of training top AI models has grown 2-3x annually for the past eight years. By 2027, the largest models could cost over a billion dollars.
    
    
    
  
  
      
      
        report
      
      
      
      
         · 
        
         20 min read
        
      
    
    
      Training compute of frontier AI models grows by 4-5x per year
    
    
    
      Our expanded AI model database shows that training compute grew 4-5x/year from 2010 to 2024, with similar trends in frontier and large language models.
    
    
    
  
  
      
      
        paper
      
      
      
      
         · 
        
         10 min read
        
      
    
    
      Do the returns to software R&D point towards a singularity?
    
    
    
      Returns to R&D are key in growth dynamics and AI development. Our paper introduces new empirical techniques to estimate this vital parameter.
    
    
    
  
  
      
      
        paper
      
      
      
      
         · 
        
         4 min read
        
      
    
    
      Chinchilla scaling: A replication attempt
    
    
    
      We replicate Hoffmann et al.’s parametric scaling law estimates, finding issues and providing better-fitting estimates that align with their other methods.
    
    
    
  
      
      
        report
      
      
      
      
         · 
        
         16 min read
        
      
    
    
      Tracking large-scale AI models
    
    
    
      We present a dataset of 81 large-scale models, from AlphaGo to Gemini, developed across 18 countries, at the leading edge of scale and capabilities.
    
    
    
  
  
      
      
        report
      
      
      
      
         · 
        
         9 min read
        
      
    
    
      Optimally allocating compute between inference and training
    
    
    
      AI labs should spend comparable resources on training and inference, assuming they can flexibly balance compute between the two to maintain performance.
    
    
    
  
  
      
      
        paper
      
      
      
      
         · 
        
         3 min read
        
      
    
    
      Algorithmic progress in language models
    
    
    
      Progress in pretrained language model performance outpaces expectations, occurring at a pace equivalent to doubling computational power every 5 to 14 months.
    
    
    
  
  
      
      
        update
      
      
      
      
         · 
        
         10 min read
        
      
    
    
      2023 impact report
    
    
    
      In 2023, Epoch published nearly 20 reports on AI, added hundreds of models to our database, helped with government policies, and raised over $7 million.
    
    
    
  
  
      
      
        report
      
      
      
      
         · 
        
         23 min read
        
      
    
    
      Biological sequence models in the context of the AI directives
    
    
    
      Our expanded database now includes biological sequence models, highlighting potential regulatory gaps and the growth of training compute in these models.
    
    
    
  
  
      
      
        paper
      
      
      
      
         · 
        
         3 min read
        
      
    
    
      How predictable is language model benchmark performance?
    
    
    
      We investigate large language model performance, finding that compute-focused extrapolations are a promising way to forecast AI capabilities.
    
    
    
  
  
      
      
        paper
      
      
      
      
         · 
        
         4 min read
        
      
    
    
      Limits to the energy efficiency of CMOS microprocessors
    
    
    
      How far can the energy efficiency of CMOS microprocessors be pushed before hitting physical limits? We find room for a further 50 to 1000x improvement.
    
    
    
  
  
      
      
        paper
      
      
      
      
         · 
        
         2 min read
        
      
    
    
      AI capabilities can be significantly improved without expensive retraining
    
    
    
      While scaling compute is key to improving LLMs, post-training enhancements can offer gains equivalent to 5-20x more compute at less than 1% of the cost.
    
    
    
  
  
      
      
        paper
      
      
      
      
         · 
        
         3 min read
        
      
    
    
      Who is leading in AI? An analysis of industry AI research
    
    
    
      Industry has emerged as a driving force in AI. We compare top companies on research impact, training runs, and contributions to algorithmic innovations.
    
    
    
  
  
      
      
        report
      
      
      
      
         · 
        
         31 min read
        
      
    
    
      Challenges in predicting AI automation
    
    
    
      Economists propose various approaches to predicting AI's automation of valuable tasks, but disagreements persist, with no consensus on the best method.
    
    
    
  
  
      
      
        report
      
      
      
      
         · 
        
         27 min read
        
      
    
    
      Trends in machine learning hardware
    
    
    
      FLOP/s performance in 47 ML hardware accelerators doubled every 2.3 years. Switching from FP32 to tensor-FP16 led to a further 10x performance increase.
    
    
    
  
  
      
      
        update
      
      
      
      
         · 
        
         1 min read
        
      
    
    
      Announcing Epoch AI's updated parameter, compute and data trends database
    
    
    
      Our database now spans over 700 ML systems, tracking parameters, datasets, and training compute details for notable machine learning models.
    
    
    
  
  
      
      
        paper
      
      
      
      
         · 
        
         11 min read
        
      
    
    
      Explosive growth from AI: A review of the arguments
    
    
    
      Our new article explores whether deployment of advanced AI systems could lead to growth rates ten times higher than those of today’s frontier economies.
    
    
    
  
  
      
      
        report
      
      
      
      
         · 
        
         27 min read
        
      
    
    
      Trading off compute in training and inference
    
    
    
      We characterize techniques that induce a tradeoff between spending resources on training and inference, outlining their implications for AI governance.
    
    
    
  
  
      
      
        report
      
      
      
      
         · 
        
         10 min read
        
      
    
    
      The limited benefit of recycling foundation models
    
    
    
      Reusing pretrained models can save on training costs, but it's unlikely to significantly boost AI capabilities beyond modest improvements.
    
    
    
  
  
      
      
        update
      
      
      
      
         · 
        
         3 min read
        
      
    
    
      Epoch AI and FRI mentorship program summer 2023
    
    
    
      We’re launching the Epoch and FRI mentorship program for women, non-binary, and transgender people interested in AI forecasting.
    
    
    
  
  
      
      
        report
      
      
      
      
         · 
        
         14 min read
        
      
    
    
      Direct Approach interactive model
    
    
    
      When could transformative AI be achieved? We present a simple, user-adjustable model of key inputs that forecasts the date TAI could be deployed.
    
    
    
  
  
      
      
        viewpoint
      
      
      
      
         · 
        
         26 min read
        
      
    
    
      A compute-based framework for thinking about the future of AI
    
    
    
      AI’s potential to automate labor could alter the course of human history. The availability of compute is the most important factor driving progress in AI.
    
    
    
  
  
      
      
        viewpoint
      
      
      
      
         · 
        
         1 min read
        
      
    
    
      Please report your compute
    
    
    
      Compute is essential for AI performance, yet often underreported. Adopting reporting norms would improve research, forecasts, and policy decisions.
    
    
    
  
  
      
      
        report
      
      
      
      
         · 
        
         10 min read
        
      
    
    
      The Direct Approach
    
    
    
      We propose a method using neural scaling laws to estimate the compute needed to train AI models to reach human-level performance on various tasks.
    
    
    
  
  
      
      
        paper
      
      
      
      
         · 
        
         2 min read
        
      
    
    
      Power laws in speedrunning and machine learning
    
    
    
      Our model suggests ML benchmarks aren’t near saturation. While large improvements are rare, we find 1OOM gains happen roughly once in every 50 instances.
    
    
    
  
      
      
        update
      
      
      
      
         · 
        
         1 min read
        
      
    
    
      Announcing Epoch AI’s dashboard of key trends and figures in machine learning
    
    
    
      Our dashboard provides key data from our research on machine learning and is a valuable resource for understanding the present and future of the field.
    
    
    
  
  
      
      
        update
      
      
      
      
         · 
        
         1 min read
        
      
    
    
      2022 impact report
    
    
    
      Our impact report for 2022.
    
    
    
  
  
      
      
        report
      
      
      
      
         · 
        
         66 min read
        
      
    
    
      Trends in the dollar training cost of machine learning systems
    
    
    
      How much does it cost to train AI models? Looking at 124 ML systems from between 2009 and 2022, we find the cost has grown by approximately 0.5OOM/year.
    
    
    
  
  
      
      
        report
      
      
      
      
         · 
        
         6 min read
        
      
    
    
      Scaling laws literature review
    
    
    
      I have collected a database of scaling laws for different tasks and architectures, and reviewed dozens of papers in the scaling law literature.
    
    
    
  
  
      
      
        update
      
      
      
      
         · 
        
         1 min read
        
      
    
    
      An interactive model of AI takeoff speeds
    
    
    
      We have developed an interactive website showcasing a new model of AI takeoff speeds.
    
    
    
  
  
      
      
        report
      
      
      
      
         · 
        
         16 min read
        
      
    
    
      Literature review of transformative artificial intelligence timelines
    
    
    
      We summarize and compare several models and forecasts predicting when transformative AI will be developed.
    
    
    
  
  
      
      
        paper
      
      
      
      
         · 
        
         2 min read
        
      
    
    
      Revisiting algorithmic progress
    
    
    
      Examining over 100 computer vision models, we find that every 9 months, better algorithms contribute the equivalent of a doubling of compute budgets.
    
    
    
  
  
      
      
        paper
      
      
      
      
         · 
        
         3 min read
        
      
    
    
      Will we run out of ML data? Evidence from projecting dataset size trends
    
    
    
      We project dataset growth in language and vision domains, estimating future limits to training by evaluating the availability of unlabeled data over time.
    
    
    
  
  
      
      
        report
      
      
      
      
         · 
        
         12 min read
        
      
    
    
      The longest training run
    
    
    
      Training runs of large ML systems will likely last less than 14-15 months, as shorter runs starting later use better hardware and algorithms.
    
    
    
  
  
      
      
        report
      
      
      
      
         · 
        
         22 min read
        
      
    
    
      A time-invariant version of Laplace’s rule
    
    
    
      We discuss estimating event probabilities with past data, addressing issues with Laplace’s rule and proposing a modification to improve accuracy.
    
    
    
  
  
      
      
        paper
      
      
      
      
         · 
        
         2 min read
        
      
    
    
      Machine learning model sizes and the parameter gap
    
    
    
      Since 2018, the size of ML models has been growing 10 times faster than before. Around 2020, model sizes saw a significant jump, increasing by 1OOM.
    
    
    
  
  
      
      
        report
      
      
      
      
         · 
        
         14 min read
        
      
    
    
      Trends in GPU price-performance
    
    
    
      Improvements in hardware are central to AI progress. Using data on 470 GPUs from 2006 to 2021, we find that FLOP/s per dollar doubles every ~2.5 years.
    
    
    
  
  
      
      
        update
      
      
      
      
         · 
        
         4 min read
        
      
    
    
      Announcing Epoch AI: A research initiative investigating the road to transformative AI
    
    
    
      We are a new research initiative forecasting developments in AI. Come join us!
    
    
    
  
  
      
      
        paper
      
      
      
      
         · 
        
         7 min read
        
      
    
    
      Compute trends across three eras of machine learning
    
    
    
      We’ve compiled a comprehensive dataset of the training compute of AI models, providing key insights into AI development.
    
    
    
  
  
      
      
        report
      
      
      
      
         · 
        
         24 min read
        
      
    
    
      Estimating training compute of deep learning models
    
    
    
      We describe two approaches for estimating the training compute of Deep Learning systems, by counting operations and looking at GPU time.
    
    
    
  
  
      
      
        report
      
      
      
      
         · 
        
         8 min read
        
      
    
    
      What’s the backward-forward FLOP ratio for neural networks?
    
    
    
      Determining the backward-forward FLOP ratio for neural networks, to help calculate their total training compute.
    
    
    
  
  
      
      
        report
      
      
      
      
         · 
        
         9 min read
        
      
    
    
      How to measure FLOP for neural networks empirically?
    
    
    
      Computing the utilization rate for multiple Neural Network architectures.