Jean-Stanislas Denain

By Jean-Stanislas Denain

0 results

How Fast Could Robot Production Scale Up?

Report

Apr. 22, 2026

How Fast Could Robot Production Scale Up?

We look at reference classes, factory buildout timelines, and upstream component supply to estimate plausible production rates for humanoids, quadrupeds, robotic arms, wheeled robots, and drones.

By Jean-Stanislas Denain and Yann Rivière

Report

Apr. 16, 2026

Have AI Capabilities Accelerated?

We investigate progress trends on four capability metrics to determine whether AI capabilities have recently accelerated. Three of four metrics show strong evidence of acceleration, driven by reasoning models.

By Jean-Stanislas Denain and Alexander Barry

What do frontier AI companies' job postings reveal about their plans?

Newsletter

Mar. 24, 2026

What do frontier AI companies' job postings reveal about their plans?

A fast increase in go-to-market roles, and hints about upcoming products

By Jean-Stanislas Denain and Campbell Hutcheson

Final training runs account for a minority of R&D compute spending

Newsletter

Mar. 23, 2026

Final training runs account for a minority of R&D compute spending

New evidence following the MiniMax and Z.ai IPOs

By Jean-Stanislas Denain and Cheryl Wu

Expanding our analysis of biological AI models

Report

Feb. 20, 2026

Expanding our analysis of biological AI models

We release a database of over 1,100 biological AI models across nine categories. We analyze their safeguards, accessibility, training data sources, and the foundation models they build on.

By David Atanasov, Niccolò Zanichelli, and Jean-Stanislas Denain

Newsletter

Feb. 16, 2026

How persistent is the inference cost burden?

Toby Ord argues that RL scaling primarily increases inference costs, creating a persistent economic burden. While the framing is useful, the cost to reach a given capability level falls fast, and the RL scaling data is thin.

By Jean-Stanislas Denain

Where Autonomy Works: Evaluating Robot Capabilities in 2026

Report

Feb. 10, 2026

Where Autonomy Works: Evaluating Robot Capabilities in 2026

We assess the current state of autonomous robotics by evaluating robot performance on concrete tasks across industrial, household, and navigation domains.

By Yann Rivière and Jean-Stanislas Denain

An FAQ on Reinforcement Learning Environments

Newsletter

Jan. 12, 2026

An FAQ on Reinforcement Learning Environments

We interviewed 18 people across RL environment startups, neolabs, and frontier labs about the state of the field and where it's headed.

By Jean-Stanislas Denain and Chris Barber

Newsletter

Dec. 23, 2025

Why benchmarking is hard

Running benchmarks involves many moving parts, each of which can influence the final score. The two most impactful components are scaffolds and API providers.

By Florian Brand and Jean-Stanislas Denain

Newsletter

Dec. 19, 2025

The changing drivers of LLM adoption

Public data as well as our original polling suggest LLM adoption is roughly on trend, but the underlying drivers are shifting.

By Jean-Stanislas Denain and Anson Ho

Is almost everyone wrong about America’s AI power problem?

Newsletter

Dec. 17, 2025

Is almost everyone wrong about America’s AI power problem?

Why power is less of a bottleneck than you think.

By Anson Ho, Yafah Edelman, Josh You, and Jean-Stanislas Denain

Paper

Dec. 2, 2025

A Rosetta Stone for AI benchmarks

Most benchmarks saturate too quickly to study long-run AI trends. We solve this using a statistical framework that stitches benchmarks together, with big implications for algorithmic progress and AI forecasting.

By Anson Ho, Jean-Stanislas Denain, David Atanasov, Samuel Albanie, and Rohin Shah

How many digital workers could OpenAI deploy?

Newsletter

Oct. 3, 2025

How many digital workers could OpenAI deploy?

OpenAI has the inference compute to deploy tens of millions of digital workers, but only on a narrow set of tasks – for now.

By Jean-Stanislas Denain, Anson Ho, and Jaime Sevilla

Why GPT-5 used less training compute than GPT-4.5 (but GPT-6 probably won’t)

Newsletter

Sep. 26, 2025

Why GPT-5 used less training compute than GPT-4.5 (but GPT-6 probably won’t)

OpenAI focused on scaling post-training on a smaller model

By Yafah Edelman, Jean-Stanislas Denain, Jaime Sevilla, and Anson Ho

Newsletter

Sep. 19, 2025

The huge potential implications of long-context inference

Continual learning, scaling RL, and research feedback loops

By Jean-Stanislas Denain and Anson Ho

Newsletter

Aug. 22, 2025

Why future AI agents will be trained to work together

Many multi-agent setups are based on fancy prompts, but this is unlikely to persist

By Anson Ho and Jean-Stanislas Denain

What skills does SWE-bench Verified evaluate?

Report

Jun. 13, 2025

What skills does SWE-bench Verified evaluate?

We take a deep dive into SWE-bench Verified, a prominent agentic coding benchmark. While one of the best public tests of AI coding agents, it is limited by its focus on simple bug fixes in familiar open-source repositories.

By Florian Brand and Jean-Stanislas Denain

Newsletter

Jun. 6, 2025

Beyond benchmark scores: Analyzing o3-mini’s mathematical reasoning

Examining o3-mini's math reasoning: an erudite, vibes-based solver that excels in knowledge but lacks precision, creativity, and formal human rigor.

By Anson Ho, Jean-Stanislas Denain, and Elliot Glazer

LLM responses to benchmark questions are getting longer over time

Data Insight

Apr. 17, 2025

LLM responses to benchmark questions are getting longer over time

By Luke Emberson, Ben Cottier, Josh You, Tom Adamczewski, and Jean-Stanislas Denain

Newsletter

Mar. 28, 2025

The real reason AI benchmarks haven’t reflected economic impacts

The real reason that AI benchmarks haven’t reflected real-world impacts historically is that they weren’t optimized for this, not because of fundamental limitations – but this might be changing.

By Anson Ho and Jean-Stanislas Denain

Accuracy increases with estimated training compute

Data Insight

Updated Feb. 7, 2025

Accuracy increases with estimated training compute

By Jean-Stanislas Denain

US models currently outperform non-US models

Data Insight

Updated Feb. 7, 2025

US models currently outperform non-US models

By Jean-Stanislas Denain

Models with downloadable weights currently lag behind the top-performing models

Data Insight

Updated Feb. 7, 2025

Models with downloadable weights currently lag behind the top-performing models

By Jean-Stanislas Denain

AI capabilities can be significantly improved without expensive retraining

Paper

Dec. 12, 2023

AI capabilities can be significantly improved without expensive retraining

While scaling compute for training is key to improving LLM performance, some post-training enhancements can offer gains equivalent to training with 5 to 20x more compute at less than 1% the cost.

By Tom Davidson, Jean-Stanislas Denain, Pablo Villalobos, and Guillem Bas

AI Progress

Industry

Infrastructure

Impacts

Featured

Publications

Data explorers

Benchmarks by Epoch AI

Papers & Reports

Data Insights

Newsletter

Podcast

Capabilities

Models

Frontier Data Centers

Chip Owners

Companies

Polling on AI Use

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

Jean-Stanislas Denain

Filter

By Jean-Stanislas Denain

Jean-Stanislas Denain

AI Progress

Industry

Infrastructure

Impacts

Featured

Publications

Data explorers

Benchmarks by Epoch AI

Scaling

Software progress

Open models

Capabilities

Math

Leading companies

Finances

Geopolitics

Chips

Data centers

Energy

Adoption and use

Economic impact

Future of AI

Publications

Papers & Reports

Data Insights

Newsletter

Podcast

Data explorers

Capabilities

Models

Frontier Data Centers

Chip Owners

Companies

Polling on AI Use

Benchmarks by Epoch AI

Epoch Capabilities Index

FrontierMath: Open Problems

FrontierMath: Tiers 1-4

About Epoch AI

Donate

Team

Careers

Consultations

For press

Transparency

Jean-Stanislas Denain

Filter

By Jean-Stanislas Denain