Jean-Stanislas Denain

Jean-Stanislas Denain

Jean-Stanislas Denain is a senior researcher at Epoch AI, where he leads consultation services. He is also a graduate student in Statistics at UC Berkeley.

js@epoch.ai

Filter

Type
Topic

By Jean-Stanislas Denain

0 results
How Fast Could Robot Production Scale Up?
Report
Apr. 22, 2026
How Fast Could Robot Production Scale Up?

We look at reference classes, factory buildout timelines, and upstream component supply to estimate plausible production rates for humanoids, quadrupeds, robotic arms, wheeled robots, and drones.

By Jean-Stanislas Denain and Yann Rivière

Have AI Capabilities Accelerated?
Report
Apr. 16, 2026
Have AI Capabilities Accelerated?

We investigate progress trends on four capability metrics to determine whether AI capabilities have recently accelerated. Three of four metrics show strong evidence of acceleration, driven by reasoning models.

By Jean-Stanislas Denain and Alexander Barry

What do frontier AI companies' job postings reveal about their plans?
Newsletter
Mar. 24, 2026
What do frontier AI companies' job postings reveal about their plans?

A fast increase in go-to-market roles, and hints about upcoming products

By Jean-Stanislas Denain and Campbell Hutcheson

Final training runs account for a minority of R&D compute spending
Newsletter
Mar. 23, 2026
Final training runs account for a minority of R&D compute spending

New evidence following the MiniMax and Z.ai IPOs

By Jean-Stanislas Denain and Cheryl Wu

Expanding our analysis of biological AI models
Report
Feb. 20, 2026
Expanding our analysis of biological AI models

We release a database of over 1,100 biological AI models across nine categories. We analyze their safeguards, accessibility, training data sources, and the foundation models they build on.

By David Atanasov, Niccolò Zanichelli, and Jean-Stanislas Denain

Newsletter
Feb. 16, 2026
How persistent is the inference cost burden?

Toby Ord argues that RL scaling primarily increases inference costs, creating a persistent economic burden. While the framing is useful, the cost to reach a given capability level falls fast, and the RL scaling data is thin.

By Jean-Stanislas Denain

Where Autonomy Works: Evaluating Robot Capabilities in 2026
Report
Feb. 10, 2026
Where Autonomy Works: Evaluating Robot Capabilities in 2026

We assess the current state of autonomous robotics by evaluating robot performance on concrete tasks across industrial, household, and navigation domains.

By Yann Rivière and Jean-Stanislas Denain

An FAQ on Reinforcement Learning Environments
Newsletter
Jan. 12, 2026
An FAQ on Reinforcement Learning Environments

We interviewed 18 people across RL environment startups, neolabs, and frontier labs about the state of the field and where it's headed.

By Jean-Stanislas Denain and Chris Barber

Why benchmarking is hard
Newsletter
Dec. 23, 2025
Why benchmarking is hard

Running benchmarks involves many moving parts, each of which can influence the final score. The two most impactful components are scaffolds and API providers.

By Florian Brand and Jean-Stanislas Denain

The changing drivers of LLM adoption
Newsletter
Dec. 19, 2025
The changing drivers of LLM adoption

Public data as well as our original polling suggest LLM adoption is roughly on trend, but the underlying drivers are shifting.

By Jean-Stanislas Denain and Anson Ho

Is almost everyone wrong about America’s AI power problem?
Newsletter
Dec. 17, 2025
Is almost everyone wrong about America’s AI power problem?

Why power is less of a bottleneck than you think.

By Anson Ho, Yafah Edelman, Josh You, and Jean-Stanislas Denain

A Rosetta Stone for AI benchmarks
Paper
Dec. 2, 2025
A Rosetta Stone for AI benchmarks

Most benchmarks saturate too quickly to study long-run AI trends. We solve this using a statistical framework that stitches benchmarks together, with big implications for algorithmic progress and AI forecasting.

By Anson Ho, Jean-Stanislas Denain, David Atanasov, Samuel Albanie, and Rohin Shah

How many digital workers could OpenAI deploy?
Newsletter
Oct. 3, 2025
How many digital workers could OpenAI deploy?

OpenAI has the inference compute to deploy tens of millions of digital workers, but only on a narrow set of tasks – for now.

By Jean-Stanislas Denain, Anson Ho, and Jaime Sevilla

Why GPT-5 used less training compute than GPT-4.5 (but GPT-6 probably won’t)
Newsletter
Sep. 26, 2025
Why GPT-5 used less training compute than GPT-4.5 (but GPT-6 probably won’t)

OpenAI focused on scaling post-training on a smaller model

By Yafah Edelman, Jean-Stanislas Denain, Jaime Sevilla, and Anson Ho

Newsletter
Sep. 19, 2025
The huge potential implications of long-context inference

Continual learning, scaling RL, and research feedback loops

By Jean-Stanislas Denain and Anson Ho

Newsletter
Aug. 22, 2025
Why future AI agents will be trained to work together

Many multi-agent setups are based on fancy prompts, but this is unlikely to persist

By Anson Ho and Jean-Stanislas Denain

What skills does SWE-bench Verified evaluate?
Report
Jun. 13, 2025
What skills does SWE-bench Verified evaluate?

We take a deep dive into SWE-bench Verified, a prominent agentic coding benchmark. While one of the best public tests of AI coding agents, it is limited by its focus on simple bug fixes in familiar open-source repositories.

By Florian Brand and Jean-Stanislas Denain

Newsletter
Jun. 6, 2025
Beyond benchmark scores: Analyzing o3-mini’s mathematical reasoning

Examining o3-mini's math reasoning: an erudite, vibes-based solver that excels in knowledge but lacks precision, creativity, and formal human rigor.

By Anson Ho, Jean-Stanislas Denain, and Elliot Glazer

LLM responses to benchmark questions are getting longer over time
Data Insight
Apr. 17, 2025
LLM responses to benchmark questions are getting longer over time

By Luke Emberson, Ben Cottier, Josh You, Tom Adamczewski, and Jean-Stanislas Denain

Newsletter
Mar. 28, 2025
The real reason AI benchmarks haven’t reflected economic impacts

The real reason that AI benchmarks haven’t reflected real-world impacts historically is that they weren’t optimized for this, not because of fundamental limitations – but this might be changing.

By Anson Ho and Jean-Stanislas Denain

Accuracy increases with estimated training compute
Data Insight
Updated Feb. 7, 2025
Accuracy increases with estimated training compute

By Jean-Stanislas Denain

Models with downloadable weights currently lag behind the top-performing models
Data Insight
Updated Feb. 7, 2025
Models with downloadable weights currently lag behind the top-performing models

By Jean-Stanislas Denain

US models currently outperform non-US models
Data Insight
Updated Feb. 7, 2025
US models currently outperform non-US models

By Jean-Stanislas Denain

AI capabilities can be significantly improved without expensive retraining
Paper
Dec. 12, 2023
AI capabilities can be significantly improved without expensive retraining

While scaling compute for training is key to improving LLM performance, some post-training enhancements can offer gains equivalent to training with 5 to 20x more compute at less than 1% the cost.

By Tom Davidson, Jean-Stanislas Denain, Pablo Villalobos, and Guillem Bas