Greg Burnham

Greg Burnham

Greg Burnham is a researcher at Epoch AI. Prior to this, he worked at Elemental Cognition and Bridgewater Associates. He has a BA in mathematics from Princeton University.

greg@epoch.ai

Filter

Type
Topic

By Greg Burnham

0 results
What do “economic value” benchmarks tell us?
Report
Feb. 13, 2026
What do “economic value” benchmarks tell us?

These benchmarks track a wide range of digital work. Progress will correlate with economic utility, but tasks are too self-contained to indicate full automation.

By Florian Brand and Greg Burnham

AI math capabilities could be jagged for a long time – Daniel Litt
Podcast
Jan. 29, 2026
AI math capabilities could be jagged for a long time – Daniel Litt

In this episode, Daniel Litt chats with the hosts about AI’s limits in mathematics, accelerating math research, and how to measure progress on open problems.

By Daniel Litt, Greg Burnham, and Anson Ho

Benchmark Scores = General Capability + Claudiness
Newsletter
Nov. 20, 2025
Benchmark Scores = General Capability + Claudiness

Is this because skills generalize very well, or because developers are pushing on all benchmarks at once?

By Greg Burnham

What does OSWorld tell us about AI’s ability to use computers?
Report
Oct. 30, 2025
What does OSWorld tell us about AI’s ability to use computers?

We review OSWorld, a prominent computer use benchmark. Its tasks are relatively simple, many don’t require GUIs, and success often hinges on interpreting ambiguous instructions. It is also not stable over time.

By Florian Brand and Greg Burnham

Less than 70% of FrontierMath is within reach for today’s models
Newsletter
Oct. 17, 2025
Less than 70% of FrontierMath is within reach for today’s models

57% of problems have been solved at least once.

By Greg Burnham

OpenAI is projecting unprecedented revenue growth
Newsletter
Oct. 15, 2025
OpenAI is projecting unprecedented revenue growth

No company has gone from $10B to $100B as fast as OpenAI projects to do.

By Greg Burnham

Evaluating Gemini 2.5 Deep Think's math capabilities
Report
Oct. 9, 2025
Evaluating Gemini 2.5 Deep Think's math capabilities

It has improved at using background knowledge and doing precise computations. It can be a helpful research assistant and may take a more conceptual approach to geometry. It shows limited creativity and sometimes struggles with citations.

By Greg Burnham

LLMs have not yet solved the hardest problems on high school math contests
Data Insight
Sep. 3, 2025
LLMs have not yet solved the hardest problems on high school math contests

By Greg Burnham

Newsletter
Aug. 7, 2025
We didn’t learn much from the IMO

The problems gave AI only a slim chance to show new capabilities

By Greg Burnham

Evaluating Grok 4’s math capabilities
Report
Jul. 25, 2025
Evaluating Grok 4’s math capabilities

It’s good at involved computations, improving at proofs from a low base, and useful for literature search. It still favors low-level grinds and leans on background knowledge.

By Greg Burnham

Newsletter
Jul. 8, 2025
What will the IMO tell us about AI math capabilities?

Most discussion about AI and the IMO focuses on gold medals, but that's not the thing to pay most attention to.

By Greg Burnham

LLMs now accept longer inputs, and the best models can use them more effectively
Data Insight
Jun. 25, 2025
LLMs now accept longer inputs, and the best models can use them more effectively

By Greg Burnham and Tom Adamczewski

LLM providers offer a trade-off between accuracy and speed
Data Insight
Jun. 11, 2025
LLM providers offer a trade-off between accuracy and speed

By Greg Burnham and Tom Adamczewski

Newsletter
May 30, 2025
GPQA Diamond: What’s left?

Investigate GPQA Diamond benchmark's validity: uncover flawed questions, model challenges, and why it still informs AI evaluation.

By Greg Burnham