Benchmarking updates

December 22, 2025
We benchmarked several open-weight Chinese models on FrontierMath Tiers 1-3. Top scores lag the overall frontier by about 7 months.
Read the thread
December 23, 2025
Why benchmarking is hard: exploring the challenges and complexities at each stage of the benchmarking pipeline.
Understand the challenges
December 23, 2025
AI capabilities progress has sped up: According to our ECI frontier model improvement nearly doubled in 2024.
Explore the data
Trusted by leaders at OpenAI, DeepMind, and governments worldwide.
Need deeper insights? Our team offers custom research and advisory services.
Book a consultation