Benchmarking updates

December 12, 2025
We used ECI to predict METR time horizons for recently-released models.
Check out the predictions
December 11, 2025
We added two benchmarks to the suite we run ourselves: SimpleQA Verified and a new benchmark of chess puzzles.
Read an overview
November 25, 2025
Gemini 3 Pro set a record on GPQA - by improving at organic chemistry
See how
Trusted by leaders at OpenAI, DeepMind, and governments worldwide.
Need deeper insights? Our team offers custom research and advisory services.
Book a consultation