Benchmarking updates

February 6, 2026
Opus 4.6 matched the top score on FrontierMath Tiers 1-3, a first for Anthropic models.
Read the thread
February 4, 2026
Kimi K2.5 set a new record among open-weight models on the Epoch Capabilities Index.
Check out the scores
January 27, 2026
We released FrontierMath: Open Problems, which tests AI on unsolved math research problems.
Discover the benchmark
Trusted by leaders at OpenAI, DeepMind, and governments worldwide.
Need deeper insights? Our team offers custom research and advisory services.
Book a consultation