AI Benchmarking
Our database of benchmark results, featuring the performance of leading AI models on challenging tasks. It includes results from benchmarks evaluated internally by Epoch AI as well as data collected from external sources. Explore trends across time, by benchmark, or by model.
Benchmarking updates
September 29, 2025
      
        Claude Sonnet 4.5 just established a new state-of-the-art performance in our evaluations of SWE-Bench Verified.
      
      See results for Claude Sonnet 4.5
    July 11, 2025
      
        Introducing FrontierMath Tier 4: a benchmark of extremely challenging research-level math problems, designed to test the limits of AI’s reasoning capabilities.
      
      Read our announcement thread
    July 10, 2025
      
        SWE-Bench can be tricky to run. We released a public registry of Docker containers that make it easy and fast.
      
      See how