Epoch AI’s work is free to use, distribute, and reproduce provided the source and authors are credited under the Creative Commons Attribution license.

Cite this work as

                
                  Jean-Stanislas Denain (2024), "Models with downloadable weights currently lag behind the top-performing models". Published online at epoch.ai. Retrieved from: 'https://epoch.ai/data-insights/open-vs-closed-model-performance' [online resource]

BibTeX citation

  
  @misc{epoch2024openvsclosedmodelperformance,
    title={Models with downloadable weights currently lag behind the top-performing models},
    author={Jean-Stanislas Denain},
    year={2024},
    url={https://epoch.ai/data-insights/open-vs-closed-model-performance},
    note={Accessed: }
  }
  

              

Models with downloadable weights currently lag behind the top-performing models

The models with the highest GPQA Diamond and MATH Level 5 accuracies tend not to have downloadable weights. For example, on GPQA Diamond OpenAI’s o1 outperformed the best-performing downloadable model at the time, Phi-4, by 20 percentage points. Similarly on MATH Level 5, Phi-4 lagged behind o1 by 29 percentage points.

Enable JavaScript to see an interactive visualization.

We further analyzed how far behind open models are in this article, where we found that the best-performing open LLMs lagged the best-performing closed LLMs by between 6 months on GPQA Diamond and 20 months on MMLU.

However, the release of DeepSeek-R1 in January 2025 showed that the performance gap between open-weights and closed-weights has significantly decreased. For example, on MATH Level 5 DeepSeek-R1 only lags behind the current best-performing model, o3-mini, by 2 percentage points.

Authors

Jean-Stanislas Denain

Published

November 27, 2024

Last major update

February 7, 2025

Explore this data

Data on AI Benchmarking

Benchmark results featuring the performance of leading AI models on challenging tasks.

Models with downloadable weights currently lag behind the top-performing models

Explore this data

Data on AI Benchmarking

Related insights

We value your privacy