Applications of Generative AI (GenAI) are expected to revolutionize a number of different areas, ranging from science & medicine to education. The potential for these seismic changes has triggered a lively debate about the potential risks of the technology, and resulted in calls for tighter regulation, in particular from some of the major tech companies who are leading in AI development. This regulation is likely to put at risk the budding field of open-source generative AI. Using a three-stage framework for GenAI development (near, mid and long-term), we analyze the risks and opportunities of open-source generative AI models with similar capabilities to the ones currently available (near to mid-term) and with greater capabilities (long-term). We argue that, overall, the benefits of open-source GenAI outweigh its risks. As such, we encourage the open sourcing of models, training and evaluation data, and provide a set of recommendations and best practices for managing risks associated with open-source generative AI.

Our Openness Taxonomy

Openness Taxonomy
Pipeline of the components of model (1) training, (2) evaluation and (3) deployment for typical LLMs.

There are several components involved in the training, evaluation and deployment pipeline to obtain a Large Language Model (LLM). Model developers decide whether to make each component of those pipelines
private
or
public
, with varying levels of restrictions for the latter.

🎯 The main aim of this taxonomy is to provide a structured way to track the openness of the pipelines involved in training, evaluating and deploying LLMs today.
As discussed in detail in our paper, the openness of the components involved in training, evaluation and deployment has key implications in terms of transparency, reproducibility and safety of these models.

Explore the taxonomy

Contributing

🫶 We need your help to keep this taxonomy up to date! This website is powered by Jekyll and hosted on GitHub Pages.

If there is a model missing or if you spot a mistake, please visit our GitHub repository and submit a pull request.

Contribute (Github)

Citation

You can cite our works as:
@article{eiras2024risks, title={Risks and Opportunities of Open-Source Generative AI}, author={Eiras, Francisco and Petrov, Aleksandar and Vidgen, Bertie and de Witt, Christian Schroeder and Pizzati, Fabio and Elkins, Katherine and Mukhopadhyay, Supratik and Bibi, Adel and Csaba, Botos and Steibel, Fabro and others}, journal={arXiv}, year={2024} } @inproceedings{eiras2024near, title={Near to Mid-term Risks and Opportunities of Open Source Generative AI}, author={Eiras, Francisco and Petrov, Aleksandar and Vidgen, Bertie and de Witt, Christian Schroeder and Pizzati, Fabio and Elkins, Katherine and Mukhopadhyay, Supratik and Bibi, Adel and Csaba, Botos and Steibel, Fabro and others}, booktitle={International Conference of Machine Learning}, year={2024} }