Risks and Benefits of Dual-Use Foundation Models with Widely Available Model Weights
This section considers some of the marginal risks and benefits posed by open foundation models. This section overviews the main factors identified in the Executive Order, the comments submitted to NTIA for this Report, and existing literature. Neither the risks and benefits discussed here, nor the categories they are grouped into, should be considered comprehensive or definitive. Other reports identified in the Executive Or der also overview some of these topics at greater length.
One limitation of this Report is that many AI models with widely available model weights—while highly capable—have fewer than 10 billion parameters, and are thus outside the scope of this Report as defined in Executive Order 14110. However, the number of parameters in a model (especially in models of different modalities, such as text-to-image or video generation models) may not correspond to their performance. For instance, advances in model architecture or training techniques can lead models which previously required more than 10 billion parameters to be matched in capabilities and performance by newer models with fewer than 10 billion parameters. Further, as science progresses, it is possible that this dynamic will accelerate, with the number of parameters required for advanced capabilities steadily decreasing.
These limitations, along with other factors, ultimate ly lead us to recommend that the federal government adopt a monitoring framework to inform ongoing assessments and possible policy action. Future assessments of the risks and benefits of open foundation models would benefit from an evidence base that includes a robust set of “leading indicators,” or measures that can act as warning signs for potential or imminent risk that future open foundation models may introduce. Those leading indicators might include assessments of the capabilities of leading closed-weight foundation models (as similar behaviors and performance are likely to be found in open foundation models within months or years32) and other assessments of the evolving land scape of risks and benefits.
Open foundation model capabilities and limitations are evolving, and it is difficult to extrapolate their capabilities, as well as their impact on society, based on current evidence. Further, even if we could perfectly extrapolate model performance, quantifying the marginal risks and benefits is extremely difficult. For these reasons, our analysis favors taking steps to develop the evidence base and improve research techniques, as we address in our policy recommendations.
Public Safety
This section examines the marginal risks and benefits to public safety posed by dual-use foundation models with widely available model weights.
Geopolitical Considerations
This section highlights the marginal risks and benefits related to the intersection of open foundation models and geopolitics.
Societal Risks and Well-Being
Dual-use foundation models with widely available model weights have the potential for creating benefits across society, primarily through the access to artificial intelligence that such models provide by allowing a greater range of actors (e.g., the public sector, nonprofits, academic researchers, and independent developers) to build and deploy AI systems than closed corporate models afford.
Competition, Innovation, and Research
This section covers the marginal risks and benefits dual-use foundation models with widely available model weights may introduce to AI competition, innovation, and research.
Uncertainty in Future Risks and Benefits
Many benefits and harms of foundation models are already happening. However, in some cases, the deep uncertainty inherent in future technology makes epistemic humility the wisest course of action.
32 See, e.g., Center for AI Policy Comment at 6 (“We find that the timeframe between closed and open models right now is around 1.5 years. We can arrive at this conclusion by analyzing benchmark performance between current leading open weight AI models and the best closed source AI models.”); Unlearn.AI Comment at 2 (“Estimating the timeframe between the deployment of a closed model and the deployment of an open foundation model of similar performance on relevant tasks is possible by looking at the gaps in human-evaluated performance between open foundation models and closed counterparts. While this is highly dependent on the specific AI model and its application domain, we can look towards a few examples. At the moment, it takes about 6 months to 1 year for similarly performing open models to be successfully deployed after the deployment of OpenAI’s closed models. The time gap between proprietary image recognition models and high-quality open-source alternatives has narrowed relatively quickly due to robust community engagement and significant public interest. In contrast, more niche or complex applications, such as those requiring extensive domain specific knowledge or data, might see longer timeframes before competitive open models emerge.”); Databricks Comment at 3 (“Databricks believes that major open source model developers are not far behind the closed model developers in creating equally high performance models, and that the gap between the respective development cycles may be closing.”) (internal citation omitted); Stability AI Comment at 17 (“There is ample evidence that closed models exhibiting category state of the art performance will be matched by open models in due course. Previously, it took ~28 months before an open model such as GPT-J from EleutherAI approached the performance of a closed model such as GPT-2 from Open AI on common benchmarks. That gap is closing. Only ~eight months elapsed before open models such as Llama 2-70B from Meta rivaled GPT-3.5 from Open AI, and only ~ten months elapsed before Falcon-180B from the Technology Innovation Institute (funded by the Abu Dhabi Government) exceeded GPT-3.5 performance.”) (internal citation omitted). But see Meta Comment at 13 (“It is not possible to generally estimate this timeframe given the variables involved, including the model deployment developers’ business models and whether, in the case of Llama 2, they download the model weights from Meta directly or accessed it through third-party services like Azure or AWS.”); Hugging Face Comment at 3 (“Timelines vary and estimates will change by utility of the model and costs.”); EleutherAI Comment at 21 (“The timeframe between deployment of an open and closed equally-performing model is difficult to predict reliably. The primary blocker for the capabilities of open models is funding, which can disappear at the whim of a handful of well-resourced individuals. [. . .]” See also CSET Comment at 2 (“The best way to gauge such timeframes may be to directly contact organizations designing foundation models and acquire information regarding their model performance and release strategies. This is the most viable way to get these estimations, although these organizations may not have the will or obligation to provide such information.”).