Background

Earned Trust through AI System Assurance

AI Model Weights

An AI model processes an input—such as a user prompt— into a corresponding output, and the contents of that output are determined by a series of numerical parameters that make up the model, known as the model’s weights. The values of these weights, and therefore the behavior of the model, are determined by training the model with numerous examples.¹¹ The weights represent numerical values that the model has learned during training to achieve an objective specified by the developers. Parameters encode what a model has learned during the training phase, but they are not the only important component of an AI model. For example, foundation models are trained on great quantities of data; for large language models (LLMs) in particular, training data can be further decomposed into trillions of sub-units, called tokens. Other factors also play a significant role in model performance, such as the model’s architecture, training procedures, the types of data (or modalities) processed by the model, and the complexity of the tasks the model is trained to perform.¹²

Some model developers have chosen to keep these weights guarded from the public, opting to control access through user-focused web interfaces or through APIs (application programming interfaces). Users or software systems can interact with these models by submitting inputs and receiving outputs, but cannot directly access the weights themselves. If a developer does decide to make a model’s weights widely available, three important consequences arise.

First, once weights have been released, individuals and firms can customize them outside the developer’s initial scope. For instance, users can fine-tune models on new data, such as text from a language or cultural context not included in the original training corpus.¹³ Other techniques, such as quantization,¹⁴ pruning,¹⁵ and merging multiple models together, do not require new data. Customization techniques typically require significantly less technical knowledge, resources, and computing power than training a new model from scratch. The gap between the resources required to customize pre-trained models compared to training a full model will likely continue to widen.¹⁶, ¹⁷ This accessibility afforded by open weights significantly lowers the barrier of entry to fine-tune models for both beneficial and harmful purposes. Adversarial actors can remove safeguards from open models via fine-tuning, then freely distribute the model, ultimately limiting the value of mitigation techniques.¹⁸

Users can also circumvent some of these safeguards in closed AI models, such as by consulting online information about how to ‘jailbreak’ a model to generate un intended answers (i.e., creative prompt engineering) or by fine-tuning AI models via APIs.¹⁹ However, there are significantly fewer model-based safeguards for open weight models overall.

Second, developers who publicly release model weights give up control over and visibility into its end users’ actions. They cannot rescind access to the weights or per form moderation on model usage.²⁰ While the weights could be removed from distribution platforms, such as Hugging Face, once users have downloaded the weights they can share them through other means.²¹ For example, the company Mistral AI publicly released Mixtral 8x7b, a dual-use foundation model with widely available model weights via BitTorrent, a decentralized peer-to-peer file sharing protocol which is designed specifically to evade control by one party.²²

Finally, open model weights allow users to perform computational inference using their own computational resources, which may be on a local machine or bought from a cloud service. This localizability allows users to leverage models without sharing data with the developers of the model, which can be important for confidentiality and data protection (i.e., healthcare and finance industry). However, it also limits the capacity to monitor model use and misuse, in comparison to models that only allow API or web interface access.

Model size and use is an important factor when considering the effectiveness of legal means such as take down requests in controlling the wide distribution of model weights. Large models and models that are used heavily are more likely to leverage commercial datacenter infrastructure than smaller or less frequently used models.

The Spectrum of Model Openness

This Report focuses on widely available model weights, but developers of dual-use foundation models can release their models with varying levels of openness.²³ Weights, code, training or fine-tuning data, and documentation can all be made available through multiple channels with varying types of restrictions.

Multiple layers of structured access can provide varying levels of access to different individuals at different times.²⁴ For example, access to model weights could be given to vetted researchers, but not to the general public. Model sharing can involve a staged release, where information and components are gradually released over time. This is done to allow time for safety research and for risks at one stage to become apparent before increasing access. The time scale for staged releases can vary, since “generally substantial sociotechnical research requires multiple weeks, months, and some times years.”²⁵ There are currently a wide range of AI licenses in use, which can be used by themselves or in conjunction with forms of structured access. Some licenses require the user or downloader to agree to use and redistribution restrictions, sometimes including behavioral or ethical guidelines, though they can be hard to enforce.²⁶

Even developers of models that are not “open” can increase transparency and visibility through comprehensive documentation. Model cards are one method for describing a model’s technical details, intended uses, and performance on evaluation and red-teaming efforts.²⁷ Independent of whether the training data it self is widely available, information about the training dataset(s) can be distributed using data sheets, where developers can share the processes they used to train the model and any artifacts or procedures involved in human-in-the-loop training such as data annotation or reinforcement learning with human feedback instructions.²⁸

These openness factors can and should be considered at all stages of the AI lifecycle, including post-deployment. For instance, a dual-use foundation model can be open at one stage of development and closed at another, such as a base model that is open but that is customized to create a downstream, closed consumer-facing system.

An Approach to Analysis of Marginal Risks and Benefits

As mandated by Executive Order 14110, this Report analyzes “the potential benefits, risks, and implications of dual-use foundation models for which the model weights are widely available.”²⁹ The assessment of policy options to address such models specifically, versus potential interventions to address risks more broadly, is the touchstone of our analysis. This Report will provide a broad assessment of the marginal risks and benefits of dual-use foundation models with widely available model weights. We define marginal risks and benefits as the additional risks and benefits that widely available model weights introduce compared to those that come from non-open foundation models or from other technologies more generally. Public commenters generally agreed that a marginal risk and benefit analysis framework is appropriate for our analysis.³⁰

The consideration of marginal risk is useful to avoid targeting dual-use foundation models with widely available weights with restrictions that are unduly stricter than alternative systems that pose a similar balance of benefits and risks. This does not mean that it is wise to distribute an unsafe open model as long as other equally unsafe systems already exist. Risks from open models and closed models should both be managed, though the particular mitigations required may vary. In some cases, managing the risk of open models may pose unique opportunities and challenges to reduce risk while maintaining as many of the benefits of openness as possible.

As the basis for generating policy recommendations for open foundation models, this Report assesses the marginal benefits and risks of harm that could plausibly be affected by policy and regulatory measures. Marginal benefits and risks, as assessed in this Report, meet the following conditions:

There is a difference in magnitude between dual-use foundation models with widely available model weights as compared to such models without widely available weights.
- Risks and benefits arising equally from both dual-use foundation models with widely available model weights and closed-weight dual-use foundation models are not considered “marginal.”³¹
The benefits or risks are greater for dual-use foundation models than for non-AI technologies and AI models not fitting the dual-use foundation model definition.
- Only risks and benefits that arise differently from dual-use foundation models and models that do not meet this definition (e.g., models with fewer than 10 billion parameters) are considered “marginal.”
- Similarly, the risks and benefits that exist equally in both dual-use foundation models AI and other technological products or services (such as Internet search engines) are not considered “marginal.”
The risks and benefits arise from models that will have widely available weights in the future over and above those with weights that have al-ready been widely released.
- As discussed above, once model weights have been widely released, it is difficult to “un-release” them. Any policy that restricts the wide availability of dual-use foundation model weights will be most effective on models that have not yet been widely released.
- When deciding whether to restrict the availability of a specific future set of dual-use foundation models, it is important to consider whether those future models will present substantially greater marginal risks and/ or benefits over existing models with widely available model weights.
- Not all policy options require restricting the wide availability of model weights. This consideration is most relevant for those policy options that require restricting the wide availability of model weights.

Risks and benefits that satisfy all three conditions are difficult to assess based on current evidence. Most current research on the capabilities of dual-use foundation models is conducted on models that have already been released. Evidence from this research provides a baseline against which to measure marginal risks and benefits, but cannot preemptively measure the risks and benefits introduced by the wide release of a future model. It can provide relatively little support for the marginal risks and benefits of future releases of dual-use foundation models with widely available model weights, except to the extent that such evidence supports a determination about the capabilities of future dual-use foundation models with widely available model weights. Without changes in research and monitoring capabilities, this dynamic may persist: Any evidence of risks that would justify possible policy interventions to restrict the availability of model weights might arise only after those AI models, closed or open, have been released.

Next: Risks and Benefits of Dual-Use Foundation Models with Widely Available Model Weights

¹¹ The Report defines AI models as “a component of an information system that implements AI technology and uses computational, statistical, or machine learning techniques to produce outputs from a given set of inputs,” as defined in the “Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence” Executive Order. (Exec. Order No. 14,110 (2023)).

¹² Yuksel, S.,et al., (2012). Twenty Years of Mixture of Experts. IEEE Transactions on Neural Networks and Learning Systems, 23(8), 1177–1193; Vaswani, A. et al., (2017). Attention is All You Need; What Is RLHF? (n.d.). aws.amazon.com ; Whang, O. (2024, April 30). From Baby Talk to Baby A.I. NYTimes.

¹³ Lin, Y.-T., & Chen, Y.-N. (2023). Taiwan LLM: Bridging the Linguistic Divide with a Culturally Aligned Language Model. ArXiv (Cornell University).

¹⁴ Representing weights with lower-precision numbers. See, e.g., Hugging Face. Quantization.

¹⁵ Various methods that end up removing parameters from an AI model. See, e.g., Pruning Tutorial. PyTorch.

¹⁶ Criddle, C. & Madhumita M. (2024, May 8). Artificial intelligence companies seek big profits from ‘small’ language models. Financial Times.

¹⁷ A CNAS report found that, if trends continue, frontier AI training could require 1,000 times more compute power than GPT-4 by the late 2020s/early 2030s, and training costs for leading models double approximately every 10 months. However, note that the amount of compute power an actor saves depends on the amount on inference they need to perform on the model. Future-Proofing Frontier AI Regulation. (2024, March 13).

¹⁸ See e.g.,Vincent, J. (2023, March 8). Meta’s powerful AI language model has leaked online — what happens now? The Verge; Hubinger, E., Denison, et al. (2024, January 17). Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training. ArXiv.

¹⁹ Zhan, Q. et al., (2023). Removing RLHF Protections in GPT-4 via Fine-Tuning. UIUC, Stanford.

²⁰ See, e.g., Partnership on AI Comment at 6, (“Some of the features of open models that may be relevant to assessing differential risk include that open release of model weights is irreversible, and that moderation/monitoring of open models post-release is challenging.”).

²¹ See, e.g., Hugging Face Comment at 3 (“Model weights can be shared individually between parties, on platforms with or without documentation and with or without access management, and via p2p/torrent.”).

²² See Goldman, S. (2023, December 8). Mistral AI bucks release trend by dropping torrent link to new open source LLM. VentureBeat; Coldewey, D. (2023, September 27). Mistral AI makes its first large language model free for everyone. TechCrunch. However, note that not all AI models receive such attention when released. See GitHub Comment at 3 (“Wide availability of model weights is a function of discovery, governed by online platforms. Even for content posted publicly on the internet, the default state is obscurity. Whether content is widely available will depend on ecosystem activity, distribution channels, and, particularly, sharing on platforms that enable virality. Ecosystem monitoring and governance can help inform and implement risk-based mitigations for widely available model weights.”).

²³ Exec. Order No. 14,110 (2023).

²⁴ Shevlane, T. (2022). Structured Access: An Emerging Paradigm for Safe AI Deployment. University of Oxford.

²⁵ Solaiman, I. (2023). The Gradient of Generative AI Release: Methods and Considerations. Hugging Face.

²⁶ Contractor, D., et al. (2022, June). Behavioral use licensing for responsible AI. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (pp. 778-788).

²⁷ Mitchell, M., et al. (2019, January). Model cards for model reporting. In Proceedings of the conference on fairness, accountability, and transparency (pp. 220-229).

²⁸ See, e.g., AI Accountability Policy Report, National Telecommunications and Information Administration. (2024, March). at 28 (noting that datasheets “provide salient information about the data on which the AI model was trained, including the ‘motivation, composition, collection process, [and] recommended uses’ of the dataset”).

²⁹ Exec. Order No. 14,110 (2023).

³⁰ See, e.g., AI Policy and Governance Working Group Comment at 2 (“The federal government should prioritize understanding of marginal risk. The risks of open foundation models do not exist in a vacuum. To properly assess the risks of open foundation models, and whether regulations should single out open foundation models, the federal government should directly compare the risk profile to those of closed foundation models and existing technologies. In its report, the NTIA should foreground the marginal risk of open foundation models by directing government agencies to conduct marginal risk assessments, fund marginal risk assessment research, and incorporate marginal risk assessment into procurement processes.”); Rishi Bommasani et al. at 1 (“Foundation models present tremendous benefits and risks to society as central artifacts in the AI ecosystem. In addressing dual use foundation models with widely available weights, the National Telecommunications and Information Administration (NTIA) should consider the marginal risk of open foundation models, defined as the extent to which they increase risk relative to closed foundation models or preexisting technologies like search engines.”) (internal footnote omitted); CTA Comment at 6 (“NTIA’s consideration of risks associated with open weight models should focus on marginal risks arising from such models.”); Public Knowledge Comment at 3 (“The conversation around open foundation models is significantly enriched by a nuanced understanding of the marginal risks they pose compared to their closed counterparts and existing technologies.”); OTI Comment at 17 (“NTIA and other U.S. government agencies must focus vague discussions about the risks of open AI models on the study and precise articulation of the marginal risk these models pose.”); Holistic AI Comment at 10 (“To effectively interrogate and embed these considerations, it is crucial for policy and governance discourses on responsible model release to be anchored around the concept of the marginal risk posed by open foundation models”) (internal hyperlink omitted); CDT Comment at 14 (“In evaluating the risks of [open foundation models], we must consider them in comparison to the existing risks enabled by closed models, by access to existing technologies such as the internet, and by smaller models that carry similar risks but for which controlling proliferation would be much harder if not impossible. In other words, we must consider the marginal risk of [open foundation models].”) (italics in original) (internal citation omitted); Mozilla Comment at 13 (“Debates around safety and ‘open source’ AI should center marginal risk[.]”) (quotation marks in original); Microsoft Comment at 1-2 (“We recommend [...] [p]romoting risk and impact assessments that are grounded in the specific attributes of widely available model weights that present risk, the marginal risk of such availability compared to existing systems[.] [. . .]”); PAI Comment at 6 (“In assessing the risk posed by open foundation models, and appropriate measures to address those risks, policy makers should focus on the marginal risks associated with open access release.”) (internal hyperlink omitted); Databricks Comment at 2 (“The benefits of open models substantially outweigh the marginal risks, so open weights should be allowed, even at the frontier level[.]”); Meta Comment at 16 (“In order to precisely identify and assess risks uniquely presented by open foundation models, it is important to apply a ‘marginal risk analysis’ that takes account of the risks of open models compared to: (1) preexisting technologies, and (2) closed models.”) (internal citation omitted); (quotation marks in original) GitHub Comment at 3 (“Evidence of harmful capabilities in widely available model weights and their use should consider baselines of closed, proprietary AI capabilities and the availability of potentially dangerous information in books and via internet search. [. . .] Today, available evidence of the marginal risks of open release does not substantiate government restrictions.”); BSA Comment at 3 (“Any specific policy options for open foundation models should be considered only as any marginal risk posed by such models are better understood.”); U.S. Chamber of Commerce at 3 (“As indicated in the NIST [Risk Management Framework] 1.0, ‘Risk tolerance and the level of risk acceptable to organizations or society are highly contextual and application and use-case specific.’ This is why we believe it is essential for NTIA to focus on the marginal risk, which is context-specific.”) (internal citation omitted) (quotation marks in original); AI Healthcare Working Group at 1 (“The risks of technology are real, but their promise outweighs those risk and those risks should be viewed and evaluated in the context of marginal risk.”); Johns Hopkins Center for Health Security Comment at 5 (“As Sayesh Kapoor and colleagues caution, it is important to consider the marginal risk that open models pose above preexisting technologies.”) (internal citation omitted). See also generally Center for Democracy & Technology, & et al. (March, 25, 2024). RE: Openness and Transparency in AI Provide Significant Benefits for Society. (letter from civil society organizations promoting a marginal risk assessment); Kapoor S. et al., (2024). On the Societal Impact of Open Foundation Models. ArXiv. (presenting a marginal risk framework).

³¹ See Executive Order 14110, section 4.6.

Office

Office of Policy Analysis and Development (OPAD)

Program

Artificial Intelligence

Breadcrumb

AI Model Weights

The Spectrum of Model Openness

An Approach to Analysis of Marginal Risks and Benefits