Recommendations

March 27, 2024

Earned Trust through AI System Assurance

The public, consumers, customers, workers, regulators, shareholders, and others need reliable information to make choices about AI systems. To justify public trust in, and reduce potential harms from, AI systems, it will be important to develop “accountability inputs” including better information about AI systems as well as independent evaluations of their performance, limitations, and governance. AI actors should be held accountable for claims they make about AI systems and for meeting established thresholds for trustworthy AI. Government should advance the AI accountability ecosystem by encouraging, supporting, and/or compelling these inputs. Doing this work is a natural follow-on to the AI EO, which establishes a comprehensive set of actions on AI governance; the White House Blueprint for an AI Bill of Rights, which identified the properties that should be expected from algorithmic systems; and NIST’s AI RMF, which recommended a set of approaches to AI risk management. To advance AI accountability policies and practices, we recommend guidance, support, and the development of regulatory requirements.

GUIDANCE

SUPPORT

REGULATORY REQUIREMENTS

GUIDANCE

Audits and auditors: Federal government agencies should work with stakeholders as appropriate to create guidelines for AI audits and auditors, using existing and/or new authorities.

Independent AI audits and evaluations are central to any accountability structure. To help create clarity and utility around independent audits, we recommend that the government work with stakeholders to create basic guidelines for what an audit covers and how it is conducted – guidance that will undoubtedly have some general components and some domain-specific ones. This work would likely include the creation of auditor certifications and audit methodologies, as well as mechanisms for regulatory recognition of appropriate certifications and methodologies.

Auditors should adhere to consensus standards and audit criteria where possible, recognizing that some will be specific to particular risks (e.g., dangerous capabilities in a foundation model) and/or particular deployment contexts (e.g., discriminatory impact in hiring). Much work is required to create those standards – which NIST and others are undertaking. Audits and other evaluations are being rolled out now concurrently with the development of technical standards. Especially where evaluators are not yet relying on consensus standards, it is important that they show their work so that they too are subject to evaluation. Auditors should disclose methodological choices and auditor independence criteria, with the goal of standardizing such methods and criteria as appropriate. The goals of safeguarding sensitive information and ensuring auditor independence and appropriate expertise may militate towards a certification process for qualified auditors.

AI audits should, at a minimum, be able to evaluate claims made about an AI system’s fitness for purpose, performance, processes, and controls. Regardless of claims made, an audit should apply substantive criteria arrived at through broad stakeholder inquiry across the AI system lifecycle. Areas of review might include:

Risk mitigation and management, including harm prevention;
Data quality and governance;
Communication (e.g., documentation, disclosure, provenance); and
Governance or process controls.

As valuable as they are, independent evaluations, including audits, do not derogate from the importance of regulatory inspection of AI systems and their effects.

Disclosure and access: Federal government agencies should work with stakeholders to improve standard information disclosures, using existing and/or new authorities.

Disclosures should be tailored to their audiences, which may require the creation of multiple artifacts at varying levels of detail and/or the establishment of informational intermediaries. Standardizing a baseline disclosure using artifacts like model and system cards, datasheets, and nutritional labels for AI systems can reduce the costs for all constituencies evaluating and assuring AI. As it did with food nutrition labels, the government may have a role in shaping standardized disclosure, whatever the form. We recommend support of the NIST-led process to provide guidance and best practices on standardized baseline disclosures for AI systems and certain models as an input to AI accountability. Working with stakeholders and achieving commitments from government suppliers, contractors, and grantees to implement such standardized baseline disclosures could advance adoption.

Liability rules and standards: Federal government agencies should work with stakeholders to make recommendations about applying existing liability rules and standards to AI systems and, as needed, supplementing them.

Stakeholders seek clarification of liability standards for allocating responsibility among AI actors in the value chain. We expect AI liability standards to emerge out of the courts as legal actions which clarify responsibilities and redress harms. Regulatory agencies also have an important role in determining how existing laws and regulations apply to AI systems. Of course, Congress and state legislatures will define new liability contours. To help clarify and establish standards for liability, where needed, we encourage further study and collection of stakeholder and government agency input.

To this end, we support a government convening of legal experts and other relevant stakeholders, including affected communities, to inform how policymakers understand the role of liability in the AI accountability ecosystem. The AI accountability inputs we recommend in this Report will feed into legal actions and standards and, by the same token, these inputs should be shaped by the legal community’s emerging needs to vindicate rights and interests. It is also the case that a vibrant practice of independent third-party evaluation of AI systems may depend on both exposure to liability (e.g., perhaps for auditors) and protection from liability (e.g., perhaps for researchers), depending on relevant legal considerations.

SUPPORT

People and tools: Federal government agencies should support and invest in technical infrastructure, AI system access tools, personnel, and international standards work to invigorate the accountability ecosystem.

Robust auditing, red-teaming, and other independent evaluations of AI systems require resources, some of which the federal government has and should make available, and some of which will require funding. A significant move in this direction would be for Congress to support the U.S. AI safety Institute and appropriate funds³⁸³ and establish the National AI Research Resource (NAIRR). NAIRR could contribute to the larger set of needed resources, including:

Datasets to test for equity, efficacy, and many other attributes and objectives;

Compute and cloud infrastructure required to do rigorous evaluations;
Appropriate access to AI system components and processes for researchers, regulators, and evaluators, subject to intellectual property, data privacy, and security- and safety-informed functions;
Independent red-teaming support; and
International standards development (including broad stakeholder participation) and, where applicable for national security, national standards development.

People are also required. We recommend an investment in federal personnel with appropriate sociotechnical expertise to conduct and review AI evaluations and other AI accountability inputs. Support for education and red-teaming efforts would also grow the ecosystem for independent evaluation and accountability.³⁸⁴

Research: Federal government agencies should conduct and support more research and development related to AI testing and evaluation, tools facilitating access to AI systems for research and evaluation, and provenance technologies through existing and new capacity.

Because of their complexity and importance for AI accountability, the following topics make compelling candidates for research and development investment:

Research into the creation of reliable, widely applicable evaluation methodologies for model capabilities and limitations, safety, and trustworthy AI attributes;
Research on durable watermarking and other provenance methods; and
Research into technical tools that facilitate researcher and evaluator access to AI system components in ways that preserve data privacy and the security of sensitive model elements, while retaining openness.

Government should build on investments already underway through the U.S. AI Safety Institute and the National Science Foundation.

REGULATORY REQUIREMENTS

Audits and other independent evaluations: Federal agencies should use existing and/or new authorities to require as needed independent evaluations and regulatory inspections of high-risk AI model classes and systems.

There are strong arguments for sectoral regulation of AI systems in the United States and as for mandatory audits of AI systems deemed to present a high risk of harming rights or safety – according to holistic assessments tailored to deployment and use contexts. Given these arguments, work needs to be done to implement regulatory requirements for audits in some situations. It may not currently be feasible to require audits for all high-risk AI systems because the ecosystem for AI audits is still immature; requirements may need delayed implementation. However, the ecosystem’s maturity will be accelerated by forcing functions. Government may also need to require other forms of information creation and distribution, including documentation and disclosure, in specific sectors and deployment contexts (beyond what it already does require).

Additional consideration should be given to the necessity of pre-release claim substantiation and other certification requirements for certain high-risk AI systems, models, and/or AI systems in high-risk sectors (e.g., health care and finance), as well as periodic claim substantiation for deployed AI systems. Such proactive substantiation would help AI actors to shoulder their burden of assuring AI systems from the start. In the AI context, this marginal additional friction for AI actors could create breathing room for accountability mechanisms to catch up to deployment.

Regardless of the type of inspection model that is adopted, federal regulatory agencies should coordinate closely with regulators in non-adversary countries for alignment of inspection regimes in their methods and use of international standards so that AI products can be evaluated using globally comparable criteria.

Cross-sectoral governmental capacity: The federal government should strengthen its capacity to address cross-sectoral risks and practices related to AI.

Although sector-specific requirements for AI already exist, the exercise of horizontal capacity in the federal government would provide common baseline requirements, reinforce appropriate expertise to oversee AI systems, help to address cross-sectoral risks and practices, allow for better coordination among sectoral regulators that require or consume disclosures and evaluations, and provide regulatory capacity to address foundation models.

Such cross-sectoral horizontal capacity, wherever housed, would be useful for creating accountability inputs such as:

A national registry of high-risk AI deployments;
A national AI adverse incidents reporting database and platform for receiving reports;
A national registry of disclosable AI system audits;
Coordination of, and participation in, audit standards and auditor certifications, enabling advocacy for needs of federal agencies and congruence with independent federal audit actions;
Pre-release review and certification for high-risk deployments and/or systems or models;
The collection of periodic claim substantiation for deployed systems; and
Coordination of AI accountability inputs with agency counterparts in non-adversarial states.

Contracting: The federal government should require that government suppliers, contractors, and grantees adopt sound AI governance and assurance practices for AI used in connection with the contract or grant, including using AI standards and risk management practices recognized by federal agencies, as applicable.

The government’s significant purchasing power affords it the ability to shape marketplace standards, and prefer suppliers who provide sufficient documentation, access, freedom to evaluate, and other assurance practices. As the National AI Advisory Committee Report recommended, the government should reform procurement practices to promote trustworthy AI. The same principles would apply to government grants. The OMB draft guidance on “Advancing Governance, Innovation, and Risk Management for Agency Use of Artificial Intelligence” represents a significant step in this direction.³⁸⁵

³⁸³ Without taking a position at this time, we note there may be other models for funding, such as fee-based application revenue for AI companies who seek government assistance. For literature on certain fee models that exist across some federal agencies, see, e.g. Government Accountability Office (GAO), Federal Design Options: Fee Design Options and Implications for Managing Revenue Instability (GAO Report No. GAO-13-820), (Sept. 2013); James M. MacDonald, User-Fee Financing of USDA Meat and Poultry Inspection, Agricultural Economic Report No. (AER-775), (March 1999), Chapter 3.

³⁸⁴ The Government Accountability Office has also noted that “[f]oundational to solving the AI accountability challenge is having a critical mass of digital expertise to help accelerate responsible delivery and adoption of AI capabilities.” Government Accountability Office (GAO), Artificial Intelligence: Key Practices to Help Ensure Accountability in Federal Use (GAO Report No. GAO-23-106811), at 1 (May 16, 2023).

³⁸⁵ See OMB Draft Memo. See also AI EO at Sec. 7.3 (directing the Department of Labor to establish “guidance for Federal contractors regarding nondiscrimination in hiring involving AI and other technology-based hiring systems”).

Program

Artificial Intelligence

Breadcrumb

GUIDANCE

Audits and auditors: Federal government agencies should work with stakeholders as appropriate to create guidelines for AI audits and auditors, using existing and/or new authorities.

Disclosure and access: Federal government agencies should work with stakeholders to improve standard information disclosures, using existing and/or new authorities.

Liability rules and standards: Federal government agencies should work with stakeholders to make recommendations about applying existing liability rules and standards to AI systems and, as needed, supplementing them.

SUPPORT

People and tools: Federal government agencies should support and invest in technical infrastructure, AI system access tools, personnel, and international standards work to invigorate the accountability ecosystem.

Research: Federal government agencies should conduct and support more research and development related to AI testing and evaluation, tools facilitating access to AI systems for research and evaluation, and provenance technologies through existing and new capacity.

REGULATORY REQUIREMENTS

Audits and other independent evaluations: Federal agencies should use existing and/or new authorities to require as needed independent evaluations and regulatory inspections of high-risk AI model classes and systems.

Cross-sectoral governmental capacity: The federal government should strengthen its capacity to address cross-sectoral risks and practices related to AI.