Sorry, you need to enable JavaScript to visit this website.
Skip to main content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.

Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.


The site is secure.

The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Facilitate Internal and Independent Evaluations

March 27, 2024
Earned Trust through AI System Assurance

Commenters noted that self-administered AI system assessments are important for identifying risks and system limitations, building internal capacity for ensuring trustworthy AI, and feeding into independent evaluations. Internal assessments could be a principal object of analysis and verification for independent evaluators to the extent that the assessments are made available.48 Independent external third-party evaluations (also known for short as independent evaluations), including audits and red-teaming, may be necessary for the riskiest systems under a risk-based approach to accountability.49 These independent evaluations can serve to verify claims made about AI system attributes and performance, and/or to measure achievement with respect to those attributes against external benchmarks. Many commenters insisted that AI accountability mechanisms should be mandatory,50 while others thought that voluntary commitments to audits or other independent evaluations would suffice.51 There were also plenty of commenters in between, with one noting that “a healthy policy ecosystem likely balances mandatory accountability mechanisms where risks demand it with voluntary incentives and platforms to share best practices.”52

We believe that there should be a mix of internal and independent evaluations, for the reasons stated above. AI actors may well undertake these evaluations voluntarily in the interest of risk management and harm reduction. However, as discussed below, regulatory and legal requirements around evaluations and evaluation inputs may also be necessary to make relevant actors answerable for their choices. Rather than impede innovation, governance to foster robust evaluations could abet AI development.53



48 See infra Sec. 3.2.4.

49 AI Accountability RFC, 88 Fed. Reg. at 22436 (citations omitted). As discussed in the RFC, “[i]ndependent audits may range from ‘black box’ adversarial audits conducted without the help of the audited entity to ‘white box’ cooperative audits conducted with substantial access to the relevant models and processes.”

50 Anthropic Comment at 10 (recommending mandatory adversarial testing of AI systems before release through NIST or researcher access); Anti-Defamation League (ADL) Comment at 11, 12 (“Public-facing transparency reports, much like the reports required by California’s AB 587, could require information on policies, data handling practices, and training or moderation decisions while prioritizing user privacy and without revealing sensitive or identifying information”); PricewaterhouseCoopers, LLP (PWC) Comment at 8 (“[W]e recommend mandatory disclosure of third-party assurance or an explanation that no AI accountability work has been performed”); AFL-CIO Comment at 5 (advocating mandatory audits); Data & Society Comment at 8 (advocating a mandatory AI accountability framework); Accountable Tech, AI Now, and EPIC, Zero Trust AI Governance Framework at 4 (Aug. 2023) (“It should be clear by now that self-regulation will fail to forestall AI harms. The same is true for any regulatory regime that hinges on voluntary compliance or otherwise outsources key aspects of the process to industry. That includes complex frameworks that rely primarily on auditing – especially first-party (internal) or second-party (contracted vendors) auditing – which Big Tech has increasingly embraced. These approaches may be strong on paper, but in practice, they tend to further empower industry leaders, overburden small businesses, and undercut regulators’ ability to properly enforce the letter and spirit of the law.”).

51 Developers Alliance Comment at 12, 13; R Street Comment at 10-12; Consumer Technology Association Comment at 5; U.S. Chamber of Commerce Comment at 10; Business Roundtable Comment at 5 (“[P]olicymakers should incentivize, support and recognize good faith efforts on the part of industry to implement Responsible AI and encourage self-assessments by internal teams”); OpenAI Comment at 2 (advocates for voluntary commitments “on issues such as pre-deployment testing, content provenance, and trust and safety”).

52 DLA Piper Comment at 12.

53 See Rumman Chowdhury, Submitted Written Testimony for Full Committee Hearing of the House of Representative Committee on Science, Space, and Technology: Artificial Intelligence: Advancing Innovation Towards the National Interest (July 22, 2023), at 1, 2 (“It is important to dispel the myth that ‘governance stifles innovation’. […] I use the phrase ‘brakes help you drive faster’ to explain this phenomenon - the ability to stop a car in dangerous situations enables us to feel comfortable driving at fast speeds. Governance is innovation.”).