Sorry, you need to enable JavaScript to visit this website.
Skip to main content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.

Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.

The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

AI System Evaluations

March 27, 2024
Earned Trust through AI System Assurance

Transparency and disclosures regarding AI systems are primarily valuable insofar as they feed into accountability.172 One essential tool for converting information into accountability is critical evaluation of the AI system. The National Artificial Intelligence Advisory Committee (NAIAC), in its 2023 report, observed that “practices, standards, and frameworks for designing, developing, and deploying trustworthy AI are created in organizations in a relatively ad hoc way depending on the organization, sector, risk level, and even country.”173 We agree with its accompanying observation that it is problematic that “[r]egulations and standards are being proposed that require some form of audit or compliance, but without clear guidance accompanying them.”174

The RFC described different types of evaluation, including audits, impact and risk assessments, and pre-release certifications. Commenters were divided on whether independent audits are possible now, before there are agreed upon criteria for all aspects. They also questioned whether audits should be mandated.175 Some comments reflected a sense of frustration with decades of self-regulation of technology that has failed to meet societal expectations for risk management and accountability.176 At the same time, other commenters noted that audit practices (whether required or not) can result in rote checklist compliance, industry capture, and audit-washing.177

 

The scope and use of audits in accountability structures should depend on the risk level, deployment sector, maturity of relevant evaluation methodologies, and availability of resources to conduct the audits. Audits are probably appropriate for any high-risk application or model. At the very least, audits should be capable of validating claims made about system performance and limitations as well as governance controls. Where audits seek to assure a broader range of trustworthy AI attributes, they should ideally use replicable, standardized, and transparent methods. We recommend below that audits be required, regulatory authority permitting, for designated high-risk AI systems and applications and that government act to support a vigorous ecosystem of independent evaluation. We also recommend that audits incorporate the requirements in applicable standards that are recognized by federal agencies. Designating what counts as high risk outside of specific deployment or use contexts is difficult. Nevertheless, OMB has designated in draft guidance for federal agencies presumptive categories of rights-impacting and safety-impacting AI systems, while providing for exemptions depending on context.178 This is a promising approach to creating risk buckets for AI systems generally.

 


172 See, e.g., Generally Intelligent Comment at 4 (cautioning that disclosure requirements without consequence can be a “decoy”); Cordell Institute for Policy in Medicine & Law Comment at 2 (with reference to “[a]udits, assessments and certifications,” cautioning that “[m]ere procedural tools will fail to create meaningful trust and accountability without a backdrop of strong, enforceable consumer and civil rights protections.”); Mike Ananny and Kate Crawford, “Seeing Without Knowing: Limitations of the Transparency Ideal and its Application to Algorithmic Accountability,” New Media & Society, Vol. 20, Iss. 3, at 977-982 (December 13, 2016) (describing ten “[l]imits of the transparency ideal”: that “[t]ransparency can be disconnected from power,” “[t]ransparency can be harmful,” “[t]ransparency can intentionally occlude,” “[t]ransparency can create false binaries,” “[t]ransparency can invoke neoliberal models of agency,” “[t]ransparency does not necessarily build trust,” “[t]ransparency entails professional boundary work,” “[t]ransparency can privilege seeing over understanding,” “[t]ransparency has technical limitations,” and “[t]ransparency has temporal limitations”)..

173 National Artificial Intelligence Advisory Committee, Report of the National Artificial Intelligence Advisory Committee (NAIAC), Year 1 (May 2023) at 28.

174 Id.

175 Compare Certification Working Group Comment at 21 (recommending mandating “accountability measures” and auditor and researcher access “for high capability AI systems (those that operate autonomously or semi-autonomously and pose substantial risk of harm, including physical, emotional, economic, or environmental harms”) with The American Legislative Exchange Council Comment at 8 (“voluntary codes of conduct, industry-driven standards, and individual empowerment should be preferred over government regulation in emerging technology.”).

176 The AFL-CIO Technology Institute Comment at 5 (“Self-regulatory, self-certifying, or self-attesting accountability mechanisms are insufficient to provide the level of protection workers, consumers, and the public deserve. Certifications generally only determine whether the development of the AI product or service has followed a promised set of guidelines, typically established by the developer or company or industry body.”); Center for American Progress Comment at 16 (“In order to get private companies to conduct these assessments and audits, mechanisms must directly impact what developers care about most and be aligned with the for-profit incentives driving their rapid technological development. For these reasons, voluntary measures are insufficient. Government action (such as formal rulemaking, executive orders, and new laws) are clearly needed; we cannot allow the Age of AI to be another age of self-regulation.”).

177 Mozilla Comment at 6 (“[I]t is important to untangle incentives in the auditing ecosystem — only where the incentive structure is right and auditors are sufficiently independent (and have sufficient access) can there be more certainty that audits aren’t simply conducted for the purpose of “audit-washing”); The Cordell Institute for Policy in Medicine & Law Comment at 2 (Rules built only around transparency and bias mitigation are “’AI half-measures’ because they provide the appearance of governance but fail (when deployed in isolation) to promote human values or hold liable those who create and deploy AI systems that cause harm.”). See also Ellen P. Goodman and Julia Trehu, Algorithmic Auditing: Chasing AI Accountability, 39 Santa Clara High Tech L. J. 289, 302 (2023) (coining the term “audit-washing” to describe the use of weak audit criteria to effectively misrepresent AI system characteristics, performance, or risks).

178 See OMB Draft Memo at 24-25.