Ecosystem Requirements

March 27, 2024

Earned Trust through AI System Assurance

The supply of capable evaluators lags the pace of AI innovation. A paper produced for Google DeepMind, opines: “[i]deally there would exist a rich ecosystem of model auditors providing broad coverage across different risk areas. (This ecosystem is currently under-developed.)”²⁵² Research drawing on auditing experiences across sectors, including pharmaceuticals and aviation, “strongly supports training, standardization, and accreditation for third-party AI auditors.”²⁵³ Many commenters addressed this point, observing that the ecosystem for AI assurance requires more investment, diverse stakeholder participation, and professionalization.

Programmatic Support for Auditors and Red-Teamers

The linchpin for robust evaluations is a supply of qualified auditors, researchers capable of doing red-teaming or other adversarial investigations, and critical personnel inside AI companies. There is now a “substantial gap between the demand for experts to implement responsible AI practices and the professionals who are ready to do so.”²⁵⁴ To grow the pipeline of those professionals, our evaluation of the record suggests that there should be more investment in the training of students in applied statistics, data science, machine learning, computer science, engineering, and other disciplines (perhaps including humanities and social sciences) to do AI accountability work. This training should include methods for obtaining and incorporating the input of affected communities.²⁵⁵ Marketplace demand could demonstrate to motivated students that AI assurance work is in fact a viable professional pathway.

Red teaming – the practice of outside researchers using adversarial tactics to stress test AI systems for vulnerabilities and risks – is becoming an important part of the accountability ecosystem.²⁵⁶ The largest AI companies are embracing red-teaming.²⁵⁷ But as one such company noted, talent is concentrated inside private AI labs, which reduces the capacity for independent evaluation.²⁵⁸ Another possible drag on red-teaming contributions is if red-teams are required to sign nondisclosure agreements to conduct their probes, thereby limiting what they can share with the public and, ultimately, the ways in which their evaluations can feed into the accountability ecosystem. One goal of the White House engagement with red-teaming has been to diversify and increase the supply of red-teams.²⁵⁹ Red-teams, like audit teams, should be diverse and multi-disciplinary in their membership and inquiries. ²⁶⁰ Techniques to support adversarial testing and evaluation include providing bounties and competitions for the detection of AI system flaws.

Datasets and Compute

Insufficient or inadequate datasets can be an obstacle to evaluating AI systems, as well as to training, testing, and refining them to be equitable and otherwise trustworthy. For example, to determine if an AI system is unlawfully discriminatory when deployed in a particular context, it may require consideration of training datasets and/or the availability of new datasets for testing.²⁶¹ This requires test data that many entities will not have. Commenters noted that limited data or data voids make it difficult to conduct some AI system evaluations.²⁶²

The need for publicly supplied datasets for AI system evaluation and advancement is well established. The National AI Research Resource (NAIRR) Task Force was a federal advisory committee with equal representation from government, academia, and private organizations, established by the National AI Initiative Act of 2020. In 2023, it released a template for federal infrastructure support for AI research, including “research related to robustness, scalability, reliability, safety, security, ²⁶³ privacy, interpretability, and equity of AI systems.” To promote American progress in AI, it recommended that Congress establish a research resource (the NAIRR) that would, among other things, make datasets available for training and evaluation, and support research and education around trustworthy AI. The AI EO directed the Director of the National Science Foundation, in coordination with other federal agencies, to launch a pilot program implementing the NAIRR, consistent with past recommendations of the NAIRR task force.²⁶⁴ This has now launched.²⁶⁵

In its final report, the NAIRR Task Force recommended that the NAIRR should “provide access to a federated mix of computational and data resources, testbeds, software and testing tools, and user support services via an integrated portal.”²⁶⁶ Commenters vigorously endorsed supporting the NAIRR.²⁶⁷ Some focused on the provision of datasets, even if NAIRR was not specifically mentioned. One commenter, for example, opined that government, civil society and industry should collaborate “in building data ecosystems which help generate meaningful datasets in quantity and quality, ensuring and enabling a fair and ethical AI ecosystem that provides appropriate levels of data protection.”²⁶⁸ Others stressed that it would advance AI accountability and competition if the federal government made more datasets available to developers.²⁶⁹ Conducting evaluations of AI systems, just as building and refining them, requires the underlying computing power to analyze enormous datasets and run applications. With computing power, known as “compute,” concentrated in the largest companies and some elite universities, we underscore recommendations about making more compute available to researchers and businesses.²⁷⁰

Auditor Certification

Another part of the AI accountability ecosystem in need of development is certification for AI system auditors,²⁷¹ which standards organizations are beginning to establish.²⁷² Auditors should be subject to “professional licensure, professional and ethical standards, and independent quality control and oversight (e.g. peer review and inspection).”²⁷³ ForHumanity, a non-profit public charity which provides AI audit services, recommended that such certifications require auditors to be liable for “false assurance of compliance,” be “qualified to provide expert-level service,” be “held to a standard of [p]rofessionalism and [c]ode of [e]thics,” and have “robust systems to support integrity and confidentiality of” audits and independence.²⁷⁴ Professional standards and best practices can potentially help to strengthen the integrity of audits.²⁷⁵ For example, ForHumanity worked with the Partnership on Employment & Accessible Technology (PEAT) to create a Disability Inclusion and Accessibility audit certification, which trains auditors to assess AI systems for risks that could harm people with disabilities.²⁷⁶ However, it is also possible that the gatekeeping of professionalization and credentials unduly narrows participation. If credentialling is too concentrated or stringent, it could artificially constrain the supply of evaluators. Whether as part of credentialling, or in its absence, transparency about audit methodology and goals may be the most important check on quality.²⁷⁷

It is relatively uncontroversial that auditor independence should be measured according to a prescribed professional standard.²⁷⁸ The European Commission’s Digital Services Act requires annual independent audits of providers of very large online platforms and very large online search engines; the organizations performing these audits must, among other requirements, be “independent from” and without “any conflicts of interest with” the service providers they audit.²⁷⁹ Auditor independence is partly determined by the type of services auditors may have provided to the auditee in the preceding 12-month period prior to the audit.²⁸⁰ The Sarbanes-Oxley Act of 2002 (“Sarbanes-Oxley”) defines independence in the context of annual financial auditing. Some commenters recommended importation of that definition into the AI context in the United States.²⁸¹ Others cautioned against too much credence being given to these or any other formal independence requirements, noting that de jure and actual independence may diverge as auditors can be “captured” by those who pay for their services.²⁸²

Auditors should have subject-matter and assurance experience and reflect the diversity of affected stakeholders.²⁸³ Demand for people or teams qualified to conduct AI evaluations who also satisfy the most rigorous independence requirements could outstrip supply. At least in the short term, tightening the supply of qualified auditors could have cost implications.

One concern raised in feedback to the European Commission on independent audits in the Digital Services Act is that there is a limited number of entities that have a sufficiently high level of independence and can engage in independent audits with the necessary competencies.²⁸⁴ The dilemma is that lower standards of assurance and independence might increase auditor supply, but perhaps at the cost of audit effectiveness and, ultimately, public wellbeing. To be sure, the desired end state is an abundant supply of very independent and qualified auditors. Emerging AI auditor certification programs could help.²⁸⁵

²⁵² Shevlane, supra note 228, at 6. See also Databricks Comment at 2 (“The AI audit ecosystem is not mature enough to support mandatory third-party audits.”).

²⁵³ Inioluwa Deborah Raji, Peggy Xu, Colleen Honigsberg, and Daniel Ho, Outsider Oversight: Designing a Third Party Audit Ecosystem for AI Governance, AIES '22: Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society (July 2022), at 565, 557-571.

²⁵⁴ IAPP Comment at 2.

²⁵⁵ See, e.g., Cornell University Citizens and Technology Lab Comment at 2 (recommending that government fund educational projects involving citizen participation in AI accountability, possibly modeled on the EPA’s program in Participatory Science for Environmental Protection as documented in U.S. Environmental Protection Agency, Office of Science Advisor, Policy and Engagement, Using Participatory Science at EPA: Vision and Principles (June 2022)).

²⁵⁶ DEF CON 2023 held a red-teaming exercise with thousands of people; see Hack The Future. See also Microsoft Comment at 3 (noting that it is “working to extend [red-teaming] beyond traditional cybersecurity assessments to also uncover an AI system’s potential harms”); Stability.ai Comment at 15 (“DEF CON is one example of collaborative efforts to incentivize evaluation and reporting in an unregulated environment.”).

²⁵⁷ See, e.g., Google, Why Red Teams Play a Central Role in Helping Organizations Secure AI Systems (July 2023).

²⁵⁸ See Anthropic Comment at 17.

²⁵⁹ Alan Mislove, Red-Teaming Large Language Models to Identify Novel AI Risks, The White House (August 29, 2023).

²⁶⁰ See, e.g., ADL Comment at 5; Salesforce Comment at 6; Johnson & Johnson Comment at 3 (“Diversity, equity and inclusion must be considered in all aspects of AI (e.g., selecting the issues to address/problems to solve using AI, training and hiring a diverse workforce from the data scientists to programmers, attorneys, and program managers).”).

²⁶¹ See Amy Dickens and Benjamin Moore, Improving Responsible Access to Demographic Data to Address Bias, Centre for Data Ethics and Innovation Blog (June 14, 2023).

²⁶² See, e.g., BSA | The Software Alliance Comment at 12; BigBear Comment at 23.

²⁶³ National Artificial Intelligence Research Resource Task Force, Strengthening and Democratizing the U.S. Artificial Intelligence Innovation Ecosystem: An Implementation Plan for a National Artificial Intelligence Research Resource (January 2023), at A1. See also id. at 33-34 (proposing a data service with curated datasets including from government), 37-39 (proposing educational resources and test beds).

²⁶⁴ AI EO Sec. 5.2(a)(“The program shall pursue the infrastructure, governance mechanisms, and user interfaces to pilot an initial integration of distributed computational, data, model, and training resources to be made available to the research community in support of AI-related research and development.”).

²⁶⁵ National Science Foundation, National Artificial Intelligence Research Resource Pilot.

²⁶⁶ See National Artificial Intelligence Research Resource Task Force, supra note 263, at v.

²⁶⁷ See e.g., Public Knowledge Comment at 14 (“The NAIRR could be a huge benefit to the development of safe, responsible, and publicly beneficial AI systems but the NAIRR needs more than the power of the purse backing it up in order to ensure that publicly-funded research and development remains publicly beneficial. Linking NAIRR resources with regulatory oversight would ensure enforcement of ethical and accountability standards and prevent public research resources from being unfairly captured for private benefit.”); Google DeepMind Comment at 31; Governing AI, supra note 47, at 25; Software and Information Industry Association Comment at 11; UIUC Comment at 17.

²⁶⁸ Johnson & Johnson Comment at 2. See also Centre for Data Ethics and Innovation Blog, “Improving Responsible Access to Demographic Data to Address Bias,” June 14, 2023 (recommending the establishment of demographic data intermediaries or, alternatively, the use of proxy data to infer demographic data in addressing bias).

²⁶⁹ See, e.g., Adobe Comment at 8; U.S. Chamber of Commerce Comment at 11; Kant AI Solutions Comment at 3.

²⁷⁰ See, e.g., A 20-Year Community Roadmap for Artificial Intelligence Research in the US, Computing Community Consortium and AAAI, at 3 (August 2019); National Artificial Intelligence Research Resource Task Force, supra note 263, at ii. See also Nur Ahmed & Muntasir Wahed, The De-democratization of AI: Deep Learning and the Compute Divide in Artificial Intelligence Research, arXiv (Oct. 22, 2020).

²⁷¹ See, e.g., AI Policy and Governance Working Group Comment at 6 (advocating that government be involved in credentialing auditors, which could lower costs and security risks of system access)..

²⁷² ISO is developing standards, ISO/IEC CD 42001 and 42006, for integrated AI management systems and for organizations certifying and auditing those systems respectively. ISO/IEC CD 42001, Information technology — Artificial intelligence — Management system; ISO/IEC CD 42006, Information technology — Artificial intelligence — Requirements for bodies providing audit and certification of artificial intelligence management systems.

²⁷³ AICPA Comment at 2.

²⁷⁴ ForHumanity Comment at 5.

²⁷⁵ Raji et al., Outsider Oversight, supra note 253 at 566 (“Fears of legal repercussions or corporate retaliation can weaken the audit inquiry, and professional standards can help determine limited conditions for liability.”).

²⁷⁶ See ForHumanity, FHCert.

²⁷⁷ See also PWC Comment at A1 (“The communication or report on the results of these engagements, regardless of who performs them, should specify, among other disclosures, the type of assurance provided, the scope of the procedures, and the framework under which it was performed”).

²⁷⁸ See, e.g., American Institute of CPAs (AICPA) Comment at 1 (recommending independent third-party assurance to apply “procedures designed to assess the credibility of the information and report on the results of their procedures”); Protofect Comment at 6 (“Calculation of risk should be determined by a 3rd party organization that can independently perform audits and give scores given multiple contexts - including security, privacy assessment, compliance, health and safety impact etc.”).

²⁷⁹ Regulation (EU) 2022/2065 of the European Parliament and of the Council of 19 October 2022 on a Single Market for Digital Services and Amending Directive 2000/31/EC (Digital Services Act), OJ L 277 (October 27, 2022), arts. 37(1), (3).

²⁸⁰ See Digital Services Act, supra note 279, at art. 37(3)(a)(i).

²⁸¹ See ForHumanity Comment at 5 (referencing Sarbanes-Oxley Act and also recommending that auditors be subject to oversight and held liable for false assurance); Centre for Information Policy Leadership Comment at 18.

²⁸² See, e.g., Data & Society Comment at 3 (“Conflicts of interest for assessors/auditors should be anticipated and mitigated by alternate funding for assurance work.”).

²⁸³ See Global Partners Digital Comment at 4 (commenting that audits should be conducted by teams with technical and social science expertise, human rights expertise, subject matter experts, community members, representatives of marginalized groups).

²⁸⁴ See, e.g., Mozilla Foundation, Response to the European Commission’s Call for Feedback on its Draft Delegated Regulation on Independent Audits in the Digital Services Act (June 2023), at 2, (“Fostering optimal conditions requires a diversity of audit practitioners and auditing organizations with a high level of independence and the appropriate competencies. . . . There is currently a limited number of entities prepared to conduct these audits given their enormous scope. Many likely auditing organizations have existing industry ties that limit their independence. A larger and more diverse pool of auditors must be fostered.”).

²⁸⁵ See also Responsible Artificial Intelligence Institute, The Responsible AI Certification Program (October 2022); ForHumanity Comment at 3; Holistic AI Comment at 5.

Program

Artificial Intelligence

Breadcrumb

Programmatic Support for Auditors and Red-Teamers

Datasets and Compute

Auditor Certification