AI System Access For Researchers and Other Third Parties
Researchers, auditors, red-teams, and other affected parties such as workers and unions all need appropriate access to AI systems to evaluate them. While researchers can conduct “adversarial” reviews of public-facing systems without any special access, collaboration between the evaluator and the AI actor will often be required to fully assure that systems are trustworthy.152 Commenters urged the government to facilitate appropriate external access to AI systems.153 Rigorous inquiries could require access to governance controls and design decisions, access to AI system processes (for example, to run evaluator-supplied inputs through the system), as well as access to components of the model itself, accompanying software or hardware, data inputs, model outputs, and/or refinements and modifications.
The degree of access required will vary with the questions raised. For the researcher who wants to examine whether an application has produced unlawfully discriminatory outcomes, it may be enough to have input and output data (also known as a black box model access). Commenters noted that to assess the damage that could result from malign use of advanced AI, such as large language models, much more access may be required. One commenter referenced the New York Federal Reserve system of embedding a team within every major bank in New York as a model154 and suggested that “[t]o faithfully evaluate models with all of the advantages that a motivated outsider would have with access to a model’s architecture and parameters, auditors must be given resources that enable them to simulate the level of access that would be available to a malign actor if the model architecture and parameters were stolen.”155 Some commenters argued that creators and individuals should be able to request access to AI system datasets to identify and report personal data or copyrighted works.156
We note that facilitating researcher access to data from very large online platforms and search engines and their associated algorithmic systems is something that the Digital Services Act requires in the European Union. That regulation has deemed researcher access an indispensable part of the platform accountability scheme in certain instances.
Third-party access to AI systems for the purpose of evaluations comes with risks that need to be managed. Three principal risks are:
- Liability risks to researchers for claims of copyright or contract violation or for circumventing terms of service (e.g., by scraping data) and other controls seeking to protect AI system components from view.157 A number of commenters proposed a safe harbor from intellectual property or other liability for research into AI risks.158
- Security risks to AI actors from providing access (willingly or not) to AI system components. Access to outsiders can jeopardize the trade secrets of AI actors as well as controls they have in place to prevent misuse of AI systems. Application Programming Interfaces (APIs) can be used to mediate access between researchers and AI actors, thereby reducing these risks.159
- Privacy risks to the subjects of sensitive data that may be revealed when data is accessed for evaluation. For example, evaluation of an AI system for outputting discriminatory recommendations around loans might require access to personal data about loan applicants. Researchers usually have processes in place to minimize these risks, such as by limiting data collection, obfuscating sensitive data before storing it, and complying with institutional review board requirements. Using existing, and developing new, privacy enhancing technologies can also mitigate these risks.160
The security and privacy risks underscore the need to vet researchers before permitting access to certain AI system components, monitor and limit access, and define other controls on when, why, and how sensitive information is shared.
152 See Jakob Mökander, Jonas Schuett, Hannah Rose Kirk, and Luciano Floridi, Auditing Large Language Models: A Three-Layered Approach, AI Ethics (2023), at 8.
153 See, e.g., OpenMined Comment at 1; Stanford Institute for Human-Centered AI Center for Research on Foundation Models Comment at 6-7 (recommending mandated researcher access to evaluate foundation models (red-teaming), mediated by provider consent and perhaps in the form of a sandbox).
154 ARC Comment at 7.
155 ARC Comment at 9. See also AI Policy and Governance Working Group Comment at 3 (The government should “mandate access to the technical infrastructure to enable varying levels of visibility into different components of (potentially) consequential AI systems”); Stanford Institute for Human-Centered AI Center for Research on Foundation Models Comment at 6-7 (recommending mandated researcher access to evaluate foundation models, mediated by deployer consent and perhaps in the form of a sandbox).
156 See, e.g., Copyright Alliance Comment at 6 (“Best practices from corporations, research institutions, governments, and other organizations that encourage transparency around AI ingestion already exist that enable users of AI systems or those affected by its outputs to know the provenance of those outputs. In particular, except where the AI developer is also the copyright owner of the works being ingested by the AI system, it is vital that AI developers maintain records of which copyrighted works are being ingested and how those works are being used, and make those records publicly accessible as appropriate (and subject to whatever reasonable confidentiality provisions the parties to a license may negotiate).”).
157 The Supreme Court in a recent decision interpreted the Computer Fraud and Abuse Act to potentially narrow the circumstances under which scraping data for purposes such as researching discrimination might constitute a violation of the statute. See Van Buren v. United States, 141 S.Ct. 1648 (2021). Nevertheless, this and other cases have not fully dispelled the fears of independent researchers. See Sasha Costanza-Chock, Inioluwa Deborah Raji, and Joy Buolamwini, “Who Audits the Auditors? Recommendations from a field scan of the algorithmic auditing ecosystem,” Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT '22), 1571–1583, 1577.
158 See infra Sec 5.1.
159 See, e.g., GovAI Comment at 8-9 (noting that a “research API should have different access tiers based on trust” and supporting “the creation of a secure research API” that would be integrated with the National AI Research Resource).
160 See e.g., OpenMined Comment at 3; GovAI Comment at 8 (noting that “structured transparency can help balance access with security through the use of privacy enhancing technologies”); Researchers at Boston University and University of Chicago Comment at 8 (recommending that federal regulators “encourage the development and use of …privacy enhancing technologies that protect businesses' and consumers’ privacy interests without compromising accountability.”).