Authored by Google DeepMind’s Zoë Brammer (Manager, Government Affairs and Public Policy), Lucy Lim (Research Scientist, Frontier Safety and Governance), and Dorothy Chou (Director, Public Engagement Lab), this essay takes learnings from cybersecurity bug bounty programs and uses technical considerations to advocate for a similar program to address AI security vulnerabilities.
Over the past two years, acceleration in artificial intelligence (AI) research has significantly shifted public and government expectations about risks and rewards presented by this transformative technology. A sharp rise in governance efforts - from global summits, to AI safety institutes and risk frameworks - have highlighted the perennial problem of how to govern novel technologies. The ways in which policymakers are accustomed to working—rules which address known risks and are stable over time—and the ways in which AI is evolving – even more rapidly and dynamically than other technologies, with emergent properties and unpredictable risks – are in sharp contrast.
Simply put, AI complicates traditional regulatory approaches. Blunt instruments like sweeping bans not only risk stifling innovation and potential benefits but also risk failing to effectively address actual harms.
But this governance conundrum isn’t entirely unique. History is littered with examples of emerging technologies that have posed direct challenges to pre-existing governance structures or enabled new behaviours which challenge existing principles. There is an opportunity for policymakers to mirror the creativity researchers and scientists bring in their development of model capabilities with equally creative approaches to developing policy and other governance mechanisms.
In this piece, we explore the potential of bug bounty programs to address AI security vulnerabilities. While safety issues may intersect with the security of AI systems, we believe bug bounty programs are best suited to identify vulnerabilities that affect the confidentiality, integrity, and availability of AI systems, rather than those that create safety issues - i.e. issues related to the behaviour or capabilities of AI systems. Google will be sharing more detailed updates about its AI bug bounty program in the coming weeks (Bug Hunter’s Blog).
Cybersecurity: a case study
In the early days of the Internet, community norms in cybersecurity emerged as part of the blueprint for managing emerging threats rather than banning deployment until technologies were perfected.
In the 1980s, a shared interest emerged among developers and the cybersecurity community, comprised in part of ethical hackers (the “white hats”), to identify and patch vulnerabilities before they could be exploited. They established the first bug bounty programme, followed by Netscape’s adoption of the approach in 1995. The concept was simple yet impressive: reward the members of the cybersecurity community for discovering and responsibly disclosing vulnerabilities to developers, rather than selling them on the black market or exposing them and potentially harming users. It accepted the fundamentally imperfect nature of the technology and worked to manage its evolution. As a result, a new and broadly respected field of security research emerged, alongside a framework of “Coordinated Vulnerability Disclosure” (CVD), facilitating collaboration between technology vendors and the cybersecurity community to identify and address security vulnerabilities.
The “responsible disclosure” norm evolved and standardised over time. Google’s Project Zero, for example, established a 90-day window for developers to address an identified vulnerability, alongside a commitment to wait 30 days after a patch is available to users before publicly releasing details of the vulnerability. This enables companies to patch their software while acknowledging the researcher’s contribution publicly and financially, protecting users and companies from exploitation while positively incentivising and compensating a broad community of security researchers. Google expanded their own programme in the 2010s, broadening its vulnerability rewards programme to multiple products and libraries. Along with other tech companies like Facebook and Microsoft, they built informational networks to address similar challenges across the industry. A collaborative and incentive-based approach proved effective in identifying and mitigating software vulnerabilities to the point that the US Department of Defense adopted bug bounty programmes in 2016, followed by the European Commission in 2019.
Of course, this is not to say that bug bounty programs are a replacement for traditional regulatory approaches, or that such an approach would work in all environments and under all conditions. But the key learning is that aligning incentives among different groups and working to understand the unique nature of the technology in question can lead to more effective outcomes in keeping people safe. Today, countless security researchers, both independent and working in industry, participate in corporate and government bug bounty programmes, making the Internet safer for everyone. Expanding these community driven efforts provides an opportunity to augment more traditional regulatory approaches.
Bug-bounty successes and failures
The success of bug bounty programmes lies in aligning incentives. Security researchers gain financial reward, developers fix vulnerabilities before they’re exploited (often at a lower cost for companies than maintaining a large, in-house security team) and users remain protected. The system also crucially accommodates the dynamism of emerging threats and technologies, as it allows for continuous and organic updating and adaptation.
Bug bounty programs have seen enormous success over the years. While it is hard to measure exactly how many vulnerabilities have been reported and subsequently remediated through these programs because patched vulnerabilities are not always reported to the public, it is clear that these programs help make the Internet safer for everyone. In 2018, for example, Apple’s bug bounty program led to the discovery of a critical FaceTime vulnerability by a 14-year-old user; Google’s Vulnerability Reward Program has paid out millions of dollars since 2010; and Facebook’s bug bounty program has paid out over $7.5 million since its inception in 2011.
Unlike heavy-handed regulations, bug bounty programs incentivise proactive security measures that evolve with the technology. Penalising security researchers for their red teaming work would be counterproductive, leaving developers and users less safe. Instead, these programs provide “safe harbors” for those doing essential policing work. The resulting culture of collaboration and continuous improvement has become a cornerstone of cybersecurity.
But bug bounty programs do not always catch vulnerabilities before they are exploited. Two recent cases - the 2014 Heartbleed bug and the 2021 Log4j vulnerability - caused widespread impact to users and to the broader economy, and highlight notable technical factors under which bug bounty programs appear to be less effective – namely that bug bounties often aren't a good fit for fundamental, open source dependencies on which lots of software runs. Notably, these technological factors are often exceedingly more complex in AI systems than they are in software alone, highlighting a major challenge in adapting the bug bounty program to address AI vulnerabilities. We will return to this later.
Adapting the bug bounty Model to AI: challenges and considerations
Given the success of bug bounty programs for software vulnerabilities, it is worth considering what a similar effort might look like for AI. But applying the cybersecurity model to AI is not a perfect fit. Notably, AI systems are composed of a wide variety of technical inputs, including algorithms that, taken together, create systems that are probabilistic and hard to debug. The continuous retraining and development of models adds further complexity to the identification and disclosure of vulnerabilities, particularly for external actors. For example, an exploit that works in one session may not work in another based on the outcome of probabilistic processes and different initial conditions. This may make identification of model vulnerabilities difficult as they lack distinct “exploited” and “non-exploited” states. This challenge is closely tied to interpretability, as it is not always clear in models’ behaviour whether they have been breached. These factors complicate the ability of researchers to identify and conduct root cause analysis of vulnerabilities, making them more difficult to remediate.
The stakes are in some ways also higher than they were in the Internet era. The rapid pace of AI progress, coupled with the potential for both immense benefits and serious risks creates a sense of urgency and competition that make collaborative efforts harder to stand up. Additional challenges arise from the distinction between narrow (focused on a single task) and general (capable of performing many tasks) AI as well as near-term and long-term harms.
Finally, while the cybersecurity community has a well-established culture of evaluating and testing software for vulnerabilities, a similar culture in the AI community remains nascent. The incentives of all stakeholders involved in AI development and deployment may not be as clearly aligned as they are in cybersecurity – and even there, tension still exists. Consumers, safety researchers, companies, policymakers, and even nation-states possess a range of different priorities and motivations which may not converge on a universally accepted framework for disclosure. Even if there was consensus in these areas, it remains unclear what a satisfactory ‘patch’ for some of these issues could look like. This culture needs to be cultivated by encouraging researchers to share their findings where appropriate and to collaborate on solutions.
Despite these challenges, there have been promising initiatives which signal a place to start. Last year’s DEF CON conference had a sub-conference (called a "village") focused on attempting to break the safeguards of various AI programmes, successfully spanning a range of harms from the production of misinformation, to the extraction of stored credit card numbers, to providing instructions on how to surveil someone.
Technical factors: learning from the cybersecurity community
The cybersecurity community has been working through these challenges for decades, and it is critical that the AI security community learn from their work and avoid reinventing the wheel. Bug bounty programs have effectively incentivised the security community to identify and report vulnerabilities to developers, but their effectiveness is also predicated to some extent on specific technical conditions. These conditions interplay with the design and implementation of bug bounty programs. For instance, a program for a complex, closed-source enterprise system might need to provide more detailed documentation and testing environments compared to a program for an open source web application. The development of bug bounty programs for AI must take these technical factors into account, especially in cases where AI adds nuance and complexity to them.
Open source vs. proprietary
Open source software poses unique challenges to bug bounty programs. While it can provide additional transparency as it is available for anyone to review, the incentives to do so are often lower, and the responsibility of vulnerability remediation often falls to small groups of volunteer developers maintaining open source code bases. Further, open source code is incorporated into millions of proprietary tools and services, complicating efforts to patch vulnerabilities, because in some cases each iteration of the open source integration needs its own patch. Further, even if the original, open source package is patched, there is no guarantee that downstream users will update the version they use in downstream applications. Closed source software, on the other hand, may benefit from "security through obscurity" but miss out on the advantages of community-driven reviews.
The question of openness in AI has been of keen interest to both industry and government in the last couple of years. Many advanced AI systems are proprietary, raising questions about how AI labs can provide enough information for meaningful vulnerability testing without exposing intellectual property. Other AI systems and components are open source, and can be downloaded, fine-tuned, and integrated into other tools and services. This poses additional challenges to vulnerability detection and remediation, for the reasons described above, especially when it is unclear who is responsible for vulnerability remediation.
Complexity and scale
Large, complex systems with numerous integrations and dependencies can pose challenges to bug bounty programs, because these systems often contain more attack surfaces and potential vulnerabilities, they make it harder for researchers to set up test environments, and they increase the difficulty of reproducing and verifying reported issues. For example, cloud-based infrastructures or microservices architectures might present unique challenges compared to monolithic applications.
AI systems, which are highly complex, often require specialised knowledge to audit effectively. For example, the use of natural human language as the input to many AI systems (such as modern LLMs) makes traditional security approaches and activities harder. For instance, in a bug bounty, it's hard to de-duplicate reports that use similar techniques but may have different phrasing of the prompts used to elicit the bad behaviour. Input sanitization is also difficult given that the range of accepted inputs could include all possible sentences in all possible human languages.
Vulnerabilities in AI systems can be subtle or context-dependent, in part due to their complexity, raising questions about what actually constitutes a vulnerability versus expected model behaviour. As a result, one major challenge to AI bug bounty programs is simply finding enough qualified researchers with expertise in both AI and cybersecurity.
Technical complexity
The technical complexity of potentially vulnerable systems can significantly impact bug bounty effectiveness. While technical complexity exists in both traditional software and AI systems, AI systems introduce additional challenges that need to be accounted for when designing bug bounty programs. In traditional software, there is often a clear separation between data and logic, and AI systems fundamentally blur this line by using data to train neural networks, effectively combining data and logic into a single source of potential vulnerabilities. This is further complicated by the use of in-band signalling for controlling how trained AI should behave. This integration makes identifying the root cause of AI vulnerabilities significantly more challenging than in traditional software.
Moreover, AI systems can exhibit unique vulnerabilities that don't exist in traditional software, like adversarial and data poisoning attacks. Such vulnerabilities are particularly challenging to detect through traditional bug bounty methods.
While ongoing discussions about employing "secure by design" principles in AI system development are an important start, it is crucial to consider the impact of this unique technical complexity both when structuring AI bug bounty programs and when recognising their limitations.
API and third party integrations
Systems with numerous APIs or third-party integrations introduce additional challenges to bug bounty programs. These factors can increase the attack surface due to multiple entry points and introduce potential authorization and authentication vulnerabilities.
When AI systems are not open sourced, access to them is often provided via an API. Further, as both open source and proprietary AI systems increasingly enter the mainstream technology market, they are likely to be liberally integrated into third party systems. Both of these factors complicate the ability of security researchers to identify vulnerabilities in the underlying AI systems, and differentiate the source of these vulnerabilities.
This challenge is compounded by emerging supply chain security risks within the AI ecosystem. For instance, models hosted in repositories like Hugging Face, while offering convenience and collaboration benefits, can become vectors for attack. A poisoned model, uploaded by a malicious actor, could easily be integrated into downstream applications by unsuspecting users. Detecting such compromised models is particularly difficult due to the complexities of debugging AI systems and the lack of robust tools for identifying backdoors in models. This highlights the need for increased scrutiny and security measures throughout the AI supply chain, from model development and training to deployment and integration.
Update and patch management
Developers can employ a range of practices in building and deploying software, each of which has implications for identifying and remediating security vulnerabilities down the line. Continuous Integration/Continuous Deployment (CI/CD), for example, is a set of practices that automate the process of software delivery and infrastructure changes, and enables rapid response, automated testing, and smaller, incremental changes. Systems with long update cycles, on the other hand, often require extensive testing before patching, and/or release updates less frequently due to complexity, regulatory requirements, or the critical nature of their operations.
However, while patching a codebase is a well established process, no such process exists for patching AI systems. Additionally AI systems require long update cycles and may need to be retrained or fine tuned to patch vulnerabilities. As a result, AI systems are at greater risk to security vulnerabilities as identified issues may take longer to address and remediate. Further, because AI technologies evolve so quickly, they might outpace traditional bug bounty structures and vulnerability remediation practices.
Deployment and infrastructure
A final consideration for vulnerability identification is the way in which software is deployed and the underlying infrastructure it is deployed upon. Cloud vs. on-premises deployments have different security considerations, due to fundamental differences in infrastructure, control, and management. Further, network architecture and segmentation play a role in overall security posture. AI systems can be deployed in various ways (e.g. cloud APIs, edge devices, etc.), and an effective bug bounty program would ideally cover all relevant deployment scenarios, as each poses a distinct set of security considerations and exploitation pathways.
Opportunities for AI Bug Bounties
Despite the technical challenges posed to responsible disclosure for AI, there are a range of potential application areas that are well suited to bug bounty programs for AI systems. The list below is not an exhaustive overview of the risks that should be part of an AI company’s bug bounty. Instead, it aims to articulate attacks which are novel to AI systems that could be added to existing categories of vulnerabilities. These application areas are further detailed in Google’s Abuse Vulnerability Reward Program and focus on identifying vulnerabilities in specific contexts within AI systems.
Adversarial Perturbation: Security researchers could be incentivized to find inputs that cause deterministic but highly unexpected outputs from AI models, for example by discovering contexts in which an adversary can reliably trigger a misclassification in a security control that can be abused for malicious use or adversarial gain. For instance, researchers might discover that adding a specific pattern of pixels to an image causes an AI-powered security camera to misclassify a person as a harmless object like a tree, potentially allowing unauthorised access to restricted areas.
Training Data Extraction/Membership Inference: Security researchers could be incentivised to identify methods to extract training data or personal information from models, by crafting attacks that are able to successfully reconstruct training examples that contain sensitive information.
Prompt Injection and Manipulation: Security researchers could be incentivised to discover vulnerabilities in how AI systems handle user inputs by crafting adversarial prompts that allow an adversary to influence the behaviour of the model, and hence the output, in ways that were not intended by the application. For example, rewards could be given to security researchers who design prompt injections that are invisible to victims and change the state of the victim’s account or any of their assets; prompt injections into any tools in which the response is used to make decisions that directly affect users; or prompt or preamble extraction in which a user is able to extract the initial prompt used to prime the model.
Model Manipulation: Security researchers could be incentivised to uncover methods by which an attacker is able to covertly change the behaviour of a model such that they can trigger pre-defined adversarial behaviours. For example, rewards could be given to security researchers who identify adversarial outputs or behaviour that an attacker can reliably trigger via specific input in a model, or the identification of attacks in which manipulation of the training data of the model is used to influence the model’s output in a victim’s session according to the attacker’s preference.
Model Theft and Exfiltration: Security researchers could be incentivised to uncover methods to steal or replicate the intellectual property included in AI models, for example by discovering techniques to extract the exact model architecture or weights of a proprietary model.
Best practices and ideas to consider
The rapid advancement of AI presents a unique set of security challenges that necessitate innovative solutions. While traditional regulatory approaches will undoubtedly play a role, the success of bug bounty programs in addressing cyber vulnerabilities offers a compelling model for addressing a range of novel AI security vulnerabilities and incentivizing a community approach to secure AI. This is important, not least because AI continues to evolve, but also because it can act as a general purpose technology rather than a sector-specific one. There are critical steps to take to define effective processes and benchmarks for identifying vulnerabilities first before these can be institutionalised in regulation.
Despite challenges posed by adapting bug bounty programs to address AI security vulnerabilities, the effectiveness of traditional bug bounty programs and the positive economic impact they can have should encourage policymakers, industry actors, and the security community to consider investing in their development. Effective bug bounty programs consider the technical factors outlined above when defining scope, setting up testing environments, and determining reward structures. They also require close collaboration between security and development teams to address discovered vulnerabilities in the context of the specific technical landscape.
Current best practices
The cybersecurity community has already made significant progress in instituting a series of best practices, including:
AI developers already prioritise establishing robust internal review processes and actively engaging with external researchers to identify and address vulnerabilities.
AI developers also invest resources in developing better tooling for debugging and inspecting the internal states of models as they run to better root cause issues in model execution.
Industry should continue to integrate bug bounties within well-established vulnerability management programs and software development lifecycles, ensuring that any identified vulnerabilities can be rapidly triaged and mitigated.
Industry currently provides narrowly scoped bug bounties, focused on well defined areas like specific model types or use cases like those described in the section above.
Some governments follow industry to also provide narrowly scoped bug bounties for government technologies, and expand gradually. This approach should be adopted by more governments, as it could ensure that the basic partnerships and practices for responsible disclosure are in place before bug bounty programs are expanded, and will help avoid a whack-a-mole approach.
Some governments and companies provide dynamic reward structures for cyber bug bounties, a practice which will ensure that bounty rewards can adapt to the evolving AI landscape.
Ideas to consider
As described above, the AI security community is already instituting a range of important best practices, but could consider the following ideas to increase the security of AI systems:
AI developers could build AI systems that are “secure by design.” Many existing software vulnerabilities would not exist today if secure by design had been implemented during development. Because AI is a novel technology, industry has an opportunity to learn from the cybersecurity community, and build systems that are less likely to contain security vulnerabilities. While many AI labs are pursuing these strategies already through efforts like SAIF, the community should work together to develop threat models and other secure by design principles, and work together to define new best practices and tools through efforts like CoSAI.
Governments and Industry could collaborate to establish a standardised framework for classifying AI vulnerabilities to ensure consistency across different bug bounty programs, and to ensure a common language for security researchers working in the space, building on efforts like the MITRE ATLAS Matrix.
Governments and industry should incentivize the development of defensive AI research, to balance the existing focus on adversarial research. This could help defenders identify opportunities to develop new defensive techniques.
Governments could consider the role of insurance in incentivizing companies to participate in AI bug bounty programs, taking lessons from the cyber insurance industry.
Governments could encourage the development of AI-specific security tools and testing frameworks to support bug bounty efforts.
Governments could build collaborative programs that bring together AI research institutions and cybersecurity researchers to ensure adequate expertise is dedicated to identifying and remediating security vulnerabilities.