AI Risk Management Should Incorporate Both Safety and Security






Published 5/31/2024 by Xiangyu Qi, Yangsibo Huang, Yi Zeng, Edoardo Debenedetti, Jonas Geiping, Luxi He, Kaixuan Huang, Udari Madhushani, Vikash Sehwag, Weijia Shi and 15 others
AI Risk Management Should Incorporate Both Safety and Security


The exposure of security vulnerabilities in safety-aligned language models, e.g., susceptibility to adversarial attacks, has shed light on the intricate interplay between AI safety and AI security. Although the two disciplines now come together under the overarching goal of AI risk management, they have historically evolved separately, giving rise to differing perspectives. Therefore, in this paper, we advocate that stakeholders in AI risk management should be aware of the nuances, synergies, and interplay between safety and security, and unambiguously take into account the perspectives of both disciplines in order to devise mostly effective and holistic risk mitigation approaches. Unfortunately, this vision is often obfuscated, as the definitions of the basic concepts of safety and security themselves are often inconsistent and lack consensus across communities. With AI risk management being increasingly cross-disciplinary, this issue is particularly salient. In light of this conceptual challenge, we introduce a unified reference framework to clarify the differences and interplay between AI safety and AI security, aiming to facilitate a shared understanding and effective collaboration across communities.

Create account to get full access


If you already have an account, we'll log you in


  • The paper discusses the importance of incorporating both safety and security considerations in AI risk management.
  • It presents a reference framework for comparing safety and security in AI systems, highlighting the key differences and similarities between the two.
  • The paper emphasizes the need for a holistic approach to AI risk management that addresses both safety and security concerns.

Plain English Explanation

The paper argues that when it comes to managing the risks of advanced AI systems, it's not enough to focus just on safety or just on security. Instead, AI risk management should incorporate both safety and security considerations.

Safety in this context refers to the system behaving as intended and avoiding unintended, potentially harmful consequences. For example, an autonomous vehicle should safely navigate roads without causing accidents. Security, on the other hand, is about protecting the system from external threats, like hackers trying to hijack or manipulate the system.

The paper provides a framework for comparing safety and security in AI systems. It highlights how they differ in terms of the types of failures they aim to prevent, the sources of those failures, and the approaches used to address them. Safety concerns often focus on the internal workings of the AI system, while security is more concerned with external threats.

However, the paper also notes that safety and security are interconnected. For instance, a security breach could undermine the safety of an AI system by allowing an attacker to take control and cause unintended harm. Conversely, safety issues like software bugs could create vulnerabilities that hackers could exploit.

The key point is that AI risk management needs to consider both safety and security in a holistic, comprehensive way. This helps ensure that AI systems are not only designed to behave safely, but also protected against malicious interference or exploitation.

Technical Explanation

The paper presents a reference framework for comparing safety and security in AI systems. It establishes four key dimensions for this comparison: failure modes, failure sources, failure prevention approaches, and failure tolerance.

In terms of failure modes, the paper distinguishes between safety failures (unintended, harmful consequences) and security failures (successful attacks or exploits). Safety failures often stem from issues within the AI system itself, such as design flaws or training data biases. Security failures, on the other hand, are typically caused by external actors trying to manipulate or compromise the system.

The approaches used to prevent these failures also differ. Safety focuses on [techniques like robustness testing and formal verification to ensure the system behaves as intended. Security, meanwhile, relies more on methods like access control, encryption, and intrusion detection to protect the system from external threats.

The paper also highlights the distinction between failure tolerance for safety versus security. Safety-critical systems often have redundancies and fail-safes to ensure continued operation in the face of minor failures. Security-critical systems, in contrast, may be designed to completely shut down or enter a safe mode when a security breach is detected, even at the cost of temporary service disruption.

Overall, the framework presented in the paper underscores the need for AI risk management to consider both safety and security concerns in a comprehensive manner, as they are deeply intertwined.

Critical Analysis

The paper makes a compelling case for the importance of incorporating both safety and security considerations in AI risk management. It provides a well-structured framework for comparing the two, which can help researchers and practitioners better understand the nuances and interconnections between the two domains.

One potential limitation of the paper is that it does not delve deeply into the specific techniques or methodologies for addressing the safety and security challenges it identifies. While it highlights the high-level differences in the approaches used, the paper could have provided more concrete examples or case studies to illustrate how these concepts play out in practice.

Additionally, the paper could have discussed the potential trade-offs or tensions that may arise when trying to optimize for both safety and security simultaneously. In some cases, prioritizing one may come at the expense of the other, and the paper could have explored strategies for navigating these difficult decisions.

Furthermore, the paper could have acknowledged the inherent challenges and uncertainties involved in managing the risks of advanced AI systems. As the technology continues to evolve rapidly, new safety and security challenges are likely to emerge, requiring ongoing research and adaptation.

Overall, the paper makes a valuable contribution to the field of AI risk management by emphasizing the need for a holistic approach that addresses both safety and security concerns. As the development of AI systems becomes increasingly complex, this framework can serve as a useful tool for researchers and practitioners to navigate the multifaceted challenges in this domain.


The paper argues that effective AI risk management must incorporate both safety and security considerations. It presents a reference framework for comparing the two, highlighting the key differences and similarities in terms of failure modes, failure sources, failure prevention approaches, and failure tolerance.

The paper emphasizes the deep interconnectedness between safety and security in AI systems, and the importance of adopting a holistic approach that addresses both concerns. By understanding the nuances of these two domains, researchers and practitioners can develop more robust and resilient AI systems that are not only designed to behave safely, but also protected against malicious interference or exploitation.

As the field of AI continues to evolve rapidly, the insights and framework provided in this paper can help guide the development of advanced AI systems that prioritize both safety and security, ultimately reducing the risks and potential harms associated with these powerful technologies.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers


Affirmative safety: An approach to risk management for high-risk AI

Akash R. Wasil, Joshua Clymer, David Krueger, Emily Dardaman, Simeon Campos, Evan R. Murphy





Prominent AI experts have suggested that companies developing high-risk AI systems should be required to show that such systems are safe before they can be developed or deployed. The goal of this paper is to expand on this idea and explore its implications for risk management. We argue that entities developing or deploying high-risk AI systems should be required to present evidence of affirmative safety: a proactive case that their activities keep risks below acceptable thresholds. We begin the paper by highlighting global security risks from AI that have been acknowledged by AI experts and world governments. Next, we briefly describe principles of risk management from other high-risk fields (e.g., nuclear safety). Then, we propose a risk management approach for advanced AI in which model developers must provide evidence that their activities keep certain risks below regulator-set thresholds. As a first step toward understanding what affirmative safety cases should include, we illustrate how certain kinds of technical evidence and operational evidence can support an affirmative safety case. In the technical section, we discuss behavioral evidence (evidence about model outputs), cognitive evidence (evidence about model internals), and developmental evidence (evidence about the training process). In the operational section, we offer examples of organizational practices that could contribute to affirmative safety cases: information security practices, safety culture, and emergency response capacity. Finally, we briefly compare our approach to the NIST AI Risk Management Framework. Overall, we hope our work contributes to ongoing discussions about national and global security risks posed by AI and regulatory approaches to address these risks.

Read more



Managing extreme AI risks amid rapid progress

Yoshua Bengio, Geoffrey Hinton, Andrew Yao, Dawn Song, Pieter Abbeel, Trevor Darrell, Yuval Noah Harari, Ya-Qin Zhang, Lan Xue, Shai Shalev-Shwartz, Gillian Hadfield, Jeff Clune, Tegan Maharaj, Frank Hutter, At{i}l{i}m Gunec{s} Baydin, Sheila McIlraith, Qiqi Gao, Ashwin Acharya, David Krueger, Anca Dragan, Philip Torr, Stuart Russell, Daniel Kahneman, Jan Brauner, Soren Mindermann





Artificial Intelligence (AI) is progressing rapidly, and companies are shifting their focus to developing generalist AI systems that can autonomously act and pursue goals. Increases in capabilities and autonomy may soon massively amplify AI's impact, with risks that include large-scale social harms, malicious uses, and an irreversible loss of human control over autonomous AI systems. Although researchers have warned of extreme risks from AI, there is a lack of consensus about how exactly such risks arise, and how to manage them. Society's response, despite promising first steps, is incommensurate with the possibility of rapid, transformative progress that is expected by many experts. AI safety research is lagging. Present governance initiatives lack the mechanisms and institutions to prevent misuse and recklessness, and barely address autonomous systems. In this short consensus paper, we describe extreme risks from upcoming, advanced AI systems. Drawing on lessons learned from other safety-critical technologies, we then outline a comprehensive plan combining technical research and development with proactive, adaptive governance mechanisms for a more commensurate preparation.

Read more



SoK: On the Semantic AI Security in Autonomous Driving

Junjie Shen, Ningfei Wang, Ziwen Wan, Yunpeng Luo, Takami Sato, Zhisheng Hu, Xinyang Zhang, Shengjian Guo, Zhenyu Zhong, Kang Li, Ziming Zhao, Chunming Qiao, Qi Alfred Chen





Autonomous Driving (AD) systems rely on AI components to make safety and correct driving decisions. Unfortunately, today's AI algorithms are known to be generally vulnerable to adversarial attacks. However, for such AI component-level vulnerabilities to be semantically impactful at the system level, it needs to address non-trivial semantic gaps both (1) from the system-level attack input spaces to those at AI component level, and (2) from AI component-level attack impacts to those at the system level. In this paper, we define such research space as semantic AI security as opposed to generic AI security. Over the past 5 years, increasingly more research works are performed to tackle such semantic AI security challenges in AD context, which has started to show an exponential growth trend. In this paper, we perform the first systematization of knowledge of such growing semantic AD AI security research space. In total, we collect and analyze 53 such papers, and systematically taxonomize them based on research aspects critical for the security field. We summarize 6 most substantial scientific gaps observed based on quantitative comparisons both vertically among existing AD AI security works and horizontally with security works from closely-related domains. With these, we are able to provide insights and potential future directions not only at the design level, but also at the research goal, methodology, and community levels. To address the most critical scientific methodology-level gap, we take the initiative to develop an open-source, uniform, and extensible system-driven evaluation platform, named PASS, for the semantic AD AI security research community. We also use our implemented platform prototype to showcase the capabilities and benefits of such a platform using representative semantic AD AI attacks.

Read more



Holistic Safety and Responsibility Evaluations of Advanced AI Models

Laura Weidinger, Joslyn Barnhart, Jenny Brennan, Christina Butterfield, Susie Young, Will Hawkins, Lisa Anne Hendricks, Ramona Comanescu, Oscar Chang, Mikel Rodriguez, Jennifer Beroshi, Dawn Bloxwich, Lev Proleev, Jilin Chen, Sebastian Farquhar, Lewis Ho, Iason Gabriel, Allan Dafoe, William Isaac





Safety and responsibility evaluations of advanced AI models are a critical but developing field of research and practice. In the development of Google DeepMind's advanced AI models, we innovated on and applied a broad set of approaches to safety evaluation. In this report, we summarise and share elements of our evolving approach as well as lessons learned for a broad audience. Key lessons learned include: First, theoretical underpinnings and frameworks are invaluable to organise the breadth of risk domains, modalities, forms, metrics, and goals. Second, theory and practice of safety evaluation development each benefit from collaboration to clarify goals, methods and challenges, and facilitate the transfer of insights between different stakeholders and disciplines. Third, similar key methods, lessons, and institutions apply across the range of concerns in responsibility and safety - including established and emerging harms. For this reason it is important that a wide range of actors working on safety evaluation and safety research communities work together to develop, refine and implement novel evaluation approaches and best practices, rather than operating in silos. The report concludes with outlining the clear need to rapidly advance the science of evaluations, to integrate new evaluations into the development and governance of AI, to establish scientifically-grounded norms and standards, and to promote a robust evaluation ecosystem.

Read more
