Risk thresholds for frontier AI

Read original: arXiv:2406.14713 - Published 6/24/2024 by Leonie Koessler, Jonas Schuett, Markus Anderljung

🤖

Overview

Frontier artificial intelligence (AI) systems could pose increasing risks to public safety and security
Two approaches to managing these risks are discussed:
1. Defining capability thresholds - describing AI capabilities beyond which a system is deemed too risky
2. Defining risk thresholds - stating how much risk would be too much (e.g. likelihood of economic damage exceeding a certain level)
Risk thresholds are more principled but harder to evaluate reliably
Recommended approach is to use risk thresholds to help set capability thresholds, while primarily relying on capability thresholds for decision-making
Regulators should explore defining risk thresholds as the most legitimate actors to do so

Plain English Explanation

As frontier AI systems become more advanced, they could start to pose serious risks to public safety and security. To manage these risks, there are a few different approaches that can be taken.

One approach is to define capability thresholds - these describe the specific capabilities that an AI system would need to have before it's considered too risky to deploy. For example, you might say that an AI system is too risky if it's capable of causing more than $X million in economic damage.

Another approach is to define risk thresholds - these directly state how much risk would be too much, rather than focusing on specific capabilities. For instance, you might say that the likelihood of an AI system being used by cybercriminals to cause significant economic damage must not increase by more than Y%.

The advantage of risk thresholds is that they're more based on principles and values, rather than just technical capabilities. However, they're also harder to evaluate accurately.

As a result, the recommended approach is for companies to:

Define risk thresholds to provide a principled foundation
Use those risk thresholds to help set appropriate capability thresholds
Then primarily rely on the capability thresholds when making decisions about deploying AI systems.

Regulators should also get involved in defining risk thresholds, since they're in the best position to do so in a way that's legitimate and represents the public interest. And if AI risk assessment methods become more reliable over time, risk thresholds could play an even more direct role in decision-making.

Technical Explanation

The provided paper discusses two main approaches to managing the increasing risks posed by frontier artificial intelligence (AI) systems:

Capability Thresholds: Defining specific AI capabilities beyond which a system is deemed to pose too much risk. For example, stating that an AI system must not be capable of causing more than $X million in economic damage.
Risk Thresholds: Directly defining the maximum acceptable level of risk, such as stating that the likelihood of an AI system being used by cybercriminals to cause significant economic damage must not increase by more than Y%.

The key advantage of risk thresholds is that they provide a more principled foundation based on values and priorities, rather than just technical capabilities. However, reliably evaluating and quantifying AI risks is often challenging, making risk thresholds harder to implement in practice.

As a result, the authors recommend that companies take a hybrid approach:

Define risk thresholds to provide a principled starting point
Use those risk thresholds to help set appropriate capability thresholds
Then primarily rely on the capability thresholds when making decisions about deploying AI systems

The paper also suggests that regulators should explore defining risk thresholds, as they are the most legitimate actors to do so in a way that represents the public interest. And if AI risk assessment methods become more reliable over time, risk thresholds could play a more direct role in decision-making.

Critical Analysis

The paper presents a thoughtful discussion of the tradeoffs between capability thresholds and risk thresholds for managing the risks of frontier AI systems. The authors rightly acknowledge the challenges in reliably quantifying AI risks, which is a significant limitation of the risk threshold approach.

One potential criticism is that the paper does not delve deeper into the specific methods and metrics that could be used to define and evaluate risk thresholds. Further research may be needed to develop more robust and standardized approaches in this area.

Additionally, the paper does not address the complexities of how risk thresholds could be implemented in practice, such as the potential for disagreement between different stakeholders (e.g. companies, regulators, the public) on what constitutes "acceptable" risk. [Exploring the trade-offs and balancing risk and benefits could be an important area for future work.

Overall, the paper provides a useful framework for thinking about how to approach the management of AI risks, but there is still significant room for further research and practical application of these concepts, particularly as the capabilities of frontier AI systems continue to evolve.

Conclusion

The paper discusses two main approaches to managing the risks of frontier AI systems: capability thresholds and risk thresholds. While risk thresholds are more principled, they are also more challenging to evaluate reliably.

The recommended approach is for companies to define risk thresholds to provide a foundation, use those to help set capability thresholds, and then primarily rely on the capability thresholds for decision-making. Regulators should also explore defining risk thresholds, as they are the most legitimate actors to do so.

As AI risk assessment methods become more advanced, risk thresholds could play an increasingly direct role in managing the deployment of frontier AI systems. However, further research is still needed to develop more robust and standardized approaches in this area.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🤖

Risk thresholds for frontier AI

Leonie Koessler, Jonas Schuett, Markus Anderljung

Frontier artificial intelligence (AI) systems could pose increasing risks to public safety and security. But what level of risk is acceptable? One increasingly popular approach is to define capability thresholds, which describe AI capabilities beyond which an AI system is deemed to pose too much risk. A more direct approach is to define risk thresholds that simply state how much risk would be too much. For instance, they might state that the likelihood of cybercriminals using an AI system to cause X amount of economic damage must not increase by more than Y percentage points. The main upside of risk thresholds is that they are more principled than capability thresholds, but the main downside is that they are more difficult to evaluate reliably. For this reason, we currently recommend that companies (1) define risk thresholds to provide a principled foundation for their decision-making, (2) use these risk thresholds to help set capability thresholds, and then (3) primarily rely on capability thresholds to make their decisions. Regulators should also explore the area because, ultimately, they are the most legitimate actors to define risk thresholds. If AI risk estimates become more reliable, risk thresholds should arguably play an increasingly direct role in decision-making.

6/24/2024

🤖

From Principles to Rules: A Regulatory Approach for Frontier AI

Jonas Schuett, Markus Anderljung, Alexis Carlier, Leonie Koessler, Ben Garfinkel

Several jurisdictions are starting to regulate frontier artificial intelligence (AI) systems, i.e. general-purpose AI systems that match or exceed the capabilities present in the most advanced systems. To reduce risks from these systems, regulators may require frontier AI developers to adopt safety measures. The requirements could be formulated as high-level principles (e.g. 'AI systems should be safe and secure') or specific rules (e.g. 'AI systems must be evaluated for dangerous model capabilities following the protocol set forth in...'). These regulatory approaches, known as 'principle-based' and 'rule-based' regulation, have complementary strengths and weaknesses. While specific rules provide more certainty and are easier to enforce, they can quickly become outdated and lead to box-ticking. Conversely, while high-level principles provide less certainty and are more costly to enforce, they are more adaptable and more appropriate in situations where the regulator is unsure exactly what behavior would best advance a given regulatory objective. However, rule-based and principle-based regulation are not binary options. Policymakers must choose a point on the spectrum between them, recognizing that the right level of specificity may vary between requirements and change over time. We recommend that policymakers should initially (1) mandate adherence to high-level principles for safe frontier AI development and deployment, (2) ensure that regulators closely oversee how developers comply with these principles, and (3) urgently build up regulatory capacity. Over time, the approach should likely become more rule-based. Our recommendations are based on a number of assumptions, including (A) risks from frontier AI systems are poorly understood and rapidly evolving, (B) many safety practices are still nascent, and (C) frontier AI developers are best placed to innovate on safety practices.

7/11/2024

🤖

Managing extreme AI risks amid rapid progress

Yoshua Bengio, Geoffrey Hinton, Andrew Yao, Dawn Song, Pieter Abbeel, Trevor Darrell, Yuval Noah Harari, Ya-Qin Zhang, Lan Xue, Shai Shalev-Shwartz, Gillian Hadfield, Jeff Clune, Tegan Maharaj, Frank Hutter, At{i}l{i}m Gunec{s} Baydin, Sheila McIlraith, Qiqi Gao, Ashwin Acharya, David Krueger, Anca Dragan, Philip Torr, Stuart Russell, Daniel Kahneman, Jan Brauner, Soren Mindermann

Artificial Intelligence (AI) is progressing rapidly, and companies are shifting their focus to developing generalist AI systems that can autonomously act and pursue goals. Increases in capabilities and autonomy may soon massively amplify AI's impact, with risks that include large-scale social harms, malicious uses, and an irreversible loss of human control over autonomous AI systems. Although researchers have warned of extreme risks from AI, there is a lack of consensus about how exactly such risks arise, and how to manage them. Society's response, despite promising first steps, is incommensurate with the possibility of rapid, transformative progress that is expected by many experts. AI safety research is lagging. Present governance initiatives lack the mechanisms and institutions to prevent misuse and recklessness, and barely address autonomous systems. In this short consensus paper, we describe extreme risks from upcoming, advanced AI systems. Drawing on lessons learned from other safety-critical technologies, we then outline a comprehensive plan combining technical research and development with proactive, adaptive governance mechanisms for a more commensurate preparation.

5/24/2024

On the Limitations of Compute Thresholds as a Governance Strategy

Sara Hooker

At face value, this essay is about understanding a fairly esoteric governance tool called compute thresholds. However, in order to grapple with whether these thresholds will achieve anything, we must first understand how they came to be. To do so, we need to engage with a decades-old debate at the heart of computer science progress, namely, is bigger always better? Does a certain inflection point of compute result in changes to the risk profile of a model? Hence, this essay may be of interest not only to policymakers and the wider public but also to computer scientists interested in understanding the role of compute in unlocking breakthroughs. This discussion is timely given the wide adoption of compute thresholds in both the White House Executive Orders on AI Safety (EO) and the EU AI Act to identify more risky systems. A key conclusion of this essay is that compute thresholds, as currently implemented, are shortsighted and likely to fail to mitigate risk. The relationship between compute and risk is highly uncertain and rapidly changing. Relying upon compute thresholds overestimates our ability to predict what abilities emerge at different scales. This essay ends with recommendations for a better way forward.

7/31/2024