On the Limitations of Compute Thresholds as a Governance Strategy

Read original: arXiv:2407.05694 - Published 7/31/2024 by Sara Hooker

On the Limitations of Compute Thresholds as a Governance Strategy

Overview

The paper examines the limitations of using compute thresholds as a governance strategy for artificial intelligence (AI) systems.
It explores the complex and uncertain relationship between the amount of compute power used and the associated risks or potential harms.
The paper challenges the simplistic notion that limiting compute can effectively mitigate the risks of large-scale AI systems.

Plain English Explanation

The research paper discusses the challenges of using compute thresholds as a strategy for governing and managing the risks of AI systems. The core idea is that there may not be a simple, direct relationship between the amount of computing power used to train an AI model and the potential harms or risks it poses.

On the Limitations of Compute Thresholds as a Governance Strategy argues that the relationship between compute and risk is more complex and uncertain than it may seem at first glance. Just because an AI system is trained on less computing power doesn't necessarily mean it will be less risky or harmful. There are many other factors, like the dataset used, the model architecture, and the intended use case, that can influence the potential risks.

The paper also highlights how limiting compute may have unintended consequences, such as incentivizing the development of more efficient but potentially riskier AI systems. It suggests that a more nuanced, multifaceted approach to AI governance is needed, one that considers a broader range of factors beyond just the amount of compute power used.

Technical Explanation

The paper On the Limitations of Compute Thresholds as a Governance Strategy challenges the idea that imposing compute thresholds can effectively mitigate the risks of large-scale AI systems. The authors argue that the relationship between compute and risk is more complex and uncertain than often assumed.

The paper examines how factors like dataset quality, model architecture, and intended use case can influence the potential harms or risks of an AI system, independent of the amount of compute power used for training. It suggests that limiting compute may have unintended consequences, such as incentivizing the development of more efficient but potentially riskier AI models.

The authors draw on examples from the research literature, such as the Risk Thresholds at the Frontier of AI and More Compute is What You Need papers, to illustrate the limitations of a compute-centric approach to AI governance. They also discuss the potential implications for public perceptions and societal-scale AI governance.

Overall, the paper argues for a more nuanced, multifaceted approach to AI governance that considers a broader range of factors beyond just the amount of compute power used, in order to sustainably scale AI while mitigating its risks.

Critical Analysis

The paper raises valid concerns about the limitations of using compute thresholds as a primary strategy for governing the risks of large-scale AI systems. The authors make a compelling case that the relationship between compute and risk is more complex and uncertain than often assumed.

One strength of the paper is its recognition of the many other factors, beyond just compute, that can influence the potential harms or benefits of an AI system. The authors rightly point out that characteristics like dataset quality, model architecture, and intended use case can be just as, if not more, important than the amount of compute power used.

However, the paper could have delved deeper into some of the specific mechanisms by which compute thresholds may have unintended consequences, such as incentivizing the development of more efficient but riskier models. Additionally, the paper could have explored in more detail the alternative approaches to AI governance that the authors suggest are needed, beyond just compute thresholds.

Overall, the paper makes a valuable contribution by challenging the simplistic notion that limiting compute can effectively mitigate AI risks. It highlights the need for a more nuanced, multifaceted approach to AI governance that considers a broader range of factors. Further research and discussion in this area could help develop more effective and sustainable strategies for governing the development and deployment of large-scale AI systems.

Conclusion

The research paper "On the Limitations of Compute Thresholds as a Governance Strategy" argues that using compute thresholds as the primary approach to governing the risks of AI systems is overly simplistic and flawed. The authors demonstrate that the relationship between the amount of compute power used and the potential harms or benefits of an AI system is much more complex and uncertain than often assumed.

The paper emphasizes that factors like dataset quality, model architecture, and intended use case can be just as, if not more, important than compute power in determining the risks associated with an AI system. It also suggests that limiting compute may have unintended consequences, such as incentivizing the development of more efficient but potentially riskier models.

Overall, the paper makes a compelling case for a more nuanced, multifaceted approach to AI governance that considers a broader range of factors beyond just compute thresholds. Developing effective strategies for governing the development and deployment of large-scale AI systems remains a critical challenge, and this research contributes valuable insights to this important ongoing discussion.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

On the Limitations of Compute Thresholds as a Governance Strategy

Sara Hooker

At face value, this essay is about understanding a fairly esoteric governance tool called compute thresholds. However, in order to grapple with whether these thresholds will achieve anything, we must first understand how they came to be. To do so, we need to engage with a decades-old debate at the heart of computer science progress, namely, is bigger always better? Does a certain inflection point of compute result in changes to the risk profile of a model? Hence, this essay may be of interest not only to policymakers and the wider public but also to computer scientists interested in understanding the role of compute in unlocking breakthroughs. This discussion is timely given the wide adoption of compute thresholds in both the White House Executive Orders on AI Safety (EO) and the EU AI Act to identify more risky systems. A key conclusion of this essay is that compute thresholds, as currently implemented, are shortsighted and likely to fail to mitigate risk. The relationship between compute and risk is highly uncertain and rapidly changing. Relying upon compute thresholds overestimates our ability to predict what abilities emerge at different scales. This essay ends with recommendations for a better way forward.

7/31/2024

🏋️

Training Compute Thresholds: Features and Functions in AI Governance

Lennart Heim, Leonie Koessler

Regulators in the US and EU are using thresholds based on training compute--the number of computational operations used in training--to identify general-purpose artificial intelligence (GPAI) models that may pose risks of large-scale societal harm. We argue that training compute currently is the most suitable metric to identify GPAI models that deserve regulatory oversight and further scrutiny. Training compute correlates with model capabilities and risks, is quantifiable, can be measured early in the AI lifecycle, and can be verified by external actors, among other advantageous features. These features make compute thresholds considerably more suitable than other proposed metrics to serve as an initial filter to trigger additional regulatory requirements and scrutiny. However, training compute is an imperfect proxy for risk. As such, compute thresholds should not be used in isolation to determine appropriate mitigation measures. Instead, they should be used to detect potentially risky GPAI models that warrant regulatory oversight, such as through notification requirements, and further scrutiny, such as via model evaluations and risk assessments, the results of which may inform which mitigation measures are appropriate. In fact, this appears largely consistent with how compute thresholds are used today. As GPAI technology and market structures evolve, regulators should update compute thresholds and complement them with other metrics into regulatory review processes.

8/7/2024

🤖

Risk thresholds for frontier AI

Leonie Koessler, Jonas Schuett, Markus Anderljung

Frontier artificial intelligence (AI) systems could pose increasing risks to public safety and security. But what level of risk is acceptable? One increasingly popular approach is to define capability thresholds, which describe AI capabilities beyond which an AI system is deemed to pose too much risk. A more direct approach is to define risk thresholds that simply state how much risk would be too much. For instance, they might state that the likelihood of cybercriminals using an AI system to cause X amount of economic damage must not increase by more than Y percentage points. The main upside of risk thresholds is that they are more principled than capability thresholds, but the main downside is that they are more difficult to evaluate reliably. For this reason, we currently recommend that companies (1) define risk thresholds to provide a principled foundation for their decision-making, (2) use these risk thresholds to help set capability thresholds, and then (3) primarily rely on capability thresholds to make their decisions. Regulators should also explore the area because, ultimately, they are the most legitimate actors to define risk thresholds. If AI risk estimates become more reliable, risk thresholds should arguably play an increasingly direct role in decision-making.

6/24/2024

✅

More Compute Is What You Need

Zhen Guo

Large language model pre-training has become increasingly expensive, with most practitioners relying on scaling laws to allocate compute budgets for model size and training tokens, commonly referred to as Compute-Optimal or Chinchilla Optimal. In this paper, we hypothesize a new scaling law that suggests model performance depends mostly on the amount of compute spent for transformer-based models, independent of the specific allocation to model size and dataset size. Using this unified scaling law, we predict that (a) for inference efficiency, training should prioritize smaller model sizes and larger training datasets, and (b) assuming the exhaustion of available web datasets, scaling the model size might be the only way to further improve model performance.

5/3/2024