SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation

2310.12508

Published 4/5/2024 by Chongyu Fan, Jiancheng Liu, Yihua Zhang, Eric Wong, Dennis Wei, Sijia Liu

🖼️

Abstract

With evolving data regulations, machine unlearning (MU) has become an important tool for fostering trust and safety in today's AI models. However, existing MU methods focusing on data and/or weight perspectives often suffer limitations in unlearning accuracy, stability, and cross-domain applicability. To address these challenges, we introduce the concept of 'weight saliency' for MU, drawing parallels with input saliency in model explanation. This innovation directs MU's attention toward specific model weights rather than the entire model, improving effectiveness and efficiency. The resultant method that we call saliency unlearning (SalUn) narrows the performance gap with 'exact' unlearning (model retraining from scratch after removing the forgetting data points). To the best of our knowledge, SalUn is the first principled MU approach that can effectively erase the influence of forgetting data, classes, or concepts in both image classification and generation tasks. As highlighted below, For example, SalUn yields a stability advantage in high-variance random data forgetting, e.g., with a 0.2% gap compared to exact unlearning on the CIFAR-10 dataset. Moreover, in preventing conditional diffusion models from generating harmful images, SalUn achieves nearly 100% unlearning accuracy, outperforming current state-of-the-art baselines like Erased Stable Diffusion and Forget-Me-Not. Codes are available at https://github.com/OPTML-Group/Unlearn-Saliency. (WARNING: This paper contains model outputs that may be offensive in nature.)

Create account to get full access

Overview

With evolving data regulations, machine unlearning (MU) has become an important tool for fostering trust and safety in today's AI models.
Existing MU methods focusing on data and/or weight perspectives often suffer limitations in unlearning accuracy, stability, and cross-domain applicability.
This paper introduces the concept of 'weight saliency' for MU, drawing parallels with input saliency in model explanation.
The resultant method, called saliency unlearning (SalUn), narrows the performance gap with 'exact' unlearning (model retraining from scratch after removing the forgetting data points).
SalUn can effectively erase the influence of forgetting data, classes, or concepts in both image classification and generation tasks.

Plain English Explanation

As AI models become more widespread, there is a growing need to ensure they are trustworthy and safe. One way to address this is through machine unlearning (MU), which allows AI models to 'forget' certain information or data points.

However, existing MU methods have limitations - they may not be very accurate, stable, or able to be applied to different types of AI tasks. To address these issues, the researchers in this paper introduced a new approach called saliency unlearning (SalUn).

SalUn is based on the idea of 'weight saliency' - instead of trying to forget the entire AI model, it focuses on the specific parts (weights) of the model that are most important for the information that needs to be forgotten. This makes the unlearning process more effective and efficient.

The researchers found that SalUn performs better than existing methods, especially when it comes to forgetting random data or preventing harmful image generation. For example, on the CIFAR-10 dataset, SalUn was only 0.2% less accurate than completely retraining the model from scratch after removing the forgetting data.

Overall, SalUn seems to be a promising approach for making AI models more trustworthy and secure, by allowing them to selectively 'forget' certain information when needed.

Technical Explanation

The key innovation in this paper is the concept of 'weight saliency' for machine unlearning (MU). This draws a parallel with the idea of 'input saliency' in model explanation, where certain input features are identified as more important for a model's output.

Similarly, the researchers hypothesized that focusing MU on the most salient weights of a model, rather than the entire model, could improve the effectiveness and efficiency of the unlearning process. This led to the development of their saliency unlearning (SalUn) method.

SalUn works by first identifying the most important weights in the model for the information that needs to be forgotten. It then selectively updates these weights to remove the influence of the forgetting data, classes, or concepts. This is in contrast to previous MU methods that would typically retrain the entire model or modify the entire set of weights.

The researchers evaluated SalUn on both image classification and generation tasks. For image classification on CIFAR-10, they found that SalUn achieved a stability advantage, with only a 0.2% gap compared to 'exact' unlearning (full model retraining).

In the more challenging domain of conditional diffusion models for image generation, SalUn was able to prevent the model from generating harmful images with nearly 100% unlearning accuracy, outperforming state-of-the-art baselines like Erased Stable Diffusion and Forget-Me-Not.

Critical Analysis

The researchers acknowledge that SalUn, like other MU methods, has some limitations. For example, the weight saliency calculation assumes the model is differentiable, which may not always be the case. Additionally, the method may not be as effective for unlearning highly entangled or distributed representations in the model.

Another potential issue is that the paper does not provide a detailed analysis of the computational efficiency of SalUn compared to other MU approaches. The runtime and memory requirements of the saliency calculation and weight updating steps could be an important practical consideration.

It would also be valuable to see how SalUn performs on a wider range of tasks and datasets, beyond the image classification and generation experiments presented. Evaluating the method's cross-domain applicability and robustness would strengthen the claims of its effectiveness.

That said, the introduction of 'weight saliency' for MU is a novel and promising concept that deserves further exploration. The strong results, especially in the challenging domain of conditional diffusion models, suggest SalUn is a valuable addition to the MU toolkit.

Conclusion

This paper presents a new machine unlearning approach called saliency unlearning (SalUn) that focuses on selectively modifying the most important weights in a model, rather than the entire model. This innovation helps improve the unlearning accuracy, stability, and cross-domain applicability of MU compared to previous methods.

The researchers demonstrate SalUn's effectiveness on both image classification and generation tasks, showing it can outperform state-of-the-art baselines, especially in preventing conditional diffusion models from generating harmful outputs.

While SalUn has some limitations, the core idea of 'weight saliency' for MU is a significant contribution that could inspire further research in this area. As AI models become more widespread, tools like SalUn will be crucial for building trust and ensuring the safety of these systems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🖼️

Multi-Class Unlearning for Image Classification via Weight Filtering

Samuele Poppi, Sara Sarto, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

Machine Unlearning is an emerging paradigm for selectively removing the impact of training datapoints from a network. Unlike existing methods that target a limited subset or a single class, our framework unlearns all classes in a single round. We achieve this by modulating the network's components using memory matrices, enabling the network to demonstrate selective unlearning behavior for any class after training. By discovering weights that are specific to each class, our approach also recovers a representation of the classes which is explainable by design. We test the proposed framework on small- and medium-scale image classification datasets, with both convolution- and Transformer-based backbones, showcasing the potential for explainable solutions through unlearning.

6/11/2024

cs.CV cs.AI cs.LG

🖼️

Single Image Unlearning: Efficient Machine Unlearning in Multimodal Large Language Models

Jiaqi Li, Qianshan Wei, Chuanyi Zhang, Guilin Qi, Miaozeng Du, Yongrui Chen, Sheng Bi

Machine unlearning empowers individuals with the `right to be forgotten' by removing their private or sensitive information encoded in machine learning models. However, it remains uncertain whether MU can be effectively applied to Multimodal Large Language Models (MLLMs), particularly in scenarios of forgetting the leaked visual data of concepts. To overcome the challenge, we propose an efficient method, Single Image Unlearning (SIU), to unlearn the visual recognition of a concept by fine-tuning a single associated image for few steps. SIU consists of two key aspects: (i) Constructing Multifaceted fine-tuning data. We introduce four targets, based on which we construct fine-tuning data for the concepts to be forgotten; (ii) Jointly training loss. To synchronously forget the visual recognition of concepts and preserve the utility of MLLMs, we fine-tune MLLMs through a novel Dual Masked KL-divergence Loss combined with Cross Entropy loss. Alongside our method, we establish MMUBench, a new benchmark for MU in MLLMs and introduce a collection of metrics for its evaluation. Experimental results on MMUBench show that SIU completely surpasses the performance of existing methods. Furthermore, we surprisingly find that SIU can avoid invasive membership inference attacks and jailbreak attacks. To the best of our knowledge, we are the first to explore MU in MLLMs. We will release the code and benchmark in the near future.

5/30/2024

cs.CV cs.AI

Rethinking Machine Unlearning for Large Language Models

Sijia Liu, Yuanshun Yao, Jinghan Jia, Stephen Casper, Nathalie Baracaldo, Peter Hase, Xiaojun Xu, Yuguang Yao, Hang Li, Kush R. Varshney, Mohit Bansal, Sanmi Koyejo, Yang Liu

We explore machine unlearning (MU) in the domain of large language models (LLMs), referred to as LLM unlearning. This initiative aims to eliminate undesirable data influence (e.g., sensitive or illegal information) and the associated model capabilities, while maintaining the integrity of essential knowledge generation and not affecting causally unrelated information. We envision LLM unlearning becoming a pivotal element in the life-cycle management of LLMs, potentially standing as an essential foundation for developing generative AI that is not only safe, secure, and trustworthy, but also resource-efficient without the need of full retraining. We navigate the unlearning landscape in LLMs from conceptual formulation, methodologies, metrics, and applications. In particular, we highlight the often-overlooked aspects of existing LLM unlearning research, e.g., unlearning scope, data-model interaction, and multifaceted efficacy assessment. We also draw connections between LLM unlearning and related areas such as model editing, influence functions, model explanation, adversarial training, and reinforcement learning. Furthermore, we outline an effective assessment framework for LLM unlearning and explore its applications in copyright and privacy safeguards and sociotechnical harm reduction.

4/8/2024

cs.LG cs.CL

Label Smoothing Improves Machine Unlearning

Zonglin Di, Zhaowei Zhu, Jinghan Jia, Jiancheng Liu, Zafar Takhirov, Bo Jiang, Yuanshun Yao, Sijia Liu, Yang Liu

The objective of machine unlearning (MU) is to eliminate previously learned data from a model. However, it is challenging to strike a balance between computation cost and performance when using existing MU techniques. Taking inspiration from the influence of label smoothing on model confidence and differential privacy, we propose a simple gradient-based MU approach that uses an inverse process of label smoothing. This work introduces UGradSL, a simple, plug-and-play MU approach that uses smoothed labels. We provide theoretical analyses demonstrating why properly introducing label smoothing improves MU performance. We conducted extensive experiments on six datasets of various sizes and different modalities, demonstrating the effectiveness and robustness of our proposed method. The consistent improvement in MU performance is only at a marginal cost of additional computations. For instance, UGradSL improves over the gradient ascent MU baseline by 66% unlearning accuracy without sacrificing unlearning efficiency.

6/13/2024

cs.LG