Forget Sharpness: Perturbed Forgetting of Model Biases Within SAM Dynamics

2406.06700

Published 6/12/2024 by Ankit Vani, Frederick Tung, Gabriel L. Oliveira, Hossein Sharifi-Noghabi

Forget Sharpness: Perturbed Forgetting of Model Biases Within SAM Dynamics

Abstract

Despite attaining high empirical generalization, the sharpness of models trained with sharpness-aware minimization (SAM) do not always correlate with generalization error. Instead of viewing SAM as minimizing sharpness to improve generalization, our paper considers a new perspective based on SAM's training dynamics. We propose that perturbations in SAM perform perturbed forgetting, where they discard undesirable model biases to exhibit learning signals that generalize better. We relate our notion of forgetting to the information bottleneck principle, use it to explain observations like the better generalization of smaller perturbation batches, and show that perturbed forgetting can exhibit a stronger correlation with generalization than flatness. While standard SAM targets model biases exposed by the steepest ascent directions, we propose a new perturbation that targets biases exposed through the model's outputs. Our output bias forgetting perturbations outperform standard SAM, GSAM, and ASAM on ImageNet, robustness benchmarks, and transfer to CIFAR-{10,100}, while sometimes converging to sharper regions. Our results suggest that the benefits of SAM can be explained by alternative mechanistic principles that do not require flatness of the loss surface.

Create account to get full access

Overview

This paper explores the relationship between Sharpness-Aware Minimization (SAM) and the forgetting of model biases.
The authors show that SAM can lead to the "perturbed forgetting" of model biases, which means that biases are not completely eliminated but rather transformed in complex ways.
The paper provides insights into how SAM affects the landscape of the optimization problem and the properties of the final trained model.

Plain English Explanation

The paper looks at a machine learning technique called Sharpness-Aware Minimization (SAM). SAM is a way to train machine learning models that helps them become more robust and less sensitive to small changes in their input data.

The key insight from this paper is that while SAM can help reduce certain types of biases in the model, it doesn't completely eliminate them. Instead, it transforms the biases in complex ways. The authors call this "perturbed forgetting" - the biases aren't fully forgotten, but they're changed and distorted.

This is an important finding because it gives us a better understanding of how SAM works under the hood. We can see that SAM isn't just about making the model more accurate, but also about changing the underlying structure and biases in the model in subtle ways.

Technical Explanation

The paper explores the relationship between Sharpness-Aware Minimization (SAM) and the "forgetting" of model biases. SAM is a machine learning technique that aims to find model parameters that are robust to small perturbations in the input data. The authors show that SAM can lead to the "perturbed forgetting" of model biases, where biases are not completely eliminated but rather transformed in complex ways.

The authors analyze the optimization landscape and the properties of the final trained model under SAM dynamics. They find that SAM encourages the model to learn representations that are less sensitive to small changes in the input, which can lead to the transformation of model biases rather than their complete elimination.

The paper provides insights into how SAM affects the structure of the optimization problem and the final trained model, shedding light on the mechanisms behind SAM's robustness properties. The authors' findings suggest that the impact of SAM goes beyond simply reducing the sharpness of the objective function, and that it can have more nuanced effects on the learned model representations.

Critical Analysis

The paper provides a nuanced and insightful analysis of how Sharpness-Aware Minimization (SAM) affects the forgetting of model biases. The authors' findings suggest that SAM does not simply eliminate biases, but rather transforms them in complex ways. This is an important contribution to our understanding of how SAM works and its effects on the learned model representations.

One potential limitation of the study is that it focuses primarily on a single type of model bias, and it's unclear how the results would generalize to other types of biases or model architectures. Additionally, the paper does not explore the potential practical implications of the "perturbed forgetting" phenomenon, such as how it might affect the real-world performance of models trained with SAM.

Further research could investigate the relationship between SAM and other types of model biases, as well as the potential trade-offs or drawbacks of the "perturbed forgetting" effect. It would also be valuable to explore the practical applications of these findings and how they might inform the use of SAM in different machine learning domains.

Conclusion

This paper provides valuable insights into the relationship between Sharpness-Aware Minimization (SAM) and the forgetting of model biases. The key finding is that SAM does not simply eliminate biases, but rather transforms them in complex ways through a process of "perturbed forgetting." This suggests that the impact of SAM goes beyond just reducing the sharpness of the objective function, and that it can have more nuanced effects on the learned model representations.

These insights contribute to our understanding of how SAM works and its potential implications for machine learning. While further research is needed to fully explore the practical applications and limitations of these findings, this paper represents an important step forward in understanding the inner workings of this powerful optimization technique.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Sharpness-Aware Minimization Enhances Feature Quality via Balanced Learning

Jacob Mitchell Springer, Vaishnavh Nagarajan, Aditi Raghunathan

Sharpness-Aware Minimization (SAM) has emerged as a promising alternative optimizer to stochastic gradient descent (SGD). The originally-proposed motivation behind SAM was to bias neural networks towards flatter minima that are believed to generalize better. However, recent studies have shown conflicting evidence on the relationship between flatness and generalization, suggesting that flatness does fully explain SAM's success. Sidestepping this debate, we identify an orthogonal effect of SAM that is beneficial out-of-distribution: we argue that SAM implicitly balances the quality of diverse features. SAM achieves this effect by adaptively suppressing well-learned features which gives remaining features opportunity to be learned. We show that this mechanism is beneficial in datasets that contain redundant or spurious features where SGD falls for the simplicity bias and would not otherwise learn all available features. Our insights are supported by experiments on real data: we demonstrate that SAM improves the quality of features in datasets containing redundant or spurious features, including CelebA, Waterbirds, CIFAR-MNIST, and DomainBed.

6/3/2024

cs.LG

🏋️

On the Duality Between Sharpness-Aware Minimization and Adversarial Training

Yihao Zhang, Hangzhou He, Jingyu Zhu, Huanran Chen, Yifei Wang, Zeming Wei

Adversarial Training (AT), which adversarially perturb the input samples during training, has been acknowledged as one of the most effective defenses against adversarial attacks, yet suffers from inevitably decreased clean accuracy. Instead of perturbing the samples, Sharpness-Aware Minimization (SAM) perturbs the model weights during training to find a more flat loss landscape and improve generalization. However, as SAM is designed for better clean accuracy, its effectiveness in enhancing adversarial robustness remains unexplored. In this work, considering the duality between SAM and AT, we investigate the adversarial robustness derived from SAM. Intriguingly, we find that using SAM alone can improve adversarial robustness. To understand this unexpected property of SAM, we first provide empirical and theoretical insights into how SAM can implicitly learn more robust features, and conduct comprehensive experiments to show that SAM can improve adversarial robustness notably without sacrificing any clean accuracy, shedding light on the potential of SAM to be a substitute for AT when accuracy comes at a higher priority. Code is available at https://github.com/weizeming/SAM_AT.

6/6/2024

cs.LG cs.AI cs.CR

Efficient Sharpness-Aware Minimization for Molecular Graph Transformer Models

Yili Wang, Kaixiong Zhou, Ninghao Liu, Ying Wang, Xin Wang

Sharpness-aware minimization (SAM) has received increasing attention in computer vision since it can effectively eliminate the sharp local minima from the training trajectory and mitigate generalization degradation. However, SAM requires two sequential gradient computations during the optimization of each step: one to obtain the perturbation gradient and the other to obtain the updating gradient. Compared with the base optimizer (e.g., Adam), SAM doubles the time overhead due to the additional perturbation gradient. By dissecting the theory of SAM and observing the training gradient of the molecular graph transformer, we propose a new algorithm named GraphSAM, which reduces the training cost of SAM and improves the generalization performance of graph transformer models. There are two key factors that contribute to this result: (i) textit{gradient approximation}: we use the updating gradient of the previous step to approximate the perturbation gradient at the intermediate steps smoothly (textbf{increases efficiency}); (ii) textit{loss landscape approximation}: we theoretically prove that the loss landscape of GraphSAM is limited to a small range centered on the expected loss of SAM (textbf{guarantees generalization performance}). The extensive experiments on six datasets with different tasks demonstrate the superiority of GraphSAM, especially in optimizing the model update process. The code is in:https://github.com/YL-wang/GraphSAM/tree/graphsam

6/21/2024

cs.LG

Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization

Jiaxin Deng, Junbiao Pang, Baochang Zhang

Sharpness-Aware Minimization (SAM) has emerged as a promising approach for effectively reducing the generalization error. However, SAM incurs twice the computational cost compared to base optimizer (e.g., SGD). We propose Asymptotic Unbiased Sampling with respect to iterations to accelerate SAM (AUSAM), which maintains the model's generalization capacity while significantly enhancing computational efficiency. Concretely, we probabilistically sample a subset of data points beneficial for SAM optimization based on a theoretically guaranteed criterion, i.e., the Gradient Norm of each Sample (GNS). We further approximate the GNS by the difference in loss values before and after perturbation in SAM. As a plug-and-play, architecture-agnostic method, our approach consistently accelerates SAM across a range of tasks and networks, i.e., classification, human pose estimation and network quantization. On CIFAR10/100 and Tiny-ImageNet, AUSAM achieves results comparable to SAM while providing a speedup of over 70%. Compared to recent dynamic data pruning methods, AUSAM is better suited for SAM and excels in maintaining performance. Additionally, AUSAM accelerates optimization in human pose estimation and model quantization without sacrificing performance, demonstrating its broad practicality.

6/13/2024

cs.CV cs.LG