May the Forgetting Be with You: Alternate Replay for Learning with Noisy Labels

Read original: arXiv:2408.14284 - Published 8/27/2024 by Monica Millunzi, Lorenzo Bonicelli, Angelo Porrello, Jacopo Credi, Petter N. Kolm, Simone Calderara

May the Forgetting Be with You: Alternate Replay for Learning with Noisy Labels

Overview

This paper proposes an alternate replay approach for learning with noisy labels.
The key ideas are to leverage forgetting to combat noisy labels and to use a combination of gradient update and replay to improve performance.
The authors conduct experiments on several benchmark datasets and show the effectiveness of their approach compared to other methods.

Plain English Explanation

The paper looks at a common problem in machine learning where the training data has "noisy labels" - meaning the labels (the correct answers) are not always accurate. This can happen for many reasons, like human error when labeling data.

The main idea of the paper is to use a technique called "alternate replay" to help the model learn despite the noisy labels. The basic approach is:

Leverage Forgetting: The model is allowed to "forget" some of the noisy information it has learned, rather than stubbornly trying to memorize everything.
Gradient Updates + Replay: The model is trained using a combination of regular gradient updates on the current data,
and
replaying (revisiting) some of the model's past experience.

The authors test this approach on several standard machine learning benchmark datasets, and show that it outperforms other methods for dealing with noisy labels. The key is finding the right balance between updating the model with new data and selectively forgetting or replaying past information.

Technical Explanation

The paper introduces a new method called Alternate Replay (AR) for learning with noisy labels. The key insight is to leverage the model's tendency to forget some of the noisy information it has learned, rather than trying to memorize everything.

The AR approach has two main components:

Gradient Updates: The model is trained using regular gradient updates on the current batch of (potentially noisy) training data.
Alternate Replay: In parallel, the model also revisits and "replays" a subset of its past experiences, using these as additional training examples.

The replay mechanism is designed to counteract the forgetting of useful information, while the gradient updates help the model adapt to the current data, even if it contains noise.

The authors evaluate AR on several benchmark datasets with synthetic and real-world label noise, and show that it outperforms other methods for dealing with noisy labels, such as memory-guided soft experience replay and compressed latent replays.

Critical Analysis

The paper provides a novel and intriguing approach to dealing with noisy labels in machine learning. By explicitly leveraging the model's tendency to forget, the authors demonstrate that a careful balance of gradient updates and selective replay can lead to improved performance.

One potential limitation is that the paper focuses on standard benchmark datasets, and it's not clear how well the AR approach would scale to larger, more complex real-world datasets. Additionally, the method relies on some hyperparameters (e.g., the size of the replay buffer) that may require careful tuning for optimal performance.

It would also be interesting to see how AR compares to other techniques for dealing with noisy labels, such as robust loss functions or data-cleaning approaches. Combining AR with these other methods could potentially lead to even stronger performance.

Conclusion

This paper presents a novel approach called Alternate Replay for learning with noisy labels. The key idea is to leverage the model's natural tendency to forget, using a combination of gradient updates and selective replay to improve performance. The authors demonstrate the effectiveness of their method on several benchmark datasets, and the paper opens up interesting avenues for further research on dealing with label noise in machine learning.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

May the Forgetting Be with You: Alternate Replay for Learning with Noisy Labels

Monica Millunzi, Lorenzo Bonicelli, Angelo Porrello, Jacopo Credi, Petter N. Kolm, Simone Calderara

Forgetting presents a significant challenge during incremental training, making it particularly demanding for contemporary AI systems to assimilate new knowledge in streaming data environments. To address this issue, most approaches in Continual Learning (CL) rely on the replay of a restricted buffer of past data. However, the presence of noise in real-world scenarios, where human annotation is constrained by time limitations or where data is automatically gathered from the web, frequently renders these strategies vulnerable. In this study, we address the problem of CL under Noisy Labels (CLN) by introducing Alternate Experience Replay (AER), which takes advantage of forgetting to maintain a clear distinction between clean, complex, and noisy samples in the memory buffer. The idea is that complex or mislabeled examples, which hardly fit the previously learned data distribution, are most likely to be forgotten. To grasp the benefits of such a separation, we equip AER with Asymmetric Balanced Sampling (ABS): a new sample selection strategy that prioritizes purity on the current task while retaining relevant samples from the past. Through extensive computational comparisons, we demonstrate the effectiveness of our approach in terms of both accuracy and purity of the obtained buffer, resulting in a remarkable average gain of 4.71% points in accuracy with respect to existing loss-based purification strategies. Code is available at https://github.com/aimagelab/mammoth.

8/27/2024

Adaptive Memory Replay for Continual Learning

James Seale Smith, Lazar Valkov, Shaunak Halbe, Vyshnavi Gutta, Rogerio Feris, Zsolt Kira, Leonid Karlinsky

Foundation Models (FMs) have become the hallmark of modern AI, however, these models are trained on massive data, leading to financially expensive training. Updating FMs as new data becomes available is important, however, can lead to `catastrophic forgetting', where models underperform on tasks related to data sub-populations observed too long ago. This continual learning (CL) phenomenon has been extensively studied, but primarily in a setting where only a small amount of past data can be stored. We advocate for the paradigm where memory is abundant, allowing us to keep all previous data, but computational resources are limited. In this setting, traditional replay-based CL approaches are outperformed by a simple baseline which replays past data selected uniformly at random, indicating that this setting necessitates a new approach. We address this by introducing a framework of adaptive memory replay for continual learning, where sampling of past data is phrased as a multi-armed bandit problem. We utilize Bolzmann sampling to derive a method which dynamically selects past data for training conditioned on the current task, assuming full data access and emphasizing training efficiency. Through extensive evaluations on both vision and language pre-training tasks, we demonstrate the effectiveness of our approach, which maintains high performance while reducing forgetting by up to 10% at no training efficiency cost.

4/22/2024

CORE: Mitigating Catastrophic Forgetting in Continual Learning through Cognitive Replay

Jianshu Zhang, Yankai Fu, Ziheng Peng, Dongyu Yao, Kun He

This paper introduces a novel perspective to significantly mitigate catastrophic forgetting in continuous learning (CL), which emphasizes models' capacity to preserve existing knowledge and assimilate new information. Current replay-based methods treat every task and data sample equally and thus can not fully exploit the potential of the replay buffer. In response, we propose COgnitive REplay (CORE), which draws inspiration from human cognitive review processes. CORE includes two key strategies: Adaptive Quantity Allocation and Quality-Focused Data Selection. The former adaptively modulates the replay buffer allocation for each task based on its forgetting rate, while the latter guarantees the inclusion of representative data that best encapsulates the characteristics of each task within the buffer. Our approach achieves an average accuracy of 37.95% on split-CIFAR10, surpassing the best baseline method by 6.52%. Additionally, it significantly enhances the accuracy of the poorest-performing task by 6.30% compared to the top baseline. Code is available at https://github.com/sterzhang/CORE.

4/10/2024

Prior-free Balanced Replay: Uncertainty-guided Reservoir Sampling for Long-Tailed Continual Learning

Lei Liu, Li Liu, Yawen Cui

Even in the era of large models, one of the well-known issues in continual learning (CL) is catastrophic forgetting, which is significantly challenging when the continual data stream exhibits a long-tailed distribution, termed as Long-Tailed Continual Learning (LTCL). Existing LTCL solutions generally require the label distribution of the data stream to achieve re-balance training. However, obtaining such prior information is often infeasible in real scenarios since the model should learn without pre-identifying the majority and minority classes. To this end, we propose a novel Prior-free Balanced Replay (PBR) framework to learn from long-tailed data stream with less forgetting. Concretely, motivated by our experimental finding that the minority classes are more likely to be forgotten due to the higher uncertainty, we newly design an uncertainty-guided reservoir sampling strategy to prioritize rehearsing minority data without using any prior information, which is based on the mutual dependence between the model and samples. Additionally, we incorporate two prior-free components to further reduce the forgetting issue: (1) Boundary constraint is to preserve uncertain boundary supporting samples for continually re-estimating task boundaries. (2) Prototype constraint is to maintain the consistency of learned class prototypes along with training. Our approach is evaluated on three standard long-tailed benchmarks, demonstrating superior performance to existing CL methods and previous SOTA LTCL approach in both task- and class-incremental learning settings, as well as ordered- and shuffled-LTCL settings.

8/28/2024