Class incremental learning with probability dampening and cascaded gated classifier

2402.01262

Published 5/24/2024 by Jary Pomponi, Alessio Devoto, Simone Scardapane

Class incremental learning with probability dampening and cascaded gated classifier

Abstract

Humans are capable of acquiring new knowledge and transferring learned knowledge into different domains, incurring a small forgetting. The same ability, called Continual Learning, is challenging to achieve when operating with neural networks due to the forgetting affecting past learned tasks when learning new ones. This forgetting can be mitigated by replaying stored samples from past tasks, but a large memory size may be needed for long sequences of tasks; moreover, this could lead to overfitting on saved samples. In this paper, we propose a novel regularisation approach and a novel incremental classifier called, respectively, Margin Dampening and Cascaded Scaling Classifier. The first combines a soft constraint and a knowledge distillation approach to preserve past learned knowledge while allowing the model to learn new patterns effectively. The latter is a gated incremental classifier, helping the model modify past predictions without directly interfering with them. This is achieved by modifying the output of the model with auxiliary scaling functions. We empirically show that our approach performs well on multiple benchmarks against well-established baselines, and we also study each component of our proposal and how the combinations of such components affect the final results.

Create account to get full access

Proposed

Overview

The paper introduces a new method called the Cascaded Scaling Classifier (CSC) for class incremental learning.
Class incremental learning is a type of continual learning where new classes are added to a model over time without forgetting previous classes.
CSC uses probability scaling to mitigate catastrophic forgetting, the tendency of models to forget previously learned information when adapting to new data.

Plain English Explanation

The Cascaded Scaling Classifier (CSC) is a new approach for class incremental learning, a type of continual learning where a model learns to recognize new classes over time without forgetting how to recognize previous classes. The key idea behind CSC is to use probability scaling to help the model adapt to new classes without catastrophically forgetting old ones.

Imagine you're training a model to recognize different types of animals. At first, it might just learn to recognize dogs and cats. Later on, you want it to also recognize horses and elephants. A naive approach would be to simply retrain the model on the new animal classes, but this often leads to the model forgetting how to recognize the original dog and cat classes - a phenomenon known as catastrophic forgetting.

The CSC approach aims to address this by using a technique called probability scaling. Essentially, when the model is learning the new horse and elephant classes, it doesn't completely overwrite the information it had about dogs and cats. Instead, it scales the probabilities associated with the dog and cat classes, allowing it to maintain some knowledge of those classes while also learning the new ones. This helps the model avoid catastrophically forgetting the original classes.

Technical Explanation

The CSC architecture consists of a feature extractor and a cascaded set of classifiers. When a new class is introduced, a new classifier is added to the cascade. The existing classifiers' output probabilities are scaled down to make room for the new classifier, allowing the model to learn the new class without completely overwriting the knowledge of previous classes.

The authors evaluate CSC on several continual learning benchmarks and show that it outperforms other state-of-the-art class incremental learning methods, particularly in terms of maintaining high accuracy on previously learned classes. The authors also provide theoretical analysis and insights into the convergence properties of the CSC approach.

Critical Analysis

The Cascaded Scaling Classifier (CSC) presents a novel and promising approach for addressing the challenge of catastrophic forgetting in class incremental learning. However, the authors acknowledge that the method has some limitations. For example, the number of classifiers in the cascade grows linearly with the number of classes, which could lead to increased computational and memory requirements as the number of classes becomes very large.

Additionally, the authors note that the probability scaling mechanism may not be optimal for all types of class distributions and that further research is needed to understand the effects of different class dynamics on the performance of CSC. It would also be interesting to see how CSC compares to other continual learning approaches, such as those that use weight interpolation or adaptive methods, in more diverse and challenging scenarios.

Overall, the Cascaded Scaling Classifier (CSC) represents an important contribution to the field of continual learning and provides a solid foundation for further research and development in this area.

Conclusion

The Cascaded Scaling Classifier (CSC) is a novel approach for class incremental learning, a type of continual learning that focuses on adding new classes to a model over time without forgetting previously learned classes. By using a probability scaling technique, CSC is able to effectively mitigate the problem of catastrophic forgetting, allowing the model to continuously expand its knowledge without losing its ability to recognize older classes.

The authors provide a thorough evaluation of CSC on various continual learning benchmarks and demonstrate its superiority over other state-of-the-art methods. While the approach has some limitations, such as the growing number of classifiers, the paper represents an important advancement in the field of continual learning and paves the way for further research and development in this area.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

✨

Feature Expansion and enhanced Compression for Class Incremental Learning

Quentin Ferdinand (ENSTA Bretagne, Lab-STICC_MATRIX), Gilles Le Chenadec (ENSTA Bretagne, Lab-STICC_MATRIX), Benoit Clement (CROSSING, ENSTA Bretagne, Lab-STICC_MATRIX), Panagiotis Papadakis (Lab-STICC_RAMBO, IMT Atlantique - INFO), Quentin Oliveau

Class incremental learning consists in training discriminative models to classify an increasing number of classes over time. However, doing so using only the newly added class data leads to the known problem of catastrophic forgetting of the previous classes. Recently, dynamic deep learning architectures have been shown to exhibit a better stability-plasticity trade-off by dynamically adding new feature extractors to the model in order to learn new classes followed by a compression step to scale the model back to its original size, thus avoiding a growing number of parameters. In this context, we propose a new algorithm that enhances the compression of previous class knowledge by cutting and mixing patches of previous class samples with the new images during compression using our Rehearsal-CutMix method. We show that this new data augmentation reduces catastrophic forgetting by specifically targeting past class information and improving its compression. Extensive experiments performed on the CIFAR and ImageNet datasets under diverse incremental learning evaluation protocols demonstrate that our approach consistently outperforms the state-of-the-art . The code will be made available upon publication of our work.

5/15/2024

cs.LG cs.AI cs.CV

✅

Mitigating Interference in the Knowledge Continuum through Attention-Guided Incremental Learning

Prashant Bhat, Bharath Renjith, Elahe Arani, Bahram Zonooz

Continual learning (CL) remains a significant challenge for deep neural networks, as it is prone to forgetting previously acquired knowledge. Several approaches have been proposed in the literature, such as experience rehearsal, regularization, and parameter isolation, to address this problem. Although almost zero forgetting can be achieved in task-incremental learning, class-incremental learning remains highly challenging due to the problem of inter-task class separation. Limited access to previous task data makes it difficult to discriminate between classes of current and previous tasks. To address this issue, we propose `Attention-Guided Incremental Learning' (AGILE), a novel rehearsal-based CL approach that incorporates compact task attention to effectively reduce interference between tasks. AGILE utilizes lightweight, learnable task projection vectors to transform the latent representations of a shared task attention module toward task distribution. Through extensive empirical evaluation, we show that AGILE significantly improves generalization performance by mitigating task interference and outperforming rehearsal-based approaches in several CL scenarios. Furthermore, AGILE can scale well to a large number of tasks with minimal overhead while remaining well-calibrated with reduced task-recency bias.

5/24/2024

cs.LG cs.AI cs.CV

Continual Learning with Weight Interpolation

Jk{e}drzej Kozal, Jan Wasilewski, Bartosz Krawczyk, Micha{l} Wo'zniak

Continual learning poses a fundamental challenge for modern machine learning systems, requiring models to adapt to new tasks while retaining knowledge from previous ones. Addressing this challenge necessitates the development of efficient algorithms capable of learning from data streams and accumulating knowledge over time. This paper proposes a novel approach to continual learning utilizing the weight consolidation method. Our method, a simple yet powerful technique, enhances robustness against catastrophic forgetting by interpolating between old and new model weights after each novel task, effectively merging two models to facilitate exploration of local minima emerging after arrival of new concepts. Moreover, we demonstrate that our approach can complement existing rehearsal-based replay approaches, improving their accuracy and further mitigating the forgetting phenomenon. Additionally, our method provides an intuitive mechanism for controlling the stability-plasticity trade-off. Experimental results showcase the significant performance enhancement to state-of-the-art experience replay algorithms the proposed weight consolidation approach offers. Our algorithm can be downloaded from https://github.com/jedrzejkozal/weight-interpolation-cl.

4/10/2024

cs.LG

Federated Continual Learning Goes Online: Leveraging Uncertainty for Modality-Agnostic Class-Incremental Learning

Giuseppe Serra, Florian Buettner

Given the ability to model more realistic and dynamic problems, Federated Continual Learning (FCL) has been increasingly investigated recently. A well-known problem encountered in this setting is the so-called catastrophic forgetting, for which the learning model is inclined to focus on more recent tasks while forgetting the previously learned knowledge. The majority of the current approaches in FCL propose generative-based solutions to solve said problem. However, this setting requires multiple training epochs over the data, implying an offline setting where datasets are stored locally and remain unchanged over time. Furthermore, the proposed solutions are tailored for vision tasks solely. To overcome these limitations, we propose a new modality-agnostic approach to deal with the online scenario where new data arrive in streams of mini-batches that can only be processed once. To solve catastrophic forgetting, we propose an uncertainty-aware memory-based approach. In particular, we suggest using an estimator based on the Bregman Information (BI) to compute the model's variance at the sample level. Through measures of predictive uncertainty, we retrieve samples with specific characteristics, and - by retraining the model on such samples - we demonstrate the potential of this approach to reduce the forgetting effect in realistic settings.

5/30/2024

cs.LG