Controlling Forgetting with Test-Time Data in Continual Learning

2406.13653

Published 6/21/2024 by Vaibhav Singh, Rahaf Aljundi, Eugene Belilovsky

Controlling Forgetting with Test-Time Data in Continual Learning

Abstract

Foundational vision-language models have shown impressive performance on various downstream tasks. Yet, there is still a pressing need to update these models later as new tasks or domains become available. Ongoing Continual Learning (CL) research provides techniques to overcome catastrophic forgetting of previous information when new knowledge is acquired. To date, CL techniques focus only on the supervised training sessions. This results in significant forgetting yielding inferior performance to even the prior model zero shot performance. In this work, we argue that test-time data hold great information that can be leveraged in a self supervised manner to refresh the model's memory of previous learned tasks and hence greatly reduce forgetting at no extra labelling cost. We study how unsupervised data can be employed online to improve models' performance on prior tasks upon encountering representative samples. We propose a simple yet effective student-teacher model with gradient based sparse parameters updates and show significant performance improvements and reduction in forgetting, which could alleviate the role of an offline episodic memory/experience replay buffer.

Create account to get full access

Overview

This paper introduces a novel approach to continual learning called "Controlling Forgetting with Test-Time Data" (CFTD), which leverages data available at test time to mitigate catastrophic forgetting.
The key idea is to use the test-time data to dynamically adjust the model's parameters and reduce forgetting of previously learned tasks.
The authors demonstrate the effectiveness of CFTD on various benchmark datasets and show that it outperforms state-of-the-art continual learning methods.

Plain English Explanation

Continual learning is the ability of an AI system to learn new information while retaining and applying its existing knowledge. One of the biggest challenges in continual learning is "catastrophic forgetting," where the system forgets what it has previously learned when it acquires new knowledge.

The researchers in this paper have developed a new technique called "Controlling Forgetting with Test-Time Data" (CFTD) to address this issue. The core idea is to use the data available at test time (when the model is being used to make predictions) to dynamically adjust the model's parameters and prevent it from forgetting what it has learned previously.

Imagine you have an AI system that has been trained to recognize different types of animals. Over time, as it learns to identify new animals, it may start to forget how to recognize the animals it learned about earlier. The CFTD approach would allow the system to use the information available at test time, such as the specific animals it's being asked to identify, to fine-tune its internal parameters and maintain its ability to recognize all the animals it has learned.

The researchers demonstrate that this CFTD approach outperforms other state-of-the-art continual learning methods on various benchmark datasets, making it a promising technique for building AI systems that can continuously learn and adapt without forgetting their previous knowledge.

Technical Explanation

The key innovation of the CFTD approach is the use of test-time data to dynamically adjust the model's parameters and mitigate catastrophic forgetting. Traditionally, continual learning methods have focused on modifying the model architecture or the training process to prevent forgetting, but the CFTD method takes a different approach.

The authors propose a two-stage training process. First, the model is trained on a sequence of tasks using a standard continual learning method, such as Elastic Weight Consolidation (EWC) or Masked Based Continual Learning (MBCL). Then, at test time, the model is fine-tuned using the current test-time data, with the goal of minimizing forgetting of the previously learned tasks.

The authors introduce a novel optimization objective that balances the need to perform well on the current test-time task with the need to retain the knowledge from previous tasks. This is achieved by adding a regularization term to the loss function that penalizes the model for deviating too much from its previously learned parameters.

The experiments demonstrate that CFTD outperforms state-of-the-art continual learning methods on a variety of benchmark datasets, including permuted MNIST, split CIFAR-100, and Continual CIFAR-100. The authors also show that CFTD is robust to different types of test-time data, including samples from the current task, samples from previous tasks, and even adversarial examples.

Critical Analysis

The CFTD approach presented in this paper is a promising step towards addressing the challenge of catastrophic forgetting in continual learning. By leveraging test-time data to dynamically adjust the model's parameters, the method can effectively retain knowledge from previous tasks while learning new information.

One potential limitation of the CFTD approach is that it assumes the availability of test-time data that is representative of the previously learned tasks. In real-world scenarios, the test-time data may not always be a reliable proxy for the previous tasks, which could limit the effectiveness of the method.

Additionally, the authors do not explore the scalability of CFTD to more complex and long-term continual learning scenarios. It would be interesting to see how the method performs as the number of tasks and the complexity of the learning problem increases.

Further research could also investigate the integration of CFTD with other continual learning techniques, such as Federated Continual Learning or Adaptive Memory Replay, to create more robust and versatile continual learning systems.

Conclusion

The "Controlling Forgetting with Test-Time Data" (CFTD) approach presented in this paper is a novel and promising technique for mitigating catastrophic forgetting in continual learning. By leveraging test-time data to dynamically adjust the model's parameters, CFTD can effectively retain knowledge from previously learned tasks while acquiring new information.

The empirical results demonstrate the effectiveness of CFTD on various benchmark datasets, outperforming state-of-the-art continual learning methods. This suggests that the CFTD approach could be a valuable tool for building AI systems that can continuously learn and adapt without forgetting their previous knowledge.

While the method has some potential limitations, such as the reliance on representative test-time data, the paper's contribution represents an important step forward in the field of continual learning. Ongoing research to address these challenges and further explore the integration of CFTD with other techniques could lead to even more robust and versatile continual learning solutions.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Adaptive Memory Replay for Continual Learning

James Seale Smith, Lazar Valkov, Shaunak Halbe, Vyshnavi Gutta, Rogerio Feris, Zsolt Kira, Leonid Karlinsky

Foundation Models (FMs) have become the hallmark of modern AI, however, these models are trained on massive data, leading to financially expensive training. Updating FMs as new data becomes available is important, however, can lead to `catastrophic forgetting', where models underperform on tasks related to data sub-populations observed too long ago. This continual learning (CL) phenomenon has been extensively studied, but primarily in a setting where only a small amount of past data can be stored. We advocate for the paradigm where memory is abundant, allowing us to keep all previous data, but computational resources are limited. In this setting, traditional replay-based CL approaches are outperformed by a simple baseline which replays past data selected uniformly at random, indicating that this setting necessitates a new approach. We address this by introducing a framework of adaptive memory replay for continual learning, where sampling of past data is phrased as a multi-armed bandit problem. We utilize Bolzmann sampling to derive a method which dynamically selects past data for training conditioned on the current task, assuming full data access and emphasizing training efficiency. Through extensive evaluations on both vision and language pre-training tasks, we demonstrate the effectiveness of our approach, which maintains high performance while reducing forgetting by up to 10% at no training efficiency cost.

4/22/2024

cs.LG cs.CL cs.CV

Data-dependent and Oracle Bounds on Forgetting in Continual Learning

Lior Friedman, Ron Meir

In continual learning, knowledge must be preserved and re-used between tasks, maintaining good transfer to future tasks and minimizing forgetting of previously learned ones. While several practical algorithms have been devised for this setting, there have been few theoretical works aiming to quantify and bound the degree of Forgetting in general settings. We provide both data-dependent and oracle upper bounds that apply regardless of model and algorithm choice, as well as bounds for Gibbs posteriors. We derive an algorithm inspired by our bounds and demonstrate empirically that our approach yields improved forward and backward transfer.

6/14/2024

cs.LG

Federated Continual Learning Goes Online: Leveraging Uncertainty for Modality-Agnostic Class-Incremental Learning

Giuseppe Serra, Florian Buettner

Given the ability to model more realistic and dynamic problems, Federated Continual Learning (FCL) has been increasingly investigated recently. A well-known problem encountered in this setting is the so-called catastrophic forgetting, for which the learning model is inclined to focus on more recent tasks while forgetting the previously learned knowledge. The majority of the current approaches in FCL propose generative-based solutions to solve said problem. However, this setting requires multiple training epochs over the data, implying an offline setting where datasets are stored locally and remain unchanged over time. Furthermore, the proposed solutions are tailored for vision tasks solely. To overcome these limitations, we propose a new modality-agnostic approach to deal with the online scenario where new data arrive in streams of mini-batches that can only be processed once. To solve catastrophic forgetting, we propose an uncertainty-aware memory-based approach. In particular, we suggest using an estimator based on the Bregman Information (BI) to compute the model's variance at the sample level. Through measures of predictive uncertainty, we retrieve samples with specific characteristics, and - by retraining the model on such samples - we demonstrate the potential of this approach to reduce the forgetting effect in realistic settings.

7/4/2024

cs.LG

CORE: Mitigating Catastrophic Forgetting in Continual Learning through Cognitive Replay

Jianshu Zhang, Yankai Fu, Ziheng Peng, Dongyu Yao, Kun He

This paper introduces a novel perspective to significantly mitigate catastrophic forgetting in continuous learning (CL), which emphasizes models' capacity to preserve existing knowledge and assimilate new information. Current replay-based methods treat every task and data sample equally and thus can not fully exploit the potential of the replay buffer. In response, we propose COgnitive REplay (CORE), which draws inspiration from human cognitive review processes. CORE includes two key strategies: Adaptive Quantity Allocation and Quality-Focused Data Selection. The former adaptively modulates the replay buffer allocation for each task based on its forgetting rate, while the latter guarantees the inclusion of representative data that best encapsulates the characteristics of each task within the buffer. Our approach achieves an average accuracy of 37.95% on split-CIFAR10, surpassing the best baseline method by 6.52%. Additionally, it significantly enhances the accuracy of the poorest-performing task by 6.30% compared to the top baseline. Code is available at https://github.com/sterzhang/CORE.

4/10/2024

cs.LG cs.AI