InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning

2404.00228

Published 4/4/2024 by Yan-Shuo Liang, Wu-Jun Li

InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning

Abstract

Continual learning requires the model to learn multiple tasks sequentially. In continual learning, the model should possess the ability to maintain its performance on old tasks (stability) and the ability to adapt to new tasks continuously (plasticity). Recently, parameter-efficient fine-tuning (PEFT), which involves freezing a pre-trained model and injecting a small number of learnable parameters to adapt to downstream tasks, has gained increasing popularity in continual learning. Although existing continual learning methods based on PEFT have demonstrated superior performance compared to those not based on PEFT, most of them do not consider how to eliminate the interference of the new task on the old tasks, which inhibits the model from making a good trade-off between stability and plasticity. In this work, we propose a new PEFT method, called interference-free low-rank adaptation (InfLoRA), for continual learning. InfLoRA injects a small number of parameters to reparameterize the pre-trained weights and shows that fine-tuning these injected parameters is equivalent to fine-tuning the pre-trained weights within a subspace. Furthermore, InfLoRA designs this subspace to eliminate the interference of the new task on the old tasks, making a good trade-off between stability and plasticity. Experimental results show that InfLoRA outperforms existing state-of-the-art continual learning methods on multiple datasets.

Get summaries of the top AI research delivered straight to your inbox:

Introduction

This paper presents a new approach called InfLoRA (Interference-Free Low-Rank Adaptation) for continual learning, which aims to address the problem of catastrophic forgetting. Continual learning is the ability of a machine learning model to learn new tasks while retaining knowledge from previous tasks, without experiencing a significant drop in performance. The key idea behind InfLoRA is to perform low-rank adaptation of the model parameters while avoiding interference between tasks, which can lead to forgetting.

Plain English Explanation

Continual learning is like a person trying to learn new skills while not forgetting the old ones. For example, if you learn how to play the guitar, and then later try to learn how to play the piano, you don't want to forget how to play the guitar. InfLoRA is a new method that tries to solve this problem of "forgetting" in machine learning models.

The main insight of InfLoRA is to update only a small part of the model's parameters (a low-rank subspace) when learning a new task, rather than changing the entire model. This helps the model retain knowledge from previous tasks while acquiring new skills. Importantly, InfLoRA also avoids interference between the updates for different tasks, which is a key cause of forgetting in many continual learning approaches.

Technical Explanation

The authors propose the InfLoRA (Interference-Free Low-Rank Adaptation) method for continual learning. InfLoRA builds on the MTLoRA and Hessian-Aware Low-Rank Weight Perturbation approaches, which use low-rank adaptation to efficiently update only a small part of the model's parameters when learning new tasks.

The key innovation in InfLoRA is the use of an interference-free update rule that decouples the parameter updates for different tasks. This is achieved by introducing task-specific low-rank adaptation matrices that are learned independently for each task. This helps to prevent catastrophic forgetting, as updates for one task do not interfere with the knowledge acquired for previous tasks.

The authors evaluate InfLoRA on several continual learning benchmarks, including permuted MNIST, split CIFAR-100, and a language modeling task. The results show that InfLoRA outperforms several state-of-the-art continual learning methods in terms of final task performance and backward transfer (the ability to retain knowledge from previous tasks).

Critical Analysis

The authors provide a thorough evaluation of InfLoRA and demonstrate its superiority over existing continual learning approaches. However, the paper does not address certain limitations and potential issues with the method.

For example, the authors do not discuss the computational and memory overhead of maintaining task-specific low-rank adaptation matrices, especially as the number of tasks grows. This could be a significant drawback in real-world scenarios with a large number of tasks.

Additionally, the paper does not explore the interpretability or explainability of the low-rank adaptations learned by InfLoRA. Understanding how the model is updating its parameters and what knowledge is being retained or forgotten could be valuable for both researchers and practitioners.

Further research could also investigate the performance of InfLoRA on more complex and diverse task distributions, as well as its robustness to task interference and distribution shifts.

Conclusion

The InfLoRA method presents a promising approach for continual learning by leveraging low-rank adaptation with interference-free updates. By carefully decoupling the parameter updates for different tasks, InfLoRA is able to effectively learn new skills while retaining knowledge from previous tasks, outperforming state-of-the-art continual learning methods. While the paper provides a strong technical contribution, further research is needed to address the potential limitations and explore the broader applicability of the approach.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

LoRA Learns Less and Forgets Less

Dan Biderman, Jose Gonzalez Ortiz, Jacob Portes, Mansheej Paul, Philip Greengard, Connor Jennings, Daniel King, Sam Havens, Vitaliy Chiley, Jonathan Frankle, Cody Blakeney, John P. Cunningham

Low-Rank Adaptation (LoRA) is a widely-used parameter-efficient finetuning method for large language models. LoRA saves memory by training only low rank perturbations to selected weight matrices. In this work, we compare the performance of LoRA and full finetuning on two target domains, programming and mathematics. We consider both the instruction finetuning ($approx$100K prompt-response pairs) and continued pretraining ($approx$10B unstructured tokens) data regimes. Our results show that, in most settings, LoRA substantially underperforms full finetuning. Nevertheless, LoRA exhibits a desirable form of regularization: it better maintains the base model's performance on tasks outside the target domain. We show that LoRA provides stronger regularization compared to common techniques such as weight decay and dropout; it also helps maintain more diverse generations. We show that full finetuning learns perturbations with a rank that is 10-100X greater than typical LoRA configurations, possibly explaining some of the reported gaps. We conclude by proposing best practices for finetuning with LoRA.

5/17/2024

cs.LG cs.AI cs.CL

ALoRA: Allocating Low-Rank Adaptation for Fine-tuning Large Language Models

Zequan Liu, Jiawen Lyn, Wei Zhu, Xing Tian, Yvette Graham

Parameter-efficient fine-tuning (PEFT) is widely studied for its effectiveness and efficiency in the era of large language models. Low-rank adaptation (LoRA) has demonstrated commendable performance as a popular and representative method. However, it is implemented with a fixed intrinsic rank that might not be the ideal setting for the downstream tasks. Recognizing the need for more flexible downstream task adaptation, we extend the methodology of LoRA to an innovative approach we call allocating low-rank adaptation (ALoRA) that enables dynamic adjustments to the intrinsic rank during the adaptation process. First, we propose a novel method, AB-LoRA, that can effectively estimate the importance score of each LoRA rank. Second, guided by AB-LoRA, we gradually prune abundant and negatively impacting LoRA ranks and allocate the pruned LoRA budgets to important Transformer modules needing higher ranks. We have conducted experiments on various tasks, and the experimental results demonstrate that our ALoRA method can outperform the recent baselines with comparable tunable parameters.

4/16/2024

cs.CL

🌿

LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report

Justin Zhao, Timothy Wang, Wael Abid, Geoffrey Angus, Arnav Garg, Jeffery Kinnison, Alex Sherstinsky, Piero Molino, Travis Addair, Devvret Rishi

Low Rank Adaptation (LoRA) has emerged as one of the most widely adopted methods for Parameter Efficient Fine-Tuning (PEFT) of Large Language Models (LLMs). LoRA reduces the number of trainable parameters and memory usage while achieving comparable performance to full fine-tuning. We aim to assess the viability of training and serving LLMs fine-tuned with LoRA in real-world applications. First, we measure the quality of LLMs fine-tuned with quantized low rank adapters across 10 base models and 31 tasks for a total of 310 models. We find that 4-bit LoRA fine-tuned models outperform base models by 34 points and GPT-4 by 10 points on average. Second, we investigate the most effective base models for fine-tuning and assess the correlative and predictive capacities of task complexity heuristics in forecasting the outcomes of fine-tuning. Finally, we evaluate the latency and concurrency capabilities of LoRAX, an open-source Multi-LoRA inference server that facilitates the deployment of multiple LoRA fine-tuned models on a single GPU using shared base model weights and dynamic adapter loading. LoRAX powers LoRA Land, a web application that hosts 25 LoRA fine-tuned Mistral-7B LLMs on a single NVIDIA A100 GPU with 80GB memory. LoRA Land highlights the quality and cost-effectiveness of employing multiple specialized LLMs over a single, general-purpose LLM.

5/3/2024

cs.CL cs.AI cs.LG

AFLoRA: Adaptive Freezing of Low Rank Adaptation in Parameter Efficient Fine-Tuning of Large Models

Zeyu Liu, Souvik Kundu, Anni Li, Junrui Wan, Lianghao Jiang, Peter Anthony Beerel

We present a novel Parameter-Efficient Fine-Tuning (PEFT) method, dubbed as Adaptive Freezing of Low Rank Adaptation (AFLoRA). Specifically, for each pre-trained frozen weight tensor, we add a parallel path of trainable low-rank matrices, namely a down-projection and an up-projection matrix, each of which is followed by a feature transformation vector. Based on a novel freezing score, we the incrementally freeze these projection matrices during fine-tuning to reduce the computation and alleviate over-fitting. Our experimental results demonstrate that we can achieve state-of-the-art performance with an average improvement of up to $0.85%$ as evaluated on GLUE benchmark while yeilding up to $9.5times$ fewer average trainable parameters. While compared in terms of runtime, AFLoRA can yield up to $1.86times$ improvement as opposed to similar PEFT alternatives. Besides the practical utility of our approach, we provide insights on the trainability requirements of LoRA paths at different modules and the freezing schedule for the different projection matrices. Code will be released.

4/17/2024

cs.CL cs.AI cs.LG