Continual Learning of Large Language Models: A Comprehensive Survey

2404.16789

Published 4/26/2024 by Haizhou Shi, Zihao Xu, Hengyi Wang, Weiyi Qin, Wenyuan Wang, Yibin Wang, Hao Wang

💬

Abstract

The recent success of large language models (LLMs) trained on static, pre-collected, general datasets has sparked numerous research directions and applications. One such direction addresses the non-trivial challenge of integrating pre-trained LLMs into dynamic data distributions, task structures, and user preferences. Pre-trained LLMs, when tailored for specific needs, often experience significant performance degradation in previous knowledge domains -- a phenomenon known as catastrophic forgetting. While extensively studied in the continual learning (CL) community, it presents new manifestations in the realm of LLMs. In this survey, we provide a comprehensive overview of the current research progress on LLMs within the context of CL. This survey is structured into four main sections: we first describe an overview of continually learning LLMs, consisting of two directions of continuity: vertical continuity (or vertical continual learning), i.e., continual adaptation from general to specific capabilities, and horizontal continuity (or horizontal continual learning), i.e., continual adaptation across time and domains (Section 3). We then summarize three stages of learning LLMs in the context of modern CL: Continual Pre-Training (CPT), Domain-Adaptive Pre-training (DAP), and Continual Fine-Tuning (CFT) (Section 4). Then we provide an overview of evaluation protocols for continual learning with LLMs, along with the current available data sources (Section 5). Finally, we discuss intriguing questions pertaining to continual learning for LLMs (Section 6). The full list of papers examined in this survey is available at https://github.com/Wang-ML-Lab/llm-continual-learning-survey.

Create account to get full access

Overview

The recent success of large language models (LLMs) has led to numerous research directions, including the challenge of integrating pre-trained LLMs into dynamic data distributions, task structures, and user preferences.
Pre-trained LLMs often experience significant performance degradation in previous knowledge domains, a phenomenon known as catastrophic forgetting.
This survey provides a comprehensive overview of the current research progress on LLMs within the context of continual learning (CL).

Plain English Explanation

Large language models (LLMs) are a type of artificial intelligence that have become very good at understanding and generating human-like text. These models are trained on vast amounts of online data, allowing them to tackle a wide range of language-related tasks. However, a key challenge is that when these pre-trained LLMs are adapted to work on more specific tasks or data, they can often struggle to retain their previous knowledge, a problem known as catastrophic forgetting.

This research survey explores the current efforts to address this issue by integrating LLMs into a continual learning framework. Continual learning aims to allow AI systems to continuously learn and adapt to new information without forgetting what they've learned before.

The survey covers two main directions of continual learning for LLMs: "vertical continuity," where the model adapts from general to more specialized capabilities, and "horizontal continuity," where the model learns across different domains over time. It also summarizes the key stages of continual learning for LLMs, including continual pre-training, domain-adaptive pre-training, and [continual fine-tuning].

The goal of this research is to enable LLMs to be more flexible and adaptable, allowing them to continuously learn and evolve to better meet the needs of users and applications over time.

Technical Explanation

The survey begins by providing an overview of the two main directions of continual learning for LLMs: vertical continuity and horizontal continuity. Vertical continuity refers to the continuous adaptation of LLMs from general to more specific capabilities, while horizontal continuity describes the continual adaptation of LLMs across different domains and time periods.

The authors then summarize the three key stages of continual learning for LLMs:

Continual Pre-Training (CPT): Techniques for continuously updating the pre-trained weights of an LLM to maintain and expand its knowledge.
Domain-Adaptive Pre-training (DAP): Methods for adapting the pre-trained LLM to specific data distributions or task structures.
Continual Fine-Tuning (CFT): Approaches for fine-tuning the LLM on new tasks or datasets without catastrophically forgetting previous knowledge.

The survey also provides an overview of evaluation protocols and available data sources for assessing the performance of continually learning LLMs.

Critical Analysis

The survey highlights the significant challenges involved in integrating pre-trained LLMs into dynamic environments without suffering from catastrophic forgetting. While the continual learning community has extensively studied this problem, the authors note that new manifestations of the issue arise when applying these techniques to large language models.

One potential limitation of the current research, as mentioned in the survey, is the lack of standardized evaluation protocols and benchmarks for continually learning LLMs. This makes it difficult to compare the performance of different approaches and understand their real-world applicability.

Additionally, the survey does not delve into the potential ethical and societal implications of continually learning LLMs. As these models become more adaptable and capable of continuous learning, there may be concerns around the evolution of their biases, the potential for misuse, and the impact on various industries and domains.

Further research is needed to address these challenges and develop robust, scalable, and responsible continual learning techniques for large language models.

Conclusion

This survey provides a comprehensive overview of the current research on continual learning for large language models (LLMs). By addressing the challenge of integrating pre-trained LLMs into dynamic environments without catastrophic forgetting, the research aims to enable these models to become more flexible, adaptable, and capable of continuous learning.

The survey covers the key concepts of vertical and horizontal continuity, as well as the main stages of continual learning for LLMs, including continual pre-training, domain-adaptive pre-training, and continual fine-tuning. While the research shows promising progress, the authors also highlight the need for standardized evaluation protocols and further exploration of the ethical and societal implications of continually learning LLMs.

As LLMs continue to play an increasingly important role in various applications, the ability to adapt and evolve these models over time will be crucial for unlocking their full potential and ensuring they remain relevant and beneficial in dynamic, real-world environments.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Recent Advances of Foundation Language Models-based Continual Learning: A Survey

Yutao Yang, Jie Zhou, Xuanwen Ding, Tianyu Huai, Shunyu Liu, Qin Chen, Liang He, Yuan Xie

Recently, foundation language models (LMs) have marked significant achievements in the domains of natural language processing (NLP) and computer vision (CV). Unlike traditional neural network models, foundation LMs obtain a great ability for transfer learning by acquiring rich commonsense knowledge through pre-training on extensive unsupervised datasets with a vast number of parameters. However, they still can not emulate human-like continuous learning due to catastrophic forgetting. Consequently, various continual learning (CL)-based methodologies have been developed to refine LMs, enabling them to adapt to new tasks without forgetting previous knowledge. However, a systematic taxonomy of existing approaches and a comparison of their performance are still lacking, which is the gap that our survey aims to fill. We delve into a comprehensive review, summarization, and classification of the existing literature on CL-based approaches applied to foundation language models, such as pre-trained language models (PLMs), large language models (LLMs) and vision-language models (VLMs). We divide these studies into offline CL and online CL, which consist of traditional methods, parameter-efficient-based methods, instruction tuning-based methods and continual pre-training methods. Offline CL encompasses domain-incremental learning, task-incremental learning, and class-incremental learning, while online CL is subdivided into hard task boundary and blurry task boundary settings. Additionally, we outline the typical datasets and metrics employed in CL research and provide a detailed analysis of the challenges and future work for LMs-based continual learning.

5/30/2024

cs.CL

🧠

Continual Learning with Pre-Trained Models: A Survey

Da-Wei Zhou, Hai-Long Sun, Jingyi Ning, Han-Jia Ye, De-Chuan Zhan

Nowadays, real-world applications often face streaming data, which requires the learning system to absorb new knowledge as data evolves. Continual Learning (CL) aims to achieve this goal and meanwhile overcome the catastrophic forgetting of former knowledge when learning new ones. Typical CL methods build the model from scratch to grow with incoming data. However, the advent of the pre-trained model (PTM) era has sparked immense research interest, particularly in leveraging PTMs' robust representational capabilities. This paper presents a comprehensive survey of the latest advancements in PTM-based CL. We categorize existing methodologies into three distinct groups, providing a comparative analysis of their similarities, differences, and respective advantages and disadvantages. Additionally, we offer an empirical study contrasting various state-of-the-art methods to highlight concerns regarding fairness in comparisons. The source code to reproduce these evaluations is available at: https://github.com/sun-hailong/LAMDA-PILOT

4/24/2024

cs.LG cs.CV

💬

Towards Lifelong Learning of Large Language Models: A Survey

Junhao Zheng, Shengjie Qiu, Chengming Shi, Qianli Ma

As the applications of large language models (LLMs) expand across diverse fields, the ability of these models to adapt to ongoing changes in data, tasks, and user preferences becomes crucial. Traditional training methods, relying on static datasets, are increasingly inadequate for coping with the dynamic nature of real-world information. Lifelong learning, also known as continual or incremental learning, addresses this challenge by enabling LLMs to learn continuously and adaptively over their operational lifetime, integrating new knowledge while retaining previously learned information and preventing catastrophic forgetting. This survey delves into the sophisticated landscape of lifelong learning, categorizing strategies into two primary groups: Internal Knowledge and External Knowledge. Internal Knowledge includes continual pretraining and continual finetuning, each enhancing the adaptability of LLMs in various scenarios. External Knowledge encompasses retrieval-based and tool-based lifelong learning, leveraging external data sources and computational tools to extend the model's capabilities without modifying core parameters. The key contributions of our survey are: (1) Introducing a novel taxonomy categorizing the extensive literature of lifelong learning into 12 scenarios; (2) Identifying common techniques across all lifelong learning scenarios and classifying existing literature into various technique groups within each scenario; (3) Highlighting emerging techniques such as model expansion and data selection, which were less explored in the pre-LLM era. Through a detailed examination of these groups and their respective categories, this survey aims to enhance the adaptability, reliability, and overall performance of LLMs in real-world applications.

6/11/2024

cs.LG cs.CL

Large Language Model Can Continue Evolving From Mistakes

Haokun Zhao, Haixia Han, Jie Shi, Chengyu Du, Jiaqing Liang, Yanghua Xiao

As world knowledge evolves and new task paradigms emerge, Continual Learning (CL) is crucial for keeping Large Language Models (LLMs) up-to-date and addressing their shortcomings. In practical applications, LLMs often require both continual instruction tuning (CIT) and continual pre-training (CPT) to adapt to new task paradigms and acquire necessary knowledge for task-solving. However, it remains challenging to collect CPT data that addresses the knowledge deficiencies in models while maintaining adequate volume, and improving the efficiency of utilizing this data also presents significant difficulties. Inspired by the 'summarizing mistakes' learning skill, we propose the Continue Evolving from Mistakes (CEM) method, aiming to provide a data-efficient approach for collecting CPT data and continually improving LLMs' performance through iterative evaluation and supplementation with mistake-relevant knowledge. To efficiently utilize these CPT data and mitigate forgetting, we design a novel CL training set construction paradigm that integrates parallel CIT and CPT data. Extensive experiments demonstrate the efficacy of the CEM method, achieving up to a 17% improvement in accuracy in the best case. Furthermore, additional experiments confirm the potential of combining CEM with catastrophic forgetting mitigation methods, enabling iterative and continual model evolution.

6/18/2024

cs.LG cs.AI cs.CL