RE-Adapt: Reverse Engineered Adaptation of Large Language Models

Read original: arXiv:2405.15007 - Published 5/27/2024 by William Fleshman, Benjamin Van Durme

💬

Overview

The paper introduces RE-Adapt, a method for fine-tuning large language models (LLMs) on new domains without degrading their pre-existing instruction-following abilities.
RE-Adapt "reverse engineers" an adapter that isolates what an instruction-tuned model has learned beyond its corresponding pre-trained base model, without requiring any additional data or training.
The base model can then be fine-tuned on a new domain, and the reverse-engineered adapter can be used to readapt the model to instruction following.
The paper also presents a low-rank variant called LoRE-Adapt, which outperforms other fine-tuning methods across multiple popular LLMs and datasets, even when used with retrieval-augmented generation.

Plain English Explanation

RE-Adapt is a way to update large language models (LLMs) to work on new tasks without losing their original abilities. Typically, when you fine-tune an LLM on a new task, it can forget how to do its original tasks. RE-Adapt solves this problem by creating a special "adapter" that remembers the original abilities of the model.

The key idea is that the adapter isolates what the model has learned beyond its original pre-trained version. This means the base model can be fine-tuned on a new task, and then the adapter can be used to "readapt" the model back to its original instruction-following abilities. Importantly, this process doesn't require any additional data or training - it's all done automatically.

The paper also introduces LoRE-Adapt, a more efficient version of RE-Adapt that uses a low-rank approach. Both RE-Adapt and LoRE-Adapt outperform other fine-tuning methods, even when the models are used together with retrieval-augmented generation.

The main benefit of this approach is that it allows LLMs to be quickly adapted to new domains or tasks without losing their original capabilities. This could be very useful in real-world applications where models need to be continuously updated and refined.

Technical Explanation

The key innovation in RE-Adapt is the "reverse engineering" of an adapter that isolates what an instruction-tuned model has learned beyond its corresponding pre-trained base model. This adapter can then be used to "readapt" the base model to its original instruction-following abilities after it has been fine-tuned on a new domain.

Importantly, this reverse-engineered adapter is obtained without requiring any additional data or training. The authors achieve this by analyzing the differences between the instruction-tuned model and its pre-trained base, and identifying a low-rank subspace that captures the instruction-following capabilities.

The paper also introduces a more efficient variant called LoRE-Adapt, which uses a low-rank factorization to further reduce the number of parameters in the adapter. Both RE-Adapt and LoRE-Adapt are evaluated on multiple popular LLMs and datasets, including when used in conjunction with retrieval-augmented generation.

The results show that RE-Adapt and LoRE-Adapt outperform other parameter-efficient fine-tuning methods like LoRA and Adapter Tuning, demonstrating the benefits of the reverse-engineered adapter approach.

Critical Analysis

The paper provides a thorough evaluation of RE-Adapt and LoRE-Adapt across multiple datasets and LLMs, including when used with retrieval-augmented generation. This helps to demonstrate the generalizability and robustness of the proposed methods.

However, the paper does not explore the limitations or potential downsides of the approach in depth. For example, it's not clear how the reverse-engineered adapters would perform in scenarios where the original instruction-tuned model had significant overlapping capabilities with the pre-trained base model.

Additionally, the paper focuses on the performance gains achieved by RE-Adapt and LoRE-Adapt, but does not provide much insight into the underlying mechanisms or design choices that enable these improvements. A more in-depth technical analysis of the adapter structure and its relationship to the base model's learning could help readers understand the approach more deeply.

Overall, the paper presents a promising and novel approach to fine-tuning LLMs, but further research may be needed to fully understand its limitations and potential applications in real-world settings.

Conclusion

RE-Adapt and its low-rank variant LoRE-Adapt offer a compelling solution to the problem of fine-tuning large language models on new domains without degrading their pre-existing instruction-following abilities. By reverse-engineering an adapter that isolates the model's learned capabilities, these methods allow for efficient and effective fine-tuning while preserving the original task performance.

The strong results demonstrated in the paper, even when used in conjunction with retrieval-augmented generation, suggest that RE-Adapt and LoRE-Adapt could be valuable tools for practitioners looking to continuously adapt and refine large language models for real-world applications. As the field of large language models continues to evolve, techniques like those presented in this paper will likely play an important role in ensuring the flexibility and robustness of these powerful AI systems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

💬

RE-Adapt: Reverse Engineered Adaptation of Large Language Models

William Fleshman, Benjamin Van Durme

We introduce RE-Adapt, an approach to fine-tuning large language models on new domains without degrading any pre-existing instruction-tuning. We reverse engineer an adapter which isolates what an instruction-tuned model has learned beyond its corresponding pretrained base model. Importantly, this requires no additional data or training. We can then fine-tune the base model on a new domain and readapt it to instruction following with the reverse engineered adapter. RE-Adapt and our low-rank variant LoRE-Adapt both outperform other methods of fine-tuning, across multiple popular LLMs and datasets, even when the models are used in conjunction with retrieval-augmented generation.

5/27/2024

RE-AdaptIR: Improving Information Retrieval through Reverse Engineered Adaptation

William Fleshman, Benjamin Van Durme

Large language models (LLMs) fine-tuned for text-retrieval have demonstrated state-of-the-art results across several information retrieval (IR) benchmarks. However, supervised training for improving these models requires numerous labeled examples, which are generally unavailable or expensive to acquire. In this work, we explore the effectiveness of extending reverse engineered adaptation to the context of information retrieval (RE-AdaptIR). We use RE-AdaptIR to improve LLM-based IR models using only unlabeled data. We demonstrate improved performance both in training domains as well as zero-shot in domains where the models have seen no queries. We analyze performance changes in various fine-tuning scenarios and offer findings of immediate use to practitioners.

6/24/2024

Reformulating Domain Adaptation of Large Language Models as Adapt-Retrieve-Revise: A Case Study on Chinese Legal Domain

Zhen wan, Yating Zhang, Yexiang Wang, Fei Cheng, Sadao Kurohashi

While large language models (LLMs) like GPT-4 have recently demonstrated astonishing zero-shot capabilities in general domain tasks, they often generate content with hallucinations in specific domains such as Chinese law, hindering their application in these areas. This is typically due to the absence of training data that encompasses such a specific domain, preventing GPT-4 from acquiring in-domain knowledge. A pressing challenge is that it's not plausible to continue training LLMs of such scale on in-domain data. This paper introduces a simple and effective domain adaptation framework for GPT-4 by reformulating generation as an textbf{adapt-retrieve-revise} process. The initial step is to textbf{adapt} an affordable 7B LLM to the target domain by continuing learning on in-domain data. When solving a task, we leverage the adapted LLM to generate a draft answer given a task query. Then, the draft answer will be used to textbf{retrieve} supporting evidence candidates from an external in-domain knowledge base. Finally, the draft answer and retrieved evidence are concatenated into a whole prompt to let GPT-4 assess the evidence and textbf{revise} the draft answer to generate the final answer. Our proposal combines the advantages of the efficiency of adapting a smaller 7B model with the evidence-assessing capability of GPT-4 and effectively prevents GPT-4 from generating hallucinatory content. In the zero-shot setting of four Chinese legal tasks, our method improves accuracy by 33.3% compared to the direct generation by GPT-4. When compared to two stronger retrieval-based baselines, our method outperforms them by 15.4% and 23.9%. Our code will be released

8/27/2024

🌿

Parameter-Efficient Fine-Tuning With Adapters

Keyu Chen, Yuan Pang, Zi Yang

In the arena of language model fine-tuning, the traditional approaches, such as Domain-Adaptive Pretraining (DAPT) and Task-Adaptive Pretraining (TAPT), although effective, but computational intensive. This research introduces a novel adaptation method utilizing the UniPELT framework as a base and added a PromptTuning Layer, which significantly reduces the number of trainable parameters while maintaining competitive performance across various benchmarks. Our method employs adapters, which enable efficient transfer of pretrained models to new tasks with minimal retraining of the base model parameters. We evaluate our approach using three diverse datasets: the GLUE benchmark, a domain-specific dataset comprising four distinct areas, and the Stanford Question Answering Dataset 1.1 (SQuAD). Our results demonstrate that our customized adapter-based method achieves performance comparable to full model fine-tuning, DAPT+TAPT and UniPELT strategies while requiring fewer or equivalent amount of parameters. This parameter efficiency not only alleviates the computational burden but also expedites the adaptation process. The study underlines the potential of adapters in achieving high performance with significantly reduced resource consumption, suggesting a promising direction for future research in parameter-efficient fine-tuning.

5/10/2024