Black-box Prompt Tuning with Subspace Learning

Read original: arXiv:2305.03518 - Published 6/18/2024 by Yuanhang Zheng, Zhixing Tan, Peng Li, Yang Liu

🌀

Overview

The paper introduces a new method called Black-box prompt tuning with Subspace Learning (BSL) to enhance the versatility of black-box prompt tuning for large language models (LLMs).
Traditional black-box prompt tuning uses derivative-free optimization algorithms to learn prompts within low-dimensional subspaces, rather than backpropagating through the LLM network.
However, recent studies have found that black-box prompt tuning lacks versatility across tasks and LLMs, which the authors believe is due to suboptimal choice of subspaces.
The BSL method aims to identify common subspaces for similar tasks through meta-learning, allowing for more effective prompt optimization on target tasks.

Plain English Explanation

The paper proposes a new technique called Black-box prompt tuning with Subspace Learning (BSL) to improve the performance of black-box prompt tuning for large language models.

Traditional black-box prompt tuning works by finding the best prompts for a task without directly modifying the language model itself. Instead, it searches for prompts within a low-dimensional subspace using optimization algorithms. However, this approach has been found to lack versatility, meaning the prompts don't work well across different tasks or language models.

The key idea behind BSL is to identify common subspaces for similar tasks through a meta-learning process. The intuition is that the optimal prompts for related tasks tend to lie in the same general region of the prompt space. By finding these shared subspaces, the method can then optimize prompts more effectively for new target tasks that are similar to the source tasks used in meta-learning.

The authors show through experiments that this BSL approach consistently achieves strong performance across a variety of downstream tasks and language models, overcoming the limitations of traditional black-box prompt tuning.

Technical Explanation

The paper introduces the Black-box prompt tuning with Subspace Learning (BSL) framework to address the versatility issues of standard black-box prompt tuning methods.

Black-box prompt tuning typically operates by optimizing prompts within a low-dimensional subspace using derivative-free optimization algorithms, rather than backpropagating through the entire language model network. However, recent studies have found that this approach lacks versatility across tasks and LLMs, which the authors hypothesize is due to suboptimal subspace selection.

To enhance versatility, the BSL method first identifies common subspaces for similar source tasks through a meta-learning process. The intuition is that the optimal prompts for related tasks tend to lie in a shared subspace. By finding these shared subspaces, the authors expect that optimizing within the identified subspace will yield good prompts for target tasks that are similar to the source tasks.

The experimental results demonstrate that the BSL framework consistently achieves competitive performance across a variety of downstream tasks and language models, overcoming the limitations of standard black-box prompt tuning.

Critical Analysis

The paper makes a compelling case for the BSL approach, providing empirical evidence of its benefits over standard black-box prompt tuning. However, there are a few potential areas for further research and consideration:

The authors acknowledge that the choice of source tasks used in meta-learning can impact the effectiveness of the identified subspaces. Further investigation is needed to understand how to optimally select the source tasks to maximize performance on diverse target tasks.
The paper does not explore the scalability of the meta-learning process as the number of source tasks grows. Efficient techniques for subspace learning may be required for practical application to large-scale task collections.
While the experiments demonstrate strong performance, it would be valuable to understand the specific types of tasks and language models where BSL provides the greatest benefits compared to other prompt tuning approaches, such as soft prompt tuning or task-specific prompt engineering.

Overall, the BSL framework presents a promising direction for enhancing the versatility of prompt-based language model tuning. Continued research on effective subspace learning and the broader applicability of this approach could further advance the field of efficient and adaptable prompt-based model optimization.

Conclusion

The paper introduces a new Black-box prompt tuning with Subspace Learning (BSL) method to improve the versatility of black-box prompt tuning for large language models. By identifying common subspaces for similar tasks through meta-learning, BSL can optimize prompts more effectively for target tasks that share characteristics with the source tasks used in the subspace learning process.

The experimental results demonstrate that the BSL framework consistently achieves strong performance across a variety of downstream tasks and language models, overcoming the limitations of standard black-box prompt tuning approaches. While there are opportunities for further research, particularly around scaling the meta-learning process and understanding the specific task domains where BSL excels, this work presents an important step forward in enhancing the adaptability and effectiveness of prompt-based language model optimization.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🌀

Black-box Prompt Tuning with Subspace Learning

Yuanhang Zheng, Zhixing Tan, Peng Li, Yang Liu

Black-box prompt tuning employs derivative-free optimization algorithms to learn prompts within low-dimensional subspaces rather than back-propagating through the network of Large Language Models (LLMs). Recent studies reveal that black-box prompt tuning lacks versatility across tasks and LLMs, which we believe is related to the suboptimal choice of subspaces. In this paper, we introduce Black-box prompt tuning with Subspace Learning (BSL) to enhance the versatility of black-box prompt tuning. Based on the assumption that nearly optimal prompts for similar tasks reside in a common subspace, we propose identifying such subspaces through meta-learning on a collection of similar source tasks. Consequently, for a target task that shares similarities with the source tasks, we expect that optimizing within the identified subspace can yield a prompt that performs well on the target task. Experimental results confirm that our BSL framework consistently achieves competitive performance across various downstream tasks and LLMs.

6/18/2024

📈

Make Prompt-based Black-Box Tuning Colorful: Boosting Model Generalization from Three Orthogonal Perspectives

Qiushi Sun, Chengcheng Han, Nuo Chen, Renyu Zhu, Jingyang Gong, Xiang Li, Ming Gao

Large language models (LLMs) have shown increasing power on various natural language processing (NLP) tasks. However, tuning these models for downstream tasks usually needs exorbitant costs or is unavailable due to commercial considerations. Recently, black-box tuning has been proposed to address this problem by optimizing task-specific prompts without accessing the gradients and hidden representations. However, most existing works have yet fully exploited the potential of gradient-free optimization under the scenario of few-shot learning. In this paper, we describe BBT-RGB, a suite of straightforward and complementary techniques for enhancing the efficiency and performance of black-box optimization. Specifically, our method includes three plug-and-play components: (1) Two-stage derivative-free optimization strategy that facilitates fast convergence and mitigates overfitting; (2) Automatic verbalizer construction with its novel usage under few-shot settings; (3) Better prompt initialization policy based on instruction search and auto-selected demonstration. Extensive experiments across various tasks on natural language understanding and inference demonstrate the effectiveness of our method. Our codes are publicly available at https://github.com/QiushiSun/BBT-RGB.

5/7/2024

Efficient Prompt Tuning by Multi-Space Projection and Prompt Fusion

Pengxiang Lan, Enneng Yang, Yuting Liu, Guibing Guo, Linying Jiang, Jianzhe Zhao, Xingwei Wang

Prompt tuning is a promising method to fine-tune a pre-trained language model without retraining its large-scale parameters. Instead, it attaches a soft prompt to the input text, whereby downstream tasks can be well adapted by merely learning the embeddings of prompt tokens. Nevertheless, existing methods still suffer from two challenges: (i) they are hard to balance accuracy and efficiency. A longer (shorter) soft prompt generally leads to a better(worse) accuracy but at the cost of more (less) training time. (ii)The performance may not be consistent when adapting to different downstream tasks. We attribute it to the same embedding space but responsible for different requirements of downstream tasks. To address these issues, we propose an Efficient Prompt Tuning method (EPT) by multi-space projection and prompt fusion. Specifically, it decomposes a given soft prompt into a shorter prompt and two low-rank matrices, significantly reducing the training time. Accuracy is also enhanced by leveraging low-rank matrices and the short prompt as additional knowledge sources to enrich the semantics of the original short prompt. In addition, we project the soft prompt into multiple subspaces to improve the performance consistency, and then adaptively learn the combination weights of different spaces through a gating network. Experiments on 13 natural language processing downstream tasks show that our method significantly and consistently outperforms 11 comparison methods with the relative percentage of improvements up to 12.9%, and training time decreased by 14%.

7/2/2024

Visual Prompt Tuning in Null Space for Continual Learning

Yue Lu, Shizhou Zhang, De Cheng, Yinghui Xing, Nannan Wang, Peng Wang, Yanning Zhang

Existing prompt-tuning methods have demonstrated impressive performances in continual learning (CL), by selecting and updating relevant prompts in the vision-transformer models. On the contrary, this paper aims to learn each task by tuning the prompts in the direction orthogonal to the subspace spanned by previous tasks' features, so as to ensure no interference on tasks that have been learned to overcome catastrophic forgetting in CL. However, different from the orthogonal projection in the traditional CNN architecture, the prompt gradient orthogonal projection in the ViT architecture shows completely different and greater challenges, i.e., 1) the high-order and non-linear self-attention operation; 2) the drift of prompt distribution brought by the LayerNorm in the transformer block. Theoretically, we have finally deduced two consistency conditions to achieve the prompt gradient orthogonal projection, which provide a theoretical guarantee of eliminating interference on previously learned knowledge via the self-attention mechanism in visual prompt tuning. In practice, an effective null-space-based approximation solution has been proposed to implement the prompt gradient orthogonal projection. Extensive experimental results demonstrate the effectiveness of anti-forgetting on four class-incremental benchmarks with diverse pre-trained baseline models, and our approach achieves superior performances to state-of-the-art methods. Our code is available at https://github.com/zugexiaodui/VPTinNSforCL.

6/12/2024