Soft Prompt Tuning for Augmenting Dense Retrieval with Large Language Models

Read original: arXiv:2307.08303 - Published 6/18/2024 by Zhiyuan Peng, Xuyang Wu, Qifan Wang, Yi Fang

💬

Overview

Dense retrieval (DR) converts queries and documents into dense embeddings and measures their similarity in vector space
One challenge in DR is the lack of domain-specific training data
Transfer learning from large public datasets like MS MARCO may not benefit all DR models and domains equally
Researchers have used large language models (LLMs) to improve zero-shot and few-shot DR models, but human-written prompts can lead to low-quality generated queries
The proposed approach, Soft Prompt Tuning for Augmenting DR (SPTAR), uses soft prompt tuning to optimize task-specific prompts and generate high-quality "weak" queries to train DR models

Plain English Explanation

Dense retrieval is a way of finding relevant documents for a given query by converting both the query and the documents into compact numerical representations called embeddings, and then measuring how similar the query embedding is to each document embedding. This allows for efficient information retrieval, as the computations can be done quickly in vector space.

One challenge with dense retrieval is that it often requires a lot of training data, specifically examples of queries and the documents that should be retrieved for those queries. While large public datasets like MS MARCO can be used for transfer learning, the benefits may not be equal across all types of content and retrieval tasks.

To address this, some researchers have turned to large language models - powerful AI systems that can generate human-like text. The idea is to use these language models to generate "weak" queries that can be used to train the dense retrieval models, even when there isn't much labeled data available.

However, the prompts used to generate these weak queries are often manually written by humans, which can lead to low-quality results. The new SPTAR approach solves this by using "soft prompt tuning" - automatically optimizing the prompts used to generate the weak queries, based on the limited ground truth data that is available. This helps ensure the generated queries are high-quality and useful for training the dense retrieval models.

The experiments show that SPTAR outperforms other unsupervised techniques like BM25 as well as previous LLM-based methods for augmenting dense retrieval.

Technical Explanation

The key idea behind the SPTAR approach is to leverage soft prompt tuning to optimize task-specific prompts that can be used to generate high-quality "weak" queries from large language models (LLMs).

For each target task or domain, the researchers first tune a soft prompt on the limited ground truth data available. This soft prompt is then used to prompt the LLM to generate weak queries for unlabeled documents. This yields a larger set of document-query pairs that can be used to train a dense retrieval model for the target task.

To further improve the quality of the generated weak queries, the researchers design a filtering step to select only the highest quality document-query pairs. This helps ensure the final training data is as clean and relevant as possible.

The experiments compare SPTAR to unsupervised baselines like BM25 as well as a recently proposed LLM-based data augmentation method. The results show that SPTAR outperforms these alternatives, demonstrating the benefits of the soft prompt tuning approach for generating high-quality weak queries to train task-specific dense retrievers.

Critical Analysis

The SPTAR approach addresses an important challenge in dense retrieval - the need for large amounts of domain-specific training data. By leveraging soft prompt tuning and large language models, the researchers are able to generate "weak" queries that can supplement the limited ground truth data, without sacrificing quality.

One potential limitation is that the success of the approach may depend on the quality and coverage of the underlying language model. If the LLM has biases or gaps in its knowledge, this could be reflected in the generated weak queries. The researchers do mention a filtering step to address this, but further investigation into the robustness of the approach across different LLMs and domains would be valuable.

Additionally, the paper does not provide a deep analysis of the characteristics of the generated weak queries, such as their syntactic complexity, topical relevance, or similarity to human-written queries. A more detailed examination of these properties could shed light on how the soft prompt tuning process shapes the query generation.

Overall, the SPTAR approach represents an innovative use of prompt tuning techniques to address a key challenge in dense retrieval. The promising results suggest this is a fruitful direction for further research and development in this area.

Conclusion

The SPTAR approach tackles the problem of limited training data for dense retrieval models by leveraging soft prompt tuning to generate high-quality "weak" queries from large language models. This allows for effective data augmentation, even when ground truth data is scarce.

The experiments demonstrate that SPTAR outperforms unsupervised baselines as well as previous LLM-based data augmentation methods for dense retrieval. This highlights the potential of prompt tuning techniques to improve the performance of retrieval systems in low-resource settings.

As the use of large language models continues to grow in AI applications, the SPTAR approach provides a template for how these powerful models can be effectively integrated into data-driven tasks like dense retrieval. Further research on the robustness and generalization of this technique could lead to significant advancements in information access and knowledge discovery.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

💬

Soft Prompt Tuning for Augmenting Dense Retrieval with Large Language Models

Zhiyuan Peng, Xuyang Wu, Qifan Wang, Yi Fang

Dense retrieval (DR) converts queries and documents into dense embeddings and measures the similarity between queries and documents in vector space. One of the challenges in DR is the lack of domain-specific training data. While DR models can learn from large-scale public datasets like MS MARCO through transfer learning, evidence shows that not all DR models and domains can benefit from transfer learning equally. Recently, some researchers have resorted to large language models (LLMs) to improve the zero-shot and few-shot DR models. However, the hard prompts or human-written prompts utilized in these works cannot guarantee the good quality of generated weak queries. To tackle this, we propose soft prompt tuning for augmenting DR (SPTAR): For each task, we leverage soft prompt-tuning to optimize a task-specific soft prompt on limited ground truth data and then prompt the LLMs to tag unlabeled documents with weak queries, yielding enough weak document-query pairs to train task-specific dense retrievers. We design a filter to select high-quality example document-query pairs in the prompt to further improve the quality of weak tagged queries. To the best of our knowledge, there is no prior work utilizing soft prompt tuning to augment DR models. The experiments demonstrate that SPTAR outperforms the unsupervised baselines BM25 and the recently proposed LLMs-based augmentation method for DR.

6/18/2024

Selective Prompting Tuning for Personalized Conversations with LLMs

Qiushi Huang, Xubo Liu, Tom Ko, Bo Wu, Wenwu Wang, Yu Zhang, Lilian Tang

In conversational AI, personalizing dialogues with persona profiles and contextual understanding is essential. Despite large language models' (LLMs) improved response coherence, effective persona integration remains a challenge. In this work, we first study two common approaches for personalizing LLMs: textual prompting and direct fine-tuning. We observed that textual prompting often struggles to yield responses that are similar to the ground truths in datasets, while direct fine-tuning tends to produce repetitive or overly generic replies. To alleviate those issues, we propose textbf{S}elective textbf{P}rompt textbf{T}uning (SPT), which softly prompts LLMs for personalized conversations in a selective way. Concretely, SPT initializes a set of soft prompts and uses a trainable dense retriever to adaptively select suitable soft prompts for LLMs according to different input contexts, where the prompt retriever is dynamically updated through feedback from the LLMs. Additionally, we propose context-prompt contrastive learning and prompt fusion learning to encourage the SPT to enhance the diversity of personalized conversations. Experiments on the CONVAI2 dataset demonstrate that SPT significantly enhances response diversity by up to 90%, along with improvements in other critical performance indicators. Those results highlight the efficacy of SPT in fostering engaging and personalized dialogue generation. The SPT model code (https://github.com/hqsiswiliam/SPT) is publicly available for further exploration.

6/27/2024

🛠️

APrompt4EM: Augmented Prompt Tuning for Generalized Entity Matching

Yikuan Xia, Jiazun Chen, Xinchi Li, Jun Gao

Generalized Entity Matching (GEM), which aims at judging whether two records represented in different formats refer to the same real-world entity, is an essential task in data management. The prompt tuning paradigm for pre-trained language models (PLMs), including the recent PromptEM model, effectively addresses the challenges of low-resource GEM in practical applications, offering a robust solution when labeled data is scarce. However, existing prompt tuning models for GEM face the challenges of prompt design and information gap. This paper introduces an augmented prompt tuning framework for the challenges, which consists of two main improvements. The first is an augmented contextualized soft token-based prompt tuning method that extracts a guiding soft token benefit for the PLMs' prompt tuning, and the second is a cost-effective information augmentation strategy leveraging large language models (LLMs). Our approach performs well on the low-resource GEM challenges. Extensive experiments show promising advancements of our basic model without information augmentation over existing methods based on moderate-size PLMs (average 5.24%+), and our model with information augmentation achieves comparable performance compared with fine-tuned LLMs, using less than 14% of the API fee.

5/9/2024

Passage-specific Prompt Tuning for Passage Reranking in Question Answering with Large Language Models

Xuyang Wu, Zhiyuan Peng, Krishna Sravanthi Rajanala Sai, Hsin-Tai Wu, Yi Fang

Effective passage retrieval and reranking methods have been widely utilized to identify suitable candidates in open-domain question answering tasks, recent studies have resorted to LLMs for reranking the retrieved passages by the log-likelihood of the question conditioned on each passage. Although these methods have demonstrated promising results, the performance is notably sensitive to the human-written prompt (or hard prompt), and fine-tuning LLMs can be computationally intensive and time-consuming. Furthermore, this approach limits the leverage of question-passage relevance pairs and passage-specific knowledge to enhance the ranking capabilities of LLMs. In this paper, we propose passage-specific prompt tuning for reranking in open-domain question answering (PSPT): a parameter-efficient method that fine-tunes learnable passage-specific soft prompts, incorporating passage-specific knowledge from a limited set of question-passage relevance pairs. The method involves ranking retrieved passages based on the log-likelihood of the model generating the question conditioned on each passage and the learned soft prompt. We conducted extensive experiments utilizing the Llama-2-chat-7B model across three publicly available open-domain question answering datasets and the results demonstrate the effectiveness of the proposed approach.

6/24/2024