Learning to Retrieve Iteratively for In-Context Learning

2406.14739

Published 6/24/2024 by Yunmo Chen, Tongfei Chen, Harsh Jhamtani, Patrick Xia, Richard Shin, Jason Eisner, Benjamin Van Durme

cs.CL

Learning to Retrieve Iteratively for In-Context Learning

Abstract

We introduce iterative retrieval, a novel framework that empowers retrievers to make iterative decisions through policy optimization. Finding an optimal portfolio of retrieved items is a combinatorial optimization problem, generally considered NP-hard. This approach provides a learned approximation to such a solution, meeting specific task requirements under a given family of large language models (LLMs). We propose a training procedure based on reinforcement learning, incorporating feedback from LLMs. We instantiate an iterative retriever for composing in-context learning (ICL) exemplars and apply it to various semantic parsing tasks that demand synthesized programs as outputs. By adding only 4M additional parameters for state encoding, we convert an off-the-shelf dense retriever into a stateful iterative retriever, outperforming previous methods in selecting ICL exemplars on semantic parsing datasets such as CalFlow, TreeDST, and MTOP. Additionally, the trained iterative retriever generalizes across different inference LLMs beyond the one used during training.

Create account to get full access

Overview

Introduces an iterative retrieval approach for in-context learning, where a model learns to retrieve relevant information from a knowledge base to aid in solving downstream tasks.
Iterative retrieval allows the model to refine its understanding of the context and retrieve more relevant information over multiple steps.
Experiments show the iterative retriever outperforms standard retrieval-based methods on various in-context learning benchmarks.

Plain English Explanation

The paper describes a new way for machine learning models to learn from and use information to solve problems. In many tasks, models need to understand the context or background information to perform well. This paper introduces an "iterative retriever" - a model that can repeatedly search through a knowledge base to find the most relevant information to help it solve the task.

The key idea is that the model doesn't just retrieve information once, but goes through multiple rounds of searching and refining its understanding of the context. This allows the model to progressively hone in on the most useful information, rather than relying on a single retrieval attempt.

The paper shows through experiments that this iterative retrieval approach outperforms standard retrieval-based methods on various benchmarks for in-context learning - tasks where the model needs to leverage the provided context to perform well. The iterative nature allows the model to build a more complete and accurate understanding of the relevant background information.

Technical Explanation

The paper introduces an "iterative retriever" model for in-context learning tasks. The core idea is to have the model repeatedly retrieve relevant information from a knowledge base to aid in solving a downstream task, rather than relying on a single retrieval attempt.

The iterative retriever consists of three key components:

Context Encoder: Encodes the task input and any provided context into a latent representation.
Retriever: Retrieves the most relevant information from the knowledge base based on the current context encoding.
Integrator: Combines the retrieved information with the current context encoding to update the model's understanding.

This process is repeated for multiple iterations, allowing the model to progressively refine its retrieval and context understanding. Experiments on various in-context learning benchmarks, such as RetiCL, Adversarial Robustness, and Context Learning, demonstrate the effectiveness of this iterative retrieval approach compared to standard retrieval-based methods.

The authors also discuss potential extensions, such as using reinforcement learning to optimize the retrieval process (Recall, Retrieve, Reason) and fine-tuning the model on specific tasks (Iterative Forward Tuning).

Critical Analysis

The paper presents a compelling approach to in-context learning, addressing the limitations of standard retrieval-based methods. The iterative nature of the retriever allows the model to gradually build a more comprehensive understanding of the relevant background information, which is a key strength.

However, the paper does not extensively explore the limitations or failure modes of the iterative retriever. For example, it would be interesting to understand how the model performs in situations where the knowledge base is incomplete or noisy, or when the task requires reasoning beyond simple information retrieval.

Additionally, the authors mention potential extensions, such as using reinforcement learning to optimize the retrieval process, but do not provide details or experimental results for these ideas. Exploring these avenues could further strengthen the iterative retriever approach and provide insights into its scalability and generalizability.

Overall, the paper presents a promising direction for in-context learning, but additional research is needed to fully understand the capabilities and limitations of the iterative retriever model.

Conclusion

This paper introduces an iterative retriever model for in-context learning tasks, where a model repeatedly retrieves relevant information from a knowledge base to aid in solving a downstream problem. The key innovation is the iterative nature of the retrieval process, which allows the model to progressively refine its understanding of the context and retrieve more useful information over multiple steps.

Experiments show that this iterative retrieval approach outperforms standard retrieval-based methods on various in-context learning benchmarks. The ability to build a more comprehensive understanding of the relevant context is a significant advantage of this approach.

While the paper presents a compelling solution, further research is needed to explore the limitations and potential extensions of the iterative retriever model. Investigating its performance in more challenging scenarios and incorporating advanced techniques, such as reinforcement learning, could lead to further improvements in in-context learning capabilities.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

📶

RetICL: Sequential Retrieval of In-Context Examples with Reinforcement Learning

Alexander Scarlatos, Andrew Lan

Recent developments in large pre-trained language models have enabled unprecedented performance on a variety of downstream tasks. Achieving best performance with these models often leverages in-context learning, where a model performs a (possibly new) task given one or more examples. However, recent work has shown that the choice of examples can have a large impact on task performance and that finding an optimal set of examples is non-trivial. While there are many existing methods for selecting in-context examples, they generally score examples independently, ignoring the dependency between them and the order in which they are provided to the model. In this work, we propose Retrieval for In-Context Learning (RetICL), a learnable method for modeling and optimally selecting examples sequentially for in-context learning. We frame the problem of sequential example selection as a Markov decision process and train an example retriever using reinforcement learning. We evaluate RetICL on math word problem solving and scientific question answering tasks and show that it consistently outperforms or matches heuristic and learnable baselines. We also use case studies to show that RetICL implicitly learns representations of problem solving strategies.

4/17/2024

cs.CL cs.AI cs.LG

Evaluating the Adversarial Robustness of Retrieval-Based In-Context Learning for Large Language Models

Simon Chi Lok Yu, Jie He, Pasquale Minervini, Jeff Z. Pan

With the emergence of large language models, such as LLaMA and OpenAI GPT-3, In-Context Learning (ICL) gained significant attention due to its effectiveness and efficiency. However, ICL is very sensitive to the choice, order, and verbaliser used to encode the demonstrations in the prompt. Retrieval-Augmented ICL methods try to address this problem by leveraging retrievers to extract semantically related examples as demonstrations. While this approach yields more accurate results, its robustness against various types of adversarial attacks, including perturbations on test samples, demonstrations, and retrieved data, remains under-explored. Our study reveals that retrieval-augmented models can enhance robustness against test sample attacks, outperforming vanilla ICL with a 4.87% reduction in Attack Success Rate (ASR); however, they exhibit overconfidence in the demonstrations, leading to a 2% increase in ASR for demonstration attacks. Adversarial training can help improve the robustness of ICL methods to adversarial attacks; however, such a training scheme can be too costly in the context of LLMs. As an alternative, we introduce an effective training-free adversarial defence method, DARD, which enriches the example pool with those attacked samples. We show that DARD yields improvements in performance and robustness, achieving a 15% reduction in ASR over the baselines. Code and data are released to encourage further research: https://github.com/simonucl/adv-retreival-icl

5/28/2024

cs.CL cs.AI

In-Context Learning or: How I learned to stop worrying and love Applied Information Retrieval

Andrew Parry, Debasis Ganguly, Manish Chandra

With the increasing ability of large language models (LLMs), in-context learning (ICL) has evolved as a new paradigm for natural language processing (NLP), where instead of fine-tuning the parameters of an LLM specific to a downstream task with labeled examples, a small number of such examples is appended to a prompt instruction for controlling the decoder's generation process. ICL, thus, is conceptually similar to a non-parametric approach, such as $k$-NN, where the prediction for each instance essentially depends on the local topology, i.e., on a localised set of similar instances and their labels (called few-shot examples). This suggests that a test instance in ICL is analogous to a query in IR, and similar examples in ICL retrieved from a training set relate to a set of documents retrieved from a collection in IR. While standard unsupervised ranking models can be used to retrieve these few-shot examples from a training set, the effectiveness of the examples can potentially be improved by re-defining the notion of relevance specific to its utility for the downstream task, i.e., considering an example to be relevant if including it in the prompt instruction leads to a correct prediction. With this task-specific notion of relevance, it is possible to train a supervised ranking model (e.g., a bi-encoder or cross-encoder), which potentially learns to optimally select the few-shot examples. We believe that the recent advances in neural rankers can potentially find a use case for this task of optimally choosing examples for more effective downstream ICL predictions.

5/3/2024

cs.IR

Recall, Retrieve and Reason: Towards Better In-Context Relation Extraction

Guozheng Li, Peng Wang, Wenjun Ke, Yikai Guo, Ke Ji, Ziyu Shang, Jiajun Liu, Zijie Xu

Relation extraction (RE) aims to identify relations between entities mentioned in texts. Although large language models (LLMs) have demonstrated impressive in-context learning (ICL) abilities in various tasks, they still suffer from poor performances compared to most supervised fine-tuned RE methods. Utilizing ICL for RE with LLMs encounters two challenges: (1) retrieving good demonstrations from training examples, and (2) enabling LLMs exhibit strong ICL abilities in RE. On the one hand, retrieving good demonstrations is a non-trivial process in RE, which easily results in low relevance regarding entities and relations. On the other hand, ICL with an LLM achieves poor performance in RE while RE is different from language modeling in nature or the LLM is not large enough. In this work, we propose a novel recall-retrieve-reason RE framework that synergizes LLMs with retrieval corpora (training examples) to enable relevant retrieving and reliable in-context reasoning. Specifically, we distill the consistently ontological knowledge from training datasets to let LLMs generate relevant entity pairs grounded by retrieval corpora as valid queries. These entity pairs are then used to retrieve relevant training examples from the retrieval corpora as demonstrations for LLMs to conduct better ICL via instruction tuning. Extensive experiments on different LLMs and RE datasets demonstrate that our method generates relevant and valid entity pairs and boosts ICL abilities of LLMs, achieving competitive or new state-of-the-art performance on sentence-level RE compared to previous supervised fine-tuning methods and ICL-based methods.

4/30/2024

cs.CL cs.AI