Supervised Knowledge Makes Large Language Models Better In-context Learners

2312.15918

Published 4/12/2024 by Linyi Yang, Shuibai Zhang, Zhuohao Yu, Guangsheng Bao, Yidong Wang, Jindong Wang, Ruochen Xu, Wei Ye, Xing Xie, Weizhu Chen and 1 other

cs.CL cs.AI

Supervised Knowledge Makes Large Language Models Better In-context Learners

Abstract

Large Language Models (LLMs) exhibit emerging in-context learning abilities through prompt engineering. The recent progress in large-scale generative models has further expanded their use in real-world language applications. However, the critical challenge of improving the generalizability and factuality of LLMs in natural language understanding and question answering remains under-explored. While previous in-context learning research has focused on enhancing models to adhere to users' specific instructions and quality expectations, and to avoid undesired outputs, little to no work has explored the use of task-Specific fine-tuned Language Models (SLMs) to improve LLMs' in-context learning during the inference stage. Our primary contribution is the establishment of a simple yet effective framework that enhances the reliability of LLMs as it: 1) generalizes out-of-distribution data, 2) elucidates how LLMs benefit from discriminative models, and 3) minimizes hallucinations in generative tasks. Using our proposed plug-in method, enhanced versions of Llama 2 and ChatGPT surpass their original versions regarding generalizability and factuality. We offer a comprehensive suite of resources, including 16 curated datasets, prompts, model checkpoints, and LLM outputs across 9 distinct tasks. The code and data are released at: https://github.com/YangLinyi/Supervised-Knowledge-Makes-Large-Language-Models-Better-In-context-Learners. Our empirical analysis sheds light on the advantages of incorporating discriminative models into LLMs and highlights the potential of our methodology in fostering more reliable LLMs.

Get summaries of the top AI research delivered straight to your inbox:

Overview

This paper explores how incorporating supervised knowledge can improve the in-context learning capabilities of large language models.
The researchers propose a novel method called "supervised in-context learning" that leverages external knowledge to enhance a model's ability to learn and generalize from few examples.
The paper presents empirical results demonstrating the benefits of this approach across various tasks, including few-shot classification, question answering, and text generation.

Plain English Explanation

Large language models, such as GPT-3, have shown remarkable abilities in tasks like text generation and question answering. However, these models can struggle with "in-context learning" - the ability to rapidly learn and apply new information from just a few examples provided in the input context.

The researchers in this paper hypothesized that giving these models access to additional supervised knowledge, beyond just the few examples in the input, could improve their in-context learning abilities. They developed a new method called "supervised in-context learning" that combines the model's existing knowledge with externally provided information related to the task at hand.

For example, if the model is asked to classify news articles as real or fake, the researchers would provide the model with not just a few sample articles, but also additional information about reliable news sources, common fake news tactics, and other relevant knowledge. This supervised knowledge helps the model better understand the task and leverage its existing capabilities to learn from the few examples more effectively.

The paper demonstrates through extensive experiments that this approach leads to significant performance improvements across a variety of tasks, compared to using just the in-context examples alone. The models are able to learn more quickly and generalize better to new situations, thanks to the supplementary supervised information.

This work highlights the potential for combining a language model's impressive but general knowledge with more targeted, task-specific information to create more capable and adaptable AI systems. By equipping these models with the right kind of background knowledge, they can become better "in-context learners" - a crucial capability for building AI that can flexibly apply its skills to new situations.

Technical Explanation

The paper proposes a novel method called "supervised in-context learning" that aims to improve the in-context learning capabilities of large language models. The key idea is to leverage external supervised knowledge, in addition to the few examples provided in the input context, to enhance the model's ability to rapidly learn and generalize.

The researchers start with a baseline in-context learning setup, where the model is given a small number of examples (e.g., a few labeled data points) and asked to perform a task (e.g., classification, question answering) on a new input. They then introduce the supervised in-context learning approach, which augments this setup by providing the model with additional relevant information, such as:

Task-specific knowledge: Factual information, definitions, or other knowledge relevant to the task at hand (e.g., details about reliable news sources for a fake news classification task).
Heuristics and strategies: High-level guidance or rules-of-thumb for solving the task (e.g., common tactics used in fake news articles).
Exemplar outputs: Examples of desired model outputs for the task (e.g., correctly classified news articles).

The paper demonstrates the benefits of this supervised in-context learning approach through experiments on a range of tasks, including few-shot classification, question answering, and text generation. The results show that the models trained with the additional supervised knowledge significantly outperform the baseline in-context learning models, achieving higher accuracy, better generalization, and faster learning.

The authors hypothesize that the supervised information helps the language models better understand the task context and leverage their existing capabilities more effectively. This enables the models to learn more efficiently from the few examples provided, rather than relying solely on their general, pre-trained knowledge.

Critical Analysis

The paper provides a compelling approach to improving the in-context learning abilities of large language models. By incorporating relevant supervised knowledge, the models are able to learn and generalize more effectively from limited examples, which is a crucial capability for many real-world applications.

However, the paper does not fully address the potential limitations and caveats of this approach. For instance, the researchers acknowledge that the benefits of supervised in-context learning may diminish as the number of training examples increases, as the model can eventually learn the task without the additional information.

Additionally, the paper does not explore the potential risks or pitfalls of providing the models with external knowledge. There could be concerns around the reliability, bias, or safety of the supervised information, and the authors do not discuss strategies for ensuring the integrity of the supplementary knowledge.

Further research is needed to understand the broader implications and generalizability of this approach. For example, it would be valuable to investigate how the supervised in-context learning method performs on more complex, real-world tasks, and how it compares to other strategies for enhancing in-context learning, such as meta-learning or uncertainty quantification.

Conclusion

This paper presents a promising approach to improving the in-context learning capabilities of large language models by leveraging supervised knowledge. The results demonstrate that providing models with relevant task-specific information, heuristics, and exemplars can significantly boost their ability to learn and generalize from limited examples.

The implications of this work extend beyond just enhancing the performance of language models on specific tasks. By combining a model's general knowledge with more targeted, supervised information, we can create AI systems that are more adaptable, flexible, and capable of rapidly learning new skills - a crucial step towards building AI that can flexibly apply its capabilities in the real world, like expanding spoken language understanding or generating better code.

As language models continue to advance, techniques like supervised in-context learning will be essential for unlocking their full potential and creating AI that can truly learn and reason in human-like ways.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

New!Language Models can Exploit Cross-Task In-context Learning for Data-Scarce Novel Tasks

Anwoy Chatterjee, Eshaan Tanwar, Subhabrata Dutta, Tanmoy Chakraborty

Large Language Models (LLMs) have transformed NLP with their remarkable In-context Learning (ICL) capabilities. Automated assistants based on LLMs are gaining popularity; however, adapting them to novel tasks is still challenging. While colossal models excel in zero-shot performance, their computational demands limit widespread use, and smaller language models struggle without context. This paper investigates whether LLMs can generalize from labeled examples of predefined tasks to novel tasks. Drawing inspiration from biological neurons and the mechanistic interpretation of the Transformer architecture, we explore the potential for information sharing across tasks. We design a cross-task prompting setup with three LLMs and show that LLMs achieve significant performance improvements despite no examples from the target task in the context. Cross-task prompting leads to a remarkable performance boost of 107% for LLaMA-2 7B, 18.6% for LLaMA-2 13B, and 3.2% for GPT 3.5 on average over zero-shot prompting, and performs comparable to standard in-context learning. The effectiveness of generating pseudo-labels for in-task examples is demonstrated, and our analyses reveal a strong correlation between the effect of cross-task examples and model activation similarities in source and target input tokens. This paper offers a first-of-its-kind exploration of LLMs' ability to solve novel tasks based on contextual signals from different task examples.

5/20/2024

cs.CL

🌀

In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax

Aaron Mueller, Albert Webson, Jackson Petty, Tal Linzen

In-context learning (ICL) is now a common method for teaching large language models (LLMs) new tasks: given labeled examples in the input context, the LLM learns to perform the task without weight updates. Do models guided via ICL infer the underlying structure of the task defined by the context, or do they rely on superficial heuristics that only generalize to identically distributed examples? We address this question using transformations tasks and an NLI task that assess sensitivity to syntax - a requirement for robust language understanding. We further investigate whether out-of-distribution generalization can be improved via chain-of-thought prompting, where the model is provided with a sequence of intermediate computation steps that illustrate how the task ought to be performed. In experiments with models from the GPT, PaLM, and Llama 2 families, we find large variance across LMs. The variance is explained more by the composition of the pre-training corpus and supervision methods than by model size; in particular, models pre-trained on code generalize better, and benefit more from chain-of-thought prompting.

4/11/2024

cs.CL

↗️

New!Enhancing Small Medical Learners with Privacy-preserving Contextual Prompting

Xinlu Zhang, Shiyang Li, Xianjun Yang, Chenxin Tian, Yao Qin, Linda Ruth Petzold

Large language models (LLMs) demonstrate remarkable medical expertise, but data privacy concerns impede their direct use in healthcare environments. Although offering improved data privacy protection, domain-specific small language models (SLMs) often underperform LLMs, emphasizing the need for methods that reduce this performance gap while alleviating privacy concerns. In this paper, we present a simple yet effective method that harnesses LLMs' medical proficiency to boost SLM performance in medical tasks under privacy-restricted scenarios. Specifically, we mitigate patient privacy issues by extracting keywords from medical data and prompting the LLM to generate a medical knowledge-intensive context by simulating clinicians' thought processes. This context serves as additional input for SLMs, augmenting their decision-making capabilities. Our method significantly enhances performance in both few-shot and full training settings across three medical knowledge-intensive tasks, achieving up to a 22.57% increase in absolute accuracy compared to SLM fine-tuning without context, and sets new state-of-the-art results in two medical tasks within privacy-restricted scenarios. Further out-of-domain testing and experiments in two general domain datasets showcase its generalizability and broad applicability. Our code can be found at https://github.com/XZhang97666/PrivacyBoost-SLM.

5/17/2024

cs.CL cs.AI

💬

Uncertainty Quantification for In-Context Learning of Large Language Models

Chen Ling, Xujiang Zhao, Xuchao Zhang, Wei Cheng, Yanchi Liu, Yiyou Sun, Mika Oishi, Takao Osaki, Katsushi Matsuda, Jie Ji, Guangji Bai, Liang Zhao, Haifeng Chen

In-context learning has emerged as a groundbreaking ability of Large Language Models (LLMs) and revolutionized various fields by providing a few task-relevant demonstrations in the prompt. However, trustworthy issues with LLM's response, such as hallucination, have also been actively discussed. Existing works have been devoted to quantifying the uncertainty in LLM's response, but they often overlook the complex nature of LLMs and the uniqueness of in-context learning. In this work, we delve into the predictive uncertainty of LLMs associated with in-context learning, highlighting that such uncertainties may stem from both the provided demonstrations (aleatoric uncertainty) and ambiguities tied to the model's configurations (epistemic uncertainty). We propose a novel formulation and corresponding estimation method to quantify both types of uncertainties. The proposed method offers an unsupervised way to understand the prediction of in-context learning in a plug-and-play fashion. Extensive experiments are conducted to demonstrate the effectiveness of the decomposition. The code and data are available at: https://github.com/lingchen0331/UQ_ICL.

4/1/2024

cs.CL cs.LG