Show, Don't Tell: Aligning Language Models with Demonstrated Feedback

Read original: arXiv:2406.00888 - Published 6/4/2024 by Omar Shaikh, Michelle Lam, Joey Hejna, Yijia Shao, Michael Bernstein, Diyi Yang

Show, Don't Tell: Aligning Language Models with Demonstrated Feedback

Overview

This paper presents a method to align language models with demonstrated feedback, rather than relying solely on text-based instructions.
The key idea is to use demonstrations of desired behavior, rather than just written descriptions, to train language models to perform tasks.
This approach, called "Show, Don't Tell," aims to create language models that can better understand and execute complex tasks by learning from real-world examples.

Plain English Explanation

The paper describes a new way to train language models, which are AI systems that can understand and generate human language. Typically, these models are trained on large datasets of text, like books and websites. This allows them to learn the patterns and structures of language.

However, the authors argue that this text-based training has limitations. It can be difficult for language models to fully understand complex tasks or behaviors just from reading about them. Instead, the researchers propose training the models using "demonstrations" - actual examples of the desired behavior, like a person performing a task step-by-step.

By learning from these real-world demonstrations, the language models can gain a more intuitive, hands-on understanding of the task. This "Show, Don't Tell" approach aims to create models that are better able to comprehend and execute complex instructions, rather than just parroting back text.

The paper explores different techniques for incorporating this demonstrated feedback into the language model training process. The goal is to develop AI assistants that can understand natural language commands in context and take appropriate actions, rather than just generating generic text responses.

Technical Explanation

The key innovation in this paper is the use of "demonstrated feedback" to train language models, rather than relying solely on text-based training data. The authors propose several methods to incorporate this demonstrated feedback:

Demonstration Augmentation: The training dataset is expanded to include not just text, but also demonstrations of the desired behavior. The language model is then trained to generate outputs that match the demonstrated actions.
Self-Alignment: The language model is trained to generate outputs that are consistent with its own internal understanding of the task, as revealed through its behavior during the demonstration phase. This "self-alignment" helps the model learn a coherent internal representation of the task.
Instructable Reward Models: The authors develop reward models that can evaluate the quality of the language model's outputs based on the demonstrated examples. This allows the model to be fine-tuned to better match the demonstrated behavior.

These techniques are evaluated on a variety of language tasks, from text generation to task completion. The results show that incorporating demonstrated feedback can lead to significant performance improvements compared to text-only training.

Critical Analysis

The "Show, Don't Tell" approach presented in this paper is a promising direction for advancing the capabilities of language models. By grounding the models in real-world demonstrations, rather than just textual data, they can potentially develop a more nuanced and contextual understanding of language and tasks.

However, the paper does not fully address some important limitations and challenges of this approach:

Data Collection: Gathering high-quality demonstration data at scale could be a significant practical hurdle. Collecting and annotating such datasets requires significant time and effort.
Generalization: It's unclear how well the models trained on specific demonstrations will generalize to novel situations or instructions. The paper notes this as an area for further research.
Alignment with Human Values: As language models become more capable, there are important questions about aligning their outputs with human preferences and ethical considerations. The paper does not address these broader societal implications.

Overall, the "Show, Don't Tell" approach is an intriguing step towards more capable and contextual language models. However, further research is needed to address the practical and ethical challenges of this technology. Demonstration-based learning remains an active area of exploration in the field of AI.

Conclusion

This paper presents a novel approach to training language models by incorporating demonstrated feedback, rather than relying solely on text-based data. The key idea is to use real-world examples of desired behavior to help the models develop a more intuitive, context-aware understanding of language and tasks.

The techniques described, such as demonstration augmentation and instructable reward models, show promise in improving the performance of language models on a variety of tasks. However, the paper also highlights important challenges around data collection, generalization, and broader societal implications that will need to be addressed in future research.

Overall, the "Show, Don't Tell" approach represents an important step towards more capable and aligned language models that can better understand and execute complex instructions. As AI systems become increasingly integrated into our lives, techniques like this will be crucial for developing assistants that can seamlessly and safely interact with humans.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Show, Don't Tell: Aligning Language Models with Demonstrated Feedback

Omar Shaikh, Michelle Lam, Joey Hejna, Yijia Shao, Michael Bernstein, Diyi Yang

Language models are aligned to emulate the collective voice of many, resulting in outputs that align with no one in particular. Steering LLMs away from generic output is possible through supervised finetuning or RLHF, but requires prohibitively large datasets for new ad-hoc tasks. We argue that it is instead possible to align an LLM to a specific setting by leveraging a very small number ($<10$) of demonstrations as feedback. Our method, Demonstration ITerated Task Optimization (DITTO), directly aligns language model outputs to a user's demonstrated behaviors. Derived using ideas from online imitation learning, DITTO cheaply generates online comparison data by treating users' demonstrations as preferred over output from the LLM and its intermediate checkpoints. We evaluate DITTO's ability to learn fine-grained style and task alignment across domains such as news articles, emails, and blog posts. Additionally, we conduct a user study soliciting a range of demonstrations from participants ($N=16$). Across our benchmarks and user study, we find that win-rates for DITTO outperform few-shot prompting, supervised fine-tuning, and other self-play methods by an average of 19% points. By using demonstrations as feedback directly, DITTO offers a novel method for effective customization of LLMs.

6/4/2024

Policy Improvement using Language Feedback Models

Victor Zhong, Dipendra Misra, Xingdi Yuan, Marc-Alexandre C^ot'e

We introduce Language Feedback Models (LFMs) that identify desirable behaviour - actions that help achieve tasks specified in the instruction - for imitation learning in instruction following. To train LFMs, we obtain feedback from Large Language Models (LLMs) on visual trajectories verbalized to language descriptions. First, by using LFMs to identify desirable behaviour to imitate, we improve in task-completion rate over strong behavioural cloning baselines on three distinct language grounding environments (Touchdown, ScienceWorld, and ALFWorld). Second, LFMs outperform using LLMs as experts to directly predict actions, when controlling for the number of LLM output tokens. Third, LFMs generalize to unseen environments, improving task-completion rate by 3.5-12.0% through one round of adaptation. Finally, LFM can be modified to provide human-interpretable feedback without performance loss, allowing human verification of desirable behaviour for imitation learning.

4/22/2024

SeCoKD: Aligning Large Language Models for In-Context Learning with Fewer Shots

Weixing Wang, Haojin Yang, Christoph Meinel

Previous studies have shown that demonstrations can significantly help Large Language Models (LLMs ) perform better on the given tasks. However, this so-called In-Context Learning ( ICL ) ability is very sensitive to the presenting context, and often dozens of demonstrations are needed. In this work, we investigate if we can reduce the shot number while still maintaining a competitive performance. We present SeCoKD, a self-Knowledge Distillation ( KD ) training framework that aligns the student model with a heavily prompted variation, thereby increasing the utilization of a single demonstration. We experiment with the SeCoKD across three LLMs and six benchmarks focusing mainly on reasoning tasks. Results show that our method outperforms the base model and Supervised Fine-tuning ( SFT ), especially in zero-shot and one-shot settings by 30% and 10%, respectively. Moreover, SeCoKD brings little negative artifacts when evaluated on new tasks, which is more robust than Supervised Fine-tuning.

6/21/2024

💬

Babysit A Language Model From Scratch: Interactive Language Learning by Trials and Demonstrations

Ziqiao Ma, Zekun Wang, Joyce Chai

Humans are efficient language learners and inherently social creatures. Our language development is largely shaped by our social interactions, for example, the demonstration and feedback from caregivers. Contrary to human language learning, recent advancements in large language models have primarily adopted a non-interactive training paradigm, and refined pre-trained models through feedback afterward. In this work, we aim to examine how corrective feedback from interactions influences neural language acquisition from the ground up through systematically controlled experiments, assessing whether it contributes to learning efficiency in language models. We introduce a trial-and-demonstration (TnD) learning framework that incorporates three components: student trials, teacher demonstrations, and a reward conditioned on language competence at various developmental stages. Our experiments reveal that the TnD approach accelerates word acquisition for student models of equal and smaller numbers of parameters, and we highlight the significance of both trials and demonstrations. We further show that the teacher's choices of words influence students' word-specific learning efficiency, and a practice-makes-perfect effect is evident by a strong correlation between the frequency of words in trials and their respective learning curves. Our findings suggest that interactive language learning, with teacher demonstrations and student trials, can facilitate efficient word learning in language models.

5/24/2024