QCRD: Quality-guided Contrastive Rationale Distillation for Large Language Models

2405.13014

Published 5/24/2024 by Wei Wang, Zhaowei Li, Qi Xu, Yiqing Cai, Hang Song, Qi Qi, Ran Zhou, Zhida Huang, Tao Wang, Li Xiao

cs.CL cs.AI

💬

Abstract

Deploying large language models (LLMs) poses challenges in terms of resource limitations and inference efficiency. To address these challenges, recent research has focused on using smaller task-specific language models, which are enhanced by distilling the knowledge rationales generated by LLMs. However, previous works mostly emphasize the effectiveness of positive knowledge, while overlooking the knowledge noise and the exploration of negative knowledge. In this paper, we first propose a general approach called quality-guided contrastive rationale distillation for reasoning capacity learning, considering contrastive learning perspectives. For the learning of positive knowledge, we collect positive rationales through self-consistency to denoise the LLM rationales generated by temperature sampling. For the negative knowledge distillation, we generate negative rationales using temperature sampling for the iteration-before smaller language models themselves. Finally, a contrastive loss is designed to better distill the positive and negative rationales into the smaller language model, where an online-update discriminator is used to judge the qualities of rationales and assign weights for better optimizing the training process. Through extensive experiments on multiple reasoning tasks, we demonstrate that our method consistently outperforms the previous distillation methods and produces higher-quality rationales.

Create account to get full access

Overview

Deploying large language models (LLMs) is challenging due to resource limitations and inference efficiency.
Recent research has focused on using smaller task-specific language models enhanced by distilling knowledge from LLMs.
Previous works emphasize the effectiveness of positive knowledge but overlook knowledge noise and negative knowledge.

Plain English Explanation

Large language models like GPT-3 are powerful, but they require a lot of computing power and memory to run. This makes it difficult to deploy them in many real-world applications. Researchers have been exploring ways to address this challenge by creating smaller, more efficient models that can still benefit from the knowledge of the larger models.

One approach is to distill the key insights or "rationales" from the LLM and use them to train a smaller, task-specific model. This allows the smaller model to leverage the broader understanding of the LLM while being more lightweight and efficient. Previous distillation methods have focused on capturing the positive knowledge from the LLM, but they've overlooked two important aspects: the noise or inaccuracies in the LLM's rationales, and the potential value of negative knowledge (i.e., what the model should not do).

This paper proposes a new quality-guided contrastive rationale distillation method that addresses these limitations. It uses techniques like self-consistency and temperature sampling to extract cleaner positive rationales from the LLM, and it also explicitly distills negative rationales to help the smaller model learn what not to do. By incorporating both positive and negative knowledge in a contrastive fashion, the researchers were able to train smaller models that outperformed previous distillation approaches on a range of reasoning tasks.

Technical Explanation

The key technical contributions of this paper are:

Positive Rationale Distillation: The authors collect positive rationales from the LLM using a self-consistency approach, which helps to denoise the rationales generated by temperature sampling.
Negative Rationale Distillation: The authors generate negative rationales using temperature sampling, which are then used to train the smaller model to avoid undesirable behaviors.
Contrastive Loss: A contrastive loss function is designed to effectively distill both the positive and negative rationales into the smaller language model. An online-update discriminator is used to judge the quality of the rationales and assign appropriate weights during training.

Through extensive experiments on multiple reasoning tasks, the authors demonstrate that their quality-guided contrastive rationale distillation method consistently outperforms previous distillation approaches and produces higher-quality rationales. This suggests that explicitly considering both positive and negative knowledge can be a valuable strategy for enhancing the reasoning capabilities of smaller language models.

Critical Analysis

The paper presents a thoughtful and well-designed approach to addressing the challenges of deploying LLMs in resource-constrained environments. By considering both positive and negative knowledge distillation, the authors have introduced an important new dimension to the problem that was largely overlooked in prior work.

One potential limitation of the approach is the reliance on temperature sampling to generate the negative rationales. While this technique is effective, it may be sensitive to hyperparameter tuning and could introduce additional complexities. It would be interesting to see if other methods for extracting negative knowledge, such as those explored in Improving Language Model Reasoning through Self-Motivated Learning, could be integrated into the framework.

Additionally, the paper does not provide much insight into the quality or interpretability of the rationales generated by the smaller models. It would be valuable to see more qualitative analysis of the model outputs to better understand the nature of the positive and negative knowledge being distilled.

Overall, this research represents an important step forward in the field of efficient and effective language model deployment. By focusing on both the positive and negative aspects of knowledge distillation, the authors have opened up new avenues for further exploration and improvement.

Conclusion

This paper presents a novel quality-guided contrastive rationale distillation method for training smaller, task-specific language models that can leverage the broad knowledge of larger LLMs. By explicitly considering both positive and negative rationales, the authors have developed an approach that consistently outperforms previous distillation techniques and produces higher-quality model outputs.

This research highlights the importance of going beyond the simple transfer of positive knowledge when distilling language models. By also accounting for negative knowledge and the quality of the rationales being distilled, the authors have demonstrated a more holistic and effective strategy for enhancing the reasoning capabilities of smaller, more efficient models. This work has important implications for the practical deployment of advanced language technologies in a wide range of real-world applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🌀

RDRec: Rationale Distillation for LLM-based Recommendation

Xinfeng Wang, Jin Cui, Yoshimi Suzuki, Fumiyo Fukumoto

Large language model (LLM)-based recommender models that bridge users and items through textual prompts for effective semantic reasoning have gained considerable attention. However, few methods consider the underlying rationales behind interactions, such as user preferences and item attributes, limiting the reasoning capability of LLMs for recommendations. This paper proposes a rationale distillation recommender (RDRec), a compact model designed to learn rationales generated by a larger language model (LM). By leveraging rationales from reviews related to users and items, RDRec remarkably specifies their profiles for recommendations. Experiments show that RDRec achieves state-of-the-art (SOTA) performance in both top-N and sequential recommendations. Our source code is released at https://github.com/WangXFng/RDRec.

6/17/2024

cs.CL cs.IR

🤔

Efficient End-to-End Visual Document Understanding with Rationale Distillation

Wang Zhu, Alekh Agarwal, Mandar Joshi, Robin Jia, Jesse Thomason, Kristina Toutanova

Understanding visually situated language requires interpreting complex layouts of textual and visual elements. Pre-processing tools, such as optical character recognition (OCR), can map document image inputs to textual tokens, then large language models (LLMs) can reason over text. However, such methods have high computational and engineering complexity. Can small pretrained image-to-text models accurately understand visual documents through similar recognition and reasoning steps instead? We propose Rationale Distillation (RD), which incorporates the outputs of OCR tools, LLMs, and larger multimodal models as intermediate rationales, and trains a small student model to predict both rationales and answers. On three visual document understanding benchmarks representing infographics, scanned documents, and figures, our Pix2Struct (282M parameters) student model finetuned with RD outperforms the base model by 4-5% absolute accuracy with only 1% higher computational cost.

4/3/2024

cs.CV cs.CL

🏋️

Tailoring Self-Rationalizers with Multi-Reward Distillation

Sahana Ramnath, Brihi Joshi, Skyler Hallinan, Ximing Lu, Liunian Harold Li, Aaron Chan, Jack Hessel, Yejin Choi, Xiang Ren

Large language models (LMs) are capable of generating free-text rationales to aid question answering. However, prior work 1) suggests that useful self-rationalization is emergent only at significant scales (e.g., 175B parameter GPT-3); and 2) focuses largely on downstream performance, ignoring the semantics of the rationales themselves, e.g., are they faithful, true, and helpful for humans? In this work, we enable small-scale LMs (approx. 200x smaller than GPT-3) to generate rationales that not only improve downstream task performance, but are also more plausible, consistent, and diverse, assessed both by automatic and human evaluation. Our method, MaRio (Multi-rewArd RatIOnalization), is a multi-reward conditioned self-rationalization algorithm that optimizes multiple distinct properties like plausibility, diversity and consistency. Results on five difficult question-answering datasets StrategyQA, QuaRel, OpenBookQA, NumerSense and QASC show that not only does MaRio improve task accuracy, but it also improves the self-rationalization quality of small LMs across the aforementioned axes better than a supervised fine-tuning (SFT) baseline. Extensive human evaluations confirm that MaRio rationales are preferred vs. SFT rationales, as well as qualitative improvements in plausibility and consistency.

5/24/2024

cs.CL

Beyond Imitation: Learning Key Reasoning Steps from Dual Chain-of-Thoughts in Reasoning Distillation

Chengwei Dai, Kun Li, Wei Zhou, Songlin Hu

As Large Language Models (LLMs) scale up and gain powerful Chain-of-Thoughts (CoTs) reasoning abilities, practical resource constraints drive efforts to distill these capabilities into more compact Smaller Language Models (SLMs). We find that CoTs consist mainly of simple reasoning forms, with a small proportion ($approx 4.7%$) of key reasoning steps that truly impact conclusions. However, previous distillation methods typically involve supervised fine-tuning student SLMs only on correct CoTs data produced by teacher LLMs, resulting in students struggling to learn the key reasoning steps, instead imitating the teacher's reasoning forms and making errors or omissions on these steps. To address these issues, drawing an analogy to human learning, where analyzing mistakes according to correct solutions often reveals the crucial steps leading to successes or failures, we propose mistaktextbf{E}-textbf{D}riven key reasontextbf{I}ng step distillatextbf{T}ion (textbf{EDIT}), a novel method that further aids SLMs learning key reasoning steps rather than mere simple fine-tuning. Firstly, to expose these crucial steps in CoTs, we design specific prompts to generate dual CoTs data with similar reasoning paths but divergent conclusions. Then, we apply the minimum edit distance algorithm on the dual CoTs data to locate these key steps and optimize the likelihood of these steps. Extensive experiments validate the effectiveness of EDIT across both in-domain and out-of-domain benchmark reasoning datasets. Further analysis shows that EDIT can generate high-quality CoTs with more correct key reasoning steps. Notably, we also explore how different mistake patterns affect performance and find that EDIT benefits more from logical errors than from knowledge or mathematical calculation errors in dual CoTsfootnote{Code can be found at url{https://github.com/C-W-D/EDIT}}.

5/31/2024

cs.CL cs.AI