Verifiable by Design: Aligning Language Models to Quote from Pre-Training Data

2404.03862

Published 4/8/2024 by Jingyu Zhang, Marc Marone, Tianjian Li, Benjamin Van Durme, Daniel Khashabi

💬

Abstract

For humans to trust the fluent generations of large language models (LLMs), they must be able to verify their correctness against trusted, external sources. Recent efforts aim to increase verifiability through citations of retrieved documents or post-hoc provenance. However, such citations are prone to mistakes that further complicate their verifiability. To address these limitations, we tackle the verifiability goal with a different philosophy: we trivialize the verification process by developing models that quote verbatim statements from trusted sources in pre-training data. We propose Quote-Tuning, which demonstrates the feasibility of aligning LLMs to leverage memorized information and quote from pre-training data. Quote-Tuning quantifies quoting against large corpora with efficient membership inference tools, and uses the amount of quotes as an implicit reward signal to construct a synthetic preference dataset for quoting, without any human annotation. Next, the target model is aligned to quote using preference optimization algorithms. Experimental results show that Quote-Tuning significantly increases the percentage of LLM generation quoted verbatim from high-quality pre-training documents by 55% to 130% relative to untuned models while maintaining response quality. Further experiments demonstrate that Quote-Tuning generalizes quoting to out-of-domain data, is applicable in different tasks, and provides additional benefits to truthfulness. Quote-Tuning not only serves as a hassle-free method to increase quoting but also opens up avenues for improving LLM trustworthiness through better verifiability.

Create account to get full access

Overview

Large language models (LLMs) need to be verifiable against trusted external sources for humans to trust them.
Recent efforts have tried to increase verifiability through citations, but these are prone to mistakes.
This paper proposes a different approach called Quote-Tuning to trivialize the verification process.

Plain English Explanation

For large language models to be trusted by humans, it's important that they can be verified against reliable, external sources. Recent attempts have aimed to increase this verifiability by having the models cite the documents they are referencing. However, these citations can sometimes be inaccurate, which makes the verification process more complicated.

This paper suggests a different way to approach the problem of verifiability. Instead of relying on citations, the researchers developed a method called Quote-Tuning that trains the language models to directly quote verbatim statements from the trusted sources used during their initial training. This makes it much easier for humans to verify the accuracy of the information, since they can simply check the quoted text against the original source.

The Quote-Tuning process works by using efficient tools to quantify how much the model is quoting from the training data, and then optimizing the model to increase the amount of verbatim quotes. This is done without any additional human annotation, making it a relatively hassle-free way to improve the model's trustworthiness and verifiability.

Technical Explanation

The key innovation of this paper is the Quote-Tuning approach, which aims to trivialize the verification process for large language models. Rather than relying on potentially error-prone citations, Quote-Tuning trains the models to directly quote verbatim statements from the trusted sources used during pre-training.

The process works as follows: First, efficient membership inference tools are used to quantify how much the model is quoting from the pre-training corpus. This quoting behavior is then used as an implicit reward signal to construct a synthetic preference dataset, without any human annotation. The target model is then aligned to this quoting preference using optimization algorithms.

Experimental results show that Quote-Tuning can significantly increase the percentage of model generations that are verbatim quotes from high-quality sources, by 55% to 130% compared to untuned models. Crucially, this increase in quoting is achieved while maintaining the overall response quality of the model.

Further experiments demonstrate that Quote-Tuning generalizes well to out-of-domain data, is applicable across different tasks, and provides additional benefits in terms of model truthfulness. By making verification trivial through direct quoting, Quote-Tuning opens up new avenues for improving the trustworthiness of large language models.

Critical Analysis

The Quote-Tuning approach presented in this paper is an innovative solution to the challenge of verifiability for large language models. By training the models to directly quote from trusted sources, it avoids the potential pitfalls of citation-based approaches, which can be prone to mistakes.

However, one limitation of the current work is that it does not address the potential issue of models memorizing and regurgitating harmful or biased content that may be present in the pre-training data. While the focus is on increasing verifiability, further research is needed to ensure that the quoted content is not only factually accurate, but also ethical and unbiased.

Additionally, the paper does not explore the potential impact of Quote-Tuning on the model's ability to capture and reflect human beliefs and preferences. It would be interesting to investigate whether the increased quoting behavior has any effect, positive or negative, on the model's psychometric and predictive capabilities.

Overall, the Quote-Tuning approach is a promising step towards improving the trustworthiness and verifiability of large language models. By making the verification process more transparent and accessible, it has the potential to enhance human trust and confidence in these powerful AI systems.

Conclusion

This paper presents a novel approach called Quote-Tuning to address the challenge of verifiability for large language models. By training the models to directly quote verbatim statements from trusted sources, it trivializes the verification process and allows humans to easily check the accuracy of the information provided by the models.

The experimental results demonstrate that Quote-Tuning can significantly increase the amount of quoted content while maintaining overall response quality. Additionally, the method generalizes well to out-of-domain data and provides benefits in terms of model truthfulness.

Overall, Quote-Tuning represents an important step forward in enhancing the trustworthiness and verifiability of large language models, which is crucial for their widespread adoption and use in real-world applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

📉

Towards Faithful and Robust LLM Specialists for Evidence-Based Question-Answering

Tobias Schimanski, Jingwei Ni, Mathias Kraus, Elliott Ash, Markus Leippold

Advances towards more faithful and traceable answers of Large Language Models (LLMs) are crucial for various research and practical endeavors. One avenue in reaching this goal is basing the answers on reliable sources. However, this Evidence-Based QA has proven to work insufficiently with LLMs in terms of citing the correct sources (source quality) and truthfully representing the information within sources (answer attributability). In this work, we systematically investigate how to robustly fine-tune LLMs for better source quality and answer attributability. Specifically, we introduce a data generation pipeline with automated data quality filters, which can synthesize diversified high-quality training and testing data at scale. We further introduce four test sets to benchmark the robustness of fine-tuned specialist models. Extensive evaluation shows that fine-tuning on synthetic data improves performance on both in- and out-of-distribution. Furthermore, we show that data quality, which can be drastically improved by proposed quality filters, matters more than quantity in improving Evidence-Based QA.

6/4/2024

cs.CL cs.LG

💬

Effective Large Language Model Adaptation for Improved Grounding and Citation Generation

Xi Ye, Ruoxi Sun, Sercan O. Arik, Tomas Pfister

Large language models (LLMs) have achieved remarkable advancements in natural language understanding and generation. However, one major issue towards their widespread deployment in the real world is that they can generate hallucinated answers that are not factual. Towards this end, this paper focuses on improving LLMs by grounding their responses in retrieved passages and by providing citations. We propose a new framework, AGREE, Adaptation for GRounding EnhancEment, that improves the grounding from a holistic perspective. Our framework tunes LLMs to selfground the claims in their responses and provide accurate citations to retrieved documents. This tuning on top of the pre-trained LLMs requires well-grounded responses (with citations) for paired queries, for which we introduce a method that can automatically construct such data from unlabeled queries. The selfgrounding capability of tuned LLMs further grants them a test-time adaptation (TTA) capability that can actively retrieve passages to support the claims that have not been grounded, which iteratively improves the responses of LLMs. Across five datasets and two LLMs, our results show that the proposed tuningbased AGREE framework generates superior grounded responses with more accurate citations compared to prompting-based approaches and post-hoc citing-based approaches

4/4/2024

cs.CL

Source-Aware Training Enables Knowledge Attribution in Language Models

Muhammad Khalifa, David Wadden, Emma Strubell, Honglak Lee, Lu Wang, Iz Beltagy, Hao Peng

Large language models (LLMs) learn a vast amount of knowledge during pretraining, but they are often oblivious to the source(s) of such knowledge. We investigate the problem of intrinsic source citation, where LLMs are required to cite the pretraining source supporting a generated response. Intrinsic source citation can enhance LLM transparency, interpretability, and verifiability. To give LLMs such ability, we explore source-aware training -- a post pretraining recipe that involves (i) training the LLM to associate unique source document identifiers with the knowledge in each document, followed by (ii) an instruction-tuning to teach the LLM to cite a supporting pretraining source when prompted. Source-aware training can easily be applied to pretrained LLMs off the shelf, and diverges minimally from existing pretraining/fine-tuning frameworks. Through experiments on carefully curated data, we demonstrate that our training recipe can enable faithful attribution to the pretraining data without a substantial impact on the model's quality compared to standard pretraining. Our results also highlight the importance of data augmentation in achieving attribution. Code and data available here: url{https://github.com/mukhal/intrinsic-source-citation}

4/12/2024

cs.CL cs.AI

A Realistic Evaluation of LLMs for Quotation Attribution in Literary Texts: A Case Study of LLaMa3

Gaspard Michel, Elena V. Epure, Romain Hennequin, Christophe Cerisara

Large Language Models (LLMs) zero-shot and few-shot performance are subject to memorization and data contamination, complicating the assessment of their validity. In literary tasks, the performance of LLMs is often correlated to the degree of book memorization. In this work, we carry out a realistic evaluation of LLMs for quotation attribution in novels, taking the instruction fined-tuned version of Llama3 as an example. We design a task-specific memorization measure and use it to show that Llama3's ability to perform quotation attribution is positively correlated to the novel degree of memorization. However, Llama3 still performs impressively well on books it has not memorized nor seen. Data and code will be made publicly available.

6/18/2024

cs.CL