Sequence-Level Certainty Reduces Hallucination In Knowledge-Grounded Dialogue Generation

2310.18794

Published 4/16/2024 by Yixin Wan, Fanyou Wu, Weijie Xu, Srinivasan H. Sengamedu

🛸

Abstract

In this work, we propose sequence-level certainty as a common theme over hallucination in Knowledge Grounded Dialogue Generation (KGDG). We explore the correlation between the level of hallucination in model responses and two types of sequence-level certainty: probabilistic certainty and semantic certainty. Empirical results reveal that higher levels of both types of certainty in model responses are correlated with lower levels of hallucination. We further propose Certainty-based Response Ranking (CRR), a decoding-time hallucination mitigation method that samples several response candidates, ranks them based on sequence-level certainty, and outputs the response with the highest certainty level. Aligning with our definitions of sequence-level certainty, we design 2 types of CRR approaches: Probabilistic CRR (P-CRR) and Semantic CRR (S-CRR). P-CRR ranks individually sampled model responses using the arithmetic mean log-probability of the entire sequence. S-CRR approaches certainty estimation from meaning-space, and ranks model response candidates based on their semantic certainty level as measured by an entailment-based Agreement Score (AS). Through extensive experiments across 3 KGDG datasets, 3 decoding methods, and 4 KGDG models, we validate the effectiveness of CRR for reducing hallucination in KGDG task.

Create account to get full access

Overview

This paper explores the relationship between sequence-level certainty and hallucination in Knowledge Grounded Dialogue Generation (KGDG) models.
The authors propose two types of sequence-level certainty: probabilistic certainty and semantic certainty, and show that higher levels of both are associated with lower levels of hallucination in model responses.
They introduce Certainty-based Response Ranking (CRR), a decoding-time method that selects the most certain response from a set of candidates to mitigate hallucination.
The paper validates the effectiveness of CRR through extensive experiments on multiple KGDG datasets, decoding methods, and models.

Plain English Explanation

When you have a conversation with someone, you generally expect their responses to be relevant and truthful, based on the information available to them. However, language models used for dialogue generation can sometimes produce responses that are not grounded in facts, a phenomenon known as hallucination.

In this research, the authors explore the idea that the level of certainty in a model's responses may be linked to the likelihood of hallucination. They define two types of sequence-level certainty:

Probabilistic Certainty: How confident the model is that the entire response sequence is likely, based on the average log-probability of the words.
Semantic Certainty: How semantically coherent and entailed the response is, based on an "Agreement Score" that measures how well the response aligns with the provided context.

The key finding is that responses with higher levels of both probabilistic and semantic certainty tend to have lower levels of hallucination. This suggests that measuring and optimizing for certainty could be a promising way to reduce hallucination in dialogue models.

To put this idea into practice, the authors propose a technique called Certainty-based Response Ranking (CRR). CRR generates multiple response candidates, then selects the one with the highest level of either probabilistic or semantic certainty. This helps ensure that the final response is more grounded in the provided information and less likely to contain hallucinated content.

Technical Explanation

The paper begins by defining two types of sequence-level certainty for KGDG models:

Probabilistic Certainty: The arithmetic mean of the log-probability of each token in the generated response sequence. This provides a measure of how likely the entire sequence is, according to the model's language understanding.
Semantic Certainty: An "Agreement Score" (AS) that quantifies how semantically coherent and entailed the response is, given the provided context. This is calculated using a pre-trained entailment model.

The authors then propose two variations of Certainty-based Response Ranking (CRR) that leverage these certainty measures:

Probabilistic CRR (P-CRR): Samples multiple response candidates and ranks them by their probabilistic certainty (mean log-probability).
Semantic CRR (S-CRR): Samples multiple response candidates and ranks them by their semantic certainty (Agreement Score).

Through extensive experiments on 3 KGDG datasets, 3 decoding methods, and 4 KGDG models, the authors show that both P-CRR and S-CRR are effective at reducing hallucination in generated responses, with S-CRR generally outperforming P-CRR.

Critical Analysis

The paper provides a well-designed and comprehensive study on the relationship between sequence-level certainty and hallucination in KGDG models. The authors' definitions of probabilistic and semantic certainty are intuitive and well-grounded in the literature on language model evaluation.

One potential limitation of the research is that it focuses solely on reducing hallucination, without considering other important aspects of dialogue quality, such as relevance, fluency, and coherence. It would be valuable to explore how the proposed CRR approaches impact these other dimensions of dialogue, and whether there are any trade-offs involved.

Additionally, the paper does not delve into the underlying reasons why higher levels of certainty correlate with lower hallucination. Further investigation into the causal mechanisms behind this relationship could provide valuable insights for improving dialogue generation models more broadly.

Finally, the authors mention the need for human evaluation to assess the real-world impact of their approaches. While the automated metrics used in the study are informative, ultimately, the true test of the CRR methods' effectiveness will come from assessments by human dialogue participants.

Conclusion

This research introduces an intriguing connection between sequence-level certainty and hallucination in KGDG models, and proposes practical techniques to leverage this insight for mitigating hallucination. The Certainty-based Response Ranking (CRR) approaches demonstrate promising results in reducing hallucination across multiple datasets and models, suggesting that certainty-aware decoding could be a valuable tool for improving the reliability and trustworthiness of dialogue systems.

As language models continue to play an increasingly important role in our daily interactions, the ability to ensure the factual grounding and coherence of their responses will be crucial. This work represents an important step towards that goal, and encourages further research into the connections between model certainty, hallucination, and other aspects of dialogue quality.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

A Cause-Effect Look at Alleviating Hallucination of Knowledge-grounded Dialogue Generation

Jifan Yu, Xiaohan Zhang, Yifan Xu, Xuanyu Lei, Zijun Yao, Jing Zhang, Lei Hou, Juanzi Li

Empowered by the large-scale pretrained language models, existing dialogue systems have demonstrated impressive performance conducting fluent and natural-sounding conversations. However, they are still plagued by the hallucination problem, causing unpredictable factual errors in the generated responses. Recently, knowledge-grounded dialogue generation models, that intentionally invoke external knowledge resources to more informative responses, are also proven to be effective in reducing hallucination. Following the idea of getting high-quality knowledge, a few efforts have achieved pretty good performance on this issue. As some inevitable knowledge noises may also lead to hallucinations, it is emergent to investigate the reason and future directions for building noise-tolerant methods in KGD tasks. In this paper, we analyze the causal story behind this problem with counterfactual reasoning methods. Based on the causal effect analysis, we propose a possible solution for alleviating the hallucination in KGD by exploiting the dialogue-knowledge interaction. Experimental results of our example implementation show that this method can reduce hallucination without disrupting other dialogue performance, while keeping adaptive to different generation models. We hope our efforts can support and call for more attention to developing lightweight techniques towards robust and trusty dialogue systems.

4/5/2024

cs.CL cs.AI

Knowledge Verification to Nip Hallucination in the Bud

Fanqi Wan, Xinting Huang, Leyang Cui, Xiaojun Quan, Wei Bi, Shuming Shi

While large language models (LLMs) have demonstrated exceptional performance across various tasks following human alignment, they may still generate responses that sound plausible but contradict factual knowledge, a phenomenon known as emph{hallucination}. In this paper, we demonstrate the feasibility of mitigating hallucinations by verifying and minimizing the inconsistency between external knowledge present in the alignment data and the intrinsic knowledge embedded within foundation LLMs. Specifically, we propose a novel approach called Knowledge Consistent Alignment (KCA), which employs a well-aligned LLM to automatically formulate assessments based on external knowledge to evaluate the knowledge boundaries of foundation LLMs. To address knowledge inconsistencies in the alignment data, KCA implements several specific strategies to deal with these data instances. We demonstrate the superior efficacy of KCA in reducing hallucinations across six benchmarks, utilizing foundation LLMs of varying backbones and scales. This confirms the effectiveness of mitigating hallucinations by reducing knowledge inconsistency. Our code, model weights, and data are openly accessible at url{https://github.com/fanqiwan/KCA}.

4/17/2024

cs.CL

💬

Mitigating Hallucinations in Large Language Models via Self-Refinement-Enhanced Knowledge Retrieval

Mengjia Niu, Hao Li, Jie Shi, Hamed Haddadi, Fan Mo

Large language models (LLMs) have demonstrated remarkable capabilities across various domains, although their susceptibility to hallucination poses significant challenges for their deployment in critical areas such as healthcare. To address this issue, retrieving relevant facts from knowledge graphs (KGs) is considered a promising method. Existing KG-augmented approaches tend to be resource-intensive, requiring multiple rounds of retrieval and verification for each factoid, which impedes their application in real-world scenarios. In this study, we propose Self-Refinement-Enhanced Knowledge Graph Retrieval (Re-KGR) to augment the factuality of LLMs' responses with less retrieval efforts in the medical field. Our approach leverages the attribution of next-token predictive probability distributions across different tokens, and various model layers to primarily identify tokens with a high potential for hallucination, reducing verification rounds by refining knowledge triples associated with these tokens. Moreover, we rectify inaccurate content using retrieved knowledge in the post-processing stage, which improves the truthfulness of generated responses. Experimental results on a medical dataset demonstrate that our approach can enhance the factual capability of LLMs across various foundational models as evidenced by the highest scores on truthfulness.

5/13/2024

cs.CL cs.LG

Semantic Entropy Probes: Robust and Cheap Hallucination Detection in LLMs

Jannik Kossen, Jiatong Han, Muhammed Razzak, Lisa Schut, Shreshth Malik, Yarin Gal

We propose semantic entropy probes (SEPs), a cheap and reliable method for uncertainty quantification in Large Language Models (LLMs). Hallucinations, which are plausible-sounding but factually incorrect and arbitrary model generations, present a major challenge to the practical adoption of LLMs. Recent work by Farquhar et al. (2024) proposes semantic entropy (SE), which can detect hallucinations by estimating uncertainty in the space semantic meaning for a set of model generations. However, the 5-to-10-fold increase in computation cost associated with SE computation hinders practical adoption. To address this, we propose SEPs, which directly approximate SE from the hidden states of a single generation. SEPs are simple to train and do not require sampling multiple model generations at test time, reducing the overhead of semantic uncertainty quantification to almost zero. We show that SEPs retain high performance for hallucination detection and generalize better to out-of-distribution data than previous probing methods that directly predict model accuracy. Our results across models and tasks suggest that model hidden states capture SE, and our ablation studies give further insights into the token positions and model layers for which this is the case.

6/26/2024

cs.CL cs.AI cs.LG