Perceptions of Linguistic Uncertainty by Language Models and Humans

Read original: arXiv:2407.15814 - Published 7/23/2024 by Catarina G Belem, Markelle Kelly, Mark Steyvers, Sameer Singh, Padhraic Smyth

Perceptions of Linguistic Uncertainty by Language Models and Humans

Overview

Examines how language models and humans perceive linguistic uncertainty
Compares the ability of language models and humans to express and understand uncertainty in language
Provides insights into the similarities and differences between how machines and humans handle uncertainty

Plain English Explanation

This paper explores how language models and humans perceive and express linguistic uncertainty. Language models are AI systems that are trained on vast amounts of text data to generate human-like language. The researchers wanted to understand how well these models can faithfully express their uncertainty compared to humans.

The study used various language comprehension tasks to assess how language models and humans handle uncertain language. The results provide insights into the differences in how machines and humans grapple with uncertainty in language.

Technical Explanation

The researchers conducted a series of experiments to compare the perceptions of linguistic uncertainty between language models and humans. They evaluated the ability of both to express and understand uncertainty in language using various tasks, such as identifying uncertain statements, quantifying degrees of uncertainty, and generating responses that convey appropriate levels of uncertainty.

The results showed both similarities and differences in how language models and humans handle linguistic uncertainty. While language models were generally able to recognize uncertain statements, they sometimes struggled to generate responses that conveyed the right level of uncertainty. Humans, on the other hand, demonstrated a more nuanced understanding of uncertainty and were better able to express appropriate degrees of uncertainty in their language.

The findings provide important insights into the limitations of current language models in handling linguistic uncertainty, and suggest areas for further research and development to improve their performance in this domain.

Critical Analysis

The paper provides valuable insights into the differences between how language models and humans perceive and express linguistic uncertainty. However, it is important to note that the study was limited to a specific set of tasks and language models, and the findings may not generalize to all types of language models or applications.

Additionally, the paper does not delve into the potential causes of the observed differences, such as the training data or architectural choices of the language models. Further research is needed to better understand the underlying factors that influence the handling of linguistic uncertainty by both machines and humans.

It would also be interesting to explore the implications of these findings for real-world applications of language models, such as in conversational AI or natural language processing tasks, where the ability to convey and understand uncertainty is crucial.

Conclusion

This paper provides important insights into the perceptions of linguistic uncertainty by language models and humans. The findings suggest that while language models can recognize uncertain statements, they may struggle to generate responses that convey the appropriate level of uncertainty. In contrast, humans demonstrate a more nuanced understanding of linguistic uncertainty.

These insights have significant implications for the development and deployment of language models, particularly in applications where the ability to express and understand uncertainty is critical. The paper highlights the need for further research to improve the handling of linguistic uncertainty by AI systems and to better understand the similarities and differences between machine and human cognition in this domain.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Perceptions of Linguistic Uncertainty by Language Models and Humans

Catarina G Belem, Markelle Kelly, Mark Steyvers, Sameer Singh, Padhraic Smyth

Uncertainty expressions such as ``probably'' or ``highly unlikely'' are pervasive in human language. While prior work has established that there is population-level agreement in terms of how humans interpret these expressions, there has been little inquiry into the abilities of language models to interpret such expressions. In this paper, we investigate how language models map linguistic expressions of uncertainty to numerical responses. Our approach assesses whether language models can employ theory of mind in this setting: understanding the uncertainty of another agent about a particular statement, independently of the model's own certainty about that statement. We evaluate both humans and 10 popular language models on a task created to assess these abilities. Unexpectedly, we find that 8 out of 10 models are able to map uncertainty expressions to probabilistic responses in a human-like manner. However, we observe systematically different behavior depending on whether a statement is actually true or false. This sensitivity indicates that language models are substantially more susceptible to bias based on their prior knowledge (as compared to humans). These findings raise important questions and have broad implications for human-AI alignment and AI-AI communication.

7/23/2024

Relying on the Unreliable: The Impact of Language Models' Reluctance to Express Uncertainty

Kaitlyn Zhou, Jena D. Hwang, Xiang Ren, Maarten Sap

As natural language becomes the default interface for human-AI interaction, there is a need for LMs to appropriately communicate uncertainties in downstream applications. In this work, we investigate how LMs incorporate confidence in responses via natural language and how downstream users behave in response to LM-articulated uncertainties. We examine publicly deployed models and find that LMs are reluctant to express uncertainties when answering questions even when they produce incorrect responses. LMs can be explicitly prompted to express confidences, but tend to be overconfident, resulting in high error rates (an average of 47%) among confident responses. We test the risks of LM overconfidence by conducting human experiments and show that users rely heavily on LM generations, whether or not they are marked by certainty. Lastly, we investigate the preference-annotated datasets used in post training alignment and find that humans are biased against texts with uncertainty. Our work highlights new safety harms facing human-LM interactions and proposes design recommendations and mitigating strategies moving forward.

7/11/2024

💬

I'm Not Sure, But...: Examining the Impact of Large Language Models' Uncertainty Expression on User Reliance and Trust

Sunnie S. Y. Kim, Q. Vera Liao, Mihaela Vorvoreanu, Stephanie Ballard, Jennifer Wortman Vaughan

Widely deployed large language models (LLMs) can produce convincing yet incorrect outputs, potentially misleading users who may rely on them as if they were correct. To reduce such overreliance, there have been calls for LLMs to communicate their uncertainty to end users. However, there has been little empirical work examining how users perceive and act upon LLMs' expressions of uncertainty. We explore this question through a large-scale, pre-registered, human-subject experiment (N=404) in which participants answer medical questions with or without access to responses from a fictional LLM-infused search engine. Using both behavioral and self-reported measures, we examine how different natural language expressions of uncertainty impact participants' reliance, trust, and overall task performance. We find that first-person expressions (e.g., I'm not sure, but...) decrease participants' confidence in the system and tendency to agree with the system's answers, while increasing participants' accuracy. An exploratory analysis suggests that this increase can be attributed to reduced (but not fully eliminated) overreliance on incorrect answers. While we observe similar effects for uncertainty expressed from a general perspective (e.g., It's not clear, but...), these effects are weaker and not statistically significant. Our findings suggest that using natural language expressions of uncertainty may be an effective approach for reducing overreliance on LLMs, but that the precise language used matters. This highlights the importance of user testing before deploying LLMs at scale.

5/16/2024

Can Large Language Models Faithfully Express Their Intrinsic Uncertainty in Words?

Gal Yona, Roee Aharoni, Mor Geva

We posit that large language models (LLMs) should be capable of expressing their intrinsic uncertainty in natural language. For example, if the LLM is equally likely to output two contradicting answers to the same question, then its generated response should reflect this uncertainty by hedging its answer (e.g., I'm not sure, but I think...). We formalize faithful response uncertainty based on the gap between the model's intrinsic confidence in the assertions it makes and the decisiveness by which they are conveyed. This example-level metric reliably indicates whether the model reflects its uncertainty, as it penalizes both excessive and insufficient hedging. We evaluate a variety of aligned LLMs at faithfully communicating uncertainty on several knowledge-intensive question answering tasks. Our results provide strong evidence that modern LLMs are poor at faithfully conveying their uncertainty, and that better alignment is necessary to improve their trustworthiness.

5/28/2024