Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models

2402.03271

Published 5/31/2024 by Zhiyuan Hu, Chumin Liu, Xidong Feng, Yilun Zhao, See-Kiong Ng, Anh Tuan Luu, Junxian He, Pang Wei Koh, Bryan Hooi

cs.CL cs.AI cs.LG

💬

Abstract

In the face of uncertainty, the ability to

seek information

is of fundamental importance. In many practical applications, such as medical diagnosis and troubleshooting, the information needed to solve the task is not initially given and has to be actively sought by asking follow-up questions (for example, a doctor asking a patient for more details about their symptoms). In this work, we introduce Uncertainty of Thoughts (UoT), an algorithm to augment large language models with the ability to actively seek information by asking effective questions. UoT combines 1) an

uncertainty-aware simulation approach

which enables the model to simulate possible future scenarios and how likely they are to occur, 2)

uncertainty-based rewards

motivated by information gain which incentivizes the model to seek information, and 3) a

reward propagation scheme

to select the optimal question to ask in a way that maximizes the expected reward. In experiments on medical diagnosis, troubleshooting, and the 20 Questions game, UoT achieves an average performance improvement of 38.1% in the rate of successful task completion across multiple LLMs compared with direct prompting and also improves efficiency (i.e., the number of questions needed to complete the task). Our code has been released here

Create account to get full access

Overview

Introduces Uncertainty of Thoughts (UoT), an algorithm to enable large language models to actively seek information through effective questioning
UoT combines uncertainty-aware simulation, uncertainty-based rewards, and reward propagation to select optimal questions
Experiments show UoT achieves 38.1% average performance improvement in successful task completion across medical diagnosis, troubleshooting, and "20 Questions" game

Plain English Explanation

When facing uncertainty, the ability to seek information is crucial. For example, in medical diagnosis or troubleshooting, the information needed to solve the problem may not be initially provided, so the model needs to actively ask follow-up questions to gather more details.

The Uncertainty of Thoughts (UoT) algorithm aims to give large language models this capability. UoT has three key components:

Uncertainty-aware simulation: The model can imagine possible future scenarios and estimate how likely they are to occur.
Uncertainty-based rewards: The model is incentivized to seek information that reduces uncertainty and maximizes its expected reward.
Reward propagation: The model selects the optimal question to ask in a way that maximizes the expected reward.

In experiments on medical diagnosis, troubleshooting, and the "20 Questions" game, UoT improved the success rate by an average of 38.1% compared to directly prompting the language model. It also made the process more efficient by requiring fewer questions to complete the tasks.

Technical Explanation

The Uncertainty of Thoughts (UoT) algorithm combines several key techniques to enable large language models to actively seek information:

Uncertainty-aware simulation: UoT uses a simulation-based approach to estimate the uncertainty associated with possible future scenarios. This allows the model to reason about the likelihood of different outcomes and the value of gathering additional information.
Uncertainty-based rewards: UoT defines rewards based on the model's uncertainty reduction, motivating it to ask questions that provide the most informative answers and decrease uncertainty.
Reward propagation: To select the optimal question to ask, UoT uses a reward propagation scheme that evaluates the expected long-term reward of each possible question, allowing the model to choose the one that maximizes its expected information gain.

The researchers evaluated UoT on three tasks: medical diagnosis, troubleshooting, and the "20 Questions" game. Across these experiments, UoT achieved an average performance improvement of 38.1% in successful task completion compared to directly prompting the language model. UoT also improved efficiency, requiring fewer questions to complete the tasks.

Critical Analysis

The Uncertainty of Thoughts (UoT) algorithm represents an important step towards building language agents that can actively seek information to solve complex, open-ended tasks. However, the paper also acknowledges several limitations and avenues for future research:

Scalability: The computational complexity of the uncertainty-aware simulation and reward propagation mechanisms may limit the scalability of UoT to larger, more complex tasks.
Robustness: The performance of UoT may be sensitive to the quality and reliability of the underlying language model, which could be a concern when deploying such systems in real-world applications.
Ethical considerations: As language agents become more capable of actively questioning users, there may be ethical implications around privacy, trust, and the potential for manipulation that should be carefully considered.

Further research is needed to address these challenges and explore ways to make uncertainty-aware language agents more robust, scalable, and aligned with human values.

Conclusion

The Uncertainty of Thoughts (UoT) algorithm represents an important step towards building language models that can actively seek information to solve complex, open-ended tasks. By combining uncertainty-aware simulation, uncertainty-based rewards, and reward propagation, UoT enables language models to ask effective questions that improve task success rates and efficiency.

As the field of uncertainty-aware language models and uncertainty quantification in large language models continues to advance, we can expect to see more powerful and capable language agents that can better navigate uncertainty and actively collaborate with humans to solve a wide range of problems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Towards Uncertainty-Aware Language Agent

Jiuzhou Han, Wray Buntine, Ehsan Shareghi

While Language Agents have achieved promising success by placing Large Language Models at the core of a more versatile design that dynamically interacts with the external world, the existing approaches neglect the notion of uncertainty during these interactions. We present the Uncertainty-Aware Language Agent (UALA), a framework that orchestrates the interaction between the agent and the external world using uncertainty quantification. Compared with other well-known counterparts like ReAct, our extensive experiments across 3 representative tasks (HotpotQA, StrategyQA, MMLU) and various LLM sizes demonstrate that UALA brings a significant improvement of performance, while having a substantially lower reliance on the external world (i.e., reduced number of tool calls and tokens). Our analyses provide various insights including the great potential of UALA compared with agent fine-tuning, and underscore the unreliability of verbalised confidence of LLMs as a proxy for uncertainty.

5/31/2024

cs.CL

Shifting Attention to Relevance: Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models

Jinhao Duan, Hao Cheng, Shiqi Wang, Alex Zavalny, Chenan Wang, Renjing Xu, Bhavya Kailkhura, Kaidi Xu

Large Language Models (LLMs) show promising results in language generation and instruction following but frequently hallucinate, making their outputs less reliable. Despite Uncertainty Quantification's (UQ) potential solutions, implementing it accurately within LLMs is challenging. Our research introduces a simple heuristic: not all tokens in auto-regressive LLM text equally represent the underlying meaning, as linguistic redundancy often allows a few keywords to convey the essence of long sentences. However, current methods underestimate this inequality when assessing uncertainty, causing tokens with limited semantics to be equally or excessively weighted in UQ. To correct this, we propose Shifting Attention to more Relevant (SAR) components at both token- and sentence-levels for better UQ. We conduct extensive experiments involving a range of popular off-the-shelf LLMs, such as Vicuna, WizardLM, and LLaMA-2-chat, with model sizes extending up to 33B parameters. We evaluate various free-form question-answering tasks, encompassing domains such as reading comprehension, science Q&A, and medical Q&A. Our experimental results, coupled with a comprehensive demographic analysis, demonstrate the superior performance of SAR. The code is available at https://github.com/jinhaoduan/SAR.

5/30/2024

cs.CL cs.AI cs.LG

To Believe or Not to Believe Your LLM

Yasin Abbasi Yadkori, Ilja Kuzborskij, Andr'as Gyorgy, Csaba Szepesv'ari

We explore uncertainty quantification in large language models (LLMs), with the goal to identify when uncertainty in responses given a query is large. We simultaneously consider both epistemic and aleatoric uncertainties, where the former comes from the lack of knowledge about the ground truth (such as about facts or the language), and the latter comes from irreducible randomness (such as multiple possible answers). In particular, we derive an information-theoretic metric that allows to reliably detect when only epistemic uncertainty is large, in which case the output of the model is unreliable. This condition can be computed based solely on the output of the model obtained simply by some special iterative prompting based on the previous responses. Such quantification, for instance, allows to detect hallucinations (cases when epistemic uncertainty is high) in both single- and multi-answer responses. This is in contrast to many standard uncertainty quantification strategies (such as thresholding the log-likelihood of a response) where hallucinations in the multi-answer case cannot be detected. We conduct a series of experiments which demonstrate the advantage of our formulation. Further, our investigations shed some light on how the probabilities assigned to a given output by an LLM can be amplified by iterative prompting, which might be of independent interest.

6/5/2024

cs.LG cs.AI cs.CL

💬

Knowledge of Knowledge: Exploring Known-Unknowns Uncertainty with Large Language Models

Alfonso Amayuelas, Liangming Pan, Wenhu Chen, William Wang

This paper investigates the capabilities of Large Language Models (LLMs) in the context of understanding their knowledge and uncertainty over questions. Specifically, we focus on addressing known-unknown questions, characterized by high uncertainty due to the absence of definitive answers. To facilitate our study, we collect a new dataset with Known-Unknown Questions (KUQ) and establish a categorization framework to clarify the origins of uncertainty in such queries. Subsequently, we examine the performance of open-source LLMs, fine-tuned using this dataset, in distinguishing between known and unknown queries within open-ended question-answering scenarios. The fine-tuned models demonstrated a significant improvement, achieving a considerable increase in F1-score relative to their pre-fine-tuning state. Through a comprehensive analysis, we reveal insights into the models' improved uncertainty articulation and their consequent efficacy in multi-agent debates. These findings help us understand how LLMs can be trained to identify and express uncertainty, improving our knowledge of how they understand and express complex or unclear information.

6/24/2024

cs.CL cs.AI