Do LLMs Exhibit Human-Like Reasoning? Evaluating Theory of Mind in LLMs for Open-Ended Responses

2406.05659

Published 6/11/2024 by Maryam Amirizaniani, Elias Martin, Maryna Sivachenko, Afra Mashhadi, Chirag Shah

Do LLMs Exhibit Human-Like Reasoning? Evaluating Theory of Mind in LLMs for Open-Ended Responses

Abstract

Theory of Mind (ToM) reasoning entails recognizing that other individuals possess their own intentions, emotions, and thoughts, which is vital for guiding one's own thought processes. Although large language models (LLMs) excel in tasks such as summarization, question answering, and translation, they still face challenges with ToM reasoning, especially in open-ended questions. Despite advancements, the extent to which LLMs truly understand ToM reasoning and how closely it aligns with human ToM reasoning remains inadequately explored in open-ended scenarios. Motivated by this gap, we assess the abilities of LLMs to perceive and integrate human intentions and emotions into their ToM reasoning processes within open-ended questions. Our study utilizes posts from Reddit's ChangeMyView platform, which demands nuanced social reasoning to craft persuasive responses. Our analysis, comparing semantic similarity and lexical overlap metrics between responses generated by humans and LLMs, reveals clear disparities in ToM reasoning capabilities in open-ended questions, with even the most advanced models showing notable limitations. To enhance LLM capabilities, we implement a prompt tuning method that incorporates human intentions and emotions, resulting in improvements in ToM reasoning performance. However, despite these improvements, the enhancement still falls short of fully achieving human-like reasoning. This research highlights the deficiencies in LLMs' social reasoning and demonstrates how integrating human intentions and emotions can boost their effectiveness.

Create account to get full access

Overview

This paper examines whether large language models (LLMs) exhibit human-like reasoning and theory of mind, which is the ability to attribute mental states to oneself and others.
The researchers evaluate LLMs' theory of mind capabilities through open-ended responses to a variety of scenarios that require reasoning about the beliefs, desires, and intentions of the characters involved.
The findings provide insights into the current state of LLM reasoning and highlight opportunities and risks as these models become more advanced.

Plain English Explanation

The paper investigates whether large language models (LLMs) can think and reason in a way that is similar to how humans do. Specifically, it looks at whether LLMs have a "theory of mind" - the ability to understand that other people have their own thoughts, feelings, and intentions that may be different from your own.

The researchers tested this by giving LLMs open-ended prompts that required them to imagine the perspectives and motivations of different characters in a story. They wanted to see if the LLMs could respond in a way that showed they could understand and reason about the mental states of the characters, similar to how humans would.

The results provide insights into the current capabilities of LLMs when it comes to this type of high-level reasoning. This is an important area to explore as these models become more advanced and potentially take on more complex decision-making roles. Understanding their strengths and limitations in areas like theory of mind can help us identify opportunities and risks as we continue to develop and deploy these powerful AI systems.

Technical Explanation

The paper evaluates the theory of mind capabilities of large language models (LLMs) through their responses to open-ended prompts that require reasoning about the beliefs, desires, and intentions of the characters involved.

The researchers created a benchmark called OpenToM, which includes a diverse set of scenarios designed to test different aspects of theory of mind. These include tasks like interpreting indirect speech, understanding others' false beliefs, and reasoning about characters' emotional states and motivations.

LLMs were prompted with these OpenToM scenarios and their responses were analyzed to assess whether they exhibited human-like theory of mind reasoning. The results suggest that while LLMs can perform well on some theory of mind tasks, they still fall short of human-level understanding of self and others.

The paper also explores the relationship between LLMs' theory of mind capabilities and their overall reasoning abilities, as well as the potential for delegating certain theory of mind reasoning tasks to these language models.

Critical Analysis

The paper provides a comprehensive evaluation of LLMs' theory of mind capabilities, but it acknowledges several limitations and areas for further research. One key caveat is that the study focuses on open-ended responses, which may not fully capture the reasoning abilities of LLMs when constrained to more specific tasks or outputs.

Additionally, the paper notes that the current generation of LLMs may be biased or limited in their understanding of certain social and cultural contexts, which could impact their theory of mind performance. Expanding the diversity of the scenarios and test subjects used in the OpenToM benchmark could help address this concern.

While the findings suggest that LLMs still have room for improvement when it comes to human-like theory of mind reasoning, the paper encourages further exploration of the relationship between these capabilities and other aspects of language understanding and reasoning. Investigating how theory of mind abilities develop in LLMs as they become more advanced could yield valuable insights for the field of AI alignment.

Conclusion

This paper provides a detailed examination of the theory of mind capabilities of large language models, offering a nuanced perspective on their current strengths and limitations in this area of high-level reasoning. The findings highlight both the progress made in developing LLMs that can exhibit some human-like theory of mind, as well as the challenges that remain in fully capturing the complexity of human social and cognitive abilities.

As LLMs continue to advance and potentially take on more decision-making roles, understanding their theory of mind capabilities will be crucial for aligning these systems with human values and ensuring their actions and outputs are guided by a meaningful understanding of the mental states of those they interact with. The insights provided in this paper lay the groundwork for further research and development in this important area of AI alignment.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🏅

LLM Theory of Mind and Alignment: Opportunities and Risks

Winnie Street

Large language models (LLMs) are transforming human-computer interaction and conceptions of artificial intelligence (AI) with their impressive capacities for conversing and reasoning in natural language. There is growing interest in whether LLMs have theory of mind (ToM); the ability to reason about the mental and emotional states of others that is core to human social intelligence. As LLMs are integrated into the fabric of our personal, professional and social lives and given greater agency to make decisions with real-world consequences, there is a critical need to understand how they can be aligned with human values. ToM seems to be a promising direction of inquiry in this regard. Following the literature on the role and impacts of human ToM, this paper identifies key areas in which LLM ToM will show up in human:LLM interactions at individual and group levels, and what opportunities and risks for alignment are raised in each. On the individual level, the paper considers how LLM ToM might manifest in goal specification, conversational adaptation, empathy and anthropomorphism. On the group level, it considers how LLM ToM might facilitate collective alignment, cooperation or competition, and moral judgement-making. The paper lays out a broad spectrum of potential implications and suggests the most pressing areas for future research.

5/15/2024

cs.HC cs.AI

LLMs achieve adult human performance on higher-order theory of mind tasks

Winnie Street, John Oliver Siy, Geoff Keeling, Adrien Baranes, Benjamin Barnett, Michael McKibben, Tatenda Kanyere, Alison Lentz, Blaise Aguera y Arcas, Robin I. M. Dunbar

This paper examines the extent to which large language models (LLMs) have developed higher-order theory of mind (ToM); the human ability to reason about multiple mental and emotional states in a recursive manner (e.g. I think that you believe that she knows). This paper builds on prior work by introducing a handwritten test suite -- Multi-Order Theory of Mind Q&A -- and using it to compare the performance of five LLMs to a newly gathered adult human benchmark. We find that GPT-4 and Flan-PaLM reach adult-level and near adult-level performance on ToM tasks overall, and that GPT-4 exceeds adult performance on 6th order inferences. Our results suggest that there is an interplay between model size and finetuning for the realisation of ToM abilities, and that the best-performing LLMs have developed a generalised capacity for ToM. Given the role that higher-order ToM plays in a wide range of cooperative and competitive human behaviours, these findings have significant implications for user-facing LLM applications.

6/3/2024

cs.AI cs.CL cs.HC

Through the Theory of Mind's Eye: Reading Minds with Multimodal Video Large Language Models

Zhawnen Chen, Tianchun Wang, Yizhou Wang, Michal Kosinski, Xiang Zhang, Yun Fu, Sheng Li

Can large multimodal models have a human-like ability for emotional and social reasoning, and if so, how does it work? Recent research has discovered emergent theory-of-mind (ToM) reasoning capabilities in large language models (LLMs). LLMs can reason about people's mental states by solving various text-based ToM tasks that ask questions about the actors' ToM (e.g., human belief, desire, intention). However, human reasoning in the wild is often grounded in dynamic scenes across time. Thus, we consider videos a new medium for examining spatio-temporal ToM reasoning ability. Specifically, we ask explicit probing questions about videos with abundant social and emotional reasoning content. We develop a pipeline for multimodal LLM for ToM reasoning using video and text. We also enable explicit ToM reasoning by retrieving key frames for answering a ToM question, which reveals how multimodal LLMs reason about ToM.

6/21/2024

cs.CV cs.AI

💬

OpenToM: A Comprehensive Benchmark for Evaluating Theory-of-Mind Reasoning Capabilities of Large Language Models

Hainiu Xu, Runcong Zhao, Lixing Zhu, Jinhua Du, Yulan He

Neural Theory-of-Mind (N-ToM), machine's ability to understand and keep track of the mental states of others, is pivotal in developing socially intelligent agents. However, prevalent N-ToM benchmarks have several shortcomings, including the presence of ambiguous and artificial narratives, absence of personality traits and preferences, a lack of questions addressing characters' psychological mental states, and limited diversity in the questions posed. In response to these issues, we construct OpenToM, a new benchmark for assessing N-ToM with (1) longer and clearer narrative stories, (2) characters with explicit personality traits, (3) actions that are triggered by character intentions, and (4) questions designed to challenge LLMs' capabilities of modeling characters' mental states of both the physical and psychological world. Using OpenToM, we reveal that state-of-the-art LLMs thrive at modeling certain aspects of mental states in the physical world but fall short when tracking characters' mental states in the psychological world.

6/4/2024

cs.AI cs.CL