Language Models Represent Beliefs of Self and Others

2402.18496

Published 5/31/2024 by Wentao Zhu, Zhining Zhang, Yizhou Wang

Language Models Represent Beliefs of Self and Others

Abstract

Understanding and attributing mental states, known as Theory of Mind (ToM), emerges as a fundamental capability for human social reasoning. While Large Language Models (LLMs) appear to possess certain ToM abilities, the mechanisms underlying these capabilities remain elusive. In this study, we discover that it is possible to linearly decode the belief status from the perspectives of various agents through neural activations of language models, indicating the existence of internal representations of self and others' beliefs. By manipulating these representations, we observe dramatic changes in the models' ToM performance, underscoring their pivotal role in the social reasoning process. Additionally, our findings extend to diverse social reasoning tasks that involve different causal inference patterns, suggesting the potential generalizability of these representations.

Create account to get full access

Overview

This paper investigates how language models can represent beliefs of both the self and others, which is a key aspect of human theory of mind.
The authors conducted experiments to explore the model's ability to reason about the mental states and beliefs of different agents, and how this reasoning is reflected in the model's language output.
The findings have implications for understanding the inner workings of large language models and their potential alignment with human-like cognitive capabilities.

Plain English Explanation

Large language models (LLMs) like GPT-3 have achieved remarkable success in generating human-like text, but there is still much to learn about how they work under the hood. This paper explores an intriguing capability of these models: the ability to represent beliefs of both the self and others, which is a core part of human "theory of mind."

Theory of mind refers to our innate capacity to understand that other people have their own thoughts, beliefs, and intentions that may differ from our own. Humans develop this skill from a young age, and it's a crucial part of how we navigate the social world. The researchers wanted to see if large language models exhibit similar theory of mind capabilities.

Through a series of experiments, the authors found that LLMs do seem to have some ability to reason about the mental states of different agents. For example, the models could accurately predict how a character in a story would behave based on their beliefs and goals. The models also showed signs of self-awareness, able to distinguish their own perspective from that of other entities.

These findings suggest that the inner workings of LLMs may be more "human-like" than previously thought, with the models developing an intuitive understanding of social cognition. This has important implications for how we design and deploy these powerful AI systems to ensure they are aligned with human values and capabilities.

Technical Explanation

The paper begins by noting that a core component of human intelligence is the ability to represent and reason about the beliefs and mental states of both the self and others - a capacity known as "theory of mind." The authors hypothesized that large language models may also develop similar theory of mind capabilities through the process of pretraining on vast amounts of natural language data.

To investigate this, the researchers conducted a series of experiments designed to probe the models' ability to represent the beliefs and perspectives of different agents. In one experiment, they presented LLMs with stories involving characters with false beliefs, and measured the models' ability to accurately predict the characters' subsequent actions based on those false beliefs.

The results showed that LLMs did exhibit some theory of mind reasoning, correctly anticipating how characters would behave based on their (potentially mistaken) beliefs. The models also demonstrated self-awareness, being able to distinguish their own perspective from that of the story characters.

Further analysis revealed that the extent of the models' theory of mind abilities was correlated with their general language understanding capabilities, as measured by standard benchmarks. This suggests that the development of social cognition in LLMs may be intrinsically linked to their broader mastery of natural language and reasoning.

Overall, the findings indicate that the inner workings of LLMs may be more human-like than previously appreciated, with the models developing an intuitive grasp of theory of mind - a core component of human intelligence. This has important implications for how we understand and align these powerful AI systems with human values and capabilities.

Critical Analysis

The paper provides a compelling demonstration of theory of mind capabilities in large language models, but it acknowledges several caveats and limitations. First, the experiments were conducted on relatively simplistic story scenarios, and it's unclear how the models would perform in more complex, real-world social situations.

Additionally, the theory of mind abilities exhibited by the LLMs, while notable, were still far below the level of human understanding. The models showed signs of distinguishing their own perspective from others, but they struggled to fully grasp the nuances of false beliefs and the resulting behavioral predictions.

There are also open questions about the underlying mechanisms driving the models' theory of mind reasoning. Is this capability truly a emergent property of the models' architecture and training, or are there specific architectural or training choices that could be made to further enhance these social cognition skills?

Further research is needed to better understand the strengths, limitations, and potential biases of LLMs when it comes to representing and reasoning about the beliefs and mental states of both the self and others. As these models become more powerful and widely deployed, it will be critical to ensure they are aligned with human-like theory of mind capabilities to enable safe and beneficial interactions.

Conclusion

This paper provides an illuminating look into the theory of mind capabilities of large language models, a fundamental aspect of human intelligence that has been relatively unstudied in the context of AI. The findings suggest that LLMs may be developing some intuitive understanding of social cognition, but there is still much work to be done to fully align these models with human-level theory of mind.

As language models continue to advance, understanding and enhancing their ability to represent and reason about beliefs - both their own and those of others - will be crucial for ensuring safe and beneficial interactions between humans and AI systems. This research represents an important step in that direction, and raises fascinating questions about the inner workings and potential of these powerful technologies.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Theory of Mind for Multi-Agent Collaboration via Large Language Models

Huao Li, Yu Quan Chong, Simon Stepputtis, Joseph Campbell, Dana Hughes, Michael Lewis, Katia Sycara

While Large Language Models (LLMs) have demonstrated impressive accomplishments in both reasoning and planning, their abilities in multi-agent collaborations remains largely unexplored. This study evaluates LLM-based agents in a multi-agent cooperative text game with Theory of Mind (ToM) inference tasks, comparing their performance with Multi-Agent Reinforcement Learning (MARL) and planning-based baselines. We observed evidence of emergent collaborative behaviors and high-order Theory of Mind capabilities among LLM-based agents. Our results reveal limitations in LLM-based agents' planning optimization due to systematic failures in managing long-horizon contexts and hallucination about the task state. We explore the use of explicit belief state representations to mitigate these issues, finding that it enhances task performance and the accuracy of ToM inferences for LLM-based agents.

6/28/2024

cs.CL cs.AI

🏅

LLM Theory of Mind and Alignment: Opportunities and Risks

Winnie Street

Large language models (LLMs) are transforming human-computer interaction and conceptions of artificial intelligence (AI) with their impressive capacities for conversing and reasoning in natural language. There is growing interest in whether LLMs have theory of mind (ToM); the ability to reason about the mental and emotional states of others that is core to human social intelligence. As LLMs are integrated into the fabric of our personal, professional and social lives and given greater agency to make decisions with real-world consequences, there is a critical need to understand how they can be aligned with human values. ToM seems to be a promising direction of inquiry in this regard. Following the literature on the role and impacts of human ToM, this paper identifies key areas in which LLM ToM will show up in human:LLM interactions at individual and group levels, and what opportunities and risks for alignment are raised in each. On the individual level, the paper considers how LLM ToM might manifest in goal specification, conversational adaptation, empathy and anthropomorphism. On the group level, it considers how LLM ToM might facilitate collective alignment, cooperation or competition, and moral judgement-making. The paper lays out a broad spectrum of potential implications and suggests the most pressing areas for future research.

5/15/2024

cs.HC cs.AI

LLMs achieve adult human performance on higher-order theory of mind tasks

Winnie Street, John Oliver Siy, Geoff Keeling, Adrien Baranes, Benjamin Barnett, Michael McKibben, Tatenda Kanyere, Alison Lentz, Blaise Aguera y Arcas, Robin I. M. Dunbar

This paper examines the extent to which large language models (LLMs) have developed higher-order theory of mind (ToM); the human ability to reason about multiple mental and emotional states in a recursive manner (e.g. I think that you believe that she knows). This paper builds on prior work by introducing a handwritten test suite -- Multi-Order Theory of Mind Q&A -- and using it to compare the performance of five LLMs to a newly gathered adult human benchmark. We find that GPT-4 and Flan-PaLM reach adult-level and near adult-level performance on ToM tasks overall, and that GPT-4 exceeds adult performance on 6th order inferences. Our results suggest that there is an interplay between model size and finetuning for the realisation of ToM abilities, and that the best-performing LLMs have developed a generalised capacity for ToM. Given the role that higher-order ToM plays in a wide range of cooperative and competitive human behaviours, these findings have significant implications for user-facing LLM applications.

6/3/2024

cs.AI cs.CL cs.HC

Through the Theory of Mind's Eye: Reading Minds with Multimodal Video Large Language Models

Zhawnen Chen, Tianchun Wang, Yizhou Wang, Michal Kosinski, Xiang Zhang, Yun Fu, Sheng Li

Can large multimodal models have a human-like ability for emotional and social reasoning, and if so, how does it work? Recent research has discovered emergent theory-of-mind (ToM) reasoning capabilities in large language models (LLMs). LLMs can reason about people's mental states by solving various text-based ToM tasks that ask questions about the actors' ToM (e.g., human belief, desire, intention). However, human reasoning in the wild is often grounded in dynamic scenes across time. Thus, we consider videos a new medium for examining spatio-temporal ToM reasoning ability. Specifically, we ask explicit probing questions about videos with abundant social and emotional reasoning content. We develop a pipeline for multimodal LLM for ToM reasoning using video and text. We also enable explicit ToM reasoning by retrieving key frames for answering a ToM question, which reveals how multimodal LLMs reason about ToM.

6/21/2024

cs.CV cs.AI