Explicit Modelling of Theory of Mind for Belief Prediction in Nonverbal Social Interactions

Read original: arXiv:2407.06762 - Published 8/29/2024 by Matteo Bortoletto, Constantin Ruhdorfer, Lei Shi, Andreas Bulling

Explicit Modelling of Theory of Mind for Belief Prediction in Nonverbal Social Interactions

Overview

This paper explores the use of explicit theory of mind (ToM) modeling for predicting beliefs in nonverbal social interactions.
The researchers developed a neural network architecture that incorporates ToM reasoning to infer the beliefs of other agents based on their nonverbal behaviors.
The model was evaluated on a dataset of social interactions and demonstrated improved performance compared to baseline approaches that do not explicitly model ToM.

Plain English Explanation

In social interactions, we often try to understand what other people are thinking or believing, even when they don't express their thoughts verbally. This is known as having a "theory of mind" - the ability to attribute mental states, like beliefs, to others.

The researchers in this paper wanted to create an AI system that could do this kind of theory of mind reasoning to predict what other people are thinking, based on their nonverbal behaviors (like their body language or facial expressions). They developed a neural network model that was specifically designed to incorporate explicit theory of mind concepts, in order to better infer the beliefs of other "agents" (which could be people or other AI systems) during social interactions.

The researchers tested their model on a dataset of social interactions and found that it performed better at predicting beliefs compared to other approaches that didn't explicitly model theory of mind. This suggests that incorporating this kind of reasoning into AI systems could help them better understand and interact with humans in social situations.

Technical Explanation

The paper presents a neural network architecture that explicitly models theory of mind (ToM) reasoning for predicting the beliefs of other agents in nonverbal social interactions. The model takes as input the observed behaviors of the agents and their past interactions, and outputs predictions of their current beliefs.

The key innovation is the incorporation of a dedicated ToM module within the neural network. This module is designed to reason about the mental states of other agents based on their observable actions and the context of the interaction. The ToM module feeds into the main belief prediction network, allowing the model to leverage this theory of mind reasoning to make more accurate belief inferences.

The researchers evaluated their model on a dataset of social interactions, comparing its performance to baseline approaches that do not explicitly model ToM. The results showed that the ToM-augmented model achieved significantly better belief prediction accuracy, demonstrating the value of incorporating this cognitive reasoning capacity.

Critical Analysis

The paper makes a compelling case for the importance of modeling theory of mind in AI systems that need to understand and interact with humans in social settings. The explicit ToM module is a novel architectural contribution that could have applications beyond the specific belief prediction task studied here.

However, the paper does not delve deeply into the inner workings of the ToM module or provide a detailed analysis of how it contributes to the model's performance. More insight into the specific ToM reasoning processes and how they interact with the other components of the network would be valuable.

Additionally, the dataset used for evaluation, while relevant, is relatively small and narrow in scope. Testing the model on a wider range of social interaction scenarios, including more diverse nonverbal cues and belief dynamics, would help strengthen the generalizability of the findings.

Further research could also explore how this ToM-augmented approach might be combined with other techniques, such as [perceptions-to-beliefs-exploring-precursory-inferences-theory] or [language-models-represent-beliefs-self-others], to provide a more comprehensive understanding of social cognition in AI systems.

Conclusion

This paper presents a novel neural network architecture that explicitly models theory of mind reasoning to improve the prediction of beliefs in nonverbal social interactions. The results demonstrate the value of incorporating this cognitive capacity into AI systems that need to understand and interact with humans in social settings.

While further research is needed to fully explore the potential of this approach, the work represents an important step forward in bridging the gap between human-level social intelligence and the capabilities of current AI systems. As AI becomes increasingly integrated into our daily lives, the ability to reason about the beliefs and mental states of others will be crucial for building systems that can engage in more natural and effective social interactions.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Explicit Modelling of Theory of Mind for Belief Prediction in Nonverbal Social Interactions

Matteo Bortoletto, Constantin Ruhdorfer, Lei Shi, Andreas Bulling

We propose MToMnet - a Theory of Mind (ToM) neural network for predicting beliefs and their dynamics during human social interactions from multimodal input. ToM is key for effective nonverbal human communication and collaboration, yet, existing methods for belief modelling have not included explicit ToM modelling or have typically been limited to one or two modalities. MToMnet encodes contextual cues (scene videos and object locations) and integrates them with person-specific cues (human gaze and body language) in a separate MindNet for each person. Inspired by prior research on social cognition and computational ToM, we propose three different MToMnet variants: two involving fusion of latent representations and one involving re-ranking of classification scores. We evaluate our approach on two challenging real-world datasets, one focusing on belief prediction, while the other examining belief dynamics prediction. Our results demonstrate that MToMnet surpasses existing methods by a large margin while at the same time requiring a significantly smaller number of parameters. Taken together, our method opens up a highly promising direction for future work on artificial intelligent systems that can robustly predict human beliefs from their non-verbal behaviour and, as such, more effectively collaborate with humans.

8/29/2024

📈

Learning mental states estimation through self-observation: a developmental synergy between intentions and beliefs representations in a deep-learning model of Theory of Mind

Francesca Bianco, Silvia Rigato, Maria Laura Filippetti, Dimitri Ognibene

Theory of Mind (ToM), the ability to attribute beliefs, intentions, or mental states to others, is a crucial feature of human social interaction. In complex environments, where the human sensory system reaches its limits, behaviour is strongly driven by our beliefs about the state of the world around us. Accessing others' mental states, e.g., beliefs and intentions, allows for more effective social interactions in natural contexts. Yet, these variables are not directly observable, making understanding ToM a challenging quest of interest for different fields, including psychology, machine learning and robotics. In this paper, we contribute to this topic by showing a developmental synergy between learning to predict low-level mental states (e.g., intentions, goals) and attributing high-level ones (i.e., beliefs). Specifically, we assume that learning beliefs attribution can occur by observing one's own decision processes involving beliefs, e.g., in a partially observable environment. Using a simple feed-forward deep learning model, we show that, when learning to predict others' intentions and actions, more accurate predictions can be acquired earlier if beliefs attribution is learnt simultaneously. Furthermore, we show that the learning performance improves even when observed actors have a different embodiment than the observer and the gain is higher when observing beliefs-driven chunks of behaviour. We propose that our computational approach can inform the understanding of human social cognitive development and be relevant for the design of future adaptive social robots able to autonomously understand, assist, and learn from human interaction partners in novel natural environments and tasks.

7/26/2024

Perceptions to Beliefs: Exploring Precursory Inferences for Theory of Mind in Large Language Models

Chani Jung, Dongkwan Kim, Jiho Jin, Jiseon Kim, Yeon Seonwoo, Yejin Choi, Alice Oh, Hyunwoo Kim

While humans naturally develop theory of mind (ToM), the capability to understand other people's mental states and beliefs, state-of-the-art large language models (LLMs) underperform on simple ToM benchmarks. We posit that we can extend our understanding of LLMs' ToM abilities by evaluating key human ToM precursors -- perception inference and perception-to-belief inference -- in LLMs. We introduce two datasets, Percept-ToMi and Percept-FANToM, to evaluate these precursory inferences for ToM in LLMs by annotating characters' perceptions on ToMi and FANToM, respectively. Our evaluation of eight state-of-the-art LLMs reveals that the models generally perform well in perception inference while exhibiting limited capability in perception-to-belief inference (e.g., lack of inhibitory control). Based on these results, we present PercepToM, a novel ToM method leveraging LLMs' strong perception inference capability while supplementing their limited perception-to-belief inference. Experimental results demonstrate that PercepToM significantly enhances LLM's performance, especially in false belief scenarios.

7/10/2024

Language Models Represent Beliefs of Self and Others

Wentao Zhu, Zhining Zhang, Yizhou Wang

Understanding and attributing mental states, known as Theory of Mind (ToM), emerges as a fundamental capability for human social reasoning. While Large Language Models (LLMs) appear to possess certain ToM abilities, the mechanisms underlying these capabilities remain elusive. In this study, we discover that it is possible to linearly decode the belief status from the perspectives of various agents through neural activations of language models, indicating the existence of internal representations of self and others' beliefs. By manipulating these representations, we observe dramatic changes in the models' ToM performance, underscoring their pivotal role in the social reasoning process. Additionally, our findings extend to diverse social reasoning tasks that involve different causal inference patterns, suggesting the potential generalizability of these representations.

5/31/2024