ToM-LM: Delegating Theory of Mind Reasoning to External Symbolic Executors in Large Language Models

Read original: arXiv:2404.15515 - Published 6/27/2024 by Weizhi Tang, Vaishak Belle

ToM-LM: Delegating Theory of Mind Reasoning to External Symbolic Executors in Large Language Models

Overview

This paper introduces ToM-LM, a novel approach to integrating external symbolic executors with large language models (LLMs) to enable more advanced theory of mind (ToM) reasoning capabilities.
The key idea is to delegate complex ToM reasoning tasks to specialized external modules, while leveraging the language understanding and generation capabilities of LLMs.
The authors demonstrate the effectiveness of ToM-LM on a new benchmark called NegotiationToM, which tests an agent's ability to reason about the beliefs, desires, and intentions of other agents during negotiation scenarios.

Plain English Explanation

The paper discusses a new way to make large language models (LLMs) better at something called "theory of mind" (ToM) reasoning. ToM is the ability to understand and reason about the mental states of other agents, like their beliefs, desires, and intentions.

The researchers created a system called ToM-LM that combines LLMs with specialized external "executors" that can handle the complex ToM reasoning tasks. The LLM is good at understanding and generating language, but the external modules do the heavy lifting when it comes to ToM reasoning.

To test this system, the researchers developed a new benchmark called NegotiationToM that challenges AI agents to reason about the mental states of other agents during negotiation scenarios. This tests the agents' ToM capabilities in a practical, real-world-like setting.

By delegating the ToM reasoning to external modules, the ToM-LM system was able to outperform regular LLMs on the NegotiationToM benchmark. This suggests that this hybrid approach of combining LLMs with specialized reasoning modules could be a promising direction for developing more advanced AI systems with better theory of mind abilities.

Technical Explanation

The paper introduces a novel architecture called ToM-LM that integrates large language models (LLMs) with external symbolic executors to enable more sophisticated theory of mind (ToM) reasoning capabilities.

The key innovation is the separation of language understanding/generation (handled by the LLM) and complex ToM reasoning (delegated to specialized external modules). This allows the system to leverage the strengths of both components - the LLM's fluency in natural language, and the external modules' capacity for logical inference and reasoning about mental states.

To evaluate ToM-LM, the authors developed a new benchmark called NegotiationToM that tests an agent's ability to reason about the beliefs, desires, and intentions of other agents during multi-agent negotiation scenarios. This benchmark builds on prior work on temporal reasoning and self-evaluation capabilities in LLMs.

The authors demonstrate that ToM-LM outperforms standalone LLMs on the NegotiationToM benchmark, highlighting the benefits of the hybrid architecture. They also conduct an in-depth analysis of the deductive competence of ToM-LM, showing its ability to reason about complex mental states and simulate the decision-making process of other agents.

Critical Analysis

The paper presents a compelling approach to enhancing the theory of mind reasoning capabilities of large language models by delegating the complex ToM tasks to external symbolic executors. This hybrid architecture leverages the strengths of both components and shows promising results on the novel NegotiationToM benchmark.

However, the authors acknowledge several limitations and areas for further research. For example, the current implementation assumes the availability of pre-trained ToM executors, which may not always be the case. Developing methods to automatically learn or synthesize these executors from data would be an important next step.

Additionally, the NegotiationToM benchmark, while a valuable contribution, may not capture the full breadth of ToM reasoning required in real-world situations. Expanding the benchmark to include a wider range of scenarios and mental states could help further stress-test the capabilities of ToM-LM and other ToM-enabled systems.

Finally, the paper does not address potential issues around the interpretability and transparency of the ToM-LM system. As these models become more complex, ensuring that their reasoning process is understandable and accountable will be crucial, especially in high-stakes applications.

Conclusion

The ToM-LM architecture presented in this paper represents a promising step towards developing large language models with more advanced theory of mind reasoning capabilities. By delegating the complex ToM tasks to external symbolic executors, the system is able to outperform standalone LLMs on the NegotiationToM benchmark, a new test of an agent's ability to reason about the mental states of other agents.

This hybrid approach highlights the potential benefits of combining the language understanding and generation capabilities of LLMs with specialized reasoning modules. As the field of AI continues to advance, such neuro-symbolic integration could play a key role in creating more intelligent and socially aware systems that can better understand and interact with the world around them.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

ToM-LM: Delegating Theory of Mind Reasoning to External Symbolic Executors in Large Language Models

Weizhi Tang, Vaishak Belle

Theory of Mind (ToM) refers to the ability of individuals to attribute mental states to others. While Large Language Models (LLMs) have shown some promise with ToM ability, they still struggle with complex ToM reasoning. Our approach leverages an external symbolic executor, specifically the SMCDEL model checker, and fine-tuning to improve the ToM reasoning ability of LLMs. In our approach, an LLM is first fine-tuned through pairs of natural language and symbolic formulation representation of ToM problems and is then instructed to generate the symbolic formulation with a one-shot in-context example. The generated symbolic formulation is then executed by the SMCDEL model checker to perform transparent and verifiable ToM reasoning and give the final result. We demonstrate that our approach, ToM-LM, shows a significant improvement over all the constructed baselines. Our study proposes a novel view about externalizing a particular component of ToM reasoning, mainly reasoning about beliefs, and suggests generalizing it to other aspects of ToM reasoning.

6/27/2024

Language Models Represent Beliefs of Self and Others

Wentao Zhu, Zhining Zhang, Yizhou Wang

Understanding and attributing mental states, known as Theory of Mind (ToM), emerges as a fundamental capability for human social reasoning. While Large Language Models (LLMs) appear to possess certain ToM abilities, the mechanisms underlying these capabilities remain elusive. In this study, we discover that it is possible to linearly decode the belief status from the perspectives of various agents through neural activations of language models, indicating the existence of internal representations of self and others' beliefs. By manipulating these representations, we observe dramatic changes in the models' ToM performance, underscoring their pivotal role in the social reasoning process. Additionally, our findings extend to diverse social reasoning tasks that involve different causal inference patterns, suggesting the potential generalizability of these representations.

5/31/2024

Do LLMs Exhibit Human-Like Reasoning? Evaluating Theory of Mind in LLMs for Open-Ended Responses

Maryam Amirizaniani, Elias Martin, Maryna Sivachenko, Afra Mashhadi, Chirag Shah

Theory of Mind (ToM) reasoning entails recognizing that other individuals possess their own intentions, emotions, and thoughts, which is vital for guiding one's own thought processes. Although large language models (LLMs) excel in tasks such as summarization, question answering, and translation, they still face challenges with ToM reasoning, especially in open-ended questions. Despite advancements, the extent to which LLMs truly understand ToM reasoning and how closely it aligns with human ToM reasoning remains inadequately explored in open-ended scenarios. Motivated by this gap, we assess the abilities of LLMs to perceive and integrate human intentions and emotions into their ToM reasoning processes within open-ended questions. Our study utilizes posts from Reddit's ChangeMyView platform, which demands nuanced social reasoning to craft persuasive responses. Our analysis, comparing semantic similarity and lexical overlap metrics between responses generated by humans and LLMs, reveals clear disparities in ToM reasoning capabilities in open-ended questions, with even the most advanced models showing notable limitations. To enhance LLM capabilities, we implement a prompt tuning method that incorporates human intentions and emotions, resulting in improvements in ToM reasoning performance. However, despite these improvements, the enhancement still falls short of fully achieving human-like reasoning. This research highlights the deficiencies in LLMs' social reasoning and demonstrates how integrating human intentions and emotions can boost their effectiveness.

6/11/2024

LLMs achieve adult human performance on higher-order theory of mind tasks

Winnie Street, John Oliver Siy, Geoff Keeling, Adrien Baranes, Benjamin Barnett, Michael McKibben, Tatenda Kanyere, Alison Lentz, Blaise Aguera y Arcas, Robin I. M. Dunbar

This paper examines the extent to which large language models (LLMs) have developed higher-order theory of mind (ToM); the human ability to reason about multiple mental and emotional states in a recursive manner (e.g. I think that you believe that she knows). This paper builds on prior work by introducing a handwritten test suite -- Multi-Order Theory of Mind Q&A -- and using it to compare the performance of five LLMs to a newly gathered adult human benchmark. We find that GPT-4 and Flan-PaLM reach adult-level and near adult-level performance on ToM tasks overall, and that GPT-4 exceeds adult performance on 6th order inferences. Our results suggest that there is an interplay between model size and finetuning for the realisation of ToM abilities, and that the best-performing LLMs have developed a generalised capacity for ToM. Given the role that higher-order ToM plays in a wide range of cooperative and competitive human behaviours, these findings have significant implications for user-facing LLM applications.

6/3/2024