ARN: Analogical Reasoning on Narratives

2310.00996

Published 4/24/2024 by Zhivar Sourati, Filip Ilievski, Pia Sommerauer, Yifan Jiang

🔍

Abstract

As a core cognitive skill that enables the transferability of information across domains, analogical reasoning has been extensively studied for both humans and computational models. However, while cognitive theories of analogy often focus on narratives and study the distinction between surface, relational, and system similarities, existing work in natural language processing has a narrower focus as far as relational analogies between word pairs. This gap brings a natural question: can state-of-the-art large language models (LLMs) detect system analogies between narratives? To gain insight into this question and extend word-based relational analogies to relational system analogies, we devise a comprehensive computational framework that operationalizes dominant theories of analogy, using narrative elements to create surface and system mappings. Leveraging the interplay between these mappings, we create a binary task and benchmark for Analogical Reasoning on Narratives (ARN), covering four categories of far (cross-domain)/near (within-domain) analogies and disanalogies. We show that while all LLMs can largely recognize near analogies, even the largest ones struggle with far analogies in a zero-shot setting, with GPT4.0 scoring below random. Guiding the models through solved examples and chain-of-thought reasoning enhances their analogical reasoning ability. Yet, since even in the few-shot setting, the best model only performs halfway between random and humans, ARN opens exciting directions for computational analogical reasoners.

Create account to get full access

Overview

• The paper investigates whether large language models (LLMs) can detect system analogies between narratives, which is a core cognitive skill that enables transferring information across domains. • The researchers create a comprehensive computational framework that operationalizes dominant theories of analogy, using narrative elements to create surface and system mappings. • They benchmark the analogical reasoning abilities of LLMs on a binary task covering four categories of far (cross-domain)/near (within-domain) analogies and disanalogies, called Analogical Reasoning on Narratives (ARN).

Plain English Explanation

• Analogical reasoning is the ability to recognize similarities between different concepts or situations, which allows us to apply knowledge from one domain to another. This is a crucial cognitive skill that enables transferability of information across domains. • While previous research on analogy has focused on narratives and the distinction between surface, relational, and system similarities, existing natural language processing work has a narrower focus on word-based relational analogies. • The researchers in this paper wanted to see if state-of-the-art LLMs could detect system analogies between narratives, which would be a more sophisticated form of analogical reasoning. • They created a comprehensive framework that uses narrative elements to map out surface and system-level similarities between different stories. This allowed them to design a benchmark task called ARN that tests LLMs' ability to recognize near and far analogies, as well as disanalogies. • The results showed that while LLMs can largely recognize near analogies, they struggle with far analogies in a zero-shot setting. Even with guidance and chain-of-thought reasoning, the best model only performed halfway between random and human-level performance on the ARN task. • This suggests that current LLMs still have limitations when it comes to the more complex cognitive skill of system-level analogical reasoning, and opens up exciting directions for future research in this area.

Technical Explanation

• The researchers created a comprehensive computational framework that operationalizes dominant theories of analogy, using narrative elements to create surface and system mappings. • They leveraged the interplay between these mappings to create a binary task and benchmark for Analogical Reasoning on Narratives (ARN), covering four categories of far (cross-domain)/near (within-domain) analogies and disanalogies. • They evaluated the performance of various LLMs, including the largest ones, on this ARN benchmark in a zero-shot setting. The results showed that while LLMs can largely recognize near analogies, they struggle with far analogies, with even the largest model (GPT-4.0) scoring below random. • The researchers then explored whether guiding the models through solved examples and chain-of-thought reasoning could enhance their analogical reasoning ability. While this did improve performance, the best model still only performed halfway between random and human-level on the ARN task.

Critical Analysis

• The ARN benchmark represents an important step in evaluating the more sophisticated cognitive skill of system-level analogical reasoning in LLMs, going beyond the narrower focus of previous work on word-based relational analogies. • However, the paper acknowledges that the task is still relatively simple compared to the nuanced and context-dependent nature of analogical reasoning in real-world narratives. • Additionally, the researchers note that their framework for creating surface and system mappings, while comprehensive, may not fully capture the complexity of how humans perceive and reason about analogies in stories. • Further research is needed to develop more robust and ecologically valid benchmarks for testing analogical reasoning in LLMs, as well as to explore other approaches for enhancing this cognitive skill, such as incorporating analogous instances or reasoning-as-retrieval strategies.

Conclusion

• This paper highlights the limitations of current LLMs in the more complex cognitive skill of system-level analogical reasoning, as demonstrated by their poor performance on the ARN benchmark. • While the researchers' computational framework and benchmark represent an important step forward, the results suggest that there is still much work to be done to develop LLMs that can truly reason and behave like humans when it comes to analogical thinking. • Overcoming these challenges could have significant implications for the broader field of artificial intelligence and its ability to transfer knowledge and reasoning skills across diverse domains.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Can language models learn analogical reasoning? Investigating training objectives and comparisons to human performance

Molly R. Petersen, Lonneke van der Plas

While analogies are a common way to evaluate word embeddings in NLP, it is also of interest to investigate whether or not analogical reasoning is a task in itself that can be learned. In this paper, we test several ways to learn basic analogical reasoning, specifically focusing on analogies that are more typical of what is used to evaluate analogical reasoning in humans than those in commonly used NLP benchmarks. Our experiments find that models are able to learn analogical reasoning, even with a small amount of data. We additionally compare our models to a dataset with a human baseline, and find that after training, models approach human performance.

5/6/2024

cs.CL

💬

ANALOGYKB: Unlocking Analogical Reasoning of Language Models with A Million-scale Knowledge Base

Siyu Yuan, Jiangjie Chen, Changzhi Sun, Jiaqing Liang, Yanghua Xiao, Deqing Yang

Analogical reasoning is a fundamental cognitive ability of humans. However, current language models (LMs) still struggle to achieve human-like performance in analogical reasoning tasks due to a lack of resources for model training. In this work, we address this gap by proposing ANALOGYKB, a million-scale analogy knowledge base (KB) derived from existing knowledge graphs (KGs). ANALOGYKB identifies two types of analogies from the KGs: 1) analogies of the same relations, which can be directly extracted from the KGs, and 2) analogies of analogous relations, which are identified with a selection and filtering pipeline enabled by large language models (LLMs), followed by minor human efforts for data quality control. Evaluations on a series of datasets of two analogical reasoning tasks (analogy recognition and generation) demonstrate that ANALOGYKB successfully enables both smaller LMs and LLMs to gain better analogical reasoning capabilities.

5/20/2024

cs.CL cs.AI

Semantic Structure-Mapping in LLM and Human Analogical Reasoning

Sam Musker, Alex Duchnowski, Raphael Milli`ere, Ellie Pavlick

Analogical reasoning is considered core to human learning and cognition. Recent studies have compared the analogical reasoning abilities of human subjects and Large Language Models (LLMs) on abstract symbol manipulation tasks, such as letter string analogies. However, these studies largely neglect analogical reasoning over semantically meaningful symbols, such as natural language words. This ability to draw analogies that link language to non-linguistic domains, which we term semantic structure-mapping, is thought to play a crucial role in language acquisition and broader cognitive development. We test human subjects and LLMs on analogical reasoning tasks that require the transfer of semantic structure and content from one domain to another. Advanced LLMs match human performance across many task variations. However, humans and LLMs respond differently to certain task variations and semantic distractors. Overall, our data suggest that LLMs are approaching human-level performance on these important cognitive tasks, but are not yet entirely human like.

6/21/2024

cs.CL

Evidence from counterfactual tasks supports emergent analogical reasoning in large language models

Taylor Webb, Keith J. Holyoak, Hongjing Lu

We recently reported evidence that large language models are capable of solving a wide range of text-based analogy problems in a zero-shot manner, indicating the presence of an emergent capacity for analogical reasoning. Two recent commentaries have challenged these results, citing evidence from so-called `counterfactual' tasks in which the standard sequence of the alphabet is arbitrarily permuted so as to decrease similarity with materials that may have been present in the language model's training data. Here, we reply to these critiques, clarifying some misunderstandings about the test materials used in our original work, and presenting evidence that language models are also capable of generalizing to these new counterfactual task variants.

5/1/2024

cs.CL cs.AI