UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations

Read original: arXiv:2311.08469 - Published 5/2/2024 by Wenting Zhao, Justin T Chiu, Jena D. Hwang, Faeze Brahman, Jack Hessel, Sanjiban Choudhury, Yejin Choi, Xiang Lorraine Li, Alane Suhr

🧠

Overview

This paper explores the task of "uncommonsense abductive reasoning" - reasoning about unusual, unexpected, and unlikely situations to generate an explanation that makes the unexpected outcome more likely.
The researchers curate a new English language corpus called "UNcommonsense" to investigate this ability, which differs from existing commonsense reasoning datasets focused on everyday situations.
They compare the performance of large language models and human explainers, finding that human-written explanations enhanced with model outputs achieve the highest quality.
The paper also experiments with imitation learning algorithms to train open and accessible language models on this task, showing improvements over the standard supervised fine-tuning approach.

Plain English Explanation

The paper focuses on a type of reasoning called "uncommonsense abductive reasoning." This involves trying to explain surprising or unlikely events, rather than the more common everyday situations that existing commonsense reasoning research has looked at.

To study this, the researchers created a new dataset called "UNcommonsense" with examples of unusual, unexpected, and unlikely scenarios. They then compared how well large language models and human explainers could come up with explanations that made these unexpected outcomes seem more plausible.

The key finding is that the best results came from a hybrid approach, where human-written explanations were enhanced using outputs from the language models. This allowed them to balance specificity and diversity in the explanations.

The researchers also experimented with new training methods, called "imitation learning," to teach language models to perform this kind of uncommonsense reasoning. These methods outperformed the standard fine-tuning approach, suggesting they're a promising direction for developing more capable commonsense reasoning systems.

Technical Explanation

The paper introduces the task of "uncommonsense abductive reasoning," where the goal is to generate an explanation that makes an unexpected or unlikely outcome more plausible given some context. This differs from existing work on commonsense reasoning, which has focused on more common, everyday situations.

To study this, the researchers curate a new dataset called "UNcommonsense" containing examples of unusual scenarios with unexpected outcomes. They then compare the performance of large language models and human explainers on this task, finding that model-augmented human-written explanations achieve the highest quality.

Additionally, the paper experiments with several imitation learning algorithms to train open and accessible language models on this task. These methods consistently outperform the standard supervised fine-tuning approach, as evaluated by human raters.

Critical Analysis

The paper provides a valuable contribution by introducing the "uncommonsense abductive reasoning" task and the associated UNcommonsense dataset. This expands on the typical commonsense reasoning benchmarks, which tend to focus on more commonplace situations.

However, the paper acknowledges some limitations of the dataset and task. For example, the scenarios may still be somewhat constrained and not fully representative of the breadth of unexpected real-world situations. Additionally, the evaluation relies on human ratings, which could be subjective and may not capture all aspects of commonsense reasoning ability.

Further research could explore broader evaluations of reasoning behavior beyond just the quality of generated explanations. Investigating the underlying reasoning processes and commonsense knowledge used by both humans and models could provide additional insights.

Conclusion

This paper makes an important contribution by introducing the task of "uncommonsense abductive reasoning" and a corresponding dataset to study the ability of language models and humans to reason about unexpected and unlikely situations. The key findings suggest that a hybrid approach, combining model outputs with human-written explanations, can achieve the highest quality results.

Additionally, the experiments with imitation learning techniques show promise for developing more capable commonsense reasoning systems in an open and accessible way. Overall, this work expands the frontiers of commonsense reasoning research and points to exciting directions for improving the reasoning abilities of artificial intelligence systems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🧠

UNcommonsense Reasoning: Abductive Reasoning about Uncommon Situations

Wenting Zhao, Justin T Chiu, Jena D. Hwang, Faeze Brahman, Jack Hessel, Sanjiban Choudhury, Yejin Choi, Xiang Lorraine Li, Alane Suhr

Language technologies that accurately model the dynamics of events must perform commonsense reasoning. Existing work evaluating commonsense reasoning focuses on making inferences about common, everyday situations. To instead investigate the ability to model unusual, unexpected, and unlikely situations, we explore the task of uncommonsense abductive reasoning. Given a piece of context with an unexpected outcome, this task requires reasoning abductively to generate an explanation that makes the unexpected outcome more likely in the context. To this end, we curate and release a new English language corpus called UNcommonsense. We characterize the performance differences between human explainers and the best-performing large language models, finding that model-enhanced human-written explanations achieve the highest quality by trading off between specificity and diversity. Finally, we experiment with several imitation learning algorithms to train open and accessible language models on this task. When compared with the vanilla supervised fine-tuning approach, these methods consistently reduce lose rates on both common and uncommonsense abductive reasoning judged by human evaluators.

5/2/2024

From Data to Commonsense Reasoning: The Use of Large Language Models for Explainable AI

Stefanie Krause, Frieder Stolzenburg

Commonsense reasoning is a difficult task for a computer, but a critical skill for an artificial intelligence (AI). It can enhance the explainability of AI models by enabling them to provide intuitive and human-like explanations for their decisions. This is necessary in many areas especially in question answering (QA), which is one of the most important tasks of natural language processing (NLP). Over time, a multitude of methods have emerged for solving commonsense reasoning problems such as knowledge-based approaches using formal logic or linguistic analysis. In this paper, we investigate the effectiveness of large language models (LLMs) on different QA tasks with a focus on their abilities in reasoning and explainability. We study three LLMs: GPT-3.5, Gemma and Llama 3. We further evaluate the LLM results by means of a questionnaire. We demonstrate the ability of LLMs to reason with commonsense as the models outperform humans on different datasets. While GPT-3.5's accuracy ranges from 56% to 93% on various QA benchmarks, Llama 3 achieved a mean accuracy of 90% on all eleven datasets. Thereby Llama 3 is outperforming humans on all datasets with an average 21% higher accuracy over ten datasets. Furthermore, we can appraise that, in the sense of explainable artificial intelligence (XAI), GPT-3.5 provides good explanations for its decisions. Our questionnaire revealed that 66% of participants rated GPT-3.5's explanations as either good or excellent. Taken together, these findings enrich our understanding of current LLMs and pave the way for future investigations of reasoning and explainability.

7/8/2024

Leveraging Explicit Reasoning for Inference Integration in Commonsense-Augmented Dialogue Models

Sarah E. Finch, Jinho D. Choi

Open-domain dialogue systems need to grasp social commonsense to understand and respond effectively to human users. Commonsense-augmented dialogue models have been proposed that aim to infer commonsense knowledge from dialogue contexts in order to improve response quality. However, existing approaches to commonsense-augmented dialogue rely on implicit reasoning to integrate commonsense inferences during response generation. In this study, we explore the impact of explicit reasoning against implicit reasoning over commonsense for dialogue response generation. Our findings demonstrate that separating commonsense reasoning into explicit steps for generating, selecting, and integrating commonsense into responses leads to better dialogue interactions, improving naturalness, engagement, specificity, and overall quality. Subsequent analyses of these findings unveil insights into the effectiveness of various types of commonsense in generating responses and the particular response traits enhanced through explicit reasoning for commonsense integration. Our work advances research in open-domain dialogue by achieving a new state-of-the-art in commonsense-augmented response generation.

6/14/2024

💬

Hypothesis Search: Inductive Reasoning with Language Models

Ruocheng Wang, Eric Zelikman, Gabriel Poesia, Yewen Pu, Nick Haber, Noah D. Goodman

Inductive reasoning is a core problem-solving capacity: humans can identify underlying principles from a few examples, which robustly generalize to novel scenarios. Recent work evaluates large language models (LLMs) on inductive reasoning tasks by directly prompting them yielding in context learning. This works well for straightforward inductive tasks but performs poorly on complex tasks such as the Abstraction and Reasoning Corpus (ARC). In this work, we propose to improve the inductive reasoning ability of LLMs by generating explicit hypotheses at multiple levels of abstraction: we prompt the LLM to propose multiple abstract hypotheses about the problem, in natural language, then implement the natural language hypotheses as concrete Python programs. These programs can be verified by running on observed examples and generalized to novel inputs. To reduce the hypothesis search space, we explore steps to filter the set of hypotheses to implement: we either ask the LLM to summarize them into a smaller set of hypotheses or ask human annotators to select a subset. We verify our pipeline's effectiveness on the ARC visual inductive reasoning benchmark, its variant 1D-ARC, string transformation dataset SyGuS, and list transformation dataset List Functions. On a random 100-problem subset of ARC, our automated pipeline using LLM summaries achieves 30% accuracy, outperforming the direct prompting baseline (accuracy of 17%). With the minimal human input of selecting from LLM-generated candidates, performance is boosted to 33%. Our ablations show that both abstract hypothesis generation and concrete program representations benefit LLMs on inductive reasoning tasks.

6/3/2024