AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning

2405.10385

YC

0

Reddit

0

Published 5/21/2024 by Mina Ghashami, Soumya Smruti Mishra

šŸ“¶

Abstract

The SemEval 2024 BRAINTEASER task represents a pioneering venture in Natural Language Processing (NLP) by focusing on lateral thinking, a dimension of cognitive reasoning that is often overlooked in traditional linguistic analyses. This challenge comprises of Sentence Puzzle and Word Puzzle subtasks and aims to test language models' capacity for divergent thinking. In this paper, we present our approach to the BRAINTEASER task. We employ a holistic strategy by leveraging cutting-edge pre-trained models in multiple choice architecture, and diversify the training data with Sentence and Word Puzzle datasets. To gain further improvement, we fine-tuned the model with synthetic humor or jokes dataset and the RiddleSense dataset which helped augmenting the model's lateral thinking abilities. Empirical results show that our approach achieve 92.5% accuracy in Sentence Puzzle subtask and 80.2% accuracy in Word Puzzle subtask.

Create account to get full access

or

If you already have an account, we'll log you in

Overview

  • This paper presents a multi-choice question answering system developed by AmazUtah_NLP for the SemEval-2024 Task 9, which focuses on "Commonsense Defying Reasoning".
  • The system aims to tackle the challenge of answering questions that require reasoning beyond typical commonsense knowledge.
  • The paper describes the system's architecture, training approach, and performance on the task.

Plain English Explanation

The researchers at AmazUtah_NLP created a machine learning model that can answer multiple-choice questions. These questions are designed to test a person's ability to reason beyond common sense knowledge.

Typical commonsense knowledge is what most people would consider normal or expected. However, some questions might require a deeper understanding of the context or the ability to think in unconventional ways. The AmazUtah_NLP system was built to handle these kinds of "commonsense defying" questions.

The paper explains how the researchers designed and trained their model to excel at this task. They used various techniques, such as leveraging external knowledge bases and iterative reasoning, to help the system understand the questions and provide accurate answers.

Overall, this research aims to advance the field of natural language processing and push the boundaries of what AI systems can understand about the world and human reasoning.

Technical Explanation

The AmazUtah_NLP system for the SemEval-2024 Task 9 is a multi-choice question answering model that combines several key components to address the challenge of "Commonsense Defying Reasoning".

The system's architecture includes a language model pre-trained on a large corpus of text, which is then fine-tuned on the task-specific dataset. Additionally, the researchers leverage external knowledge bases and employ an iterative reasoning process to better understand the context and reasoning required to answer the questions.

During training, the model is presented with the question, the answer choices, and additional contextual information. The system then learns to select the most appropriate answer by combining its understanding of the question, the potential answers, and the relevant background knowledge.

The researchers report that their approach achieves strong performance on the SemEval-2024 Task 9 dataset, demonstrating the system's ability to reason beyond typical commonsense knowledge and provide accurate answers to the challenging questions.

Critical Analysis

The paper presents a well-designed and thoughtful approach to the SemEval-2024 Task 9 on "Commonsense Defying Reasoning". The researchers have carefully considered the limitations of relying solely on commonsense knowledge and have incorporated various techniques to enhance the system's reasoning capabilities.

One potential limitation of the research is the reliance on external knowledge bases, which may not always be comprehensive or tailored to the specific task. The researchers acknowledge this and suggest exploring ways to further integrate and curate the knowledge sources to better suit the task requirements.

Additionally, while the iterative reasoning approach seems promising, the paper does not provide a detailed analysis of its effectiveness or the specific mechanisms that enable the system to arrive at the correct answers. Further investigation into the reasoning process could help shed light on the system's strengths and weaknesses.

Overall, the AmazUtah_NLP system represents a significant step forward in addressing the challenge of "Commonsense Defying Reasoning" and could inspire future research in this area.

Conclusion

The AmazUtah_NLP system presented in this paper demonstrates a novel approach to multi-choice question answering that goes beyond relying on commonsense knowledge. By leveraging external resources, employing iterative reasoning, and fine-tuning a powerful language model, the researchers have developed a system capable of providing accurate answers to questions that defy typical commonsense understanding.

This research contributes to the broader field of natural language processing and highlights the importance of building AI systems that can reason flexibly and adapt to the complexities of human knowledge and reasoning. As the field continues to progress, the insights and techniques presented in this paper may inspire further advancements in the pursuit of more versatile and intelligent language understanding systems.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

iREL at SemEval-2024 Task 9: Improving Conventional Prompting Methods for Brain Teasers

iREL at SemEval-2024 Task 9: Improving Conventional Prompting Methods for Brain Teasers

Harshit Gupta, Manav Chaudhary, Tathagata Raha, Shivansh Subramanian, Vasudeva Varma

YC

0

Reddit

0

This paper describes our approach for SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense. The BRAINTEASER task comprises multiple-choice Question Answering designed to evaluate the models' lateral thinking capabilities. It consists of Sentence Puzzle and Word Puzzle subtasks that require models to defy default common-sense associations and exhibit unconventional thinking. We propose a unique strategy to improve the performance of pre-trained language models, notably the Gemini 1.0 Pro Model, in both subtasks. We employ static and dynamic few-shot prompting techniques and introduce a model-generated reasoning strategy that utilizes the LLM's reasoning capabilities to improve performance. Our approach demonstrated significant improvements, showing that it performed better than the baseline models by a considerable margin but fell short of performing as well as the human annotators, thus highlighting the efficacy of the proposed strategies.

Read more

5/28/2024

BAMO at SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense

BAMO at SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense

Baktash Ansari, Mohammadmostafa Rostamkhani, Sauleh Eetemadi

YC

0

Reddit

0

This paper outlines our approach to SemEval 2024 Task 9, BRAINTEASER: A Novel Task Defying Common Sense. The task aims to evaluate the ability of language models to think creatively. The dataset comprises multi-choice questions that challenge models to think outside of the box. We fine-tune 2 models, BERT and RoBERTa Large. Next, we employ a Chain of Thought (CoT) zero-shot prompting approach with 6 large language models, such as GPT-3.5, Mixtral, and Llama2. Finally, we utilize ReConcile, a technique that employs a round table conference approach with multiple agents for zero-shot learning, to generate consensus answers among 3 selected language models. Our best method achieves an overall accuracy of 85 percent on the sentence puzzles subtask.

Read more

6/10/2024

šŸŒ

DaVinci at SemEval-2024 Task 9: Few-shot prompting GPT-3.5 for Unconventional Reasoning

Suyash Vardhan Mathur, Akshett Rai Jindal, Manish Shrivastava

YC

0

Reddit

0

While significant work has been done in the field of NLP on vertical thinking, which involves primarily logical thinking, little work has been done towards lateral thinking, which involves looking at problems from an unconventional perspective and defying existing conceptions and notions. Towards this direction, SemEval 2024 introduces the task of BRAINTEASER, which involves two types of questions -- Sentence Puzzles and Word Puzzles that defy conventional common-sense reasoning and constraints. In this paper, we tackle both types of questions using few-shot prompting on GPT-3.5 and gain insights regarding the difference in the nature of the two types. Our prompting strategy placed us 26th on the leaderboard for the Sentence Puzzle and 15th on the Word Puzzle task.

Read more

5/21/2024

šŸ·ļø

SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense

Yifan Jiang, Filip Ilievski, Kaixin Ma

YC

0

Reddit

0

While vertical thinking relies on logical and commonsense reasoning, lateral thinking requires systems to defy commonsense associations and overwrite them through unconventional thinking. Lateral thinking has been shown to be challenging for current models but has received little attention. A recent benchmark, BRAINTEASER, aims to evaluate current models' lateral thinking ability in a zero-shot setting. In this paper, we split the original benchmark to also support fine-tuning setting and present SemEval Task 9: BRAIN-TEASER(S), the first task at this competition designed to test the system's reasoning and lateral thinking ability. As a popular task, BRAINTEASER(S)'s two subtasks receive 483 team submissions from 182 participants during the competition. This paper provides a fine-grained system analysis of the competition results, together with a reflection on what this means for the ability of the systems to reason laterally. We hope that the BRAINTEASER(S) subtasks and findings in this paper can stimulate future work on lateral thinking and robust reasoning by computational models.

Read more

4/26/2024