DaVinci at SemEval-2024 Task 9: Few-shot prompting GPT-3.5 for Unconventional Reasoning

2405.11559

Published 5/21/2024 by Suyash Vardhan Mathur, Akshett Rai Jindal, Manish Shrivastava

🌐

Abstract

While significant work has been done in the field of NLP on vertical thinking, which involves primarily logical thinking, little work has been done towards lateral thinking, which involves looking at problems from an unconventional perspective and defying existing conceptions and notions. Towards this direction, SemEval 2024 introduces the task of BRAINTEASER, which involves two types of questions -- Sentence Puzzles and Word Puzzles that defy conventional common-sense reasoning and constraints. In this paper, we tackle both types of questions using few-shot prompting on GPT-3.5 and gain insights regarding the difference in the nature of the two types. Our prompting strategy placed us 26th on the leaderboard for the Sentence Puzzle and 15th on the Word Puzzle task.

Create account to get full access

Background

DaVinci at SemEval-2024 Task 9

The paper discusses the approach taken by the DaVinci team in the SemEval-2024 Task 9 competition, which focused on unconventional reasoning. The task involved solving novel "brainteaser" style questions that required creative and unconventional thinking to arrive at the correct answer.

Plain English Explanation

The SemEval-2024 Task 9 competition challenged teams to develop AI systems that could solve unusual, creative puzzles. These types of puzzles don't have a straightforward solution and require thinking outside the box. The DaVinci team decided to use a large language model called GPT-3.5 and a technique called "few-shot prompting" to tackle these unconventional reasoning problems.

The key idea was to provide the GPT-3.5 model with just a few examples of the type of brainteaser questions, along with their solutions. This allows the model to learn the general patterns and strategies needed to solve these types of novel problems, without needing extensive training on a large dataset. The DaVinci team then used this few-shot prompted GPT-3.5 model to generate answers for the competition's test questions.

Technical Explanation

The DaVinci team used the GPT-3.5 large language model as the foundation for their approach. They employed a few-shot prompting technique, where they provided the model with just a handful of example brainteaser questions and their solutions. This allowed the model to learn the general strategies and patterns needed to solve these types of unconventional reasoning problems, without requiring lengthy training on a large dataset.

The team carefully constructed the prompts to guide the model towards the type of creative, lateral thinking required. They also experimented with different prompt formats and prompting strategies to optimize the model's performance on the task.

Critical Analysis

The DaVinci team's approach of using few-shot prompting with a powerful language model like GPT-3.5 is an interesting and promising strategy for tackling unconventional reasoning problems. By leveraging the model's strong language understanding and generation capabilities, and guiding it with just a few relevant examples, they were able to achieve good results without the need for extensive, domain-specific training.

However, the paper does not provide a detailed analysis of the model's limitations or weaknesses. It's possible that the few-shot approach may struggle with more complex or varied types of brainteaser questions, or that the model's responses could be inconsistent or lack the true depth of understanding that a human would have.

Additionally, the paper does not discuss potential biases or safety concerns that could arise from using a large language model in this context. These are important considerations that should be explored further.

Conclusion

The DaVinci team's approach of using few-shot prompting with GPT-3.5 to solve unconventional reasoning problems in the SemEval-2024 Task 9 competition is a novel and intriguing idea. By leveraging the capabilities of a powerful language model and guiding it with just a few relevant examples, they were able to demonstrate promising results without the need for extensive training.

This work highlights the potential of large language models to tackle creative and lateral thinking tasks, which could have important implications for the development of more flexible and adaptable AI systems. However, further research is needed to fully understand the limitations and potential pitfalls of this approach, as well as to address any ethical or safety concerns that may arise.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

iREL at SemEval-2024 Task 9: Improving Conventional Prompting Methods for Brain Teasers

Harshit Gupta, Manav Chaudhary, Tathagata Raha, Shivansh Subramanian, Vasudeva Varma

This paper describes our approach for SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense. The BRAINTEASER task comprises multiple-choice Question Answering designed to evaluate the models' lateral thinking capabilities. It consists of Sentence Puzzle and Word Puzzle subtasks that require models to defy default common-sense associations and exhibit unconventional thinking. We propose a unique strategy to improve the performance of pre-trained language models, notably the Gemini 1.0 Pro Model, in both subtasks. We employ static and dynamic few-shot prompting techniques and introduce a model-generated reasoning strategy that utilizes the LLM's reasoning capabilities to improve performance. Our approach demonstrated significant improvements, showing that it performed better than the baseline models by a considerable margin but fell short of performing as well as the human annotators, thus highlighting the efficacy of the proposed strategies.

5/28/2024

cs.CL

🏷️

SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense

Yifan Jiang, Filip Ilievski, Kaixin Ma

While vertical thinking relies on logical and commonsense reasoning, lateral thinking requires systems to defy commonsense associations and overwrite them through unconventional thinking. Lateral thinking has been shown to be challenging for current models but has received little attention. A recent benchmark, BRAINTEASER, aims to evaluate current models' lateral thinking ability in a zero-shot setting. In this paper, we split the original benchmark to also support fine-tuning setting and present SemEval Task 9: BRAIN-TEASER(S), the first task at this competition designed to test the system's reasoning and lateral thinking ability. As a popular task, BRAINTEASER(S)'s two subtasks receive 483 team submissions from 182 participants during the competition. This paper provides a fine-grained system analysis of the competition results, together with a reflection on what this means for the ability of the systems to reason laterally. We hope that the BRAINTEASER(S) subtasks and findings in this paper can stimulate future work on lateral thinking and robust reasoning by computational models.

4/26/2024

cs.AI cs.CL cs.LG

📶

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning

Mina Ghashami, Soumya Smruti Mishra

The SemEval 2024 BRAINTEASER task represents a pioneering venture in Natural Language Processing (NLP) by focusing on lateral thinking, a dimension of cognitive reasoning that is often overlooked in traditional linguistic analyses. This challenge comprises of Sentence Puzzle and Word Puzzle subtasks and aims to test language models' capacity for divergent thinking. In this paper, we present our approach to the BRAINTEASER task. We employ a holistic strategy by leveraging cutting-edge pre-trained models in multiple choice architecture, and diversify the training data with Sentence and Word Puzzle datasets. To gain further improvement, we fine-tuned the model with synthetic humor or jokes dataset and the RiddleSense dataset which helped augmenting the model's lateral thinking abilities. Empirical results show that our approach achieve 92.5% accuracy in Sentence Puzzle subtask and 80.2% accuracy in Word Puzzle subtask.

5/21/2024

cs.CL cs.AI cs.IR cs.LG

BAMO at SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense

Baktash Ansari, Mohammadmostafa Rostamkhani, Sauleh Eetemadi

This paper outlines our approach to SemEval 2024 Task 9, BRAINTEASER: A Novel Task Defying Common Sense. The task aims to evaluate the ability of language models to think creatively. The dataset comprises multi-choice questions that challenge models to think outside of the box. We fine-tune 2 models, BERT and RoBERTa Large. Next, we employ a Chain of Thought (CoT) zero-shot prompting approach with 6 large language models, such as GPT-3.5, Mixtral, and Llama2. Finally, we utilize ReConcile, a technique that employs a round table conference approach with multiple agents for zero-shot learning, to generate consensus answers among 3 selected language models. Our best method achieves an overall accuracy of 85 percent on the sentence puzzles subtask.

6/10/2024

cs.CL