iREL at SemEval-2024 Task 9: Improving Conventional Prompting Methods for Brain Teasers

2405.16129

Published 5/28/2024 by Harshit Gupta, Manav Chaudhary, Tathagata Raha, Shivansh Subramanian, Vasudeva Varma

iREL at SemEval-2024 Task 9: Improving Conventional Prompting Methods for Brain Teasers

Abstract

This paper describes our approach for SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense. The BRAINTEASER task comprises multiple-choice Question Answering designed to evaluate the models' lateral thinking capabilities. It consists of Sentence Puzzle and Word Puzzle subtasks that require models to defy default common-sense associations and exhibit unconventional thinking. We propose a unique strategy to improve the performance of pre-trained language models, notably the Gemini 1.0 Pro Model, in both subtasks. We employ static and dynamic few-shot prompting techniques and introduce a model-generated reasoning strategy that utilizes the LLM's reasoning capabilities to improve performance. Our approach demonstrated significant improvements, showing that it performed better than the baseline models by a considerable margin but fell short of performing as well as the human annotators, thus highlighting the efficacy of the proposed strategies.

Create account to get full access

Overview

This paper presents the iREL team's approach to SemEval-2024 Task 9: Improving Conventional Prompting Methods for Brain Teasers.
The task involves developing language models that can solve brain teasers, which are short, puzzling questions that require creative thinking to answer.
The iREL team explores methods to enhance standard prompting techniques to improve the performance of language models on this task.

Plain English Explanation

The paper describes the work of the iREL team in the SemEval-2024 Task 9: Improving Conventional Prompting Methods for Brain Teasers. This task challenges researchers to create language models that can solve brain teasers - short, puzzle-like questions that require creative thinking to answer correctly.

The iREL team focuses on improving standard prompting techniques, which are the instructions provided to language models to guide their responses. By enhancing these prompting methods, the researchers aim to boost the performance of language models on the brain teaser task. The paper details the specific approaches the iREL team used to achieve this goal.

Technical Explanation

The iREL team's paper explores methods to enhance standard prompting techniques in order to improve the performance of language models on SemEval-2024 Task 9: Improving Conventional Prompting Methods for Brain Teasers. The team experimented with various prompting strategies, such as including additional context, incorporating feedback from unsuccessful attempts, and leveraging few-shot learning techniques like those described in davinci-at-semeval-2024-task-9-few.

The researchers also explored methods to make language models more robust to the unique challenges of brain teasers, such as the need for creative and unconventional thinking, as highlighted in amazutahnlp-at-semeval-2024-task-9-multichoice and mothman-at-semeval-2024-task-9-iterative. The team's approach, described in masontigers-at-semeval-2024-task-9-solving, involved iterative refinement of model outputs and leveraging diverse training data to expose the models to a wide range of problem-solving strategies.

Critical Analysis

The paper provides a comprehensive overview of the iREL team's approach to the SemEval-2024 Task 9 challenge, but it does not delve into the specific details or results of their experiments. While the high-level strategies are outlined, more information on the performance of the different prompting techniques and their relative strengths and weaknesses would have been helpful for a deeper understanding of the research.

Additionally, the paper does not address potential limitations or areas for further exploration. It would be valuable to know if the team encountered any challenges or identified any shortcomings in their approach that could be addressed in future work.

Overall, the paper provides a solid foundation for understanding the iREL team's contributions to the task, but more detailed analysis and discussion of the research findings would enhance the reader's ability to critically evaluate the work and its implications.

Conclusion

The iREL team's work on SemEval-2024 Task 9: Improving Conventional Prompting Methods for Brain Teasers represents an important step towards developing language models that can effectively solve creative, puzzle-like problems. By exploring enhanced prompting techniques, the researchers aim to push the boundaries of what language models can achieve, with potential applications in areas like education, entertainment, and cognitive training.

While the paper provides a high-level overview of the team's approach, more detailed information on the specific methods, results, and limitations would help strengthen the insights and contributions of this research. Continued exploration and refinement of prompting strategies for brain teasers and other creative problem-solving tasks could lead to significant advancements in the field of natural language processing and beyond.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

📶

AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying Reasoning

Mina Ghashami, Soumya Smruti Mishra

The SemEval 2024 BRAINTEASER task represents a pioneering venture in Natural Language Processing (NLP) by focusing on lateral thinking, a dimension of cognitive reasoning that is often overlooked in traditional linguistic analyses. This challenge comprises of Sentence Puzzle and Word Puzzle subtasks and aims to test language models' capacity for divergent thinking. In this paper, we present our approach to the BRAINTEASER task. We employ a holistic strategy by leveraging cutting-edge pre-trained models in multiple choice architecture, and diversify the training data with Sentence and Word Puzzle datasets. To gain further improvement, we fine-tuned the model with synthetic humor or jokes dataset and the RiddleSense dataset which helped augmenting the model's lateral thinking abilities. Empirical results show that our approach achieve 92.5% accuracy in Sentence Puzzle subtask and 80.2% accuracy in Word Puzzle subtask.

5/21/2024

cs.CL cs.AI cs.IR cs.LG

BAMO at SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense

Baktash Ansari, Mohammadmostafa Rostamkhani, Sauleh Eetemadi

This paper outlines our approach to SemEval 2024 Task 9, BRAINTEASER: A Novel Task Defying Common Sense. The task aims to evaluate the ability of language models to think creatively. The dataset comprises multi-choice questions that challenge models to think outside of the box. We fine-tune 2 models, BERT and RoBERTa Large. Next, we employ a Chain of Thought (CoT) zero-shot prompting approach with 6 large language models, such as GPT-3.5, Mixtral, and Llama2. Finally, we utilize ReConcile, a technique that employs a round table conference approach with multiple agents for zero-shot learning, to generate consensus answers among 3 selected language models. Our best method achieves an overall accuracy of 85 percent on the sentence puzzles subtask.

6/10/2024

cs.CL

🌐

DaVinci at SemEval-2024 Task 9: Few-shot prompting GPT-3.5 for Unconventional Reasoning

Suyash Vardhan Mathur, Akshett Rai Jindal, Manish Shrivastava

While significant work has been done in the field of NLP on vertical thinking, which involves primarily logical thinking, little work has been done towards lateral thinking, which involves looking at problems from an unconventional perspective and defying existing conceptions and notions. Towards this direction, SemEval 2024 introduces the task of BRAINTEASER, which involves two types of questions -- Sentence Puzzles and Word Puzzles that defy conventional common-sense reasoning and constraints. In this paper, we tackle both types of questions using few-shot prompting on GPT-3.5 and gain insights regarding the difference in the nature of the two types. Our prompting strategy placed us 26th on the leaderboard for the Sentence Puzzle and 15th on the Word Puzzle task.

5/21/2024

cs.CL cs.AI

🏷️

SemEval-2024 Task 9: BRAINTEASER: A Novel Task Defying Common Sense

Yifan Jiang, Filip Ilievski, Kaixin Ma

While vertical thinking relies on logical and commonsense reasoning, lateral thinking requires systems to defy commonsense associations and overwrite them through unconventional thinking. Lateral thinking has been shown to be challenging for current models but has received little attention. A recent benchmark, BRAINTEASER, aims to evaluate current models' lateral thinking ability in a zero-shot setting. In this paper, we split the original benchmark to also support fine-tuning setting and present SemEval Task 9: BRAIN-TEASER(S), the first task at this competition designed to test the system's reasoning and lateral thinking ability. As a popular task, BRAINTEASER(S)'s two subtasks receive 483 team submissions from 182 participants during the competition. This paper provides a fine-grained system analysis of the competition results, together with a reflection on what this means for the ability of the systems to reason laterally. We hope that the BRAINTEASER(S) subtasks and findings in this paper can stimulate future work on lateral thinking and robust reasoning by computational models.

4/26/2024

cs.AI cs.CL cs.LG