Active Prompting with Chain-of-Thought for Large Language Models

2302.12246

Published 6/10/2024 by Shizhe Diao, Pengcheng Wang, Yong Lin, Tong Zhang

💬

Abstract

The increasing scale of large language models (LLMs) brings emergent abilities to various complex tasks requiring reasoning, such as arithmetic and commonsense reasoning. It is known that the effective design of task-specific prompts is critical for LLMs' ability to produce high-quality answers. In particular, an effective approach for complex question-and-answer tasks is example-based prompting with chain-of-thought (CoT) reasoning, which significantly improves the performance of LLMs. However, current CoT methods rely on a fixed set of human-annotated exemplars, which are not necessarily the most effective examples for different tasks. This paper proposes a new method, Active-Prompt, to adapt LLMs to different tasks with task-specific example prompts (annotated with human-designed CoT reasoning). For this purpose, we propose a solution to the key problem of determining which questions are the most important and helpful ones to annotate from a pool of task-specific queries. By borrowing ideas from the related problem of uncertainty-based active learning, we introduce several metrics to characterize the uncertainty so as to select the most uncertain questions for annotation. Experimental results demonstrate the superiority of our proposed method, achieving state-of-the-art on eight complex reasoning tasks. Further analyses of different uncertainty metrics, pool sizes, zero-shot learning, and accuracy-uncertainty relationship demonstrate the effectiveness of our method. Our code will be available at https://github.com/shizhediao/active-prompt.

Create account to get full access

Overview

Large language models (LLMs) have shown impressive abilities in complex tasks like arithmetic and commonsense reasoning.
Effective prompt design, particularly using example-based prompting with chain-of-thought (CoT) reasoning, is crucial for high-quality answers from LLMs.
Current CoT methods rely on a fixed set of human-annotated examples, which may not be the most effective for different tasks.
This paper proposes a new method, Active-Prompt, to adapt LLMs to different tasks using task-specific example prompts with human-designed CoT reasoning.

Plain English Explanation

Large language models (LLMs) have become incredibly powerful, and can now tackle complex tasks that require reasoning, like solving math problems or answering questions that involve common sense. However, getting these LLMs to perform well on these tasks often requires carefully designing the "prompt" - the instructions or examples you give the model to guide its responses.

One effective approach is to use "chain-of-thought" prompting, where you provide the model with a series of step-by-step examples that demonstrate how to solve a problem. This helps the model learn the reasoning process, not just the final answer. But the current methods for chain-of-thought prompting rely on a fixed set of examples chosen by humans, which may not be the best examples for every task.

The researchers in this paper propose a new method called "Active-Prompt" that can automatically select the most helpful examples to include in the prompt, tailored to the specific task. They borrow ideas from "active learning" - a technique where the model itself helps choose the most informative examples to learn from. By applying this to prompt design, the model can effectively adapt to different tasks without needing a fixed set of examples.

Technical Explanation

The key innovation of this paper is the "Active-Prompt" method, which aims to automatically select the most informative example prompts to help large language models (LLMs) adapt to different complex reasoning tasks.

Current state-of-the-art methods for improving LLM performance on these tasks rely on chain-of-thought (CoT) prompting, where the model is shown a series of step-by-step examples demonstrating how to solve a problem. This helps the model learn the underlying reasoning process. However, these methods use a fixed set of human-curated examples, which may not be optimal for all tasks.

The Active-Prompt method borrows ideas from active learning, a technique where the model itself helps select the most informative training examples. In this case, the model identifies the most "uncertain" task-specific queries from a pool of examples, and those are annotated with human-designed CoT reasoning to create the optimal prompt.

The researchers introduce several metrics to quantify this uncertainty, such as the model's confidence in its own answers and the diversity of answers it generates. They then select the most uncertain examples to annotate and include in the prompt.

Experiments demonstrate that this Active-Prompt method outperforms existing CoT prompting approaches on a range of complex reasoning tasks. Further analysis also shows the benefits of this adaptive prompt design, including improved zero-shot learning and a strong correlation between the model's uncertainty and its accuracy.

Critical Analysis

The Active-Prompt method presented in this paper is a novel and promising approach to improving large language model (LLM) performance on complex reasoning tasks. By adaptively selecting the most informative example prompts, it avoids the limitations of relying on a fixed set of human-curated examples.

However, the paper does acknowledge some potential limitations and areas for further research. For instance, the method still relies on human annotation of the selected example prompts with chain-of-thought reasoning, which could be time-consuming and costly to scale. Exploring ways to automatically generate high-quality CoT reasoning, or to learn it directly from the model, could further enhance the efficiency and flexibility of this approach.

Additionally, while the experiments demonstrate strong performance on a range of tasks, the paper does not provide a thorough analysis of the types of tasks or queries where Active-Prompt excels compared to other methods. Understanding the strengths and weaknesses of this approach across different problem domains would be valuable for guiding its practical application.

Finally, the paper focuses primarily on the technical details of the Active-Prompt method and its empirical evaluation. Expanding the discussion to consider the broader implications and potential societal impacts of this research could help readers appreciate the significance of this work beyond just the technical advances.

Conclusion

This paper presents a novel "Active-Prompt" method for adapting large language models (LLMs) to complex reasoning tasks. By automatically selecting the most informative example prompts, annotated with human-designed chain-of-thought reasoning, Active-Prompt outperforms existing approaches on a range of benchmarks.

The key innovation is the use of uncertainty-based active learning to identify the most helpful examples to include in the prompt, rather than relying on a fixed set of human-curated examples. This allows the method to effectively tailor the prompt to the specific task at hand.

While the paper focuses on the technical details and empirical evaluation, the Active-Prompt approach has broader implications for improving LLM performance on challenging reasoning tasks. Further research into automating the process of generating high-quality chain-of-thought reasoning, and exploring the method's capabilities across diverse problem domains, could unlock even more potential for this adaptive prompt design technique.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

💬

Pattern-Aware Chain-of-Thought Prompting in Large Language Models

Yufeng Zhang, Xuepeng Wang, Lingxiang Wu, Jinqiao Wang

Chain-of-thought (CoT) prompting can guide language models to engage in complex multi-step reasoning. The quality of provided demonstrations significantly impacts the success of downstream inference tasks. While existing automated methods prioritize accuracy and semantics in these demonstrations, we show that the underlying reasoning patterns play a more crucial role in such tasks. In this paper, we propose Pattern-Aware CoT, a prompting method that considers the diversity of demonstration patterns. By incorporating patterns such as step length and reasoning process within intermediate steps, PA-CoT effectively mitigates the issue of bias induced by demonstrations and enables better generalization to diverse scenarios. We conduct experiments on nine reasoning benchmark tasks using two open-source LLMs. The results show that our method substantially enhances reasoning performance and exhibits robustness to errors. The code will be made publicly available.

4/24/2024

cs.CL

💬

Boosting Language Models Reasoning with Chain-of-Knowledge Prompting

Jianing Wang, Qiushi Sun, Xiang Li, Ming Gao

Recently, Chain-of-Thought (CoT) prompting has delivered success on complex reasoning tasks, which aims at designing a simple prompt like ``Let's think step by step'' or multiple in-context exemplars with well-designed rationales to elicit Large Language Models (LLMs) to generate intermediate reasoning steps. However, the generated rationales often come with mistakes, making unfactual and unfaithful reasoning chains. To mitigate this brittleness, we propose a novel Chain-of-Knowledge (CoK) prompting, where we aim at eliciting LLMs to generate explicit pieces of knowledge evidence in the form of structure triple. This is inspired by our human behaviors, i.e., we can draw a mind map or knowledge map as the reasoning evidence in the brain before answering a complex question. Benefiting from CoK, we additionally introduce a F^2-Verification method to estimate the reliability of the reasoning chains in terms of factuality and faithfulness. For the unreliable response, the wrong evidence can be indicated to prompt the LLM to rethink. Extensive experiments demonstrate that our method can further improve the performance of commonsense, factual, symbolic, and arithmetic reasoning tasks.

6/4/2024

cs.CL

🌿

Chain-of-Thought Reasoning Without Prompting

Xuezhi Wang, Denny Zhou

In enhancing the reasoning capabilities of large language models (LLMs), prior research primarily focuses on specific prompting techniques such as few-shot or zero-shot chain-of-thought (CoT) prompting. These methods, while effective, often involve manually intensive prompt engineering. Our study takes a novel approach by asking: Can LLMs reason effectively without prompting? Our findings reveal that, intriguingly, CoT reasoning paths can be elicited from pre-trained LLMs by simply altering the textit{decoding} process. Rather than conventional greedy decoding, we investigate the top-$k$ alternative tokens, uncovering that CoT paths are frequently inherent in these sequences. This approach not only bypasses the confounders of prompting but also allows us to assess the LLMs' textit{intrinsic} reasoning abilities. Moreover, we observe that the presence of a CoT in the decoding path correlates with a higher confidence in the model's decoded answer. This confidence metric effectively differentiates between CoT and non-CoT paths. Extensive empirical studies on various reasoning benchmarks show that the proposed CoT-decoding effectively elicits reasoning capabilities from language models, which were previously obscured by standard greedy decoding.

5/27/2024

cs.CL

💬

Multimodal Chain-of-Thought Reasoning in Language Models

Zhuosheng Zhang, Aston Zhang, Mu Li, Hai Zhao, George Karypis, Alex Smola

Large language models (LLMs) have shown impressive performance on complex reasoning by leveraging chain-of-thought (CoT) prompting to generate intermediate reasoning chains as the rationale to infer the answer. However, existing CoT studies have primarily focused on the language modality. We propose Multimodal-CoT that incorporates language (text) and vision (images) modalities into a two-stage framework that separates rationale generation and answer inference. In this way, answer inference can leverage better generated rationales that are based on multimodal information. Experimental results on ScienceQA and A-OKVQA benchmark datasets show the effectiveness of our proposed approach. With Multimodal-CoT, our model under 1 billion parameters achieves state-of-the-art performance on the ScienceQA benchmark. Our analysis indicates that Multimodal-CoT offers the advantages of mitigating hallucination and enhancing convergence speed. Code is publicly available at https://github.com/amazon-science/mm-cot.

5/21/2024

cs.CL cs.AI cs.CV