Boosting Language Models Reasoning with Chain-of-Knowledge Prompting

2306.06427

Published 6/4/2024 by Jianing Wang, Qiushi Sun, Xiang Li, Ming Gao

💬

Abstract

Recently, Chain-of-Thought (CoT) prompting has delivered success on complex reasoning tasks, which aims at designing a simple prompt like ``Let's think step by step'' or multiple in-context exemplars with well-designed rationales to elicit Large Language Models (LLMs) to generate intermediate reasoning steps. However, the generated rationales often come with mistakes, making unfactual and unfaithful reasoning chains. To mitigate this brittleness, we propose a novel Chain-of-Knowledge (CoK) prompting, where we aim at eliciting LLMs to generate explicit pieces of knowledge evidence in the form of structure triple. This is inspired by our human behaviors, i.e., we can draw a mind map or knowledge map as the reasoning evidence in the brain before answering a complex question. Benefiting from CoK, we additionally introduce a F^2-Verification method to estimate the reliability of the reasoning chains in terms of factuality and faithfulness. For the unreliable response, the wrong evidence can be indicated to prompt the LLM to rethink. Extensive experiments demonstrate that our method can further improve the performance of commonsense, factual, symbolic, and arithmetic reasoning tasks.

Create account to get full access

Overview

Introduces a novel prompting technique called Chain-of-Knowledge (CoK) to address the shortcomings of Chain-of-Thought (CoT) prompting
CoK aims to elicit Large Language Models (LLMs) to generate explicit knowledge evidence in the form of structured triples
Proposes an F^2-Verification method to estimate the reliability of the reasoning chains in terms of factuality and faithfulness
Demonstrates improved performance on commonsense, factual, symbolic, and arithmetic reasoning tasks

Plain English Explanation

Chain-of-Thought (CoT) prompting is a technique that tries to get language models to break down complex problems into step-by-step reasoning. However, the reasoning steps generated by language models often contain mistakes or incorrect information.

To address this, the researchers propose a new technique called Chain-of-Knowledge (CoK) prompting. Instead of just asking the model to reason step-by-step, CoK prompts the model to generate explicit pieces of knowledge or facts that can be used as evidence to support the reasoning. This is inspired by how humans might draw a "knowledge map" in their minds before answering a complex question.

The researchers also introduce an "F^2-Verification" method to evaluate how reliable and accurate the reasoning chains generated by the model are. This can identify cases where the model has produced incorrect evidence, allowing the system to prompt the model to rethink its reasoning.

Overall, this approach aims to make the reasoning of language models more transparent, factual, and trustworthy, which could be important for applications like commonsense reasoning, symbolic reasoning, and arithmetic problem-solving.

Technical Explanation

The paper introduces a novel prompting technique called Chain-of-Knowledge (CoK) prompting to address the shortcomings of Chain-of-Thought (CoT) prompting. While CoT prompting aims to elicit Large Language Models (LLMs) to generate step-by-step reasoning, the resulting reasoning chains often contain mistakes and unfaithful information.

To mitigate this, the researchers propose CoK prompting, which aims to elicit LLMs to generate explicit pieces of knowledge evidence in the form of structured triples (subject-predicate-object). This is inspired by how humans might draw a "knowledge map" in their minds before answering a complex question.

The researchers also introduce an "F^2-Verification" method to estimate the reliability of the reasoning chains in terms of factuality and faithfulness. This allows the system to identify cases where the model has produced incorrect evidence and prompt the model to rethink its reasoning.

Extensive experiments demonstrate that the proposed CoK prompting approach, coupled with the F^2-Verification method, can further improve the performance of LLMs on commonsense, factual, symbolic, and arithmetic reasoning tasks compared to standard CoT prompting.

Critical Analysis

The paper presents a novel and promising approach to addressing the challenges of Chain-of-Thought (CoT) prompting. The introduction of Chain-of-Knowledge (CoK) prompting and the F^2-Verification method seems to be a significant step forward in making the reasoning of language models more transparent, factual, and trustworthy.

One potential limitation of the approach is that it may be more computationally intensive, as the model needs to generate and verify the knowledge evidence in addition to the reasoning steps. The paper does not provide a detailed analysis of the computational overhead or runtime implications of the proposed methods.

Additionally, the paper focuses on a limited set of reasoning tasks, and it would be interesting to see how the approach performs on a wider range of real-world applications, such as multimodal reasoning, planning, or step-by-step problem-solving. The researchers could also explore ways to further improve the reliability and accuracy of the generated knowledge evidence, perhaps by incorporating additional verification mechanisms or external knowledge sources.

Overall, the paper presents an innovative approach that could have significant implications for the development of more trustworthy and reliable language models, and the researchers have provided a solid foundation for further exploration and refinement of these ideas.

Conclusion

The paper introduces a novel Chain-of-Knowledge (CoK) prompting technique to address the shortcomings of Chain-of-Thought (CoT) prompting in eliciting Large Language Models (LLMs) to generate reliable and factual reasoning chains. By prompting LLMs to generate explicit knowledge evidence in the form of structured triples, and verifying the reliability of the reasoning using an F^2-Verification method, the researchers have demonstrated improved performance on commonsense, factual, symbolic, and arithmetic reasoning tasks.

This approach represents a significant step forward in making the reasoning of language models more transparent and trustworthy, which could have important implications for a wide range of applications, from multimodal reasoning to step-by-step problem-solving. As the field of language models continues to evolve, techniques like CoK prompting and F^2-Verification could play a crucial role in unlocking the full potential of these powerful AI systems while ensuring their reliability and safety.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🌿

Chain-of-Thought Reasoning Without Prompting

Xuezhi Wang, Denny Zhou

In enhancing the reasoning capabilities of large language models (LLMs), prior research primarily focuses on specific prompting techniques such as few-shot or zero-shot chain-of-thought (CoT) prompting. These methods, while effective, often involve manually intensive prompt engineering. Our study takes a novel approach by asking: Can LLMs reason effectively without prompting? Our findings reveal that, intriguingly, CoT reasoning paths can be elicited from pre-trained LLMs by simply altering the textit{decoding} process. Rather than conventional greedy decoding, we investigate the top-$k$ alternative tokens, uncovering that CoT paths are frequently inherent in these sequences. This approach not only bypasses the confounders of prompting but also allows us to assess the LLMs' textit{intrinsic} reasoning abilities. Moreover, we observe that the presence of a CoT in the decoding path correlates with a higher confidence in the model's decoded answer. This confidence metric effectively differentiates between CoT and non-CoT paths. Extensive empirical studies on various reasoning benchmarks show that the proposed CoT-decoding effectively elicits reasoning capabilities from language models, which were previously obscured by standard greedy decoding.

5/27/2024

cs.CL

💬

Pattern-Aware Chain-of-Thought Prompting in Large Language Models

Yufeng Zhang, Xuepeng Wang, Lingxiang Wu, Jinqiao Wang

Chain-of-thought (CoT) prompting can guide language models to engage in complex multi-step reasoning. The quality of provided demonstrations significantly impacts the success of downstream inference tasks. While existing automated methods prioritize accuracy and semantics in these demonstrations, we show that the underlying reasoning patterns play a more crucial role in such tasks. In this paper, we propose Pattern-Aware CoT, a prompting method that considers the diversity of demonstration patterns. By incorporating patterns such as step length and reasoning process within intermediate steps, PA-CoT effectively mitigates the issue of bias induced by demonstrations and enables better generalization to diverse scenarios. We conduct experiments on nine reasoning benchmark tasks using two open-source LLMs. The results show that our method substantially enhances reasoning performance and exhibits robustness to errors. The code will be made publicly available.

4/24/2024

cs.CL

💬

Multimodal Chain-of-Thought Reasoning in Language Models

Zhuosheng Zhang, Aston Zhang, Mu Li, Hai Zhao, George Karypis, Alex Smola

Large language models (LLMs) have shown impressive performance on complex reasoning by leveraging chain-of-thought (CoT) prompting to generate intermediate reasoning chains as the rationale to infer the answer. However, existing CoT studies have primarily focused on the language modality. We propose Multimodal-CoT that incorporates language (text) and vision (images) modalities into a two-stage framework that separates rationale generation and answer inference. In this way, answer inference can leverage better generated rationales that are based on multimodal information. Experimental results on ScienceQA and A-OKVQA benchmark datasets show the effectiveness of our proposed approach. With Multimodal-CoT, our model under 1 billion parameters achieves state-of-the-art performance on the ScienceQA benchmark. Our analysis indicates that Multimodal-CoT offers the advantages of mitigating hallucination and enhancing convergence speed. Code is publicly available at https://github.com/amazon-science/mm-cot.

5/21/2024

cs.CL cs.AI cs.CV

Direct Evaluation of Chain-of-Thought in Multi-hop Reasoning with Knowledge Graphs

Minh-Vuong Nguyen, Linhao Luo, Fatemeh Shiri, Dinh Phung, Yuan-Fang Li, Thuy-Trang Vu, Gholamreza Haffari

Large language models (LLMs) demonstrate strong reasoning abilities when prompted to generate chain-of-thought (CoT) explanations alongside answers. However, previous research on evaluating LLMs has solely focused on answer accuracy, neglecting the correctness of the generated CoT. In this paper, we delve deeper into the CoT reasoning capabilities of LLMs in multi-hop question answering by utilizing knowledge graphs (KGs). We propose a novel discriminative and generative CoT evaluation paradigm to assess LLMs' knowledge of reasoning and the accuracy of the generated CoT. Through experiments conducted on 5 different families of LLMs across 2 multi-hop question-answering datasets, we find that LLMs possess sufficient knowledge to perform reasoning. However, there exists a significant disparity between answer accuracy and faithfulness of the CoT reasoning generated by LLMs, indicating that they often arrive at correct answers through incorrect reasoning.

6/21/2024

cs.CL