Cost-efficient Knowledge-based Question Answering with Large Language Models

2405.17337

Published 5/28/2024 by Junnan Dong, Qinggang Zhang, Chuang Zhou, Hao Chen, Daochen Zha, Xiao Huang

Cost-efficient Knowledge-based Question Answering with Large Language Models

Abstract

Knowledge-based question answering (KBQA) is widely used in many scenarios that necessitate domain knowledge. Large language models (LLMs) bring opportunities to KBQA, while their costs are significantly higher and absence of domain-specific knowledge during pre-training. We are motivated to combine LLMs and prior small models on knowledge graphs (KGMs) for both inferential accuracy and cost saving. However, it remains challenging since accuracy and cost are not readily combined in the optimization as two distinct metrics. It is also laborious for model selection since different models excel in diverse knowledge. To this end, we propose Coke, a novel cost-efficient strategy for KBQA with LLMs, modeled as a tailored multi-armed bandit problem to minimize calls to LLMs within limited budgets. We first formulate the accuracy expectation with a cluster-level Thompson Sampling for either KGMs or LLMs. A context-aware policy is optimized to further distinguish the expert model subject to the question semantics. The overall decision is bounded by the cost regret according to historical expenditure on failures. Extensive experiments showcase the superior performance of Coke, which moves the Pareto frontier with up to 20.89% saving of GPT-4 fees while achieving a 2.74% higher accuracy on the benchmark datasets.

Create account to get full access

Overview

This paper explores a cost-efficient approach to knowledge-based question answering using large language models.
The researchers propose a method to leverage existing knowledge bases and language models to provide accurate answers to questions, while minimizing the computational resources required.
Key innovations include techniques to selectively retrieve relevant knowledge, generate candidate answers, and refine the responses in a computationally efficient manner.

Plain English Explanation

The paper presents a new way to answer questions using large language models and existing knowledge bases, but in a more cost-effective manner. Large language models are powerful AI systems that can understand and generate human-like text, but running them can be computationally expensive. The researchers have developed techniques to selectively retrieve only the most relevant information from knowledge bases, generate possible answers, and then refine those answers to be as accurate as possible - all while using fewer computational resources than traditional approaches. This allows for knowledge-based question answering to be done more efficiently, which could make it more accessible and usable in real-world applications.

Technical Explanation

The paper formulates the problem of cost-efficient knowledge-based question answering as jointly optimizing for answer quality and computational cost. The authors propose a multi-stage approach that leverages large language models and knowledge graphs to efficiently retrieve relevant information, generate candidate answers, and then refine the responses.

Key technical innovations include:

Selective Knowledge Retrieval: Techniques to efficiently identify the most relevant information in knowledge bases for a given question, reducing the computational burden.
Iterative Answer Generation: A multi-step process to generate candidate answers, evaluate them, and refine the responses to improve accuracy.
Knowledge-Infused Reasoning: Incorporating structured knowledge from knowledge graphs to guide the large language model's reasoning process and enhance the final answers.

The paper demonstrates the effectiveness of this approach through extensive experiments on multiple benchmark datasets, showing significant improvements in cost-efficiency compared to prior methods.

Critical Analysis

The paper presents a compelling approach to making knowledge-based question answering more practical and accessible through improved cost-efficiency. The selective retrieval of relevant knowledge and the iterative refinement of answers are clever innovations that help address the computational challenges of traditional methods.

However, the paper does not delve deeply into potential limitations or caveats of the proposed approach. For example, the reliance on high-quality knowledge bases may be a constraint, as building and maintaining such resources can be challenging. Additionally, the paper does not explore the generalization of the techniques to domains beyond the specific benchmarks used in the experiments.

Further research could investigate how the methods could be adapted to work with noisy or incomplete knowledge sources, or to handle open-ended questions that may require more creative reasoning beyond simply retrieving and refining information from knowledge bases.

Conclusion

This paper makes an important contribution to the field of knowledge-based question answering by introducing a cost-efficient approach that leverages large language models and knowledge graphs. The selective retrieval of relevant information and the iterative refinement of answers represent significant advancements, potentially making knowledge-based question answering more practical and accessible in real-world applications.

While the paper does not address all potential limitations, the core innovations and the demonstrated performance improvements are compelling. This research represents a promising step towards more efficient and effective knowledge-based question answering systems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Enhancing Question Answering for Enterprise Knowledge Bases using Large Language Models

Feihu Jiang, Chuan Qin, Kaichun Yao, Chuyu Fang, Fuzhen Zhuang, Hengshu Zhu, Hui Xiong

Efficient knowledge management plays a pivotal role in augmenting both the operational efficiency and the innovative capacity of businesses and organizations. By indexing knowledge through vectorization, a variety of knowledge retrieval methods have emerged, significantly enhancing the efficacy of knowledge management systems. Recently, the rapid advancements in generative natural language processing technologies paved the way for generating precise and coherent answers after retrieving relevant documents tailored to user queries. However, for enterprise knowledge bases, assembling extensive training data from scratch for knowledge retrieval and generation is a formidable challenge due to the privacy and security policies of private data, frequently entailing substantial costs. To address the challenge above, in this paper, we propose EKRG, a novel Retrieval-Generation framework based on large language models (LLMs), expertly designed to enable question-answering for Enterprise Knowledge bases with limited annotation costs. Specifically, for the retrieval process, we first introduce an instruction-tuning method using an LLM to generate sufficient document-question pairs for training a knowledge retriever. This method, through carefully designed instructions, efficiently generates diverse questions for enterprise knowledge bases, encompassing both fact-oriented and solution-oriented knowledge. Additionally, we develop a relevance-aware teacher-student learning strategy to further enhance the efficiency of the training process. For the generation process, we propose a novel chain of thought (CoT) based fine-tuning method to empower the LLM-based generator to adeptly respond to user questions using retrieved documents. Finally, extensive experiments on real-world datasets have demonstrated the effectiveness of our proposed framework.

4/23/2024

cs.CL cs.AI cs.IR

EffiQA: Efficient Question-Answering with Strategic Multi-Model Collaboration on Knowledge Graphs

Zixuan Dong, Baoyun Peng, Yufei Wang, Jia Fu, Xiaodong Wang, Yongxue Shan, Xin Zhou

While large language models (LLMs) have shown remarkable capabilities in natural language processing, they struggle with complex, multi-step reasoning tasks involving knowledge graphs (KGs). Existing approaches that integrate LLMs and KGs either underutilize the reasoning abilities of LLMs or suffer from prohibitive computational costs due to tight coupling. To address these limitations, we propose a novel collaborative framework named EffiQA that can strike a balance between performance and efficiency via an iterative paradigm. EffiQA consists of three stages: global planning, efficient KG exploration, and self-reflection. Specifically, EffiQA leverages the commonsense capability of LLMs to explore potential reasoning pathways through global planning. Then, it offloads semantic pruning to a small plug-in model for efficient KG exploration. Finally, the exploration results are fed to LLMs for self-reflection to further improve the global planning and efficient KG exploration. Empirical evidence on multiple KBQA benchmarks shows EffiQA's effectiveness, achieving an optimal balance between reasoning accuracy and computational costs. We hope the proposed new framework will pave the way for efficient, knowledge-intensive querying by redefining the integration of LLMs and KGs, fostering future research on knowledge-based question answering.

6/4/2024

cs.CL

Counter-intuitive: Large Language Models Can Better Understand Knowledge Graphs Than We Thought

Xinbang Dai, Yuncheng Hua, Tongtong Wu, Yang Sheng, Qiu Ji, Guilin Qi

As the parameter scale of large language models (LLMs) grows, jointly training knowledge graph (KG) embeddings with model parameters to enhance LLM capabilities becomes increasingly costly. Consequently, the community has shown interest in developing prompt strategies that effectively integrate KG information into LLMs. However, the format for incorporating KGs into LLMs lacks standardization; for instance, KGs can be transformed into linearized triples or natural language (NL) text. Current prompting methods often rely on a trial-and-error approach, leaving researchers with an incomplete understanding of which KG input format best facilitates LLM comprehension of KG content. To elucidate this, we design a series of experiments to explore LLMs' understanding of different KG input formats within the context of prompt engineering. Our analysis examines both literal and attention distribution levels. Through extensive experiments, we indicate a counter-intuitive phenomenon: when addressing fact-related questions, unordered linearized triples are more effective for LLMs' understanding of KGs compared to fluent NL text. Furthermore, noisy, incomplete, or marginally relevant subgraphs can still enhance LLM performance. Finally, different LLMs have distinct preferences for different formats of organizing unordered triples.

6/18/2024

cs.CL cs.AI

Reasoning on Efficient Knowledge Paths:Knowledge Graph Guides Large Language Model for Domain Question Answering

Yuqi Wang, Boran Jiang, Yi Luo, Dawei He, Peng Cheng, Liangcai Gao

Large language models (LLMs), such as GPT3.5, GPT4 and LLAMA2 perform surprisingly well and outperform human experts on many tasks. However, in many domain-specific evaluations, these LLMs often suffer from hallucination problems due to insufficient training of relevant corpus. Furthermore, fine-tuning large models may face problems such as the LLMs are not open source or the construction of high-quality domain instruction is difficult. Therefore, structured knowledge databases such as knowledge graph can better provide domain back- ground knowledge for LLMs and make full use of the reasoning and analysis capabilities of LLMs. In some previous works, LLM was called multiple times to determine whether the current triplet was suitable for inclusion in the subgraph when retrieving subgraphs through a question. Especially for the question that require a multi-hop reasoning path, frequent calls to LLM will consume a lot of computing power. Moreover, when choosing the reasoning path, LLM will be called once for each step, and if one of the steps is selected incorrectly, it will lead to the accumulation of errors in the following steps. In this paper, we integrated and optimized a pipeline for selecting reasoning paths from KG based on LLM, which can reduce the dependency on LLM. In addition, we propose a simple and effective subgraph retrieval method based on chain of thought (CoT) and page rank which can returns the paths most likely to contain the answer. We conduct experiments on three datasets: GenMedGPT-5k [14], WebQuestions [2], and CMCQA [21]. Finally, RoK can demonstrate that using fewer LLM calls can achieve the same results as previous SOTAs models.

4/17/2024

cs.CL cs.AI cs.IR