Combining LLMs and Knowledge Graphs to Reduce Hallucinations in Question Answering

Read original: arXiv:2409.04181 - Published 9/9/2024 by Larissa Pusch, Tim O. F. Conrad

🔍

Overview

Natural language processing has revolutionized how we interact with digital information systems, including databases.
Challenges persist, especially when accuracy is critical, like in the biomedical domain.
A key issue is the "hallucination problem," where models generate information not supported by the underlying data, leading to misinformation.

Plain English Explanation

This research paper presents a novel approach to improve the accuracy and reliability of question-answering systems by combining Large Language Models (LLMs) and Knowledge Graphs (KGs). The researchers used a biomedical KG as an example.

The key idea is to use a "query checker" that ensures the LLM-generated queries are syntactically and semantically valid before using them to extract information from the KG. This helps reduce errors like hallucinations, where the model makes up information not supported by the data.

The researchers evaluated their approach using a new dataset of 50 biomedical questions, testing several LLMs, including GPT-4 Turbo and llama3:70b. They found that while GPT-4 Turbo outperformed other models in generating accurate queries, open-source models like llama3:70b also showed promise with appropriate prompt engineering.

To make this approach accessible, the researchers developed a user-friendly web-based interface that allows users to input natural language queries, view the generated and corrected Cypher queries, and verify the resulting paths for accuracy.

Technical Explanation

The researchers built their approach on the LangChain framework and incorporated a "query checker" to ensure the syntactical and semantic validity of LLM-generated queries. These validated queries are then used to extract information from a Knowledge Graph, substantially reducing errors like hallucinations.

The researchers evaluated the overall performance using a new benchmark dataset of 50 biomedical questions, testing several LLMs, including GPT-4 Turbo and llama3:70b. Their results indicate that while GPT-4 Turbo outperforms other models in generating accurate queries, open-source models like llama3:70b show promise with appropriate prompt engineering.

Critical Analysis

The paper addresses an important challenge in the use of LLMs for question-answering systems, especially in critical domains like biomedicine. The researchers' approach of combining LLMs and KGs to reduce hallucinations is a promising solution.

However, the paper does not discuss the potential limitations of their approach, such as the reliance on a specific biomedical KG or the scalability of the query checker to handle more complex queries. Additionally, the researchers could have explored the performance of their approach on a wider range of LLMs and knowledge domains to better understand its generalizability.

Conclusion

This research paper presents a hybrid approach that effectively addresses common issues such as data gaps and hallucinations in question-answering systems. By combining LLMs and KGs, the researchers have developed a reliable and intuitive solution that can improve the accuracy and trustworthiness of these systems, especially in critical domains like biomedicine. The availability of a user-friendly web-based interface further enhances the accessibility and practical application of this approach.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🔍

Combining LLMs and Knowledge Graphs to Reduce Hallucinations in Question Answering

Larissa Pusch, Tim O. F. Conrad

Advancements in natural language processing have revolutionized the way we can interact with digital information systems, such as databases, making them more accessible. However, challenges persist, especially when accuracy is critical, as in the biomedical domain. A key issue is the hallucination problem, where models generate information unsupported by the underlying data, potentially leading to dangerous misinformation. This paper presents a novel approach designed to bridge this gap by combining Large Language Models (LLM) and Knowledge Graphs (KG) to improve the accuracy and reliability of question-answering systems, on the example of a biomedical KG. Built on the LangChain framework, our method incorporates a query checker that ensures the syntactical and semantic validity of LLM-generated queries, which are then used to extract information from a Knowledge Graph, substantially reducing errors like hallucinations. We evaluated the overall performance using a new benchmark dataset of 50 biomedical questions, testing several LLMs, including GPT-4 Turbo and llama3:70b. Our results indicate that while GPT-4 Turbo outperforms other models in generating accurate queries, open-source models like llama3:70b show promise with appropriate prompt engineering. To make this approach accessible, a user-friendly web-based interface has been developed, allowing users to input natural language queries, view generated and corrected Cypher queries, and verify the resulting paths for accuracy. Overall, this hybrid approach effectively addresses common issues such as data gaps and hallucinations, offering a reliable and intuitive solution for question answering systems. The source code for generating the results of this paper and for the user-interface can be found in our Git repository: https://git.zib.de/lpusch/cyphergenkg-gui

9/9/2024

💬

Combining Knowledge Graphs and Large Language Models

Amanda Kau, Xuzeng He, Aishwarya Nambissan, Aland Astudillo, Hui Yin, Amir Aryani

In recent years, Natural Language Processing (NLP) has played a significant role in various Artificial Intelligence (AI) applications such as chatbots, text generation, and language translation. The emergence of large language models (LLMs) has greatly improved the performance of these applications, showing astonishing results in language understanding and generation. However, they still show some disadvantages, such as hallucinations and lack of domain-specific knowledge, that affect their performance in real-world tasks. These issues can be effectively mitigated by incorporating knowledge graphs (KGs), which organise information in structured formats that capture relationships between entities in a versatile and interpretable fashion. Likewise, the construction and validation of KGs present challenges that LLMs can help resolve. The complementary relationship between LLMs and KGs has led to a trend that combines these technologies to achieve trustworthy results. This work collected 28 papers outlining methods for KG-powered LLMs, LLM-based KGs, and LLM-KG hybrid approaches. We systematically analysed and compared these approaches to provide a comprehensive overview highlighting key trends, innovative techniques, and common challenges. This synthesis will benefit researchers new to the field and those seeking to deepen their understanding of how KGs and LLMs can be effectively combined to enhance AI applications capabilities.

7/10/2024

🌀

An Enhanced Prompt-Based LLM Reasoning Scheme via Knowledge Graph-Integrated Collaboration

Yihao Li, Ru Zhang, Jianyi Liu

While Large Language Models (LLMs) demonstrate exceptional performance in a multitude of Natural Language Processing (NLP) tasks, they encounter challenges in practical applications, including issues with hallucinations, inadequate knowledge updating, and limited transparency in the reasoning process. To overcome these limitations, this study innovatively proposes a collaborative training-free reasoning scheme involving tight cooperation between Knowledge Graph (KG) and LLMs. This scheme first involves using LLMs to iteratively explore KG, selectively retrieving a task-relevant knowledge subgraph to support reasoning. The LLMs are then guided to further combine inherent implicit knowledge to reason on the subgraph while explicitly elucidating the reasoning process. Through such a cooperative approach, our scheme achieves more reliable knowledge-based reasoning and facilitates the tracing of the reasoning results. Experimental results show that our scheme significantly progressed across multiple datasets, notably achieving over a 10% improvement on the QALD10 dataset compared to the best baseline and the fine-tuned state-of-the-art (SOTA) work. Building on this success, this study hopes to offer a valuable reference for future research in the fusion of KG and LLMs, thereby enhancing LLMs' proficiency in solving complex issues.

6/13/2024

💬

Mitigating Hallucinations in Large Language Models via Self-Refinement-Enhanced Knowledge Retrieval

Mengjia Niu, Hao Li, Jie Shi, Hamed Haddadi, Fan Mo

Large language models (LLMs) have demonstrated remarkable capabilities across various domains, although their susceptibility to hallucination poses significant challenges for their deployment in critical areas such as healthcare. To address this issue, retrieving relevant facts from knowledge graphs (KGs) is considered a promising method. Existing KG-augmented approaches tend to be resource-intensive, requiring multiple rounds of retrieval and verification for each factoid, which impedes their application in real-world scenarios. In this study, we propose Self-Refinement-Enhanced Knowledge Graph Retrieval (Re-KGR) to augment the factuality of LLMs' responses with less retrieval efforts in the medical field. Our approach leverages the attribution of next-token predictive probability distributions across different tokens, and various model layers to primarily identify tokens with a high potential for hallucination, reducing verification rounds by refining knowledge triples associated with these tokens. Moreover, we rectify inaccurate content using retrieved knowledge in the post-processing stage, which improves the truthfulness of generated responses. Experimental results on a medical dataset demonstrate that our approach can enhance the factual capability of LLMs across various foundational models as evidenced by the highest scores on truthfulness.

5/13/2024