Biomedical knowledge graph-optimized prompt generation for large language models

2311.17330

Published 5/15/2024 by Karthik Soman, Peter W Rose, John H Morris, Rabia E Akbas, Brett Smith, Braian Peetoom, Catalina Villouta-Reyes, Gabriel Cerono, Yongmei Shi, Angela Rizk-Jackson and 4 others

cs.CL

🛸

Abstract

Large Language Models (LLMs) are being adopted at an unprecedented rate, yet still face challenges in knowledge-intensive domains like biomedicine. Solutions such as pre-training and domain-specific fine-tuning add substantial computational overhead, requiring further domain expertise. Here, we introduce a token-optimized and robust Knowledge Graph-based Retrieval Augmented Generation (KG-RAG) framework by leveraging a massive biomedical KG (SPOKE) with LLMs such as Llama-2-13b, GPT-3.5-Turbo and GPT-4, to generate meaningful biomedical text rooted in established knowledge. Compared to the existing RAG technique for Knowledge Graphs, the proposed method utilizes minimal graph schema for context extraction and uses embedding methods for context pruning. This optimization in context extraction results in more than 50% reduction in token consumption without compromising the accuracy, making a cost-effective and robust RAG implementation on proprietary LLMs. KG-RAG consistently enhanced the performance of LLMs across diverse biomedical prompts by generating responses rooted in established knowledge, accompanied by accurate provenance and statistical evidence (if available) to substantiate the claims. Further benchmarking on human curated datasets, such as biomedical true/false and multiple-choice questions (MCQ), showed a remarkable 71% boost in the performance of the Llama-2 model on the challenging MCQ dataset, demonstrating the framework's capacity to empower open-source models with fewer parameters for domain specific questions. Furthermore, KG-RAG enhanced the performance of proprietary GPT models, such as GPT-3.5 and GPT-4. In summary, the proposed framework combines explicit and implicit knowledge of KG and LLM in a token optimized fashion, thus enhancing the adaptability of general-purpose LLMs to tackle domain-specific questions in a cost-effective fashion.

Create account to get full access

Overview

Large Language Models (LLMs) are being rapidly adopted, but still face challenges in specialized domains like biomedicine
Existing solutions like pre-training and fine-tuning add computational overhead and require domain expertise
The researchers introduce a token-optimized and robust Knowledge Graph-based Retrieval Augmented Generation (KG-RAG) framework to enhance LLM performance in biomedicine

Plain English Explanation

Powerful language models are becoming more common, but they still struggle with tasks that require deep knowledge, like answering questions about medicine and biology. Typical solutions, like training the models on lots of data or fine-tuning them for specific domains, add a lot of computational cost and require a lot of expertise.

In this research, the team developed a new approach called KG-RAG that combines large language models with a knowledge graph - a structured database of information. This allows the language model to generate biomedical text that is grounded in established scientific knowledge, without needing as much training data or fine-tuning. The key innovations are optimizing the way the knowledge graph is used to provide context, and making the overall system more efficient in terms of the computational resources it requires.

The researchers show that KG-RAG consistently improves the performance of different language models on a variety of biomedical tasks, like answering true/false questions and multiple-choice questions. This is an important step towards making powerful language models more useful in specialized domains like healthcare and life sciences.

Technical Explanation

The KG-RAG framework leverages a large biomedical knowledge graph called SPOKE to enhance the capabilities of LLMs like Llama-2-13b, GPT-3.5-Turbo, and GPT-4 on domain-specific tasks.

Unlike previous retrieval-augmented generation (RAG) techniques that use knowledge graphs, KG-RAG utilizes a minimal graph schema for context extraction and embedding methods for context pruning. This optimization reduces token consumption by over 50% without compromising accuracy, making the approach more cost-effective and robust when deploying on proprietary LLMs.

Evaluation on biomedical datasets, including true/false questions and multiple-choice questions (MCQs), showed that KG-RAG can significantly boost performance. For example, it led to a 71% improvement in the Llama-2 model's accuracy on the challenging MCQ dataset. The framework also enhanced the capabilities of proprietary GPT models like GPT-3.5 and GPT-4.

Critical Analysis

The paper demonstrates the potential of the KG-RAG approach to empower general-purpose language models to handle domain-specific tasks more effectively. By optimizing the use of the knowledge graph, the researchers were able to reduce the computational overhead typically associated with retrieval-augmented generation techniques.

However, the paper does not provide much insight into the limitations of the approach or potential areas for further research. For example, it would be interesting to understand how the performance of KG-RAG compares to models that are explicitly trained on biomedical data, or how the framework might generalize to other specialized domains beyond biomedicine.

Additionally, the researchers could have delved deeper into the potential ethical considerations of deploying such a system, particularly around issues of transparency and accountability when generating biomedical text that could have real-world implications.

Conclusion

The KG-RAG framework represents a promising approach to enhancing the capabilities of large language models in specialized domains like biomedicine. By leveraging a knowledge graph to provide grounded, evidence-based context, the researchers were able to significantly boost the performance of models like Llama-2 and GPT on challenging biomedical tasks.

This work underscores the potential for hybrid systems that combine the strengths of large language models and structured knowledge to tackle complex, domain-specific challenges. As language models continue to advance, further innovations in this direction could lead to more reliable and trustworthy AI systems for high-stakes applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language Models

Minbyul Jeong, Jiwoong Sohn, Mujeen Sung, Jaewoo Kang

Recent proprietary large language models (LLMs), such as GPT-4, have achieved a milestone in tackling diverse challenges in the biomedical domain, ranging from multiple-choice questions to long-form generations. To address challenges that still cannot be handled with the encoded knowledge of LLMs, various retrieval-augmented generation (RAG) methods have been developed by searching documents from the knowledge corpus and appending them unconditionally or selectively to the input of LLMs for generation. However, when applying existing methods to different domain-specific problems, poor generalization becomes apparent, leading to fetching incorrect documents or making inaccurate judgments. In this paper, we introduce Self-BioRAG, a framework reliable for biomedical text that specializes in generating explanations, retrieving domain-specific documents, and self-reflecting generated responses. We utilize 84k filtered biomedical instruction sets to train Self-BioRAG that can assess its generated explanations with customized reflective tokens. Our work proves that domain-specific components, such as a retriever, domain-related document corpus, and instruction sets are necessary for adhering to domain-related instructions. Using three major medical question-answering benchmark datasets, experimental results of Self-BioRAG demonstrate significant performance gains by achieving a 7.2% absolute improvement on average over the state-of-the-art open-foundation model with a parameter size of 7B or less. Overall, we analyze that Self-BioRAG finds the clues in the question, retrieves relevant documents if needed, and understands how to answer with information from retrieved documents and encoded knowledge as a medical expert does. We release our data and code for training our framework components and model weights (7B and 13B) to enhance capabilities in biomedical and clinical domains.

6/19/2024

cs.CL cs.AI cs.IR

💬

BiomedRAG: A Retrieval Augmented Large Language Model for Biomedicine

Mingchen Li, Halil Kilicoglu, Hua Xu, Rui Zhang

Large Language Models (LLMs) have swiftly emerged as vital resources for different applications in the biomedical and healthcare domains; however, these models encounter issues such as generating inaccurate information or hallucinations. Retrieval-augmented generation provided a solution for these models to update knowledge and enhance their performance. In contrast to previous retrieval-augmented LMs, which utilize specialized cross-attention mechanisms to help LLM encode retrieved text, BiomedRAG adopts a simpler approach by directly inputting the retrieved chunk-based documents into the LLM. This straightforward design is easily applicable to existing retrieval and language models, effectively bypassing noise information in retrieved documents, particularly in noise-intensive tasks. Moreover, we demonstrate the potential for utilizing the LLM to supervise the retrieval model in the biomedical domain, enabling it to retrieve the document that assists the LM in improving its predictions. Our experiments reveal that with the tuned scorer,textsc{ BiomedRAG} attains superior performance across 5 biomedical NLP tasks, encompassing information extraction (triple extraction, relation extraction), text classification, link prediction, and question-answering, leveraging over 9 datasets. For instance, in the triple extraction task, textsc{BiomedRAG} outperforms other triple extraction systems with micro-F1 scores of 81.42 and 88.83 on GIT and ChemProt corpora, respectively.

5/6/2024

cs.CL

Tool Calling: Enhancing Medication Consultation via Retrieval-Augmented Large Language Models

Zhongzhen Huang, Kui Xue, Yongqi Fan, Linjie Mu, Ruoyu Liu, Tong Ruan, Shaoting Zhang, Xiaofan Zhang

Large-scale language models (LLMs) have achieved remarkable success across various language tasks but suffer from hallucinations and temporal misalignment. To mitigate these shortcomings, Retrieval-augmented generation (RAG) has been utilized to provide external knowledge to facilitate the answer generation. However, applying such models to the medical domain faces several challenges due to the lack of domain-specific knowledge and the intricacy of real-world scenarios. In this study, we explore LLMs with RAG framework for knowledge-intensive tasks in the medical field. To evaluate the capabilities of LLMs, we introduce MedicineQA, a multi-round dialogue benchmark that simulates the real-world medication consultation scenario and requires LLMs to answer with retrieved evidence from the medicine database. MedicineQA contains 300 multi-round question-answering pairs, each embedded within a detailed dialogue history, highlighting the challenge posed by this knowledge-intensive task to current LLMs. We further propose a new textit{Distill-Retrieve-Read} framework instead of the previous textit{Retrieve-then-Read}. Specifically, the distillation and retrieval process utilizes a tool calling mechanism to formulate search queries that emulate the keyword-based inquiries used by search engines. With experimental results, we show that our framework brings notable performance improvements and surpasses the previous counterparts in the evidence retrieval process in terms of evidence retrieval accuracy. This advancement sheds light on applying RAG to the medical domain.

4/30/2024

cs.CL

KnowGPT: Knowledge Graph based Prompting for Large Language Models

Qinggang Zhang, Junnan Dong, Hao Chen, Daochen Zha, Zailiang Yu, Xiao Huang

Large Language Models (LLMs) have demonstrated remarkable capabilities in many real-world applications. Nonetheless, LLMs are often criticized for their tendency to produce hallucinations, wherein the models fabricate incorrect statements on tasks beyond their knowledge and perception. To alleviate this issue, researchers have explored leveraging the factual knowledge in knowledge graphs (KGs) to ground the LLM's responses in established facts and principles. However, most state-of-the-art LLMs are closed-source, making it challenging to develop a prompting framework that can efficiently and effectively integrate KGs into LLMs with hard prompts only. Generally, existing KG-enhanced LLMs usually suffer from three critical issues, including huge search space, high API costs, and laborious prompt engineering, that impede their widespread application in practice. To this end, we introduce a novel Knowledge Graph based PrompTing framework, namely KnowGPT, to enhance LLMs with domain knowledge. KnowGPT contains a knowledge extraction module to extract the most informative knowledge from KGs, and a context-aware prompt construction module to automatically convert extracted knowledge into effective prompts. Experiments on three benchmarks demonstrate that KnowGPT significantly outperforms all competitors. Notably, KnowGPT achieves a 92.6% accuracy on OpenbookQA leaderboard, comparable to human-level performance.

6/5/2024

cs.CL cs.AI