MKRAG: Medical Knowledge Retrieval Augmented Generation for Medical Question Answering

2309.16035

Published 7/1/2024 by Yucheng Shi, Shaochen Xu, Tianze Yang, Zhengliang Liu, Tianming Liu, Xiang Li, Ninghao Liu

🛸

Abstract

Large Language Models (LLMs), although powerful in general domains, often perform poorly on domain-specific tasks like medical question answering (QA). Moreover, they tend to function as black-boxes, making it challenging to modify their behavior. To address the problem, our study delves into retrieval augmented generation (RAG), aiming to improve LLM responses without the need for fine-tuning or retraining. Specifically, we propose a comprehensive retrieval strategy to extract medical facts from an external knowledge base, and then inject them into the query prompt for LLMs. Focusing on medical QA using the MedQA-SMILE dataset, we evaluate the impact of different retrieval models and the number of facts provided to the LLM. Notably, our retrieval-augmented Vicuna-7B model exhibited an accuracy improvement from 44.46% to 48.54%. This work underscores the potential of RAG to enhance LLM performance, offering a practical approach to mitigate the challenges of black-box LLMs.

Create account to get full access

Overview

Large language models (LLMs) are powerful in general domains, but often perform poorly on specialized tasks like medical question answering.
LLMs also tend to function as black boxes, making it difficult to modify their behavior.
The researchers propose a comprehensive retrieval strategy to enhance LLM performance on medical question answering without the need for fine-tuning or retraining.

Plain English Explanation

Large language models (LLMs) are powerful AI systems that can understand and generate human-like text. However, when it comes to specialized tasks like answering medical questions, these models often struggle. This is because they are trained on a broad range of information, but may not have deep knowledge about specific domains like healthcare.

Additionally, LLMs are often considered "black boxes" - it can be challenging to understand how they work and why they make certain decisions. This makes it difficult to modify their behavior or improve their performance on specific tasks.

To address these issues, the researchers in this study explored a technique called retrieval augmented generation (RAG). The idea is to enhance the LLM's knowledge by retrieving relevant information from an external source, such as a medical knowledge base, and then incorporating that information into the model's responses.

By adding this extra layer of medical knowledge, the researchers were able to improve the accuracy of their LLM on a medical question answering task, without having to completely retrain the model from scratch.

Technical Explanation

The researchers focused on improving the performance of LLMs on medical question answering using the MedQA-SMILE dataset. They proposed a comprehensive retrieval strategy to extract relevant medical facts from an external knowledge base and inject them into the query prompt for the LLM.

The key elements of their approach include:

Retrieval Model: The researchers explored different retrieval models to identify the most relevant medical facts from the knowledge base, including dynamic document relevance (DR-RAG) and other techniques.
Retrieval Augmentation: The retrieved medical facts were then incorporated into the query prompt, allowing the LLM to leverage this additional information when generating its response.
Evaluation: The researchers evaluated the impact of their retrieval-augmented approach on the Vicuna-7B LLM, measuring the accuracy improvement on the MedQA-SMILE dataset. They found that the retrieval-augmented model exhibited an accuracy improvement from 44.46% to 48.54%.

By focusing on retrieval augmentation, the researchers were able to enhance the LLM's performance on medical question answering without the need for fine-tuning or retraining the entire model. This approach offers a practical solution to the challenges of working with black-box LLMs in specialized domains.

Critical Analysis

The researchers acknowledge several limitations and areas for further research in their paper:

The retrieval-augmented approach still lags behind human-level performance on the MedQA-SMILE dataset, indicating that there is room for improvement.
The study focused on a single LLM (Vicuna-7B) and a specific medical question answering dataset. Further research is needed to evaluate the generalizability of the approach across different LLMs and medical domains.
The researchers did not explore the impact of the retrieval strategy on the interpretability or transparency of the LLM's decision-making process. Additional research is needed to understand how the retrieval-augmented approach affects the model's inner workings.

While the researchers have demonstrated the potential of retrieval augmentation to enhance LLM performance on medical tasks, there are still several areas that warrant further investigation. Exploring the scalability of the approach, its impact on model interpretability, and its applicability to a wider range of domains could help unlock the full potential of this technique.

Conclusion

The study presented in this paper offers a promising approach to improving the performance of large language models on specialized tasks like medical question answering. By leveraging retrieval-augmented generation, the researchers were able to enhance the Vicuna-7B LLM's accuracy without the need for extensive fine-tuning or retraining.

This work underscores the potential of retrieval-augmented generation (RAG) to bridge the gap between the general capabilities of LLMs and the domain-specific knowledge required for specialized tasks. As the field of AI continues to advance, techniques like this could play a vital role in unlocking the full potential of large language models and making them more useful in real-world applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Tool Calling: Enhancing Medication Consultation via Retrieval-Augmented Large Language Models

Zhongzhen Huang, Kui Xue, Yongqi Fan, Linjie Mu, Ruoyu Liu, Tong Ruan, Shaoting Zhang, Xiaofan Zhang

Large-scale language models (LLMs) have achieved remarkable success across various language tasks but suffer from hallucinations and temporal misalignment. To mitigate these shortcomings, Retrieval-augmented generation (RAG) has been utilized to provide external knowledge to facilitate the answer generation. However, applying such models to the medical domain faces several challenges due to the lack of domain-specific knowledge and the intricacy of real-world scenarios. In this study, we explore LLMs with RAG framework for knowledge-intensive tasks in the medical field. To evaluate the capabilities of LLMs, we introduce MedicineQA, a multi-round dialogue benchmark that simulates the real-world medication consultation scenario and requires LLMs to answer with retrieved evidence from the medicine database. MedicineQA contains 300 multi-round question-answering pairs, each embedded within a detailed dialogue history, highlighting the challenge posed by this knowledge-intensive task to current LLMs. We further propose a new textit{Distill-Retrieve-Read} framework instead of the previous textit{Retrieve-then-Read}. Specifically, the distillation and retrieval process utilizes a tool calling mechanism to formulate search queries that emulate the keyword-based inquiries used by search engines. With experimental results, we show that our framework brings notable performance improvements and surpasses the previous counterparts in the evidence retrieval process in terms of evidence retrieval accuracy. This advancement sheds light on applying RAG to the medical domain.

4/30/2024

cs.CL

Improving Medical Reasoning through Retrieval and Self-Reflection with Retrieval-Augmented Large Language Models

Minbyul Jeong, Jiwoong Sohn, Mujeen Sung, Jaewoo Kang

Recent proprietary large language models (LLMs), such as GPT-4, have achieved a milestone in tackling diverse challenges in the biomedical domain, ranging from multiple-choice questions to long-form generations. To address challenges that still cannot be handled with the encoded knowledge of LLMs, various retrieval-augmented generation (RAG) methods have been developed by searching documents from the knowledge corpus and appending them unconditionally or selectively to the input of LLMs for generation. However, when applying existing methods to different domain-specific problems, poor generalization becomes apparent, leading to fetching incorrect documents or making inaccurate judgments. In this paper, we introduce Self-BioRAG, a framework reliable for biomedical text that specializes in generating explanations, retrieving domain-specific documents, and self-reflecting generated responses. We utilize 84k filtered biomedical instruction sets to train Self-BioRAG that can assess its generated explanations with customized reflective tokens. Our work proves that domain-specific components, such as a retriever, domain-related document corpus, and instruction sets are necessary for adhering to domain-related instructions. Using three major medical question-answering benchmark datasets, experimental results of Self-BioRAG demonstrate significant performance gains by achieving a 7.2% absolute improvement on average over the state-of-the-art open-foundation model with a parameter size of 7B or less. Overall, we analyze that Self-BioRAG finds the clues in the question, retrieves relevant documents if needed, and understands how to answer with information from retrieved documents and encoded knowledge as a medical expert does. We release our data and code for training our framework components and model weights (7B and 13B) to enhance capabilities in biomedical and clinical domains.

6/19/2024

cs.CL cs.AI cs.IR

💬

A Survey on RAG Meets LLMs: Towards Retrieval-Augmented Large Language Models

Wenqi Fan, Yujuan Ding, Liangbo Ning, Shijie Wang, Hengyun Li, Dawei Yin, Tat-Seng Chua, Qing Li

As one of the most advanced techniques in AI, Retrieval-Augmented Generation (RAG) can offer reliable and up-to-date external knowledge, providing huge convenience for numerous tasks. Particularly in the era of AI-Generated Content (AIGC), the powerful capacity of retrieval in providing additional knowledge enables RAG to assist existing generative AI in producing high-quality outputs. Recently, Large Language Models (LLMs) have demonstrated revolutionary abilities in language understanding and generation, while still facing inherent limitations, such as hallucinations and out-of-date internal knowledge. Given the powerful abilities of RAG in providing the latest and helpful auxiliary information, Retrieval-Augmented Large Language Models (RA-LLMs) have emerged to harness external and authoritative knowledge bases, rather than solely relying on the model's internal knowledge, to augment the generation quality of LLMs. In this survey, we comprehensively review existing research studies in RA-LLMs, covering three primary technical perspectives: architectures, training strategies, and applications. As the preliminary knowledge, we briefly introduce the foundations and recent advances of LLMs. Then, to illustrate the practical significance of RAG for LLMs, we systematically review mainstream relevant work by their architectures, training strategies, and application areas, detailing specifically the challenges of each and the corresponding capabilities of RA-LLMs. Finally, to deliver deeper insights, we discuss current limitations and several promising directions for future research. Updated information about this survey can be found at https://advanced-recommender-systems.github.io/RAG-Meets-LLMs/

6/18/2024

cs.CL cs.AI cs.IR

Augmenting Query and Passage for Retrieval-Augmented Generation using LLMs for Open-Domain Question Answering

Minsang Kim, Cheoneum Park, Seungjun Baek

Retrieval-augmented generation (RAG) has received much attention for Open-domain question-answering (ODQA) tasks as a means to compensate for the parametric knowledge of large language models (LLMs). While previous approaches focused on processing retrieved passages to remove irrelevant context, they still rely heavily on the quality of retrieved passages which can degrade if the question is ambiguous or complex. In this paper, we propose a simple yet efficient method called question and passage augmentation via LLMs for open-domain QA. Our method first decomposes the original questions into multiple-step sub-questions. By augmenting the original question with detailed sub-questions and planning, we are able to make the query more specific on what needs to be retrieved, improving the retrieval performance. In addition, to compensate for the case where the retrieved passages contain distracting information or divided opinions, we augment the retrieved passages with self-generated passages by LLMs to guide the answer extraction. Experimental results show that the proposed scheme outperforms the previous state-of-the-art and achieves significant performance gain over existing RAG methods.

6/21/2024

cs.CL cs.AI