Improving Health Question Answering with Reliable and Time-Aware Evidence Retrieval

2404.08359

Published 4/15/2024 by Juraj Vladika, Florian Matthes

Improving Health Question Answering with Reliable and Time-Aware Evidence Retrieval

Abstract

In today's digital world, seeking answers to health questions on the Internet is a common practice. However, existing question answering (QA) systems often rely on using pre-selected and annotated evidence documents, thus making them inadequate for addressing novel questions. Our study focuses on the open-domain QA setting, where the key challenge is to first uncover relevant evidence in large knowledge bases. By utilizing the common retrieve-then-read QA pipeline and PubMed as a trustworthy collection of medical research documents, we answer health questions from three diverse datasets. We modify different retrieval settings to observe their influence on the QA pipeline's performance, including the number of retrieved documents, sentence selection process, the publication year of articles, and their number of citations. Our results reveal that cutting down on the amount of retrieved documents and favoring more recent and highly cited documents can improve the final macro F1 score up to 10%. We discuss the results, highlight interesting examples, and outline challenges for future research, like managing evidence disagreement and crafting user-friendly explanations.

Create account to get full access

Overview

This research paper focuses on improving health-related question answering by developing reliable and time-aware evidence retrieval methods.
The authors aim to address the challenges of retrieving relevant and up-to-date information to answer health-related questions accurately.
The paper explores techniques to enhance the performance of question answering systems in the biomedical domain.

Plain English Explanation

The paper presents methods to improve the ability of computer systems to answer health-related questions accurately. One of the key challenges in this area is ensuring that the information used to answer questions is reliable and up-to-date. This is crucial when dealing with health and medical topics, as outdated or inaccurate information can have serious consequences.

The researchers developed techniques to better retrieve relevant and timely evidence to support answering health questions. This involves understanding the context and temporal aspects of the questions, and using that knowledge to find the most appropriate information sources. By improving the evidence retrieval process, the authors aimed to enhance the overall performance of health question answering systems, making them more useful and trustworthy for users.

Technical Explanation

The paper explores several approaches to improving health question answering, with a focus on reliable and time-aware evidence retrieval.

First, the authors investigate biomedical question answering and the unique challenges it presents, such as the need for up-to-date information and the importance of reliable sources.

To address these challenges, the researchers developed a retrieval-augmented question answering system that combines language models with time-aware evidence retrieval. This approach aims to identify the most relevant and current information to answer health-related queries.

The system uses self-reflection techniques to better understand the context and temporal aspects of the questions, allowing it to retrieve the most appropriate evidence from a large knowledge base.

Through extensive experiments, the authors demonstrate the effectiveness of their approach in improving the accuracy and reliability of health question answering compared to traditional methods.

Critical Analysis

The paper provides a comprehensive and well-designed approach to enhancing health question answering systems. The focus on reliable and time-aware evidence retrieval is a crucial aspect, as it addresses a significant challenge in this domain.

One potential limitation of the research is the reliance on a specific knowledge base or corpus of information. While the authors demonstrate the effectiveness of their methods on this dataset, it would be valuable to see how the system performs on a broader range of health-related information sources, including dynamic and user-generated content.

Additionally, the paper could have explored the potential biases or limitations of the underlying language models and retrieval algorithms, and how these might impact the overall system performance. Addressing such considerations could further strengthen the reliability and trustworthiness of the proposed solution.

Conclusion

This research paper presents an innovative approach to improving health question answering by developing reliable and time-aware evidence retrieval methods. By understanding the context and temporal aspects of health-related queries, the authors have developed a system that can more effectively identify and utilize relevant and up-to-date information to provide accurate and trustworthy answers.

The techniques described in this paper have the potential to significantly enhance the capabilities of health question answering systems, making them more useful and reliable for users seeking medical information. This research contributes to the ongoing efforts to improve the quality and reliability of health-related information available to the public, which is crucial for promoting better-informed decision-making and improved health outcomes.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🛸

Retrieval Augmented Generation for Domain-specific Question Answering

Sanat Sharma, David Seunghyun Yoon, Franck Dernoncourt, Dewang Sultania, Karishma Bagga, Mengjiao Zhang, Trung Bui, Varun Kotte

Question answering (QA) has become an important application in the advanced development of large language models. General pre-trained large language models for question-answering are not trained to properly understand the knowledge or terminology for a specific domain, such as finance, healthcare, education, and customer service for a product. To better cater to domain-specific understanding, we build an in-house question-answering system for Adobe products. We propose a novel framework to compile a large question-answer database and develop the approach for retrieval-aware finetuning of a Large Language model. We showcase that fine-tuning the retriever leads to major improvements in the final generation. Our overall approach reduces hallucinations during generation while keeping in context the latest retrieval information for contextual grounding.

5/30/2024

cs.CL cs.AI cs.IR cs.LG

🤷

To Generate or to Retrieve? On the Effectiveness of Artificial Contexts for Medical Open-Domain Question Answering

Giacomo Frisoni, Alessio Cocchieri, Alex Presepi, Gianluca Moro, Zaiqiao Meng

Medical open-domain question answering demands substantial access to specialized knowledge. Recent efforts have sought to decouple knowledge from model parameters, counteracting architectural scaling and allowing for training on common low-resource hardware. The retrieve-then-read paradigm has become ubiquitous, with model predictions grounded on relevant knowledge pieces from external repositories such as PubMed, textbooks, and UMLS. An alternative path, still under-explored but made possible by the advent of domain-specific large language models, entails constructing artificial contexts through prompting. As a result, to generate or to retrieve is the modern equivalent of Hamlet's dilemma. This paper presents MedGENIE, the first generate-then-read framework for multiple-choice question answering in medicine. We conduct extensive experiments on MedQA-USMLE, MedMCQA, and MMLU, incorporating a practical perspective by assuming a maximum of 24GB VRAM. MedGENIE sets a new state-of-the-art in the open-book setting of each testbed, allowing a small-scale reader to outcompete zero-shot closed-book 175B baselines while using up to 706$times$ fewer parameters. Our findings reveal that generated passages are more effective than retrieved ones in attaining higher accuracy.

6/14/2024

cs.CL cs.AI

👁️

ExpertQA: Expert-Curated Questions and Attributed Answers

Chaitanya Malaviya, Subin Lee, Sihao Chen, Elizabeth Sieber, Mark Yatskar, Dan Roth

As language models are adopted by a more sophisticated and diverse set of users, the importance of guaranteeing that they provide factually correct information supported by verifiable sources is critical across fields of study. This is especially the case for high-stakes fields, such as medicine and law, where the risk of propagating false information is high and can lead to undesirable societal consequences. Previous work studying attribution and factuality has not focused on analyzing these characteristics of language model outputs in domain-specific scenarios. In this work, we conduct human evaluation of responses from a few representative systems along various axes of attribution and factuality, by bringing domain experts in the loop. Specifically, we collect expert-curated questions from 484 participants across 32 fields of study, and then ask the same experts to evaluate generated responses to their own questions. In addition, we ask experts to improve upon responses from language models. The output of our analysis is ExpertQA, a high-quality long-form QA dataset with 2177 questions spanning 32 fields, along with verified answers and attributions for claims in the answers.

4/3/2024

cs.CL cs.AI

EWEK-QA: Enhanced Web and Efficient Knowledge Graph Retrieval for Citation-based Question Answering Systems

Mohammad Dehghan, Mohammad Ali Alomrani, Sunyam Bagga, David Alfonso-Hermelo, Khalil Bibi, Abbas Ghaddar, Yingxue Zhang, Xiaoguang Li, Jianye Hao, Qun Liu, Jimmy Lin, Boxing Chen, Prasanna Parthasarathi, Mahdi Biparva, Mehdi Rezagholizadeh

The emerging citation-based QA systems are gaining more attention especially in generative AI search applications. The importance of extracted knowledge provided to these systems is vital from both accuracy (completeness of information) and efficiency (extracting the information in a timely manner). In this regard, citation-based QA systems are suffering from two shortcomings. First, they usually rely only on web as a source of extracted knowledge and adding other external knowledge sources can hamper the efficiency of the system. Second, web-retrieved contents are usually obtained by some simple heuristics such as fixed length or breakpoints which might lead to splitting information into pieces. To mitigate these issues, we propose our enhanced web and efficient knowledge graph (KG) retrieval solution (EWEK-QA) to enrich the content of the extracted knowledge fed to the system. This has been done through designing an adaptive web retriever and incorporating KGs triples in an efficient manner. We demonstrate the effectiveness of EWEK-QA over the open-source state-of-the-art (SoTA) web-based and KG baseline models using a comprehensive set of quantitative and human evaluation experiments. Our model is able to: first, improve the web-retriever baseline in terms of extracting more relevant passages (>20%), the coverage of answer span (>25%) and self containment (>35%); second, obtain and integrate KG triples into its pipeline very efficiently (by avoiding any LLM calls) to outperform the web-only and KG-only SoTA baselines significantly in 7 quantitative QA tasks and our human evaluation.

6/18/2024

cs.CL