MultiADE: A Multi-domain Benchmark for Adverse Drug Event Extraction

Read original: arXiv:2405.18015 - Published 5/29/2024 by Xiang Dai, Sarvnaz Karimi, Abeed Sarker, Ben Hachey, Cecile Paris

MultiADE: A Multi-domain Benchmark for Adverse Drug Event Extraction

Overview

• This paper introduces MultiADE, a new benchmark dataset for adverse drug event extraction from electronic health records (EHRs).

• The dataset covers multiple medical domains, making it a more comprehensive resource for training and evaluating adverse drug event detection models.

• The paper also presents several baseline models for adverse drug event extraction on the MultiADE dataset, providing a starting point for further research in this area.

Plain English Explanation

The paper describes the creation of a new dataset called MultiADE that can be used to train and test machine learning models to automatically detect adverse drug events in electronic health records.

Adverse drug events are negative reactions that can occur when a person takes a medication. They are an important medical issue, as they can lead to serious health problems and even hospitalizations.

The MultiADE dataset contains a large number of real-world examples of adverse drug events from multiple different medical specialties, such as oncology, psychiatry, and cardiology. This makes it a more comprehensive resource compared to previous datasets, which tended to focus on a single medical domain.

By providing this diverse dataset, the researchers hope to spur the development of more robust and accurate machine learning models for identifying adverse drug events. This could ultimately lead to improved patient safety and better medical care.

Technical Explanation

The paper introduces the MultiADE dataset, which is a new benchmark for adverse drug event extraction from electronic health records. The dataset covers multiple medical domains, including oncology, psychiatry, and cardiology, making it more comprehensive than previous datasets that focused on a single domain.

The authors curated the MultiADE dataset from real-world clinical notes, extracting and annotating mentions of adverse drug events. They then evaluated several baseline machine learning models for the task of adverse drug event extraction on the MultiADE dataset, including transformer-based models and graph neural networks.

The results demonstrate that the MultiADE dataset presents a challenging benchmark for adverse drug event extraction, with the best-performing model achieving an F1 score of 0.74. The authors also provide an in-depth analysis of the model performances, highlighting the need for further advancements in knowledge-enriched and multi-modal approaches to improve the generalization of adverse drug event detection models.

Critical Analysis

The MultiADE dataset represents a valuable contribution to the field of adverse drug event detection, as it provides a more diverse and challenging benchmark compared to previous datasets. The inclusion of multiple medical domains is a particular strength, as it better reflects the real-world complexity of adverse drug event detection in clinical practice.

However, the paper does not address some potential limitations of the dataset. For example, the authors do not provide details on the distribution of adverse drug events across the different medical domains, which could impact the difficulty of the task for certain domains. Additionally, the annotation process and the reliability of the ground-truth labels are not thoroughly discussed.

Furthermore, the paper focuses primarily on the development and evaluation of the dataset, but does not delve deeply into the insights gained from the baseline model performances. A more in-depth analysis of the strengths and weaknesses of the different approaches, as well as suggestions for future research directions, could have strengthened the paper's contribution to the field.

Conclusion

The MultiADE dataset introduced in this paper represents a significant advancement in the field of adverse drug event extraction from electronic health records. By providing a multi-domain benchmark, the researchers have created a more comprehensive resource for training and evaluating machine learning models in this important area of medical informatics.

The baseline model results highlight the challenges posed by the MultiADE dataset and the need for further innovations in knowledge-enriched and multi-modal approaches to adverse drug event detection. The availability of the MultiADE dataset should encourage the development of more robust and accurate models, ultimately leading to improved patient safety and better medical care.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

MultiADE: A Multi-domain Benchmark for Adverse Drug Event Extraction

Xiang Dai, Sarvnaz Karimi, Abeed Sarker, Ben Hachey, Cecile Paris

Objective. Active adverse event surveillance monitors Adverse Drug Events (ADE) from different data sources, such as electronic health records, medical literature, social media and search engine logs. Over years, many datasets are created, and shared tasks are organised to facilitate active adverse event surveillance. However, most-if not all-datasets or shared tasks focus on extracting ADEs from a particular type of text. Domain generalisation-the ability of a machine learning model to perform well on new, unseen domains (text types)-is under-explored. Given the rapid advancements in natural language processing, one unanswered question is how far we are from having a single ADE extraction model that are effective on various types of text, such as scientific literature and social media posts}. Methods. We contribute to answering this question by building a multi-domain benchmark for adverse drug event extraction, which we named MultiADE. The new benchmark comprises several existing datasets sampled from different text types and our newly created dataset-CADECv2, which is an extension of CADEC (Karimi, et al., 2015), covering online posts regarding more diverse drugs than CADEC. Our new dataset is carefully annotated by human annotators following detailed annotation guidelines. Conclusion. Our benchmark results show that the generalisation of the trained models is far from perfect, making it infeasible to be deployed to process different types of text. In addition, although intermediate transfer learning is a promising approach to utilising existing resources, further investigation is needed on methods of domain adaptation, particularly cost-effective methods to select useful training instances.

5/29/2024

Enhancing Adverse Drug Event Detection with Multimodal Dataset: Corpus Creation and Model Development

Pranab Sahoo, Ayush Kumar Singh, Sriparna Saha, Aman Chadha, Samrat Mondal

The mining of adverse drug events (ADEs) is pivotal in pharmacovigilance, enhancing patient safety by identifying potential risks associated with medications, facilitating early detection of adverse events, and guiding regulatory decision-making. Traditional ADE detection methods are reliable but slow, not easily adaptable to large-scale operations, and offer limited information. With the exponential increase in data sources like social media content, biomedical literature, and Electronic Medical Records (EMR), extracting relevant ADE-related information from these unstructured texts is imperative. Previous ADE mining studies have focused on text-based methodologies, overlooking visual cues, limiting contextual comprehension, and hindering accurate interpretation. To address this gap, we present a MultiModal Adverse Drug Event (MMADE) detection dataset, merging ADE-related textual information with visual aids. Additionally, we introduce a framework that leverages the capabilities of LLMs and VLMs for ADE detection by generating detailed descriptions of medical images depicting ADEs, aiding healthcare professionals in visually identifying adverse events. Using our MMADE dataset, we showcase the significance of integrating visual cues from images to enhance overall performance. This approach holds promise for patient safety, ADE awareness, and healthcare accessibility, paving the way for further exploration in personalized healthcare.

5/28/2024

🔮

CT-ADE: An Evaluation Benchmark for Adverse Drug Event Prediction from Clinical Trial Results

Anthony Yazdani, Alban Bornet, Philipp Khlebnikov, Boya Zhang, Hossein Rouhizadeh, Poorya Amini, Douglas Teodoro

Adverse drug events (ADEs) significantly impact clinical research, causing many clinical trial failures. ADE prediction is key for developing safer medications and enhancing patient outcomes. To support this effort, we introduce CT-ADE, a dataset for multilabel predictive modeling of ADEs in monopharmacy treatments. CT-ADE integrates data from 2,497 unique drugs, encompassing 168,984 drug-ADE pairs extracted from clinical trials, annotated with patient and contextual information, and comprehensive ADE concepts standardized across multiple levels of the MedDRA ontology. Preliminary analyses with large language models (LLMs) achieved F1-scores up to 55.90%. Models using patient and contextual information showed F1-score improvements of 21%-38% over models using only chemical structure data. Our results highlight the importance of target population and treatment regimens in the predictive modeling of ADEs, offering greater performance gains than LLM domain specialization and scaling. CT-ADE provides an essential tool for researchers aiming to leverage artificial intelligence and machine learning to enhance patient safety and minimize the impact of ADEs on pharmaceutical research and development. The dataset is publicly accessible at https://github.com/ds4dh/CT-ADE.

7/31/2024

🧠

Knowledge-augmented Graph Neural Networks with Concept-aware Attention for Adverse Drug Event Detection

Shaoxiong Ji, Ya Gao, Pekka Marttinen

Adverse drug events (ADEs) are an important aspect of drug safety. Various texts such as biomedical literature, drug reviews, and user posts on social media and medical forums contain a wealth of information about ADEs. Recent studies have applied word embedding and deep learning -based natural language processing to automate ADE detection from text. However, they did not explore incorporating explicit medical knowledge about drugs and adverse reactions or the corresponding feature learning. This paper adopts the heterogenous text graph which describes relationships between documents, words and concepts, augments it with medical knowledge from the Unified Medical Language System, and proposes a concept-aware attention mechanism which learns features differently for the different types of nodes in the graph. We further utilize contextualized embeddings from pretrained language models and convolutional graph neural networks for effective feature representation and relational learning. Experiments on four public datasets show that our model achieves performance competitive to the recent advances and the concept-aware attention consistently outperforms other attention mechanisms.

5/21/2024