MEDFuse: Multimodal EHR Data Fusion with Masked Lab-Test Modeling and Large Language Models

Read original: arXiv:2407.12309 - Published 7/18/2024 by Thao Minh Nguyen Phan, Cong-Tinh Dao, Chenwei Wu, Jian-Zhe Wang, Shun Liu, Jun-En Ding, David Restrepo, Feng Liu, Fang-Ming Hung, Wen-Chih Peng

MEDFuse: Multimodal EHR Data Fusion with Masked Lab-Test Modeling and Large Language Models

Overview

This paper introduces MEDFuse, a novel approach for fusing multimodal data from electronic health records (EHRs) to enable improved computer-aided diagnosis.
The key innovations of MEDFuse include masked lab-test modeling and the use of large language models for feature extraction and fusion.
The authors demonstrate the effectiveness of MEDFuse on several real-world EHR datasets, showing improvements over existing multimodal fusion methods.

Plain English Explanation

MEDFuse is a new technique that aims to improve the accuracy of computer systems that help doctors diagnose medical conditions. It does this by combining different types of data from a patient's electronic health record, such as lab test results, medical notes, and demographic information.

The researchers developed two main innovations in MEDFuse. First, they used a technique called "masked lab-test modeling" to help the system better understand the relationships between different lab test results, even when some of the test data is missing. Second, they leveraged large language models - powerful AI systems trained on vast amounts of text - to extract meaningful features from the unstructured medical notes and other data.

By combining these innovations, MEDFuse was able to outperform other state-of-the-art methods for fusing multimodal EHR data. This suggests it could be a valuable tool for building more accurate computer-aided diagnosis systems, which could ultimately help doctors make better-informed decisions and provide more personalized care for patients.

Technical Explanation

The key technical innovations in MEDFuse include:

Masked Lab-Test Modeling: The authors developed a novel masking strategy to handle missing lab test data in EHRs. They train a masked lab-test prediction model to learn the underlying relationships between different lab test results, allowing the system to infer and impute missing values.
Large Language Model Integration: MEDFuse incorporates large pre-trained language models, such as EMERGE: Integrating RAG for Improved Multimodal EHR Predictive and FlexCare: Leveraging Cross-Task Synergy for Flexible Multimodal, to extract meaningful features from unstructured clinical notes and other textual data in the EHRs.
Multimodal Fusion: The authors propose a fusion strategy that combines the outputs of the masked lab-test prediction model and the large language model features, along with other structured EHR data, to obtain a unified representation for downstream tasks like disease diagnosis.

The authors evaluate MEDFuse on several real-world EHR datasets, including Towards Precision Healthcare: Robust Fusion of Time Series and EHRmonize: A Framework for Medical Concept Abstraction from Electronic health records. They demonstrate significant performance improvements over existing multimodal fusion methods, highlighting the effectiveness of the masked lab-test modeling and large language model integration.

Critical Analysis

The authors acknowledge several limitations of their work:

The performance of MEDFuse is still dependent on the quality and completeness of the underlying EHR data, which can vary across healthcare systems and institutions.
The masked lab-test modeling approach may not fully capture complex dependencies between different lab tests, and further research is needed to improve the imputation accuracy.
The integration of large language models, while beneficial, could also introduce biases and artifacts present in the pre-trained models, which may affect the final performance.

Additionally, the authors do not provide a detailed analysis of the computational costs and model complexity of MEDFuse, which could be important considerations for real-world deployment in clinical settings.

Further research could explore ways to make the system more robust to noisy or incomplete EHR data, as well as investigate methods to better understand and mitigate potential biases introduced by the language models.

Conclusion

The MEDFuse framework presented in this paper represents a significant advancement in the field of multimodal EHR data fusion for computer-aided diagnosis. By combining innovative techniques like masked lab-test modeling and large language model integration, the authors have demonstrated the potential to build more accurate and reliable decision support systems for healthcare professionals.

The improvements shown by MEDFuse over existing methods suggest it could have a meaningful impact on improving patient outcomes and supporting the transition towards more personalized and precision-driven healthcare. However, further research is needed to address the limitations and ensure the long-term robustness and reliability of such systems in real-world clinical settings.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

MEDFuse: Multimodal EHR Data Fusion with Masked Lab-Test Modeling and Large Language Models

Thao Minh Nguyen Phan, Cong-Tinh Dao, Chenwei Wu, Jian-Zhe Wang, Shun Liu, Jun-En Ding, David Restrepo, Feng Liu, Fang-Ming Hung, Wen-Chih Peng

Electronic health records (EHRs) are multimodal by nature, consisting of structured tabular features like lab tests and unstructured clinical notes. In real-life clinical practice, doctors use complementary multimodal EHR data sources to get a clearer picture of patients' health and support clinical decision-making. However, most EHR predictive models do not reflect these procedures, as they either focus on a single modality or overlook the inter-modality interactions/redundancy. In this work, we propose MEDFuse, a Multimodal EHR Data Fusion framework that incorporates masked lab-test modeling and large language models (LLMs) to effectively integrate structured and unstructured medical data. MEDFuse leverages multimodal embeddings extracted from two sources: LLMs fine-tuned on free clinical text and masked tabular transformers trained on structured lab test results. We design a disentangled transformer module, optimized by a mutual information loss to 1) decouple modality-specific and modality-shared information and 2) extract useful joint representation from the noise and redundancy present in clinical notes. Through comprehensive validation on the public MIMIC-III dataset and the in-house FEMH dataset, MEDFuse demonstrates great potential in advancing clinical predictions, achieving over 90% F1 score in the 10-disease multi-label classification task.

7/18/2024

Global Contrastive Training for Multimodal Electronic Health Records with Language Supervision

Yingbo Ma, Suraj Kolla, Zhenhong Hu, Dhruv Kaliraman, Victoria Nolan, Ziyuan Guan, Yuanfang Ren, Brooke Armfield, Tezcan Ozrazgat-Baslanti, Jeremy A. Balch, Tyler J. Loftus, Parisa Rashidi, Azra Bihorac, Benjamin Shickel

Modern electronic health records (EHRs) hold immense promise in tracking personalized patient health trajectories through sequential deep learning, owing to their extensive breadth, scale, and temporal granularity. Nonetheless, how to effectively leverage multiple modalities from EHRs poses significant challenges, given its complex characteristics such as high dimensionality, multimodality, sparsity, varied recording frequencies, and temporal irregularities. To this end, this paper introduces a novel multimodal contrastive learning framework, specifically focusing on medical time series and clinical notes. To tackle the challenge of sparsity and irregular time intervals in medical time series, the framework integrates temporal cross-attention transformers with a dynamic embedding and tokenization scheme for learning multimodal feature representations. To harness the interconnected relationships between medical time series and clinical notes, the framework equips a global contrastive loss, aligning a patient's multimodal feature representations with the corresponding discharge summaries. Since discharge summaries uniquely pertain to individual patients and represent a holistic view of the patient's hospital stay, machine learning models are led to learn discriminative multimodal features via global contrasting. Extensive experiments with a real-world EHR dataset demonstrated that our framework outperformed state-of-the-art approaches on the exemplar task of predicting the occurrence of nine postoperative complications for more than 120,000 major inpatient surgeries using multimodal data from UF health system split among three hospitals (UF Health Gainesville, UF Health Jacksonville, and UF Health Jacksonville-North).

4/11/2024

💬

Large Language Multimodal Models for 5-Year Chronic Disease Cohort Prediction Using EHR Data

Jun-En Ding, Phan Nguyen Minh Thao, Wen-Chih Peng, Jian-Zhe Wang, Chun-Cheng Chug, Min-Chen Hsieh, Yun-Chien Tseng, Ling Chen, Dongsheng Luo, Chi-Te Wang, Pei-fu Chen, Feng Liu, Fang-Ming Hung

Chronic diseases such as diabetes are the leading causes of morbidity and mortality worldwide. Numerous research studies have been attempted with various deep learning models in diagnosis. However, most previous studies had certain limitations, including using publicly available datasets (e.g. MIMIC), and imbalanced data. In this study, we collected five-year electronic health records (EHRs) from the Taiwan hospital database, including 1,420,596 clinical notes, 387,392 laboratory test results, and more than 1,505 laboratory test items, focusing on research pre-training large language models. We proposed a novel Large Language Multimodal Models (LLMMs) framework incorporating multimodal data from clinical notes and laboratory test results for the prediction of chronic disease risk. Our method combined a text embedding encoder and multi-head attention layer to learn laboratory test values, utilizing a deep neural network (DNN) module to merge blood features with chronic disease semantics into a latent space. In our experiments, we observe that clinicalBERT and PubMed-BERT, when combined with attention fusion, can achieve an accuracy of 73% in multiclass chronic diseases and diabetes prediction. By transforming laboratory test values into textual descriptions and employing the Flan T-5 model, we achieved a 76% Area Under the ROC Curve (AUROC), demonstrating the effectiveness of leveraging numerical text data for training and inference in language models. This approach significantly improves the accuracy of early-stage diabetes prediction.

9/2/2024

EMERGE: Integrating RAG for Improved Multimodal EHR Predictive Modeling

Yinghao Zhu, Changyu Ren, Zixiang Wang, Xiaochen Zheng, Shiyun Xie, Junlan Feng, Xi Zhu, Zhoujun Li, Liantao Ma, Chengwei Pan

The integration of multimodal Electronic Health Records (EHR) data has notably advanced clinical predictive capabilities. However, current models that utilize clinical notes and multivariate time-series EHR data often lack the necessary medical context for precise clinical tasks. Previous methods using knowledge graphs (KGs) primarily focus on structured knowledge extraction. To address this, we propose EMERGE, a Retrieval-Augmented Generation (RAG) driven framework aimed at enhancing multimodal EHR predictive modeling. Our approach extracts entities from both time-series data and clinical notes by prompting Large Language Models (LLMs) and aligns them with professional PrimeKG to ensure consistency. Beyond triplet relationships, we include entities' definitions and descriptions to provide richer semantics. The extracted knowledge is then used to generate task-relevant summaries of patients' health statuses. These summaries are fused with other modalities utilizing an adaptive multimodal fusion network with cross-attention. Extensive experiments on the MIMIC-III and MIMIC-IV datasets for in-hospital mortality and 30-day readmission tasks demonstrate the superior performance of the EMERGE framework compared to baseline models. Comprehensive ablation studies and analyses underscore the efficacy of each designed module and the framework's robustness to data sparsity. EMERGE significantly enhances the use of multimodal EHR data in healthcare, bridging the gap with nuanced medical contexts crucial for informed clinical predictions.

6/4/2024