Temporal Cross-Attention for Dynamic Embedding and Tokenization of Multimodal Electronic Health Records

2403.04012

Published 4/3/2024 by Yingbo Ma, Suraj Kolla, Dhruv Kaliraman, Victoria Nolan, Zhenhong Hu, Ziyuan Guan, Yuanfang Ren, Brooke Armfield, Tezcan Ozrazgat-Baslanti, Tyler J. Loftus and 3 others

cs.LG

Temporal Cross-Attention for Dynamic Embedding and Tokenization of Multimodal Electronic Health Records

Abstract

The breadth, scale, and temporal granularity of modern electronic health records (EHR) systems offers great potential for estimating personalized and contextual patient health trajectories using sequential deep learning. However, learning useful representations of EHR data is challenging due to its high dimensionality, sparsity, multimodality, irregular and variable-specific recording frequency, and timestamp duplication when multiple measurements are recorded simultaneously. Although recent efforts to fuse structured EHR and unstructured clinical notes suggest the potential for more accurate prediction of clinical outcomes, less focus has been placed on EHR embedding approaches that directly address temporal EHR challenges by learning time-aware representations from multimodal patient time series. In this paper, we introduce a dynamic embedding and tokenization framework for precise representation of multimodal clinical time series that combines novel methods for encoding time and sequential position with temporal cross-attention. Our embedding and tokenization framework, when integrated into a multitask transformer classifier with sliding window attention, outperformed baseline approaches on the exemplar task of predicting the occurrence of nine postoperative complications of more than 120,000 major inpatient surgeries using multimodal data from three hospitals and two academic health centers in the United States.

Create account to get full access

Overview

This paper presents a novel approach for embedding and tokenizing multimodal electronic health records (EHRs) that captures the dynamic temporal relationships between different data modalities.
The proposed method, called Temporal Cross-Attention, uses a transformer-based architecture to learn representations that account for the evolving and interdependent nature of clinical time series data.
The authors demonstrate the effectiveness of their approach on several EHR-based tasks, including disease prediction and medical concept extraction.

Plain English Explanation

Electronic health records (EHRs) contain a wealth of information about patient health, including medical history, test results, and treatment plans. However, effectively utilizing this data can be challenging due to its complex and dynamic nature.

The researchers in this study developed a new way to represent the information in EHRs that better captures the evolving relationships between different types of medical data over time. Their approach, called Temporal Cross-Attention, uses a neural network architecture inspired by transformers to learn these dynamic representations.

Transformers are a type of machine learning model that are particularly good at understanding the context and relationships within sequential data, like the information found in EHRs. By incorporating temporal cross-attention, the model can learn how different medical factors, such as symptoms, lab results, and medications, interact and change over the course of a patient's care.

The researchers show that their Temporal Cross-Attention model outperforms other state-of-the-art methods on tasks like predicting future diseases and extracting key medical concepts from clinical notes. This suggests that their approach is better able to capture the nuanced, time-varying patterns in EHR data compared to more traditional techniques.

Overall, this work represents an important advancement in the field of medical machine learning, as it provides a more powerful and flexible way to leverage the rich information contained within electronic health records.

Technical Explanation

The key innovation of this paper is the Temporal Cross-Attention (TCA) module, which is used to encode the dynamic temporal relationships in multimodal EHR data. The TCA module consists of two main components:

Temporal Attention: This component learns to attend to relevant past time steps when representing the current state of a clinical time series, capturing the evolving nature of the data.
Cross-Attention: This component learns to attend to related modalities (e.g., lab tests, medications, diagnoses) when representing a particular data type, capturing the interdependencies between different aspects of a patient's health.

The TCA module is integrated into a transformer-based architecture that jointly embeds and tokenizes the multimodal EHR data. This allows the model to learn representations that are sensitive to both the temporal dynamics and cross-modal relationships present in the data.

The authors evaluate their Temporal Cross-Attention model on several EHR-based tasks, including disease prediction and medical concept extraction. They show that their approach outperforms a variety of baseline models, including standard transformer architectures and methods that do not explicitly model temporal dynamics.

Critical Analysis

The authors acknowledge several limitations of their work. First, their experiments are conducted on a single EHR dataset, so the generalizability of their findings to other healthcare settings remains to be seen. Additionally, the interpretability of the learned representations is not extensively explored, which could be an important consideration for real-world clinical applications.

Another potential concern is the computational complexity of the TCA module, which may limit its scalability to very large-scale EHR datasets. The authors do not provide a detailed analysis of the model's runtime or memory requirements, which would be helpful for assessing its practical feasibility.

Despite these limitations, the Temporal Cross-Attention approach represents a significant advancement in the field of medical machine learning. By explicitly modeling the dynamic and interdependent nature of EHR data, the authors have developed a more powerful and nuanced way to leverage this critical healthcare information.

Conclusion

This paper introduces a novel transformer-based architecture, called Temporal Cross-Attention, that is designed to effectively encode the dynamic and multimodal nature of electronic health records. The authors demonstrate the effectiveness of their approach on several EHR-based tasks, showing that it outperforms existing state-of-the-art methods.

While the research has some limitations, it represents an important step forward in the field of medical machine learning. By capturing the evolving relationships between different aspects of patient health, the Temporal Cross-Attention model has the potential to unlock new insights and improve clinical decision-making. As healthcare systems continue to generate increasingly rich and complex data, approaches like this will become increasingly valuable for extracting meaningful information and driving progress in the field.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Global Contrastive Training for Multimodal Electronic Health Records with Language Supervision

Yingbo Ma, Suraj Kolla, Zhenhong Hu, Dhruv Kaliraman, Victoria Nolan, Ziyuan Guan, Yuanfang Ren, Brooke Armfield, Tezcan Ozrazgat-Baslanti, Jeremy A. Balch, Tyler J. Loftus, Parisa Rashidi, Azra Bihorac, Benjamin Shickel

Modern electronic health records (EHRs) hold immense promise in tracking personalized patient health trajectories through sequential deep learning, owing to their extensive breadth, scale, and temporal granularity. Nonetheless, how to effectively leverage multiple modalities from EHRs poses significant challenges, given its complex characteristics such as high dimensionality, multimodality, sparsity, varied recording frequencies, and temporal irregularities. To this end, this paper introduces a novel multimodal contrastive learning framework, specifically focusing on medical time series and clinical notes. To tackle the challenge of sparsity and irregular time intervals in medical time series, the framework integrates temporal cross-attention transformers with a dynamic embedding and tokenization scheme for learning multimodal feature representations. To harness the interconnected relationships between medical time series and clinical notes, the framework equips a global contrastive loss, aligning a patient's multimodal feature representations with the corresponding discharge summaries. Since discharge summaries uniquely pertain to individual patients and represent a holistic view of the patient's hospital stay, machine learning models are led to learn discriminative multimodal features via global contrasting. Extensive experiments with a real-world EHR dataset demonstrated that our framework outperformed state-of-the-art approaches on the exemplar task of predicting the occurrence of nine postoperative complications for more than 120,000 major inpatient surgeries using multimodal data from UF health system split among three hospitals (UF Health Gainesville, UF Health Jacksonville, and UF Health Jacksonville-North).

4/11/2024

cs.LG cs.CL

🔮

Time-aware Heterogeneous Graph Transformer with Adaptive Attention Merging for Health Event Prediction

Shibo Li, Hengliang Cheng, Weihua Li

The widespread application of Electronic Health Records (EHR) data in the medical field has led to early successes in disease risk prediction using deep learning methods. These methods typically require extensive data for training due to their large parameter sets. However, existing works do not exploit the full potential of EHR data. A significant challenge arises from the infrequent occurrence of many medical codes within EHR data, limiting their clinical applicability. Current research often lacks in critical areas: 1) incorporating disease domain knowledge; 2) heterogeneously learning disease representations with rich meanings; 3) capturing the temporal dynamics of disease progression. To overcome these limitations, we introduce a novel heterogeneous graph learning model designed to assimilate disease domain knowledge and elucidate the intricate relationships between drugs and diseases. This model innovatively incorporates temporal data into visit-level embeddings and leverages a time-aware transformer alongside an adaptive attention mechanism to produce patient representations. When evaluated on two healthcare datasets, our approach demonstrated notable enhancements in both prediction accuracy and interpretability over existing methodologies, signifying a substantial advancement towards personalized and proactive healthcare management.

5/13/2024

cs.LG

❗

Predictive Modeling with Temporal Graphical Representation on Electronic Health Records

Jiayuan Chen, Changchang Yin, Yuanlong Wang, Ping Zhang

Deep learning-based predictive models, leveraging Electronic Health Records (EHR), are receiving increasing attention in healthcare. An effective representation of a patient's EHR should hierarchically encompass both the temporal relationships between historical visits and medical events, and the inherent structural information within these elements. Existing patient representation methods can be roughly categorized into sequential representation and graphical representation. The sequential representation methods focus only on the temporal relationships among longitudinal visits. On the other hand, the graphical representation approaches, while adept at extracting the graph-structured relationships between various medical events, fall short in effectively integrate temporal information. To capture both types of information, we model a patient's EHR as a novel temporal heterogeneous graph. This graph includes historical visits nodes and medical events nodes. It propagates structured information from medical event nodes to visit nodes and utilizes time-aware visit nodes to capture changes in the patient's health status. Furthermore, we introduce a novel temporal graph transformer (TRANS) that integrates temporal edge features, global positional encoding, and local structural encoding into heterogeneous graph convolution, capturing both temporal and structural information. We validate the effectiveness of TRANS through extensive experiments on three real-world datasets. The results show that our proposed approach achieves state-of-the-art performance.

5/8/2024

cs.LG cs.AI

🧠

TA-RNN: an Attention-based Time-aware Recurrent Neural Network Architecture for Electronic Health Records

Mohammad Al Olaimat (for the Alzheimer's Disease Neuroimaging Initiative), Serdar Bozdag (for the Alzheimer's Disease Neuroimaging Initiative)

Motivation: Electronic Health Records (EHR) represent a comprehensive resource of a patient's medical history. EHR are essential for utilizing advanced technologies such as deep learning (DL), enabling healthcare providers to analyze extensive data, extract valuable insights, and make precise and data-driven clinical decisions. DL methods such as Recurrent Neural Networks (RNN) have been utilized to analyze EHR to model disease progression and predict diagnosis. However, these methods do not address some inherent irregularities in EHR data such as irregular time intervals between clinical visits. Furthermore, most DL models are not interpretable. In this study, we propose two interpretable DL architectures based on RNN, namely Time-Aware RNN (TA-RNN) and TA-RNN-Autoencoder (TA-RNN-AE) to predict patient's clinical outcome in EHR at next visit and multiple visits ahead, respectively. To mitigate the impact of irregular time intervals, we propose incorporating time embedding of the elapsed times between visits. For interpretability, we propose employing a dual-level attention mechanism that operates between visits and features within each visit. Results: The results of the experiments conducted on Alzheimer's Disease Neuroimaging Initiative (ADNI) and National Alzheimer's Coordinating Center (NACC) datasets indicated superior performance of proposed models for predicting Alzheimer's Disease (AD) compared to state-of-the-art and baseline approaches based on F2 and sensitivity. Additionally, TA-RNN showed superior performance on Medical Information Mart for Intensive Care (MIMIC-III) dataset for mortality prediction. In our ablation study, we observed enhanced predictive performance by incorporating time embedding and attention mechanisms. Finally, investigating attention weights helped identify influential visits and features in predictions.

4/5/2024

cs.LG cs.AI