Clustering of Disease Trajectories with Explainable Machine Learning: A Case Study on Postoperative Delirium Phenotypes

Read original: arXiv:2405.03327 - Published 5/7/2024 by Xiaochen Zheng, Manuel Schurch, Xingyu Chen, Maria Angeliki Komninou, Reto Schupbach, Ahmed Allam, Jan Bartussek, Michael Krauthammer

🔗

Overview

Identifying distinct phenotypes (sets of observable characteristics) within complex diseases or syndromes is crucial for precision medicine, which aims to tailor healthcare to individual patient needs.
Postoperative delirium (POD) is a complex neuropsychiatric condition with significant variability in its clinical presentation and underlying causes.
The researchers hypothesize that POD comprises multiple distinct phenotypes that are not directly observable in clinical practice.
Uncovering these phenotypes could enhance our understanding of POD's pathogenesis and enable the development of more targeted prevention and treatment strategies.

Plain English Explanation

The paper explores a novel approach to uncover potential phenotypes within the complex condition of postoperative delirium (POD). POD is a neurological and psychiatric disorder that can occur after surgery, and it can manifest in different ways across patients.

The researchers believe that POD may actually consist of several distinct subtypes or "phenotypes" that are not easily observed in a clinical setting. By identifying these underlying phenotypes, the hope is to gain a better understanding of how POD develops and what causes it. This could then lead to more personalized and effective prevention and treatment strategies for patients.

The researchers use a combination of machine learning techniques to first predict a patient's risk of developing POD, and then apply clustering algorithms to try to uncover any hidden phenotypes within the data. They first test this approach on synthetic data, where they can simulate patient cohorts with predefined phenotypes. This allows them to validate that their method can successfully recover the known underlying phenotypes.

The researchers then apply their approach to a real-world dataset of elderly surgical patients, demonstrating its ability to uncover clinically relevant subtypes of POD. This paves the way for more personalized and targeted management of this complex disorder.

Technical Explanation

The researchers propose an approach that integrates supervised machine learning for personalized POD risk prediction with unsupervised clustering techniques to uncover potential POD phenotypes.

First, they use synthetic data to simulate patient cohorts with predefined phenotypes based on distinct sets of informative features. This allows them to test their method in a controlled setting where the true underlying phenotypes are known.

The researchers train a predictive model on the synthetic data and then use SHAP (Shapley Additive Explanations) to identify the most important features for the model's predictions. By clustering the patients in the SHAP feature importance space, the team is able to successfully recover the true underlying phenotypes, outperforming clustering in the raw feature space.

The researchers then apply their approach to a real-world dataset of elderly surgical patients. The results demonstrate the utility of this method in uncovering clinically relevant subtypes of complex disorders like POD, paving the way for more personalized and targeted treatment strategies.

Critical Analysis

The paper presents a novel and promising approach for uncovering hidden phenotypes within complex clinical conditions like postoperative delirium. The use of synthetic data to validate the method in a controlled setting is a particular strength, as it allows the researchers to confirm that the clustering algorithm can accurately recover known underlying phenotypes.

However, the paper does not provide extensive details on the quality or representativeness of the real-world dataset used in the case study. It would be helpful to have more information on the characteristics of the patient cohort, the completeness of the data, and any potential biases or limitations.

Additionally, the paper does not discuss the clinical implications or practical applications of the identified POD phenotypes. More research would be needed to understand how these insights could be translated into improved prevention, diagnosis, or treatment strategies for patients.

Overall, this research represents an important step towards a better understanding of the heterogeneity within complex clinical conditions. By combining predictive modeling and unsupervised clustering, the authors demonstrate a valuable approach that could be applied to a wide range of medical domains.

Conclusion

This paper proposes a novel approach that integrates supervised machine learning and unsupervised clustering to uncover potential phenotypes within the complex condition of postoperative delirium (POD). The researchers first validate their method using synthetic data, where they can confirm its ability to recover known underlying phenotypes. They then apply the approach to real-world data from a cohort of elderly surgical patients, demonstrating its utility in uncovering clinically relevant subtypes of POD.

By identifying distinct POD phenotypes, this research represents an important step towards a better understanding of the pathogenesis of this complex disorder. The insights gained could ultimately facilitate the development of more personalized and targeted prevention and treatment strategies, aligning with the goals of precision medicine. Further research is needed to explore the practical applications of these findings and their broader implications for the management of complex clinical conditions.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🔗

Clustering of Disease Trajectories with Explainable Machine Learning: A Case Study on Postoperative Delirium Phenotypes

Xiaochen Zheng, Manuel Schurch, Xingyu Chen, Maria Angeliki Komninou, Reto Schupbach, Ahmed Allam, Jan Bartussek, Michael Krauthammer

The identification of phenotypes within complex diseases or syndromes is a fundamental component of precision medicine, which aims to adapt healthcare to individual patient characteristics. Postoperative delirium (POD) is a complex neuropsychiatric condition with significant heterogeneity in its clinical manifestations and underlying pathophysiology. We hypothesize that POD comprises several distinct phenotypes, which cannot be directly observed in clinical practice. Identifying these phenotypes could enhance our understanding of POD pathogenesis and facilitate the development of targeted prevention and treatment strategies. In this paper, we propose an approach that combines supervised machine learning for personalized POD risk prediction with unsupervised clustering techniques to uncover potential POD phenotypes. We first demonstrate our approach using synthetic data, where we simulate patient cohorts with predefined phenotypes based on distinct sets of informative features. We aim to mimic any clinical disease with our synthetic data generation method. By training a predictive model and applying SHAP, we show that clustering patients in the SHAP feature importance space successfully recovers the true underlying phenotypes, outperforming clustering in the raw feature space. We then present a case study using real-world data from a cohort of elderly surgical patients. The results showcase the utility of our approach in uncovering clinically relevant subtypes of complex disorders like POD, paving the way for more precise and personalized treatment strategies.

5/7/2024

🔗

Discovery of Generalizable TBI Phenotypes Using Multivariate Time-Series Clustering

Hamid Ghaderi, Brandon Foreman, Chandan K. Reddy, Vignesh Subbian

Traumatic Brain Injury (TBI) presents a broad spectrum of clinical presentations and outcomes due to its inherent heterogeneity, leading to diverse recovery trajectories and varied therapeutic responses. While many studies have delved into TBI phenotyping for distinct patient populations, identifying TBI phenotypes that consistently generalize across various settings and populations remains a critical research gap. Our research addresses this by employing multivariate time-series clustering to unveil TBI's dynamic intricates. Utilizing a self-supervised learning-based approach to clustering multivariate time-Series data with missing values (SLAC-Time), we analyzed both the research-centric TRACK-TBI and the real-world MIMIC-IV datasets. Remarkably, the optimal hyperparameters of SLAC-Time and the ideal number of clusters remained consistent across these datasets, underscoring SLAC-Time's stability across heterogeneous datasets. Our analysis revealed three generalizable TBI phenotypes ({alpha}, b{eta}, and {gamma}), each exhibiting distinct non-temporal features during emergency department visits, and temporal feature profiles throughout ICU stays. Specifically, phenotype {alpha} represents mild TBI with a remarkably consistent clinical presentation. In contrast, phenotype b{eta} signifies severe TBI with diverse clinical manifestations, and phenotype {gamma} represents a moderate TBI profile in terms of severity and clinical diversity. Age is a significant determinant of TBI outcomes, with older cohorts recording higher mortality rates. Importantly, while certain features varied by age, the core characteristics of TBI manifestations tied to each phenotype remain consistent across diverse populations.

8/22/2024

✨

Feature importance to explain multimodal prediction models. A clinical use case

Jorn-Jan van de Beld, Shreyasi Pathak, Jeroen Geerdink, Johannes H. Hegeman, Christin Seifert

Surgery to treat elderly hip fracture patients may cause complications that can lead to early mortality. An early warning system for complications could provoke clinicians to monitor high-risk patients more carefully and address potential complications early, or inform the patient. In this work, we develop a multimodal deep-learning model for post-operative mortality prediction using pre-operative and per-operative data from elderly hip fracture patients. Specifically, we include static patient data, hip and chest images before surgery in pre-operative data, vital signals, and medications administered during surgery in per-operative data. We extract features from image modalities using ResNet and from vital signals using LSTM. Explainable model outcomes are essential for clinical applicability, therefore we compute Shapley values to explain the predictions of our multimodal black box model. We find that i) Shapley values can be used to estimate the relative contribution of each modality both locally and globally, and ii) a modified version of the chain rule can be used to propagate Shapley values through a sequence of models supporting interpretable local explanations. Our findings imply that a multimodal combination of black box models can be explained by propagating Shapley values through the model sequence.

4/30/2024

🤿

Deep Representation Learning-Based Dynamic Trajectory Phenotyping for Acute Respiratory Failure in Medical Intensive Care Units

Alan Wu, Tilendra Choudhary, Pulakesh Upadhyaya, Ayman Ali, Philip Yang, Rishikesan Kamaleswaran

Sepsis-induced acute respiratory failure (ARF) is a serious complication with a poor prognosis. This paper presents a deep representation learningbased phenotyping method to identify distinct groups of clinical trajectories of septic patients with ARF. For this retrospective study, we created a dataset from electronic medical records (EMR) consisting of data from sepsis patients admitted to medical intensive care units who required at least 24 hours of invasive mechanical ventilation at a quarternary care academic hospital in southeast USA for the years 2016-2021. A total of N=3349 patient encounters were included in this study. Clustering Representation Learning on Incomplete Time Series Data (CRLI) algorithm was applied to a parsimonious set of EMR variables in this data set. To validate the optimal number of clusters, the K-means algorithm was used in conjunction with dynamic time warping. Our model yielded four distinct patient phenotypes that were characterized as liver dysfunction/heterogeneous, hypercapnia, hypoxemia, and multiple organ dysfunction syndrome by a critical care expert. A Kaplan-Meier analysis to compare the 28-day mortality trends exhibited significant differences (p < 0.005) between the four phenotypes. The study demonstrates the utility of our deep representation learning-based approach in unraveling phenotypes that reflect the heterogeneity in sepsis-induced ARF in terms of different mortality outcomes and severity. These phenotypes might reveal important clinical insights into an effective prognosis and tailored treatment strategies.

5/7/2024