Semi-Supervised Generative Models for Disease Trajectories: A Case Study on Systemic Sclerosis

Read original: arXiv:2407.11427 - Published 7/17/2024 by C'ecile Trottet, Manuel Schurch, Ahmed Allam, Imon Barua, Liubov Petelytska, Oliver Distler, Anna-Maria Hoffmann-Vold, Michael Krauthammer, the EUSTAR collaborators

Semi-Supervised Generative Models for Disease Trajectories: A Case Study on Systemic Sclerosis

Overview

This paper proposes a semi-supervised generative model for modeling disease trajectories, using systemic sclerosis as a case study. The key ideas are:

Developing a framework to jointly model both labeled and unlabeled disease progression data
Leveraging the unlabeled data to improve the model's ability to capture the underlying disease dynamics
Demonstrating the model's effectiveness on a systemic sclerosis dataset, a chronic autoimmune disease characterized by complex, heterogeneous progression patterns

Plain English Explanation

The researchers wanted to create a model that could better understand and predict how diseases like systemic sclerosis progress over time. Systemic sclerosis is a chronic autoimmune disease that can be very unpredictable, with patients showing a wide range of symptoms and disease progression patterns.

The researchers developed a new type of machine learning model that could use both labeled data (information from patients with known disease progression) and unlabeled data (information from patients where the disease progression is unknown). By combining these two types of data, the model was able to better capture the underlying patterns and dynamics of how the disease evolves.

This is important because it allows doctors and researchers to get a clearer picture of how systemic sclerosis affects different patients, which can help them provide more personalized and effective treatments. The model could also be applied to study the progression of other complex, heterogeneous diseases.

Technical Explanation

The researchers proposed a semi-supervised generative model for modeling disease trajectories. The key technical elements include:

A latent variable model that can jointly model both labeled (disease progression is known) and unlabeled (disease progression is unknown) data
Leveraging the unlabeled data to better capture the underlying disease dynamics and progression patterns
Applying this framework to a systemic sclerosis dataset, which exhibits complex, heterogeneous progression patterns

The model was able to outperform baseline approaches on tasks like predicting future disease progression and identifying subgroups of patients with similar trajectories. This demonstrates the value of combining labeled and unlabeled data to build more robust and informative models of complex disease processes.

Critical Analysis

The paper provides a thoughtful approach to leveraging both labeled and unlabeled data for modeling disease trajectories. The use of a semi-supervised generative model is a promising direction, as it allows the model to learn from the broader pool of available data, including cases where the full disease progression is unknown.

However, the paper does note some limitations, such as the relatively small size of the systemic sclerosis dataset used for evaluation. Larger and more diverse datasets would be helpful to further validate the model's performance and generalizability.

Additionally, while the paper demonstrates the model's ability to identify subgroups of patients with similar trajectories, more work may be needed to understand the clinical relevance and implications of these subgroups. Collaboration with domain experts could help provide deeper insights in this area.

Conclusion

This research presents a novel semi-supervised generative model for modeling complex disease trajectories, using systemic sclerosis as a case study. By jointly leveraging labeled and unlabeled data, the model is able to better capture the underlying dynamics of disease progression, which is valuable for improving clinical decision-making and developing more personalized treatments.

The findings demonstrate the potential of combining machine learning techniques with domain knowledge to gain a deeper understanding of chronic, heterogeneous diseases. Further research in this direction could lead to significant advancements in the management and care of patients with complex health conditions.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Semi-Supervised Generative Models for Disease Trajectories: A Case Study on Systemic Sclerosis

C'ecile Trottet, Manuel Schurch, Ahmed Allam, Imon Barua, Liubov Petelytska, Oliver Distler, Anna-Maria Hoffmann-Vold, Michael Krauthammer, the EUSTAR collaborators

We propose a deep generative approach using latent temporal processes for modeling and holistically analyzing complex disease trajectories, with a particular focus on Systemic Sclerosis (SSc). We aim to learn temporal latent representations of the underlying generative process that explain the observed patient disease trajectories in an interpretable and comprehensive way. To enhance the interpretability of these latent temporal processes, we develop a semi-supervised approach for disentangling the latent space using established medical knowledge. By combining the generative approach with medical definitions of different characteristics of SSc, we facilitate the discovery of new aspects of the disease. We show that the learned temporal latent processes can be utilized for further data analysis and clinical hypothesis testing, including finding similar patients and clustering SSc patient trajectories into novel sub-types. Moreover, our method enables personalized online monitoring and prediction of multivariate time series with uncertainty quantification.

7/17/2024

Probabilistic Temporal Prediction of Continuous Disease Trajectories and Treatment Effects Using Neural SDEs

Joshua Durso-Finley, Berardino Barile, Jean-Pierre Falet, Douglas L. Arnold, Nick Pawlowski, Tal Arbel

Personalized medicine based on medical images, including predicting future individualized clinical disease progression and treatment response, would have an enormous impact on healthcare and drug development, particularly for diseases (e.g. multiple sclerosis (MS)) with long term, complex, heterogeneous evolutions and no cure. In this work, we present the first stochastic causal temporal framework to model the continuous temporal evolution of disease progression via Neural Stochastic Differential Equations (NSDE). The proposed causal inference model takes as input the patient's high dimensional images (MRI) and tabular data, and predicts both factual and counterfactual progression trajectories on different treatments in latent space. The NSDE permits the estimation of high-confidence personalized trajectories and treatment effects. Extensive experiments were performed on a large, multi-centre, proprietary dataset of patient 3D MRI and clinical data acquired during several randomized clinical trials for MS treatments. Our results present the first successful uncertainty-based causal Deep Learning (DL) model to: (a) accurately predict future patient MS disability evolution (e.g. EDSS) and treatment effects leveraging baseline MRI, and (b) permit the discovery of subgroups of patients for which the model has high confidence in their response to treatment even in clinical trials which did not reach their clinical endpoints.

6/19/2024

Semi-Supervised Learning for Deep Causal Generative Models

Yasin Ibrahim, Hermione Warr, Konstantinos Kamnitsas

Developing models that are capable of answering questions of the form How would x change if y had been z?' is fundamental to advancing medical image analysis. Training causal generative models that address such counterfactual questions, though, currently requires that all relevant variables have been observed and that the corresponding labels are available in the training data. However, clinical data may not have complete records for all patients and state of the art causal generative models are unable to take full advantage of this. We thus develop, for the first time, a semi-supervised deep causal generative model that exploits the causal relationships between variables to maximise the use of all available data. We explore this in the setting where each sample is either fully labelled or fully unlabelled, as well as the more clinically realistic case of having different labels missing for each sample. We leverage techniques from causal inference to infer missing values and subsequently generate realistic counterfactuals, even for samples with incomplete labels.

7/15/2024

ImageFlowNet: Forecasting Multiscale Trajectories of Disease Progression with Irregularly-Sampled Longitudinal Medical Images

Chen Liu, Ke Xu, Liangbo L. Shen, Guillaume Huguet, Zilong Wang, Alexander Tong, Danilo Bzdok, Jay Stewart, Jay C. Wang, Lucian V. Del Priore, Smita Krishnaswamy

The forecasting of disease progression from images is a holy grail for clinical decision making. However, this task is complicated by the inherent high dimensionality, temporal sparsity and sampling irregularity in longitudinal image acquisitions. Existing methods often rely on extracting hand-crafted features and performing time-series analysis in this vector space, leading to a loss of rich spatial information within the images. To overcome these challenges, we introduce ImageFlowNet, a novel framework that learns latent-space flow fields that evolve multiscale representations in joint embedding spaces using neural ODEs and SDEs to model disease progression in the image domain. Notably, ImageFlowNet learns multiscale joint representation spaces by combining cohorts of patients together so that information can be transferred between the patient samples. The dynamics then provide plausible trajectories of progression, with the SDE providing alternative trajectories from the same starting point. We provide theoretical insights that support our formulation of ODEs, and motivate our regularizations involving high-level visual features, latent space organization, and trajectory smoothness. We then demonstrate ImageFlowNet's effectiveness through empirical evaluations on three longitudinal medical image datasets depicting progression in retinal geographic atrophy, multiple sclerosis, and glioblastoma.

7/15/2024