3DTINC: Time-Equivariant Non-Contrastive Learning for Predicting Disease Progression from Longitudinal OCTs

2312.16980

Published 5/14/2024 by Taha Emre, Arunava Chakravarty, Antoine Rivail, Dmitrii Lachinov, Oliver Leingang, Sophie Riedl, Julia Mai, Hendrik P. N. Scholl, Sobha Sivaprasad, Daniel Rueckert and 3 others

cs.CV cs.LG

✨

Abstract

Self-supervised learning (SSL) has emerged as a powerful technique for improving the efficiency and effectiveness of deep learning models. Contrastive methods are a prominent family of SSL that extract similar representations of two augmented views of an image while pushing away others in the representation space as negatives. However, the state-of-the-art contrastive methods require large batch sizes and augmentations designed for natural images that are impractical for 3D medical images. To address these limitations, we propose a new longitudinal SSL method, 3DTINC, based on non-contrastive learning. It is designed to learn perturbation-invariant features for 3D optical coherence tomography (OCT) volumes, using augmentations specifically designed for OCT. We introduce a new non-contrastive similarity loss term that learns temporal information implicitly from intra-patient scans acquired at different times. Our experiments show that this temporal information is crucial for predicting progression of retinal diseases, such as age-related macular degeneration (AMD). After pretraining with 3DTINC, we evaluated the learned representations and the prognostic models on two large-scale longitudinal datasets of retinal OCTs where we predict the conversion to wet-AMD within a six months interval. Our results demonstrate that each component of our contributions is crucial for learning meaningful representations useful in predicting disease progression from longitudinal volumetric scans.

Create account to get full access

Overview

This paper proposes a new self-supervised learning (SSL) method called 3DTINC for learning representations from 3D optical coherence tomography (OCT) volumes of the retina.
Existing contrastive SSL methods require large batch sizes and augmentations designed for natural images, which are impractical for 3D medical images like OCT.
3DTINC uses a non-contrastive similarity loss to learn perturbation-invariant features from intra-patient OCT scans acquired at different timepoints, capturing temporal information crucial for predicting disease progression.
The authors evaluate the learned representations on two large-scale longitudinal datasets for predicting conversion to wet age-related macular degeneration (AMD), demonstrating the importance of their contributions.

Plain English Explanation

Self-supervised learning is a powerful technique for improving the performance of deep learning models, especially when labeled data is scarce. Contrastive methods are a prominent family of SSL that learn similar representations for two modified versions of an image, while pushing away representations of other images as negatives.

However, existing contrastive SSL methods have limitations when applied to 3D medical images like optical coherence tomography (OCT) of the retina. They require large batches of images and data augmentations designed for natural images, which are not suitable for the 3D structure and noise characteristics of OCT scans.

To address these challenges, the researchers propose a new SSL method called 3DTINC that uses a non-contrastive approach. Instead of pushing away negative examples, 3DTINC learns perturbation-invariant features by exploiting the temporal information inherent in longitudinal OCT scans of the same patient acquired at different timepoints.

The key insight is that the changes in a patient's retinal structure over time, even if subtle, contain valuable information about the progression of diseases like age-related macular degeneration (AMD). By learning to extract features that are consistent across these intra-patient scans, 3DTINC can capture this temporal information, which is crucial for predicting disease progression.

Technical Explanation

The 3DTINC method is designed to learn representations from 3D OCT volumes that are robust to perturbations and can capture temporal information relevant for predicting disease progression. It differs from contrastive SSL approaches in several ways:

Non-contrastive learning: Instead of pushing away "negative" examples in the representation space, 3DTINC uses a novel non-contrastive similarity loss that encourages the model to learn features that are consistent across different timepoints for the same patient, as opposed to across different patients.
Temporal information: By leveraging the longitudinal nature of OCT scans, 3DTINC can implicitly learn temporal information about disease progression, which is crucial for predicting outcomes like conversion to wet AMD.
Targeted augmentations: The authors design data augmentations specifically for 3D OCT volumes, such as applying random rotations, scaling, and noise, to ensure the learned representations are robust to the unique characteristics of this modality.

The authors evaluate the representations learned by 3DTINC on two large-scale longitudinal datasets of retinal OCT scans, where the task is to predict the conversion to wet AMD within a 6-month interval. They show that each component of their approach - the non-contrastive loss, the use of temporal information, and the specialized augmentations - is crucial for achieving state-of-the-art performance on this prognostic task.

Critical Analysis

The 3DTINC method addresses important limitations of existing contrastive SSL approaches when applied to 3D medical imaging data. By focusing on learning perturbation-invariant features from longitudinal scans, the authors demonstrate the value of incorporating temporal information, which is often overlooked in computer vision tasks.

However, a potential limitation of the work is the reliance on intra-patient scans, which may not always be available, especially for rare diseases. An interesting direction for future research could be to explore ways of leveraging inter-patient information, perhaps through anatomical conditioning or adapting self-supervised learning to computational pathology.

Additionally, the authors could further investigate the interpretability of the learned representations and how they capture clinically relevant features for disease progression. Techniques like LATIM could be leveraged to better understand the temporal dynamics encoded in the representations.

Overall, the 3DTINC method is a promising approach for learning robust representations from 3D medical imaging data, with potential applications in a wide range of prognostic tasks beyond the specific case of AMD studied in this work.

Conclusion

The 3DTINC method proposed in this paper addresses key limitations of existing contrastive self-supervised learning techniques when applied to 3D medical imaging data, such as optical coherence tomography (OCT) of the retina. By leveraging the temporal information inherent in longitudinal scans and using specialized data augmentations, 3DTINC can learn perturbation-invariant features that are crucial for predicting disease progression, as demonstrated by its state-of-the-art performance on predicting conversion to wet age-related macular degeneration.

This work highlights the importance of tailoring self-supervised learning approaches to the unique characteristics of medical imaging modalities, and the potential of exploiting temporal information for prognostic tasks. The insights from this research could inspire further advancements in self-supervised representation learning for 3D medical imaging, with broader implications for improving the efficiency and effectiveness of deep learning in healthcare applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Time-Equivariant Contrastive Learning for Degenerative Disease Progression in Retinal OCT

Taha Emre, Arunava Chakravarty, Dmitrii Lachinov, Antoine Rivail, Ursula Schmidt-Erfurth, Hrvoje Bogunovi'c

Contrastive pretraining provides robust representations by ensuring their invariance to different image transformations while simultaneously preventing representational collapse. Equivariant contrastive learning, on the other hand, provides representations sensitive to specific image transformations while remaining invariant to others. By introducing equivariance to time-induced transformations, such as disease-related anatomical changes in longitudinal imaging, the model can effectively capture such changes in the representation space. In this work, we pro-pose a Time-equivariant Contrastive Learning (TC) method. First, an encoder embeds two unlabeled scans from different time points of the same patient into the representation space. Next, a temporal equivariance module is trained to predict the representation of a later visit based on the representation from one of the previous visits and the corresponding time interval with a novel regularization loss term while preserving the invariance property to irrelevant image transformations. On a large longitudinal dataset, our model clearly outperforms existing equivariant contrastive methods in predicting progression from intermediate age-related macular degeneration (AMD) to advanced wet-AMD within a specified time-window.

5/16/2024

cs.CV

LaTiM: Longitudinal representation learning in continuous-time models to predict disease progression

Rachid Zeghlache, Pierre-Henri Conze, Mostafa El Habib Daho, Yihao Li, Hugo Le Boit'e, Ramin Tadayoni, Pascal Massin, B'eatrice Cochener, Alireza Rezaei, Ikram Brahim, Gwenol'e Quellec, Mathieu Lamard

This work proposes a novel framework for analyzing disease progression using time-aware neural ordinary differential equations (NODE). We introduce a time-aware head in a framework trained through self-supervised learning (SSL) to leverage temporal information in latent space for data augmentation. This approach effectively integrates NODEs with SSL, offering significant performance improvements compared to traditional methods that lack explicit temporal integration. We demonstrate the effectiveness of our strategy for diabetic retinopathy progression prediction using the OPHDIAT database. Compared to the baseline, all NODE architectures achieve statistically significant improvements in area under the ROC curve (AUC) and Kappa metrics, highlighting the efficacy of pre-training with SSL-inspired approaches. Additionally, our framework promotes stable training for NODEs, a commonly encountered challenge in time-aware modeling.

4/11/2024

cs.LG cs.AI

Similarity-aware Syncretic Latent Diffusion Model for Medical Image Translation with Representation Learning

Tingyi Lin, Pengju Lyu, Jie Zhang, Yuqing Wang, Cheng Wang, Jianjun Zhu

Non-contrast CT (NCCT) imaging may reduce image contrast and anatomical visibility, potentially increasing diagnostic uncertainty. In contrast, contrast-enhanced CT (CECT) facilitates the observation of regions of interest (ROI). Leading generative models, especially the conditional diffusion model, demonstrate remarkable capabilities in medical image modality transformation. Typical conditional diffusion models commonly generate images with guidance of segmentation labels for medical modal transformation. Limited access to authentic guidance and its low cardinality can pose challenges to the practical clinical application of conditional diffusion models. To achieve an equilibrium of generative quality and clinical practices, we propose a novel Syncretic generative model based on the latent diffusion model for medical image translation (S$^2$LDM), which can realize high-fidelity reconstruction without demand of additional condition during inference. S$^2$LDM enhances the similarity in distinct modal images via syncretic encoding and diffusing, promoting amalgamated information in the latent space and generating medical images with more details in contrast-enhanced regions. However, syncretic latent spaces in the frequency domain tend to favor lower frequencies, commonly locate in identical anatomic structures. Thus, S$^2$LDM applies adaptive similarity loss and dynamic similarity to guide the generation and supplements the shortfall in high-frequency details throughout the training process. Quantitative experiments confirm the effectiveness of our approach in medical image translation. Our code will release lately.

6/21/2024

eess.IV cs.CV

🖼️

Harnessing the power of longitudinal medical imaging for eye disease prognosis using Transformer-based sequence modeling

Gregory Holste, Mingquan Lin, Ruiwen Zhou, Fei Wang, Lei Liu, Qi Yan, Sarah H. Van Tassel, Kyle Kovacs, Emily Y. Chew, Zhiyong Lu, Zhangyang Wang, Yifan Peng

Deep learning has enabled breakthroughs in automated diagnosis from medical imaging, with many successful applications in ophthalmology. However, standard medical image classification approaches only assess disease presence at the time of acquisition, neglecting the common clinical setting of longitudinal imaging. For slow, progressive eye diseases like age-related macular degeneration (AMD) and primary open-angle glaucoma (POAG), patients undergo repeated imaging over time to track disease progression and forecasting the future risk of developing disease is critical to properly plan treatment. Our proposed Longitudinal Transformer for Survival Analysis (LTSA) enables dynamic disease prognosis from longitudinal medical imaging, modeling the time to disease from sequences of fundus photography images captured over long, irregular time periods. Using longitudinal imaging data from the Age-Related Eye Disease Study (AREDS) and Ocular Hypertension Treatment Study (OHTS), LTSA significantly outperformed a single-image baseline in 19/20 head-to-head comparisons on late AMD prognosis and 18/20 comparisons on POAG prognosis. A temporal attention analysis also suggested that, while the most recent image is typically the most influential, prior imaging still provides additional prognostic value.

5/15/2024

cs.CV cs.AI