A Demographic-Conditioned Variational Autoencoder for fMRI Distribution Sampling and Removal of Confounds

Read original: arXiv:2405.07977 - Published 5/14/2024 by Anton Orlichenko, Gang Qu, Ziyu Zhou, Anqi Liu, Hong-Wen Deng, Zhengming Ding, Julia M. Stephen, Tony W. Wilson, Vince D. Calhoun, Yu-Ping Wang

🔗

Overview

The researchers create a variational autoencoder (VAE) model called DemoVAE to remove demographic confounds from fMRI data and generate high-quality synthetic fMRI data based on user-supplied demographics.
They train and validate their model on two large, widely used datasets: the Philadelphia Neurodevelopmental Cohort (PNC) and Bipolar and Schizophrenia Network for Intermediate Phenotypes (BSNIP).
The goal is to address the challenge of demographic confounds in fMRI data and the restricted access to many fMRI datasets.

Plain English Explanation

Functional magnetic resonance imaging (fMRI) is a powerful tool that can be used to study the brain and predict various cognitive and health-related outcomes. However, factors like a person's age, sex, and race can also influence the fMRI data, which can make it difficult to draw accurate conclusions.

To address this issue, the researchers developed a DemoVAE model, which is a type of machine learning algorithm that can remove the effects of demographic factors from the fMRI data. This allows the researchers to generate new, synthetic fMRI data that captures the natural variation in the data while removing the influence of demographics.

The researchers tested their DemoVAE model on two large fMRI datasets, the Philadelphia Neurodevelopmental Cohort (PNC) and the Bipolar and Schizophrenia Network for Intermediate Phenotypes (BSNIP). They found that the DemoVAE model was able to recapitulate the group differences in the fMRI data while also capturing the full range of individual variations.

Importantly, the researchers also found that most of the predictions made using the original fMRI data, such as predicting general intelligence or mental health status, were actually driven by the demographic factors rather than the underlying brain activity. This suggests that many of the previous findings using fMRI data may have been biased by these demographic confounds.

By developing the DemoVAE model, the researchers have provided a way to generate high-quality synthetic fMRI data that can be used by researchers without the restrictions of accessing the original datasets. This can help advance our understanding of the brain and its relationship to various cognitive and health outcomes, while also addressing the issue of demographic biases in the data.

Technical Explanation

The researchers create a variational autoencoder (VAE)-based model called DemoVAE to decorrelate fMRI features from demographic factors like age, sex, and race, and generate high-quality synthetic fMRI data conditioned on user-supplied demographics.

They train and validate their model using two large, widely used fMRI datasets: the Philadelphia Neurodevelopmental Cohort (PNC) and the Bipolar and Schizophrenia Network for Intermediate Phenotypes (BSNIP). The model is designed to recapitulate group differences in the fMRI data while capturing the full breadth of individual variations.

The researchers find that DemoVAE is able to generate fMRI data that captures the full distribution of functional connectivity (FC) better than traditional VAE or GAN models. Importantly, they also find that most clinical and cognitive fields that are correlated with the original fMRI data are not correlated with the latent representations learned by DemoVAE.

One exception is that several fields related to schizophrenia medication and symptom severity remain correlated with the DemoVAE latents. This suggests that the original fMRI-based predictions of these outcomes were largely driven by demographic confounds, rather than the underlying brain activity.

Critical Analysis

The researchers provide a thorough evaluation of their DemoVAE model and the implications of their findings. They acknowledge the limitations of their approach, such as the potential for the model to learn spurious correlations between the latent representations and certain clinical/cognitive fields.

Additionally, while the researchers demonstrate that DemoVAE can generate high-quality synthetic fMRI data, they do not explore the potential applications or practical implications of this capability in depth. Further research could investigate how these synthetic datasets could be used to advance neuroscience research or address challenges in data sharing and access.

Another area for further exploration is the extent to which the demographic confounds identified in this study generalize to other fMRI-based prediction tasks and datasets. The researchers focus on a specific set of datasets and outcomes, and it would be valuable to investigate the broader prevalence of these demographic biases in the field.

Overall, the DemoVAE model and the researchers' findings represent an important contribution to the field of neuroimaging, highlighting the need to carefully account for demographic factors in data analysis and the potential benefits of using generative models to address these challenges.

Conclusion

The researchers have developed a DemoVAE model that can remove demographic confounds from fMRI data and generate high-quality synthetic fMRI data based on user-supplied demographics. This addresses a significant challenge in the field of neuroimaging, where demographic factors can strongly influence the results of fMRI-based predictions.

By validating their model on two large, widely used fMRI datasets, the researchers have demonstrated the effectiveness of their approach and its potential to advance our understanding of the brain and its relationship to cognitive and health outcomes. Additionally, the ability to generate synthetic fMRI data can help address the challenge of data sharing and access, which is a significant barrier in the field.

Overall, the DemoVAE model represents an important step forward in the effort to mitigate the effects of demographic biases in neuroimaging research and unlock the full potential of fMRI data to provide insights into the human brain and mind.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🔗

A Demographic-Conditioned Variational Autoencoder for fMRI Distribution Sampling and Removal of Confounds

Anton Orlichenko, Gang Qu, Ziyu Zhou, Anqi Liu, Hong-Wen Deng, Zhengming Ding, Julia M. Stephen, Tony W. Wilson, Vince D. Calhoun, Yu-Ping Wang

Objective: fMRI and derived measures such as functional connectivity (FC) have been used to predict brain age, general fluid intelligence, psychiatric disease status, and preclinical neurodegenerative disease. However, it is not always clear that all demographic confounds, such as age, sex, and race, have been removed from fMRI data. Additionally, many fMRI datasets are restricted to authorized researchers, making dissemination of these valuable data sources challenging. Methods: We create a variational autoencoder (VAE)-based model, DemoVAE, to decorrelate fMRI features from demographics and generate high-quality synthetic fMRI data based on user-supplied demographics. We train and validate our model using two large, widely used datasets, the Philadelphia Neurodevelopmental Cohort (PNC) and Bipolar and Schizophrenia Network for Intermediate Phenotypes (BSNIP). Results: We find that DemoVAE recapitulates group differences in fMRI data while capturing the full breadth of individual variations. Significantly, we also find that most clinical and computerized battery fields that are correlated with fMRI data are not correlated with DemoVAE latents. An exception are several fields related to schizophrenia medication and symptom severity. Conclusion: Our model generates fMRI data that captures the full distribution of FC better than traditional VAE or GAN models. We also find that most prediction using fMRI data is dependent on correlation with, and prediction of, demographics. Significance: Our DemoVAE model allows for generation of high quality synthetic data conditioned on subject demographics as well as the removal of the confounding effects of demographics. We identify that FC-based prediction tasks are highly influenced by demographic confounds.

5/14/2024

Disentangling Hippocampal Shape Variations: A Study of Neurological Disorders Using Mesh Variational Autoencoder with Contrastive Learning

Jakaria Rabbi, Johannes Kiechle, Christian Beaulieu, Nilanjan Ray, Dana Cobzas

This paper presents a comprehensive study focused on disentangling hippocampal shape variations from diffusion tensor imaging (DTI) datasets within the context of neurological disorders. Leveraging a Graph Variational Autoencoder (VAE) enhanced with Supervised Contrastive Learning, our approach aims to improve interpretability by disentangling two distinct latent variables corresponding to age and the presence of diseases. In our ablation study, we investigate a range of VAE architectures and contrastive loss functions, showcasing the enhanced disentanglement capabilities of our approach. This evaluation uses synthetic 3D torus mesh data and real 3D hippocampal mesh datasets derived from the DTI hippocampal dataset. Our supervised disentanglement model outperforms several state-of-the-art (SOTA) methods like attribute and guided VAEs in terms of disentanglement scores. Our model distinguishes between age groups and disease status in patients with Multiple Sclerosis (MS) using the hippocampus data. Our Graph VAE with Supervised Contrastive Learning shows the volume changes of the hippocampus of MS populations at different ages, and the result is consistent with the current neuroimaging literature. This research provides valuable insights into the relationship between neurological disorder and hippocampal shape changes in different age groups of MS populations using a Graph VAE with Supervised Contrastive loss.

9/11/2024

Privacy-preserving datasets by capturing feature distributions with Conditional VAEs

Francesco Di Salvo, David Tafler, Sebastian Doerrich, Christian Ledig

Large and well-annotated datasets are essential for advancing deep learning applications, however often costly or impossible to obtain by a single entity. In many areas, including the medical domain, approaches relying on data sharing have become critical to address those challenges. While effective in increasing dataset size and diversity, data sharing raises significant privacy concerns. Commonly employed anonymization methods based on the k-anonymity paradigm often fail to preserve data diversity, affecting model robustness. This work introduces a novel approach using Conditional Variational Autoencoders (CVAEs) trained on feature vectors extracted from large pre-trained vision foundation models. Foundation models effectively detect and represent complex patterns across diverse domains, allowing the CVAE to faithfully capture the embedding space of a given data distribution to generate (sample) a diverse, privacy-respecting, and potentially unbounded set of synthetic feature vectors. Our method notably outperforms traditional approaches in both medical and natural image domains, exhibiting greater dataset diversity and higher robustness against perturbations while preserving sample privacy. These results underscore the potential of generative models to significantly impact deep learning applications in data-scarce and privacy-sensitive environments. The source code is available at https://github.com/francescodisalvo05/cvae-anonymization .

8/2/2024

Latent 3D Brain MRI Counterfactual

Wei Peng, Tian Xia, Fabio De Sousa Ribeiro, Tomas Bosschieter, Ehsan Adeli, Qingyu Zhao, Ben Glocker, Kilian M. Pohl

The number of samples in structural brain MRI studies is often too small to properly train deep learning models. Generative models show promise in addressing this issue by effectively learning the data distribution and generating high-fidelity MRI. However, they struggle to produce diverse, high-quality data outside the distribution defined by the training data. One way to address the issue is using causal models developed for 3D volume counterfactuals. However, accurately modeling causality in high-dimensional spaces is a challenge so that these models generally generate 3D brain MRIS of lower quality. To address these challenges, we propose a two-stage method that constructs a Structural Causal Model (SCM) within the latent space. In the first stage, we employ a VQ-VAE to learn a compact embedding of the MRI volume. Subsequently, we integrate our causal model into this latent space and execute a three-step counterfactual procedure using a closed-form Generalized Linear Model (GLM). Our experiments conducted on real-world high-resolution MRI data (1mm) demonstrate that our method can generate high-quality 3D MRI counterfactuals.

9/10/2024