Mind-to-Image: Projecting Visual Mental Imagination of the Brain from fMRI

2404.05468

Published 5/29/2024 by Hugo Caselles-Dupr'e, Charles Mellerio, Paul H'erent, Aliz'ee Lopez-Persem, Benoit B'eranger, Mathieu Soularue, Pierre Fautrel, Gauthier Vernier, Matthieu Cord

cs.CV cs.LG

Mind-to-Image: Projecting Visual Mental Imagination of the Brain from fMRI

Abstract

The reconstruction of images observed by subjects from fMRI data collected during visual stimuli has made strong progress in the past decade, thanks to the availability of extensive fMRI datasets and advancements in generative models for image generation. However, the application of visual reconstruction has remained limited. Reconstructing visual imagination presents a greater challenge, with potentially revolutionary applications ranging from aiding individuals with disabilities to verifying witness accounts in court. The primary hurdles in this field are the absence of data collection protocols for visual imagery and the lack of datasets on the subject. Traditionally, fMRI-to-image relies on data collected from subjects exposed to visual stimuli, which poses issues for generating visual imagery based on the difference of brain activity between visual stimulation and visual imagery. For the first time, we have compiled a substantial dataset (around 6h of scans) on visual imagery along with a proposed data collection protocol. We then train a modified version of an fMRI-to-image model and demonstrate the feasibility of reconstructing images from two modes of imagination: from memory and from pure imagination. The resulting pipeline we call Mind-to-Image marks a step towards creating a technology that allow direct reconstruction of visual imagery.

Create account to get full access

Overview

• This paper presents a novel method for projecting visual mental imagery from functional Magnetic Resonance Imaging (fMRI) data, allowing researchers to reconstruct what a person is imagining in their mind.

• The technique, called "Mind-to-Image," leverages deep learning models to translate brain activity patterns captured by fMRI scans into corresponding visual images.

• This research builds on previous work in brain-computer interfaces and visual decoding from neural data, pushing the boundaries of what's possible in decoding thought processes from neuroimaging data.

Plain English Explanation

The researchers have developed a way to reconstruct what someone is mentally picturing or imagining, just from looking at their brain activity. They used fMRI scans, which measure blood flow in the brain, to capture the patterns of neural activity that occur when people visualize different objects or scenes in their minds.

By training powerful AI models on these fMRI brain data and corresponding visual images, the researchers were able to essentially "reverse engineer" the connection between brain activity and mental imagery. Now, when they see a new person's fMRI data, their models can generate a visual reconstruction of what that person is picturing in their mind's eye.

This is an exciting advance that could have applications in brain-computer interfaces, neurorehabilitation, and cognitive neuroscience research. It opens up new ways of understanding and interacting with the human mind.

Technical Explanation

The core of the "Mind-to-Image" approach is a deep learning architecture that learns to map fMRI brain activity patterns to corresponding visual images. The researchers first collected fMRI data from participants as they visually imagined a diverse set of objects, scenes, and concepts.

They then trained a series of convolutional neural networks to learn the complex relationship between the multi-voxel fMRI data and the pixel-level image representations. This allowed their models to effectively "translate" a person's neural activity into a reconstructed visual image.

The researchers experimented with different model architectures and training strategies, including incorporating knowledge from pre-trained vision models to improve performance. Their final "Mind-to-Image" model was able to generate remarkably detailed and accurate visual reconstructions from unseen fMRI data, demonstrating the power of this approach.

Critical Analysis

While the results presented in this paper are impressive, the authors acknowledge several important limitations and areas for further research. For instance, the image reconstructions are still somewhat blurry and lack fine-grained details compared to the original mental imagery.

Additionally, the experiments were conducted in a controlled lab setting, and it remains to be seen how well the models would generalize to more naturalistic, unconstrained thought processes. There are also open questions about the interpretability of the learned representations and the cognitive mechanisms underlying the brain-to-image mapping.

Further work is needed to address these challenges and refine the "Mind-to-Image" approach. Incorporating advancements in generative modeling and leveraging multimodal brain data could help improve the fidelity and robustness of the reconstructions.

Conclusion

The "Mind-to-Image" technique represents a significant step forward in our ability to decode and externalize human visual mental imagery from brain activity. By bridging the gap between neural representations and perceptual experiences, this research opens up new frontiers for brain-computer interaction, cognitive neuroscience, and our understanding of the human mind.

While the current implementation has some limitations, the potential implications of this work are vast. As the field of "brain reading" and neural decoding continues to advance, we may one day be able to effortlessly share our inner mental worlds and collaborate in ways that were previously unimaginable.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data

Paul S. Scotti, Mihir Tripathy, Cesar Kadir Torrico Villanueva, Reese Kneeland, Tong Chen, Ashutosh Narang, Charan Santhirasegaran, Jonathan Xu, Thomas Naselaris, Kenneth A. Norman, Tanishq Mathew Abraham

Reconstructions of visual perception from brain activity have improved tremendously, but the practical utility of such methods has been limited. This is because such models are trained independently per subject where each subject requires dozens of hours of expensive fMRI training data to attain high-quality results. The present work showcases high-quality reconstructions using only 1 hour of fMRI training data. We pretrain our model across 7 subjects and then fine-tune on minimal data from a new subject. Our novel functional alignment procedure linearly maps all brain data to a shared-subject latent space, followed by a shared non-linear mapping to CLIP image space. We then map from CLIP space to pixel space by fine-tuning Stable Diffusion XL to accept CLIP latents as inputs instead of text. This approach improves out-of-subject generalization with limited training data and also attains state-of-the-art image retrieval and reconstruction metrics compared to single-subject approaches. MindEye2 demonstrates how accurate reconstructions of perception are possible from a single visit to the MRI facility. All code is available on GitHub.

6/18/2024

cs.CV cs.AI

Progress Towards Decoding Visual Imagery via fNIRS

Michel Adamic, Wellington Avelino, Anna Brandenberger, Bryan Chiang, Hunter Davis, Stephen Fay, Andrew Gregory, Aayush Gupta, Raphael Hotter, Grace Jiang, Fiona Leng, Stephen Polcyn, Thomas Ribeiro, Paul Scotti, Michelle Wang, Marley Xiong, Jonathan Xu

We demonstrate the possibility of reconstructing images from fNIRS brain activity and start building a prototype to match the required specs. By training an image reconstruction model on downsampled fMRI data, we discovered that cm-scale spatial resolution is sufficient for image generation. We obtained 71% retrieval accuracy with 1-cm resolution, compared to 93% on the full-resolution fMRI, and 20% with 2-cm resolution. With simulations and high-density tomography, we found that time-domain fNIRS can achieve 1-cm resolution, compared to 2-cm resolution for continuous-wave fNIRS. Lastly, we share designs for a prototype time-domain fNIRS device, consisting of a laser driver, a single photon detector, and a time-to-digital converter system.

6/26/2024

eess.IV cs.AI cs.CV cs.LG

Reconstructing Retinal Visual Images from 3T fMRI Data Enhanced by Unsupervised Learning

Yujian Xiong, Wenhui Zhu, Zhong-Lin Lu, Yalin Wang

The reconstruction of human visual inputs from brain activity, particularly through functional Magnetic Resonance Imaging (fMRI), holds promising avenues for unraveling the mechanisms of the human visual system. Despite the significant strides made by deep learning methods in improving the quality and interpretability of visual reconstruction, there remains a substantial demand for high-quality, long-duration, subject-specific 7-Tesla fMRI experiments. The challenge arises in integrating diverse smaller 3-Tesla datasets or accommodating new subjects with brief and low-quality fMRI scans. In response to these constraints, we propose a novel framework that generates enhanced 3T fMRI data through an unsupervised Generative Adversarial Network (GAN), leveraging unpaired training across two distinct fMRI datasets in 7T and 3T, respectively. This approach aims to overcome the limitations of the scarcity of high-quality 7-Tesla data and the challenges associated with brief and low-quality scans in 3-Tesla experiments. In this paper, we demonstrate the reconstruction capabilities of the enhanced 3T fMRI data, highlighting its proficiency in generating superior input visual images compared to data-intensive methods trained and tested on a single subject.

4/9/2024

cs.CV

🌿

Animate Your Thoughts: Decoupled Reconstruction of Dynamic Natural Vision from Slow Brain Activity

Yizhuo Lu, Changde Du, Chong Wang, Xuanliu Zhu, Liuyun Jiang, Huiguang He

Reconstructing human dynamic vision from brain activity is a challenging task with great scientific significance. The difficulty stems from two primary issues: (1) vision-processing mechanisms in the brain are highly intricate and not fully revealed, making it challenging to directly learn a mapping between fMRI and video; (2) the temporal resolution of fMRI is significantly lower than that of natural videos. To overcome these issues, this paper propose a two-stage model named Mind-Animator, which achieves state-of-the-art performance on three public datasets. Specifically, during the fMRI-to-feature stage, we decouple semantic, structural, and motion features from fMRI through fMRI-vision-language tri-modal contrastive learning and sparse causal attention. In the feature-to-video stage, these features are merged to videos by an inflated Stable Diffusion. We substantiate that the reconstructed video dynamics are indeed derived from fMRI, rather than hallucinations of the generative model, through permutation tests. Additionally, the visualization of voxel-wise and ROI-wise importance maps confirms the neurobiological interpretability of our model.

5/7/2024

cs.CV cs.AI