DREAM: Visual Decoding from Reversing Human Visual System

2310.02265

Published 4/11/2024 by Weihao Xia, Raoul de Charette, Cengiz Oztireli, Jing-Hao Xue

✅

Abstract

In this work we present DREAM, an fMRI-to-image method for reconstructing viewed images from brain activities, grounded on fundamental knowledge of the human visual system. We craft reverse pathways that emulate the hierarchical and parallel nature of how humans perceive the visual world. These tailored pathways are specialized to decipher semantics, color, and depth cues from fMRI data, mirroring the forward pathways from visual stimuli to fMRI recordings. To do so, two components mimic the inverse processes within the human visual system: the Reverse Visual Association Cortex (R-VAC) which reverses pathways of this brain region, extracting semantics from fMRI data; the Reverse Parallel PKM (R-PKM) component simultaneously predicting color and depth from fMRI signals. The experiments indicate that our method outperforms the current state-of-the-art models in terms of the consistency of appearance, structure, and semantics. Code will be made publicly available to facilitate further research in this field.

Create account to get full access

Overview

Presents a method called DREAM for reconstructing viewed images from brain activity data (fMRI)
Leverages knowledge of the human visual system to create reverse pathways that extract semantics, color, and depth cues from fMRI signals
Outperforms current state-of-the-art models in terms of appearance, structure, and semantic consistency

Plain English Explanation

The paper describes a new technique called DREAM that can reconstruct images from brain activity data recorded using fMRI. The key insight is that the researchers designed the DREAM method to mimic the way the human visual system works in reverse.

Specifically, DREAM has two main components. The first component, called the Reverse Visual Association Cortex (R-VAC), tries to extract the semantic meaning or high-level concepts from the fMRI data, just like how the visual association cortex in the brain processes visual information. The second component, called the Reverse Parallel PKM (R-PKM), simultaneously predicts the color and depth information from the fMRI signals, mirroring how the human visual system processes these low-level visual cues in parallel.

By engineering these reverse pathways to mirror the forward flow of visual processing in the brain, the DREAM method is able to reconstruct viewed images more accurately than previous approaches. The resulting images have better consistency in terms of appearance, structure, and semantics.

Technical Explanation

The DREAM method proposed in this work aims to reconstruct viewed images from functional magnetic resonance imaging (fMRI) data, building on our fundamental understanding of the hierarchical and parallel nature of the human visual system.

The key components of DREAM are:

Reverse Visual Association Cortex (R-VAC): This module is designed to extract semantic information from the fMRI data, mimicking the function of the visual association cortex in the brain, which processes high-level visual concepts.
Reverse Parallel PKM (R-PKM): This component simultaneously predicts color and depth cues from the fMRI signals, emulating the parallel processing of low-level visual features in the human visual system.

By crafting these specialized reverse pathways, DREAM is able to capture the different types of information encoded in the fMRI data and use them to reconstruct the original viewed images. The experiments show that this approach outperforms current state-of-the-art models in terms of the consistency of the reconstructed images with the actual appearance, structure, and semantics of the viewed stimuli.

Critical Analysis

The paper provides a compelling approach to image reconstruction from fMRI data, grounded in our understanding of the human visual system. By designing specialized reverse pathways to extract semantic, color, and depth information, the DREAM method is able to generate more coherent and faithful reconstructions compared to previous techniques.

However, the paper does not discuss some potential limitations of the approach. For example, the experiments are conducted on a relatively small dataset, and it's unclear how well DREAM would scale to more diverse and complex visual stimuli. Additionally, the reliance on fMRI data, which has relatively low spatial and temporal resolution, may limit the quality of the reconstructed images compared to approaches using higher-resolution neural recordings.

Further research could explore ways to incorporate additional neural and cognitive information to improve the reconstruction accuracy and robustness of the DREAM method. Additionally, exploring the interpretability of the learned reverse pathways could provide valuable insights into the neural mechanisms underlying human visual perception.

Conclusion

The DREAM method presented in this work offers a promising approach to reconstructing viewed images from fMRI data, leveraging our understanding of the human visual system. By engineering reverse pathways that extract semantic, color, and depth information, DREAM is able to generate more consistent and faithful image reconstructions compared to previous state-of-the-art techniques.

While the paper demonstrates the effectiveness of this approach, further research is needed to address potential limitations and explore ways to enhance the reconstruction quality and generalizability of the method. Overall, this work represents an important step forward in the field of neural image reconstruction, with valuable implications for our understanding of human visual perception and potential applications in brain-computer interfaces and cognitive neuroscience.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Mind-to-Image: Projecting Visual Mental Imagination of the Brain from fMRI

Hugo Caselles-Dupr'e, Charles Mellerio, Paul H'erent, Aliz'ee Lopez-Persem, Benoit B'eranger, Mathieu Soularue, Pierre Fautrel, Gauthier Vernier, Matthieu Cord

The reconstruction of images observed by subjects from fMRI data collected during visual stimuli has made strong progress in the past decade, thanks to the availability of extensive fMRI datasets and advancements in generative models for image generation. However, the application of visual reconstruction has remained limited. Reconstructing visual imagination presents a greater challenge, with potentially revolutionary applications ranging from aiding individuals with disabilities to verifying witness accounts in court. The primary hurdles in this field are the absence of data collection protocols for visual imagery and the lack of datasets on the subject. Traditionally, fMRI-to-image relies on data collected from subjects exposed to visual stimuli, which poses issues for generating visual imagery based on the difference of brain activity between visual stimulation and visual imagery. For the first time, we have compiled a substantial dataset (around 6h of scans) on visual imagery along with a proposed data collection protocol. We then train a modified version of an fMRI-to-image model and demonstrate the feasibility of reconstructing images from two modes of imagination: from memory and from pure imagination. The resulting pipeline we call Mind-to-Image marks a step towards creating a technology that allow direct reconstruction of visual imagery.

5/29/2024

cs.CV cs.LG

🌿

Animate Your Thoughts: Decoupled Reconstruction of Dynamic Natural Vision from Slow Brain Activity

Yizhuo Lu, Changde Du, Chong Wang, Xuanliu Zhu, Liuyun Jiang, Huiguang He

Reconstructing human dynamic vision from brain activity is a challenging task with great scientific significance. The difficulty stems from two primary issues: (1) vision-processing mechanisms in the brain are highly intricate and not fully revealed, making it challenging to directly learn a mapping between fMRI and video; (2) the temporal resolution of fMRI is significantly lower than that of natural videos. To overcome these issues, this paper propose a two-stage model named Mind-Animator, which achieves state-of-the-art performance on three public datasets. Specifically, during the fMRI-to-feature stage, we decouple semantic, structural, and motion features from fMRI through fMRI-vision-language tri-modal contrastive learning and sparse causal attention. In the feature-to-video stage, these features are merged to videos by an inflated Stable Diffusion. We substantiate that the reconstructed video dynamics are indeed derived from fMRI, rather than hallucinations of the generative model, through permutation tests. Additionally, the visualization of voxel-wise and ROI-wise importance maps confirms the neurobiological interpretability of our model.

5/7/2024

cs.CV cs.AI

Automating the Diagnosis of Human Vision Disorders by Cross-modal 3D Generation

Li Zhang, Yuankun Yang, Ziyang Xie, Zhiyuan Yuan, Jianfeng Feng, Xiatian Zhu, Yu-Gang Jiang

Understanding the hidden mechanisms behind human's visual perception is a fundamental quest in neuroscience, underpins a wide variety of critical applications, e.g. clinical diagnosis. To that end, investigating into the neural responses of human mind activities, such as functional Magnetic Resonance Imaging (fMRI), has been a significant research vehicle. However, analyzing fMRI signals is challenging, costly, daunting, and demanding for professional training. Despite remarkable progress in artificial intelligence (AI) based fMRI analysis, existing solutions are limited and far away from being clinically meaningful. In this context, we leap forward to demonstrate how AI can go beyond the current state of the art by decoding fMRI into visually plausible 3D visuals, enabling automatic clinical analysis of fMRI data, even without healthcare professionals. Innovationally, we reformulate the task of analyzing fMRI data as a conditional 3D scene reconstruction problem. We design a novel cross-modal 3D scene representation learning method, Brain3D, that takes as input the fMRI data of a subject who was presented with a 2D object image, and yields as output the corresponding 3D object visuals. Importantly, we show that in simulated scenarios our AI agent captures the distinct functionalities of each region of human vision system as well as their intricate interplay relationships, aligning remarkably with the established discoveries of neuroscience. Non-expert diagnosis indicate that Brain3D can successfully identify the disordered brain regions, such as V1, V2, V3, V4, and the medial temporal lobe (MTL) within the human visual system. We also present results in cross-modal 3D visual construction setting, showcasing the perception quality of our 3D scene generation.

5/27/2024

cs.CV

🖼️

Neuro-Vision to Language: Image Reconstruction and Interaction via Non-invasive Brain Recordings

Guobin Shen, Dongcheng Zhao, Xiang He, Linghao Feng, Yiting Dong, Jihang Wang, Qian Zhang, Yi Zeng

Decoding non-invasive brain recordings is pivotal for advancing our understanding of human cognition but faces challenges due to individual differences and complex neural signal representations. Traditional methods often require customized models and extensive trials, lacking interpretability in visual reconstruction tasks. Our framework integrates 3D brain structures with visual semantics using a Vision Transformer 3D. This unified feature extractor efficiently aligns fMRI features with multiple levels of visual embeddings, eliminating the need for subject-specific models and allowing extraction from single-trial data. The extractor consolidates multi-level visual features into one network, simplifying integration with Large Language Models (LLMs). Additionally, we have enhanced the fMRI dataset with diverse fMRI-image-related textual data to support multimodal large model development. Integrating with LLMs enhances decoding capabilities, enabling tasks such as brain captioning, complex reasoning, concept localization, and visual reconstruction. Our approach demonstrates superior performance across these tasks, precisely identifying language-based concepts within brain signals, enhancing interpretability, and providing deeper insights into neural processes. These advances significantly broaden the applicability of non-invasive brain decoding in neuroscience and human-computer interaction, setting the stage for advanced brain-computer interfaces and cognitive models.

5/24/2024

cs.NE