Visual Decoding and Reconstruction via EEG Embeddings with Guided Diffusion

2403.07721

Published 4/8/2024 by Dongyang Li, Chen Wei, Shiying Li, Jiachen Zou, Quanying Liu

Visual Decoding and Reconstruction via EEG Embeddings with Guided Diffusion

Abstract

How to decode human vision through neural signals has attracted a long-standing interest in neuroscience and machine learning. Modern contrastive learning and generative models improved the performance of fMRI-based visual decoding and reconstruction. However, the high cost and low temporal resolution of fMRI limit their applications in brain-computer interfaces (BCIs), prompting a high need for EEG-based visual reconstruction. In this study, we present an EEG-based visual reconstruction framework. It consists of a plug-and-play EEG encoder called the Adaptive Thinking Mapper (ATM), which is aligned with image embeddings, and a two-stage EEG guidance image generator that first transforms EEG features into image priors and then reconstructs the visual stimuli with a pre-trained image generator. Our approach allows EEG embeddings to achieve superior performance in image classification and retrieval tasks. Our two-stage image generation strategy vividly reconstructs images seen by humans. Furthermore, we analyzed the impact of signals from different time windows and brain regions on decoding and reconstruction. The versatility of our framework is demonstrated in the magnetoencephalogram (MEG) data modality. We report that EEG-based visual decoding achieves SOTA performance, highlighting the portability, low cost, and high temporal resolution of EEG, enabling a wide range of BCI applications. The code of ATM is available at https://github.com/dongyangli-del/EEG_Image_decode.

Create account to get full access

Overview

This research paper proposes a method for decoding and reconstructing visual information from electroencephalography (EEG) data using a guided diffusion model.
The key contributions include: 1) an aligned temporal model (ATM) for encoding EEG data into a meaningful representation, 2) a guided diffusion model for reconstructing visual images from the EEG embeddings, and 3) experiments demonstrating the effectiveness of this approach on natural image reconstruction tasks.

Plain English Explanation

The paper describes a way to recreate visual images from brain activity data measured using EEG sensors. EEG is a technique that can detect the electrical signals produced by the brain when we see things. The researchers developed a system that can take these EEG signals and use them to generate an image that the person was likely looking at.

The core idea is to first transform the raw EEG data into a more meaningful representation, or "embedding", using a special machine learning model called an Aligned Temporal Model (ATM). This embedding captures the important features of what the person was seeing. Then, a "guided diffusion" model is used to generate a visual image that matches this EEG embedding.

The guided diffusion model starts with random noise and iteratively refines it, being "guided" by the EEG embedding, until it produces a recognizable image. This process allows the model to reconstruct the visual information that the brain activity represents, even though the EEG signals themselves don't directly encode a complete image.

The authors show that this approach can effectively reconstruct natural images that people were looking at, based on their EEG data. This demonstrates the potential of using brain activity to decode and reconstruct visual experiences, with applications in areas like brain-computer interfaces and cognitive neuroscience.

Technical Explanation

The paper presents a method for visual decoding and reconstruction from EEG data using guided diffusion. The key components are:

ATM for EEG embedding: An Aligned Temporal Model (ATM) is used to transform raw EEG signals into a more meaningful vector representation, or "embedding", that captures the relevant visual features.
Guided Diffusion for Reconstruction: A guided diffusion model is used to generate visual images from the EEG embeddings. This model starts with random noise and iteratively refines it, being "guided" by the EEG embedding, until a recognizable image is produced.
Experiments: The authors evaluate their approach on natural image reconstruction tasks, demonstrating its effectiveness at decoding visual information from EEG data. This builds on prior work in EEG-based image reconstruction and general image reconstruction from brain activity.

Critical Analysis

The paper presents a novel and promising approach for decoding and reconstructing visual information from EEG data. The authors acknowledge several limitations, such as the need for further improvements in image quality and the challenge of scaling to more complex visual stimuli.

One potential concern is the reliance on the specific ATM architecture for encoding the EEG data. While the authors demonstrate its effectiveness, it would be valuable to explore the performance of alternative EEG embedding methods, or even end-to-end approaches that jointly learn the embedding and reconstruction.

Additionally, the evaluation is limited to natural image reconstruction. Extending the approach to other visual tasks, such as object recognition or scene understanding, could further demonstrate its broad applicability and usefulness in cognitive neuroscience research.

Conclusion

This research presents an innovative technique for decoding and reconstructing visual information from EEG data using a guided diffusion model. The proposed approach shows promising results in natural image reconstruction, with potential applications in brain-computer interfaces and cognitive neuroscience. While the method has some limitations, it represents an important step forward in leveraging neural activity to decode and reconstruct complex visual experiences.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

EEG classification for visual brain decoding with spatio-temporal and transformer based paradigms

Akanksha Sharma, Jyoti Nigam, Abhishek Rathore, Arnav Bhavsar

In this work, we delve into the EEG classification task in the domain of visual brain decoding via two frameworks, involving two different learning paradigms. Considering the spatio-temporal nature of EEG data, one of our frameworks is based on a CNN-BiLSTM model. The other involves a CNN-Transformer architecture which inherently involves the more versatile attention based learning paradigm. In both cases, a special 1D-CNN feature extraction module is used to generate the initial embeddings with 1D convolutions in the time and the EEG channel domains. Considering the EEG signals are noisy, non stationary and the discriminative features are even less clear (than in semantically structured data such as text or image), we also follow a window-based classification followed by majority voting during inference, to yield labels at a signal level. To illustrate how brain patterns correlate with different image classes, we visualize t-SNE plots of the BiLSTM embeddings alongside brain activation maps for the top 10 classes. These visualizations provide insightful revelations into the distinct neural signatures associated with each visual category, showcasing the BiLSTM's capability to capture and represent the discriminative brain activity linked to visual stimuli. We demonstrate the performance of our approach on the updated EEG-Imagenet dataset with positive comparisons with state-of-the-art methods.

6/12/2024

cs.HC

🌿

Decoding Natural Images from EEG for Object Recognition

Yonghao Song, Bingchuan Liu, Xiang Li, Nanlin Shi, Yijun Wang, Xiaorong Gao

Electroencephalography (EEG) signals, known for convenient non-invasive acquisition but low signal-to-noise ratio, have recently gained substantial attention due to the potential to decode natural images. This paper presents a self-supervised framework to demonstrate the feasibility of learning image representations from EEG signals, particularly for object recognition. The framework utilizes image and EEG encoders to extract features from paired image stimuli and EEG responses. Contrastive learning aligns these two modalities by constraining their similarity. With the framework, we attain significantly above-chance results on a comprehensive EEG-image dataset, achieving a top-1 accuracy of 15.6% and a top-5 accuracy of 42.8% in challenging 200-way zero-shot tasks. Moreover, we perform extensive experiments to explore the biological plausibility by resolving the temporal, spatial, spectral, and semantic aspects of EEG signals. Besides, we introduce attention modules to capture spatial correlations, providing implicit evidence of the brain activity perceived from EEG data. These findings yield valuable insights for neural decoding and brain-computer interfaces in real-world scenarios. The code will be released on https://github.com/eeyhsong/NICE-EEG.

4/5/2024

cs.HC cs.AI eess.SP

🖼️

Neuro-Vision to Language: Image Reconstruction and Interaction via Non-invasive Brain Recordings

Guobin Shen, Dongcheng Zhao, Xiang He, Linghao Feng, Yiting Dong, Jihang Wang, Qian Zhang, Yi Zeng

Decoding non-invasive brain recordings is pivotal for advancing our understanding of human cognition but faces challenges due to individual differences and complex neural signal representations. Traditional methods often require customized models and extensive trials, lacking interpretability in visual reconstruction tasks. Our framework integrates 3D brain structures with visual semantics using a Vision Transformer 3D. This unified feature extractor efficiently aligns fMRI features with multiple levels of visual embeddings, eliminating the need for subject-specific models and allowing extraction from single-trial data. The extractor consolidates multi-level visual features into one network, simplifying integration with Large Language Models (LLMs). Additionally, we have enhanced the fMRI dataset with diverse fMRI-image-related textual data to support multimodal large model development. Integrating with LLMs enhances decoding capabilities, enabling tasks such as brain captioning, complex reasoning, concept localization, and visual reconstruction. Our approach demonstrates superior performance across these tasks, precisely identifying language-based concepts within brain signals, enhancing interpretability, and providing deeper insights into neural processes. These advances significantly broaden the applicability of non-invasive brain decoding in neuroscience and human-computer interaction, setting the stage for advanced brain-computer interfaces and cognitive models.

5/24/2024

cs.NE

Mind's Eye: Image Recognition by EEG via Multimodal Similarity-Keeping Contrastive Learning

Chi-Sheng Chen, Chun-Shu Wei

Decoding images from non-invasive electroencephalographic (EEG) signals has been a grand challenge in understanding how the human brain process visual information in real-world scenarios. To cope with the issues of signal-to-noise ratio and nonstationarity, this paper introduces a MUltimodal Similarity-keeping contrastivE learning (MUSE) framework for zero-shot EEG-based image classification. We develop a series of multivariate time-series encoders tailored for EEG signals and assess the efficacy of regularized contrastive EEG-Image pretraining using an extensive visual EEG dataset. Our method achieves state-of-the-art performance, with a top-1 accuracy of 19.3% and a top-5 accuracy of 48.8% in 200-way zero-shot image classification. Furthermore, we visualize neural patterns via model interpretation, shedding light on the visual processing dynamics in the human brain. The code repository for this work is available at: https://github.com/ChiShengChen/MUSE_EEG.

6/26/2024

eess.SP cs.AI cs.HC cs.LG