Visual Neural Decoding via Improved Visual-EEG Semantic Consistency

Read original: arXiv:2408.06788 - Published 8/14/2024 by Hongzhou Chen, Lianghua He, Yihang Liu, Longzhen Yang

Visual Neural Decoding via Improved Visual-EEG Semantic Consistency

Overview

EEG-based visual neural decoding
Multimodal contrastive learning (MCL) to improve visual-EEG semantic consistency
Mutual information maximization between visual and EEG representations

Plain English Explanation

In this research, the authors explore ways to improve the ability to decode visual information from electroencephalography (EEG) data. EEG measures the brain's electrical activity, and by analyzing EEG signals, researchers can try to understand what a person is perceiving or thinking about.

The key innovation in this work is the use of multimodal contrastive learning (MCL). MCL is a technique that tries to learn representations (mathematical encodings) of visual information and EEG data that are semantically consistent - in other words, the representations capture the underlying meaning and relationships between the two data modalities.

The goal is to maximize the mutual information between the visual and EEG representations. This means finding ways to ensure that the representations of the visual and EEG data are closely related, so that one can be used to predict or reconstruct the other. By improving this semantic consistency, the researchers aim to enhance the ability to decode or reconstruct visual information from EEG signals.

Technical Explanation

The paper proposes a novel multimodal contrastive learning (MCL) framework to improve the semantic consistency between visual and EEG representations for enhanced visual neural decoding. The key steps are:

Visual Encoder: A convolutional neural network (CNN) is used to encode visual stimuli into a compact visual representation.
EEG Encoder: A transformer-based network is used to encode the EEG signals into a latent EEG representation.
Multimodal Contrastive Learning: The visual and EEG representations are trained to be semantically consistent by maximizing their mutual information. This is done by pulling together positive pairs (corresponding visual and EEG data) and pushing apart negative pairs (non-corresponding data).
Decoding: The trained visual and EEG encoders are used to perform visual neural decoding, where the EEG representation is used to reconstruct or predict the corresponding visual information.

The authors show that this MCL-based approach outperforms previous methods for visual neural decoding from EEG signals, demonstrating the benefits of improving the semantic consistency between the two modalities.

Critical Analysis

The paper presents a well-designed study with a thorough evaluation, but there are a few potential limitations and areas for future research:

The experiments are conducted on a relatively small dataset, so further validation on larger, more diverse datasets would be valuable.
The decoding performance, while improved over baselines, is still not perfect, suggesting there is room for further advancements in the field.
The interpretability of the learned representations is not deeply explored, which could provide additional insights into the neural mechanisms underlying visual processing.

Overall, the research makes a compelling case for the value of multimodal learning techniques like MCL in enhancing our ability to decode visual information from brain signals, with potential applications in brain-computer interfaces and the study of human cognition.

Conclusion

This paper introduces a novel multimodal contrastive learning approach to improve the semantic consistency between visual and EEG representations, leading to enhanced visual neural decoding performance. By maximizing the mutual information between the two modalities, the method learns more effective representations for reconstructing visual stimuli from EEG signals. While further research is needed, this work represents an important step forward in the field of EEG-based visual neural decoding and its real-world applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Visual Neural Decoding via Improved Visual-EEG Semantic Consistency

Hongzhou Chen, Lianghua He, Yihang Liu, Longzhen Yang

Visual neural decoding refers to the process of extracting and interpreting original visual experiences from human brain activity. Recent advances in metric learning-based EEG visual decoding methods have delivered promising results and demonstrated the feasibility of decoding novel visual categories from brain activity. However, methods that directly map EEG features to the CLIP embedding space may introduce mapping bias and cause semantic inconsistency among features, thereby degrading alignment and impairing decoding performance. To further explore the semantic consistency between visual and neural signals. In this work, we construct a joint semantic space and propose a Visual-EEG Semantic Decouple Framework that explicitly extracts the semantic-related features of these two modalities to facilitate optimal alignment. Specifically, a cross-modal information decoupling module is introduced to guide the extraction of semantic-related information from modalities. Then, by quantifying the mutual information between visual image and EEG features, we observe a strong positive correlation between the decoding performance and the magnitude of mutual information. Furthermore, inspired by the mechanisms of visual object understanding from neuroscience, we propose an intra-class geometric consistency approach during the alignment process. This strategy maps visual samples within the same class to consistent neural patterns, which further enhances the robustness and the performance of EEG visual decoding. Experiments on a large Image-EEG dataset show that our method achieves state-of-the-art results in zero-shot neural decoding tasks.

8/14/2024

BrainDecoder: Style-Based Visual Decoding of EEG Signals

Minsuk Choi, Hiroshi Ishikawa

Decoding neural representations of visual stimuli from electroencephalography (EEG) offers valuable insights into brain activity and cognition. Recent advancements in deep learning have significantly enhanced the field of visual decoding of EEG, primarily focusing on reconstructing the semantic content of visual stimuli. In this paper, we present a novel visual decoding pipeline that, in addition to recovering the content, emphasizes the reconstruction of the style, such as color and texture, of images viewed by the subject. Unlike previous methods, this ``style-based'' approach learns in the CLIP spaces of image and text separately, facilitating a more nuanced extraction of information from EEG signals. We also use captions for text alignment simpler than previously employed, which we find work better. Both quantitative and qualitative evaluations show that our method better preserves the style of visual stimuli and extracts more fine-grained semantic information from neural signals. Notably, it achieves significant improvements in quantitative results and sets a new state-of-the-art on the popular Brain2Image dataset.

9/10/2024

Visual Decoding and Reconstruction via EEG Embeddings with Guided Diffusion

Dongyang Li, Chen Wei, Shiying Li, Jiachen Zou, Quanying Liu

How to decode human vision through neural signals has attracted a long-standing interest in neuroscience and machine learning. Modern contrastive learning and generative models improved the performance of fMRI-based visual decoding and reconstruction. However, the high cost and low temporal resolution of fMRI limit their applications in brain-computer interfaces (BCIs), prompting a high need for EEG-based visual reconstruction. In this study, we present an EEG-based visual reconstruction framework. It consists of a plug-and-play EEG encoder called the Adaptive Thinking Mapper (ATM), which is aligned with image embeddings, and a two-stage EEG guidance image generator that first transforms EEG features into image priors and then reconstructs the visual stimuli with a pre-trained image generator. Our approach allows EEG embeddings to achieve superior performance in image classification and retrieval tasks. Our two-stage image generation strategy vividly reconstructs images seen by humans. Furthermore, we analyzed the impact of signals from different time windows and brain regions on decoding and reconstruction. The versatility of our framework is demonstrated in the magnetoencephalogram (MEG) data modality. We report that EEG-based visual decoding achieves SOTA performance, highlighting the portability, low cost, and high temporal resolution of EEG, enabling a wide range of BCI applications. The code of ATM is available at https://github.com/dongyangli-del/EEG_Image_decode.

4/8/2024

SEE: Semantically Aligned EEG-to-Text Translation

Yitian Tao, Yan Liang, Luoyu Wang, Yongqing Li, Qing Yang, Han Zhang

Decoding neurophysiological signals into language is of great research interest within brain-computer interface (BCI) applications. Electroencephalography (EEG), known for its non-invasiveness, ease of use, and cost-effectiveness, has been a popular method in this field. However, current EEG-to-Text decoding approaches face challenges due to the huge domain gap between EEG recordings and raw texts, inherent data bias, and small closed vocabularies. In this paper, we propose SEE: Semantically Aligned EEG-to-Text Translation, a novel method aimed at improving EEG-to-Text decoding by seamlessly integrating two modules into a pre-trained BART language model. These two modules include (1) a Cross-Modal Codebook that learns cross-modal representations to enhance feature consolidation and mitigate domain gap, and (2) a Semantic Matching Module that fully utilizes pre-trained text representations to align multi-modal features extracted from EEG-Text pairs while considering noise caused by false negatives, i.e., data from different EEG-Text pairs that have similar semantic meanings. Experimental results on the Zurich Cognitive Language Processing Corpus (ZuCo) demonstrate the effectiveness of SEE, which enhances the feasibility of accurate EEG-to-Text decoding.

9/26/2024