EEG classification for visual brain decoding with spatio-temporal and transformer based paradigms

2406.07153

Published 6/12/2024 by Akanksha Sharma, Jyoti Nigam, Abhishek Rathore, Arnav Bhavsar

EEG classification for visual brain decoding with spatio-temporal and transformer based paradigms

Abstract

In this work, we delve into the EEG classification task in the domain of visual brain decoding via two frameworks, involving two different learning paradigms. Considering the spatio-temporal nature of EEG data, one of our frameworks is based on a CNN-BiLSTM model. The other involves a CNN-Transformer architecture which inherently involves the more versatile attention based learning paradigm. In both cases, a special 1D-CNN feature extraction module is used to generate the initial embeddings with 1D convolutions in the time and the EEG channel domains. Considering the EEG signals are noisy, non stationary and the discriminative features are even less clear (than in semantically structured data such as text or image), we also follow a window-based classification followed by majority voting during inference, to yield labels at a signal level. To illustrate how brain patterns correlate with different image classes, we visualize t-SNE plots of the BiLSTM embeddings alongside brain activation maps for the top 10 classes. These visualizations provide insightful revelations into the distinct neural signatures associated with each visual category, showcasing the BiLSTM's capability to capture and represent the discriminative brain activity linked to visual stimuli. We demonstrate the performance of our approach on the updated EEG-Imagenet dataset with positive comparisons with state-of-the-art methods.

Create account to get full access

Overview

This research paper explores the use of Electroencephalography (EEG) signals for visual brain decoding using spatio-temporal and transformer-based models.
The researchers investigate different deep learning architectures, including Convolutional Neural Network (CNN) - Bidirectional Long Short-Term Memory (BiLSTM) and CNN-Transformer models, to classify EEG data for visual recognition tasks.
The study utilizes the EEG-Imagenet dataset, which contains EEG recordings of participants viewing natural images.

Plain English Explanation

The paper examines how brain signals, captured through EEG technology, can be used to decode or interpret what a person is seeing. The researchers explore different deep learning models, which are algorithms that can learn patterns from data, to classify the EEG signals into the specific images that the person was viewing.

The key idea is to use the EEG data, which reflects the brain's electrical activity, as a way to "read" the person's visual perception. By training deep learning models on this EEG data, the researchers aim to develop systems that can accurately identify the images a person is seeing, even without the person explicitly telling the system what they are looking at.

This research builds on previous work on using EEG for brain-computer interfaces (BCIs) and decoding visual information from EEG signals. The researchers explore new model architectures, such as combining CNNs and transformers, to see if they can improve the accuracy and reliability of this visual brain decoding process.

Technical Explanation

The researchers investigate two main deep learning architectures for EEG classification and visual brain decoding:

CNN-BiLSTM: This model combines a Convolutional Neural Network (CNN) to extract spatial features from the EEG data, followed by a Bidirectional Long Short-Term Memory (BiLSTM) network to capture the temporal dynamics of the EEG signals.
CNN-Transformer: This model replaces the BiLSTM component with a Transformer-based module, which can learn long-range dependencies in the EEG data more effectively than the recurrent BiLSTM architecture.

The researchers evaluate these models on the EEG-Imagenet dataset, which contains EEG recordings of participants viewing a diverse set of natural images. The models are trained to classify the EEG data into the corresponding image categories.

The results show that the CNN-Transformer model outperforms the CNN-BiLSTM model, demonstrating the effectiveness of the Transformer architecture in capturing the complex spatio-temporal patterns in the EEG data for visual brain decoding tasks.

Critical Analysis

The paper provides a comprehensive exploration of deep learning architectures for EEG-based visual brain decoding. The researchers have thoughtfully designed their experiments and discussed the strengths and limitations of the proposed models.

One potential area for further research is to investigate the interpretability of the models, as understanding the specific features and patterns the models are learning from the EEG data could provide valuable insights into the neural mechanisms underlying visual perception.

Additionally, the researchers could explore the robustness of these models to different types of visual stimuli, such as dynamic scenes or abstract art, to better understand the generalization capabilities of the proposed approaches.

Conclusion

This research paper presents novel deep learning models for classifying EEG signals to decode visual brain activity. The CNN-Transformer architecture demonstrates superior performance compared to the CNN-BiLSTM model, highlighting the potential of transformer-based models in capturing the complex spatio-temporal patterns in EEG data.

The findings of this study contribute to the ongoing efforts in brain-computer interface (BCI) technology and the broader field of decoding and reconstructing visual information from neural signals. This research has important implications for developing more accurate and reliable systems for interpreting and interacting with the human visual system using non-invasive EEG recordings.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Visual Decoding and Reconstruction via EEG Embeddings with Guided Diffusion

Dongyang Li, Chen Wei, Shiying Li, Jiachen Zou, Quanying Liu

How to decode human vision through neural signals has attracted a long-standing interest in neuroscience and machine learning. Modern contrastive learning and generative models improved the performance of fMRI-based visual decoding and reconstruction. However, the high cost and low temporal resolution of fMRI limit their applications in brain-computer interfaces (BCIs), prompting a high need for EEG-based visual reconstruction. In this study, we present an EEG-based visual reconstruction framework. It consists of a plug-and-play EEG encoder called the Adaptive Thinking Mapper (ATM), which is aligned with image embeddings, and a two-stage EEG guidance image generator that first transforms EEG features into image priors and then reconstructs the visual stimuli with a pre-trained image generator. Our approach allows EEG embeddings to achieve superior performance in image classification and retrieval tasks. Our two-stage image generation strategy vividly reconstructs images seen by humans. Furthermore, we analyzed the impact of signals from different time windows and brain regions on decoding and reconstruction. The versatility of our framework is demonstrated in the magnetoencephalogram (MEG) data modality. We report that EEG-based visual decoding achieves SOTA performance, highlighting the portability, low cost, and high temporal resolution of EEG, enabling a wide range of BCI applications. The code of ATM is available at https://github.com/dongyangli-del/EEG_Image_decode.

4/8/2024

cs.HC eess.SP

🏷️

EEGEncoder: Advancing BCI with Transformer-Based Motor Imagery Classification

Wangdan Liao, Weidong Wang

Brain-computer interfaces (BCIs) harness electroencephalographic signals for direct neural control of devices, offering a significant benefit for individuals with motor impairments. Traditional machine learning methods for EEG-based motor imagery (MI) classification encounter challenges such as manual feature extraction and susceptibility to noise.This paper introduces EEGEncoder, a deep learning framework that employs modified transformers and TCNs to surmount these limitations. We innovatively propose a fusion architecture, namely Dual-Stream Temporal-Spatial Block (DSTS), to capture temporal and spatial features, improving the accuracy of Motor Imagery classification task. Additionally, we use multiple parallel structures to enhance the performance of the model. When tested on the BCI Competition IV-2a dataset, our model results outperform current state-of-the-art techniques.

6/26/2024

cs.HC cs.LG

Geometric Neural Network based on Phase Space for BCI-EEG decoding

Igor Carrara, Bruno Aristimunha, Marie-Constance Corsi, Raphael Y. de Camargo, Sylvain Chevallier, Th'eodore Papadopoulo

The integration of Deep Learning (DL) algorithms on brain signal analysis is still in its nascent stages compared to their success in fields like Computer Vision, especially in Brain-Computer Interface (BCI), where the brain activity is decoded to control external devices without requiring muscle control. Electroencephalography (EEG) is a widely adopted choice for designing BCI systems due to its non-invasive and cost-effective nature and excellent temporal resolution. Still, it comes at the expense of limited training data, poor signal-to-noise, and a large variability across and within-subject recordings. Finally, setting up a BCI system with many electrodes takes a long time, hindering the widespread adoption of reliable DL architectures in BCIs outside research laboratories. To improve adoption, we need to improve user comfort using, for instance, reliable algorithms that operate with few electrodes. Approach: Our research aims to develop a DL algorithm that delivers effective results with a limited number of electrodes. Taking advantage of the Augmented Covariance Method with SPDNet, we propose the SPDNet$_{psi}$ architecture and analyze its performance and computational impact, as well as the interpretability of the results. The evaluation is conducted on 5-fold cross-validation, using only three electrodes positioned above the Motor Cortex. The methodology was tested on nearly 100 subjects from several open-source datasets using the Mother Of All BCI Benchmark (MOABB) framework. Main results: The results of our SPDNet$_{psi}$ demonstrate that the augmented approach combined with the SPDNet significantly outperforms all the current state-of-the-art DL architecture in MI decoding. Significance: This new architecture is explainable, with a low number of trainable parameters and a reduced carbon footprint.

6/24/2024

eess.SP cs.AI cs.LG

Recurrent and Convolutional Neural Networks in Classification of EEG Signal for Guided Imagery and Mental Workload Detection

Filip Postepski, Grzegorz M. Wojcik, Krzysztof Wrobel, Andrzej Kawiak, Katarzyna Zemla, Grzegorz Sedek

The Guided Imagery technique is reported to be used by therapists all over the world in order to increase the comfort of patients suffering from a variety of disorders from mental to oncology ones and proved to be successful in numerous of ways. Possible support for the therapists can be estimation of the time at which subject goes into deep relaxation. This paper presents the results of the investigations of a cohort of 26 students exposed to Guided Imagery relaxation technique and mental task workloads conducted with the use of dense array electroencephalographic amplifier. The research reported herein aimed at verification whether it is possible to detect differences between those two states and to classify them using deep learning methods and recurrent neural networks such as EEGNet, Long Short-Term Memory-based classifier, 1D Convolutional Neural Network and hybrid model of 1D Convolutional Neural Network and Long Short-Term Memory. The data processing pipeline was presented from the data acquisition, through the initial data cleaning, preprocessing and postprocessing. The classification was based on two datasets: one of them using 26 so-called cognitive electrodes and the other one using signal collected from 256 channels. So far there have not been such comparisons in the application being discussed. The classification results are presented by the validation metrics such as: accuracy, recall, precision, F1-score and loss for each case. It turned out that it is not necessary to collect signals from all electrodes as classification of the cognitive ones gives the results similar to those obtained for the full signal and extending input to 256 channels does not add much value. In Disscussion there were proposed an optimal classifier as well as some suggestions concerning the prospective development of the project.

5/29/2024

cs.LG