Multi-modal Mood Reader: Pre-trained Model Empowers Cross-Subject Emotion Recognition

2405.19373

Published 5/31/2024 by Yihang Dong, Xuhang Chen, Yanyan Shen, Michael Kwok-Po Ng, Tao Qian, Shuqiang Wang

Multi-modal Mood Reader: Pre-trained Model Empowers Cross-Subject Emotion Recognition

Abstract

Emotion recognition based on Electroencephalography (EEG) has gained significant attention and diversified development in fields such as neural signal processing and affective computing. However, the unique brain anatomy of individuals leads to non-negligible natural differences in EEG signals across subjects, posing challenges for cross-subject emotion recognition. While recent studies have attempted to address these issues, they still face limitations in practical effectiveness and model framework unity. Current methods often struggle to capture the complex spatial-temporal dynamics of EEG signals and fail to effectively integrate multimodal information, resulting in suboptimal performance and limited generalizability across subjects. To overcome these limitations, we develop a Pre-trained model based Multimodal Mood Reader for cross-subject emotion recognition that utilizes masked brain signal modeling and interlinked spatial-temporal attention mechanism. The model learns universal latent representations of EEG signals through pre-training on large scale dataset, and employs Interlinked spatial-temporal attention mechanism to process Differential Entropy(DE) features extracted from EEG data. Subsequently, a multi-level fusion layer is proposed to integrate the discriminative features, maximizing the advantages of features across different dimensions and modalities. Extensive experiments on public datasets demonstrate Mood Reader's superior performance in cross-subject emotion recognition tasks, outperforming state-of-the-art methods. Additionally, the model is dissected from attention perspective, providing qualitative analysis of emotion-related brain areas, offering valuable insights for affective research in neural signal processing.

Create account to get full access

Overview

This paper presents a novel pre-trained model for cross-subject emotion recognition using multimodal brain signals, including electroencephalography (EEG) and functional near-infrared spectroscopy (fNIRS).
The proposed model, called "Multi-modal Mood Reader," leverages spatial-temporal attention mechanisms to capture the complex patterns in brain activity associated with emotional states.
The model is pre-trained on a large dataset, enabling it to generalize well to new subjects and datasets, overcoming the common challenge of subject-specific variability in emotion recognition tasks.

Plain English Explanation

The researchers have developed a new system that can recognize a person's emotions by analyzing their brain activity. This system uses a technique called "multi-modal" emotion recognition, which means it combines different types of brain signals, such as EEG and fNIRS, to get a more complete picture of the brain's response to emotions.

The key innovation of this system is that it uses a "pre-trained" model, which means the system has already been trained on a large amount of data from many different people. This allows the system to generalize well to new people and datasets, overcoming a common challenge in emotion recognition where the system struggles to work reliably across different individuals.

The system uses special attention mechanisms to focus on the most relevant parts of the brain signals when trying to identify emotions. This helps it capture the complex patterns in brain activity that are associated with different emotional states.

Overall, this new multi-modal emotion recognition system represents an important step forward in the field of machine learning for EEG-based emotion recognition, as it can work effectively across a wide range of people and situations, rather than being limited to a specific individual or dataset.

Technical Explanation

The researchers propose a novel pre-trained model called "Multi-modal Mood Reader" for cross-subject emotion recognition using multimodal brain signals, including EEG and fNIRS. The model leverages spatial-temporal attention mechanisms to capture the complex patterns in brain activity associated with different emotional states.

The key innovation of this work is the pre-training approach, which enables the model to generalize well to new subjects and datasets, overcoming the common challenge of subject-specific variability in emotion recognition tasks. The pre-training is performed on a large-scale dataset, allowing the model to learn robust and transferable representations of emotional brain signatures.

The model architecture consists of two main components: a multimodal feature extractor and a spatial-temporal attention-based emotion classifier. The feature extractor combines the information from EEG and fNIRS signals, while the attention module selectively focuses on the most relevant spatial and temporal features for emotion recognition.

The researchers evaluate the performance of the Multi-modal Mood Reader on several public emotion recognition datasets, demonstrating its superior cross-subject generalization capabilities compared to state-of-the-art methods. The model achieves consistent and high accuracy in recognizing emotions across different subjects, a significant improvement over previous subject-specific approaches.

Critical Analysis

The paper presents a compelling approach to address the challenge of subject-specific variability in EEG-based emotion recognition, a longstanding issue in the field. The use of a pre-trained model and multimodal brain signals is a promising direction, as it leverages the complementary information from different modalities to improve the robustness and generalization of the emotion recognition system.

However, the paper could have provided more details on the pre-training process, such as the specific datasets used, the training hyperparameters, and the strategies employed to ensure the model learns transferable representations. Additionally, the authors could have discussed the potential limitations of the pre-training approach, such as the requirement of a large and diverse dataset for effective pre-training, and the potential challenges in adapting the model to new, unseen datasets or modalities.

Furthermore, the paper could have delved deeper into the interpretability of the spatial-temporal attention mechanisms, as understanding the neurophysiological underpinnings of the model's emotion recognition capabilities could provide valuable insights for the field of neuroscience-informed machine learning.

Overall, the "Multi-modal Mood Reader" presents a promising approach to improve the generalization and performance of emotion recognition systems, and the authors have made a valuable contribution to the field of EEG-based emotion recognition. However, further research and validation on a wider range of datasets and real-world applications would be necessary to fully assess the potential and limitations of this approach.

Conclusion

The "Multi-modal Mood Reader" proposed in this paper represents a significant advancement in the field of cross-subject emotion recognition using multimodal brain signals. The key innovation is the use of a pre-trained model that can effectively generalize to new subjects and datasets, overcoming a longstanding challenge in the field.

By leveraging spatial-temporal attention mechanisms and combining EEG and fNIRS data, the model is able to capture the complex patterns in brain activity associated with different emotional states. The superior cross-subject performance demonstrated in the experiments suggests that this approach has the potential to enable more reliable and practical emotion recognition systems, with applications in areas such as mental health monitoring, human-computer interaction, and affective computing.

Overall, this research contributes to the growing body of work on neuroscience-informed machine learning and highlights the importance of developing robust and generalizable emotion recognition models to unlock the full potential of brain-computer interfaces and emotional intelligence technologies.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Multimodal Physiological Signals Representation Learning via Multiscale Contrasting for Depression Recognition

Kai Shao, Rui Wang, Yixue Hao, Long Hu, Min Chen, Hans Arno Jacobsen

Depression recognition based on physiological signals such as functional near-infrared spectroscopy (fNIRS) and electroencephalogram (EEG) has made considerable progress. However, most existing studies ignore the complementarity and semantic consistency of multimodal physiological signals under the same stimulation task in complex spatio-temporal patterns. In this paper, we introduce a multimodal physiological signals representation learning framework using Siamese architecture via multiscale contrasting for depression recognition (MRLMC). First, fNIRS and EEG are transformed into different but correlated data based on a time-domain data augmentation strategy. Then, we design a spatio-temporal contrasting module to learn the representation of fNIRS and EEG through weight-sharing multiscale spatio-temporal convolution. Furthermore, to enhance the learning of semantic representation associated with stimulation tasks, a semantic consistency contrast module is proposed, aiming to maximize the semantic similarity of fNIRS and EEG. Extensive experiments on publicly available and self-collected multimodal physiological signals datasets indicate that MRLMC outperforms the state-of-the-art models. Moreover, our proposed framework is capable of transferring to multimodal time series downstream tasks.

6/27/2024

cs.LG cs.AI

EmT: A Novel Transformer for Generalized Cross-subject EEG Emotion Recognition

Yi Ding, Chengxuan Tong, Shuailei Zhang, Muyun Jiang, Yong Li, Kevin Lim Jun Liang, Cuntai Guan

Integrating prior knowledge of neurophysiology into neural network architecture enhances the performance of emotion decoding. While numerous techniques emphasize learning spatial and short-term temporal patterns, there has been limited emphasis on capturing the vital long-term contextual information associated with emotional cognitive processes. In order to address this discrepancy, we introduce a novel transformer model called emotion transformer (EmT). EmT is designed to excel in both generalized cross-subject EEG emotion classification and regression tasks. In EmT, EEG signals are transformed into a temporal graph format, creating a sequence of EEG feature graphs using a temporal graph construction module (TGC). A novel residual multi-view pyramid GCN module (RMPG) is then proposed to learn dynamic graph representations for each EEG feature graph within the series, and the learned representations of each graph are fused into one token. Furthermore, we design a temporal contextual transformer module (TCT) with two types of token mixers to learn the temporal contextual information. Finally, the task-specific output module (TSO) generates the desired outputs. Experiments on four publicly available datasets show that EmT achieves higher results than the baseline methods for both EEG emotion classification and regression tasks. The code is available at https://github.com/yi-ding-cs/EmT.

6/27/2024

cs.LG eess.SP

👨‍🏫

A Supervised Information Enhanced Multi-Granularity Contrastive Learning Framework for EEG Based Emotion Recognition

Xiang Li, Jian Song, Zhigang Zhao, Chunxiao Wang, Dawei Song, Bin Hu

This study introduces a novel Supervised Info-enhanced Contrastive Learning framework for EEG based Emotion Recognition (SICLEER). SI-CLEER employs multi-granularity contrastive learning to create robust EEG contextual representations, potentiallyn improving emotion recognition effectiveness. Unlike existing methods solely guided by classification loss, we propose a joint learning model combining self-supervised contrastive learning loss and supervised classification loss. This model optimizes both loss functions, capturing subtle EEG signal differences specific to emotion detection. Extensive experiments demonstrate SI-CLEER's robustness and superior accuracy on the SEED dataset compared to state-of-the-art methods. Furthermore, we analyze electrode performance, highlighting the significance of central frontal and temporal brain region EEGs in emotion detection. This study offers an universally applicable approach with potential benefits for diverse EEG classification tasks.

5/14/2024

cs.LG cs.AI eess.SP

Feature Fusion Based on Mutual-Cross-Attention Mechanism for EEG Emotion Recognition

Yimin Zhao, Jin Gu

An objective and accurate emotion diagnostic reference is vital to psychologists, especially when dealing with patients who are difficult to communicate with for pathological reasons. Nevertheless, current systems based on Electroencephalography (EEG) data utilized for sentiment discrimination have some problems, including excessive model complexity, mediocre accuracy, and limited interpretability. Consequently, we propose a novel and effective feature fusion mechanism named Mutual-Cross-Attention (MCA). Combining with a specially customized 3D Convolutional Neural Network (3D-CNN), this purely mathematical mechanism adeptly discovers the complementary relationship between time-domain and frequency-domain features in EEG data. Furthermore, the new designed Channel-PSD-DE 3D feature also contributes to the high performance. The proposed method eventually achieves 99.49% (valence) and 99.30% (arousal) accuracy on DEAP dataset.

6/21/2024

cs.LG cs.AI