Emotion Loss Attacking: Adversarial Attack Perception for Skeleton based on Multi-dimensional Features

Read original: arXiv:2406.19815 - Published 7/1/2024 by Feng Liu, Qing Xu, Qijian Zheng

Emotion Loss Attacking: Adversarial Attack Perception for Skeleton based on Multi-dimensional Features

Overview

This research paper focuses on developing a new adversarial attack method for skeleton-based emotion recognition models.
The proposed approach, called "Emotion Loss Attacking" (ELA), leverages multi-dimensional features to generate adversarial examples that can fool the target emotion recognition model.
ELA aims to provide a better understanding of the vulnerabilities of skeleton-based emotion recognition systems and inform the development of more robust models.

Plain English Explanation

Emotion recognition from skeleton data, which tracks the movement and positions of a person's joints, is an important task in computer vision and AI. However, these models can be vulnerable to adversarial attacks, where small, carefully crafted changes to the input data can cause the model to make mistakes.

The researchers in this paper propose a new type of adversarial attack called "Emotion Loss Attacking" (ELA). ELA works by analyzing multiple aspects of the skeleton data, such as the positions, velocities, and accelerations of the joints, to find ways to subtly modify the input that will confuse the emotion recognition model.

The key idea behind ELA is that by targeting these multi-dimensional features of the skeleton data, the researchers can generate adversarial examples that are more effective at fooling the model compared to attacks that only consider the joint positions. This provides valuable insights into the weaknesses of current emotion recognition systems and can help guide the development of more robust and secure models.

Technical Explanation

The researchers first propose a skeleton-based emotion recognition model that takes in multi-dimensional features of the skeleton data, including joint positions, velocities, and accelerations. This model is trained on a dataset of annotated emotion labels.

To generate adversarial examples, the researchers develop the ELA approach, which consists of the following steps:

Compute the emotion classification loss of the target model on the input skeleton data.
Compute the gradients of this loss with respect to the multi-dimensional skeleton features.
Update the skeleton data by taking a small step in the direction of the gradients, effectively modifying the joint positions, velocities, and accelerations.
Repeat steps 1-3 until the modified skeleton data is able to fool the emotion recognition model.

The researchers evaluate ELA on several benchmark datasets for skeleton-based emotion recognition and compare its performance to other adversarial attack methods. They find that ELA is able to generate more effective adversarial examples that achieve higher attack success rates while maintaining imperceptible changes to the input data.

Critical Analysis

The research presented in this paper provides a valuable contribution to the understanding of the vulnerabilities of skeleton-based emotion recognition models. By developing the ELA approach, the researchers demonstrate the importance of considering multi-dimensional features of the skeleton data when designing adversarial attacks.

However, the paper does not fully address the potential real-world implications of these attacks. While the adversarial examples generated by ELA may be imperceptible to human observers, it is unclear how these attacks would translate to practical scenarios, such as real-time interaction systems or wearable devices. Further research is needed to understand the feasibility and impact of such attacks in deployed systems.

Additionally, the paper does not provide a comprehensive discussion of potential countermeasures or defense mechanisms against ELA. While the researchers mention the need for more robust emotion recognition models, they do not explore specific techniques or architectures that could increase the resilience of these systems to adversarial attacks.

Conclusion

This research paper presents a novel adversarial attack method called "Emotion Loss Attacking" (ELA) that targets the multi-dimensional features of skeleton data to fool emotion recognition models. By considering a wider range of features, ELA is able to generate more effective adversarial examples compared to previous approaches.

The findings of this paper highlight the importance of developing secure and robust emotion recognition systems that can withstand adversarial attacks. The insights gained from this research can inform the design of improved emotion recognition models that are better equipped to handle the real-world challenges and potential security threats faced by these technologies.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Emotion Loss Attacking: Adversarial Attack Perception for Skeleton based on Multi-dimensional Features

Feng Liu, Qing Xu, Qijian Zheng

Adversarial attack on skeletal motion is a hot topic. However, existing researches only consider part of dynamic features when measuring distance between skeleton graph sequences, which results in poor imperceptibility. To this end, we propose a novel adversarial attack method to attack action recognizers for skeletal motions. Firstly, our method systematically proposes a dynamic distance function to measure the difference between skeletal motions. Meanwhile, we innovatively introduce emotional features for complementary information. In addition, we use Alternating Direction Method of Multipliers(ADMM) to solve the constrained optimization problem, which generates adversarial samples with better imperceptibility to deceive the classifiers. Experiments show that our method is effective on multiple action classifiers and datasets. When the perturbation magnitude measured by l norms is the same, the dynamic perturbations generated by our method are much lower than that of other methods. What's more, we are the first to prove the effectiveness of emotional features, and provide a new idea for measuring the distance between skeletal motions.

7/1/2024

Boosting Adversarial Transferability for Skeleton-based Action Recognition via Exploring the Model Posterior Space

Yunfeng Diao, Baiqi Wu, Ruixuan Zhang, Xun Yang, Meng Wang, He Wang

Skeletal motion plays a pivotal role in human activity recognition (HAR). Recently, attack methods have been proposed to identify the universal vulnerability of skeleton-based HAR(S-HAR). However, the research of adversarial transferability on S-HAR is largely missing. More importantly, existing attacks all struggle in transfer across unknown S-HAR models. We observed that the key reason is that the loss landscape of the action recognizers is rugged and sharp. Given the established correlation in prior studies~cite{qin2022boosting,wu2020towards} between loss landscape and adversarial transferability, we assume and empirically validate that smoothing the loss landscape could potentially improve adversarial transferability on S-HAR. This is achieved by proposing a new post-train Dual Bayesian strategy, which can effectively explore the model posterior space for a collection of surrogates without the need for re-training. Furthermore, to craft adversarial examples along the motion manifold, we incorporate the attack gradient with information of the motion dynamics in a Bayesian manner. Evaluated on benchmark datasets, e.g. HDM05 and NTU 60, the average transfer success rate can reach as high as 35.9% and 45.5% respectively. In comparison, current state-of-the-art skeletal attacks achieve only 3.6% and 9.8%. The high adversarial transferability remains consistent across various surrogate, victim, and even defense models. Through a comprehensive analysis of the results, we provide insights on what surrogates are more likely to exhibit transferability, to shed light on future research.

9/6/2024

Adversarial Robustness in RGB-Skeleton Action Recognition: Leveraging Attention Modality Reweighter

Chao Liu, Xin Liu, Zitong Yu, Yonghong Hou, Huanjing Yue, Jingyu Yang

Deep neural networks (DNNs) have been applied in many computer vision tasks and achieved state-of-the-art (SOTA) performance. However, misclassification will occur when DNNs predict adversarial examples which are created by adding human-imperceptible adversarial noise to natural examples. This limits the application of DNN in security-critical fields. In order to enhance the robustness of models, previous research has primarily focused on the unimodal domain, such as image recognition and video understanding. Although multi-modal learning has achieved advanced performance in various tasks, such as action recognition, research on the robustness of RGB-skeleton action recognition models is scarce. In this paper, we systematically investigate how to improve the robustness of RGB-skeleton action recognition models. We initially conducted empirical analysis on the robustness of different modalities and observed that the skeleton modality is more robust than the RGB modality. Motivated by this observation, we propose the formatword{A}ttention-based formatword{M}odality formatword{R}eweighter (formatword{AMR}), which utilizes an attention layer to re-weight the two modalities, enabling the model to learn more robust features. Our AMR is plug-and-play, allowing easy integration with multimodal models. To demonstrate the effectiveness of AMR, we conducted extensive experiments on various datasets. For example, compared to the SOTA methods, AMR exhibits a 43.77% improvement against PGD20 attacks on the NTU-RGB+D 60 dataset. Furthermore, it effectively balances the differences in robustness between different modalities.

7/30/2024

Self-supervised Gait-based Emotion Representation Learning from Selective Strongly Augmented Skeleton Sequences

Cheng Song, Lu Lu, Zhen Ke, Long Gao, Shuai Ding

Emotion recognition is an important part of affective computing. Extracting emotional cues from human gaits yields benefits such as natural interaction, a nonintrusive nature, and remote detection. Recently, the introduction of self-supervised learning techniques offers a practical solution to the issues arising from the scarcity of labeled data in the field of gait-based emotion recognition. However, due to the limited diversity of gaits and the incompleteness of feature representations for skeletons, the existing contrastive learning methods are usually inefficient for the acquisition of gait emotions. In this paper, we propose a contrastive learning framework utilizing selective strong augmentation (SSA) for self-supervised gait-based emotion representation, which aims to derive effective representations from limited labeled gait data. First, we propose an SSA method for the gait emotion recognition task, which includes upper body jitter and random spatiotemporal mask. The goal of SSA is to generate more diverse and targeted positive samples and prompt the model to learn more distinctive and robust feature representations. Then, we design a complementary feature fusion network (CFFN) that facilitates the integration of cross-domain information to acquire topological structural and global adaptive features. Finally, we implement the distributional divergence minimization loss to supervise the representation learning of the generally and strongly augmented queries. Our approach is validated on the Emotion-Gait (E-Gait) and Emilya datasets and outperforms the state-of-the-art methods under different evaluation protocols.

5/9/2024