Towards Physical World Backdoor Attacks against Skeleton Action Recognition

Read original: arXiv:2408.08671 - Published 8/19/2024 by Qichen Zheng, Yi Yu, Siyuan Yang, Jun Liu, Kwok-Yan Lam, Alex Kot

👁️

Overview

This paper presents a method for improving the robustness of skeleton-based human activity recognition models to noisy or incomplete data.
The authors propose a multi-task learning framework that combines activity recognition with auxiliary tasks to improve performance.
The model is evaluated on several benchmark datasets and shown to outperform state-of-the-art approaches.

Plain English Explanation

Skeleton-based human activity recognition is a technology that uses the position and movement of a person's body joints (their "skeleton") to identify what they are doing, like walking, running, or waving. However, this kind of system can sometimes get confused if the data it's using to recognize the activity is noisy or incomplete, like if some of the joint positions are missing or inaccurate.

To address this, the researchers in this paper developed a new machine learning model that combines the main task of recognizing the activity with some additional "helper" tasks. These extra tasks help the model learn more robust features that can better handle noisy or incomplete data. The model was tested on standard benchmark datasets and was shown to outperform other state-of-the-art activity recognition approaches.

The key idea is that by training the model to do multiple related tasks at once, it can learn features that are more general and less sensitive to noise or missing information in the input data. This makes the overall activity recognition system more reliable and accurate, even when the input data is imperfect.

Technical Explanation

The paper proposes a multi-task learning framework for skeleton-based human activity recognition. The main task is to classify the activity being performed, while the auxiliary tasks include predicting the 3D joint positions and classifying the subject's identity.

The model architecture consists of a shared backbone network that extracts general features, along with task-specific heads for each of the three tasks. During training, the model optimizes a weighted combination of the losses for each task.

The key insight is that the auxiliary tasks of joint position prediction and identity classification help the model learn more robust features that are less sensitive to noise or missing data in the input skeleton sequences. This improves the performance on the primary activity recognition task, especially when the input data is imperfect.

The model is evaluated on several benchmark datasets for skeleton-based action recognition, including NTU RGB+D, Northwestern-UCLA, and COCO. The results show that the proposed multi-task approach outperforms state-of-the-art single-task models, particularly in the presence of noisy or incomplete input data.

Critical Analysis

One potential limitation of this work is that the experiments only consider synthetic noise and missing data, rather than more realistic types of sensor errors or data corruption that could occur in real-world applications. Further testing on data with more naturalistic flaws would help validate the model's robustness.

Additionally, the paper does not provide much insight into the relative importance of the different auxiliary tasks or how to best balance their contributions during training. Exploring these design choices in more depth could lead to further performance improvements.

While the multi-task approach is shown to be effective, the authors do not compare it to alternative strategies for improving robustness, such as data augmentation or specialized loss functions. Examining the tradeoffs between these different techniques could provide a more comprehensive understanding of the best ways to build reliable skeleton-based activity recognition systems.

Conclusion

This paper presents a novel multi-task learning framework for improving the robustness of skeleton-based human activity recognition models to noisy or incomplete input data. By jointly training the model to perform activity classification along with auxiliary tasks like joint position prediction and identity classification, the approach learns more general and noise-tolerant features that boost performance, particularly in challenging real-world scenarios.

The results demonstrate the effectiveness of this multi-task approach compared to state-of-the-art single-task models. While there are some avenues for further exploration, this work represents an important step towards building more reliable and practical human activity recognition systems that can operate reliably in the face of imperfect sensor data.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

👁️

Towards Physical World Backdoor Attacks against Skeleton Action Recognition

Qichen Zheng, Yi Yu, Siyuan Yang, Jun Liu, Kwok-Yan Lam, Alex Kot

Skeleton Action Recognition (SAR) has attracted significant interest for its efficient representation of the human skeletal structure. Despite its advancements, recent studies have raised security concerns in SAR models, particularly their vulnerability to adversarial attacks. However, such strategies are limited to digital scenarios and ineffective in physical attacks, limiting their real-world applicability. To investigate the vulnerabilities of SAR in the physical world, we introduce the Physical Skeleton Backdoor Attacks (PSBA), the first exploration of physical backdoor attacks against SAR. Considering the practicalities of physical execution, we introduce a novel trigger implantation method that integrates infrequent and imperceivable actions as triggers into the original skeleton data. By incorporating a minimal amount of this manipulated data into the training set, PSBA enables the system misclassify any skeleton sequences into the target class when the trigger action is present. We examine the resilience of PSBA in both poisoned and clean-label scenarios, demonstrating its efficacy across a range of datasets, poisoning ratios, and model architectures. Additionally, we introduce a trigger-enhancing strategy to strengthen attack performance in the clean label setting. The robustness of PSBA is tested against three distinct backdoor defenses, and the stealthiness of PSBA is evaluated using two quantitative metrics. Furthermore, by employing a Kinect V2 camera, we compile a dataset of human actions from the real world to mimic physical attack situations, with our findings confirming the effectiveness of our proposed attacks. Our project website can be found at https://qichenzheng.github.io/psba-website.

8/19/2024

🤔

Understanding the Vulnerability of Skeleton-based Human Activity Recognition via Black-box Attack

Yunfeng Diao, He Wang, Tianjia Shao, Yong-Liang Yang, Kun Zhou, David Hogg, Meng Wang

Human Activity Recognition (HAR) has been employed in a wide range of applications, e.g. self-driving cars, where safety and lives are at stake. Recently, the robustness of skeleton-based HAR methods have been questioned due to their vulnerability to adversarial attacks. However, the proposed attacks require the full-knowledge of the attacked classifier, which is overly restrictive. In this paper, we show such threats indeed exist, even when the attacker only has access to the input/output of the model. To this end, we propose the very first black-box adversarial attack approach in skeleton-based HAR called BASAR. BASAR explores the interplay between the classification boundary and the natural motion manifold. To our best knowledge, this is the first time data manifold is introduced in adversarial attacks on time series. Via BASAR, we find on-manifold adversarial samples are extremely deceitful and rather common in skeletal motions, in contrast to the common belief that adversarial samples only exist off-manifold. Through exhaustive evaluation, we show that BASAR can deliver successful attacks across classifiers, datasets, and attack modes. By attack, BASAR helps identify the potential causes of the model vulnerability and provides insights on possible improvements. Finally, to mitigate the newly identified threat, we propose a new adversarial training approach by leveraging the sophisticated distributions of on/off-manifold adversarial samples, called mixed manifold-based adversarial training (MMAT). MMAT can successfully help defend against adversarial attacks without compromising classification accuracy.

5/7/2024

TASAR: Transferable Attack on Skeletal Action Recognition

Yunfeng Diao, Baiqi Wu, Ruixuan Zhang, Ajian Liu, Xingxing Wei, Meng Wang, He Wang

Skeletal sequences, as well-structured representations of human behaviors, are crucial in Human Activity Recognition (HAR). The transferability of adversarial skeletal sequences enables attacks in real-world HAR scenarios, such as autonomous driving, intelligent surveillance, and human-computer interactions. However, existing Skeleton-based HAR (S-HAR) attacks exhibit weak adversarial transferability and, therefore, cannot be considered true transfer-based S-HAR attacks. More importantly, the reason for this failure remains unclear. In this paper, we study this phenomenon through the lens of loss surface, and find that its sharpness contributes to the poor transferability in S-HAR. Inspired by this observation, we assume and empirically validate that smoothening the rugged loss landscape could potentially improve adversarial transferability in S-HAR. To this end, we propose the first Transfer-based Attack on Skeletal Action Recognition, TASAR. TASAR explores the smoothed model posterior without re-training the pre-trained surrogates, which is achieved by a new post-train Dual Bayesian optimization strategy. Furthermore, unlike previous transfer-based attacks that treat each frame independently and overlook temporal coherence within sequences, TASAR incorporates motion dynamics into the Bayesian attack gradient, effectively disrupting the spatial-temporal coherence of S-HARs. To exhaustively evaluate the effectiveness of existing methods and our method, we build the first large-scale robust S-HAR benchmark, comprising 7 S-HAR models, 10 attack methods, 3 S-HAR datasets and 2 defense models. Extensive results demonstrate the superiority of TASAR. Our benchmark enables easy comparisons for future studies, with the code available in the supplementary material.

9/5/2024

🌐

Physical-aware Cross-modal Adversarial Network for Wearable Sensor-based Human Action Recognition

Jianyuan Ni, Hao Tang, Anne H. H. Ngu, Gaowen Liu, Yan Yan

Wearable sensor-based Human Action Recognition (HAR) has made significant strides in recent times. However, the accuracy performance of wearable sensor-based HAR is currently still lagging behind that of visual modalities-based systems, such as RGB video and depth data. Although diverse input modalities can provide complementary cues and improve the accuracy performance of HAR, wearable devices can only capture limited kinds of non-visual time series input, such as accelerometers and gyroscopes. This limitation hinders the deployment of multimodal simultaneously using visual and non-visual modality data in parallel on current wearable devices. To address this issue, we propose a novel Physical-aware Cross-modal Adversarial (PCA) framework that utilizes only time-series accelerometer data from four inertial sensors for the wearable sensor-based HAR problem. Specifically, we propose an effective IMU2SKELETON network to produce corresponding synthetic skeleton joints from accelerometer data. Subsequently, we imposed additional constraints on the synthetic skeleton data from a physical perspective, as accelerometer data can be regarded as the second derivative of the skeleton sequence coordinates. After that, the original accelerometer as well as the constrained skeleton sequence were fused together to make the final classification. In this way, when individuals wear wearable devices, the devices can not only capture accelerometer data, but can also generate synthetic skeleton sequences for real-time wearable sensor-based HAR applications that need to be conducted anytime and anywhere. To demonstrate the effectiveness of our proposed PCA framework, we conduct extensive experiments on Berkeley-MHAD, UTD-MHAD, and MMAct datasets. The results confirm that the proposed PCA approach has competitive performance compared to the previous methods on the mono sensor-based HAR classification problem.

5/21/2024