Understanding the Vulnerability of Skeleton-based Human Activity Recognition via Black-box Attack

Read original: arXiv:2211.11312 - Published 5/7/2024 by Yunfeng Diao, He Wang, Tianjia Shao, Yong-Liang Yang, Kun Zhou, David Hogg, Meng Wang

🤔

Overview

This paper explores the robustness of skeleton-based human activity recognition (HAR) models to adversarial attacks.
The authors propose a new black-box adversarial attack approach called BASAR that exploits the interplay between the classification boundary and the natural motion manifold.
The research finds that on-manifold adversarial samples are common in skeletal motions, challenging the belief that adversarial samples only exist off-manifold.
The authors also propose a new adversarial training approach called mixed manifold-based adversarial training (MMAT) to defend against these attacks.

Plain English Explanation

The paper focuses on the security and reliability of human activity recognition (HAR) systems, which are used in safety-critical applications like self-driving cars. These systems use data from sensors, like cameras or wearable devices, to recognize human actions and behaviors.

The researchers discovered that these HAR systems can be fooled by "adversarial attacks" - small, often imperceptible changes to the input data that cause the system to make incorrect predictions. Previous research has shown that these attacks are possible, but they required the attacker to have full knowledge of the model's inner workings, which is a very strong assumption.

In this paper, the authors show that such attacks can also work even when the attacker only has access to the model's inputs and outputs, without knowing the model's architecture or parameters. They propose a new attack method called BASAR that exploits the relationship between the classification boundary and the natural movements that humans make.

Surprisingly, the researchers found that many of these adversarial samples are not completely unnatural or "off-manifold" (i.e., outside the normal range of human motion), as previously believed. Instead, they exist on the "manifold" of natural human movements, making them even more deceptive.

To address this threat, the authors developed a new defense mechanism called mixed manifold-based adversarial training (MMAT). This technique helps the model become more robust to both on-manifold and off-manifold adversarial samples, without significantly impacting its overall accuracy.

Technical Explanation

The paper proposes a new black-box adversarial attack approach for skeleton-based human activity recognition (HAR) models, called BASAR (Black-box Adversarial Skeleton-based Activity Recognition).

BASAR exploits the interplay between the classification boundary and the natural motion manifold to generate adversarial samples. The authors hypothesize that adversarial samples do not necessarily have to be completely unnatural or "off-manifold" (i.e., outside the range of normal human motion), as commonly believed. Instead, they may exist on the "manifold" of natural human movements, making them even more deceptive.

To validate this hypothesis, the researchers conduct extensive evaluations of BASAR across various classifiers, datasets, and attack modes. The results show that BASAR can consistently deliver successful attacks, even in a black-box setting where the attacker has limited knowledge of the target model.

Furthermore, the authors propose a new adversarial training approach called mixed manifold-based adversarial training (MMAT) to defend against these attacks. MMAT leverages the sophisticated distributions of on-manifold and off-manifold adversarial samples to improve the model's robustness without compromising its classification accuracy.

Critical Analysis

The paper makes a significant contribution by introducing the first black-box adversarial attack approach for skeleton-based HAR models and challenging the common belief that adversarial samples must be off-manifold. The authors provide a comprehensive evaluation of BASAR and demonstrate its effectiveness across various settings.

However, the paper does not address the computational complexity and practical feasibility of implementing BASAR in real-world scenarios, where the attacker may have limited resources and time. Additionally, the proposed MMAT defense mechanism is evaluated only on a limited set of models and datasets, and its generalizability to other HAR systems remains to be explored.

Further research could investigate the transferability of BASAR attacks across different HAR models and the development of more efficient and scalable defense mechanisms. Additionally, it would be valuable to understand the broader implications of on-manifold adversarial samples and their potential impact on the reliability and trustworthiness of skeleton-based HAR systems in safety-critical applications.

Conclusion

This paper presents a significant advancement in the security and robustness of skeleton-based human activity recognition (HAR) systems. By proposing the BASAR attack and the MMAT defense, the authors provide new insights into the nature of adversarial samples and the vulnerabilities of these models.

The findings challenge the common assumption that adversarial samples must be completely unnatural or off-manifold, and highlight the need for more comprehensive defense strategies that can handle both on-manifold and off-manifold adversarial threats. The proposed MMAT approach offers a promising direction for improving the reliability of HAR systems, which are critical for safety-critical applications like self-driving cars.

Overall, this research contributes to the ongoing efforts to ensure the trustworthiness and robustness of AI-powered systems, particularly in domains where human safety and security are at stake.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🤔

Understanding the Vulnerability of Skeleton-based Human Activity Recognition via Black-box Attack

Yunfeng Diao, He Wang, Tianjia Shao, Yong-Liang Yang, Kun Zhou, David Hogg, Meng Wang

Human Activity Recognition (HAR) has been employed in a wide range of applications, e.g. self-driving cars, where safety and lives are at stake. Recently, the robustness of skeleton-based HAR methods have been questioned due to their vulnerability to adversarial attacks. However, the proposed attacks require the full-knowledge of the attacked classifier, which is overly restrictive. In this paper, we show such threats indeed exist, even when the attacker only has access to the input/output of the model. To this end, we propose the very first black-box adversarial attack approach in skeleton-based HAR called BASAR. BASAR explores the interplay between the classification boundary and the natural motion manifold. To our best knowledge, this is the first time data manifold is introduced in adversarial attacks on time series. Via BASAR, we find on-manifold adversarial samples are extremely deceitful and rather common in skeletal motions, in contrast to the common belief that adversarial samples only exist off-manifold. Through exhaustive evaluation, we show that BASAR can deliver successful attacks across classifiers, datasets, and attack modes. By attack, BASAR helps identify the potential causes of the model vulnerability and provides insights on possible improvements. Finally, to mitigate the newly identified threat, we propose a new adversarial training approach by leveraging the sophisticated distributions of on/off-manifold adversarial samples, called mixed manifold-based adversarial training (MMAT). MMAT can successfully help defend against adversarial attacks without compromising classification accuracy.

5/7/2024

TASAR: Transferable Attack on Skeletal Action Recognition

Yunfeng Diao, Baiqi Wu, Ruixuan Zhang, Ajian Liu, Xingxing Wei, Meng Wang, He Wang

Skeletal sequences, as well-structured representations of human behaviors, are crucial in Human Activity Recognition (HAR). The transferability of adversarial skeletal sequences enables attacks in real-world HAR scenarios, such as autonomous driving, intelligent surveillance, and human-computer interactions. However, existing Skeleton-based HAR (S-HAR) attacks exhibit weak adversarial transferability and, therefore, cannot be considered true transfer-based S-HAR attacks. More importantly, the reason for this failure remains unclear. In this paper, we study this phenomenon through the lens of loss surface, and find that its sharpness contributes to the poor transferability in S-HAR. Inspired by this observation, we assume and empirically validate that smoothening the rugged loss landscape could potentially improve adversarial transferability in S-HAR. To this end, we propose the first Transfer-based Attack on Skeletal Action Recognition, TASAR. TASAR explores the smoothed model posterior without re-training the pre-trained surrogates, which is achieved by a new post-train Dual Bayesian optimization strategy. Furthermore, unlike previous transfer-based attacks that treat each frame independently and overlook temporal coherence within sequences, TASAR incorporates motion dynamics into the Bayesian attack gradient, effectively disrupting the spatial-temporal coherence of S-HARs. To exhaustively evaluate the effectiveness of existing methods and our method, we build the first large-scale robust S-HAR benchmark, comprising 7 S-HAR models, 10 attack methods, 3 S-HAR datasets and 2 defense models. Extensive results demonstrate the superiority of TASAR. Our benchmark enables easy comparisons for future studies, with the code available in the supplementary material.

9/5/2024

👁️

Towards Physical World Backdoor Attacks against Skeleton Action Recognition

Qichen Zheng, Yi Yu, Siyuan Yang, Jun Liu, Kwok-Yan Lam, Alex Kot

Skeleton Action Recognition (SAR) has attracted significant interest for its efficient representation of the human skeletal structure. Despite its advancements, recent studies have raised security concerns in SAR models, particularly their vulnerability to adversarial attacks. However, such strategies are limited to digital scenarios and ineffective in physical attacks, limiting their real-world applicability. To investigate the vulnerabilities of SAR in the physical world, we introduce the Physical Skeleton Backdoor Attacks (PSBA), the first exploration of physical backdoor attacks against SAR. Considering the practicalities of physical execution, we introduce a novel trigger implantation method that integrates infrequent and imperceivable actions as triggers into the original skeleton data. By incorporating a minimal amount of this manipulated data into the training set, PSBA enables the system misclassify any skeleton sequences into the target class when the trigger action is present. We examine the resilience of PSBA in both poisoned and clean-label scenarios, demonstrating its efficacy across a range of datasets, poisoning ratios, and model architectures. Additionally, we introduce a trigger-enhancing strategy to strengthen attack performance in the clean label setting. The robustness of PSBA is tested against three distinct backdoor defenses, and the stealthiness of PSBA is evaluated using two quantitative metrics. Furthermore, by employing a Kinect V2 camera, we compile a dataset of human actions from the real world to mimic physical attack situations, with our findings confirming the effectiveness of our proposed attacks. Our project website can be found at https://qichenzheng.github.io/psba-website.

8/19/2024

Boosting Adversarial Transferability for Skeleton-based Action Recognition via Exploring the Model Posterior Space

Yunfeng Diao, Baiqi Wu, Ruixuan Zhang, Xun Yang, Meng Wang, He Wang

Skeletal motion plays a pivotal role in human activity recognition (HAR). Recently, attack methods have been proposed to identify the universal vulnerability of skeleton-based HAR(S-HAR). However, the research of adversarial transferability on S-HAR is largely missing. More importantly, existing attacks all struggle in transfer across unknown S-HAR models. We observed that the key reason is that the loss landscape of the action recognizers is rugged and sharp. Given the established correlation in prior studies~cite{qin2022boosting,wu2020towards} between loss landscape and adversarial transferability, we assume and empirically validate that smoothing the loss landscape could potentially improve adversarial transferability on S-HAR. This is achieved by proposing a new post-train Dual Bayesian strategy, which can effectively explore the model posterior space for a collection of surrogates without the need for re-training. Furthermore, to craft adversarial examples along the motion manifold, we incorporate the attack gradient with information of the motion dynamics in a Bayesian manner. Evaluated on benchmark datasets, e.g. HDM05 and NTU 60, the average transfer success rate can reach as high as 35.9% and 45.5% respectively. In comparison, current state-of-the-art skeletal attacks achieve only 3.6% and 9.8%. The high adversarial transferability remains consistent across various surrogate, victim, and even defense models. Through a comprehensive analysis of the results, we provide insights on what surrogates are more likely to exhibit transferability, to shed light on future research.

9/6/2024