Physical Adversarial Attack on Monocular Depth Estimation via Shape-Varying Patches

Read original: arXiv:2407.17312 - Published 7/25/2024 by Chenxing Zhao, Yang Li, Shihao Wu, Wenyi Tan, Shuangju Zhou, Quan Pan

Physical Adversarial Attack on Monocular Depth Estimation via Shape-Varying Patches

Overview

This paper explores a physical adversarial attack on monocular depth estimation models.
The researchers create shape-varying patches that can be printed and placed in the environment to fool the depth estimation model.
The attack is effective across multiple depth estimation models and datasets.

Plain English Explanation

The paper describes a way to trick monocular depth estimation models, which are computer vision systems that can determine the distance of objects from a single camera image. The researchers developed special physical patches - essentially stickers or objects that can be placed in the environment - that are designed to confuse the depth estimation model.

These shape-varying patches have a unique shape and appearance that causes the depth estimation model to incorrectly judge the distance of the object the patch is placed on. So the model might think an object is farther away than it really is.

The researchers tested this attack on several different depth estimation models and found it was effective across the board. This shows that these types of physical adversarial attacks can be a real threat to the reliability of computer vision systems, especially in safety-critical applications like self-driving cars.

Technical Explanation

The paper proposes a physical adversarial attack on monocular depth estimation models using shape-varying patches. The attack works by crafting a patch with a carefully designed shape that, when placed in the environment, causes the depth estimation model to incorrectly predict the depth of the scene.

The key innovation is using a parametric representation of the patch shape, allowing it to be optimized to fool the depth model. The researchers use a differentiable rendering approach to compute the depth map produced by the model for a given patch shape, and then optimize the patch shape to maximize the depth prediction error.

Experiments show the attack is effective across multiple depth estimation models, including MonoDepth2 and PackNet-SfM, as well as on different datasets like KITTI and TartanAir. The shape-varying patch can be physically printed and placed in the environment to reliably fool the depth estimation in the real world.

This work demonstrates the vulnerability of monocular depth estimation to physical adversarial attacks, which has important implications for the safety and reliability of computer vision systems in autonomous vehicles and other applications.

Critical Analysis

The paper provides a thorough technical explanation of the shape-varying patch attack and validates its effectiveness across multiple depth estimation models and datasets. However, some potential limitations and avenues for future work are not discussed:

The attack may be more difficult to scale to complex, cluttered real-world environments beyond the controlled lab settings tested.
Defenses like model-agnostic adversarial patch detection could potentially be developed to mitigate this type of attack.
The impact of the attack on downstream tasks like object detection and tracking that rely on depth estimation is not explored.

Additionally, while the technical details are sound, the paper could benefit from a deeper discussion of the broader implications and societal impact of such adversarial vulnerabilities in safety-critical computer vision systems.

Conclusion

This paper presents a novel physical adversarial attack on monocular depth estimation models using shape-varying patches. The attack is shown to be effective across multiple depth estimation models and datasets, demonstrating the vulnerability of these systems to carefully crafted physical perturbations.

The work highlights the need for robust defenses against adversarial attacks, especially in safety-critical applications like autonomous vehicles where depth estimation is a crucial component. Further research is needed to understand the broader implications and develop comprehensive mitigation strategies to ensure the reliability of computer vision systems in the real world.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Physical Adversarial Attack on Monocular Depth Estimation via Shape-Varying Patches

Chenxing Zhao, Yang Li, Shihao Wu, Wenyi Tan, Shuangju Zhou, Quan Pan

Adversarial attacks against monocular depth estimation (MDE) systems pose significant challenges, particularly in safety-critical applications such as autonomous driving. Existing patch-based adversarial attacks for MDE are confined to the vicinity of the patch, making it difficult to affect the entire target. To address this limitation, we propose a physics-based adversarial attack on monocular depth estimation, employing a framework called Attack with Shape-Varying Patches (ASP), aiming to optimize patch content, shape, and position to maximize effectiveness. We introduce various mask shapes, including quadrilateral, rectangular, and circular masks, to enhance the flexibility and efficiency of the attack. Furthermore, we propose a new loss function to extend the influence of the patch beyond the overlapping regions. Experimental results demonstrate that our attack method generates an average depth error of 18 meters on the target car with a patch area of 1/9, affecting over 98% of the target area.

7/25/2024

Adversarial Manhole: Challenging Monocular Depth Estimation and Semantic Segmentation Models with Patch Attack

Naufal Suryanto, Andro Aprila Adiputra, Ahmada Yusril Kadiptya, Yongsu Kim, Howon Kim

Monocular depth estimation (MDE) and semantic segmentation (SS) are crucial for the navigation and environmental interpretation of many autonomous driving systems. However, their vulnerability to practical adversarial attacks is a significant concern. This paper presents a novel adversarial attack using practical patches that mimic manhole covers to deceive MDE and SS models. The goal is to cause these systems to misinterpret scenes, leading to false detections of near obstacles or non-passable objects. We use Depth Planar Mapping to precisely position these patches on road surfaces, enhancing the attack's effectiveness. Our experiments show that these adversarial patches cause a 43% relative error in MDE and achieve a 96% attack success rate in SS. These patches create affected error regions over twice their size in MDE and approximately equal to their size in SS. Our studies also confirm the patch's effectiveness in physical simulations, the adaptability of the patches across different target models, and the effectiveness of our proposed modules, highlighting their practical implications.

8/28/2024

Self-supervised Adversarial Training of Monocular Depth Estimation against Physical-World Attacks

Zhiyuan Cheng, Cheng Han, James Liang, Qifan Wang, Xiangyu Zhang, Dongfang Liu

Monocular Depth Estimation (MDE) plays a vital role in applications such as autonomous driving. However, various attacks target MDE models, with physical attacks posing significant threats to system security. Traditional adversarial training methods, which require ground-truth labels, are not directly applicable to MDE models that lack ground-truth depth. Some self-supervised model hardening techniques (e.g., contrastive learning) overlook the domain knowledge of MDE, resulting in suboptimal performance. In this work, we introduce a novel self-supervised adversarial training approach for MDE models, leveraging view synthesis without the need for ground-truth depth. We enhance adversarial robustness against real-world attacks by incorporating L_0-norm-bounded perturbation during training. We evaluate our method against supervised learning-based and contrastive learning-based approaches specifically designed for MDE. Our experiments with two representative MDE networks demonstrate improved robustness against various adversarial attacks, with minimal impact on benign performance.

6/11/2024

BadPart: Unified Black-box Adversarial Patch Attacks against Pixel-wise Regression Tasks

Zhiyuan Cheng, Zhaoyi Liu, Tengda Guo, Shiwei Feng, Dongfang Liu, Mingjie Tang, Xiangyu Zhang

Pixel-wise regression tasks (e.g., monocular depth estimation (MDE) and optical flow estimation (OFE)) have been widely involved in our daily life in applications like autonomous driving, augmented reality and video composition. Although certain applications are security-critical or bear societal significance, the adversarial robustness of such models are not sufficiently studied, especially in the black-box scenario. In this work, we introduce the first unified black-box adversarial patch attack framework against pixel-wise regression tasks, aiming to identify the vulnerabilities of these models under query-based black-box attacks. We propose a novel square-based adversarial patch optimization framework and employ probabilistic square sampling and score-based gradient estimation techniques to generate the patch effectively and efficiently, overcoming the scalability problem of previous black-box patch attacks. Our attack prototype, named BadPart, is evaluated on both MDE and OFE tasks, utilizing a total of 7 models. BadPart surpasses 3 baseline methods in terms of both attack performance and efficiency. We also apply BadPart on the Google online service for portrait depth estimation, causing 43.5% relative distance error with 50K queries. State-of-the-art (SOTA) countermeasures cannot defend our attack effectively.

5/28/2024