TrackPGD: A White-box Attack using Binary Masks against Robust Transformer Trackers

Read original: arXiv:2407.03946 - Published 7/8/2024 by Fatemeh Nourilenjan Nokabadi, Yann Batiste Pequignot, Jean-Francois Lalonde, Christian Gagn'e

TrackPGD: A White-box Attack using Binary Masks against Robust Transformer Trackers

Overview

This research paper proposes a white-box attack called TrackPGD to fool robust transformer-based object trackers.
The attack uses binary masks to manipulate the input in a way that degrades the tracker's performance without being visually noticeable.
Experiments show TrackPGD can significantly degrade the tracking accuracy of state-of-the-art transformer-based trackers.

Plain English Explanation

The paper describes a new way to trick advanced object tracking systems that use transformer models. These trackers are designed to be robust against common attacks, but the researchers found a clever technique to bypass their defenses.

The key idea is to add a special "binary mask" to the video frames being tracked. This mask is carefully designed to confuse the tracker's neural network, causing it to lose track of the target object. Importantly, the mask is invisible to the human eye, so the video appears normal.

By applying this "TrackPGD" attack, the researchers were able to significantly degrade the performance of several state-of-the-art transformer-based trackers. This highlights the need for even more robust defenses against adversarial attacks in computer vision systems.

Technical Explanation

The paper introduces TrackPGD, a white-box attack that uses binary masks to degrade the performance of transformer-based object trackers. The key steps are:

The attacker has access to the target tracker's model parameters and architecture (white-box setting).
They generate a binary mask that, when added to the video frames, will cause the tracker to lose the target object.
The mask is optimized using an
Projected Gradient Descent
(PGD) approach to minimize the tracker's performance while ensuring the mask is visually imperceptible.
Experiments show TrackPGD can reduce the tracking accuracy of state-of-the-art transformer models like PUTR and BADPART by over 50%.

The paper demonstrates the vulnerability of even robust transformer-based trackers to this type of white-box attack using binary masks. This highlights the importance of developing even more secure defense mechanisms against adversarial attacks in computer vision.

Critical Analysis

The paper provides a thorough evaluation of the TrackPGD attack, including comparisons to other state-of-the-art attacks. However, the authors acknowledge several limitations:

The attack assumes white-box access to the target tracker, which may not always be the case in real-world scenarios.
The attack is tailored to transformer-based trackers and may not generalize to other architectures.
The paper does not explore defense mechanisms that could mitigate the TrackPGD attack.

Additionally, one could question whether the binary mask constraint is truly necessary, as the attacker could potentially use a more general perturbation to achieve similar results.

Overall, the paper makes a valuable contribution by exposing the vulnerability of robust transformer-based trackers to a novel white-box attack. However, further research is needed to develop more comprehensive defense strategies against a wider range of adversarial attacks.

Conclusion

The TrackPGD attack presented in this paper demonstrates the susceptibility of state-of-the-art transformer-based object trackers to carefully crafted adversarial perturbations. By using binary masks, the attacker can significantly degrade the tracking accuracy without introducing visually noticeable changes to the video.

This research highlights the need for continued advancements in the security and robustness of computer vision systems, especially as they become more widely deployed in real-world applications. Developing effective defense mechanisms against white-box and black-box attacks will be crucial for ensuring the reliability and trustworthiness of these technologies.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

TrackPGD: A White-box Attack using Binary Masks against Robust Transformer Trackers

Fatemeh Nourilenjan Nokabadi, Yann Batiste Pequignot, Jean-Francois Lalonde, Christian Gagn'e

Object trackers with transformer backbones have achieved robust performance on visual object tracking datasets. However, the adversarial robustness of these trackers has not been well studied in the literature. Due to the backbone differences, the adversarial white-box attacks proposed for object tracking are not transferable to all types of trackers. For instance, transformer trackers such as MixFormerM still function well after black-box attacks, especially in predicting the object binary masks. We are proposing a novel white-box attack named TrackPGD, which relies on the predicted object binary mask to attack the robust transformer trackers. That new attack focuses on annotation masks by adapting the well-known SegPGD segmentation attack, allowing to successfully conduct the white-box attack on trackers relying on transformer backbones. The experimental results indicate that the TrackPGD is able to effectively attack transformer-based trackers such as MixFormerM, OSTrackSTS, and TransT-SEG on several tracking datasets.

7/8/2024

Reproducibility Study on Adversarial Attacks Against Robust Transformer Trackers

Fatemeh Nourilenjan Nokabadi, Jean-Franc{c}ois Lalonde, Christian Gagn'e

New transformer networks have been integrated into object tracking pipelines and have demonstrated strong performance on the latest benchmarks. This paper focuses on understanding how transformer trackers behave under adversarial attacks and how different attacks perform on tracking datasets as their parameters change. We conducted a series of experiments to evaluate the effectiveness of existing adversarial attacks on object trackers with transformer and non-transformer backbones. We experimented on 7 different trackers, including 3 that are transformer-based, and 4 which leverage other architectures. These trackers are tested against 4 recent attack methods to assess their performance and robustness on VOT2022ST, UAV123 and GOT10k datasets. Our empirical study focuses on evaluating adversarial robustness of object trackers based on bounding box versus binary mask predictions, and attack methods at different levels of perturbations. Interestingly, our study found that altering the perturbation level may not significantly affect the overall object tracking results after the attack. Similarly, the sparsity and imperceptibility of the attack perturbations may remain stable against perturbation level shifts. By applying a specific attack on all transformer trackers, we show that new transformer trackers having a stronger cross-attention modeling achieve a greater adversarial robustness on tracking datasets, such as VOT2022ST and GOT10k. Our results also indicate the necessity for new attack methods to effectively tackle the latest types of transformer trackers. The codes necessary to reproduce this study are available at https://github.com/fatemehN/ReproducibilityStudy.

6/5/2024

🔮

CosPGD: an efficient white-box adversarial attack for pixel-wise prediction tasks

Shashank Agnihotri, Steffen Jung, Margret Keuper

While neural networks allow highly accurate predictions in many tasks, their lack of robustness towards even slight input perturbations often hampers their deployment. Adversarial attacks such as the seminal projected gradient descent (PGD) offer an effective means to evaluate a model's robustness and dedicated solutions have been proposed for attacks on semantic segmentation or optical flow estimation. While they attempt to increase the attack's efficiency, a further objective is to balance its effect, so that it acts on the entire image domain instead of isolated point-wise predictions. This often comes at the cost of optimization stability and thus efficiency. Here, we propose CosPGD, an attack that encourages more balanced errors over the entire image domain while increasing the attack's overall efficiency. To this end, CosPGD leverages a simple alignment score computed from any pixel-wise prediction and its target to scale the loss in a smooth and fully differentiable way. It leads to efficient evaluations of a model's robustness for semantic segmentation as well as regression models (such as optical flow, disparity estimation, or image restoration), and it allows it to outperform the previous SotA attack on semantic segmentation. We provide code for the CosPGD algorithm and example usage at https://github.com/shashankskagnihotri/cospgd.

7/9/2024

PG-Attack: A Precision-Guided Adversarial Attack Framework Against Vision Foundation Models for Autonomous Driving

Jiyuan Fu, Zhaoyu Chen, Kaixun Jiang, Haijing Guo, Shuyong Gao, Wenqiang Zhang

Vision foundation models are increasingly employed in autonomous driving systems due to their advanced capabilities. However, these models are susceptible to adversarial attacks, posing significant risks to the reliability and safety of autonomous vehicles. Adversaries can exploit these vulnerabilities to manipulate the vehicle's perception of its surroundings, leading to erroneous decisions and potentially catastrophic consequences. To address this challenge, we propose a novel Precision-Guided Adversarial Attack (PG-Attack) framework that combines two techniques: Precision Mask Perturbation Attack (PMP-Attack) and Deceptive Text Patch Attack (DTP-Attack). PMP-Attack precisely targets the attack region to minimize the overall perturbation while maximizing its impact on the target object's representation in the model's feature space. DTP-Attack introduces deceptive text patches that disrupt the model's understanding of the scene, further enhancing the attack's effectiveness. Our experiments demonstrate that PG-Attack successfully deceives a variety of advanced multi-modal large models, including GPT-4V, Qwen-VL, and imp-V1. Additionally, we won First-Place in the CVPR 2024 Workshop Challenge: Black-box Adversarial Attacks on Vision Foundation Models and codes are available at https://github.com/fuhaha824/PG-Attack.

7/19/2024