Under-confidence Backdoors Are Resilient and Stealthy Backdoors

Read original: arXiv:2202.11203 - Published 7/23/2024 by Minlong Peng, Zidi Xiong, Quang H. Nguyen, Mingming Sun, Khoa D. Doan, Ping Li

📶

Overview

Backdoor attacks aim to make a model produce designed outputs on inputs with pre-designed backdoors.
Existing attack methods often result in severe over-fitting of the victim model over the backdoors, making the attack effective but easier to detect.
This paper proposes a label-smoothing strategy to overcome the over-fitting problem and improve the stealthiness of these attacks.

Plain English Explanation

In machine learning, researchers sometimes try to "trick" a model by adding a small number of poisoned samples to the training data. This is called a "backdoor attack." The goal is to make the model produce a specific output whenever it sees an input with a pre-designed "backdoor."

However, the existing methods for doing this often cause the model to become too focused on the backdoor, making the attack easy to detect. In this paper, the researchers propose a new technique called "label-smoothing" to address this problem.

The key idea is to not change the label of the poisoned samples to the target class 100% of the time. Instead, they only change it with a certain probability, which is designed to make the model's prediction for the target class only slightly higher than the other classes. This makes the attack more stealthy while still maintaining a high success rate.

The researchers found that this label-smoothing strategy can significantly improve the stealthiness of existing backdoor attack methods, while still allowing the attacker to control the model's output to some degree.

Technical Explanation

The researchers propose a Label-Smoothed Backdoor Attack (LSBA) to address the over-fitting problem of existing backdoor attack methods. In LSBA, the label of the poisoned sample x is changed to the target class with a probability p_n(x) instead of 100%. The value of p_n(x) is designed to make the prediction probability for the target class only slightly greater than those of the other classes.

The researchers show through empirical studies on several existing backdoor attacks that their label-smoothing strategy can significantly improve the stealthiness of these attacks while still achieving a high attack success rate. Additionally, the strategy allows the attacker to manually control the prediction probability of the desired output by manipulating the number of LSBAs applied and activated.

Critical Analysis

The researchers acknowledge that their label-smoothing strategy does not completely eliminate the risk of backdoor attacks, as the model can still be biased towards the backdoor to some degree. Additionally, the strategy requires carefully tuning the p_n(x) parameter to balance stealthiness and attack success rate.

Further research could explore ways to more robustly defend against backdoor attacks, such as by developing proactive defense mechanisms or clean-label backdoor mitigation techniques. It would also be interesting to investigate the potential for adversarial trigger reverse engineering to detect and remove backdoors in a more general way.

Conclusion

This paper presents a novel label-smoothing strategy to improve the stealthiness of backdoor attacks while maintaining a high attack success rate. By carefully controlling the probability of changing the label of poisoned samples, the researchers show that they can make these attacks harder to detect without sacrificing their effectiveness.

While this technique does not completely solve the backdoor attack problem, it highlights the importance of considering model robustness and security during the training process. As machine learning models become more widely deployed, defending against such attacks will be a crucial challenge for the field.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

📶

Under-confidence Backdoors Are Resilient and Stealthy Backdoors

Minlong Peng, Zidi Xiong, Quang H. Nguyen, Mingming Sun, Khoa D. Doan, Ping Li

By injecting a small number of poisoned samples into the training set, backdoor attacks aim to make the victim model produce designed outputs on any input injected with pre-designed backdoors. In order to achieve a high attack success rate using as few poisoned training samples as possible, most existing attack methods change the labels of the poisoned samples to the target class. This practice often results in severe over-fitting of the victim model over the backdoors, making the attack quite effective in output control but easier to be identified by human inspection or automatic defense algorithms. In this work, we proposed a label-smoothing strategy to overcome the over-fitting problem of these attack methods, obtaining a textit{Label-Smoothed Backdoor Attack} (LSBA). In the LSBA, the label of the poisoned sample $bm{x}$ will be changed to the target class with a probability of $p_n(bm{x})$ instead of 100%, and the value of $p_n(bm{x})$ is specifically designed to make the prediction probability the target class be only slightly greater than those of the other classes. Empirical studies on several existing backdoor attacks show that our strategy can considerably improve the stealthiness of these attacks and, at the same time, achieve a high attack success rate. In addition, our strategy makes it able to manually control the prediction probability of the design output through manipulating the applied and activated number of LSBAsfootnote{Source code will be published at url{https://github.com/v-mipeng/LabelSmoothedAttack.git}}.

7/23/2024

Wicked Oddities: Selectively Poisoning for Effective Clean-Label Backdoor Attacks

Quang H. Nguyen, Nguyen Ngoc-Hieu, The-Anh Ta, Thanh Nguyen-Tang, Kok-Seng Wong, Hoang Thanh-Tung, Khoa D. Doan

Deep neural networks are vulnerable to backdoor attacks, a type of adversarial attack that poisons the training data to manipulate the behavior of models trained on such data. Clean-label attacks are a more stealthy form of backdoor attacks that can perform the attack without changing the labels of poisoned data. Early works on clean-label attacks added triggers to a random subset of the training set, ignoring the fact that samples contribute unequally to the attack's success. This results in high poisoning rates and low attack success rates. To alleviate the problem, several supervised learning-based sample selection strategies have been proposed. However, these methods assume access to the entire labeled training set and require training, which is expensive and may not always be practical. This work studies a new and more practical (but also more challenging) threat model where the attacker only provides data for the target class (e.g., in face recognition systems) and has no knowledge of the victim model or any other classes in the training set. We study different strategies for selectively poisoning a small set of training samples in the target class to boost the attack success rate in this setting. Our threat model poses a serious threat in training machine learning models with third-party datasets, since the attack can be performed effectively with limited information. Experiments on benchmark datasets illustrate the effectiveness of our strategies in improving clean-label backdoor attacks.

7/17/2024

Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor

Shaokui Wei, Hongyuan Zha, Baoyuan Wu

Data-poisoning backdoor attacks are serious security threats to machine learning models, where an adversary can manipulate the training dataset to inject backdoors into models. In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned. Unlike most existing methods that primarily detect and remove/unlearn suspicious samples to mitigate malicious backdoor attacks, we propose a novel defense approach called PDB (Proactive Defensive Backdoor). Specifically, PDB leverages the home field advantage of defenders by proactively injecting a defensive backdoor into the model during training. Taking advantage of controlling the training process, the defensive backdoor is designed to suppress the malicious backdoor effectively while remaining secret to attackers. In addition, we introduce a reversible mapping to determine the defensive target label. During inference, PDB embeds a defensive trigger in the inputs and reverses the model's prediction, suppressing malicious backdoor and ensuring the model's utility on the original task. Experimental results across various datasets and models demonstrate that our approach achieves state-of-the-art defense performance against a wide range of backdoor attacks.

5/28/2024

Model-agnostic clean-label backdoor mitigation in cybersecurity environments

Giorgio Severi, Simona Boboila, John Holodnak, Kendra Kratkiewicz, Rauf Izmailov, Alina Oprea

The training phase of machine learning models is a delicate step, especially in cybersecurity contexts. Recent research has surfaced a series of insidious training-time attacks that inject backdoors in models designed for security classification tasks without altering the training labels. With this work, we propose new techniques that leverage insights in cybersecurity threat models to effectively mitigate these clean-label poisoning attacks, while preserving the model utility. By performing density-based clustering on a carefully chosen feature subspace, and progressively isolating the suspicious clusters through a novel iterative scoring procedure, our defensive mechanism can mitigate the attacks without requiring many of the common assumptions in the existing backdoor defense literature. To show the generality of our proposed mitigation, we evaluate it on two clean-label model-agnostic attacks on two different classic cybersecurity data modalities: network flows classification and malware classification, using gradient boosting and neural network models.

7/12/2024