BELT: Old-School Backdoor Attacks can Evade the State-of-the-Art Defense with Backdoor Exclusivity Lifting

Read original: arXiv:2312.04902 - Published 4/26/2024 by Huming Qiu, Junjie Sun, Mi Zhang, Xudong Pan, Min Yang

BELT: Old-School Backdoor Attacks can Evade the State-of-the-Art Defense with Backdoor Exclusivity Lifting

Overview

This paper introduces a new backdoor attack called BELT, which can evade state-of-the-art defense mechanisms.
Backdoor attacks are a type of machine learning vulnerability where a model is trained with malicious data, allowing an attacker to trigger undesirable behavior during inference.
The authors show that BELT can effectively launch old-school backdoor attacks, which were previously thought to be solvable by modern defense techniques.

Plain English Explanation

The paper discusses a new type of backdoor attack called BELT that can bypass the current defenses against these types of attacks. Backdoor attacks are a vulnerability in machine learning models where the model is trained with malicious data. This allows the attacker to trigger undesirable behavior during normal use of the model, such as misclassifying an image.

Previous research had proposed defenses that were thought to be effective at stopping these backdoor attacks. However, the BELT approach shows that attackers can still find ways to sneak backdoors into models, even with these defenses in place. BELT basically allows attackers to use older, simpler backdoor techniques that were previously believed to be solvable by the state-of-the-art defense methods.

The key idea behind BELT is to "lift" the exclusivity of the backdoor trigger, meaning the attacker can use a more generic trigger that isn't tied to a specific class. This makes the backdoor harder to detect and remove. The paper demonstrates that BELT can successfully launch these types of backdoor attacks even when the defenses are expecting the more complex backdoor techniques that had been seen in the past.

Technical Explanation

The paper proposes a new backdoor attack method called BELT (Backdoor Exclusivity Lifting Technique). BELT aims to evade state-of-the-art backdoor defense mechanisms by using a more generic backdoor trigger, rather than one that is specific to a particular class.

Traditionally, backdoor attacks have relied on triggers that are tightly coupled to a target class. For example, an attacker might train a model to misclassify images with a particular pattern as a certain target class. Defense techniques like neural cleanse have been developed to detect and remove these class-specific backdoors.

However, BELT lifts this class exclusivity, allowing the attacker to use a more generic trigger that can activate the backdoor across multiple classes. This makes the backdoor harder to detect and remove, as the defense mechanisms are designed to look for the class-specific triggers.

The paper evaluates BELT on various benchmark datasets and models, showing that it can effectively launch backdoor attacks even when state-of-the-art defense techniques are applied. The authors also provide an in-depth analysis of the BELT technique and its properties.

Critical Analysis

The BELT technique represents a notable advancement in backdoor attack methods, as it can bypass the current state-of-the-art defenses. This is a concerning development, as it suggests that even sophisticated defense mechanisms may not be sufficient to protect against these types of attacks.

One limitation of the paper is that it does not explore potential countermeasures or defenses against the BELT approach. The authors acknowledge that developing effective defenses against BELT-style attacks is an important area for future research.

Additionally, the paper focuses on a specific type of backdoor attack and does not address other emerging threats, such as targeted data poisoning or horizontal class backdoors. It would be valuable for the research community to take a more comprehensive view of the various backdoor attack vectors and explore holistic defense strategies.

Conclusion

The BELT technique introduced in this paper represents a significant advancement in backdoor attack methods, as it can evade state-of-the-art defense mechanisms that were previously thought to be effective. This highlights the ongoing arms race between attackers and defenders in the field of machine learning security.

While the paper provides a detailed technical explanation of the BELT approach, further research is needed to develop robust defenses that can withstand a broader range of backdoor attacks and poisoning threats. Maintaining the security of machine learning systems is a crucial challenge that requires continued attention from the research community.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

BELT: Old-School Backdoor Attacks can Evade the State-of-the-Art Defense with Backdoor Exclusivity Lifting

Huming Qiu, Junjie Sun, Mi Zhang, Xudong Pan, Min Yang

Deep neural networks (DNNs) are susceptible to backdoor attacks, where malicious functionality is embedded to allow attackers to trigger incorrect classifications. Old-school backdoor attacks use strong trigger features that can easily be learned by victim models. Despite robustness against input variation, the robustness however increases the likelihood of unintentional trigger activations. This leaves traces to existing defenses, which find approximate replacements for the original triggers that can activate the backdoor without being identical to the original trigger via, e.g., reverse engineering and sample overlay. In this paper, we propose and investigate a new characteristic of backdoor attacks, namely, backdoor exclusivity, which measures the ability of backdoor triggers to remain effective in the presence of input variation. Building upon the concept of backdoor exclusivity, we propose Backdoor Exclusivity LifTing (BELT), a novel technique which suppresses the association between the backdoor and fuzzy triggers to enhance backdoor exclusivity for defense evasion. Extensive evaluation on three popular backdoor benchmarks validate, our approach substantially enhances the stealthiness of four old-school backdoor attacks, which, after backdoor exclusivity lifting, is able to evade seven state-of-the-art backdoor countermeasures, at almost no cost of the attack success rate and normal utility. For example, one of the earliest backdoor attacks BadNet, enhanced by BELT, evades most of the state-of-the-art defenses including ABS and MOTH which would otherwise recognize the backdoored model.

4/26/2024

Breaking the False Sense of Security in Backdoor Defense through Re-Activation Attack

Mingli Zhu, Siyuan Liang, Baoyuan Wu

Deep neural networks face persistent challenges in defending against backdoor attacks, leading to an ongoing battle between attacks and defenses. While existing backdoor defense strategies have shown promising performance on reducing attack success rates, can we confidently claim that the backdoor threat has truly been eliminated from the model? To address it, we re-investigate the characteristics of the backdoored models after defense (denoted as defense models). Surprisingly, we find that the original backdoors still exist in defense models derived from existing post-training defense strategies, and the backdoor existence is measured by a novel metric called backdoor existence coefficient. It implies that the backdoors just lie dormant rather than being eliminated. To further verify this finding, we empirically show that these dormant backdoors can be easily re-activated during inference, by manipulating the original trigger with well-designed tiny perturbation using universal adversarial attack. More practically, we extend our backdoor reactivation to black-box scenario, where the defense model can only be queried by the adversary during inference, and develop two effective methods, i.e., query-based and transfer-based backdoor re-activation attacks. The effectiveness of the proposed methods are verified on both image classification and multimodal contrastive learning (i.e., CLIP) tasks. In conclusion, this work uncovers a critical vulnerability that has never been explored in existing defense strategies, emphasizing the urgency of designing more robust and advanced backdoor defense mechanisms in the future.

5/31/2024

Exploiting the Vulnerability of Large Language Models via Defense-Aware Architectural Backdoor

Abdullah Arafat Miah, Yu Bi

Deep neural networks (DNNs) have long been recognized as vulnerable to backdoor attacks. By providing poisoned training data in the fine-tuning process, the attacker can implant a backdoor into the victim model. This enables input samples meeting specific textual trigger patterns to be classified as target labels of the attacker's choice. While such black-box attacks have been well explored in both computer vision and natural language processing (NLP), backdoor attacks relying on white-box attack philosophy have hardly been thoroughly investigated. In this paper, we take the first step to introduce a new type of backdoor attack that conceals itself within the underlying model architecture. Specifically, we propose to design separate backdoor modules consisting of two functions: trigger detection and noise injection. The add-on modules of model architecture layers can detect the presence of input trigger tokens and modify layer weights using Gaussian noise to disturb the feature distribution of the baseline model. We conduct extensive experiments to evaluate our attack methods using two model architecture settings on five different large language datasets. We demonstrate that the training-free architectural backdoor on a large language model poses a genuine threat. Unlike the-state-of-art work, it can survive the rigorous fine-tuning and retraining process, as well as evade output probability-based defense methods (i.e. BDDR). All the code and data is available https://github.com/SiSL-URI/Arch_Backdoor_LLM.

9/10/2024

Rethinking Backdoor Detection Evaluation for Language Models

Jun Yan, Wenjie Jacky Mo, Xiang Ren, Robin Jia

Backdoor attacks, in which a model behaves maliciously when given an attacker-specified trigger, pose a major security risk for practitioners who depend on publicly released language models. Backdoor detection methods aim to detect whether a released model contains a backdoor, so that practitioners can avoid such vulnerabilities. While existing backdoor detection methods have high accuracy in detecting backdoored models on standard benchmarks, it is unclear whether they can robustly identify backdoors in the wild. In this paper, we examine the robustness of backdoor detectors by manipulating different factors during backdoor planting. We find that the success of existing methods highly depends on how intensely the model is trained on poisoned data during backdoor planting. Specifically, backdoors planted with either more aggressive or more conservative training are significantly more difficult to detect than the default ones. Our results highlight a lack of robustness of existing backdoor detectors and the limitations in current benchmark construction.

9/4/2024