SEEP: Training Dynamics Grounds Latent Representation Search for Mitigating Backdoor Poisoning Attacks

2405.11575

Published 5/21/2024 by Xuanli He, Qiongkai Xu, Jun Wang, Benjamin I. P. Rubinstein, Trevor Cohn

🏋️

Abstract

Modern NLP models are often trained on public datasets drawn from diverse sources, rendering them vulnerable to data poisoning attacks. These attacks can manipulate the model's behavior in ways engineered by the attacker. One such tactic involves the implantation of backdoors, achieved by poisoning specific training instances with a textual trigger and a target class label. Several strategies have been proposed to mitigate the risks associated with backdoor attacks by identifying and removing suspected poisoned examples. However, we observe that these strategies fail to offer effective protection against several advanced backdoor attacks. To remedy this deficiency, we propose a novel defensive mechanism that first exploits training dynamics to identify poisoned samples with high precision, followed by a label propagation step to improve recall and thus remove the majority of poisoned instances. Compared with recent advanced defense methods, our method considerably reduces the success rates of several backdoor attacks while maintaining high classification accuracy on clean test sets.

Create account to get full access

Overview

This document provides formatting instructions for submissions to the Transactions of the Association for Computational Linguistics (TACL) journal.
It covers common violations of submission rules that can result in desk rejections, as well as general guidelines for formatting the paper, including sections on the title, author information, abstract, body, references, and supplementary material.
The instructions are based on the tacl2021v1-template.tex and tacl2021v1.sty files, dated December 15, 2021.

Plain English Explanation

This paper outlines the formatting requirements and submission guidelines for publishing research papers in the Transactions of the Association for Computational Linguistics (TACL) journal. It's essentially a set of instructions that authors must follow when preparing their manuscripts for submission.

The key points covered include:

Avoiding common mistakes that can lead to an outright rejection of the paper, even before it goes through the full review process. These "desk rejections" are often due to violations of basic submission rules.
General formatting requirements, such as the structure of the paper (title, author information, abstract, main body, references, etc.), as well as specific formatting rules for things like citations, equations, figures, and tables.

The goal is to ensure a consistent and professional look and feel for all TACL publications, making the review and publication process as smooth as possible for both authors and editors.

Technical Explanation

The document begins by outlining some "courtesy warnings" - common mistakes that have resulted in desk rejections for TACL submissions in the past. These include things like exceeding the page limit, failing to properly anonymize the paper, and including author information in the wrong places.

The main body of the instructions covers the general formatting guidelines. This includes details on the required sections of the paper (title, author information, abstract, body, references, etc.), as well as specific formatting rules for elements like citations, equations, figures, and tables. The instructions also provide guidance on preparing supplementary material and handling revisions.

Throughout the document, the authors reference the tacl2021v1-template.tex and tacl2021v1.sty files, which contain the LaTeX code and style definitions needed to properly format a TACL submission.

Critical Analysis

The formatting instructions provided in this document are comprehensive and clearly aimed at ensuring a consistent, high-quality presentation of TACL publications. The emphasis on avoiding common mistakes that can lead to desk rejections is particularly valuable, as it helps authors understand the key requirements upfront and avoid wasting time on submissions that are likely to be rejected.

That said, the level of detail and technical nature of the instructions may be overwhelming for some authors, especially those who are new to academic publishing or not familiar with LaTeX. The editors may want to consider providing additional resources or support for authors who are unfamiliar with the required formatting.

Additionally, while the instructions cover a wide range of formatting rules, there may be edge cases or unique situations that are not fully addressed. It would be helpful if the editors were open to providing clarification or exceptions on a case-by-case basis, rather than strictly enforcing the guidelines without any flexibility.

Conclusion

The formatting instructions for TACL submissions provide a clear and comprehensive set of guidelines for authors to follow when preparing their manuscripts. By outlining common mistakes that can lead to desk rejections and providing detailed formatting requirements, the editors are able to maintain a consistent look and feel across all TACL publications.

While the technical nature of the instructions may be challenging for some authors, the overall goal of streamlining the submission and review process is a valuable one. As the field of computational linguistics continues to evolve, it will be important for TACL to regularly review and update these guidelines to ensure they remain relevant and effective.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor

Shaokui Wei, Hongyuan Zha, Baoyuan Wu

Data-poisoning backdoor attacks are serious security threats to machine learning models, where an adversary can manipulate the training dataset to inject backdoors into models. In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned. Unlike most existing methods that primarily detect and remove/unlearn suspicious samples to mitigate malicious backdoor attacks, we propose a novel defense approach called PDB (Proactive Defensive Backdoor). Specifically, PDB leverages the home field advantage of defenders by proactively injecting a defensive backdoor into the model during training. Taking advantage of controlling the training process, the defensive backdoor is designed to suppress the malicious backdoor effectively while remaining secret to attackers. In addition, we introduce a reversible mapping to determine the defensive target label. During inference, PDB embeds a defensive trigger in the inputs and reverses the model's prediction, suppressing malicious backdoor and ensuring the model's utility on the original task. Experimental results across various datasets and models demonstrate that our approach achieves state-of-the-art defense performance against a wide range of backdoor attacks.

5/28/2024

cs.CR cs.CV

Poisoning-based Backdoor Attacks for Arbitrary Target Label with Positive Triggers

Binxiao Huang, Jason Chun Lok, Chang Liu, Ngai Wong

Poisoning-based backdoor attacks expose vulnerabilities in the data preparation stage of deep neural network (DNN) training. The DNNs trained on the poisoned dataset will be embedded with a backdoor, making them behave well on clean data while outputting malicious predictions whenever a trigger is applied. To exploit the abundant information contained in the input data to output label mapping, our scheme utilizes the network trained from the clean dataset as a trigger generator to produce poisons that significantly raise the success rate of backdoor attacks versus conventional approaches. Specifically, we provide a new categorization of triggers inspired by the adversarial technique and develop a multi-label and multi-payload Poisoning-based backdoor attack with Positive Triggers (PPT), which effectively moves the input closer to the target label on benign classifiers. After the classifier is trained on the poisoned dataset, we can generate an input-label-aware trigger to make the infected classifier predict any given input to any target label with a high possibility. Under both dirty- and clean-label settings, we show empirically that the proposed attack achieves a high attack success rate without sacrificing accuracy across various datasets, including SVHN, CIFAR10, GTSRB, and Tiny ImageNet. Furthermore, the PPT attack can elude a variety of classical backdoor defenses, proving its effectiveness.

5/10/2024

cs.CV cs.CR

🌐

Partial train and isolate, mitigate backdoor attack

Yong Li, Han Gao

Neural networks are widely known to be vulnerable to backdoor attacks, a method that poisons a portion of the training data to make the target model perform well on normal data sets, while outputting attacker-specified or random categories on the poisoned samples. Backdoor attacks are full of threats. Poisoned samples are becoming more and more similar to corresponding normal samples, and even the human eye cannot easily distinguish them. On the other hand, the accuracy of models carrying backdoors on normal samples is no different from that of clean models.In this article, by observing the characteristics of backdoor attacks, We provide a new model training method (PT) that freezes part of the model to train a model that can isolate suspicious samples. Then, on this basis, a clean model is fine-tuned to resist backdoor attacks.

6/7/2024

cs.CV

Efficient Backdoor Attacks for Deep Neural Networks in Real-world Scenarios

Ziqiang Li, Hong Sun, Pengfei Xia, Heng Li, Beihao Xia, Yi Wu, Bin Li

Recent deep neural networks (DNNs) have came to rely on vast amounts of training data, providing an opportunity for malicious attackers to exploit and contaminate the data to carry out backdoor attacks. However, existing backdoor attack methods make unrealistic assumptions, assuming that all training data comes from a single source and that attackers have full access to the training data. In this paper, we introduce a more realistic attack scenario where victims collect data from multiple sources, and attackers cannot access the complete training data. We refer to this scenario as data-constrained backdoor attacks. In such cases, previous attack methods suffer from severe efficiency degradation due to the entanglement between benign and poisoning features during the backdoor injection process. To tackle this problem, we introduce three CLIP-based technologies from two distinct streams: Clean Feature Suppression and Poisoning Feature Augmentation.effective solution for data-constrained backdoor attacks. The results demonstrate remarkable improvements, with some settings achieving over 100% improvement compared to existing attacks in data-constrained scenarios. Code is available at https://github.com/sunh1113/Efficient-backdoor-attacks-for-deep-neural-networks-in-real-world-scenarios

4/22/2024

cs.CR cs.CV