Poisoning Web-Scale Training Datasets is Practical

2302.10149

171

Published 5/7/2024 by Nicholas Carlini, Matthew Jagielski, Christopher A. Choquette-Choo, Daniel Paleka, Will Pearce, Hyrum Anderson, Andreas Terzis, Kurt Thomas, Florian Tram`er

cs.CR cs.LG

🏋️

Abstract

Deep learning models are often trained on distributed, web-scale datasets crawled from the internet. In this paper, we introduce two new dataset poisoning attacks that intentionally introduce malicious examples to a model's performance. Our attacks are immediately practical and could, today, poison 10 popular datasets. Our first attack, split-view poisoning, exploits the mutable nature of internet content to ensure a dataset annotator's initial view of the dataset differs from the view downloaded by subsequent clients. By exploiting specific invalid trust assumptions, we show how we could have poisoned 0.01% of the LAION-400M or COYO-700M datasets for just $60 USD. Our second attack, frontrunning poisoning, targets web-scale datasets that periodically snapshot crowd-sourced content -- such as Wikipedia -- where an attacker only needs a time-limited window to inject malicious examples. In light of both attacks, we notify the maintainers of each affected dataset and recommended several low-overhead defenses.

Get summaries of the top AI research delivered straight to your inbox:

Overview

Deep learning models are often trained on large datasets crawled from the internet
This paper introduces two new attacks that can intentionally introduce malicious examples into these datasets
The attacks could be used to poison 10 popular datasets today for a low cost

Plain English Explanation

The paper describes two new ways that bad actors could secretly insert malicious content into the massive online datasets used to train popular AI models.

The first attack, called "split-view poisoning", exploits the fact that internet content can change over time. An attacker could make a dataset annotator see one version of a web page, while secretly providing a different, malicious version to the model during training. By doing this for just 0.01% of a huge dataset like LAION-400M or COYO-700M, an attacker could poison the entire dataset for only $60.

The second attack, "frontrunning poisoning", targets datasets that regularly take snapshots of crowd-sourced content like Wikipedia. Here, an attacker only needs a short window of time to inject malicious examples before the snapshot is taken, allowing them to contaminate the entire dataset.

The researchers notified the maintainers of the affected datasets and suggested some simple defenses against these attacks.

Technical Explanation

The paper introduces two novel dataset poisoning attacks that could be used to maliciously contaminate the large web-crawled datasets commonly used to train deep learning models.

In the "split-view poisoning" attack, the researchers exploit the mutable nature of internet content. By ensuring that a dataset annotator sees a benign version of a web page, while a subsequent client downloading the dataset receives a malicious version, the researchers show how they could have poisoned 0.01% of the LAION-400M or COYO-700M datasets for just $60.

The "frontrunning poisoning" attack targets datasets that take periodic snapshots of crowd-sourced content, like Wikipedia. Here, an attacker only needs a limited time window to inject malicious examples before the snapshot is taken, allowing them to contaminate the entire dataset.

The researchers responsibly disclosed these attacks to the maintainers of the affected datasets and provided recommendations for low-overhead defenses.

Critical Analysis

The paper presents compelling evidence of the vulnerability of web-crawled datasets to poisoning attacks. The "split-view poisoning" and "frontrunning poisoning" techniques appear to be immediately practical and could be used to contaminate major datasets used in deep learning today.

One limitation is that the paper does not explore the long-term impact of these attacks on downstream model performance and robustness. While the researchers demonstrate the ability to insert malicious content, more research is needed to understand how this would translate to real-world harms.

Additionally, the proposed defenses, while sensible, may not be sufficient to fully mitigate these threats. Ongoing vigilance and more sophisticated techniques for detecting and removing malicious content may be necessary as attackers become more sophisticated.

Overall, this research highlights the need for deep learning practitioners to carefully scrutinize the data they use and implement robust safeguards against adversarial manipulation. As the field of AI continues to advance, addressing dataset security will be crucial to ensuring the reliability and trustworthiness of these powerful technologies.

Conclusion

This paper introduces two practical attacks that bad actors could use to secretly contaminate the large web-crawled datasets commonly used to train deep learning models. By exploiting the mutable nature of internet content and the periodic snapshot approach of some datasets, the researchers demonstrate how attackers could poison 10 popular datasets for a low cost.

While the researchers provided some initial defenses, this work underscores the broader challenge of maintaining the integrity of training data in an era of web-scale AI. Addressing these dataset security vulnerabilities will be crucial to ensuring the reliability and trustworthiness of deep learning models as they become increasingly ubiquitous in our lives.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Poisoning-based Backdoor Attacks for Arbitrary Target Label with Positive Triggers

Binxiao Huang, Jason Chun Lok, Chang Liu, Ngai Wong

Poisoning-based backdoor attacks expose vulnerabilities in the data preparation stage of deep neural network (DNN) training. The DNNs trained on the poisoned dataset will be embedded with a backdoor, making them behave well on clean data while outputting malicious predictions whenever a trigger is applied. To exploit the abundant information contained in the input data to output label mapping, our scheme utilizes the network trained from the clean dataset as a trigger generator to produce poisons that significantly raise the success rate of backdoor attacks versus conventional approaches. Specifically, we provide a new categorization of triggers inspired by the adversarial technique and develop a multi-label and multi-payload Poisoning-based backdoor attack with Positive Triggers (PPT), which effectively moves the input closer to the target label on benign classifiers. After the classifier is trained on the poisoned dataset, we can generate an input-label-aware trigger to make the infected classifier predict any given input to any target label with a high possibility. Under both dirty- and clean-label settings, we show empirically that the proposed attack achieves a high attack success rate without sacrificing accuracy across various datasets, including SVHN, CIFAR10, GTSRB, and Tiny ImageNet. Furthermore, the PPT attack can elude a variety of classical backdoor defenses, proving its effectiveness.

5/10/2024

cs.CV cs.CR

Manipulating Recommender Systems: A Survey of Poisoning Attacks and Countermeasures

Thanh Toan Nguyen, Quoc Viet Hung Nguyen, Thanh Tam Nguyen, Thanh Trung Huynh, Thanh Thi Nguyen, Matthias Weidlich, Hongzhi Yin

Recommender systems have become an integral part of online services to help users locate specific information in a sea of data. However, existing studies show that some recommender systems are vulnerable to poisoning attacks, particularly those that involve learning schemes. A poisoning attack is where an adversary injects carefully crafted data into the process of training a model, with the goal of manipulating the system's final recommendations. Based on recent advancements in artificial intelligence, such attacks have gained importance recently. While numerous countermeasures to poisoning attacks have been developed, they have not yet been systematically linked to the properties of the attacks. Consequently, assessing the respective risks and potential success of mitigation strategies is difficult, if not impossible. This survey aims to fill this gap by primarily focusing on poisoning attacks and their countermeasures. This is in contrast to prior surveys that mainly focus on attacks and their detection methods. Through an exhaustive literature review, we provide a novel taxonomy for poisoning attacks, formalise its dimensions, and accordingly organise 30+ attacks described in the literature. Further, we review 40+ countermeasures to detect and/or prevent poisoning attacks, evaluating their effectiveness against specific types of attacks. This comprehensive survey should serve as a point of reference for protecting recommender systems against poisoning attacks. The article concludes with a discussion on open issues in the field and impactful directions for future research. A rich repository of resources associated with poisoning attacks is available at https://github.com/tamlhp/awesome-recsys-poisoning.

4/24/2024

cs.CR cs.IR cs.LG

Hard Work Does Not Always Pay Off: Poisoning Attacks on Neural Architecture Search

Zachary Coalson, Huazheng Wang, Qingyun Wu, Sanghyun Hong

In this paper, we study the robustness of data-centric approaches to finding neural network architectures (known as neural architecture search) to data distribution shifts. To audit this robustness, we present a data poisoning attack, when injected to the training data used for architecture search that can prevent the victim algorithm from finding an architecture with optimal accuracy. We first define the attack objective for crafting poisoning samples that can induce the victim to generate sub-optimal architectures. To this end, we weaponize existing search algorithms to generate adversarial architectures that serve as our objectives. We also present techniques that the attacker can use to significantly reduce the computational costs of crafting poisoning samples. In an extensive evaluation of our poisoning attack on a representative architecture search algorithm, we show its surprising robustness. Because our attack employs clean-label poisoning, we also evaluate its robustness against label noise. We find that random label-flipping is more effective in generating sub-optimal architectures than our clean-label attack. Our results suggests that care must be taken for the data this emerging approach uses, and future work is needed to develop robust algorithms.

5/13/2024

cs.LG cs.CR

Efficient Backdoor Attacks for Deep Neural Networks in Real-world Scenarios

Ziqiang Li, Hong Sun, Pengfei Xia, Heng Li, Beihao Xia, Yi Wu, Bin Li

Recent deep neural networks (DNNs) have came to rely on vast amounts of training data, providing an opportunity for malicious attackers to exploit and contaminate the data to carry out backdoor attacks. However, existing backdoor attack methods make unrealistic assumptions, assuming that all training data comes from a single source and that attackers have full access to the training data. In this paper, we introduce a more realistic attack scenario where victims collect data from multiple sources, and attackers cannot access the complete training data. We refer to this scenario as data-constrained backdoor attacks. In such cases, previous attack methods suffer from severe efficiency degradation due to the entanglement between benign and poisoning features during the backdoor injection process. To tackle this problem, we introduce three CLIP-based technologies from two distinct streams: Clean Feature Suppression and Poisoning Feature Augmentation.effective solution for data-constrained backdoor attacks. The results demonstrate remarkable improvements, with some settings achieving over 100% improvement compared to existing attacks in data-constrained scenarios. Code is available at https://github.com/sunh1113/Efficient-backdoor-attacks-for-deep-neural-networks-in-real-world-scenarios

4/22/2024

cs.CR cs.CV