Hidden in Plain Sound: Environmental Backdoor Poisoning Attacks on Whisper, and Mitigations

Read original: arXiv:2409.12553 - Published 9/20/2024 by Jonatan Bartolini, Todor Stoyanov, Alberto Giaretta

Hidden in Plain Sound: Environmental Backdoor Poisoning Attacks on Whisper, and Mitigations

Overview

The paper describes a new type of backdoor attack on speech recognition models, specifically targeting the Whisper model.
The attack, called "Environmental Backdoor Poisoning", involves injecting imperceptible audio triggers into the environment that can cause the model to misclassify speech.
The paper also proposes mitigations against this attack, including data augmentation and robust training techniques.

Plain English Explanation

The researchers have discovered a new way to trick speech recognition models, like the popular Whisper model, into making mistakes. They call this an "Environmental Backdoor Poisoning" attack.

The idea is to hide small, barely noticeable audio triggers in the environment around the model. When the model hears these triggers, it gets confused and starts misinterpreting what the person is saying. This could be used, for example, to make a voice assistant like Siri or Alexa ignore certain commands or carry out unintended actions.

The researchers also tested ways to prevent these attacks, such as training the models to be more robust and adding extra audio data during the training process. This makes it harder for the backdoor triggers to slip past the model's defenses.

Overall, this research highlights an important security vulnerability in modern speech recognition systems and provides some initial steps towards making them more secure against these types of attacks.

Technical Explanation

The paper presents a novel "Environmental Backdoor Poisoning" attack that exploits the voice activity detection (VAD) component of speech recognition models, like Whisper, to trigger misclassifications.

The attack works by injecting imperceptible audio triggers into the environment, which cause the VAD module to incorrectly activate and lead the model to interpret normal speech as the attacker's desired output. The researchers demonstrate the attack on the Whisper model, showing that it can effectively bypass defenses like adversarial training.

To mitigate these attacks, the paper proposes several techniques, including:

Data Augmentation: Introducing diverse environmental noise and triggers during training to improve the model's robustness.
Robust VAD: Designing more secure VAD modules that are resistant to environmental manipulation.
Anomaly Detection: Monitoring for unusual VAD activations that could indicate an ongoing attack.

The researchers evaluate these mitigation strategies and find that they can effectively reduce the impact of the environmental backdoor poisoning attacks on Whisper's performance.

Critical Analysis

The paper provides a comprehensive analysis of this new type of backdoor attack and demonstrates its effectiveness against state-of-the-art speech recognition models. However, the authors acknowledge that their proposed mitigations, while promising, may not be a complete solution.

One potential concern is the difficulty of anticipating and defending against all possible types of environmental triggers that an attacker could use. The researchers focused on a specific set of triggers, but a more determined adversary might be able to find other ways to manipulate the VAD component.

Additionally, the mitigations proposed in the paper, such as data augmentation and robust VAD design, may come with their own performance trade-offs or increased computational costs. Striking the right balance between security and usability will be an important challenge for future research in this area.

Finally, the paper does not explore the broader implications of these attacks on real-world applications of speech recognition technology. Understanding the potential impact on privacy, security, and user trust will be crucial as these systems become more widely deployed.

Conclusion

The "Hidden in Plain Sound" paper uncovers a new and concerning vulnerability in speech recognition models, demonstrating how environmental backdoor poisoning attacks can be used to subvert their performance. The proposed mitigation strategies offer promising avenues for improving the security of these systems, but ongoing research is needed to address the inherent challenges of this problem.

As speech-based interfaces become more ubiquitous, ensuring their robustness against adversarial attacks will be crucial for maintaining user trust and safeguarding the integrity of these important technologies. This paper serves as an important step forward in understanding and addressing these emerging security threats.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Hidden in Plain Sound: Environmental Backdoor Poisoning Attacks on Whisper, and Mitigations

Jonatan Bartolini, Todor Stoyanov, Alberto Giaretta

Thanks to the popularisation of transformer-based models, speech recognition (SR) is gaining traction in various application fields, such as industrial and robotics environments populated with mission-critical devices. While transformer-based SR can provide various benefits for simplifying human-machine interfacing, the research on the cybersecurity aspects of these models is lacklustre. In particular, concerning backdoor poisoning attacks. In this paper, we propose a new poisoning approach that maps different environmental trigger sounds to target phrases of different lengths, during the fine-tuning phase. We test our approach on Whisper, one of the most popular transformer-based SR model, showing that it is highly vulnerable to our attack, under several testing conditions. To mitigate the attack proposed in this paper, we investigate the use of Silero VAD, a state-of-the-art voice activity detection (VAD) model, as a defence mechanism. Our experiments show that it is possible to use VAD models to filter out malicious triggers and mitigate our attacks, with a varying degree of success, depending on the type of trigger sound and testing conditions.

9/20/2024

$Muting Whisper: A Universal Acoustic Adversarial Attack on Speech Foundation Models$

Muting Whisper: A Universal Acoustic Adversarial Attack on Speech Foundation Models

Vyas Raina, Rao Ma, Charles McGhee, Kate Knill, Mark Gales

Recent developments in large speech foundation models like Whisper have led to their widespread use in many automatic speech recognition (ASR) applications. These systems incorporate `special tokens' in their vocabulary, such as $texttt{}$, to guide their language generation process. However, we demonstrate that these tokens can be exploited by adversarial attacks to manipulate the model's behavior. We propose a simple yet effective method to learn a universal acoustic realization of Whisper's $texttt{}$ token, which, when prepended to any speech signal, encourages the model to ignore the speech and only transcribe the special token, effectively `muting' the model. Our experiments demonstrate that the same, universal 0.64-second adversarial audio segment can successfully mute a target Whisper ASR model for over 97% of speech samples. Moreover, we find that this universal adversarial audio segment often transfers to new datasets and tasks. Overall this work demonstrates the vulnerability of Whisper models to `muting' adversarial attacks, where such attacks can pose both risks and potential benefits in real-world settings: for example the attack can be used to bypass speech moderation systems, or conversely the attack can also be used to protect private speech data.

7/18/2024

Imperceptible Rhythm Backdoor Attacks: Exploring Rhythm Transformation for Embedding Undetectable Vulnerabilities on Speech Recognition

Wenhan Yao, Jiangkun Yang, Yongqiang He, Jia Liu, Weiping Wen

Speech recognition is an essential start ring of human-computer interaction, and recently, deep learning models have achieved excellent success in this task. However, when the model training and private data provider are always separated, some security threats that make deep neural networks (DNNs) abnormal deserve to be researched. In recent years, the typical backdoor attacks have been researched in speech recognition systems. The existing backdoor methods are based on data poisoning. The attacker adds some incorporated changes to benign speech spectrograms or changes the speech components, such as pitch and timbre. As a result, the poisoned data can be detected by human hearing or automatic deep algorithms. To improve the stealthiness of data poisoning, we propose a non-neural and fast algorithm called Random Spectrogram Rhythm Transformation (RSRT) in this paper. The algorithm combines four steps to generate stealthy poisoned utterances. From the perspective of rhythm component transformation, our proposed trigger stretches or squeezes the mel spectrograms and recovers them back to signals. The operation keeps timbre and content unchanged for good stealthiness. Our experiments are conducted on two kinds of speech recognition tasks, including testing the stealthiness of poisoned samples by speaker verification and automatic speech recognition. The results show that our method has excellent effectiveness and stealthiness. The rhythm trigger needs a low poisoning rate and gets a very high attack success rate.

8/23/2024

FlowMur: A Stealthy and Practical Audio Backdoor Attack with Limited Knowledge

Jiahe Lan, Jie Wang, Baochen Yan, Zheng Yan, Elisa Bertino

Speech recognition systems driven by DNNs have revolutionized human-computer interaction through voice interfaces, which significantly facilitate our daily lives. However, the growing popularity of these systems also raises special concerns on their security, particularly regarding backdoor attacks. A backdoor attack inserts one or more hidden backdoors into a DNN model during its training process, such that it does not affect the model's performance on benign inputs, but forces the model to produce an adversary-desired output if a specific trigger is present in the model input. Despite the initial success of current audio backdoor attacks, they suffer from the following limitations: (i) Most of them require sufficient knowledge, which limits their widespread adoption. (ii) They are not stealthy enough, thus easy to be detected by humans. (iii) Most of them cannot attack live speech, reducing their practicality. To address these problems, in this paper, we propose FlowMur, a stealthy and practical audio backdoor attack that can be launched with limited knowledge. FlowMur constructs an auxiliary dataset and a surrogate model to augment adversary knowledge. To achieve dynamicity, it formulates trigger generation as an optimization problem and optimizes the trigger over different attachment positions. To enhance stealthiness, we propose an adaptive data poisoning method according to Signal-to-Noise Ratio (SNR). Furthermore, ambient noise is incorporated into the process of trigger generation and data poisoning to make FlowMur robust to ambient noise and improve its practicality. Extensive experiments conducted on two datasets demonstrate that FlowMur achieves high attack performance in both digital and physical settings while remaining resilient to state-of-the-art defenses. In particular, a human study confirms that triggers generated by FlowMur are not easily detected by participants.

7/8/2024