DistriBlock: Identifying adversarial audio samples by leveraging characteristics of the output distribution

Read original: arXiv:2305.17000 - Published 7/11/2024 by Mat'ias P. Pizarro B., Dorothea Kolossa, Asja Fischer

🤔

Overview

The paper proposes DistriBlock, an efficient strategy to detect adversarial attacks on automatic speech recognition (ASR) systems.
Adversarial attacks can mislead ASR systems into predicting an arbitrary target text, posing a security threat.
DistriBlock measures characteristics of the probability distribution over output tokens at each time step and uses these to distinguish benign and adversarial inputs.
The authors demonstrate the strong performance of their approach across different ASR systems and datasets.

Plain English Explanation

Automatic speech recognition (ASR) systems are used in many applications, such as virtual assistants and voice-controlled devices. However, these systems can be vulnerable to adversarial attacks, where small, carefully crafted changes to the input audio can cause the system to output completely different text than what was said.

To address this security threat, the researchers developed a method called DistriBlock. DistriBlock analyzes the probability distribution of the output tokens at each time step during ASR. It looks at the median, maximum, and minimum of these probabilities, as well as measures of the distribution's entropy and how different it is from the distribution in the next time step.

By using machine learning models to detect patterns in these distribution characteristics, DistriBlock can reliably distinguish between normal speech and adversarial attacks across different ASR systems and datasets. The authors show it achieves over 99% accuracy in this task, significantly outperforming previous approaches.

One key insight is that adversarial attacks that can bypass DistriBlock become much noisier, making them easier to detect through other filtering methods. This provides an additional layer of protection for ASR systems.

Technical Explanation

The paper proposes DistriBlock, a novel strategy for detecting adversarial attacks on automatic speech recognition (ASR) systems. ASR systems are vulnerable to adversarial attacks, where small, carefully crafted changes to the input audio can cause the system to output completely different text than what was said.

DistriBlock focuses on analyzing the probability distribution over output tokens at each time step of the ASR system. Specifically, it measures:

The median, maximum, and minimum of the output probabilities
The entropy of the distribution
The Kullback-Leibler and Jensen-Shannon divergence between the distribution and that of the subsequent time step

The authors then leverage these distribution characteristics to train binary classifiers, including simple thresholds, ensemble models, and neural networks, to distinguish between benign and adversarial inputs.

Through extensive evaluation across different state-of-the-art ASR systems and language datasets, the authors demonstrate the strong performance of DistriBlock. It achieves a mean area under the receiver operating characteristic (ROC) curve of 99% for detecting target adversarial examples against clean data, and 97% against noisy data.

To assess the robustness of their method, the authors also show that adaptive adversarial examples designed to bypass DistriBlock become much noisier. This makes them easier to detect through additional filtering, providing an extra layer of protection for ASR systems.

Critical Analysis

The paper presents a comprehensive and well-designed study on detecting adversarial attacks against automatic speech recognition systems. The authors thoroughly evaluate their DistriBlock approach across multiple ASR models and datasets, showcasing its strong performance.

One limitation is that the paper does not explore the computational overhead or real-time performance of DistriBlock, which would be important considerations for practical deployment. Additionally, the authors only test against targeted adversarial attacks, and it would be valuable to examine the method's effectiveness against more general, non-targeted attacks as well.

The paper could also be strengthened by a more in-depth discussion of the underlying reasons why the distribution characteristics used by DistriBlock are effective at distinguishing benign and adversarial inputs. A deeper analysis of the distribution properties could provide additional insights.

Furthermore, the authors could explore the applicability of DistriBlock to other audio-based security domains, such as speaker verification or speech emotion recognition, where adversarial attacks pose similar threats.

Overall, the DistriBlock approach represents a promising step towards enhancing the robustness of automatic speech recognition systems against adversarial attacks. Further research to address the identified limitations and expand the scope of the method could further strengthen its practical impact.

Conclusion

The paper proposes DistriBlock, an efficient strategy for detecting adversarial attacks on automatic speech recognition (ASR) systems. By analyzing the probability distribution of output tokens at each time step, DistriBlock can reliably distinguish between benign speech and adversarial inputs across different ASR models and datasets.

The strong performance of DistriBlock, along with the insight that adaptive adversarial attacks become much noisier and easier to detect, suggests it could be a valuable tool for enhancing the security and robustness of ASR systems. Further research to optimize the method's efficiency and explore its applicability to other audio-based security domains could expand its impact and importance in the field.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🤔

DistriBlock: Identifying adversarial audio samples by leveraging characteristics of the output distribution

Mat'ias P. Pizarro B., Dorothea Kolossa, Asja Fischer

Adversarial attacks can mislead automatic speech recognition (ASR) systems into predicting an arbitrary target text, thus posing a clear security threat. To prevent such attacks, we propose DistriBlock, an efficient detection strategy applicable to any ASR system that predicts a probability distribution over output tokens in each time step. We measure a set of characteristics of this distribution: the median, maximum, and minimum over the output probabilities, the entropy of the distribution, as well as the Kullback-Leibler and the Jensen-Shannon divergence with respect to the distributions of the subsequent time step. Then, by leveraging the characteristics observed for both benign and adversarial data, we apply binary classifiers, including simple threshold-based classification, ensembles of such classifiers, and neural networks. Through extensive analysis across different state-of-the-art ASR systems and language data sets, we demonstrate the supreme performance of this approach, with a mean area under the receiver operating characteristic curve for distinguishing target adversarial examples against clean and noisy data of 99% and 97%, respectively. To assess the robustness of our method, we show that adaptive adversarial examples that can circumvent DistriBlock are much noisier, which makes them easier to detect through filtering and creates another avenue for preserving the system's robustness.

7/11/2024

Detecting and Defending Against Adversarial Attacks on Automatic Speech Recognition via Diffusion Models

Nikolai L. Kuhne, Astrid H. F. Kitchen, Marie S. Jensen, Mikkel S. L. Br{o}ndt, Martin Gonzalez, Christophe Biscio, Zheng-Hua Tan

Automatic speech recognition (ASR) systems are known to be vulnerable to adversarial attacks. This paper addresses detection and defence against targeted white-box attacks on speech signals for ASR systems. While existing work has utilised diffusion models (DMs) to purify adversarial examples, achieving state-of-the-art results in keyword spotting tasks, their effectiveness for more complex tasks such as sentence-level ASR remains unexplored. Additionally, the impact of the number of forward diffusion steps on performance is not well understood. In this paper, we systematically investigate the use of DMs for defending against adversarial attacks on sentences and examine the effect of varying forward diffusion steps. Through comprehensive experiments on the Mozilla Common Voice dataset, we demonstrate that two forward diffusion steps can completely defend against adversarial attacks on sentences. Moreover, we introduce a novel, training-free approach for detecting adversarial attacks by leveraging a pre-trained DM. Our experimental results show that this method can detect adversarial attacks with high accuracy.

9/13/2024

💬

$DA^3$: A Distribution-Aware Adversarial Attack against Language Models

Yibo Wang, Xiangjue Dong, James Caverlee, Philip S. Yu

Language models can be manipulated by adversarial attacks, which introduce subtle perturbations to input data. While recent attack methods can achieve a relatively high attack success rate (ASR), we've observed that the generated adversarial examples have a different data distribution compared with the original examples. Specifically, these adversarial examples exhibit reduced confidence levels and greater divergence from the training data distribution. Consequently, they are easy to detect using straightforward detection methods, diminishing the efficacy of such attacks. To address this issue, we propose a Distribution-Aware Adversarial Attack ($DA^3$) method. $DA^3$ considers the distribution shifts of adversarial examples to improve attacks' effectiveness under detection methods. We further design a novel evaluation metric, the Non-detectable Attack Success Rate (NASR), which integrates both ASR and detectability for the attack task. We conduct experiments on four widely used datasets to validate the attack effectiveness and transferability of adversarial examples generated by $DA^3$ against both the white-box BERT-base and RoBERTa-base models and the black-box LLaMA2-7b model.

9/24/2024

🖼️

Diffusion-Based Adversarial Purification for Speaker Verification

Yibo Bai, Xiao-Lei Zhang, Xuelong Li

Recently, automatic speaker verification (ASV) based on deep learning is easily contaminated by adversarial attacks, which is a new type of attack that injects imperceptible perturbations to audio signals so as to make ASV produce wrong decisions. This poses a significant threat to the security and reliability of ASV systems. To address this issue, we propose a Diffusion-Based Adversarial Purification (DAP) method that enhances the robustness of ASV systems against such adversarial attacks. Our method leverages a conditional denoising diffusion probabilistic model to effectively purify the adversarial examples and mitigate the impact of perturbations. DAP first introduces controlled noise into adversarial examples, and then performs a reverse denoising process to reconstruct clean audio. Experimental results demonstrate the efficacy of the proposed DAP in enhancing the security of ASV and meanwhile minimizing the distortion of the purified audio signals.

7/10/2024