Deciphering the Definition of Adversarial Robustness for post-hoc OOD Detectors

2406.15104

Published 6/27/2024 by Peter Lorenz, Mario Fernandez, Jens Muller, Ullrich Kothe

🏋️

Abstract

Detecting out-of-distribution (OOD) inputs is critical for safely deploying deep learning models in real-world scenarios. In recent years, many OOD detectors have been developed, and even the benchmarking has been standardized, i.e. OpenOOD. The number of post-hoc detectors is growing fast and showing an option to protect a pre-trained classifier against natural distribution shifts, claiming to be ready for real-world scenarios. However, its efficacy in handling adversarial examples has been neglected in the majority of studies. This paper investigates the adversarial robustness of the 16 post-hoc detectors on several evasion attacks and discuss a roadmap towards adversarial defense in OOD detectors.

Create account to get full access

Overview

• This paper examines the definition of "adversarial robustness" for post-hoc out-of-distribution (OOD) detectors, which are models used to identify data that is different from the training distribution.

• The researchers explore how various definitions of adversarial robustness can affect the performance and behavior of OOD detectors, and provide guidance on evaluating these detectors.

Plain English Explanation

• OOD detectors are used to identify data that is different from what a machine learning model was originally trained on. This is important because models can behave unpredictably when presented with data that is outside their normal operating range.

• The paper looks at how the way we define "adversarial robustness" - the ability of a model to resist adversarial attacks that try to trick it - can impact the performance of OOD detectors. Depending on the definition used, the detectors may behave very differently when faced with unusual or "out-of-distribution" data.

• By understanding these nuances, the researchers aim to help developers build more reliable and effective OOD detectors that can better handle the real-world complexity of data that may differ from the original training set. This is crucial for deploying machine learning models in sensitive or high-stakes applications.

Technical Explanation

• The paper first provides an overview of related work on adversarial robustness, OOD detection, and adversarial examples.

• It then examines different definitions of adversarial robustness and how they impact the behavior of post-hoc OOD detectors, using the OODRobustBench benchmark to evaluate various detectors.

• The experiments show that the choice of adversarial robustness definition can drastically affect the performance and failure modes of OOD detectors, with some definitions leading to detectors that are more robust to distributional shift but less robust to adversarial perturbations.

• The paper provides guidelines for researchers and practitioners on how to thoughtfully define and evaluate adversarial robustness when designing and testing OOD detectors.

Critical Analysis

• The paper acknowledges that the insights are specific to post-hoc OOD detectors, and may not generalize to other types of OOD detection approaches.

• It also notes that the evaluation of adversarial robustness is inherently challenging, as there is no universally agreed-upon definition or set of benchmarks.

• While the paper provides a structured analysis of how different robustness definitions impact OOD detectors, it does not offer a definitive solution or recommendation for the "best" way to define adversarial robustness in this context.

• Further research may be needed to explore the broader implications of these findings, as well as to investigate alternative approaches to building reliable and robust OOD detection systems.

Conclusion

• This paper highlights the importance of carefully defining and evaluating adversarial robustness when developing post-hoc OOD detectors.

• The researchers demonstrate how different robustness definitions can lead to vastly different detector behaviors, underscoring the need for a nuanced understanding of these tradeoffs.

• By providing a structured analysis and guidelines, the paper aims to help researchers and practitioners build more reliable and effective OOD detection systems, which are crucial for the safe and responsible deployment of machine learning models in high-stakes applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Out-of-Distribution Data: An Acquaintance of Adversarial Examples -- A Survey

Naveen Karunanayake, Ravin Gunawardena, Suranga Seneviratne, Sanjay Chawla

Deep neural networks (DNNs) deployed in real-world applications can encounter out-of-distribution (OOD) data and adversarial examples. These represent distinct forms of distributional shifts that can significantly impact DNNs' reliability and robustness. Traditionally, research has addressed OOD detection and adversarial robustness as separate challenges. This survey focuses on the intersection of these two areas, examining how the research community has investigated them together. Consequently, we identify two key research directions: robust OOD detection and unified robustness. Robust OOD detection aims to differentiate between in-distribution (ID) data and OOD data, even when they are adversarially manipulated to deceive the OOD detector. Unified robustness seeks a single approach to make DNNs robust against both adversarial attacks and OOD inputs. Accordingly, first, we establish a taxonomy based on the concept of distributional shifts. This framework clarifies how robust OOD detection and unified robustness relate to other research areas addressing distributional shifts, such as OOD detection, open set recognition, and anomaly detection. Subsequently, we review existing work on robust OOD detection and unified robustness. Finally, we highlight the limitations of the existing work and propose promising research directions that explore adversarial and OOD inputs within a unified framework.

4/9/2024

cs.LG

🎯

OODRobustBench: a Benchmark and Large-Scale Analysis of Adversarial Robustness under Distribution Shift

Lin Li, Yifei Wang, Chawin Sitawarin, Michael Spratling

Existing works have made great progress in improving adversarial robustness, but typically test their method only on data from the same distribution as the training data, i.e. in-distribution (ID) testing. As a result, it is unclear how such robustness generalizes under input distribution shifts, i.e. out-of-distribution (OOD) testing. This omission is concerning as such distribution shifts are unavoidable when methods are deployed in the wild. To address this issue we propose a benchmark named OODRobustBench to comprehensively assess OOD adversarial robustness using 23 dataset-wise shifts (i.e. naturalistic shifts in input distribution) and 6 threat-wise shifts (i.e., unforeseen adversarial threat models). OODRobustBench is used to assess 706 robust models using 60.7K adversarial evaluations. This large-scale analysis shows that: 1) adversarial robustness suffers from a severe OOD generalization issue; 2) ID robustness correlates strongly with OOD robustness in a positive linear way. The latter enables the prediction of OOD robustness from ID robustness. We then predict and verify that existing methods are unlikely to achieve high OOD robustness. Novel methods are therefore required to achieve OOD robustness beyond our prediction. To facilitate the development of these methods, we investigate a wide range of techniques and identify several promising directions. Code and models are available at: https://github.com/OODRobustBench/OODRobustBench.

6/5/2024

cs.LG cs.CV

🖼️

Robust Image Classification in the Presence of Out-of-Distribution and Adversarial Samples Using Attractors in Neural Networks

Nasrin Alipour, Seyyed Ali SeyyedSalehi

The proper handling of out-of-distribution (OOD) samples in deep classifiers is a critical concern for ensuring the suitability of deep neural networks in safety-critical systems. Existing approaches developed for robust OOD detection in the presence of adversarial attacks lose their performance by increasing the perturbation levels. This study proposes a method for robust classification in the presence of OOD samples and adversarial attacks with high perturbation levels. The proposed approach utilizes a fully connected neural network that is trained to use training samples as its attractors, enhancing its robustness. This network has the ability to classify inputs and identify OOD samples as well. To evaluate this method, the network is trained on the MNIST dataset, and its performance is tested on adversarial examples. The results indicate that the network maintains its performance even when classifying adversarial examples, achieving 87.13% accuracy when dealing with highly perturbed MNIST test data. Furthermore, by using fashion-MNIST and CIFAR-10-bw as OOD samples, the network can distinguish these samples from MNIST samples with an accuracy of 98.84% and 99.28%, respectively. In the presence of severe adversarial attacks, these measures decrease slightly to 98.48% and 98.88%, indicating the robustness of the proposed method.

6/18/2024

cs.CV cs.LG eess.IV

A noisy elephant in the room: Is your out-of-distribution detector robust to label noise?

Galadrielle Humblot-Renaux, Sergio Escalera, Thomas B. Moeslund

The ability to detect unfamiliar or unexpected images is essential for safe deployment of computer vision systems. In the context of classification, the task of detecting images outside of a model's training domain is known as out-of-distribution (OOD) detection. While there has been a growing research interest in developing post-hoc OOD detection methods, there has been comparably little discussion around how these methods perform when the underlying classifier is not trained on a clean, carefully curated dataset. In this work, we take a closer look at 20 state-of-the-art OOD detection methods in the (more realistic) scenario where the labels used to train the underlying classifier are unreliable (e.g. crowd-sourced or web-scraped labels). Extensive experiments across different datasets, noise types & levels, architectures and checkpointing strategies provide insights into the effect of class label noise on OOD detection, and show that poor separation between incorrectly classified ID samples vs. OOD samples is an overlooked yet important limitation of existing methods. Code: https://github.com/glhr/ood-labelnoise

4/3/2024

cs.CV cs.AI cs.LG