Out-of-Distribution Data: An Acquaintance of Adversarial Examples -- A Survey

2404.05219

Published 4/9/2024 by Naveen Karunanayake, Ravin Gunawardena, Suranga Seneviratne, Sanjay Chawla

Out-of-Distribution Data: An Acquaintance of Adversarial Examples -- A Survey

Abstract

Deep neural networks (DNNs) deployed in real-world applications can encounter out-of-distribution (OOD) data and adversarial examples. These represent distinct forms of distributional shifts that can significantly impact DNNs' reliability and robustness. Traditionally, research has addressed OOD detection and adversarial robustness as separate challenges. This survey focuses on the intersection of these two areas, examining how the research community has investigated them together. Consequently, we identify two key research directions: robust OOD detection and unified robustness. Robust OOD detection aims to differentiate between in-distribution (ID) data and OOD data, even when they are adversarially manipulated to deceive the OOD detector. Unified robustness seeks a single approach to make DNNs robust against both adversarial attacks and OOD inputs. Accordingly, first, we establish a taxonomy based on the concept of distributional shifts. This framework clarifies how robust OOD detection and unified robustness relate to other research areas addressing distributional shifts, such as OOD detection, open set recognition, and anomaly detection. Subsequently, we review existing work on robust OOD detection and unified robustness. Finally, we highlight the limitations of the existing work and propose promising research directions that explore adversarial and OOD inputs within a unified framework.

Create account to get full access

Overview

This paper surveys the topic of out-of-distribution (OOD) data, which is closely related to the concept of adversarial examples.
OOD data refers to inputs that are substantially different from the training data, often causing machine learning models to perform poorly.
The paper examines various aspects of OOD data, including its connection to adversarial examples, methods for detecting and handling OOD samples, and the broader implications for machine learning robustness.

Plain English Explanation

Machine learning models, such as those used for image recognition or language processing, are trained on a specific set of data. However, in the real world, these models may encounter inputs that are quite different from the training data. These "out-of-distribution" (OOD) samples can cause the models to perform poorly or make incorrect predictions.

For example, imagine a model trained to recognize different types of animals in photographs. The model might work well on images of common pets or farm animals, but struggle with more unusual or exotic animals that it hasn't seen before. This OOD problem is closely related to the concept of adversarial examples, which are inputs that have been intentionally modified to trick the model into making mistakes.

Researchers are actively studying ways to detect and handle OOD data, as well as improving the fairness of models when faced with OOD samples. This is an important area of research, as the robustness and reliability of machine learning systems are crucial for their widespread adoption and real-world applications.

Technical Explanation

The paper begins by providing background on deep neural networks and their susceptibility to OOD data and adversarial examples. It then delves into various approaches for detecting OOD samples, such as using confidence thresholds, auxiliary classifiers, and generative models.

The authors also discuss techniques for handling OOD data, including data augmentation, adversarial training, and test-time adaptation. These methods aim to improve the robustness of machine learning models to unexpected or unusual inputs.

Additionally, the paper examines the fairness implications of OOD data, as certain demographic groups may be more likely to encounter OOD samples, leading to potential biases in the model's performance.

Critical Analysis

The paper provides a comprehensive overview of the OOD data problem and the various approaches researchers have explored to address it. However, the authors acknowledge that there are still many open challenges and areas for further research.

For example, the detection of OOD samples can be challenging, as the boundary between in-distribution and OOD data is often not well-defined. Additionally, the handling techniques discussed, such as adversarial training, can be computationally expensive and may not generalize well to all types of OOD data.

The fairness implications of OOD data are also an important area that requires more investigation. The paper suggests that certain demographic groups may be disproportionately affected by OOD samples, but more research is needed to fully understand the extent of this problem and develop effective solutions.

Conclusion

This survey paper provides a comprehensive overview of the out-of-distribution (OOD) data problem, which is closely related to the challenge of adversarial examples. The authors examine various techniques for detecting and handling OOD samples, as well as the fairness implications of this issue.

As machine learning systems become more widely deployed in real-world applications, the robustness and reliability of these models to unexpected or unusual inputs will be crucial. The research discussed in this paper represents an important step towards improving the overall robustness and trustworthiness of AI systems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🎯

OODRobustBench: a Benchmark and Large-Scale Analysis of Adversarial Robustness under Distribution Shift

Lin Li, Yifei Wang, Chawin Sitawarin, Michael Spratling

Existing works have made great progress in improving adversarial robustness, but typically test their method only on data from the same distribution as the training data, i.e. in-distribution (ID) testing. As a result, it is unclear how such robustness generalizes under input distribution shifts, i.e. out-of-distribution (OOD) testing. This omission is concerning as such distribution shifts are unavoidable when methods are deployed in the wild. To address this issue we propose a benchmark named OODRobustBench to comprehensively assess OOD adversarial robustness using 23 dataset-wise shifts (i.e. naturalistic shifts in input distribution) and 6 threat-wise shifts (i.e., unforeseen adversarial threat models). OODRobustBench is used to assess 706 robust models using 60.7K adversarial evaluations. This large-scale analysis shows that: 1) adversarial robustness suffers from a severe OOD generalization issue; 2) ID robustness correlates strongly with OOD robustness in a positive linear way. The latter enables the prediction of OOD robustness from ID robustness. We then predict and verify that existing methods are unlikely to achieve high OOD robustness. Novel methods are therefore required to achieve OOD robustness beyond our prediction. To facilitate the development of these methods, we investigate a wide range of techniques and identify several promising directions. Code and models are available at: https://github.com/OODRobustBench/OODRobustBench.

6/5/2024

cs.LG cs.CV

Out-of-distribution Detection in Medical Image Analysis: A survey

Zesheng Hong, Yubiao Yue, Yubin Chen, Huanjie Lin, Yuanmei Luo, Mini Han Wang, Weidong Wang, Jialong Xu, Xiaoqi Yang, Zhenzhang Li, Sihong Xie

Computer-aided diagnostics has benefited from the development of deep learning-based computer vision techniques in these years. Traditional supervised deep learning methods assume that the test sample is drawn from the identical distribution as the training data. However, it is possible to encounter out-of-distribution samples in real-world clinical scenarios, which may cause silent failure in deep learning-based medical image analysis tasks. Recently, research has explored various out-of-distribution (OOD) detection situations and techniques to enable a trustworthy medical AI system. In this survey, we systematically review the recent advances in OOD detection in medical image analysis. We first explore several factors that may cause a distributional shift when using a deep-learning-based model in clinic scenarios, with three different types of distributional shift well defined on top of these factors. Then a framework is suggested to categorize and feature existing solutions, while the previous studies are reviewed based on the methodology taxonomy. Our discussion also includes evaluation protocols and metrics, as well as the challenge and a research direction lack of exploration.

4/30/2024

cs.CV

Continual Unsupervised Out-of-Distribution Detection

Lars Doorenbos, Raphael Sznitman, Pablo M'arquez-Neila

Deep learning models excel when the data distribution during training aligns with testing data. Yet, their performance diminishes when faced with out-of-distribution (OOD) samples, leading to great interest in the field of OOD detection. Current approaches typically assume that OOD samples originate from an unconcentrated distribution complementary to the training distribution. While this assumption is appropriate in the traditional unsupervised OOD (U-OOD) setting, it proves inadequate when considering the place of deployment of the underlying deep learning model. To better reflect this real-world scenario, we introduce the novel setting of continual U-OOD detection. To tackle this new setting, we propose a method that starts from a U-OOD detector, which is agnostic to the OOD distribution, and slowly updates during deployment to account for the actual OOD distribution. Our method uses a new U-OOD scoring function that combines the Mahalanobis distance with a nearest-neighbor approach. Furthermore, we design a confidence-scaled few-shot OOD detector that outperforms previous methods. We show our method greatly improves upon strong baselines from related fields.

6/5/2024

cs.CV cs.LG

Toward a Realistic Benchmark for Out-of-Distribution Detection

Pietro Recalcati, Fabio Garcea, Luca Piano, Fabrizio Lamberti, Lia Morra

Deep neural networks are increasingly used in a wide range of technologies and services, but remain highly susceptible to out-of-distribution (OOD) samples, that is, drawn from a different distribution than the original training set. A common approach to address this issue is to endow deep neural networks with the ability to detect OOD samples. Several benchmarks have been proposed to design and validate OOD detection techniques. However, many of them are based on far-OOD samples drawn from very different distributions, and thus lack the complexity needed to capture the nuances of real-world scenarios. In this work, we introduce a comprehensive benchmark for OOD detection, based on ImageNet and Places365, that assigns individual classes as in-distribution or out-of-distribution depending on the semantic similarity with the training set. Several techniques can be used to determine which classes should be considered in-distribution, yielding benchmarks with varying properties. Experimental results on different OOD detection techniques show how their measured efficacy depends on the selected benchmark and how confidence-based techniques may outperform classifier-based ones on near-OOD samples.

4/17/2024

cs.LG cs.CV