Feature Density Estimation for Out-of-Distribution Detection via Normalizing Flows

2402.06537

Published 5/1/2024 by Evan D. Cook, Marc-Antoine Lavoie, Steven L. Waslander

✨

Abstract

Out-of-distribution (OOD) detection is a critical task for safe deployment of learning systems in the open world setting. In this work, we investigate the use of feature density estimation via normalizing flows for OOD detection and present a fully unsupervised approach which requires no exposure to OOD data, avoiding researcher bias in OOD sample selection. This is a post-hoc method which can be applied to any pretrained model, and involves training a lightweight auxiliary normalizing flow model to perform the out-of-distribution detection via density thresholding. Experiments on OOD detection in image classification show strong results for far-OOD data detection with only a single epoch of flow training, including 98.2% AUROC for ImageNet-1k vs. Textures, which exceeds the state of the art by 7.8%. We additionally explore the connection between the feature space distribution of the pretrained model and the performance of our method. Finally, we provide insights into training pitfalls that have plagued normalizing flows for use in OOD detection.

Create account to get full access

Overview

This paper investigates the use of feature density estimation via normalizing flows for out-of-distribution (OOD) detection.
The authors present a fully unsupervised approach that does not require any exposure to OOD data, avoiding researcher bias in OOD sample selection.
The method involves training a lightweight auxiliary normalizing flow model to perform OOD detection via density thresholding.
Experiments on OOD detection in image classification show strong results for far-OOD data detection with only a single epoch of flow training.
The paper also explores the connection between the feature space distribution of the pretrained model and the performance of the OOD detection method.
Additionally, the paper provides insights into training pitfalls that have plagued normalizing flows for use in OOD detection.

Plain English Explanation

Out-of-distribution (OOD) detection is an important task for ensuring the safe deployment of machine learning systems in the real world. Imagine you've trained a model to recognize different types of animals in images, but you want to make sure it can also detect when it sees something that isn't an animal at all, like a car or a building. That's where OOD detection comes in.

In this paper, the researchers investigate a new approach to OOD detection that uses a technique called "normalizing flows." Normalizing flows are a way of modeling the distribution of data, which can be useful for identifying when a new sample doesn't fit that distribution. The key advantage of this approach is that it doesn't require any prior examples of out-of-distribution data, which can be tricky to obtain and can introduce bias.

Instead, the method works by training a separate, lightweight model to estimate the density of the features learned by the main model. This density information can then be used to identify when a new sample is significantly different from the data the main model was trained on, and thus likely to be out-of-distribution.

The researchers found that this approach works really well, especially at detecting OOD data that is very different from the training data. For example, they achieved a 98.2% accuracy score on detecting images from the ImageNet dataset when the main model was trained on a different dataset called Textures. This result even exceeds the current state-of-the-art for this type of task.

The paper also explores how the properties of the main model's feature space can impact the performance of the OOD detection method, and provides insights into some of the challenges that have historically made it difficult to use normalizing flows for this purpose.

Technical Explanation

The core idea of this work is to use feature density estimation via normalizing flows for the task of out-of-distribution (OOD) detection. Normalizing flows are a class of generative models that can learn a flexible probability density function over the input space. By training a lightweight auxiliary normalizing flow model on the feature representations of a pre-trained classification model, the authors show that effective OOD detection can be achieved in a fully unsupervised manner, without requiring any exposure to OOD data.

The key advantages of this approach are:

It avoids the challenge of OOD sample selection, which can introduce researcher bias.
The auxiliary flow model can be trained efficiently in a single epoch, making it a practical post-hoc method that can be applied to any pre-trained model.

The authors extensively evaluate their OOD detection method on image classification benchmarks, considering both "near-OOD" and "far-OOD" settings. For the far-OOD case, they report a remarkable 98.2% AUROC on the ImageNet-1k vs. Textures task, exceeding the previous state-of-the-art by 7.8%.

Additionally, the paper provides insights into the connection between the feature space distribution of the pre-trained model and the performance of the OOD detection method. The authors found that models with more compact feature distributions tend to yield better OOD detection results using their approach.

Finally, the paper discusses training pitfalls that have historically made it challenging to use normalizing flows effectively for OOD detection, and how their method helps address these issues.

Critical Analysis

The authors present a compelling and practical approach to out-of-distribution detection that avoids the challenges of OOD sample selection. By leveraging normalizing flows to estimate the density of the feature representations, their method can be applied as a post-hoc technique to any pre-trained model without requiring access to OOD data.

One notable strength of this work is the strong empirical results, particularly on the far-OOD detection tasks. Achieving a 98.2% AUROC on ImageNet-1k vs. Textures is an impressive feat that significantly advances the state-of-the-art in this area.

That said, the paper does not extensively explore the limitations or potential failure modes of the proposed approach. For example, it would be valuable to understand how the method performs on more subtle or "near-OOD" cases, where the out-of-distribution samples are more similar to the in-distribution data. Additionally, the paper does not provide much insight into the computational overhead or inference latency of the auxiliary normalizing flow model, which could be an important practical consideration for real-world deployment.

Another area for further research could be investigating how the OOD detection performance varies across different types of pre-trained models (e.g., convolutional neural networks vs. transformers) or different training regimes (e.g., self-supervised learning vs. supervised training). Exploring these connections could yield additional insights into the relationship between the feature space distribution and OOD detection capabilities.

Overall, this work presents a promising and innovative approach to the critical problem of out-of-distribution detection. While there are still avenues for further research and refinement, the authors have made a significant contribution to the field.

Conclusion

This paper investigates the use of normalizing flows for out-of-distribution (OOD) detection, a crucial task for ensuring the safe deployment of machine learning systems in real-world settings. The authors present a fully unsupervised approach that avoids the challenges of OOD sample selection, which can introduce researcher bias.

By training a lightweight auxiliary normalizing flow model to estimate the density of the feature representations learned by a pre-trained classification model, the method can effectively identify OOD samples without any prior exposure to out-of-distribution data. The strong empirical results, particularly on far-OOD detection tasks, demonstrate the significant potential of this approach.

While the paper does not extensively explore the limitations of the method, it provides valuable insights into the relationship between the feature space distribution and OOD detection performance. Additionally, the authors highlight important training pitfalls that have historically plagued the use of normalizing flows for this purpose.

Overall, this work represents an important step forward in the field of OOD detection, with the potential to enable more reliable and robust machine learning systems that can safely operate in the open world. As the field continues to evolve, this research can serve as a foundation for further advancements in this critical area of study.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🔎

Out-of-distribution detection based on subspace projection of high-dimensional features output by the last convolutional layer

Qiuyu Zhu, Yiwei He

Out-of-distribution (OOD) detection, crucial for reliable pattern classification, discerns whether a sample originates outside the training distribution. This paper concentrates on the high-dimensional features output by the final convolutional layer, which contain rich image features. Our key idea is to project these high-dimensional features into two specific feature subspaces, leveraging the dimensionality reduction capacity of the network's linear layers, trained with Predefined Evenly-Distribution Class Centroids (PEDCC)-Loss. This involves calculating the cosines of three projection angles and the norm values of features, thereby identifying distinctive information for in-distribution (ID) and OOD data, which assists in OOD detection. Building upon this, we have modified the batch normalization (BN) and ReLU layer preceding the fully connected layer, diminishing their impact on the output feature distributions and thereby widening the distribution gap between ID and OOD data features. Our method requires only the training of the classification network model, eschewing any need for input pre-processing or specific OOD data pre-tuning. Extensive experiments on several benchmark datasets demonstrates that our approach delivers state-of-the-art performance. Our code is available at https://github.com/Hewell0/ProjOOD.

5/6/2024

cs.CV

Pursuing Feature Separation based on Neural Collapse for Out-of-Distribution Detection

Yingwen Wu, Ruiji Yu, Xinwen Cheng, Zhengbao He, Xiaolin Huang

In the open world, detecting out-of-distribution (OOD) data, whose labels are disjoint with those of in-distribution (ID) samples, is important for reliable deep neural networks (DNNs). To achieve better detection performance, one type of approach proposes to fine-tune the model with auxiliary OOD datasets to amplify the difference between ID and OOD data through a separation loss defined on model outputs. However, none of these studies consider enlarging the feature disparity, which should be more effective compared to outputs. The main difficulty lies in the diversity of OOD samples, which makes it hard to describe their feature distribution, let alone design losses to separate them from ID features. In this paper, we neatly fence off the problem based on an aggregation property of ID features named Neural Collapse (NC). NC means that the penultimate features of ID samples within a class are nearly identical to the last layer weight of the corresponding class. Based on this property, we propose a simple but effective loss called OrthLoss, which binds the features of OOD data in a subspace orthogonal to the principal subspace of ID features formed by NC. In this way, the features of ID and OOD samples are separated by different dimensions. By optimizing the feature separation loss rather than purely enlarging output differences, our detection achieves SOTA performance on CIFAR benchmarks without any additional data augmentation or sampling, demonstrating the importance of feature separation in OOD detection. The code will be published.

5/29/2024

cs.CV cs.LG

🧠

Detecting Out-of-Distribution Through the Lens of Neural Collapse

Litian Liu, Yao Qin

Efficient and versatile Out-of-Distribution (OOD) detection is essential for the safe deployment of AI yet remains challenging for existing algorithms. Inspired by Neural Collapse, we discover that features of in-distribution (ID) samples cluster closer to the weight vectors compared to features of OOD samples. In addition, we reveal that ID features tend to expand in space to structure a simplex Equiangular Tight Framework, which nicely explains the prevalent observation that ID features reside further from the origin than OOD features. Taking both insights from Neural Collapse into consideration, we propose to leverage feature proximity to weight vectors for OOD detection and further complement this perspective by using feature norms to filter OOD samples. Extensive experiments on off-the-shelf models demonstrate the efficiency and effectiveness of our method across diverse classification tasks and model architectures, enhancing the generalization capability of OOD detection.

6/3/2024

cs.LG eess.IV

Continual Unsupervised Out-of-Distribution Detection

Lars Doorenbos, Raphael Sznitman, Pablo M'arquez-Neila

Deep learning models excel when the data distribution during training aligns with testing data. Yet, their performance diminishes when faced with out-of-distribution (OOD) samples, leading to great interest in the field of OOD detection. Current approaches typically assume that OOD samples originate from an unconcentrated distribution complementary to the training distribution. While this assumption is appropriate in the traditional unsupervised OOD (U-OOD) setting, it proves inadequate when considering the place of deployment of the underlying deep learning model. To better reflect this real-world scenario, we introduce the novel setting of continual U-OOD detection. To tackle this new setting, we propose a method that starts from a U-OOD detector, which is agnostic to the OOD distribution, and slowly updates during deployment to account for the actual OOD distribution. Our method uses a new U-OOD scoring function that combines the Mahalanobis distance with a nearest-neighbor approach. Furthermore, we design a confidence-scaled few-shot OOD detector that outperforms previous methods. We show our method greatly improves upon strong baselines from related fields.

6/5/2024

cs.CV cs.LG