Approximations to the Fisher Information Metric of Deep Generative Models for Out-Of-Distribution Detection

2403.01485

Published 5/28/2024 by Sam Dauncey, Chris Holmes, Christopher Williams, Fabian Falck

Approximations to the Fisher Information Metric of Deep Generative Models for Out-Of-Distribution Detection

Abstract

Likelihood-based deep generative models such as score-based diffusion models and variational autoencoders are state-of-the-art machine learning models approximating high-dimensional distributions of data such as images, text, or audio. One of many downstream tasks they can be naturally applied to is out-of-distribution (OOD) detection. However, seminal work by Nalisnick et al. which we reproduce showed that deep generative models consistently infer higher log-likelihoods for OOD data than data they were trained on, marking an open problem. In this work, we analyse using the gradient of a data point with respect to the parameters of the deep generative model for OOD detection, based on the simple intuition that OOD data should have larger gradient norms than training data. We formalise measuring the size of the gradient as approximating the Fisher information metric. We show that the Fisher information matrix (FIM) has large absolute diagonal values, motivating the use of chi-square distributed, layer-wise gradient norms as features. We combine these features to make a simple, model-agnostic and hyperparameter-free method for OOD detection which estimates the joint density of the layer-wise gradient norms for a given data point. We find that these layer-wise gradient norms are weakly correlated, rendering their combined usage informative, and prove that the layer-wise gradient norms satisfy the principle of (data representation) invariance. Our empirical results indicate that this method outperforms the Typicality test for most deep generative models and image dataset pairings.

Create account to get full access

Overview

The paper proposes methods for detecting out-of-distribution (OOD) samples using approximations to the Fisher Information Metric (FIM) of deep generative models.
OOD detection is important for ensuring the reliability and safety of machine learning systems in real-world applications.
The authors explore several approaches to approximating the FIM, including using the gradient-regularized OOD detection, trajectory volatility, and information-theoretic framework methods.
The proposed techniques aim to provide more robust and accurate OOD detection compared to existing deep metric learning-based approaches.

Plain English Explanation

The paper focuses on a important problem in machine learning called "out-of-distribution" (OOD) detection. This refers to the ability of a model to recognize when it's being presented with data that is very different from the type of data it was trained on.

For example, imagine a model that was trained to recognize images of dogs and cats. If you showed it an image of a car, the model should be able to detect that this is something completely different from what it was trained on, and flag it as "out-of-distribution".

The authors propose using a mathematical concept called the "Fisher Information Metric" (FIM) to help detect these OOD samples. The FIM basically measures how sensitive a model's outputs are to changes in its input data.

The key insight is that OOD samples will likely cause much larger changes in the model's outputs compared to in-distribution samples. So by approximating the FIM, the authors develop several techniques to identify these OOD cases more effectively than existing methods.

The paper explores different ways to approximate the FIM, including using the gradient-regularized approach, analyzing the volatility of the model's internal trajectories, and applying an information-theoretic framework.

Ultimately, the goal is to make machine learning models more reliable and robust, so they can be safely deployed in real-world applications where encountering unexpected, "out-of-distribution" data is a common challenge.

Technical Explanation

The paper introduces several approximations to the Fisher Information Metric (FIM) for the purpose of out-of-distribution (OOD) detection in deep generative models.

The FIM provides a measure of how sensitive a model's outputs are to changes in its input data. The authors hypothesize that OOD samples will induce larger changes in the model's outputs compared to in-distribution samples, and can thus be detected by approximating the FIM.

Three key FIM approximation methods are explored:

Gradient-regularized OOD detection: This approach leverages the fact that the gradients of the model's outputs with respect to its inputs will be larger for OOD samples. Regularizing the model to have small gradients can help distinguish OOD cases.
Trajectory volatility: The authors analyze the volatility of the model's internal representations (feature trajectories) as the input data is perturbed. OOD samples are expected to cause more volatile trajectories.
Information-theoretic framework: This method uses mutual information between the model's inputs and outputs as a proxy for the FIM. OOD samples are expected to have lower mutual information.

The proposed techniques are evaluated on a range of OOD detection benchmarks and are shown to outperform existing deep metric learning-based approaches.

Critical Analysis

The paper presents a thorough investigation of different FIM approximation methods for OOD detection, with a strong theoretical foundation and extensive empirical evaluation.

One potential limitation is that the methods rely on access to the model's internal representations and gradients, which may not always be readily available, especially in the context of large language models where the models are often treated as black boxes.

Additionally, the paper focuses on deep generative models, and it's unclear how well the proposed techniques would generalize to other model architectures or tasks beyond image generation.

Further research could explore ways to make the FIM approximations more efficient and applicable to a wider range of machine learning models and real-world scenarios.

Conclusion

This paper makes a significant contribution to the field of out-of-distribution detection by proposing several novel techniques based on approximations to the Fisher Information Metric. The authors demonstrate that their methods can outperform existing approaches, providing a more robust and reliable way to ensure the safety and reliability of machine learning systems in real-world applications.

The insights and techniques presented in this work have the potential to improve the overall robustness and trustworthiness of deep learning models, which is crucial as these models become increasingly ubiquitous in high-stakes decision-making processes. Further developments in this area could have far-reaching implications for the responsible and ethical deployment of AI systems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Gradient-Regularized Out-of-Distribution Detection

Sina Sharifi, Taha Entesari, Bardia Safaei, Vishal M. Patel, Mahyar Fazlyab

One of the challenges for neural networks in real-life applications is the overconfident errors these models make when the data is not from the original training distribution. Addressing this issue is known as Out-of-Distribution (OOD) detection. Many state-of-the-art OOD methods employ an auxiliary dataset as a surrogate for OOD data during training to achieve improved performance. However, these methods fail to fully exploit the local information embedded in the auxiliary dataset. In this work, we propose the idea of leveraging the information embedded in the gradient of the loss function during training to enable the network to not only learn a desired OOD score for each sample but also to exhibit similar behavior in a local neighborhood around each sample. We also develop a novel energy-based sampling method to allow the network to be exposed to more informative OOD samples during the training phase. This is especially important when the auxiliary dataset is large. We demonstrate the effectiveness of our method through extensive experiments on several OOD benchmarks, improving the existing state-of-the-art FPR95 by 4% on our ImageNet experiment. We further provide a theoretical analysis through the lens of certified robustness and Lipschitz analysis to showcase the theoretical foundation of our work. We will publicly release our code after the review process.

4/24/2024

cs.CV cs.LG

A Geometric Explanation of the Likelihood OOD Detection Paradox

Hamidreza Kamkari, Brendan Leigh Ross, Jesse C. Cresswell, Anthony L. Caterini, Rahul G. Krishnan, Gabriel Loaiza-Ganem

Likelihood-based deep generative models (DGMs) commonly exhibit a puzzling behaviour: when trained on a relatively complex dataset, they assign higher likelihood values to out-of-distribution (OOD) data from simpler sources. Adding to the mystery, OOD samples are never generated by these DGMs despite having higher likelihoods. This two-pronged paradox has yet to be conclusively explained, making likelihood-based OOD detection unreliable. Our primary observation is that high-likelihood regions will not be generated if they contain minimal probability mass. We demonstrate how this seeming contradiction of large densities yet low probability mass can occur around data confined to low-dimensional manifolds. We also show that this scenario can be identified through local intrinsic dimension (LID) estimation, and propose a method for OOD detection which pairs the likelihoods and LID estimates obtained from a pre-trained DGM. Our method can be applied to normalizing flows and score-based diffusion models, and obtains results which match or surpass state-of-the-art OOD detection benchmarks using the same DGM backbones. Our code is available at https://github.com/layer6ai-labs/dgm_ood_detection.

6/13/2024

cs.LG cs.AI cs.CV stat.ML

🔎

Trajectory Volatility for Out-of-Distribution Detection in Mathematical Reasoning

Yiming Wang, Pei Zhang, Baosong Yang, Derek F. Wong, Zhuosheng Zhang, Rui Wang

Real-world data deviating from the independent and identically distributed (i.i.d.) assumption of in-distribution training data poses security threats to deep networks, thus advancing out-of-distribution (OOD) detection algorithms. Detection methods in generative language models (GLMs) mainly focus on uncertainty estimation and embedding distance measurement, with the latter proven to be most effective in traditional linguistic tasks like summarization and translation. However, another complex generative scenario mathematical reasoning poses significant challenges to embedding-based methods due to its high-density feature of output spaces, but this feature causes larger discrepancies in the embedding shift trajectory between different samples in latent spaces. Hence, we propose a trajectory-based method TV score, which uses trajectory volatility for OOD detection in mathematical reasoning. Experiments show that our method outperforms all traditional algorithms on GLMs under mathematical reasoning scenarios and can be extended to more applications with high-density features in output spaces, such as multiple-choice questions.

5/24/2024

cs.CL cs.AI cs.LG

🖼️

An Information-Theoretic Framework for Out-of-Distribution Generalization

Wenliang Liu, Guanding Yu, Lele Wang, Renjie Liao

We study the Out-of-Distribution (OOD) generalization in machine learning and propose a general framework that provides information-theoretic generalization bounds. Our framework interpolates freely between Integral Probability Metric (IPM) and $f$-divergence, which naturally recovers some known results (including Wasserstein- and KL-bounds), as well as yields new generalization bounds. Moreover, we show that our framework admits an optimal transport interpretation. When evaluated in two concrete examples, the proposed bounds either strictly improve upon existing bounds in some cases or recover the best among existing OOD generalization bounds.

4/1/2024

cs.IT cs.LG