FSP-Laplace: Function-Space Priors for the Laplace Approximation in Bayesian Deep Learning

Read original: arXiv:2407.13711 - Published 7/19/2024 by Tristan Cinquin, Marvin Pfortner, Vincent Fortuin, Philipp Hennig, Robert Bamler

FSP-Laplace: Function-Space Priors for the Laplace Approximation in Bayesian Deep Learning

Overview

The paper proposes a new approach called "FSP-Laplace" for Bayesian deep learning, which uses function-space priors to improve the Laplace approximation.
The Laplace approximation is a common technique in Bayesian deep learning, but it can be sensitive to the choice of parameter-space priors.
FSP-Laplace addresses this issue by using priors defined directly in the function space, rather than the parameter space.
The authors show that this leads to more robust and accurate Bayesian inference, with benefits for model uncertainty quantification and out-of-distribution detection.

Plain English Explanation

FSP-Laplace: Function-Space Priors for the Laplace Approximation in Bayesian Deep Learning proposes a new way to do Bayesian deep learning, which is a technique that allows neural networks to capture uncertainty in their predictions.

Bayesian deep learning models typically use the Laplace approximation to estimate the model's posterior distribution, which represents the uncertainty in the model's parameters. However, the Laplace approximation can be sensitive to the choice of prior distribution on the model parameters.

The key idea in this paper is to define the prior distribution directly on the function space of the neural network, rather than on the individual parameters. This "function-space prior" (FSP) approach leads to more robust and accurate Bayesian inference, with benefits for tasks like quantifying model uncertainty and detecting when the model is being used on data that is very different from what it was trained on.

Technical Explanation

FSP-Laplace introduces a new way to perform Bayesian inference in deep neural networks using the Laplace approximation. The traditional Laplace approximation relies on defining a prior distribution over the model parameters, but this can be sensitive to the choice of prior.

To address this issue, the authors propose using a function-space prior instead of a parameter-space prior. This means defining the prior directly on the function mapping from inputs to outputs, rather than on the individual weights and biases of the network.

The key technical contribution is a method for efficiently computing the Laplace approximation in the function space, using a variational approach to estimate the posterior distribution. This leads to more robust Bayesian inference that is less sensitive to the prior choice.

The authors demonstrate the benefits of FSP-Laplace on a range of benchmark tasks, showing improvements in uncertainty quantification and out-of-distribution detection compared to standard Laplace approximation methods.

Critical Analysis

The paper presents a compelling approach to addressing a key limitation of the Laplace approximation in Bayesian deep learning. By moving to a function-space prior, the authors are able to sidestep issues with parameter-space priors that can lead to poor posterior estimates.

However, the paper does not explore some potential limitations of the FSP-Laplace method. For example, the computational complexity of the variational inference procedure may be a practical concern, especially for large-scale models. Additionally, the authors do not investigate how the choice of function-space prior itself may impact the results, an area that could benefit from further study.

More broadly, while the paper demonstrates promising empirical results, there may be open questions about the theoretical properties and generalization of the function-space prior approach. Further research into the mathematical foundations could help solidify the conceptual advantages of this technique.

Overall, this paper represents an important step forward in Bayesian deep learning, but there remain opportunities to build upon this work and explore its broader implications.

Conclusion

FSP-Laplace introduces a new function-space prior approach to improve the Laplace approximation in Bayesian deep learning. By defining the prior directly on the function mapping rather than the model parameters, the method is able to produce more robust and accurate posterior estimates.

This has significant potential benefits for tasks like uncertainty quantification and out-of-distribution detection, which are crucial for the safe and reliable deployment of deep learning systems. While the paper leaves some avenues for further research, it represents an important advance in the field of Bayesian deep learning.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

FSP-Laplace: Function-Space Priors for the Laplace Approximation in Bayesian Deep Learning

Tristan Cinquin, Marvin Pfortner, Vincent Fortuin, Philipp Hennig, Robert Bamler

Laplace approximations are popular techniques for endowing deep networks with epistemic uncertainty estimates as they can be applied without altering the predictions of the neural network, and they scale to large models and datasets. While the choice of prior strongly affects the resulting posterior distribution, computational tractability and lack of interpretability of weight space typically limit the Laplace approximation to isotropic Gaussian priors, which are known to cause pathological behavior as depth increases. As a remedy, we directly place a prior on function space. More precisely, since Lebesgue densities do not exist on infinite-dimensional function spaces, we have to recast training as finding the so-called weak mode of the posterior measure under a Gaussian process (GP) prior restricted to the space of functions representable by the neural network. Through the GP prior, one can express structured and interpretable inductive biases, such as regularity or periodicity, directly in function space, while still exploiting the implicit inductive biases that allow deep networks to generalize. After model linearization, the training objective induces a negative log-posterior density to which we apply a Laplace approximation, leveraging highly scalable methods from matrix-free linear algebra. Our method provides improved results where prior knowledge is abundant, e.g., in many scientific inference tasks. At the same time, it stays competitive for black-box regression and classification tasks where neural networks typically excel.

7/19/2024

📶

Generalized Laplace Approximation

Yinsong Chen, Samson S. Yu, Zhong Li, Chee Peng Lim

In recent years, the inconsistency in Bayesian deep learning has garnered increasing attention. Tempered or generalized posterior distributions often offer a direct and effective solution to this issue. However, understanding the underlying causes and evaluating the effectiveness of generalized posteriors remain active areas of research. In this study, we introduce a unified theoretical framework to attribute Bayesian inconsistency to model misspecification and inadequate priors. We interpret the generalization of the posterior with a temperature factor as a correction for misspecified models through adjustments to the joint probability model, and the recalibration of priors by redistributing probability mass on models within the hypothesis space using data samples. Additionally, we highlight a distinctive feature of Laplace approximation, which ensures that the generalized normalizing constant can be treated as invariant, unlike the typical scenario in general Bayesian learning where this constant varies with model parameters post-generalization. Building on this insight, we propose the generalized Laplace approximation, which involves a simple adjustment to the computation of the Hessian matrix of the regularized loss function. This method offers a flexible and scalable framework for obtaining high-quality posterior distributions. We assess the performance and properties of the generalized Laplace approximation on state-of-the-art neural networks and real-world datasets.

5/27/2024

🤿

Efficient Bayesian Updates for Deep Learning via Laplace Approximations

Denis Huseljic, Marek Herde, Lukas Rauch, Paul Hahn, Zhixin Huang, Daniel Kottke, Stephan Vogt, Bernhard Sick

Since training deep neural networks takes significant computational resources, extending the training dataset with new data is difficult, as it typically requires complete retraining. Moreover, specific applications do not allow costly retraining due to time or computational constraints. We address this issue by proposing a novel Bayesian update method for deep neural networks by using a last-layer Laplace approximation. Concretely, we leverage second-order optimization techniques on the Gaussian posterior distribution of a Laplace approximation, computing the inverse Hessian matrix in closed form. This way, our method allows for fast and effective updates upon the arrival of new data in a stationary setting. A large-scale evaluation study across different data modalities confirms that our updates are a fast and competitive alternative to costly retraining. Furthermore, we demonstrate its applicability in a deep active learning scenario by using our update to improve existing selection strategies.

7/15/2024

🤿

Variational Linearized Laplace Approximation for Bayesian Deep Learning

Luis A. Ortega, Sim'on Rodr'iguez Santana, Daniel Hern'andez-Lobato

The Linearized Laplace Approximation (LLA) has been recently used to perform uncertainty estimation on the predictions of pre-trained deep neural networks (DNNs). However, its widespread application is hindered by significant computational costs, particularly in scenarios with a large number of training points or DNN parameters. Consequently, additional approximations of LLA, such as Kronecker-factored or diagonal approximate GGN matrices, are utilized, potentially compromising the model's performance. To address these challenges, we propose a new method for approximating LLA using a variational sparse Gaussian Process (GP). Our method is based on the dual RKHS formulation of GPs and retains, as the predictive mean, the output of the original DNN. Furthermore, it allows for efficient stochastic optimization, which results in sub-linear training time in the size of the training dataset. Specifically, its training cost is independent of the number of training points. We compare our proposed method against accelerated LLA (ELLA), which relies on the Nystrom approximation, as well as other LLA variants employing the sample-then-optimize principle. Experimental results, both on regression and classification datasets, show that our method outperforms these already existing efficient variants of LLA, both in terms of the quality of the predictive distribution and in terms of total computational time.

5/24/2024