Amortized Variational Inference: When and Why?

2307.11018

Published 5/27/2024 by Charles C. Margossian, David M. Blei

✨

Abstract

In a probabilistic latent variable model, factorized (or mean-field) variational inference (F-VI) fits a separate parametric distribution for each latent variable. Amortized variational inference (A-VI) instead learns a common inference function, which maps each observation to its corresponding latent variable's approximate posterior. Typically, A-VI is used as a step in the training of variational autoencoders, however it stands to reason that A-VI could also be used as a general alternative to F-VI. In this paper we study when and why A-VI can be used for approximate Bayesian inference. We derive conditions on a latent variable model which are necessary, sufficient, and verifiable under which A-VI can attain F-VI's optimal solution, thereby closing the amortization gap. We prove these conditions are uniquely verified by simple hierarchical models, a broad class that encompasses many models in machine learning. We then show, on a broader class of models, how to expand the domain of AVI's inference function to improve its solution, and we provide examples, e.g. hidden Markov models, where the amortization gap cannot be closed.

Create account to get full access

Overview

In a probabilistic latent variable model, variational inference is a way to approximate the true posterior distribution of the latent variables.
Factorized (or mean-field) variational inference fits a separate parametric distribution for each latent variable.
Amortized variational inference learns a common inference function that maps observations to their corresponding latent variable's approximate posterior.
This paper studies when and why amortized variational inference can be used as a general alternative to factorized variational inference.

Plain English Explanation

In machine learning, we often work with models that have hidden or "latent" variables. These latent variables represent underlying factors that we can't directly observe, but that influence the data we do observe. Variational inference is a technique used to approximate the true distribution of these latent variables.

One approach to variational inference is called "factorized" or "mean-field" variational inference. This method fits a separate probability distribution for each latent variable. An alternative approach is "amortized" variational inference, which learns a single inference function that can map any observation to the approximate posterior distribution of its corresponding latent variables.

Amortized variational inference is often used as part of training variational autoencoders, but this paper explores whether it could be used more broadly as an alternative to factorized variational inference.

Technical Explanation

The paper derives the conditions under which amortized variational inference (A-VI) can achieve the same optimal solution as factorized (or mean-field) variational inference (F-VI).

The authors show that these conditions are uniquely verified by a broad class of "simple hierarchical models," which encompass many common machine learning models. This means that for this class of models, A-VI can be used as a general alternative to F-VI without sacrificing accuracy.

For a broader class of models, the paper discusses how to expand the domain of the A-VI inference function to improve its solution. The authors also provide examples, such as hidden Markov models, where the "amortization gap" between A-VI and F-VI cannot be closed.

Critical Analysis

The paper provides a rigorous theoretical analysis of the conditions under which amortized variational inference can achieve the same optimal solution as factorized variational inference. This is an important result, as it demonstrates that A-VI can be used as a general-purpose alternative to F-VI in many common machine learning models.

However, the paper also acknowledges that there are limitations to this result. The class of "simple hierarchical models" where the conditions are uniquely verified may not capture all the models researchers are interested in. Additionally, for a broader class of models, the authors show that the "amortization gap" cannot be closed, suggesting that A-VI may not be suitable in those cases.

Further research could explore ways to expand the applicability of A-VI beyond the simple hierarchical models, or investigate hybrid approaches that combine the strengths of both A-VI and F-VI. Additionally, empirical studies comparing the performance of A-VI and F-VI in various real-world scenarios would help validate the theoretical insights presented in this paper.

Conclusion

This paper makes an important contribution to the understanding of amortized variational inference and its relationship to factorized variational inference. By deriving the conditions under which A-VI can achieve the same optimal solution as F-VI, the authors have shown that A-VI can be a viable and general-purpose alternative in a broad class of machine learning models.

This work has the potential to inform the design and selection of variational inference techniques for a wide range of probabilistic modeling tasks, and may lead to further advancements in the field of scalable and efficient Bayesian inference [^1].

[^1]: ASPIRE: Iterative Amortized Posterior Inference for Bayesian Inverse Problems

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

ASPIRE: Iterative Amortized Posterior Inference for Bayesian Inverse Problems

Rafael Orozco, Ali Siahkoohi, Mathias Louboutin, Felix J. Herrmann

Due to their uncertainty quantification, Bayesian solutions to inverse problems are the framework of choice in applications that are risk averse. These benefits come at the cost of computations that are in general, intractable. New advances in machine learning and variational inference (VI) have lowered the computational barrier by learning from examples. Two VI paradigms have emerged that represent different tradeoffs: amortized and non-amortized. Amortized VI can produce fast results but due to generalizing to many observed datasets it produces suboptimal inference results. Non-amortized VI is slower at inference but finds better posterior approximations since it is specialized towards a single observed dataset. Current amortized VI techniques run into a sub-optimality wall that can not be improved without more expressive neural networks or extra training data. We present a solution that enables iterative improvement of amortized posteriors that uses the same networks architectures and training data. The benefits of our method requires extra computations but these remain frugal since they are based on physics-hybrid methods and summary statistics. Importantly, these computations remain mostly offline thus our method maintains cheap and reusable online evaluation while bridging the approximation gap these two paradigms. We denote our proposed method ASPIRE - Amortized posteriors with Summaries that are Physics-based and Iteratively REfined. We first validate our method on a stylized problem with a known posterior then demonstrate its practical use on a high-dimensional and nonlinear transcranial medical imaging problem with ultrasound. Compared with the baseline and previous methods from the literature our method stands out as an computationally efficient and high-fidelity method for posterior inference.

5/10/2024

cs.LG stat.ML

🤯

Learning to solve Bayesian inverse problems: An amortized variational inference approach using Gaussian and Flow guides

Sharmila Karumuri, Ilias Bilionis

Inverse problems, i.e., estimating parameters of physical models from experimental data, are ubiquitous in science and engineering. The Bayesian formulation is the gold standard because it alleviates ill-posedness issues and quantifies epistemic uncertainty. Since analytical posteriors are not typically available, one resorts to Markov chain Monte Carlo sampling or approximate variational inference. However, inference needs to be rerun from scratch for each new set of data. This drawback limits the applicability of the Bayesian formulation to real-time settings, e.g., health monitoring of engineered systems, and medical diagnosis. The objective of this paper is to develop a methodology that enables real-time inference by learning the Bayesian inverse map, i.e., the map from data to posteriors. Our approach is as follows. We parameterize the posterior distribution as a function of data. This work outlines two distinct approaches to do this. The first method involves parameterizing the posterior using an amortized full-rank Gaussian guide, implemented through neural networks. The second method utilizes a Conditional Normalizing Flow guide, employing conditional invertible neural networks for cases where the target posterior is arbitrarily complex. In both approaches, we learn the network parameters by amortized variational inference which involves maximizing the expectation of evidence lower bound over all possible datasets compatible with the model. We demonstrate our approach by solving a set of benchmark problems from science and engineering. Our results show that the posterior estimates of our approach are in agreement with the corresponding ground truth obtained by Markov chain Monte Carlo. Once trained, our approach provides the posterior distribution for a given observation just at the cost of a forward pass of the neural network.

5/28/2024

stat.ML cs.LG

🤔

Variational inference, Mixture of Gaussians, Bayesian Machine Learning

Tom Huix, Anna Korba, Alain Durmus, Eric Moulines

Variational inference (VI) is a popular approach in Bayesian inference, that looks for the best approximation of the posterior distribution within a parametric family, minimizing a loss that is typically the (reverse) Kullback-Leibler (KL) divergence. Despite its empirical success, the theoretical properties of VI have only received attention recently, and mostly when the parametric family is the one of Gaussians. This work aims to contribute to the theoretical study of VI in the non-Gaussian case by investigating the setting of Mixture of Gaussians with fixed covariance and constant weights. In this view, VI over this specific family can be casted as the minimization of a Mollified relative entropy, i.e. the KL between the convolution (with respect to a Gaussian kernel) of an atomic measure supported on Diracs, and the target distribution. The support of the atomic measure corresponds to the localization of the Gaussian components. Hence, solving variational inference becomes equivalent to optimizing the positions of the Diracs (the particles), which can be done through gradient descent and takes the form of an interacting particle system. We study two sources of error of variational inference in this context when optimizing the mollified relative entropy. The first one is an optimization result, that is a descent lemma establishing that the algorithm decreases the objective at each iteration. The second one is an approximation error, that upper bounds the objective between an optimal finite mixture and the target distribution.

6/11/2024

stat.ML cs.LG

🧠

Neural Methods for Amortised Parameter Inference

Andrew Zammit-Mangion, Matthew Sainsbury-Dale, Raphael Huser

Simulation-based methods for statistical inference have evolved dramatically over the past 50 years, keeping pace with technological advancements. The field is undergoing a new revolution as it embraces the representational capacity of neural networks, optimisation libraries and graphics processing units for learning complex mappings between data and inferential targets. The resulting tools are amortised, in the sense that they allow rapid inference through fast feedforward operations. In this article we review recent progress in the context of point estimation, approximate Bayesian inference, summary-statistic construction, and likelihood approximation. We also cover software, and include a simple illustration to showcase the wide array of tools available for amortised inference and the benefits they offer over Markov chain Monte Carlo methods. The article concludes with an overview of relevant topics and an outlook on future research directions.

6/27/2024

stat.ML cs.LG