Batch and match: black-box variational inference with a score-based divergence

Read original: arXiv:2402.14758 - Published 6/13/2024 by Diana Cai, Chirag Modi, Loucas Pillaud-Vivien, Charles C. Margossian, Robert M. Gower, David M. Blei, Lawrence K. Saul

Batch and match: black-box variational inference with a score-based divergence

Overview

This paper presents a new approach for black-box variational inference (BBVI) using a score-based divergence measure.
The key idea is to use the score function of the variational distribution to define a divergence between the variational and true posterior distributions.
This allows for more flexible and robust BBVI optimization compared to traditional methods.

Plain English Explanation

The paper discusses a new way to do approximate inference, which is the process of estimating an unknown probability distribution (the "true" posterior) using a simpler, easier-to-work-with distribution (the "variational" distribution).

Traditionally, this has been done by minimizing a divergence measure (a mathematical way to quantify how different two distributions are) between the variational and true posterior distributions. However, the authors propose using a different divergence measure called the "score-based divergence".

The score function is a mathematical tool that describes how the variational distribution changes as its parameters change. By using this score function in the divergence measure, the authors show that the optimization process for finding the best variational distribution becomes more flexible and robust compared to standard methods.

This is an important advance because it can lead to better approximate inference, which is crucial for many machine learning and statistical modeling problems where the true posterior distribution is too complex to work with directly.

Technical Explanation

The authors start by reviewing standard black-box variational inference (BBVI) methods, which optimize a variational distribution to approximate an intractable true posterior distribution.

They then introduce the key idea of using a "score-based divergence" as the objective function for BBVI. This divergence measure is defined in terms of the score function of the variational distribution, which describes how the distribution changes as its parameters are adjusted.

The authors show that optimizing this score-based divergence has several advantages over traditional BBVI approaches based on other divergence measures. In particular, it can lead to more stable and reliable optimization as well as tighter variational bounds.

They demonstrate the effectiveness of their approach on several benchmark problems, showing improved performance compared to standard BBVI methods.

Critical Analysis

The authors provide a thorough theoretical analysis of their score-based BBVI approach, establishing its advantages over prior work. However, they do acknowledge some limitations:

The score-based divergence may be more computationally expensive to optimize in certain cases, as it requires estimating higher-order derivatives.
The method still relies on sampling-based approximations, which can suffer from high variance, especially in high-dimensional problems.

Additionally, while the empirical results are promising, the authors only consider relatively simple benchmark problems. More research would be needed to assess the scalability and performance of this approach on large-scale, real-world applications.

Overall, this paper introduces an intriguing new perspective on BBVI that merits further investigation and development. The score-based divergence approach could lead to significant improvements in approximate inference, but its practical impact will depend on addressing the remaining challenges.

Conclusion

This paper presents a novel black-box variational inference (BBVI) method that uses a score-based divergence measure as the optimization objective. By leveraging the score function of the variational distribution, the authors demonstrate that this approach can lead to more stable and reliable BBVI optimization compared to traditional divergence-based methods.

The key theoretical and empirical contributions of this work advance the state of the art in approximate inference, which is a fundamental challenge in machine learning and statistical modeling. While some limitations remain, this score-based BBVI framework shows promising potential to improve the flexibility and robustness of variational inference techniques across a wide range of applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Batch and match: black-box variational inference with a score-based divergence

Diana Cai, Chirag Modi, Loucas Pillaud-Vivien, Charles C. Margossian, Robert M. Gower, David M. Blei, Lawrence K. Saul

Most leading implementations of black-box variational inference (BBVI) are based on optimizing a stochastic evidence lower bound (ELBO). But such approaches to BBVI often converge slowly due to the high variance of their gradient estimates and their sensitivity to hyperparameters. In this work, we propose batch and match (BaM), an alternative approach to BBVI based on a score-based divergence. Notably, this score-based divergence can be optimized by a closed-form proximal update for Gaussian variational families with full covariance matrices. We analyze the convergence of BaM when the target distribution is Gaussian, and we prove that in the limit of infinite batch size the variational parameter updates converge exponentially quickly to the target mean and covariance. We also evaluate the performance of BaM on Gaussian and non-Gaussian target distributions that arise from posterior inference in hierarchical and deep generative models. In these experiments, we find that BaM typically converges in fewer (and sometimes significantly fewer) gradient evaluations than leading implementations of BBVI based on ELBO maximization.

6/13/2024

🤯

A Framework for Improving the Reliability of Black-box Variational Inference

Manushi Welandawe, Michael Riis Andersen, Aki Vehtari, Jonathan H. Huggins

Black-box variational inference (BBVI) now sees widespread use in machine learning and statistics as a fast yet flexible alternative to Markov chain Monte Carlo methods for approximate Bayesian inference. However, stochastic optimization methods for BBVI remain unreliable and require substantial expertise and hand-tuning to apply effectively. In this paper, we propose Robust and Automated Black-box VI (RABVI), a framework for improving the reliability of BBVI optimization. RABVI is based on rigorously justified automation techniques, includes just a small number of intuitive tuning parameters, and detects inaccurate estimates of the optimal variational approximation. RABVI adaptively decreases the learning rate by detecting convergence of the fixed--learning-rate iterates, then estimates the symmetrized Kullback--Leibler (KL) divergence between the current variational approximation and the optimal one. It also employs a novel optimization termination criterion that enables the user to balance desired accuracy against computational cost by comparing (i) the predicted relative decrease in the symmetrized KL divergence if a smaller learning were used and (ii) the predicted computation required to converge with the smaller learning rate. We validate the robustness and accuracy of RABVI through carefully designed simulation studies and on a diverse set of real-world model and data examples.

5/17/2024

🤯

Efficient Mixture Learning in Black-Box Variational Inference

Alexandra Hotti, Oskar Kviman, Ricky Mol'en, V'ictor Elvira, Jens Lagergren

Mixture variational distributions in black box variational inference (BBVI) have demonstrated impressive results in challenging density estimation tasks. However, currently scaling the number of mixture components can lead to a linear increase in the number of learnable parameters and a quadratic increase in inference time due to the evaluation of the evidence lower bound (ELBO). Our two key contributions address these limitations. First, we introduce the novel Multiple Importance Sampling Variational Autoencoder (MISVAE), which amortizes the mapping from input to mixture-parameter space using one-hot encodings. Fortunately, with MISVAE, each additional mixture component incurs a negligible increase in network parameters. Second, we construct two new estimators of the ELBO for mixtures in BBVI, enabling a tremendous reduction in inference time with marginal or even improved impact on performance. Collectively, our contributions enable scalability to hundreds of mixture components and provide superior estimation performance in shorter time, with fewer network parameters compared to previous Mixture VAEs. Experimenting with MISVAE, we achieve astonishing, SOTA results on MNIST. Furthermore, we empirically validate our estimators in other BBVI settings, including Bayesian phylogenetic inference, where we improve inference times for the SOTA mixture model on eight data sets.

6/12/2024

💬

Linear Convergence of Black-Box Variational Inference: Should We Stick the Landing?

Kyurae Kim, Yian Ma, Jacob R. Gardner

We prove that black-box variational inference (BBVI) with control variates, particularly the sticking-the-landing (STL) estimator, converges at a geometric (traditionally called linear) rate under perfect variational family specification. In particular, we prove a quadratic bound on the gradient variance of the STL estimator, one which encompasses misspecified variational families. Combined with previous works on the quadratic variance condition, this directly implies convergence of BBVI with the use of projected stochastic gradient descent. For the projection operator, we consider a domain with triangular scale matrices, which the projection onto is computable in $Theta(d)$ time, where $d$ is the dimensionality of the target posterior. We also improve existing analysis on the regular closed-form entropy gradient estimators, which enables comparison against the STL estimator, providing explicit non-asymptotic complexity guarantees for both.

4/24/2024