Provably Scalable Black-Box Variational Inference with Structured Variational Families

2401.10989

Published 6/4/2024 by Joohwan Ko, Kyurae Kim, Woo Chang Kim, Jacob R. Gardner

🤯

Abstract

Variational families with full-rank covariance approximations are known not to work well in black-box variational inference (BBVI), both empirically and theoretically. In fact, recent computational complexity results for BBVI have established that full-rank variational families scale poorly with the dimensionality of the problem compared to e.g. mean-field families. This is particularly critical to hierarchical Bayesian models with local variables; their dimensionality increases with the size of the datasets. Consequently, one gets an iteration complexity with an explicit (mathcal{O}(N^2)) dependence on the dataset size (N). In this paper, we explore a theoretical middle ground between mean-field variational families and full-rank families: structured variational families. We rigorously prove that certain scale matrix structures can achieve a better iteration complexity of (mathcal{O}left(Nright)), implying better scaling with respect to (N). We empirically verify our theoretical results on large-scale hierarchical models.

Create account to get full access

Overview

Variational families with full-rank covariance approximations are known to perform poorly in black-box variational inference (BBVI)
Recent research has shown that full-rank variational families scale poorly with the dimensionality of the problem compared to mean-field families
This is particularly problematic for hierarchical Bayesian models with local variables, as their dimensionality increases with the size of the datasets
The authors explore a middle ground between mean-field and full-rank families: structured variational families

Plain English Explanation

In black-box variational inference (BBVI), researchers try to approximate complex probability distributions using simpler, more manageable distributions. One common approach is to use "full-rank" distributions, which can capture more intricate relationships between the variables. However, recent studies have shown that these full-rank distributions don't work very well in BBVI, especially as the number of variables (the "dimensionality") gets larger.

This is a particular problem for hierarchical Bayesian models with local variables, where the dimensionality increases as the dataset gets bigger. In these cases, the computational complexity of the BBVI algorithm ends up scaling poorly with the dataset size, making it impractical for large-scale problems.

To address this issue, the authors of this paper explore a middle ground between the simple "mean-field" distributions and the more complex full-rank distributions. They investigate "structured" variational families, which can capture some of the important relationships between variables while still scaling better than the full-rank approach. Through rigorous mathematical analysis, they show that certain structured distributions can achieve better scaling with respect to the dataset size, making them a promising alternative for large-scale hierarchical models.

Technical Explanation

The authors build on prior work that has established the computational complexity issues of full-rank variational families in BBVI. Specifically, they show that the iteration complexity of BBVI with full-rank families has an explicit

O(N^2)

dependence on the dataset size

, where

is the number of data points. This is in contrast to the

O(N)

scaling of simpler mean-field variational families.

To address this problem, the authors explore a class of "structured" variational families that lie between mean-field and full-rank. They rigorously prove that certain structured scale matrix forms can achieve an

O(N)

iteration complexity, implying better scaling with respect to the dataset size. The key insight is that these structured families can capture important dependencies between variables while avoiding the prohibitive costs of the full-rank approach.

The authors empirically verify their theoretical results on large-scale hierarchical Bayesian models, demonstrating the practical benefits of their structured variational families. This work contributes to the ongoing research on improving the reliability and efficiency of BBVI and advancing the state-of-the-art in variational inference.

Critical Analysis

The authors provide a rigorous theoretical analysis of the computational complexity trade-offs between different variational families in the context of BBVI. Their work highlights the importance of carefully designing the structure of the variational family to balance expressiveness and scalability, especially for large-scale hierarchical models.

That said, the authors acknowledge that the specific structured forms they analyze may not be the only viable options, and there could be other structured families that achieve similar or better scaling properties. Additionally, the empirical evaluation is limited to a few hierarchical models, and further testing on a wider range of real-world problems would help validate the broader applicability of their approach.

It would also be interesting to see how the structured variational families perform compared to other recent advancements in BBVI, such as variance control techniques or kernel-based methods. A more comprehensive comparative analysis could further elucidate the strengths and limitations of the structured variational approach.

Overall, this work makes an important contribution to the theoretical understanding of BBVI and provides a promising direction for developing more scalable variational inference methods for large-scale hierarchical models.

Conclusion

This paper explores a middle ground between the simple mean-field and the more complex full-rank variational families in the context of black-box variational inference (BBVI). The authors rigorously prove that certain structured variational families can achieve better computational scaling with respect to the dataset size compared to full-rank families, while still capturing important dependencies between variables.

The theoretical insights and empirical results presented in this work have the potential to significantly improve the applicability of BBVI to large-scale hierarchical Bayesian models, which are widely used in a variety of domains. By developing more scalable variational inference methods, the authors contribute to the ongoing efforts to make Bayesian modeling and inference more practical and accessible for real-world problems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🤯

A Framework for Improving the Reliability of Black-box Variational Inference

Manushi Welandawe, Michael Riis Andersen, Aki Vehtari, Jonathan H. Huggins

Black-box variational inference (BBVI) now sees widespread use in machine learning and statistics as a fast yet flexible alternative to Markov chain Monte Carlo methods for approximate Bayesian inference. However, stochastic optimization methods for BBVI remain unreliable and require substantial expertise and hand-tuning to apply effectively. In this paper, we propose Robust and Automated Black-box VI (RABVI), a framework for improving the reliability of BBVI optimization. RABVI is based on rigorously justified automation techniques, includes just a small number of intuitive tuning parameters, and detects inaccurate estimates of the optimal variational approximation. RABVI adaptively decreases the learning rate by detecting convergence of the fixed--learning-rate iterates, then estimates the symmetrized Kullback--Leibler (KL) divergence between the current variational approximation and the optimal one. It also employs a novel optimization termination criterion that enables the user to balance desired accuracy against computational cost by comparing (i) the predicted relative decrease in the symmetrized KL divergence if a smaller learning were used and (ii) the predicted computation required to converge with the smaller learning rate. We validate the robustness and accuracy of RABVI through carefully designed simulation studies and on a diverse set of real-world model and data examples.

5/17/2024

stat.ML cs.LG

💬

Linear Convergence of Black-Box Variational Inference: Should We Stick the Landing?

Kyurae Kim, Yian Ma, Jacob R. Gardner

We prove that black-box variational inference (BBVI) with control variates, particularly the sticking-the-landing (STL) estimator, converges at a geometric (traditionally called linear) rate under perfect variational family specification. In particular, we prove a quadratic bound on the gradient variance of the STL estimator, one which encompasses misspecified variational families. Combined with previous works on the quadratic variance condition, this directly implies convergence of BBVI with the use of projected stochastic gradient descent. For the projection operator, we consider a domain with triangular scale matrices, which the projection onto is computable in $Theta(d)$ time, where $d$ is the dimensionality of the target posterior. We also improve existing analysis on the regular closed-form entropy gradient estimators, which enables comparison against the STL estimator, providing explicit non-asymptotic complexity guarantees for both.

4/24/2024

stat.ML cs.LG

👁️

Manifold Gaussian Variational Bayes on the Precision Matrix

Martin Magris, Mostafa Shabani, Alexandros Iosifidis

We propose an optimization algorithm for Variational Inference (VI) in complex models. Our approach relies on natural gradient updates where the variational space is a Riemann manifold. We develop an efficient algorithm for Gaussian Variational Inference whose updates satisfy the positive definite constraint on the variational covariance matrix. Our Manifold Gaussian Variational Bayes on the Precision matrix (MGVBP) solution provides simple update rules, is straightforward to implement, and the use of the precision matrix parametrization has a significant computational advantage. Due to its black-box nature, MGVBP stands as a ready-to-use solution for VI in complex models. Over five datasets, we empirically validate our feasible approach on different statistical and econometric models, discussing its performance with respect to baseline methods.

4/17/2024

stat.ML cs.LG

Batch and match: black-box variational inference with a score-based divergence

Diana Cai, Chirag Modi, Loucas Pillaud-Vivien, Charles C. Margossian, Robert M. Gower, David M. Blei, Lawrence K. Saul

Most leading implementations of black-box variational inference (BBVI) are based on optimizing a stochastic evidence lower bound (ELBO). But such approaches to BBVI often converge slowly due to the high variance of their gradient estimates and their sensitivity to hyperparameters. In this work, we propose batch and match (BaM), an alternative approach to BBVI based on a score-based divergence. Notably, this score-based divergence can be optimized by a closed-form proximal update for Gaussian variational families with full covariance matrices. We analyze the convergence of BaM when the target distribution is Gaussian, and we prove that in the limit of infinite batch size the variational parameter updates converge exponentially quickly to the target mean and covariance. We also evaluate the performance of BaM on Gaussian and non-Gaussian target distributions that arise from posterior inference in hierarchical and deep generative models. In these experiments, we find that BaM typically converges in fewer (and sometimes significantly fewer) gradient evaluations than leading implementations of BBVI based on ELBO maximization.

6/13/2024

stat.ML cs.AI cs.LG