Probabilistic Programming with Programmable Variational Inference

Read original: arXiv:2406.15742 - Published 6/26/2024 by McCoy R. Becker, Alexander K. Lew, Xiaoyan Wang, Matin Ghavami, Mathieu Huot, Martin C. Rinard, Vikash K. Mansinghka

🤯

Overview

Probabilistic programming languages (PPLs) provide a wide range of advanced Monte Carlo methods, but their support for variational inference (VI) is less developed.
Users are typically limited to a predefined selection of variational objectives and gradient estimators, which are implemented monolithically in PPL backends without formal correctness arguments.
The paper proposes a more modular approach to supporting VI in PPLs, based on compositional program transformation.

Plain English Explanation

Probabilistic programming languages (PPLs) are powerful tools that allow researchers and developers to build complex statistical models and perform various types of probabilistic inference. One of the key inference techniques supported by PPLs is variational inference (VI). VI is a method for approximating the posterior distribution of a statistical model, which is often computationally more efficient than other inference techniques like Markov Chain Monte Carlo (MCMC).

However, compared to the wide range of advanced Monte Carlo methods available in modern PPLs, the support for VI is relatively less developed. Users are typically limited to a predefined set of variational objectives and gradient estimators, which are implemented as a monolithic (single, non-modular) system within the PPL backend. This means that users have limited flexibility in customizing or extending the VI implementation to suit their specific needs.

To address this limitation, the paper proposes a more modular approach to supporting VI in PPLs. The key idea is to express variational objectives as programs, which can employ first-class constructs for computing densities and expected values under user-defined models and variational families. These programs are then systematically transformed into unbiased gradient estimators that can be used to optimize the variational objectives.

This modular design enables compositional reasoning about various concerns related to VI, such as automatic differentiation, density accumulation, tracing, and the application of unbiased gradient estimation strategies. Additionally, it increases the expressiveness of VI in PPLs along three key axes:

Support for an open-ended set of user-defined variational objectives, rather than a fixed menu of options.
Support for a combinatorial space of gradient estimation strategies, many of which are not automated by today's PPLs.
Support for a broader class of models and variational families, including constructs for approximate marginalization and normalization.

The authors implement their approach in an extension to the Gen probabilistic programming system, called genjax.vi, and evaluate it on several deep generative modeling tasks. The results show minimal performance overhead compared to hand-coded implementations and performance competitive with well-established open-source PPLs.

Technical Explanation

The paper introduces a modular approach to supporting variational inference (VI) in probabilistic programming languages (PPLs). In contrast to the monolithic implementation of VI in existing PPL backends, the authors propose a compositional program transformation framework that allows users to express variational objectives as programs.

These programs can employ first-class constructs for computing densities and expected values under user-defined models and variational families. The authors then systematically transform these programs into unbiased gradient estimators that can be used to optimize the variational objectives.

This modular design enables compositional reasoning about various concerns related to VI, such as automatic differentiation, density accumulation, tracing, and the application of unbiased gradient estimation strategies.

Compared to existing support for VI in PPLs, the authors' approach increases expressiveness along three key axes:

Support for an open-ended set of user-defined variational objectives, rather than a fixed menu of options.
Support for a combinatorial space of gradient estimation strategies, many of which are not automated by today's PPLs.
Support for a broader class of models and variational families, including constructs for approximate marginalization and normalization.

The authors implement their approach in an extension to the Gen probabilistic programming system, called genjax.vi, and evaluate it on several deep generative modeling tasks. The results show minimal performance overhead compared to hand-coded implementations and performance competitive with well-established open-source PPLs.

Critical Analysis

The paper presents a compelling and innovative approach to supporting variational inference (VI) in probabilistic programming languages (PPLs). The authors' modular and compositional design is a significant advancement over the monolithic implementations typically found in existing PPL backends.

By allowing users to express variational objectives as programs, the authors enable greater flexibility and customization in the VI process. This is a crucial feature, as the specific requirements and constraints of VI can vary widely across different applications and research domains.

The authors' focus on unbiased gradient estimation and compositional reasoning is also a notable strength of their approach, as it helps to ensure the reliability and robustness of the VI implementation.

However, the paper does not address some potential limitations and challenges that may arise in practice. For example, the authors do not discuss the impact of their approach on the computational efficiency of VI, which is a critical concern in many real-world applications.

Additionally, while the authors demonstrate the performance competitiveness of their implementation, it would be valuable to see a more in-depth analysis of the trade-offs and potential drawbacks compared to other VI techniques and PPL implementations.

Overall, the paper represents a significant contribution to the field of probabilistic programming and variational inference. The authors' modular and compositional approach to VI is a promising direction for future research and development in this area.

Conclusion

This paper proposes a novel and more modular approach to supporting variational inference (VI) in probabilistic programming languages (PPLs). By allowing users to express variational objectives as programs and systematically transforming them into unbiased gradient estimators, the authors enable greater flexibility, customization, and compositional reasoning in the VI process.

The authors' approach addresses several limitations of the monolithic implementations typically found in existing PPL backends, and increases the expressiveness of VI along three key axes: support for an open-ended set of user-defined variational objectives, a combinatorial space of gradient estimation strategies, and a broader class of models and variational families.

The authors' implementation, genjax.vi, demonstrates competitive performance with well-established open-source PPLs, making it a promising tool for researchers and developers working in the field of probabilistic programming and deep generative modeling.

While the paper does not address all potential limitations and challenges, it represents a significant advancement in the state of the art for supporting VI in PPLs. The authors' modular and compositional approach is likely to inspire further research and innovation in this important area of probabilistic inference.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🤯

Probabilistic Programming with Programmable Variational Inference

McCoy R. Becker, Alexander K. Lew, Xiaoyan Wang, Matin Ghavami, Mathieu Huot, Martin C. Rinard, Vikash K. Mansinghka

Compared to the wide array of advanced Monte Carlo methods supported by modern probabilistic programming languages (PPLs), PPL support for variational inference (VI) is less developed: users are typically limited to a predefined selection of variational objectives and gradient estimators, which are implemented monolithically (and without formal correctness arguments) in PPL backends. In this paper, we propose a more modular approach to supporting variational inference in PPLs, based on compositional program transformation. In our approach, variational objectives are expressed as programs, that may employ first-class constructs for computing densities of and expected values under user-defined models and variational families. We then transform these programs systematically into unbiased gradient estimators for optimizing the objectives they define. Our design enables modular reasoning about many interacting concerns, including automatic differentiation, density accumulation, tracing, and the application of unbiased gradient estimation strategies. Additionally, relative to existing support for VI in PPLs, our design increases expressiveness along three axes: (1) it supports an open-ended set of user-defined variational objectives, rather than a fixed menu of options; (2) it supports a combinatorial space of gradient estimation strategies, many not automated by today's PPLs; and (3) it supports a broader class of models and variational families, because it supports constructs for approximate marginalization and normalization (previously introduced only for Monte Carlo inference). We implement our approach in an extension to the Gen probabilistic programming system (genjax.vi, implemented in JAX), and evaluate on several deep generative modeling tasks, showing minimal performance overhead vs. hand-coded implementations and performance competitive with well-established open-source PPLs.

6/26/2024

🤯

A Primer on Variational Inference for Physics-Informed Deep Generative Modelling

Alex Glyn-Davies, Arnaud Vadeboncoeur, O. Deniz Akyildiz, Ieva Kazlauskaite, Mark Girolami

Variational inference (VI) is a computationally efficient and scalable methodology for approximate Bayesian inference. It strikes a balance between accuracy of uncertainty quantification and practical tractability. It excels at generative modelling and inversion tasks due to its built-in Bayesian regularisation and flexibility, essential qualities for physics related problems. Deriving the central learning objective for VI must often be tailored to new learning tasks where the nature of the problems dictates the conditional dependence between variables of interest, such as arising in physics problems. In this paper, we provide an accessible and thorough technical introduction to VI for forward and inverse problems, guiding the reader through standard derivations of the VI framework and how it can best be realized through deep learning. We then review and unify recent literature exemplifying the creative flexibility allowed by VI. This paper is designed for a general scientific audience looking to solve physics-based problems with an emphasis on uncertainty quantification.

9/11/2024

Particle Semi-Implicit Variational Inference

Jen Ning Lim, Adam M. Johansen

Semi-implicit variational inference (SIVI) enriches the expressiveness of variational families by utilizing a kernel and a mixing distribution to hierarchically define the variational distribution. Existing SIVI methods parameterize the mixing distribution using implicit distributions, leading to intractable variational densities. As a result, directly maximizing the evidence lower bound (ELBO) is not possible and so, they resort to either: optimizing bounds on the ELBO, employing costly inner-loop Markov chain Monte Carlo runs, or solving minimax objectives. In this paper, we propose a novel method for SIVI called Particle Variational Inference (PVI) which employs empirical measures to approximate the optimal mixing distributions characterized as the minimizer of a natural free energy functional via a particle approximation of an Euclidean--Wasserstein gradient flow. This approach means that, unlike prior works, PVI can directly optimize the ELBO; furthermore, it makes no parametric assumption about the mixing distribution. Our empirical results demonstrate that PVI performs favourably against other SIVI methods across various tasks. Moreover, we provide a theoretical analysis of the behaviour of the gradient flow of a related free energy functional: establishing the existence and uniqueness of solutions as well as propagation of chaos results.

7/2/2024

🤔

Variational inference, Mixture of Gaussians, Bayesian Machine Learning

Tom Huix, Anna Korba, Alain Durmus, Eric Moulines

Variational inference (VI) is a popular approach in Bayesian inference, that looks for the best approximation of the posterior distribution within a parametric family, minimizing a loss that is typically the (reverse) Kullback-Leibler (KL) divergence. Despite its empirical success, the theoretical properties of VI have only received attention recently, and mostly when the parametric family is the one of Gaussians. This work aims to contribute to the theoretical study of VI in the non-Gaussian case by investigating the setting of Mixture of Gaussians with fixed covariance and constant weights. In this view, VI over this specific family can be casted as the minimization of a Mollified relative entropy, i.e. the KL between the convolution (with respect to a Gaussian kernel) of an atomic measure supported on Diracs, and the target distribution. The support of the atomic measure corresponds to the localization of the Gaussian components. Hence, solving variational inference becomes equivalent to optimizing the positions of the Diracs (the particles), which can be done through gradient descent and takes the form of an interacting particle system. We study two sources of error of variational inference in this context when optimizing the mollified relative entropy. The first one is an optimization result, that is a descent lemma establishing that the algorithm decreases the objective at each iteration. The second one is an approximation error, that upper bounds the objective between an optimal finite mixture and the target distribution.

6/11/2024