Improving Variational Autoencoder Estimation from Incomplete Data with Mixture Variational Families

2403.03069

Published 6/28/2024 by Vaidotas Simkus, Michael U. Gutmann

Improving Variational Autoencoder Estimation from Incomplete Data with Mixture Variational Families

Abstract

We consider the task of estimating variational autoencoders (VAEs) when the training data is incomplete. We show that missing data increases the complexity of the model's posterior distribution over the latent variables compared to the fully-observed case. The increased complexity may adversely affect the fit of the model due to a mismatch between the variational and model posterior distributions. We introduce two strategies based on (i) finite variational-mixture and (ii) imputation-based variational-mixture distributions to address the increased posterior complexity. Through a comprehensive evaluation of the proposed approaches, we show that variational mixtures are effective at improving the accuracy of VAE estimation from incomplete data.

Create account to get full access

Overview

This paper proposes a new method for improving the estimation of Variational Autoencoders (VAEs) from incomplete data.
The key idea is to use a mixture of variational families to better approximate the true posterior distribution, rather than relying on a single parametric family.
The authors demonstrate that this approach outperforms the standard VAE estimation method on several benchmark datasets with missing data.

Plain English Explanation

Variational Autoencoders (VAEs) are a powerful type of machine learning model that can learn to generate new data that is similar to a given dataset. However, when the data has missing values, it can be challenging to train VAEs effectively.

The standard approach for training VAEs from incomplete data is to make assumptions about the missing values and then optimize the model accordingly. This paper introduces a new technique that can improve upon the standard approach.

The key insight is that instead of using a single probability distribution to approximate the true underlying distribution of the data, the authors propose using a mixture of different distributions. This allows the model to better capture the complex structure of the data, even when some values are missing.

By using this mixture of variational families, the authors show that their approach can outperform the standard VAE method on a variety of datasets with missing data. This could have important implications for applications where VAEs need to work with incomplete or noisy data, such as manifold learning, multi-modal learning, and condition monitoring.

Technical Explanation

The standard approach for training VAEs from incomplete data is to make assumptions about the missing values and then optimize the model accordingly. This is done by maximizing a lower bound on the log-likelihood of the observed data, which is known as the Evidence Lower Bound (ELBO).

The authors propose an alternative method that uses a mixture of variational families to better approximate the true posterior distribution. The key idea is to represent the variational distribution as a mixture of Gaussians, where each component in the mixture corresponds to a different set of assumptions about the missing data.

By optimizing this mixture variational distribution, the authors show that the model can better capture the complex structure of the data, even when some values are missing. They demonstrate the effectiveness of this approach on several benchmark datasets, including image and tabular data with varying degrees of missingness.

Critical Analysis

The authors acknowledge several limitations and areas for future research in their paper. For example, they note that the mixture variational approach can be computationally more expensive than the standard VAE method, especially as the number of components in the mixture increases.

Additionally, the authors suggest that their method could be further improved by incorporating more flexible variational families, such as normalizing flows or implicit distributions, to better capture the true posterior distribution.

Another potential limitation is that the performance of the mixture variational approach may depend on the specific characteristics of the dataset, such as the pattern and amount of missing data. The authors do not provide a comprehensive analysis of how their method would perform in different scenarios, which could be an area for future research.

Conclusion

This paper presents a novel approach for improving the estimation of Variational Autoencoders (VAEs) from incomplete data. By using a mixture of variational families to better approximate the true posterior distribution, the authors demonstrate that their method outperforms the standard VAE estimation approach on several benchmark datasets.

This work has the potential to significantly impact applications where VAEs need to work with incomplete or noisy data, such as manifold learning, multi-modal learning, and condition monitoring. By improving the ability of VAEs to learn from incomplete data, this research could lead to more robust and accurate models in a wide range of real-world applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🤯

Efficient Mixture Learning in Black-Box Variational Inference

Alexandra Hotti, Oskar Kviman, Ricky Mol'en, V'ictor Elvira, Jens Lagergren

Mixture variational distributions in black box variational inference (BBVI) have demonstrated impressive results in challenging density estimation tasks. However, currently scaling the number of mixture components can lead to a linear increase in the number of learnable parameters and a quadratic increase in inference time due to the evaluation of the evidence lower bound (ELBO). Our two key contributions address these limitations. First, we introduce the novel Multiple Importance Sampling Variational Autoencoder (MISVAE), which amortizes the mapping from input to mixture-parameter space using one-hot encodings. Fortunately, with MISVAE, each additional mixture component incurs a negligible increase in network parameters. Second, we construct two new estimators of the ELBO for mixtures in BBVI, enabling a tremendous reduction in inference time with marginal or even improved impact on performance. Collectively, our contributions enable scalability to hundreds of mixture components and provide superior estimation performance in shorter time, with fewer network parameters compared to previous Mixture VAEs. Experimenting with MISVAE, we achieve astonishing, SOTA results on MNIST. Furthermore, we empirically validate our estimators in other BBVI settings, including Bayesian phylogenetic inference, where we improve inference times for the SOTA mixture model on eight data sets.

6/12/2024

cs.LG stat.ML

Towards Model-Agnostic Posterior Approximation for Fast and Accurate Variational Autoencoders

Yaniv Yacoby, Weiwei Pan, Finale Doshi-Velez

Inference for Variational Autoencoders (VAEs) consists of learning two models: (1) a generative model, which transforms a simple distribution over a latent space into the distribution over observed data, and (2) an inference model, which approximates the posterior of the latent codes given data. The two components are learned jointly via a lower bound to the generative model's log marginal likelihood. In early phases of joint training, the inference model poorly approximates the latent code posteriors. Recent work showed that this leads optimization to get stuck in local optima, negatively impacting the learned generative model. As such, recent work suggests ensuring a high-quality inference model via iterative training: maximizing the objective function relative to the inference model before every update to the generative model. Unfortunately, iterative training is inefficient, requiring heuristic criteria for reverting from iterative to joint training for speed. Here, we suggest an inference method that trains the generative and inference models independently. It approximates the posterior of the true model a priori; fixing this posterior approximation, we then maximize the lower bound relative to only the generative model. By conventional wisdom, this approach should rely on the true prior and likelihood of the true model to approximate its posterior (which are unknown). However, we show that we can compute a deterministic, model-agnostic posterior approximation (MAPA) of the true model's posterior. We then use MAPA to develop a proof-of-concept inference method. We present preliminary results on low-dimensional synthetic data that (1) MAPA captures the trend of the true posterior, and (2) our MAPA-based inference performs better density estimation with less computation than baselines. Lastly, we present a roadmap for scaling the MAPA-based inference method to high-dimensional data.

6/14/2024

stat.ML cs.LG

📉

Manifold Learning by Mixture Models of VAEs for Inverse Problems

Giovanni S. Alberti, Johannes Hertrich, Matteo Santacesaria, Silvia Sciutto

Representing a manifold of very high-dimensional data with generative models has been shown to be computationally efficient in practice. However, this requires that the data manifold admits a global parameterization. In order to represent manifolds of arbitrary topology, we propose to learn a mixture model of variational autoencoders. Here, every encoder-decoder pair represents one chart of a manifold. We propose a loss function for maximum likelihood estimation of the model weights and choose an architecture that provides us the analytical expression of the charts and of their inverses. Once the manifold is learned, we use it for solving inverse problems by minimizing a data fidelity term restricted to the learned manifold. To solve the arising minimization problem we propose a Riemannian gradient descent algorithm on the learned manifold. We demonstrate the performance of our method for low-dimensional toy examples as well as for deblurring and electrical impedance tomography on certain image manifolds.

6/13/2024

cs.LG stat.ML

🔍

Learning multi-modal generative models with permutation-invariant encoders and tighter variational bounds

Marcel Hirt, Domenico Campolo, Victoria Leong, Juan-Pablo Ortega

Devising deep latent variable models for multi-modal data has been a long-standing theme in machine learning research. Multi-modal Variational Autoencoders (VAEs) have been a popular generative model class that learns latent representations that jointly explain multiple modalities. Various objective functions for such models have been suggested, often motivated as lower bounds on the multi-modal data log-likelihood or from information-theoretic considerations. To encode latent variables from different modality subsets, Product-of-Experts (PoE) or Mixture-of-Experts (MoE) aggregation schemes have been routinely used and shown to yield different trade-offs, for instance, regarding their generative quality or consistency across multiple modalities. In this work, we consider a variational bound that can tightly approximate the data log-likelihood. We develop more flexible aggregation schemes that generalize PoE or MoE approaches by combining encoded features from different modalities based on permutation-invariant neural networks. Our numerical experiments illustrate trade-offs for multi-modal variational bounds and various aggregation schemes. We show that tighter variational bounds and more flexible aggregation models can become beneficial when one wants to approximate the true joint distribution over observed modalities and latent variables in identifiable models.

4/22/2024

stat.ML cs.LG