Subtractive Mixture Models via Squaring: Representation and Learning

Read original: arXiv:2310.00724 - Published 4/29/2024 by Lorenzo Loconte, Aleksanteri M. Sladek, Stefan Mengel, Martin Trapp, Arno Solin, Nicolas Gillis, Antonio Vergari

👁️

Overview

Mixture models are typically represented and learned by adding multiple distributions as components
Allowing mixtures to subtract probability mass or density can significantly reduce the number of components needed to model complex distributions
Learning such subtractive mixtures while ensuring they encode a non-negative function is challenging
This paper investigates how to learn and perform inference on deep subtractive mixtures by squaring them
The research is conducted in the framework of probabilistic circuits, which enable the representation of tensorized mixtures and generalization of other subtractive models

Plain English Explanation

Mixture models are a way of representing complex distributions by combining multiple simpler distributions, like Gaussian mixture models. Traditionally, these mixtures are built by adding the different components together.

However, the authors of this paper propose a different approach - allowing the mixture to subtract probability mass or density from certain areas. This can drastically reduce the number of components needed to model complex real-world distributions accurately.

The challenge is that learning these "subtractive" mixtures while ensuring they still represent a valid, non-negative probability distribution is difficult. The researchers investigate how to overcome this by "squaring" the subtractive mixtures, which forces them to be non-negative.

They do this using a framework called "probabilistic circuits", which allows them to represent these more expressive mixtures in a tensorized way and generalize other subtractive models, like curriculum learning for parity functions.

Technical Explanation

The key technical contribution of this paper is developing a way to learn and perform inference on deep subtractive mixture models. Traditionally, mixture models are built by adding multiple distribution components together. The authors show that allowing the mixture to subtract probability mass or density can dramatically reduce the number of components needed to model complex real-world distributions.

To learn these subtractive mixtures while ensuring they still represent a valid, non-negative probability distribution, the researchers propose squaring the mixture components. They do this in the framework of probabilistic circuits, which enables them to represent the mixtures in a tensorized way and generalize several other subtractive models, like QN-Mixer and Mod-MLL.

Theoretically, the authors prove that the class of squared circuits allowing subtractions can be exponentially more expressive than traditional additive mixture models. Empirically, they demonstrate this increased expressiveness on a series of real-world distribution estimation tasks.

Critical Analysis

The paper presents a compelling approach to learning more expressive mixture models by incorporating subtractive components. This has the potential to significantly improve the efficiency and accuracy of density estimation for complex real-world distributions.

However, the authors acknowledge that learning these subtractive mixtures while ensuring they remain valid probability distributions is a significant challenge. The "squaring" technique they propose is an interesting solution, but it remains to be seen how well it scales to high-dimensional, large-scale problems.

Additionally, the paper focuses primarily on the theoretical and empirical benefits of the subtractive mixture approach, without delving deeply into potential limitations or drawbacks. It would be helpful to see a more thorough discussion of the potential issues or edge cases that may arise when applying these models in practice.

Overall, this research represents an important step forward in the field of mixture modeling, and the ideas presented here could have far-reaching implications for a wide range of machine learning and statistical applications. However, further work is needed to fully explore the strengths, weaknesses, and practical considerations of this approach.

Conclusion

This paper introduces a novel approach to learning mixture models by allowing the mixture components to subtract probability mass or density, in contrast to the traditional additive approach. The researchers demonstrate that this subtractive mixture modeling technique can significantly reduce the number of components required to accurately model complex real-world distributions.

By squaring the mixture components and representing them using probabilistic circuits, the authors are able to learn these subtractive mixtures while ensuring they encode a valid, non-negative probability distribution. Theoretically, they prove that this class of squared circuits with subtractions can be exponentially more expressive than traditional additive mixtures.

Empirically, the researchers show the increased expressiveness of their subtractive mixture models on a series of real-world density estimation tasks. This work represents an important advance in the field of mixture modeling, with potential applications in a wide range of machine learning and statistical domains.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

👁️

Subtractive Mixture Models via Squaring: Representation and Learning

Lorenzo Loconte, Aleksanteri M. Sladek, Stefan Mengel, Martin Trapp, Arno Solin, Nicolas Gillis, Antonio Vergari

Mixture models are traditionally represented and learned by adding several distributions as components. Allowing mixtures to subtract probability mass or density can drastically reduce the number of components needed to model complex distributions. However, learning such subtractive mixtures while ensuring they still encode a non-negative function is challenging. We investigate how to learn and perform inference on deep subtractive mixtures by squaring them. We do this in the framework of probabilistic circuits, which enable us to represent tensorized mixtures and generalize several other subtractive models. We theoretically prove that the class of squared circuits allowing subtractions can be exponentially more expressive than traditional additive mixtures; and, we empirically show this increased expressiveness on a series of real-world distribution estimation tasks.

4/29/2024

Sum of Squares Circuits

Lorenzo Loconte, Stefan Mengel, Antonio Vergari

Designing expressive generative models that support exact and efficient inference is a core question in probabilistic ML. Probabilistic circuits (PCs) offer a framework where this tractability-vs-expressiveness trade-off can be analyzed theoretically. Recently, squared PCs encoding subtractive mixtures via negative parameters have emerged as tractable models that can be exponentially more expressive than monotonic PCs, i.e., PCs with positive parameters only. In this paper, we provide a more precise theoretical characterization of the expressiveness relationships among these models. First, we prove that squared PCs can be less expressive than monotonic ones. Second, we formalize a novel class of PCs -- sum of squares PCs -- that can be exponentially more expressive than both squared and monotonic PCs. Around sum of squares PCs, we build an expressiveness hierarchy that allows us to precisely unify and separate different tractable model classes such as Born Machines and PSD models, and other recently introduced tractable probabilistic models by using complex parameters. Finally, we empirically show the effectiveness of sum of squares circuits in performing distribution estimation.

8/22/2024

On the Relationship Between Monotone and Squared Probabilistic Circuits

Benjie Wang, Guy Van den Broeck

Probabilistic circuits are a unifying representation of functions as computation graphs of weighted sums and products. Their primary application is in probabilistic modeling, where circuits with non-negative weights (monotone circuits) can be used to represent and learn density/mass functions, with tractable marginal inference. Recently, it was proposed to instead represent densities as the square of the circuit function (squared circuits); this allows the use of negative weights while retaining tractability, and can be exponentially more compact than monotone circuits. Unfortunately, we show the reverse also holds, meaning that monotone circuits and squared circuits are incomparable in general. This raises the question of whether we can reconcile, and indeed improve upon the two modeling approaches. We answer in the positive by proposing InceptionPCs, a novel type of circuit that naturally encompasses both monotone circuits and squared circuits as special cases, and employs complex parameters. Empirically, we validate that InceptionPCs can outperform both monotone and squared circuits on image datasets.

8/6/2024

🏅

Learning Mixtures of Gaussians Using Diffusion Models

Khashayar Gatmiry, Jonathan Kelner, Holden Lee

We give a new algorithm for learning mixtures of $k$ Gaussians (with identity covariance in $mathbb{R}^n$) to TV error $varepsilon$, with quasi-polynomial ($O(n^{text{poly log}left(frac{n+k}{varepsilon}right)})$) time and sample complexity, under a minimum weight assumption. Unlike previous approaches, most of which are algebraic in nature, our approach is analytic and relies on the framework of diffusion models. Diffusion models are a modern paradigm for generative modeling, which typically rely on learning the score function (gradient log-pdf) along a process transforming a pure noise distribution, in our case a Gaussian, to the data distribution. Despite their dazzling performance in tasks such as image generation, there are few end-to-end theoretical guarantees that they can efficiently learn nontrivial families of distributions; we give some of the first such guarantees. We proceed by deriving higher-order Gaussian noise sensitivity bounds for the score functions for a Gaussian mixture to show that that they can be inductively learned using piecewise polynomial regression (up to poly-logarithmic degree), and combine this with known convergence results for diffusion models. Our results extend to continuous mixtures of Gaussians where the mixing distribution is supported on a union of $k$ balls of constant radius. In particular, this applies to the case of Gaussian convolutions of distributions on low-dimensional manifolds, or more generally sets with small covering number.

4/30/2024