Lifting Architectural Constraints of Injective Flows

Read original: arXiv:2306.01843 - Published 6/28/2024 by Peter Sorrenson, Felix Draxler, Armand Rousselot, Sander Hummerich, Lea Zimmermann, Ullrich Kothe

🤔

Overview

Normalizing Flows models aim to learn the full-dimensional likelihood of training data, but real data often lies on a lower-dimensional manifold, leading to wasted computation modeling noise.
Injective Flows address this by jointly learning the data manifold and the distribution on it, but have been limited by restrictive architectures or high computational cost.
This paper introduces a new efficient maximum likelihood estimator that enables the use of free-form bottleneck architectures, and a stable training objective to avoid divergent solutions when jointly learning the manifold and distribution.

Plain English Explanation

Normalizing Flows are a type of machine learning model that try to learn the full, multidimensional distribution of the training data. However, real-world data often only exists on a lower-dimensional surface or "manifold" within the full data space. This means Normalizing Flows end up wasting a lot of computational effort trying to model noise or irrelevant dimensions.

Injective Flows aim to solve this by jointly learning both the data manifold and the probability distribution on that manifold. This allows the model to focus its capacity on the relevant, lower-dimensional structure of the data. But so far, Injective Flow models have been limited by either very specific architectural constraints or high computational costs.

This new paper introduces a new, more efficient way to train Injective Flow models that works with more flexible, "free-form" architectures. The key insight is to use a new training objective that avoids the problem of the manifold and distribution learning diverging from each other during training, leading to more stable and effective models.

Technical Explanation

The core technical contribution of this paper is a new efficient maximum likelihood estimator for Injective Flow models, which enables the use of flexible, "free-form" bottleneck architectures. Previous Injective Flow approaches were limited to more restrictive architectures or incurred high computational costs.

The authors show that naively training Injective Flows to jointly learn the data manifold and the distribution on that manifold can lead to divergent solutions, where the learned manifold and distribution drift apart. To address this, they propose a new training objective that encourages the manifold and distribution to stay aligned.

The authors demonstrate the effectiveness of their approach through extensive experiments on toy datasets, tabular data, and image data, showing competitive performance compared to prior Normalizing Flows and Injective Flows methods, such as MixerFlow, Universality of Coupling-based Normalizing Flows, Kernelized Normalizing Flows, and Generative Assignment Flows.

Critical Analysis

The authors acknowledge that their approach, like other Injective Flow models, still requires some architectural constraints to ensure the learned manifold is a valid geometric object. While the free-form bottleneck architecture provides more flexibility, there are still limitations on the types of manifolds that can be represented.

Additionally, the authors note that their training objective, while effective, is heuristic in nature and does not provide any formal guarantees of convergence or optimality. Further theoretical analysis of the properties of this training objective would be valuable.

Finally, the authors only evaluate their method on relatively simple datasets. Scaling Injective Flow models to higher-dimensional, more complex real-world datasets remains an open challenge that is not fully addressed in this work.

Conclusion

This paper presents an important advance in Injective Flow models, a class of neural generative models that can jointly learn the data manifold and the distribution on that manifold. By introducing a new efficient maximum likelihood estimator and a stabilized training objective, the authors enable the use of more flexible architectural designs while maintaining strong empirical performance.

These developments help address the core limitation of standard Normalizing Flows, which is their tendency to waste capacity on modeling irrelevant dimensions of the data. As machine learning models are increasingly applied to high-dimensional, real-world datasets, techniques like Injective Flows that can adapt to the intrinsic dimensionality of the data will become increasingly valuable.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🤔

Lifting Architectural Constraints of Injective Flows

Peter Sorrenson, Felix Draxler, Armand Rousselot, Sander Hummerich, Lea Zimmermann, Ullrich Kothe

Normalizing Flows explicitly maximize a full-dimensional likelihood on the training data. However, real data is typically only supported on a lower-dimensional manifold leading the model to expend significant compute on modeling noise. Injective Flows fix this by jointly learning a manifold and the distribution on it. So far, they have been limited by restrictive architectures and/or high computational cost. We lift both constraints by a new efficient estimator for the maximum likelihood loss, compatible with free-form bottleneck architectures. We further show that naively learning both the data manifold and the distribution on it can lead to divergent solutions, and use this insight to motivate a stable maximum likelihood training objective. We perform extensive experiments on toy, tabular and image data, demonstrating the competitive performance of the resulting model.

6/28/2024

Injective Flows for parametric hypersurfaces

Marcello Massimo Negri, Jonathan Aellen, Volker Roth

Normalizing Flows (NFs) are powerful and efficient models for density estimation. When modeling densities on manifolds, NFs can be generalized to injective flows but the Jacobian determinant becomes computationally prohibitive. Current approaches either consider bounds on the log-likelihood or rely on some approximations of the Jacobian determinant. In contrast, we propose injective flows for parametric hypersurfaces and show that for such manifolds we can compute the Jacobian determinant exactly and efficiently, with the same cost as NFs. Furthermore, we show that for the subclass of star-like manifolds we can extend the proposed framework to always allow for a Cartesian representation of the density. We showcase the relevance of modeling densities on hypersurfaces in two settings. Firstly, we introduce a novel Objective Bayesian approach to penalized likelihood models by interpreting level-sets of the penalty as star-like manifolds. Secondly, we consider Bayesian mixture models and introduce a general method for variational inference by defining the posterior of mixture weights on the probability simplex.

6/14/2024

🔄

On the Universality of Coupling-based Normalizing Flows

Felix Draxler, Stefan Wahl, Christoph Schnorr, Ullrich Kothe

We present a novel theoretical framework for understanding the expressive power of normalizing flows. Despite their prevalence in scientific applications, a comprehensive understanding of flows remains elusive due to their restricted architectures. Existing theorems fall short as they require the use of arbitrarily ill-conditioned neural networks, limiting practical applicability. We propose a distributional universality theorem for well-conditioned coupling-based normalizing flows such as RealNVP. In addition, we show that volume-preserving normalizing flows are not universal, what distribution they learn instead, and how to fix their expressivity. Our results support the general wisdom that affine and related couplings are expressive and in general outperform volume-preserving flows, bridging a gap between empirical results and theoretical understanding.

6/6/2024

🗣️

MixerFlow: MLP-Mixer meets Normalising Flows

Eshant English, Matthias Kirchler, Christoph Lippert

Normalising flows are generative models that transform a complex density into a simpler density through the use of bijective transformations enabling both density estimation and data generation from a single model. %However, the requirement for bijectivity imposes the use of specialised architectures. In the context of image modelling, the predominant choice has been the Glow-based architecture, whereas alternative architectures remain largely unexplored in the research community. In this work, we propose a novel architecture called MixerFlow, based on the MLP-Mixer architecture, further unifying the generative and discriminative modelling architectures. MixerFlow offers an efficient mechanism for weight sharing for flow-based models. Our results demonstrate comparative or superior density estimation on image datasets and good scaling as the image resolution increases, making MixerFlow a simple yet powerful alternative to the Glow-based architectures. We also show that MixerFlow provides more informative embeddings than Glow-based architectures and can integrate many structured transformations such as splines or Kolmogorov-Arnold Networks.

6/28/2024