Neural Diffusion Models

2310.08337

Published 6/4/2024 by Grigory Bartosh, Dmitry Vetrov, Christian A. Naesseth

🧠

Abstract

Diffusion models have shown remarkable performance on many generative tasks. Despite recent success, most diffusion models are restricted in that they only allow linear transformation of the data distribution. In contrast, broader family of transformations can potentially help train generative distributions more efficiently, simplifying the reverse process and closing the gap between the true negative log-likelihood and the variational approximation. In this paper, we present Neural Diffusion Models (NDMs), a generalization of conventional diffusion models that enables defining and learning time-dependent non-linear transformations of data. We show how to optimise NDMs using a variational bound in a simulation-free setting. Moreover, we derive a time-continuous formulation of NDMs, which allows fast and reliable inference using off-the-shelf numerical ODE and SDE solvers. Finally, we demonstrate the utility of NDMs with learnable transformations through experiments on standard image generation benchmarks, including CIFAR-10, downsampled versions of ImageNet and CelebA-HQ. NDMs outperform conventional diffusion models in terms of likelihood and produce high-quality samples.

Create account to get full access

Overview

Diffusion models have shown impressive performance in generative tasks, but are limited to linear transformations of the data distribution.
This paper introduces Neural Diffusion Models (NDMs), a generalization that allows for time-dependent non-linear transformations.
NDMs can be optimized using a variational bound in a simulation-free setting and have a time-continuous formulation for fast and reliable inference.
Experiments show NDMs outperform conventional diffusion models in terms of likelihood and sample quality on image generation benchmarks.

Plain English Explanation

Diffusion models are a type of machine learning algorithm that have become very good at generating new images, text, and other types of data. They work by starting with random noise and slowly transforming it into realistic-looking outputs through a step-by-step process.

However, most diffusion models are limited in that they can only perform linear transformations on the data. This means the changes they make to the noise are relatively simple and predictable.

In contrast, Neural Diffusion Models (NDMs) allow for more complex, non-linear transformations of the data. This gives the model more flexibility to learn the true underlying distribution of the data, potentially leading to better generation performance.

The authors of this paper show how to train NDMs using a special optimization technique that doesn't require running the full forward process. They also develop a time-continuous formulation of NDMs that enables fast and reliable inference using standard numerical methods.

When tested on standard image generation benchmarks like CIFAR-10 and ImageNet, NDMs outperformed conventional diffusion models in terms of the likelihood of the generated samples and the visual quality of those samples.

Technical Explanation

The key innovation in this paper is the introduction of Neural Diffusion Models (NDMs), which generalize conventional diffusion models by allowing for time-dependent non-linear transformations of the data distribution.

Whereas standard diffusion models are restricted to linear transformations, NDMs can learn a broader family of transformations. This can help train the generative distribution more efficiently, simplifying the reverse process and narrowing the gap between the true negative log-likelihood and the variational approximation.

The authors show how to optimize NDMs using a variational bound in a simulation-free setting, bypassing the need to run the full forward process. They also derive a time-continuous formulation of NDMs, which enables fast and reliable inference using off-the-shelf numerical ODE and SDE solvers.

Experimental results on standard image generation benchmarks like CIFAR-10, downsampled ImageNet, and CelebA-HQ demonstrate that NDMs outperform conventional diffusion models in terms of likelihood and sample quality. This suggests that learnable non-linear transformations can be a valuable addition to the diffusion modeling toolbox.

Critical Analysis

The paper presents a compelling extension of diffusion models by introducing Neural Diffusion Models (NDMs) with learnable non-linear transformations. However, there are a few potential limitations and areas for further research:

Computational Complexity: While the time-continuous formulation enables efficient inference, the training of NDMs may be more computationally intensive than standard diffusion models due to the additional complexity of the non-linear transformations.
Stability and Convergence: The authors note that training NDMs can be sensitive to hyperparameter settings, and more work may be needed to ensure stable and reliable training across a variety of datasets and tasks.
Interpretability: The increased flexibility of NDMs comes at the cost of reduced interpretability compared to linear diffusion models. Understanding the learned transformations and their relationship to the underlying data distribution may be a challenge.
Further research into methods for interpreting and analyzing diffusion models, such as those explored in "Diffusion Models as Probabilistic Neural Operators", could help address this limitation.
Alternative approaches, like "Learning to Discretize" as discussed in the related work, may also offer opportunities to enhance the flexibility of diffusion models while maintaining interpretability.
The relationship between NDMs and other techniques for incorporating physics-informed priors, as in "Physics-Informed Diffusion Models", could also be an interesting area for further exploration.

Overall, the Neural Diffusion Models presented in this paper represent a promising direction for advancing the state-of-the-art in generative modeling, but additional research is needed to fully understand the capabilities, limitations, and applications of this approach.

Conclusion

This paper introduces Neural Diffusion Models (NDMs), a generalization of conventional diffusion models that enables learning time-dependent non-linear transformations of the data distribution. NDMs are shown to outperform standard diffusion models in terms of likelihood and sample quality on several image generation benchmarks.

The time-continuous formulation of NDMs and the simulation-free optimization technique are key technical contributions that enable efficient training and inference. While NDMs show promise, further research is needed to address potential limitations around computational complexity, training stability, and model interpretability.

Overall, this work represents an important step forward in enhancing the flexibility and performance of diffusion-based generative models, with potential applications in a variety of domains that could benefit from high-quality, likelihood-based sample generation.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🧠

Neural Flow Diffusion Models: Learnable Forward Process for Improved Diffusion Modelling

Grigory Bartosh, Dmitry Vetrov, Christian A. Naesseth

Conventional diffusion models typically relies on a fixed forward process, which implicitly defines complex marginal distributions over latent variables. This can often complicate the reverse process' task in learning generative trajectories, and results in costly inference for diffusion models. To address these limitations, we introduce Neural Flow Diffusion Models (NFDM), a novel framework that enhances diffusion models by supporting a broader range of forward processes beyond the standard Gaussian. We also propose a novel parameterization technique for learning the forward process. Our framework provides an end-to-end, simulation-free optimization objective, effectively minimizing a variational upper bound on the negative log-likelihood. Experimental results demonstrate NFDM's strong performance, evidenced by state-of-the-art likelihood estimation. Furthermore, we investigate NFDM's capacity for learning generative dynamics with specific characteristics, such as deterministic straight lines trajectories, and demonstrate how the framework may be adopted for learning bridges between two distributions. The results underscores NFDM's versatility and its potential for a wide range of applications.

6/4/2024

stat.ML cs.CV cs.LG

Image Neural Field Diffusion Models

Yinbo Chen, Oliver Wang, Richard Zhang, Eli Shechtman, Xiaolong Wang, Michael Gharbi

Diffusion models have shown an impressive ability to model complex data distributions, with several key advantages over GANs, such as stable training, better coverage of the training distribution's modes, and the ability to solve inverse problems without extra training. However, most diffusion models learn the distribution of fixed-resolution images. We propose to learn the distribution of continuous images by training diffusion models on image neural fields, which can be rendered at any resolution, and show its advantages over fixed-resolution models. To achieve this, a key challenge is to obtain a latent space that represents photorealistic image neural fields. We propose a simple and effective method, inspired by several recent techniques but with key changes to make the image neural fields photorealistic. Our method can be used to convert existing latent diffusion autoencoders into image neural field autoencoders. We show that image neural field diffusion models can be trained using mixed-resolution image datasets, outperform fixed-resolution diffusion models followed by super-resolution models, and can solve inverse problems with conditions applied at different scales efficiently.

6/12/2024

cs.CV

🛸

New!Fast Sampling via Discrete Non-Markov Diffusion Models

Zixiang Chen, Huizhuo Yuan, Yongqian Li, Yiwen Kou, Junkai Zhang, Quanquan Gu

Discrete diffusion models have emerged as powerful tools for high-quality data generation. Despite their success in discrete spaces, such as text generation tasks, the acceleration of discrete diffusion models remains under explored. In this paper, we propose a discrete non-Markov diffusion model, which admits an accelerated reverse sampling for discrete data generation. Our method significantly reduces the number of function evaluations (i.e., calls to the neural network), making the sampling process much faster. Furthermore, we study the transition from finite to infinite step sampling, offering new insights into bridging the gap between discrete and continuous-time processes for discrete diffusion models. Extensive experiments on natural language generation and machine translation tasks demonstrate the superior performance of our method in terms of both generation speed and sample quality compared to existing methods for discrete diffusion models.

6/28/2024

cs.LG cs.AI stat.ML

Neural Residual Diffusion Models for Deep Scalable Vision Generation

Zhiyuan Ma, Liangliang Zhao, Biqing Qi, Bowen Zhou

The most advanced diffusion models have recently adopted increasingly deep stacked networks (e.g., U-Net or Transformer) to promote the generative emergence capabilities of vision generation models similar to large language models (LLMs). However, progressively deeper stacked networks will intuitively cause numerical propagation errors and reduce noisy prediction capabilities on generative data, which hinders massively deep scalable training of vision generation models. In this paper, we first uncover the nature that neural networks being able to effectively perform generative denoising lies in the fact that the intrinsic residual unit has consistent dynamic property with the input signal's reverse diffusion process, thus supporting excellent generative abilities. Afterwards, we stand on the shoulders of two common types of deep stacked networks to propose a unified and massively scalable Neural Residual Diffusion Models framework (Neural-RDM for short), which is a simple yet meaningful change to the common architecture of deep generative networks by introducing a series of learnable gated residual parameters that conform to the generative dynamics. Experimental results on various generative tasks show that the proposed neural residual models obtain state-of-the-art scores on image's and video's generative benchmarks. Rigorous theoretical proofs and extensive experiments also demonstrate the advantages of this simple gated residual mechanism consistent with dynamic modeling in improving the fidelity and consistency of generated content and supporting large-scale scalable training. Code is available at https://github.com/Anonymous/Neural-RDM.

6/21/2024

cs.CV cs.AI