Discrete Flow Matching

Read original: arXiv:2407.15595 - Published 7/23/2024 by Itai Gat, Tal Remez, Neta Shaul, Felix Kreuk, Ricky T. Q. Chen, Gabriel Synnaeve, Yossi Adi, Yaron Lipman

📊

Overview

This paper introduces a novel approach called Discrete Flow Matching for generating high-quality discrete data, such as language, using flow-based models.
Discrete Flow Matching offers several key contributions, including a general framework for probability path interpolation, a generic sampling formula, and improved generative performance compared to previous discrete flow and diffusion models.
The approach scales up to 1.7B parameters and achieves strong results on coding benchmarks, bridging the gap between autoregressive and discrete flow models for generating discrete data.

Plain English Explanation

Discrete Flow Matching is a new way to generate high-quality discrete data, like text or computer code, using a type of machine learning model called a flow-based model. Flow-based models are powerful for generating continuous data like images, but they haven't worked as well for discrete data until now.

The key ideas behind Discrete Flow Matching are:

Probability Paths: It uses a general family of probability paths that can smoothly interpolate between the starting distribution (e.g., random noise) and the target distribution (e.g., natural language). This gives the model flexibility in how it generates the data.
Sampling Formula: It has a generic formula for sampling from these probability paths using learned models that can predict the current data point and the amount of noise in the data. This allows the model to efficiently generate new samples.
Improved Performance: By using different schedules for the probability paths, the model can generate discrete data with much better quality than previous discrete flow and diffusion models. This helps it get closer to the performance of autoregressive models, which generate data one piece at a time.
Scalability: The authors scale up Discrete Flow Matching to have 1.7 billion parameters, which allows it to achieve strong results on coding benchmarks, where it can generate high-quality computer code.

Overall, Discrete Flow Matching represents an important step forward in using flow-based models to generate discrete data, potentially opening up new applications in areas like language modeling and code generation.

Technical Explanation

Discrete Flow Matching builds on the success of flow-based models and diffusion models for generating continuous data like images, and applies these ideas to the problem of generating discrete data, such as text or computer code.

The key technical contributions of the paper are:

Probability Path Formulation: The authors define a general family of probability paths that can smoothly interpolate between the starting distribution (e.g., random noise) and the target distribution (e.g., natural language). This gives the model flexibility in how it generates the data.
Sampling Formula: The paper provides a generic formula for sampling from these probability paths using learned models that can predict the current data point and the amount of noise in the data. This allows the model to efficiently generate new samples.
Improved Generative Performance: By experimenting with different schedules for the probability paths, the authors show that Discrete Flow Matching can generate discrete data with much better perplexity compared to previous discrete flow and diffusion models. This helps bridge the gap between autoregressive models and discrete flow models.
Scalability: The authors scale up Discrete Flow Matching to 1.7 billion parameters, which allows it to achieve strong results on coding benchmarks, such as HumanEval and 1-shot MBPP, where it can generate high-quality computer code.

Critical Analysis

The paper presents a promising approach for using flow-based models to generate high-quality discrete data, but there are a few potential limitations and areas for further research:

Benchmark Limitations: While the coding benchmarks used in the paper are valuable, they may not fully capture the complexity of natural language generation. Further testing on more diverse language tasks would help validate the approach.
Computational Complexity: Scaling the Discrete Flow Matching model to 1.7 billion parameters suggests it may have high computational and memory requirements, which could limit its practical applicability, especially on resource-constrained devices.
Interpretability: As with many large-scale neural models, the inner workings of Discrete Flow Matching may be difficult to interpret, which could hinder its use in applications that require explainable AI.
Potential Biases: Like other data-driven approaches, Discrete Flow Matching may inherit biases present in the training data, which could lead to undesirable outputs. Careful monitoring and mitigation of such biases would be important.

Overall, Discrete Flow Matching represents an interesting advance in the field of discrete data generation, but further research and real-world testing would be needed to fully evaluate its capabilities and limitations.

Conclusion

Discrete Flow Matching is a novel approach that extends the power of flow-based models to the generation of high-quality discrete data, such as text and code. By leveraging a general framework for probability path interpolation and a generic sampling formula, the model is able to outperform previous discrete flow and diffusion models, bridging the gap with autoregressive approaches.

The scalability of Discrete Flow Matching, demonstrated by the 1.7 billion parameter model, suggests that this approach could have significant practical applications in areas like language modeling and code generation. However, further research is needed to address potential limitations, such as computational complexity, interpretability, and potential biases.

Overall, Discrete Flow Matching represents an exciting development in the field of generative modeling, and its ability to generate high-quality discrete data in a non-autoregressive fashion could pave the way for new applications and advancements in AI.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

📊

Discrete Flow Matching

Itai Gat, Tal Remez, Neta Shaul, Felix Kreuk, Ricky T. Q. Chen, Gabriel Synnaeve, Yossi Adi, Yaron Lipman

Despite Flow Matching and diffusion models having emerged as powerful generative paradigms for continuous variables such as images and videos, their application to high-dimensional discrete data, such as language, is still limited. In this work, we present Discrete Flow Matching, a novel discrete flow paradigm designed specifically for generating discrete data. Discrete Flow Matching offers several key contributions: (i) it works with a general family of probability paths interpolating between source and target distributions; (ii) it allows for a generic formula for sampling from these probability paths using learned posteriors such as the probability denoiser ($x$-prediction) and noise-prediction ($epsilon$-prediction); (iii) practically, focusing on specific probability paths defined with different schedulers considerably improves generative perplexity compared to previous discrete diffusion and flow models; and (iv) by scaling Discrete Flow Matching models up to 1.7B parameters, we reach 6.7% Pass@1 and 13.4% Pass@10 on HumanEval and 6.7% Pass@1 and 20.6% Pass@10 on 1-shot MBPP coding benchmarks. Our approach is capable of generating high-quality discrete data in a non-autoregressive fashion, significantly closing the gap between autoregressive models and discrete flow models.

7/23/2024

📊

Fisher Flow Matching for Generative Modeling over Discrete Data

Oscar Davis, Samuel Kessler, Mircea Petrache, .Ismail .Ilkan Ceylan, Michael Bronstein, Avishek Joey Bose

Generative modeling over discrete data has recently seen numerous success stories, with applications spanning language modeling, biological sequence design, and graph-structured molecular data. The predominant generative modeling paradigm for discrete data is still autoregressive, with more recent alternatives based on diffusion or flow-matching falling short of their impressive performance in continuous data settings, such as image or video generation. In this work, we introduce Fisher-Flow, a novel flow-matching model for discrete data. Fisher-Flow takes a manifestly geometric perspective by considering categorical distributions over discrete data as points residing on a statistical manifold equipped with its natural Riemannian metric: the $textit{Fisher-Rao metric}$. As a result, we demonstrate discrete data itself can be continuously reparameterised to points on the positive orthant of the $d$-hypersphere $mathbb{S}^d_+$, which allows us to define flows that map any source distribution to target in a principled manner by transporting mass along (closed-form) geodesics of $mathbb{S}^d_+$. Furthermore, the learned flows in Fisher-Flow can be further bootstrapped by leveraging Riemannian optimal transport leading to improved training dynamics. We prove that the gradient flow induced by Fisher-Flow is optimal in reducing the forward KL divergence. We evaluate Fisher-Flow on an array of synthetic and diverse real-world benchmarks, including designing DNA Promoter, and DNA Enhancer sequences. Empirically, we find that Fisher-Flow improves over prior diffusion and flow-matching models on these benchmarks.

5/30/2024

Flow Map Matching

Nicholas M. Boffi, Michael S. Albergo, Eric Vanden-Eijnden

Generative models based on dynamical transport of measure, such as diffusion models, flow matching models, and stochastic interpolants, learn an ordinary or stochastic differential equation whose trajectories push initial conditions from a known base distribution onto the target. While training is cheap, samples are generated via simulation, which is more expensive than one-step models like GANs. To close this gap, we introduce flow map matching -- an algorithm that learns the two-time flow map of an underlying ordinary differential equation. The approach leads to an efficient few-step generative model whose step count can be chosen a-posteriori to smoothly trade off accuracy for computational expense. Leveraging the stochastic interpolant framework, we introduce losses for both direct training of flow maps and distillation from pre-trained (or otherwise known) velocity fields. Theoretically, we show that our approach unifies many existing few-step generative models, including consistency models, consistency trajectory models, progressive distillation, and neural operator approaches, which can be obtained as particular cases of our formalism. With experiments on CIFAR-10 and ImageNet 32x32, we show that flow map matching leads to high-quality samples with significantly reduced sampling cost compared to diffusion or stochastic interpolant methods.

6/12/2024

📶

Dirichlet Flow Matching with Applications to DNA Sequence Design

Hannes Stark, Bowen Jing, Chenyu Wang, Gabriele Corso, Bonnie Berger, Regina Barzilay, Tommi Jaakkola

Discrete diffusion or flow models could enable faster and more controllable sequence generation than autoregressive models. We show that naive linear flow matching on the simplex is insufficient toward this goal since it suffers from discontinuities in the training target and further pathologies. To overcome this, we develop Dirichlet flow matching on the simplex based on mixtures of Dirichlet distributions as probability paths. In this framework, we derive a connection between the mixtures' scores and the flow's vector field that allows for classifier and classifier-free guidance. Further, we provide distilled Dirichlet flow matching, which enables one-step sequence generation with minimal performance hits, resulting in $O(L)$ speedups compared to autoregressive models. On complex DNA sequence generation tasks, we demonstrate superior performance compared to all baselines in distributional metrics and in achieving desired design targets for generated sequences. Finally, we show that our classifier-free guidance approach improves unconditional generation and is effective for generating DNA that satisfies design targets. Code is available at https://github.com/HannesStark/dirichlet-flow-matching.

6/3/2024