RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance

Read original: arXiv:2405.14677 - Published 5/24/2024 by Zhicheng Sun, Zhenhao Yang, Yang Jin, Haozhe Chi, Kun Xu, Kun Xu, Liwei Chen, Hao Jiang, Di Zhang, Yang Song and 2 others

🐍

Overview

This paper explores a new problem: customizing diffusion models to generate identity-preserving images from user-provided reference images.
Existing approaches typically require extensive training on domain-specific images, lacking flexibility across different use cases.
The researchers propose using a training-free technique called classifier guidance to steer diffusion models and enable personalized image generation.

Plain English Explanation

The paper discusses a new way to create personalized images using diffusion models, which are a type of AI system that can generate realistic-looking images. Typically, these models need to be trained on a lot of images from a specific domain (like faces or objects) to be able to generate images that preserve the identity of the subject.

However, the researchers found a way to avoid this extensive training process. They use a technique called "classifier guidance" that allows the diffusion model to be steered using an existing image classifier (a model that can recognize and categorize images). This makes the process more flexible, as it doesn't require training the diffusion model on a large dataset of domain-specific images.

The key insight is that by using a simple mathematical trick, they can get the diffusion model to generate images that match the style and identity of a reference image provided by the user, without needing to extensively train the model. This allows for more personalized and customized image generation, which could be useful for a variety of applications like ID Animator: Zero-Shot Identity Preserving Human Animation, Relation Rectification Diffusion Model, or Customize Your Own Paired Data via Few-Shot Learning.

Technical Explanation

The paper builds on a recent framework called "rectified flow," which provides a stable and convergent way to perform classifier guidance. The key innovation is that the researchers show how to resolve the limitation of vanilla classifier guidance, which requires a specialized classifier, by using a simple fixed-point solution that allows the use of off-the-shelf image discriminators.

Specifically, the paper demonstrates that by anchoring the solving procedure to a reference flow trajectory, the process becomes stable and convergent, enabling flexible personalization with a variety of pre-trained image classifiers. The researchers implement this method on the rectified flow framework and evaluate it on generating personalized images of human faces, live subjects, and certain objects, showing advantageous results compared to previous approaches.

The LCM: Lookahead Encoder-based Text-to-Image and Subject Diffusion: Open-Domain Personalized Text-to-Image papers explore related ideas of personalized image generation, though they use different techniques.

Critical Analysis

The paper presents a compelling solution to the problem of customizing diffusion models for personalized image generation. The key strength is the flexibility of the approach, which avoids the need for extensive domain-specific training and instead leverages off-the-shelf image classifiers.

One potential limitation is the reliance on having a suitable pre-trained image classifier available. While the researchers demonstrate the method works with various classifiers, the performance may still be dependent on the quality and relevance of the chosen classifier.

Additionally, the paper focuses on generating images that preserve the identity of the subject, but it does not explore other aspects of personalization, such as generating images with specific styles, poses, or compositions. Further research could investigate expanding the personalization capabilities of the approach.

Overall, the paper makes a valuable contribution to the field of diffusion models and personalized image generation, and the proposed technique could be a useful tool for a variety of applications.

Conclusion

This paper presents a novel approach to customizing diffusion models for personalized image generation, using a training-free technique called classifier guidance. By exploiting a recent rectified flow framework, the researchers demonstrate a simple fixed-point solution that allows the use of off-the-shelf image classifiers, enabling flexible personalization without the need for extensive domain-specific training.

The results show the proposed method can generate identity-preserving images for human faces, live subjects, and certain objects, outperforming previous approaches. This work represents an important step towards more accessible and versatile personalized image generation, with potential applications in areas like LCM: Lookahead Encoder-based Text-to-Image, ID Animator: Zero-Shot Identity Preserving Human Animation, and Customize Your Own Paired Data via Few-Shot Learning.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🐍

RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance

Zhicheng Sun, Zhenhao Yang, Yang Jin, Haozhe Chi, Kun Xu, Kun Xu, Liwei Chen, Hao Jiang, Di Zhang, Yang Song, Kun Gai, Yadong Mu

Customizing diffusion models to generate identity-preserving images from user-provided reference images is an intriguing new problem. The prevalent approaches typically require training on extensive domain-specific images to achieve identity preservation, which lacks flexibility across different use cases. To address this issue, we exploit classifier guidance, a training-free technique that steers diffusion models using an existing classifier, for personalized image generation. Our study shows that based on a recent rectified flow framework, the major limitation of vanilla classifier guidance in requiring a special classifier can be resolved with a simple fixed-point solution, allowing flexible personalization with off-the-shelf image discriminators. Moreover, its solving procedure proves to be stable when anchored to a reference flow trajectory, with a convergence guarantee. The derived method is implemented on rectified flow with different off-the-shelf image discriminators, delivering advantageous personalization results for human faces, live subjects, and certain objects. Code is available at https://github.com/feifeiobama/RectifID.

5/24/2024

Improving the Training of Rectified Flows

Sangyun Lee, Zinan Lin, Giulia Fanti

Diffusion models have shown great promise for image and video generation, but sampling from state-of-the-art models requires expensive numerical integration of a generative ODE. One approach for tackling this problem is rectified flows, which iteratively learn smooth ODE paths that are less susceptible to truncation error. However, rectified flows still require a relatively large number of function evaluations (NFEs). In this work, we propose improved techniques for training rectified flows, allowing them to compete with knowledge distillation methods even in the low NFE setting. Our main insight is that under realistic settings, a single iteration of the Reflow algorithm for training rectified flows is sufficient to learn nearly straight trajectories; hence, the current practice of using multiple Reflow iterations is unnecessary. We thus propose techniques to improve one-round training of rectified flows, including a U-shaped timestep distribution and LPIPS-Huber premetric. With these techniques, we improve the FID of the previous 2-rectified flow by up to 72% in the 1 NFE setting on CIFAR-10. On ImageNet 64$times$64, our improved rectified flow outperforms the state-of-the-art distillation methods such as consistency distillation and progressive distillation in both one-step and two-step settings and rivals the performance of improved consistency training (iCT) in FID. Code is available at https://github.com/sangyun884/rfpp.

5/31/2024

Text-to-Image Rectified Flow as Plug-and-Play Priors

Xiaofeng Yang, Cheng Chen, Xulei Yang, Fayao Liu, Guosheng Lin

Large-scale diffusion models have achieved remarkable performance in generative tasks. Beyond their initial training applications, these models have proven their ability to function as versatile plug-and-play priors. For instance, 2D diffusion models can serve as loss functions to optimize 3D implicit models. Rectified flow, a novel class of generative models, enforces a linear progression from the source to the target distribution and has demonstrated superior performance across various domains. Compared to diffusion-based methods, rectified flow approaches surpass in terms of generation quality and efficiency, requiring fewer inference steps. In this work, we present theoretical and experimental evidence demonstrating that rectified flow based methods offer similar functionalities to diffusion models - they can also serve as effective priors. Besides the generative capabilities of diffusion priors, motivated by the unique time-symmetry properties of rectified flow models, a variant of our method can additionally perform image inversion. Experimentally, our rectified flow-based priors outperform their diffusion counterparts - the SDS and VSD losses - in text-to-3D generation. Our method also displays competitive performance in image inversion and editing.

6/6/2024

Synthesizing Efficient Data with Diffusion Models for Person Re-Identification Pre-Training

Ke Niu, Haiyang Yu, Xuelin Qian, Teng Fu, Bin Li, Xiangyang Xue

Existing person re-identification (Re-ID) methods principally deploy the ImageNet-1K dataset for model initialization, which inevitably results in sub-optimal situations due to the large domain gap. One of the key challenges is that building large-scale person Re-ID datasets is time-consuming. Some previous efforts address this problem by collecting person images from the internet e.g., LUPerson, but it struggles to learn from unlabeled, uncontrollable, and noisy data. In this paper, we present a novel paradigm Diffusion-ReID to efficiently augment and generate diverse images based on known identities without requiring any cost of data collection and annotation. Technically, this paradigm unfolds in two stages: generation and filtering. During the generation stage, we propose Language Prompts Enhancement (LPE) to ensure the ID consistency between the input image sequence and the generated images. In the diffusion process, we propose a Diversity Injection (DI) module to increase attribute diversity. In order to make the generated data have higher quality, we apply a Re-ID confidence threshold filter to further remove the low-quality images. Benefiting from our proposed paradigm, we first create a new large-scale person Re-ID dataset Diff-Person, which consists of over 777K images from 5,183 identities. Next, we build a stronger person Re-ID backbone pre-trained on our Diff-Person. Extensive experiments are conducted on four person Re-ID benchmarks in six widely used settings. Compared with other pre-training and self-supervised competitors, our approach shows significant superiority.

6/11/2024