UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization

Read original: arXiv:2408.05939 - Published 9/9/2024 by Junjie He, Yifeng Geng, Liefeng Bo

UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization

Overview

The paper proposes a unified framework called UniPortrait for identity-preserving single- and multi-human image personalization.
It enables personalized editing of human images while preserving the individual's identity.
The framework can handle both single-human and multi-human images.

Plain English Explanation

The UniPortrait framework is designed to allow people to personalize images of themselves or others in a way that preserves the individual's identity. This means you can make changes to the image, such as altering the clothing or background, without dramatically changing how the person in the image looks.

The framework works for both single-person images and images with multiple people. This allows users to customize group photos or images with multiple individuals in a consistent way. The key innovation is the ability to maintain the core identity of the people in the image even as other elements are changed.

This could be useful for tasks like editing family photos, creating personalized avatars, or generating images for virtual communication, while still ensuring the people remain recognizable. The framework aims to strike a balance between personalization and preserving the individual's identity.

Technical Explanation

The UniPortrait framework consists of several key components. First, it uses a 3D head generation model to create a 3D representation of the person's head from a single 2D image. This 3D head can then be used to generate personalized images while maintaining the individual's identity.

The framework also includes a multimodal fine-grained identity disentanglement model that can separate different aspects of a person's identity, such as their facial features, hairstyle, and clothing. This allows the system to independently edit these elements while preserving the core identity.

For multi-human images, the framework uses a capture your moments parallel universes approach to handle the multiple individuals. This ensures that the identity of each person is maintained even as the image is personalized.

Overall, the UniPortrait framework represents a significant advance in the field of identity-preserving image personalization, with applications in areas like photography, virtual communication, and digital avatar creation.

Critical Analysis

The paper provides a comprehensive and technically robust framework for identity-preserving image personalization. However, some potential limitations and areas for further research are worth noting.

The framework relies on accurate 3D head reconstruction and identity disentanglement, which can be challenging, especially for more diverse or complex facial features. Further research may be needed to improve the robustness and generalization of these core components.

Additionally, the paper does not address potential privacy concerns or ethical considerations around the use of such technology. As with any system that can manipulate human images, there are risks of misuse or unintended consequences that should be carefully considered.

Furthermore, the paper focuses primarily on technical aspects and does not provide much insight into user experience or real-world deployment challenges. Understanding how such a framework would be received and used by the general public would be an important area for future research.

Conclusion

The UniPortrait framework represents a significant advancement in the field of identity-preserving image personalization. By enabling both single-human and multi-human image customization while preserving individual identities, the framework has the potential to impact a wide range of applications, from photography and virtual communication to digital avatar creation.

The technical innovations in 3D head generation, identity disentanglement, and parallel universe handling demonstrate the depth of the research. However, further work is needed to address potential limitations and ensure the ethical and responsible deployment of such technology. Overall, the UniPortrait framework is an exciting step forward in the field of image personalization.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization

Junjie He, Yifeng Geng, Liefeng Bo

This paper presents UniPortrait, an innovative human image personalization framework that unifies single- and multi-ID customization with high face fidelity, extensive facial editability, free-form input description, and diverse layout generation. UniPortrait consists of only two plug-and-play modules: an ID embedding module and an ID routing module. The ID embedding module extracts versatile editable facial features with a decoupling strategy for each ID and embeds them into the context space of diffusion models. The ID routing module then combines and distributes these embeddings adaptively to their respective regions within the synthesized image, achieving the customization of single and multiple IDs. With a carefully designed two-stage training scheme, UniPortrait achieves superior performance in both single- and multi-ID customization. Quantitative and qualitative experiments demonstrate the advantages of our method over existing approaches as well as its good scalability, e.g., the universal compatibility with existing generative control tools. The project page is at https://aigcdesigngroup.github.io/UniPortrait-Page/ .

9/9/2024

ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving

Jiehui Huang, Xiao Dong, Wenhui Song, Hanhui Li, Jun Zhou, Yuhao Cheng, Shutao Liao, Long Chen, Yiqiang Yan, Shengcai Liao, Xiaodan Liang

Diffusion-based technologies have made significant strides, particularly in personalized and customized facialgeneration. However, existing methods face challenges in achieving high-fidelity and detailed identity (ID)consistency, primarily due to insufficient fine-grained control over facial areas and the lack of a comprehensive strategy for ID preservation by fully considering intricate facial details and the overall face. To address these limitations, we introduce ConsistentID, an innovative method crafted for diverseidentity-preserving portrait generation under fine-grained multimodal facial prompts, utilizing only a single reference image. ConsistentID comprises two key components: a multimodal facial prompt generator that combines facial features, corresponding facial descriptions and the overall facial context to enhance precision in facial details, and an ID-preservation network optimized through the facial attention localization strategy, aimed at preserving ID consistency in facial regions. Together, these components significantly enhance the accuracy of ID preservation by introducing fine-grained multimodal ID information from facial regions. To facilitate training of ConsistentID, we present a fine-grained portrait dataset, FGID, with over 500,000 facial images, offering greater diversity and comprehensiveness than existing public facial datasets. % such as LAION-Face, CelebA, FFHQ, and SFHQ. Experimental results substantiate that our ConsistentID achieves exceptional precision and diversity in personalized facial generation, surpassing existing methods in the MyStyle dataset. Furthermore, while ConsistentID introduces more multimodal ID information, it maintains a fast inference speed during generation.

4/26/2024

Portrait3D: 3D Head Generation from Single In-the-wild Portrait Image

Jinkun Hao, Junshu Tang, Jiangning Zhang, Ran Yi, Yijia Hong, Moran Li, Weijian Cao, Yating Wang, Lizhuang Ma

While recent works have achieved great success on one-shot 3D common object generation, high quality and fidelity 3D head generation from a single image remains a great challenge. Previous text-based methods for generating 3D heads were limited by text descriptions and image-based methods struggled to produce high-quality head geometry. To handle this challenging problem, we propose a novel framework, Portrait3D, to generate high-quality 3D heads while preserving their identities. Our work incorporates the identity information of the portrait image into three parts: 1) geometry initialization, 2) geometry sculpting, and 3) texture generation stages. Given a reference portrait image, we first align the identity features with text features to realize ID-aware guidance enhancement, which contains the control signals representing the face information. We then use the canny map, ID features of the portrait image, and a pre-trained text-to-normal/depth diffusion model to generate ID-aware geometry supervision, and 3D-GAN inversion is employed to generate ID-aware geometry initialization. Furthermore, with the ability to inject identity information into 3D head generation, we use ID-aware guidance to calculate ID-aware Score Distillation (ISD) for geometry sculpting. For texture generation, we adopt the ID Consistent Texture Inpainting and Refinement which progressively expands the view for texture inpainting to obtain an initialization UV texture map. We then use the id-aware guidance to provide image-level supervision for noisy multi-view images to obtain a refined texture map. Extensive experiments demonstrate that we can generate high-quality 3D heads with accurate geometry and texture from single in-the-wild portrait images. The project page is at https://jinkun-hao.github.io/Portrait3D/.

6/26/2024

🧠

CapHuman: Capture Your Moments in Parallel Universes

Chao Liang, Fan Ma, Linchao Zhu, Yingying Deng, Yi Yang

We concentrate on a novel human-centric image synthesis task, that is, given only one reference facial photograph, it is expected to generate specific individual images with diverse head positions, poses, facial expressions, and illuminations in different contexts. To accomplish this goal, we argue that our generative model should be capable of the following favorable characteristics: (1) a strong visual and semantic understanding of our world and human society for basic object and human image generation. (2) generalizable identity preservation ability. (3) flexible and fine-grained head control. Recently, large pre-trained text-to-image diffusion models have shown remarkable results, serving as a powerful generative foundation. As a basis, we aim to unleash the above two capabilities of the pre-trained model. In this work, we present a new framework named CapHuman. We embrace the encode then learn to align paradigm, which enables generalizable identity preservation for new individuals without cumbersome tuning at inference. CapHuman encodes identity features and then learns to align them into the latent space. Moreover, we introduce the 3D facial prior to equip our model with control over the human head in a flexible and 3D-consistent manner. Extensive qualitative and quantitative analyses demonstrate our CapHuman can produce well-identity-preserved, photo-realistic, and high-fidelity portraits with content-rich representations and various head renditions, superior to established baselines. Code and checkpoint will be released at https://github.com/VamosC/CapHuman.

5/20/2024