LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control

Read original: arXiv:2407.03168 - Published 7/4/2024 by Jianzhu Guo, Dingyun Zhang, Xiaoqiang Liu, Zhizhou Zhong, Yuan Zhang, Pengfei Wan, Di Zhang

139

LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control

Overview

This paper presents "LivePortrait", a system for efficient and controllable portrait animation
The system combines stitching and retargeting techniques to generate seamless and expressive portrait animations from input video
Key innovations include a novel stitching algorithm and retargeting controls for the animated portrait

Plain English Explanation

The paper introduces a system called "LivePortrait" that can take a video of a person's face and turn it into an animated portrait. The system uses a combination of two key techniques:

Stitching: The system can stitch together different facial expressions and movements from the input video to create a smooth, seamless animation. This helps avoid any jarring transitions or glitches in the final animation.
Retargeting Control: The system gives the user control over how the animation is retargeted, allowing them to adjust things like the size, position, and even the emotional expression of the animated portrait. This level of control is useful for applications like virtual avatars or video production.

The core innovations in this paper are the novel stitching algorithm and the retargeting control capabilities. These allow the LivePortrait system to generate high-quality, customizable portrait animations efficiently from simple input videos.

Technical Explanation

The LivePortrait system takes a video of a person's face as input and produces an animated portrait as output. The key technical innovations are:

Stitching Algorithm: The system uses a novel stitching algorithm to seamlessly combine different facial expressions and movements from the input video. This involves aligning and blending the facial features to create a smooth animation, while preserving the natural dynamics of the original footage.
Retargeting Controls: LivePortrait provides users with fine-grained control over the retargeting of the animated portrait. This includes adjusting the size, position, and even the emotional expression of the animated face. These controls are powered by a deep learning-based model that can manipulate the portrait animation in real-time.

The paper also describes the system architecture and implementation details, as well as extensive evaluations comparing LivePortrait to related approaches. The results demonstrate the system's ability to generate high-quality, controllable portrait animations efficiently from simple input videos.

Critical Analysis

The LivePortrait system represents a significant advance in portrait animation technology, addressing key limitations of prior work. The stitching algorithm and retargeting controls are novel and effective, allowing for the creation of seamless and customizable animations.

However, the paper does not explore some potential limitations or areas for further research. For example, the system may struggle with input videos that have significant occlusions or poor lighting conditions. Additionally, the retargeting controls are currently limited to a pre-defined set of emotional expressions, and it would be interesting to see if the system could be extended to support more nuanced and personalized animation controls.

Overall, the LivePortrait system is a promising step forward in the field of portrait animation, and the techniques introduced in this paper could have important implications for applications such as virtual avatars, video production, and human-computer interaction. Further research and development in this area could lead to even more advanced and versatile portrait animation systems.

Conclusion

The LivePortrait system presented in this paper introduces novel stitching and retargeting techniques to enable efficient and controllable portrait animation from input videos. The key innovations, including the stitching algorithm and retargeting controls, allow the system to generate high-quality, seamless animations with a high degree of customization.

While the paper does not explore all possible limitations or areas for future work, the LivePortrait system represents a significant advancement in the field of portrait animation. The techniques introduced could have important applications in various domains, such as virtual avatars, video production, and human-computer interaction. Further research in this area could lead to even more advanced and versatile portrait animation systems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

139

LivePortrait: Efficient Portrait Animation with Stitching and Retargeting Control

Jianzhu Guo, Dingyun Zhang, Xiaoqiang Liu, Zhizhou Zhong, Yuan Zhang, Pengfei Wan, Di Zhang

Portrait Animation aims to synthesize a lifelike video from a single source image, using it as an appearance reference, with motion (i.e., facial expressions and head pose) derived from a driving video, audio, text, or generation. Instead of following mainstream diffusion-based methods, we explore and extend the potential of the implicit-keypoint-based framework, which effectively balances computational efficiency and controllability. Building upon this, we develop a video-driven portrait animation framework named LivePortrait with a focus on better generalization, controllability, and efficiency for practical usage. To enhance the generation quality and generalization ability, we scale up the training data to about 69 million high-quality frames, adopt a mixed image-video training strategy, upgrade the network architecture, and design better motion transformation and optimization objectives. Additionally, we discover that compact implicit keypoints can effectively represent a kind of blendshapes and meticulously propose a stitching and two retargeting modules, which utilize a small MLP with negligible computational overhead, to enhance the controllability. Experimental results demonstrate the efficacy of our framework even compared to diffusion-based methods. The generation speed remarkably reaches 12.8ms on an RTX 4090 GPU with PyTorch. The inference code and models are available at https://github.com/KwaiVGI/LivePortrait

7/4/2024

X-Portrait: Expressive Portrait Animation with Hierarchical Motion Attention

You Xie, Hongyi Xu, Guoxian Song, Chao Wang, Yichun Shi, Linjie Luo

We propose X-Portrait, an innovative conditional diffusion model tailored for generating expressive and temporally coherent portrait animation. Specifically, given a single portrait as appearance reference, we aim to animate it with motion derived from a driving video, capturing both highly dynamic and subtle facial expressions along with wide-range head movements. As its core, we leverage the generative prior of a pre-trained diffusion model as the rendering backbone, while achieve fine-grained head pose and expression control with novel controlling signals within the framework of ControlNet. In contrast to conventional coarse explicit controls such as facial landmarks, our motion control module is learned to interpret the dynamics directly from the original driving RGB inputs. The motion accuracy is further enhanced with a patch-based local control module that effectively enhance the motion attention to small-scale nuances like eyeball positions. Notably, to mitigate the identity leakage from the driving signals, we train our motion control modules with scaling-augmented cross-identity images, ensuring maximized disentanglement from the appearance reference modules. Experimental results demonstrate the universal effectiveness of X-Portrait across a diverse range of facial portraits and expressive driving sequences, and showcase its proficiency in generating captivating portrait animations with consistently maintained identity characteristics.

7/29/2024

Portrait Video Editing Empowered by Multimodal Generative Priors

Xuan Gao, Haiyao Xiao, Chenglai Zhong, Shimin Hu, Yudong Guo, Juyong Zhang

We introduce PortraitGen, a powerful portrait video editing method that achieves consistent and expressive stylization with multimodal prompts. Traditional portrait video editing methods often struggle with 3D and temporal consistency, and typically lack in rendering quality and efficiency. To address these issues, we lift the portrait video frames to a unified dynamic 3D Gaussian field, which ensures structural and temporal coherence across frames. Furthermore, we design a novel Neural Gaussian Texture mechanism that not only enables sophisticated style editing but also achieves rendering speed over 100FPS. Our approach incorporates multimodal inputs through knowledge distilled from large-scale 2D generative models. Our system also incorporates expression similarity guidance and a face-aware portrait editing module, effectively mitigating degradation issues associated with iterative dataset updates. Extensive experiments demonstrate the temporal consistency, editing efficiency, and superior rendering quality of our method. The broad applicability of the proposed approach is demonstrated through various applications, including text-driven editing, image-driven editing, and relighting, highlighting its great potential to advance the field of video editing. Demo videos and released code are provided in our project page: https://ustc3dv.github.io/PortraitGen/

9/23/2024

MobilePortrait: Real-Time One-Shot Neural Head Avatars on Mobile Devices

Jianwen Jiang, Gaojie Lin, Zhengkun Rong, Chao Liang, Yongming Zhu, Jiaqi Yang, Tianyun Zhong

Existing neural head avatars methods have achieved significant progress in the image quality and motion range of portrait animation. However, these methods neglect the computational overhead, and to the best of our knowledge, none is designed to run on mobile devices. This paper presents MobilePortrait, a lightweight one-shot neural head avatars method that reduces learning complexity by integrating external knowledge into both the motion modeling and image synthesis, enabling real-time inference on mobile devices. Specifically, we introduce a mixed representation of explicit and implicit keypoints for precise motion modeling and precomputed visual features for enhanced foreground and background synthesis. With these two key designs and using simple U-Nets as backbones, our method achieves state-of-the-art performance with less than one-tenth the computational demand. It has been validated to reach speeds of over 100 FPS on mobile devices and support both video and audio-driven inputs.

7/9/2024