EdgeRelight360: Text-Conditioned 360-Degree HDR Image Generation for Real-Time On-Device Video Portrait Relighting

Read original: arXiv:2404.09918 - Published 4/16/2024 by Min-Hui Lin, Mahesh Reddy, Guillaume Berger, Michel Sarkis, Fatih Porikli, Ning Bi

EdgeRelight360: Text-Conditioned 360-Degree HDR Image Generation for Real-Time On-Device Video Portrait Relighting

Overview

This paper presents EdgeRelight360, a system that generates high-dynamic-range (HDR) 360-degree images for real-time video portrait relighting on mobile devices.
The system uses a text-conditional generative model to create HDR images that match a given lighting description, enabling on-device video portrait relighting without the need for complex 3D reconstruction.
The researchers demonstrate that EdgeRelight360 can produce high-quality results in real-time on mobile devices, making it suitable for applications like video conferencing and content creation.

Plain English Explanation

EdgeRelight360 is a system that can change the lighting in 360-degree videos in real-time on your phone or tablet. It works by using a special type of artificial intelligence (AI) model that can generate high-quality, high-dynamic-range (HDR) 360-degree images based on a text description of the desired lighting.

For example, if you wanted to make a video of yourself look like it was taken in bright, warm sunlight, you could type a description like "bright, warm sunlight" and EdgeRelight360 would automatically adjust the lighting in the video to match that description. This allows you to change the lighting of your video without having to set up complex 3D lighting equipment.

The key advantage of EdgeRelight360 is that it can do this lighting adjustment in real-time, directly on your mobile device, without needing to send the video to a powerful computer first. This makes it useful for applications like video calls or live-streamed content creation, where you want to be able to quickly and easily adjust the lighting to get the perfect look.

Technical Explanation

EdgeRelight360 uses a text-conditional generative adversarial network (GAN) to generate the HDR 360-degree images for portrait relighting. The model takes as input both a text description of the desired lighting and the original 360-degree video frame, and outputs a new 360-degree image with the specified lighting applied.

The researchers trained the model on a large dataset of HDR 360-degree images and associated text descriptions, allowing it to learn the relationship between lighting attributes and the resulting image. During inference, the model can then generate a new HDR 360-degree frame that matches the input text prompt.

To enable real-time performance on mobile devices, the researchers developed a lightweight neural network architecture and performed extensive optimization and quantization techniques. This allows EdgeRelight360 to run efficiently on mobile CPUs and GPUs, supporting high-quality video portrait relighting at 30 frames per second.

Critical Analysis

The paper presents a promising approach for real-time video portrait relighting using text-conditional HDR image generation. The researchers demonstrate compelling results and a viable system for practical applications like video conferencing and content creation.

However, the paper does not discuss potential limitations or biases in the model, such as how it may handle diverse skin tones, lighting conditions, or cultural contexts. Further research is needed to assess the robustness and fairness of the system.

Additionally, while the real-time performance on mobile devices is a key strength, the quality of the generated HDR images could potentially be improved by leveraging more powerful hardware or exploring alternative neural network architectures.

Overall, EdgeRelight360 represents an interesting advance in the field of computational photography and augmented reality, with the potential to significantly enhance the user experience for various video-based applications.

Conclusion

The EdgeRelight360 system presents a novel approach to real-time video portrait relighting using text-conditional HDR image generation. By leveraging a lightweight neural network architecture, the system can run efficiently on mobile devices, enabling users to easily adjust the lighting in their videos with natural language prompts.

This technology has the potential to transform various video-based applications, from video conferencing to content creation, by giving users more control over the visual aesthetic of their content. As the researchers continue to refine and expand the capabilities of EdgeRelight360, it could become an increasingly valuable tool for enhancing the user experience and enabling more expressive and visually compelling video communications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

EdgeRelight360: Text-Conditioned 360-Degree HDR Image Generation for Real-Time On-Device Video Portrait Relighting

Min-Hui Lin, Mahesh Reddy, Guillaume Berger, Michel Sarkis, Fatih Porikli, Ning Bi

In this paper, we present EdgeRelight360, an approach for real-time video portrait relighting on mobile devices, utilizing text-conditioned generation of 360-degree high dynamic range image (HDRI) maps. Our method proposes a diffusion-based text-to-360-degree image generation in the HDR domain, taking advantage of the HDR10 standard. This technique facilitates the generation of high-quality, realistic lighting conditions from textual descriptions, offering flexibility and control in portrait video relighting task. Unlike the previous relighting frameworks, our proposed system performs video relighting directly on-device, enabling real-time inference with real 360-degree HDRI maps. This on-device processing ensures both privacy and guarantees low runtime, providing an immediate response to changes in lighting conditions or user inputs. Our approach paves the way for new possibilities in real-time video applications, including video conferencing, gaming, and augmented reality, by allowing dynamic, text-based control of lighting conditions.

4/16/2024

Lite2Relight: 3D-aware Single Image Portrait Relighting

Pramod Rao, Gereon Fox, Abhimitra Meka, Mallikarjun B R, Fangneng Zhan, Tim Weyrich, Bernd Bickel, Hanspeter Pfister, Wojciech Matusik, Mohamed Elgharib, Christian Theobalt

Achieving photorealistic 3D view synthesis and relighting of human portraits is pivotal for advancing AR/VR applications. Existing methodologies in portrait relighting demonstrate substantial limitations in terms of generalization and 3D consistency, coupled with inaccuracies in physically realistic lighting and identity preservation. Furthermore, personalization from a single view is difficult to achieve and often requires multiview images during the testing phase or involves slow optimization processes. This paper introduces Lite2Relight, a novel technique that can predict 3D consistent head poses of portraits while performing physically plausible light editing at interactive speed. Our method uniquely extends the generative capabilities and efficient volumetric representation of EG3D, leveraging a lightstage dataset to implicitly disentangle face reflectance and perform relighting under target HDRI environment maps. By utilizing a pre-trained geometry-aware encoder and a feature alignment module, we map input images into a relightable 3D space, enhancing them with a strong face geometry and reflectance prior. Through extensive quantitative and qualitative evaluations, we show that our method outperforms the state-of-the-art methods in terms of efficacy, photorealism, and practical application. This includes producing 3D-consistent results of the full head, including hair, eyes, and expressions. Lite2Relight paves the way for large-scale adoption of photorealistic portrait editing in various domains, offering a robust, interactive solution to a previously constrained problem. Project page: https://vcai.mpi-inf.mpg.de/projects/Lite2Relight/

7/16/2024

👨‍🏫

New!Personalized Video Relighting With an At-Home Light Stage

Jun Myeong Choi, Max Christman, Roni Sengupta

In this paper, we develop a personalized video relighting algorithm that produces high-quality and temporally consistent relit videos under any pose, expression, and lighting condition in real-time. Existing relighting algorithms typically rely either on publicly available synthetic data, which yields poor relighting results, or on actual light stage data which is difficult to acquire. We show that by just capturing recordings of a user watching YouTube videos on a monitor we can train a personalized algorithm capable of performing high-quality relighting under any condition. Our key contribution is a novel image-based neural relighting architecture that effectively separates the intrinsic appearance features - the geometry and reflectance of the face - from the source lighting and then combines them with the target lighting to generate a relit image. This neural architecture enables smoothing of intrinsic appearance features leading to temporally stable video relighting. Both qualitative and quantitative evaluations show that our architecture improves portrait image relighting quality and temporal consistency over state-of-the-art approaches on both casually captured `Light Stage at Your Desk' (LSYD) and light-stage-captured `One Light At a Time' (OLAT) datasets.

9/30/2024

Baking Relightable NeRF for Real-time Direct/Indirect Illumination Rendering

Euntae Choi, Vincent Carpentier, Seunghun Shin, Sungjoo Yoo

Relighting, which synthesizes a novel view under a given lighting condition (unseen in training time), is a must feature for immersive photo-realistic experience. However, real-time relighting is challenging due to high computation cost of the rendering equation which requires shape and material decomposition and visibility test to model shadow. Additionally, for indirect illumination, additional computation of rendering equation on each secondary surface point (where reflection occurs) is required rendering real-time relighting challenging. We propose a novel method that executes a CNN renderer to compute primary surface points and rendering parameters, required for direct illumination. We also present a lightweight hash grid-based renderer, for indirect illumination, which is recursively executed to perform the secondary ray tracing process. Both renderers are trained in a distillation from a pre-trained teacher model and provide real-time physically-based rendering under unseen lighting condition at a negligible loss of rendering quality.

9/17/2024