Sagiri: Low Dynamic Range Image Enhancement with Generative Diffusion Prior

Read original: arXiv:2406.09389 - Published 6/14/2024 by Baiang Li, Sizhuo Ma, Yanhong Zeng, Xiaogang Xu, Youqing Fang, Zhao Zhang, Jian Wang, Kai Chen

Sagiri: Low Dynamic Range Image Enhancement with Generative Diffusion Prior

Overview

This paper introduces Sagiri, a deep learning-based method for enhancing low dynamic range (LDR) images using a generative diffusion prior.
Sagiri aims to generate high dynamic range (HDR) images from their LDR counterparts, improving overall image quality and visual appeal.
The method leverages the power of diffusion models, which have shown promising results in image-to-image translation tasks and HDR video reconstruction.

Plain English Explanation

Sagiri is a new technique that can take a standard, low-quality image and transform it into a higher-quality, more visually appealing one. It works by using a special kind of artificial intelligence called a "diffusion model" to generate a new, enhanced version of the original image.

Diffusion models are a powerful tool that have been used to tackle a variety of image-related tasks, like generating HDR images from LDR ones and reconstructing HDR videos. In the case of Sagiri, the diffusion model is trained to take a low-quality image and gradually transform it into a high-quality, HDR version, preserving important details and improving the overall visual appeal.

This can be really useful for things like enhancing photos taken on a smartphone or fixing up old, low-quality images. By using the power of machine learning, Sagiri can do the heavy lifting and generate a much better-looking image than the original.

Technical Explanation

The core idea behind Sagiri is to leverage the capabilities of diffusion models for the task of LDR image enhancement. Diffusion models have shown great potential in semantic-aware inverse tone mapping and perceptual optimization of HDR images, making them a natural choice for this application.

Sagiri's architecture consists of a U-Net-based diffusion model that takes an LDR image as input and gradually transforms it into an HDR counterpart. The model is trained using a combination of adversarial and perceptual losses, which help to ensure that the generated HDR images are not only visually appealing but also preserve important details and maintain a high level of realism.

The key innovation in Sagiri is the use of a generative diffusion prior, which helps to guide the model towards generating plausible HDR images that are consistent with the underlying data distribution. This is in contrast to more traditional approaches that rely on hand-crafted priors or optimization-based techniques, which can be more brittle and less flexible.

Critical Analysis

The authors of the Sagiri paper have made a compelling case for the usefulness of diffusion models in the context of LDR image enhancement. By leveraging the power of these generative models, they have demonstrated the ability to produce high-quality HDR images from their low-dynamic-range counterparts.

One potential limitation of the Sagiri approach is that it may struggle with particularly challenging or low-quality input images, where the diffusion model may have difficulty generating a faithful HDR representation. Additionally, the computational complexity of the diffusion process could be a bottleneck for real-time applications or resource-constrained devices.

It would also be interesting to see how Sagiri compares to other diffusion-based approaches for blind image restoration in terms of both performance and generalization capabilities. Further research in this direction could help to better understand the strengths and weaknesses of the diffusion-based paradigm for image enhancement tasks.

Conclusion

The Sagiri paper presents a novel approach to LDR image enhancement using a generative diffusion prior. By harnessing the power of diffusion models, the authors have demonstrated the ability to generate high-quality HDR images from their low-dynamic-range counterparts, with potential applications in areas such as computational photography, image editing, and visual media enhancement.

While the method shows promising results, there are still some open challenges and areas for further exploration. Nonetheless, the Sagiri technique represents an important step forward in the field of image enhancement and demonstrates the potential of diffusion-based approaches to tackle complex visual tasks.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Sagiri: Low Dynamic Range Image Enhancement with Generative Diffusion Prior

Baiang Li, Sizhuo Ma, Yanhong Zeng, Xiaogang Xu, Youqing Fang, Zhao Zhang, Jian Wang, Kai Chen

Capturing High Dynamic Range (HDR) scenery using 8-bit cameras often suffers from over-/underexposure, loss of fine details due to low bit-depth compression, skewed color distributions, and strong noise in dark areas. Traditional LDR image enhancement methods primarily focus on color mapping, which enhances the visual representation by expanding the image's color range and adjusting the brightness. However, these approaches fail to effectively restore content in dynamic range extremes, which are regions with pixel values close to 0 or 255. To address the full scope of challenges in HDR imaging and surpass the limitations of current models, we propose a novel two-stage approach. The first stage maps the color and brightness to an appropriate range while keeping the existing details, and the second stage utilizes a diffusion prior to generate content in dynamic range extremes lost during capture. This generative refinement module can also be used as a plug-and-play module to enhance and complement existing LDR enhancement models. The proposed method markedly improves the quality and details of LDR images, demonstrating superior performance through rigorous experimental validation. The project page is at https://sagiri0208.github.io

6/14/2024

Exposure Diffusion: HDR Image Generation by Consistent LDR denoising

Mojtaba Bemana, Thomas Leimkuhler, Karol Myszkowski, Hans-Peter Seidel, Tobias Ritschel

We demonstrate generating high-dynamic range (HDR) images using the concerted action of multiple black-box, pre-trained low-dynamic range (LDR) image diffusion models. Common diffusion models are not HDR as, first, there is no sufficiently large HDR image dataset available to re-train them, and second, even if it was, re-training such models is impossible for most compute budgets. Instead, we seek inspiration from the HDR image capture literature that traditionally fuses sets of LDR images, called brackets, to produce a single HDR image. We operate multiple denoising processes to generate multiple LDR brackets that together form a valid HDR result. To this end, we introduce an exposure consistency term into the diffusion process to couple the brackets such that they agree across the exposure range they share. We demonstrate HDR versions of state-of-the-art unconditional and conditional as well as restoration-type (LDR2HDR) generative modeling.

5/24/2024

Diffusion-Promoted HDR Video Reconstruction

Yuanshen Guan, Ruikang Xu, Mingde Yao, Ruisheng Gao, Lizhi Wang, Zhiwei Xiong

High dynamic range (HDR) video reconstruction aims to generate HDR videos from low dynamic range (LDR) frames captured with alternating exposures. Most existing works solely rely on the regression-based paradigm, leading to adverse effects such as ghosting artifacts and missing details in saturated regions. In this paper, we propose a diffusion-promoted method for HDR video reconstruction, termed HDR-V-Diff, which incorporates a diffusion model to capture the HDR distribution. As such, HDR-V-Diff can reconstruct HDR videos with realistic details while alleviating ghosting artifacts. However, the direct introduction of video diffusion models would impose massive computational burden. Instead, to alleviate this burden, we first propose an HDR Latent Diffusion Model (HDR-LDM) to learn the distribution prior of single HDR frames. Specifically, HDR-LDM incorporates a tonemapping strategy to compress HDR frames into the latent space and a novel exposure embedding to aggregate the exposure information into the diffusion process. We then propose a Temporal-Consistent Alignment Module (TCAM) to learn the temporal information as a complement for HDR-LDM, which conducts coarse-to-fine feature alignment at different scales among video frames. Finally, we design a Zero-Init Cross-Attention (ZiCA) mechanism to effectively integrate the learned distribution prior and temporal information for generating HDR frames. Extensive experiments validate that HDR-V-Diff achieves state-of-the-art results on several representative datasets.

6/13/2024

Semantic Aware Diffusion Inverse Tone Mapping

Abhishek Goswami, Aru Ranjan Singh, Francesco Banterle, Kurt Debattista, Thomas Bashford-Rogers

The range of real-world scene luminance is larger than the capture capability of many digital camera sensors which leads to details being lost in captured images, most typically in bright regions. Inverse tone mapping attempts to boost these captured Standard Dynamic Range (SDR) images back to High Dynamic Range (HDR) by creating a mapping that linearizes the well exposed values from the SDR image, and provides a luminance boost to the clipped content. However, in most cases, the details in the clipped regions cannot be recovered or estimated. In this paper, we present a novel inverse tone mapping approach for mapping SDR images to HDR that generates lost details in clipped regions through a semantic-aware diffusion based inpainting approach. Our method proposes two major contributions - first, we propose to use a semantic graph to guide SDR diffusion based inpainting in masked regions in a saturated image. Second, drawing inspiration from traditional HDR imaging and bracketing methods, we propose a principled formulation to lift the SDR inpainted regions to HDR that is compatible with generative inpainting methods. Results show that our method demonstrates superior performance across different datasets on objective metrics, and subjective experiments show that the proposed method matches (and in most cases outperforms) state-of-art inverse tone mapping operators in terms of objective metrics and outperforms them for visual fidelity.

5/27/2024