Towards Real-World HDR Video Reconstruction: A Large-Scale Benchmark Dataset and A Two-Stage Alignment Network

Read original: arXiv:2405.00244 - Published 5/2/2024 by Yong Shu, Liquan Shen, Xiangyu Hu, Mengyao Li, Zihao Zhou

Towards Real-World HDR Video Reconstruction: A Large-Scale Benchmark Dataset and A Two-Stage Alignment Network

Overview

This paper presents a large-scale benchmark dataset for high dynamic range (HDR) video reconstruction and a two-stage alignment network for addressing the challenge.
The dataset, called the Real-World HDR Video Reconstruction (RWHR) dataset, contains over 100 HDR video sequences captured in diverse real-world scenes, providing a comprehensive testbed for evaluating HDR video reconstruction methods.
The proposed two-stage alignment network combines global and local alignment to effectively handle various misalignment issues in HDR video reconstruction.

Plain English Explanation

In this paper, the researchers have developed a new dataset and a network to help improve the quality of HDR video reconstruction. HDR video refers to video with a wider range of brightness and color than standard video, which can provide a more realistic and vivid viewing experience.

The researchers created the Real-World HDR Video Reconstruction (RWHR) dataset, which contains over 100 HDR video sequences recorded in real-world settings. This dataset provides a comprehensive benchmark for evaluating different methods for reconstructing HDR video from standard video footage.

To address the challenge of HDR video reconstruction, the researchers also proposed a two-stage alignment network. This network first performs a global alignment to correct for large-scale camera movement, and then a local alignment to address smaller-scale misalignment issues. By combining these two alignment steps, the network can effectively handle the various types of misalignment that can occur in real-world HDR video recordings.

Technical Explanation

The paper presents a large-scale Real-World HDR Video Reconstruction (RWHR) dataset for evaluating HDR video reconstruction methods. The dataset contains over 100 HDR video sequences captured in diverse real-world scenes, such as outdoor landscapes, indoor environments, and dynamic events. This dataset provides a comprehensive testbed for assessing the performance of HDR video reconstruction algorithms in realistic conditions, going beyond the limited datasets and synthetic setups used in previous research.

To address the challenges of HDR video reconstruction, the authors propose a two-stage alignment network. The first stage of the network performs a global alignment to correct for large-scale camera motion, such as panning and tilting. The second stage then performs a local alignment to address smaller-scale misalignment issues, such as those caused by object motion or dynamic lighting changes. By combining these two alignment steps, the network can effectively handle the diverse types of misalignment encountered in real-world HDR video recordings.

The authors evaluate their proposed two-stage alignment network on the RWHR dataset and demonstrate its superiority over existing HDR video reconstruction methods, both in terms of objective image quality metrics and subjective user studies.

Critical Analysis

The Real-World HDR Video Reconstruction (RWHR) dataset presented in this paper is a valuable contribution to the field of HDR video research, as it provides a more realistic and comprehensive testbed for evaluating HDR video reconstruction algorithms. However, the dataset is limited to a specific set of real-world scenes, and it may not capture the full diversity of scenarios that HDR video reconstruction methods may encounter in practical applications.

The two-stage alignment network proposed in the paper addresses a crucial challenge in HDR video reconstruction, but its performance may be dependent on the quality and consistency of the input video frames. In real-world scenarios, video footage can be subject to various artifacts, such as motion blur, compression artifacts, and sensor noise, which could potentially degrade the network's alignment performance.

Furthermore, the authors do not provide a detailed analysis of the computational complexity and runtime of their proposed network, which is an important consideration for practical deployment, especially in applications that require real-time processing, such as incremental joint learning of depth, pose, and implicit scene or adapting pretrained networks for image quality assessment in high dynamic range scenarios.

Conclusion

This paper presents a significant advancement in the field of HDR video reconstruction by introducing a large-scale Real-World HDR Video Reconstruction (RWHR) dataset and a two-stage alignment network that effectively handles various misalignment issues. The proposed approach demonstrates promising results and could pave the way for further improvements in HDR video reconstruction for dynamic scenes and events, pixel-wise acquisition models using deep learning, and potentially other applications that require high-quality HDR imaging, such as incremental joint learning of depth, pose, and implicit scene and adapting pretrained networks for image quality assessment in high dynamic range scenarios.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Towards Real-World HDR Video Reconstruction: A Large-Scale Benchmark Dataset and A Two-Stage Alignment Network

Yong Shu, Liquan Shen, Xiangyu Hu, Mengyao Li, Zihao Zhou

As an important and practical way to obtain high dynamic range (HDR) video, HDR video reconstruction from sequences with alternating exposures is still less explored, mainly due to the lack of large-scale real-world datasets. Existing methods are mostly trained on synthetic datasets, which perform poorly in real scenes. In this work, to facilitate the development of real-world HDR video reconstruction, we present Real-HDRV, a large-scale real-world benchmark dataset for HDR video reconstruction, featuring various scenes, diverse motion patterns, and high-quality labels. Specifically, our dataset contains 500 LDRs-HDRs video pairs, comprising about 28,000 LDR frames and 4,000 HDR labels, covering daytime, nighttime, indoor, and outdoor scenes. To our best knowledge, our dataset is the largest real-world HDR video reconstruction dataset. Correspondingly, we propose an end-to-end network for HDR video reconstruction, where a novel two-stage strategy is designed to perform alignment sequentially. Specifically, the first stage performs global alignment with the adaptively estimated global offsets, reducing the difficulty of subsequent alignment. The second stage implicitly performs local alignment in a coarse-to-fine manner at the feature level using the adaptive separable convolution. Extensive experiments demonstrate that: (1) models trained on our dataset can achieve better performance on real scenes than those trained on synthetic datasets; (2) our method outperforms previous state-of-the-art methods. Our dataset is available at https://github.com/yungsyu99/Real-HDRV.

5/2/2024

Efficient HDR Reconstruction from Real-World Raw Images

Qirui Yang, Yihao Liu, Qihua Chen, Huanjing Yue, Kun Li, Jingyu Yang

The widespread usage of high-definition screens on edge devices stimulates a strong demand for efficient high dynamic range (HDR) algorithms. However, many existing HDR methods either deliver unsatisfactory results or consume too much computational and memory resources, hindering their application to high-resolution images (usually with more than 12 megapixels) in practice. In addition, existing HDR dataset collection methods often are labor-intensive. In this work, in a new aspect, we discover an excellent opportunity for HDR reconstructing directly from raw images and investigating novel neural network structures that benefit the deployment of mobile devices. Our key insights are threefold: (1) we develop a lightweight-efficient HDR model, RepUNet, using the structural re-parameterization technique to achieve fast and robust HDR; (2) we design a new computational raw HDR data formation pipeline and construct a real-world raw HDR dataset, RealRaw-HDR; (3) we propose a plug-and-play motion alignment loss to mitigate motion ghosting under limited bandwidth conditions. Our model contains less than 830K parameters and takes less than 3 ms to process an image of 4K resolution using one RTX 3090 GPU. While being highly efficient, our model also outperforms the state-of-the-art HDR methods in terms of PSNR, SSIM, and a color difference metric.

6/6/2024

HDR Imaging for Dynamic Scenes with Events

Li Xiaopeng, Zeng Zhaoyuan, Fan Cien, Zhao Chen, Deng Lei, Yu Lei

High dynamic range imaging (HDRI) for real-world dynamic scenes is challenging because moving objects may lead to hybrid degradation of low dynamic range and motion blur. Existing event-based approaches only focus on a separate task, while cascading HDRI and motion deblurring would lead to sub-optimal solutions, and unavailable ground-truth sharp HDR images aggravate the predicament. To address these challenges, we propose an Event-based HDRI framework within a Self-supervised learning paradigm, i.e., Self-EHDRI, which generalizes HDRI performance in real-world dynamic scenarios. Specifically, a self-supervised learning strategy is carried out by learning cross-domain conversions from blurry LDR images to sharp LDR images, which enables sharp HDR images to be accessible in the intermediate process even though ground-truth sharp HDR images are missing. Then, we formulate the event-based HDRI and motion deblurring model and conduct a unified network to recover the intermediate sharp HDR results, where both the high dynamic range and high temporal resolution of events are leveraged simultaneously for compensation. We construct large-scale synthetic and real-world datasets to evaluate the effectiveness of our method. Comprehensive experiments demonstrate that the proposed Self-EHDRI outperforms state-of-the-art approaches by a large margin. The codes, datasets, and results are available at https://lxp-whu.github.io/Self-EHDRI.

4/5/2024

Diffusion-Promoted HDR Video Reconstruction

Yuanshen Guan, Ruikang Xu, Mingde Yao, Ruisheng Gao, Lizhi Wang, Zhiwei Xiong

High dynamic range (HDR) video reconstruction aims to generate HDR videos from low dynamic range (LDR) frames captured with alternating exposures. Most existing works solely rely on the regression-based paradigm, leading to adverse effects such as ghosting artifacts and missing details in saturated regions. In this paper, we propose a diffusion-promoted method for HDR video reconstruction, termed HDR-V-Diff, which incorporates a diffusion model to capture the HDR distribution. As such, HDR-V-Diff can reconstruct HDR videos with realistic details while alleviating ghosting artifacts. However, the direct introduction of video diffusion models would impose massive computational burden. Instead, to alleviate this burden, we first propose an HDR Latent Diffusion Model (HDR-LDM) to learn the distribution prior of single HDR frames. Specifically, HDR-LDM incorporates a tonemapping strategy to compress HDR frames into the latent space and a novel exposure embedding to aggregate the exposure information into the diffusion process. We then propose a Temporal-Consistent Alignment Module (TCAM) to learn the temporal information as a complement for HDR-LDM, which conducts coarse-to-fine feature alignment at different scales among video frames. Finally, we design a Zero-Init Cross-Attention (ZiCA) mechanism to effectively integrate the learned distribution prior and temporal information for generating HDR frames. Extensive experiments validate that HDR-V-Diff achieves state-of-the-art results on several representative datasets.

6/13/2024