Pix2HDR -- A pixel-wise acquisition and deep learning-based synthesis approach for high-speed HDR videos

Read original: arXiv:2310.16139 - Published 4/26/2024 by Caixin Wang, Jie Zhang, Matthew A. Wilson, Ralph Etienne-Cummings

🤿

Overview

This paper addresses the challenge of capturing high-speed, high dynamic range (HDR) video of dynamic scenes with fast motion and varying light conditions.
Existing methods that use multi-exposure frames to acquire HDR video suffer from issues like motion artifacts and reduced frame rate.
The authors propose a novel approach that samples individual pixels at different exposure times and phases, enabling high-speed HDR video capture.
They implement this on a programmable image sensor and use deep neural networks to transform the pixel-wise outputs into HDR video, minimizing motion blur.
The method demonstrates the ability to capture aliasing-free HDR video at 1000 frames per second, handling challenging low-light and high-contrast conditions.

Plain English Explanation

Taking high-quality video of fast-moving objects in changing lighting can be really hard for regular cameras. Conventional cameras often struggle because they can't capture the full range of bright and dark areas at the same time (high dynamic range). And trying to take multiple exposures to get the full range ends up causing blurry motion.

Instead of taking full frames at different exposures, this new approach samples individual pixels at different exposure times. It uses a special image sensor that can program each pixel to capture light for a different amount of time. By mixing pixels with different exposure levels, it can get the full dynamic range without as much blur.

The researchers then use a deep learning model to take these mixed-exposure pixel readings and reconstruct a high-quality HDR video. This lets them capture fast-moving objects in very bright or very dark scenes, something regular cameras have a hard time with. Combining the flexible pixel sampling with powerful AI processing makes this system much more adaptable to challenging real-world conditions.

Technical Explanation

The key innovation in this paper is a novel HDR video capture approach that samples individual pixels at varying exposure times and phase offsets, rather than taking full multi-exposure frames. This is implemented on a monochrome programmable image sensor that can control the exposure of each pixel independently.

By mixing pixels with different exposure levels, the system can simultaneously capture fast motion at high dynamic range, without the motion artifacts that plague frame-based multi-exposure methods. The authors then use a deep neural network to transform these pixel-wise outputs into HDR video, minimizing motion blur through end-to-end learning.

The experiments demonstrate the system's ability to capture aliasing-free HDR video at an impressive 1000 frames per second. This allows it to resolve fast motion even in challenging low-light conditions and against very bright backgrounds - scenarios where conventional cameras would struggle. The flexible pixel-level sampling combined with deep learning is a key enabler for this enhanced performance in dynamic scenes.

Critical Analysis

The paper makes a compelling case for the advantages of this pixel-wise HDR video capture approach over frame-based multi-exposure methods. By sampling individual pixels at different exposures, it avoids the motion artifacts that can plague frame-based HDR fusion. And the use of deep learning to decode the complex pixel-wise outputs into high-quality HDR video is a clever solution.

That said, the paper does not address some potential limitations of the approach. For example, the reliance on a specialized programmable image sensor may limit the scalability and cost-effectiveness compared to solutions that could be implemented on off-the-shelf hardware. Additionally, the deep learning model adds complexity and could introduce its own artifacts or biases if not carefully designed and trained.

Further research could explore ways to adapt this approach to work with more widely available image sensor hardware, or to investigate the robustness and generalization capabilities of the deep learning component. Assessing the perceptual quality and optimization of the HDR video output could also be an interesting avenue for future work.

Overall, this paper presents a novel and promising direction for high-speed HDR video capture that could have significant implications for a wide range of vision applications dealing with dynamic, high-contrast scenes.

Conclusion

This paper introduces a novel approach for capturing high-speed, high dynamic range video that overcomes the limitations of conventional frame-based multi-exposure methods. By sampling individual pixels at varying exposures and phase offsets, and using deep learning to reconstruct the final HDR video, the system can handle fast motion and challenging lighting conditions that would trip up regular cameras.

The ability to record aliasing-free HDR video at 1000 frames per second opens up new possibilities for vision applications that require both high temporal and high dynamic range resolution, such as autonomous driving, sports analysis, and scientific imaging. While the current implementation relies on specialized hardware, the core concepts of flexible pixel-level sampling and AI-powered reconstruction could potentially be adapted to more widely available image sensors in the future.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🤿

Pix2HDR -- A pixel-wise acquisition and deep learning-based synthesis approach for high-speed HDR videos

Caixin Wang, Jie Zhang, Matthew A. Wilson, Ralph Etienne-Cummings

Accurately capturing dynamic scenes with wide-ranging motion and light intensity is crucial for many vision applications. However, acquiring high-speed high dynamic range (HDR) video is challenging because the camera's frame rate restricts its dynamic range. Existing methods sacrifice speed to acquire multi-exposure frames. Yet, misaligned motion in these frames can still pose complications for HDR fusion algorithms, resulting in artifacts. Instead of frame-based exposures, we sample the videos using individual pixels at varying exposures and phase offsets. Implemented on a monochrome pixel-wise programmable image sensor, our sampling pattern simultaneously captures fast motion at a high dynamic range. We then transform pixel-wise outputs into an HDR video using end-to-end learned weights from deep neural networks, achieving high spatiotemporal resolution with minimized motion blurring. We demonstrate aliasing-free HDR video acquisition at 1000 FPS, resolving fast motion under low-light conditions and against bright backgrounds - both challenging conditions for conventional cameras. By combining the versatility of pixel-wise sampling patterns with the strength of deep neural networks at decoding complex scenes, our method greatly enhances the vision system's adaptability and performance in dynamic conditions.

4/26/2024

Efficient HDR Reconstruction from Real-World Raw Images

Qirui Yang, Yihao Liu, Qihua Chen, Huanjing Yue, Kun Li, Jingyu Yang

The widespread usage of high-definition screens on edge devices stimulates a strong demand for efficient high dynamic range (HDR) algorithms. However, many existing HDR methods either deliver unsatisfactory results or consume too much computational and memory resources, hindering their application to high-resolution images (usually with more than 12 megapixels) in practice. In addition, existing HDR dataset collection methods often are labor-intensive. In this work, in a new aspect, we discover an excellent opportunity for HDR reconstructing directly from raw images and investigating novel neural network structures that benefit the deployment of mobile devices. Our key insights are threefold: (1) we develop a lightweight-efficient HDR model, RepUNet, using the structural re-parameterization technique to achieve fast and robust HDR; (2) we design a new computational raw HDR data formation pipeline and construct a real-world raw HDR dataset, RealRaw-HDR; (3) we propose a plug-and-play motion alignment loss to mitigate motion ghosting under limited bandwidth conditions. Our model contains less than 830K parameters and takes less than 3 ms to process an image of 4K resolution using one RTX 3090 GPU. While being highly efficient, our model also outperforms the state-of-the-art HDR methods in terms of PSNR, SSIM, and a color difference metric.

6/6/2024

HDR Imaging for Dynamic Scenes with Events

Li Xiaopeng, Zeng Zhaoyuan, Fan Cien, Zhao Chen, Deng Lei, Yu Lei

High dynamic range imaging (HDRI) for real-world dynamic scenes is challenging because moving objects may lead to hybrid degradation of low dynamic range and motion blur. Existing event-based approaches only focus on a separate task, while cascading HDRI and motion deblurring would lead to sub-optimal solutions, and unavailable ground-truth sharp HDR images aggravate the predicament. To address these challenges, we propose an Event-based HDRI framework within a Self-supervised learning paradigm, i.e., Self-EHDRI, which generalizes HDRI performance in real-world dynamic scenarios. Specifically, a self-supervised learning strategy is carried out by learning cross-domain conversions from blurry LDR images to sharp LDR images, which enables sharp HDR images to be accessible in the intermediate process even though ground-truth sharp HDR images are missing. Then, we formulate the event-based HDRI and motion deblurring model and conduct a unified network to recover the intermediate sharp HDR results, where both the high dynamic range and high temporal resolution of events are leveraged simultaneously for compensation. We construct large-scale synthetic and real-world datasets to evaluate the effectiveness of our method. Comprehensive experiments demonstrate that the proposed Self-EHDRI outperforms state-of-the-art approaches by a large margin. The codes, datasets, and results are available at https://lxp-whu.github.io/Self-EHDRI.

4/5/2024

🏋️

FastHDRNet: A new efficient method for SDR-to-HDR Translation

Siyuan Tian, Hao Wang, Yiren Rong, Junhao Wang, Renjie Dai, Zhengxiao He

Modern displays nowadays possess the capability to render video content with a high dynamic range (HDR) and an extensive color gamut .However, the majority of available resources are still in standard dynamic range (SDR). Therefore, we need to identify an effective methodology for this objective.The existing deep neural networks (DNN) based SDR to HDR conversion methods outperforms conventional methods, but they are either too large to implement or generate some terrible artifacts. We propose a neural network for SDR to HDR conversion, termed FastHDRNet. This network includes two parts, Adaptive Universal Color Transformation (AUCT) and Local Enhancement (LE). The architecture is designed as a lightweight network that utilizes global statistics and local information with super high efficiency. After the experiment, we find that our proposed method achieves state-of-the-art performance in both quantitative comparisons and visual quality with a lightweight structure and a enhanced infer speed.

5/14/2024