UniINR: Event-guided Unified Rolling Shutter Correction, Deblurring, and Interpolation

Read original: arXiv:2305.15078 - Published 7/12/2024 by Yunfan LU, Guoqiang Liang, Yusheng Wang, Lin Wang, Hui Xiong

⛏️

Overview

The paper proposes a novel approach called UniINR to recover high-frame-rate global shutter (GS) sharp frames from a rolling shutter (RS) blur frame and paired events.
Rolling shutter cameras exhibit both distortion and blur during fast camera movements, which makes it challenging to recover clear high-frame-rate video.
The key idea is to use a spatial-temporal implicit neural representation (INR) to directly map position, time, and color, addressing the interlocking degradations.
The method features a lightweight model and efficient inference, outperforming prior approaches.

Plain English Explanation

When a camera moves quickly, the images it captures can become distorted and blurry. This is because the camera uses a "rolling shutter" to capture the image, where different parts of the frame are captured at different times. [This can lead to issues like the "Jello effect" where the image appears to wobble or motion blur.

The researchers propose a new way to fix these problems by using a special type of AI model called a "spatial-temporal implicit neural representation" (ST-INR). This model can directly map the position, time, and color of the image, allowing it to undo the distortion and blur caused by the rolling shutter.

The key innovation is that the model can do all of these corrections at once, instead of trying to fix the distortion and blur separately. This helps avoid compounding errors and produces better-quality results.

The model is also designed to be lightweight and efficient, so it can run quickly and be used in real-world applications. The researchers show that their method outperforms previous approaches in terms of both image quality and processing speed.

Overall, this research provides a powerful new tool for fixing the problems that can occur when capturing video with a rolling shutter camera, which is common in many consumer and industrial devices. By using advanced AI techniques, the researchers have found a way to produce crisp, clear video even during fast camera movements.

Technical Explanation

The paper proposes a novel method called UniINR to recover high-frame-rate global shutter (GS) sharp frames from a rolling shutter (RS) blur frame and paired events. Rolling shutter cameras frequently exhibit both distortion and blur during fast camera movements, making it challenging to recover clear high-frame-rate video.

The key idea is to use a spatial-temporal implicit neural representation (ST-INR) to directly map the position, time, and color, addressing the interlocking degradations. Specifically, the method introduces a spatial-temporal implicit encoding (STE) to convert the RS blur image and events into a spatial-temporal representation (STR). To query a specific sharp frame (GS or RS), the model embeds the exposure time into the STR and decodes the features pixel-by-pixel to recover a sharp frame.

The UniINR model is lightweight, with only 0.38M parameters, and achieves efficient inference of 2.83ms/frame for 31x frame interpolation of an RS blur frame. Extensive experiments show that the proposed method significantly outperforms prior approaches such as those using diffusion models for rolling shutter removal and using events to mitigate motion blur.

Critical Analysis

The paper presents a compelling and novel approach to addressing the challenge of recovering high-quality video from rolling shutter cameras during fast motion. The use of a spatial-temporal implicit neural representation to directly map the position, time, and color is an elegant solution that avoids the need for separate distortion correction and deblurring steps.

One potential limitation is that the method relies on having paired event data, which may not always be available, especially in legacy or consumer-grade camera systems. It would be interesting to see if the model could be adapted to work with RS blur frames alone, perhaps by incorporating some form of motion flow estimation or neural radiance fields.

Additionally, the paper does not provide much insight into the specific failure modes or limitations of the UniINR model. It would be valuable to understand the types of scenes or motion patterns where the method may struggle, as well as any potential artifacts or quality degradation that could arise.

Overall, the UniINR approach represents an impressive step forward in addressing the challenging problem of rolling shutter distortion and blur. The efficient inference and strong performance compared to prior methods suggest that this work could have significant practical applications in areas such as computational photography, robotics, and video production.

Conclusion

The paper presents UniINR, a novel method for recovering high-frame-rate global shutter (GS) sharp frames from a rolling shutter (RS) blur frame and paired events. By using a spatial-temporal implicit neural representation, the approach can directly map position, time, and color to address the interlocking degradations caused by rolling shutter distortion and blur.

UniINR features a lightweight model with efficient inference, outperforming prior methods in both image quality and processing speed. This research represents an important advancement in computational photography and video processing, providing a powerful tool for capturing clear, high-quality footage even during fast camera movements. The insights and techniques developed in this work could have far-reaching impacts in fields like robotics, autonomous vehicles, and immersive media.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

⛏️

UniINR: Event-guided Unified Rolling Shutter Correction, Deblurring, and Interpolation

Yunfan LU, Guoqiang Liang, Yusheng Wang, Lin Wang, Hui Xiong

Video frames captured by rolling shutter (RS) cameras during fast camera movement frequently exhibit RS distortion and blur simultaneously. Naturally, recovering high-frame-rate global shutter (GS) sharp frames from an RS blur frame must simultaneously consider RS correction, deblur, and frame interpolation. A naive way is to decompose the whole process into separate tasks and cascade existing methods; however, this results in cumulative errors and noticeable artifacts. Event cameras enjoy many advantages, e.g., high temporal resolution, making them potential for our problem. To this end, we propose the first and novel approach, named UniINR, to recover arbitrary frame-rate sharp GS frames from an RS blur frame and paired events. Our key idea is unifying spatial-temporal implicit neural representation (INR) to directly map the position and time coordinates to color values to address the interlocking degradations. Specifically, we introduce spatial-temporal implicit encoding (STE) to convert an RS blur image and events into a spatial-temporal representation (STR). To query a specific sharp frame (GS or RS), we embed the exposure time into STR and decode the embedded features pixel-by-pixel to recover a sharp frame. Our method features a lightweight model with only 0.38M parameters, and it also enjoys high inference efficiency, achieving 2.83ms/frame in 31 times frame interpolation of an RS blur frame. Extensive experiments show that our method significantly outperforms prior methods. Code is available at https://github.com/yunfanLu/UniINR.

7/12/2024

📉

HR-INR: Continuous Space-Time Video Super-Resolution via Event Camera

Yunfan Lu, Zipeng Wang, Yusheng Wang, Hui Xiong

Continuous space-time video super-resolution (C-STVSR) aims to simultaneously enhance video resolution and frame rate at an arbitrary scale. Recently, implicit neural representation (INR) has been applied to video restoration, representing videos as implicit fields that can be decoded at an arbitrary scale. However, the highly ill-posed nature of C-STVSR limits the effectiveness of current INR-based methods: they assume linear motion between frames and use interpolation or feature warping to generate features at arbitrary spatiotemporal positions with two consecutive frames. This restrains C-STVSR from capturing rapid and nonlinear motion and long-term dependencies (involving more than two frames) in complex dynamic scenes. In this paper, we propose a novel C-STVSR framework, called HR-INR, which captures both holistic dependencies and regional motions based on INR. It is assisted by an event camera, a novel sensor renowned for its high temporal resolution and low latency. To fully utilize the rich temporal information from events, we design a feature extraction consisting of (1) a regional event feature extractor - taking events as inputs via the proposed event temporal pyramid representation to capture the regional nonlinear motion and (2) a holistic event-frame feature extractor for long-term dependence and continuity motion. We then propose a novel INR-based decoder with spatiotemporal embeddings to capture long-term dependencies with a larger temporal perception field. We validate the effectiveness and generalization of our method on four datasets (both simulated and real data), showing the superiority of our method.

5/24/2024

Rolling Shutter Correction with Intermediate Distortion Flow Estimation

Mingdeng Cao, Sidi Yang, Yujiu Yang, Yinqiang Zheng

This paper proposes to correct the rolling shutter (RS) distorted images by estimating the distortion flow from the global shutter (GS) to RS directly. Existing methods usually perform correction using the undistortion flow from the RS to GS. They initially predict the flow from consecutive RS frames, subsequently rescaling it as the displacement fields from the RS frame to the underlying GS image using time-dependent scaling factors. Following this, RS-aware forward warping is employed to convert the RS image into its GS counterpart. Nevertheless, this strategy is prone to two shortcomings. First, the undistortion flow estimation is rendered inaccurate by merely linear scaling the flow, due to the complex non-linear motion nature. Second, RS-aware forward warping often results in unavoidable artifacts. To address these limitations, we introduce a new framework that directly estimates the distortion flow and rectifies the RS image with the backward warping operation. More specifically, we first propose a global correlation-based flow attention mechanism to estimate the initial distortion flow and GS feature jointly, which are then refined by the following coarse-to-fine decoder layers. Additionally, a multi-distortion flow prediction strategy is integrated to mitigate the issue of inaccurate flow estimation further. Experimental results validate the effectiveness of the proposed method, which outperforms state-of-the-art approaches on various benchmarks while maintaining high efficiency. The project is available at url{https://github.com/ljzycmd/DFRSC}.

4/10/2024

Single Image Rolling Shutter Removal with Diffusion Models

Zhanglei Yang, Haipeng Li, Mingbo Hong, Bing Zeng, Shuaicheng Liu

We present RS-Diffusion, the first Diffusion Models-based method for single-frame Rolling Shutter (RS) correction. RS artifacts compromise visual quality of frames due to the row wise exposure of CMOS sensors. Most previous methods have focused on multi-frame approaches, using temporal information from consecutive frames for the motion rectification. However, few approaches address the more challenging but important single frame RS correction. In this work, we present an ``image-to-motion'' framework via diffusion techniques, with a designed patch-attention module. In addition, we present the RS-Real dataset, comprised of captured RS frames alongside their corresponding Global Shutter (GS) ground-truth pairs. The GS frames are corrected from the RS ones, guided by the corresponding Inertial Measurement Unit (IMU) gyroscope data acquired during capture. Experiments show that our RS-Diffusion surpasses previous single RS correction methods. Our method and proposed RS-Real dataset lay a solid foundation for advancing the field of RS correction.

7/4/2024