DemosaicFormer: Coarse-to-Fine Demosaicing Network for HybridEVS Camera

Read original: arXiv:2406.07951 - Published 6/13/2024 by Senyan Xu, Zhijing Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha

DemosaicFormer: Coarse-to-Fine Demosaicing Network for HybridEVS Camera

Overview

• This paper presents a new deep learning-based method called DemosaicFormer for demosaicing images captured by a hybrid event-based and conventional video sensor (HybridEVS) camera.

• Demosaicing is the process of reconstructing a full-color image from the incomplete color samples captured by the camera's sensor, which uses a color filter array.

• The DemosaicFormer approach uses a coarse-to-fine strategy, with an initial coarse demosaicing followed by a refinement stage to produce the final high-quality output.

• The authors demonstrate the effectiveness of their method on the MIPI 2024 Challenge dataset for HybridEVS camera demosaicing.

Plain English Explanation

The paper describes a new way to process images from a special type of camera called a HybridEVS camera. These cameras capture both traditional video frames and additional information about changes in brightness over time (called events).

To get a full-color image from a HybridEVS camera, a process called demosaicing is needed. This is because the camera's sensor only captures a partial color sample at each pixel location, using a special color filter. The DemosaicFormer method first does a coarse reconstruction of the full-color image, and then refines it to produce the final high-quality result.

The authors show that their DemosaicFormer approach works well on a dataset created for a recent challenge focused on demosaicing for HybridEVS cameras. This suggests it could be a useful tool for processing images from these types of cameras, which have applications in areas like autonomous vehicles and robotic vision.

Technical Explanation

The paper introduces a new deep learning-based method called DemosaicFormer for demosaicing images captured by a HybridEVS camera. The DemosaicFormer approach uses a coarse-to-fine strategy, with an initial coarse demosaicing followed by a refinement stage to produce the final high-quality output.

The coarse demosaicing stage is based on a transformer-like architecture, which the authors call the "Coarse Demosaicing Transformer" (CDT). This initial stage takes the Bayer-patterned sensor data as input and produces a coarse demosaiced image.

The refinement stage then uses a convolutional neural network, the "Fine Demosaicing Network" (FDN), to further enhance the coarse output and produce the final high-resolution demosaiced image. This two-stage approach allows the model to first capture the overall structure of the image and then refine the details.

The authors evaluate their DemosaicFormer method on the MIPI 2024 Challenge dataset for HybridEVS camera demosaicing, and show that it outperforms several baseline approaches. They also conduct ablation studies to analyze the contribution of the individual components of their architecture.

Critical Analysis

The paper presents a novel and well-designed approach to the problem of demosaicing for HybridEVS cameras. The coarse-to-fine strategy is a clever way to leverage the strengths of both transformer-based and convolutional neural network architectures.

One potential limitation of the work is that it is evaluated only on the MIPI 2024 Challenge dataset, which may not fully represent the diversity of real-world HybridEVS camera applications. It would be interesting to see how the DemosaicFormer method performs on a broader range of datasets and use cases.

Additionally, the paper does not discuss the computational complexity or runtime performance of the proposed approach. This information would be useful for understanding the practicality of deploying DemosaicFormer in real-world systems, especially those with tight resource constraints, such as autonomous vehicles or mobile robots.

Further research could also explore the integration of the DemosaicFormer method with other neuromorphic vision techniques, or investigate ways to leverage the event-based information captured by HybridEVS cameras to further improve the demosaicing process.

Conclusion

The DemosaicFormer method presented in this paper represents a significant advancement in the field of demosaicing for HybridEVS cameras. By combining a coarse demosaicing stage with a refinement network, the authors have developed a effective approach that can produce high-quality reconstructed images from the partial color information captured by these specialized sensors.

The strong performance of DemosaicFormer on the MIPI 2024 Challenge dataset suggests it could be a valuable tool for applications that rely on HybridEVS cameras, such as autonomous vehicles, robotics, and computational photography. Further research to expand the method's capabilities and explore its integration with other neuromorphic vision techniques could unlock even more exciting possibilities for this emerging sensor technology.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

DemosaicFormer: Coarse-to-Fine Demosaicing Network for HybridEVS Camera

Senyan Xu, Zhijing Sun, Jiaying Zhu, Yurui Zhu, Xueyang Fu, Zheng-Jun Zha

Hybrid Event-Based Vision Sensor (HybridEVS) is a novel sensor integrating traditional frame-based and event-based sensors, offering substantial benefits for applications requiring low-light, high dynamic range, and low-latency environments, such as smartphones and wearable devices. Despite its potential, the lack of Image signal processing (ISP) pipeline specifically designed for HybridEVS poses a significant challenge. To address this challenge, in this study, we propose a coarse-to-fine framework named DemosaicFormer which comprises coarse demosaicing and pixel correction. Coarse demosaicing network is designed to produce a preliminary high-quality estimate of the RGB image from the HybridEVS raw data while the pixel correction network enhances the performance of image restoration and mitigates the impact of defective pixels. Our key innovation is the design of a Multi-Scale Gating Module (MSGM) applying the integration of cross-scale features, which allows feature information to flow between different scales. Additionally, the adoption of progressive training and data augmentation strategies further improves model's robustness and effectiveness. Experimental results show superior performance against the existing methods both qualitatively and visually, and our DemosaicFormer achieves the best performance in terms of all the evaluation metrics in the MIPI 2024 challenge on Demosaic for Hybridevs Camera. The code is available at https://github.com/QUEAHREN/DemosaicFormer.

6/13/2024

Event Camera Demosaicing via Swin Transformer and Pixel-focus Loss

Yunfan Lu, Yijie Xu, Wenzong Ma, Weiyu Guo, Hui Xiong

Recent research has highlighted improvements in high-quality imaging guided by event cameras, with most of these efforts concentrating on the RGB domain. However, these advancements frequently neglect the unique challenges introduced by the inherent flaws in the sensor design of event cameras in the RAW domain. Specifically, this sensor design results in the partial loss of pixel values, posing new challenges for RAW domain processes like demosaicing. The challenge intensifies as most research in the RAW domain is based on the premise that each pixel contains a value, making the straightforward adaptation of these methods to event camera demosaicing problematic. To end this, we present a Swin-Transformer-based backbone and a pixel-focus loss function for demosaicing with missing pixel values in RAW domain processing. Our core motivation is to refine a general and widely applicable foundational model from the RGB domain for RAW domain processing, thereby broadening the model's applicability within the entire imaging process. Our method harnesses multi-scale processing and space-to-depth techniques to ensure efficiency and reduce computing complexity. We also proposed the Pixel-focus Loss function for network fine-tuning to improve network convergence based on our discovery of a long-tailed distribution in training loss. Our method has undergone validation on the MIPI Demosaic Challenge dataset, with subsequent analytical experimentation confirming its efficacy. All code and trained models are released here: https://github.com/yunfanLu/ev-demosaic

4/4/2024

EV-MGDispNet: Motion-Guided Event-Based Stereo Disparity Estimation Network with Left-Right Consistency

Junjie Jiang, Hao Zhuang, Xinjie Huang, Delei Kong, Zheng Fang

Event cameras have the potential to revolutionize the field of robot vision, particularly in areas like stereo disparity estimation, owing to their high temporal resolution and high dynamic range. Many studies use deep learning for event camera stereo disparity estimation. However, these methods fail to fully exploit the temporal information in the event stream to acquire clear event representations. Additionally, there is room for further reduction in pixel shifts in the feature maps before constructing the cost volume. In this paper, we propose EV-MGDispNet, a novel event-based stereo disparity estimation method. Firstly, we propose an edge-aware aggregation (EAA) module, which fuses event frames and motion confidence maps to generate a novel clear event representation. Then, we propose a motion-guided attention (MGA) module, where motion confidence maps utilize deformable transformer encoders to enhance the feature map with more accurate edges. Finally, we also add a census left-right consistency loss function to enhance the left-right consistency of stereo event representation. Through conducting experiments within challenging real-world driving scenarios, we validate that our method outperforms currently known state-of-the-art methods in terms of mean absolute error (MAE) and root mean square error (RMSE) metrics.

8/13/2024

🧪

V2CE: Video to Continuous Events Simulator

Zhongyang Zhang, Shuyang Cui, Kaidong Chai, Haowen Yu, Subhasis Dasgupta, Upal Mahbub, Tauhidur Rahman

Dynamic Vision Sensor (DVS)-based solutions have recently garnered significant interest across various computer vision tasks, offering notable benefits in terms of dynamic range, temporal resolution, and inference speed. However, as a relatively nascent vision sensor compared to Active Pixel Sensor (APS) devices such as RGB cameras, DVS suffers from a dearth of ample labeled datasets. Prior efforts to convert APS data into events often grapple with issues such as a considerable domain shift from real events, the absence of quantified validation, and layering problems within the time axis. In this paper, we present a novel method for video-to-events stream conversion from multiple perspectives, considering the specific characteristics of DVS. A series of carefully designed losses helps enhance the quality of generated event voxels significantly. We also propose a novel local dynamic-aware timestamp inference strategy to accurately recover event timestamps from event voxels in a continuous fashion and eliminate the temporal layering problem. Results from rigorous validation through quantified metrics at all stages of the pipeline establish our method unquestionably as the current state-of-the-art (SOTA).

4/30/2024