Event-Enhanced Snapshot Compressive Videography at 10K FPS

Read original: arXiv:2404.07551 - Published 4/12/2024 by Bo Zhang, Jinli Suo, Qionghai Dai

Event-Enhanced Snapshot Compressive Videography at 10K FPS

Overview

This paper presents a new approach for capturing ultra-high-speed video at 10,000 frames per second (FPS) using a combination of snapshot compressive imaging and event cameras.
The researchers developed a dual-path optical setup and a dual-branch Transformer network to efficiently capture and reconstruct high-speed video from a single snapshot and event data.
This technique could enable new applications in fields like high-speed cinematography, scientific imaging, and autonomous systems that require rapid visual processing.

Plain English Explanation

The paper describes a new way to capture super-fast video at 10,000 frames per second (FPS). Normally, capturing video this fast would require expensive, specialized cameras. But the researchers have come up with a clever solution using two different types of cameras working together.

The first camera is a standard snapshot camera, which takes a single picture at a time. The second camera is an event camera, which is really good at detecting small changes in the scene. By combining the information from these two cameras, the researchers can reconstruct a high-speed video from just a single snapshot.

This is done using a special dual-path optical setup that splits the light between the two cameras. A machine learning algorithm then analyzes the snapshot and event camera data to recreate the full video sequence. This allows them to capture ultra-high-speed video without needing an extremely fast traditional camera.

The key advantages of this approach are that it's much more affordable and compact than existing high-speed video systems. It could open up new applications in fields like scientific research, movie production, and self-driving cars, where being able to see and respond to fast-moving events is crucial.

Technical Explanation

The paper presents an Event-Enhanced Snapshot Compressive Videography system that can capture 10,000 FPS video using a combination of snapshot compressive imaging and an event camera.

The key innovation is a dual-path optical setup that splits the incoming light between a standard snapshot camera and an event camera. A dual-branch Transformer network is then used to reconstruct the high-speed video sequence from the single snapshot and the event data.

The event camera provides high-temporal resolution information about changes in the scene, which helps the network efficiently recover the full video frames from the compressed snapshot. Experiments show this approach can achieve 10,000 FPS video reconstruction with high fidelity, outperforming previous snapshot compressive imaging methods.

Critical Analysis

The paper presents a compelling technical solution for high-speed videography, but there are a few potential limitations and areas for further research:

The current setup requires a custom dual-path optical system, which may limit its practical deployment compared to a single-sensor approach.
The Transformer network architecture, while effective, may have high computational requirements that could constrain real-time performance on resource-limited devices.
The evaluation was conducted on controlled lab scenes, so further testing is needed to assess the system's robustness to real-world conditions like varying lighting, motion blur, and occlusions.

Despite these caveats, the core idea of leveraging event cameras to enhance snapshot compressive imaging is a promising direction. Continued research in this area could lead to more affordable and accessible ultra-high-speed imaging solutions across a range of applications.

Conclusion

This paper introduces a novel Event-Enhanced Snapshot Compressive Videography system that can capture 10,000 FPS video using a combination of snapshot imaging and event cameras. The dual-path optical setup and dual-branch Transformer network enable efficient reconstruction of high-speed video from a single compressed snapshot.

This work demonstrates the potential of hybrid imaging approaches to push the boundaries of what's possible with conventional camera hardware. If further developed, this technology could have a significant impact on fields like high-speed cinematography, scientific imaging, and autonomous systems that require rapid visual processing. Overall, the paper presents an exciting step forward in the quest for more versatile and capable imaging solutions.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Event-Enhanced Snapshot Compressive Videography at 10K FPS

Bo Zhang, Jinli Suo, Qionghai Dai

Video snapshot compressive imaging (SCI) encodes the target dynamic scene compactly into a snapshot and reconstructs its high-speed frame sequence afterward, greatly reducing the required data footprint and transmission bandwidth as well as enabling high-speed imaging with a low frame rate intensity camera. In implementation, high-speed dynamics are encoded via temporally varying patterns, and only frames at corresponding temporal intervals can be reconstructed, while the dynamics occurring between consecutive frames are lost. To unlock the potential of conventional snapshot compressive videography, we propose a novel hybrid intensity+event imaging scheme by incorporating an event camera into a video SCI setup. Our proposed system consists of a dual-path optical setup to record the coded intensity measurement and intermediate event signals simultaneously, which is compact and photon-efficient by collecting the half photons discarded in conventional video SCI. Correspondingly, we developed a dual-branch Transformer utilizing the reciprocal relationship between two data modes to decode dense video frames. Extensive experiments on both simulated and real-captured data demonstrate our superiority to state-of-the-art video SCI and video frame interpolation (VFI) methods. Benefiting from the new hybrid design leveraging both intrinsic redundancy in videos and the unique feature of event cameras, we achieve high-quality videography at 0.1ms time intervals with a low-cost CMOS image sensor working at 24 FPS.

4/12/2024

🤖

Towards Real-time Video Compressive Sensing on Mobile Devices

Miao Cao, Lishun Wang, Huan Wang, Guoqing Wang, Xin Yuan

Video Snapshot Compressive Imaging (SCI) uses a low-speed 2D camera to capture high-speed scenes as snapshot compressed measurements, followed by a reconstruction algorithm to retrieve the high-speed video frames. The fast evolving mobile devices and existing high-performance video SCI reconstruction algorithms motivate us to develop mobile reconstruction methods for real-world applications. Yet, it is still challenging to deploy previous reconstruction algorithms on mobile devices due to the complex inference process, let alone real-time mobile reconstruction. To the best of our knowledge, there is no video SCI reconstruction model designed to run on the mobile devices. Towards this end, in this paper, we present an effective approach for video SCI reconstruction, dubbed MobileSCI, which can run at real-time speed on the mobile devices for the first time. Specifically, we first build a U-shaped 2D convolution-based architecture, which is much more efficient and mobile-friendly than previous state-of-the-art reconstruction methods. Besides, an efficient feature mixing block, based on the channel splitting and shuffling mechanisms, is introduced as a novel bottleneck block of our proposed MobileSCI to alleviate the computational burden. Finally, a customized knowledge distillation strategy is utilized to further improve the reconstruction quality. Extensive results on both simulated and real data show that our proposed MobileSCI can achieve superior reconstruction quality with high efficiency on the mobile devices. Particularly, we can reconstruct a 256 X 256 X 8 snapshot compressed measurement with real-time performance (about 35 FPS) on an iPhone 15. Code is available at https://github.com/mcao92/MobileSCI.

8/15/2024

🧪

A Simple Low-bit Quantization Framework for Video Snapshot Compressive Imaging

Miao Cao, Lishun Wang, Huan Wang, Xin Yuan

Video Snapshot Compressive Imaging (SCI) aims to use a low-speed 2D camera to capture high-speed scene as snapshot compressed measurements, followed by a reconstruction algorithm to reconstruct the high-speed video frames. State-of-the-art (SOTA) deep learning-based algorithms have achieved impressive performance, yet with heavy computational workload. Network quantization is a promising way to reduce computational cost. However, a direct low-bit quantization will bring large performance drop. To address this challenge, in this paper, we propose a simple low-bit quantization framework (dubbed Q-SCI) for the end-to-end deep learning-based video SCI reconstruction methods which usually consist of a feature extraction, feature enhancement, and video reconstruction module. Specifically, we first design a high-quality feature extraction module and a precise video reconstruction module to extract and propagate high-quality features in the low-bit quantized model. In addition, to alleviate the information distortion of the Transformer branch in the quantized feature enhancement module, we introduce a shift operation on the query and key distributions to further bridge the performance gap. Comprehensive experimental results manifest that our Q-SCI framework can achieve superior performance, e.g., 4-bit quantized EfficientSCI-S derived by our Q-SCI framework can theoretically accelerate the real-valued EfficientSCI-S by 7.8X with only 2.3% performance gap on the simulation testing datasets. Code is available at https://github.com/mcao92/QuantizedSCI.

8/1/2024

Deep Optics for Video Snapshot Compressive Imaging

Ping Wang, Lishun Wang, Xin Yuan

Video snapshot compressive imaging (SCI) aims to capture a sequence of video frames with only a single shot of a 2D detector, whose backbones rest in optical modulation patterns (also known as masks) and a computational reconstruction algorithm. Advanced deep learning algorithms and mature hardware are putting video SCI into practical applications. Yet, there are two clouds in the sunshine of SCI: i) low dynamic range as a victim of high temporal multiplexing, and ii) existing deep learning algorithms' degradation on real system. To address these challenges, this paper presents a deep optics framework to jointly optimize masks and a reconstruction network. Specifically, we first propose a new type of structural mask to realize motion-aware and full-dynamic-range measurement. Considering the motion awareness property in measurement domain, we develop an efficient network for video SCI reconstruction using Transformer to capture long-term temporal dependencies, dubbed Res2former. Moreover, sensor response is introduced into the forward model of video SCI to guarantee end-to-end model training close to real system. Finally, we implement the learned structural masks on a digital micro-mirror device. Experimental results on synthetic and real data validate the effectiveness of the proposed framework. We believe this is a milestone for real-world video SCI. The source code and data are available at https://github.com/pwangcs/DeepOpticsSCI.

4/9/2024