EV-MGDispNet: Motion-Guided Event-Based Stereo Disparity Estimation Network with Left-Right Consistency

Read original: arXiv:2408.05452 - Published 8/13/2024 by Junjie Jiang, Hao Zhuang, Xinjie Huang, Delei Kong, Zheng Fang

EV-MGDispNet: Motion-Guided Event-Based Stereo Disparity Estimation Network with Left-Right Consistency

Overview

Event-based stereo disparity estimation is a challenging task due to the sparse and asynchronous nature of event data.
The authors propose a novel deep learning model called EV-MGDispNet that leverages motion guidance and left-right consistency to improve stereo disparity estimation from event cameras.
EV-MGDispNet uses spatio-temporal feature enhancement to capture the dynamic information in event data and enforces left-right consistency to ensure the estimated disparities are reliable.

Plain English Explanation

Event cameras are a type of sensor that capture changes in light intensity instead of traditional frames. This allows them to have very fast response times and low power consumption, making them useful for applications like robotics and autonomous vehicles. However, using event data for tasks like stereo disparity estimation (estimating the distance to objects in a scene from two cameras) is challenging because the data is sparse and asynchronous.

To address this, the researchers developed a deep learning model called EV-MGDispNet that uses two key techniques to improve stereo disparity estimation from event cameras:

Motion Guidance: EV-MGDispNet uses information about the motion of objects in the scene to guide the disparity estimation process. This helps the model better capture the dynamic nature of the event data.
Left-Right Consistency: The model enforces a consistency check between the disparity maps estimated from the left and right event cameras. This helps ensure the final disparity estimates are reliable and accurate.

By incorporating these techniques, EV-MGDispNet is able to produce more accurate and robust stereo disparity maps from event camera data compared to previous approaches.

Technical Explanation

The key components of the EV-MGDispNet architecture are:

Spatio-Temporal Feature Encoder: This module takes the input event data and encodes it into a set of spatio-temporal features that capture both the spatial and temporal information in the event stream.
Motion Guidance Module: This module uses the encoded spatio-temporal features to predict a motion guidance map, which provides information about the movement of objects in the scene.
Stereo Disparity Estimation Module: This module takes the spatio-temporal features and the motion guidance map as input and produces a disparity map for the left and right event cameras.
Left-Right Consistency Module: This module enforces a consistency check between the disparity maps estimated from the left and right cameras, ensuring the final disparity estimates are reliable.

The authors evaluate EV-MGDispNet on several event-based stereo datasets and show that it outperforms previous state-of-the-art methods in terms of disparity estimation accuracy. The motion guidance and left-right consistency components are shown to be key to the model's improved performance.

Critical Analysis

The authors acknowledge that their approach has a few limitations:

The performance of EV-MGDispNet is still sensitive to the quality of the event data, and it may not work as well in challenging environments with high noise or low contrast.
The model is computationally more complex than some simpler disparity estimation approaches, which could be a concern for real-time applications with limited computational resources.
The authors only evaluate the model on synthetic and laboratory datasets, and more real-world testing would be needed to fully assess its practical applicability.

Overall, the EV-MGDispNet approach represents an interesting and promising step forward in event-based stereo disparity estimation. However, further research is needed to address the limitations and improve the robustness and efficiency of the model.

Conclusion

The EV-MGDispNet model proposed in this paper demonstrates how incorporating motion guidance and left-right consistency can significantly improve stereo disparity estimation from event-based sensors. This is an important advancement in the field of event-based computer vision, which has the potential to enable new applications in areas like robotics and autonomous driving. While the model has some limitations, the authors' innovative approach and the promising results suggest that event-based stereo vision is a promising direction for future research.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

EV-MGDispNet: Motion-Guided Event-Based Stereo Disparity Estimation Network with Left-Right Consistency

Junjie Jiang, Hao Zhuang, Xinjie Huang, Delei Kong, Zheng Fang

Event cameras have the potential to revolutionize the field of robot vision, particularly in areas like stereo disparity estimation, owing to their high temporal resolution and high dynamic range. Many studies use deep learning for event camera stereo disparity estimation. However, these methods fail to fully exploit the temporal information in the event stream to acquire clear event representations. Additionally, there is room for further reduction in pixel shifts in the feature maps before constructing the cost volume. In this paper, we propose EV-MGDispNet, a novel event-based stereo disparity estimation method. Firstly, we propose an edge-aware aggregation (EAA) module, which fuses event frames and motion confidence maps to generate a novel clear event representation. Then, we propose a motion-guided attention (MGA) module, where motion confidence maps utilize deformable transformer encoders to enhance the feature map with more accurate edges. Finally, we also add a census left-right consistency loss function to enhance the left-right consistency of stereo event representation. Through conducting experiments within challenging real-world driving scenarios, we validate that our method outperforms currently known state-of-the-art methods in terms of mean absolute error (MAE) and root mean square error (RMSE) metrics.

8/13/2024

EvGGS: A Collaborative Learning Framework for Event-based Generalizable Gaussian Splatting

Jiaxu Wang, Junhao He, Ziyi Zhang, Mingyuan Sun, Jingkai Sun, Renjing Xu

Event cameras offer promising advantages such as high dynamic range and low latency, making them well-suited for challenging lighting conditions and fast-moving scenarios. However, reconstructing 3D scenes from raw event streams is difficult because event data is sparse and does not carry absolute color information. To release its potential in 3D reconstruction, we propose the first event-based generalizable 3D reconstruction framework, called EvGGS, which reconstructs scenes as 3D Gaussians from only event input in a feedforward manner and can generalize to unseen cases without any retraining. This framework includes a depth estimation module, an intensity reconstruction module, and a Gaussian regression module. These submodules connect in a cascading manner, and we collaboratively train them with a designed joint loss to make them mutually promote. To facilitate related studies, we build a novel event-based 3D dataset with various material objects and calibrated labels of grayscale images, depth maps, camera poses, and silhouettes. Experiments show models that have jointly trained significantly outperform those trained individually. Our approach performs better than all baselines in reconstruction quality, and depth/intensity predictions with satisfactory rendering speed.

6/4/2024

IMU-Aided Event-based Stereo Visual Odometry

Junkai Niu, Sheng Zhong, Yi Zhou

Direct methods for event-based visual odometry solve the mapping and camera pose tracking sub-problems by establishing implicit data association in a way that the generative model of events is exploited. The main bottlenecks faced by state-of-the-art work in this field include the high computational complexity of mapping and the limited accuracy of tracking. In this paper, we improve our previous direct pipeline textit{Event-based Stereo Visual Odometry} in terms of accuracy and efficiency. To speed up the mapping operation, we propose an efficient strategy of edge-pixel sampling according to the local dynamics of events. The mapping performance in terms of completeness and local smoothness is also improved by combining the temporal stereo results and the static stereo results. To circumvent the degeneracy issue of camera pose tracking in recovering the yaw component of general 6-DoF motion, we introduce as a prior the gyroscope measurements via pre-integration. Experiments on publicly available datasets justify our improvement. We release our pipeline as an open-source software for future research in this field.

5/8/2024

Temporal Event Stereo via Joint Learning with Stereoscopic Flow

Hoonhee Cho, Jae-Young Kang, Kuk-Jin Yoon

Event cameras are dynamic vision sensors inspired by the biological retina, characterized by their high dynamic range, high temporal resolution, and low power consumption. These features make them capable of perceiving 3D environments even in extreme conditions. Event data is continuous across the time dimension, which allows a detailed description of each pixel's movements. To fully utilize the temporally dense and continuous nature of event cameras, we propose a novel temporal event stereo, a framework that continuously uses information from previous time steps. This is accomplished through the simultaneous training of an event stereo matching network alongside stereoscopic flow, a new concept that captures all pixel movements from stereo cameras. Since obtaining ground truth for optical flow during training is challenging, we propose a method that uses only disparity maps to train the stereoscopic flow. The performance of event-based stereo matching is enhanced by temporally aggregating information using the flows. We have achieved state-of-the-art performance on the MVSEC and the DSEC datasets. The method is computationally efficient, as it stacks previous information in a cascading manner. The code is available at https://github.com/mickeykang16/TemporalEventStereo.

7/16/2024