eTraM: Event-based Traffic Monitoring Dataset

Read original: arXiv:2403.19976 - Published 4/3/2024 by Aayush Atul Verma, Bharatesh Chakravarthi, Arpitsinh Vaghela, Hua Wei, Yezhou Yang

eTraM: Event-based Traffic Monitoring Dataset

eTraM Statistics

The eTraM dataset consists of 10 hours of data collected from the Prophesee EVK4 HD camera. In addition to the annotated static perception data, the dataset includes sequences of ego-motion event-based data, which provides increased diversity and opportunities for further experimentation.

Figure 1: Average Duration Spent by Objects from Each Class: The bar plot illustrates the average duration, in seconds, spent by objects of different classes, providing insights into the temporal characteristics of each class in the dataset.

The text presents an analysis of the average time spent by instances from different classes at traffic sites. The analysis reveals distinct temporal dynamics for the various classes in the dataset. Pedestrians and wheelchair users spend the most time at the traffic sites, reflecting their slower movement speeds. In contrast, vehicle classes tend to spend relatively less time at the sites.

Figure 2: Analysis of the distribution of objects categorized by size (small, medium, and large)

The text analyzes the distribution of different categories (VH, PED, and MM) based on the area they cover - small, medium, and large, as shown in Figure 2.

Figure 3: Aspect Ratio Distribution in eTraM: A histogram depicting the frequency distribution of aspect ratios across different classes in eTraM, providing a comprehensive overview of the dataset’s characteristics.

Figure 4: Impact of Spatiotemporal Filtering on Event Camera Data: Comparison of a noisy pre-filtered image (left) and the enhanced clarity achieved post-filtering (right) on daytime (top row) and nighttime data (bottom row).

The paper establishes benchmarks based on object size classifications, as shown in Table 1. Analysis reveals similar performance trends across the models. For pedestrian and vehicle categories, performance is consistently superior on medium-sized instances compared to small and large-sized. Vehicle performance is similar across all three size classifications, but pedestrian performance significantly drops with small-sized instances. In contrast, micro-mobility performs better on small-sized instances than medium-sized, although its best-performing size is still worse than the worst-performing of pedestrian and vehicle.

These results indicate a performance degradation when dealing with small-sized objects, particularly micro-mobility. This may be due to the lack of contour and color information in the raw event data. Figure 3 also presents the frequency of aspect ratios for each class in the eTraM dataset.

Figure 5: Traffic Participant Object Detection by RVT: Snapshots illustrating the detection results of RVT at various traffic sites, showcasing its performance in diverse real-world scenarios.

Figure 7: Illustration of Intersection-over-Union based Multi-Object Tracking on the detection results of RVT

Denoising Using Spatiotemporal Filter

The provided text discusses a denoising step implemented for the eTraM system to address the noise present in the event stream, particularly during nighttime data with increased levels of reflections and pointed light sources from streets and vehicles. Figure 4 qualitatively illustrates the effectiveness of the spatiotemporal filter [3] by presenting a side-by-side comparison of images before and after applying the filter, showcasing the impact of noise reduction on event data frames.

Implementation Details

The paper examines how well event-based models perform on the eTraM dataset. The authors trained three state-of-the-art architectures - RVT, RED, and YOLOv8 - on 7 hours of data, and evaluated them on 1.5 hours of validation and test data. Different learning rates were used for the models.

The paper discusses two input representations used in the experiments: Histogram of Events and Time Surfaces. Histogram of Events assigns each event to a cell based on its position and timestamp, and tallies the counts in each cell and time bin, with separate counts for each polarity. Time Surfaces record the timestamp of the most recent event for each pixel, with an exponential decay applied to diminish the influence of older events. The input representations are used as three-dimensional tensors, with the dimensions corresponding to the relevant parameters.

The mathematical formulations for updating the Histogram of Events and Time Surfaces are provided in the paper.

Detection and Tracking Examples

The provided section discusses the detection results using tensor-based methods, RVT and RED, on the eTraM dataset. The detection results are used to perform tracking using an IoU-based thresholding technique, which yields a Multi-Object Tracking Accuracy (MOTA) of 0.18 and a Multi-Object Tracking Precision (MOTP) of 0.28 on the eTraM test set. The paper notes that the precise evaluation of tracking performance is made possible by the inclusion of object IDs within the eTraM dataset. An example of ground truth objects and their corresponding tracking is illustrated in Figure 7.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

eTraM: Event-based Traffic Monitoring Dataset

Aayush Atul Verma, Bharatesh Chakravarthi, Arpitsinh Vaghela, Hua Wei, Yezhou Yang

Event cameras, with their high temporal and dynamic range and minimal memory usage, have found applications in various fields. However, their potential in static traffic monitoring remains largely unexplored. To facilitate this exploration, we present eTraM - a first-of-its-kind, fully event-based traffic monitoring dataset. eTraM offers 10 hr of data from different traffic scenarios in various lighting and weather conditions, providing a comprehensive overview of real-world situations. Providing 2M bounding box annotations, it covers eight distinct classes of traffic participants, ranging from vehicles to pedestrians and micro-mobility. eTraM's utility has been assessed using state-of-the-art methods for traffic participant detection, including RVT, RED, and YOLOv8. We quantitatively evaluate the ability of event-based models to generalize on nighttime and unseen scenes. Our findings substantiate the compelling potential of leveraging event cameras for traffic monitoring, opening new avenues for research and application. eTraM is available at https://eventbasedvision.github.io/eTraM

4/3/2024

🔎

MEVDT: Multi-Modal Event-Based Vehicle Detection and Tracking Dataset

Zaid A. El Shair, Samir A. Rawashdeh

In this data article, we introduce the Multi-Modal Event-based Vehicle Detection and Tracking (MEVDT) dataset. This dataset provides a synchronized stream of event data and grayscale images of traffic scenes, captured using the Dynamic and Active-Pixel Vision Sensor (DAVIS) 240c hybrid event-based camera. MEVDT comprises 63 multi-modal sequences with approximately 13k images, 5M events, 10k object labels, and 85 unique object tracking trajectories. Additionally, MEVDT includes manually annotated ground truth labels $unicode{x2014}$ consisting of object classifications, pixel-precise bounding boxes, and unique object IDs $unicode{x2014}$ which are provided at a labeling frequency of 24 Hz. Designed to advance the research in the domain of event-based vision, MEVDT aims to address the critical need for high-quality, real-world annotated datasets that enable the development and evaluation of object detection and tracking algorithms in automotive environments.

7/31/2024

Long-term Frame-Event Visual Tracking: Benchmark Dataset and Baseline

Xiao Wang, Ju Huang, Shiao Wang, Chuanming Tang, Bo Jiang, Yonghong Tian, Jin Tang, Bin Luo

Current event-/frame-event based trackers undergo evaluation on short-term tracking datasets, however, the tracking of real-world scenarios involves long-term tracking, and the performance of existing tracking algorithms in these scenarios remains unclear. In this paper, we first propose a new long-term and large-scale frame-event single object tracking dataset, termed FELT. It contains 742 videos and 1,594,474 RGB frames and event stream pairs and has become the largest frame-event tracking dataset to date. We re-train and evaluate 15 baseline trackers on our dataset for future works to compare. More importantly, we find that the RGB frames and event streams are naturally incomplete due to the influence of challenging factors and spatially sparse event flow. In response to this, we propose a novel associative memory Transformer network as a unified backbone by introducing modern Hopfield layers into multi-head self-attention blocks to fuse both RGB and event data. Extensive experiments on RGB-Event (FELT), RGB-Thermal (RGBT234, LasHeR), and RGB-Depth (DepthTrack) datasets fully validated the effectiveness of our model. The dataset and source code can be found at url{https://github.com/Event-AHU/FELT_SOT_Benchmark}.

4/4/2024

ES-PTAM: Event-based Stereo Parallel Tracking and Mapping

Suman Ghosh, Valentina Cavinato, Guillermo Gallego

Visual Odometry (VO) and SLAM are fundamental components for spatial perception in mobile robots. Despite enormous progress in the field, current VO/SLAM systems are limited by their sensors' capability. Event cameras are novel visual sensors that offer advantages to overcome the limitations of standard cameras, enabling robots to expand their operating range to challenging scenarios, such as high-speed motion and high dynamic range illumination. We propose a novel event-based stereo VO system by combining two ideas: a correspondence-free mapping module that estimates depth by maximizing ray density fusion and a tracking module that estimates camera poses by maximizing edge-map alignment. We evaluate the system comprehensively on five real-world datasets, spanning a variety of camera types (manufacturers and spatial resolutions) and scenarios (driving, flying drone, hand-held, egocentric, etc). The quantitative and qualitative results demonstrate that our method outperforms the state of the art in majority of the test sequences by a margin, e.g., trajectory error reduction of 45% on RPG dataset, 61% on DSEC dataset, and 21% on TUM-VIE dataset. To benefit the community and foster research on event-based perception systems, we release the source code and results: https://github.com/tub-rip/ES-PTAM

8/29/2024