MEVDT: Multi-Modal Event-Based Vehicle Detection and Tracking Dataset

Read original: arXiv:2407.20446 - Published 7/31/2024 by Zaid A. El Shair, Samir A. Rawashdeh

🔎

Overview

This paper presents a new dataset for event-based vision
The dataset, called SEVD, contains ego-centric event-based vision data from various urban scenes
The data can be used to train models for event-based vision tasks like object detection, tracking, and segmentation

Plain English Explanation

Event-based vision is a new type of visual sensing that records changes in brightness over time, instead of capturing full images like traditional cameras. This allows for much higher temporal resolution and lower power consumption. The SEVD dataset provides a large collection of this type of event-based data, captured from the perspective of a person moving through urban environments.

The dataset can be used to develop and test machine learning models that are specialized for working with event-based vision data, rather than regular video or images. These models could enable new applications like super-fast object tracking, low-latency autonomous navigation, and energy-efficient visual sensing. By providing a diverse set of real-world event-based vision data, this dataset aims to accelerate progress in this emerging field of computer vision.

Technical Explanation

The SEVD dataset contains over 6 million event frames captured from a wearable event-based camera as the wearer navigated various urban environments. The data was collected using a commercial event-based vision sensor, the Dynamic Vision Sensor (DVS), which outputs a stream of binary events corresponding to local brightness changes over time.

The dataset covers a wide range of scenes including city streets, parks, and indoor spaces. It includes annotations for objects, people, and other semantically meaningful elements. Researchers can use this data to train and evaluate models for tasks like object detection, tracking, and segmentation on event-based vision data. The dataset also provides simulated event data, allowing for testing on controlled scenarios.

Critical Analysis

The SEVD dataset provides a valuable resource for developing and evaluating event-based vision models. However, it is limited to ego-centric viewpoints, and may not capture the full diversity of event-based scenes. Additionally, the annotations, while extensive, may not perfectly match the subjective interpretations of different researchers.

Some key areas for further research include extending the dataset to other viewpoints and scenarios, improving the consistency and granularity of annotations, and exploring semi-supervised or self-supervised learning techniques that can leverage the unique properties of event-based data.

Conclusion

The SEVD dataset represents an important step forward for the field of event-based vision. By providing a large, annotated collection of real-world event-based data, it enables researchers to train more robust and capable models for a variety of applications. As event-based vision technology continues to mature, datasets like SEVD will play a crucial role in unlocking its potential for tasks like high-speed object tracking, low-power autonomy, and efficient visual perception.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🔎

MEVDT: Multi-Modal Event-Based Vehicle Detection and Tracking Dataset

Zaid A. El Shair, Samir A. Rawashdeh

In this data article, we introduce the Multi-Modal Event-based Vehicle Detection and Tracking (MEVDT) dataset. This dataset provides a synchronized stream of event data and grayscale images of traffic scenes, captured using the Dynamic and Active-Pixel Vision Sensor (DAVIS) 240c hybrid event-based camera. MEVDT comprises 63 multi-modal sequences with approximately 13k images, 5M events, 10k object labels, and 85 unique object tracking trajectories. Additionally, MEVDT includes manually annotated ground truth labels $unicode{x2014}$ consisting of object classifications, pixel-precise bounding boxes, and unique object IDs $unicode{x2014}$ which are provided at a labeling frequency of 24 Hz. Designed to advance the research in the domain of event-based vision, MEVDT aims to address the critical need for high-quality, real-world annotated datasets that enable the development and evaluation of object detection and tracking algorithms in automotive environments.

7/31/2024

👀

SEVD: Synthetic Event-based Vision Dataset for Ego and Fixed Traffic Perception

Manideep Reddy Aliminati, Bharatesh Chakravarthi, Aayush Atul Verma, Arpitsinh Vaghela, Hua Wei, Xuesong Zhou, Yezhou Yang

Recently, event-based vision sensors have gained attention for autonomous driving applications, as conventional RGB cameras face limitations in handling challenging dynamic conditions. However, the availability of real-world and synthetic event-based vision datasets remains limited. In response to this gap, we present SEVD, a first-of-its-kind multi-view ego, and fixed perception synthetic event-based dataset using multiple dynamic vision sensors within the CARLA simulator. Data sequences are recorded across diverse lighting (noon, nighttime, twilight) and weather conditions (clear, cloudy, wet, rainy, foggy) with domain shifts (discrete and continuous). SEVD spans urban, suburban, rural, and highway scenes featuring various classes of objects (car, truck, van, bicycle, motorcycle, and pedestrian). Alongside event data, SEVD includes RGB imagery, depth maps, optical flow, semantic, and instance segmentation, facilitating a comprehensive understanding of the scene. Furthermore, we evaluate the dataset using state-of-the-art event-based (RED, RVT) and frame-based (YOLOv8) methods for traffic participant detection tasks and provide baseline benchmarks for assessment. Additionally, we conduct experiments to assess the synthetic event-based dataset's generalization capabilities. The dataset is available at https://eventbasedvision.github.io/SEVD

4/24/2024

DeepSense-V2V: A Vehicle-to-Vehicle Multi-Modal Sensing, Localization, and Communications Dataset

Joao Morais, Gouranga Charan, Nikhil Srinivas, Ahmed Alkhateeb

High data rate and low-latency vehicle-to-vehicle (V2V) communication are essential for future intelligent transport systems to enable coordination, enhance safety, and support distributed computing and intelligence requirements. Developing effective communication strategies, however, demands realistic test scenarios and datasets. This is important at the high-frequency bands where more spectrum is available, yet harvesting this bandwidth is challenged by the need for direction transmission and the sensitivity of signal propagation to blockages. This work presents the first large-scale multi-modal dataset for studying mmWave vehicle-to-vehicle communications. It presents a two-vehicle testbed that comprises data from a 360-degree camera, four radars, four 60 GHz phased arrays, a 3D lidar, and two precise GPSs. The dataset contains vehicles driving during the day and night for 120 km in intercity and rural settings, with speeds up to 100 km per hour. More than one million objects were detected across all images, from trucks to bicycles. This work further includes detailed dataset statistics that prove the coverage of various situations and highlights how this dataset can enable novel machine-learning applications.

6/27/2024

eTraM: Event-based Traffic Monitoring Dataset

Aayush Atul Verma, Bharatesh Chakravarthi, Arpitsinh Vaghela, Hua Wei, Yezhou Yang

Event cameras, with their high temporal and dynamic range and minimal memory usage, have found applications in various fields. However, their potential in static traffic monitoring remains largely unexplored. To facilitate this exploration, we present eTraM - a first-of-its-kind, fully event-based traffic monitoring dataset. eTraM offers 10 hr of data from different traffic scenarios in various lighting and weather conditions, providing a comprehensive overview of real-world situations. Providing 2M bounding box annotations, it covers eight distinct classes of traffic participants, ranging from vehicles to pedestrians and micro-mobility. eTraM's utility has been assessed using state-of-the-art methods for traffic participant detection, including RVT, RED, and YOLOv8. We quantitatively evaluate the ability of event-based models to generalize on nighttime and unseen scenes. Our findings substantiate the compelling potential of leveraging event cameras for traffic monitoring, opening new avenues for research and application. eTraM is available at https://eventbasedvision.github.io/eTraM

4/3/2024