DailyDVS-200: A Comprehensive Benchmark Dataset for Event-Based Action Recognition

Read original: arXiv:2407.05106 - Published 7/16/2024 by Qi Wang, Zhou Xu, Yuming Lin, Jingtao Ye, Hongsheng Li, Guangming Zhu, Syed Afaq Ali Shah, Mohammed Bennamoun, Liang Zhang

DailyDVS-200: A Comprehensive Benchmark Dataset for Event-Based Action Recognition

Overview

This paper introduces DailyDVS-200, a large-scale benchmark dataset for event-based action recognition using neuromorphic sensors.
The dataset consists of over 200 action categories captured in natural environments, with a focus on daily life activities.
It provides a comprehensive evaluation of existing event-based action recognition methods and establishes a new state-of-the-art performance.

Plain English Explanation

The paper presents a new dataset called DailyDVS-200, which is designed to advance the field of event-based action recognition. Event-based sensors, also known as neuromorphic sensors, work differently from traditional cameras. Instead of capturing images at a fixed rate, they only record changes in brightness, which results in a highly efficient and low-power data representation.

The DailyDVS-200 dataset contains over 200 different types of human actions, such as walking, cooking, and playing sports, that were recorded in real-world settings. This is a significant improvement over previous datasets, which tended to be smaller and focused on a more limited set of activities. By providing a diverse and challenging benchmark, the authors aim to drive progress in event-based action recognition, which has numerous applications in areas like robotics, surveillance, and assistive technology.

The paper also includes a comprehensive evaluation of existing event-based action recognition methods using the new dataset. The results show that the DailyDVS-200 dataset presents a significant challenge for current algorithms, and the authors establish a new state-of-the-art performance, paving the way for further advancements in this field.

Technical Explanation

The authors of this paper introduce DailyDVS-200, a large-scale benchmark dataset for event-based action recognition. The dataset was captured using a neuromorphic sensor, which records changes in brightness rather than traditional frames. This results in a compact and efficient data representation that is well-suited for real-time and low-power applications.

The dataset contains over 200 different action categories, including a wide range of daily life activities, such as walking, cooking, and playing sports. The actions were recorded in natural environments, providing a more realistic and challenging setting compared to previous datasets. The authors argue that this diversity and complexity is crucial for advancing the state-of-the-art in event-based action recognition.

To establish a benchmark for the dataset, the authors evaluate several existing event-based action recognition methods, including EV-Flow, HATS, and DVSAC. The results show that the DailyDVS-200 dataset presents a significant challenge, with the best-performing method achieving an accuracy of only 50%. The authors then propose a new state-of-the-art model that outperforms the existing approaches, paving the way for further advancements in this field.

Critical Analysis

The DailyDVS-200 dataset represents a significant contribution to the field of event-based action recognition. By providing a diverse and challenging benchmark, the authors have highlighted the limitations of existing algorithms and encouraged the development of more robust and adaptive models.

However, the paper does not address several potential limitations of the dataset. For example, the authors do not provide detailed information about the environmental conditions, lighting, and camera viewpoints used during the data collection process. This information could be crucial for understanding the dataset's biases and the generalization capabilities of the trained models.

Additionally, the authors could have explored the potential trade-offs between dataset complexity and model performance. It would be valuable to understand how the models' accuracy and efficiency scale as the dataset size and diversity increase, as this could inform the design of future event-based action recognition datasets and algorithms.

Conclusion

The DailyDVS-200 dataset represents a significant advancement in the field of event-based action recognition. By providing a large-scale, diverse, and challenging benchmark, the authors have laid the groundwork for the development of more robust and adaptive event-based action recognition models. The comprehensive evaluation of existing methods and the establishment of a new state-of-the-art performance demonstrate the dataset's potential to drive progress in this emerging field, with important applications in areas such as robotics, surveillance, and assistive technology.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

DailyDVS-200: A Comprehensive Benchmark Dataset for Event-Based Action Recognition

Qi Wang, Zhou Xu, Yuming Lin, Jingtao Ye, Hongsheng Li, Guangming Zhu, Syed Afaq Ali Shah, Mohammed Bennamoun, Liang Zhang

Neuromorphic sensors, specifically event cameras, revolutionize visual data acquisition by capturing pixel intensity changes with exceptional dynamic range, minimal latency, and energy efficiency, setting them apart from conventional frame-based cameras. The distinctive capabilities of event cameras have ignited significant interest in the domain of event-based action recognition, recognizing their vast potential for advancement. However, the development in this field is currently slowed by the lack of comprehensive, large-scale datasets, which are critical for developing robust recognition frameworks. To bridge this gap, we introduces DailyDVS-200, a meticulously curated benchmark dataset tailored for the event-based action recognition community. DailyDVS-200 is extensive, covering 200 action categories across real-world scenarios, recorded by 47 participants, and comprises more than 22,000 event sequences. This dataset is designed to reflect a broad spectrum of action types, scene complexities, and data acquisition diversity. Each sequence in the dataset is annotated with 14 attributes, ensuring a detailed characterization of the recorded actions. Moreover, DailyDVS-200 is structured to facilitate a wide range of research paths, offering a solid foundation for both validating existing approaches and inspiring novel methodologies. By setting a new benchmark in the field, we challenge the current limitations of neuromorphic data processing and invite a surge of new approaches in event-based action recognition techniques, which paves the way for future explorations in neuromorphic computing and beyond. The dataset and source code are available at https://github.com/QiWang233/DailyDVS-200.

7/16/2024

🧪

V2CE: Video to Continuous Events Simulator

Zhongyang Zhang, Shuyang Cui, Kaidong Chai, Haowen Yu, Subhasis Dasgupta, Upal Mahbub, Tauhidur Rahman

Dynamic Vision Sensor (DVS)-based solutions have recently garnered significant interest across various computer vision tasks, offering notable benefits in terms of dynamic range, temporal resolution, and inference speed. However, as a relatively nascent vision sensor compared to Active Pixel Sensor (APS) devices such as RGB cameras, DVS suffers from a dearth of ample labeled datasets. Prior efforts to convert APS data into events often grapple with issues such as a considerable domain shift from real events, the absence of quantified validation, and layering problems within the time axis. In this paper, we present a novel method for video-to-events stream conversion from multiple perspectives, considering the specific characteristics of DVS. A series of carefully designed losses helps enhance the quality of generated event voxels significantly. We also propose a novel local dynamic-aware timestamp inference strategy to accurately recover event timestamps from event voxels in a continuous fashion and eliminate the temporal layering problem. Results from rigorous validation through quantified metrics at all stages of the pipeline establish our method unquestionably as the current state-of-the-art (SOTA).

4/30/2024

Event Stream based Human Action Recognition: A High-Definition Benchmark Dataset and Algorithms

Xiao Wang, Shiao Wang, Pengpeng Shao, Bo Jiang, Lin Zhu, Yonghong Tian

Human Action Recognition (HAR) stands as a pivotal research domain in both computer vision and artificial intelligence, with RGB cameras dominating as the preferred tool for investigation and innovation in this field. However, in real-world applications, RGB cameras encounter numerous challenges, including light conditions, fast motion, and privacy concerns. Consequently, bio-inspired event cameras have garnered increasing attention due to their advantages of low energy consumption, high dynamic range, etc. Nevertheless, most existing event-based HAR datasets are low resolution ($346 times 260$). In this paper, we propose a large-scale, high-definition ($1280 times 800$) human action recognition dataset based on the CeleX-V event camera, termed CeleX-HAR. It encompasses 150 commonly occurring action categories, comprising a total of 124,625 video sequences. Various factors such as multi-view, illumination, action speed, and occlusion are considered when recording these data. To build a more comprehensive benchmark dataset, we report over 20 mainstream HAR models for future works to compare. In addition, we also propose a novel Mamba vision backbone network for event stream based HAR, termed EVMamba, which equips the spatial plane multi-directional scanning and novel voxel temporal scanning mechanism. By encoding and mining the spatio-temporal information of event streams, our EVMamba has achieved favorable results across multiple datasets. Both the dataset and source code will be released on url{https://github.com/Event-AHU/CeleX-HAR}

8/20/2024

🤿

Deep Learning for Event-based Vision: A Comprehensive Survey and Benchmarks

Xu Zheng, Yexin Liu, Yunfan Lu, Tongyan Hua, Tianbo Pan, Weiming Zhang, Dacheng Tao, Lin Wang

Event cameras are bio-inspired sensors that capture the per-pixel intensity changes asynchronously and produce event streams encoding the time, pixel position, and polarity (sign) of the intensity changes. Event cameras possess a myriad of advantages over canonical frame-based cameras, such as high temporal resolution, high dynamic range, low latency, etc. Being capable of capturing information in challenging visual conditions, event cameras have the potential to overcome the limitations of frame-based cameras in the computer vision and robotics community. In very recent years, deep learning (DL) has been brought to this emerging field and inspired active research endeavors in mining its potential. However, there is still a lack of taxonomies in DL techniques for event-based vision. We first scrutinize the typical event representations with quality enhancement methods as they play a pivotal role as inputs to the DL models. We then provide a comprehensive survey of existing DL-based methods by structurally grouping them into two major categories: 1) image/video reconstruction and restoration; 2) event-based scene understanding and 3D vision. We conduct benchmark experiments for the existing methods in some representative research directions, i.e., image reconstruction, deblurring, and object recognition, to identify some critical insights and problems. Finally, we have discussions regarding the challenges and provide new perspectives for inspiring more research studies.

4/12/2024