Event-based dataset for the detection and classification of manufacturing assembly tasks

Read original: arXiv:2405.14626 - Published 5/24/2024 by Laura Duarte, Pedro Neto

🔎

Overview

The Event-based Dataset of Assembly Tasks (EDAT24) showcases manufacturing primitive tasks like idle, pick, place, and screw, which are basic actions performed by human operators in manufacturing assembly.
The data was captured using a DAVIS240C event camera, an asynchronous vision sensor that records changes in light intensity.
The dataset includes 400 samples of event data and grayscale frames for the 4 manufacturing primitives, with 100 samples per primitive.
The user interacts with objects from the open-source CT-Benchmark in front of the static DAVIS event camera.
The data is provided in raw (.aedat) and pre-processed (.npy) formats, along with custom Python code to extend the dataset.

Plain English Explanation

The EDAT24 dataset showcases some basic actions that human workers perform during manufacturing assembly, like picking up parts, placing them, and screwing them together. The researchers used a special type of camera called an "event camera" to capture this data. Unlike a regular camera that takes full pictures at set intervals, an event camera only records changes in the scene, like when something moves. This makes the data much smaller and better suited for real-time analysis of human motion.

The dataset includes 400 samples of this event data, with 100 samples for each of the 4 manufacturing primitives (idle, pick, place, and screw). The user was recorded interacting with objects from another open-source dataset called CT-Benchmark while the event camera stayed in a fixed position. The researchers provide the raw event data as well as some pre-processed versions to make it easier for other researchers to work with. They also include some custom code that allows people to add new types of manufacturing tasks or collect more samples to expand the dataset.

Technical Explanation

The EDAT24 dataset contains recordings of four basic manufacturing tasks - idle, pick, place, and screw - performed by a human operator in front of a DAVIS240C event camera. Event cameras are asynchronous vision sensors that only register changes in light intensity, recording "events" rather than full frames. This lightweight data format is well-suited for real-time detection and analysis of human motion.

Each of the 400 samples in the dataset (100 per primitive task) includes both the raw event data (.aedat format) and pre-processed versions (.npy format) for easier use. The user interacts with objects from the open-source CT-Benchmark dataset while the static event camera captures their movements. The researchers also provide custom Python code to help others extend the dataset by adding new manufacturing primitives or collecting additional samples.

The EDAT24 dataset builds on previous work in event-based vision datasets like ESPCN, SEVD, and V2E, aiming to advance research in real-time human motion analysis for manufacturing applications.

Critical Analysis

The EDAT24 dataset provides a valuable resource for researchers working on real-time human motion analysis, particularly in the context of manufacturing assembly tasks. The use of an event camera to capture the data is a key strength, as it results in a lightweight, high-temporal-resolution dataset well-suited for the intended applications.

However, the dataset is relatively small, with only 100 samples per primitive task. While the researchers provide the ability to extend the dataset, it would be beneficial to see a larger-scale version with more diverse samples to improve the robustness and generalizability of any models developed using this data.

Additionally, the dataset is focused solely on the four specified manufacturing primitives. Expanding the scope to include a wider range of assembly tasks or even non-manufacturing human activities could further broaden the usefulness and applicability of the dataset.

Conclusion

The EDAT24 dataset offers a unique and valuable resource for researchers working on real-time human motion analysis, particularly in the context of manufacturing assembly tasks. By using an event camera to capture the data, the researchers have created a lightweight, high-temporal-resolution dataset well-suited for the intended applications.

While the current dataset is relatively small in scope, the ability to extend it with new primitives or additional samples is a positive feature that could lead to further advancements in this field. As event-based vision continues to evolve, datasets like EDAT24 will play an increasingly important role in driving research and development of robust, real-time systems for human-centric applications in manufacturing and beyond.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🔎

Event-based dataset for the detection and classification of manufacturing assembly tasks

Laura Duarte, Pedro Neto

The featured dataset, the Event-based Dataset of Assembly Tasks (EDAT24), showcases a selection of manufacturing primitive tasks (idle, pick, place, and screw), which are basic actions performed by human operators in any manufacturing assembly. The data were captured using a DAVIS240C event camera, an asynchronous vision sensor that registers events when changes in light intensity value occur. Events are a lightweight data format for conveying visual information and are well-suited for real-time detection and analysis of human motion. Each manufacturing primitive has 100 recorded samples of DAVIS240C data, including events and greyscale frames, for a total of 400 samples. In the dataset, the user interacts with objects from the open-source CT-Benchmark in front of the static DAVIS event camera. All data are made available in raw form (.aedat) and in pre-processed form (.npy). Custom-built Python code is made available together with the dataset to aid researchers to add new manufacturing primitives or extend the dataset with more samples.

5/24/2024

DailyDVS-200: A Comprehensive Benchmark Dataset for Event-Based Action Recognition

Qi Wang, Zhou Xu, Yuming Lin, Jingtao Ye, Hongsheng Li, Guangming Zhu, Syed Afaq Ali Shah, Mohammed Bennamoun, Liang Zhang

Neuromorphic sensors, specifically event cameras, revolutionize visual data acquisition by capturing pixel intensity changes with exceptional dynamic range, minimal latency, and energy efficiency, setting them apart from conventional frame-based cameras. The distinctive capabilities of event cameras have ignited significant interest in the domain of event-based action recognition, recognizing their vast potential for advancement. However, the development in this field is currently slowed by the lack of comprehensive, large-scale datasets, which are critical for developing robust recognition frameworks. To bridge this gap, we introduces DailyDVS-200, a meticulously curated benchmark dataset tailored for the event-based action recognition community. DailyDVS-200 is extensive, covering 200 action categories across real-world scenarios, recorded by 47 participants, and comprises more than 22,000 event sequences. This dataset is designed to reflect a broad spectrum of action types, scene complexities, and data acquisition diversity. Each sequence in the dataset is annotated with 14 attributes, ensuring a detailed characterization of the recorded actions. Moreover, DailyDVS-200 is structured to facilitate a wide range of research paths, offering a solid foundation for both validating existing approaches and inspiring novel methodologies. By setting a new benchmark in the field, we challenge the current limitations of neuromorphic data processing and invite a surge of new approaches in event-based action recognition techniques, which paves the way for future explorations in neuromorphic computing and beyond. The dataset and source code are available at https://github.com/QiWang233/DailyDVS-200.

7/16/2024

🔎

MEVDT: Multi-Modal Event-Based Vehicle Detection and Tracking Dataset

Zaid A. El Shair, Samir A. Rawashdeh

In this data article, we introduce the Multi-Modal Event-based Vehicle Detection and Tracking (MEVDT) dataset. This dataset provides a synchronized stream of event data and grayscale images of traffic scenes, captured using the Dynamic and Active-Pixel Vision Sensor (DAVIS) 240c hybrid event-based camera. MEVDT comprises 63 multi-modal sequences with approximately 13k images, 5M events, 10k object labels, and 85 unique object tracking trajectories. Additionally, MEVDT includes manually annotated ground truth labels $unicode{x2014}$ consisting of object classifications, pixel-precise bounding boxes, and unique object IDs $unicode{x2014}$ which are provided at a labeling frequency of 24 Hz. Designed to advance the research in the domain of event-based vision, MEVDT aims to address the critical need for high-quality, real-world annotated datasets that enable the development and evaluation of object detection and tracking algorithms in automotive environments.

7/31/2024

📊

Event Data Association via Robust Model Fitting for Event-based Object Tracking

Haosheng Chen, Shuyuan Lin, Yan Yan, Hanzi Wang, Xinbo Gao

Event-based approaches, which are based on bio-inspired asynchronous event cameras, have achieved promising performance on various computer vision tasks. However, the study of the fundamental event data association problem is still in its infancy. In this paper, we propose a novel Event Data Association (called EDA) approach to explicitly address the event association and fusion problem. The proposed EDA seeks for event trajectories that best fit the event data, in order to perform unifying data association and information fusion. In EDA, we first asynchronously fuse the event data based on its information entropy. Then, we introduce a deterministic model hypothesis generation strategy, which effectively generates model hypotheses from the fused events, to represent the corresponding event trajectories. After that, we present a two-stage weighting algorithm, which robustly weighs and selects true models from the generated model hypotheses, through multi-structural geometric model fitting. Meanwhile, we also propose an adaptive model selection strategy to automatically determine the number of the true models. Finally, we use the selected true models to associate and fuse the event data, without being affected by sensor noise and irrelevant structures. We evaluate the performance of the proposed EDA on the object tracking task. The experimental results show the effectiveness of EDA under challenging scenarios, such as high speed, motion blur, and high dynamic range conditions.

4/10/2024