RoboMNIST: A Multimodal Dataset for Multi-Robot Activity Recognition Using WiFi Sensing, Video, and Audio

Read original: arXiv:2408.16703 - Published 8/30/2024 by Kian Behzad, Rojin Zandi, Elaheh Motamedi, Hojjat Salehinejad, Milad Siami

RoboMNIST: A Multimodal Dataset for Multi-Robot Activity Recognition Using WiFi Sensing, Video, and Audio

Overview

Presents a multimodal dataset called RoboMNIST for multi-robot activity recognition
Uses WiFi sensing, video, and audio data to capture the activities of multiple robots
Designed to support research on multi-robot activity recognition and related topics

Plain English Explanation

The RoboMNIST dataset is a collection of data that can be used to train AI systems to recognize the activities of multiple robots. It includes information from different sensors, like WiFi, video cameras, and microphones, to capture a comprehensive view of the robots' actions.

The key idea is that by combining these diverse data sources, researchers can develop more accurate and robust models for recognizing the activities of multiple robots at the same time. This could be useful for applications like coordinating the actions of robot teams or enhancing the capabilities of individual robots.

The dataset can also serve as a benchmark for researchers to test and compare different multi-modal perception approaches for robotics applications.

Technical Explanation

The RoboMNIST dataset was created by researchers at Northeastern University. It consists of data collected from multiple robots performing a variety of activities, such as moving, grasping objects, and interacting with each other.

The data includes:

WiFi signal measurements to capture the robots' movements and interactions
Video footage from multiple cameras to observe the robots' physical actions
Audio recordings to detect sounds associated with the robots' activities

By combining these modalities, the researchers aimed to create a comprehensive dataset that can support the development of advanced multi-robot activity recognition algorithms.

The dataset is designed to be challenging, with varying lighting conditions, camera perspectives, and robot configurations. This is intended to encourage the development of robust and adaptable activity recognition models that can handle the complexities of real-world multi-robot scenarios.

Critical Analysis

The RoboMNIST dataset represents a valuable contribution to the field of multi-robot activity recognition. By providing a multimodal dataset, the researchers have created an opportunity for researchers to explore the potential benefits of combining different sensing modalities for this task.

However, the dataset is limited in its scale and diversity. It only includes a small number of robots performing a relatively narrow set of activities. To truly advance the state of the art in multi-robot activity recognition, a larger and more comprehensive dataset would be needed, with a greater variety of robot types, environmental conditions, and task scenarios.

Additionally, the dataset does not provide any information about the ground truth labeling process or the reliability of the annotations. This makes it difficult to assess the quality and validity of the data, which is a crucial consideration for any dataset used in machine learning research.

Conclusion

The RoboMNIST dataset represents an important step towards advancing the state of the art in multi-robot activity recognition. By providing a multimodal dataset that combines WiFi, video, and audio data, the researchers have opened up new avenues for researchers to explore the potential of combining different sensing modalities for this task.

While the dataset has some limitations in terms of scale and diversity, it still serves as a valuable resource for the research community. By encouraging the development of robust and adaptable activity recognition models, the RoboMNIST dataset can contribute to the advancement of multi-robot coordination and collaboration capabilities, with potential applications in a wide range of domains.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

RoboMNIST: A Multimodal Dataset for Multi-Robot Activity Recognition Using WiFi Sensing, Video, and Audio

Kian Behzad, Rojin Zandi, Elaheh Motamedi, Hojjat Salehinejad, Milad Siami

We introduce a novel dataset for multi-robot activity recognition (MRAR) using two robotic arms integrating WiFi channel state information (CSI), video, and audio data. This multimodal dataset utilizes signals of opportunity, leveraging existing WiFi infrastructure to provide detailed indoor environmental sensing without additional sensor deployment. Data were collected using two Franka Emika robotic arms, complemented by three cameras, three WiFi sniffers to collect CSI, and three microphones capturing distinct yet complementary audio data streams. The combination of CSI, visual, and auditory data can enhance robustness and accuracy in MRAR. This comprehensive dataset enables a holistic understanding of robotic environments, facilitating advanced autonomous operations that mimic human-like perception and interaction. By repurposing ubiquitous WiFi signals for environmental sensing, this dataset offers significant potential aiming to advance robotic perception and autonomous systems. It provides a valuable resource for developing sophisticated decision-making and adaptive capabilities in dynamic environments.

8/30/2024

RoboFiSense: Attention-Based Robotic Arm Activity Recognition with WiFi Sensing

Rojin Zandi, Kian Behzad, Elaheh Motamedi, Hojjat Salehinejad, Milad Siami

Despite the current surge of interest in autonomous robotic systems, robot activity recognition within restricted indoor environments remains a formidable challenge. Conventional methods for detecting and recognizing robotic arms' activities often rely on vision-based or light detection and ranging (LiDAR) sensors, which require line-of-sight (LoS) access and may raise privacy concerns, for example, in nursing facilities. This research pioneers an innovative approach harnessing channel state information (CSI) measured from WiFi signals, subtly influenced by the activity of robotic arms. We developed an attention-based network to classify eight distinct activities performed by a Franka Emika robotic arm in different situations. Our proposed bidirectional vision transformer-concatenated (BiVTC) methodology aspires to predict robotic arm activities accurately, even when trained on activities with different velocities, all without dependency on external or internal sensors or visual aids. Considering the high dependency of CSI data on the environment motivated us to study the problem of sniffer location selection, by systematically changing the sniffer's location and collecting different sets of data. Finally, this paper also marks the first publication of the CSI data of eight distinct robotic arm activities, collectively referred to as RoboFiSense. This initiative aims to provide a benchmark dataset and baselines to the research community, fostering advancements in the field of robotics sensing.

5/8/2024

MARS: Multimodal Active Robotic Sensing for Articulated Characterization

Hongliang Zeng, Ping Zhang, Chengjiong Wu, Jiahua Wang, Tingyu Ye, Fang Li

Precise perception of articulated objects is vital for empowering service robots. Recent studies mainly focus on point cloud, a single-modal approach, often neglecting vital texture and lighting details and assuming ideal conditions like optimal viewpoints, unrepresentative of real-world scenarios. To address these limitations, we introduce MARS, a novel framework for articulated object characterization. It features a multi-modal fusion module utilizing multi-scale RGB features to enhance point cloud features, coupled with reinforcement learning-based active sensing for autonomous optimization of observation viewpoints. In experiments conducted with various articulated object instances from the PartNet-Mobility dataset, our method outperformed current state-of-the-art methods in joint parameter estimation accuracy. Additionally, through active sensing, MARS further reduces errors, demonstrating enhanced efficiency in handling suboptimal viewpoints. Furthermore, our method effectively generalizes to real-world articulated objects, enhancing robot interactions. Code is available at https://github.com/robhlzeng/MARS.

7/2/2024

🧪

CoPeD-Advancing Multi-Robot Collaborative Perception: A Comprehensive Dataset in Real-World Environments

Yang Zhou, Long Quang, Carlos Nieto-Granda, Giuseppe Loianno

In the past decade, although single-robot perception has made significant advancements, the exploration of multi-robot collaborative perception remains largely unexplored. This involves fusing compressed, intermittent, limited, heterogeneous, and asynchronous environmental information across multiple robots to enhance overall perception, despite challenges like sensor noise, occlusions, and sensor failures. One major hurdle has been the lack of real-world datasets. This paper presents a pioneering and comprehensive real-world multi-robot collaborative perception dataset to boost research in this area. Our dataset leverages the untapped potential of air-ground robot collaboration featuring distinct spatial viewpoints, complementary robot mobilities, coverage ranges, and sensor modalities. It features raw sensor inputs, pose estimation, and optional high-level perception annotation, thus accommodating diverse research interests. Compared to existing datasets predominantly designed for Simultaneous Localization and Mapping (SLAM), our setup ensures a diverse range and adequate overlap of sensor views to facilitate the study of multi-robot collaborative perception algorithms. We demonstrate the value of this dataset qualitatively through multiple collaborative perception tasks. We believe this work will unlock the potential research of high-level scene understanding through multi-modal collaborative perception in multi-robot settings.

5/24/2024