MUSES: The Multi-Sensor Semantic Perception Dataset for Driving under Uncertainty

Read original: arXiv:2401.12761 - Published 7/18/2024 by Tim Brodermann, David Bruggemann, Christos Sakaridis, Kevin Ta, Odysseas Liagouris, Jason Corkill, Luc Van Gool

⛏️

Overview

Autonomous vehicles require robust semantic visual perception systems to navigate diverse driving conditions
Existing datasets often lack important sensor modalities or do not effectively utilize them to improve semantic annotations
To address this, the researchers introduce MUSES, a multimodal dataset for semantic perception in challenging conditions

Plain English Explanation

The researchers have developed a new dataset called MUSES (MUlti-SEnsor Semantic perception dataset) to help improve the visual perception capabilities of autonomous vehicles. Autonomous cars need to be able to accurately identify and understand the objects and conditions around them, like other vehicles, pedestrians, road signs, and weather, in order to drive safely.

However, the datasets that are currently used to train the computer vision models in autonomous cars often don't include the full range of sensor data that real self-driving cars use, such as lidar, radar, and event cameras. Or they don't leverage that additional sensor data to help annotate and label the objects in the visual data, which is especially important in challenging conditions like rain or darkness.

The MUSES dataset aims to address these limitations. It includes synchronized recordings from multiple sensor types - cameras, lidar, radar, event cameras, and GPS/inertial sensors. The researchers also developed a new [object Object] that captures both the type of object (classification) and the specific instance of that object (detection) as well as the uncertainty in those labels. This allows models to not only recognize what objects are present, but also how confident they can be in that recognition.

By providing this rich, multimodal dataset and advanced annotation, the researchers hope to spur new developments in [object Object] for autonomous vehicles that can handle a wide range of real-world driving conditions. This is a critical capability for enabling [object Object], where the vehicle can drive itself in any scenario without human intervention.

Technical Explanation

The core innovation of this work is the introduction of the MUSES dataset, which includes synchronized multimodal sensor recordings (2D camera, lidar, radar, event camera, IMU/GNSS) along with detailed panoptic annotations (classifying both object categories and instances) that also capture uncertainty in the ground truth labels.

The researchers developed a new two-stage annotation protocol to generate these rich ground truth labels. First, annotators classified each object into semantic categories. Then, in a second pass, they also labeled the specific instance of each object and indicated the level of uncertainty in both the class and instance labels.

This uncertainty-aware panoptic annotation enables a novel task of uncertainty-aware panoptic segmentation, where models must not only identify and segment the objects in the scene, but also output their confidence in those predictions. The MUSES dataset provides a benchmark for evaluating models on this task, in addition to the standard semantic and panoptic segmentation tasks.

The diverse sensor suite and challenging environmental conditions (varying weather, illumination, etc.) captured in MUSES are designed to stress-test the robustness of semantic perception systems for autonomous driving. The researchers demonstrate that existing state-of-the-art models struggle on this dataset, highlighting the need for further research and innovation in this area.

Critical Analysis

The MUSES dataset represents an important step forward in providing a more comprehensive and realistic benchmark for evaluating semantic perception in autonomous vehicles. By including a broader range of sensor modalities and capturing uncertainty in the ground truth annotations, it pushes the field to develop more robust and capable perception systems.

However, the dataset is still limited to a single driving environment (a suburb near Zurich, Switzerland), and it remains to be seen how well the models trained on MUSES will generalize to other locations and conditions. Additionally, the annotation process, while detailed, relies on human annotators and may introduce its own biases and inconsistencies.

Further research is also needed to fully leverage the uncertainty information provided in the annotations. The novel uncertainty-aware panoptic segmentation task is an interesting starting point, but there may be other ways to incorporate uncertainty into perception models and decision-making pipelines for autonomous driving.

Finally, while the dataset is publicly available, the compute and storage requirements for working with the multimodal sensor data may limit its accessibility, especially for smaller research groups or individual developers. Strategies for efficient data processing and model training on MUSES will be an important area of investigation.

Conclusion

The MUSES dataset represents a significant advance in providing a more realistic and challenging benchmark for semantic perception in autonomous vehicles. By incorporating a diverse range of sensor modalities and capturing uncertainty in the ground truth annotations, it pushes the field to develop more robust and capable perception systems capable of handling the complex, real-world conditions that self-driving cars will encounter.

While there are still some limitations and open questions, MUSES opens up new avenues for research in multimodal and uncertainty-aware perception, which will be crucial for realizing the promise of [object Object]. As the autonomous vehicle industry and research community continue to make progress, datasets like MUSES will play a vital role in driving innovation and ensuring the safety and reliability of self-driving technologies.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

⛏️

MUSES: The Multi-Sensor Semantic Perception Dataset for Driving under Uncertainty

Tim Brodermann, David Bruggemann, Christos Sakaridis, Kevin Ta, Odysseas Liagouris, Jason Corkill, Luc Van Gool

Achieving level-5 driving automation in autonomous vehicles necessitates a robust semantic visual perception system capable of parsing data from different sensors across diverse conditions. However, existing semantic perception datasets often lack important non-camera modalities typically used in autonomous vehicles, or they do not exploit such modalities to aid and improve semantic annotations in challenging conditions. To address this, we introduce MUSES, the MUlti-SEnsor Semantic perception dataset for driving in adverse conditions under increased uncertainty. MUSES includes synchronized multimodal recordings with 2D panoptic annotations for 2500 images captured under diverse weather and illumination. The dataset integrates a frame camera, a lidar, a radar, an event camera, and an IMU/GNSS sensor. Our new two-stage panoptic annotation protocol captures both class-level and instance-level uncertainty in the ground truth and enables the novel task of uncertainty-aware panoptic segmentation we introduce, along with standard semantic and panoptic segmentation. MUSES proves both effective for training and challenging for evaluating models under diverse visual conditions, and it opens new avenues for research in multimodal and uncertainty-aware dense semantic perception. Our dataset and benchmark are publicly available at https://muses.vision.ee.ethz.ch.

7/18/2024

SemanticSpray++: A Multimodal Dataset for Autonomous Driving in Wet Surface Conditions

Aldi Piroli, Vinzenz Dallabetta, Johannes Kopp, Marc Walessa, Daniel Meissner, Klaus Dietmayer

Autonomous vehicles rely on camera, LiDAR, and radar sensors to navigate the environment. Adverse weather conditions like snow, rain, and fog are known to be problematic for both camera and LiDAR-based perception systems. Currently, it is difficult to evaluate the performance of these methods due to the lack of publicly available datasets containing multimodal labeled data. To address this limitation, we propose the SemanticSpray++ dataset, which provides labels for camera, LiDAR, and radar data of highway-like scenarios in wet surface conditions. In particular, we provide 2D bounding boxes for the camera image, 3D bounding boxes for the LiDAR point cloud, and semantic labels for the radar targets. By labeling all three sensor modalities, the SemanticSpray++ dataset offers a comprehensive test bed for analyzing the performance of different perception methods when vehicles travel on wet surface conditions. Together with comprehensive label statistics, we also evaluate multiple baseline methods across different tasks and analyze their performances. The dataset will be available at https://semantic-spray-dataset.github.io .

6/17/2024

🖼️

MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration

Yanbo Ding, Shaobin Zhuang, Kunchang Li, Zhengrong Yue, Yu Qiao, Yali Wang

Despite recent advancements in text-to-image generation, most existing methods struggle to create images with multiple objects and complex spatial relationships in 3D world. To tackle this limitation, we introduce a generic AI system, namely MUSES, for 3D-controllable image generation from user queries. Specifically, our MUSES addresses this challenging task by developing a progressive workflow with three key components, including (1) Layout Manager for 2D-to-3D layout lifting, (2) Model Engineer for 3D object acquisition and calibration, (3) Image Artist for 3D-to-2D image rendering. By mimicking the collaboration of human professionals, this multi-modal agent pipeline facilitates the effective and automatic creation of images with 3D-controllable objects, through an explainable integration of top-down planning and bottom-up generation. Additionally, we find that existing benchmarks lack detailed descriptions of complex 3D spatial relationships of multiple objects. To fill this gap, we further construct a new benchmark of T2I-3DisBench (3D image scene), which describes diverse 3D image scenes with 50 detailed prompts. Extensive experiments show the state-of-the-art performance of MUSES on both T2I-CompBench and T2I-3DisBench, outperforming recent strong competitors such as DALL-E 3 and Stable Diffusion 3. These results demonstrate a significant step of MUSES forward in bridging natural language, 2D image generation, and 3D world.

8/22/2024

Multiagent Multitraversal Multimodal Self-Driving: Open MARS Dataset

Yiming Li, Zhiheng Li, Nuo Chen, Moonjun Gong, Zonglin Lyu, Zehong Wang, Peili Jiang, Chen Feng

Large-scale datasets have fueled recent advancements in AI-based autonomous vehicle research. However, these datasets are usually collected from a single vehicle's one-time pass of a certain location, lacking multiagent interactions or repeated traversals of the same place. Such information could lead to transformative enhancements in autonomous vehicles' perception, prediction, and planning capabilities. To bridge this gap, in collaboration with the self-driving company May Mobility, we present the MARS dataset which unifies scenarios that enable MultiAgent, multitraveRSal, and multimodal autonomous vehicle research. More specifically, MARS is collected with a fleet of autonomous vehicles driving within a certain geographical area. Each vehicle has its own route and different vehicles may appear at nearby locations. Each vehicle is equipped with a LiDAR and surround-view RGB cameras. We curate two subsets in MARS: one facilitates collaborative driving with multiple vehicles simultaneously present at the same location, and the other enables memory retrospection through asynchronous traversals of the same location by multiple vehicles. We conduct experiments in place recognition and neural reconstruction. More importantly, MARS introduces new research opportunities and challenges such as multitraversal 3D reconstruction, multiagent perception, and unsupervised object discovery. Our data and codes can be found at https://ai4ce.github.io/MARS/.

6/14/2024