AVM-SLAM: Semantic Visual SLAM with Multi-Sensor Fusion in a Bird's Eye View for Automated Valet Parking

Read original: arXiv:2309.08180 - Published 7/2/2024 by Ye Li, Wenchao Yang, Dekun Lin, Qianlei Wang, Zhe Cui, Xiaolin Qin

🔗

Overview

Accurately localizing a vehicle in challenging garage environments is crucial for automated valet parking (AVP) tasks.
This research introduces AVM-SLAM, a semantic visual SLAM (Simultaneous Localization and Mapping) architecture with multi-sensor fusion in a bird's eye view (BEV).
The system uses four fisheye cameras, wheel encoders, and an inertial measurement unit (IMU) to construct a robust SLAM system.
Key innovations include a flare removal technique for improved road marking detection and semantic feature extraction, as well as a semantic pre-qualification (SPQ) module to handle repetitive textures.
The researchers have released a specialized multi-sensor and high-resolution dataset of an underground garage to encourage further exploration and validation of their approach.

Plain English Explanation

Automated valet parking (AVP) systems need to accurately determine a vehicle's location, even in challenging garage environments with poor lighting, sparse textures, repetitive structures, dynamic scenes, and no GPS signal. To address these issues, the researchers developed a new system called AVM-SLAM.

AVM-SLAM uses multiple cameras, wheel sensors, and an inertial measurement unit to create a detailed map of the environment and track the vehicle's location within it. A key innovation is the use of a "bird's eye view" perspective, which helps the system better detect road markings and other important features.

The researchers also developed a technique to remove glare and reflections from the camera images, which can interfere with feature detection. Additionally, they created a "semantic pre-qualification" module to help the system recognize and handle repetitive patterns in the environment, which can be difficult for traditional SLAM systems.

By releasing a specialized dataset of an underground garage, the researchers are encouraging other researchers to further test and improve upon their AVM-SLAM approach in similar challenging environments.

Technical Explanation

AVM-SLAM is a cutting-edge semantic visual SLAM architecture that fuses data from multiple sensors to construct a robust, bird's eye view (BEV) representation of the environment. The system utilizes four fisheye cameras, wheel encoders, and an inertial measurement unit (IMU) to create this comprehensive map.

A unique aspect of AVM-SLAM is the implementation of a flare removal technique within the BEV imagery. This significantly enhances the detection of road markings and the extraction of semantic features by convolutional neural networks, leading to superior mapping and localization capabilities.

The researchers also pioneered a semantic pre-qualification (SPQ) module to handle the challenges posed by environments with repetitive textures, such as those found in many parking garages. This module enhances loop detection and overall system robustness.

To validate the effectiveness and resilience of AVM-SLAM, the researchers have released a specialized multi-sensor and high-resolution dataset of an underground garage, accessible at https://yale-cv.github.io/avm-slam_dataset. This dataset encourages further exploration and validation of their approach within similar challenging settings.

Critical Analysis

The paper presents a comprehensive and innovative solution to the problem of accurate localization in complex garage environments. The use of multi-sensor fusion, including fisheye cameras, wheel encoders, and an IMU, is a key strength of the AVM-SLAM system, as it allows for a more robust and reliable mapping and tracking of the vehicle's position.

The researchers' development of the flare removal technique and the semantic pre-qualification module are also notable contributions, as they directly address the challenges posed by poor lighting, repetitive structures, and dynamic scenes in parking garages.

However, the paper does not provide a detailed evaluation of the system's performance compared to other state-of-the-art SLAM approaches, such as BundleSLAM or LetsMap. Additionally, the paper does not discuss the computational complexity or real-time performance of the AVM-SLAM system, which are critical factors for practical deployment in autonomous parking applications.

Further research could also explore the generalizability of the AVM-SLAM approach to other types of environments, beyond just parking garages, as well as investigate the potential for tighter integration with other components of an autonomous driving system, such as vision-based localization and planning algorithms.

Conclusion

The AVM-SLAM system introduced in this research represents a significant advancement in the field of visual SLAM for automated valet parking applications. By leveraging multi-sensor fusion, innovative techniques for handling challenging environmental conditions, and the release of a specialized dataset, the researchers have made important contributions to the development of robust and reliable localization solutions for autonomous vehicles in complex parking garage settings.

While further research is needed to fully evaluate the system's performance and generalizability, the AVM-SLAM approach demonstrates the potential for cutting-edge semantic visual SLAM architectures to overcome the limitations of traditional techniques and enable the widespread deployment of automated valet parking systems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🔗

AVM-SLAM: Semantic Visual SLAM with Multi-Sensor Fusion in a Bird's Eye View for Automated Valet Parking

Ye Li, Wenchao Yang, Dekun Lin, Qianlei Wang, Zhe Cui, Xiaolin Qin

Accurate localization in challenging garage environments -- marked by poor lighting, sparse textures, repetitive structures, dynamic scenes, and the absence of GPS -- is crucial for automated valet parking (AVP) tasks. Addressing these challenges, our research introduces AVM-SLAM, a cutting-edge semantic visual SLAM architecture with multi-sensor fusion in a bird's eye view (BEV). This novel framework synergizes the capabilities of four fisheye cameras, wheel encoders, and an inertial measurement unit (IMU) to construct a robust SLAM system. Unique to our approach is the implementation of a flare removal technique within the BEV imagery, significantly enhancing road marking detection and semantic feature extraction by convolutional neural networks for superior mapping and localization. Our work also pioneers a semantic pre-qualification (SPQ) module, designed to adeptly handle the challenges posed by environments with repetitive textures, thereby enhancing loop detection and system robustness. To demonstrate the effectiveness and resilience of AVM-SLAM, we have released a specialized multi-sensor and high-resolution dataset of an underground garage, accessible at https://yale-cv.github.io/avm-slam_dataset, encouraging further exploration and validation of our approach within similar settings.

7/2/2024

🤯

BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation

Zhijian Liu, Haotian Tang, Alexander Amini, Xinyu Yang, Huizi Mao, Daniela Rus, Song Han

Multi-sensor fusion is essential for an accurate and reliable autonomous driving system. Recent approaches are based on point-level fusion: augmenting the LiDAR point cloud with camera features. However, the camera-to-LiDAR projection throws away the semantic density of camera features, hindering the effectiveness of such methods, especially for semantic-oriented tasks (such as 3D scene segmentation). In this paper, we break this deeply-rooted convention with BEVFusion, an efficient and generic multi-task multi-sensor fusion framework. It unifies multi-modal features in the shared bird's-eye view (BEV) representation space, which nicely preserves both geometric and semantic information. To achieve this, we diagnose and lift key efficiency bottlenecks in the view transformation with optimized BEV pooling, reducing latency by more than 40x. BEVFusion is fundamentally task-agnostic and seamlessly supports different 3D perception tasks with almost no architectural changes. It establishes the new state of the art on nuScenes, achieving 1.3% higher mAP and NDS on 3D object detection and 13.6% higher mIoU on BEV map segmentation, with 1.9x lower computation cost. Code to reproduce our results is available at https://github.com/mit-han-lab/bevfusion.

9/4/2024

Automated Parking Planning with Vision-Based BEV Approach

Yuxuan Zhao

Automated Valet Parking (AVP) is a crucial component of advanced autonomous driving systems, focusing on the endpoint task within the human-vehicle interaction process to tackle the challenges of the last mile.The perception module of the automated parking algorithm has evolved from local perception using ultrasonic radar and global scenario precise map matching for localization to a high-level map-free Birds Eye View (BEV) perception solution.The BEV scene places higher demands on the real-time performance and safety of automated parking planning tasks. This paper proposes an improved automated parking algorithm based on the A* algorithm, integrating vehicle kinematic models, heuristic function optimization, bidirectional search, and Bezier curve optimization to enhance the computational speed and real-time capabilities of the planning algorithm.Numerical optimization methods are employed to generate the final parking trajectory, ensuring the safety of the parking path. The proposed approach is experimentally validated in the commonly used industrial CARLA-ROS joint simulation environment. Compared to traditional algorithms, this approach demonstrates reduced computation time with more challenging collision-risk test cases and improved performance in comfort metrics.

6/26/2024

🤷

SlideSLAM: Sparse, Lightweight, Decentralized Metric-Semantic SLAM for Multi-Robot Navigation

Xu Liu, Jiuzhou Lei, Ankit Prabhu, Yuezhan Tao, Igor Spasojevic, Pratik Chaudhari, Nikolay Atanasov, Vijay Kumar

This paper develops a real-time decentralized metric-semantic Simultaneous Localization and Mapping (SLAM) approach that leverages a sparse and lightweight object-based representation to enable a heterogeneous robot team to autonomously explore 3D environments featuring indoor, urban, and forested areas without relying on GPS. We use a hierarchical metric-semantic representation of the environment, including high-level sparse semantic maps of object models and low-level voxel maps. We leverage the informativeness and viewpoint invariance of the high-level semantic map to obtain an effective semantics-driven place-recognition algorithm for inter-robot loop closure detection across aerial and ground robots with different sensing modalities. A communication module is designed to track each robot's own observations and those of other robots whenever communication links are available. Such observations are then used to construct a merged map. Our framework enables real-time decentralized operations onboard robots, allowing them to opportunistically leverage communication. We integrate and deploy our proposed framework on three types of aerial and ground robots. Extensive experimental results show an average inter-robot localization error of approximately 20 cm in position and 0.2 degrees in orientation, an object mapping F1 score consistently over 0.9, and a communication packet size of merely 2-3 megabytes per kilometer trajectory with as many as 1,000 landmarks. The project website can be found at https://xurobotics.github.io/slideslam/.

7/26/2024