Dynamic Open Vocabulary Enhanced Safe-landing with Intelligence (DOVESEI)

Read original: arXiv:2308.11471 - Published 5/7/2024 by Haechan Mark Bong, Rongge Zhang, Ricardo de Azambuja, Giovanni Beltrame

🛠️

Overview

This research focuses on the crucial step of safe landing for urban airborne robots.
The primary focus is on the segmentation aspect of the safe landing perception stack.
The researchers present a streamlined reactive UAV system that uses visual servoing and open vocabulary image segmentation.
The system is designed to operate at altitudes up to 100 meters, addressing a gap in previous research.
The researchers introduce a dynamic focus mechanism to improve the reliability of the segmentation output.

Plain English Explanation

The researchers are working on enabling safe landing for drones and other urban airborne robots. They believe that the most crucial part of this is the ability to accurately segment the environment and identify suitable landing spots.

To achieve this, they have developed a drone control system that uses visual information and open vocabulary image segmentation. This approach allows the system to adapt to different environments without needing to be extensively trained on specific scenarios.

The researchers are focusing on operations at altitudes up to 100 meters, as previous work has primarily addressed lower altitudes up to 30 meters. This higher range is important for enabling drones to navigate more complex urban environments.

One challenge the researchers encountered was that the segmentation output could fluctuate abruptly between video frames, which could disrupt the landing process. To address this, they introduced a "dynamic focus" mechanism that adjusts the segmentation mask based on the current stage of the landing maneuver. This helps the control system avoid areas that are beyond the drone's safety radius, improving the reliability of the landing.

Through their experiments, the researchers have demonstrated that their system can successfully execute landing maneuvers at altitudes as low as 20 meters, with a significant improvement in the landing success rate compared to using a global segmentation approach.

Technical Explanation

The researchers present a streamlined reactive UAV system that employs visual servoing by harnessing the capabilities of open vocabulary image segmentation. This approach allows the system to adapt to various scenarios with minimal adjustments, bypassing the need for extensive data accumulation to refine internal models.

The primary focus of the research is on operations originating from altitudes of 100 meters, as this range has received less attention in previous work, which has typically focused on altitudes up to 30 meters. The researchers choose to leave the remaining 20 meters to be navigated using conventional 3D path planning methods.

Using monocular cameras and image segmentation, the researchers demonstrate the system's capability to successfully execute landing maneuvers at altitudes as low as 20 meters. However, they observe that this approach is vulnerable to intermittent and abrupt fluctuations in the segmentation between frames in a video stream.

To address this challenge, the researchers introduce a dynamic focus: a masking mechanism that self-adjusts according to the current landing stage. This dynamic focus guides the control system to avoid regions beyond the drone's safety radius projected onto the ground, mitigating the problems caused by segmentation fluctuations. Their experiments show that this supplementary layer can improve the landing success rate by almost tenfold compared to using a global segmentation approach.

Critical Analysis

The researchers acknowledge the limitations of their approach, such as the vulnerability to segmentation fluctuations between video frames. While the dynamic focus mechanism helps address this issue, it is not a complete solution, and further refinements may be necessary.

Additionally, the researchers' focus on altitudes up to 100 meters is an important step, but there may still be challenges in navigating the final 20 meters using conventional 3D path planning methods. Advancements in small-body object detection could potentially improve the reliability of the landing process at these lower altitudes.

It would also be interesting to see how the researchers' approach performs in more complex urban environments with a wider variety of obstacles and landing surfaces. Further testing and evaluation in real-world scenarios would help validate the system's robustness and identify any additional limitations.

Conclusion

This research presents a significant step towards enabling safe landing for urban airborne robots. The use of open vocabulary image segmentation and the introduction of a dynamic focus mechanism demonstrate an innovative approach to address the challenges of reliable landing perception.

The researchers' focus on higher altitudes, up to 100 meters, is particularly noteworthy, as it expands the operating range of drone landing systems beyond the limitations of previous work. While there are still some challenges to overcome, this research contributes to the ongoing efforts to develop robust and adaptable landing solutions for urban airborne robots.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🛠️

Dynamic Open Vocabulary Enhanced Safe-landing with Intelligence (DOVESEI)

Haechan Mark Bong, Rongge Zhang, Ricardo de Azambuja, Giovanni Beltrame

This work targets what we consider to be the foundational step for urban airborne robots, a safe landing. Our attention is directed toward what we deem the most crucial aspect of the safe landing perception stack: segmentation. We present a streamlined reactive UAV system that employs visual servoing by harnessing the capabilities of open vocabulary image segmentation. This approach can adapt to various scenarios with minimal adjustments, bypassing the necessity for extensive data accumulation for refining internal models, thanks to its open vocabulary methodology. Given the limitations imposed by local authorities, our primary focus centers on operations originating from altitudes of 100 meters. This choice is deliberate, as numerous preceding works have dealt with altitudes up to 30 meters, aligning with the capabilities of small stereo cameras. Consequently, we leave the remaining 20m to be navigated using conventional 3D path planning methods. Utilizing monocular cameras and image segmentation, our findings demonstrate the system's capability to successfully execute landing maneuvers at altitudes as low as 20 meters. However, this approach is vulnerable to intermittent and occasionally abrupt fluctuations in the segmentation between frames in a video stream. To address this challenge, we enhance the image segmentation output by introducing what we call a dynamic focus: a masking mechanism that self adjusts according to the current landing stage. This dynamic focus guides the control system to avoid regions beyond the drone's safety radius projected onto the ground, thus mitigating the problems with fluctuations. Through the implementation of this supplementary layer, our experiments have reached improvements in the landing success rate of almost tenfold when compared to global segmentation. All the source code is open source and available online (github.com/MISTLab/DOVESEI).

5/7/2024

PEACE: Prompt Engineering Automation for CLIPSeg Enhancement in Aerial Robotics

Haechan Mark Bong, Rongge Zhang, Antoine Robillard, Ricardo de Azambuja, Giovanni Beltrame

Safe landing is an essential aspect of flight operations in fields ranging from industrial to space robotics. With the growing interest in artificial intelligence, we focus on learning-based methods for safe landing. Our previous work, Dynamic Open-Vocabulary Enhanced SafE-Landing with Intelligence (DOVESEI), demonstrated the feasibility of using prompt-based segmentation for identifying safe landing zones with open vocabulary models. However, relying on a heuristic selection of words for prompts is not reliable, as it cannot adapt to changing environments, potentially leading to harmful outcomes if the observed environment is not accurately represented by the chosen prompt. To address this issue, we introduce PEACE (Prompt Engineering Automation for CLIPSeg Enhancement), an enhancement to DOVESEI that automates prompt engineering to adapt to shifts in data distribution. PEACE can perform safe landings using only monocular cameras and image segmentation. PEACE shows significant improvements in prompt generation and engineering for aerial images compared to standard prompts used for CLIP and CLIPSeg. By combining DOVESEI and PEACE, our system improved the success rate of safe landing zone selection by at least 30% in both simulations and indoor experiments.

9/10/2024

🤯

Embedded light-weight approach for safe landing in populated areas

Tilemahos Mitroudas, Vasiliki Balaska, Athanasios Psomoulis, Antonios Gasteratos

Landing safety is a challenge heavily engaging the research community recently, due to the increasing interest in applications availed by aerial vehicles. In this paper, we propose a landing safety pipeline based on state of the art object detectors and OctoMap. First, a point cloud of surface obstacles is generated, which is then inserted in an OctoMap. The unoccupied areas are identified, thus resulting to a list of safe landing points. Due to the low inference time achieved by state of the art object detectors and the efficient point cloud manipulation using OctoMap, it is feasible for our approach to deploy on low-weight embedded systems. The proposed pipeline has been evaluated in many simulation scenarios, varying in people density, number, and movement. Simulations were executed with an Nvidia Jetson Nano in the loop to confirm the pipeline's performance and robustness in a low computing power hardware. The experiments yielded promising results with a 95% success rate.

4/9/2024

🛸

Visual Environment Assessment for Safe Autonomous Quadrotor Landing

Mattia Secchiero, Nishanth Bobbili, Yang Zhou, Giuseppe Loianno

Autonomous identification and evaluation of safe landing zones are of paramount importance for ensuring the safety and effectiveness of aerial robots in the event of system failures, low battery, or the successful completion of specific tasks. In this paper, we present a novel approach for detection and assessment of potential landing sites for safe quadrotor landing. Our solution efficiently integrates 2D and 3D environmental information, eliminating the need for external aids such as GPS and computationally intensive elevation maps. The proposed pipeline combines semantic data derived from a Neural Network (NN), to extract environmental features, with geometric data obtained from a disparity map, to extract critical geometric attributes such as slope, flatness, and roughness. We define several cost metrics based on these attributes to evaluate safety, stability, and suitability of regions in the environments and identify the most suitable landing area. Our approach runs in real-time on quadrotors equipped with limited computational capabilities. Experimental results conducted in diverse environments demonstrate that the proposed method can effectively assess and identify suitable landing areas, enabling the safe and autonomous landing of a quadrotor.

5/6/2024