Ensuring UAV Safety: A Vision-only and Real-time Framework for Collision Avoidance Through Object Detection, Tracking, and Distance Estimation

2405.06749

Published 5/17/2024 by Vasileios Karampinis, Anastasios Arsenos, Orfeas Filippopoulos, Evangelos Petrongonas, Christos Skliros, Dimitrios Kollias, Stefanos Kollias, Athanasios Voulodimos

cs.CV cs.LG

Ensuring UAV Safety: A Vision-only and Real-time Framework for Collision Avoidance Through Object Detection, Tracking, and Distance Estimation

Abstract

In the last twenty years, unmanned aerial vehicles (UAVs) have garnered growing interest due to their expanding applications in both military and civilian domains. Detecting non-cooperative aerial vehicles with efficiency and estimating collisions accurately are pivotal for achieving fully autonomous aircraft and facilitating Advanced Air Mobility (AAM). This paper presents a deep-learning framework that utilizes optical sensors for the detection, tracking, and distance estimation of non-cooperative aerial vehicles. In implementing this comprehensive sensing framework, the availability of depth information is essential for enabling autonomous aerial vehicles to perceive and navigate around obstacles. In this work, we propose a method for estimating the distance information of a detected aerial object in real time using only the input of a monocular camera. In order to train our deep learning components for the object detection, tracking and depth estimation tasks we utilize the Amazon Airborne Object Tracking (AOT) Dataset. In contrast to previous approaches that integrate the depth estimation module into the object detector, our method formulates the problem as image-to-image translation. We employ a separate lightweight encoder-decoder network for efficient and robust depth estimation. In a nutshell, the object detection module identifies and localizes obstacles, conveying this information to both the tracking module for monitoring obstacle movement and the depth estimation module for calculating distances. Our approach is evaluated on the Airborne Object Tracking (AOT) dataset which is the largest (to the best of our knowledge) air-to-air airborne object dataset.

Create account to get full access

Overview

This paper presents a vision-based and real-time framework for collision avoidance in unmanned aerial vehicles (UAVs) using object detection, tracking, and distance estimation.
The proposed system aims to ensure the safety of UAVs by enabling them to detect and avoid obstacles in their environment.
The framework leverages computer vision techniques to perceive the surroundings and make informed decisions to navigate safely.

Plain English Explanation

This research outlines a way to help keep drones safe and avoid collisions. The key idea is to use cameras and computer vision to detect and track objects around the drone, and then estimate how far away those objects are. With this information, the drone can navigate and adjust its flight path to steer clear of obstacles in real-time.

The Attention-Based Deep Learning Architecture for Real-Time and C2FDrone: Coarse-to-Fine Drone-to-Drone papers explore related approaches for drone perception and navigation. Similarly, the Unified Control Framework for Real-Time Interception of Obstacles and D-VAT: End-to-End Visual Active Tracking papers tackle the challenge of enabling drones to navigate safely and track targets. The Leveraging Edge Detection and Neural Networks for Better UAV work also examines computer vision techniques for drone perception.

The goal of this research is to make drones more capable of operating safely in complex environments by equipping them with the ability to detect, track, and avoid obstacles in real-time using only onboard cameras and sensors. This could have important applications in areas like search and rescue, infrastructure inspection, and package delivery.

Technical Explanation

The paper proposes a vision-based and real-time framework for collision avoidance in UAVs that consists of three main components:

Object Detection: The system uses a deep learning-based object detection model to identify potential obstacles in the drone's environment, such as other aircraft, buildings, or terrain features.
Object Tracking: Once an object is detected, the framework employs a tracking algorithm to continuously monitor the object's position and movement relative to the drone.
Distance Estimation: By leveraging stereo vision or monocular depth estimation techniques, the system can calculate the distance between the drone and the detected objects, allowing it to determine the risk of collision.

Based on the information gathered from these components, the framework can then plan and execute safe navigation maneuvers to avoid collisions in real-time. The authors evaluate the performance of their approach through extensive simulations and experiments, demonstrating its effectiveness in detecting and avoiding obstacles while maintaining stable flight.

Critical Analysis

The paper presents a comprehensive framework for UAV collision avoidance, addressing key technical challenges such as object detection, tracking, and distance estimation. The authors have taken a well-rounded approach by integrating multiple computer vision techniques to enable the drone's perception and decision-making capabilities.

One potential limitation of the proposed system is its reliance on onboard cameras and sensors, which could be affected by environmental conditions (e.g., poor visibility, occlusions) or the drone's own motion. The authors acknowledge this and suggest the possibility of integrating additional sensors, such as LiDAR or radar, to enhance the system's robustness.

Additionally, the paper focuses on static obstacle avoidance, and it would be interesting to see how the framework could be extended to handle dynamic obstacles, such as other moving drones or aircraft. The Unified Control Framework for Real-Time Interception of Obstacles paper explores some approaches to this challenge.

Overall, the research presented in this paper represents a significant contribution to the field of UAV safety and collision avoidance, and the proposed framework could have important implications for the widespread adoption and safe operation of drones in various applications.

Conclusion

This paper introduces a vision-based and real-time framework for collision avoidance in unmanned aerial vehicles (UAVs). By leveraging computer vision techniques for object detection, tracking, and distance estimation, the proposed system enables drones to perceive their surroundings and navigate safely, avoiding obstacles in their path.

The key innovations of this work include the integration of these core computer vision components into a cohesive framework for UAV safety, as well as its focus on real-time performance to enable immediate decision-making and maneuvers. The authors' thorough evaluation and demonstration of the framework's effectiveness through simulations and experiments further highlight its practical potential.

As drones become increasingly prevalent in various industries and applications, ensuring their safe operation is of paramount importance. The research presented in this paper represents a significant step forward in addressing this challenge and paving the way for the widespread deployment of UAVs in a wide range of settings, from search and rescue missions to infrastructure inspection and package delivery.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

UVCPNet: A UAV-Vehicle Collaborative Perception Network for 3D Object Detection

Yuchao Wang, Peirui Cheng, Pengju Tian, Ziyang Yuan, Liangjin Zhao, Jing Tian, Wensheng Wang, Zhirui Wang, Xian Sun

With the advancement of collaborative perception, the role of aerial-ground collaborative perception, a crucial component, is becoming increasingly important. The demand for collaborative perception across different perspectives to construct more comprehensive perceptual information is growing. However, challenges arise due to the disparities in the field of view between cross-domain agents and their varying sensitivity to information in images. Additionally, when we transform image features into Bird's Eye View (BEV) features for collaboration, we need accurate depth information. To address these issues, we propose a framework specifically designed for aerial-ground collaboration. First, to mitigate the lack of datasets for aerial-ground collaboration, we develop a virtual dataset named V2U-COO for our research. Second, we design a Cross-Domain Cross-Adaptation (CDCA) module to align the target information obtained from different domains, thereby achieving more accurate perception results. Finally, we introduce a Collaborative Depth Optimization (CDO) module to obtain more precise depth estimation results, leading to more accurate perception outcomes. We conduct extensive experiments on both our virtual dataset and a public dataset to validate the effectiveness of our framework. Our experiments on the V2U-COO dataset and the DAIR-V2X dataset demonstrate that our method improves detection accuracy by 6.1% and 2.7%, respectively.

6/10/2024

cs.CV

Clustering-based Learning for UAV Tracking and Pose Estimation

Jiaping Xiao, Phumrapee Pisutsin, Cheng Wen Tsao, Mir Feroskhan

UAV tracking and pose estimation plays an imperative role in various UAV-related missions, such as formation control and anti-UAV measures. Accurately detecting and tracking UAVs in a 3D space remains a particularly challenging problem, as it requires extracting sparse features of micro UAVs from different flight environments and continuously matching correspondences, especially during agile flight. Generally, cameras and LiDARs are the two main types of sensors used to capture UAV trajectories in flight. However, both sensors have limitations in UAV classification and pose estimation. This technical report briefly introduces the method proposed by our team NTU-ICG for the CVPR 2024 UG2+ Challenge Track 5. This work develops a clustering-based learning detection approach, CL-Det, for UAV tracking and pose estimation using two types of LiDARs, namely Livox Avia and LiDAR 360. We combine the information from the two data sources to locate drones in 3D. We first align the timestamps of Livox Avia data and LiDAR 360 data and then separate the point cloud of objects of interest (OOIs) from the environment. The point cloud of OOIs is clustered using the DBSCAN method, with the midpoint of the largest cluster assumed to be the UAV position. Furthermore, we utilize historical estimations to fill in missing data. The proposed method shows competitive pose estimation performance and ranks 5th on the final leaderboard of the CVPR 2024 UG2+ Challenge.

5/28/2024

cs.RO cs.AI

UCDNet: Multi-UAV Collaborative 3D Object Detection Network by Reliable Feature Mapping

Pengju Tian, Peirui Cheng, Yuchao Wang, Zhechao Wang, Zhirui Wang, Menglong Yan, Xue Yang, Xian Sun

Multi-UAV collaborative 3D object detection can perceive and comprehend complex environments by integrating complementary information, with applications encompassing traffic monitoring, delivery services and agricultural management. However, the extremely broad observations in aerial remote sensing and significant perspective differences across multiple UAVs make it challenging to achieve precise and consistent feature mapping from 2D images to 3D space in multi-UAV collaborative 3D object detection paradigm. To address the problem, we propose an unparalleled camera-based multi-UAV collaborative 3D object detection paradigm called UCDNet. Specifically, the depth information from the UAVs to the ground is explicitly utilized as a strong prior to provide a reference for more accurate and generalizable feature mapping. Additionally, we design a homologous points geometric consistency loss as an auxiliary self-supervision, which directly influences the feature mapping module, thereby strengthening the global consistency of multi-view perception. Experiments on AeroCollab3D and CoPerception-UAVs datasets show our method increases 4.7% and 10% mAP respectively compared to the baseline, which demonstrates the superiority of UCDNet.

6/10/2024

cs.CV

🤿

An Attention-Based Deep Learning Architecture for Real-Time Monocular Visual Odometry: Applications to GPS-free Drone Navigation

Olivier Brochu Dufour, Abolfazl Mohebbi, Sofiane Achiche

Drones are increasingly used in fields like industry, medicine, research, disaster relief, defense, and security. Technical challenges, such as navigation in GPS-denied environments, hinder further adoption. Research in visual odometry is advancing, potentially solving GPS-free navigation issues. Traditional visual odometry methods use geometry-based pipelines which, while popular, often suffer from error accumulation and high computational demands. Recent studies utilizing deep neural networks (DNNs) have shown improved performance, addressing these drawbacks. Deep visual odometry typically employs convolutional neural networks (CNNs) and sequence modeling networks like recurrent neural networks (RNNs) to interpret scenes and deduce visual odometry from video sequences. This paper presents a novel real-time monocular visual odometry model for drones, using a deep neural architecture with a self-attention module. It estimates the ego-motion of a camera on a drone, using consecutive video frames. An inference utility processes the live video feed, employing deep learning to estimate the drone's trajectory. The architecture combines a CNN for image feature extraction and a long short-term memory (LSTM) network with a multi-head attention module for video sequence modeling. Tested on two visual odometry datasets, this model converged 48% faster than a previous RNN model and showed a 22% reduction in mean translational drift and a 12% improvement in mean translational absolute trajectory error, demonstrating enhanced robustness to noise.

4/30/2024

cs.RO cs.CV cs.LG eess.IV