Clustering-based Learning for UAV Tracking and Pose Estimation

2405.16867

Published 5/28/2024 by Jiaping Xiao, Phumrapee Pisutsin, Cheng Wen Tsao, Mir Feroskhan

Clustering-based Learning for UAV Tracking and Pose Estimation

Abstract

UAV tracking and pose estimation plays an imperative role in various UAV-related missions, such as formation control and anti-UAV measures. Accurately detecting and tracking UAVs in a 3D space remains a particularly challenging problem, as it requires extracting sparse features of micro UAVs from different flight environments and continuously matching correspondences, especially during agile flight. Generally, cameras and LiDARs are the two main types of sensors used to capture UAV trajectories in flight. However, both sensors have limitations in UAV classification and pose estimation. This technical report briefly introduces the method proposed by our team NTU-ICG for the CVPR 2024 UG2+ Challenge Track 5. This work develops a clustering-based learning detection approach, CL-Det, for UAV tracking and pose estimation using two types of LiDARs, namely Livox Avia and LiDAR 360. We combine the information from the two data sources to locate drones in 3D. We first align the timestamps of Livox Avia data and LiDAR 360 data and then separate the point cloud of objects of interest (OOIs) from the environment. The point cloud of OOIs is clustered using the DBSCAN method, with the midpoint of the largest cluster assumed to be the UAV position. Furthermore, we utilize historical estimations to fill in missing data. The proposed method shows competitive pose estimation performance and ranks 5th on the final leaderboard of the CVPR 2024 UG2+ Challenge.

Create account to get full access

Overview

This paper presents a clustering-based approach for tracking and estimating the pose of unmanned aerial vehicles (UAVs) in real-time.
The method uses a convolutional neural network to extract visual features from camera images and then applies clustering algorithms to group similar features and track the UAV.
The approach also estimates the 6-degree-of-freedom pose of the UAV, which includes its position and orientation, using the clustered features.

Plain English Explanation

The paper describes a new way to keep track of and determine the position and orientation of drones (also called UAVs) in real-time using a camera. It works by first using a deep learning model to identify important visual features in the camera images. Then, it groups together similar features using clustering algorithms. This allows the system to track the drone as it moves around and also figure out exactly where the drone is positioned and how it is oriented in 3D space.

The key idea is to use the clustered visual features, rather than trying to directly detect the drone itself, to perform the tracking and pose estimation. This clustering-based approach is claimed to be more robust and effective than previous methods. The authors demonstrate the effectiveness of their technique through experiments on real-world drone footage.

Technical Explanation

The paper presents a clustering-based learning approach for UAV tracking and pose estimation. The method first uses a convolutional neural network to extract visual features from camera images. It then applies clustering algorithms, such as k-means and DBSCAN, to group together similar features.

By tracking the clusters of visual features over time, the system is able to localize and follow the movement of the UAV. The paper also describes how the 6-degree-of-freedom pose (position and orientation) of the UAV can be estimated based on the clustered features. Experiments on real-world datasets show that the clustering-based approach outperforms previous UAV detection, classification, and tracking methods in terms of accuracy and robustness.

The use of clustering allows the system to be more resilient to occlusions and environmental changes compared to direct object detection approaches. Additionally, the leveraging of edge detection and neural networks for feature extraction is claimed to improve performance. The paper also discusses how the technique can be used to enhance 3D reconstruction from sparse point clouds obtained from the tracked UAV.

Critical Analysis

The paper presents a novel and promising approach for UAV tracking and pose estimation that leverages clustering techniques. The experimental results demonstrate the effectiveness of the method compared to prior work.

However, the paper does not provide a detailed analysis of the computational complexity and runtime performance of the proposed algorithms. This is an important consideration for real-time UAV applications where low latency is crucial. The authors also do not explore the robustness of the approach to challenging scenarios, such as multi-object tracking with camera-lidar fusion, which could further enhance the practical applicability of the techniques.

Additionally, the paper would benefit from a more thorough discussion of the limitations and potential failure modes of the clustering-based approach. For example, how sensitive is the performance to the selection of clustering algorithms and their hyperparameters? What types of environments or scenarios might cause issues for this method?

Overall, the research presents an interesting and potentially impactful contribution to the field of UAV perception and tracking. However, further analysis and exploration of the practical considerations and limitations would strengthen the paper and guide future research in this area.

Conclusion

This paper introduces a novel clustering-based approach for tracking and estimating the pose of UAVs using camera images. The key innovation is the use of feature clustering to localize and follow the drone, rather than relying on direct object detection. This clustering-based method is shown to be more robust and effective than previous techniques.

The paper demonstrates the effectiveness of the proposed approach through experiments on real-world datasets. The ability to accurately track the 6-degree-of-freedom pose of UAVs in real-time has important applications in areas such as autonomous navigation, aerial photography, and public safety monitoring. While the paper leaves some open questions, it represents a promising step forward in the field of UAV perception and control.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Multi-Modal UAV Detection, Classification and Tracking Algorithm -- Technical Report for CVPR 2024 UG2 Challenge

Tianchen Deng, Yi Zhou, Wenhua Wu, Mingrui Li, Jingwei Huang, Shuhong Liu, Yanzeng Song, Hao Zuo, Yanbo Wang, Yutao Yue, Hesheng Wang, Weidong Chen

This technical report presents the 1st winning model for UG2+, a task in CVPR 2024 UAV Tracking and Pose-Estimation Challenge. This challenge faces difficulties in drone detection, UAV-type classification and 2D/3D trajectory estimation in extreme weather conditions with multi-modal sensor information, including stereo vision, various Lidars, Radars, and audio arrays. Leveraging this information, we propose a multi-modal UAV detection, classification, and 3D tracking method for accurate UAV classification and tracking. A novel classification pipeline which incorporates sequence fusion, region of interest (ROI) cropping, and keyframe selection is proposed. Our system integrates cutting-edge classification techniques and sophisticated post-processing steps to boost accuracy and robustness. The designed pose estimation pipeline incorporates three modules: dynamic points analysis, a multi-object tracker, and trajectory completion techniques. Extensive experiments have validated the effectiveness and precision of our approach. In addition, we also propose a novel dataset pre-processing method and conduct a comprehensive ablation study for our design. We finally achieved the best performance in the classification and tracking of the MMUAD dataset. The code and configuration of our method are available at https://github.com/dtc111111/Multi-Modal-UAV.

5/28/2024

cs.RO cs.CV

Ensuring UAV Safety: A Vision-only and Real-time Framework for Collision Avoidance Through Object Detection, Tracking, and Distance Estimation

Vasileios Karampinis, Anastasios Arsenos, Orfeas Filippopoulos, Evangelos Petrongonas, Christos Skliros, Dimitrios Kollias, Stefanos Kollias, Athanasios Voulodimos

In the last twenty years, unmanned aerial vehicles (UAVs) have garnered growing interest due to their expanding applications in both military and civilian domains. Detecting non-cooperative aerial vehicles with efficiency and estimating collisions accurately are pivotal for achieving fully autonomous aircraft and facilitating Advanced Air Mobility (AAM). This paper presents a deep-learning framework that utilizes optical sensors for the detection, tracking, and distance estimation of non-cooperative aerial vehicles. In implementing this comprehensive sensing framework, the availability of depth information is essential for enabling autonomous aerial vehicles to perceive and navigate around obstacles. In this work, we propose a method for estimating the distance information of a detected aerial object in real time using only the input of a monocular camera. In order to train our deep learning components for the object detection, tracking and depth estimation tasks we utilize the Amazon Airborne Object Tracking (AOT) Dataset. In contrast to previous approaches that integrate the depth estimation module into the object detector, our method formulates the problem as image-to-image translation. We employ a separate lightweight encoder-decoder network for efficient and robust depth estimation. In a nutshell, the object detection module identifies and localizes obstacles, conveying this information to both the tracking module for monitoring obstacle movement and the depth estimation module for calculating distances. Our approach is evaluated on the Airborne Object Tracking (AOT) dataset which is the largest (to the best of our knowledge) air-to-air airborne object dataset.

5/17/2024

cs.CV cs.LG

Leveraging edge detection and neural networks for better UAV localization

Theo Di Piazza, Enric Meinhardt-Llopis, Gabriele Facciolo, Benedicte Bascle, Corentin Abgrall, Jean-Clement Devaux

We propose a novel method for geolocalizing Unmanned Aerial Vehicles (UAVs) in environments lacking Global Navigation Satellite Systems (GNSS). Current state-of-the-art techniques employ an offline-trained encoder to generate a vector representation (embedding) of the UAV's current view, which is then compared with pre-computed embeddings of geo-referenced images to determine the UAV's position. Here, we demonstrate that the performance of these methods can be significantly enhanced by preprocessing the images to extract their edges, which exhibit robustness to seasonal and illumination variations. Furthermore, we establish that utilizing edges enhances resilience to orientation and altitude inaccuracies. Additionally, we introduce a confidence criterion for localization. Our findings are substantiated through synthetic experiments.

6/4/2024

cs.CV

UCDNet: Multi-UAV Collaborative 3D Object Detection Network by Reliable Feature Mapping

Pengju Tian, Peirui Cheng, Yuchao Wang, Zhechao Wang, Zhirui Wang, Menglong Yan, Xue Yang, Xian Sun

Multi-UAV collaborative 3D object detection can perceive and comprehend complex environments by integrating complementary information, with applications encompassing traffic monitoring, delivery services and agricultural management. However, the extremely broad observations in aerial remote sensing and significant perspective differences across multiple UAVs make it challenging to achieve precise and consistent feature mapping from 2D images to 3D space in multi-UAV collaborative 3D object detection paradigm. To address the problem, we propose an unparalleled camera-based multi-UAV collaborative 3D object detection paradigm called UCDNet. Specifically, the depth information from the UAVs to the ground is explicitly utilized as a strong prior to provide a reference for more accurate and generalizable feature mapping. Additionally, we design a homologous points geometric consistency loss as an auxiliary self-supervision, which directly influences the feature mapping module, thereby strengthening the global consistency of multi-view perception. Experiments on AeroCollab3D and CoPerception-UAVs datasets show our method increases 4.7% and 10% mAP respectively compared to the baseline, which demonstrates the superiority of UCDNet.

6/10/2024

cs.CV