C2FDrone: Coarse-to-Fine Drone-to-Drone Detection using Vision Transformer Networks

2404.19276

Published 5/1/2024 by Sairam VC Rebbapragada, Pranoy Panda, Vineeth N Balasubramanian

🔎

Abstract

A vision-based drone-to-drone detection system is crucial for various applications like collision avoidance, countering hostile drones, and search-and-rescue operations. However, detecting drones presents unique challenges, including small object sizes, distortion, occlusion, and real-time processing requirements. Current methods integrating multi-scale feature fusion and temporal information have limitations in handling extreme blur and minuscule objects. To address this, we propose a novel coarse-to-fine detection strategy based on vision transformers. We evaluate our approach on three challenging drone-to-drone detection datasets, achieving F1 score enhancements of 7%, 3%, and 1% on the FL-Drones, AOT, and NPS-Drones datasets, respectively. Additionally, we demonstrate real-time processing capabilities by deploying our model on an edge-computing device. Our code will be made publicly available.

Create account to get full access

Overview

This paper proposes a novel vision-based drone detection system that uses a coarse-to-fine detection strategy based on vision transformers.
The system aims to address the unique challenges of drone detection, such as small object sizes, distortion, occlusion, and real-time processing requirements.
The authors evaluate their approach on three challenging drone-to-drone detection datasets and demonstrate real-time processing capabilities on an edge-computing device.

Plain English Explanation

Drones, or unmanned aerial vehicles, are becoming increasingly common for various applications, from collision avoidance to search-and-rescue operations. However, detecting drones can be a challenging task due to their small size, the distortion and occlusion they may face, and the need for real-time processing.

To address these challenges, the researchers have developed a new drone detection system that uses a "coarse-to-fine" approach. This means the system first looks for larger, more obvious drone-like objects, and then zooms in to identify smaller, more subtle drones.

The key innovation is the use of "vision transformers," a type of machine learning model that is well-suited for handling complex visual information. The authors show that their system outperforms other state-of-the-art methods on several benchmark datasets, and can even run in real-time on specialized hardware.

Technical Explanation

The paper proposes a novel coarse-to-fine drone detection strategy based on vision transformers. The approach first uses a coarse detection stage to identify potential drone-like objects, and then a fine detection stage to accurately localize and classify the drones.

The coarse detection stage leverages a transformer-based architecture to capture multi-scale features and temporal information, which helps handle challenges like small object sizes and occlusion. The fine detection stage then refines the bounding boxes and class predictions using a similar transformer-based model.

The authors evaluate their approach on three challenging drone-to-drone detection datasets: FL-Drones, AOT, and NPS-Drones. They report F1 score improvements of 7%, 3%, and 1%, respectively, over previous state-of-the-art methods. Additionally, they demonstrate that their model can run in real-time on an edge-computing device, which is crucial for many practical applications.

Critical Analysis

The paper presents a well-designed and thorough evaluation of the proposed drone detection system. The use of vision transformers is a novel and promising approach to tackle the unique challenges of drone detection, such as small object sizes and real-time processing requirements.

However, the paper does not fully address the potential limitations of the system. For example, it would be helpful to understand how the system performs in scenarios with extreme environmental conditions, such as heavy fog or rain, which could further degrade the visual input. Additionally, the paper does not discuss the power consumption and hardware requirements of the real-time deployment, which could be an important consideration for some applications.

Furthermore, the authors could have provided more insight into the specific architectural choices and hyperparameter tuning that led to the reported performance improvements. This information would be valuable for researchers looking to build upon this work or adapt the approach to other domains.

Overall, the paper presents a significant contribution to the field of drone detection, and the authors have made their code publicly available, which is commendable. Further research and testing in more diverse and challenging real-world scenarios could help validate the broader applicability and robustness of the proposed system.

Conclusion

This paper introduces a novel coarse-to-fine drone detection system based on vision transformers, which aims to address the unique challenges of drone perception, such as small object sizes, distortion, and real-time processing requirements.

The authors demonstrate that their approach outperforms state-of-the-art methods on several benchmark datasets and can run in real-time on edge-computing devices. This is a significant step forward in enabling robust and practical drone-to-drone detection for a wide range of applications, from collision avoidance to search-and-rescue operations.

While the paper presents a well-designed and thorough evaluation, further research is needed to address potential limitations, such as the system's performance in extreme environmental conditions and the hardware requirements for real-time deployment. Nonetheless, the authors' public release of the code is a commendable contribution that will likely spur further advancements in this important and rapidly evolving field.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Robust Low-Cost Drone Detection and Classification in Low SNR Environments

Stefan Gluge, Matthias Nyfeler, Ahmad Aghaebrahimian, Nicola Ramagnano, Christof Schupbach

The proliferation of drones, or unmanned aerial vehicles (UAVs), has raised significant safety concerns due to their potential misuse in activities such as espionage, smuggling, and infrastructure disruption. This paper addresses the critical need for effective drone detection and classification systems that operate independently of UAV cooperation. We evaluate various convolutional neural networks (CNNs) for their ability to detect and classify drones using spectrogram data derived from consecutive Fourier transforms of signal components. The focus is on model robustness in low signal-to-noise ratio (SNR) environments, which is critical for real-world applications. A comprehensive dataset is provided to support future model development. In addition, we demonstrate a low-cost drone detection system using a standard computer, software-defined radio (SDR) and antenna, validated through real-world field testing. On our development dataset, all models consistently achieved an average balanced classification accuracy of >= 85% at SNR > -12dB. In the field test, these models achieved an average balance accuracy of > 80%, depending on transmitter distance and antenna direction. Our contributions include: a publicly available dataset for model development, a comparative analysis of CNN for drone detection under low SNR conditions, and the deployment and field evaluation of a practical, low-cost detection system.

6/28/2024

eess.SP cs.LG

Vision Transformers for End-to-End Vision-Based Quadrotor Obstacle Avoidance

Anish Bhattacharya, Nishanth Rao, Dhruv Parikh, Pratik Kunapuli, Nikolai Matni, Vijay Kumar

We demonstrate the capabilities of an attention-based end-to-end approach for high-speed quadrotor obstacle avoidance in dense, cluttered environments, with comparison to various state-of-the-art architectures. Quadrotor unmanned aerial vehicles (UAVs) have tremendous maneuverability when flown fast; however, as flight speed increases, traditional vision-based navigation via independent mapping, planning, and control modules breaks down due to increased sensor noise, compounding errors, and increased processing latency. Thus, learning-based, end-to-end planning and control networks have shown to be effective for online control of these fast robots through cluttered environments. We train and compare convolutional, U-Net, and recurrent architectures against vision transformer models for depth-based end-to-end control, in a photorealistic, high-physics-fidelity simulator as well as in hardware, and observe that the attention-based models are more effective as quadrotor speeds increase, while recurrent models with many layers provide smoother commands at lower speeds. To the best of our knowledge, this is the first work to utilize vision transformers for end-to-end vision-based quadrotor control.

5/20/2024

cs.RO cs.AI eess.IV

Ensuring UAV Safety: A Vision-only and Real-time Framework for Collision Avoidance Through Object Detection, Tracking, and Distance Estimation

Vasileios Karampinis, Anastasios Arsenos, Orfeas Filippopoulos, Evangelos Petrongonas, Christos Skliros, Dimitrios Kollias, Stefanos Kollias, Athanasios Voulodimos

In the last twenty years, unmanned aerial vehicles (UAVs) have garnered growing interest due to their expanding applications in both military and civilian domains. Detecting non-cooperative aerial vehicles with efficiency and estimating collisions accurately are pivotal for achieving fully autonomous aircraft and facilitating Advanced Air Mobility (AAM). This paper presents a deep-learning framework that utilizes optical sensors for the detection, tracking, and distance estimation of non-cooperative aerial vehicles. In implementing this comprehensive sensing framework, the availability of depth information is essential for enabling autonomous aerial vehicles to perceive and navigate around obstacles. In this work, we propose a method for estimating the distance information of a detected aerial object in real time using only the input of a monocular camera. In order to train our deep learning components for the object detection, tracking and depth estimation tasks we utilize the Amazon Airborne Object Tracking (AOT) Dataset. In contrast to previous approaches that integrate the depth estimation module into the object detector, our method formulates the problem as image-to-image translation. We employ a separate lightweight encoder-decoder network for efficient and robust depth estimation. In a nutshell, the object detection module identifies and localizes obstacles, conveying this information to both the tracking module for monitoring obstacle movement and the depth estimation module for calculating distances. Our approach is evaluated on the Airborne Object Tracking (AOT) dataset which is the largest (to the best of our knowledge) air-to-air airborne object dataset.

5/17/2024

cs.CV cs.LG

Visible and Clear: Finding Tiny Objects in Difference Map

Bing Cao, Haiyu Yao, Pengfei Zhu, Qinghua Hu

Tiny object detection is one of the key challenges in the field of object detection. The performance of most generic detectors dramatically decreases in tiny object detection tasks. The main challenge lies in extracting effective features of tiny objects. Existing methods usually perform generation-based feature enhancement, which is seriously affected by spurious textures and artifacts, making it difficult to make the tiny-object-specific features visible and clear for detection. To address this issue, we propose a self-reconstructed tiny object detection (SR-TOD) framework. We for the first time introduce a self-reconstruction mechanism in the detection model, and discover the strong correlation between it and the tiny objects. Specifically, we impose a reconstruction head in-between the neck of a detector, constructing a difference map of the reconstructed image and the input, which shows high sensitivity to tiny objects. This inspires us to enhance the weak representations of tiny objects under the guidance of the difference maps. Thus, improving the visibility of tiny objects for the detectors. Building on this, we further develop a Difference Map Guided Feature Enhancement (DGFE) module to make the tiny feature representation more clear. In addition, we further propose a new multi-instance anti-UAV dataset, which is called DroneSwarms dataset and contains a large number of tiny drones with the smallest average size to date. Extensive experiments on the DroneSwarms dataset and other datasets demonstrate the effectiveness of the proposed method. The code and dataset will be publicly available.

5/21/2024

cs.CV