Simplifying Two-Stage Detectors for On-Device Inference in Remote Sensing

2404.07405

Published 4/12/2024 by Jaemin Kang, Hoeseok Yang, Hyungshin Kim

Simplifying Two-Stage Detectors for On-Device Inference in Remote Sensing

Abstract

Deep learning has been successfully applied to object detection from remotely sensed images. Images are typically processed on the ground rather than on-board due to the computation power of the ground system. Such offloaded processing causes delays in acquiring target mission information, which hinders its application to real-time use cases. For on-device object detection, researches have been conducted on designing efficient detectors or model compression to reduce inference latency. However, highly accurate two-stage detectors still need further exploitation for acceleration. In this paper, we propose a model simplification method for two-stage object detectors. Instead of constructing a general feature pyramid, we utilize only one feature extraction in the two-stage detector. To compensate for the accuracy drop, we apply a high pass filter to the RPN's score map. Our approach is applicable to any two-stage detector using a feature pyramid network. In the experiments with state-of-the-art two-stage detectors such as ReDet, Oriented-RCNN, and LSKNet, our method reduced computation costs upto 61.2% with the accuracy loss within 2.1% on the DOTAv1.5 dataset. Source code will be released.

Create account to get full access

Overview

This paper focuses on simplifying two-stage object detectors to enable on-device inference for remote sensing applications.
The authors propose modifications to the architecture and training process of two-stage detectors to reduce computational complexity and memory footprint, making them more suitable for deployment on edge devices.
The proposed techniques are evaluated on popular remote sensing datasets, demonstrating improved inference speed and model efficiency compared to standard two-stage detectors without significant accuracy degradation.

Plain English Explanation

Object detection is a crucial task in remote sensing, where the goal is to identify and locate objects of interest (such as vehicles, buildings, or ships) in aerial or satellite imagery. Traditional object detectors often use a two-stage approach, where the first stage generates potential object proposals, and the second stage classifies and refines these proposals.

While two-stage detectors generally achieve high accuracy, they can be computationally expensive and resource-intensive, making them challenging to deploy on edge devices with limited computational power and memory. This paper presents a set of techniques to simplify two-stage detectors, making them more efficient for on-device inference in remote sensing applications.

The key ideas include reducing the number of object proposals generated in the first stage, using a more efficient backbone network, and optimizing the training process to improve model efficiency. By implementing these modifications, the authors demonstrate that the simplified two-stage detectors can achieve comparable accuracy to their more complex counterparts while significantly reducing inference time and memory usage, enabling their deployment on edge devices.

This research is particularly relevant for real-world remote sensing applications, where the ability to perform object detection on-device, without relying on a powerful central server, can unlock new opportunities for low-latency and privacy-preserving AI-powered services in areas such as intelligent transportation systems, urban planning, and environmental monitoring.

Technical Explanation

The authors start by examining the limitations of existing two-stage detectors, such as their high computational complexity and memory requirements, which hinder their deployment on resource-constrained edge devices. To address these challenges, the paper proposes several key modifications to the architecture and training of two-stage detectors:

Reduced Object Proposals: The authors reduce the number of object proposals generated in the first stage of the detector, which helps to lower the overall computational burden. This is achieved by using a more efficient Region Proposal Network (RPN) and applying additional filtering techniques to the proposals.
Efficient Backbone Network: The authors replace the commonly used ResNet or VGG backbones with a more efficient network, such as MobileNetV2, to reduce the model's size and inference time without significantly impacting detection performance.
Optimized Training: The authors introduce several training techniques, including knowledge distillation and iterative bounding box regression, to further improve the model's efficiency and accuracy.

The proposed modifications are evaluated on popular remote sensing datasets, such as DIOR and UCAS-AOD, and compared to state-of-the-art two-stage detectors. The results demonstrate that the simplified two-stage detectors can achieve comparable or even superior detection performance while significantly reducing inference time and memory footprint, making them more suitable for on-device deployment in remote sensing applications.

Critical Analysis

The paper presents a well-designed and comprehensive approach to simplifying two-stage object detectors for on-device inference in remote sensing. The authors have thoroughly examined the limitations of existing two-stage detectors and proposed a set of targeted modifications to address these challenges.

One potential area for further research is the exploration of alternative backbone networks beyond MobileNetV2, as the choice of backbone can have a significant impact on the overall model efficiency and performance. Additionally, the authors could investigate the application of more advanced neural architecture search techniques to automatically discover efficient detector architectures tailored for remote sensing use cases.

Moreover, the paper could have delved deeper into the trade-offs between model efficiency and detection accuracy, as in some real-world scenarios, a slight decrease in accuracy may be acceptable if it enables the deployment of the model on resource-constrained edge devices. A more nuanced discussion of these trade-offs and their implications for different remote sensing applications would further strengthen the paper's contribution.

Despite these potential areas for improvement, the paper presents a valuable contribution to the field of on-device object detection for remote sensing, offering practical solutions that can enable the deployment of AI-powered services closer to the data sources, reducing latency and improving privacy.

Conclusion

This paper tackles the challenge of enabling efficient on-device object detection for remote sensing applications by simplifying the architecture and training of two-stage detectors. The proposed techniques, which include reducing the number of object proposals, using a more efficient backbone network, and optimizing the training process, demonstrate significant improvements in inference speed and model efficiency without compromising detection accuracy.

The findings of this research are particularly relevant for real-world remote sensing use cases, where the ability to perform object detection directly on edge devices can unlock new opportunities for low-latency and privacy-preserving AI-powered services. The authors' work serves as a valuable contribution to the ongoing efforts to bring powerful computer vision capabilities closer to the data sources, paving the way for more intelligent and responsive remote sensing systems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Removal and Selection: Improving RGB-Infrared Object Detection via Coarse-to-Fine Fusion

Tianyi Zhao, Maoxun Yuan, Feng Jiang, Nan Wang, Xingxing Wei

Object detection in visible (RGB) and infrared (IR) images has been widely applied in recent years. Leveraging the complementary characteristics of RGB and IR images, the object detector provides reliable and robust object localization from day to night. Most existing fusion strategies directly input RGB and IR images into deep neural networks, leading to inferior detection performance. However, the RGB and IR features have modality-specific noise, these strategies will exacerbate the fused features along with the propagation. Inspired by the mechanism of the human brain processing multimodal information, in this paper, we introduce a new coarse-to-fine perspective to purify and fuse two modality features. Specifically, following this perspective, we design a Redundant Spectrum Removal module to coarsely remove interfering information within each modality and a Dynamic Feature Selection module to finely select the desired features for feature fusion. To verify the effectiveness of the coarse-to-fine fusion strategy, we construct a new object detector called the Removal and Selection Detector (RSDet). Extensive experiments on three RGB-IR object detection datasets verify the superior performance of our method.

5/8/2024

cs.CV

🔎

Efficient Meta-Learning Enabled Lightweight Multiscale Few-Shot Object Detection in Remote Sensing Images

Wenbin Guan, Zijiu Yang, Xiaohong Wu, Liqiong Chen, Feng Huang, Xiaohai He, Honggang Chen

Presently, the task of few-shot object detection (FSOD) in remote sensing images (RSIs) has become a focal point of attention. Numerous few-shot detectors, particularly those based on two-stage detectors, face challenges when dealing with the multiscale complexities inherent in RSIs. Moreover, these detectors present impractical characteristics in real-world applications, mainly due to their unwieldy model parameters when handling large amount of data. In contrast, we recognize the advantages of one-stage detectors, including high detection speed and a global receptive field. Consequently, we choose the YOLOv7 one-stage detector as a baseline and subject it to a novel meta-learning training framework. This transformation allows the detector to adeptly address FSOD tasks while capitalizing on its inherent advantage of lightweight. Additionally, we thoroughly investigate the samples generated by the meta-learning strategy and introduce a novel meta-sampling approach to retain samples produced by our designed meta-detection head. Coupled with our devised meta-cross loss, we deliberately utilize negative samples that are often overlooked to extract valuable knowledge from them. This approach serves to enhance detection accuracy and efficiently refine the overall meta-learning strategy. To validate the effectiveness of our proposed detector, we conducted performance comparisons with current state-of-the-art detectors using the DIOR and NWPU VHR-10.v2 datasets, yielding satisfactory results.

6/18/2024

cs.CV

Rethinking Early-Fusion Strategies for Improved Multispectral Object Detection

Xue Zhang, Si-Yuan Cao, Fang Wang, Runmin Zhang, Zhe Wu, Xiaohan Zhang, Xiaokai Bai, Hui-Liang Shen

Most recent multispectral object detectors employ a two-branch structure to extract features from RGB and thermal images. While the two-branch structure achieves better performance than a single-branch structure, it overlooks inference efficiency. This conflict is increasingly aggressive, as recent works solely pursue higher performance rather than both performance and efficiency. In this paper, we address this issue by improving the performance of efficient single-branch structures. We revisit the reasons causing the performance gap between these structures. For the first time, we reveal the information interference problem in the naive early-fusion strategy adopted by previous single-branch structures. Besides, we find that the domain gap between multispectral images, and weak feature representation of the single-branch structure are also key obstacles for performance. Focusing on these three problems, we propose corresponding solutions, including a novel shape-priority early-fusion strategy, a weakly supervised learning method, and a core knowledge distillation technique. Experiments demonstrate that single-branch networks equipped with these three contributions achieve significant performance enhancements while retaining high efficiency. Our code will be available at url{https://github.com/XueZ-phd/Efficient-RGB-T-Early-Fusion-Detection}.

5/28/2024

cs.CV

LR-FPN: Enhancing Remote Sensing Object Detection with Location Refined Feature Pyramid Network

Hanqian Li, Ruinan Zhang, Ye Pan, Junchi Ren, Fei Shen

Remote sensing target detection aims to identify and locate critical targets within remote sensing images, finding extensive applications in agriculture and urban planning. Feature pyramid networks (FPNs) are commonly used to extract multi-scale features. However, existing FPNs often overlook extracting low-level positional information and fine-grained context interaction. To address this, we propose a novel location refined feature pyramid network (LR-FPN) to enhance the extraction of shallow positional information and facilitate fine-grained context interaction. The LR-FPN consists of two primary modules: the shallow position information extraction module (SPIEM) and the contextual interaction module (CIM). Specifically, SPIEM first maximizes the retention of solid location information of the target by simultaneously extracting positional and saliency information from the low-level feature map. Subsequently, CIM injects this robust location information into different layers of the original FPN through spatial and channel interaction, explicitly enhancing the object area. Moreover, in spatial interaction, we introduce a simple local and non-local interaction strategy to learn and retain the saliency information of the object. Lastly, the LR-FPN can be readily integrated into common object detection frameworks to improve performance significantly. Extensive experiments on two large-scale remote sensing datasets (i.e., DOTAV1.0 and HRSC2016) demonstrate that the proposed LR-FPN is superior to state-of-the-art object detection approaches. Our code and models will be publicly available.

4/3/2024

cs.CV