A DeNoising FPN With Transformer R-CNN for Tiny Object Detection

Read original: arXiv:2406.05755 - Published 6/18/2024 by Hou-I Liu, Yu-Wen Tseng, Kai-Cheng Chang, Pin-Jyun Wang, Hong-Han Shuai, Wen-Huang Cheng

A DeNoising FPN With Transformer R-CNN for Tiny Object Detection

Overview

Presents a novel deep learning model for detecting tiny objects in aerial images
Combines a Denoising Feature Pyramid Network (DeNoising FPN) and a Transformer-based Region-based Convolutional Neural Network (Transformer R-CNN)
Aims to enhance the detection of small and cluttered objects in challenging remote sensing scenarios

Plain English Explanation

This research paper introduces a new deep learning model designed to detect tiny objects, like vehicles or structures, in aerial imagery. The key innovations are a Denoising FPN that helps reduce noise and enhance small details, and a Transformer-based R-CNN that uses self-attention mechanisms to better identify small, crowded objects.

The goal is to improve upon existing object detection methods, which can struggle to find tiny targets amidst clutter or low resolution in aerial/satellite imagery. By combining denoising and transformer-based components, the model aims to be more robust and accurate for these challenging remote sensing scenarios.

Technical Explanation

The proposed model consists of two main components: a Denoising Feature Pyramid Network (DeNoising FPN) and a Transformer-based Region-based Convolutional Neural Network (Transformer R-CNN).

The DeNoising FPN builds upon a standard FPN architecture, but includes additional denoising blocks to enhance small-scale details and remove unwanted noise/artifacts. This helps preserve important information about tiny objects that could otherwise be lost.

The Transformer R-CNN uses a transformer-based detection head, rather than a traditional CNN-based one. The transformer layers allow the model to better capture long-range dependencies and contextual information, which is crucial for identifying small, clustered objects. This DETRS approach has shown promise for tiny object detection tasks.

Additionally, the authors incorporate a contrastive learning strategy during pretraining to further enhance the model's ability to recognize small targets.

Critical Analysis

The paper presents a well-designed approach that leverages several state-of-the-art techniques to tackle the challenging problem of tiny object detection in aerial imagery. The combination of the DeNoising FPN and Transformer R-CNN appears to be a reasonable and innovative solution.

However, the authors do not provide extensive comparisons to other recent methods like C2FDrone, which also aim to address tiny object detection. More thorough benchmarking against a wider range of baselines would help better evaluate the relative merits of this approach.

Additionally, the paper would benefit from a more in-depth discussion of potential limitations or failure cases of the proposed model. For example, it's unclear how the model would perform on extremely crowded scenes or in the presence of significant occlusions, which could still pose challenges even with the denoising and transformer components.

Conclusion

This research introduces a novel deep learning architecture that combines a denoising feature pyramid network and a transformer-based object detector to enhance the detection of tiny objects in aerial imagery. The technical innovations, including the DeNoising FPN and Transformer R-CNN, show promise for improving upon existing methods in this challenging computer vision task.

While the paper presents a well-designed approach, further evaluation against a broader set of baselines and a more thorough discussion of potential limitations would strengthen the overall contribution. Nonetheless, this work represents an interesting advance in the field of remote sensing object detection, with potential applications in areas like urban planning, infrastructure monitoring, and disaster response.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

A DeNoising FPN With Transformer R-CNN for Tiny Object Detection

Hou-I Liu, Yu-Wen Tseng, Kai-Cheng Chang, Pin-Jyun Wang, Hong-Han Shuai, Wen-Huang Cheng

Despite notable advancements in the field of computer vision, the precise detection of tiny objects continues to pose a significant challenge, largely owing to the minuscule pixel representation allocated to these objects in imagery data. This challenge resonates profoundly in the domain of geoscience and remote sensing, where high-fidelity detection of tiny objects can facilitate a myriad of applications ranging from urban planning to environmental monitoring. In this paper, we propose a new framework, namely, DeNoising FPN with Trans R-CNN (DNTR), to improve the performance of tiny object detection. DNTR consists of an easy plug-in design, DeNoising FPN (DN-FPN), and an effective Transformer-based detector, Trans R-CNN. Specifically, feature fusion in the feature pyramid network is important for detecting multiscale objects. However, noisy features may be produced during the fusion process since there is no regularization between the features of different scales. Therefore, we introduce a DN-FPN module that utilizes contrastive learning to suppress noise in each level's features in the top-down path of FPN. Second, based on the two-stage framework, we replace the obsolete R-CNN detector with a novel Trans R-CNN detector to focus on the representation of tiny objects with self-attention. Experimental results manifest that our DNTR outperforms the baselines by at least 17.4% in terms of APvt on the AI-TOD dataset and 9.6% in terms of AP on the VisDrone dataset, respectively. Our code will be available at https://github.com/hoiliu-0801/DNTR.

6/18/2024

LR-FPN: Enhancing Remote Sensing Object Detection with Location Refined Feature Pyramid Network

Hanqian Li, Ruinan Zhang, Ye Pan, Junchi Ren, Fei Shen

Remote sensing target detection aims to identify and locate critical targets within remote sensing images, finding extensive applications in agriculture and urban planning. Feature pyramid networks (FPNs) are commonly used to extract multi-scale features. However, existing FPNs often overlook extracting low-level positional information and fine-grained context interaction. To address this, we propose a novel location refined feature pyramid network (LR-FPN) to enhance the extraction of shallow positional information and facilitate fine-grained context interaction. The LR-FPN consists of two primary modules: the shallow position information extraction module (SPIEM) and the contextual interaction module (CIM). Specifically, SPIEM first maximizes the retention of solid location information of the target by simultaneously extracting positional and saliency information from the low-level feature map. Subsequently, CIM injects this robust location information into different layers of the original FPN through spatial and channel interaction, explicitly enhancing the object area. Moreover, in spatial interaction, we introduce a simple local and non-local interaction strategy to learn and retain the saliency information of the object. Lastly, the LR-FPN can be readily integrated into common object detection frameworks to improve performance significantly. Extensive experiments on two large-scale remote sensing datasets (i.e., DOTAV1.0 and HRSC2016) demonstrate that the proposed LR-FPN is superior to state-of-the-art object detection approaches. Our code and models will be publicly available.

4/3/2024

DQ-DETR: DETR with Dynamic Query for Tiny Object Detection

Hou-I Liu, Yi-Xin Huang, Hong-Han Shuai, Wen-Huang Cheng

Despite previous DETR-like methods having performed successfully in generic object detection, tiny object detection is still a challenging task for them since the positional information of object queries is not customized for detecting tiny objects, whose scale is extraordinarily smaller than general objects. Also, DETR-like methods using a fixed number of queries make them unsuitable for aerial datasets, which only contain tiny objects, and the numbers of instances are imbalanced between different images. Thus, we present a simple yet effective model, named DQ-DETR, which consists of three different components: categorical counting module, counting-guided feature enhancement, and dynamic query selection to solve the above-mentioned problems. DQ-DETR uses the prediction and density maps from the categorical counting module to dynamically adjust the number of object queries and improve the positional information of queries. Our model DQ-DETR outperforms previous CNN-based and DETR-like methods, achieving state-of-the-art mAP 30.2% on the AI-TOD-V2 dataset, which mostly consists of tiny objects. Our code will be available at https://github.com/Katie0723/DQ-DETR.

9/24/2024

🔎

DETRs Beat YOLOs on Real-time Object Detection

Yian Zhao, Wenyu Lv, Shangliang Xu, Jinman Wei, Guanzhong Wang, Qingqing Dang, Yi Liu, Jie Chen

The YOLO series has become the most popular framework for real-time object detection due to its reasonable trade-off between speed and accuracy. However, we observe that the speed and accuracy of YOLOs are negatively affected by the NMS. Recently, end-to-end Transformer-based detectors (DETRs) have provided an alternative to eliminating NMS. Nevertheless, the high computational cost limits their practicality and hinders them from fully exploiting the advantage of excluding NMS. In this paper, we propose the Real-Time DEtection TRansformer (RT-DETR), the first real-time end-to-end object detector to our best knowledge that addresses the above dilemma. We build RT-DETR in two steps, drawing on the advanced DETR: first we focus on maintaining accuracy while improving speed, followed by maintaining speed while improving accuracy. Specifically, we design an efficient hybrid encoder to expeditiously process multi-scale features by decoupling intra-scale interaction and cross-scale fusion to improve speed. Then, we propose the uncertainty-minimal query selection to provide high-quality initial queries to the decoder, thereby improving accuracy. In addition, RT-DETR supports flexible speed tuning by adjusting the number of decoder layers to adapt to various scenarios without retraining. Our RT-DETR-R50 / R101 achieves 53.1% / 54.3% AP on COCO and 108 / 74 FPS on T4 GPU, outperforming previously advanced YOLOs in both speed and accuracy. We also develop scaled RT-DETRs that outperform the lighter YOLO detectors (S and M models). Furthermore, RT-DETR-R50 outperforms DINO-R50 by 2.2% AP in accuracy and about 21 times in FPS. After pre-training with Objects365, RT-DETR-R50 / R101 achieves 55.3% / 56.2% AP. The project page: https://zhao-yian.github.io/RTDETR.

4/4/2024