Mapping Degeneration Meets Label Evolution: Learning Infrared Small Target Detection with Single Point Supervision

Read original: arXiv:2304.01484 - Published 8/26/2024 by Xinyi Ying, Li Liu, Yingqian Wang, Ruojing Li, Nuo Chen, Zaiping Lin, Weidong Sheng, Shilin Zhou

🔎

Overview

Researchers have been training convolutional neural networks (CNNs) to detect infrared small targets, but this requires a large number of detailed per-pixel annotations, which is labor-intensive.
To address this, the researchers propose a new method called label evolution with single point supervision (LESPS) that can train CNNs using only point-level supervision rather than full pixel-level annotations.
During training, the researchers discover that CNNs first learn to segment a cluster of pixels near the target, and then gradually converge to predict the ground truth point labels.
Motivated by this, the LESPS framework progressively expands the point labels using the CNN's intermediate predictions, allowing the network to learn a pixel-level target mask in an end-to-end manner.

Plain English Explanation

The paper focuses on training convolutional neural networks (CNNs) to detect small infrared targets. Typically, this requires detailed annotations for every pixel in the image, which is very time-consuming and expensive.

To make this process easier, the researchers propose a new method called label evolution with single point supervision (LESPS). With LESPS, the CNN only needs to be trained on simple point-level annotations - just a dot indicating the center of each target.

Interestingly, the researchers found that during this point-level training, the CNN first learns to segment a cluster of pixels around the target, and then gradually refines this to match the exact ground truth point. Inspired by this, the LESPS framework dynamically expands the point labels using the CNN's own predictions. This allows the network to eventually learn a full pixel-level target mask, without needing the labor-intensive full pixel annotations.

Technical Explanation

The key innovation in this paper is the label evolution with single point supervision (LESPS) framework. Traditionally, training CNNs for infrared small target detection requires detailed per-pixel annotations, which is very labor-intensive.

To address this, the researchers propose using only simple point-level supervision during training. Surprisingly, they find that even with just these point labels, the CNN first learns to segment a cluster of pixels around the target, and then gradually converges to predict the exact ground truth point.

Motivated by this "mapping degeneration" phenomenon, the LESPS method dynamically expands the point labels using the CNN's own intermediate predictions. Specifically, the point labels are iteratively grown into larger regions based on the network's outputs. This allows the CNN to gradually learn a full pixel-level target mask in an end-to-end manner, without needing the costly full pixel annotations.

The researchers conduct extensive experiments to validate the effectiveness of LESPS. They show that CNNs trained with LESPS can recover the target masks from just point-level supervision, achieving over 70% of the performance of fully supervised CNNs in terms of pixel-level intersection over union (IoU), and over 95% in terms of object-level probability of detection (Pd).

Critical Analysis

The LESPS framework represents an innovative approach to reducing the annotation burden for training CNNs on infrared small target detection. By leveraging the network's own intermediate predictions to progressively expand the point-level labels, the researchers are able to avoid the need for full pixel-level annotations.

However, the paper does not fully address potential limitations or edge cases of the LESPS method. For example, it's unclear how well the approach would scale to more complex scenes with many closely-spaced targets, or how robust it would be to noisy or inaccurate point labels provided by human annotators.

Additionally, while the researchers provide strong quantitative results, more qualitative analysis or visualizations of the intermediate stages of label evolution could help readers better understand the underlying dynamics and failure modes of the approach.

Further research could also explore ways to make the label expansion process more principled or automated, rather than relying on heuristics. Incorporating uncertainty estimates or active learning strategies may be fruitful avenues to investigate.

Overall, the LESPS framework represents an important step forward in reducing annotation costs for infrared small target detection. With further refinement and analysis, it could become a valuable tool in the computer vision practitioner's toolkit.

Conclusion

This paper presents a novel label evolution with single point supervision (LESPS) framework for training convolutional neural networks (CNNs) to detect infrared small targets. By leveraging the CNN's own intermediate predictions to progressively expand simple point-level annotations, LESPS can achieve high performance without the need for labor-intensive full pixel-level annotations.

The researchers' key insight is that CNNs trained on point labels first learn to segment a cluster around the target, and then gradually converge to the exact ground truth point. Motivated by this "mapping degeneration" phenomenon, LESPS dynamically grows the point labels to allow the network to learn a full pixel-level target mask in an end-to-end manner.

Extensive experiments show that CNNs trained with LESPS can recover detailed target masks while achieving over 70% of the performance of fully supervised networks in terms of pixel-level intersection over union (IoU), and over 95% in object-level probability of detection (Pd). This represents an important step forward in reducing annotation costs for infrared small target detection, with potential applications in a variety of computer vision tasks.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🔎

Mapping Degeneration Meets Label Evolution: Learning Infrared Small Target Detection with Single Point Supervision

Xinyi Ying, Li Liu, Yingqian Wang, Ruojing Li, Nuo Chen, Zaiping Lin, Weidong Sheng, Shilin Zhou

Training a convolutional neural network (CNN) to detect infrared small targets in a fully supervised manner has gained remarkable research interests in recent years, but is highly labor expensive since a large number of per-pixel annotations are required. To handle this problem, in this paper, we make the first attempt to achieve infrared small target detection with point-level supervision. Interestingly, during the training phase supervised by point labels, we discover that CNNs first learn to segment a cluster of pixels near the targets, and then gradually converge to predict groundtruth point labels. Motivated by this mapping degeneration phenomenon, we propose a label evolution framework named label evolution with single point supervision (LESPS) to progressively expand the point label by leveraging the intermediate predictions of CNNs. In this way, the network predictions can finally approximate the updated pseudo labels, and a pixel-level target mask can be obtained to train CNNs in an end-to-end manner. We conduct extensive experiments with insightful visualizations to validate the effectiveness of our method. Experimental results show that CNNs equipped with LESPS can well recover the target masks from corresponding point labels, {and can achieve over 70% and 95% of their fully supervised performance in terms of pixel-level intersection over union (IoU) and object-level probability of detection (Pd), respectively. Code is available at https://github.com/XinyiYing/LESPS.

8/26/2024

🔎

Refined Infrared Small Target Detection Scheme with Single-Point Supervision

Jinmiao Zhao, Zelin Shi, Chuang Yu, Yunpeng Liu

Recently, infrared small target detection with single-point supervision has attracted extensive attention. However, the detection accuracy of existing methods has difficulty meeting actual needs. Therefore, we propose an innovative refined infrared small target detection scheme with single-point supervision, which has excellent segmentation accuracy and detection rate. Specifically, we introduce label evolution with single point supervision (LESPS) framework and explore the performance of various excellent infrared small target detection networks based on this framework. Meanwhile, to improve the comprehensive performance, we construct a complete post-processing strategy. On the one hand, to improve the segmentation accuracy, we use a combination of test-time augmentation (TTA) and conditional random field (CRF) for post-processing. On the other hand, to improve the detection rate, we introduce an adjustable sensitivity (AS) strategy for post-processing, which fully considers the advantages of multiple detection results and reasonably adds some areas with low confidence to the fine segmentation image in the form of centroid points. In addition, to further improve the performance and explore the characteristics of this task, on the one hand, we construct and find that a multi-stage loss is helpful for fine-grained detection. On the other hand, we find that a reasonable sliding window cropping strategy for test samples has better performance for actual multi-size samples. Extensive experimental results show that the proposed scheme achieves state-of-the-art (SOTA) performance. Notably, the proposed scheme won the third place in the ICPR 2024 Resource-Limited Infrared Small Target Detection Challenge Track 1: Weakly Supervised Infrared Small Target Detection.

8/7/2024

Beyond Full Label: Single-Point Prompt for Infrared Small Target Label Generation

Shuai Yuan, Hanlin Qin, Renke Kou, Xiang Yan, Zechuan Li, Chenxu Peng, Abd-Krim Seghouane

In this work, we make the first attempt to construct a learning-based single-point annotation paradigm for infrared small target label generation (IRSTLG). Our intuition is that label generation requires just one more point prompt than target detection: IRSTLG can be regarded as an infrared small target detection (IRSTD) task with the target location hint. Based on this insight, we introduce an energy double guided single-point prompt (EDGSP) framework, which adeptly transforms the target detection network into a refined label generation method. Specifically, the proposed EDGSP includes: 1) target energy initialization (TEI) to create a foundational outline for sufficient shape evolution of pseudo label, 2) double prompt embedding (DPE) for rapid localization of interested regions and reinforcement of individual differences to avoid label adhesion, and 3) bounding box-based matching (BBM) to eliminate false alarms. Experimental results show that pseudo labels generated by three baselines equipped with EDGSP achieve 100% object-level probability of detection (Pd) and 0% false-alarm rate (Fa) on SIRST, NUDT-SIRST, and IRSTD-1k datasets, with a pixel-level intersection over union (IoU) improvement of 13.28% over state-of-the-art (SOTA) label generation methods. In the practical application of downstream IRSTD, EDGSP realizes, for the first time, a single-point generated pseudo mask beyond the full label. Even with coarse single-point annotations, it still achieves 99.5% performance of full labeling.

8/22/2024

Hybrid Mask Generation for Infrared Small Target Detection with Single-Point Supervision

Weijie He, Mushui Liu, Yunlong Yu, Zheming Lu, Xi Li

Single-frame infrared small target (SIRST) detection poses a significant challenge due to the requirement to discern minute targets amidst complex infrared background clutter. Recently, deep learning approaches have shown promising results in this domain. However, these methods heavily rely on extensive manual annotations, which are particularly cumbersome and resource-intensive for infrared small targets owing to their minute sizes. To address this limitation, we introduce a Hybrid Mask Generation (HMG) approach that recovers high-quality masks for each target from only a single-point label for network training. Specifically, our HMG approach consists of a handcrafted Points-to-Mask Generation strategy coupled with a pseudo mask updating strategy to recover and refine pseudo masks from point labels. The Points-to-Mask Generation strategy divides two distinct stages: Points-to-Box conversion, where individual point labels are transformed into bounding boxes, and subsequently, Box-to-Mask prediction, where these bounding boxes are elaborated into precise masks. The mask updating strategy integrates the complementary strengths of handcrafted and deep-learning algorithms to iteratively refine the initial pseudo masks. Experimental results across three datasets demonstrate that our method outperforms the existing methods for infrared small target detection with single-point supervision.

9/9/2024