P2RBox: Point Prompt Oriented Object Detection with SAM

Read original: arXiv:2311.13128 - Published 5/24/2024 by Guangming Cao, Xuehui Yu, Wenwen Yu, Xumeng Han, Xue Yang, Guorong Li, Jianbin Jiao, Zhenjun Han

🔎

Overview

This paper introduces P2RBox, a method for oriented object detection in remote sensing scenarios using single-point annotations.
The key innovation is the use of point prompts to generate high-quality rotated bounding box (RBox) annotations, addressing the granularity ambiguity of previous point-based methods.
P2RBox leverages the Segment Anything Model (SAM) to generate mask proposals, which are then refined using semantic and spatial information from the annotation points.
Two advanced guidance cues, Boundary Sensitive Mask guidance and Centrality guidance, are incorporated to enhance detection capabilities.
The effectiveness of P2RBox is demonstrated by integrating it with three different detectors, outperforming the state-of-the-art point-annotated generative method PointOBB by a significant margin.

Plain English Explanation

Identifying objects in aerial or satellite images, known as oriented object detection, is a challenging task. One approach that is gaining attention is single-point annotation, where users only need to mark a single point within each object of interest, instead of drawing detailed bounding boxes. This is more cost-effective, but can lead to ambiguity about the exact size and orientation of the objects.

The researchers behind P2RBox have come up with a way to address this issue. Their method uses the Segment Anything Model (SAM) to generate initial mask proposals for the objects based on the single-point annotations. These masks are then refined using additional information about the objects' semantic and spatial characteristics.

The key innovations in P2RBox are two "guidance cues" that help improve the quality of the final object detections:

Boundary Sensitive Mask guidance: This uses the semantic information about the object's boundaries to refine the mask.
Centrality guidance: This utilizes the spatial information about the object's center point to reduce the ambiguity in the mask.

By incorporating these guidance cues, P2RBox is able to generate much more accurate rotated bounding boxes (RBoxes) for the objects compared to previous single-point annotation methods. The researchers demonstrate that integrating P2RBox with three different object detectors leads to significant performance improvements.

Importantly, P2RBox outperforms the current state-of-the-art single-point annotation method, PointOBB, by about 29% on a standard remote sensing dataset. This shows the potential for P2RBox to enable practical applications of single-point annotation, which could greatly reduce the cost and effort required for object detection in aerial and satellite imagery.

Technical Explanation

The key technical innovations in P2RBox are:

Mask Generation: P2RBox uses the Segment Anything Model (SAM) to generate initial mask proposals for the objects based on the single-point annotations.
Mask Refinement: The generated masks are then refined using two advanced guidance cues:
- Boundary Sensitive Mask guidance: This leverages the semantic information about the object's boundaries to improve the mask quality.
- Centrality guidance: This utilizes the spatial information about the object's center point to reduce the ambiguity in the mask.
RBox Generation: The refined masks are converted into oriented bounding boxes (RBoxes) based on the feature directions suggested by the model.

To demonstrate the effectiveness of P2RBox, the authors integrate it with three different object detectors: Faster R-CNN, Cascade R-CNN, and FCOS. This integration leads to significant performance improvements compared to the baseline detectors.

Furthermore, P2RBox outperforms the current state-of-the-art point-annotated generative method, PointOBB, by about 29% mAP (62.43% vs 33.31%) on the DOTA-v1.0 dataset. This demonstrates the effectiveness of P2RBox in addressing the granularity ambiguity inherent in single-point annotations and its potential for practical applications in remote sensing scenarios.

Critical Analysis

The paper provides a novel and promising approach to oriented object detection using single-point annotations. The key strengths of the P2RBox method are:

Addressing Granularity Ambiguity: The incorporation of the Boundary Sensitive Mask guidance and Centrality guidance cues effectively addresses the granularity ambiguity that has plagued previous point-based methods, leading to significant performance improvements.
Flexibility and Generalizability: The authors demonstrate the versatility of P2RBox by integrating it with three different object detectors, suggesting that it can be readily adapted to work with a wide range of detection architectures.

However, the paper also raises a few points for further consideration:

Dataset Limitations: The evaluation is primarily conducted on the DOTA-v1.0 dataset, which may not fully capture the diversity of real-world remote sensing scenarios. It would be valuable to assess the method's performance on a broader range of remote sensing datasets.
Computational Efficiency: The paper does not provide detailed information about the computational cost and inference time of the P2RBox method. As real-world applications often require fast and efficient processing, this aspect should be further investigated.
Interpretability and Explainability: While the method demonstrates impressive performance, it would be beneficial to explore the interpretability and explainability of the generated RBoxes, particularly to understand the impact of the different guidance cues on the final outputs.

Overall, the P2RBox method represents a significant advancement in the field of oriented object detection using single-point annotations, and the promising results suggest its potential for practical applications in remote sensing scenarios. Further research to address the identified limitations and explore the method's broader applicability would be valuable.

Conclusion

The P2RBox method introduced in this paper offers a novel approach to oriented object detection in remote sensing scenarios using single-point annotations. By leveraging the Segment Anything Model to generate mask proposals and refining them with advanced guidance cues, P2RBox is able to effectively address the granularity ambiguity inherent in point-based annotations.

The demonstrated integration of P2RBox with three different object detectors and its significant performance improvement over the state-of-the-art point-annotated generative method PointOBB suggest that this approach has the potential to enable practical applications of single-point annotation in remote sensing. This could lead to significant cost savings and reduced effort in the annotation process, ultimately facilitating more widespread adoption of object detection in aerial and satellite imagery.

While the paper highlights the strengths of the P2RBox method, further research is needed to address the identified limitations, such as exploring its performance on a broader range of datasets, investigating its computational efficiency, and delving into the interpretability and explainability of the generated RBoxes. Nonetheless, this work represents an important step forward in the field of oriented object detection and demonstrates the value of innovative approaches to leveraging single-point annotations.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🔎

P2RBox: Point Prompt Oriented Object Detection with SAM

Guangming Cao, Xuehui Yu, Wenwen Yu, Xumeng Han, Xue Yang, Guorong Li, Jianbin Jiao, Zhenjun Han

Single-point annotation in oriented object detection of remote sensing scenarios is gaining increasing attention due to its cost-effectiveness. However, due to the granularity ambiguity of points, there is a significant performance gap between previous methods and those with fully supervision. In this study, we introduce P2RBox, which employs point prompt to generate rotated box (RBox) annotation for oriented object detection. P2RBox employs the SAM model to generate high-quality mask proposals. These proposals are then refined using the semantic and spatial information from annotation points. The best masks are converted into oriented boxes based on the feature directions suggested by the model. P2RBox incorporates two advanced guidance cues: Boundary Sensitive Mask guidance, which leverages semantic information, and Centrality guidance, which utilizes spatial information to reduce granularity ambiguity. This combination enhances detection capabilities significantly. To demonstrate the effectiveness of this method, enhancements based on the baseline were observed by integrating three different detectors. Furthermore, compared to the state-of-the-art point-annotated generative method PointOBB, P2RBox outperforms by about 29% mAP (62.43% vs 33.31%) on DOTA-v1.0 dataset, which provides possibilities for the practical application of point annotations.

5/24/2024

Point-supervised Brain Tumor Segmentation with Box-prompted MedSAM

Xiaofeng Liu, Jonghye Woo, Chao Ma, Jinsong Ouyang, Georges El Fakhri

Delineating lesions and anatomical structure is important for image-guided interventions. Point-supervised medical image segmentation (PSS) has great potential to alleviate costly expert delineation labeling. However, due to the lack of precise size and boundary guidance, the effectiveness of PSS often falls short of expectations. Although recent vision foundational models, such as the medical segment anything model (MedSAM), have made significant advancements in bounding-box-prompted segmentation, it is not straightforward to utilize point annotation, and is prone to semantic ambiguity. In this preliminary study, we introduce an iterative framework to facilitate semantic-aware point-supervised MedSAM. Specifically, the semantic box-prompt generator (SBPG) module has the capacity to convert the point input into potential pseudo bounding box suggestions, which are explicitly refined by the prototype-based semantic similarity. This is then succeeded by a prompt-guided spatial refinement (PGSR) module that harnesses the exceptional generalizability of MedSAM to infer the segmentation mask, which also updates the box proposal seed in SBPG. Performance can be progressively improved with adequate iterations. We conducted an evaluation on BraTS2018 for the segmentation of whole brain tumors and demonstrated its superior performance compared to traditional PSS methods and on par with box-supervised methods.

8/2/2024

Robust Box Prompt based SAM for Medical Image Segmentation

Yuhao Huang, Xin Yang, Han Zhou, Yan Cao, Haoran Dou, Fajin Dong, Dong Ni

The Segment Anything Model (SAM) can achieve satisfactory segmentation performance under high-quality box prompts. However, SAM's robustness is compromised by the decline in box quality, limiting its practicality in clinical reality. In this study, we propose a novel Robust Box prompt based SAM (textbf{RoBox-SAM}) to ensure SAM's segmentation performance under prompts with different qualities. Our contribution is three-fold. First, we propose a prompt refinement module to implicitly perceive the potential targets, and output the offsets to directly transform the low-quality box prompt into a high-quality one. We then provide an online iterative strategy for further prompt refinement. Second, we introduce a prompt enhancement module to automatically generate point prompts to assist the box-promptable segmentation effectively. Last, we build a self-information extractor to encode the prior information from the input image. These features can optimize the image embeddings and attention calculation, thus, the robustness of SAM can be further enhanced. Extensive experiments on the large medical segmentation dataset including 99,299 images, 5 modalities, and 25 organs/targets validated the efficacy of our proposed RoBox-SAM.

8/1/2024

Category-Aware Dynamic Label Assignment with High-Quality Oriented Proposal

Mingkui Feng, Hancheng Yu, Xiaoyu Dang, Ming Zhou

Objects in aerial images are typically embedded in complex backgrounds and exhibit arbitrary orientations. When employing oriented bounding boxes (OBB) to represent arbitrary oriented objects, the periodicity of angles could lead to discontinuities in label regression values at the boundaries, inducing abrupt fluctuations in the loss function. To address this problem, an OBB representation based on the complex plane is introduced in the oriented detection framework, and a trigonometric loss function is proposed. Moreover, leveraging prior knowledge of complex background environments and significant differences in large objects in aerial images, a conformer RPN head is constructed to predict angle information. The proposed loss function and conformer RPN head jointly generate high-quality oriented proposals. A category-aware dynamic label assignment based on predicted category feedback is proposed to address the limitations of solely relying on IoU for proposal label assignment. This method makes negative sample selection more representative, ensuring consistency between classification and regression features. Experiments were conducted on four realistic oriented detection datasets, and the results demonstrate superior performance in oriented object detection with minimal parameter tuning and time costs. Specifically, mean average precision (mAP) scores of 82.02%, 71.99%, 69.87%, and 98.77% were achieved on the DOTA-v1.0, DOTA-v1.5, DIOR-R, and HRSC2016 datasets, respectively.

7/4/2024