Semi-Supervised Object Detection: A Survey on Progress from CNN to Transformer

Read original: arXiv:2407.08460 - Published 7/17/2024 by Tahira Shehzadi, Ifza, Didier Stricker, Muhammad Zeshan Afzal

Semi-Supervised Object Detection: A Survey on Progress from CNN to Transformer

Overview

• This paper provides a comprehensive survey of semi-supervised object detection, which aims to leverage unlabeled data to improve object detection performance. • The survey covers the evolution of semi-supervised object detection approaches, from convolutional neural networks (CNNs) to the more recent Transformer-based models. • Key topics discussed include leveraging unlabeled data to boost object detection, collaboration between multiple teacher models, and end-to-end semi-supervised object detection.

Plain English Explanation

Object detection is a computer vision task that involves identifying and localizing objects of interest within an image. Traditionally, object detection models are trained on large, labeled datasets, which can be time-consuming and expensive to create. Semi-supervised object detection aims to address this by leveraging both labeled and unlabeled data to improve model performance.

The paper starts by reviewing previous research on semi-supervised object detection, including approaches that use multiple "teacher" models to boost performance and end-to-end semi-supervised frameworks. It then explores how the field has evolved, moving from CNN-based models to more powerful Transformer-based architectures.

The key idea behind semi-supervised object detection is to use unlabeled data, which is often much easier to obtain than labeled data, to help the model learn more robust and generalizable features. This can be achieved through techniques like self-supervised learning, where the model is trained to perform auxiliary tasks on the unlabeled data, or co-operative supervision, where multiple teacher models work together to provide guidance to a single student model.

By incorporating these semi-supervised techniques, object detection models can achieve better performance with less labeled data, which can be especially useful in domains where labeling data is particularly challenging or expensive, such as medical imaging or autonomous driving.

Technical Explanation

The paper begins by reviewing previous surveys and research on semi-supervised object detection. This includes approaches like leveraging unlabeled data to boost object detection performance, collaboration between multiple teacher models, and end-to-end semi-supervised object detection frameworks.

The survey then delves into the evolution of semi-supervised object detection, starting with CNN-based models and progressing to the more recent Transformer-based architectures. Transformer-based models, such as DETR, have shown promising results in semi-supervised object detection by leveraging the powerful self-attention mechanism to learn rich feature representations from both labeled and unlabeled data.

The paper also discusses key techniques used in semi-supervised object detection, such as self-supervised learning, where the model learns to perform auxiliary tasks on unlabeled data to extract useful features, and cooperative supervision, where multiple teacher models work together to guide a single student model.

Critical Analysis

The paper provides a thorough and well-structured survey of the progress in semi-supervised object detection, covering a range of relevant techniques and architectures. However, it is important to note that while semi-supervised learning can be a powerful approach, it also comes with its own set of challenges and limitations.

One potential issue is the reliability and generalization of the features learned from unlabeled data. If the unlabeled data is not sufficiently diverse or representative of the target domain, the model may learn spurious correlations or biases that could negatively impact its performance on real-world data.

Additionally, the paper does not delve deeply into the computational and memory requirements of the semi-supervised approaches, which could be a significant concern, especially for resource-constrained deployment scenarios, such as few-shot object detection.

Further research is needed to address these challenges and to explore more efficient and robust semi-supervised object detection methods that can be readily deployed in practical applications.

Conclusion

This survey paper provides a comprehensive overview of the progress in semi-supervised object detection, covering the evolution from CNN-based models to the more recent Transformer-based architectures. The paper highlights key techniques, such as self-supervised learning and cooperative supervision, that have been used to leverage unlabeled data to boost object detection performance.

The findings in this paper have important implications for the development of more efficient and accessible object detection systems, as semi-supervised learning can help reduce the reliance on large, heavily annotated datasets. This could be particularly beneficial in domains where data labeling is challenging or expensive, such as medical imaging or autonomous driving.

However, the paper also identifies potential limitations and areas for further research, such as ensuring the reliability and generalization of features learned from unlabeled data and addressing the computational and memory requirements of semi-supervised approaches. Addressing these challenges will be crucial for the wider adoption and real-world deployment of semi-supervised object detection techniques.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Semi-Supervised Object Detection: A Survey on Progress from CNN to Transformer

Tahira Shehzadi, Ifza, Didier Stricker, Muhammad Zeshan Afzal

The impressive advancements in semi-supervised learning have driven researchers to explore its potential in object detection tasks within the field of computer vision. Semi-Supervised Object Detection (SSOD) leverages a combination of a small labeled dataset and a larger, unlabeled dataset. This approach effectively reduces the dependence on large labeled datasets, which are often expensive and time-consuming to obtain. Initially, SSOD models encountered challenges in effectively leveraging unlabeled data and managing noise in generated pseudo-labels for unlabeled data. However, numerous recent advancements have addressed these issues, resulting in substantial improvements in SSOD performance. This paper presents a comprehensive review of 27 cutting-edge developments in SSOD methodologies, from Convolutional Neural Networks (CNNs) to Transformers. We delve into the core components of semi-supervised learning and its integration into object detection frameworks, covering data augmentation techniques, pseudo-labeling strategies, consistency regularization, and adversarial training methods. Furthermore, we conduct a comparative analysis of various SSOD models, evaluating their performance and architectural differences. We aim to ignite further research interest in overcoming existing challenges and exploring new directions in semi-supervised learning for object detection.

7/17/2024

SOOD++: Leveraging Unlabeled Data to Boost Oriented Object Detection

Dingkang Liang, Wei Hua, Chunsheng Shi, Zhikang Zou, Xiaoqing Ye, Xiang Bai

Semi-supervised object detection (SSOD), leveraging unlabeled data to boost object detectors, has become a hot topic recently. However, existing SSOD approaches mainly focus on horizontal objects, leaving multi-oriented objects common in aerial images unexplored. At the same time, the annotation cost of multi-oriented objects is significantly higher than that of their horizontal counterparts. Therefore, in this paper, we propose a simple yet effective Semi-supervised Oriented Object Detection method termed SOOD++. Specifically, we observe that objects from aerial images are usually arbitrary orientations, small scales, and aggregation, which inspires the following core designs: a Simple Instance-aware Dense Sampling (SIDS) strategy is used to generate comprehensive dense pseudo-labels; the Geometry-aware Adaptive Weighting (GAW) loss dynamically modulates the importance of each pair between pseudo-label and corresponding prediction by leveraging the intricate geometric information of aerial objects; we treat aerial images as global layouts and explicitly build the many-to-many relationship between the sets of pseudo-labels and predictions via the proposed Noise-driven Global Consistency (NGC). Extensive experiments conducted on various multi-oriented object datasets under various labeled settings demonstrate the effectiveness of our method. For example, on the DOTA-V1.5 benchmark, the proposed method outperforms previous state-of-the-art (SOTA) by a large margin (+2.92, +2.39, and +2.57 mAP under 10%, 20%, and 30% labeled data settings, respectively) with single-scale training and testing. More importantly, it still improves upon a strong supervised baseline with 70.66 mAP, trained using the full DOTA-V1.5 train-val set, by +1.82 mAP, resulting in a 72.48 mAP, pushing the new state-of-the-art. The code will be made available.

7/2/2024

Class-balanced Open-set Semi-supervised Object Detection for Medical Images

Zhanyun Lu, Renshu Gu, Huimin Cheng, Siyu Pang, Mingyu Xu, Peifang Xu, Yaqi Wang, Yuichiro Kinoshita, Juan Ye, Gangyong Jia, Qing Wu

Medical image datasets in the real world are often unlabeled and imbalanced, and Semi-Supervised Object Detection (SSOD) can utilize unlabeled data to improve an object detector. However, existing approaches predominantly assumed that the unlabeled data and test data do not contain out-of-distribution (OOD) classes. The few open-set semi-supervised object detection methods have two weaknesses: first, the class imbalance is not considered; second, the OOD instances are distinguished and simply discarded during pseudo-labeling. In this paper, we consider the open-set semi-supervised object detection problem which leverages unlabeled data that contain OOD classes to improve object detection for medical images. Our study incorporates two key innovations: Category Control Embed (CCE) and out-of-distribution Detection Fusion Classifier (OODFC). CCE is designed to tackle dataset imbalance by constructing a Foreground information Library, while OODFC tackles open-set challenges by integrating the ``unknown'' information into basic pseudo-labels. Our method outperforms the state-of-the-art SSOD performance, achieving a 4.25 mAP improvement on the public Parasite dataset.

8/23/2024

🔎

Collaboration of Teachers for Semi-supervised Object Detection

Liyu Chen, Huaao Tang, Yi Wen, Hanting Chen, Wei Li, Junchao Liu, Jie Hu

Recent semi-supervised object detection (SSOD) has achieved remarkable progress by leveraging unlabeled data for training. Mainstream SSOD methods rely on Consistency Regularization methods and Exponential Moving Average (EMA), which form a cyclic data flow. However, the EMA updating training approach leads to weight coupling between the teacher and student models. This coupling in a cyclic data flow results in a decrease in the utilization of unlabeled data information and the confirmation bias on low-quality or erroneous pseudo-labels. To address these issues, we propose the Collaboration of Teachers Framework (CTF), which consists of multiple pairs of teacher and student models for training. In the learning process of CTF, the Data Performance Consistency Optimization module (DPCO) informs the best pair of teacher models possessing the optimal pseudo-labels during the past training process, and these most reliable pseudo-labels generated by the best performing teacher would guide the other student models. As a consequence, this framework greatly improves the utilization of unlabeled data and prevents the positive feedback cycle of unreliable pseudo-labels. The CTF achieves outstanding results on numerous SSOD datasets, including a 0.71% mAP improvement on the 10% annotated COCO dataset and a 0.89% mAP improvement on the VOC dataset compared to LabelMatch and converges significantly faster. Moreover, the CTF is plug-and-play and can be integrated with other mainstream SSOD methods.

5/24/2024