Using Unreliable Pseudo-Labels for Label-Efficient Semantic Segmentation

Read original: arXiv:2306.02314 - Published 8/21/2024 by Haochen Wang, Yuchao Wang, Yujun Shen, Junsong Fan, Yuxi Wang, Zhaoxiang Zhang

🎲

Overview

The key challenge in label-efficient semantic segmentation is to produce high-quality pseudo-labels to leverage a large amount of unlabeled or weakly labeled data.
Commonly, only the highly confident predictions are selected as pseudo-ground-truths, leaving most pixels unused due to unreliability.
This paper argues that even unreliable and ambiguous pixels can provide valuable information for model training.

Plain English Explanation

Semantic segmentation is the task of assigning a category label to each pixel in an image. Label-efficient semantic segmentation aims to achieve good performance with a small amount of labeled data by using unlabeled or weakly labeled data.

A common approach is to select the pixels that the model is most confident about and use those as pseudo-ground-truths to train the model. However, this leads to a problem where most pixels are left unused because the model isn't very confident about them.

This paper argues that even the unreliable and ambiguous pixels can provide useful information for training the model. The key insight is that while an unreliable prediction may be confused between the top few classes, it should still be confident that the pixel does not belong to the other, less likely classes. By treating these ambiguous pixels as "negative keys" for the unlikely classes, the model can learn from them.

The paper presents a method to make effective use of all the available pixels, both reliable and unreliable, to train the model. They adaptively adjust the threshold for determining reliable vs. unreliable pixels as the training progresses.

Technical Explanation

The proposed method consists of the following key steps:

Separating Reliable and Unreliable Pixels: The model's prediction entropy is used to distinguish between reliable and unreliable pixels. Reliable pixels have low entropy (high confidence), while unreliable pixels have high entropy (ambiguous).
Managing Unreliable Pixels: For each unreliable pixel, a category-wise queue is maintained, storing that pixel as a "negative key" for all the classes it is unlikely to belong to (i.e., the classes with low prediction scores).
Adaptive Threshold: As training progresses, the threshold for separating reliable and unreliable pixels is adjusted adaptively to make better use of the available data.

The authors evaluate their approach on various semantic segmentation benchmarks and show that it outperforms state-of-the-art alternatives. The key advantage is the ability to effectively leverage all the available pixels, including the unreliable ones, to improve model performance.

Critical Analysis

The paper presents a novel and promising approach to address the challenge of label-efficient semantic segmentation. The adaptive threshold mechanism and the use of unreliable pixels as negative keys are interesting ideas that can potentially be applied to other semi-supervised or weakly supervised learning tasks.

One potential limitation is the reliance on prediction entropy to separate reliable and unreliable pixels. While this is a reasonable heuristic, there may be more sophisticated ways to make this distinction, especially as the model's confidence calibration can change during training.

Additionally, the paper does not explore the sensitivity of the method to hyperparameters, such as the size of the category-wise queues or the specific adaptive threshold strategy. Further analysis in this direction could provide more insights into the robustness and limitations of the approach.

Overall, this research represents a valuable contribution to the field of label-efficient semantic segmentation and opens up interesting avenues for future work.

Conclusion

This paper presents a novel approach to leverage both reliable and unreliable pixel predictions for label-efficient semantic segmentation. By treating unreliable pixels as negative keys for unlikely classes, the method can effectively utilize all available data to train the model, leading to improved performance compared to state-of-the-art alternatives.

The adaptive threshold mechanism and the core idea of extracting value from ambiguous pixels are promising and could potentially be applied to other semi-supervised or weakly supervised learning tasks beyond semantic segmentation. Further research exploring the method's robustness and potential extensions could lead to even more impactful developments in this important area of machine learning.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🎲

Using Unreliable Pseudo-Labels for Label-Efficient Semantic Segmentation

Haochen Wang, Yuchao Wang, Yujun Shen, Junsong Fan, Yuxi Wang, Zhaoxiang Zhang

The crux of label-efficient semantic segmentation is to produce high-quality pseudo-labels to leverage a large amount of unlabeled or weakly labeled data. A common practice is to select the highly confident predictions as the pseudo-ground-truths for each pixel, but it leads to a problem that most pixels may be left unused due to their unreliability. However, we argue that every pixel matters to the model training, even those unreliable and ambiguous pixels. Intuitively, an unreliable prediction may get confused among the top classes, however, it should be confident about the pixel not belonging to the remaining classes. Hence, such a pixel can be convincingly treated as a negative key to those most unlikely categories. Therefore, we develop an effective pipeline to make sufficient use of unlabeled data. Concretely, we separate reliable and unreliable pixels via the entropy of predictions, push each unreliable pixel to a category-wise queue that consists of negative keys, and manage to train the model with all candidate pixels. Considering the training evolution, we adaptively adjust the threshold for the reliable-unreliable partition. Experimental results on various benchmarks and training settings demonstrate the superiority of our approach over the state-of-the-art alternatives.

8/21/2024

Self Adaptive Threshold Pseudo-labeling and Unreliable Sample Contrastive Loss for Semi-supervised Image Classification

Xuerong Zhang, Li Huang, Jing Lv, Ming Yang

Semi-supervised learning is attracting blooming attention, due to its success in combining unlabeled data. However, pseudo-labeling-based semi-supervised approaches suffer from two problems in image classification: (1) Existing methods might fail to adopt suitable thresholds since they either use a pre-defined/fixed threshold or an ad-hoc threshold adjusting scheme, resulting in inferior performance and slow convergence. (2) Discarding unlabeled data with confidence below the thresholds results in the loss of discriminating information. To solve these issues, we develop an effective method to make sufficient use of unlabeled data. Specifically, we design a self adaptive threshold pseudo-labeling strategy, which thresholds for each class can be dynamically adjusted to increase the number of reliable samples. Meanwhile, in order to effectively utilise unlabeled data with confidence below the thresholds, we propose an unreliable sample contrastive loss to mine the discriminative information in low-confidence samples by learning the similarities and differences between sample features. We evaluate our method on several classification benchmarks under partially labeled settings and demonstrate its superiority over the other approaches.

7/8/2024

Semi-supervised Video Semantic Segmentation Using Unreliable Pseudo Labels for PVUW2024

Biao Wu, Diankai Zhang, Si Gao, Chengjian Zheng, Shaoli Liu, Ning Wang

Pixel-level Scene Understanding is one of the fundamental problems in computer vision, which aims at recognizing object classes, masks and semantics of each pixel in the given image. Compared with image scene parsing, video scene parsing introduces temporal information, which can effectively improve the consistency and accuracy of prediction,because the real-world is actually video-based rather than a static state. In this paper, we adopt semi-supervised video semantic segmentation method based on unreliable pseudo labels. Then, We ensemble the teacher network model with the student network model to generate pseudo labels and retrain the student network. Our method achieves the mIoU scores of 63.71% and 67.83% on development test and final test respectively. Finally, we obtain the 1st place in the Video Scene Parsing in the Wild Challenge at CVPR 2024.

6/4/2024

Weighting Pseudo-Labels via High-Activation Feature Index Similarity and Object Detection for Semi-Supervised Segmentation

Prantik Howlader, Hieu Le, Dimitris Samaras

Semi-supervised semantic segmentation methods leverage unlabeled data by pseudo-labeling them. Thus the success of these methods hinges on the reliablility of the pseudo-labels. Existing methods mostly choose high-confidence pixels in an effort to avoid erroneous pseudo-labels. However, high confidence does not guarantee correct pseudo-labels especially in the initial training iterations. In this paper, we propose a novel approach to reliably learn from pseudo-labels. First, we unify the predictions from a trained object detector and a semantic segmentation model to identify reliable pseudo-label pixels. Second, we assign different learning weights to pseudo-labeled pixels to avoid noisy training signals. To determine these weights, we first use the reliable pseudo-label pixels identified from the first step and labeled pixels to construct a prototype for each class. Then, the per-pixel weight is the structural similarity between the pixel and the prototype measured via rank-statistics similarity. This metric is robust to noise, making it better suited for comparing features from unlabeled images, particularly in the initial training phases where wrong pseudo labels are prone to occur. We show that our method can be easily integrated into four semi-supervised semantic segmentation frameworks, and improves them in both Cityscapes and Pascal VOC datasets.

7/18/2024