Self-training via Metric Learning for Source-Free Domain Adaptation of Semantic Segmentation

2212.04227

Published 4/10/2024 by Ibrahim Batuhan Akkaya, Ugur Halici

🏋️

Abstract

Unsupervised source-free domain adaptation methods aim to train a model for the target domain utilizing a pretrained source-domain model and unlabeled target-domain data, particularly when accessibility to source data is restricted due to intellectual property or privacy concerns. Traditional methods usually use self-training with pseudo-labeling, which is often subjected to thresholding based on prediction confidence. However, such thresholding limits the effectiveness of self-training due to insufficient supervision. This issue becomes more severe in a source-free setting, where supervision comes solely from the predictions of the pre-trained source model. In this study, we propose a novel approach by incorporating a mean-teacher model, wherein the student network is trained using all predictions from the teacher network. Instead of employing thresholding on predictions, we introduce a method to weight the gradients calculated from pseudo-labels based on the reliability of the teacher's predictions. To assess reliability, we introduce a novel approach using proxy-based metric learning. Our method is evaluated in synthetic-to-real and cross-city scenarios, demonstrating superior performance compared to existing state-of-the-art methods.

Create account to get full access

Overview

Unsupervised source-free domain adaptation aims to train a model for a target domain using a pre-trained source-domain model and unlabeled target-domain data.
Traditional methods use self-training with pseudo-labeling, which is limited by thresholding based on prediction confidence.
This issue is more severe in source-free settings, where supervision comes solely from the pre-trained source model.

Plain English Explanation

In some situations, organizations may have a machine learning model that was trained on data from one setting (the "source" domain), but they want to use that model to make predictions in a different setting (the "target" domain). For example, a model trained on medical images from one hospital might be used to analyze images from a different hospital.

The challenge is that the target domain data is unlabeled, meaning there are no correct answers provided for the model to learn from. Unsupervised source-free domain adaptation aims to solve this problem by using the pre-trained source model to generate "pseudo-labels" for the unlabeled target data. The model can then be fine-tuned using those pseudo-labels.

However, traditional pseudo-labeling methods have limitations. They often use a confidence threshold to decide which pseudo-labels to trust, but this can limit the effectiveness of the approach, especially when the source model is the only source of supervision, as in a "source-free" setting.

Technical Explanation

The paper proposes a novel approach that incorporates a "mean-teacher" model, where the student network is trained using all predictions from the teacher network, rather than just the high-confidence predictions. To determine the reliability of each pseudo-label, the authors introduce a method based on "proxy-based metric learning."

The proposed method is evaluated in experiments involving transferring models from synthetic to real-world data, as well as between different city environments. The results demonstrate superior performance compared to existing state-of-the-art methods for unsupervised source-free domain adaptation.

Critical Analysis

The paper presents a promising approach to address the limitations of traditional pseudo-labeling methods in the context of source-free domain adaptation. By leveraging all predictions from the teacher model, rather than just high-confidence ones, the method can potentially make better use of the available information.

However, the paper does not fully explore the potential limitations or caveats of the proposed approach. For instance, it would be valuable to understand how the method performs when the source and target domains are very different, or when the pre-trained source model has significant biases or errors.

Additionally, the authors could have provided more insight into the proxy-based metric learning approach used to assess pseudo-label reliability. A deeper examination of this component and its impact on the overall performance would strengthen the technical contribution.

Conclusion

This paper introduces a novel unsupervised source-free domain adaptation method that addresses the limitations of traditional pseudo-labeling approaches. By incorporating a mean-teacher model and a proxy-based reliability assessment, the proposed technique demonstrates improved performance compared to existing state-of-the-art methods.

The research highlights the importance of developing robust domain adaptation techniques, particularly in scenarios where direct access to source data is restricted. While the paper provides a promising solution, further exploration of the method's limitations and potential improvements could strengthen its impact and applicability in real-world settings.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

ProtoGMM: Multi-prototype Gaussian-Mixture-based Domain Adaptation Model for Semantic Segmentation

Nazanin Moradinasab, Laura S. Shankman, Rebecca A. Deaton, Gary K. Owens, Donald E. Brown

Domain adaptive semantic segmentation aims to generate accurate and dense predictions for an unlabeled target domain by leveraging a supervised model trained on a labeled source domain. The prevalent self-training approach involves retraining the dense discriminative classifier of $p(class|pixel feature)$ using the pseudo-labels from the target domain. While many methods focus on mitigating the issue of noisy pseudo-labels, they often overlook the underlying data distribution p(pixel feature|class) in both the source and target domains. To address this limitation, we propose the multi-prototype Gaussian-Mixture-based (ProtoGMM) model, which incorporates the GMM into contrastive losses to perform guided contrastive learning. Contrastive losses are commonly executed in the literature using memory banks, which can lead to class biases due to underrepresented classes. Furthermore, memory banks often have fixed capacities, potentially restricting the model's ability to capture diverse representations of the target/source domains. An alternative approach is to use global class prototypes (i.e. averaged features per category). However, the global prototypes are based on the unimodal distribution assumption per class, disregarding within-class variation. To address these challenges, we propose the ProtoGMM model. This novel approach involves estimating the underlying multi-prototype source distribution by utilizing the GMM on the feature space of the source samples. The components of the GMM model act as representative prototypes. To achieve increased intra-class semantic similarity, decreased inter-class similarity, and domain alignment between the source and target domains, we employ multi-prototype contrastive learning between source distribution and target samples. The experiments show the effectiveness of our method on UDA benchmarks.

6/28/2024

cs.CV

👀

Self-Training: A Survey

Massih-Reza Amini, Vasilii Feofanov, Loic Pauletto, Lies Hadjadj, Emilie Devijver, Yury Maximov

Semi-supervised algorithms aim to learn prediction functions from a small set of labeled observations and a large set of unlabeled observations. Because this framework is relevant in many applications, they have received a lot of interest in both academia and industry. Among the existing techniques, self-training methods have undoubtedly attracted greater attention in recent years. These models are designed to find the decision boundary on low density regions without making additional assumptions about the data distribution, and use the unsigned output score of a learned classifier, or its margin, as an indicator of confidence. The working principle of self-training algorithms is to learn a classifier iteratively by assigning pseudo-labels to the set of unlabeled training samples with a margin greater than a certain threshold. The pseudo-labeled examples are then used to enrich the labeled training data and to train a new classifier in conjunction with the labeled training set. In this paper, we present self-training methods for binary and multi-class classification; as well as their variants and two related approaches, namely consistency-based approaches and transductive learning. We examine the impact of significant self-training features on various methods, using different general and image classification benchmarks, and we discuss our ideas for future research in self-training. To the best of our knowledge, this is the first thorough and complete survey on this subject.

5/28/2024

cs.LG

👀

Source-Free Domain Adaptation Guided by Vision and Vision-Language Pre-Training

Wenyu Zhang, Li Shen, Chuan-Sheng Foo

Source-free domain adaptation (SFDA) aims to adapt a source model trained on a fully-labeled source domain to a related but unlabeled target domain. While the source model is a key avenue for acquiring target pseudolabels, the generated pseudolabels may exhibit source bias. In the conventional SFDA pipeline, a large data (e.g. ImageNet) pre-trained feature extractor is used to initialize the source model at the start of source training, and subsequently discarded. Despite having diverse features important for generalization, the pre-trained feature extractor can overfit to the source data distribution during source training and forget relevant target domain knowledge. Rather than discarding this valuable knowledge, we introduce an integrated framework to incorporate pre-trained networks into the target adaptation process. The proposed framework is flexible and allows us to plug modern pre-trained networks into the adaptation process to leverage their stronger representation learning capabilities. For adaptation, we propose the Co-learn algorithm to improve target pseudolabel quality collaboratively through the source model and a pre-trained feature extractor. Building on the recent success of the vision-language model CLIP in zero-shot image recognition, we present an extension Co-learn++ to further incorporate CLIP's zero-shot classification decisions. We evaluate on 3 benchmark datasets and include more challenging scenarios such as open-set, partial-set and open-partial SFDA. Experimental results demonstrate that our proposed strategy improves adaptation performance and can be successfully integrated with existing SFDA methods.

5/7/2024

cs.CV cs.LG

🤷

Multi-Target Unsupervised Domain Adaptation for Semantic Segmentation without External Data

Yonghao Xu, Pedram Ghamisi, Yannis Avrithis

Multi-target unsupervised domain adaptation (UDA) aims to learn a unified model to address the domain shift between multiple target domains. Due to the difficulty of obtaining annotations for dense predictions, it has recently been introduced into cross-domain semantic segmentation. However, most existing solutions require labeled data from the source domain and unlabeled data from multiple target domains concurrently during training. Collectively, we refer to this data as external. When faced with new unlabeled data from an unseen target domain, these solutions either do not generalize well or require retraining from scratch on all data. To address these challenges, we introduce a new strategy called multi-target UDA without external data for semantic segmentation. Specifically, the segmentation model is initially trained on the external data. Then, it is adapted to a new unseen target domain without accessing any external data. This approach is thus more scalable than existing solutions and remains applicable when external data is inaccessible. We demonstrate this strategy using a simple method that incorporates self-distillation and adversarial learning, where knowledge acquired from the external data is preserved during adaptation through one-way adversarial learning. Extensive experiments in several synthetic-to-real and real-to-real adaptation settings on four benchmark urban driving datasets show that our method significantly outperforms current state-of-the-art solutions, even in the absence of external data. Our source code is available online (https://github.com/YonghaoXu/UT-KD).

5/13/2024

cs.CV