Supervised domain adaptation for building extraction from off-nadir aerial images

Read original: arXiv:2311.03867 - Published 8/9/2024 by Bipul Neupane, Jagannath Aryal, Abbas Rajabifard

👨‍🏫

Overview

Building extraction is crucial for inventory management and urban planning, but it is affected by misalignment between labels and off-nadir source imagery in training data.
Existing solutions like teacher-student learning of noise-tolerant convolutional neural networks (CNNs) have lower accuracy and cannot surpass the teacher's performance.
This paper proposes a supervised domain adaptation (SDA) of encoder-decoder networks (EDNs) between noisy and clean datasets to tackle the problem.

Plain English Explanation

The paper focuses on the challenge of building extraction - the process of identifying and delineating buildings in satellite or aerial imagery. This information is essential for inventory management and urban planning, as it allows cities to keep track of their building stock and make informed decisions about development and infrastructure.

However, the training data used to develop building extraction models often suffers from a misalignment between the building labels (the ground truth information about the location and size of buildings) and the actual imagery used to train the model. This can happen when the imagery is captured from an off-nadir (not directly overhead) angle, which distorts the appearance of the buildings compared to the labels.

Existing solutions, such as using teacher-student learning to train noise-tolerant convolutional neural networks (CNNs), have been able to address this problem to some degree. In this approach, a high-performing "teacher" network is used to guide the training of a less accurate "student" network, helping it become more robust to the noise and distortions in the data.

However, these teacher-student approaches still have limitations - the student networks typically achieve lower accuracy and are unable to surpass the performance of the teacher.

To address this, the paper proposes a new method called supervised domain adaptation (SDA) of encoder-decoder networks (EDNs). The key idea is to configure the EDNs with high-performing, lightweight encoder architectures like EfficientNet, ResNeSt, and MobileViT, and then use SDA to adapt the networks trained on noisy data to perform well on clean, high-quality datasets.

The authors compare this SDA approach to other existing techniques like knowledge distillation (KD) and deep mutual learning (DML), using three newly developed datasets that simulate different levels of building height and spatial resolution misalignment.

Technical Explanation

The paper proposes a supervised domain adaptation (SDA) approach to train encoder-decoder networks (EDNs) for building extraction, addressing the misalignment between labels and off-nadir source imagery in the training data.

The experimental design involved:

Network Architectures: The authors benchmarked 43 lightweight CNN architectures, including popular models like EfficientNet, ResNeSt, and MobileViT, to identify the best-performing encoders for the EDN setup.
Optimization: Five different optimizers were evaluated, including Adam, SGD, and RMSProp, to find the best-performing configuration.
Loss Functions: Nine different loss functions were tested, such as Dice loss, Binary Cross-Entropy, and their combinations, to identify the most suitable objective for the task.
EDN Configurations: Seven different EDN configurations were explored, varying the encoder and decoder components, to determine the optimal architecture for the SDA approach.

The proposed SDA method was then compared against the existing teacher-student learning approaches, such as knowledge distillation (KD) and deep mutual learning (DML), on three newly developed datasets that simulate varying degrees of building height and spatial resolution misalignment.

The results show that the SDA of the best-performing EDN from the study significantly outperformed the KD and DML methods, achieving F1 scores of up to 0.943, 0.868, 0.912, and 0.697 for low-rise, mid-rise, high-rise, and skyscraper buildings, respectively.

Critical Analysis

The paper presents a well-designed study that systematically explores different network architectures, optimization strategies, and loss functions to tackle the challenging problem of building extraction in the presence of label-image misalignment.

One potential limitation is the use of simulated datasets, which may not fully capture the complexity and nuances of real-world urban environments. While the authors have designed these datasets to mimic various levels of misalignment, it would be valuable to also evaluate the proposed SDA approach on actual, real-world datasets to ensure its effectiveness in practical scenarios.

Additionally, the paper does not provide much insight into the computational efficiency or inference speed of the proposed SDA method, which could be an important consideration for practical applications. A comparison of the training and inference times of the SDA approach against the baseline methods would help readers understand the trade-offs and practical implications of the proposed solution.

Further research could also explore the potential of self-supervised learning or unsupervised domain adaptation techniques to address the label-image misalignment problem, potentially eliminating the need for manually annotated clean datasets required by the supervised domain adaptation approach.

Conclusion

This paper presents a robust supervised domain adaptation (SDA) approach to train encoder-decoder networks (EDNs) for building extraction, addressing the challenge of misalignment between labels and off-nadir source imagery in the training data.

The extensive experimental evaluation, which includes benchmarking various network architectures, optimization strategies, and loss functions, demonstrates the effectiveness of the proposed SDA method in outperforming existing teacher-student learning techniques, such as knowledge distillation and deep mutual learning.

The findings of this study can be highly valuable for researchers and practitioners working on developing accurate and reliable building extraction models, which are crucial for urban inventory management and planning. The insights and techniques presented in this paper can serve as a foundation for further advancements in this important field of research.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

👨‍🏫

Supervised domain adaptation for building extraction from off-nadir aerial images

Bipul Neupane, Jagannath Aryal, Abbas Rajabifard

Building extraction $-$ needed for inventory management and planning of urban environment $-$ is affected by the misalignment between labels and off-nadir source imagery in training data. Teacher-Student learning of noise-tolerant convolutional neural networks (CNNs) is the existing solution, but the Student networks typically have lower accuracy and cannot surpass the Teacher's performance. This paper proposes a supervised domain adaptation (SDA) of encoder-decoder networks (EDNs) between noisy and clean datasets to tackle the problem. EDNs are configured with high-performing lightweight encoders such as EfficientNet, ResNeSt, and MobileViT. The proposed method is compared against the existing Teacher-Student learning methods like knowledge distillation (KD) and deep mutual learning (DML) with three newly developed datasets. The methods are evaluated for different urban buildings (low-rise, mid-rise, high-rise, and skyscrapers), where misalignment increases with the increase in building height and spatial resolution. For a robust experimental design, 43 lightweight CNNs, five optimisers, nine loss functions, and seven EDNs are benchmarked to obtain the best-performing EDN for SDA. The SDA of the best-performing EDN from our study significantly outperformed KD and DML with up to 0.943, 0.868, 0.912, and 0.697 F1 scores in the low-rise, mid-rise, high-rise, and skyscrapers respectively. The proposed method and the experimental findings will be beneficial in training robust CNNs for building extraction.

8/9/2024

🤷

Unsupervised Domain Adaptation Architecture Search with Self-Training for Land Cover Mapping

Clifford Broni-Bediako, Junshi Xia, Naoto Yokoya

Unsupervised domain adaptation (UDA) is a challenging open problem in land cover mapping. Previous studies show encouraging progress in addressing cross-domain distribution shifts on remote sensing benchmarks for land cover mapping. The existing works are mainly built on large neural network architectures, which makes them resource-hungry systems, limiting their practical impact for many real-world applications in resource-constrained environments. Thus, we proposed a simple yet effective framework to search for lightweight neural networks automatically for land cover mapping tasks under domain shifts. This is achieved by integrating Markov random field neural architecture search (MRF-NAS) into a self-training UDA framework to search for efficient and effective networks under a limited computation budget. This is the first attempt to combine NAS with self-training UDA as a single framework for land cover mapping. We also investigate two different pseudo-labelling approaches (confidence-based and energy-based) in self-training scheme. Experimental results on two recent datasets (OpenEarthMap & FLAIR #1) for remote sensing UDA demonstrate a satisfactory performance. With only less than 2M parameters and 30.16 GFLOPs, the best-discovered lightweight network reaches state-of-the-art performance on the regional target domain of OpenEarthMap (59.38% mIoU) and the considered target domain of FLAIR #1 (51.19% mIoU). The code is at https://github.com/cliffbb/UDA-NAS}{https://github.com/cliffbb/UDA-NAS.

4/24/2024

Self-degraded contrastive domain adaptation for industrial fault diagnosis with bi-imbalanced data

Gecheng Chen, Zeyu Yang, Chengwen Luo, Jianqiang Li

Modern industrial fault diagnosis tasks often face the combined challenge of distribution discrepancy and bi-imbalance. Existing domain adaptation approaches pay little attention to the prevailing bi-imbalance, leading to poor domain adaptation performance or even negative transfer. In this work, we propose a self-degraded contrastive domain adaptation (Sd-CDA) diagnosis framework to handle the domain discrepancy under the bi-imbalanced data. It first pre-trains the feature extractor via imbalance-aware contrastive learning based on model pruning to learn the feature representation efficiently in a self-supervised manner. Then it forces the samples away from the domain boundary based on supervised contrastive domain adversarial learning (SupCon-DA) and ensures the features generated by the feature extractor are discriminative enough. Furthermore, we propose the pruned contrastive domain adversarial learning (PSupCon-DA) to pay automatically re-weighted attention to the minorities to enhance the performance towards bi-imbalanced data. We show the superiority of the proposed method via two experiments.

6/3/2024

Physics-augmented Deep Learning with Adversarial Domain Adaptation: Applications to STM Image Denoising

Jianxin Xie, Wonhee Ko, Rui-Xing Zhang, Bing Yao

Image denoising is a critical task in various scientific fields such as medical imaging and material characterization, where the accurate recovery of underlying structures from noisy data is essential. Although supervised denoising techniques have achieved significant advancements, they typically require large datasets of paired clean-noisy images for training. Unsupervised methods, while not reliant on paired data, typically necessitate a set of unpaired clean images for training, which are not always accessible. In this paper, we propose a physics-augmented deep learning with adversarial domain adaption (PDA-Net) framework for unsupervised image denoising, with applications to denoise real-world scanning tunneling microscopy (STM) images. Our PDA-Net leverages the underlying physics to simulate and envision the ground truth for denoised STM images. Additionally, built upon Generative Adversarial Networks (GANs), we incorporate a cycle-consistency module and a domain adversarial module into our PDA-Net to address the challenge of lacking paired training data and achieve information transfer between the simulated and real experimental domains. Finally, we propose to implement feature alignment and weight-sharing techniques to fully exploit the similarity between simulated and real experimental images, thereby enhancing the denoising performance in both the simulation and experimental domains. Experimental results demonstrate that the proposed PDA-Net successfully enhances the quality of STM images, offering promising applications to enhance scientific discovery and accelerate experimental quantum material research.

9/10/2024