WV-Net: A foundation model for SAR WV-mode satellite imagery trained using contrastive self-supervised learning on 10 million images

Read original: arXiv:2406.18765 - Published 6/28/2024 by Yannik Glaser, Justin E. Stopa, Linnea M. Wolniewicz, Ralph Foster, Doug Vandemark, Alexis Mouche, Bertrand Chapron, Peter Sadowski

WV-Net: A foundation model for SAR WV-mode satellite imagery trained using contrastive self-supervised learning on 10 million images

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

WV-Net: A foundation model for SAR WV-mode satellite imagery trained using contrastive self-supervised learning on 10 million images

Yannik Glaser, Justin E. Stopa, Linnea M. Wolniewicz, Ralph Foster, Doug Vandemark, Alexis Mouche, Bertrand Chapron, Peter Sadowski

The European Space Agency's Copernicus Sentinel-1 (S-1) mission is a constellation of C-band synthetic aperture radar (SAR) satellites that provide unprecedented monitoring of the world's oceans. S-1's wave mode (WV) captures 20x20 km image patches at 5 m pixel resolution and is unaffected by cloud cover or time-of-day. The mission's open data policy has made SAR data easily accessible for a range of applications, but the need for manual image annotations is a bottleneck that hinders the use of machine learning methods. This study uses nearly 10 million WV-mode images and contrastive self-supervised learning to train a semantic embedding model called WV-Net. In multiple downstream tasks, WV-Net outperforms a comparable model that was pre-trained on natural images (ImageNet) with supervised learning. Experiments show improvements for estimating wave height (0.50 vs 0.60 RMSE using linear probing), estimating near-surface air temperature (0.90 vs 0.97 RMSE), and performing multilabel-classification of geophysical and atmospheric phenomena (0.96 vs 0.95 micro-averaged AUROC). WV-Net embeddings are also superior in an unsupervised image-retrieval task and scale better in data-sparse settings. Together, these results demonstrate that WV-Net embeddings can support geophysical research by providing a convenient foundation model for a variety of data analysis and exploration tasks.

6/28/2024

Deformation monitoring with Sentinel-1 Wave mode data

Piyush S. Agram, Matthew T. Calef, Kelly M. Olsen, Kimberly Carlson, Scott Arko

We describe the salient characteristics of Sentinel-1 wave (WV) mode vignettes. We describe our approach for working with WV mode data that enables vignette-based data access and processing, thereby eliminating the Sentinel-1 Single Look Complex (SLC) data packaging and current archive metadata conventions as a bottleneck to large scale processing. We discuss the spatial and temporal coverage of Sentinel-1 WV mode data and show that a large volume of data has been acquired over land masses in this mode, thus allowing us to use it for land monitoring applications as well as ocean applications. For targeted infrastructure monitoring studies, we are able to generate coregistered, geocoded stacks of WV mode SLCs for any area of interest (AOI) with sufficient wave mode coverage, in a few minutes. We demonstrate the applicability of using WV mode data for deformation monitoring applications. Finally, we discuss the benefits and limitations of working with Sentinel-1 WV mode data.

6/24/2024

Wavelet-based Bi-dimensional Aggregation Network for SAR Image Change Detection

Jiangwei Xie, Feng Gao, Xiaowei Zhou, Junyu Dong

Synthetic aperture radar (SAR) image change detection is critical in remote sensing image analysis. Recently, the attention mechanism has been widely used in change detection tasks. However, existing attention mechanisms often employ down-sampling operations such as average pooling on the Key and Value components to enhance computational efficiency. These irreversible operations result in the loss of high-frequency components and other important information. To address this limitation, we develop Wavelet-based Bi-dimensional Aggregation Network (WBANet) for SAR image change detection. We design a wavelet-based self-attention block that includes discrete wavelet transform and inverse discrete wavelet transform operations on Key and Value components. Hence, the feature undergoes downsampling without any loss of information, while simultaneously enhancing local contextual awareness through an expanded receptive field. Additionally, we have incorporated a bi-dimensional aggregation module that boosts the non-linear representation capability by merging spatial and channel information via broadcast mechanism. Experimental results on three SAR datasets demonstrate that our WBANet significantly outperforms contemporary state-of-the-art methods. Specifically, our WBANet achieves 98.33%, 96.65%, and 96.62% of percentage of correct classification (PCC) on the respective datasets, highlighting its superior performance. Source codes are available at url{https://github.com/summitgao/WBANet}.

7/19/2024

Multi-Label Guided Soft Contrastive Learning for Efficient Earth Observation Pretraining

Yi Wang, Conrad M Albrecht, Xiao Xiang Zhu

Self-supervised pretraining on large-scale satellite data has raised great interest in building Earth observation (EO) foundation models. However, many important resources beyond pure satellite imagery, such as land-cover-land-use products that provide free global semantic information, as well as vision foundation models that hold strong knowledge of the natural world, tend to be overlooked. In this work, we show these free additional resources not only help resolve common contrastive learning bottlenecks, but also significantly boost the efficiency and effectiveness of EO pretraining. Specifically, we first propose soft contrastive learning that optimizes cross-scene soft similarity based on land-cover-generated multi-label supervision, naturally solving the issue of multiple positive samples and too strict positive matching in complex scenes. Second, we explore cross-domain continual pretraining for both multispectral and SAR imagery, building efficient EO foundation models from strongest vision models such as DINOv2. Integrating simple weight-initialization and Siamese masking strategies into our soft contrastive learning framework, we demonstrate impressive continual pretraining performance even when the input channels and modalities are not aligned. Without prohibitive training, we produce multispectral and SAR foundation models that achieve significantly better results in 9 out of 10 downstream tasks than most existing SOTA models. For example, our ResNet50/ViT-S achieve 84.8/85.0 linear probing mAP scores on BigEarthNet-10% which are better than most existing ViT-L models; under the same setting, our ViT-B sets a new record of 86.8 in multispectral, and 82.5 in SAR, the latter even better than many multispectral models. Dataset and models are available at https://github.com/zhu-xlab/softcon.

6/3/2024