Pan-cancer Histopathology WSI Pre-training with Position-aware Masked Autoencoder

Read original: arXiv:2407.07504 - Published 7/16/2024 by Kun Wu, Zhiguo Jiang, Kunming Tang, Jun Shi, Fengying Xie, Wei Wang, Haibo Wu, Yushan Zheng

Pan-cancer Histopathology WSI Pre-training with Position-aware Masked Autoencoder

Overview

This paper presents a pre-training approach for whole-slide image (WSI) histopathology data using a position-aware masked autoencoder (PAMAE) model.
The PAMAE model is trained to reconstruct partially masked WSI patches, while also predicting the position of the masked patches within the full slide.
The pre-trained PAMAE model can then be fine-tuned on downstream tasks like cancer classification, achieving state-of-the-art performance.

Plain English Explanation

The paper focuses on a technique called position-aware masked autoencoder (PAMAE) that can be used to pre-train models on large datasets of whole-slide histopathology images. These whole-slide images contain detailed visual information about tissue samples, which can be useful for tasks like cancer classification.

The key idea behind PAMAE is to partially hide or "mask" portions of the input image, and then train the model to reconstruct the missing parts. But the model is also trained to predict the position of the masked regions within the full slide. This forces the model to learn a detailed spatial understanding of the image content.

Once the PAMAE model is pre-trained in this way, it can be fine-tuned on specific downstream tasks, like classifying cancer types. The authors show that this pre-training approach leads to significantly better performance compared to training the model from scratch.

The benefit of this technique is that it allows models to learn rich, generalizable representations from large, unlabeled histopathology datasets, which can then be effectively applied to various medical imaging tasks. This is particularly valuable since collecting and annotating medical image data can be time-consuming and expensive.

Technical Explanation

The paper introduces a position-aware masked autoencoder (PAMAE) model for pre-training on whole-slide histopathology images (WSIs). The PAMAE model is trained to reconstruct partially masked WSI patches, while also predicting the position of the masked patches within the full slide.

Specifically, the PAMAE model takes a WSI patch as input and applies a masking strategy that randomly hides a portion of the patch. The model is then trained to reconstruct the masked regions and predict the position of the masked regions within the full slide. This forces the model to learn a detailed spatial understanding of the image content.

The authors experiment with different masking strategies, such as random masking and structured masking, to encourage the model to learn different types of spatial relationships in the data.

After pre-training the PAMAE model on a large, unlabeled dataset of WSIs, the model can be fine-tuned on downstream tasks like cancer classification or image retrieval. The authors demonstrate state-of-the-art performance on these tasks, outperforming models trained from scratch.

The PAMAE pre-training approach builds on recent advancements in self-supervised learning for microscopy images and whole-slide representations, as well as position-aware models for spatial and topological understanding.

Critical Analysis

The paper presents a compelling approach for pre-training models on large, unlabeled histopathology datasets, which can be an effective way to leverage the wealth of available data and improve performance on downstream tasks.

One potential limitation is that the PAMAE model is trained to reconstruct and predict the position of randomly or structured masked regions, which may not fully capture the complex spatial relationships and hierarchical structures present in real histopathology data. It would be interesting to explore more biologically-inspired masking strategies that better reflect the underlying tissue organization.

Additionally, the authors only evaluate the PAMAE pre-training on a relatively small set of downstream tasks. It would be valuable to see how the pre-trained model generalizes to a wider range of histopathology applications, such as tumor segmentation or computational pathology.

Overall, the paper presents a promising direction for leveraging self-supervised learning to improve the performance and generalization of histopathology models, which could have significant impact in the field of computational pathology and cancer diagnosis.

Conclusion

The paper introduces a position-aware masked autoencoder (PAMAE) model for pre-training on large, unlabeled histopathology datasets. The PAMAE model is trained to reconstruct partially masked whole-slide image patches while also predicting the position of the masked regions, encouraging the model to learn a detailed spatial understanding of the image content.

The pre-trained PAMAE model can then be fine-tuned on downstream tasks like cancer classification, achieving state-of-the-art performance. This approach allows models to leverage the wealth of available histopathology data to learn rich, generalizable representations that can be effectively applied to various medical imaging tasks.

The paper's findings suggest that self-supervised pre-training strategies, like the PAMAE model, can be a powerful tool for advancing the field of computational pathology and improving the performance of AI systems in medical imaging applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Pan-cancer Histopathology WSI Pre-training with Position-aware Masked Autoencoder

Kun Wu, Zhiguo Jiang, Kunming Tang, Jun Shi, Fengying Xie, Wei Wang, Haibo Wu, Yushan Zheng

Large-scale pre-training models have promoted the development of histopathology image analysis. However, existing self-supervised methods for histopathology images focus on learning patch features, while there is still a lack of available pre-training models for WSI-level feature learning. In this paper, we propose a novel self-supervised learning framework for pan-cancer WSI-level representation pre-training with the designed position-aware masked autoencoder (PAMA). Meanwhile, we propose the position-aware cross-attention (PACA) module with a kernel reorientation (KRO) strategy and an anchor dropout (AD) mechanism. The KRO strategy can capture the complete semantic structure and eliminate ambiguity in WSIs, and the AD contributes to enhancing the robustness and generalization of the model. We evaluated our method on 6 large-scale datasets from multiple organs for pan-cancer classification tasks. The results have demonstrated the effectiveness of PAMA in generalized and discriminative WSI representation learning and pan-cancer WSI pre-training. The proposed method was also compared with 7 WSI analysis methods. The experimental results have indicated that our proposed PAMA is superior to the state-of-the-art methods.The code and checkpoints are available at https://github.com/WkEEn/PAMA.

7/16/2024

A self-supervised framework for learning whole slide representations

Xinhai Hou, Cheng Jiang, Akhil Kondepudi, Yiwei Lyu, Asadur Chowdury, Honglak Lee, Todd C. Hollon

Whole slide imaging is fundamental to biomedical microscopy and computational pathology. Previously, learning representations for gigapixel-sized whole slide images (WSIs) has relied on multiple instance learning with weak labels, which do not annotate the diverse morphologic features and spatial heterogeneity of WSIs. A high-quality self-supervised learning method for WSIs would provide transferable visual representations for downstream computational pathology tasks, without the need for dense annotations. We present Slide Pre-trained Transformers (SPT) for gigapixel-scale self-supervision of WSIs. Treating WSI patches as tokens, SPT combines data transformation strategies from language and vision modeling into a general and unified framework to generate views of WSIs for self-supervised pretraining. SPT leverages the inherent regional heterogeneity, histologic feature variability, and information redundancy within WSIs to learn high-quality whole slide representations. We benchmark SPT visual representations on five diagnostic tasks across three biomedical microscopy datasets. SPT significantly outperforms baselines for histopathologic diagnosis, cancer subtyping, and genetic mutation prediction. Finally, we demonstrate that SPT consistently improves whole slide representations when using off-the-shelf, in-domain, and foundational patch encoders for whole slide multiple instance learning.

5/27/2024

PD-L1 Classification of Weakly-Labeled Whole Slide Images of Breast Cancer

Giacomo Cignoni, Cristian Scatena, Chiara Frascarelli, Nicola Fusco, Antonio Giuseppe Naccarato, Giuseppe Nicol'o Fanelli, Alina S^irbu

Specific and effective breast cancer therapy relies on the accurate quantification of PD-L1 positivity in tumors, which appears in the form of brown stainings in high resolution whole slide images (WSIs). However, the retrieval and extensive labeling of PD-L1 stained WSIs is a time-consuming and challenging task for pathologists, resulting in low reproducibility, especially for borderline images. This study aims to develop and compare models able to classify PD-L1 positivity of breast cancer samples based on WSI analysis, relying only on WSI-level labels. The task consists of two phases: identifying regions of interest (ROI) and classifying tumors as PD-L1 positive or negative. For the latter, two model categories were developed, with different feature extraction methodologies. The first encodes images based on the colour distance from a base color. The second uses a convolutional autoencoder to obtain embeddings of WSI tiles, and aggregates them into a WSI-level embedding. For both model types, features are fed into downstream ML classifiers. Two datasets from different clinical centers were used in two different training configurations: (1) training on one dataset and testing on the other; (2) combining the datasets. We also tested the performance with or without human preprocessing to remove brown artefacts Colour distance based models achieve the best performances on testing configuration (1) with artefact removal, while autoencoder-based models are superior in the remaining cases, which are prone to greater data variability.

4/17/2024

Self Pre-training with Topology- and Spatiality-aware Masked Autoencoders for 3D Medical Image Segmentation

Pengfei Gu, Yejia Zhang, Huimin Li, Chaoli Wang, Danny Z. Chen

Masked Autoencoders (MAEs) have been shown to be effective in pre-training Vision Transformers (ViTs) for natural and medical image analysis problems. By reconstructing missing pixel/voxel information in visible patches, a ViT encoder can aggregate contextual information for downstream tasks. But, existing MAE pre-training methods, which were specifically developed with the ViT architecture, lack the ability to capture geometric shape and spatial information, which is critical for medical image segmentation tasks. In this paper, we propose a novel extension of known MAEs for self pre-training (i.e., models pre-trained on the same target dataset) for 3D medical image segmentation. (1) We propose a new topological loss to preserve geometric shape information by computing topological signatures of both the input and reconstructed volumes, learning geometric shape information. (2) We introduce a pre-text task that predicts the positions of the centers and eight corners of 3D crops, enabling the MAE to aggregate spatial information. (3) We extend the MAE pre-training strategy to a hybrid state-of-the-art (SOTA) medical image segmentation architecture and co-pretrain it alongside the ViT. (4) We develop a fine-tuned model for downstream segmentation tasks by complementing the pre-trained ViT encoder with our pre-trained SOTA model. Extensive experiments on five public 3D segmentation datasets show the effectiveness of our new approach.

7/17/2024