Data Efficient Contrastive Learning in Histopathology using Active Sampling

Read original: arXiv:2303.16247 - Published 7/23/2024 by Tahsin Reasat, Asif Sushmit, David S. Smith

📊

Overview

Deep learning (DL) algorithms can provide accurate and robust quantitative analysis in digital pathology.
These algorithms require large amounts of annotated training data, which is impractical in pathology due to the high resolution of histopathological images.
Self-supervised methods have been proposed to learn features using ad-hoc pretext tasks, but the self-supervised training process is time-consuming as it uses a large unlabeled dataset.

Plain English Explanation

Deep learning algorithms have the potential to provide detailed and reliable analysis of digital pathology images. However, these algorithms need a lot of labeled training data, which is difficult to obtain for pathology images because they have very high resolution. To address this, researchers have developed self-supervised learning methods that can learn useful features from large unlabeled datasets. But the self-supervised training process is still very time-consuming.

In this work, the researchers propose a new method that can actively select the most informative samples from the training dataset, reducing the number of samples needed by 93% and the training time by 62%, while maintaining the same performance as the traditional self-supervised learning approach. This makes the training process much more efficient and practical for real-world pathology applications.

Technical Explanation

The researchers developed a new method for actively sampling informative members from the training set using a small proxy network. This reduces the sample requirement by 93% and the training time by 62%, while still achieving the same performance as the traditional self-supervised learning method.

The key idea is to use a small "proxy" network to quickly identify the most informative samples from the large unlabeled dataset, rather than using the entire dataset. This proxy network is trained to predict which samples will be most useful for the main self-supervised learning task. By only using the most informative samples, the researchers were able to significantly reduce the training time and data requirements, making the overall approach much more efficient and practical for real-world digital pathology applications.

Critical Analysis

The researchers acknowledge that their method still requires a small amount of labeled data to train the proxy network, which could be a limitation in some scenarios. Additionally, the effectiveness of the method may depend on the specific self-supervised pretext task and the characteristics of the pathology dataset.

Further research could explore ways to reduce or eliminate the need for any labeled data, perhaps by using more sophisticated techniques for identifying informative samples. It would also be valuable to test the method on a wider range of pathology datasets to better understand its general applicability.

Overall, the researchers have presented a promising approach to make self-supervised learning more efficient and practical for digital pathology applications, which could have significant impact on the field.

Conclusion

The researchers have developed a new method for actively sampling informative training samples for self-supervised deep learning in digital pathology. This approach significantly reduces the amount of data and training time required, while maintaining the same performance as traditional self-supervised learning. By making the training process more efficient and practical, this work has the potential to enable more widespread adoption of deep learning for quantitative analysis in digital pathology.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

📊

Data Efficient Contrastive Learning in Histopathology using Active Sampling

Tahsin Reasat, Asif Sushmit, David S. Smith

Deep learning (DL) based diagnostics systems can provide accurate and robust quantitative analysis in digital pathology. These algorithms require large amounts of annotated training data which is impractical in pathology due to the high resolution of histopathological images. Hence, self-supervised methods have been proposed to learn features using ad-hoc pretext tasks. The self-supervised training process uses a large unlabeled dataset which makes the learning process time consuming. In this work, we propose a new method for actively sampling informative members from the training set using a small proxy network, decreasing sample requirement by 93% and training time by 62% while maintaining the same performance of the traditional self-supervised learning method. The code is available on https://github.com/Reasat/data_efficient_cl

7/23/2024

🏷️

Classification of Breast Cancer Histopathology Images using a Modified Supervised Contrastive Learning Method

Matina Mahdizadeh Sani, Ali Royat, Mahdieh Soleymani Baghshah

Deep neural networks have reached remarkable achievements in medical image processing tasks, specifically classifying and detecting various diseases. However, when confronted with limited data, these networks face a critical vulnerability, often succumbing to overfitting by excessively memorizing the limited information available. This work addresses the challenge mentioned above by improving the supervised contrastive learning method to reduce the impact of false positives. Unlike most existing methods that rely predominantly on fully supervised learning, our approach leverages the advantages of self-supervised learning in conjunction with employing the available labeled data. We evaluate our method on the BreakHis dataset, which consists of breast cancer histopathology images, and demonstrate an increase in classification accuracy by 1.45% at the image level and 1.42% at the patient level compared to the state-of-the-art method. This improvement corresponds to 93.63% absolute accuracy, highlighting our approach's effectiveness in leveraging data properties to learn more appropriate representation space.

5/7/2024

Dataset Distillation for Histopathology Image Classification

Cong Cong, Shiyu Xuan, Sidong Liu, Maurice Pagnucco, Shiliang Zhang, Yang Song

Deep neural networks (DNNs) have exhibited remarkable success in the field of histopathology image analysis. On the other hand, the contemporary trend of employing large models and extensive datasets has underscored the significance of dataset distillation, which involves compressing large-scale datasets into a condensed set of synthetic samples, offering distinct advantages in improving training efficiency and streamlining downstream applications. In this work, we introduce a novel dataset distillation algorithm tailored for histopathology image datasets (Histo-DD), which integrates stain normalisation and model augmentation into the distillation progress. Such integration can substantially enhance the compatibility with histopathology images that are often characterised by high colour heterogeneity. We conduct a comprehensive evaluation of the effectiveness of the proposed algorithm and the generated histopathology samples in both patch-level and slide-level classification tasks. The experimental results, carried out on three publicly available WSI datasets, including Camelyon16, TCGA-IDH, and UniToPath, demonstrate that the proposed Histo-DD can generate more informative synthetic patches than previous coreset selection and patch sampling methods. Moreover, the synthetic samples can preserve discriminative information, substantially reduce training efforts, and exhibit architecture-agnostic properties. These advantages indicate that synthetic samples can serve as an alternative to large-scale datasets.

8/20/2024

📊

MyriadAL: Active Few Shot Learning for Histopathology

Nico Schiavone, Jingyi Wang, Shuangzhi Li, Roger Zemp, Xingyu Li

Active Learning (AL) and Few Shot Learning (FSL) are two label-efficient methods which have achieved excellent results recently. However, most prior arts in both learning paradigms fail to explore the wealth of the vast unlabelled data. In this study, we address this issue in the scenario where the annotation budget is very limited, yet a large amount of unlabelled data for the target task is available. We frame this work in the context of histopathology where labelling is prohibitively expensive. To this end, we introduce an active few shot learning framework, Myriad Active Learning (MAL), including a contrastive-learning encoder, pseudo-label generation, and novel query sample selection in the loop. Specifically, we propose to massage unlabelled data in a self-supervised manner, where the obtained data representations and clustering knowledge form the basis to activate the AL loop. With feedback from the oracle in each AL cycle, the pseudo-labels of the unlabelled data are refined by optimizing a shallow task-specific net on top of the encoder. These updated pseudo-labels serve to inform and improve the active learning query selection process. Furthermore, we introduce a novel recipe to combine existing uncertainty measures and utilize the entire uncertainty list to reduce sample redundancy in AL. Extensive experiments on two public histopathology datasets show that MAL has superior test accuracy, macro F1-score, and label efficiency compared to prior works, and can achieve a comparable test accuracy to a fully supervised algorithm while labelling only 5% of the dataset.

4/26/2024