MyriadAL: Active Few Shot Learning for Histopathology

Read original: arXiv:2310.16161 - Published 4/26/2024 by Nico Schiavone, Jingyi Wang, Shuangzhi Li, Roger Zemp, Xingyu Li

📊

Overview

This paper introduces a new active few-shot learning framework called Myriad Active Learning (MAL) that leverages unlabeled data to improve label efficiency.
MAL uses a contrastive learning encoder, pseudo-label generation, and a novel query sample selection process to activate the active learning loop.
The authors evaluate MAL on two public histopathology datasets and show it can achieve comparable test accuracy to a fully supervised model while only labeling 5% of the dataset.

Plain English Explanation

Active learning and few-shot learning are two methods that can help [object Object] models learn efficiently with limited labeled data. However, most previous approaches in these areas have not taken advantage of the large amounts of [object Object] that are often available.

This paper proposes a new framework called Myriad Active Learning (MAL) that combines active learning and few-shot learning to leverage both labeled and unlabeled data. The key ideas are:

Use [object Object] to learn useful data representations from the unlabeled data in a self-supervised way.
Generate "pseudo-labels" for the unlabeled data based on the learned representations, and refine these labels as the active learning process progresses.
Use a novel query sample selection method that combines existing uncertainty measures to reduce redundancy when selecting new samples to label.

The authors test MAL on two [object Object] image datasets and show that it can achieve high accuracy while only labeling a small fraction (5%) of the total data. This could be very useful in domains like medical imaging where [object Object].

Technical Explanation

The core of the MAL framework is a contrastive learning encoder that learns useful data representations from the unlabeled images in a self-supervised way. These representations and the clustering knowledge obtained serve as the foundation for the active learning loop.

In each active learning cycle, the model generates pseudo-labels for the unlabeled data by optimizing a shallow task-specific network on top of the encoder. These updated pseudo-labels are then used to inform the active learning query selection process.

The authors introduce a novel query sample selection method that combines existing uncertainty measures, such as entropy and margin, to reduce sample redundancy and improve label efficiency. This approach utilizes the entire uncertainty list, rather than just the top most uncertain samples.

Extensive experiments on two public histopathology datasets, PatchCamelyon and Camelyon16, demonstrate that MAL outperforms previous state-of-the-art active learning and few-shot learning methods in terms of test accuracy, macro F1-score, and label efficiency. The results show that MAL can achieve comparable test accuracy to a fully supervised model while labeling only 5% of the dataset.

Critical Analysis

The authors acknowledge several limitations of their work. First, the performance of MAL is still dependent on the quality of the initial contrastive learning representations. If these representations are not sufficiently informative, the subsequent active learning process may be hindered.

Additionally, the authors note that the pseudo-label refinement process, while effective, still relies on a small amount of labeled data. It would be interesting to explore ways to further reduce this dependency, perhaps by incorporating more advanced semi-supervised learning techniques.

Another potential area for improvement is the query sample selection method. While the authors' proposed approach combining multiple uncertainty measures is effective, there may be room for more sophisticated techniques that better capture the informativeness and diversity of candidate samples.

Finally, the authors only evaluate MAL on histopathology image datasets. It would be valuable to assess its performance on a wider range of tasks and datasets to better understand its broader applicability and potential limitations.

Conclusion

This paper introduces a novel active few-shot learning framework called Myriad Active Learning (MAL) that leverages unlabeled data to improve label efficiency. By using contrastive learning, pseudo-label generation, and a novel query sample selection process, MAL is able to achieve high test accuracy on histopathology image datasets while labeling only a small fraction of the total data.

The results of this work have significant implications for domains where data labeling is prohibitively expensive, such as medical imaging. By reducing the amount of labeled data required, MAL could help make machine learning models more accessible and practical in these settings. Further research to address the identified limitations and expand the approach to a wider range of applications would be valuable contributions to the field.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

📊

MyriadAL: Active Few Shot Learning for Histopathology

Nico Schiavone, Jingyi Wang, Shuangzhi Li, Roger Zemp, Xingyu Li

Active Learning (AL) and Few Shot Learning (FSL) are two label-efficient methods which have achieved excellent results recently. However, most prior arts in both learning paradigms fail to explore the wealth of the vast unlabelled data. In this study, we address this issue in the scenario where the annotation budget is very limited, yet a large amount of unlabelled data for the target task is available. We frame this work in the context of histopathology where labelling is prohibitively expensive. To this end, we introduce an active few shot learning framework, Myriad Active Learning (MAL), including a contrastive-learning encoder, pseudo-label generation, and novel query sample selection in the loop. Specifically, we propose to massage unlabelled data in a self-supervised manner, where the obtained data representations and clustering knowledge form the basis to activate the AL loop. With feedback from the oracle in each AL cycle, the pseudo-labels of the unlabelled data are refined by optimizing a shallow task-specific net on top of the encoder. These updated pseudo-labels serve to inform and improve the active learning query selection process. Furthermore, we introduce a novel recipe to combine existing uncertainty measures and utilize the entire uncertainty list to reduce sample redundancy in AL. Extensive experiments on two public histopathology datasets show that MAL has superior test accuracy, macro F1-score, and label efficiency compared to prior works, and can achieve a comparable test accuracy to a fully supervised algorithm while labelling only 5% of the dataset.

4/26/2024

Focused Active Learning for Histopathological Image Classification

Arne Schmidt, Pablo Morales-'Alvarez, Lee A. D. Cooper, Lee A. Newberg, Andinet Enquobahrie, Aggelos K. Katsaggelos, Rafael Molina

Active Learning (AL) has the potential to solve a major problem of digital pathology: the efficient acquisition of labeled data for machine learning algorithms. However, existing AL methods often struggle in realistic settings with artifacts, ambiguities, and class imbalances, as commonly seen in the medical field. The lack of precise uncertainty estimations leads to the acquisition of images with a low informative value. To address these challenges, we propose Focused Active Learning (FocAL), which combines a Bayesian Neural Network with Out-of-Distribution detection to estimate different uncertainties for the acquisition function. Specifically, the weighted epistemic uncertainty accounts for the class imbalance, aleatoric uncertainty for ambiguous images, and an OoD score for artifacts. We perform extensive experiments to validate our method on MNIST and the real-world Panda dataset for the classification of prostate cancer. The results confirm that other AL methods are 'distracted' by ambiguities and artifacts which harm the performance. FocAL effectively focuses on the most informative images, avoiding ambiguities and artifacts during acquisition. For both experiments, FocAL outperforms existing AL approaches, reaching a Cohen's kappa of 0.764 with only 0.69% of the labeled Panda data.

4/9/2024

📊

Data Efficient Contrastive Learning in Histopathology using Active Sampling

Tahsin Reasat, Asif Sushmit, David S. Smith

Deep learning (DL) based diagnostics systems can provide accurate and robust quantitative analysis in digital pathology. These algorithms require large amounts of annotated training data which is impractical in pathology due to the high resolution of histopathological images. Hence, self-supervised methods have been proposed to learn features using ad-hoc pretext tasks. The self-supervised training process uses a large unlabeled dataset which makes the learning process time consuming. In this work, we propose a new method for actively sampling informative members from the training set using a small proxy network, decreasing sample requirement by 93% and training time by 62% while maintaining the same performance of the traditional self-supervised learning method. The code is available on https://github.com/Reasat/data_efficient_cl

7/23/2024

🏷️

Federated Active Learning Framework for Efficient Annotation Strategy in Skin-lesion Classification

Zhipeng Deng, Yuqiao Yang, Kenji Suzuki

Federated Learning (FL) enables multiple institutes to train models collaboratively without sharing private data. Current FL research focuses on communication efficiency, privacy protection, and personalization and assumes that the data of FL have already been ideally collected. In medical scenarios, however, data annotation demands both expertise and intensive labor, which is a critical problem in FL. Active learning (AL), has shown promising performance in reducing the number of data annotations in medical image analysis. We propose a federated AL (FedAL) framework in which AL is executed periodically and interactively under FL. We exploit a local model in each hospital and a global model acquired from FL to construct an ensemble. We use ensemble-entropy-based AL as an efficient data-annotation strategy in FL. Therefore, our FedAL framework can decrease the amount of annotated data and preserve patient privacy while maintaining the performance of FL. To our knowledge, this is the first FedAL framework applied to medical images. We validated our framework on real-world dermoscopic datasets. Using only 50% of samples, our framework was able to achieve state-of-the-art performance on a skin-lesion classification task. Our framework performed better than several state-of-the-art AL methods under FL and achieved comparable performance to full-data FL.

6/18/2024