Predictive Accuracy-Based Active Learning for Medical Image Segmentation

2405.00452

Published 7/2/2024 by Jun Shi, Shulan Ruan, Ziqi Zhu, Minfan Zhao, Hong An, Xudong Xue, Bing Yan

Predictive Accuracy-Based Active Learning for Medical Image Segmentation

Abstract

Active learning is considered a viable solution to alleviate the contradiction between the high dependency of deep learning-based segmentation methods on annotated data and the expensive pixel-level annotation cost of medical images. However, most existing methods suffer from unreliable uncertainty assessment and the struggle to balance diversity and informativeness, leading to poor performance in segmentation tasks. In response, we propose an efficient Predictive Accuracy-based Active Learning (PAAL) method for medical image segmentation, first introducing predictive accuracy to define uncertainty. Specifically, PAAL mainly consists of an Accuracy Predictor (AP) and a Weighted Polling Strategy (WPS). The former is an attached learnable module that can accurately predict the segmentation accuracy of unlabeled samples relative to the target model with the predicted posterior probability. The latter provides an efficient hybrid querying scheme by combining predicted accuracy and feature representation, aiming to ensure the uncertainty and diversity of the acquired samples. Extensive experiment results on multiple datasets demonstrate the superiority of PAAL. PAAL achieves comparable accuracy to fully annotated data while reducing annotation costs by approximately 50% to 80%, showcasing significant potential in clinical applications. The code is available at https://github.com/shijun18/PAAL-MedSeg.

Create account to get full access

Overview

This research paper introduces a new active learning method for medical image segmentation tasks, which aims to improve the predictive accuracy of the segmentation model with a limited annotation budget.
The proposed approach, called Predictive Accuracy-Based Active Learning (PAAL), selects the most informative unlabeled samples for annotation by estimating the potential impact of annotating each sample on the model's predictive accuracy.
The PAAL method is evaluated on several medical image segmentation datasets and demonstrates superior performance compared to other active learning strategies.

Plain English Explanation

The paper presents a new way to train medical image segmentation models more efficiently. In many medical imaging tasks, such as analyzing MRI scans or X-rays, the models need to be trained on a large number of labeled images to perform well. However, labeling these images can be time-consuming and expensive, as it requires expert radiologists or clinicians to manually annotate the relevant structures in each image.

Focused Active Learning for Histopathological Image Classification and Active Learning for Efficient Annotation in Precision Agriculture Use have explored similar active learning approaches to address this challenge. The key idea behind the new method, called Predictive Accuracy-Based Active Learning (PAAL), is to select the most informative unlabeled images for annotation, rather than annotating images randomly or using other heuristics.

PAAL does this by estimating how much each unlabeled image would improve the model's predictive accuracy if it were annotated and added to the training data. The images that are predicted to have the biggest positive impact on the model's performance are then prioritized for annotation. This helps to ensure that the limited annotation budget is used as efficiently as possible, leading to better segmentation models with fewer annotated images.

The researchers evaluate PAAL on several medical image segmentation datasets and show that it outperforms other active learning strategies, such as uncertainty-based or diversity-based approaches. This suggests that PAAL could be a valuable tool for training more accurate medical image segmentation models while reducing the annotation burden on domain experts.

Technical Explanation

The paper introduces a new active learning method called Predictive Accuracy-Based Active Learning (PAAL) for medical image segmentation tasks. The key idea behind PAAL is to select the most informative unlabeled samples for annotation by estimating the potential impact of annotating each sample on the model's predictive accuracy.

Specifically, PAAL first trains an initial segmentation model using a small set of labeled images. It then evaluates the model's performance on the unlabeled images and estimates the potential improvement in predictive accuracy that would result from annotating each unlabeled image and adding it to the training set. The images with the highest predicted accuracy improvements are then prioritized for annotation and added to the training set.

The authors evaluate PAAL on several medical image segmentation datasets, including brain MRI, retinal OCT, and chest X-ray images. They compare PAAL to other active learning strategies, such as uncertainty-based and diversity-based approaches, as well as a random sampling baseline. The results show that PAAL consistently outperforms these other methods, achieving higher segmentation accuracy with fewer annotated images.

The authors also analyze the impact of different components of the PAAL method, such as the model used for predicting accuracy improvements and the acquisition function for selecting samples. They find that a simple linear regression model for predicting accuracy improvements and a greedy acquisition function perform well in practice.

MyriadAL: Active Few-Shot Learning for Histopathology and Think Twice Before Selection in Federated Evidential Active Learning have explored related active learning techniques, but the PAAL method introduces a novel way of estimating the impact of annotating each unlabeled sample on the model's predictive accuracy.

Critical Analysis

The PAAL method proposed in this paper is a promising approach for improving the efficiency of medical image segmentation tasks. By actively selecting the most informative unlabeled samples for annotation, the method can achieve higher segmentation accuracy with fewer labeled images, which can significantly reduce the annotation burden on domain experts.

One potential limitation of the PAAL method is that it relies on a separate model for predicting the accuracy improvements from annotating each unlabeled sample. While the authors show that a simple linear regression model works well in practice, the accuracy of this prediction model could be a critical factor in the overall performance of PAAL. Exploring more sophisticated prediction models, such as those based on AnchorAL: Computationally Efficient Active Learning for Large-Imbalanced Datasets, could potentially further improve the method's performance.

Another area for further research could be investigating how PAAL performs in more realistic, practical scenarios, where the initial labeled dataset may be small and the distribution of the unlabeled data may differ from the labeled data. The authors' experiments were conducted on relatively large initial labeled datasets, and it would be valuable to understand how PAAL scales to more challenging real-world settings.

Overall, the PAAL method represents an exciting advancement in active learning for medical image segmentation, and the promising results reported in this paper suggest that it could have a significant impact on improving the efficiency and effectiveness of these critical healthcare applications.

Conclusion

This research paper introduces a new active learning method called Predictive Accuracy-Based Active Learning (PAAL) for medical image segmentation tasks. PAAL aims to improve the predictive accuracy of segmentation models by selectively annotating the most informative unlabeled samples, rather than annotating randomly or using other heuristic-based approaches.

The key innovation of PAAL is its ability to estimate the potential impact of annotating each unlabeled sample on the model's predictive accuracy, and then prioritize the annotation of the samples with the highest predicted accuracy improvements. This helps to ensure that the limited annotation budget is used as efficiently as possible, leading to better segmentation models with fewer annotated images.

The authors' evaluation of PAAL on several medical image segmentation datasets shows that it outperforms other active learning strategies, demonstrating the method's effectiveness in reducing the annotation burden while maintaining high segmentation accuracy. This suggests that PAAL could be a valuable tool for training more accurate medical image segmentation models, with important implications for a wide range of healthcare applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Focused Active Learning for Histopathological Image Classification

Arne Schmidt, Pablo Morales-'Alvarez, Lee A. D. Cooper, Lee A. Newberg, Andinet Enquobahrie, Aggelos K. Katsaggelos, Rafael Molina

Active Learning (AL) has the potential to solve a major problem of digital pathology: the efficient acquisition of labeled data for machine learning algorithms. However, existing AL methods often struggle in realistic settings with artifacts, ambiguities, and class imbalances, as commonly seen in the medical field. The lack of precise uncertainty estimations leads to the acquisition of images with a low informative value. To address these challenges, we propose Focused Active Learning (FocAL), which combines a Bayesian Neural Network with Out-of-Distribution detection to estimate different uncertainties for the acquisition function. Specifically, the weighted epistemic uncertainty accounts for the class imbalance, aleatoric uncertainty for ambiguous images, and an OoD score for artifacts. We perform extensive experiments to validate our method on MNIST and the real-world Panda dataset for the classification of prostate cancer. The results confirm that other AL methods are 'distracted' by ambiguities and artifacts which harm the performance. FocAL effectively focuses on the most informative images, avoiding ambiguities and artifacts during acquisition. For both experiments, FocAL outperforms existing AL approaches, reaching a Cohen's kappa of 0.764 with only 0.69% of the labeled Panda data.

4/9/2024

cs.CV cs.AI

Edge-guided and Class-balanced Active Learning for Semantic Segmentation of Aerial Images

Lianlei Shan, Weiqiang Wang, Ke Lv, Bin Luo

Semantic segmentation requires pixel-level annotation, which is time-consuming. Active Learning (AL) is a promising method for reducing data annotation costs. Due to the gap between aerial and natural images, the previous AL methods are not ideal, mainly caused by unreasonable labeling units and the neglect of class imbalance. Previous labeling units are based on images or regions, which does not consider the characteristics of segmentation tasks and aerial images, i.e., the segmentation network often makes mistakes in the edge region, and the edge of aerial images is often interlaced and irregular. Therefore, an edge-guided labeling unit is proposed and supplemented as the new unit. On the other hand, the class imbalance is severe, manifested in two aspects: the aerial image is seriously imbalanced, and the AL strategy does not fully consider the class balance. Both seriously affect the performance of AL in aerial images. We comprehensively ensure class balance from all steps that may occur imbalance, including initial labeled data, subsequent labeled data, and pseudo-labels. Through the two improvements, our method achieves more than 11.2% gains compared to state-of-the-art methods on three benchmark datasets, Deepglobe, Potsdam, and Vaihingen, and more than 18.6% gains compared to the baseline. Sufficient ablation studies show that every module is indispensable. Furthermore, we establish a fair and strong benchmark for future research on AL for aerial image segmentation.

5/29/2024

cs.CV

Active learning for efficient annotation in precision agriculture: a use-case on crop-weed semantic segmentation

Bart M. van Marrewijk, Charbel Dandjinou, Dan Jeric Arcega Rustia, Nicolas Franco Gonzalez, Boubacar Diallo, J'er^ome Dias, Paul Melki, Pieter M. Blok

Optimizing deep learning models requires large amounts of annotated images, a process that is both time-intensive and costly. Especially for semantic segmentation models in which every pixel must be annotated. A potential strategy to mitigate annotation effort is active learning. Active learning facilitates the identification and selection of the most informative images from a large unlabelled pool. The underlying premise is that these selected images can improve the model's performance faster than random selection to reduce annotation effort. While active learning has demonstrated promising results on benchmark datasets like Cityscapes, its performance in the agricultural domain remains largely unexplored. This study addresses this research gap by conducting a comparative study of three active learning-based acquisition functions: Bayesian Active Learning by Disagreement (BALD), stochastic-based BALD (PowerBALD), and Random. The acquisition functions were tested on two agricultural datasets: Sugarbeet and Corn-Weed, both containing three semantic classes: background, crop and weed. Our results indicated that active learning, especially PowerBALD, yields a higher performance than Random sampling on both datasets. But due to the relatively large standard deviations, the differences observed were minimal; this was partly caused by high image redundancy and imbalanced classes. Specifically, more than 89% of the pixels belonged to the background class on both datasets. The absence of significant results on both datasets indicates that further research is required for applying active learning on agricultural datasets, especially if they contain a high-class imbalance and redundant images. Recommendations and insights are provided in this paper to potentially resolve such issues.

4/4/2024

cs.CV cs.AI

🖼️

Think Twice Before Selection: Federated Evidential Active Learning for Medical Image Analysis with Domain Shifts

Jiayi Chen, Benteng Ma, Hengfei Cui, Yong Xia

Federated learning facilitates the collaborative learning of a global model across multiple distributed medical institutions without centralizing data. Nevertheless, the expensive cost of annotation on local clients remains an obstacle to effectively utilizing local data. To mitigate this issue, federated active learning methods suggest leveraging local and global model predictions to select a relatively small amount of informative local data for annotation. However, existing methods mainly focus on all local data sampled from the same domain, making them unreliable in realistic medical scenarios with domain shifts among different clients. In this paper, we make the first attempt to assess the informativeness of local data derived from diverse domains and propose a novel methodology termed Federated Evidential Active Learning (FEAL) to calibrate the data evaluation under domain shift. Specifically, we introduce a Dirichlet prior distribution in both local and global models to treat the prediction as a distribution over the probability simplex and capture both aleatoric and epistemic uncertainties by using the Dirichlet-based evidential model. Then we employ the epistemic uncertainty to calibrate the aleatoric uncertainty. Afterward, we design a diversity relaxation strategy to reduce data redundancy and maintain data diversity. Extensive experiments and analysis on five real multi-center medical image datasets demonstrate the superiority of FEAL over the state-of-the-art active learning methods in federated scenarios with domain shifts. The code will be available at https://github.com/JiayiChen815/FEAL.

4/23/2024

cs.CV