Annotating Ambiguous Images: General Annotation Strategy for High-Quality Data with Real-World Biomedical Validation

2306.12189

Published 4/30/2024 by Lars Schmarje, Vasco Grossmann, Claudius Zelenka, Johannes Brunger, Reinhard Koch

Annotating Ambiguous Images: General Annotation Strategy for High-Quality Data with Real-World Biomedical Validation

Abstract

In the field of image classification, existing methods often struggle with biased or ambiguous data, a prevalent issue in real-world scenarios. Current strategies, including semi-supervised learning and class blending, offer partial solutions but lack a definitive resolution. Addressing this gap, our paper introduces a novel strategy for generating high-quality labels in challenging datasets. Central to our approach is a clearly designed flowchart, based on a broad literature review, which enables the creation of reliable labels. We validate our methodology through a rigorous real-world test case in the biomedical field, specifically in deducing height reduction from vertebral imaging. Our empirical study, leveraging over 250,000 annotations, demonstrates the effectiveness of our strategies decisions compared to their alternatives.

Create account to get full access

Overview

This paper presents a general annotation strategy for image classification, with a real-world biomedical application on vertebral fracture diagnosis.
The approach aims to address the challenge of annotating ambiguous images, which is common in medical imaging tasks.
The authors validate their strategy on a vertebral fracture dataset, demonstrating its effectiveness in improving classification performance.

Plain English Explanation

The paper describes a new way to label and categorize medical images, focusing on the challenge of labeling images that are unclear or ambiguous. This is a common problem in medical imaging, where doctors may disagree on how to interpret certain images.

The researchers developed a general annotation strategy that can be used to improve the accuracy of image classification, even for ambiguous or difficult-to-interpret images. They tested this approach on a dataset of spinal X-rays, where the goal was to identify vertebral fractures.

The key idea is to gather input from multiple experts and use that to train a more robust image classification model. This helps address the subjectivity and uncertainty that can arise when interpreting medical images. The approach involves iteratively refining the annotations and model, with the goal of producing a final system that can accurately classify new images.

The researchers demonstrate that their strategy leads to significant improvements in vertebral fracture detection compared to existing methods. This suggests the approach could be valuable for a wide range of medical imaging tasks where ambiguity and subjectivity are challenges.

Technical Explanation

The paper presents a general annotation strategy for image classification tasks, with a focus on addressing the challenge of annotating ambiguous images. The proposed approach involves iteratively refining the annotations and training a classification model, leveraging input from multiple expert raters.

The authors first collect annotations from multiple experts on a dataset of vertebral X-ray images, where the task is to identify the presence of vertebral fractures. These initial annotations capture the inherent ambiguity and disagreement that can arise in medical image interpretation.

To address this, the researchers employ an iterative refinement strategy that alternates between improving the annotations and training an improved classification model. This involves:

Aggregating the multiple expert annotations to create a consolidated ground truth label for each image.
Training a classification model using the consolidated labels.
Identifying images where the model's predictions disagree with the current ground truth, indicating ambiguity.
Soliciting additional annotations for these ambiguous images from the experts.
Updating the ground truth labels and repeating the process.

This iterative process continues until the model's performance on a held-out test set converges. The authors demonstrate the effectiveness of this approach through experiments on a large vertebral fracture dataset, showing significant improvements in classification accuracy compared to existing methods.

Critical Analysis

The paper presents a well-designed and thorough approach to addressing the challenge of annotating ambiguous medical images. The iterative refinement strategy is a clever way to leverage multiple expert opinions to improve the model's performance on difficult-to-label cases.

However, the authors acknowledge several limitations and areas for further research. For example, the approach requires a significant amount of expert effort to obtain the initial and refined annotations, which may not be feasible in all real-world settings. Additionally, the paper does not explore the impact of the number of expert raters or the quality of their individual annotations on the final results.

It would also be valuable to understand how the proposed strategy compares to other approaches for handling ambiguity in medical image annotation, such as uncertainty-guided annotation or diagnosis-guided bootstrapping. Expanding the evaluation to additional medical imaging tasks beyond vertebral fracture detection could also provide further insights into the broader applicability of the method.

Conclusion

This paper presents a novel annotation strategy that effectively addresses the challenge of labeling ambiguous medical images, a common issue in many healthcare applications. The iterative refinement approach, which leverages input from multiple experts, demonstrates significant improvements in vertebral fracture detection performance compared to existing methods.

While the technique requires substantial expert effort, the potential benefits in terms of improved classification accuracy and robustness to ambiguity make it a promising direction for further research and development. The insights from this work could inform the design of more effective and reliable computer-aided diagnosis systems, ultimately leading to better patient outcomes.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

📊

Ambiguous Annotations: When is a Pedestrian not a Pedestrian?

Luisa Schwirten, Jannes Scholz, Daniel Kondermann, Janis Keuper

Datasets labelled by human annotators are widely used in the training and testing of machine learning models. In recent years, researchers are increasingly paying attention to label quality. However, it is not always possible to objectively determine whether an assigned label is correct or not. The present work investigates this ambiguity in the annotation of autonomous driving datasets as an important dimension of data quality. Our experiments show that excluding highly ambiguous data from the training improves model performance of a state-of-the-art pedestrian detector in terms of LAMR, precision and F1 score, thereby saving training time and annotation costs. Furthermore, we demonstrate that, in order to safely remove ambiguous instances and ensure the retained representativeness of the training data, an understanding of the properties of the dataset and class under investigation is crucial.

5/15/2024

cs.CV

Mitigating annotation shift in cancer classification using single image generative models

Marta Buetas Arcas, Richard Osuala, Karim Lekadir, Oliver D'iaz

Artificial Intelligence (AI) has emerged as a valuable tool for assisting radiologists in breast cancer detection and diagnosis. However, the success of AI applications in this domain is restricted by the quantity and quality of available data, posing challenges due to limited and costly data annotation procedures that often lead to annotation shifts. This study simulates, analyses and mitigates annotation shifts in cancer classification in the breast mammography domain. First, a high-accuracy cancer risk prediction model is developed, which effectively distinguishes benign from malignant lesions. Next, model performance is used to quantify the impact of annotation shift. We uncover a substantial impact of annotation shift on multiclass classification performance particularly for malignant lesions. We thus propose a training data augmentation approach based on single-image generative models for the affected class, requiring as few as four in-domain annotations to considerably mitigate annotation shift, while also addressing dataset imbalance. Lastly, we further increase performance by proposing and validating an ensemble architecture based on multiple models trained under different data augmentation regimes. Our study offers key insights into annotation shift in deep learning breast cancer classification and explores the potential of single-image generative models to overcome domain shift challenges.

5/31/2024

cs.CV cs.AI

Multi-rater Prompting for Ambiguous Medical Image Segmentation

Jinhong Wang, Yi Cheng, Jintai Chen, Hongxia Xu, Danny Chen, Jian Wu

Multi-rater annotations commonly occur when medical images are independently annotated by multiple experts (raters). In this paper, we tackle two challenges arisen in multi-rater annotations for medical image segmentation (called ambiguous medical image segmentation): (1) How to train a deep learning model when a group of raters produces a set of diverse but plausible annotations, and (2) how to fine-tune the model efficiently when computation resources are not available for re-training the entire model on a different dataset domain. We propose a multi-rater prompt-based approach to address these two challenges altogether. Specifically, we introduce a series of rater-aware prompts that can be plugged into the U-Net model for uncertainty estimation to handle multi-annotation cases. During the prompt-based fine-tuning process, only 0.3% of learnable parameters are required to be updated comparing to training the entire model. Further, in order to integrate expert consensus and disagreement, we explore different multi-rater incorporation strategies and design a mix-training strategy for comprehensive insight learning. Extensive experiments verify the effectiveness of our new approach for ambiguous medical image segmentation on two public datasets while alleviating the heavy burden of model re-training.

4/12/2024

cs.CV

Uncertainty-guided annotation enhances segmentation with the human-in-the-loop

Nadieh Khalili, Joey Spronck, Francesco Ciompi, Jeroen van der Laak, Geert Litjens

Deep learning algorithms, often critiqued for their 'black box' nature, traditionally fall short in providing the necessary transparency for trusted clinical use. This challenge is particularly evident when such models are deployed in local hospitals, encountering out-of-domain distributions due to varying imaging techniques and patient-specific pathologies. Yet, this limitation offers a unique avenue for continual learning. The Uncertainty-Guided Annotation (UGA) framework introduces a human-in-the-loop approach, enabling AI to convey its uncertainties to clinicians, effectively acting as an automated quality control mechanism. UGA eases this interaction by quantifying uncertainty at the pixel level, thereby revealing the model's limitations and opening the door for clinician-guided corrections. We evaluated UGA on the Camelyon dataset for lymph node metastasis segmentation which revealed that UGA improved the Dice coefficient (DC), from 0.66 to 0.76 by adding 5 patches, and further to 0.84 with 10 patches. To foster broader application and community contribution, we have made our code accessible at

4/12/2024

cs.CV cs.AI cs.HC