Automated Detection of Label Errors in Semantic Segmentation Datasets via Deep Learning and Uncertainty Quantification

Read original: arXiv:2207.06104 - Published 8/27/2024 by Matthias Rottmann, Marco Reese

🔎

Overview

This research paper presents a method for detecting label errors in image datasets used for semantic segmentation.
Semantic segmentation datasets require extensive human labor to annotate, and label errors can easily be overlooked during review.
Label errors can lead to biased benchmarks and performance degradation of deep neural networks trained on these datasets.
The authors propose a novel approach that leverages the pixel-wise predictions of semantic segmentation models and component-level uncertainty quantification to identify label errors.

Plain English Explanation

The researchers have developed a new way to detect label errors in image datasets used for semantic segmentation. Semantic segmentation is the task of assigning a category label to each pixel in an image, such as "road," "building," or "person." Creating these types of datasets requires a lot of manual effort from human annotators, and it's easy for mistakes to slip through during the review process.

These label errors can cause problems when the datasets are used to train deep neural networks for semantic segmentation. The networks may learn the wrong associations and end up performing poorly, even on images without label errors. To address this, the researchers have come up with a way to use the networks' own uncertainty estimates to detect label errors.

The key insight is that the networks tend to be more uncertain about pixels at the boundaries between different objects or regions in the image. By looking at the patterns of uncertainty across entire connected components (groups of neighboring pixels) in the segmentation, the researchers can identify which labels are likely to be wrong. This allows them to catch and fix errors in the dataset without having to manually review every single pixel.

Technical Explanation

The researchers present a novel approach for detecting label errors in semantic segmentation datasets. Semantic segmentation models produce pixel-wise predictions, which makes identifying label errors through uncertainty quantification a complex task. The authors propose lifting the consideration of uncertainty to the level of predicted components (connected regions of pixels) rather than individual pixels.

They benchmark their approach by intentionally introducing label errors into the Cityscapes and CARLA driving simulator datasets. Their experiments show that the method is able to detect the majority of label errors while maintaining a low false positive rate. This is a significant improvement over previous uncertainty-based approaches, which were hampered by the high uncertainty at object boundaries.

Additionally, the researchers apply their method to widely-used semantic segmentation datasets and provide a collection of identified label errors, along with sample statistics. This serves as a valuable resource for the computer vision community to improve the quality of these important benchmarks.

The key technical contribution is the insight that component-level uncertainty quantification is more effective than pixel-level for detecting label errors in semantic segmentation. By focusing on the uncertainty patterns of entire connected regions rather than individual pixels, the method is able to more reliably identify mislabeled areas.

Critical Analysis

The researchers have presented a well-designed and thorough approach to the important problem of label error detection in semantic segmentation datasets. The use of component-level uncertainty quantification is a clever and effective solution to the challenges posed by the high uncertainty at object boundaries.

One potential limitation of the study is the reliance on manually curated datasets (Cityscapes and CARLA) for benchmarking. While these provide a controlled environment to assess the method's performance, it would be valuable to see how the approach scales and performs on larger, real-world datasets collected in the wild.

Additionally, the paper does not address the issue of how to prioritize the detected label errors for human review and correction. In a large-scale dataset, the method may identify thousands of potential errors, and providing guidance on which ones to focus on first would be a helpful next step.

Despite these minor concerns, the research represents a significant contribution to the field of computer vision. By improving the quality of semantic segmentation datasets, the work has the potential to lead to more robust and reliable deep learning models for a wide range of applications, from autonomous driving to medical image analysis.

Conclusion

This research paper presents a novel approach for detecting label errors in semantic segmentation datasets. The key innovation is the use of component-level uncertainty quantification, which allows the method to reliably identify mislabeled regions in the data.

The authors have demonstrated the effectiveness of their approach through extensive experiments on curated datasets, and they have also applied it to widely-used benchmarks, providing a valuable collection of identified label errors. This work has important implications for improving the quality of semantic segmentation datasets and, by extension, the performance of deep learning models trained on them.

Overall, this research represents a significant advancement in the field of computer vision, and it lays the groundwork for further improvements in dataset curation and model robustness.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🔎

Automated Detection of Label Errors in Semantic Segmentation Datasets via Deep Learning and Uncertainty Quantification

Matthias Rottmann, Marco Reese

In this work, we for the first time present a method for detecting label errors in image datasets with semantic segmentation, i.e., pixel-wise class labels. Annotation acquisition for semantic segmentation datasets is time-consuming and requires plenty of human labor. In particular, review processes are time consuming and label errors can easily be overlooked by humans. The consequences are biased benchmarks and in extreme cases also performance degradation of deep neural networks (DNNs) trained on such datasets. DNNs for semantic segmentation yield pixel-wise predictions, which makes detection of label errors via uncertainty quantification a complex task. Uncertainty is particularly pronounced at the transitions between connected components of the prediction. By lifting the consideration of uncertainty to the level of predicted components, we enable the usage of DNNs together with component-level uncertainty quantification for the detection of label errors. We present a principled approach to benchmarking the task of label error detection by dropping labels from the Cityscapes dataset as well from a dataset extracted from the CARLA driving simulator, where in the latter case we have the labels under control. Our experiments show that our approach is able to detect the vast majority of label errors while controlling the number of false label error detections. Furthermore, we apply our method to semantic segmentation datasets frequently used by the computer vision community and present a collection of label errors along with sample statistics.

8/27/2024

Improving Label Error Detection and Elimination with Uncertainty Quantification

Johannes Jakubik, Michael Vossing, Manil Maskey, Christopher Wolfle, Gerhard Satzger

Identifying and handling label errors can significantly enhance the accuracy of supervised machine learning models. Recent approaches for identifying label errors demonstrate that a low self-confidence of models with respect to a certain label represents a good indicator of an erroneous label. However, latest work has built on softmax probabilities to measure self-confidence. In this paper, we argue that -- as softmax probabilities do not reflect a model's predictive uncertainty accurately -- label error detection requires more sophisticated measures of model uncertainty. Therefore, we develop a range of novel, model-agnostic algorithms for Uncertainty Quantification-Based Label Error Detection (UQ-LED), which combine the techniques of confident learning (CL), Monte Carlo Dropout (MCD), model uncertainty measures (e.g., entropy), and ensemble learning to enhance label error detection. We comprehensively evaluate our algorithms on four image classification benchmark datasets in two stages. In the first stage, we demonstrate that our UQ-LED algorithms outperform state-of-the-art confident learning in identifying label errors. In the second stage, we show that removing all identified errors from the training data based on our approach results in higher accuracies than training on all available labeled data. Importantly, besides our contributions to the detection of label errors, we particularly propose a novel approach to generate realistic, class-dependent label errors synthetically. Overall, our study demonstrates that selectively cleaning datasets with UQ-LED algorithms leads to more accurate classifications than using larger, noisier datasets.

5/17/2024

Improving Uncertainty-Error Correspondence in Deep Bayesian Medical Image Segmentation

Prerak Mody, Nicolas F. Chaves-de-Plaza, Chinmay Rao, Eleftheria Astrenidou, Mischa de Ridder, Nienke Hoekstra, Klaus Hildebrandt, Marius Staring

Increased usage of automated tools like deep learning in medical image segmentation has alleviated the bottleneck of manual contouring. This has shifted manual labour to quality assessment (QA) of automated contours which involves detecting errors and correcting them. A potential solution to semi-automated QA is to use deep Bayesian uncertainty to recommend potentially erroneous regions, thus reducing time spent on error detection. Previous work has investigated the correspondence between uncertainty and error, however, no work has been done on improving the utility of Bayesian uncertainty maps such that it is only present in inaccurate regions and not in the accurate ones. Our work trains the FlipOut model with the Accuracy-vs-Uncertainty (AvU) loss which promotes uncertainty to be present only in inaccurate regions. We apply this method on datasets of two radiotherapy body sites, c.f. head-and-neck CT and prostate MR scans. Uncertainty heatmaps (i.e. predictive entropy) are evaluated against voxel inaccuracies using Receiver Operating Characteristic (ROC) and Precision-Recall (PR) curves. Numerical results show that when compared to the Bayesian baseline the proposed method successfully suppresses uncertainty for accurate voxels, with similar presence of uncertainty for inaccurate voxels. Code to reproduce experiments is available at https://github.com/prerakmody/bayesuncertainty-error-correspondence

9/6/2024

Uncertainty Quantification for Bird's Eye View Semantic Segmentation: Methods and Benchmarks

Linlin Yu, Bowen Yang, Tianhao Wang, Kangshuo Li, Feng Chen

The fusion of raw features from multiple sensors on an autonomous vehicle to create a Bird's Eye View (BEV) representation is crucial for planning and control systems. There is growing interest in using deep learning models for BEV semantic segmentation. Anticipating segmentation errors and improving the explainability of DNNs is essential for autonomous driving, yet it is under-studied. This paper introduces a benchmark for predictive uncertainty quantification in BEV segmentation. The benchmark assesses various approaches across three popular datasets using two representative backbones and focuses on the effectiveness of predicted uncertainty in identifying misclassified and out-of-distribution (OOD) pixels, as well as calibration. Empirical findings highlight the challenges in uncertainty quantification. Our results find that evidential deep learning based approaches show the most promise by efficiently quantifying aleatoric and epistemic uncertainty. We propose the Uncertainty-Focal-Cross-Entropy (UFCE) loss, designed for highly imbalanced data, which consistently improves the segmentation quality and calibration. Additionally, we introduce a vacuity-scaled regularization term that enhances the model's focus on high uncertainty pixels, improving epistemic uncertainty quantification.

6/3/2024