How to Fix a Broken Confidence Estimator: Evaluating Post-hoc Methods for Selective Classification with Deep Neural Networks

2305.15508

Published 5/27/2024 by Lu'is Felipe P. Cattelan, Danilo Silva

🏷️

Abstract

This paper addresses the problem of selective classification for deep neural networks, where a model is allowed to abstain from low-confidence predictions to avoid potential errors. We focus on so-called post-hoc methods, which replace the confidence estimator of a given classifier without modifying or retraining it, thus being practically appealing. Considering neural networks with softmax outputs, our goal is to identify the best confidence estimator that can be computed directly from the unnormalized logits. This problem is motivated by the intriguing observation in recent work that many classifiers appear to have a broken confidence estimator, in the sense that their selective classification performance is much worse than what could be expected by their corresponding accuracies. We perform an extensive experimental study of many existing and proposed confidence estimators applied to 84 pretrained ImageNet classifiers available from popular repositories. Our results show that a simple $p$-norm normalization of the logits, followed by taking the maximum logit as the confidence estimator, can lead to considerable gains in selective classification performance, completely fixing the pathological behavior observed in many classifiers. As a consequence, the selective classification performance of any classifier becomes almost entirely determined by its corresponding accuracy. Moreover, these results are shown to be consistent under distribution shift. Our code is available at https://github.com/lfpc/FixSelectiveClassification.

Create account to get full access

Overview

This paper addresses the problem of selective classification for deep neural networks, where a model can choose to abstain from low-confidence predictions to avoid potential errors.
The focus is on "post-hoc" methods, which can improve the confidence estimation of a given classifier without modifying or retraining it.
The goal is to identify the best confidence estimator that can be computed directly from the unnormalized logits of a neural network with softmax outputs.
The researchers explore this problem based on the observation that many classifiers have a "broken" confidence estimator, leading to poor selective classification performance.

Plain English Explanation

When we use deep neural networks for classification tasks, there may be some instances where the network is not very confident about its prediction. In these cases, it would be better for the network to abstain from making a prediction, rather than risk making an incorrect one. This paper looks at ways to improve a neural network's ability to identify and abstain from low-confidence predictions, without having to retrain the entire network.

The researchers focus on "post-hoc" methods, which means they can be applied to an existing, pre-trained classifier to improve its confidence estimation, without modifying the classifier itself. They explore different ways of using the raw, unnormalized outputs (called "logits") of the neural network to come up with a better measure of the network's confidence in its predictions.

This is an important problem because some existing classifiers have been found to have a "broken" confidence estimator, meaning their confidence scores don't align well with their actual accuracy. By finding a better way to estimate confidence, the researchers aim to improve the selective classification performance - the ability of the network to abstain from low-confidence predictions and only make high-confidence ones.

Technical Explanation

The paper's key contribution is an extensive experimental study of many existing and proposed confidence estimators, applied to 84 pre-trained ImageNet classifiers from popular repositories. The researchers focus on neural networks with softmax outputs, and their goal is to identify the best confidence estimator that can be computed directly from the unnormalized logits.

Their results show that a simple "p-norm normalization" of the logits, followed by taking the maximum logit as the confidence estimator, can lead to significant improvements in selective classification performance. This essentially "fixes" the pathological behavior observed in many classifiers, where the selective classification performance was much worse than expected based on their corresponding accuracies.

Furthermore, the researchers demonstrate that these results are consistent even under distribution shift, where the test data differs from the training data. This suggests that their proposed confidence estimation method is a robust and general solution to the problem of selective classification.

The code for this work is available at https://github.com/lfpc/FixSelectiveClassification.

Critical Analysis

The paper provides a comprehensive and rigorous analysis of the problem of selective classification for deep neural networks. The researchers' exploration of a wide range of confidence estimators, applied to a large number of pre-trained models, gives confidence in the generalizability of their findings.

However, the paper does not address some potential limitations or areas for further research. For example, the researchers only consider neural networks with softmax outputs, and it's unclear if their proposed method would work equally well for other types of neural network architectures or output representations, such as hierarchical classification or probabilistic outputs.

Additionally, while the researchers demonstrate the robustness of their method under distribution shift, they do not explore the performance of their approach in more extreme situations, such as post-hoc reversal or other challenging domain adaptation scenarios.

Overall, the paper presents a valuable contribution to the field of selective classification, offering a simple yet effective solution to a common problem in deep learning. Further research exploring the limits and broader applicability of this approach would be a welcome addition to the literature.

Conclusion

This paper addresses the important problem of selective classification for deep neural networks, where models can choose to abstain from low-confidence predictions to avoid potential errors. The researchers focus on "post-hoc" methods that can improve confidence estimation without modifying or retraining the underlying classifier.

Their key finding is that a simple "p-norm normalization" of the neural network's logits, followed by taking the maximum logit as the confidence estimator, can lead to significant improvements in selective classification performance. This "fixes" the broken confidence estimation observed in many existing classifiers, making the selective classification performance almost entirely determined by the model's accuracy.

These results are shown to be robust even under distribution shift, suggesting the proposed confidence estimation method is a general and practical solution to the problem of selective classification. While the paper does not address all potential limitations, it represents an important step forward in improving the reliability and trustworthiness of deep learning models in real-world applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🔮

Selective Prediction for Semantic Segmentation using Post-Hoc Confidence Estimation and Its Performance under Distribution Shift

Bruno Laboissiere Camargos Borges, Bruno Machado Pacheco, Danilo Silva

Semantic segmentation plays a crucial role in various computer vision applications, yet its efficacy is often hindered by the lack of high-quality labeled data. To address this challenge, a common strategy is to leverage models trained on data from different populations, such as publicly available datasets. This approach, however, leads to the distribution shift problem, presenting a reduced performance on the population of interest. In scenarios where model errors can have significant consequences, selective prediction methods offer a means to mitigate risks and reduce reliance on expert supervision. This paper investigates selective prediction for semantic segmentation in low-resource settings, thus focusing on post-hoc confidence estimators applied to pre-trained models operating under distribution shift. We propose a novel image-level confidence measure tailored for semantic segmentation and demonstrate its effectiveness through experiments on three medical imaging tasks. Our findings show that post-hoc confidence estimators offer a cost-effective approach to reducing the impacts of distribution shift.

5/8/2024

cs.LG cs.CV

Investigating Calibration and Corruption Robustness of Post-hoc Pruned Perception CNNs: An Image Classification Benchmark Study

Pallavi Mitra, Gesina Schwalbe, Nadja Klein

Convolutional Neural Networks (CNNs) have achieved state-of-the-art performance in many computer vision tasks. However, high computational and storage demands hinder their deployment into resource-constrained environments, such as embedded devices. Model pruning helps to meet these restrictions by reducing the model size, while maintaining superior performance. Meanwhile, safety-critical applications pose more than just resource and performance constraints. In particular, predictions must not be overly confident, i.e., provide properly calibrated uncertainty estimations (proper uncertainty calibration), and CNNs must be robust against corruptions like naturally occurring input perturbations (natural corruption robustness). This work investigates the important trade-off between uncertainty calibration, natural corruption robustness, and performance for current state-of-research post-hoc CNN pruning techniques in the context of image classification tasks. Our study reveals that post-hoc pruning substantially improves the model's uncertainty calibration, performance, and natural corruption robustness, sparking hope for safe and robust embedded CNNs.Furthermore, uncertainty calibration and natural corruption robustness are not mutually exclusive targets under pruning, as evidenced by the improved safety aspects obtained by post-hoc unstructured pruning with increasing compression.

6/3/2024

cs.CV cs.AI

Hierarchical Selective Classification

Shani Goren, Ido Galil, Ran El-Yaniv

Deploying deep neural networks for risk-sensitive tasks necessitates an uncertainty estimation mechanism. This paper introduces hierarchical selective classification, extending selective classification to a hierarchical setting. Our approach leverages the inherent structure of class relationships, enabling models to reduce the specificity of their predictions when faced with uncertainty. In this paper, we first formalize hierarchical risk and coverage, and introduce hierarchical risk-coverage curves. Next, we develop algorithms for hierarchical selective classification (which we refer to as inference rules), and propose an efficient algorithm that guarantees a target accuracy constraint with high probability. Lastly, we conduct extensive empirical studies on over a thousand ImageNet classifiers, revealing that training regimes such as CLIP, pretraining on ImageNet21k and knowledge distillation boost hierarchical selective performance.

5/21/2024

cs.LG cs.CV

🏷️

Calibrated Selective Classification

Adam Fisch, Tommi Jaakkola, Regina Barzilay

Selective classification allows models to abstain from making predictions (e.g., say I don't know) when in doubt in order to obtain better effective accuracy. While typical selective models can be effective at producing more accurate predictions on average, they may still allow for wrong predictions that have high confidence, or skip correct predictions that have low confidence. Providing calibrated uncertainty estimates alongside predictions -- probabilities that correspond to true frequencies -- can be as important as having predictions that are simply accurate on average. However, uncertainty estimates can be unreliable for certain inputs. In this paper, we develop a new approach to selective classification in which we propose a method for rejecting examples with uncertain uncertainties. By doing so, we aim to make predictions with {well-calibrated} uncertainty estimates over the distribution of accepted examples, a property we call selective calibration. We present a framework for learning selectively calibrated models, where a separate selector network is trained to improve the selective calibration error of a given base model. In particular, our work focuses on achieving robust calibration, where the model is intentionally designed to be tested on out-of-domain data. We achieve this through a training strategy inspired by distributionally robust optimization, in which we apply simulated input perturbations to the known, in-domain training data. We demonstrate the empirical effectiveness of our approach on multiple image classification and lung cancer risk assessment tasks.

6/24/2024

cs.LG