On Calibration of Object Detectors: Pitfalls, Evaluation and Baselines

2405.20459

Published 6/3/2024 by Selim Kuzucu, Kemal Oksuz, Jonathan Sadeghi, Puneet K. Dokania

🏋️

Abstract

Reliable usage of object detectors require them to be calibrated -- a crucial problem that requires careful attention. Recent approaches towards this involve (1) designing new loss functions to obtain calibrated detectors by training them from scratch, and (2) post-hoc Temperature Scaling (TS) that learns to scale the likelihood of a trained detector to output calibrated predictions. These approaches are then evaluated based on a combination of Detection Expected Calibration Error (D-ECE) and Average Precision. In this work, via extensive analysis and insights, we highlight that these recent evaluation frameworks, evaluation metrics, and the use of TS have notable drawbacks leading to incorrect conclusions. As a step towards fixing these issues, we propose a principled evaluation framework to jointly measure calibration and accuracy of object detectors. We also tailor efficient and easy-to-use post-hoc calibration approaches such as Platt Scaling and Isotonic Regression specifically for object detection task. Contrary to the common notion, our experiments show that once designed and evaluated properly, post-hoc calibrators, which are extremely cheap to build and use, are much more powerful and effective than the recent train-time calibration methods. To illustrate, D-DETR with our post-hoc Isotonic Regression calibrator outperforms the recent train-time state-of-the-art calibration method Cal-DETR by more than 7 D-ECE on the COCO dataset. Additionally, we propose improved versions of the recently proposed Localization-aware ECE and show the efficacy of our method on these metrics as well. Code is available at: https://github.com/fiveai/detection_calibration.

Create account to get full access

Overview

The paper discusses the crucial problem of calibrating object detectors to ensure reliable usage.
It highlights the limitations of recent approaches, including designing new loss functions and post-hoc Temperature Scaling (TS).
The paper proposes a principled evaluation framework to jointly measure calibration and accuracy of object detectors.
It also introduces efficient and easy-to-use post-hoc calibration approaches, such as Platt Scaling and Isotonic Regression, specifically for object detection tasks.
Contrary to common belief, the paper shows that post-hoc calibrators are more powerful and effective than recent train-time calibration methods.

Plain English Explanation

Object detectors are computer vision models that can identify and locate objects in images or videos. However, for these detectors to be reliable, they need to be properly calibrated, which means their output probabilities should accurately reflect the true likelihood of an object being present.

Recent approaches to this problem have focused on two main strategies: (1) designing new loss functions to train object detectors from scratch to be well-calibrated, and (2) using a technique called Temperature Scaling (TS) to adjust the output probabilities of a trained detector to make them more calibrated.

These approaches are then evaluated using a combination of metrics, such as Detection Expected Calibration Error (D-ECE) and Average Precision.

However, the paper argues that these recent evaluation frameworks, metrics, and the use of TS have notable drawbacks, leading to incorrect conclusions about the effectiveness of these calibration methods.

To address these issues, the paper proposes a new, more principled evaluation framework that can jointly measure the calibration and accuracy of object detectors. It also introduces efficient and easy-to-use post-hoc calibration approaches, such as Platt Scaling and Isotonic Regression, specifically tailored for object detection tasks.

Surprisingly, the paper finds that these post-hoc calibrators, which are cheap to implement and use, are actually more powerful and effective than the recent train-time calibration methods, contrary to common belief. For example, the paper shows that a detector calibrated using Isotonic Regression outperforms a state-of-the-art train-time calibration method by a significant margin on the COCO dataset.

Additionally, the paper proposes improved versions of the recently proposed Localization-aware ECE metric, which takes into account the localization accuracy of object detectors when evaluating their calibration.

Technical Explanation

The paper begins by highlighting the importance of calibrating object detectors to ensure reliable usage. It then reviews two main approaches from recent literature: (1) designing new loss functions to train calibrated detectors from scratch, and (2) using post-hoc Temperature Scaling (TS) to scale the output probabilities of a trained detector.

The paper argues that these recent evaluation frameworks, metrics (such as D-ECE), and the use of TS have notable drawbacks, leading to incorrect conclusions about the effectiveness of these calibration methods. For example, the paper shows that TS can sometimes worsen the calibration of object detectors.

To address these issues, the paper proposes a new, principled evaluation framework that can jointly measure the calibration and accuracy of object detectors. This framework uses a combination of metrics, including a tailored version of Expected Calibration Error (ECE) that takes into account the localization accuracy of detectors.

The paper then introduces efficient and easy-to-use post-hoc calibration approaches, such as Platt Scaling and Isotonic Regression, specifically designed for object detection tasks. These methods learn to scale the output probabilities of a trained detector to make them more calibrated.

Contrary to the common belief that train-time calibration methods are superior, the paper's extensive experiments show that these post-hoc calibrators are much more powerful and effective than the recent train-time calibration methods. For example, the paper demonstrates that a detector calibrated using Isotonic Regression outperforms the state-of-the-art train-time calibration method Cal-DETR by more than 7 D-ECE on the COCO dataset.

Critical Analysis

The paper provides a thorough and insightful analysis of the current state of object detector calibration research. While the proposed solutions are technically sound, there are a few potential limitations and areas for further research that could be considered:

The paper's evaluation is primarily focused on the COCO dataset, which may not fully capture the diversity of real-world object detection scenarios. It would be valuable to extend the analysis to other datasets and application domains to assess the generalizability of the findings.
The paper does not provide a comprehensive comparison of the computational and memory overhead of the various calibration methods. This information could be useful for practitioners who need to deploy calibrated object detectors in resource-constrained environments.
The paper discusses the limitations of current evaluation metrics, such as D-ECE, but does not propose a completely novel metric. Further research could explore developing more robust and comprehensive evaluation frameworks for object detector calibration.
The paper's focus is on post-hoc calibration methods, but there may still be value in exploring train-time calibration approaches, especially if they can be combined with the proposed post-hoc techniques for even better performance.

Overall, the paper presents a significant contribution to the field of object detector calibration, challenging the current state-of-the-art and proposing practical solutions. However, as with any research, there is always room for further exploration and refinement to address the remaining challenges.

Conclusion

This paper tackles the crucial problem of calibrating object detectors to ensure their reliable usage. It highlights the limitations of recent approaches, including designing new loss functions and post-hoc Temperature Scaling, and proposes a more principled evaluation framework to jointly measure the calibration and accuracy of object detectors.

The paper introduces efficient and easy-to-use post-hoc calibration approaches, such as Platt Scaling and Isotonic Regression, specifically tailored for object detection tasks. Contrary to common belief, the paper's extensive experiments show that these post-hoc calibrators are much more powerful and effective than the recent train-time calibration methods.

This work represents a significant step forward in the field of object detector calibration, providing valuable insights and practical solutions that can help improve the reliability and trustworthiness of computer vision systems in real-world applications.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Optimizing Calibration by Gaining Aware of Prediction Correctness

Yuchi Liu, Lei Wang, Yuli Zou, James Zou, Liang Zheng

Model calibration aims to align confidence with prediction correctness. The Cross-Entropy (CE) loss is widely used for calibrator training, which enforces the model to increase confidence on the ground truth class. However, we find the CE loss has intrinsic limitations. For example, for a narrow misclassification, a calibrator trained by the CE loss often produces high confidence on the wrongly predicted class (e.g., a test sample is wrongly classified and its softmax score on the ground truth class is around 0.4), which is undesirable. In this paper, we propose a new post-hoc calibration objective derived from the aim of calibration. Intuitively, the proposed objective function asks that the calibrator decrease model confidence on wrongly predicted samples and increase confidence on correctly predicted samples. Because a sample itself has insufficient ability to indicate correctness, we use its transformed versions (e.g., rotated, greyscaled and color-jittered) during calibrator training. Trained on an in-distribution validation set and tested with isolated, individual test samples, our method achieves competitive calibration performance on both in-distribution and out-of-distribution test sets compared with the state of the art. Further, our analysis points out the difference between our method and commonly used objectives such as CE loss and mean square error loss, where the latters sometimes deviates from the calibration aim.

4/26/2024

cs.CV cs.LG stat.ML

A re-calibration method for object detection with multi-modal alignment bias in autonomous driving

Zhihang Song, Lihui Peng, Jianming Hu, Danya Yao, Yi Zhang

Multi-modal object detection in autonomous driving has achieved great breakthroughs due to the usage of fusing complementary information from different sensors. The calibration in fusion between sensors such as LiDAR and camera is always supposed to be precise in previous work. However, in reality, calibration matrices are fixed when the vehicles leave the factory, but vibration, bumps, and data lags may cause calibration bias. As the research on the calibration influence on fusion detection performance is relatively few, flexible calibration dependency multi-sensor detection method has always been attractive. In this paper, we conducted experiments on SOTA detection method EPNet++ and proved slight bias on calibration can reduce the performance seriously. We also proposed a re-calibration model based on semantic segmentation which can be combined with a detection algorithm to improve the performance and robustness of multi-modal calibration bias.

5/28/2024

cs.CV

🎲

Posterior Probability Matters: Doubly-Adaptive Calibration for Neural Predictions in Online Advertising

Penghui Wei, Weimin Zhang, Ruijie Hou, Jinquan Liu, Shaoguo Liu, Liang Wang, Bo Zheng

Predicting user response probabilities is vital for ad ranking and bidding. We hope that predictive models can produce accurate probabilistic predictions that reflect true likelihoods. Calibration techniques aim to post-process model predictions to posterior probabilities. Field-level calibration -- which performs calibration w.r.t. to a specific field value -- is fine-grained and more practical. In this paper we propose a doubly-adaptive approach AdaCalib. It learns an isotonic function family to calibrate model predictions with the guidance of posterior statistics, and field-adaptive mechanisms are designed to ensure that the posterior is appropriate for the field value to be calibrated. Experiments verify that AdaCalib achieves significant improvement on calibration performance. It has been deployed online and beats previous approach.

5/28/2024

cs.LG cs.IR

Mask-TS Net: Mask Temperature Scaling Uncertainty Calibration for Polyp Segmentation

Yudian Zhang, Chenhao Xu, Kaiye Xu, Haijiang Zhu

Lots of popular calibration methods in medical images focus on classification, but there are few comparable studies on semantic segmentation. In polyp segmentation of medical images, we find most diseased area occupies only a small portion of the entire image, resulting in previous models being not well-calibrated for lesion regions but well-calibrated for background, despite their seemingly better Expected Calibration Error (ECE) scores overall. Therefore, we proposed four-branches calibration network with Mask-Loss and Mask-TS strategies to more focus on the scaling of logits within potential lesion regions, which serves to mitigate the influence of background interference. In the experiments, we compare the existing calibration methods with the proposed Mask Temperature Scaling (Mask-TS). The results indicate that the proposed calibration network outperforms other methods both qualitatively and quantitatively.

5/10/2024

cs.CV