Distribution-Aware Calibration for Object Detection with Noisy Bounding Boxes

Read original: arXiv:2308.12017 - Published 8/28/2024 by Donghao Zhou, Jialin Li, Jinpeng Li, Jiancheng Huang, Qiang Nie, Yong Liu, Bin-Bin Gao, Qiong Wang, Pheng-Ann Heng, Guangyong Chen

🔎

Overview

Accurately annotated datasets are crucial for effective object detection models
Obtaining accurate bounding box annotations is labor-intensive and challenging
Noisy bounding boxes can degrade detection performance

Plain English Explanation

Object detection models, which identify and locate objects in images, rely on large datasets of annotated images to train effectively. However, manually drawing accurate bounding boxes around objects in images is a time-consuming and difficult task. This often results in some level of "noise" or inaccuracy in the annotations, which can then negatively impact the performance of the object detection models trained on that data.

The researchers behind this paper DISCO observed that the true location of an object is usually situated within the region where multiple proposed bounding boxes overlap. They developed a technique called DISCO to model the spatial distribution of these proposed bounding boxes in order to better calibrate the supervision signals used to train the object detection model. This helps the model overcome the limitations of noisy ground-truth annotations.

Technical Explanation

The DISCO approach involves modeling the spatial distribution of proposed bounding boxes to statistically infer the potential locations of objects. Based on this modeled distribution, the researchers developed three techniques:

Distribution-Aware Proposal Augmentation (DA-Aug): Uses the distribution information to generate additional, diverse object proposals to improve the model's classification performance.
Distribution-Aware Box Refinement (DA-Ref): Refines the locations of predicted bounding boxes based on the modeled distribution, improving the model's localization accuracy.
Distribution-Aware Confidence Estimation (DA-Est): Estimates the confidence of predicted bounding boxes using the distribution information, providing better interpretability of the model's outputs.

The researchers evaluated DISCO on large-scale noisy datasets like Pascal VOC and MS-COCO, and found that it can achieve state-of-the-art object detection performance, especially in scenarios with high levels of annotation noise.

Critical Analysis

The DISCO approach provides a promising solution to the challenge of training effective object detectors on noisy datasets. By modeling the spatial distribution of proposed bounding boxes, the technique is able to better calibrate the supervision signals used during training, leading to improved classification, localization, and interpretability.

However, the paper does not fully address the potential limitations of this approach. For instance, the effectiveness of DISCO may depend on the quality and diversity of the initial set of proposed bounding boxes, which could be a source of bias. Additionally, the computational overhead of the distribution modeling and calibration steps may be a concern, especially for large-scale deployment.

Further research could explore ways to make the DISCO approach more efficient, as well as investigate its performance on a wider range of datasets and object detection architectures. Addressing these potential limitations could help solidify DISCO as a robust and practical solution for training accurate object detectors from noisy data.

Conclusion

The DISCO technique offers a valuable approach for improving object detection performance on datasets with noisy bounding box annotations. By modeling the spatial distribution of proposed bounding boxes and using this information to calibrate the supervision signals during training, DISCO can help object detection models overcome the limitations of inaccurate ground-truth annotations. As datasets continue to grow in size and complexity, techniques like DISCO will become increasingly important for developing robust and reliable object detection systems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🔎

Distribution-Aware Calibration for Object Detection with Noisy Bounding Boxes

Donghao Zhou, Jialin Li, Jinpeng Li, Jiancheng Huang, Qiang Nie, Yong Liu, Bin-Bin Gao, Qiong Wang, Pheng-Ann Heng, Guangyong Chen

Large-scale well-annotated datasets are of great importance for training an effective object detector. However, obtaining accurate bounding box annotations is laborious and demanding. Unfortunately, the resultant noisy bounding boxes could cause corrupt supervision signals and thus diminish detection performance. Motivated by the observation that the real ground-truth is usually situated in the aggregation region of the proposals assigned to a noisy ground-truth, we propose DIStribution-aware CalibratiOn (DISCO) to model the spatial distribution of proposals for calibrating supervision signals. In DISCO, spatial distribution modeling is performed to statistically extract the potential locations of objects. Based on the modeled distribution, three distribution-aware techniques, i.e., distribution-aware proposal augmentation (DA-Aug), distribution-aware box refinement (DA-Ref), and distribution-aware confidence estimation (DA-Est), are developed to improve classification, localization, and interpretability, respectively. Extensive experiments on large-scale noisy image datasets (i.e., Pascal VOC and MS-COCO) demonstrate that DISCO can achieve state-of-the-art detection performance, especially at high noise levels. Code is available at https://github.com/Correr-Zhou/DISCO.

8/28/2024

Distribution-Aware Robust Learning from Long-Tailed Data with Noisy Labels

Jae Soon Baik, In Young Yoon, Kun Hoon Kim, Jun Won Choi

Deep neural networks have demonstrated remarkable advancements in various fields using large, well-annotated datasets. However, real-world data often exhibit long-tailed distributions and label noise, significantly degrading generalization performance. Recent studies addressing these issues have focused on noisy sample selection methods that estimate the centroid of each class based on high-confidence samples within each target class. The performance of these methods is limited because they use only the training samples within each class for class centroid estimation, making the quality of centroids susceptible to long-tailed distributions and noisy labels. In this study, we present a robust training framework called Distribution-aware Sample Selection and Contrastive Learning (DaSC). Specifically, DaSC introduces a Distribution-aware Class Centroid Estimation (DaCC) to generate enhanced class centroids. DaCC performs weighted averaging of the features from all samples, with weights determined based on model predictions. Additionally, we propose a confidence-aware contrastive learning strategy to obtain balanced and robust representations. The training samples are categorized into high-confidence and low-confidence samples. Our method then applies Semi-supervised Balanced Contrastive Loss (SBCL) using high-confidence samples, leveraging reliable label information to mitigate class bias. For the low-confidence samples, our method computes Mixup-enhanced Instance Discrimination Loss (MIDL) to improve their representations in a self-supervised manner. Our experimental results on CIFAR and real-world noisy-label datasets demonstrate the superior performance of the proposed DaSC compared to previous approaches.

7/25/2024

Harnessing Uncertainty-aware Bounding Boxes for Unsupervised 3D Object Detection

Ruiyang Zhang, Hu Zhang, Hang Yu, Zhedong Zheng

Unsupervised 3D object detection aims to identify objects of interest from unlabeled raw data, such as LiDAR points. Recent approaches usually adopt pseudo 3D bounding boxes (3D bboxes) from clustering algorithm to initialize the model training, and then iteratively updating both pseudo labels and the trained model. However, pseudo bboxes inevitably contain noises, and such inaccurate annotation accumulates to the final model, compromising the performance. Therefore, in an attempt to mitigate the negative impact of pseudo bboxes, we introduce a new uncertainty-aware framework. In particular, Our method consists of two primary components: uncertainty estimation and uncertainty regularization. (1) In the uncertainty estimation phase, we incorporate an extra auxiliary detection branch alongside the primary detector. The prediction disparity between the primary and auxiliary detectors is leveraged to estimate uncertainty at the box coordinate level, including position, shape, orientation. (2) Based on the assessed uncertainty, we regularize the model training via adaptively adjusting every 3D bboxes coordinates. For pseudo bbox coordinates with high uncertainty, we assign a relatively low loss weight. Experiment verifies that the proposed method is robust against the noisy pseudo bboxes, yielding substantial improvements on nuScenes and Lyft compared to existing techniques, with increases of 6.9% in AP$_{BEV}$ and 2.5% in AP$_{3D}$ on nuScenes, and 2.2% in AP$_{BEV}$ and 1.0% in AP$_{3D}$ on Lyft.

8/2/2024

Distribution Discrepancy and Feature Heterogeneity for Active 3D Object Detection

Huang-Yu Chen, Jia-Fong Yeh, Jia-Wei Liao, Pin-Hsuan Peng, Winston H. Hsu

LiDAR-based 3D object detection is a critical technology for the development of autonomous driving and robotics. However, the high cost of data annotation limits its advancement. We propose a novel and effective active learning (AL) method called Distribution Discrepancy and Feature Heterogeneity (DDFH), which simultaneously considers geometric features and model embeddings, assessing information from both the instance-level and frame-level perspectives. Distribution Discrepancy evaluates the difference and novelty of instances within the unlabeled and labeled distributions, enabling the model to learn efficiently with limited data. Feature Heterogeneity ensures the heterogeneity of intra-frame instance features, maintaining feature diversity while avoiding redundant or similar instances, thus minimizing annotation costs. Finally, multiple indicators are efficiently aggregated using Quantile Transform, providing a unified measure of informativeness. Extensive experiments demonstrate that DDFH outperforms the current state-of-the-art (SOTA) methods on the KITTI and Waymo datasets, effectively reducing the bounding box annotation cost by 56.3% and showing robustness when working with both one-stage and two-stage models.

9/12/2024