Consensus Focus for Object Detection and minority classes

Read original: arXiv:2401.05530 - Published 6/4/2024 by Erik Isai Valle Salgado, Chen Li, Yaqi Han, Linchao Shi, Xinghui Li

🔎

Overview

The paper proposes a modified consensus focus method for semi-supervised and long-tailed object detection.
The method introduces a voting system based on source confidence to determine the contribution of each model in a consensus.
It allows the user to choose the relevance of each class in the target label space to avoid suppressing minority bounding boxes.
The method combines multiple models' results without discarding the "poisonous" networks.
Experiments on synthetic driving datasets show the method outperforms existing techniques like NMS, soft-NMS, and WBF.

Plain English Explanation

The paper tackles the challenge of domain adaptation and multi-source transfer learning in machine learning. These problems arise when you want to use a model trained on one dataset (the "source" domain) to make predictions on a different dataset (the "target" domain).

The authors propose a new method called "modified consensus focus" that combines the predictions of multiple models to improve performance, especially on small or rare categories (long-tailed object detection).

The key ideas are:

A voting system that weighs the contribution of each model based on its confidence in the predictions.
Allowing the user to choose which classes are more important in the target dataset, so the method doesn't suppress the bounding boxes for minority classes.
Combining the results of multiple models, even if some of them are not performing well ("poisonous" networks).

The authors tested the method on synthetic driving datasets and found it outperformed existing techniques like Non-Maximum Suppression (NMS), Soft-NMS, and Weighted Boxes Fusion (WBF) in terms of confidence and accuracy of the bounding boxes.

Technical Explanation

The paper proposes a "modified consensus focus" method for semi-supervised and long-tailed object detection. The key elements of the method are:

Voting system based on source confidence: The method introduces a voting system that weighs the contribution of each model in the consensus based on its confidence in the predictions. This allows the method to focus on the most reliable models.
User-specified class relevance: The method lets the user choose the relevance of each class in the target label space. This helps avoid suppressing bounding boxes for minority classes, which is a common issue with existing ensemble methods.
Combining multiple models: The method combines the results of multiple models, even if some of them are not performing well ("poisonous" networks). This is in contrast to other ensemble methods that discard poorly performing models.

The authors evaluated the proposed method on synthetic driving datasets and compared it to existing techniques like NMS, Soft-NMS, and WBF. The results show that the modified consensus focus approach achieves higher confidence and more accurate bounding boxes than the baseline methods.

Critical Analysis

The paper introduces a novel approach to ensemble learning for object detection, addressing important challenges like domain adaptation and long-tailed distributions. The key strengths of the method are the flexibility to prioritize certain classes, the robustness to "poisonous" models, and the improved performance on small or rare categories.

However, the paper does not provide a detailed analysis of the method's limitations or potential issues. For example, it would be helpful to understand how the method scales with the number of source models, or how sensitive the performance is to the user's choice of class relevance. Additionally, the evaluation on synthetic driving datasets, while informative, may not fully capture the complexities of real-world object detection scenarios.

Further research could explore the generalizability of the modified consensus focus approach to other domains and tasks, as well as investigate ways to automate the selection of class relevance to reduce the burden on the user. Additionally, a more thorough comparison to state-of-the-art ensemble methods in the object detection literature could provide deeper insights into the method's strengths and weaknesses.

Conclusion

The paper presents a novel ensemble method, modified consensus focus, for semi-supervised and long-tailed object detection. The key innovations are a voting system based on source confidence, user-specified class relevance, and the ability to combine multiple models without discarding poorly performing ones. Experiments on synthetic driving datasets show the method outperforms existing techniques like NMS, Soft-NMS, and WBF.

The modified consensus focus approach offers a flexible and robust solution to the challenges of domain adaptation and long-tailed distributions in machine learning. While the paper lacks a detailed analysis of the method's limitations, it showcases an important step forward in the field of ensemble learning for object detection.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🔎

Consensus Focus for Object Detection and minority classes

Erik Isai Valle Salgado, Chen Li, Yaqi Han, Linchao Shi, Xinghui Li

Ensemble methods exploit the availability of a given number of classifiers or detectors trained in single or multiple source domains and tasks to address machine learning problems such as domain adaptation or multi-source transfer learning. Existing research measures the domain distance between the sources and the target dataset, trains multiple networks on the same data with different samples per class, or combines predictions from models trained under varied hyperparameters and settings. Their solutions enhanced the performance on small or tail categories but hurt the rest. To this end, we propose a modified consensus focus for semi-supervised and long-tailed object detection. We introduce a voting system based on source confidence that spots the contribution of each model in a consensus, lets the user choose the relevance of each class in the target label space so that it relaxes minority bounding boxes suppression, and combines multiple models' results without discarding the poisonous networks. Our tests on synthetic driving datasets retrieved higher confidence and more accurate bounding boxes than the NMS, soft-NMS, and WBF. The code used to generate the results is available in our GitHub repository: http://github.com/ErikValle/Consensus-focus-for-object-detection.

6/4/2024

🛠️

Improving Single Domain-Generalized Object Detection: A Focus on Diversification and Alignment

Muhammad Sohail Danish, Muhammad Haris Khan, Muhammad Akhtar Munir, M. Saquib Sarfraz, Mohsen Ali

In this work, we tackle the problem of domain generalization for object detection, specifically focusing on the scenario where only a single source domain is available. We propose an effective approach that involves two key steps: diversifying the source domain and aligning detections based on class prediction confidence and localization. Firstly, we demonstrate that by carefully selecting a set of augmentations, a base detector can outperform existing methods for single domain generalization by a good margin. This highlights the importance of domain diversification in improving the performance of object detectors. Secondly, we introduce a method to align detections from multiple views, considering both classification and localization outputs. This alignment procedure leads to better generalized and well-calibrated object detector models, which are crucial for accurate decision-making in safety-critical applications. Our approach is detector-agnostic and can be seamlessly applied to both single-stage and two-stage detectors. To validate the effectiveness of our proposed methods, we conduct extensive experiments and ablations on challenging domain-shift scenarios. The results consistently demonstrate the superiority of our approach compared to existing methods. Our code and models are available at: https://github.com/msohaildanish/DivAlign

5/24/2024

Multi-clue Consistency Learning to Bridge Gaps Between General and Oriented Object in Semi-supervised Detection

Chenxu Wang, Chunyan Xu, Ziqi Gu, Zhen Cui

While existing semi-supervised object detection (SSOD) methods perform well in general scenes, they encounter challenges in handling oriented objects in aerial images. We experimentally find three gaps between general and oriented object detection in semi-supervised learning: 1) Sampling inconsistency: the common center sampling is not suitable for oriented objects with larger aspect ratios when selecting positive labels from labeled data. 2) Assignment inconsistency: balancing the precision and localization quality of oriented pseudo-boxes poses greater challenges which introduces more noise when selecting positive labels from unlabeled data. 3) Confidence inconsistency: there exists more mismatch between the predicted classification and localization qualities when considering oriented objects, affecting the selection of pseudo-labels. Therefore, we propose a Multi-clue Consistency Learning (MCL) framework to bridge gaps between general and oriented objects in semi-supervised detection. Specifically, considering various shapes of rotated objects, the Gaussian Center Assignment is specially designed to select the pixel-level positive labels from labeled data. We then introduce the Scale-aware Label Assignment to select pixel-level pseudo-labels instead of unreliable pseudo-boxes, which is a divide-and-rule strategy suited for objects with various scales. The Consistent Confidence Soft Label is adopted to further boost the detector by maintaining the alignment of the predicted results. Comprehensive experiments on DOTA-v1.5 and DOTA-v1.0 benchmarks demonstrate that our proposed MCL can achieve state-of-the-art performance in the semi-supervised oriented object detection task.

7/9/2024

🔮

Utilizing dataset affinity prediction in object detection to assess training data

Stefan Becker, Jens Bayer, Ronny Hug, Wolfgang Hubner, Michael Arens

Data pooling offers various advantages, such as increasing the sample size, improving generalization, reducing sampling bias, and addressing data sparsity and quality, but it is not straightforward and may even be counterproductive. Assessing the effectiveness of pooling datasets in a principled manner is challenging due to the difficulty in estimating the overall information content of individual datasets. Towards this end, we propose incorporating a data source prediction module into standard object detection pipelines. The module runs with minimal overhead during inference time, providing additional information about the data source assigned to individual detections. We show the benefits of the so-called dataset affinity score by automatically selecting samples from a heterogeneous pool of vehicle datasets. The results show that object detectors can be trained on a significantly sparser set of training samples without losing detection accuracy.

5/9/2024