A Novel Unified Architecture for Low-Shot Counting by Detection and Segmentation

Read original: arXiv:2409.18686 - Published 9/30/2024 by Jer Pelhan, Alan Lukev{z}iv{c}, Vitjan Zavrtanik, Matej Kristan

🔎

Overview

Low-shot object counters estimate the number of objects in an image using few or no annotated examples.
Objects are localized by matching them to prototypes, which are constructed by unsupervised image-wide object appearance aggregation.
Existing approaches can lead to overgeneralization and false positive detections due to diverse object appearances.
Current methods use a surrogate loss that predicts a unit Gaussian at each object center, which is sensitive to annotation errors, hyperparameters, and does not directly optimize the detection task.

Plain English Explanation

GeCo: A Novel Unified Architecture for Low-Shot Counting is a new approach to counting objects in images using only a small number of examples, or even no examples at all.

Traditional object counting methods work by first identifying where the objects are in the image, and then counting them. However, this can be challenging when objects have diverse appearances, as the methods may struggle to recognize all the different types of objects.

GeCo solves this problem by using a novel way to match the objects in the image to "prototypes" - generic representations of what the objects look like. This allows GeCo to be more robust to the variety of object appearances.

Additionally, GeCo introduces a new way of training the model to directly optimize the object counting task, rather than relying on an indirect "surrogate" loss function that can be sensitive to errors in the training data. This helps GeCo achieve more accurate object counts.

Technical Explanation

GeCo is a low-shot object counter that uses a unified architecture to achieve accurate object detection, segmentation, and count estimation. It addresses the limitations of existing approaches by:

Robustly generalizing object prototypes across diverse appearances through a novel "dense object query" formulation.
Introducing a new counting loss that directly optimizes the detection task, avoiding the issues of standard surrogate losses.

The key innovations in GeCo's architecture and training enable it to surpass leading few-shot detection-based counters by around 25% in total count error, while also achieving superior detection accuracy.

Critical Analysis

The paper presents a strong technical contribution, with a well-designed architecture and training approach that outperforms previous low-shot counting methods. However, some potential limitations or areas for further research include:

The paper focuses on evaluating GeCo on standard low-shot counting benchmarks, but it would be interesting to see how it performs on more diverse or challenging real-world datasets.
The authors mention that GeCo's dense object query mechanism helps it generalize across diverse object appearances, but they don't provide a deep analysis of this aspect.
While the new counting loss function is a key innovation, the paper doesn't explore the trade-offs or implications of directly optimizing for the counting task versus using a surrogate loss.

Overall, GeCo represents a significant advancement in low-shot object counting, and the proposed techniques could have broader applications in other computer vision tasks that require robust handling of diverse object appearances.

Conclusion

GeCo introduces a novel unified architecture for low-shot object counting that outperforms previous state-of-the-art methods. By using a robust prototype matching approach and a counting-specific loss function, GeCo can accurately detect, segment, and count objects in images with far fewer annotated examples than traditional supervised methods.

This research represents an important step forward in making object counting more accessible and scalable, with potential applications in areas like retail analytics, wildlife monitoring, and medical imaging. The technical innovations in GeCo's design and training could also inspire future work in other computer vision domains that require handling diverse and challenging object appearances.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🔎

A Novel Unified Architecture for Low-Shot Counting by Detection and Segmentation

Jer Pelhan, Alan Lukev{z}iv{c}, Vitjan Zavrtanik, Matej Kristan

Low-shot object counters estimate the number of objects in an image using few or no annotated exemplars. Objects are localized by matching them to prototypes, which are constructed by unsupervised image-wide object appearance aggregation. Due to potentially diverse object appearances, the existing approaches often lead to overgeneralization and false positive detections. Furthermore, the best-performing methods train object localization by a surrogate loss, that predicts a unit Gaussian at each object center. This loss is sensitive to annotation error, hyperparameters and does not directly optimize the detection task, leading to suboptimal counts. We introduce GeCo, a novel low-shot counter that achieves accurate object detection, segmentation, and count estimation in a unified architecture. GeCo robustly generalizes the prototypes across objects appearances through a novel dense object query formulation. In addition, a novel counting loss is proposed, that directly optimizes the detection task and avoids the issues of the standard surrogate loss. GeCo surpasses the leading few-shot detection-based counters by $sim$25% in the total count MAE, achieves superior detection accuracy and sets a new solid state-of-the-art result across all low-shot counting setups.

9/30/2024

Zero-shot Object Counting with Good Exemplars

Huilin Zhu, Jingling Yuan, Zhengwei Yang, Yu Guo, Zheng Wang, Xian Zhong, Shengfeng He

Zero-shot object counting (ZOC) aims to enumerate objects in images using only the names of object classes during testing, without the need for manual annotations. However, a critical challenge in current ZOC methods lies in their inability to identify high-quality exemplars effectively. This deficiency hampers scalability across diverse classes and undermines the development of strong visual associations between the identified classes and image content. To this end, we propose the Visual Association-based Zero-shot Object Counting (VA-Count) framework. VA-Count consists of an Exemplar Enhancement Module (EEM) and a Noise Suppression Module (NSM) that synergistically refine the process of class exemplar identification while minimizing the consequences of incorrect object identification. The EEM utilizes advanced vision-language pretaining models to discover potential exemplars, ensuring the framework's adaptability to various classes. Meanwhile, the NSM employs contrastive learning to differentiate between optimal and suboptimal exemplar pairs, reducing the negative effects of erroneous exemplars. VA-Count demonstrates its effectiveness and scalability in zero-shot contexts with superior performance on two object counting datasets.

7/10/2024

👁️

DAVE -- A Detect-and-Verify Paradigm for Low-Shot Counting

Jer Pelhan, Alan Lukev{z}iv{c}, Vitjan Zavrtanik, Matej Kristan

Low-shot counters estimate the number of objects corresponding to a selected category, based on only few or no exemplars annotated in the image. The current state-of-the-art estimates the total counts as the sum over the object location density map, but does not provide individual object locations and sizes, which are crucial for many applications. This is addressed by detection-based counters, which, however fall behind in the total count accuracy. Furthermore, both approaches tend to overestimate the counts in the presence of other object classes due to many false positives. We propose DAVE, a low-shot counter based on a detect-and-verify paradigm, that avoids the aforementioned issues by first generating a high-recall detection set and then verifying the detections to identify and remove the outliers. This jointly increases the recall and precision, leading to accurate counts. DAVE outperforms the top density-based counters by ~20% in the total count MAE, it outperforms the most recent detection-based counter by ~20% in detection quality and sets a new state-of-the-art in zero-shot as well as text-prompt-based counting.

4/26/2024

AFreeCA: Annotation-Free Counting for All

Adriano D'Alessandro, Ali Mahdavi-Amiri, Ghassan Hamarneh

Object counting methods typically rely on manually annotated datasets. The cost of creating such datasets has restricted the versatility of these networks to count objects from specific classes (such as humans or penguins), and counting objects from diverse categories remains a challenge. The availability of robust text-to-image latent diffusion models (LDMs) raises the question of whether these models can be utilized to generate counting datasets. However, LDMs struggle to create images with an exact number of objects based solely on text prompts but they can be used to offer a dependable textit{sorting} signal by adding and removing objects within an image. Leveraging this data, we initially introduce an unsupervised sorting methodology to learn object-related features that are subsequently refined and anchored for counting purposes using counting data generated by LDMs. Further, we present a density classifier-guided method for dividing an image into patches containing objects that can be reliably counted. Consequently, we can generate counting data for any type of object and count them in an unsupervised manner. Our approach outperforms other unsupervised and few-shot alternatives and is not restricted to specific object classes for which counting data is available. Code to be released upon acceptance.

8/6/2024