DAVE -- A Detect-and-Verify Paradigm for Low-Shot Counting

Read original: arXiv:2404.16622 - Published 4/26/2024 by Jer Pelhan, Alan Lukev{z}iv{c}, Vitjan Zavrtanik, Matej Kristan

👁️

Overview

The paper proposes a new low-shot object counting method called DAVE that avoids the limitations of existing approaches
DAVE generates a high-recall detection set and then verifies the detections to identify and remove outliers, improving both recall and precision for accurate counts
DAVE outperforms state-of-the-art density-based and detection-based counters on total count accuracy and detection quality

Plain English Explanation

Object counting is an important computer vision task with many real-world applications. Low-shot counters aim to estimate the number of objects of a selected category, even when only a few or no examples are annotated in the image.

Existing state-of-the-art methods take two main approaches. Density-based counters estimate the total count by summing over a density map, but don't provide the individual object locations and sizes. Detection-based counters do detect individual objects, but tend to be less accurate on the total count.

The proposed DAVE method takes a "detect-and-verify" approach to address these limitations. First, it generates a high-recall set of object detections. Then, it verifies these detections to identify and remove any false positives. This joint approach of high-recall detection and selective verification leads to accurate counts, outperforming previous state-of-the-art methods.

DAVE also works well in zero-shot and text-prompt-based counting settings, where no image examples are provided and the task is specified via text.

Technical Explanation

The key innovation of DAVE is its "detect-and-verify" paradigm. First, it generates a high-recall set of object detections using a modified object detection model. Then, it verifies these detections to remove any false positives, improving both recall and precision.

The detection module uses a backbone network to extract features, followed by a set of detection heads that predict bounding boxes and object categories. A novel "cropping and classification" head is added to aid in the verification step.

The verification module takes the high-recall detections and classifies them as true or false positives. It does this by extracting features from the cropped detection regions and passing them through a series of fully connected layers.

By combining high-recall detection and selective verification, DAVE achieves state-of-the-art performance on total count accuracy, outperforming previous density-based and detection-based counters by around 20%. It also sets new benchmarks for zero-shot and text-prompt-based counting.

Critical Analysis

The paper thoroughly evaluates DAVE on multiple benchmarks and shows its superiority over existing methods. However, some potential limitations are worth considering:

The verification module adds computational complexity, which could impact real-time performance for certain applications.
The paper does not explore the model's robustness to dataset bias or distribution shift, which is an important consideration for practical deployment.
While DAVE outperforms other methods, there is still room for improvement in total count accuracy, especially for more challenging scenes with occlusion or clutter.

Nonetheless, the key ideas behind DAVE, such as the "detect-and-verify" paradigm and the novel cropping and classification head, represent a promising direction for advancing the state-of-the-art in low-shot object counting. Further research could explore ways to streamline the architecture and investigate the model's generalization capabilities.

Conclusion

The proposed DAVE method addresses the limitations of existing low-shot object counting approaches by combining high-recall detection and selective verification. This joint strategy leads to significant improvements in total count accuracy and detection quality, setting new benchmarks for both density-based and detection-based counters.

DAVE's strong performance in zero-shot and text-prompt-based counting settings also demonstrates its versatility and potential for real-world applications where annotated data is scarce. Overall, the research represents an important step forward in the field of low-shot object counting, with implications for a wide range of computer vision tasks.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

👁️

DAVE -- A Detect-and-Verify Paradigm for Low-Shot Counting

Jer Pelhan, Alan Lukev{z}iv{c}, Vitjan Zavrtanik, Matej Kristan

Low-shot counters estimate the number of objects corresponding to a selected category, based on only few or no exemplars annotated in the image. The current state-of-the-art estimates the total counts as the sum over the object location density map, but does not provide individual object locations and sizes, which are crucial for many applications. This is addressed by detection-based counters, which, however fall behind in the total count accuracy. Furthermore, both approaches tend to overestimate the counts in the presence of other object classes due to many false positives. We propose DAVE, a low-shot counter based on a detect-and-verify paradigm, that avoids the aforementioned issues by first generating a high-recall detection set and then verifying the detections to identify and remove the outliers. This jointly increases the recall and precision, leading to accurate counts. DAVE outperforms the top density-based counters by ~20% in the total count MAE, it outperforms the most recent detection-based counter by ~20% in detection quality and sets a new state-of-the-art in zero-shot as well as text-prompt-based counting.

4/26/2024

AFreeCA: Annotation-Free Counting for All

Adriano D'Alessandro, Ali Mahdavi-Amiri, Ghassan Hamarneh

Object counting methods typically rely on manually annotated datasets. The cost of creating such datasets has restricted the versatility of these networks to count objects from specific classes (such as humans or penguins), and counting objects from diverse categories remains a challenge. The availability of robust text-to-image latent diffusion models (LDMs) raises the question of whether these models can be utilized to generate counting datasets. However, LDMs struggle to create images with an exact number of objects based solely on text prompts but they can be used to offer a dependable textit{sorting} signal by adding and removing objects within an image. Leveraging this data, we initially introduce an unsupervised sorting methodology to learn object-related features that are subsequently refined and anchored for counting purposes using counting data generated by LDMs. Further, we present a density classifier-guided method for dividing an image into patches containing objects that can be reliably counted. Consequently, we can generate counting data for any type of object and count them in an unsupervised manner. Our approach outperforms other unsupervised and few-shot alternatives and is not restricted to specific object classes for which counting data is available. Code to be released upon acceptance.

8/6/2024

Zero-shot Object Counting with Good Exemplars

Huilin Zhu, Jingling Yuan, Zhengwei Yang, Yu Guo, Zheng Wang, Xian Zhong, Shengfeng He

Zero-shot object counting (ZOC) aims to enumerate objects in images using only the names of object classes during testing, without the need for manual annotations. However, a critical challenge in current ZOC methods lies in their inability to identify high-quality exemplars effectively. This deficiency hampers scalability across diverse classes and undermines the development of strong visual associations between the identified classes and image content. To this end, we propose the Visual Association-based Zero-shot Object Counting (VA-Count) framework. VA-Count consists of an Exemplar Enhancement Module (EEM) and a Noise Suppression Module (NSM) that synergistically refine the process of class exemplar identification while minimizing the consequences of incorrect object identification. The EEM utilizes advanced vision-language pretaining models to discover potential exemplars, ensuring the framework's adaptability to various classes. Meanwhile, the NSM employs contrastive learning to differentiate between optimal and suboptimal exemplar pairs, reducing the negative effects of erroneous exemplars. VA-Count demonstrates its effectiveness and scalability in zero-shot contexts with superior performance on two object counting datasets.

7/10/2024

Iterative Object Count Optimization for Text-to-image Diffusion Models

Oz Zafar, Lior Wolf, Idan Schwartz

We address a persistent challenge in text-to-image models: accurately generating a specified number of objects. Current models, which learn from image-text pairs, inherently struggle with counting, as training data cannot depict every possible number of objects for any given object. To solve this, we propose optimizing the generated image based on a counting loss derived from a counting model that aggregates an object's potential. Employing an out-of-the-box counting model is challenging for two reasons: first, the model requires a scaling hyperparameter for the potential aggregation that varies depending on the viewpoint of the objects, and second, classifier guidance techniques require modified models that operate on noisy intermediate diffusion steps. To address these challenges, we propose an iterated online training mode that improves the accuracy of inferred images while altering the text conditioning embedding and dynamically adjusting hyperparameters. Our method offers three key advantages: (i) it can consider non-derivable counting techniques based on detection models, (ii) it is a zero-shot plug-and-play solution facilitating rapid changes to the counting techniques and image generation methods, and (iii) the optimized counting token can be reused to generate accurate images without additional optimization. We evaluate the generation of various objects and show significant improvements in accuracy. The project page is available at https://ozzafar.github.io/count_token.

8/22/2024