On the Evaluation Metric for Hashing

Read original: arXiv:1905.10951 - Published 5/7/2024 by Qing-Yuan Jiang, Ming-Wei Li, Wu-Jun Li

🧠

Overview

Hashing has become a widely used technique for large-scale approximate nearest neighbor (ANN) search due to its low storage cost and fast query speed.
Bucket search, also known as hash lookup, can achieve fast query speed with a sub-linear time cost by using an inverted index table constructed from hash codes.
Existing metrics used to evaluate hashing algorithms are improper for evaluating hash codes for bucket search, as they either ignore retrieval time cost or suffer from other issues.
This paper proposes a new evaluation metric called radius aware mean average precision (RAMAP) to better assess hash codes for bucket search, and also introduces two coding strategies to demonstrate the problems with existing metrics.

Plain English Explanation

Hashing is a technique that converts data into a compact code, called a hash code. Hashing has become a popular way to quickly search through large datasets because the hash codes are small and can be looked up very fast.

One way to use hash codes for searching is called "bucket search" or "hash lookup." This involves creating an index table from the hash codes, which allows the search system to quickly find the relevant data by looking up the hash code. This type of search can be very efficient, with a search time that grows sub-linearly as the dataset size increases.

However, the metrics commonly used to evaluate hashing algorithms are not well-suited for assessing the performance of hash codes for bucket search. Some of these metrics ignore the important factor of how long the search takes, while others have other issues that make them poor indicators of real-world search quality.

To address this, the researchers propose a new evaluation metric called "radius aware mean average precision" (RAMAP). This metric is designed to better capture the strengths of hash codes for bucket search, including the speed and accuracy of the searches. The researchers also introduce two new coding strategies to further illustrate the problems with existing metrics.

Technical Explanation

The paper first discusses how hashing has become a widely used technique for large-scale approximate nearest neighbor (ANN) search, enabled by the low storage cost and fast query speed of hash codes. Bucket search, or hash lookup, is highlighted as a particularly efficient form of ANN search that can achieve sub-linear query time by using an inverted index constructed from the hash codes.

The authors then explain that existing evaluation metrics for hashing algorithms are inadequate for assessing the performance of hash codes for bucket search. Some metrics ignore the crucial factor of retrieval time cost, while others suffer from issues like the "uncertainty problem" due to the integer-valued Hamming distance used in ranking, or a lack of sensitivity to the Hamming radius. Metrics focused on a specific Hamming radius also fail to capture the overall performance.

To address these shortcomings, the paper proposes a new evaluation metric called radius aware mean average precision (RAMAP). RAMAP is designed to properly account for both the search accuracy and the retrieval time cost when evaluating hash codes for bucket search.

Furthermore, the authors introduce two coding strategies - "cluster-based coding" and "anchor-aware coding" - to qualitatively demonstrate the problems of existing metrics. These strategies show how certain metrics can fail to accurately reflect the true performance of hash codes for bucket search.

Experimental results are presented to validate that the proposed RAMAP metric can provide a more appropriate evaluation of hash codes for bucket search compared to existing metrics.

Critical Analysis

The paper makes a strong case for the need to develop more suitable evaluation metrics for assessing the performance of hash codes in the context of bucket search. The authors convincingly demonstrate the limitations of existing metrics, which fail to adequately capture the importance of retrieval time cost or are insensitive to key factors like Hamming radius.

The introduction of the RAMAP metric is a promising solution, as it appears to address these shortcomings by jointly considering search accuracy and retrieval time. However, the authors do not provide a deep analysis of the specific tradeoffs or edge cases that might arise when using RAMAP, which would be helpful for understanding its practical implications and limitations.

Additionally, the paper does not discuss the computational complexity of calculating RAMAP or how it might scale as the dataset size increases. This could be an important consideration, especially for large-scale ANN search applications.

Overall, the research presented in this paper is a valuable contribution to the hashing community, as it highlights an important gap in existing evaluation practices and proposes a new metric that could lead to more meaningful assessments of hashing algorithms for bucket search. Further exploration of RAMAP's properties and performance characteristics would help strengthen the case for its adoption.

Conclusion

This paper identifies a critical issue in the evaluation of hashing algorithms for large-scale approximate nearest neighbor (ANN) search, specifically in the context of bucket search or hash lookup. The authors demonstrate that existing metrics used to assess hashing techniques are inadequate, as they either ignore the important factor of retrieval time cost or suffer from other problems like the "uncertainty problem" and lack of sensitivity to Hamming radius.

To address these shortcomings, the paper introduces a new evaluation metric called radius aware mean average precision (RAMAP), which is designed to better capture the strengths of hash codes for bucket search by jointly considering search accuracy and retrieval time. The authors also propose two novel coding strategies to illustrate the issues with current evaluation methods.

The research presented in this paper is a valuable contribution to the hashing community, as it highlights an important gap in current evaluation practices and proposes a new metric that could lead to more meaningful assessments of hashing algorithms for large-scale ANN search applications. Further exploration of RAMAP's properties and performance characteristics would help strengthen the case for its adoption and drive the development of more effective hashing techniques.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🧠

On the Evaluation Metric for Hashing

Qing-Yuan Jiang, Ming-Wei Li, Wu-Jun Li

Due to its low storage cost and fast query speed, hashing has been widely used for large-scale approximate nearest neighbor (ANN) search. Bucket search, also called hash lookup, can achieve fast query speed with a sub-linear time cost based on the inverted index table constructed from hash codes. Many metrics have been adopted to evaluate hashing algorithms. However, all existing metrics are improper to evaluate the hash codes for bucket search. On one hand, all existing metrics ignore the retrieval time cost which is an important factor reflecting the performance of search. On the other hand, some of them, such as mean average precision (MAP), suffer from the uncertainty problem as the ranked list is based on integer-valued Hamming distance, and are insensitive to Hamming radius as these metrics only depend on relative Hamming distance. Other metrics, such as precision at Hamming radius R, fail to evaluate global performance as these metrics only depend on one specific Hamming radius. In this paper, we first point out the problems of existing metrics which have been ignored by the hashing community, and then propose a novel evaluation metric called radius aware mean average precision (RAMAP) to evaluate hash codes for bucket search. Furthermore, two coding strategies are also proposed to qualitatively show the problems of existing metrics. Experiments demonstrate that our proposed RAMAP can provide more proper evaluation than existing metrics.

5/7/2024

🖼️

Multiple Code Hashing for Efficient Image Retrieval

Ming-Wei Li, Qing-Yuan Jiang, Wu-Jun Li

Due to its low storage cost and fast query speed, hashing has been widely used in large-scale image retrieval tasks. Hash bucket search returns data points within a given Hamming radius to each query, which can enable search at a constant or sub-linear time cost. However, existing hashing methods cannot achieve satisfactory retrieval performance for hash bucket search in complex scenarios, since they learn only one hash code for each image. More specifically, by using one hash code to represent one image, existing methods might fail to put similar image pairs to the buckets with a small Hamming distance to the query when the semantic information of images is complex. As a result, a large number of hash buckets need to be visited for retrieving similar images, based on the learned codes. This will deteriorate the efficiency of hash bucket search. In this paper, we propose a novel hashing framework, called multiple code hashing (MCH), to improve the performance of hash bucket search. The main idea of MCH is to learn multiple hash codes for each image, with each code representing a different region of the image. Furthermore, we propose a deep reinforcement learning algorithm to learn the parameters in MCH. To the best of our knowledge, this is the first work that proposes to learn multiple hash codes for each image in image retrieval. Experiments demonstrate that MCH can achieve a significant improvement in hash bucket search, compared with existing methods that learn only one hash code for each image.

5/7/2024

Not All Pairs are Equal: Hierarchical Learning for Average-Precision-Oriented Video Retrieval

Yang Liu, Qianqian Xu, Peisong Wen, Siran Dai, Qingming Huang

The rapid growth of online video resources has significantly promoted the development of video retrieval methods. As a standard evaluation metric for video retrieval, Average Precision (AP) assesses the overall rankings of relevant videos at the top list, making the predicted scores a reliable reference for users. However, recent video retrieval methods utilize pair-wise losses that treat all sample pairs equally, leading to an evident gap between the training objective and evaluation metric. To effectively bridge this gap, in this work, we aim to address two primary challenges: a) The current similarity measure and AP-based loss are suboptimal for video retrieval; b) The noticeable noise from frame-to-frame matching introduces ambiguity in estimating the AP loss. In response to these challenges, we propose the Hierarchical learning framework for Average-Precision-oriented Video Retrieval (HAP-VR). For the former challenge, we develop the TopK-Chamfer Similarity and QuadLinear-AP loss to measure and optimize video-level similarities in terms of AP. For the latter challenge, we suggest constraining the frame-level similarities to achieve an accurate AP loss estimation. Experimental results present that HAP-VR outperforms existing methods on several benchmark datasets, providing a feasible solution for video retrieval tasks and thus offering potential benefits for the multi-media application.

7/23/2024

A Bi-metric Framework for Fast Similarity Search

Haike Xu, Sandeep Silwal, Piotr Indyk

We propose a new bi-metric framework for designing nearest neighbor data structures. Our framework assumes two dissimilarity functions: a ground-truth metric that is accurate but expensive to compute, and a proxy metric that is cheaper but less accurate. In both theory and practice, we show how to construct data structures using only the proxy metric such that the query procedure achieves the accuracy of the expensive metric, while only using a limited number of calls to both metrics. Our theoretical results instantiate this framework for two popular nearest neighbor search algorithms: DiskANN and Cover Tree. In both cases we show that, as long as the proxy metric used to construct the data structure approximates the ground-truth metric up to a bounded factor, our data structure achieves arbitrarily good approximation guarantees with respect to the ground-truth metric. On the empirical side, we apply the framework to the text retrieval problem with two dissimilarity functions evaluated by ML models with vastly different computational costs. We observe that for almost all data sets in the MTEB benchmark, our approach achieves a considerably better accuracy-efficiency tradeoff than the alternatives, such as re-ranking.

6/6/2024