BANG: Billion-Scale Approximate Nearest Neighbor Search using a Single GPU

Read original: arXiv:2401.11324 - Published 7/22/2024 by Karthik V., Saim Khan, Somesh Singh, Harsha Vardhan Simhadri, Jyothi Vedurada

BANG: Billion-Scale Approximate Nearest Neighbor Search using a Single GPU

Overview

BANG is a system for billion-scale approximate nearest neighbor (ANN) search using a single GPU.
It leverages GPU parallelism to enable high-speed ANN search on large datasets.
The paper presents the design and implementation of BANG, as well as experimental results demonstrating its efficiency.

Plain English Explanation

BANG: Billion-Scale Approximate Nearest Neighbor Search using a Single GPU describes a new technique for finding the nearest neighbors of data points in very large datasets. This is an important task in many machine learning and data analysis applications, such as image retrieval, recommendation systems, and anomaly detection.

The key idea behind BANG is to take advantage of the massive parallelism of modern graphics processing units (GPUs) to speed up the nearest neighbor search process. Traditionally, this type of search has been done on CPUs, which are good at executing a small number of complex operations quickly. In contrast, GPUs excel at performing a large number of simple operations in parallel, making them well-suited for the kind of brute-force search required for nearest neighbor problems.

The researchers developed a novel algorithm and data structure that allow BANG to efficiently leverage GPU resources to search through billions of data points in a matter of seconds. By carefully designing the memory layout and computation patterns, they were able to achieve high utilization of the GPU's processing cores and memory bandwidth.

The results of their experiments show that BANG can outperform state-of-the-art CPU-based nearest neighbor search systems by 10-100x, while still providing high-quality approximate results. This makes BANG a powerful tool for applications that require fast and scalable nearest neighbor search, such as large-scale image retrieval or recommendation systems.

Technical Explanation

The BANG system is designed to perform approximate nearest neighbor (ANN) search on datasets containing billions of high-dimensional data points, using a single GPU. To achieve this, the authors propose several key innovations:

Efficient GPU-based ANN Search Algorithm: BANG uses a novel GPU-tailored algorithm for ANN search that maximizes the utilization of the GPU's parallel processing capabilities. This includes carefully organizing the data in memory to enable coalesced memory accesses and designing computation patterns that map well to the GPU's SIMD (single instruction, multiple data) architecture.
Hybrid Index Structure: BANG employs a hybrid index structure that combines the strengths of different ANN search techniques. It uses a coarse-grained index to quickly identify candidate nearest neighbors, followed by a fine-grained search using a GPU-optimized brute-force search.
Asynchronous Query Processing: BANG can process multiple queries simultaneously on the GPU, using an asynchronous execution model to hide latency and maintain high GPU utilization.

The paper presents a detailed evaluation of BANG's performance on large-scale datasets, including comparisons to state-of-the-art CPU-based ANN search systems. The results show that BANG can outperform these baselines by 10-100x, while maintaining high search quality.

Critical Analysis

The BANG paper makes a compelling case for the use of GPUs to accelerate approximate nearest neighbor search, a fundamental operation in many machine learning and data analysis applications. The authors have clearly put a lot of thought into designing an efficient GPU-based algorithm and index structure that maximizes the parallelism and computational capabilities of modern GPUs.

However, the paper does not address some potential limitations and areas for further research:

GPU Memory Constraints: The authors assume the entire dataset can fit in the GPU's memory, which may not be the case for truly massive datasets. Techniques for out-of-core processing or data partitioning may be needed to handle larger-scale problems.
Generalization to Other Metrics: The paper focuses on Euclidean distance as the similarity measure, but many applications may require other distance metrics (e.g., cosine similarity, Jaccard distance). Extending BANG to handle a broader range of similarity measures could broaden its applicability.
Dynamic Dataset Updates: The current BANG system assumes a static dataset, but many real-world applications require efficient handling of dataset updates (e.g., new data points being added or existing data points being modified). Developing techniques for efficient incremental index maintenance would be an important direction for future research.
Comparison to Other GPU-based Approaches: While the paper compares BANG to state-of-the-art CPU-based ANN search systems, it does not provide a comprehensive comparison to other GPU-accelerated ANN search methods. Understanding how BANG's performance and capabilities stack up against alternative GPU-based approaches would be valuable.

Overall, the BANG paper presents a significant advancement in the field of large-scale approximate nearest neighbor search, demonstrating the power of GPU-based parallel processing. The techniques and insights developed in this work could inspire further innovations in this important area of research.

Conclusion

The BANG paper introduces a novel system for billion-scale approximate nearest neighbor search using a single GPU. By carefully designing a GPU-optimized algorithm and index structure, the authors have shown that GPUs can be leveraged to achieve orders of magnitude performance improvements over traditional CPU-based approaches.

This work highlights the potential of GPUs to accelerate fundamental data processing and machine learning tasks, and could have significant implications for a wide range of applications that rely on efficient nearest neighbor search, such as recommender systems, image retrieval, and anomaly detection. As the scale and complexity of data continue to grow, techniques like BANG will become increasingly important for enabling fast and scalable data analysis at the necessary levels of throughput and accuracy.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

BANG: Billion-Scale Approximate Nearest Neighbor Search using a Single GPU

Karthik V., Saim Khan, Somesh Singh, Harsha Vardhan Simhadri, Jyothi Vedurada

Approximate Nearest Neighbour Search (ANNS) is a subroutine in algorithms routinely employed in information retrieval, pattern recognition, data mining, image processing, and beyond. Recent works have established that graph-based ANNS algorithms are practically more efficient than the other methods proposed in the literature. The growing volume and dimensionality of data necessitates designing scalable techniques for ANNS. To this end, the prior art has explored parallelizing graph-based ANNS on GPU leveraging its massive parallelism. The current state-of-the-art GPU-based ANNS algorithms either (i) require both the dataset and the generated graph index to reside entirely in the GPU memory, or (ii) they partition the dataset into small independent shards, each of which can fit in GPU memory, and perform the search on these shards on the GPU. While the first approach fails to handle large datasets due to the limited memory available on the GPU, the latter delivers poor performance on large datasets due to high data traffic over the low-bandwidth PCIe bus. We introduce BANG, a first-of-its-kind technique for graph-based ANNS on GPU for billion-scale datasets that cannot entirely fit in the GPU memory. BANG stands out by harnessing a compressed form of the dataset on a single GPU to perform distance computations while efficiently accessing the graph index kept on the host memory, enabling efficient ANNS on large graphs within the limited GPU memory. BANG incorporates highly optimized GPU kernels and proceeds in phases that run concurrently on the GPU and CPU. Notably, on the billion-size datasets, we achieve throughputs 40x-200x more than the competing methods for a high recall value of 0.9. Additionally, BANG is the best in cost- and power-efficiency among the competing methods from the recent Billion-Scale Approximate Nearest Neighbour Search Challenge.

7/22/2024

🧪

CAGRA: Highly Parallel Graph Construction and Approximate Nearest Neighbor Search for GPUs

Hiroyuki Ootomo, Akira Naruse, Corey Nolet, Ray Wang, Tamas Feher, Yong Wang

Approximate Nearest Neighbor Search (ANNS) plays a critical role in various disciplines spanning data mining and artificial intelligence, from information retrieval and computer vision to natural language processing and recommender systems. Data volumes have soared in recent years and the computational cost of an exhaustive exact nearest neighbor search is often prohibitive, necessitating the adoption of approximate techniques. The balanced performance and recall of graph-based approaches have more recently garnered significant attention in ANNS algorithms, however, only a few studies have explored harnessing the power of GPUs and multi-core processors despite the widespread use of massively parallel and general-purpose computing. To bridge this gap, we introduce a novel parallel computing hardware-based proximity graph and search algorithm. By leveraging the high-performance capabilities of modern hardware, our approach achieves remarkable efficiency gains. In particular, our method surpasses existing CPU and GPU-based methods in constructing the proximity graph, demonstrating higher throughput in both large- and small-batch searches while maintaining compatible accuracy. In graph construction time, our method, CAGRA, is 2.2~27x faster than HNSW, which is one of the CPU SOTA implementations. In large-batch query throughput in the 90% to 95% recall range, our method is 33~77x faster than HNSW, and is 3.8~8.8x faster than the SOTA implementations for GPU. For a single query, our method is 3.4~53x faster than HNSW at 95% recall.

7/10/2024

A Real-Time Adaptive Multi-Stream GPU System for Online Approximate Nearest Neighborhood Search

Yiping Sun, Yang Shi, Jiaolong Du

In recent years, Approximate Nearest Neighbor Search (ANNS) has played a pivotal role in modern search and recommendation systems, especially in emerging LLM applications like Retrieval-Augmented Generation. There is a growing exploration into harnessing the parallel computing capabilities of GPUs to meet the substantial demands of ANNS. However, existing systems primarily focus on offline scenarios, overlooking the distinct requirements of online applications that necessitate real-time insertion of new vectors. This limitation renders such systems inefficient for real-world scenarios. Moreover, previous architectures struggled to effectively support real-time insertion due to their reliance on serial execution streams. In this paper, we introduce a novel Real-Time Adaptive Multi-Stream GPU ANNS System (RTAMS-GANNS). Our architecture achieves its objectives through three key advancements: 1) We initially examined the real-time insertion mechanisms in existing GPU ANNS systems and discovered their reliance on repetitive copying and memory allocation, which significantly hinders real-time effectiveness on GPUs. As a solution, we introduce a dynamic vector insertion algorithm based on memory blocks, which includes in-place rearrangement. 2) To enable real-time vector insertion in parallel, we introduce a multi-stream parallel execution mode, which differs from existing systems that operate serially within a single stream. Our system utilizes a dynamic resource pool, allowing multiple streams to execute concurrently without additional execution blocking. 3) Through extensive experiments and comparisons, our approach effectively handles varying QPS levels across different datasets, reducing latency by up to 40%-80%. The proposed system has also been deployed in real-world industrial search and recommendation systems, serving hundreds of millions of users daily, and has achieved good results.

8/7/2024

✅

Approximate Nearest Neighbour Search on Dynamic Datasets: An Investigation

Ben Harwood, Amir Dezfouli, Iadine Chades, Conrad Sanderson

Approximate k-Nearest Neighbour (ANN) methods are often used for mining information and aiding machine learning on large scale high-dimensional datasets. ANN methods typically differ in the index structure used for accelerating searches, resulting in various recall/runtime trade-off points. For applications with static datasets, runtime constraints and dataset properties can be used to empirically select an ANN method with suitable operating characteristics. However, for applications with dynamic datasets, which are subject to frequent online changes (like addition of new samples), there is currently no consensus as to which ANN methods are most suitable. Traditional evaluation approaches do not consider the computational costs of updating the index structure, as well as the rate and size of index updates. To address this, we empirically evaluate 5 popular ANN methods on two main applications (online data collection and online feature learning) while taking into account these considerations. Two dynamic datasets are used, derived from the SIFT1M dataset with 1 million samples and the DEEP1B dataset with 1 billion samples. The results indicate that the often used k-d trees method is not suitable on dynamic datasets as it is slower than a straightforward baseline exhaustive search method. For online data collection, the Hierarchical Navigable Small World Graphs method achieves a consistent speedup over baseline across a wide range of recall rates. For online feature learning, the Scalable Nearest Neighbours method is faster than baseline for recall rates below 75%.

6/6/2024