AiSAQ: All-in-Storage ANNS with Product Quantization for DRAM-free Information Retrieval

Read original: arXiv:2404.06004 - Published 4/10/2024 by Kento Tatsuno, Daisuke Miyashita, Taiga Ikeda, Kiyoshi Ishiyama, Kazunari Sumiyoshi, Jun Deguchi

AiSAQ: All-in-Storage ANNS with Product Quantization for DRAM-free Information Retrieval

Overview

The paper introduces AiSAQ, a novel approach for approximate nearest neighbor search (ANNS) that eliminates the need for DRAM by performing all operations within storage.
AiSAQ combines graph-based ANNS algorithms with product quantization, a technique for efficient vector compression, to enable fast and memory-efficient information retrieval.
The researchers demonstrate that AiSAQ outperforms state-of-the-art ANNS solutions in terms of retrieval quality and search speed, while also significantly reducing the memory footprint.

Plain English Explanation

AiSAQ: All-in-Storage ANNS with Product Quantization for DRAM-free Information Retrieval is a new way to quickly search through large collections of data, like images or documents, to find the ones that are most similar to a given query.

Typically, this type of approximate nearest neighbor search (ANNS) requires a lot of fast memory (DRAM) to store and process the data. AiSAQ eliminates the need for DRAM by doing all the processing right inside the storage device itself, like a hard drive or SSD.

AiSAQ combines two key techniques:

Graph-based ANNS algorithms: These algorithms organize the data into a special graph structure that makes it very fast to search and find similar items.
Product quantization: This is a way to compress the data so it takes up less space, without losing too much of the important information.

By putting these two ideas together, the researchers show that AiSAQ can search through data much faster and with a much smaller memory footprint than other state-of-the-art ANNS solutions. This could be really useful for applications like image search, recommendation systems, or natural language processing, where you need to quickly find similar items in a large database.

Technical Explanation

The paper introduces a new ANNS system called AiSAQ that performs all operations within storage, eliminating the need for DRAM. AiSAQ combines graph-based ANNS algorithms with the product quantization technique for efficient vector compression.

The key components of AiSAQ are:

Graph-based ANNS algorithm: AiSAQ uses a graph-based ANNS algorithm to organize the data into a special graph structure. This allows for fast and efficient nearest neighbor search, as the graph encodes information about similar items.
Product quantization: To reduce the memory footprint, AiSAQ leverages product quantization, a technique that compresses high-dimensional vectors by splitting them into smaller subvectors and quantizing each subvector independently. This enables efficient storage and processing of the data entirely within the storage device.

The researchers evaluate AiSAQ on several standard ANNS benchmarks and show that it outperforms state-of-the-art ANNS solutions in both retrieval quality and search speed, while also significantly reducing the memory footprint. This is particularly beneficial for applications that require fast and memory-efficient information retrieval, such as image super-resolution, neural network quantization, and hardware-aware deep learning.

Critical Analysis

The paper provides a comprehensive evaluation of AiSAQ and demonstrates its superiority over existing ANNS solutions. However, the authors do not discuss the potential limitations or challenges of the approach.

One potential concern is the impact of the quantization process on the quality of the search results. While the authors show that AiSAQ maintains high retrieval quality, it would be interesting to see a more detailed analysis of the tradeoffs between compression and accuracy.

Additionally, the paper does not explore the scalability of AiSAQ to very large datasets or its performance under different data distributions. Further research is needed to understand the broader applicability and limitations of the proposed approach.

Conclusion

The AiSAQ system presents a promising solution for efficient and memory-friendly approximate nearest neighbor search. By combining graph-based ANNS algorithms with product quantization, the researchers have developed a DRAM-free approach that outperforms state-of-the-art methods in both retrieval quality and search speed.

This work has significant implications for a wide range of applications that rely on fast and memory-efficient information retrieval, such as image search, recommendation systems, and natural language processing. The ability to perform ANNS entirely within storage could enable new hardware-efficient solutions for these and other data-intensive tasks.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

AiSAQ: All-in-Storage ANNS with Product Quantization for DRAM-free Information Retrieval

Kento Tatsuno, Daisuke Miyashita, Taiga Ikeda, Kiyoshi Ishiyama, Kazunari Sumiyoshi, Jun Deguchi

In approximate nearest neighbor search (ANNS) methods based on approximate proximity graphs, DiskANN achieves good recall-speed balance for large-scale datasets using both of RAM and storage. Despite it claims to save memory usage by loading compressed vectors by product quantization (PQ), its memory usage increases in proportion to the scale of datasets. In this paper, we propose All-in-Storage ANNS with Product Quantization (AiSAQ), which offloads the compressed vectors to storage. Our method achieves $sim$10 MB memory usage in query search even with billion-scale datasets with minor performance degradation. AiSAQ also reduces the index load time before query search, which enables the index switch between muitiple billion-scale datasets and significantly enhances the flexibility of retrieval-augmented generation (RAG). This method is applicable to all graph-based ANNS algorithms and can be combined with higher-spec ANNS methods in the future.

4/10/2024

➖

RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor Search

Jianyang Gao, Cheng Long

Searching for approximate nearest neighbors (ANN) in the high-dimensional Euclidean space is a pivotal problem. Recently, with the help of fast SIMD-based implementations, Product Quantization (PQ) and its variants can often efficiently and accurately estimate the distances between the vectors and have achieved great success in the in-memory ANN search. Despite their empirical success, we note that these methods do not have a theoretical error bound and are observed to fail disastrously on some real-world datasets. Motivated by this, we propose a new randomized quantization method named RaBitQ, which quantizes $D$-dimensional vectors into $D$-bit strings. RaBitQ guarantees a sharp theoretical error bound and provides good empirical accuracy at the same time. In addition, we introduce efficient implementations of RaBitQ, supporting to estimate the distances with bitwise operations or SIMD-based operations. Extensive experiments on real-world datasets confirm that (1) our method outperforms PQ and its variants in terms of accuracy-efficiency trade-off by a clear margin and (2) its empirical performance is well-aligned with our theoretical analysis.

5/22/2024

🧪

CAGRA: Highly Parallel Graph Construction and Approximate Nearest Neighbor Search for GPUs

Hiroyuki Ootomo, Akira Naruse, Corey Nolet, Ray Wang, Tamas Feher, Yong Wang

Approximate Nearest Neighbor Search (ANNS) plays a critical role in various disciplines spanning data mining and artificial intelligence, from information retrieval and computer vision to natural language processing and recommender systems. Data volumes have soared in recent years and the computational cost of an exhaustive exact nearest neighbor search is often prohibitive, necessitating the adoption of approximate techniques. The balanced performance and recall of graph-based approaches have more recently garnered significant attention in ANNS algorithms, however, only a few studies have explored harnessing the power of GPUs and multi-core processors despite the widespread use of massively parallel and general-purpose computing. To bridge this gap, we introduce a novel parallel computing hardware-based proximity graph and search algorithm. By leveraging the high-performance capabilities of modern hardware, our approach achieves remarkable efficiency gains. In particular, our method surpasses existing CPU and GPU-based methods in constructing the proximity graph, demonstrating higher throughput in both large- and small-batch searches while maintaining compatible accuracy. In graph construction time, our method, CAGRA, is 2.2~27x faster than HNSW, which is one of the CPU SOTA implementations. In large-batch query throughput in the 90% to 95% recall range, our method is 33~77x faster than HNSW, and is 3.8~8.8x faster than the SOTA implementations for GPU. For a single query, our method is 3.4~53x faster than HNSW at 95% recall.

7/10/2024

BANG: Billion-Scale Approximate Nearest Neighbor Search using a Single GPU

Karthik V., Saim Khan, Somesh Singh, Harsha Vardhan Simhadri, Jyothi Vedurada

Approximate Nearest Neighbour Search (ANNS) is a subroutine in algorithms routinely employed in information retrieval, pattern recognition, data mining, image processing, and beyond. Recent works have established that graph-based ANNS algorithms are practically more efficient than the other methods proposed in the literature. The growing volume and dimensionality of data necessitates designing scalable techniques for ANNS. To this end, the prior art has explored parallelizing graph-based ANNS on GPU leveraging its massive parallelism. The current state-of-the-art GPU-based ANNS algorithms either (i) require both the dataset and the generated graph index to reside entirely in the GPU memory, or (ii) they partition the dataset into small independent shards, each of which can fit in GPU memory, and perform the search on these shards on the GPU. While the first approach fails to handle large datasets due to the limited memory available on the GPU, the latter delivers poor performance on large datasets due to high data traffic over the low-bandwidth PCIe bus. We introduce BANG, a first-of-its-kind technique for graph-based ANNS on GPU for billion-scale datasets that cannot entirely fit in the GPU memory. BANG stands out by harnessing a compressed form of the dataset on a single GPU to perform distance computations while efficiently accessing the graph index kept on the host memory, enabling efficient ANNS on large graphs within the limited GPU memory. BANG incorporates highly optimized GPU kernels and proceeds in phases that run concurrently on the GPU and CPU. Notably, on the billion-size datasets, we achieve throughputs 40x-200x more than the competing methods for a high recall value of 0.9. Additionally, BANG is the best in cost- and power-efficiency among the competing methods from the recent Billion-Scale Approximate Nearest Neighbour Search Challenge.

7/22/2024