Characterizing the Dilemma of Performance and Index Size in Billion-Scale Vector Search and Breaking It with Second-Tier Memory

Read original: arXiv:2405.03267 - Published 5/8/2024 by Rongxin Cheng, Yifan Peng, Xingda Wei, Hongrui Xie, Rong Chen, Sijie Shen, Haibo Chen

Characterizing the Dilemma of Performance and Index Size in Billion-Scale Vector Search and Breaking It with Second-Tier Memory

Overview

The paper explores the trade-off between performance and index size in billion-scale vector search, and proposes a solution using second-tier memory.
It characterizes the performance-index size dilemma and presents a novel approach to address it, breaking the traditional trade-off.
The proposed solution leverages second-tier memory, such as solid-state drives (SSDs), to achieve high performance while maintaining a compact index size.

Plain English Explanation

Searching through massive collections of data, like billions of images or documents, can be a big challenge. Researchers often have to choose between having a fast search system or a compact index that takes up less storage space. This can be a difficult dilemma, as faster searches usually require more storage, while smaller indexes are slower.

The researchers in this paper looked at this trade-off in detail and found a way to break it. They propose using a two-tier memory system, with a fast main memory for the most important data, and a slower but larger second-tier memory, like an SSD, for the rest. This allows them to maintain a compact index that fits in the main memory, while still providing fast search speeds by offloading less-used data to the second-tier.

By leveraging this two-tier approach, the researchers were able to achieve high-performance billion-scale vector search without the usual trade-off of ballooning index size. This could be very useful for applications that need to search through massive datasets, like image search or large-scale recommender systems, while still keeping the overall system size and cost manageable.

Technical Explanation

The paper begins by characterizing the fundamental dilemma between performance and index size in billion-scale vector search. The researchers explain how traditional index structures, such as product quantization or binary token indexes, face a trade-off between query speed and index size.

To address this challenge, the authors propose leveraging second-tier memory, such as solid-state drives (SSDs), to break the traditional performance-index size trade-off. Their key insight is that by offloading less-frequently accessed data to the second-tier memory, they can maintain a compact in-memory index while still providing high-performance query processing.

The paper presents the design and implementation of their two-tier memory system, which includes techniques for efficiently managing data placement and access across the main memory and second-tier storage. They also describe how their approach can be integrated with existing vector search systems to provide significant performance improvements without the need for a larger index.

Through extensive experiments on billion-scale datasets, the authors demonstrate the effectiveness of their solution. They show that their two-tier memory system can achieve query latencies comparable to in-memory indexes while maintaining a much smaller index size, ultimately breaking the traditional dilemma.

Critical Analysis

The paper presents a well-designed and thoroughly evaluated solution to the longstanding challenge of balancing performance and index size in billion-scale vector search. The authors' use of second-tier memory is a clever way to sidestep the traditional trade-off, and their techniques for data placement and access management seem robust.

However, one potential limitation of the approach is the reliance on second-tier storage, which may have higher latency and lower bandwidth than main memory. While the authors demonstrate that their system can still achieve high performance, there may be workloads or applications where the overhead of second-tier access is unacceptable. Additionally, the paper does not explore the energy or cost implications of the two-tier memory architecture, which could be an important consideration for some use cases.

Another area for further research could be exploring the integration of this technique with other advanced index structures, such as deep learning-based retrieval models or multi-vector dense retrieval. Combining the benefits of second-tier memory with these more sophisticated indexing approaches could lead to even more powerful and efficient billion-scale vector search systems.

Conclusion

This paper provides a significant advancement in the field of billion-scale vector search by presenting a novel solution to the long-standing dilemma of balancing performance and index size. By leveraging second-tier memory, the researchers have found a way to maintain a compact in-memory index while still delivering high-performance query processing.

The implications of this work are substantial, as it could enable a new generation of large-scale search and retrieval systems that are both efficient and scalable. This could have far-reaching impacts on a wide range of applications, from image search and recommendation engines to knowledge management and decision support systems.

Overall, the paper represents a compelling and innovative contribution to the field of vector search and large-scale data processing. The authors' insights and techniques could inspire further research and development in this important area of computer science.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Characterizing the Dilemma of Performance and Index Size in Billion-Scale Vector Search and Breaking It with Second-Tier Memory

Rongxin Cheng, Yifan Peng, Xingda Wei, Hongrui Xie, Rong Chen, Sijie Shen, Haibo Chen

Vector searches on large-scale datasets are critical to modern online services like web search and RAG, which necessity storing the datasets and their index on the secondary storage like SSD. In this paper, we are the first to characterize the trade-off of performance and index size in existing SSD-based graph and cluster indexes: to improve throughput by 5.7$times$ and 1.7$times$, these indexes have to pay a 5.8$times$ storage amplification and 7.7$times$ with respect to the dataset size, respectively. The root cause is that the coarse-grained access of SSD mismatches the fine-grained random read required by vector indexes with small amplification. This paper argues that second-tier memory, such as remote DRAM/NVM connected via RDMA or CXL, is a powerful storage for addressing the problem from a system's perspective, thanks to its fine-grained access granularity. However, putting existing indexes -- primarily designed for SSD -- directly on second-tier memory cannot fully utilize its power. Meanwhile, second-tier memory still behaves more like storage, so using it as DRAM is also inefficient. To this end, we build a graph and cluster index that centers around the performance features of second-tier memory. With careful execution engine and index layout designs, we show that vector indexes can achieve optimal performance with orders of magnitude smaller index amplification, on a variety of second-tier memory devices. Based on our improved graph and vector indexes on second-tier memory, we further conduct a systematic study between them to facilitate developers choosing the right index for their workloads. Interestingly, the findings on the second-tier memory contradict the ones on SSDs.

5/8/2024

🤔

Semi-Parametric Retrieval via Binary Token Index

Jiawei Zhou, Li Dong, Furu Wei, Lei Chen

The landscape of information retrieval has broadened from search services to a critical component in various advanced applications, where indexing efficiency, cost-effectiveness, and freshness are increasingly important yet remain less explored. To address these demands, we introduce Semi-parametric Vocabulary Disentangled Retrieval (SVDR). SVDR is a novel semi-parametric retrieval framework that supports two types of indexes: an embedding-based index for high effectiveness, akin to existing neural retrieval methods; and a binary token index that allows for quick and cost-effective setup, resembling traditional term-based retrieval. In our evaluation on three open-domain question answering benchmarks with the entire Wikipedia as the retrieval corpus, SVDR consistently demonstrates superiority. It achieves a 3% higher top-1 retrieval accuracy compared to the dense retriever DPR when using an embedding-based index and an 9% higher top-1 accuracy compared to BM25 when using a binary token index. Specifically, the adoption of a binary token index reduces index preparation time from 30 GPU hours to just 2 CPU hours and storage size from 31 GB to 2 GB, achieving a 90% reduction compared to an embedding-based index.

5/6/2024

🏷️

DEX: Scalable Range Indexing on Disaggregated Memory [Extended Version]

Baotong Lu, Kaisong Huang, Chieh-Jan Mike Liang, Tianzheng Wang, Eric Lo

Memory disaggregation can potentially allow memory-optimized range indexes such as B+-trees to scale beyond one machine while attaining high hardware utilization and low cost. Designing scalable indexes on disaggregated memory, however, is challenging due to rudimentary caching, unprincipled offloading and excessive inconsistency among servers. This paper proposes DEX, a new scalable B+-tree for memory disaggregation. DEX includes a set of techniques to reduce remote accesses, including logical partitioning, lightweight caching and cost-aware offloading. Our evaluation shows that DEX can outperform the state-of-the-art by 1.7--56.3X, and the advantage remains under various setups, such as cache size and skewness.

5/24/2024

STT-RAM-based Hierarchical In-Memory Computing

Dhruv Gajaria, Kevin Antony Gomez, Tosiron Adegbija

In-memory computing promises to overcome the von Neumann bottleneck in computer systems by performing computations directly within the memory. Previous research has suggested using Spin-Transfer Torque RAM (STT-RAM) for in-memory computing due to its non-volatility, low leakage power, high density, endurance, and commercial viability. This paper explores hierarchical in-memory computing, where different levels of the memory hierarchy are augmented with processing elements to optimize workload execution. The paper investigates processing in memory (PiM) using non-volatile STT-RAM and processing in cache (PiC) using volatile STT-RAM with relaxed retention, which helps mitigate STT-RAM's write latency and energy overheads. We analyze tradeoffs and overheads associated with data movement for PiC versus write overheads for PiM using STT-RAMs for various workloads. We examine workload characteristics, such as computational intensity and CPU-dependent workloads with limited instruction-level parallelism, and their impact on PiC/PiM tradeoffs. Using these workloads, we evaluate computing in STT-RAM versus SRAM at different cache hierarchy levels and explore the potential of heterogeneous STT-RAM cache architectures with various retention times for PiC and CPU-based computing. Our experiments reveal significant advantages of STT-RAM-based PiC over PiM for specific workloads. Finally, we describe open research problems in hierarchical in-memory computing architectures to further enhance this paradigm.

7/30/2024