Semi-Parametric Retrieval via Binary Token Index

2405.01924

Published 5/6/2024 by Jiawei Zhou, Li Dong, Furu Wei, Lei Chen

🤔

Abstract

The landscape of information retrieval has broadened from search services to a critical component in various advanced applications, where indexing efficiency, cost-effectiveness, and freshness are increasingly important yet remain less explored. To address these demands, we introduce Semi-parametric Vocabulary Disentangled Retrieval (SVDR). SVDR is a novel semi-parametric retrieval framework that supports two types of indexes: an embedding-based index for high effectiveness, akin to existing neural retrieval methods; and a binary token index that allows for quick and cost-effective setup, resembling traditional term-based retrieval. In our evaluation on three open-domain question answering benchmarks with the entire Wikipedia as the retrieval corpus, SVDR consistently demonstrates superiority. It achieves a 3% higher top-1 retrieval accuracy compared to the dense retriever DPR when using an embedding-based index and an 9% higher top-1 accuracy compared to BM25 when using a binary token index. Specifically, the adoption of a binary token index reduces index preparation time from 30 GPU hours to just 2 CPU hours and storage size from 31 GB to 2 GB, achieving a 90% reduction compared to an embedding-based index.

Get summaries of the top AI research delivered straight to your inbox:

Overview

The paper introduces a novel semi-parametric retrieval framework called Semi-parametric Vocabulary Disentangled Retrieval (SVDR)
SVDR supports two types of indexes: an embedding-based index for high effectiveness, and a binary token index for quick and cost-effective setup
SVDR consistently outperforms existing methods on open-domain question answering benchmarks, achieving higher top-1 retrieval accuracy

Plain English Explanation

Information retrieval, the process of finding relevant information from a large dataset, has become a critical component in various advanced applications, such as question answering and multi-vector retrieval. In these applications, indexing efficiency, cost-effectiveness, and data freshness are increasingly important but remain less explored.

To address these demands, the researchers developed a new framework called SVDR. SVDR combines two types of indexes: an embedding-based index, similar to existing neural retrieval methods, and a binary token index, which is like the traditional term-based retrieval but more cost-effective.

The embedding-based index provides high retrieval effectiveness, while the binary token index allows for quick and cost-effective setup. By using these two indexes together, SVDR can achieve superior performance on open-domain question answering tasks compared to existing methods.

Technical Explanation

The researchers evaluated SVDR on three open-domain question answering benchmarks, using the entire Wikipedia as the retrieval corpus. SVDR consistently demonstrated superior performance compared to other methods.

When using an embedding-based index, SVDR achieved a 3% higher top-1 retrieval accuracy compared to the dense retriever DPR. And when using a binary token index, SVDR achieved a 9% higher top-1 accuracy compared to BM25, a traditional term-based retrieval method.

Importantly, the binary token index in SVDR significantly reduces the cost and time required for index preparation. Compared to the embedding-based index, the binary token index reduces the index preparation time from 30 GPU hours to just 2 CPU hours, and the storage size from 31 GB to 2 GB – a 90% reduction.

Critical Analysis

The paper presents a compelling solution to the growing demands of information retrieval in advanced applications. By introducing the SVDR framework with its two-pronged index approach, the researchers have addressed the key challenges of indexing efficiency, cost-effectiveness, and data freshness.

While the results are impressive, the paper does not discuss potential limitations or areas for further research. For example, it would be interesting to understand how SVDR performs on more diverse types of retrieval tasks beyond open-domain question answering, or how it compares to other emerging binary token representation techniques.

Additionally, the paper could have provided more insight into the trade-offs between the embedding-based and binary token indexes, and how users might choose between them depending on their specific needs and constraints.

Conclusion

The SVDR framework introduced in this paper represents a significant advancement in information retrieval, addressing critical challenges in indexing efficiency, cost-effectiveness, and data freshness. By combining an embedding-based index for high effectiveness and a binary token index for quick and cost-effective setup, SVDR demonstrates superior performance on open-domain question answering tasks.

The potential impact of this research extends beyond question answering, as the SVDR framework could be applied to a wide range of advanced applications that rely on efficient and cost-effective information retrieval. As the field of information retrieval continues to evolve, the insights and techniques presented in this paper will likely play an important role in shaping the future of this critical technology.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

💬

BTR: Binary Token Representations for Efficient Retrieval Augmented Language Models

Qingqing Cao, Sewon Min, Yizhong Wang, Hannaneh Hajishirzi

Retrieval augmentation addresses many critical problems in large language models such as hallucination, staleness, and privacy leaks. However, running retrieval-augmented language models (LMs) is slow and difficult to scale due to processing large amounts of retrieved text. We introduce binary token representations (BTR), which use 1-bit vectors to precompute every token in passages, significantly reducing computation during inference. Despite the potential loss of accuracy, our new calibration techniques and training objectives restore performance. Combined with offline and runtime compression, this only requires 127GB of disk space for encoding 3 billion tokens in Wikipedia. Our experiments show that on five knowledge-intensive NLP tasks, BTR accelerates state-of-the-art inference by up to 4x and reduces storage by over 100x while maintaining over 95% task performance.

5/6/2024

cs.CL cs.AI

🚀

Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse Representations

Sebastian Bruch, Franco Maria Nardini, Cosimo Rulli, Rossano Venturini

Learned sparse representations form an attractive class of contextual embeddings for text retrieval. That is so because they are effective models of relevance and are interpretable by design. Despite their apparent compatibility with inverted indexes, however, retrieval over sparse embeddings remains challenging. That is due to the distributional differences between learned embeddings and term frequency-based lexical models of relevance such as BM25. Recognizing this challenge, a great deal of research has gone into, among other things, designing retrieval algorithms tailored to the properties of learned sparse representations, including approximate retrieval systems. In fact, this task featured prominently in the latest BigANN Challenge at NeurIPS 2023, where approximate algorithms were evaluated on a large benchmark dataset by throughput and recall. In this work, we propose a novel organization of the inverted index that enables fast yet effective approximate retrieval over learned sparse embeddings. Our approach organizes inverted lists into geometrically-cohesive blocks, each equipped with a summary vector. During query processing, we quickly determine if a block must be evaluated using the summaries. As we show experimentally, single-threaded query processing using our method, Seismic, reaches sub-millisecond per-query latency on various sparse embeddings of the MS MARCO dataset while maintaining high recall. Our results indicate that Seismic is one to two orders of magnitude faster than state-of-the-art inverted index-based solutions and further outperforms the winning (graph-based) submissions to the BigANN Challenge by a significant margin.

4/30/2024

cs.IR

🔎

Rethinking the Role of Token Retrieval in Multi-Vector Retrieval

Jinhyuk Lee, Zhuyun Dai, Sai Meher Karthik Duddu, Tao Lei, Iftekhar Naim, Ming-Wei Chang, Vincent Y. Zhao

Multi-vector retrieval models such as ColBERT [Khattab and Zaharia, 2020] allow token-level interactions between queries and documents, and hence achieve state of the art on many information retrieval benchmarks. However, their non-linear scoring function cannot be scaled to millions of documents, necessitating a three-stage process for inference: retrieving initial candidates via token retrieval, accessing all token vectors, and scoring the initial candidate documents. The non-linear scoring function is applied over all token vectors of each candidate document, making the inference process complicated and slow. In this paper, we aim to simplify the multi-vector retrieval by rethinking the role of token retrieval. We present XTR, ConteXtualized Token Retriever, which introduces a simple, yet novel, objective function that encourages the model to retrieve the most important document tokens first. The improvement to token retrieval allows XTR to rank candidates only using the retrieved tokens rather than all tokens in the document, and enables a newly designed scoring stage that is two-to-three orders of magnitude cheaper than that of ColBERT. On the popular BEIR benchmark, XTR advances the state-of-the-art by 2.8 nDCG@10 without any distillation. Detailed analysis confirms our decision to revisit the token retrieval stage, as XTR demonstrates much better recall of the token retrieval stage compared to ColBERT.

4/10/2024

cs.CL cs.IR

Event-enhanced Retrieval in Real-time Search

Yanan Zhang, Xiaoling Bai, Tianhua Zhou

The embedding-based retrieval (EBR) approach is widely used in mainstream search engine retrieval systems and is crucial in recent retrieval-augmented methods for eliminating LLM illusions. However, existing EBR models often face the semantic drift problem and insufficient focus on key information, leading to a low adoption rate of retrieval results in subsequent steps. This issue is especially noticeable in real-time search scenarios, where the various expressions of popular events on the Internet make real-time retrieval heavily reliant on crucial event information. To tackle this problem, this paper proposes a novel approach called EER, which enhances real-time retrieval performance by improving the dual-encoder model of traditional EBR. We incorporate contrastive learning to accompany pairwise learning for encoder optimization. Furthermore, to strengthen the focus on critical event information in events, we include a decoder module after the document encoder, introduce a generative event triplet extraction scheme based on prompt-tuning, and correlate the events with query encoder optimization through comparative learning. This decoder module can be removed during inference. Extensive experiments demonstrate that EER can significantly improve the real-time search retrieval performance. We believe that this approach will provide new perspectives in the field of information retrieval. The codes and dataset are available at https://github.com/open-event-hub/Event-enhanced_Retrieval .

4/10/2024

cs.CL cs.IR