Towards Boosting LLMs-driven Relevance Modeling with Progressive Retrieved Behavior-augmented Prompting

Read original: arXiv:2408.09439 - Published 8/20/2024 by Zeyuan Chen, Haiyan Wu, Kaixin Wu, Wei Chen, Mingjie Zhong, Jia Xu, Zhongyi Liu, Wei Zhang

Towards Boosting LLMs-driven Relevance Modeling with Progressive Retrieved Behavior-augmented Prompting

Overview

This paper proposes a new approach called "Progressive Retrieved Behavior-augmented Prompting" to improve the relevance modeling capabilities of large language models (LLMs) in search engine and recommendation applications.
The key idea is to iteratively incorporate user behavior signals (e.g., clicks, dwell time) into the prompting process to better align the LLM's predictions with user preferences.
The authors conduct experiments on two real-world datasets and show that their approach outperforms standard LLM-based relevance modeling approaches.

Plain English Explanation

The paper explores ways to make large language models better at determining the relevance of content for search engines and recommendation systems. The core issue is that these models don't always understand what users truly care about or prefer.

To address this, the researchers developed a new technique called "Progressive Retrieved Behavior-augmented Prompting." The basic idea is to gradually incorporate signals about how users interact with content (e.g., how long they spend on a page, whether they click on it) into the prompts given to the language model. This helps the model learn to prioritize content that users find more relevant and useful.

Through experiments on real-world datasets, the authors show this approach can outperform standard language model-based relevance modeling methods. By taking the user's actual behavior into account, the model can make better predictions about what content a user is likely to find valuable.

Technical Explanation

The paper proposes a novel relevance modeling approach called "Progressive Retrieved Behavior-augmented Prompting" (PRBP) to enhance the performance of large language models (LLMs) in search and recommendation tasks.

The key innovation is the iterative incorporation of user behavior signals, such as clicks and dwell time, into the prompting process. Specifically, the authors start with an initial prompt to the LLM, then progressively augment this prompt with behavioral information retrieved from the search/recommendation history. This allows the LLM to better align its relevance predictions with user preferences over successive iterations.

The authors evaluate their PRBP approach on two real-world datasets - a web search dataset and a product search dataset. They compare the performance of PRBP against standard LLM-based relevance modeling techniques, as well as other prompt engineering methods like APEER. The results show that PRBP outperforms these baselines, demonstrating the benefits of incorporating user behavior signals into the prompting process.

Critical Analysis

The paper presents a promising approach to improving the relevance modeling capabilities of LLMs, which is an important and practical problem. The iterative incorporation of user behavior data is a clever idea that allows the model to progressively learn what users find relevant.

However, the paper does not address some potential limitations and areas for further research:

The experiments are conducted on relatively narrow domains (web search, product search). It's unclear how well the PRBP approach would generalize to other types of search/recommendation tasks.
The paper does not explore the impact of different types of user behavior signals or how to best combine them in the prompting process.
There are open questions around the computational overhead and scalability of the PRBP approach, especially as the amount of behavioral data grows.

Additionally, while the results are promising, the paper could have provided more critical analysis of the tradeoffs and potential downsides of this approach. For example, how might the use of user behavior data raise privacy concerns, and how can these be addressed?

Conclusion

This paper presents a novel relevance modeling technique called "Progressive Retrieved Behavior-augmented Prompting" that aims to enhance the performance of large language models in search and recommendation tasks. By iteratively incorporating user behavior signals into the prompting process, the approach allows the LLM to better align its predictions with user preferences.

The experimental results demonstrate the potential of this approach, outperforming standard LLM-based relevance modeling methods. However, the paper also highlights several areas for further research and exploration, such as the generalizability of the approach, the impact of different behavior signals, and the scalability and privacy implications.

Overall, this work offers a promising direction for improving the relevance modeling capabilities of LLMs, with potential applications in a variety of search and recommendation scenarios.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Towards Boosting LLMs-driven Relevance Modeling with Progressive Retrieved Behavior-augmented Prompting

Zeyuan Chen, Haiyan Wu, Kaixin Wu, Wei Chen, Mingjie Zhong, Jia Xu, Zhongyi Liu, Wei Zhang

Relevance modeling is a critical component for enhancing user experience in search engines, with the primary objective of identifying items that align with users' queries. Traditional models only rely on the semantic congruence between queries and items to ascertain relevance. However, this approach represents merely one aspect of the relevance judgement, and is insufficient in isolation. Even powerful Large Language Models (LLMs) still cannot accurately judge the relevance of a query and an item from a semantic perspective. To augment LLMs-driven relevance modeling, this study proposes leveraging user interactions recorded in search logs to yield insights into users' implicit search intentions. The challenge lies in the effective prompting of LLMs to capture dynamic search intentions, which poses several obstacles in real-world relevance scenarios, i.e., the absence of domain-specific knowledge, the inadequacy of an isolated prompt, and the prohibitive costs associated with deploying LLMs. In response, we propose ProRBP, a novel Progressive Retrieved Behavior-augmented Prompting framework for integrating search scenario-oriented knowledge with LLMs effectively. Specifically, we perform the user-driven behavior neighbors retrieval from the daily search logs to obtain domain-specific knowledge in time, retrieving candidates that users consider to meet their expectations. Then, we guide LLMs for relevance modeling by employing advanced prompting techniques that progressively improve the outputs of the LLMs, followed by a progressive aggregation with comprehensive consideration of diverse aspects. For online serving, we have developed an industrial application framework tailored for the deployment of LLMs in relevance modeling. Experiments on real-world industry data and online A/B testing demonstrate our proposal achieves promising performance.

8/20/2024

Robust Interaction-based Relevance Modeling for Online E-Commerce and LLM-based Retrieval

Ben Chen, Huangyu Dai, Xiang Ma, Wen Jiang, Wei Ning

Semantic relevance calculation is crucial for e-commerce search engines, as it ensures that the items selected closely align with customer intent. Inadequate attention to this aspect can detrimentally affect user experience and engagement. Traditional text-matching techniques are prevalent but often fail to capture the nuances of search intent accurately, so neural networks now have become a preferred solution to processing such complex text matching. Existing methods predominantly employ representation-based architectures, which strike a balance between high traffic capacity and low latency. However, they exhibit significant shortcomings in generalization and robustness when compared to interaction-based architectures. In this work, we introduce a robust interaction-based modeling paradigm to address these shortcomings. It encompasses 1) a dynamic length representation scheme for expedited inference, 2) a professional terms recognition method to identify subjects and core attributes from complex sentence structures, and 3) a contrastive adversarial training protocol to bolster the model's robustness and matching capabilities. Extensive offline evaluations demonstrate the superior robustness and effectiveness of our approach, and online A/B testing confirms its ability to improve relevance in the same exposure position, resulting in more clicks and conversions. To the best of our knowledge, this method is the first interaction-based approach for large e-commerce search relevance calculation. Notably, we have deployed it for the entire search traffic on alibaba.com, the largest B2B e-commerce platform in the world.

6/5/2024

APEER: Automatic Prompt Engineering Enhances Large Language Model Reranking

Can Jin, Hongwu Peng, Shiyu Zhao, Zhenting Wang, Wujiang Xu, Ligong Han, Jiahui Zhao, Kai Zhong, Sanguthevar Rajasekaran, Dimitris N. Metaxas

Large Language Models (LLMs) have significantly enhanced Information Retrieval (IR) across various modules, such as reranking. Despite impressive performance, current zero-shot relevance ranking with LLMs heavily relies on human prompt engineering. Existing automatic prompt engineering algorithms primarily focus on language modeling and classification tasks, leaving the domain of IR, particularly reranking, underexplored. Directly applying current prompt engineering algorithms to relevance ranking is challenging due to the integration of query and long passage pairs in the input, where the ranking complexity surpasses classification tasks. To reduce human effort and unlock the potential of prompt optimization in reranking, we introduce a novel automatic prompt engineering algorithm named APEER. APEER iteratively generates refined prompts through feedback and preference optimization. Extensive experiments with four LLMs and ten datasets demonstrate the substantial performance improvement of APEER over existing state-of-the-art (SoTA) manual prompts. Furthermore, we find that the prompts generated by APEER exhibit better transferability across diverse tasks and LLMs. Code is available at https://github.com/jincan333/APEER.

6/21/2024

Prompt Recursive Search: A Living Framework with Adaptive Growth in LLM Auto-Prompting

Xiangyu Zhao, Chengqian Ma

Large Language Models (LLMs) exhibit remarkable proficiency in addressing a diverse array of tasks within the Natural Language Processing (NLP) domain, with various prompt design strategies significantly augmenting their capabilities. However, these prompts, while beneficial, each possess inherent limitations. The primary prompt design methodologies are twofold: The first, exemplified by the Chain of Thought (CoT), involves manually crafting prompts specific to individual datasets, hence termed Expert-Designed Prompts (EDPs). Once these prompts are established, they are unalterable, and their effectiveness is capped by the expertise of the human designers. When applied to LLMs, the static nature of EDPs results in a uniform approach to both simple and complex problems within the same dataset, leading to the inefficient use of tokens for straightforward issues. The second method involves prompts autonomously generated by the LLM, known as LLM-Derived Prompts (LDPs), which provide tailored solutions to specific problems, mitigating the limitations of EDPs. However, LDPs may encounter a decline in performance when tackling complex problems due to the potential for error accumulation during the solution planning process. To address these challenges, we have conceived a novel Prompt Recursive Search (PRS) framework that leverages the LLM to generate solutions specific to the problem, thereby conserving tokens. The framework incorporates an assessment of problem complexity and an adjustable structure, ensuring a reduction in the likelihood of errors. We have substantiated the efficacy of PRS framework through extensive experiments using LLMs with different numbers of parameters across a spectrum of datasets in various domains. Compared to the CoT method, the PRS method has increased the accuracy on the BBH dataset by 8% using Llama3-7B model, achieving a 22% improvement.

8/6/2024