Exploiting Positional Bias for Query-Agnostic Generative Content in Search

2405.00469

Published 5/2/2024 by Andrew Parry, Sean MacAvaney, Debasis Ganguly

👀

Abstract

In recent years, neural ranking models (NRMs) have been shown to substantially outperform their lexical counterparts in text retrieval. In traditional search pipelines, a combination of features leads to well-defined behaviour. However, as neural approaches become increasingly prevalent as the final scoring component of engines or as standalone systems, their robustness to malicious text and, more generally, semantic perturbation needs to be better understood. We posit that the transformer attention mechanism can induce exploitable defects through positional bias in search models, leading to an attack that could generalise beyond a single query or topic. We demonstrate such defects by showing that non-relevant text--such as promotional content--can be easily injected into a document without adversely affecting its position in search results. Unlike previous gradient-based attacks, we demonstrate these biases in a query-agnostic fashion. In doing so, without the knowledge of topicality, we can still reduce the negative effects of non-relevant content injection by controlling injection position. Our experiments are conducted with simulated on-topic promotional text automatically generated by prompting LLMs with topical context from target documents. We find that contextualisation of a non-relevant text further reduces negative effects whilst likely circumventing existing content filtering mechanisms. In contrast, lexical models are found to be more resilient to such content injection attacks. We then investigate a simple yet effective compensation for the weaknesses of the NRMs in search, validating our hypotheses regarding transformer bias.

Create account to get full access

Overview

Neural ranking models (NRMs) have been shown to outperform traditional lexical models in text retrieval
However, as NRMs become more prevalent, their robustness to malicious text and semantic perturbation needs to be better understood
The authors posit that the transformer attention mechanism can induce exploitable defects through positional bias in search models, leading to an attack that could generalize beyond a single query or topic
They demonstrate how non-relevant text, such as promotional content, can be easily injected into a document without adversely affecting its position in search results
In contrast, lexical models are found to be more resilient to such content injection attacks

Plain English Explanation

Neural ranking models (NRMs) are a type of search algorithm that have been shown to outperform traditional keyword-based search methods. As these NRMs become more widely used, it's important to understand how they can be vulnerable to malicious attacks, such as someone trying to artificially boost the ranking of irrelevant content.

The researchers in this study found that the way NRMs use attention mechanisms can make them susceptible to a type of attack where non-relevant text, like promotional content, can be injected into a document without affecting its position in the search results. This is different from previous attacks, as it can work across multiple queries and topics, rather than being specific to a single search.

To demonstrate this, the researchers used language models to automatically generate promotional text that was then injected into target documents. They found that even when the non-relevant content was contextually relevant, it was still able to maintain a high ranking in the search results. In contrast, the more traditional lexical search models were more resilient to this type of content injection attack.

The researchers suggest a simple way to help compensate for these weaknesses in NRMs, which could be an important step in making these powerful search algorithms more robust and secure.

Technical Explanation

The researchers in this study investigate the vulnerability of neural ranking models (NRMs) to malicious text injection attacks. They posit that the transformer attention mechanism used in many NRMs can induce exploitable defects through positional bias, leading to an attack that could generalize beyond a single query or topic.

To demonstrate this, the authors conduct experiments where they inject simulated on-topic promotional text automatically generated by prompting large language models (LLMs) with topical context from target documents. They find that this non-relevant text can be easily inserted into a document without adversely affecting its position in search results.

Unlike previous gradient-based attacks, the researchers show that these biases can be exploited in a query-agnostic fashion. This means that without knowledge of the specific topic, they can still reduce the negative effects of non-relevant content injection by controlling the position of the injected text.

Interestingly, the researchers find that contextualizing the non-relevant text further reduces its negative effects while likely circumventing existing content filtering mechanisms. In contrast, they discover that lexical models are more resilient to such content injection attacks.

Critical Analysis

The researchers provide a compelling demonstration of how the attention mechanisms in NRMs can be exploited through positional biases, leading to the injection of non-relevant content into search results. This is an important finding, as it highlights a vulnerability in these powerful models that could be leveraged by bad actors.

While the researchers mention that their attack can generalize beyond a single query or topic, it would be helpful to see more extensive testing across a wider range of queries and domains to fully understand the scope of this issue. Additionally, the use of automatically generated promotional text, while effective for the purposes of the study, may not fully capture the nuances of real-world adversarial content.

It's also worth considering the potential implications of this research beyond search engines. As NRMs become more widely adopted in other domains, such as recommendation systems or text classification, similar vulnerabilities could arise and need to be addressed.

Overall, this study provides valuable insights into the potential pitfalls of relying too heavily on NRMs without a thorough understanding of their limitations and weaknesses. The proposed compensation approach is a promising step, but further research and development may be necessary to ensure the robustness of these models in the face of evolving adversarial threats.

Conclusion

This paper highlights a concerning vulnerability in neural ranking models (NRMs), where the attention mechanism used in these models can be exploited to inject non-relevant content into search results without adversely affecting their ranking. The researchers demonstrate this issue through experiments using automatically generated promotional text, and find that lexical models are more resilient to such content injection attacks.

The implications of this research extend beyond search engines, as NRMs are increasingly being adopted in a variety of applications. Understanding and addressing the potential weaknesses of these models will be crucial to ensuring their responsible and trustworthy deployment in high-stakes domains. The proposed compensation approach is a step in the right direction, but further work is needed to fully mitigate these types of adversarial attacks and ensure the long-term reliability of neural ranking systems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

💬

Technical Report: Impact of Position Bias on Language Models in Token Classification

Mehdi Ben Amor, Michael Granitzer, Jelena Mitrovi'c

Language Models (LMs) have shown state-of-the-art performance in Natural Language Processing (NLP) tasks. Downstream tasks such as Named Entity Recognition (NER) or Part-of-Speech (POS) tagging are known to suffer from data imbalance issues, particularly regarding the ratio of positive to negative examples and class disparities. This paper investigates an often-overlooked issue of encoder models, specifically the position bias of positive examples in token classification tasks. For completeness, we also include decoders in the evaluation. We evaluate the impact of position bias using different position embedding techniques, focusing on BERT with Absolute Position Embedding (APE), Relative Position Embedding (RPE), and Rotary Position Embedding (RoPE). Therefore, we conduct an in-depth evaluation of the impact of position bias on the performance of LMs when fine-tuned on token classification benchmarks. Our study includes CoNLL03 and OntoNote5.0 for NER, English Tree Bank UD_en, and TweeBank for POS tagging. We propose an evaluation approach to investigate position bias in transformer models. We show that LMs can suffer from this bias with an average drop ranging from 3% to 9% in their performance. To mitigate this effect, we propose two methods: Random Position Shifting and Context Perturbation, that we apply on batches during the training process. The results show an improvement of $approx$ 2% in the performance of the model on CoNLL03, UD_en, and TweeBank.

4/12/2024

cs.CL cs.AI

Mitigate Position Bias in Large Language Models via Scaling a Single Dimension

Yijiong Yu, Huiqiang Jiang, Xufang Luo, Qianhui Wu, Chin-Yew Lin, Dongsheng Li, Yuqing Yang, Yongfeng Huang, Lili Qiu

Large Language Models (LLMs) are increasingly applied in various real-world scenarios due to their excellent generalization capabilities and robust generative abilities. However, they exhibit position bias, also known as lost in the middle, a phenomenon that is especially pronounced in long-context scenarios, which indicates the placement of the key information in different positions of a prompt can significantly affect accuracy. This paper first explores the micro-level manifestations of position bias, concluding that attention weights are a micro-level expression of position bias. It further identifies that, in addition to position embeddings, causal attention mask also contributes to position bias by creating position-specific hidden states. Based on these insights, we propose a method to mitigate position bias by scaling this positional hidden states. Experiments on the NaturalQuestions Multi-document QA, KV retrieval, LongBench and timeline reorder tasks, using various models including RoPE models, context windowextended models, and Alibi models, demonstrate the effectiveness and generalizability of our approach. Our method can improve performance by up to 15.2% by modifying just one dimension of hidden states. Our code is available at https://aka.ms/PositionalHidden.

6/5/2024

cs.CL cs.LG

Ranking Manipulation for Conversational Search Engines

Samuel Pfrommer, Yatong Bai, Tanmay Gautam, Somayeh Sojoudi

Major search engine providers are rapidly incorporating Large Language Model (LLM)-generated content in response to user queries. These conversational search engines operate by loading retrieved website text into the LLM context for summarization and interpretation. Recent research demonstrates that LLMs are highly vulnerable to jailbreaking and prompt injection attacks, which disrupt the safety and quality goals of LLMs using adversarial strings. This work investigates the impact of prompt injections on the ranking order of sources referenced by conversational search engines. To this end, we introduce a focused dataset of real-world consumer product websites and formalize conversational search ranking as an adversarial problem. Experimentally, we analyze conversational search rankings in the absence of adversarial injections and show that different LLMs vary significantly in prioritizing product name, document content, and context position. We then present a tree-of-attacks-based jailbreaking technique which reliably promotes low-ranked products. Importantly, these attacks transfer effectively to state-of-the-art conversational search engines such as perplexity.ai. Given the strong financial incentive for website owners to boost their search ranking, we argue that our problem formulation is of critical importance for future robustness work.

6/14/2024

cs.CL

💬

Where is the answer? Investigating Positional Bias in Language Model Knowledge Extraction

Kuniaki Saito, Kihyuk Sohn, Chen-Yu Lee, Yoshitaka Ushiku

Large language models require updates to remain up-to-date or adapt to new domains by fine-tuning them with new documents. One key is memorizing the latest information in a way that the memorized information is extractable with a query prompt. However, LLMs suffer from a phenomenon called perplexity curse; despite minimizing document perplexity during fine-tuning, LLMs struggle to extract information through a prompt sentence. In this new knowledge acquisition and extraction, we find a very intriguing fact that LLMs can accurately answer questions about the first sentence, but they struggle to extract information described in the middle or end of the documents used for fine-tuning. Our study suggests that the auto-regressive training causes this issue; each token is prompted by reliance on all previous tokens, which hinders the model from recalling information from training documents by question prompts. To conduct the in-depth study, we publish both synthetic and real datasets, enabling the evaluation of the QA performance w.r.t. the position of the corresponding answer in a document. Our investigation shows that even a large model suffers from the perplexity curse, but regularization such as denoising auto-regressive loss can enhance the information extraction from diverse positions. These findings will be (i) a key to improving knowledge extraction from LLMs and (ii) new elements to discuss the trade-off between RAG and fine-tuning in adapting LLMs to a new domain.

5/24/2024

cs.CL cs.AI