Human-Imperceptible Retrieval Poisoning Attacks in LLM-Powered Applications

Read original: arXiv:2404.17196 - Published 4/29/2024 by Quan Zhang, Binqi Zeng, Chijin Zhou, Gwihwan Go, Heyuan Shi, Yu Jiang

🔮

Overview

Researchers have developed advanced frameworks that allow large language models (LLMs) to augment their knowledge with external content using a technique called retrieval augmented generation (RAG).
However, these frameworks do not adequately consider the risk of external content, leaving LLM-powered applications vulnerable to attacks.
This paper introduces a new threat called "retrieval poisoning," where attackers can guide LLM-powered applications to generate malicious responses by crafting documents that appear benign but mislead the RAG process.

Plain English Explanation

Large language models (LLMs) like GPT-3 have become increasingly powerful at tasks like generating text, answering questions, and even engaging in conversations. To make these LLMs even more useful, researchers have developed frameworks that allow them to augment their knowledge with additional information from external sources using a technique called retrieval augmented generation (RAG).

The RAG process works by having the LLM first retrieve relevant information from a database of documents, and then using that information to generate its response. However, the paper reveals that the current design of these frameworks doesn't do enough to protect against a new kind of attack called "retrieval poisoning."

In this attack, malicious actors can craft documents that appear benign but actually contain information that tricks the LLM into generating incorrect or even harmful responses. Even though the documents themselves might seem legitimate, once they're used as sources for the RAG process, the application is misled into producing the wrong output.

The researchers' experiments showed that attackers can successfully mislead LLMs in this way up to 88% of the time, and even achieve a 67% success rate in real-world applications. This demonstrates the potential for serious impact if this kind of attack were to be carried out in the real world.

Technical Explanation

The paper examines the vulnerability of LLM-powered applications that use retrieval augmented generation (RAG) to enhance their knowledge and capabilities. In the RAG process, the LLM first retrieves relevant information from a database of external documents, and then uses that information to generate its final response.

The researchers analyzed the design of existing LLM application frameworks and found that they do not adequately consider the risks of the external content used in the RAG process. This leaves the applications vulnerable to a new attack called "retrieval poisoning," where attackers can craft documents that appear benign but actually contain information that misleads the LLM during the RAG process.

Through their analysis, the researchers discovered that attackers can craft documents that are visually indistinguishable from legitimate ones, but when used as sources for the RAG process, cause the LLM-powered application to generate incorrect or even malicious responses.

In their experiments, the researchers found that attackers were able to successfully mislead the LLMs in this way up to 88.33% of the time. They also tested the attack in a real-world application scenario and achieved a 66.67% success rate, demonstrating the potential impact of this threat.

Critical Analysis

The paper raises important concerns about the security implications of using retrieval augmented generation (RAG) in LLM-powered applications. While the RAG technique can enhance the capabilities of these applications, the researchers have identified a significant vulnerability that could be exploited by malicious actors.

One limitation of the research is that it primarily focuses on the technical aspects of the attack, without delving deeply into the potential real-world consequences or the ethical considerations around the use of large language models. The paper does not address how this vulnerability could be exploited in practice, nor does it discuss the potential societal impact if such attacks were to occur.

Additionally, the paper does not provide clear recommendations for how LLM application frameworks can be designed to better mitigate the risks of retrieval poisoning. While the researchers have identified the problem, more work is needed to develop effective solutions that can be implemented by developers and researchers working in this field.

Overall, the paper highlights an important security concern that deserves further attention and research. Continued exploration of this issue and the development of robust countermeasures will be crucial as LLM-powered applications become more prevalent in our daily lives.

Conclusion

This paper introduces a new threat to LLM-powered applications called "retrieval poisoning," where attackers can craft seemingly benign documents that mislead the application's retrieval augmented generation (RAG) process, causing it to generate incorrect or even malicious responses.

The researchers' experiments have demonstrated the significant potential impact of this attack, with success rates as high as 88.33% in their tests. This highlights the need for LLM application frameworks to better consider the risks of external content used in the RAG process, and to develop more robust security measures to protect against such attacks.

As LLMs continue to play an increasingly important role in our digital lives, addressing vulnerabilities like retrieval poisoning will be crucial to ensuring the safety and reliability of the applications powered by these powerful language models.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🔮

Human-Imperceptible Retrieval Poisoning Attacks in LLM-Powered Applications

Quan Zhang, Binqi Zeng, Chijin Zhou, Gwihwan Go, Heyuan Shi, Yu Jiang

Presently, with the assistance of advanced LLM application development frameworks, more and more LLM-powered applications can effortlessly augment the LLMs' knowledge with external content using the retrieval augmented generation (RAG) technique. However, these frameworks' designs do not have sufficient consideration of the risk of external content, thereby allowing attackers to undermine the applications developed with these frameworks. In this paper, we reveal a new threat to LLM-powered applications, termed retrieval poisoning, where attackers can guide the application to yield malicious responses during the RAG process. Specifically, through the analysis of LLM application frameworks, attackers can craft documents visually indistinguishable from benign ones. Despite the documents providing correct information, once they are used as reference sources for RAG, the application is misled into generating incorrect responses. Our preliminary experiments indicate that attackers can mislead LLMs with an 88.33% success rate, and achieve a 66.67% success rate in the real-world application, demonstrating the potential impact of retrieval poisoning.

4/29/2024

On the Vulnerability of Applying Retrieval-Augmented Generation within Knowledge-Intensive Application Domains

Xun Xian, Ganghua Wang, Xuan Bi, Jayanth Srinivasa, Ashish Kundu, Charles Fleming, Mingyi Hong, Jie Ding

Retrieval-Augmented Generation (RAG) has been empirically shown to enhance the performance of large language models (LLMs) in knowledge-intensive domains such as healthcare, finance, and legal contexts. Given a query, RAG retrieves relevant documents from a corpus and integrates them into the LLMs' generation process. In this study, we investigate the adversarial robustness of RAG, focusing specifically on examining the retrieval system. First, across 225 different setup combinations of corpus, retriever, query, and targeted information, we show that retrieval systems are vulnerable to universal poisoning attacks in medical Q&A. In such attacks, adversaries generate poisoned documents containing a broad spectrum of targeted information, such as personally identifiable information. When these poisoned documents are inserted into a corpus, they can be accurately retrieved by any users, as long as attacker-specified queries are used. To understand this vulnerability, we discovered that the deviation from the query's embedding to that of the poisoned document tends to follow a pattern in which the high similarity between the poisoned document and the query is retained, thereby enabling precise retrieval. Based on these findings, we develop a new detection-based defense to ensure the safe use of RAG. Through extensive experiments spanning various Q&A domains, we observed that our proposed method consistently achieves excellent detection rates in nearly all cases.

9/27/2024

PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models

Wei Zou, Runpeng Geng, Binghui Wang, Jinyuan Jia

Large language models (LLMs) have achieved remarkable success due to their exceptional generative capabilities. Despite their success, they also have inherent limitations such as a lack of up-to-date knowledge and hallucination. Retrieval-Augmented Generation (RAG) is a state-of-the-art technique to mitigate these limitations. The key idea of RAG is to ground the answer generation of an LLM on external knowledge retrieved from a knowledge database. Existing studies mainly focus on improving the accuracy or efficiency of RAG, leaving its security largely unexplored. We aim to bridge the gap in this work. We find that the knowledge database in a RAG system introduces a new and practical attack surface. Based on this attack surface, we propose PoisonedRAG, the first knowledge corruption attack to RAG, where an attacker could inject a few malicious texts into the knowledge database of a RAG system to induce an LLM to generate an attacker-chosen target answer for an attacker-chosen target question. We formulate knowledge corruption attacks as an optimization problem, whose solution is a set of malicious texts. Depending on the background knowledge (e.g., black-box and white-box settings) of an attacker on a RAG system, we propose two solutions to solve the optimization problem, respectively. Our results show PoisonedRAG could achieve a 90% attack success rate when injecting five malicious texts for each target question into a knowledge database with millions of texts. We also evaluate several defenses and our results show they are insufficient to defend against PoisonedRAG, highlighting the need for new defenses.

8/14/2024

BadRAG: Identifying Vulnerabilities in Retrieval Augmented Generation of Large Language Models

Jiaqi Xue, Mengxin Zheng, Yebowen Hu, Fei Liu, Xun Chen, Qian Lou

Large Language Models (LLMs) are constrained by outdated information and a tendency to generate incorrect data, commonly referred to as hallucinations. Retrieval-Augmented Generation (RAG) addresses these limitations by combining the strengths of retrieval-based methods and generative models. This approach involves retrieving relevant information from a large, up-to-date dataset and using it to enhance the generation process, leading to more accurate and contextually appropriate responses. Despite its benefits, RAG introduces a new attack surface for LLMs, particularly because RAG databases are often sourced from public data, such as the web. In this paper, we propose TrojRAG{} to identify the vulnerabilities and attacks on retrieval parts (RAG database) and their indirect attacks on generative parts (LLMs). Specifically, we identify that poisoning several customized content passages could achieve a retrieval backdoor, where the retrieval works well for clean queries but always returns customized poisoned adversarial queries. Triggers and poisoned passages can be highly customized to implement various attacks. For example, a trigger could be a semantic group like The Republican Party, Donald Trump, etc. Adversarial passages can be tailored to different contents, not only linked to the triggers but also used to indirectly attack generative LLMs without modifying them. These attacks can include denial-of-service attacks on RAG and semantic steering attacks on LLM generations conditioned by the triggers. Our experiments demonstrate that by just poisoning 10 adversarial passages can induce 98.2% success rate to retrieve the adversarial passages. Then, these passages can increase the reject ratio of RAG-based GPT-4 from 0.01% to 74.6% or increase the rate of negative responses from 0.22% to 72% for targeted queries.

6/7/2024