KGPA: Robustness Evaluation for Large Language Models via Cross-Domain Knowledge Graphs

2406.10802

Published 6/18/2024 by Aihua Pei (Waseda University), Zehua Yang (Waseda University), Shunan Zhu (Waseda University), Ruoxi Cheng (Southeast University), Ju Jia (Southeast University), Lina Wang (Wuhan University)

cs.CL cs.AI

KGPA: Robustness Evaluation for Large Language Models via Cross-Domain Knowledge Graphs

Abstract

Existing frameworks for assessing robustness of large language models (LLMs) overly depend on specific benchmarks, increasing costs and failing to evaluate performance of LLMs in professional domains due to dataset limitations. This paper proposes a framework that systematically evaluates the robustness of LLMs under adversarial attack scenarios by leveraging knowledge graphs (KGs). Our framework generates original prompts from the triplets of knowledge graphs and creates adversarial prompts by poisoning, assessing the robustness of LLMs through the results of these adversarial attacks. We systematically evaluate the effectiveness of this framework and its modules. Experiments show that adversarial robustness of the ChatGPT family ranks as GPT-4-turbo > GPT-4o > GPT-3.5-turbo, and the robustness of large language models is influenced by the professional domains in which they operate.

Create account to get full access

Overview

• This paper proposes a novel approach called KGPA (Knowledge Graph Powered Adversary) for evaluating the robustness of large language models (LLMs) against adversarial attacks.

• KGPA leverages cross-domain knowledge graphs to identify and construct adversarial inputs that can effectively challenge the capabilities of LLMs across diverse tasks and domains.

• The paper presents comprehensive empirical evaluations demonstrating the effectiveness of KGPA in uncovering vulnerabilities in state-of-the-art LLMs, including GPT-3, InstructGPT, and PALM.

Plain English Explanation

The paper focuses on evaluating the robustness of large language models (LLMs) - powerful AI systems that can generate human-like text. LLMs have shown impressive capabilities, but they can also be vulnerable to adversarial attacks, where small, carefully crafted changes to the input can cause the model to make mistakes.

The researchers developed a new approach called KGPA that uses knowledge graphs - structured representations of information - to find vulnerabilities in LLMs. Knowledge graphs contain a wealth of information about the world, and the researchers used this to create adversarial inputs that could challenge the models in unexpected ways.

For example, the researchers might construct a sentence that combines information from multiple domains, like science and politics, in a way that confuses the LLM. By testing the models against a diverse set of these adversarial inputs, the researchers were able to uncover weaknesses that had not been found before.

The paper presents detailed experiments showing that KGPA is effective at finding vulnerabilities in state-of-the-art LLMs like GPT-3, InstructGPT, and PALM. This work is important for improving the reliability and safety of these powerful language models as they become more widely used.

Technical Explanation

The key idea behind KGPA is to leverage the rich information contained in cross-domain knowledge graphs to construct adversarial inputs that can challenge the capabilities of LLMs. Knowledge graphs represent entities (e.g., people, places, concepts) and the relationships between them, providing a structured way to model real-world knowledge.

The KGPA approach works as follows:

Knowledge Graph Construction: The researchers construct a comprehensive cross-domain knowledge graph by integrating information from various sources, including Wikipedia, Wikidata, and domain-specific knowledge bases.
Adversarial Example Generation: KGPA uses the knowledge graph to identify entities and relationships that can be combined in unexpected ways to create adversarial inputs. For example, it might generate a sentence that mixes factual information from science and politics in a confusing manner.
Robustness Evaluation: The researchers evaluate the robustness of LLMs by testing their performance on the adversarial inputs generated by KGPA, across a diverse set of tasks and domains.

The experiments in the paper show that KGPA is effective at uncovering vulnerabilities in state-of-the-art LLMs like GPT-3, InstructGPT, and PALM. The researchers demonstrate that KGPA-generated adversarial inputs can significantly degrade the performance of these models on a variety of tasks, including question answering, natural language inference, and text generation.

Critical Analysis

The paper provides a comprehensive and rigorous evaluation of KGPA, and the results are compelling. However, there are a few potential limitations and areas for further research:

Generalization to Unseen Domains: While KGPA leverages cross-domain knowledge graphs, the paper primarily evaluates the approach on a limited set of domains. It would be valuable to assess the model's ability to generalize to a wider range of domains and tasks.
Sensitivity to Knowledge Graph Quality: The effectiveness of KGPA is inherently dependent on the quality and coverage of the underlying knowledge graph. Errors or biases in the knowledge graph could lead to suboptimal adversarial examples. Exploring techniques to improve knowledge graph construction and curation would be a valuable direction.
Computational Efficiency: The process of generating adversarial examples using KGPA may be computationally intensive, especially for large-scale evaluations. Investigating ways to improve the efficiency of the approach could make it more practical for real-world deployment.
Ethical Considerations: While the paper focuses on evaluating the robustness of LLMs, the techniques developed could also be used to create adversarial attacks that harm users or systems. Careful consideration of the ethical implications and potential misuse of KGPA is warranted.

Conclusion

The KGPA approach presented in this paper represents a significant advancement in the field of robustness evaluation for large language models. By leveraging cross-domain knowledge graphs, the researchers have developed a powerful tool for uncovering vulnerabilities in state-of-the-art LLMs, which is crucial for improving the reliability and safety of these powerful AI systems.

The empirical results demonstrate the effectiveness of KGPA, and the paper also highlights important areas for further research and development. As LLMs continue to become more prominent in various applications, the insights and techniques presented in this work will be invaluable for ensuring the robust and responsible deployment of these technologies.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Assessing Adversarial Robustness of Large Language Models: An Empirical Study

Zeyu Yang, Zhao Meng, Xiaochen Zheng, Roger Wattenhofer

Large Language Models (LLMs) have revolutionized natural language processing, but their robustness against adversarial attacks remains a critical concern. We presents a novel white-box style attack approach that exposes vulnerabilities in leading open-source LLMs, including Llama, OPT, and T5. We assess the impact of model size, structure, and fine-tuning strategies on their resistance to adversarial perturbations. Our comprehensive evaluation across five diverse text classification tasks establishes a new benchmark for LLM robustness. The findings of this study have far-reaching implications for the reliable deployment of LLMs in real-world applications and contribute to the advancement of trustworthy AI systems.

5/7/2024

cs.CL cs.LG

💬

BiasKG: Adversarial Knowledge Graphs to Induce Bias in Large Language Models

Chu Fei Luo, Ahmad Ghawanmeh, Xiaodan Zhu, Faiza Khan Khattak

Modern large language models (LLMs) have a significant amount of world knowledge, which enables strong performance in commonsense reasoning and knowledge-intensive tasks when harnessed properly. The language model can also learn social biases, which has a significant potential for societal harm. There have been many mitigation strategies proposed for LLM safety, but it is unclear how effective they are for eliminating social biases. In this work, we propose a new methodology for attacking language models with knowledge graph augmented generation. We refactor natural language stereotypes into a knowledge graph, and use adversarial attacking strategies to induce biased responses from several open- and closed-source language models. We find our method increases bias in all models, even those trained with safety guardrails. This demonstrates the need for further research in AI safety, and further work in this new adversarial space.

5/9/2024

cs.CL cs.LG

💬

AttacKG+:Boosting Attack Knowledge Graph Construction with Large Language Models

Yongheng Zhang, Tingwen Du, Yunshan Ma, Xiang Wang, Yi Xie, Guozheng Yang, Yuliang Lu, Ee-Chien Chang

Attack knowledge graph construction seeks to convert textual cyber threat intelligence (CTI) reports into structured representations, portraying the evolutionary traces of cyber attacks. Even though previous research has proposed various methods to construct attack knowledge graphs, they generally suffer from limited generalization capability to diverse knowledge types as well as requirement of expertise in model design and tuning. Addressing these limitations, we seek to utilize Large Language Models (LLMs), which have achieved enormous success in a broad range of tasks given exceptional capabilities in both language understanding and zero-shot task fulfillment. Thus, we propose a fully automatic LLM-based framework to construct attack knowledge graphs named: AttacKG+. Our framework consists of four consecutive modules: rewriter, parser, identifier, and summarizer, each of which is implemented by instruction prompting and in-context learning empowered by LLMs. Furthermore, we upgrade the existing attack knowledge schema and propose a comprehensive version. We represent a cyber attack as a temporally unfolding event, each temporal step of which encapsulates three layers of representation, including behavior graph, MITRE TTP labels, and state summary. Extensive evaluation demonstrates that: 1) our formulation seamlessly satisfies the information needs in threat event analysis, 2) our construction framework is effective in faithfully and accurately extracting the information defined by AttacKG+, and 3) our attack graph directly benefits downstream security practices such as attack reconstruction. All the code and datasets will be released upon acceptance.

5/9/2024

cs.CR cs.AI

KnowGPT: Knowledge Graph based Prompting for Large Language Models

Qinggang Zhang, Junnan Dong, Hao Chen, Daochen Zha, Zailiang Yu, Xiao Huang

Large Language Models (LLMs) have demonstrated remarkable capabilities in many real-world applications. Nonetheless, LLMs are often criticized for their tendency to produce hallucinations, wherein the models fabricate incorrect statements on tasks beyond their knowledge and perception. To alleviate this issue, researchers have explored leveraging the factual knowledge in knowledge graphs (KGs) to ground the LLM's responses in established facts and principles. However, most state-of-the-art LLMs are closed-source, making it challenging to develop a prompting framework that can efficiently and effectively integrate KGs into LLMs with hard prompts only. Generally, existing KG-enhanced LLMs usually suffer from three critical issues, including huge search space, high API costs, and laborious prompt engineering, that impede their widespread application in practice. To this end, we introduce a novel Knowledge Graph based PrompTing framework, namely KnowGPT, to enhance LLMs with domain knowledge. KnowGPT contains a knowledge extraction module to extract the most informative knowledge from KGs, and a context-aware prompt construction module to automatically convert extracted knowledge into effective prompts. Experiments on three benchmarks demonstrate that KnowGPT significantly outperforms all competitors. Notably, KnowGPT achieves a 92.6% accuracy on OpenbookQA leaderboard, comparable to human-level performance.

6/5/2024

cs.CL cs.AI