Ghostbuster: Detecting Text Ghostwritten by Large Language Models

2305.15047

Published 4/9/2024 by Vivek Verma, Eve Fleisig, Nicholas Tomlin, Dan Klein

💬

Abstract

We introduce Ghostbuster, a state-of-the-art system for detecting AI-generated text. Our method works by passing documents through a series of weaker language models, running a structured search over possible combinations of their features, and then training a classifier on the selected features to predict whether documents are AI-generated. Crucially, Ghostbuster does not require access to token probabilities from the target model, making it useful for detecting text generated by black-box models or unknown model versions. In conjunction with our model, we release three new datasets of human- and AI-generated text as detection benchmarks in the domains of student essays, creative writing, and news articles. We compare Ghostbuster to a variety of existing detectors, including DetectGPT and GPTZero, as well as a new RoBERTa baseline. Ghostbuster achieves 99.0 F1 when evaluated across domains, which is 5.9 F1 higher than the best preexisting model. It also outperforms all previous approaches in generalization across writing domains (+7.5 F1), prompting strategies (+2.1 F1), and language models (+4.4 F1). We also analyze the robustness of our system to a variety of perturbations and paraphrasing attacks and evaluate its performance on documents written by non-native English speakers.

Create account to get full access

Overview

Ghostbuster is a new system for detecting AI-generated text.
It works by passing documents through multiple language models, analyzing their features, and then training a classifier to predict if the text was generated by AI.
Ghostbuster doesn't require access to the internal workings of the target AI model, making it useful for detecting text from black-box or unknown models.
The researchers also released three new datasets for benchmarking AI text detection in different domains.
Ghostbuster outperformed existing detectors like DetectGPT and GPTZero across various tests.

Plain English Explanation

Ghostbuster is a new tool that can tell if a piece of text was written by a human or generated by an AI system. It works by sending the text through a series of different language models - smaller AI systems that analyze the words and structure of the text. Ghostbuster then looks for patterns in how these models analyze the text and uses that to train a classifier that can predict if the text is human-written or AI-generated.

One key advantage of Ghostbuster is that it doesn't need to know the details of the AI model that generated the text. This makes it useful for detecting text from black-box models or new AI systems that you may not have information about.

To test Ghostbuster, the researchers created three new datasets of human and AI-written text in areas like student essays, creative writing, and news articles. They found that Ghostbuster was able to detect AI-generated text with 99% accuracy, outperforming other existing detectors. It was also better at generalizing to different writing styles, prompts, and AI models.

Technical Explanation

The core of the Ghostbuster system is a structured search over features extracted from a sequence of weaker language models. First, the input document is passed through a series of pre-trained language models, such as BERT and GPT-2. These models produce a variety of statistics and embeddings that capture different aspects of the text.

Ghostbuster then searches over possible combinations of these features, looking for the set that best distinguishes human-written and AI-generated text. This structured search allows the system to identify the most informative signals without requiring access to the internals of the target AI model.

The selected features are then used to train a final classifier that predicts whether a given document was written by a human or generated by an AI system. The researchers evaluated Ghostbuster on three new benchmark datasets covering student essays, creative writing, and news articles. Across these domains, Ghostbuster achieved a 99.0 F1 score, outperforming previous detectors by a significant margin.

Critical Analysis

The Ghostbuster paper provides a comprehensive and rigorous evaluation of the system's performance. The researchers carefully designed their experiments to assess Ghostbuster's ability to generalize across different writing styles, prompting strategies, and language models. This is an important consideration, as real-world AI-generated text may come from a wide variety of sources.

However, the paper does not deeply explore the system's robustness to more advanced adversarial attacks, such as targeted paraphrasing or fine-tuning of the generated text. While the researchers did test Ghostbuster's performance on text written by non-native English speakers, additional evaluation on more diverse populations would be valuable.

Furthermore, the computational and memory requirements of Ghostbuster's structured search and multi-model architecture may limit its practical deployability, especially for real-time detection. Exploring more efficient architectures or distillation techniques could help address this.

Conclusion

Overall, the Ghostbuster system represents a significant advance in the field of AI-generated text detection. By leveraging a structured search over features from multiple language models, the system achieves state-of-the-art performance without requiring access to the internals of the target AI model. The release of new benchmark datasets in various domains also provides valuable resources for further research in this area.

As AI-generated text becomes more prevalent, tools like Ghostbuster will be crucial for maintaining the integrity of written communication and combating the spread of misinformation. The authors' careful evaluation and critical analysis of their work sets a high standard for future advances in this important and rapidly evolving field.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

🔎

Efficient Detection of LLM-generated Texts with a Bayesian Surrogate Model

Yibo Miao, Hongcheng Gao, Hao Zhang, Zhijie Deng

The detection of machine-generated text, especially from large language models (LLMs), is crucial in preventing serious social problems resulting from their misuse. Some methods train dedicated detectors on specific datasets but fall short in generalizing to unseen test data, while other zero-shot ones often yield suboptimal performance. Although the recent DetectGPT has shown promising detection performance, it suffers from significant inefficiency issues, as detecting a single candidate requires querying the source LLM with hundreds of its perturbations. This paper aims to bridge this gap. Concretely, we propose to incorporate a Bayesian surrogate model, which allows us to select typical samples based on Bayesian uncertainty and interpolate scores from typical samples to other samples, to improve query efficiency. Empirical results demonstrate that our method significantly outperforms existing approaches under a low query budget. Notably, when detecting the text generated by LLaMA family models, our method with just 2 or 3 queries can outperform DetectGPT with 200 queries.

6/5/2024

cs.LG cs.AI cs.CL

🤖

Detecting AI Generated Text Based on NLP and Machine Learning Approaches

Nuzhat Prova

Recent advances in natural language processing (NLP) may enable artificial intelligence (AI) models to generate writing that is identical to human written form in the future. This might have profound ethical, legal, and social repercussions. This study aims to address this problem by offering an accurate AI detector model that can differentiate between electronically produced text and human-written text. Our approach includes machine learning methods such as XGB Classifier, SVM, BERT architecture deep learning models. Furthermore, our results show that the BERT performs better than previous models in identifying information generated by AI from information provided by humans. Provide a comprehensive analysis of the current state of AI-generated text identification in our assessment of pertinent studies. Our testing yielded positive findings, showing that our strategy is successful, with the BERT emerging as the most probable answer. We analyze the research's societal implications, highlighting the possible advantages for various industries while addressing sustainability issues pertaining to morality and the environment. The XGB classifier and SVM give 0.84 and 0.81 accuracy in this article, respectively. The greatest accuracy in this research is provided by the BERT model, which provides 0.93% accuracy.

4/17/2024

cs.LG cs.CL

🤖

Vietnamese AI Generated Text Detection

Quang-Dan Tran, Van-Quan Nguyen, Quang-Huy Pham, K. B. Thang Nguyen, Trong-Hop Do

In recent years, Large Language Models (LLMs) have become integrated into our daily lives, serving as invaluable assistants in completing tasks. Widely embraced by users, the abuse of LLMs is inevitable, particularly in using them to generate text content for various purposes, leading to difficulties in distinguishing between text generated by LLMs and that written by humans. In this study, we present a dataset named ViDetect, comprising 6.800 samples of Vietnamese essay, with 3.400 samples authored by humans and the remainder generated by LLMs, serving the purpose of detecting text generated by AI. We conducted evaluations using state-of-the-art methods, including ViT5, BartPho, PhoBERT, mDeberta V3, and mBERT. These results contribute not only to the growing body of research on detecting text generated by AI but also demonstrate the adaptability and effectiveness of different methods in the Vietnamese language context. This research lays the foundation for future advancements in AI-generated text detection and provides valuable insights for researchers in the field of natural language processing.

5/7/2024

cs.CL cs.AI

🔎

Deepfake Text Detection in the Wild

Yafu Li, Qintong Li, Leyang Cui, Wei Bi, Zhilin Wang, Longyue Wang, Linyi Yang, Shuming Shi, Yue Zhang

Large language models (LLMs) have achieved human-level text generation, emphasizing the need for effective AI-generated text detection to mitigate risks like the spread of fake news and plagiarism. Existing research has been constrained by evaluating detection methods on specific domains or particular language models. In practical scenarios, however, the detector faces texts from various domains or LLMs without knowing their sources. To this end, we build a comprehensive testbed by gathering texts from diverse human writings and texts generated by different LLMs. Empirical results show challenges in distinguishing machine-generated texts from human-authored ones across various scenarios, especially out-of-distribution. These challenges are due to the decreasing linguistic distinctions between the two sources. Despite challenges, the top-performing detector can identify 86.54% out-of-domain texts generated by a new LLM, indicating the feasibility for application scenarios. We release our resources at https://github.com/yafuly/MAGE.

5/22/2024

cs.CL