Authorship Obfuscation in Multilingual Machine-Generated Text Detection

Read original: arXiv:2401.07867 - Published 6/19/2024 by Dominik Macko, Robert Moro, Adaku Uchendu, Ivan Srba, Jason Samuel Lucas, Michiharu Yamashita, Nafis Irtiza Tripto, Dongwon Lee, Jakub Simko, Maria Bielikova

🔎

Overview

Recent advancements in large language models (LLMs) have led to concerns about their potential for misuse, such as the massive generation and spread of disinformation.
Machine-generated text (MGT) detection is crucial to address these threats, but it is susceptible to authorship obfuscation (AO) methods like paraphrasing, which can help MGTs evade detection.
Previous evaluations of MGT detection were limited to monolingual settings, so the susceptibility of recently proposed multilingual detectors is still unknown.

Plain English Explanation

The paper investigates the ability of modern language models to generate high-quality text, which has raised concerns about their potential misuse in spreading misinformation or disinformation on a large scale. To address this, the researchers focus on the challenge of detecting machine-generated text, which is an important tool for combating the spread of such content.

However, the researchers note that current MGT detection methods are susceptible to authorship obfuscation techniques, such as paraphrasing, that can help machine-generated text evade detection. Previous studies have only looked at this problem in single-language settings, so the researchers set out to investigate the performance of multilingual MGT detectors when faced with these obfuscation techniques.

Technical Explanation

The paper presents a comprehensive study of the performance of 10 well-known authorship obfuscation (AO) methods in attacking 37 machine-generated text (MGT) detection methods across 11 different languages. This amounts to a total of 4,070 combinations (10 AO methods × 37 MGT detectors × 11 languages) that were evaluated.

The researchers also investigated the effect of data augmentation, using obfuscated texts, on the adversarial robustness of the MGT detection methods. The results show that all the tested AO methods were able to cause evasion of automated detection in all the languages tested, with homoglyph attacks being particularly successful. However, some of the AO methods severely damaged the text, making it difficult for humans to read or recognize.

Critical Analysis

The study provides a thorough and comprehensive evaluation of the susceptibility of MGT detection methods to authorship obfuscation techniques in a multilingual setting. This is an important contribution, as previous research had only focused on monolingual scenarios, leaving the performance of multilingual detectors largely unknown.

One potential limitation of the study is that it did not explicitly evaluate the impact of the obfuscated texts on human readability and comprehension. While the researchers note that some of the AO methods severely damaged the text, a more in-depth assessment of the human-perceivable quality of the obfuscated texts could have provided additional insights.

Additionally, the study does not delve into the potential reasons why certain AO methods, such as homoglyph attacks, were more successful than others. Exploring the underlying mechanisms and vulnerabilities of the MGT detection methods could inform the development of more robust and comprehensive solutions.

Conclusion

This study highlights the significant vulnerability of current machine-generated text detection methods to authorship obfuscation techniques, even in multilingual settings. The findings underscore the urgent need for further research and development of more adversarially robust MGT detection approaches that can withstand a wide range of obfuscation attacks. As language models continue to advance, addressing the challenges posed by the misuse of their text generation capabilities will be a critical priority for the research community and policymakers alike.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🔎

Authorship Obfuscation in Multilingual Machine-Generated Text Detection

Dominik Macko, Robert Moro, Adaku Uchendu, Ivan Srba, Jason Samuel Lucas, Michiharu Yamashita, Nafis Irtiza Tripto, Dongwon Lee, Jakub Simko, Maria Bielikova

High-quality text generation capability of recent Large Language Models (LLMs) causes concerns about their misuse (e.g., in massive generation/spread of disinformation). Machine-generated text (MGT) detection is important to cope with such threats. However, it is susceptible to authorship obfuscation (AO) methods, such as paraphrasing, which can cause MGTs to evade detection. So far, this was evaluated only in monolingual settings. Thus, the susceptibility of recently proposed multilingual detectors is still unknown. We fill this gap by comprehensively benchmarking the performance of 10 well-known AO methods, attacking 37 MGT detection methods against MGTs in 11 languages (i.e., 10 $times$ 37 $times$ 11 = 4,070 combinations). We also evaluate the effect of data augmentation on adversarial robustness using obfuscated texts. The results indicate that all tested AO methods can cause evasion of automated detection in all tested languages, where homoglyph attacks are especially successful. However, some of the AO methods severely damaged the text, making it no longer readable or easily recognizable by humans (e.g., changed language, weird characters).

6/19/2024

Zero-Shot Machine-Generated Text Detection Using Mixture of Large Language Models

Matthieu Dubois, Franc{c}ois Yvon, Pablo Piantanida

The dissemination of Large Language Models (LLMs), trained at scale, and endowed with powerful text-generating abilities has vastly increased the threats posed by generative AI technologies by reducing the cost of producing harmful, toxic, faked or forged content. In response, various proposals have been made to automatically discriminate artificially generated from human-written texts, typically framing the problem as a classification problem. Most approaches evaluate an input document by a well-chosen detector LLM, assuming that low-perplexity scores reliably signal machine-made content. As using one single detector can induce brittleness of performance, we instead consider several and derive a new, theoretically grounded approach to combine their respective strengths. Our experiments, using a variety of generator LLMs, suggest that our method effectively increases the robustness of detection.

9/14/2024

Humanizing Machine-Generated Content: Evading AI-Text Detection through Adversarial Attack

Ying Zhou, Ben He, Le Sun

With the development of large language models (LLMs), detecting whether text is generated by a machine becomes increasingly challenging in the face of malicious use cases like the spread of false information, protection of intellectual property, and prevention of academic plagiarism. While well-trained text detectors have demonstrated promising performance on unseen test data, recent research suggests that these detectors have vulnerabilities when dealing with adversarial attacks such as paraphrasing. In this paper, we propose a framework for a broader class of adversarial attacks, designed to perform minor perturbations in machine-generated content to evade detection. We consider two attack settings: white-box and black-box, and employ adversarial learning in dynamic scenarios to assess the potential enhancement of the current detection model's robustness against such attacks. The empirical results reveal that the current detection models can be compromised in as little as 10 seconds, leading to the misclassification of machine-generated text as human-written content. Furthermore, we explore the prospect of improving the model's robustness over iterative adversarial learning. Although some improvements in model robustness are observed, practical applications still face significant challenges. These findings shed light on the future development of AI-text detectors, emphasizing the need for more accurate and robust detection methods.

4/3/2024

Machine-Generated Text Localization

Zhongping Zhang, Wenda Qin, Bryan A. Plummer

Machine-Generated Text (MGT) detection aims to identify a piece of text as machine or human written. Prior work has primarily formulated MGT detection as a binary classification task over an entire document, with limited work exploring cases where only part of a document is machine generated. This paper provides the first in-depth study of MGT that localizes the portions of a document that were machine generated. Thus, if a bad actor were to change a key portion of a news article to spread misinformation, whole document MGT detection may fail since the vast majority is human written, but our approach can succeed due to its granular approach. A key challenge in our MGT localization task is that short spans of text, e.g., a single sentence, provides little information indicating if it is machine generated due to its short length. To address this, we leverage contextual information, where we predict whether multiple sentences are machine or human written at once. This enables our approach to identify changes in style or content to boost performance. A gain of 4-13% mean Average Precision (mAP) over prior work demonstrates the effectiveness of approach on five diverse datasets: GoodNews, VisualNews, WikiText, Essay, and WP. We release our implementation at https://github.com/Zhongping-Zhang/MGT_Localization.

6/12/2024