Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews

2403.07183

YC

2

Reddit

0

Published 6/18/2024 by Weixin Liang, Zachary Izzo, Yaohui Zhang, Haley Lepp, Hancheng Cao, Xuandong Zhao, Lingjiao Chen, Haotian Ye, Sheng Liu, Zhi Huang and 2 others
Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews

Abstract

We present an approach for estimating the fraction of text in a large corpus which is likely to be substantially modified or produced by a large language model (LLM). Our maximum likelihood model leverages expert-written and AI-generated reference texts to accurately and efficiently examine real-world LLM-use at the corpus level. We apply this approach to a case study of scientific peer review in AI conferences that took place after the release of ChatGPT: ICLR 2024, NeurIPS 2023, CoRL 2023 and EMNLP 2023. Our results suggest that between 6.5% and 16.9% of text submitted as peer reviews to these conferences could have been substantially modified by LLMs, i.e. beyond spell-checking or minor writing updates. The circumstances in which generated text occurs offer insight into user behavior: the estimated fraction of LLM-generated text is higher in reviews which report lower confidence, were submitted close to the deadline, and from reviewers who are less likely to respond to author rebuttals. We also observe corpus-level trends in generated text which may be too subtle to detect at the individual level, and discuss the implications of such trends on peer review. We call for future interdisciplinary work to examine how LLM use is changing our information and knowledge practices.

Create account to get full access

or

If you already have an account, we'll log you in

Overview

  • This paper explores the impact of large language models (LLMs) like ChatGPT on the peer review process for AI conference submissions.
  • The researchers developed a system to detect AI-generated content in peer reviews at scale and conducted a case study on the impact of ChatGPT on AI conference peer reviews.
  • The paper provides insights into the extent of AI-assisted peer review content and discusses the implications for the academic community.

Plain English Explanation

The paper examines how the rise of powerful language models like ChatGPT is affecting the peer review process for academic papers, particularly in the field of artificial intelligence (AI). The researchers created a system to automatically detect when peer reviewers have used AI tools to generate or assist in writing their reviews.

They then applied this system to a case study of peer reviews for an AI conference, looking at the prevalence of AI-generated content. The findings suggest that AI-assisted peer reviewing is already quite widespread, with a significant portion of reviews containing content generated or influenced by language models like ChatGPT.

This raises important questions about the integrity of the peer review process and the potential impacts on the quality of research. The paper discusses the implications for the academic community, such as the need to develop new policies and guidelines to address the use of AI in peer review.

Technical Explanation

The researchers developed a system to detect AI-generated content in peer reviews at scale. They trained language models to distinguish between human-written and AI-generated text, and applied this system to analyze peer reviews for an AI conference.

The key elements of their approach include:

  • Collecting a dataset of human-written and AI-generated text samples to train their detection models
  • Developing machine learning classifiers to identify AI-generated content with high accuracy
  • Applying the detection system to a large corpus of peer reviews for an AI conference

Through this analysis, the researchers found that a significant portion of the peer reviews contained content that was likely generated or influenced by AI language models like ChatGPT. This suggests that the use of AI tools in the peer review process is already quite widespread, even if not always disclosed.

The paper discusses the implications of these findings, including the potential impacts on the quality and integrity of peer review, as well as the need for the academic community to develop new policies and guidelines to address the use of AI in this context.

Critical Analysis

The paper provides a valuable case study on the impact of LLMs like ChatGPT on the peer review process, an issue that is becoming increasingly important as these technologies become more widely available and used.

One potential limitation of the research is the reliance on a single AI conference as the case study. While this provides a useful starting point, the prevalence of AI-assisted peer reviewing may vary across different research fields and publication venues. Expanding the analysis to a broader range of academic disciplines and conferences could yield additional insights.

Additionally, the paper does not delve deeply into the potential downstream consequences of AI-assisted peer review, such as the impact on research quality, the fairness and objectivity of the review process, or the broader societal implications. Further research in these areas would be valuable.

That said, the paper makes a compelling case for the academic community to proactively address the challenges posed by the use of LLMs in peer review. The development of clear guidelines and best practices, as well as tools to help detect and mitigate AI-generated content, will be crucial to maintaining the integrity of the peer review system.

Conclusion

This paper provides an important case study on the impact of large language models like ChatGPT on the peer review process for academic conferences, particularly in the field of AI. The researchers developed a system to detect AI-generated content in peer reviews at scale and found that a significant portion of reviews contained content likely produced or influenced by language models.

These findings highlight the need for the academic community to urgently address the challenges posed by the use of AI in peer review. Developing new policies, guidelines, and tools to ensure the integrity of the review process will be critical to maintaining the quality and trustworthiness of academic research. As language model usage continues to grow, this issue will only become more pressing in the years to come.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

The AI Review Lottery: Widespread AI-Assisted Peer Reviews Boost Paper Scores and Acceptance Rates

The AI Review Lottery: Widespread AI-Assisted Peer Reviews Boost Paper Scores and Acceptance Rates

Giuseppe Russo Latona, Manoel Horta Ribeiro, Tim R. Davidson, Veniamin Veselovsky, Robert West

YC

0

Reddit

0

Journals and conferences worry that peer reviews assisted by artificial intelligence (AI), in particular, large language models (LLMs), may negatively influence the validity and fairness of the peer-review system, a cornerstone of modern science. In this work, we address this concern with a quasi-experimental study of the prevalence and impact of AI-assisted peer reviews in the context of the 2024 International Conference on Learning Representations (ICLR), a large and prestigious machine-learning conference. Our contributions are threefold. Firstly, we obtain a lower bound for the prevalence of AI-assisted reviews at ICLR 2024 using the GPTZero LLM detector, estimating that at least $15.8%$ of reviews were written with AI assistance. Secondly, we estimate the impact of AI-assisted reviews on submission scores. Considering pairs of reviews with different scores assigned to the same paper, we find that in $53.4%$ of pairs the AI-assisted review scores higher than the human review ($p = 0.002$; relative difference in probability of scoring higher: $+14.4%$ in favor of AI-assisted reviews). Thirdly, we assess the impact of receiving an AI-assisted peer review on submission acceptance. In a matched study, submissions near the acceptance threshold that received an AI-assisted peer review were $4.9$ percentage points ($p = 0.024$) more likely to be accepted than submissions that did not. Overall, we show that AI-assisted reviews are consequential to the peer-review process and offer a discussion on future implications of current trends

Read more

5/6/2024

Delving into ChatGPT usage in academic writing through excess vocabulary

Delving into ChatGPT usage in academic writing through excess vocabulary

Dmitry Kobak, Rita Gonz'alez M'arquez, EmH{o}ke-'Agnes Horv'at, Jan Lause

YC

0

Reddit

0

Recent large language models (LLMs) can generate and revise text with human-level performance, and have been widely commercialized in systems like ChatGPT. These models come with clear limitations: they can produce inaccurate information, reinforce existing biases, and be easily misused. Yet, many scientists have been using them to assist their scholarly writing. How wide-spread is LLM usage in the academic literature currently? To answer this question, we use an unbiased, large-scale approach, free from any assumptions on academic LLM usage. We study vocabulary changes in 14 million PubMed abstracts from 2010-2024, and show how the appearance of LLMs led to an abrupt increase in the frequency of certain style words. Our analysis based on excess words usage suggests that at least 10% of 2024 abstracts were processed with LLMs. This lower bound differed across disciplines, countries, and journals, and was as high as 30% for some PubMed sub-corpora. We show that the appearance of LLM-based writing assistants has had an unprecedented impact in the scientific literature, surpassing the effect of major world events such as the Covid pandemic.

Read more

6/12/2024

Is ChatGPT Transforming Academics' Writing Style?

Is ChatGPT Transforming Academics' Writing Style?

Mingmeng Geng, Roberto Trotta

YC

0

Reddit

0

Based on one million arXiv papers submitted from May 2018 to January 2024, we assess the textual density of ChatGPT's writing style in their abstracts by means of a statistical analysis of word frequency changes. Our model is calibrated and validated on a mixture of real abstracts and ChatGPT-modified abstracts (simulated data) after a careful noise analysis. We find that ChatGPT is having an increasing impact on arXiv abstracts, especially in the field of computer science, where the fraction of ChatGPT-revised abstracts is estimated to be approximately 35%, if we take the output of one of the simplest prompts, revise the following sentences, as a baseline. We conclude with an analysis of both positive and negative aspects of the penetration of ChatGPT into academics' writing style.

Read more

4/15/2024

🏋️

A Perspective Study on Chinese Social Media regarding LLM for Education and Beyond

Yao Tian, Chengwei Tong, Lik-Hang Lee, Reza Hadi Mogavi, Yong Liao, Pengyuan Zhou

YC

0

Reddit

0

The application of AI-powered tools has piqued the interest of many fields, particularly in the academic community. This study uses ChatGPT, currently the most powerful and popular AI tool, as a representative example to analyze how the Chinese public perceives the potential of large language models (LLMs) for educational and general purposes. Although facing accessibility challenges, we found that the number of discussions on ChatGPT per month is 16 times that of Ernie Bot developed by Baidu, the most popular alternative product to ChatGPT in the mainland, making ChatGPT a more suitable subject for our analysis. The study also serves as the first effort to investigate the changes in public opinion as AI technologies become more advanced and intelligent. The analysis reveals that, upon first encounters with advanced AI that was not yet highly capable, some social media users believed that AI advancements would benefit education and society, while others feared that advanced AI, like ChatGPT, would make humans feel inferior and lead to problems such as cheating and a decline in moral principles. The majority of users remained neutral. Interestingly, with the rapid development and improvement of AI capabilities, public attitudes have tended to shift in a positive direction. We present a thorough analysis of the trending shift and a roadmap to ensure the ethical application of ChatGPT-like models in education and beyond.

Read more

6/3/2024