Delving into ChatGPT usage in academic writing through excess vocabulary

2406.07016

YC

0

Reddit

0

Published 6/12/2024 by Dmitry Kobak, Rita Gonz'alez M'arquez, EmH{o}ke-'Agnes Horv'at, Jan Lause
Delving into ChatGPT usage in academic writing through excess vocabulary

Abstract

Recent large language models (LLMs) can generate and revise text with human-level performance, and have been widely commercialized in systems like ChatGPT. These models come with clear limitations: they can produce inaccurate information, reinforce existing biases, and be easily misused. Yet, many scientists have been using them to assist their scholarly writing. How wide-spread is LLM usage in the academic literature currently? To answer this question, we use an unbiased, large-scale approach, free from any assumptions on academic LLM usage. We study vocabulary changes in 14 million PubMed abstracts from 2010-2024, and show how the appearance of LLMs led to an abrupt increase in the frequency of certain style words. Our analysis based on excess words usage suggests that at least 10% of 2024 abstracts were processed with LLMs. This lower bound differed across disciplines, countries, and journals, and was as high as 30% for some PubMed sub-corpora. We show that the appearance of LLM-based writing assistants has had an unprecedented impact in the scientific literature, surpassing the effect of major world events such as the Covid pandemic.

Create account to get full access

or

If you already have an account, we'll log you in

Overview

  • This paper investigates the impact of ChatGPT, a large language model, on academic writing styles.
  • The researchers analyze the use of excess vocabulary, such as rare and complex words, in student writing to understand how ChatGPT may be transforming academic writing.
  • The study draws insights from various related papers that explore the influence of language models on citation patterns, student usage, and public perception.

Plain English Explanation

The researchers were interested in understanding how the use of ChatGPT, a powerful artificial intelligence chatbot, is affecting the way students write for academic purposes. They focused on analyzing the use of uncommon and advanced vocabulary words in student writing, as this can be an indicator of how language models are shaping academic writing styles.

To provide context, the researchers also looked at other studies that have explored related topics, such as how large language models can influence citation patterns in research papers, how students are using these models, and how the public views the impact of these technologies on academia.

Technical Explanation

The paper examines the use of excess vocabulary, which refers to the inclusion of rare and complex words, in student writing as a means of understanding the impact of ChatGPT on academic writing styles. The researchers draw insights from several related studies, including:

The researchers analyze the usage of excess vocabulary in student writing to gain insights into how ChatGPT and similar language models may be influencing academic writing styles.

Critical Analysis

The paper provides a valuable exploration of the potential impact of ChatGPT on academic writing, but it also acknowledges several caveats and limitations. The researchers note that the use of excess vocabulary is just one indicator of writing style changes and that further research is needed to fully understand the complex ways in which language models are shaping academic discourse.

Additionally, the paper raises the need to consider the ethical implications of language model integration in academic settings, such as concerns around academic integrity and the potential for misuse. The researchers encourage readers to think critically about the research and to form their own opinions on the impact of these technologies on the academic landscape.

Conclusion

This paper presents a timely investigation into the influence of ChatGPT and similar large language models on academic writing styles, focusing on the use of excess vocabulary as a proxy for understanding this phenomenon. By drawing insights from related research, the study provides a nuanced perspective on the potential transformative effects of these technologies on academic writing and the need for continued critical analysis and discussion in this rapidly evolving field.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Is ChatGPT Transforming Academics' Writing Style?

Is ChatGPT Transforming Academics' Writing Style?

Mingmeng Geng, Roberto Trotta

YC

0

Reddit

0

Based on one million arXiv papers submitted from May 2018 to January 2024, we assess the textual density of ChatGPT's writing style in their abstracts by means of a statistical analysis of word frequency changes. Our model is calibrated and validated on a mixture of real abstracts and ChatGPT-modified abstracts (simulated data) after a careful noise analysis. We find that ChatGPT is having an increasing impact on arXiv abstracts, especially in the field of computer science, where the fraction of ChatGPT-revised abstracts is estimated to be approximately 35%, if we take the output of one of the simplest prompts, revise the following sentences, as a baseline. We conclude with an analysis of both positive and negative aspects of the penetration of ChatGPT into academics' writing style.

Read more

4/15/2024

Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews

Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews

Weixin Liang, Zachary Izzo, Yaohui Zhang, Haley Lepp, Hancheng Cao, Xuandong Zhao, Lingjiao Chen, Haotian Ye, Sheng Liu, Zhi Huang, Daniel A. McFarland, James Y. Zou

YC

0

Reddit

0

We present an approach for estimating the fraction of text in a large corpus which is likely to be substantially modified or produced by a large language model (LLM). Our maximum likelihood model leverages expert-written and AI-generated reference texts to accurately and efficiently examine real-world LLM-use at the corpus level. We apply this approach to a case study of scientific peer review in AI conferences that took place after the release of ChatGPT: ICLR 2024, NeurIPS 2023, CoRL 2023 and EMNLP 2023. Our results suggest that between 6.5% and 16.9% of text submitted as peer reviews to these conferences could have been substantially modified by LLMs, i.e. beyond spell-checking or minor writing updates. The circumstances in which generated text occurs offer insight into user behavior: the estimated fraction of LLM-generated text is higher in reviews which report lower confidence, were submitted close to the deadline, and from reviewers who are less likely to respond to author rebuttals. We also observe corpus-level trends in generated text which may be too subtle to detect at the individual level, and discuss the implications of such trends on peer review. We call for future interdisciplinary work to examine how LLM use is changing our information and knowledge practices.

Read more

6/18/2024

🎯

An empirical study to understand how students use ChatGPT for writing essays and how it affects their ownership

Andrew Jelson, Sang Won Lee

YC

0

Reddit

0

As large language models (LLMs) become more powerful and ubiquitous, systems like ChatGPT are increasingly used by students to help them with writing tasks. To better understand how these tools are used, we investigate how students might use an LLM for essay writing, for example, to study the queries asked to ChatGPT and the responses that ChatGPT gives. To that end, we plan to conduct a user study that will record the user writing process and present them with the opportunity to use ChatGPT as an AI assistant. This study's findings will help us understand how these tools are used and how practitioners -- such as educators and essay readers -- should consider writing education and evaluation based on essay writing.

Read more

5/24/2024

Inclusivity in Large Language Models: Personality Traits and Gender Bias in Scientific Abstracts

New!Inclusivity in Large Language Models: Personality Traits and Gender Bias in Scientific Abstracts

Naseela Pervez, Alexander J. Titus

YC

0

Reddit

0

Large language models (LLMs) are increasingly utilized to assist in scientific and academic writing, helping authors enhance the coherence of their articles. Previous studies have highlighted stereotypes and biases present in LLM outputs, emphasizing the need to evaluate these models for their alignment with human narrative styles and potential gender biases. In this study, we assess the alignment of three prominent LLMs - Claude 3 Opus, Mistral AI Large, and Gemini 1.5 Flash - by analyzing their performance on benchmark text-generation tasks for scientific abstracts. We employ the Linguistic Inquiry and Word Count (LIWC) framework to extract lexical, psychological, and social features from the generated texts. Our findings indicate that, while these models generally produce text closely resembling human authored content, variations in stylistic features suggest significant gender biases. This research highlights the importance of developing LLMs that maintain a diversity of writing styles to promote inclusivity in academic discourse.

Read more

7/1/2024