Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models

2402.14207

Published 4/9/2024 by Yijia Shao, Yucheng Jiang, Theodore A. Kanell, Peter Xu, Omar Khattab, Monica S. Lam

Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models

Abstract

We study how to apply large language models to write grounded and organized long-form articles from scratch, with comparable breadth and depth to Wikipedia pages. This underexplored problem poses new challenges at the pre-writing stage, including how to research the topic and prepare an outline prior to writing. We propose STORM, a writing system for the Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking. STORM models the pre-writing stage by (1) discovering diverse perspectives in researching the given topic, (2) simulating conversations where writers carrying different perspectives pose questions to a topic expert grounded on trusted Internet sources, (3) curating the collected information to create an outline. For evaluation, we curate FreshWiki, a dataset of recent high-quality Wikipedia articles, and formulate outline assessments to evaluate the pre-writing stage. We further gather feedback from experienced Wikipedia editors. Compared to articles generated by an outline-driven retrieval-augmented baseline, more of STORM's articles are deemed to be organized (by a 25% absolute increase) and broad in coverage (by 10%). The expert feedback also helps identify new challenges for generating grounded long articles, such as source bias transfer and over-association of unrelated facts.

Create account to get full access

Overview

This paper presents FreshWiki, a method for assisting in the writing of Wikipedia-like articles from scratch using large language models.
The key idea is to use a pre-trained language model to generate relevant information and content for new Wikipedia articles, which can then be edited and refined by human authors.
The researchers create a dataset of "fresh" Wikipedia articles (i.e., recently created) and use this to train and evaluate their system.

Plain English Explanation

FreshWiki is a system that helps people write new Wikipedia-style articles from scratch. The researchers behind it wanted to make it easier for people to create high-quality encyclopedia entries, even if they don't have a lot of expertise on the topic.

The way it works is by using a powerful AI language model that has been trained on a huge amount of text data. This model can then generate relevant information and content to kickstart the article-writing process. For example, if you wanted to create a new Wikipedia page on a topic you're not an expert in, FreshWiki could provide an initial draft with key facts, ideas, and even some prose that you could then refine and expand upon.

The researchers built a dataset of recently created Wikipedia articles, which they call "fresh" articles, to train and test their system. By learning from these new pages, FreshWiki can better understand how to generate content that fits the style and format of a typical Wikipedia entry.

Technical Explanation

The core of the FreshWiki system is a large language model that has been pre-trained on a vast corpus of text data, including many existing Wikipedia articles. This model is then fine-tuned on the FreshWiki dataset, which contains the "fresh" Wikipedia articles mentioned earlier.

During the fine-tuning process, the model learns to generate content that is well-suited for new Wikipedia-style entries. This includes things like accurately summarizing key information, introducing topics in a clear and engaging way, and producing text that follows the conventions of encyclopedic writing.

When a user wants to create a new article, they provide FreshWiki with a high-level topic or title. The system then uses its language model to generate an initial draft, which the user can then edit, expand, and refine as needed. The researchers found that this approach can significantly accelerate the article-writing process and help produce higher-quality results, especially for users who may not be domain experts.

Critical Analysis

The FreshWiki research represents an interesting and potentially valuable application of large language models. By leveraging the immense knowledge and generation capabilities of these models, the system can provide a helpful starting point for creating new Wikipedia-style content.

However, the paper also acknowledges some important limitations and areas for further work. For example, the researchers note that the generated content may still contain factual inaccuracies or biases present in the training data. There are also questions about how well the system would scale to handling more complex or niche topics, where the language model may have less reliable information to draw from.

Additionally, while FreshWiki can accelerate the article-writing process, there are concerns about the potential for over-reliance on the AI-generated content. It will be important to ensure that human authors remain actively engaged in the writing and editing process, rather than simply adopting the machine-generated text wholesale.

Further research could explore ways to better integrate the human and AI contributions, perhaps by having the language model act more as a collaborator or research assistant than a primary author. Exploring the ethical implications of such systems will also be crucial as they become more prevalent.

Conclusion

Overall, the FreshWiki research represents an intriguing step forward in leveraging large language models to assist in the creation of high-quality, encyclopedic content. By automating some of the initial research and content generation tasks, the system has the potential to make it easier for people to contribute to collaborative knowledge repositories like Wikipedia.

However, the paper also highlights the need to carefully consider the limitations and potential risks of these AI-powered writing tools. As language models continue to advance, it will be important to find the right balance between human and machine contributions, ensuring that the final products maintain reliability, accuracy, and a strong authorial voice.

Simple techniques for enhancing the capabilities of language models, like the ones explored in this research, could play a valuable role in the future of collaborative content creation. But the responsible development and deployment of such systems will be crucial to realize their full potential.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

Navigating the Path of Writing: Outline-guided Text Generation with Large Language Models

Yukyung Lee, Soonwon Ka, Bokyung Son, Pilsung Kang, Jaewook Kang

Large Language Models (LLMs) have significantly impacted the writing process, enabling collaborative content creation and enhancing productivity. However, generating high-quality, user-aligned text remains challenging. In this paper, we propose Writing Path, a framework that uses explicit outlines to guide LLMs in generating goal-oriented, high-quality pieces of writing. Our approach draws inspiration from structured writing planning and reasoning paths, focusing on capturing and reflecting user intentions throughout the writing process. We construct a diverse dataset from unstructured blog posts to benchmark writing performance and introduce a comprehensive evaluation framework assessing the quality of outlines and generated texts. Our evaluations with GPT-3.5-turbo, GPT-4, and HyperCLOVA X demonstrate that the Writing Path approach significantly enhances text quality according to both LLMs and human evaluations. This study highlights the potential of integrating writing-specific techniques into LLMs to enhance their ability to meet the diverse writing needs of users.

4/23/2024

cs.CL cs.AI cs.HC

💬

Eliciting Topic Hierarchies from Large Language Models

Grace Li, Tao Long, Lydia B. Chilton

Current research has explored how Generative AI can support the brainstorming process for content creators, but a gap remains in exploring support-tools for the pre-writing process. Specifically, our research is focused on supporting users in finding topics at the right level of specificity for their audience. This process is called topic scoping. Topic scoping is a cognitively demanding task, requiring users to actively recall subtopics in a given domain. This manual approach also reduces the diversity of subtopics that a user is able to explore. We propose using Large Language Models (LLMs) to support the process of topic scoping by iteratively generating subtopics at increasing levels of specificity: dynamically creating topic hierarchies. We tested three different prompting strategies and found that increasing the amount of context included in the prompt improves subtopic generation by 20 percentage points. Finally, we discuss applications of this research in education, content creation, and product management.

6/19/2024

cs.HC

Language-Agnostic Modeling of Wikipedia Articles for Content Quality Assessment across Languages

Paramita Das, Isaac Johnson, Diego Saez-Trumper, Pablo Arag'on

Wikipedia is the largest web repository of free knowledge. Volunteer editors devote time and effort to creating and expanding articles in more than 300 language editions. As content quality varies from article to article, editors also spend substantial time rating articles with specific criteria. However, keeping these assessments complete and up-to-date is largely impossible given the ever-changing nature of Wikipedia. To overcome this limitation, we propose a novel computational framework for modeling the quality of Wikipedia articles. State-of-the-art approaches to model Wikipedia article quality have leveraged machine learning techniques with language-specific features. In contrast, our framework is based on language-agnostic structural features extracted from the articles, a set of universal weights, and a language version-specific normalization criterion. Therefore, we ensure that all language editions of Wikipedia can benefit from our framework, even those that do not have their own quality assessment scheme. Using this framework, we have built datasets with the feature values and quality scores of all revisions of all articles in the existing language versions of Wikipedia. We provide a descriptive analysis of these resources and a benchmark of our framework. In addition, we discuss possible downstream tasks to be addressed with these datasets, which are released for public use.

4/16/2024

cs.CY

Can Large Language Models Automatically Score Proficiency of Written Essays?

Watheq Mansour, Salam Albatarni, Sohaila Eltanbouly, Tamer Elsayed

Although several methods were proposed to address the problem of automated essay scoring (AES) in the last 50 years, there is still much to desire in terms of effectiveness. Large Language Models (LLMs) are transformer-based models that demonstrate extraordinary capabilities on various tasks. In this paper, we test the ability of LLMs, given their powerful linguistic knowledge, to analyze and effectively score written essays. We experimented with two popular LLMs, namely ChatGPT and Llama. We aim to check if these models can do this task and, if so, how their performance is positioned among the state-of-the-art (SOTA) models across two levels, holistically and per individual writing trait. We utilized prompt-engineering tactics in designing four different prompts to bring their maximum potential to this task. Our experiments conducted on the ASAP dataset revealed several interesting observations. First, choosing the right prompt depends highly on the model and nature of the task. Second, the two LLMs exhibited comparable average performance in AES, with a slight advantage for ChatGPT. Finally, despite the performance gap between the two LLMs and SOTA models in terms of predictions, they provide feedback to enhance the quality of the essays, which can potentially help both teachers and students.

4/17/2024

cs.CL cs.AI