Ex3: Automatic Novel Writing by Extracting, Excelsior and Expanding

Read original: arXiv:2408.08506 - Published 9/4/2024 by Lei Huang, Jiaming Guo, Guanhua He, Xishan Zhang, Rui Zhang, Shaohui Peng, Shaoli Liu, Tianshi Chen

Ex3: Automatic Novel Writing by Extracting, Excelsior and Expanding

Overview

This paper proposes a novel approach for automatic novel writing called "Ex3: Automatic Novel Writing by Extracting, Excelsior and Expanding".
The key ideas are to extract relevant content from existing novels, improve the quality of the extracted content using language models, and then expand upon it to generate new novel text.
The authors test their approach on several datasets and compare it to baseline methods, showing improvements in various metrics.

Plain English Explanation

The paper presents a system for automatically generating new novels by combining and building upon existing novels. The core idea is to extract relevant passages from a collection of novels, enhance those passages using powerful language models, and then expand on them to create new, original text.

The first step is to identify relevant content in existing novels that could be used as a starting point. This might include descriptions of settings, character dialogues, or plot points. The authors use techniques like text summarization to extract the most salient parts.

Next, the system improves the quality of the extracted content using advanced language models. These models can enhance the fluency, coherence, and creativity of the text, making it more engaging and polished.

Finally, the system expands on the enhanced passages to generate new, original novel text. It does this by taking the improved content and using data augmentation techniques to imagine new plot lines, characters, and scenes that build upon the existing material.

The key advantage of this approach is that it allows for the creation of novel-length narratives without starting from scratch. By leveraging existing novels, the system can generate high-quality content more efficiently than purely generative approaches. This could be useful for applications like creative writing assistance or novel prototyping.

Technical Explanation

The authors propose a three-stage framework called "Ex3" for automatic novel writing:

Extraction: The system first extracts relevant passages from a corpus of existing novels using techniques like text summarization. This provides a starting point of content to build upon.
Excelsior: The extracted passages are then fed into a series of language models that enhance their quality. This includes improving fluency, coherence, and creativity through techniques like data augmentation.
Expansion: Finally, the improved passages are used as prompts for a generative language model to expand upon, generating new novel text that builds on the original content.

The authors evaluate their approach on several datasets of novels, comparing it to baseline methods for text generation. They find that the Ex3 framework outperforms these baselines in terms of metrics like perplexity, coherence, and human evaluation of the generated text.

Critical Analysis

The authors acknowledge several limitations and areas for future work. For example, the quality of the generated text is still imperfect and may require further refinement. Additionally, the system currently relies on a fixed corpus of existing novels, which limits its ability to generate truly novel content.

One potential issue is that by heavily leveraging existing material, the system may struggle to create truly original narratives and characters. There is a risk of simply recombining common tropes and plotlines in predictable ways.

Further research could explore ways to increase the creativity and divergence of the generated text, perhaps by incorporating more diverse sources of inspiration or techniques for open-ended content generation. Integrating the system with interactive writing tools could also enhance the user experience and enable more collaborative forms of novel creation.

Conclusion

This paper presents a promising approach for automating the novel writing process by combining extraction, enhancement, and expansion of existing literary content. While not a complete solution, the Ex3 framework demonstrates the potential for AI-assisted creativity in the domain of fiction writing.

As language models and text generation techniques continue to advance, systems like this could become powerful tools for writers, enabling faster prototyping, ideation, and even collaborative authorship. However, care must be taken to ensure the generated content retains a sense of originality and avoids simply regurgitating common tropes.

Overall, this research represents an important step towards bridging the gap between human creativity and machine intelligence in the realm of literary expression.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Ex3: Automatic Novel Writing by Extracting, Excelsior and Expanding

Lei Huang, Jiaming Guo, Guanhua He, Xishan Zhang, Rui Zhang, Shaohui Peng, Shaoli Liu, Tianshi Chen

Generating long-term texts such as novels using artificial intelligence has always been a challenge. A common approach is to use large language models (LLMs) to construct a hierarchical framework that first plans and then writes. Despite the fact that the generated novels reach a sufficient length, they exhibit poor logical coherence and appeal in their plots and deficiencies in character and event depiction, ultimately compromising the overall narrative quality. In this paper, we propose a method named Extracting Excelsior and Expanding. Ex3 initially extracts structure information from raw novel data. By combining this structure information with the novel data, an instruction-following dataset is meticulously crafted. This dataset is then utilized to fine-tune the LLM, aiming for excelsior generation performance. In the final stage, a tree-like expansion method is deployed to facilitate the generation of arbitrarily long novels. Evaluation against previous methods showcases Ex3's ability to produce higher-quality long-form novels.

9/4/2024

Scaling Up Summarization: Leveraging Large Language Models for Long Text Extractive Summarization

L'eo Hemamou, Mehdi Debiane

In an era where digital text is proliferating at an unprecedented rate, efficient summarization tools are becoming indispensable. While Large Language Models (LLMs) have been successfully applied in various NLP tasks, their role in extractive text summarization remains underexplored. This paper introduces EYEGLAXS (Easy Yet Efficient larGe LAnguage model for eXtractive Summarization), a framework that leverages LLMs, specifically LLAMA2-7B and ChatGLM2-6B, for extractive summarization of lengthy text documents. Instead of abstractive methods, which often suffer from issues like factual inaccuracies and hallucinations, EYEGLAXS focuses on extractive summarization to ensure factual and grammatical integrity. Utilizing state-of-the-art techniques such as Flash Attention and Parameter-Efficient Fine-Tuning (PEFT), EYEGLAXS addresses the computational and resource challenges typically associated with LLMs. The system sets new performance benchmarks on well-known datasets like PubMed and ArXiv. Furthermore, we extend our research through additional analyses that explore the adaptability of LLMs in handling different sequence lengths and their efficiency in training on smaller datasets. These contributions not only set a new standard in the field but also open up promising avenues for future research in extractive text summarization.

8/29/2024

🛸

Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation

Bernd Bohnet, Kevin Swersky, Rosanne Liu, Pranjal Awasthi, Azade Nova, Javier Snaider, Hanie Sedghi, Aaron T Parisi, Michael Collins, Angeliki Lazaridou, Orhan Firat, Noah Fiedel

We explore the use of long-context capabilities in large language models to create synthetic reading comprehension data from entire books. Previous efforts to construct such datasets relied on crowd-sourcing, but the emergence of transformers with a context size of 1 million or more tokens now enables entirely automatic approaches. Our objective is to test the capabilities of LLMs to analyze, understand, and reason over problems that require a detailed comprehension of long spans of text, such as questions involving character arcs, broader themes, or the consequences of early actions later in the story. We propose a holistic pipeline for automatic data generation including question generation, answering, and model scoring using an ``Evaluator''. We find that a relative approach, comparing answers between models in a pairwise fashion and ranking with a Bradley-Terry model, provides a more consistent and differentiating scoring mechanism than an absolute scorer that rates answers individually. We also show that LLMs from different model families produce moderate agreement in their ratings. We ground our approach using the manually curated NarrativeQA dataset, where our evaluator shows excellent agreement with human judgement and even finds errors in the dataset. Using our automatic evaluation approach, we show that using an entire book as context produces superior reading comprehension performance compared to baseline no-context (parametric knowledge only) and retrieval-based approaches.

6/4/2024

A System for Automatic English Text Expansion

Silvia Garc'ia M'endez, Milagros Fern'andez Gavilanes, Enrique Costa Montenegro, Jonathan Juncal Mart'inez, Francisco Javier Gonz'alez Casta~no, Ehud Reiter

We present an automatic text expansion system to generate English sentences, which performs automatic Natural Language Generation (NLG) by combining linguistic rules with statistical approaches. Here, automatic means that the system can generate coherent and correct sentences from a minimum set of words. From its inception, the design is modular and adaptable to other languages. This adaptability is one of its greatest advantages. For English, we have created the highly precise aLexiE lexicon with wide coverage, which represents a contribution on its own. We have evaluated the resulting NLG library in an Augmentative and Alternative Communication (AAC) proof of concept, both directly (by regenerating corpus sentences) and manually (from annotations) using a popular corpus in the NLG field. We performed a second analysis by comparing the quality of text expansion in English to Spanish, using an ad-hoc Spanish-English parallel corpus. The system might also be applied to other domains such as report and news generation.

5/29/2024