Evaluating Diversity in Automatic Poetry Generation

Read original: arXiv:2406.15267 - Published 6/24/2024 by Yanran Chen, Hannes Groner, Sina Zarrie{ss}, Steffen Eger

Evaluating Diversity in Automatic Poetry Generation

Overview

• This paper evaluates the diversity of automatic poetry generation models by examining their ability to produce a wide range of unique and novel poems.

• The researchers analyze several automatic poetry generation models, including leveraging large language models for NLG evaluation advances, to assess their output diversity.

• The paper explores how factors like training data size, model architecture, and decoding methods can impact the diversity of the generated poems.

Plain English Explanation

Automatic poetry generation is an area of artificial intelligence (AI) research that aims to create computer programs capable of generating original poems. This paper looks at how well these AI poetry generators are able to produce a diverse range of unique and novel poems.

The researchers analyze several different automatic poetry generation models, including ones that use large language models, to assess how diverse the poems they create are. They examine how factors like the size of the training data, the structure of the AI model, and the techniques used to generate the final poems can influence the diversity of the output.

The goal is to better understand the capabilities and limitations of current automatic poetry generation systems, and identify ways to improve their ability to generate a wide variety of creative and original poems.

Technical Explanation

The paper investigates the diversity of automatic poetry generation models by analyzing the uniqueness and novelty of the poems they produce. The researchers evaluate several different poetry generation approaches, including curious decline of linguistic diversity in training language models and leveraging large language models for NLG evaluation advances.

Key factors examined include the size of the training dataset, the model architecture (e.g. innovations in neural data-to-text generation), and the decoding methods used to generate the final poems (e.g. beyond Turing: comparative analysis of approaches to detecting machine). The researchers use a combination of automatic and human evaluation metrics to assess the diversity of the generated poems.

Critical Analysis

The paper provides a thorough evaluation of diversity in automatic poetry generation, but there are some potential limitations to consider. The analysis is focused on a specific set of poetry generation models, and the results may not generalize to other approaches or future advancements in the field.

Additionally, the human evaluation of poem diversity could be subjective, and the automatic metrics used may not fully capture all nuances of creative expression. Further research is needed to develop more robust and comprehensive ways to assess the diversity of machine-generated poetry.

The paper also does not address potential biases or representational issues in the training data and models, which could impact the diversity and inclusivity of the generated poems. Machine-generated versus user-generated content could be an interesting area for future exploration.

Conclusion

This paper offers valuable insights into the current state of automatic poetry generation and the factors that influence the diversity of the output. The findings suggest that while progress has been made, there is still room for improvement in developing AI systems that can reliably produce a wide range of unique and creative poems.

The research highlights the importance of carefully designing poetry generation models and training approaches to maximize diversity, and the need for more robust evaluation methods to assess the creative capabilities of these systems. As the field of AI poetry generation continues to evolve, addressing these challenges will be crucial for unlocking the full potential of machine-generated verse.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Evaluating Diversity in Automatic Poetry Generation

Yanran Chen, Hannes Groner, Sina Zarrie{ss}, Steffen Eger

Natural Language Generation (NLG), and more generally generative AI, are among the currently most impactful research fields. Creative NLG, such as automatic poetry generation, is a fascinating niche in this area. While most previous research has focused on forms of the Turing test when evaluating automatic poetry generation - can humans distinguish between automatic and human generated poetry - we evaluate the diversity of automatically generated poetry, by comparing distributions of generated poetry to distributions of human poetry along structural, lexical, semantic and stylistic dimensions, assessing different model types (word vs. character-level, general purpose LLMs vs. poetry-specific models), including the very recent LLaMA3, and types of fine-tuning (conditioned vs. unconditioned). We find that current automatic poetry systems are considerably underdiverse along multiple dimensions - they often do not rhyme sufficiently, are semantically too uniform and even do not match the length distribution of human poetry. Our experiments reveal, however, that style-conditioning and character-level modeling clearly increases diversity across virtually all dimensions we explore. Our identified limitations may serve as the basis for more genuinely diverse future poetry generation models.

6/24/2024

LLM-based multi-agent poetry generation in non-cooperative environments

Ran Zhang, Steffen Eger

Despite substantial progress of large language models (LLMs) for automatic poetry generation, the generated poetry lacks diversity while the training process differs greatly from human learning. Under the rationale that the learning process of the poetry generation systems should be more human-like and their output more diverse and novel, we introduce a framework based on social learning where we emphasize non-cooperative interactions besides cooperative interactions to encourage diversity. Our experiments are the first attempt at LLM-based multi-agent systems in non-cooperative environments for poetry generation employing both TRAINING-BASED agents (GPT-2) and PROMPTING-BASED agents (GPT-3 and GPT-4). Our evaluation based on 96k generated poems shows that our framework benefits the poetry generation process for TRAINING-BASED agents resulting in 1) a 3.0-3.7 percentage point (pp) increase in diversity and a 5.6-11.3 pp increase in novelty according to distinct and novel n-grams. The generated poetry from TRAINING-BASED agents also exhibits group divergence in terms of lexicons, styles and semantics. PROMPTING-BASED agents in our framework also benefit from non-cooperative environments and a more diverse ensemble of models with non-homogeneous agents has the potential to further enhance diversity, with an increase of 7.0-17.5 pp according to our experiments. However, PROMPTING-BASED agents show a decrease in lexical diversity over time and do not exhibit the group-based divergence intended in the social network. Our paper argues for a paradigm shift in creative tasks such as automatic poetry generation to include social learning processes (via LLM-based agent modeling) similar to human interaction.

9/9/2024

Automatic Metrics in Natural Language Generation: A Survey of Current Evaluation Practices

Patr'icia Schmidtov'a, Saad Mahamood, Simone Balloccu, Ondv{r}ej Duv{s}ek, Albert Gatt, Dimitra Gkatzia, David M. Howcroft, Ondv{r}ej Pl'atek, Adarsa Sivaprasad

Automatic metrics are extensively used to evaluate natural language processing systems. However, there has been increasing focus on how they are used and reported by practitioners within the field. In this paper, we have conducted a survey on the use of automatic metrics, focusing particularly on natural language generation (NLG) tasks. We inspect which metrics are used as well as why they are chosen and how their use is reported. Our findings from this survey reveal significant shortcomings, including inappropriate metric usage, lack of implementation details and missing correlations with human judgements. We conclude with recommendations that we believe authors should follow to enable more rigour within the field.

8/20/2024

🤖

Decoding AI and Human Authorship: Nuances Revealed Through NLP and Statistical Analysis

Mayowa Akinwande, Oluwaseyi Adeliyi, Toyyibat Yussuph

This research explores the nuanced differences in texts produced by AI and those written by humans, aiming to elucidate how language is expressed differently by AI and humans. Through comprehensive statistical data analysis, the study investigates various linguistic traits, patterns of creativity, and potential biases inherent in human-written and AI- generated texts. The significance of this research lies in its contribution to understanding AI's creative capabilities and its impact on literature, communication, and societal frameworks. By examining a meticulously curated dataset comprising 500K essays spanning diverse topics and genres, generated by LLMs, or written by humans, the study uncovers the deeper layers of linguistic expression and provides insights into the cognitive processes underlying both AI and human-driven textual compositions. The analysis revealed that human-authored essays tend to have a higher total word count on average than AI-generated essays but have a shorter average word length compared to AI- generated essays, and while both groups exhibit high levels of fluency, the vocabulary diversity of Human authored content is higher than AI generated content. However, AI- generated essays show a slightly higher level of novelty, suggesting the potential for generating more original content through AI systems. The paper addresses challenges in assessing the language generation capabilities of AI models and emphasizes the importance of datasets that reflect the complexities of human-AI collaborative writing. Through systematic preprocessing and rigorous statistical analysis, this study offers valuable insights into the evolving landscape of AI-generated content and informs future developments in natural language processing (NLP).

8/6/2024