LLM-based multi-agent poetry generation in non-cooperative environments

Read original: arXiv:2409.03659 - Published 9/9/2024 by Ran Zhang, Steffen Eger

LLM-based multi-agent poetry generation in non-cooperative environments

Overview

The paper explores the use of large language models (LLMs) for multi-agent poetry generation in non-cooperative environments.
It examines how LLM-based agents can engage in adversarial poetry generation, where they compete to produce the most aesthetically pleasing poems.
The research aims to advance the state of the art in generative AI systems and understand the implications of AI-driven content creation in competitive settings.

Plain English Explanation

The paper describes a system where multiple AI agents use large language models to generate poetry in a competitive environment. Each agent tries to create the most artistically compelling poems, competing against the other agents.

This research explores how AI-powered poetry generation can work when the AI systems don't cooperate, but instead try to outperform each other. The goal is to push the boundaries of what generative AI can do and understand the implications of having AI systems compete to create content like poems.

The key idea is that by having the AI agents compete, it could lead to more innovative and diverse poetry being produced, as the agents try to one-up each other. However, it also raises questions about the role of AI in creative endeavors and whether AI-generated art can truly be considered on par with human-created works.

Technical Explanation

The paper presents a framework for multi-agent poetry generation in non-cooperative environments. The system consists of several AI agents, each equipped with a large language model trained on a corpus of poetry.

The agents engage in an adversarial process, where they compete to generate the most aesthetically pleasing poems. Each agent tries to outperform the others by producing poems that score higher according to various metrics, such as originality, emotional impact, and adherence to poetic conventions.

The researchers conducted experiments to evaluate the diversity and quality of the poetry generated by the competing agents. They found that the non-cooperative setting led to the production of more diverse and innovative poems, as the agents explored different stylistic and thematic approaches in an effort to gain an advantage over their rivals.

Critical Analysis

The paper presents an intriguing concept, but there are some potential limitations and areas for further research. While the competitive dynamic may indeed drive more creative poetry generation, there are concerns about the authenticity and emotional depth of AI-generated poetry compared to human-written works.

Additionally, the long-term implications of AI systems competing to create content are not fully explored. There are ethical questions around the role of AI in creative domains and whether it could lead to the devaluation of human artistic expression.

The researchers acknowledge these concerns and suggest the need for further study on the social and cultural impacts of AI-driven poetry generation, as well as the development of techniques to ensure the responsible deployment of such systems.

Conclusion

This paper explores the fascinating intersection of large language models, multi-agent systems, and creative expression. By having AI agents compete to generate the most compelling poetry, the researchers have demonstrated the potential for AI to push the boundaries of generative art.

However, this research also raises important questions about the role of AI in creative domains and the long-term societal implications of AI-driven content creation. As the field of generative AI continues to evolve, it will be crucial to carefully consider the ethical and cultural ramifications of these technologies.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

LLM-based multi-agent poetry generation in non-cooperative environments

Ran Zhang, Steffen Eger

Despite substantial progress of large language models (LLMs) for automatic poetry generation, the generated poetry lacks diversity while the training process differs greatly from human learning. Under the rationale that the learning process of the poetry generation systems should be more human-like and their output more diverse and novel, we introduce a framework based on social learning where we emphasize non-cooperative interactions besides cooperative interactions to encourage diversity. Our experiments are the first attempt at LLM-based multi-agent systems in non-cooperative environments for poetry generation employing both TRAINING-BASED agents (GPT-2) and PROMPTING-BASED agents (GPT-3 and GPT-4). Our evaluation based on 96k generated poems shows that our framework benefits the poetry generation process for TRAINING-BASED agents resulting in 1) a 3.0-3.7 percentage point (pp) increase in diversity and a 5.6-11.3 pp increase in novelty according to distinct and novel n-grams. The generated poetry from TRAINING-BASED agents also exhibits group divergence in terms of lexicons, styles and semantics. PROMPTING-BASED agents in our framework also benefit from non-cooperative environments and a more diverse ensemble of models with non-homogeneous agents has the potential to further enhance diversity, with an increase of 7.0-17.5 pp according to our experiments. However, PROMPTING-BASED agents show a decrease in lexical diversity over time and do not exhibit the group-based divergence intended in the social network. Our paper argues for a paradigm shift in creative tasks such as automatic poetry generation to include social learning processes (via LLM-based agent modeling) similar to human interaction.

9/9/2024

LLM-POET: Evolving Complex Environments using Large Language Models

Fuma Aki, Riku Ikeda, Takumi Saito, Ciaran Regan, Mizuki Oka

Creating systems capable of generating virtually infinite variations of complex and novel behaviour without predetermined goals or limits is a major challenge in the field of AI. This challenge has been addressed through the development of several open-ended algorithms that can continuously generate new and diverse behaviours, such as the POET and Enhanced-POET algorithms for co-evolving environments and agent behaviour. One of the challenges with existing methods however, is that they struggle to continuously generate complex environments. In this work, we propose LLM-POET, a modification of the POET algorithm where the environment is both created and mutated using a Large Language Model (LLM). By fine-tuning a LLM with text representations of Evolution Gym environments and captions that describe the environment, we were able to generate complex and diverse environments using natural language. We found that not only could the LLM produce a diverse range of environments, but compared to the CPPNs used in Enhanced-POET for environment generation, the LLM allowed for a 34% increase in the performance gain of co-evolution. This increased performance suggests that the agents were able to learn a more diverse set of skills by training on more complex environments.

6/10/2024

Evaluating Diversity in Automatic Poetry Generation

Yanran Chen, Hannes Groner, Sina Zarrie{ss}, Steffen Eger

Natural Language Generation (NLG), and more generally generative AI, are among the currently most impactful research fields. Creative NLG, such as automatic poetry generation, is a fascinating niche in this area. While most previous research has focused on forms of the Turing test when evaluating automatic poetry generation - can humans distinguish between automatic and human generated poetry - we evaluate the diversity of automatically generated poetry, by comparing distributions of generated poetry to distributions of human poetry along structural, lexical, semantic and stylistic dimensions, assessing different model types (word vs. character-level, general purpose LLMs vs. poetry-specific models), including the very recent LLaMA3, and types of fine-tuning (conditioned vs. unconditioned). We find that current automatic poetry systems are considerably underdiverse along multiple dimensions - they often do not rhyme sufficiently, are semantically too uniform and even do not match the length distribution of human poetry. Our experiments reveal, however, that style-conditioning and character-level modeling clearly increases diversity across virtually all dimensions we explore. Our identified limitations may serve as the basis for more genuinely diverse future poetry generation models.

6/24/2024

The Fellowship of the LLMs: Multi-Agent Workflows for Synthetic Preference Optimization Dataset Generation

Samee Arif, Sualeha Farid, Abdul Hameed Azeemi, Awais Athar, Agha Ali Raza

This paper presents synthetic Preference Optimization (PO) datasets generated using multi-agent workflows and evaluates the effectiveness and potential of these workflows in the dataset generation process. PO dataset generation requires two modules: (1) response evaluation, and (2) response generation. In the response evaluation module, the responses from Large Language Models (LLMs) are evaluated and ranked - a task typically carried out by human annotators that we automate using LLMs. We assess the response evaluation module in a 2 step process. In step 1, we assess LLMs as evaluators using three distinct prompting strategies. In step 2, we apply the winning prompting strategy to compare the performance of LLM-as-a-Judge, LLMs-as-a-Jury, and LLM Debate. In each step, we use inter-rater agreement using Cohen's Kappa between human annotators and LLMs. For the response generation module, we compare different configurations for the LLM Feedback Loop using the identified LLM evaluator configuration. We use the win rate (the fraction of times a generation framework is selected as the best by an LLM evaluator) to determine the best multi-agent configuration for generation. After identifying the best configurations for both modules, we use models from the GPT, Gemma, and Llama families to generate our PO datasets using the above pipeline. We generate two types of PO datasets, one to improve the generation capabilities of individual LLM and the other to improve the multi-agent workflow. Our evaluation shows that GPT-4o-as-a-Judge is more consistent across datasets when the candidate responses do not include responses from the GPT family. Additionally, we find that the LLM Feedback Loop, with Llama as the generator and Gemma as the reviewer, achieves a notable 71.8% and 73.8% win rate over single-agent Llama and Gemma, respectively.

9/10/2024