Designing and Evaluating Dialogue LLMs for Co-Creative Improvised Theatre

2405.07111

YC

0

Reddit

0

Published 5/14/2024 by Boyd Branch, Piotr Mirowski, Kory Mathewson, Sophia Ppali, Alexandra Covaci
Designing and Evaluating Dialogue LLMs for Co-Creative Improvised Theatre

Abstract

Social robotics researchers are increasingly interested in multi-party trained conversational agents. With a growing demand for real-world evaluations, our study presents Large Language Models (LLMs) deployed in a month-long live show at the Edinburgh Festival Fringe. This case study investigates human improvisers co-creating with conversational agents in a professional theatre setting. We explore the technical capabilities and constraints of on-the-spot multi-party dialogue, providing comprehensive insights from both audience and performer experiences with AI on stage. Our human-in-the-loop methodology underlines the challenges of these LLMs in generating context-relevant responses, stressing the user interface's crucial role. Audience feedback indicates an evolving interest for AI-driven live entertainment, direct human-AI interaction, and a diverse range of expectations about AI's conversational competence and utility as a creativity support tool. Human performers express immense enthusiasm, varied satisfaction, and the evolving public opinion highlights mixed emotions about AI's role in arts.

Create account to get full access

or

If you already have an account, we'll log you in

Overview

  • This paper explores the design and evaluation of large language models (LLMs) for co-creative improvised theatre performances.
  • The researchers investigate how LLMs can be integrated into interactive theatre experiences and assess their ability to engage in human-like dialogue.
  • The study involves developing and testing prototype LLM-based systems within an immersive theatre festival, examining their performance and the audience's reactions.

Plain English Explanation

The researchers in this paper are looking at how they can use large language models (LLMs) - powerful AI systems that can generate human-like text - to create interactive theatre experiences. The idea is to have the LLMs act as characters that can improvise and engage in dialogue with human actors and audience members.

The researchers developed prototype LLM-based systems and tested them out at a theatre festival, where the AI characters would interact with the audience and human performers. They wanted to see how well the LLMs could hold up their end of the conversation and whether the audience found the experience engaging and believable.

The key goals were to figure out how to design these AI theatre partners effectively and to evaluate how well they perform in a real-world, interactive setting. This research could help pave the way for more sophisticated and immersive AI-powered theatrical experiences in the future.

Technical Explanation

The paper describes the researchers' efforts to design and evaluate dialogue-based LLMs for use in co-creative improvised theatre performances. They developed prototype LLM-based systems and integrated them into an interactive theatre festival, where the AI characters would engage in conversational interactions with both human actors and audience members.

The researchers examined various design considerations for these AI theatre partners, such as [link: https://aimodels.fyi/papers/arxiv/dialogbench-evaluating-llms-as-human-like-dialogue]how to make the language model responses more human-like and engaging[/link]. They also looked at [link: https://aimodels.fyi/papers/arxiv/llm-discussion-enhancing-creativity-large-language-models]techniques for enhancing the creativity and contextual awareness of the LLMs[/link] to enable more natural and coherent dialogue.

To evaluate the performance of the systems, the researchers studied [link: https://aimodels.fyi/papers/arxiv/large-language-model-based-situational-dialogues-second]how well the LLMs handled different dialogue scenarios[/link] and [link: https://aimodels.fyi/papers/arxiv/is-this-real-life-is-this-just]the audience's perceptions of the AI's believability and engagement[/link]. They also examined [link: https://aimodels.fyi/papers/arxiv/exploring-autonomous-agents-through-lens-large-language]the broader implications of using LLMs in interactive, autonomous agents[/link].

Overall, the study provides insights into the design considerations and performance characteristics of dialogue-based LLMs within the context of co-creative, improvised theatre experiences.

Critical Analysis

The paper provides a thoughtful exploration of the challenges and potential of using LLMs for interactive theatre, but it also acknowledges several limitations and areas for further research.

One key limitation is the relatively small scale and controlled nature of the theatre festival setting used for evaluation. The researchers note that more research is needed to understand how these AI theatre partners would perform in larger-scale, less structured interactions with the public.

Additionally, the paper touches on concerns around the ethical implications of using LLMs in interactive, autonomous agents. While the researchers aimed to design the systems with transparency and user safety in mind, there are still open questions about issues like bias, accountability, and the potential for misuse that warrant further investigation.

Overall, the study represents an important step towards understanding how LLMs can be leveraged for immersive, co-creative experiences. However, the researchers rightly emphasize the need for continued exploration and refinement of these technologies to address the remaining challenges and concerns.

Conclusion

This paper presents a detailed exploration of the design and evaluation of dialogue-based LLMs for use in co-creative, improvised theatre performances. The researchers developed prototype systems and integrated them into an interactive theatre festival, allowing them to assess the performance of the AI theatre partners and the audience's perceptions.

The findings offer valuable insights into the design considerations and technical capabilities of these LLM-powered systems, as well as the opportunities and challenges associated with integrating them into live, interactive experiences. While the study demonstrates the potential of using LLMs to enable more engaging and immersive theatrical performances, it also highlights the need for further research to address the remaining technical, ethical, and practical concerns.

Overall, this work represents an important contribution to the growing field of AI-powered interactive and creative applications, and it lays the groundwork for future advancements in the use of LLMs for innovative, human-centered experiences.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

📉

DialogBench: Evaluating LLMs as Human-like Dialogue Systems

Jiao Ou, Junda Lu, Che Liu, Yihong Tang, Fuzheng Zhang, Di Zhang, Kun Gai

YC

0

Reddit

0

Large language models (LLMs) have achieved remarkable breakthroughs in new dialogue capabilities by leveraging instruction tuning, which refreshes human impressions of dialogue systems. The long-standing goal of dialogue systems is to be human-like enough to establish long-term connections with users. Therefore, there has been an urgent need to evaluate LLMs as human-like dialogue systems. In this paper, we propose DialogBench, a dialogue evaluation benchmark that contains 12 dialogue tasks to probe the capabilities of LLMs as human-like dialogue systems should have. Specifically, we prompt GPT-4 to generate evaluation instances for each task. We first design the basic prompt based on widely used design principles and further mitigate the existing biases to generate higher-quality evaluation instances. Our extensive tests on English and Chinese DialogBench of 26 LLMs show that instruction tuning improves the human likeness of LLMs to a certain extent, but most LLMs still have much room for improvement as human-like dialogue systems. Interestingly, results also show that the positioning of assistant AI can make instruction tuning weaken the human emotional perception of LLMs and their mastery of information about human daily life.

Read more

4/1/2024

A Robot Walks into a Bar: Can Language Models Serve asCreativity Support Tools for Comedy? An Evaluation of LLMs' Humour Alignment with Comedians

A Robot Walks into a Bar: Can Language Models Serve asCreativity Support Tools for Comedy? An Evaluation of LLMs' Humour Alignment with Comedians

Piotr Wojciech Mirowski, Juliette Love, Kory W. Mathewson, Shakir Mohamed

YC

0

Reddit

0

We interviewed twenty professional comedians who perform live shows in front of audiences and who use artificial intelligence in their artistic process as part of 3-hour workshops on ``AI x Comedy'' conducted at the Edinburgh Festival Fringe in August 2023 and online. The workshop consisted of a comedy writing session with large language models (LLMs), a human-computer interaction questionnaire to assess the Creativity Support Index of AI as a writing tool, and a focus group interrogating the comedians' motivations for and processes of using AI, as well as their ethical concerns about bias, censorship and copyright. Participants noted that existing moderation strategies used in safety filtering and instruction-tuned LLMs reinforced hegemonic viewpoints by erasing minority groups and their perspectives, and qualified this as a form of censorship. At the same time, most participants felt the LLMs did not succeed as a creativity support tool, by producing bland and biased comedy tropes, akin to ``cruise ship comedy material from the 1950s, but a bit less racist''. Our work extends scholarship about the subtle difference between, one the one hand, harmful speech, and on the other hand, ``offensive'' language as a practice of resistance, satire and ``punching up''. We also interrogate the global value alignment behind such language models, and discuss the importance of community-based value alignment and data ownership to build AI tools that better suit artists' needs.

Read more

6/5/2024

LLM Discussion: Enhancing the Creativity of Large Language Models via Discussion Framework and Role-Play

LLM Discussion: Enhancing the Creativity of Large Language Models via Discussion Framework and Role-Play

Li-Chun Lu, Shou-Jen Chen, Tsung-Min Pai, Chan-Hung Yu, Hung-yi Lee, Shao-Hua Sun

YC

0

Reddit

0

Large language models (LLMs) have shown exceptional proficiency in natural language processing but often fall short of generating creative and original responses to open-ended questions. To enhance LLM creativity, our key insight is to emulate the human process of inducing collective creativity through engaging discussions with participants from diverse backgrounds and perspectives. To this end, we propose LLM Discussion, a three-phase discussion framework that facilitates vigorous and diverging idea exchanges and ensures convergence to creative answers. Moreover, we adopt a role-playing technique by assigning distinct roles to LLMs to combat the homogeneity of LLMs. We evaluate the efficacy of the proposed framework with the Alternative Uses Test, Similarities Test, Instances Test, and Scientific Creativity Test through both LLM evaluation and human study. Our proposed framework outperforms single-LLM approaches and existing multi-LLM frameworks across various creativity metrics.

Read more

5/21/2024

Assistive Large Language Model Agents for Socially-Aware Negotiation Dialogues

Assistive Large Language Model Agents for Socially-Aware Negotiation Dialogues

Yuncheng Hua, Lizhen Qu, Gholamreza Haffari

YC

0

Reddit

0

We develop assistive agents based on Large Language Models (LLMs) that aid interlocutors in business negotiations. Specifically, we simulate business negotiations by letting two LLM-based agents engage in role play. A third LLM acts as a remediator agent to rewrite utterances violating norms for improving negotiation outcomes. We introduce a simple tuning-free and label-free In-Context Learning (ICL) method to identify high-quality ICL exemplars for the remediator, where we propose a novel select criteria, called value impact, to measure the quality of the negotiation outcomes. We provide rich empirical evidence to demonstrate its effectiveness in negotiations across three different negotiation topics. The source code and the generated dataset will be publicly available upon acceptance.

Read more

6/19/2024