Introducing MeMo: A Multimodal Dataset for Memory Modelling in Multiparty Conversations

Read original: arXiv:2409.13715 - Published 9/24/2024 by Maria Tsfasman, Bernd Dudzik, Kristian Fenech, Andras Lorincz, Catholijn M. Jonker, Catharine Oertel

Introducing MeMo: A Multimodal Dataset for Memory Modelling in Multiparty Conversations

Overview

A new multimodal dataset called MeMo for studying memory modeling in multiparty conversations
Includes first-person annotations of memory retention and encoding
Aims to advance research on memory in social and conversational contexts

Plain English Explanation

The researchers have created a new dataset called MeMo that can be used to study how people remember things during group conversations. The dataset includes multimodal data - things like audio, video, and text - from group interactions. Importantly, the dataset also includes annotations where people in the conversations self-report what they remembered and what helped them remember things.

This is valuable because it allows researchers to better understand the cognitive processes involved in how people encode and retain information in social, conversational settings, rather than just in isolated experiments. The goal is to advance research on memory modeling and develop systems that can better understand and support human memory in real-world conversations.

Technical Explanation

The MeMo dataset consists of recordings of multi-party conversations, along with annotations where participants self-report what they remembered from the conversations and what aspects helped them remember. The dataset includes audio, video, and transcript data, as well as annotations of social signals like gaze, gestures, and emotional expressions.

Importantly, the dataset includes two types of memory annotations - "memory retention" where participants indicate what they remember after a conversation, and "memory encoding" where they identify aspects of the conversation that helped them remember things. This allows the dataset to be used to study the cognitive and social processes involved in how people store and recall information in group settings.

The researchers describe the dataset collection process, including the conversational tasks, participant demographics, and annotation procedures. They also provide baseline experiments demonstrating the dataset's utility for training memory modeling and multimodal understanding models.

Critical Analysis

The MeMo dataset represents an important step forward in the study of memory in social, conversational contexts. By including first-person annotations of memory retention and encoding, it provides a unique window into the cognitive processes involved. However, the dataset is limited to a single cultural context (Western), and the conversational tasks may not fully capture the breadth of real-world interactions.

Additionally, the baseline models presented in the paper suggest there is significant room for improvement in accurately modeling memory from multimodal data. More research is needed to understand the factors that influence memory in group settings and develop systems that can effectively support human memory.

Conclusion

The MeMo dataset provides a valuable new resource for advancing research on memory modeling in multiparty conversations. By including first-person annotations of memory processes, it enables a more nuanced understanding of how people encode and retain information in social contexts. This has important implications for developing conversational AI systems that can better support human memory and cognition.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Introducing MeMo: A Multimodal Dataset for Memory Modelling in Multiparty Conversations

Maria Tsfasman, Bernd Dudzik, Kristian Fenech, Andras Lorincz, Catholijn M. Jonker, Catharine Oertel

The quality of human social relationships is intricately linked to human memory processes, with memory serving as the foundation for the creation of social bonds. Since human memory is selective, differing recollections of the same events within a group can lead to misunderstandings and misalignments in what is perceived to be common ground in the group. Yet, conversational facilitation systems, aimed at advancing the quality of group interactions, usually focus on tracking users' states within an individual session, ignoring what remains in each participant's memory after the interaction. Conversational memory is the process by which humans encode, retain and retrieve verbal, non-verbal and contextual information from a conversation. Understanding conversational memory can be used as a source of information on the long-term development of social connections within a group. This paper introduces the MeMo corpus, the first conversational dataset annotated with participants' memory retention reports, aimed at facilitating computational modelling of human conversational memory. The MeMo corpus includes 31 hours of small-group discussions on the topic of Covid-19, repeated over the term of 2 weeks. It integrates validated behavioural and perceptual measures, and includes audio, video, and multimodal annotations, offering a valuable resource for studying and modelling conversational memory and group dynamics. By introducing the MeMo corpus, presenting an analysis of its validity, and demonstrating its usefulness for future research, this paper aims to pave the way for future research in conversational memory modelling for intelligent system development.

9/24/2024

New!MM-Conv: A Multi-modal Conversational Dataset for Virtual Humans

Anna Deichler, Jim O'Regan, Jonas Beskow

In this paper, we present a novel dataset captured using a VR headset to record conversations between participants within a physics simulator (AI2-THOR). Our primary objective is to extend the field of co-speech gesture generation by incorporating rich contextual information within referential settings. Participants engaged in various conversational scenarios, all based on referential communication tasks. The dataset provides a rich set of multimodal recordings such as motion capture, speech, gaze, and scene graphs. This comprehensive dataset aims to enhance the understanding and development of gesture generation models in 3D scenes by providing diverse and contextually rich data.

10/2/2024

MemBench: Towards Real-world Evaluation of Memory-Augmented Dialogue Systems

Junqing He, Liang Zhu, Qi Wei, Rui Wang, Jiaxing Zhang

Long-term memory is so important for chatbots and dialogue systems (DS) that researchers have developed numerous memory-augmented DS. However, their evaluation methods are different from the real situation in human conversation. They only measured the accuracy of factual information or the perplexity of generated responses given a query, which hardly reflected their performance. Moreover, they only consider passive memory retrieval based on similarity, neglecting diverse memory-recalling paradigms in humans, e.g. emotions and surroundings. To bridge the gap, we construct a novel benchmark covering various memory recalling paradigms based on cognitive science and psychology theory. The Memory Benchmark (MemBench) contains two tasks according to the two-phrase theory in cognitive science: memory retrieval, memory recognition and injection. The benchmark considers both passive and proactive memory recalling based on meta information for the first time. In addition, novel scoring aspects are proposed to comprehensively measure the generated responses. Results from the strongest embedding models and LLMs on MemBench show that there is plenty of room for improvement in existing dialogue systems. Extensive experiments also reveal the correlation between memory injection and emotion supporting (ES) skillfulness, and intimacy. Our code and dataset will be released.

9/24/2024

New!Mixed-Session Conversation with Egocentric Memory

Jihyoung Jang, Taeyoung Kim, Hyounghun Kim

Recently introduced dialogue systems have demonstrated high usability. However, they still fall short of reflecting real-world conversation scenarios. Current dialogue systems exhibit an inability to replicate the dynamic, continuous, long-term interactions involving multiple partners. This shortfall arises because there have been limited efforts to account for both aspects of real-world dialogues: deeply layered interactions over the long-term dialogue and widely expanded conversation networks involving multiple participants. As the effort to incorporate these aspects combined, we introduce Mixed-Session Conversation, a dialogue system designed to construct conversations with various partners in a multi-session dialogue setup. We propose a new dataset called MiSC to implement this system. The dialogue episodes of MiSC consist of 6 consecutive sessions, with four speakers (one main speaker and three partners) appearing in each episode. Also, we propose a new dialogue model with a novel memory management mechanism, called Egocentric Memory Enhanced Mixed-Session Conversation Agent (EMMA). EMMA collects and retains memories from the main speaker's perspective during conversations with partners, enabling seamless continuity in subsequent interactions. Extensive human evaluations validate that the dialogues in MiSC demonstrate a seamless conversational flow, even when conversation partners change in each session. EMMA trained with MiSC is also evaluated to maintain high memorability without contradiction throughout the entire conversation.

10/4/2024