EMMI -- Empathic Multimodal Motivational Interviews Dataset: Analyses and Annotations

Read original: arXiv:2406.16478 - Published 6/26/2024 by Lucie Galland, Catherine Pelachaud, Florian Pecune

EMMI -- Empathic Multimodal Motivational Interviews Dataset: Analyses and Annotations

Overview

This paper presents the EMMI (Empathic Multimodal Motivational Interviews) dataset, which was created to study empathic behaviors in the context of motivational interviews.
The dataset consists of audio, video, and transcripts of interviews between counselors and clients discussing health-related topics.
The researchers annotated the dataset for various empathic and motivational behaviors, providing a valuable resource for studying multimodal empathy in conversational interactions.

Plain English Explanation

The researchers created a new dataset called EMMI that contains recordings of counselors talking with clients about their health. The recordings include audio, video, and text transcripts of these conversations. The researchers then carefully analyzed the recordings and identified different types of empathic and motivational behaviors that the counselors used during the interviews.

This dataset provides a rich resource for researchers studying empathy in conversations. By having access to the multimodal data (audio, video, and text) and the detailed annotations of the empathic and motivational behaviors, researchers can gain a deeper understanding of how empathy is expressed and communicated through both verbal and non-verbal cues.

The availability of this dataset could lead to the development of more empathetic conversational interfaces that can better understand and respond to human emotions and needs. This could have important applications in areas like mental health support, customer service, and educational coaching.

Technical Explanation

The EMMI dataset consists of 120 audio-visual recordings of motivational interviews between counselors and clients discussing health-related topics. The researchers annotated the dataset for various empathic and motivational behaviors, including verbal expressions of empathy, reflective listening, affirmations, open-ended questions, and non-verbal behaviors like head nods, eye contact, and facial expressions.

The annotations were performed by trained human raters using a coding scheme developed based on existing literature on empathic and motivational interviewing techniques. The inter-rater reliability of the annotations was assessed and found to be satisfactory.

The availability of this rich, multimodal dataset with detailed annotations of empathic and motivational behaviors provides a valuable resource for researchers interested in understanding and modeling empathy in conversational interactions. It could enable the development of more emotionally intelligent conversational systems that can better engage with and respond to human emotions and needs.

Critical Analysis

The EMMI dataset represents an important contribution to the field of multimodal empathy research. However, the dataset is limited to a specific context, namely motivational interviews about health-related topics. It remains to be seen how well the findings and models developed using this dataset would generalize to other types of conversational interactions.

Additionally, the dataset only includes interactions between counselors and clients, and does not capture peer-to-peer empathic exchanges. Expanding the dataset to include a wider range of conversational scenarios could further enrich our understanding of empathic behaviors.

While the researchers have made the dataset publicly available, the use of the dataset may be restricted by privacy concerns related to the personal nature of the health-related discussions. Careful consideration of ethical guidelines and data protection measures is necessary when working with such sensitive information.

Conclusion

The EMMI dataset provides a valuable resource for researchers interested in studying empathy in multimodal conversational interactions. By capturing detailed annotations of empathic and motivational behaviors, the dataset enables the development of more sophisticated models and algorithms for understanding and generating empathic responses in conversational systems.

The availability of this dataset could lead to significant advancements in the field of emotionally intelligent conversational interfaces, with potential applications in mental health support, customer service, and educational coaching. As the research in this area continues to evolve, the EMMI dataset will remain a valuable resource for the community.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

EMMI -- Empathic Multimodal Motivational Interviews Dataset: Analyses and Annotations

Lucie Galland, Catherine Pelachaud, Florian Pecune

The study of multimodal interaction in therapy can yield a comprehensive understanding of therapist and patient behavior that can be used to develop a multimodal virtual agent supporting therapy. This investigation aims to uncover how therapists skillfully blend therapy's task goal (employing classical steps of Motivational Interviewing) with the social goal (building a trusting relationship and expressing empathy). Furthermore, we seek to categorize patients into various ``types'' requiring tailored therapeutic approaches. To this intent, we present multimodal annotations of a corpus consisting of simulated motivational interviewing conversations, wherein actors portray the roles of patients and therapists. We introduce EMMI, composed of two publicly available MI corpora, AnnoMI and the Motivational Interviewing Dataset, for which we add multimodal annotations. We analyze these annotations to characterize functional behavior for developing a virtual agent performing motivational interviews emphasizing social and empathic behaviors. Our analysis found three clusters of patients expressing significant differences in behavior and adaptation of the therapist's behavior to those types. This shows the importance of a therapist being able to adapt their behavior depending on the current situation within the dialog and the type of user.

6/26/2024

Towards Multimodal Emotional Support Conversation Systems

Yuqi Chu, Lizi Liao, Zhiyuan Zhou, Chong-Wah Ngo, Richang Hong

The integration of conversational artificial intelligence (AI) into mental health care promises a new horizon for therapist-client interactions, aiming to closely emulate the depth and nuance of human conversations. Despite the potential, the current landscape of conversational AI is markedly limited by its reliance on single-modal data, constraining the systems' ability to empathize and provide effective emotional support. This limitation stems from a paucity of resources that encapsulate the multimodal nature of human communication essential for therapeutic counseling. To address this gap, we introduce the Multimodal Emotional Support Conversation (MESC) dataset, a first-of-its-kind resource enriched with comprehensive annotations across text, audio, and video modalities. This dataset captures the intricate interplay of user emotions, system strategies, system emotion, and system responses, setting a new precedent in the field. Leveraging the MESC dataset, we propose a general Sequential Multimodal Emotional Support framework (SMES) grounded in Therapeutic Skills Theory. Tailored for multimodal dialogue systems, the SMES framework incorporates an LLM-based reasoning model that sequentially generates user emotion recognition, system strategy prediction, system emotion prediction, and response generation. Our rigorous evaluations demonstrate that this framework significantly enhances the capability of AI systems to mimic therapist behaviors with heightened empathy and strategic responsiveness. By integrating multimodal data in this innovative manner, we bridge the critical gap between emotion recognition and emotional support, marking a significant advancement in conversational AI for mental health support.

8/9/2024

M3TCM: Multi-modal Multi-task Context Model for Utterance Classification in Motivational Interviews

Sayed Muddashir Hossain, Jan Alexandersson, Philipp Muller

Accurate utterance classification in motivational interviews is crucial to automatically understand the quality and dynamics of client-therapist interaction, and it can serve as a key input for systems mediating such interactions. Motivational interviews exhibit three important characteristics. First, there are two distinct roles, namely client and therapist. Second, they are often highly emotionally charged, which can be expressed both in text and in prosody. Finally, context is of central importance to classify any given utterance. Previous works did not adequately incorporate all of these characteristics into utterance classification approaches for mental health dialogues. In contrast, we present M3TCM, a Multi-modal, Multi-task Context Model for utterance classification. Our approach for the first time employs multi-task learning to effectively model both joint and individual components of therapist and client behaviour. Furthermore, M3TCM integrates information from the text and speech modality as well as the conversation context. With our novel approach, we outperform the state of the art for utterance classification on the recently introduced AnnoMI dataset with a relative improvement of 20% for the client- and by 15% for therapist utterance classification. In extensive ablation studies, we quantify the improvement resulting from each contribution.

4/5/2024

🚀

Empathy Through Multimodality in Conversational Interfaces

Mahyar Abbasian, Iman Azimi, Mohammad Feli, Amir M. Rahmani, Ramesh Jain

Agents represent one of the most emerging applications of Large Language Models (LLMs) and Generative AI, with their effectiveness hinging on multimodal capabilities to navigate complex user environments. Conversational Health Agents (CHAs), a prime example of this, are redefining healthcare by offering nuanced support that transcends textual analysis to incorporate emotional intelligence. This paper introduces an LLM-based CHA engineered for rich, multimodal dialogue-especially in the realm of mental health support. It adeptly interprets and responds to users' emotional states by analyzing multimodal cues, thus delivering contextually aware and empathetically resonant verbal responses. Our implementation leverages the versatile openCHA framework, and our comprehensive evaluation involves neutral prompts expressed in diverse emotional tones: sadness, anger, and joy. We evaluate the consistency and repeatability of the planning capability of the proposed CHA. Furthermore, human evaluators critique the CHA's empathic delivery, with findings revealing a striking concordance between the CHA's outputs and evaluators' assessments. These results affirm the indispensable role of vocal (soon multimodal) emotion recognition in strengthening the empathetic connection built by CHAs, cementing their place at the forefront of interactive, compassionate digital health solutions.

5/9/2024