Using LLMs to Investigate Correlations of Conversational Follow-up Queries with User Satisfaction

Read original: arXiv:2407.13166 - Published 7/19/2024 by Hyunwoo Kim, Yoonseo Choi, Taehyun Yang, Honggu Lee, Chaneon Park, Yongju Lee, Jin Young Kim, Juho Kim

Using LLMs to Investigate Correlations of Conversational Follow-up Queries with User Satisfaction

Overview

This paper investigates the relationship between the quality of conversational follow-up queries and user satisfaction in language models.
The researchers develop a taxonomy of conversational follow-up queries and analyze their correlation with user ratings.
The goal is to provide insights that can help improve the design of conversational AI systems.

Plain English Explanation

When interacting with conversational AI systems, users often ask follow-up questions to get more information or clarification. This paper explores the connections between the quality of these follow-up queries and how satisfied users are with the overall conversation.

The researchers first created a detailed classification system, or taxonomy, for different types of follow-up questions. This taxonomy can help developers understand the patterns and purposes of these kinds of queries.

They then analyzed how the various categories of follow-up questions correlate with user ratings of the conversation. This allows them to identify which types of follow-up queries are associated with higher or lower levels of user satisfaction.

The goal is to use these insights to improve the design of conversational AI systems. By understanding what kinds of follow-up questions lead to more positive user experiences, developers can work to generate more relevant and helpful responses to keep users engaged and satisfied.

Technical Explanation

The researchers first developed a comprehensive taxonomy of conversational follow-up queries based on an analysis of real-world conversations. This taxonomy categorizes follow-up questions into different types, such as clarification, elaboration, and resolution.

They then conducted a large-scale user study where participants interacted with a conversational AI system and rated their satisfaction. The researchers analyzed the follow-up queries in these conversations and correlated the query types with the user satisfaction scores.

The results show that certain categories of follow-up questions, such as those seeking clarification or additional details, are associated with higher user satisfaction. In contrast, queries focused on resolving discrepancies or requesting corrections tend to correlate with lower satisfaction ratings.

These findings can inform the design of conversational AI systems, helping them generate more satisfying responses to follow-up queries. By understanding the patterns of user follow-up behavior and its relationship to satisfaction, developers can work to anticipate and respond to these queries in ways that enhance the overall conversational experience.

Critical Analysis

The paper presents a rigorous and well-designed study, but it acknowledges several limitations. The taxonomy of follow-up queries, while comprehensive, may not capture all possible variations, and the user study was conducted in a controlled setting rather than real-world scenarios.

Additionally, the study focused on the correlation between follow-up queries and user satisfaction, but did not investigate the causal mechanisms underlying these relationships. Further research is needed to understand how the content and context of follow-up questions influence user perceptions and experiences.

It would also be valuable to explore how these findings apply to diverse user populations and conversational domains, as the study was limited to a specific task and participant demographics.

Conclusion

This research provides valuable insights into the relationship between conversational follow-up queries and user satisfaction in language models. By developing a detailed taxonomy of follow-up question types and analyzing their correlation with user ratings, the researchers have identified patterns that can guide the design of more effective and engaging conversational AI systems.

The findings suggest that anticipating and responding effectively to certain types of follow-up queries, such as those seeking clarification or additional information, can significantly improve the overall user experience. As conversational AI becomes increasingly prevalent, this work represents an important step towards creating more satisfying and meaningful human-AI interactions.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Using LLMs to Investigate Correlations of Conversational Follow-up Queries with User Satisfaction

Hyunwoo Kim, Yoonseo Choi, Taehyun Yang, Honggu Lee, Chaneon Park, Yongju Lee, Jin Young Kim, Juho Kim

With large language models (LLMs), conversational search engines shift how users retrieve information from the web by enabling natural conversations to express their search intents over multiple turns. Users' natural conversation embodies rich but implicit signals of users' search intents and evaluation of search results to understand user experience with the system. However, it is underexplored how and why users ask follow-up queries to continue conversations with conversational search engines and how the follow-up queries signal users' satisfaction. From qualitative analysis of 250 conversational turns from an in-lab user evaluation of Naver Cue:, a commercial conversational search engine, we propose a taxonomy of 18 users' follow-up query patterns from conversational search, comprising two major axes: (1) users' motivations behind continuing conversations (N = 7) and (2) actions of follow-up queries (N = 11). Compared to the existing literature on query reformulations, we uncovered a new set of motivations and actions behind follow-up queries, including asking for subjective opinions or providing natural language feedback on the engine's responses. To analyze conversational search logs with our taxonomy in a scalable and efficient manner, we built an LLM-powered classifier (73% accuracy). With our classifier, we analyzed 2,061 conversational tuples collected from real-world usage logs of Cue: and examined how the conversation patterns from our taxonomy correlates with satisfaction. Our initial findings suggest some signals of dissatisfactions, such as Clarifying Queries, Excluding Condition, and Substituting Condition with follow-up queries. We envision our approach could contribute to automated evaluation of conversation search experience by providing satisfaction signals and grounds for realistic user simulations.

7/19/2024

💬

Follow-Up Questions Improve Documents Generated by Large Language Models

Bernadette J Tix

This study investigates the impact of Large Language Models (LLMs) generating follow-up questions in response to user requests for short (1-page) text documents. Users interacted with a novel web-based AI system designed to ask follow-up questions. Users requested documents they would like the AI to produce. The AI then generated follow-up questions to clarify the user's needs or offer additional insights before generating the requested documents. After answering the questions, users were shown a document generated using both the initial request and the questions and answers, and a document generated using only the initial request. Users indicated which document they preferred and gave feedback about their experience with the question-answering process. The findings of this study show clear benefits to question-asking both in document preference and in the qualitative user experience. This study further shows that users found more value in questions which were thought-provoking, open-ended, or offered unique insights into the user's request as opposed to simple information-gathering questions.

8/16/2024

Generate then Retrieve: Conversational Response Retrieval Using LLMs as Answer and Query Generators

Zahra Abbasiantaeb, Mohammad Aliannejadi

CIS is a prominent area in IR which focuses on developing interactive knowledge assistants. These systems must adeptly comprehend the user's information requirements within the conversational context and retrieve the relevant information. To this aim, the existing approaches model the user's information needs by generating a single query rewrite or a single representation of the query in the query space embedding. However, to answer complex questions, a single query rewrite or representation is often ineffective. To address this, a system needs to do reasoning over multiple passages. In this work, we propose using a generate-then-retrieve approach to improve the passage retrieval performance for complex user queries. In this approach, we utilize large language models (LLMs) to (i) generate an initial answer to the user's information need by doing reasoning over the context of the conversation, and (ii) ground this answer to the collection. Based on the experiments, our proposed approach significantly improves the retrieval performance on TREC iKAT 23, TREC CAsT 20 and 22 datasets, under various setups. Also, we show that grounding the LLM's answer requires more than one searchable query, where an average of 3 queries outperforms human rewrites.

6/27/2024

⛏️

PerkwE_COQA: enhance Persian Conversational Question Answering by combining contextual keyword extraction with Large Language Models

Pardis Moradbeiki, Nasser Ghadiri

Smart cities need the involvement of their residents to enhance quality of life. Conversational query-answering is an emerging approach for user engagement. There is an increasing demand of an advanced conversational question-answering that goes beyond classic systems. Existing approaches have shown that LLMs offer promising capabilities for CQA, but may struggle to capture the nuances of conversational contexts. The new approach involves understanding the content and engaging in a multi-step conversation with the user to fulfill their needs. This paper presents a novel method to elevate the performance of Persian Conversational question-answering (CQA) systems. It combines the strengths of Large Language Models (LLMs) with contextual keyword extraction. Our method extracts keywords specific to the conversational flow, providing the LLM with additional context to understand the user's intent and generate more relevant and coherent responses. We evaluated the effectiveness of this combined approach through various metrics, demonstrating significant improvements in CQA performance compared to an LLM-only baseline. The proposed method effectively handles implicit questions, delivers contextually relevant answers, and tackles complex questions that rely heavily on conversational context. The findings indicate that our method outperformed the evaluation benchmarks up to 8% higher than existing methods and the LLM-only baseline.

4/16/2024