Navigating User Experience of ChatGPT-based Conversational Recommender Systems: The Effects of Prompt Guidance and Recommendation Domain

Read original: arXiv:2405.13560 - Published 5/24/2024 by Yizhe Zhang, Yucheng Jin, Li Chen, Ting Yang

⚙️

Overview

The paper explores the use of large language models (LLMs) to enhance conversational recommender systems (CRS), which allow users to provide feedback and express preferences through natural language.
The study investigates the impact of two key factors on the user experience of an LLM-powered CRS: prompt guidance (PG) and recommendation domain (RD).
The researchers developed a ChatGPT-based CRS and conducted an online empirical study to evaluate the system's performance across different recommendation domains and with varying levels of prompt guidance.

Plain English Explanation

Conversational recommender systems (CRS) allow users to communicate their preferences and provide feedback using natural language, such as having a conversation. With the rise of large language models (LLMs), like ChatGPT, there is growing interest in using these models to enhance the user experience of CRS and generate personalized recommendations.

However, the effectiveness of an LLM-powered CRS depends on how the system is designed and the specific context in which it is used. This study explores two key factors that can influence the user experience: prompt guidance and recommendation domain.

Prompt guidance refers to the way the system guides the user's input and feedback. Some CRS may provide clear instructions or prompts to help users articulate their preferences, while others may leave the conversation more open-ended. The researchers wanted to see how this level of guidance affects the user's perception of the system's explainability, adaptability, ease of use, and transparency.

The recommendation domain refers to the specific area the CRS is designed for, such as book recommendations or job recommendations. The researchers hypothesized that user preferences and behaviors may differ depending on the recommendation domain, and they wanted to explore how this factor interacts with prompt guidance.

To investigate these questions, the researchers developed a ChatGPT-based CRS and conducted an online study with 100 participants. The study used a mixed-method approach, combining different experimental designs to understand the impact of prompt guidance and recommendation domain on the user experience.

Technical Explanation

The researchers developed a ChatGPT-based conversational recommender system (CRS) to investigate the effects of prompt guidance (PG) and recommendation domain (RD) on the user experience.

The study employed a mixed-method approach, using a between-subjects design for the PG variable (with vs. without) and a within-subjects design for the RD variable (book recommendations vs. job recommendations). Participants (N = 100) were recruited for an online empirical evaluation of the CRS.

The findings reveal that prompt guidance can substantially enhance the system's explainability, adaptability, perceived ease of use, and transparency. Additionally, users tend to perceive a greater sense of novelty and demonstrate a higher propensity to engage with and try recommended items in the context of book recommendations compared to job recommendations.

Furthermore, the researchers found that the influence of prompt guidance on certain user experience metrics and interactive behaviors is modulated by the recommendation domain, as evidenced by the interaction effects between the two examined factors.

Critical Analysis

The study provides valuable insights into the design and evaluation of LLM-powered conversational recommender systems. The researchers have thoughtfully considered the impact of prompt guidance and recommendation domain, which are important factors that can influence the user experience.

One potential limitation of the study is the use of a single LLM (ChatGPT) in the CRS implementation. It would be interesting to see how the findings might differ if the researchers had explored other LLM architectures or compared the performance of different models. Additionally, the study focuses on two specific recommendation domains (books and jobs), and it would be beneficial to investigate a wider range of domains to further understand the generalizability of the findings.

While the study highlights the importance of prompt guidance, the researchers do not delve into the specific design of the prompts used in the CRS. A more detailed exploration of prompt engineering and its impact on the user experience could provide additional insights for designing effective conversational recommender systems.

Overall, this research contributes to the growing body of work on the user-centered evaluation of ChatGPT-based conversational systems, offering practical design considerations for enhancing the user experience in CRS.

Conclusion

This study investigates the impact of prompt guidance and recommendation domain on the user experience of a ChatGPT-based conversational recommender system (CRS). The findings demonstrate that prompt guidance can significantly improve the system's explainability, adaptability, perceived ease of use, and transparency.

Furthermore, the researchers found that users tend to perceive greater novelty and exhibit higher engagement with recommended items in the context of book recommendations compared to job recommendations. The interaction between prompt guidance and recommendation domain also suggests that the influence of these factors on user experience can be modulated by the specific domain.

These insights contribute to the development of more effective and user-friendly LLM-powered CRS, providing practical design guidance for researchers and practitioners working in this field. By considering the nuances of prompt design and recommendation context, the study highlights the importance of a user-centered approach to the development of conversational recommender systems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

⚙️

Navigating User Experience of ChatGPT-based Conversational Recommender Systems: The Effects of Prompt Guidance and Recommendation Domain

Yizhe Zhang, Yucheng Jin, Li Chen, Ting Yang

Conversational recommender systems (CRS) enable users to articulate their preferences and provide feedback through natural language. With the advent of large language models (LLMs), the potential to enhance user engagement with CRS and augment the recommendation process with LLM-generated content has received increasing attention. However, the efficacy of LLM-powered CRS is contingent upon the use of prompts, and the subjective perception of recommendation quality can differ across various recommendation domains. Therefore, we have developed a ChatGPT-based CRS to investigate the impact of these two factors, prompt guidance (PG) and recommendation domain (RD), on the overall user experience of the system. We conducted an online empirical study (N = 100) by employing a mixed-method approach that utilized a between-subjects design for the variable of PG (with vs. without) and a within-subjects design for RD (book recommendations vs. job recommendations). The findings reveal that PG can substantially enhance the system's explainability, adaptability, perceived ease of use, and transparency. Moreover, users are inclined to perceive a greater sense of novelty and demonstrate a higher propensity to engage with and try recommended items in the context of book recommendations as opposed to job recommendations. Furthermore, the influence of PG on certain user experience metrics and interactive behaviors appears to be modulated by the recommendation domain, as evidenced by the interaction effects between the two examined factors. This work contributes to the user-centered evaluation of ChatGPT-based CRS by investigating two prominent factors and offers practical design guidance.

5/24/2024

🤿

Evaluating ChatGPT as a Recommender System: A Rigorous Approach

Dario Di Palma, Giovanni Maria Biancofiore, Vito Walter Anelli, Fedelucio Narducci, Tommaso Di Noia, Eugenio Di Sciascio

Large Language Models (LLMs) have recently shown impressive abilities in handling various natural language-related tasks. Among different LLMs, current studies have assessed ChatGPT's superior performance across manifold tasks, especially under the zero/few-shot prompting conditions. Given such successes, the Recommender Systems (RSs) research community have started investigating its potential applications within the recommendation scenario. However, although various methods have been proposed to integrate ChatGPT's capabilities into RSs, current research struggles to comprehensively evaluate such models while considering the peculiarities of generative models. Often, evaluations do not consider hallucinations, duplications, and out-of-the-closed domain recommendations and solely focus on accuracy metrics, neglecting the impact on beyond-accuracy facets. To bridge this gap, we propose a robust evaluation pipeline to assess ChatGPT's ability as an RS and post-process ChatGPT recommendations to account for these aspects. Through this pipeline, we investigate ChatGPT-3.5 and ChatGPT-4 performance in the recommendation task under the zero-shot condition employing the role-playing prompt. We analyze the model's functionality in three settings: the Top-N Recommendation, the cold-start recommendation, and the re-ranking of a list of recommendations, and in three domains: movies, music, and books. The experiments reveal that ChatGPT exhibits higher accuracy than the baselines on books domain. It also excels in re-ranking and cold-start scenarios while maintaining reasonable beyond-accuracy metrics. Furthermore, we measure the similarity between the ChatGPT recommendations and the other recommenders, providing insights about how ChatGPT could be categorized in the realm of recommender systems. The evaluation pipeline is publicly released for future research.

6/5/2024

🤔

Incorporating External Knowledge and Goal Guidance for LLM-based Conversational Recommender Systems

Chuang Li, Yang Deng, Hengchang Hu, Min-Yen Kan, Haizhou Li

This paper aims to efficiently enable large language models (LLMs) to use external knowledge and goal guidance in conversational recommender system (CRS) tasks. Advanced LLMs (e.g., ChatGPT) are limited in domain-specific CRS tasks for 1) generating grounded responses with recommendation-oriented knowledge, or 2) proactively leading the conversations through different dialogue goals. In this work, we first analyze those limitations through a comprehensive evaluation, showing the necessity of external knowledge and goal guidance which contribute significantly to the recommendation accuracy and language quality. In light of this finding, we propose a novel ChatCRS framework to decompose the complex CRS task into several sub-tasks through the implementation of 1) a knowledge retrieval agent using a tool-augmented approach to reason over external Knowledge Bases and 2) a goal-planning agent for dialogue goal prediction. Experimental results on two multi-goal CRS datasets reveal that ChatCRS sets new state-of-the-art benchmarks, improving language quality of informativeness by 17% and proactivity by 27%, and achieving a tenfold enhancement in recommendation accuracy.

5/6/2024

RecGPT: Generative Personalized Prompts for Sequential Recommendation via ChatGPT Training Paradigm

Yabin Zhang, Wenhui Yu, Erhan Zhang, Xu Chen, Lantao Hu, Peng Jiang, Kun Gai

ChatGPT has achieved remarkable success in natural language understanding. Considering that recommendation is indeed a conversation between users and the system with items as words, which has similar underlying pattern with ChatGPT, we design a new chat framework in item index level for the recommendation task. Our novelty mainly contains three parts: model, training and inference. For the model part, we adopt Generative Pre-training Transformer (GPT) as the sequential recommendation model and design a user modular to capture personalized information. For the training part, we adopt the two-stage paradigm of ChatGPT, including pre-training and fine-tuning. In the pre-training stage, we train GPT model by auto-regression. In the fine-tuning stage, we train the model with prompts, which include both the newly-generated results from the model and the user's feedback. For the inference part, we predict several user interests as user representations in an autoregressive manner. For each interest vector, we recall several items with the highest similarity and merge the items recalled by all interest vectors into the final result. We conduct experiments with both offline public datasets and online A/B test to demonstrate the effectiveness of our proposed method.

4/16/2024