IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerce

2406.10173

YC

0

Reddit

0

Published 6/17/2024 by Wenxuan Ding, Weiqi Wang, Sze Heng Douglas Kwok, Minghao Liu, Tianqing Fang, Jiaxin Bai, Junxian He, Yangqiu Song
IntentionQA: A Benchmark for Evaluating Purchase Intention Comprehension Abilities of Language Models in E-commerce

Abstract

Enhancing Language Models' (LMs) ability to understand purchase intentions in E-commerce scenarios is crucial for their effective assistance in various downstream tasks. However, previous approaches that distill intentions from LMs often fail to generate meaningful and human-centric intentions applicable in real-world E-commerce contexts. This raises concerns about the true comprehension and utilization of purchase intentions by LMs. In this paper, we present IntentionQA, a double-task multiple-choice question answering benchmark to evaluate LMs' comprehension of purchase intentions in E-commerce. Specifically, LMs are tasked to infer intentions based on purchased products and utilize them to predict additional purchases. IntentionQA consists of 4,360 carefully curated problems across three difficulty levels, constructed using an automated pipeline to ensure scalability on large E-commerce platforms. Human evaluations demonstrate the high quality and low false-negative rate of our benchmark. Extensive experiments across 19 language models show that they still struggle with certain scenarios, such as understanding products and intentions accurately, jointly reasoning with products and intentions, and more, in which they fall far behind human performances. Our code and data are publicly available at https://github.com/HKUST-KnowComp/IntentionQA.

Create account to get full access

or

If you already have an account, we'll log you in

Overview

  • This paper introduces IntentionQA, a new benchmark dataset for evaluating language models' ability to comprehend purchase intention in e-commerce contexts.
  • The dataset contains over 20,000 product-related questions annotated with purchase intention labels, allowing for the assessment of models' understanding of customer intent.
  • The authors also provide baseline results using state-of-the-art language models, highlighting areas for improvement in modeling purchase intention.

Plain English Explanation

The researchers created a new dataset called IntentionQA to help measure how well AI language models can understand a customer's intent when they ask questions about products online. The dataset contains over 20,000 sample questions that have been labeled with information about whether the customer is likely to make a purchase or not.

By using this dataset, the researchers were able to test popular AI language models to see how well they could pick up on the customer's true intentions behind their questions. The results showed there is still room for improvement in this area, as the models did not always accurately recognize when a customer was serious about buying a product versus just browsing or looking for information.

The goal of this work is to help advance the development of AI-powered virtual assistants and chatbots that can have more natural, human-like conversations with online shoppers and better understand their underlying purchase motivations. [Linking to related papers on product QA, conversational shopping assistants, and multi-intent learning]

Technical Explanation

The IntentionQA dataset was created by the researchers to provide a benchmark for evaluating language models' ability to comprehend purchase intention in e-commerce settings. The dataset contains over 20,000 product-related questions that have been manually annotated with purchase intention labels - indicating whether the customer is likely to make a purchase (high intent), is just browsing (low intent), or has some other informational need.

The authors provide baseline results using several state-of-the-art language models, including BERT, RoBERTa, and GPT-3. These models were fine-tuned on the IntentionQA dataset and evaluated on their ability to accurately classify the purchase intention behind each question. The results show there is significant room for improvement, as the best-performing model achieved an F1-score of only 0.73 on the purchase intent classification task.

The creation of this benchmark dataset and the baseline model evaluations are an important step towards developing more intelligent virtual shopping assistants and product recommendation systems. By better understanding a customer's true intentions, conversational AI can have more natural and helpful dialogues, ultimately improving the e-commerce experience. [Linking to related papers on question suggestion, multi-intent learning, and understanding user intentions]

Critical Analysis

The IntentionQA dataset and benchmark presented in this paper are a valuable contribution to the field of conversational AI for e-commerce. The authors highlight the importance of accurately modeling purchase intention, which is a crucial aspect of delivering a personalized and effective shopping experience.

That said, the dataset is limited to text-based product questions, and it remains to be seen how well the insights from this work will generalize to more complex, multi-modal shopping interactions. Additionally, the paper does not delve into the potential biases or limitations of the human annotators who labeled the dataset, which could impact the reliability of the purchase intent classifications.

Further research is needed to explore more advanced techniques for intent recognition, such as incorporating contextual information about the user's browsing history, product details, and conversational flow. Techniques from related areas, such as [Linking to papers on multi-intent learning and understanding user intentions], may also prove valuable in this domain.

Overall, the IntentionQA benchmark represents an important step forward, but continued innovation will be necessary to build truly intelligent and empathetic virtual shopping assistants.

Conclusion

The IntentionQA dataset and benchmark introduced in this paper provide a valuable tool for advancing the state-of-the-art in e-commerce conversational AI. By focusing on the critical task of purchase intention comprehension, the researchers have highlighted an area where current language models still struggle and have set the stage for future improvements.

Developing AI systems that can better understand a customer's underlying motivations and intent will be crucial for delivering personalized, contextual, and human-like shopping experiences. This work lays the groundwork for more sophisticated virtual assistants and recommendation engines that can truly cater to the unique needs and preferences of each individual customer. [Linking to related papers on product QA and conversational shopping assistants]

As the field of conversational AI continues to evolve, benchmarks like IntentionQA will play an important role in driving progress and ensuring that the technology delivers tangible benefits to both businesses and consumers alike.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

MIND: Multimodal Shopping Intention Distillation from Large Vision-language Models for E-commerce Purchase Understanding

MIND: Multimodal Shopping Intention Distillation from Large Vision-language Models for E-commerce Purchase Understanding

Baixuan Xu, Weiqi Wang, Haochen Shi, Wenxuan Ding, Huihao Jing, Tianqing Fang, Jiaxin Bai, Long Chen, Yangqiu Song

YC

0

Reddit

0

Improving user experience and providing personalized search results in E-commerce platforms heavily rely on understanding purchase intention. However, existing methods for acquiring large-scale intentions bank on distilling large language models with human annotation for verification. Such an approach tends to generate product-centric intentions, overlook valuable visual information from product images, and incurs high costs for scalability. To address these issues, we introduce MIND, a multimodal framework that allows Large Vision-Language Models (LVLMs) to infer purchase intentions from multimodal product metadata and prioritize human-centric ones. Using Amazon Review data, we apply MIND and create a multimodal intention knowledge base, which contains 1,264,441 million intentions derived from 126,142 co-buy shopping records across 107,215 products. Extensive human evaluations demonstrate the high plausibility and typicality of our obtained intentions and validate the effectiveness of our distillation framework and filtering mechanism. Additional experiments reveal that our obtained intentions significantly enhance large language models in two intention comprehension tasks.

Read more

6/18/2024

InterIntent: Investigating Social Intelligence of LLMs via Intention Understanding in an Interactive Game Context

InterIntent: Investigating Social Intelligence of LLMs via Intention Understanding in an Interactive Game Context

Ziyi Liu, Abhishek Anand, Pei Zhou, Jen-tse Huang, Jieyu Zhao

YC

0

Reddit

0

Large language models (LLMs) have demonstrated the potential to mimic human social intelligence. However, most studies focus on simplistic and static self-report or performance-based tests, which limits the depth and validity of the analysis. In this paper, we developed a novel framework, InterIntent, to assess LLMs' social intelligence by mapping their ability to understand and manage intentions in a game setting. We focus on four dimensions of social intelligence: situational awareness, self-regulation, self-awareness, and theory of mind. Each dimension is linked to a specific game task: intention selection, intention following, intention summarization, and intention guessing. Our findings indicate that while LLMs exhibit high proficiency in selecting intentions, achieving an accuracy of 88%, their ability to infer the intentions of others is significantly weaker, trailing human performance by 20%. Additionally, game performance correlates with intention understanding, highlighting the importance of the four components towards success in this game. These findings underline the crucial role of intention understanding in evaluating LLMs' social intelligence and highlight the potential of using social deduction games as a complex testbed to enhance LLM evaluation. InterIntent contributes a structured approach to bridging the evaluation gap in social intelligence within multiplayer games.

Read more

6/19/2024

Identifying Shopping Intent in Product QA for Proactive Recommendations

Identifying Shopping Intent in Product QA for Proactive Recommendations

Besnik Fetahu, Nachshon Cohen, Elad Haramaty, Liane Lewin-Eytan, Oleg Rokhlenko, Shervin Malmasi

YC

0

Reddit

0

Voice assistants have become ubiquitous in smart devices allowing users to instantly access information via voice questions. While extensive research has been conducted in question answering for voice search, little attention has been paid on how to enable proactive recommendations from a voice assistant to its users. This is a highly challenging problem that often leads to user friction, mainly due to recommendations provided to the users at the wrong time. We focus on the domain of e-commerce, namely in identifying Shopping Product Questions (SPQs), where the user asking a product-related question may have an underlying shopping need. Identifying a user's shopping need allows voice assistants to enhance shopping experience by determining when to provide recommendations, such as product or deal recommendations, or proactive shopping actions recommendation. Identifying SPQs is a challenging problem and cannot be done from question text alone, and thus requires to infer latent user behavior patterns inferred from user's past shopping history. We propose features that capture the user's latent shopping behavior from their purchase history, and combine them using a novel Mixture-of-Experts (MoE) model. Our evaluation shows that the proposed approach is able to identify SPQs with a high score of F1=0.91. Furthermore, based on an online evaluation with real voice assistant users, we identify SPQs in real-time and recommend shopping actions to users to add the queried product into their shopping list. We demonstrate that we are able to accurately identify SPQs, as indicated by the significantly higher rate of added products to users' shopping lists when being prompted after SPQs vs random PQs.

Read more

4/10/2024

🗣️

Question Suggestion for Conversational Shopping Assistants Using Product Metadata

Nikhita Vedula, Oleg Rokhlenko, Shervin Malmasi

YC

0

Reddit

0

Digital assistants have become ubiquitous in e-commerce applications, following the recent advancements in Information Retrieval (IR), Natural Language Processing (NLP) and Generative Artificial Intelligence (AI). However, customers are often unsure or unaware of how to effectively converse with these assistants to meet their shopping needs. In this work, we emphasize the importance of providing customers a fast, easy to use, and natural way to interact with conversational shopping assistants. We propose a framework that employs Large Language Models (LLMs) to automatically generate contextual, useful, answerable, fluent and diverse questions about products, via in-context learning and supervised fine-tuning. Recommending these questions to customers as helpful suggestions or hints to both start and continue a conversation can result in a smoother and faster shopping experience with reduced conversation overhead and friction. We perform extensive offline evaluations, and discuss in detail about potential customer impact, and the type, length and latency of our generated product questions if incorporated into a real-world shopping assistant.

Read more

5/6/2024