MerRec: A Large-scale Multipurpose Mercari Dataset for Consumer-to-Consumer Recommendation Systems

Read original: arXiv:2402.14230 - Published 7/18/2024 by Lichi Li, Zainul Abi Din, Zhen Tan, Sam London, Tianlong Chen, Ajay Daptardar
Total Score

0

MerRec: A Large-scale Multipurpose Mercari Dataset for Consumer-to-Consumer Recommendation Systems

Sign in to get full access

or

If you already have an account, we'll log you in

Overview

  • This paper presents a large-scale dataset called the Mercari Dataset, which was collected from a consumer-to-consumer (C2C) e-commerce platform.
  • The dataset contains information about products, user interactions, and transactions, making it a valuable resource for researchers working on consumer-to-consumer recommendation systems.
  • The dataset covers a wide range of product categories and user activities, providing a comprehensive view of the C2C e-commerce ecosystem.

Plain English Explanation

The researchers have created a large dataset from the Mercari e-commerce platform, which allows consumers to buy and sell products directly to each other. This dataset contains a wealth of information, including details about the products being sold, how users interact with them, and the actual transactions that take place. This makes it a valuable resource for researchers who are working on developing recommendation systems for consumer-to-consumer marketplaces.

The dataset covers a broad range of product categories and user activities, giving researchers a comprehensive understanding of the C2C e-commerce landscape. By using this dataset, researchers can study how consumers discover, engage with, and purchase products in a direct, peer-to-peer environment, which is quite different from the traditional business-to-consumer e-commerce model. This could lead to the development of more effective recommendation systems that better cater to the unique needs and preferences of consumers in a C2C setting.

Technical Explanation

The researchers collected data from the Mercari C2C e-commerce platform, which includes information about products, user interactions, and transactions. The dataset covers a wide range of product categories, such as electronics, clothing, and home goods, and includes details like product descriptions, prices, and user ratings.

In addition to product data, the dataset also contains information about user activities, such as searches, views, and purchases. This allows researchers to study how consumers discover and engage with products in a C2C marketplace, which can provide valuable insights for developing more effective recommendation systems.

The dataset is large in scale, with millions of products and user interactions, making it a comprehensive resource for researchers working on consumer-to-consumer recommendation problems. By leveraging this dataset, researchers can explore various aspects of C2C e-commerce, such as user preferences, product discovery, and the dynamics of peer-to-peer transactions.

Critical Analysis

The Mercari Dataset presented in this paper is a valuable resource for the research community, as it provides a comprehensive view of the consumer-to-consumer e-commerce landscape. However, it is important to note that the dataset is limited to a single C2C platform, Mercari, and may not necessarily be representative of all C2C marketplaces.

Additionally, the dataset does not provide information about the physical characteristics of the products, such as images or detailed product specifications. This could limit the scope of research that can be conducted, especially in areas related to product recommendation based on visual features.

Furthermore, the dataset does not include information about the demographic characteristics of the users, which could be useful for understanding how different consumer segments interact with and purchase products in a C2C environment.

Despite these limitations, the Mercari Dataset remains a significant contribution to the field of consumer-to-consumer recommendation systems research, and researchers should consider these caveats when using the dataset for their studies.

Conclusion

The Mercari Dataset presented in this paper is a large-scale, multipurpose dataset that provides a comprehensive view of the consumer-to-consumer e-commerce ecosystem. By making this dataset available to researchers, the authors have enabled the development of more effective recommendation systems for C2C marketplaces, which could have important implications for the future of peer-to-peer commerce.

The dataset covers a wide range of product categories and user activities, allowing researchers to explore various aspects of consumer behavior and preferences in a direct, peer-to-peer environment. While the dataset has some limitations, it remains a valuable resource for advancing research in the field of consumer-to-consumer recommendation systems.



This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

MerRec: A Large-scale Multipurpose Mercari Dataset for Consumer-to-Consumer Recommendation Systems
Total Score

0

MerRec: A Large-scale Multipurpose Mercari Dataset for Consumer-to-Consumer Recommendation Systems

Lichi Li, Zainul Abi Din, Zhen Tan, Sam London, Tianlong Chen, Ajay Daptardar

In the evolving e-commerce field, recommendation systems crucially shape user experience and engagement. The rise of Consumer-to-Consumer (C2C) recommendation systems, noted for their flexibility and ease of access for customer vendors, marks a significant trend. However, the academic focus remains largely on Business-to-Consumer (B2C) models, leaving a gap filled by the limited C2C recommendation datasets that lack in item attributes, user diversity, and scale. The intricacy of C2C recommendation systems is further accentuated by the dual roles users assume as both sellers and buyers, introducing a spectrum of less uniform and varied inputs. Addressing this, we introduce MerRec, the first large-scale dataset specifically for C2C recommendations, sourced from the Mercari e-commerce platform, covering millions of users and products over 6 months in 2023. MerRec not only includes standard features such as user_id, item_id, and session_id, but also unique elements like timestamped action types, product taxonomy, and textual product attributes, offering a comprehensive dataset for research. This dataset, extensively evaluated across four recommendation tasks, establishes a new benchmark for the development of advanced recommendation algorithms in real-world scenarios, bridging the gap between academia and industry and propelling the study of C2C recommendations. Our experiment code is available at https://github.com/mercari/mercari-ml-merrec-pub-us and dataset at https://huggingface.co/datasets/mercari-us/merrec.

Read more

7/18/2024

Image Score: Learning and Evaluating Human Preferences for Mercari Search
Total Score

0

Image Score: Learning and Evaluating Human Preferences for Mercari Search

Chingis Oinar, Miao Cao, Shanshan Fu

Mercari is the largest C2C e-commerce marketplace in Japan, having more than 20 million active monthly users. Search being the fundamental way to discover desired items, we have always had a substantial amount of data with implicit feedback. Although we actively take advantage of that to provide the best service for our users, the correlation of implicit feedback for such tasks as image quality assessment is not trivial. Many traditional lines of research in Machine Learning (ML) are similarly motivated by the insatiable appetite of Deep Learning (DL) models for well-labelled training data. Weak supervision is about leveraging higher-level and/or noisier supervision over unlabeled data. Large Language Models (LLMs) are being actively studied and used for data labelling tasks. We present how we leverage a Chain-of-Thought (CoT) to enable LLM to produce image aesthetics labels that correlate well with human behavior in e-commerce settings. Leveraging LLMs is more cost-effective compared to explicit human judgment, while significantly improving the explainability of deep image quality evaluation which is highly important for customer journey optimization at Mercari. We propose a cost-efficient LLM-driven approach for assessing and predicting image quality in e-commerce settings, which is very convenient for proof-of-concept testing. We show that our LLM-produced labels correlate with user behavior on Mercari. Finally, we show our results from an online experimentation, where we achieved a significant growth in sales on the web platform.

Read more

8/22/2024

MobileConvRec: A Conversational Dataset for Mobile Apps Recommendations
Total Score

0

MobileConvRec: A Conversational Dataset for Mobile Apps Recommendations

Srijata Maji, Moghis Fereidouni, Vinaik Chhetri, Umar Farooq, A. B. Siddique

Existing recommendation systems have focused on two paradigms: 1- historical user-item interaction-based recommendations and 2- conversational recommendations. Conversational recommendation systems facilitate natural language dialogues between users and the system, allowing the system to solicit users' explicit needs while enabling users to inquire about recommendations and provide feedback. Due to substantial advancements in natural language processing, conversational recommendation systems have gained prominence. Existing conversational recommendation datasets have greatly facilitated research in their respective domains. Despite the exponential growth in mobile users and apps in recent years, research in conversational mobile app recommender systems has faced substantial constraints. This limitation can primarily be attributed to the lack of high-quality benchmark datasets specifically tailored for mobile apps. To facilitate research for conversational mobile app recommendations, we introduce MobileConvRec. MobileConvRec simulates conversations by leveraging real user interactions with mobile apps on the Google Play store, originally captured in large-scale mobile app recommendation dataset MobileRec. The proposed conversational recommendation dataset synergizes sequential user-item interactions, which reflect implicit user preferences, with comprehensive multi-turn conversations to effectively grasp explicit user needs. MobileConvRec consists of over 12K multi-turn recommendation-related conversations spanning 45 app categories. Moreover, MobileConvRec presents rich metadata for each app such as permissions data, security and privacy-related information, and binary executables of apps, among others. We demonstrate that MobileConvRec can serve as an excellent testbed for conversational mobile app recommendation through a comparative study of several pre-trained large language models.

Read more

5/29/2024

GenRec: A Flexible Data Generator for Recommendations
Total Score

0

GenRec: A Flexible Data Generator for Recommendations

Erica Coppolillo, Simone Mungari, Ettore Ritacco, Giuseppe Manco

The scarcity of realistic datasets poses a significant challenge in benchmarking recommender systems and social network analysis methods and techniques. A common and effective solution is to generate synthetic data that simulates realistic interactions. However, although various methods have been proposed, the existing literature still lacks generators that are fully adaptable and allow easy manipulation of the underlying data distributions and structural properties. To address this issue, the present work introduces GenRec, a novel framework for generating synthetic user-item interactions that exhibit realistic and well-known properties observed in recommendation scenarios. The framework is based on a stochastic generative process based on latent factor modeling. Here, the latent factors can be exploited to yield long-tailed preference distributions, and at the same time they characterize subpopulations of users and topic-based item clusters. Notably, the proposed framework is highly flexible and offers a wide range of hyper-parameters for customizing the generation of user-item interactions. The code used to perform the experiments is publicly available at https://anonymous.4open.science/r/GenRec-DED3.

Read more

7/24/2024