A Model-based Multi-Agent Personalized Short-Video Recommender System

2405.01847

Published 5/6/2024 by Peilun Zhou, Xiaoxiao Xu, Lantao Hu, Han Li, Peng Jiang

🛠️

Abstract

Recommender selects and presents top-K items to the user at each online request, and a recommendation session consists of several sequential requests. Formulating a recommendation session as a Markov decision process and solving it by reinforcement learning (RL) framework has attracted increasing attention from both academic and industry communities. In this paper, we propose a RL-based industrial short-video recommender ranking framework, which models and maximizes user watch-time in an environment of user multi-aspect preferences by a collaborative multi-agent formulization. Moreover, our proposed framework adopts a model-based learning approach to alleviate the sample selection bias which is a crucial but intractable problem in industrial recommender system. Extensive offline evaluations and live experiments confirm the effectiveness of our proposed method over alternatives. Our proposed approach has been deployed in our real large-scale short-video sharing platform, successfully serving over hundreds of millions users.

Create account to get full access

Overview

The paper proposes a reinforcement learning (RL)-based framework for a short-video recommender system that models and maximizes user watch-time while considering user multi-aspect preferences.
The framework adopts a model-based learning approach to address the sample selection bias, a crucial but intractable problem in industrial recommender systems.
The proposed approach has been deployed in a large-scale short-video sharing platform, successfully serving hundreds of millions of users.

Plain English Explanation

In the world of online content, recommendation systems play a crucial role in helping users discover new and engaging content. A reinforcement learning-based approach for sequential recommender systems can be particularly effective in this context.

The proposed framework in this paper models the recommendation process as a Markov decision process, where the recommender system selects and presents the top-K items to the user at each request. The system then learns to maximize the user's watch-time through a collaborative multi-agent formulation, considering the user's multi-aspect preferences.

One key challenge in industrial recommender systems is the sample selection bias, where the data available for training the system may not accurately represent the full user population. To address this, the authors adopt a model-based learning approach, which can help alleviate this bias. An off-policy reinforcement learning algorithm for customized multi-objective recommendation and reformulating sequential recommendation as learning dynamic user interest are two relevant techniques that could be explored further in this context.

The effectiveness of the proposed approach has been validated through extensive offline evaluations and live experiments, and it has been successfully deployed in a large-scale short-video sharing platform, serving hundreds of millions of users. Cache-aware reinforcement learning for large-scale recommender systems and prompt-based multi-interest learning for sequential recommendation are other relevant techniques that could enhance the performance of such a recommender system.

Technical Explanation

The paper proposes a reinforcement learning-based framework for a short-video recommender system that models and maximizes user watch-time while considering user multi-aspect preferences. The authors formulate the recommendation process as a Markov decision process, where the recommender system selects and presents the top-K items to the user at each request.

The proposed framework adopts a collaborative multi-agent formulation to learn the optimal recommendation policy. This approach allows the system to jointly consider various user preferences, such as content, context, and user interaction history, to provide personalized recommendations that maximize the user's watch-time.

To address the sample selection bias, a crucial problem in industrial recommender systems, the authors employ a model-based learning approach. This approach involves learning a predictive model of the user's response to recommendations, which can then be used to generate synthetic data and alleviate the bias in the observed data.

The authors conduct extensive offline evaluations and live experiments to validate the effectiveness of their proposed approach. The framework has been successfully deployed in a large-scale short-video sharing platform, serving hundreds of millions of users.

Critical Analysis

The paper presents a promising approach to improving the performance of industrial recommender systems, particularly in the context of short-video platforms. The authors' focus on modeling and maximizing user watch-time, while considering multi-aspect user preferences, is a compelling strategy for enhancing user engagement and satisfaction.

One potential limitation of the research is the reliance on a collaborative multi-agent formulation, which may introduce additional complexity and computational requirements. It would be interesting to explore alternative RL-based approaches, such as robust reinforcement learning objectives for sequential recommender systems or prompt-based multi-interest learning for sequential recommendation, to see if they can achieve similar or better performance with lower overhead.

Additionally, while the authors address the sample selection bias through a model-based learning approach, it would be valuable to understand the specific techniques used and their effectiveness in addressing this problem. Exploring other bias mitigation strategies, such as off-policy reinforcement learning algorithms for customized multi-objective recommendation or cache-aware reinforcement learning for large-scale recommender systems, could provide further insights.

Overall, the proposed framework represents a significant contribution to the field of recommender systems and has demonstrated its real-world applicability through successful deployment in a large-scale platform. Continued research and refinement of the approach could lead to even more impactful advancements in the industry.

Conclusion

The paper presents a reinforcement learning-based framework for a short-video recommender system that models and maximizes user watch-time while considering multi-aspect user preferences. The key innovations include the collaborative multi-agent formulation and the adoption of a model-based learning approach to address the sample selection bias in industrial recommender systems.

The proposed framework has been successfully deployed in a large-scale short-video sharing platform, serving hundreds of millions of users. The research advances the state-of-the-art in recommender systems and demonstrates the potential of reinforcement learning-based approaches to enhance user engagement and satisfaction in online content platforms.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

An LLM-based Recommender System Environment

Nathan Corecco, Giorgio Piatti, Luca A. Lanzendorfer, Flint Xiaofeng Fan, Roger Wattenhofer

Reinforcement learning (RL) has gained popularity in the realm of recommender systems due to its ability to optimize long-term rewards and guide users in discovering relevant content. However, the successful implementation of RL in recommender systems is challenging because of several factors, including the limited availability of online data for training on-policy methods. This scarcity requires expensive human interaction for online model training. Furthermore, the development of effective evaluation frameworks that accurately reflect the quality of models remains a fundamental challenge in recommender systems. To address these challenges, we propose a comprehensive framework for synthetic environments that simulate human behavior by harnessing the capabilities of large language models (LLMs). We complement our framework with in-depth ablation studies and demonstrate its effectiveness with experiments on movie and book recommendations. By utilizing LLMs as synthetic users, this work introduces a modular and novel framework for training RL-based recommender systems. The software, including the RL environment, is publicly available.

6/5/2024

cs.IR cs.LG

🏅

New!Multi-Scenario Combination Based on Multi-Agent Reinforcement Learning to Optimize the Advertising Recommendation System

Yang Zhao, Chang Zhou, Jin Cao, Yi Zhao, Shaobo Liu, Chiyu Cheng, Xingchen Li

This paper explores multi-scenario optimization on large platforms using multi-agent reinforcement learning (MARL). We address this by treating scenarios like search, recommendation, and advertising as a cooperative, partially observable multi-agent decision problem. We introduce the Multi-Agent Recurrent Deterministic Policy Gradient (MARDPG) algorithm, which aligns different scenarios under a shared objective and allows for strategy communication to boost overall performance. Our results show marked improvements in metrics such as click-through rate (CTR), conversion rate, and total sales, confirming our method's efficacy in practical settings.

7/4/2024

cs.LG cs.AI

🏅

Robust Reinforcement Learning Objectives for Sequential Recommender Systems

Melissa Mozifian, Tristan Sylvain, Dave Evans, Lili Meng

Attention-based sequential recommendation methods have shown promise in accurately capturing users' evolving interests from their past interactions. Recent research has also explored the integration of reinforcement learning (RL) into these models, in addition to generating superior user representations. By framing sequential recommendation as an RL problem with reward signals, we can develop recommender systems that incorporate direct user feedback in the form of rewards, enhancing personalization for users. Nonetheless, employing RL algorithms presents challenges, including off-policy training, expansive combinatorial action spaces, and the scarcity of datasets with sufficient reward signals. Contemporary approaches have attempted to combine RL and sequential modeling, incorporating contrastive-based objectives and negative sampling strategies for training the RL component. In this work, we further emphasize the efficacy of contrastive-based objectives paired with augmentation to address datasets with extended horizons. Additionally, we recognize the potential instability issues that may arise during the application of negative sampling. These challenges primarily stem from the data imbalance prevalent in real-world datasets, which is a common issue in offline RL contexts. Furthermore, we introduce an enhanced methodology aimed at providing a more effective solution to these challenges. Experimental results across several real datasets show our method with increased robustness and state-of-the-art performance.

4/19/2024

cs.LG cs.AI cs.IR

New!Multimodal Pretraining, Adaptation, and Generation for Recommendation: A Survey

Qijiong Liu, Jieming Zhu, Yanting Yang, Quanyu Dai, Zhaocheng Du, Xiao-Ming Wu, Zhou Zhao, Rui Zhang, Zhenhua Dong

Personalized recommendation serves as a ubiquitous channel for users to discover information tailored to their interests. However, traditional recommendation models primarily rely on unique IDs and categorical features for user-item matching, potentially overlooking the nuanced essence of raw item contents across multiple modalities such as text, image, audio, and video. This underutilization of multimodal data poses a limitation to recommender systems, especially in multimedia services like news, music, and short-video platforms. The recent advancements in large multimodal models offer new opportunities and challenges in developing content-aware recommender systems. This survey seeks to provide a comprehensive exploration of the latest advancements and future trajectories in multimodal pretraining, adaptation, and generation techniques, as well as their applications in enhancing recommender systems. Furthermore, we discuss current open challenges and opportunities for future research in this dynamic domain. We believe that this survey, alongside the curated resources, will provide valuable insights to inspire further advancements in this evolving landscape.

7/4/2024

cs.IR cs.MM