Scaling User Modeling: Large-scale Online User Representations for Ads Personalization in Meta

Read original: arXiv:2311.09544 - Published 5/24/2024 by Wei Zhang, Dai Li, Chen Liang, Fang Zhou, Zhongke Zhang, Xuewei Wang, Ru Li, Yi Zhou, Yaning Huang, Dong Liang and 10 others

⚙️

Overview

Efficient user representations are crucial for personalized advertising, but strict constraints on training throughput, serving latency, and memory often limit the complexity and input feature set of online ads ranking models.
This challenge is magnified in extensive systems like Meta's, which encompass hundreds of models with diverse specifications, making it impractical to tailor user representation learning for each model.
To address these challenges, the authors present Scaling User Modeling (SUM), a framework widely deployed in Meta's ads ranking system, designed to facilitate efficient and scalable sharing of online user representation across hundreds of ads models.

Plain English Explanation

Personalized advertising is crucial for businesses to reach the right customers, but the models used to decide which ads to show can be limited by technical constraints. This is especially true for large companies like Meta (Facebook), which have hundreds of different ads models with various requirements.

To solve this problem, the researchers developed a system called Scaling User Modeling (SUM). SUM uses a few "upstream" models to create detailed representations of users based on a large amount of user data. These user representations are then shared with the many "downstream" ads models, allowing them to make better decisions about what ads to show without needing to process all the user data themselves.

To keep these user representations up-to-date as users' interests and behaviors change, SUM also includes an online serving system called SOAP (SUM Online Asynchronous Platform). SOAP allows the user models to be updated frequently and generate new user representations on the fly when needed, without causing delays for users.

Overall, SUM helps large advertising systems like Meta's run more efficiently and effectively by enabling the sharing of high-quality user representations across many different models.

Technical Explanation

The SUM framework leverages a few designated "upstream" user models to synthesize user embeddings from massive amounts of user features using advanced modeling techniques. These user embeddings then serve as inputs to the many "downstream" online ads ranking models, promoting efficient representation sharing.

To adapt to the dynamic nature of user features and ensure embedding freshness, the authors designed the SUM Online Asynchronous Platform (SOAP), a latency-free online serving system complemented with model freshness and embedding stabilization. SOAP enables frequent user model updates and online inference of user embeddings upon each user request.

The authors share their hands-on deployment experiences for the SUM framework and validate its superiority through comprehensive experiments. SUM has been launched to hundreds of ads ranking models in Meta, processing hundreds of billions of user requests daily, yielding significant online metric gains and improved infrastructure efficiency.

Critical Analysis

The paper provides a thorough and well-documented approach to addressing the challenges of user representation learning in large-scale advertising systems. The authors demonstrate the practical deployment and effectiveness of the SUM framework within Meta's ads ranking system, which is a significant achievement.

However, the paper does not explicitly discuss potential limitations or areas for further research. For example, it would be interesting to understand how the SUM framework handles the privacy and ethical considerations of using large amounts of user data, or how it might be adapted to work with different types of personalized systems beyond advertising.

Additionally, while the paper highlights the performance gains achieved through SUM, a more detailed analysis of the trade-offs between the complexity of the user representation and the downstream model performance could provide valuable insights for researchers and practitioners in the field.

Conclusion

The Scaling User Modeling (SUM) framework presented in this paper is a significant contribution to the field of personalized advertising and user representation learning. By enabling efficient and scalable sharing of user representations across hundreds of ads models, SUM helps large advertising systems like Meta's operate more effectively while adhering to strict technical constraints.

The authors' practical deployment experience and the demonstrated performance gains highlight the real-world impact of the SUM framework. As the demand for personalized and targeted advertising continues to grow, solutions like SUM will become increasingly important in ensuring the scalability and efficiency of these systems, while also considering the ethical implications of using large amounts of user data.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

⚙️

Scaling User Modeling: Large-scale Online User Representations for Ads Personalization in Meta

Wei Zhang, Dai Li, Chen Liang, Fang Zhou, Zhongke Zhang, Xuewei Wang, Ru Li, Yi Zhou, Yaning Huang, Dong Liang, Kai Wang, Zhangyuan Wang, Zhengxing Chen, Fenggang Wu, Minghai Chen, Huayu Li, Yunnan Wu, Zhan Shu, Mindi Yuan, Sri Reddy

Effective user representations are pivotal in personalized advertising. However, stringent constraints on training throughput, serving latency, and memory, often limit the complexity and input feature set of online ads ranking models. This challenge is magnified in extensive systems like Meta's, which encompass hundreds of models with diverse specifications, rendering the tailoring of user representation learning for each model impractical. To address these challenges, we present Scaling User Modeling (SUM), a framework widely deployed in Meta's ads ranking system, designed to facilitate efficient and scalable sharing of online user representation across hundreds of ads models. SUM leverages a few designated upstream user models to synthesize user embeddings from massive amounts of user features with advanced modeling techniques. These embeddings then serve as inputs to downstream online ads ranking models, promoting efficient representation sharing. To adapt to the dynamic nature of user features and ensure embedding freshness, we designed SUM Online Asynchronous Platform (SOAP), a latency free online serving system complemented with model freshness and embedding stabilization, which enables frequent user model updates and online inference of user embeddings upon each user request. We share our hands-on deployment experiences for the SUM framework and validate its superiority through comprehensive experiments. To date, SUM has been launched to hundreds of ads ranking models in Meta, processing hundreds of billions of user requests daily, yielding significant online metric gains and improved infrastructure efficiency.

5/24/2024

💬

EmbSum: Leveraging the Summarization Capabilities of Large Language Models for Content-Based Recommendations

Chiyu Zhang, Yifei Sun, Minghao Wu, Jun Chen, Jie Lei, Muhammad Abdul-Mageed, Rong Jin, Angli Liu, Ji Zhu, Sem Park, Ning Yao, Bo Long

Content-based recommendation systems play a crucial role in delivering personalized content to users in the digital world. In this work, we introduce EmbSum, a novel framework that enables offline pre-computations of users and candidate items while capturing the interactions within the user engagement history. By utilizing the pretrained encoder-decoder model and poly-attention layers, EmbSum derives User Poly-Embedding (UPE) and Content Poly-Embedding (CPE) to calculate relevance scores between users and candidate items. EmbSum actively learns the long user engagement histories by generating user-interest summary with supervision from large language model (LLM). The effectiveness of EmbSum is validated on two datasets from different domains, surpassing state-of-the-art (SoTA) methods with higher accuracy and fewer parameters. Additionally, the model's ability to generate summaries of user interests serves as a valuable by-product, enhancing its usefulness for personalized content recommendations.

8/20/2024

Async Learned User Embeddings for Ads Delivery Optimization

Mingwei Tang, Meng Liu, Hong Li, Junjie Yang, Chenglin Wei, Boyang Li, Dai Li, Rengan Xu, Yifan Xu, Zehua Zhang, Xiangyu Wang, Linfeng Liu, Yuelei Xie, Chengye Liu, Labib Fawaz, Li Li, Hongnan Wang, Bill Zhu, Sri Reddy

In recommendation systems, high-quality user embeddings can capture subtle preferences, enable precise similarity calculations, and adapt to changing preferences over time to maintain relevance. The effectiveness of recommendation systems depends on the quality of user embedding. We propose to asynchronously learn high fidelity user embeddings for billions of users each day from sequence based multimodal user activities through a Transformer-like large scale feature learning module. The async learned user representations embeddings (ALURE) are further converted to user similarity graphs through graph learning and then combined with user realtime activities to retrieval highly related ads candidates for the ads delivery system. Our method shows significant gains in both offline and online experiments.

6/26/2024

SoMeR: Multi-View User Representation Learning for Social Media

Siyi Guo, Keith Burghardt, Valeria Pant`e, Kristina Lerman

User representation learning aims to capture user preferences, interests, and behaviors in low-dimensional vector representations. These representations have widespread applications in recommendation systems and advertising; however, existing methods typically rely on specific features like text content, activity patterns, or platform metadata, failing to holistically model user behavior across different modalities. To address this limitation, we propose SoMeR, a Social Media user Representation learning framework that incorporates temporal activities, text content, profile information, and network interactions to learn comprehensive user portraits. SoMeR encodes user post streams as sequences of timestamped textual features, uses transformers to embed this along with profile data, and jointly trains with link prediction and contrastive learning objectives to capture user similarity. We demonstrate SoMeR's versatility through two applications: 1) Identifying inauthentic accounts involved in coordinated influence operations by detecting users posting similar content simultaneously, and 2) Measuring increased polarization in online discussions after major events by quantifying how users with different beliefs moved farther apart in the embedding space. SoMeR's ability to holistically model users enables new solutions to important problems around disinformation, societal tensions, and online behavior understanding.

5/10/2024