RECE: Reduced Cross-Entropy Loss for Large-Catalogue Sequential Recommenders

Read original: arXiv:2408.02354 - Published 8/15/2024 by Danil Gusak, Gleb Mezentsev, Ivan Oseledets, Evgeny Frolov

RECE: Reduced Cross-Entropy Loss for Large-Catalogue Sequential Recommenders

Overview

The provided paper proposes a new loss function called Reduced Cross-Entropy (RECE) for training large-catalogue sequential recommender systems.
It aims to address the challenges of standard cross-entropy loss, which can be computationally expensive and ineffective for datasets with a large number of items.
The key ideas involve negative sampling and a simplified loss calculation to improve the efficiency and effectiveness of the training process.

Plain English Explanation

The paper is about improving the way recommendation systems are trained, especially for scenarios where there are a huge number of products or items to choose from. Typical recommendation systems use a technique called cross-entropy loss to figure out which items a user is most likely to want next. However, this can be very slow and inefficient when there are tons of items in the catalogue.

The researchers developed a new approach called Reduced Cross-Entropy (RECE) that aims to make this training process faster and more accurate. The key ideas are:

Negative Sampling: Instead of considering all possible items the user could choose, the model only looks at a small sample of "negative" items - items the user is unlikely to want. This reduces the number of calculations needed.
Simplified Loss Calculation: The standard cross-entropy loss formula is simplified, further reducing the computational burden. This "reduced" version of cross-entropy loss is what gives the approach its name.

By using these techniques, the researchers show that their RECE model can be trained more efficiently than traditional approaches, especially for datasets with a huge number of items. This could enable better, faster recommendations in real-world applications with large product catalogues.

Technical Explanation

The paper introduces a new loss function called Reduced Cross-Entropy (RECE) for training large-catalogue sequential recommender systems. The key innovations are:

Negative Sampling: Instead of considering all possible items the user could interact with next, the RECE approach only samples a small set of "negative" items - items the user is unlikely to choose. This reduces the computational cost compared to the standard cross-entropy loss, which requires evaluating the model's output for every item in the catalogue.
Simplified Loss Calculation: The standard cross-entropy loss formula is simplified by removing terms that do not directly contribute to the gradients used for updating the model parameters. This "reduced" version of the loss function, combined with negative sampling, significantly improves the efficiency of the training process.

The paper presents a detailed theoretical analysis of the RECE loss function, showing how it provides unbiased gradients for updating the model parameters. Experiments on several large-scale recommendation datasets demonstrate that the RECE approach outperforms standard cross-entropy loss in terms of recommendation accuracy and training efficiency, especially for datasets with a very large number of items.

Critical Analysis

The paper presents a well-designed and thorough evaluation of the RECE approach, including comparisons to state-of-the-art baselines on multiple large-scale recommendation datasets. The authors provide a clear theoretical justification for the RECE loss function and demonstrate its practical effectiveness.

One potential limitation is that the paper does not explore the sensitivity of the RECE approach to the choice of negative sampling strategy. The authors use a simple uniform random sampling approach, but more sophisticated negative sampling techniques could potentially further improve the performance.

Additionally, the paper does not discuss the potential impact of the RECE approach on model interpretability or the ability to explain recommendations to users. As recommendation systems become more widely deployed, there is an increasing focus on developing models that are not only accurate but also transparent and explainable.

Conclusion

The RECE approach presented in this paper represents a significant advancement in the field of large-catalogue sequential recommendation. By addressing the computational challenges of standard cross-entropy loss, the RECE model can be trained more efficiently while maintaining strong recommendation accuracy. This could enable the deployment of more powerful and scalable recommendation systems, particularly in domains with extremely large product catalogues.

The paper's contributions to improving the training process for large-scale recommender systems are an important step towards more efficient and effective recommendation technologies. As the field continues to evolve, further research on negative sampling strategies, model interpretability, and real-world deployment considerations could build upon the foundations laid in this work.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

RECE: Reduced Cross-Entropy Loss for Large-Catalogue Sequential Recommenders

Danil Gusak, Gleb Mezentsev, Ivan Oseledets, Evgeny Frolov

Scalability is a major challenge in modern recommender systems. In sequential recommendations, full Cross-Entropy (CE) loss achieves state-of-the-art recommendation quality but consumes excessive GPU memory with large item catalogs, limiting its practicality. Using a GPU-efficient locality-sensitive hashing-like algorithm for approximating large tensor of logits, this paper introduces a novel RECE (REduced Cross-Entropy) loss. RECE significantly reduces memory consumption while allowing one to enjoy the state-of-the-art performance of full CE loss. Experimental results on various datasets show that RECE cuts training peak memory usage by up to 12 times compared to existing methods while retaining or exceeding performance metrics of CE loss. The approach also opens up new possibilities for large-scale applications in other domains.

8/15/2024

Are LLM-based Recommenders Already the Best? Simple Scaled Cross-entropy Unleashes the Potential of Traditional Sequential Recommenders

Cong Xu, Zhangchi Zhu, Mo Yu, Jun Wang, Jianyong Wang, Wei Zhang

Large language models (LLMs) have been garnering increasing attention in the recommendation community. Some studies have observed that LLMs, when fine-tuned by the cross-entropy (CE) loss with a full softmax, could achieve `state-of-the-art' performance in sequential recommendation. However, most of the baselines used for comparison are trained using a pointwise/pairwise loss function. This inconsistent experimental setting leads to the underestimation of traditional methods and further fosters over-confidence in the ranking capability of LLMs. In this study, we provide theoretical justification for the superiority of the cross-entropy loss by demonstrating its two desirable properties: tightness and coverage. Furthermore, this study sheds light on additional novel insights: 1) Taking into account only the recommendation performance, CE is not yet optimal as it is not a quite tight bound in terms of some ranking metrics. 2) In scenarios that full softmax cannot be performed, an effective alternative is to scale up the sampled normalizing term. These findings then help unleash the potential of traditional recommendation models, allowing them to surpass LLM-based counterparts. Given the substantial computational burden, existing LLM-based methods are not as effective as claimed for sequential recommendation. We hope that these theoretical understandings in conjunction with the empirical results will facilitate an objective evaluation of LLM-based recommendation in the future.

8/27/2024

SimCE: Simplifying Cross-Entropy Loss for Collaborative Filtering

Xiaodong Yang, Huiyuan Chen, Yuchen Yan, Yuxin Tang, Yuying Zhao, Eric Xu, Yiwei Cai, Hanghang Tong

The learning objective is integral to collaborative filtering systems, where the Bayesian Personalized Ranking (BPR) loss is widely used for learning informative backbones. However, BPR often experiences slow convergence and suboptimal local optima, partially because it only considers one negative item for each positive item, neglecting the potential impacts of other unobserved items. To address this issue, the recently proposed Sampled Softmax Cross-Entropy (SSM) compares one positive sample with multiple negative samples, leading to better performance. Our comprehensive experiments confirm that recommender systems consistently benefit from multiple negative samples during training. Furthermore, we introduce a underline{Sim}plified Sampled Softmax underline{C}ross-underline{E}ntropy Loss (SimCE), which simplifies the SSM using its upper bound. Our validation on 12 benchmark datasets, using both MF and LightGCN backbones, shows that SimCE significantly outperforms both BPR and SSM.

6/26/2024

🏷️

Adaptive Retrieval and Scalable Indexing for k-NN Search with Cross-Encoders

Nishant Yadav, Nicholas Monath, Manzil Zaheer, Rob Fergus, Andrew McCallum

Cross-encoder (CE) models which compute similarity by jointly encoding a query-item pair perform better than embedding-based models (dual-encoders) at estimating query-item relevance. Existing approaches perform k-NN search with CE by approximating the CE similarity with a vector embedding space fit either with dual-encoders (DE) or CUR matrix factorization. DE-based retrieve-and-rerank approaches suffer from poor recall on new domains and the retrieval with DE is decoupled from the CE. While CUR-based approaches can be more accurate than the DE-based approach, they require a prohibitively large number of CE calls to compute item embeddings, thus making it impractical for deployment at scale. In this paper, we address these shortcomings with our proposed sparse-matrix factorization based method that efficiently computes latent query and item embeddings to approximate CE scores and performs k-NN search with the approximate CE similarity. We compute item embeddings offline by factorizing a sparse matrix containing query-item CE scores for a set of train queries. Our method produces a high-quality approximation while requiring only a fraction of CE calls as compared to CUR-based methods, and allows for leveraging DE to initialize the embedding space while avoiding compute- and resource-intensive finetuning of DE via distillation. At test time, the item embeddings remain fixed and retrieval occurs over rounds, alternating between a) estimating the test query embedding by minimizing error in approximating CE scores of items retrieved thus far, and b) using the updated test query embedding for retrieving more items. Our k-NN search method improves recall by up to 5% (k=1) and 54% (k=100) over DE-based approaches. Additionally, our indexing approach achieves a speedup of up to 100x over CUR-based and 5x over DE distillation methods, while matching or improving k-NN search recall over baselines.

5/7/2024