Are LLM-based Recommenders Already the Best? Simple Scaled Cross-entropy Unleashes the Potential of Traditional Sequential Recommenders

Read original: arXiv:2408.14238 - Published 8/27/2024 by Cong Xu, Zhangchi Zhu, Mo Yu, Jun Wang, Jianyong Wang, Wei Zhang

Are LLM-based Recommenders Already the Best? Simple Scaled Cross-entropy Unleashes the Potential of Traditional Sequential Recommenders

Overview

The paper explores the potential of traditional sequential recommenders over large language model-based (LLM) recommenders.
It proposes a simple scaled cross-entropy loss function that can unleash the power of traditional recommenders and outperform LLM-based approaches.
The authors demonstrate the effectiveness of their approach through experiments on various benchmark datasets.

Plain English Explanation

Recommender systems are algorithms that suggest products, content, or services that users might be interested in. There are two main types of recommender systems: traditional sequential recommenders and large language model-based (LLM) recommenders.

Traditional sequential recommenders use techniques like collaborative filtering to make recommendations based on user behavior and item interactions. LLM-based recommenders, on the other hand, use large language models trained on vast amounts of data to generate personalized recommendations.

This paper suggests that traditional sequential recommenders may have more potential than we give them credit for. The researchers developed a simple scaled cross-entropy loss function that can significantly improve the performance of traditional recommenders, allowing them to outperform even the latest LLM-based approaches.

By empowering small language models for sequential recommendation, the authors show that traditional recommenders can be just as effective, if not more so, than the more complex LLM-based systems. This could have important implications for the field of recommender systems, as traditional approaches may be more efficient and cost-effective than relying solely on large language models.

Technical Explanation

The paper starts by questioning the prevailing assumption that LLM-based recommenders are already the best approach. The authors argue that traditional sequential recommenders still have significant untapped potential that can be unlocked through simple modifications to the loss function.

To demonstrate this, the researchers propose a Scaled Cross-Entropy (SCE) loss function, which is a simple scaling of the standard cross-entropy loss commonly used in recommender systems. By adjusting the scaling factor, the SCE loss can effectively capture the relative importance of different items in the recommendation list, leading to improved performance.

The authors conduct extensive experiments on several benchmark datasets, comparing the performance of traditional recommenders with SCE loss to both state-of-the-art LLM-based approaches and traditional recommenders with standard cross-entropy loss. The results show that the SCE loss can significantly outperform the LLM-based methods, highlighting the potential of traditional recommenders when properly optimized.

The paper also discusses the practical implications of their findings, suggesting that traditional recommenders may be more efficient and scalable than LLM-based systems, particularly in scenarios with large item catalogs. This could make traditional approaches more appealing for real-world deployment, especially in resource-constrained environments.

Critical Analysis

The paper makes a compelling case for the continued relevance and potential of traditional sequential recommenders, which is an important counterpoint to the growing dominance of LLM-based approaches in the field. By demonstrating the effectiveness of a simple modification to the loss function, the authors show that traditional recommenders can be highly competitive with state-of-the-art LLM-based systems.

However, the paper does not delve into the potential limitations or caveats of the proposed SCE loss function. It would be valuable to understand the specific conditions or datasets where this approach may be less effective, as well as any potential drawbacks or tradeoffs that users should be aware of.

Additionally, the paper could have explored the underlying reasons why the SCE loss function is so effective in unleashing the potential of traditional recommenders. A deeper analysis of the theoretical or intuitive justifications for this approach could provide valuable insights for the broader recommender systems community.

Finally, while the paper highlights the practical implications of its findings, it does not address the potential challenges or obstacles that may arise in deploying traditional recommenders at scale, particularly in the face of the increasing dominance of LLM-based systems. Addressing these practical considerations could further strengthen the paper's impact and relevance.

Conclusion

This paper challenges the assumption that LLM-based recommenders are inherently superior to traditional sequential recommenders. By proposing a simple scaled cross-entropy loss function, the authors demonstrate that traditional approaches can be highly competitive and, in some cases, outperform state-of-the-art LLM-based systems.

The findings of this research have the potential to reshape the landscape of recommender systems, suggesting that traditional techniques still have significant untapped potential that can be unlocked through targeted optimizations. This could lead to more efficient and scalable recommender systems, particularly in resource-constrained environments, and inspire further research into the continued evolution of traditional recommender approaches.

Overall, this paper offers a compelling perspective on the ongoing debate between LLM-based and traditional recommender systems, and its insights could have far-reaching implications for the future development of recommendation technologies.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

Are LLM-based Recommenders Already the Best? Simple Scaled Cross-entropy Unleashes the Potential of Traditional Sequential Recommenders

Cong Xu, Zhangchi Zhu, Mo Yu, Jun Wang, Jianyong Wang, Wei Zhang

Large language models (LLMs) have been garnering increasing attention in the recommendation community. Some studies have observed that LLMs, when fine-tuned by the cross-entropy (CE) loss with a full softmax, could achieve `state-of-the-art' performance in sequential recommendation. However, most of the baselines used for comparison are trained using a pointwise/pairwise loss function. This inconsistent experimental setting leads to the underestimation of traditional methods and further fosters over-confidence in the ranking capability of LLMs. In this study, we provide theoretical justification for the superiority of the cross-entropy loss by demonstrating its two desirable properties: tightness and coverage. Furthermore, this study sheds light on additional novel insights: 1) Taking into account only the recommendation performance, CE is not yet optimal as it is not a quite tight bound in terms of some ranking metrics. 2) In scenarios that full softmax cannot be performed, an effective alternative is to scale up the sampled normalizing term. These findings then help unleash the potential of traditional recommendation models, allowing them to surpass LLM-based counterparts. Given the substantial computational burden, existing LLM-based methods are not as effective as claimed for sequential recommendation. We hope that these theoretical understandings in conjunction with the empirical results will facilitate an objective evaluation of LLM-based recommendation in the future.

8/27/2024

RECE: Reduced Cross-Entropy Loss for Large-Catalogue Sequential Recommenders

Danil Gusak, Gleb Mezentsev, Ivan Oseledets, Evgeny Frolov

Scalability is a major challenge in modern recommender systems. In sequential recommendations, full Cross-Entropy (CE) loss achieves state-of-the-art recommendation quality but consumes excessive GPU memory with large item catalogs, limiting its practicality. Using a GPU-efficient locality-sensitive hashing-like algorithm for approximating large tensor of logits, this paper introduces a novel RECE (REduced Cross-Entropy) loss. RECE significantly reduces memory consumption while allowing one to enjoy the state-of-the-art performance of full CE loss. Experimental results on various datasets show that RECE cuts training peak memory usage by up to 12 times compared to existing methods while retaining or exceeding performance metrics of CE loss. The approach also opens up new possibilities for large-scale applications in other domains.

8/15/2024

SimCE: Simplifying Cross-Entropy Loss for Collaborative Filtering

Xiaodong Yang, Huiyuan Chen, Yuchen Yan, Yuxin Tang, Yuying Zhao, Eric Xu, Yiwei Cai, Hanghang Tong

The learning objective is integral to collaborative filtering systems, where the Bayesian Personalized Ranking (BPR) loss is widely used for learning informative backbones. However, BPR often experiences slow convergence and suboptimal local optima, partially because it only considers one negative item for each positive item, neglecting the potential impacts of other unobserved items. To address this issue, the recently proposed Sampled Softmax Cross-Entropy (SSM) compares one positive sample with multiple negative samples, leading to better performance. Our comprehensive experiments confirm that recommender systems consistently benefit from multiple negative samples during training. Furthermore, we introduce a underline{Sim}plified Sampled Softmax underline{C}ross-underline{E}ntropy Loss (SimCE), which simplifies the SSM using its upper bound. Our validation on 12 benchmark datasets, using both MF and LightGCN backbones, shows that SimCE significantly outperforms both BPR and SSM.

6/26/2024

Entropic Distribution Matching in Supervised Fine-tuning of LLMs: Less Overfitting and Better Diversity

Ziniu Li, Congliang Chen, Tian Xu, Zeyu Qin, Jiancong Xiao, Ruoyu Sun, Zhi-Quan Luo

Large language models rely on Supervised Fine-Tuning (SFT) to specialize in downstream tasks. Cross Entropy (CE) loss is the de facto choice in SFT, but it often leads to overfitting and limited output diversity due to its aggressive updates to the data distribution. This paper aim to address these issues by introducing the maximum entropy principle, which favors models with flatter distributions that still effectively capture the data. Specifically, we develop a new distribution matching method called GEM, which solves reverse Kullback-Leibler divergence minimization with an entropy regularizer. For the SFT of Llama-3-8B models, GEM outperforms CE in several aspects. First, when applied to the UltraFeedback dataset to develop general instruction-following abilities, GEM exhibits reduced overfitting, evidenced by lower perplexity and better performance on the IFEval benchmark. Furthermore, GEM enhances output diversity, leading to performance gains of up to 7 points on math reasoning and code generation tasks using best-of-n sampling, even without domain-specific data. Second, when fine-tuning with domain-specific datasets for math reasoning and code generation, GEM also shows less overfitting and improvements of up to 10 points compared with CE.

8/30/2024