Learning from Streaming Data when Users Choose

2406.01481

Published 6/4/2024 by Jinyan Su, Sarah Dean

Learning from Streaming Data when Users Choose

Abstract

In digital markets comprised of many competing services, each user chooses between multiple service providers according to their preferences, and the chosen service makes use of the user data to incrementally improve its model. The service providers' models influence which service the user will choose at the next time step, and the user's choice, in return, influences the model update, leading to a feedback loop. In this paper, we formalize the above dynamics and develop a simple and efficient decentralized algorithm to locally minimize the overall user loss. Theoretically, we show that our algorithm asymptotically converges to stationary points of of the overall loss almost surely. We also experimentally demonstrate the utility of our algorithm with real world data.

Create account to get full access

Overview

The paper explores learning algorithms for streaming data scenarios where users choose which data to interact with.
This setting is common in real-world applications like online advertising, recommendation systems, and interactive AI assistants.
The key challenge is that the learning algorithm only observes the data that users choose to engage with, rather than the full data distribution.
The paper proposes new learning algorithms and analysis techniques to handle this "selective labeling" problem.

Plain English Explanation

In many real-world applications, such as online advertising, recommendation systems, and interactive AI assistants, the data that the learning algorithm observes is not a random sample of the full data distribution. Instead, users choose which data to interact with, so the algorithm only sees the data that users engage with.

This "selective labeling" problem creates challenges for traditional machine learning approaches, which assume the training data is representative of the full data distribution. The paper proposes new learning algorithms and analysis techniques to handle this issue.

The core idea is to leverage the patterns in how users choose to interact with the data, in addition to the data itself, to improve the learning algorithm's performance. By understanding the user's selection process, the algorithm can better estimate the true underlying data distribution and make more accurate predictions.

This research has important implications for building effective machine learning systems that can learn and adapt in real-world, user-driven environments. The techniques developed in this paper could help enable more robust and timely learning from user interactions, as well as better align the system's objectives with user preferences and human feedback.

Technical Explanation

The paper proposes a new learning framework for scenarios where the training data is selectively labeled by users. Specifically, the authors consider a setting where the learner observes a stream of data points, and for each data point, the user decides whether to interact with it or not. The learner only sees the data points that the user chooses to interact with, rather than the full data distribution.

To address this challenge, the authors develop two new learning algorithms: a generic algorithm that can be applied to a wide range of base models, and a more specialized algorithm for logistic regression. Both algorithms work by estimating the user's selection process, and then using this understanding to correct the bias in the observed data and improve the overall learning performance.

The key technical insights are:

Modeling the user's selection process as a latent variable in the learning problem, and developing efficient inference techniques to estimate this process.
Deriving generalization bounds that characterize the impact of selective labeling on the learner's performance, and using these bounds to design improved algorithms.
Showing that the proposed algorithms can achieve strong theoretical guarantees, including fast convergence rates and robustness to distribution shift.

The paper demonstrates the effectiveness of the proposed approaches through experiments on several real-world datasets, including online advertising and movie recommendation scenarios. The results show that the new algorithms can significantly outperform traditional methods that do not account for selective labeling, particularly in settings with high user-data mismatch or distribution shift.

Critical Analysis

The paper makes an important contribution to the field of machine learning by addressing the crucial issue of learning from selectively labeled data, which is a common challenge in many real-world applications. The proposed algorithms and analysis techniques provide a solid foundation for building more robust and effective learning systems in user-driven environments.

However, the paper does have some limitations and areas for further research:

The analysis focuses on the batch setting, where the full stream of data is available upfront. Extending the techniques to the truly online, streaming setting with limited memory and computational resources would be an important next step.
The paper assumes that the user's selection process can be accurately modeled as a latent variable. In practice, this process may be more complex and difficult to estimate, especially in settings with diverse user behaviors or evolving preferences.
The experimental evaluation is limited to relatively small-scale datasets. Assessing the scalability and practical performance of the algorithms on large-scale, industrial-level applications would help validate their real-world applicability.

Additionally, future research could explore ways to jointly optimize the user interaction design and the learning algorithm, in order to better align the system's objectives with user preferences and feedback. This could lead to more effective uplift modeling approaches that can better leverage user interactions to improve learning performance.

Conclusion

This paper tackles an important and practical problem in machine learning: learning from streaming data when users selectively choose which data to engage with. The proposed algorithms and analysis techniques provide a strong foundation for building more robust and effective learning systems in real-world, user-driven environments.

By accounting for the user's selective labeling process, the new methods can outperform traditional approaches that assume the training data is representative of the full data distribution. This has significant implications for a wide range of applications, from online advertising and recommendation systems to interactive AI assistants.

While the paper has some limitations, it represents an important step forward in addressing the challenges of learning from selectively labeled data. Further research in this direction, including extensions to the truly online, streaming setting and tighter integration with user interaction design, could lead to even more powerful and user-centric machine learning systems.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Related Papers

⚙️

Emergent specialization from participation dynamics and multi-learner retraining

Sarah Dean, Mihaela Curmei, Lillian J. Ratliff, Jamie Morgenstern, Maryam Fazel

Numerous online services are data-driven: the behavior of users affects the system's parameters, and the system's parameters affect the users' experience of the service, which in turn affects the way users may interact with the system. For example, people may choose to use a service only for tasks that already works well, or they may choose to switch to a different service. These adaptations influence the ability of a system to learn about a population of users and tasks in order to improve its performance broadly. In this work, we analyze a class of such dynamics -- where users allocate their participation amongst services to reduce the individual risk they experience, and services update their model parameters to reduce the service's risk on their current user population. We refer to these dynamics as emph{risk-reducing}, which cover a broad class of common model updates including gradient descent and multiplicative weights. For this general class of dynamics, we show that asymptotically stable equilibria are always segmented, with sub-populations allocated to a single learner. Under mild assumptions, the utilitarian social optimum is a stable equilibrium. In contrast to previous work, which shows that repeated risk minimization can result in (Hashimoto et al., 2018; Miller et al., 2021), we find that repeated myopic updates with multiple learners lead to better outcomes. We illustrate the phenomena via a simulated example initialized from real data.

4/30/2024

cs.LG cs.GT stat.ML

Scale-Robust Timely Asynchronous Decentralized Learning

Purbesh Mitra, Sennur Ulukus

We consider an asynchronous decentralized learning system, which consists of a network of connected devices trying to learn a machine learning model without any centralized parameter server. The users in the network have their own local training data, which is used for learning across all the nodes in the network. The learning method consists of two processes, evolving simultaneously without any necessary synchronization. The first process is the model update, where the users update their local model via a fixed number of stochastic gradient descent steps. The second process is model mixing, where the users communicate with each other via randomized gossiping to exchange their models and average them to reach consensus. In this work, we investigate the staleness criteria for such a system, which is a sufficient condition for convergence of individual user models. We show that for network scaling, i.e., when the number of user devices $n$ is very large, if the gossip capacity of individual users scales as $Omega(log n)$, we can guarantee the convergence of user models in finite time. Furthermore, we show that the bounded staleness can only be guaranteed by any distributed opportunistic scheme by $Omega(n)$ scaling.

5/1/2024

cs.IT cs.LG cs.MA cs.NI eess.SP

🐍

From Stream to Pool: Dynamic Pricing Beyond i.i.d. Arrivals

Titing Cui, Su Jia, Thomas Lavastida

Dynamic pricing models often posit that a $textbf{stream}$ of customer interactions occur sequentially, where customers' valuations are drawn independently. However, this model is not entirely reflective of the real world, as it overlooks a critical aspect, the law of diminishing marginal utility, which states that a customer's marginal utility from each additional unit declines. This causes the valuation distribution to shift towards the lower end, which is not captured by the stream model. This motivates us to study a pool-based model, where a $textbf{pool}$ of customers repeatedly interacts with a monopolist seller, each of whose valuation diminishes in the number of purchases made according to a discount function. In particular, when the discount function is constant, our pool model recovers the stream model. We focus on the most fundamental special case, where a customer's valuation becomes zero once a purchase is made. Given $k$ prices, we present a non-adaptive, detail-free (i.e., does not know the valuations) policy that achieves a $1/k$ competitive ratio, which is optimal among non-adaptive policies. Furthermore, based on a novel debiasing technique, we propose an adaptive learn-then-earn policy with a $tilde O(k^{2/3} n^{2/3})$ regret.

6/10/2024

cs.LG cs.GT

Do Not Wait: Learning Re-Ranking Model Without User Feedback At Serving Time in E-Commerce

Yuan Wang, Zhiyu Li, Changshuo Zhang, Sirui Chen, Xiao Zhang, Jun Xu, Quan Lin

Recommender systems have been widely used in e-commerce, and re-ranking models are playing an increasingly significant role in the domain, which leverages the inter-item influence and determines the final recommendation lists. Online learning methods keep updating a deployed model with the latest available samples to capture the shifting of the underlying data distribution in e-commerce. However, they depend on the availability of real user feedback, which may be delayed by hours or even days, such as item purchases, leading to a lag in model enhancement. In this paper, we propose a novel extension of online learning methods for re-ranking modeling, which we term LAST, an acronym for Learning At Serving Time. It circumvents the requirement of user feedback by using a surrogate model to provide the instructional signal needed to steer model improvement. Upon receiving an online request, LAST finds and applies a model modification on the fly before generating a recommendation result for the request. The modification is request-specific and transient. It means the modification is tailored to and only to the current request to capture the specific context of the request. After a request, the modification is discarded, which helps to prevent error propagation and stabilizes the online learning procedure since the predictions of the surrogate model may be inaccurate. Most importantly, as a complement to feedback-based online learning methods, LAST can be seamlessly integrated into existing online learning systems to create a more adaptive and responsive recommendation experience. Comprehensive experiments, both offline and online, affirm that LAST outperforms state-of-the-art re-ranking models.

6/21/2024

cs.IR cs.LG