DISCO: An End-to-End Bandit Framework for Personalised Discount Allocation

Read original: arXiv:2406.06433 - Published 6/14/2024 by Jason Shuo Zhang, Benjamin Howson, Panayiota Savva, Eleanor Loh

DISCO: An End-to-End Bandit Framework for Personalised Discount Allocation

Overview

• This paper presents DISCO, an end-to-end bandit framework for personalizing discount allocations in retail environments.

• The framework uses reinforcement learning techniques, specifically the Thompson Sampling algorithm, to adaptively learn and optimize personalized discount policies for each customer.

• DISCO aims to maximize revenue while providing customers with relevant and tailored discounts, addressing the challenge of targeted pricing in the presence of limited information about customer preferences.

Plain English Explanation

• Retailers often struggle to provide customers with discounts that are both appealing and profitable. [https://aimodels.fyi/papers/arxiv/contextual-dynamic-pricing-algorithms-optimality-local-differential] DISCO is a system that uses machine learning to automatically determine the best discounts to offer each individual customer.

• The key idea is to treat the discount allocation problem as a "bandit" problem, where the retailer must repeatedly decide which discount to offer a customer, with the goal of maximizing total revenue over time. [https://aimodels.fyi/papers/arxiv/distributed-multi-task-learning-stochastic-bandits-context]

• DISCO uses a technique called Thompson Sampling to learn each customer's sensitivity to different discounts. Over time, it adapts the discounts offered to each customer to maximize the retailer's profits while also providing value to the customer.

• This personalized approach contrasts with traditional one-size-fits-all discount strategies, which may not effectively capture the diverse preferences of a retailer's customer base. [https://aimodels.fyi/papers/arxiv/leveraging-offline-data-linear-latent-bandits]

Technical Explanation

• The paper formulates the discount allocation problem as a contextual bandit problem, where each customer is a "bandit" and the retailer must repeatedly decide which discount to "pull" (offer) for each customer.

• The key components of DISCO are:

A customer feature extractor that encodes customer attributes into a low-dimensional representation.
A Thompson Sampling-based discount policy that adaptively learns the optimal discount for each customer based on their observed responses.
An end-to-end training procedure that jointly optimizes the feature extractor and discount policy.

• The authors evaluate DISCO on both simulated and real-world datasets, demonstrating that it outperforms several baseline approaches in terms of revenue generation and customer satisfaction. [https://aimodels.fyi/papers/arxiv/causal-contextual-bandits-adaptive-context]

Critical Analysis

• The paper provides a robust and principled approach to the challenge of personalized discount allocation, which is an important problem in the retail industry.

• One potential limitation is the reliance on Thompson Sampling, which may be sensitive to the choice of prior distributions and could struggle in high-dimensional settings. Exploring alternative bandit algorithms may be an area for future research.

• The evaluation is limited to a single retail dataset, and further validation on a broader range of real-world scenarios would help strengthen the claims about DISCO's effectiveness. [https://aimodels.fyi/papers/arxiv/online-continuous-hyperparameter-optimization-generalized-linear-contextual]

• Ethical considerations around the use of personalized pricing strategies, and their potential impact on consumer welfare, are not discussed in depth and could be an important topic for future work.

Conclusion

• DISCO presents a promising end-to-end framework for personalizing discount allocation in retail environments, using contextual bandit techniques to adaptively learn and optimize discount policies for individual customers.

• The framework's ability to balance revenue generation and customer satisfaction demonstrates its potential for improving the effectiveness of targeted pricing strategies in the retail industry.

• While the paper makes a valuable contribution, further research is needed to explore the framework's robustness, scalability, and ethical implications in real-world settings.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

DISCO: An End-to-End Bandit Framework for Personalised Discount Allocation

Jason Shuo Zhang, Benjamin Howson, Panayiota Savva, Eleanor Loh

Personalised discount codes provide a powerful mechanism for managing customer relationships and operational spend in e-commerce. Bandits are well suited for this product area, given the partial information nature of the problem, as well as the need for adaptation to the changing business environment. Here, we introduce DISCO, an end-to-end contextual bandit framework for personalised discount code allocation at ASOS. DISCO adapts the traditional Thompson Sampling algorithm by integrating it within an integer program, thereby allowing for operational cost control. Because bandit learning is often worse with high dimensional actions, we focused on building low dimensional action and context representations that were nonetheless capable of good accuracy. Additionally, we sought to build a model that preserved the relationship between price and sales, in which customers increasing their purchasing in response to lower prices (negative price elasticity). These aims were achieved by using radial basis functions to represent the continuous (i.e. infinite armed) action space, in combination with context embeddings extracted from a neural network. These feature representations were used within a Thompson Sampling framework to facilitate exploration, and further integrated with an integer program to allocate discount codes across ASOS's customer base. These modelling decisions result in a reward model that (a) enables pooled learning across similar actions, (b) is highly accurate, including in extrapolation, and (c) preserves the expected negative price elasticity. Through offline analysis, we show that DISCO is able to effectively enact exploration and improves its performance over time, despite the global constraint. Finally, we subjected DISCO to a rigorous online A/B test, and find that it achieves a significant improvement of >1% in average basket value, relative to the legacy systems.

6/14/2024

✨

Distributed Multi-Task Learning for Stochastic Bandits with Context Distribution and Stage-wise Constraints

Jiabin Lin, Shana Moothedath

We present the problem of conservative distributed multi-task learning in stochastic linear contextual bandits with heterogeneous agents. This extends conservative linear bandits to a distributed setting where M agents tackle different but related tasks while adhering to stage-wise performance constraints. The exact context is unknown, and only a context distribution is available to the agents as in many practical applications that involve a prediction mechanism to infer context, such as stock market prediction and weather forecast. We propose a distributed upper confidence bound (UCB) algorithm, DiSC-UCB. Our algorithm constructs a pruned action set during each round to ensure the constraints are met. Additionally, it includes synchronized sharing of estimates among agents via a central server using well-structured synchronization steps. We prove the regret and communication bounds on the algorithm. We extend the problem to a setting where the agents are unaware of the baseline reward. For this setting, we provide a modified algorithm, DiSC-UCB2, and we show that the modified algorithm achieves the same regret and communication bounds. We empirically validated the performance of our algorithm on synthetic data and real-world Movielens-100K data.

4/11/2024

DISCO: Embodied Navigation and Interaction via Differentiable Scene Semantics and Dual-level Control

Xinyu Xu, Shengcheng Luo, Yanchao Yang, Yong-Lu Li, Cewu Lu

Building a general-purpose intelligent home-assistant agent skilled in diverse tasks by human commands is a long-term blueprint of embodied AI research, which poses requirements on task planning, environment modeling, and object interaction. In this work, we study primitive mobile manipulations for embodied agents, i.e. how to navigate and interact based on an instructed verb-noun pair. We propose DISCO, which features non-trivial advancements in contextualized scene modeling and efficient controls. In particular, DISCO incorporates differentiable scene representations of rich semantics in object and affordance, which is dynamically learned on the fly and facilitates navigation planning. Besides, we propose dual-level coarse-to-fine action controls leveraging both global and local cues to accomplish mobile manipulation tasks efficiently. DISCO easily integrates into embodied tasks such as embodied instruction following. To validate our approach, we take the ALFRED benchmark of large-scale long-horizon vision-language navigation and interaction tasks as a test bed. In extensive experiments, we make comprehensive evaluations and demonstrate that DISCO outperforms the art by a sizable +8.6% success rate margin in unseen scenes, even without step-by-step instructions. Our code is publicly released at https://github.com/AllenXuuu/DISCO.

7/23/2024

DISCO: Efficient Diffusion Solver for Large-Scale Combinatorial Optimization Problems

Kexiong Yu, Hang Zhao, Yuhang Huang, Renjiao Yi, Kai Xu, Chenyang Zhu

Combinatorial Optimization (CO) problems are fundamentally crucial in numerous practical applications across diverse industries, characterized by entailing enormous solution space and demanding time-sensitive response. Despite significant advancements made by recent neural solvers, their limited expressiveness does not conform well to the multi-modal nature of CO landscapes. While some research has pivoted towards diffusion models, they require simulating a Markov chain with many steps to produce a sample, which is time-consuming and does not meet the efficiency requirement of real applications, especially at scale. We propose DISCO, an efficient DIffusion Solver for Combinatorial Optimization problems that excels in both solution quality and inference speed. DISCO's efficacy is two-pronged: Firstly, it achieves rapid denoising of solutions through an analytically solvable form, allowing for direct sampling from the solution space with very few reverse-time steps, thereby drastically reducing inference time. Secondly, DISCO enhances solution quality by restricting the sampling space to a more constrained, meaningful domain guided by solution residues, while still preserving the inherent multi-modality of the output probabilistic distributions. DISCO achieves state-of-the-art results on very large Traveling Salesman Problems with 10000 nodes and challenging Maximal Independent Set benchmarks, with its per-instance denoising time up to 44.8 times faster. Through further combining a divide-and-conquer strategy, DISCO can be generalized to solve arbitrary-scale problem instances off the shelf, even outperforming models trained specifically on corresponding scales.

7/22/2024