Dynamic Pricing and Learning with Long-term Reference Effects

Read original: arXiv:2402.12562 - Published 7/23/2024 by Shipra Agrawal, Wei Tang

🌐

Overview

This paper considers a dynamic pricing problem where customer response to the current price is influenced by their price expectation, known as the reference price.
The authors propose a simple and novel reference price mechanism where the reference price is the average of the past prices offered by the seller.
They show that under this mechanism, a markdown policy (starting with a higher price and decreasing it over time) is near-optimal, regardless of the model parameters.
The paper also tackles a more challenging problem where the demand model parameters are initially unknown, and the seller must learn them online while optimizing revenue. An efficient learning algorithm is provided with an optimal regret bound.

Plain English Explanation

In this paper, the authors look at a problem where a seller is trying to figure out the best prices to charge over time. The twist is that customers don't just look at the current price - they also have an "expected price" in mind based on the prices the seller has charged in the past. The authors propose a simple way for the seller to calculate this expected price, by just taking the average of all the past prices.

Interestingly, the authors show that with this approach, the best strategy for the seller is often to start with a high price and then gradually lower it over time. This matches the common intuition that customers feel like they're getting a good deal when prices drop. The authors provide a detailed analysis of this "markdown" pricing strategy and how to implement it efficiently.

The paper then tackles a harder problem, where the seller doesn't even know the underlying demand model at first. Instead, they have to learn it over time by observing how customers respond to the prices they set. The goal is to minimize the "regret" - the difference in revenue compared to the optimal strategy if the demand model was known in advance. The authors provide an efficient learning algorithm that can achieve the best possible regret bound.

Technical Explanation

The paper studies a dynamic pricing problem where the customer's willingness to pay is influenced by their reference price, which is their expectation of the price based on past prices offered by the seller. Unlike the more commonly studied exponential smoothing mechanism, the authors propose a simple reference price mechanism where the reference price is the average of the past prices.

Under this mechanism, the authors show that a markdown policy (starting with a higher price and decreasing it over time) is near-optimal, regardless of the demand model parameters. This matches the intuition that customers feel they are getting a bargain on items that are ordinarily more expensive.

The paper then considers a more challenging dynamic pricing and learning problem where the demand model parameters are initially unknown. The seller must learn them online from customer responses while simultaneously optimizing revenue. The goal is to minimize regret, the revenue loss compared to an optimal policy with known parameters. For linear demand models, the authors provide an efficient learning algorithm with an optimal regret bound.

Critical Analysis

The paper presents a novel and insightful analysis of dynamic pricing with a reference price mechanism based on the average of past prices. The authors successfully show that a markdown policy is near-optimal in this setting, providing a theoretical justification for a commonly observed empirical phenomenon.

One potential limitation is that the analysis is restricted to linear demand models. It would be interesting to see if the insights extend to more general demand functions. Additionally, the authors do not consider strategic customer behavior, where customers may strategically delay purchases in anticipation of future price drops.

The learning problem addressed in the second part of the paper is quite challenging, as the seller must simultaneously learn the demand model and optimize the pricing policy. The authors' algorithm achieves the best possible regret bound, but it would be valuable to understand the practical performance of the algorithm and how it compares to heuristic approaches used in industry.

Overall, this paper makes a significant contribution to the dynamic pricing literature by introducing a novel reference price mechanism and providing a rigorous analysis of the resulting optimal pricing policies. The insights can help inform the design of practical pricing strategies in e-commerce and other settings where customer price expectations play an important role.

Conclusion

This paper presents a comprehensive study of dynamic pricing problems with a focus on the role of customer price expectations, or reference prices. The authors propose a simple reference price mechanism based on the average of past prices and demonstrate that under this mechanism, a markdown pricing policy is near-optimal.

The paper also tackles a more complex problem where the demand model parameters are initially unknown, requiring the seller to learn them online while optimizing revenue. The authors provide an efficient learning algorithm with an optimal regret bound, addressing an important challenge in dynamic pricing.

The insights from this work can help inform the design of practical pricing strategies in e-commerce and other settings where understanding and managing customer price expectations is crucial for maximizing revenue. The paper's technical contributions also advance the state of the art in dynamic pricing research, paving the way for further developments in this active area of study.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🌐

Dynamic Pricing and Learning with Long-term Reference Effects

Shipra Agrawal, Wei Tang

We consider a dynamic pricing problem where customer response to the current price is impacted by the customer price expectation, aka reference price. We study a simple and novel reference price mechanism where reference price is the average of the past prices offered by the seller. As opposed to the more commonly studied exponential smoothing mechanism, in our reference price mechanism the prices offered by seller have a longer term effect on the future customer expectations. We show that under this mechanism, a markdown policy is near-optimal irrespective of the parameters of the model. This matches the common intuition that a seller may be better off by starting with a higher price and then decreasing it, as the customers feel like they are getting bargains on items that are ordinarily more expensive. For linear demand models, we also provide a detailed characterization of the near-optimal markdown policy along with an efficient way of computing it. We then consider a more challenging dynamic pricing and learning problem, where the demand model parameters are apriori unknown, and the seller needs to learn them online from the customers' responses to the offered prices while simultaneously optimizing revenue. The objective is to minimize regret, i.e., the $T$-round revenue loss compared to a clairvoyant optimal policy. This task essentially amounts to learning a non-stationary optimal policy in a time-variant Markov Decision Process (MDP). For linear demand models, we provide an efficient learning algorithm with an optimal $tilde{O}(sqrt{T})$ regret upper bound.

7/23/2024

🤿

Dynamic pricing with Bayesian updates from online reviews

Jos'e Correa, Mathieu Mari, Andrew Xia

When launching new products, firms face uncertainty about market reception. Online reviews provide valuable information not only to consumers but also to firms, allowing firms to adjust the product characteristics, including its selling price. In this paper, we consider a pricing model with online reviews in which the quality of the product is uncertain, and both the seller and the buyers Bayesianly update their beliefs to make purchasing & pricing decisions. We model the seller's pricing problem as a basic bandits' problem and show a close connection with the celebrated Catalan numbers, allowing us to efficiently compute the overall future discounted reward of the seller. With this tool, we analyze and compare the optimal static and dynamic pricing strategies in terms of the probability of effectively learning the quality of the product.

4/24/2024

Contextual Dynamic Pricing: Algorithms, Optimality, and Local Differential Privacy Constraints

Zifeng Zhao, Feiyu Jiang, Yi Yu

We study the contextual dynamic pricing problem where a firm sells products to $T$ sequentially arriving consumers that behave according to an unknown demand model. The firm aims to maximize its revenue, i.e. minimize its regret over a clairvoyant that knows the model in advance. The demand model is a generalized linear model (GLM), allowing for a stochastic feature vector in $mathbb R^d$ that encodes product and consumer information. We first show that the optimal regret upper bound is of order $sqrt{dT}$, up to a logarithmic factor, improving upon existing upper bounds in the literature by a $sqrt{d}$ factor. This sharper rate is materialised by two algorithms: a confidence bound-type (supCB) algorithm and an explore-then-commit (ETC) algorithm. A key insight of our theoretical result is an intrinsic connection between dynamic pricing and the contextual multi-armed bandit problem with many arms based on a careful discretization. We further study contextual dynamic pricing under the local differential privacy (LDP) constraints. In particular, we propose a stochastic gradient descent based ETC algorithm that achieves an optimal regret upper bound of order $dsqrt{T}/epsilon$, up to a logarithmic factor, where $epsilon>0$ is the privacy parameter. The regret upper bounds with and without LDP constraints are accompanied by newly constructed minimax lower bounds, which further characterize the cost of privacy. Extensive numerical experiments and a real data application on online lending are conducted to illustrate the efficiency and practical value of the proposed algorithms in dynamic pricing.

6/5/2024

🤔

Improved Algorithms for Contextual Dynamic Pricing

Matilde Tullii, Solenne Gaucher, Nadav Merlis, Vianney Perchet

In contextual dynamic pricing, a seller sequentially prices goods based on contextual information. Buyers will purchase products only if the prices are below their valuations. The goal of the seller is to design a pricing strategy that collects as much revenue as possible. We focus on two different valuation models. The first assumes that valuations linearly depend on the context and are further distorted by noise. Under minor regularity assumptions, our algorithm achieves an optimal regret bound of $tilde{mathcal{O}}(T^{2/3})$, improving the existing results. The second model removes the linearity assumption, requiring only that the expected buyer valuation is $beta$-Holder in the context. For this model, our algorithm obtains a regret $tilde{mathcal{O}}(T^{d+2beta/d+3beta})$, where $d$ is the dimension of the context space.

6/18/2024