A pragmatic policy learning approach to account for users' fatigue in repeated auctions

Read original: arXiv:2407.10504 - Published 7/16/2024 by Benjamin Heymann (FAIRPLAY), R'emi Chan--Renous-Legoubin, Alexandre Gilotte

🖼️

Overview

Online advertising banners are sold through real-time auctions
The value of each additional banner shown to a user decreases
Basic machine learning (ML) models can predict the decrease in value, but this is not enough to optimize bids
Most bidders in the literature are "impatient" and do not fully account for the repeated nature of the auctions
Impatience induces a cost that is important to consider

Plain English Explanation

When companies want to display ads on websites, they often use real-time auctions to buy the advertising "space" or "banners." The more banners a user sees, the less valuable each additional banner becomes to that user. Basic machine learning models can predict this decrease in value, but this knowledge alone is not enough to place bids that maximize the long-term profit.

Most bidders in previous research have been "impatient" - they only focus on winning the current auction and do not fully consider how winning now will impact the value of future auctions. This impatience comes at a cost, which the researchers demonstrate through an offline analysis and by showing that mitigating the cost of impatience leads to better business metrics.

Technical Explanation

The researchers note that the decreasing marginal value of advertising banners can be detected by basic ML models, which can predict how previously won auctions will impact the current auction's opportunity value. However, simply using these predictions to maximize the expected payoff of the current auction leads to an "impatient" policy that does not fully account for the repeated nature of the auctions.

The researchers provide two empirical arguments for the importance of this cost of impatience. First, they conduct an offline counterfactual analysis, and second, they demonstrate a notable improvement in key business metrics by mitigating the cost of impatience through policy learning. This aligns with prior work on the need for more realistic modeling of ad auctions.

Critical Analysis

The researchers acknowledge that learning is not enough to produce bids that fully account for the impact of winning the current auction on future values. They provide empirical evidence for the importance of considering this cost of impatience, but do not explore the underlying reasons in depth.

It would be interesting to see further research on the psychological and behavioral factors that contribute to impatient bidding strategies, as well as the potential cognitive biases or heuristics that may lead bidders to focus solely on short-term gains. Additionally, the researchers could investigate how different auction mechanisms or market structures might influence the prevalence and cost of impatient bidding.

Conclusion

This research highlights the need for bidding strategies in online advertising auctions to consider the repeated and dynamic nature of the auctions, rather than simply optimizing for the current auction. By accounting for the cost of impatience, the researchers demonstrate that businesses can improve their key metrics, pointing to the potential for more sophisticated reinforcement learning approaches to drive innovation in this space.

This summary was produced with help from an AI and may contain inaccuracies - check out the links to read the original source documents!

Follow @aimodelsfyi on 𝕏 →

Related Papers

🖼️

A pragmatic policy learning approach to account for users' fatigue in repeated auctions

Benjamin Heymann (FAIRPLAY), R'emi Chan--Renous-Legoubin, Alexandre Gilotte

Online advertising banners are sold in real-time through auctions.Typically, the more banners a user is shown, the smaller the marginalvalue of the next banner for this user is. This fact can be detected bybasic ML models, that can be used to predict how previously won auctionsdecrease the current opportunity value. However, learning is not enough toproduce a bid that correctly accounts for how winning the current auctionimpacts the future values. Indeed, a policy that uses this prediction tomaximize the expected payoff of the current auction could be dubbedimpatient because such policy does not fully account for the repeatednature of the auctions. Under this perspective, it seems that most biddersin the literature are impatient. Unsurprisingly, impatience induces a cost.We provide two empirical arguments for the importance of this cost ofimpatience. First, an offline counterfactual analysis and, second, a notablebusiness metrics improvement by mitigating the cost of impatience withpolicy learning

7/16/2024

🤖

Learning in Repeated Multi-Unit Pay-As-Bid Auctions

Rigel Galgana, Negin Golrezaei

Motivated by Carbon Emissions Trading Schemes, Treasury Auctions, and Procurement Auctions, which all involve the auctioning of homogeneous multiple units, we consider the problem of learning how to bid in repeated multi-unit pay-as-bid auctions. In each of these auctions, a large number of (identical) items are to be allocated to the largest submitted bids, where the price of each of the winning bids is equal to the bid itself. The problem of learning how to bid in pay-as-bid auctions is challenging due to the combinatorial nature of the action space. We overcome this challenge by focusing on the offline setting, where the bidder optimizes their vector of bids while only having access to the past submitted bids by other bidders. We show that the optimal solution to the offline problem can be obtained using a polynomial time dynamic programming (DP) scheme. We leverage the structure of the DP scheme to design online learning algorithms with polynomial time and space complexity under full information and bandit feedback settings. We achieve an upper bound on regret of $O(Msqrt{Tlog |mathcal{B}|})$ and $O(Msqrt{|mathcal{B}|Tlog |mathcal{B}|})$ respectively, where $M$ is the number of units demanded by the bidder, $T$ is the total number of auctions, and $|mathcal{B}|$ is the size of the discretized bid space. We accompany these results with a regret lower bound, which match the linear dependency in $M$. Our numerical results suggest that when all agents behave according to our proposed no regret learning algorithms, the resulting market dynamics mainly converge to a welfare maximizing equilibrium where bidders submit uniform bids. Lastly, our experiments demonstrate that the pay-as-bid auction consistently generates significantly higher revenue compared to its popular alternative, the uniform price auction.

7/17/2024

📉

Autobidders with Budget and ROI Constraints: Efficiency, Regret, and Pacing Dynamics

Brendan Lucier, Sarath Pattathil, Aleksandrs Slivkins, Mengxiao Zhang

We study a game between autobidding algorithms that compete in an online advertising platform. Each autobidder is tasked with maximizing its advertiser's total value over multiple rounds of a repeated auction, subject to budget and return-on-investment constraints. We propose a gradient-based learning algorithm that is guaranteed to satisfy all constraints and achieves vanishing individual regret. Our algorithm uses only bandit feedback and can be used with the first- or second-price auction, as well as with any intermediate auction format. Our main result is that when these autobidders play against each other, the resulting expected liquid welfare over all rounds is at least half of the expected optimal liquid welfare achieved by any allocation. This holds whether or not the bidding dynamics converges to an equilibrium.

6/13/2024

➖

Advancing Ad Auction Realism: Practical Insights & Modeling Implications

Ming Chen, Sareh Nabi, Marciano Siniscalchi

Contemporary real-world online ad auctions differ from canonical models [Edelman et al., 2007; Varian, 2009] in at least four ways: (1) values and click-through rates can depend upon users' search queries, but advertisers can only partially tune their bids to specific queries; (2) advertisers do not know the number, identity, and precise value distribution of competing bidders; (3) advertisers only receive partial, aggregated feedback, and (4) payment rules are only partially known to bidders. These features make it virtually impossible to fully characterize equilibrium bidding behavior. This paper shows that, nevertheless, one can still gain useful insight into modern ad auctions by modeling advertisers as agents governed by an adversarial bandit algorithm, independent of auction mechanism intricacies. To demonstrate our approach, we first simulate soft-floor auctions [Zeithammer, 2019], a complex, real-world pricing rule for which no complete equilibrium characterization is known. We find that (i) when values and click-through rates are query-dependent, soft floors can improve revenues relative to standard auction formats even if bidder types are drawn from the same distribution; and (ii) with distributional asymmetries that reflect relevant real-world scenario, we find that soft floors yield lower revenues than suitably chosen reserve prices, even restricting attention to a single query. We then demonstrate how to infer advertiser value distributions from observed bids for a variety of pricing rules, and illustrate our approach with aggregate data from an e-commerce website.

4/11/2024