<<

A Data-Driven Metric of

Yuan Deng Sébastien Lahaie Duke University Google Research Durham, NC New York, NY [email protected] [email protected] Vahab Mirrokni Song Zuo Google Research Google Research New York, NY Beijing, China [email protected] [email protected]

ABSTRACT practice are not incentive-compatible, notably in the domain of An incentive-compatible incentivizes buyers to truthfully ad . The display advertising industry, including Google, reveal their private valuations. However, many ad auction mech- has recently switched to first-price auctions [8, 21, 34]. Moreover, anisms deployed in practice are not incentive-compatible, such some ad exchanges have allegedly used soft floors in addition to as first-price auctions (for display advertising) and the general- standard reserve prices [37], which mixes the incentives of first- ized second-price auction (for search advertising). We introduce and second-price auctions. Generalized second-price auctions are a new metric to quantify incentive compatibility in both static widely deployed for search advertising; although inspired by Vick- and dynamic environments. Our metric is data-driven and can be rey auctions, it was recognized early on that these auctions are not computed directly through black-box auction simulations without truthful [15, 26, 36]. relying on reference mechanisms or complex optimizations. We As originally defined, incentive compatibility is a binary notion— provide interpretable characterizations of our metric and prove a mechanism is either incentive compatible or it isn’t. To achieve that it is monotone in auction parameters for several mechanisms a more nuanced comparison between mechanisms, or between used in practice, such as soft floors and dynamic reserve prices. We different parametrizations of a single mechanism, there has been empirically evaluate our metric on ad auction data from a major a growing amount of interest in developing metrics that quantify ad exchange and a major search engine to demonstrate its broad incentive compatibility. The most common approach is to rely on applicability in practice. the concept of regret, which is a buyer’s utility difference between best responding and truthful reporting [4, 6, 10, 13, 32]. Regret is CCS CONCEPTS an appealing measure because its units are linked to utility and directly interpretable. However, computing a regret-based metric • Theory of computation → Computational advertising the- requires solving for the of a buyer for every possible ory. valuation, which can be a complex optimization task. Even worse, KEYWORDS auction systems in practice are becoming more complex and opaque incentive compatibility metric, ad auction, truthfulness so that their mechanics can be difficult to model let alone optimize ACM Reference Format: against. Yuan Deng, Sébastien Lahaie, Vahab Mirrokni, and Song Zuo. 2020. A Data- In this paper, we introduce a new data-driven metric to quantify Driven Metric of Incentive Compatibility. In Proceedings of The Web Confer- incentive compatibility for both static and dynamic environments ence 2020 (WWW ’20), April 20–24, 2020, Taipei, Taiwan. ACM, New York, based on Myerson’s classic characterization of the relationship NY, USA, 11 pages. https://doi.org/10.1145/3366423.3380249 between allocation and payment rules of incentive-compatible auc- tions [30]. To compute our metric, one simply applies small per- 1 INTRODUCTION turbations to bids and records the resulting bidder utilities. For addresses the problem of achieving desirable out- ad auctions this can be achieved by black-box simulations over comes by eliciting private valuation information held by multiple auction logs or, for a more faithful evaluation that captures bidder agents. A mechanism is incentive-compatible—also called truth- behavior, by applying the perturbations to small slices of experimen- ful or strategyproof—if it guarantees that an agent’s (weakly) best tal traffic. Once this data is collected our metric can be computed is to truthfully reveal its private information. This leads using straightforward database queries, along with the usual stan- to straightforward participation with predictable actions on the dard errors and confidence intervals. This simplicity and scalability part of the agent. However, many auction mechanisms fielded in is a major advantage over previous methods that rely on know- ing the reference mechanisms that are incentive-compatible [27] This paper is published under the Creative Commons Attribution 4.0 International or complex optimizations to compute profile-by-profile best re- (CC-BY 4.0) license. Authors reserve their rights to disseminate the work on their personal and corporate Web sites with the appropriate attribution. sponses [4, 13, 32]. WWW ’20, April 20–24, 2020, Taipei, Taiwan Our metric takes the form of an index that lies between 0 and 1 © 2020 IW3C2 (International World Wide Web Conference Committee), published for reasonable mechanisms: we show that it is non-negative if utility under Creative Commons CC-BY 4.0 License. ACM ISBN 978-1-4503-7023-3/20/04. https://doi.org/10.1145/3366423.3380249 WWW ’20, April 20–24, 2020, Taipei, Taiwan Yuan Deng, Sébastien Lahaie, Vahab Mirrokni, and Song Zuo under truthful is non-decreasing in an agent’s true valua- ex-post regret and the error in multi-class classifiers, enabling the tion, and that it is at most 1 if overbidding is a weakly dominated application of structural support vector machines to the design of strategy. The bounds of our metric are meaningful: it is always low-regret mechanisms. Recently, Duetting et al. [12] introduce the 1 for incentive-compatible auctions, and for first-price auctions idea of using deep learning for auction design and Balcan et al. [4] where the bid most directly influences the payment it has value 0. apply statistical learning techniques to estimate interim incentive Moreover, for a mixture of a truthful auction and the corresponding compatibility using regret as a measure. All of these regret-based first-price auction with the same allocation rule, our metric exactly approaches require complex optimizations in order to compute evaluates to the fraction of the truthful auction in the mixture. Our profile-by-profile best-responses, while our metric can be computed metric can be viewed as a measure of the marginal benefit of a via black-box auction simulations and simple database queries. bidding strategy which bids a scaled version of the true value, with Recent work attempts to address these computational difficulties the same scaling factor throughout (i.e., uniform ). To by approximating regret. Feng et al. [20] design online algorithms to add another interpretation, we show that the metric is associated minimize (and therefore compute) regret, with an eye towards fast with the difference between the payment function that truthfully convergence, and evaluate their algorithms over GSP with synthetic implements the allocation rule and the one used in the auction. data. Colini-Baldeschi et al. [9] introduce the concept of IC-Envy, Our metric can be applied to both static and dynamic mecha- which is easier to compute but bounds or even equates to regret nisms. A dynamic mechanism maintains state so that an agent’s in important domains like position auctions. IC-Envy can also be bids can influence future payoffs. For instance, an ad auction might used to bound social welfare loss due to misreports, but we are not use dynamic reserve prices, which are set based on past bid dis- aware of a way to extend the concept to dynamic environments. tributions [22]; or the auction might throttle bidders who have Other than the regret-based approaches, Pathak and Sönmez lost too many auctions in the past. To the best of our knowledge, [33] provide a ranking for mechanisms without payment, such regret-based approaches have not been extended to dynamic envi- as matching mechanisms, based on the number of instances that ronments. We provide closed-form characterizations of our metric are manipulable. Troyan and Morrill [35] introduce the concept for several static and dynamic mechanisms deployed in practice, of an “obvious manipulation” and propose to compare mechanism such as soft floors and dynamic reserve pricing. This leads usto incentives based on this criterion. Lubin and Parkes [27] quantify a notion of incentive monotonicity, which is the property that our according to the divergence between a mecha- metric is monotone in an auction parameter. We have found this nism’s payoffs and those of a strategyproof reference mechanism. In useful for reasoning about how certain parameters influence in- contrast, our metric does not rely on the access to a reference mech- centives. For dynamic reserve pricing, our metric demonstrates an anism, which may be hard to characterize or compute in certain interesting and intuitive trade-off between incentive compatibility domains. and revenue: the metric achieves its lowest value under the most Another line of related research is on testing incentive com- aggressive form of dynamic reserves. patibility, initiated by Lahaie et al. [24]. They propose a general We demonstrate the broad applicability of our metric by drawing framework to test incentive compatibility by segmenting query on data from the auction logs of the Google Ad Exchange and the traffic into buckets and systematically perturbing buyers valuations Google search engine. In the display ads setting, we simulate a sim- within buckets. Deng and Lahaie [11] extend their framework to test ple mixture mechanism between first- and second-price auctions, incentive compatibility in dynamic environments. Our work builds as well first- and second-price soft floors. We also simulate dynamic upon this framework and uses similar bid perturbation techniques. reserve pricing policies based on quantiles of past bid distributions However, our metric is meant to be evaluated on the sell-side (e.g., or Myerson-optimal reserves. The empirical results quantify the the exchange) rather than the buy-side (the bidders), because its effect of various mechanism parameters; for instance, we findthat purpose is to provide insights into how incentives vary and are using Myerson-optimal dynamic reserves with a second-price auc- affected by mechanism parameters, rather than to inform bidding. tion reduces our metric from 1.0 to 0.8 over the data, an equivalent We provide a unified metric that can capture both the static and effect to directly mixing in a first-price auction with probability 0.2. dynamic aspects of incentive compatibility at once. The results also empirically confirm the incentive monotonicity (or non-monotonicity) properties of various auction parameters, as established by the theory. In the search ads setting, we use our 2 PRELIMINARIES metric to understand how the incentives of the Generalized Second Consider n buyers participating in a dynamic auction with a single Price (GSP) auction vary with the number of positions. We also seller (e.g., an ad exchange). The dynamic auction consists of a examine the sensitivity of our metric to both anonymous and per- sequence of queries. The buyer winning a query gets the chance to sonalized reserve prices, based on dynamic reserve pricing policies display ads on the website. For convenience, we assume that there reportedly implemented at Yahoo [31]. is one query arriving at each stage, so that the t-th query arrives at stage t. The queries arrive in an online manner such that the ad slots must be sold once they arrive. Related Work For ease of presentation, we take the perspective of a single buyer. Parkes et al. [32] initiate the idea of designing combinatorial auc- We assume that the buyer’s valuation vt ∈ V = [0, 1] at stage t is tions with small regret, which is later extended to mechanisms drawn independently from a continuous distribution Ft overV with beyond combinatorial auctions by Day and Milgrom [10]. Duet- density function ft ; the distributions are not necessarily identical ting et al. [13] demonstrate a connection between the expected across stages. Upon the arrival of the t-th query, the buyer first A Data-Driven Metric of Incentive Compatibility WWW ’20, April 20–24, 2020, Taipei, Taiwan draws her valuation vt from the distribution Ft and then submits MixtureAuction [11], and posted-price auction with first/second- her bid bt ∈ V to the seller. After receiving all buyers’ bids, the price soft floors, denoted by first/second-price SoftFloor [37]. For seller decides how to allocate the ad slots and how much payment ease of demonstration, denote the first and the second highest bids to collect from each buyer. In general, a dynamic auction can be in a realized bidding profile by b1 and b2, respectively. t represented by ⟨x,p⟩, where the allocation rule xt : V → [0, 1] In the MixtureAuction with reserve r, let κ ∈ [0, 1] denote the maps the buyer’s historical bids from the first t stages to an alloca- first-price weight. The auction allocates the ad slot to the bidder t tion probability at stage t, and the payment rule pt : V → R maps with the highest bid if and only if b1 ≥ r, and charges her price the buyer’s historical bids from the first t stages to a payment at κb1 + (1 − κ) max(b2,r). Intuitively, with probability κ, the seller stage t. We call ⟨xt ,pt ⟩ the stage mechanism at stage t. Since we implements a first-price auction with reserve r, while with the are taking the perspective of a single buyer, we assume that the remaining probability 1 − κ, the seller implements a second-price allocation xt and the payment pt subsume the other players’ bids. auction with reserve r. In line with the literature, we assume that each buyer’s utility In the SoftFloor auction, let rl < rh be the reserve prices; rl and is quasi-linear such that her utility at stage t with true valuation rh are also known as the low floor and the high floor, respectively. For    vt is ut b(1,t);vt = vt · xt b(1,t) −pt b(1,t) . We use the notation the first-price SoftFloor auction, if b1 ≥ rl , the auction allocates a(t ′,t ′′) = (at ′ , ··· , at ′′ ) to represent a consecutive sequence of a, the ad slot to the bidder with the highest bid and charges her ′ ′′  from stage t to stage t . For convenience, let uˆt b(1,t) = bt · max(b2, min(b1,rh)). In contrast, for the second-price SoftFloor   xt b(1,t) − pt b(1,t) , i.e., the buyer’s utility at stage t assuming auction, the auction allocates the ad slot to the bidder with the she reports truthfully at stage t such that bt = vt . highest bid if b1 ≥ rl , but the winner is charged max(b2,rh) if b1 ≥ rh and max(b2,rl ) if rl ≤ b1 < rh. Intuitively, in both auctions, 2.1 Dynamic Incentive Compatibility the bidder with the highest bid gets allocated if her bid is above the In a dynamic auction, the buyer’s objective is to maximize her time- low floor. Moreover, the payments are the same when her bidis discounted cumulative utility with respect to a discounting factor also above the high floor, but when her bid falls between rl and rh, γ ∈ [0, 1]. However, her optimal bidding strategy at stage t depends the payment rule becomes first-pricing (or second-pricing) in the on her strategies for the future stages. We adopt the classic notion first-price (or second-price) SoftFloor auction. of dynamic incentive-compatibility [11, 29], in which the buyer is In our analysis of stage mechanisms, we assume that the com- assumed to report her valuations truthfully for all future stages and peting bid (the highest bid among the other buyers) is drawn inde- dynamic incentive-compatibility is required to hold for all stages. pendently from a continuous distribution Gt with density function дt at stage t. For convenience, given a function a(v), let its left limit Formally, a dynamic auction ⟨x,p⟩ is dynamic incentive-compatible − at v be a (v) = limδ →0+ a(v −δ), and similarly, let its right limit at if for any stage t, bidding history b(1,t−1), and valuation vt , we have + v be a (v) = limδ →0+ a(v + δ). Throughout this paper, we assume h  i vt ∈ argmax ut b(1,t−1),bt ;vt + U t b(1,t−1),bt , that the allocation rule xt and payment rule pt of the stage mecha- bt nism are continuous functions with finitely many discontinuities, (Dynamic-IC) and moreover, their left and right limits are well-defined. Note that where U t is the continuation utility of the buyer, defined as her these properties are satisfied by MixtureAuction and SoftFloor. expected future utility assuming truthful reporting: T 3 METRIC FOR STAGE AUCTIONS  Õ τ −t   U t b(1,t) = γ · E uˆτ b(1,t),v(t+1,τ ) . In this section, we introduce our metric of incentive-compatibility v(t+1,τ ) τ =t+1 for a stage auction in isolation, i.e., a stage mechanism with a fixed 2.2 Stage Incentive Compatibility bidding history. For the sake of clarity we drop the stage subscript t in this section and denote the stage mechanism by ⟨x,p⟩. We γ = In particular, for myopic buyers with discounting factor 0, the further drop its dependence on bidding history for convenience. dynamic incentive-compatibility notion simply requires that the Let uˆ(b) = b ·x(b) −p(b) be the buyer’s utility assuming she reports ⟨x ,p ⟩ stage mechanism t t satisfies stage incentive-compatibility: truthfully such that b = v.   vt ∈ argmax ut b(1,t−1),bt ;vt , (Stage-IC) bt Definition 3.1 (Individual Stage-IC Metric). The individual stage- IC metric for a buyer with valuation distribution F in a stage mech- for any stage t, bidding history b , and valuation v . (1,t−1) t anism ⟨x,p⟩ is defined as The celebrated Myerson’s lemma [30] characterizes the rela-     tionship between the allocation rule and the payment rule for any Ev∼F uˆ (1 + α)v − Ev∼F uˆ (1 − α)v Stage-IC mechanism ⟨x,p⟩: i-SIC = lim . α→0 2α · Ev∼F [v · x(v)] • The allocation rule x is non-decreasing; ∫ v Intuitively, the numerator of i-SIC is the difference between • p(v) = v · x(v) − x(z)dz. 0 the expected utility when the buyer’s valuations are perturbed These relationships form the basis of our analysis. multiplicatively up versus down by α, while the denominator is 2α times the expected welfare of the buyer when she reports truthfully. 2.3 Stage Auction Formats The metric can be viewed as a test of the envelope condition In this paper, we examine two kinds of stage mechanisms: mixture which is at the of Myerson’s lemma, except that only the between first- and second-price auction with reserve, denoted by mean value of the test is reported. This mean value can also be WWW ’20, April 20–24, 2020, Taipei, Taiwan Yuan Deng, Sébastien Lahaie, Vahab Mirrokni, and Song Zuo interpreted as the marginal benefit of uniform bid-shading. If the Proof. Letu˜(b) = b·x(b)−p∗(b). Note thatuˆ(b) = b·x(b)−p(b) = envelope condition test fails and the metric evaluates to smaller u˜(b) + p∗(b) − p(b). According to Theorem 3.2, we know that the than 1, this indicates that the buyer may benefit from slight uniform stage-IC metric for the buyer with valuation distribution F is 1 in a bid-shading, and vice versa for bid-raising. We elaborate on this stage mechanism ⟨x,p∗⟩. Therefore, we can conclude the proof of aspect in more mathematical detail in Section 3.1. the alternative form of i-SIC by plugging uˆ(b) = u˜(b) + p∗(b) − p(b) To further justify our metric, we first show that our metric is 1 into the formula of i-SIC from Theorem 3.2. for any incentive-compatible stage mechanism, regardless of the When p is differentiable, we have: buyer’s valuation distribution. E p(1 + α)v − p(1 − α)v lim Theorem 3.2. For any stage mechanism ⟨x,p⟩ that is Stage-IC, α→0 2α i-SIC = 1 for any distribution F. ∫ 1 f (v)· p(1 + α)v − p(1 − α)v dv = lim 0 Proof. Note that it suffices to show that for any Stage-IC mech- α→0 2α anism ⟨x,p⟩ and any distribution F, we have ∫ 1  p(1 + α)v − p(1 − α)v  = f (v)· v · lim dv E uˆ(1 + α)v − E uˆ(1 − α)v 0 α→0 2αv lim = E [v · x(v)] . ∫ 1 α→0 2α = f (v)· v · p′(v)dv = E v · p′(v) Applying Myerson’s lemma for the Stage-IC mechanism, we get 0 ( ) ∫ v ( ) uˆ v = 0 x v dv, which is a continuous function. Therefore, for and similarly, when x is differentiable, we have the left-hand-side, we have  ∗ ( )  − ∗ ( − )  E p 1 + α v p 1 α v  ∗′  E uˆ(1 + α)v − E uˆ(1 − α)v lim = E v · p (v) . lim α→0 2α α→0 α 2 □ ∫ 1 uˆ(1 + α)v − uˆ(1 − α)v = lim f (v)v · dv α→0 0 2αv By Lemma 3.3, when both x and p are differentiable, our metric ∫ 1 uˆ(1 + α)v − uˆv uˆv − uˆ(1 − α)v  captures the difference between the gradient of p(v) used in the = lim f (v)v · + dv stage mechanism and the gradient of p∗(v) that truthfully imple- α→0 2αv 2αv 0 ments the allocation rule x(v). ∫ 1 uˆ′+(v) + uˆ′−(v) ∫ 1 x+(v) + x−(v) = f (v)v · dv = f (v)v · dv 0 2 0 2 3.1 A Metric Bounded Between 0 and 1 = E[v · x(v)], We next demonstrate the our metric is bounded between 0 and 1 where uˆ′+(v) is the right derivative of uˆ(v) and uˆ′−(v) is the left for reasonable mechanisms. ∫ v derivative. The third and fourth equations follow uˆ(v) = x(v)dv 0 Lemma 3.4. i-SIC is non-negative ifuˆ(v) is non-decreasing in terms and that the left limit and right limit of x exist. The last equation is of v; and i-SIC is at most 1 if overbidding is always a weakly domi- due to that F is continuous and that x is continuous with finitely nated strategy. many discontinuities. □ Proof. First, it is straightforward to verify that i-SIC is non- Next, we provide an alternative form of our metric for added intu- negative if uˆ(v) is non-decreasing in terms of v. ition. Recall that Myerson’s lemma characterizes the relationship be- If over-bidding b > v is not beneficial to the buyer, then we tween the payment and the allocation in any Stage-IC mechanism: ∗ ∫ v have v · x(b) − p(b) ≤ v · x(v) − p(v), which is equivalent to, b > x(v) pins down a unique payment rule p (v) = v · x(v) − x(z)dz ( )− ( ) ∀ 0 v, uˆ b uˆ v ≤ x(b) v that (truthfully) implements the allocation rule x(v). b−v . It then implies that for all , uˆ(1 + α)v − uˆ(1 − α)v 1 ∫ (1+α)v Lemma 3.3. The individual stage-IC metric for a buyer with valu- ≤ · x(z)dz. ation distribution F in a stage mechanism ⟨x,p⟩ can be equivalently 2α 2α (1−α)v rewritten as As a result, we have ∆ i-SIC = 1 − , where     E [v · x(v)] E uˆ (1 + α)v − E uˆ (1 − α)v lim α→0 2α ∫ 1  ( )  − ( − )  ∫ (1+α)v p 1 + α v p 1 α v 1 ( ) ∆ = lim f (v)· v · dv ∫ (1−α)v x z dz α→0 0 2αv ≤ lim f (v)· dv α→0 0 2α ∫ 1 p∗ (1 + α)v − p∗ (1 − α)v − ( )· · ∫ 1 x+(v) + x−(v) lim f v v dv. ( )· [ · ( )] α→0 0 2αv = lim f v dv = E v x v α→0 0 2 In particular, when both x and p are differentiable, we have where the second-to-last equation follows that the left and right  ′ ∗′  Ev∼F v · p (v) − p (v) limits of x exist and the last equation is due to that F is continuous i-SIC = 1 − . Ev∼F [v · x(v)] and that x is continuous with finitely many discontinuities. □ A Data-Driven Metric of Incentive Compatibility WWW ’20, April 20–24, 2020, Taipei, Taiwan

The proofs of Theorem 3.2 and Lemma 3.4 imply that when Corollary 3.6. In a mixture between an incentive-compatible the metric evaluates to less than, equal to, or larger than 1, then mechanism and the corresponding first-price auction with the same this corresponds to cases where uniform bid-shading is beneficial, allocation rule, i-SIC = 1 − κ, where κ is the first-price weight. neutral, or harmful at the margin, respectively. This interpretation may be particularly relevant in practice as uniform bid-shading 3.2.2 SoftFloor. We now turn to the SoftFloor auction. is a common bid optimization strategy [5, 19]. Nevertheless, we Theorem 3.7. In a first-price SoftFloor auction with a fixed low emphasize that the metric only provides information about the floor rl , marginal benefits of bid shading, not the optimal shading factor. E [v · x(v)· 1{r ≤ v ≤ r }] i-SIC = 1 − l h , In particular, if the metric evaluates close to 0 (or 1) this does not E[v · x(v)] imply a relatively small (or large) optimal shading factor. which is decreasing as the high floor rh increases. 3.2 Stage-IC Monotonicity Intuitively, in a first-price SoftFloor auction, i-SIC exactly cap- 3.2.1 MixtureAuction. We further investigate whether our metric tures the ratio between the buyer’s total welfare and her welfare displays monotonicity in terms of certain auction parameters. To restricted to the valuations lying between the low floor rl and the begin with, we first show that our metric exactly evaluates to 1 − κ high floor rh. in MixtureAuction, where κ is the first-price weight κ. Proof of Theorem 3.7. Recall that G is the distribution of the Theorem 3.5. In a mixture between first- and second-price auction competing bids. The allocation rule of the considered buyer is with first-price weight κ and any reserve, i-SIC = 1 − κ.  0 v < r x(v) = l . Proof. Note that the utility uˆ in MixtureAuction with first- G(v) v ≥ rl ( ) · ( ) ( − )· price weight κ can be decomposed as uˆ b = κ uˆ1 b + 1 κ Moreover, the payment rule is given as uˆ2(b), where uˆ1(b) and uˆ2(b) are the buyer’s utility under pure first- price auction and pure second-price auction, respectively. Moreover,  0 v < rl  observe that the allocation rule is the same no matter what κ is, p(v) = v · G(v) rl ≤ v < rh . ∫ v i.e., the first-price weight only affects the payment rule. Therefore,  rh · G(rh) + z · д(z)dz v ≥ rh  rh since a pure second-price auction is Stage-IC, by Theorem 3.2, Therefore, the utility function is     E uˆ2 (1 + α)v − E uˆ2 (1 − α)v lim = 1.  0 v < rl α→0 2α · E [v · x(v)]  uˆ(v) = 0 rl ≤ v < rh . ∫ v Finally, observe that uˆ1(b) = 0 for all b, therefore, we have  v · G(v) − rh · G(rh) − z · д(z)dz v ≥ rh  rh     E uˆ (1 + α)v − E uˆ (1 − α)v Observe that the utility function uˆ(v) is continuous for 0 < v < 1 i-SIC = lim α→0 2α · E [v · x(v)] and moreover, for any v, both the left derivative and the right     > E uˆ2 (1 + α)v − E uˆ2 (1 − α)v derivative exist for uˆ. In particular, for v rh, we have = lim (1 − κ)· = 1 − κ. α→0 2α · E [v · x(v)] uˆ′+(v) = uˆ′−(v) = G(v) = x(v), □ ′+ ′− and for v < rh, uˆ (v) = uˆ (v) = 0. Therefore, we have According to Theorem 3.5, as the first-price weight κ increases, E uˆ(1 + α)v − E uˆ(1 − α)v our metric of stage incentive-compatibility decreases. In particular, lim α→0 2α when κ = 1 we have a pure first-price auction and our metric ∫ 1 uˆ(1 + α)v − uˆ(1 − α)v evaluates to 0. A careful reader will notice that our stage-IC metric = lim f (v)v · dv does not provide any guidance on how a buyer should shade their α→0 0 2αv     bids in MixtureAuction. For instance, the metric is always 0 for ∫ 1 uˆ (1 + α)v − uˆ v uˆ v − uˆ (1 − α)v  = lim f (v)v · + dv a pure first-price auction, but obviously the optimal level ofbid α→0 0 2αv 2αv shading (i.e, how much to scale down the value to form the bid) for ∫ 1 uˆ′+(v) + uˆ′−(v) ∫ 1 a bidder can vary widely, depending on its value distribution and = f (v)v · dv = f (v)v · x(v)dv, 2 the reserve price. 0 rh Notice that the proof of Theorem 3.5 only depends on the facts where the last inequality is due to that F is a continuous function. that the pure second-price auction is incentive-compatible, the As a result, we have buyer’s utility in the pure first-price auction is 0, and their alloca- ∫ 1 ∫ 1 f (v)· v · x(v)dv f (v)· v · x(v)dv tion rules are the same. Therefore, given any incentive-compatible rh rh ∗ ∗ i-SIC = = auction ⟨x ,p ⟩, we can construct a corresponding first-price auc- E[v · x(v)] ∫ 1 ( )· · ( ) r f v v x v dv tion with the same allocation rule ⟨x F ,pF ⟩ such that x F (v) = x∗(v) l ∗ E [v · x(v)· 1{r ≤ v ≤ r }] for all v and pF (v) = x (v)· v. In a mixture ⟨x,p⟩ between them = 1 − l h , [ · ( )] with the first price weight κ, let x(v) = κ · x F (v) + (1 − κ)· x∗(v) E v x v and p(v) = κ · pF (v) + (1 − κ)· p∗(v). which concludes the proof. □ WWW ’20, April 20–24, 2020, Taipei, Taiwan Yuan Deng, Sébastien Lahaie, Vahab Mirrokni, and Song Zuo

However, in a first-price SoftFloor auction with a fixed high Compared to the stage incentive-compatibility requirement in which floor r , the individual stage-IC metric may not be monotone in h v ∈ argmax u b ;v  , terms of the low floor r since a change in r would change both 1 1 1 1 l l b1 E[v · x(v)· 1{rl ≤ v ≤ rh }] and E[v · x(v)]. Moreover, there is no stage-IC monotonicity in a second-price the difference is that dynamic incentive-compatibility consists of an additional term of continuation utility U (b ). If we consider an SoftFloor auction even in terms of the high floor rh. Intuitively, the 1 1 individual stage-IC metric should depend on the density function alternative payment function f around the high floor: if the buyer’s valuation is concentrated d p (b1) = p1(b1) − U 1(b1), slightly above the high floor, then truthful reporting would give 1 the buyer almost 0 utility but misreporting to bid slightly below then ⟨x,p⟩ is dynamic-IC at the first stage if the payment function − d the high floor can result in a utility about rh rl > 0; but if the p1 truthfully implements the allocation rule x1. Indeed, i-DIC is buyers’ valuation is concentrated slightly below the high floor, then ⟨ d ⟩ exactly i-SIC for the “stage” mechanism x1,p1 . the buyer has no incentive to misreport. We derive a closed-form formula of the individual stage-IC metric in Lemma 3.8, which 4.1 Dynamic Reserve Pricing confirms this intuition (proof omitted due to the space limitation). We are particularly interested in second-price auctions with dy- Lemma 3.8. In a second-price SoftFloor auction with a fixed rl , namic reserve, denoted by DynamicReserve. For the purpose of analysis, we assume that there are three stages and the buyer’s dis- ∗  f (rh)· rh · rh · x(rh) − p (rh) i-SIC = 1 − . tributions are independent and identical across these three stages: Ev∼F [v · x(v)] F = F0 = F1 = F2. The reserve at stage 1 is set to be κ1 quantile of where p∗ is the payment function that truthfully implements x. distribution F.A κ quantile is the output of a quantile function such that q(κ) = F −1(κ). We are interested in quantifying the dynamic Note that the utility of truthful bidding in a second-price Soft- incentive compatibility for stage 1. In our experiment of stage 1, Floor auction is not non-decreasing, and in particular, uˆ(rh) = 0 the buyer’s valuation v1 is drawn from a distribution F and then − while uˆ (rh) > 0. As a result, i-SIC in a second-price SoftFloor she submits a bid β · v1 with β ∈ {(1 − α), 1, (1 + α)}. At stage 2, ′ auction might be negative according to Lemma 3.8. the seller sets the reserve to be the κ2 quantile of distribution F The stage-IC monotonicity of first vs. second-price SoftFloor such that Prv∼F ′ [v ≤ β · a] = Prv∼F [v ≤ a] for all a ≥ 0. auctions suggests that the first-price SoftFloor auction may lead to more predictable behavior than the second-price SoftFloor Lemma 4.2. For any distribution F, γ ∈ (0, 1], and a fixed κ2, i-DIC auction in practice: the seller is able to adjust i-SIC by changing is non-increasing as κ1 increases. the gap between rl and rh in a first-price SoftFloor auction, while This lemma suggests that the buyer is more likely to misreport i-SIC of a second-price SoftFloor auction is difficult to control when the reserve is higher. Intuitively, this is because when the as it is sensitive to the probability density of the buyer’s valuation buyer’s true valuation is below the reserve, she can misreport to around rh. any bid lower than her true valuation to reduce the reserve for stage 2 without losing any utility. 4 METRIC FOR DYNAMIC AUCTIONS In this section, we generalize our metric to a dynamic environment. Proof of Lemma 4.2. First, notice that the stage mechanism at stage 1 is a second-price auction with reserve, and therefore, by Definition 4.1 (Individual Dynamic-IC Metric). The individual Theorem 3.2, we have dynamic-IC metric at the first stage for a buyer with valuation dis- ∆-s1(α) tributions F(1,T ) and discounting factor γ in a dynamic mechanism lim = 1, α→0 · E  ·  ⟨x,p⟩ is defined by 2α v1 v1 x1 v1 ÍT τ −1 which implies that ∆-s1(α) + τ =2 γ · ∆-dτ (α) i-DIC = lim   , where γ · ∆ d (α) α→0 2α · Ev v1 · x1 v1 - 2 1 i-DIC = 1 + lim   .    α→0 2α · Ev v1 · x1 v1 ∆-s1(α) = E uˆ1 (1 + α)v1 − uˆ1 (1 − α)v1 , and 1 v1    Moreover, notice that ∆-d2(α) is independent of κ1 and additionally, ∆-dτ (α) = E uˆτ (1 + α)v1,v(2,τ ) − uˆτ (1 − α)v1,v(2,τ ) . v(1,τ ) ∆-d2(α) is non-positive since (1 + α) perturbation of bids in stage 1 will lead to a higher reserve in stage 2 than (1 − α) perturbation, Notice that when γ = 0, we have i-DIC = i-SIC for the first stage. which induces smaller utility in stage 2. Finally, we can conclude In fact, i-DIC is a direct extension of i-SIC. Recall that at the first   the proof by noticing that Ev v1 · x1 v1 is non-increasing as stage, dynamic incentive-compatibility requires that 1 κ1 increases since a larger κ1 induces a higher reserve for stage 1, h  i reducing the buyer’s welfare at stage 1. □ v1 ∈ argmax u1 b1;v1 + U 1(b1) , where b1 Lemma 4.3. For any distribution F, γ ∈ (0, 1], and a fixed κ1, T Õ τ −1    U 1(b1) = γ · E uˆτ b1,v(2,τ ) . γ ·(1 − κ2)· q(κ2)· G q(κ2) v(2,τ ) − . τ =2 i-DIC = 1 Ev1 [v1 · x1(v1)] A Data-Driven Metric of Incentive Compatibility WWW ’20, April 20–24, 2020, Taipei, Taiwan

Notice that (1 − κ2)· q(κ2) is exactly the expected revenue of set- 5 EXPERIMENTS ting the reserve as q(κ2) when there is no competing bids. Therefore, In this section we apply our metrics over real bid data from search the result suggests that the more the seller optimizes the reserve and display ad auctions. Our search and display ad data is obtained to extract revenue from the buyer, the smaller individual dynamic- from the auction logs of the Google search engine and Google IC metric for DynamicReserve. Another factor in this formula is Ad Exchange, respectively. The evaluation is semi-synthetic: we  G q(κ2) , which is the winning probability of the buyer when she draw on real bid data, but we simulate the artificial and stylized bids q(κ2). Thus, the formula suggests that individual dynamic-IC mechanisms of Sections 3 and 4, rather than the actual mechanisms metric for DynamicReserve is smaller when the buyer’s winning implemented by the ad exchange or search engine. Our goal is probability is larger. to validate the theory on IC metrics developed so far, and more generally to obtain quantitative comparisons between the incen- Proof of Lemma 4.3. Recall that in the proof of Lemma 4.2, we tive properties of different mechanisms which can be intuitively have derived a formula of i-DIC such that reasoned about. γ · ∆-d2(α) The experimental setup is the same in both cases. We obtain i-DIC = 1 + lim   . data from two consecutive days. On day 1, auctions are randomly α→0 2α · Ev v1 · x1 v1 1 partitioned into three experimental buckets with multiplicative bid   perturbations of (1 − α), 1, (1 + α) respectively. This provides the When κ1 is fixed, Ev1 v1 · x1 v1 is fixed, too. Therefore, i-DIC ∆-d 2(α) empirical data needed to compute the individual stage-IC metric only depends on limα→0 , which is 2α for each bidder according to Definition 3.1. The expectations are    evaluated simply by taking the empirical means over the data. A Ev2 uˆ v2; (1 + α) − uˆ v2, (1 − α) lim , default reserve price is used on day 1, but bid distributions from that α→0 2α day are used to implement a dynamic reserve pricing policy on day 2 where uˆ(v2, β) = 0 for v2 < β · q(κ2), and for v2 ≥ β · q(κ2), (e.g., by taking bid distribution quantiles). To form bid distributions,

∫ v2 the seller needs to define traffic buckets to pool auction data (e.g.,by    uˆ v2, β = v2 − βq(κ2) · G βq(κ2) + (v2 − z)· д(z)dz. keyword or website); if the seller uses bidder identities to segment ( ) βq κ2 the traffic, the resulting reserves are personalized, otherwise they As a result, we have are anonymous. By taking the utility difference on day 2 due to dynamic reserves from upwards or downwards bid perturbations ∫ 1   h   on day 1, we can then compute the dynamic-IC metric for each E uˆ v2; β = f (v2)· v2 − β · q(κ2) · G β · q(κ2) v 2 β ·q(κ2) bidder according to Definition 4.1. We use discounting factor γ = ∫ v2 i 1 for simplicity, to equally weigh the static and dynamic effects ( − )· ( ) + v2 z д z dz dv2. of bid perturbations in the metric. We set the perturbation level β ·q(κ2) α depending on the domain, setting it small enough to achieve   Therefore, taking the derivative of Ev2 uˆ v2; β with respect to β narrow confidence intervals without introducing too much bias. at β = 1, we have The selection of α makes a trade-off between the bias and variance of our estimator: the bias is bounded by O(α2) while the standard E  ( + ) − , ( − ) − − v2 uˆ v2; 1 α uˆ v2 1 α error is O(α 1 · n 1) where n is the number of auctions. 2α The IC metrics are defined at the level of a bidder. To aggregate ∫ 1 across bidders, we take a weighted average where each bidder is = − f (v )q(κ )G q(κ )dv = −(1 − κ )q(κ )G q(κ ). 2 2 2 2 2 2 2 weighted by its empirical allocation probability E [x(v)]. We also q(κ2) v considered weighting bidders by their empirical welfare Ev [v·x(v)], which concludes the proof. □ but because different ad auctions can have orders of magnitude differences in welfare (e.g., text versus video ads in display), this The following corollary directly follows the observation that tended to make the final metrics much more variable. Throughout (1 −κ2)·q(κ2) is exactly the expected revenue of setting the reserve we use the jackknife to estimate standard errors and report 95% as q(κ2) when there are no competing bids. confidence intervals as plus/minus twice the standard17 error[ ]. The jackknife can lead to conservative (i.e., larger) estimates of vari- Corollary 4.4. When there are no competing bids, the individual ance compared to other resampling techniques like the bootstrap, dynamic-IC metric is minimized when the reserve is set to be the ∗ ∗ but it is much more efficient to run over large datasets [16]. Myerson reserve: κ2 = F(r ) where r = argmaxr r ·(1 − F(r)). Recall that for regular distributions, the quantity r ·(1 − F(r)) 5.1 Display Ad Auctions is non-decreasing for r < r ∗ and is non-increasing for r > r ∗ [30]. Our display ad dataset consists of over 5M auction records from the Therefore, we have logs of the Google Ad Exchange sampled over three consecutive Corollary 4.5. When there are no competing bids, for any regular days which we number 0, 1, 2. For each auction we record the bid distribution F, the individual dynamic-IC metric is non-increasing and a buyer identifier for each bidder. Some auctions only have ∗ ∗ for κ2 < F(r ) and is non-decreasing for κ2 > F(r ). singleton bids. We also record the publisher identifier (similar to a website URL) and device type (mobile or desktop). The buyer id, WWW ’20, April 20–24, 2020, Taipei, Taiwan Yuan Deng, Sébastien Lahaie, Vahab Mirrokni, and Song Zuo

5 10 15 20 0.9 1.0 ● 1st−price 2nd−price ● ● ● ● 1.0 ● ● ● ● ● ● ● ● 0.8 ● ● ● ● ● ● ● 0.8 ● ● ● ● ● ● ● ● ● ● ● ●● ● ● 0.8 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.6 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.7 ● ● ● ● 0.6 ● ● ● ● 0.4 ● ● ● ● ● ● ● 0.6 first−price weight Dynamic IC Metric ● Dynamic IC Metric 20th quantile ● 0.4 ● 20th quantile ● 0 ● ● ● Dynamic IC Metric ● ● ● ● 0.2 80th quantile ● ● ● 80th quantile ● 0.2 ● ● ● ● ● ● ● Myerson ● ● Myerson 0.4 No reserve ● ● ● ● ● ● ● ● ● ● ● ● No reserve ● ● 0.5 ● ● 0.0 ● 0.2 ●

0.0 0.2 0.4 0.6 0.8 1.0 5 10 15 20 0.6 0.8 1.0 1.2 1.4 First−Price Weight High Reserve Multiplier Myerson Reserve Scaling

(a) IC monotonicity of MixtureAuction, (b) IC monotonicity of first- and second- (c) IC monotonicity of MixtureAuction, varying the first-price weight κ on day 1 price SoftFloor, varying the ratio rh /rl where we apply and vary a scaling factor and the dynamic reserve pricing policy on for a fixed rl on day 1 and the dynamic re- on the Myerson dynamic reserve pricing day 2. serve pricing policy on day 2. policy on day 2.

Figure 1: Empirical IC monotonicity of MixtureAuction and SoftFloor. Note that when no reserve prices are applied on day 2 (red lines), the DIC metric coincides with the SIC metric. publisher id, and device type are used to form bid distributions aggressive. The added effect of dynamic reserves shrinks asone to compute dynamic reserve prices; the reserves in this study are moves towards κ = 1, because at that setting SIC itself reaches 0. therefore personalized. For our experiments with SoftFloor we held the low reserve rl Using this data we simulate MixtureAuction and first- and fixed and set rh to a multiple of rl , ranging from 1 to 20. The result- second-price SoftFloor, varying their main parameters κ and rh ing trends in the DIC metric are shown in Figure 1b. In agreement respectively. We use the 60th quantile of bid distributions on day 0 with Theorem 3.7, the SIC metric is decreasing in rh for first-price to set the day 1 reserve price for MixtureAuction and the lower SoftFloor, holding rl fixed. (Recall that when there are nody- reserve rl for SoftFloor. The day 0 data is not otherwise used to namic reserves, the SIC metric and DIC metric have the same value, run simulations and compute IC metrics. We consider two kinds of which is the red line in the plot.) Also, in agreement with the in- dynamic reserve pricing policies: 1) quantile reserves which corre- sights from Lemma 3.8, the SIC metric is not monotone in rh for spond to a fixed quantile of the previous-day bid distribution, and second-price SoftFloor. For the latter mechanism, the ranking of ∗ 2) Myerson reserves computed as r = argmaxr r ·(1 − F(r)) where dynamic reserve policies is the same as for MixtureAuction: My- F is the empirical bid distribution. Using bid quantiles for reserves erson reserves are more aggressive than 80th quantiles, followed by is a natural approach because it directly controls the rate at which 20th quantiles. This is also initially true for first-price SoftFloor bids are filtered. We apply 1% perturbations to bids to evaluate the at low rh, but an inversion occurs at a multiplier of 10 where the IC metrics. Again, MixtureAuction, SoftFloor, and the quan- 20th quantile reserves achieve the lowest value of DIC metric. This tile/Myerson dynamic reserve policies are artificial mechanisms is not a contradiction: the DIC metric is not always increasing with simulated to validate the theory, and do not represent what the ad price aggressiveness. exchange actually implements in practice. Monotonicity in Dynamic Reserves. In Figure 1c we plot the DIC Monotonicity in Auction Parameters. Figure 1 plots the results metric for MixtureAuction with κ = 0, 0.2, 0.4, where recall that of the experiments on the display ad data. Note that the metric 0 yields the standard second-price auction. In this experiment we always empirically lies between 0 and 1, in line with the theory. applied Myerson reserves, but these were scaled up and down We first consider Figure 1a which shows the trend in the aggregate by 25% and 50%. Under the assumptions of Corollaries 4.4–4.5, the DIC metric, varying the first-price weight κ from 0 (pure second- minimum DIC metric is achieved at the Myerson reserve, and should price) to 1 (pure first-price). We plot several reserve pricing policies: monotonically decrease (increase) before (after) that point under the 20th and 80th bid quantiles, and the Myerson reserves. For regular bid distributions. These trends are borne out in Figure 1c. comparison we also plot the policy of not using a reserve, which is As κ increases, the trend flattens up to the point where the metric equivalent to just evaluating SIC. In agreement with Theorem 3.5, is identically zero at κ = 1 (not shown). the metric is linear in κ with very little variance. At κ = 0, the effect We should stress, however, that this experiment violates the of using Myerson dynamic reserves lowers the DIC metric from assumptions in the corollaries, so the monotonicity trends were not 1.0 to 0.8, an effect equivalent to directly mixing in a first-price formally guaranteed. There are competing bids in our simulations, auction with probability 0.2. There is an intuitive ordering in the and we did not confirm whether bid distributions are always regular. reserve policies: the 20th quantile has a smaller effect on the DIC Nonetheless, the implications of the theory are intuitive and in this metric than the 80th quantile, and the Myerson policy is even more case they were borne out empirically. A Data-Driven Metric of Incentive Compatibility WWW ’20, April 20–24, 2020, Taipei, Taiwan

5.2 Search Ad Auctions 1.0 1.5 2.0 2.5 3.0 3.5 4.0 The auction mechanism used to allocate ad slots around search results is known as a position auction (also called a keyword or VCG GSP ), and the most common position auction across major search engines today is the celebrated Generalized 1.0 ● ● ● ● ● Second-Price (GSP) auction [25]. In a position auction a buyer i places a bid of bi per click to appear in one of m ordered ad slots. ● The search engine associates a weight of w to the bidder, which is 0.8 i ● ● ● ● proportional to the bidder’s probability of being clicked if placed ● in the top position. Each ad slot j has a position normalizer λj , and ● 0.6 ● it’s assumed that higher slots lead to more clicks: λ1 > ··· > λm. ● ● Under the standard separability assumption, the expected click-rate IC Metric of bidder i in position j is wi λj . 0.4 All position auctions rank bidders in descending order of their ● score wibi [15, 36]. Without loss of generality, assume bidders are indexed so that w1b1 > ··· > wnbn; in other words, bidder i obtains 0.2 stage ● slot i if i ≤ m, and is otherwise not shown. In GSP, a winning bidder dynamic ● i is charged wi+1bi+1/wi per click. Intuitively, this is the lowest bid that would maintain bidder i’s position. The VCG auction under 0.0 this model charges the bidder (per click) [2]: 1.0 1.5 2.0 2.5 3.0 3.5 4.0 1 Õ Number of Positions (λ − − λ )w b . w λ j 1 j j j i i j >i Figure 2: Empirical IC metrics for VCG and GSP under per- Note that the VCG payment formula depends on the position nor- sonalized reserve prices computed from previous-day bids. malizers, whereas GSP does not. (If separability does not hold, the VCG formula is more complicated.) Another historically significant mechanism is the Generalized First-Price (GFP) auction where buy- confounded by other factors like ad formats and extensions. The ers simply pay their bids per click, used by Overture in the very simulation results have to be interpreted subject to these caveats. first position auctions [14, 18]. Because of the difficulty in simulating how click-rate varies with Our dataset consists of over 1M keyword auctions from the logs position in a real ad auction, we believe that the best way to evalu- of the Google search engine, sampled over two consecutive days ate the IC metrics in an actual deployed mechanism would be via in 2019. We restrict our attention to auctions with exactly 4 slots, bid perturbation experiments over live traffic. shown at the top of the search results page, and only record the Our simulation runs over two days. On the first day, bids are information of shown bidders. For each bidder, the data has a bid perturbed upwards and downwards by 10% in order to evaluate the bi (in a common currency), a normalized click probability (the wi ), SIC metric. We use a small, fixed global reserve price on this day.On and a click probability within its shown position; from the latter the second day, we apply dynamic reserves modeled on the market two estimates we back out a position normalizer λi . Given this data reserve pricing policy once used by Yahoo, as described by Ostrovsky we simulate GSP and VCG exactly as described above—the actual and Schwarz [31]. Their approach was to partition the space of keyword auction run in practice is naturally much more complex, keywords, assume that bids are i.i.d. within each partition, fit the but our purpose in this paper is to evaluate these mechanisms in distributions, and apply scaled Myerson reserves. Ostrovsky and their original and purest form. Schwarz report that scaling factors of 0.4, 0.5, 0.6 were used because The main import of using real ad auction data to validate our they led to comparable revenue uplifts to full reserves without metric is that the data captures realistic market variation. We would filtering as many bidders. We used these same scaling factorsin like to confirm that our metric can be estimated to a useful levelof our simulations, and report on results under 0.5 scaling along with precision—at the very least, the metric should be significantly dif- the full optimal reserves. Note that this reserve pricing policy is ferent from 1 under GSP with at least two slots or dynamic reserve anonymous, in the sense that bids of different bidders are pooled prices. The most important aspect of the data, from the perspec- together by keyword partition. We apply an anonymous policy in tive of incentives and our metric, are the position normalizers.1 the same vein, but also investigate a personalized policy that pools However, separability is a questionable assumption in practice: an bids and applies reserves by bidder identity. Specifically, each bid in ad’s click-rate can depend in general on the quality of neighboring our dataset is associated with a campaign and adgroup (a campaign ads [1, 3, 7], other page elements like images [28], and may be consists of multiple adgroups); we compute personalized reserves at the campaign level but evaluate the individual IC metrics at the 1For some added intuition, supposed the normalizers for positions two and lower are adgroup level. We evaluate the full DIC metric by incorporating trivial compared to the first position. Then GSP is effectively a second-price auction, and our metric would evaluate to 1 regardless of the bid distributions. At the other the utility impact of the reserves on the second day. extreme, if all normalizers are almost equal, this induces the most shading because there is little to gain in terms of click-rate from being in a higher position, but potentially Number of Positions. The reason that GSP is not incentive com- much to lose in terms of payment. patible is that bidders have an incentive to shade their values when WWW ’20, April 20–24, 2020, Taipei, Taiwan Yuan Deng, Sébastien Lahaie, Vahab Mirrokni, and Song Zuo

Auction SIC DIC-anon. DIC-pers. because the SIC metric is always 0 pointwise for GFP. According to the results in the table, the effect of anonymous reserves isto scaling = 0.5 lower DIC to around 0.05 below SIC for all mechanisms. Perhaps score-GSP 0.567 (0.099) 0.536 (0.104) 0.483 (0.108) surprisingly, the decline in DIC is very slight even comparing the bid-GSP 0.565 (0.102) 0.538 (0.105) 0.471 (0.11) half-scaled anonymous reserves to the full anonymous reserves. Mixture 0.511 (0.006) 0.486 (0.016) 0.464 (0.021) The effect of scaling personalized reserves is more pronounced, in VCG 0.994 (0.014) 0.960 (0.031) 0.824 (0.054) line with qualitative intuition, with the point estimate for the DIC scaling = 1.0 metric dropping for score-GSP from 0.483 under 0.5 scaling to 0.312 score-GSP 0.567 (0.099) 0.503 (0.109) 0.312 (0.133) under no scaling. bid-GSP 0.565 (0.102) 0.518 (0.109) 0.306 (0.133) Mixture 0.511 (0.006) 0.453 (0.026) 0.241 (0.069) 6 CONCLUSIONS VCG 0.994 (0.014) 0.924 (0.040) 0.634 (0.097) This paper introduced a new metric to quantify the incentive com- Table 1: Empirical IC metrics for various positions auctions, patibility of mechanisms in both static and dynamic environments, with Myerson dynamic reserves scaled by 0.5 and 1.0 on day with concrete applications to ad auctions. The metric relies on esti- 2. Standard errors are given in parentheses. mating buyer utilities under small bid perturbations, either through black-box simulations or experiments on small slices of live auction there is more than one position. It is therefore interesting to con- traffic. With this data in hand, computing the metric is amatterof sider how our IC metrics vary with the number of positions. To straightforward database queries; there is no need for optimization restrict the number of slots in our simulations, we set the position or access to a strategyproof reference mechanism. One can easily normalizers of lower slots to 0. Figure 2 plots the SIC and DIC compute standard errors or confidence intervals to evaluate the metrics aggregated over all bidders (adgroups), using the full per- metric alongside others such as revenue or click-rate. sonalized Myerson reserves on the second day. VCG always has an We showed that our metric takes the form of an index bounded SIC of 1 in these results as it should, whereas the GSP SIC starts at between 0 and 1 under reasonable assumptions on the underlying 1 under a single position (where it’s equivalent to the second-price mechanism. The value of 0 is achieved by a first-price payment auction), and monotonically decreases to 0.57 with four positions. rule, and 1 is achieved if the mechanism is incentive-compatible. The added impact of the dynamic personalized reserves on the This may be useful to check whether an implemented mechanism is overall DIC metric is substantial, dropping the SIC metric down incentive-compatible as planned. For example, the VCG formula for by approximately 0.25 at each number of positions; for VCG the position auctions relies on a separability assumption for click-rates effect increases with the number of positions. We note thatthe which may not hold in practice; our metric would allow one to confidence intervals are large here in comparison to the display quantify any deviations from intended incentives. For mechanisms ad setting. Increasing the size of the dataset did not help in this like soft floors and dynamic reserve prices it is possible to obtain respect. The reason for the wide intervals is that each bidder here closed-form characterizations of the incentive compatibility metric, (an adgroup) participates in relatively few auctions, compared to and we have found that it is informative to examine the metric’s display ads where some bidders are large demand-side platforms monotonicity (or non-monotonicity) in various auction parameters (representing many advertisers) that participate in most auctions. to reason about their effect on incentives. This is not a drawback of the IC metrics, but simply reflects the fact Nonetheless, care is needed to avoid reading too much into the that bidder incentives in GSP can vary depending on the position a metric: it is meant as a sell-side measure to gain insight into de- bidder achieves. The SIC confidence intervals for VCG are much ployed mechanisms. It is not useful as a buy-side measure to inform narrower as one would expect, because the metric evaluates to 1 in bidding, for instance, because there is no relationship between the this case (as the bid perturbation shrinks) no matter what a bidder’s individual-level stage-IC metric and the optimal or equilibrium position. levels of bid shading. We validated the theory via simulations over real ad auction data Anonymous vs. Personalized Reserves. We next focus on the DIC from the Google Ad Exchange and Google search engine. Among metric and examine the effect of using anonymous and personalized our findings, we validated empirically that the dynamic-IC metric reserve prices. Intuitively, anonymous reserves should be more achieves its minimum under Myerson dynamic reserve prices for difficult for a bidder to manipulate, and the DIC metric canbeused first- and second-price mixture auctions, demonstrating a trade- to quantify this effect. It is also informative to see how scaling off between incentive compatibility and revenue performance. We down the reserves as in Ostrovsky and Schwarz [31] may affect also provided the first quantitative assessment of the incentive incentives. properties of GSP over real data. Table 1 summarizes the SIC and DIC metrics for several kinds of In future work we plan to develop analytical characterizations positions auctions. We include two kinds of GSP auctions: score- of the IC metric for broader classes of mechanisms, and investigate GSP is the mechanism examined so far where bidders are ranked its potential relationship with other measures such as the buyer’s by wibi , while bid-GSP ranks bidders solely by bid bi , which was regret and her envy towards other buyers. a mechanism reportedly used by Yahoo [23]. The table also lists results for a 50-50 mixture of VCG and GFP for comparison. In agreement with Corollary 3.6, this mixture has an SIC of 0.5 (to within two standard errors), and its confidence intervals are narrow A Data-Driven Metric of Incentive Compatibility WWW ’20, April 20–24, 2020, Taipei, Taiwan

REFERENCES [19] Jon Feldman, Shanmugavelayutham Muthukrishnan, Martin Pal, and Cliff Stein. [1] Gagan Aggarwal, Jon Feldman, Shanmugavelayutham Muthukrishnan, and Mar- 2007. Budget optimization in search-based advertising auctions. In Proceedings tin Pál. 2008. Sponsored search auctions with markovian users. In International of the 8th ACM conference on Electronic commerce. 40–49. Workshop on Internet and Network Economics. Springer, 621–628. [20] Zhe Feng, Okke Schrijvers, and Eric Sodomka. 2019. Online Learning for Measur- [2] Gagan Aggarwal, Ashish Goel, and Rajeev Motwani. 2006. Truthful auctions for ing Incentive Compatibility in Ad Auctions. In The World Wide Web Conference. pricing search keywords. In Proceedings of the 7th ACM conference on Electronic ACM, 2729–2735. commerce. ACM, 1–7. [21] Google. [n.d.]. Transition schedule to first-price auction. https://support.google. [3] Susan Athey and Glenn Ellison. 2011. Position auctions with consumer search. com/admanager/answer/9298211?hl=en. Accessed: 2019-10-04. The Quarterly Journal of Economics 126, 3 (2011), 1213–1270. [22] Yash Kanoria and Hamid Nazerzadeh. 2017. Dynamic reserve prices for repeated [4] Maria-Florina Balcan, Tuomas Sandholm, and Ellen Vitercik. 2019. Estimating auctions: Learning from bids. (2017). Approximate Incentive Compatibility. In Proceedings of the 2019 ACM Conference [23] Sébastien Lahaie. 2006. An analysis of alternative slot auction designs for spon- on Economics and Computation (Phoenix, AZ, USA) (EC ’19). ACM, New York, sored search. In Proceedings of the 7th ACM Conference on Electronic Commerce. NY, USA, 867–867. https://doi.org/10.1145/3328526.3329628 ACM, 218–227. [5] Santiago Balseiro, Anthony Kim, Mohammad Mahdian, and Vahab Mirrokni. [24] Sébastien Lahaie, Andrés Munoz Medina, Balasubramanian Sivan, and Sergei 2017. Budget management strategies in repeated auctions. In Proceedings of the Vassilvitskii. 2018. Testing Incentive Compatibility in Display Ad Auctions. In 26th International Conference on World Wide Web. 15–23. Proceedings of the 2018 World Wide Web Conference (WWW). 1419–1428. [6] Gabriel Carroll. 2011. A quantitative approach to incentives: Application to voting [25] Sébastien Lahaie, David M Pennock, Amin Saberi, and Rakesh V Vohra. 2007. rules. Unpublished Manuscript. Massachusetts Institute of Technology (2011). Sponsored search auctions. Algorithmic (2007), 699–716. [7] Olivier Chapelle and Ya Zhang. 2009. A dynamic bayesian network click model [26] Renato Paes Leme and Eva Tardos. 2010. Pure and Bayes-Nash for web search ranking. In Proceedings of the 18th international conference on for generalized second price auction. In 2010 IEEE 51st Annual Symposium on World wide web. ACM, 1–10. Foundations of Computer Science. IEEE, 735–744. [8] Yuyu Chen. 2017. Programmatic advertising is preparing for the first-price [27] Benjamin Lubin and David C Parkes. 2009. Quantifying the strategyproofness of auction era. https://digiday.com/marketing/programmatic-advertising-readying- mechanisms via metrics on payoff distributions. In Proceedings of the Twenty-Fifth first-price-auction-era/. Accessed: 2019-10-04. Conference on Uncertainty in Artificial Intelligence. AUAI Press, 349–358. [9] Riccardo Colini-Baldeschi, Stefano Leonardi, Okke Schrijvers, and Eric Sodomka. [28] Pavel Metrikov, Fernando Diaz, Sebastien Lahaie, and Justin Rao. 2014. Whole 2019. Envy, Regret, and Social Welfare Loss. arXiv preprint arXiv:1907.07721 page optimization: how page elements interact with the position auction. In Proceedings of the fifteenth ACM conference on Economics and computation. ACM, (2019). 583–600. [10] Robert Day and . 2008. Core-selecting package auctions. Interna- [29] Vahab Mirrokni, Renato Paes Leme, Pingzhong Tang, and Song Zuo. 2018. Non- tional Journal of Game Theory 36, 3-4 (2008), 393–407. clairvoyant dynamic mechanism design. In Proceedings of the 2018 ACM Confer- [11] Yuan Deng and Sébastien Lahaie. 2019. Testing Dynamic Incentive Compatibility ence on Economics and Computation (EC). ACM, 169–169. in Display Ad Auctions. In Proceedings of the 25th ACM SIGKDD International [30] Roger B Myerson. 1981. Optimal auction design. Mathematics of Operations Conference on Knowledge Discovery & Data Mining (Anchorage, AK, USA) (KDD Research 6, 1 (1981), 58–73. . 1616–1624. ’19) [31] Michael Ostrovsky and Michael Schwarz. 2011. Reserve prices in internet adver- [12] Paul Duetting, Zhe Feng, Harikrishna Narasimhan, David Parkes, and Sai Srivatsa tising auctions: a field experiment. In Proceedings of the 12th ACM conference on Ravindranath. 2019. Optimal Auctions through Deep Learning. In International Electronic commerce. ACM, 59–60. . 1706–1715. Conference on Machine Learning [32] David C. Parkes, Jayant Kalagnanam, and Marta Eso. 2001. Achieving Budget- [13] Paul Duetting, Felix Fischer, Pichayut Jirapinyo, John K Lai, Benjamin Lubin, balance with Vickrey-based Payment Schemes in Exchanges. In Proceedings of and David C Parkes. 2015. Payment rules through discriminant-based classifiers. the 17th International Joint Conference on Artificial Intelligence - Volume 2 (Seattle, ACM Transactions on Economics and Computation (TEAC) 3, 1 (2015), 5. WA, USA) (IJCAI’01). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, [14] Benjamin Edelman and Michael Ostrovsky. 2007. Strategic bidder behavior in 1161–1168. http://dl.acm.org/citation.cfm?id=1642194.1642250 sponsored search auctions. 43, 1 (2007), 192–198. Decision Support Systems [33] Parag A Pathak and Tayfun Sönmez. 2013. School admissions reform in Chicago [15] Benjamin Edelman, Michael Ostrovsky, and Michael Schwarz. 2007. Internet and England: Comparing mechanisms by their vulnerability to manipulation. advertising and the generalized second-price auction: Selling billions of dollars American Economic Review 103, 1 (2013), 80–106. worth of keywords. 97, 1 (2007), 242–259. American Economic Review [34] Sarah Sluis. 2019. Google Switches To First-Price Auction. https: [16] Bradley Efron and Trevor Hastie. 2016. . Vol. 5. Computer age statistical inference //adexchanger.com/online-advertising/google-switches-to-first-price- Cambridge University Press. auction/#close-olyticsmodal. Accessed: 2019-10-04. [17] Bradley Efron and Charles Stein. 1981. The jackknife estimate of variance. The [35] Peter Troyan and Thayer Morrill. 2020. Obvious manipulations. Journal of Annals of Statistics (1981), 586–596. Economic Theory 185 (2020), 104970. [18] Daniel C Fain and Jan O Pedersen. 2006. Sponsored search: A brief history. [36] Hal R Varian. 2007. Position auctions. International Journal of Industrial Organi- Bulletin of the american Society for Information Science and technology 32, 2 (2006), zation 25, 6 (2007), 1163–1178. 12–13. [37] Robert Zeithammer. 2019. Soft Floors in Auctions. Management Science 65, 9 (2019), 4204–4221. https://doi.org/10.1287/mnsc.2018.3164