Generalization Guarantees for Multi-item Profit Maximization: Pricing, Auctions, and Randomized Mechanisms
Maria-Florina Balcan Tuomas Sandholm Ellen Vitercik Carnegie Mellon University Carnegie Mellon University Carnegie Mellon University [email protected] Optimized Markets, Inc. [email protected] Strategic Machine, Inc. Strategy Robot, Inc. [email protected]
April 13, 2021
Abstract We study the design of multi-item mechanisms that maximize expected profit with respect to a distribution over buyers’ values. In practice, a full description of the distribution is typically unavailable. Therefore, we study the setting where the designer only has samples from the distribution and the goal is to find a high-profit mechanism within a class of mechanisms. If the class is complex, a mechanism may have high average profit over the samples but low expected profit. This raises the question: how many samples are sufficient to ensure that a mechanism’s average profit is close to its expected profit? To answer this question, we uncover structure shared by many pricing, auction, and lottery mechanisms: for any set of buyers’ values, profit is piecewise linear in the mechanism’s parameters. Using this structure, we prove new bounds for mechanism classes not yet studied in the sample-based mechanism design literature and match or improve over the best known guarantees for many classes. Finally, we provide tools for optimizing an important tradeoff: more complex mechanisms typically have higher average profit over the samples than simpler mechanisms, but more samples are required to ensure that average profit nearly matches expected profit.
1 Introduction
The design of profit-maximizing∗ mechanisms is a fundamental problem with diverse applications
arXiv:1705.00243v5 [cs.LG] 10 Apr 2021 including Internet retailing, advertising markets, strategic sourcing, and artwork sales. This prob- lem has traditionally been studied under the assumption that there is a joint distribution from which the buyers’ values are drawn and that the mechanism designer knows this distribution in ad- vance. This assumption has led to groundbreaking theoretical results in the single-item setting [63], but transitioning from theory to practice is challenging because the true distribution over buyers’ values is typically unknown. Moreover, in the dramatically more challenging setting where there are multiple items for sale, the support of the distribution alone is often doubly exponential (even if there were just a single buyer with a finite type space), so obtaining and storing the distribution is typically impossible. We relax this strong assumption and instead assume that the mechanism designer only has a set of independent samples from the distribution, an approach introduced by Likhodedov and
∗In this work, we study the standard setting of profit maximization with risk-neutral agents.
1 Sandholm [54, 55] and Sandholm and Likhodedov [69]. Specifically, a single sample is a random draw from the distribution over buyers’ values, listing each buyer’s value for each set of items for sale. When the mechanism designer uses a set of samples rather than a description of the distribution to design a mechanism, we refer to this procedure as sample-based mechanism design. Sample-based mechanism design reflects current industry practices since many companies, such as online ad exchanges [47, 58], sponsored search platforms [14, 35, 72], travel companies [82], and resellers of returned items [79], use historical purchase data to adjust the sales mechanism. We provide provable guarantees for sample-based mechanism design. A mechanism designer must be cautious when performing sample-based mechanism design. In most multi-item settings, the form of the revenue-maximizing mechanism is still a mystery. Therefore, rather than use the samples to uncover the optimal mechanism, much of the literature on sample-based mechanism design suggests that we first fix a reasonably expressive mechanism class and then use the samples to optimize over the class. If, however, the mechanism class is large and complex, a mechanism with high average profit over the set of samples may have low expected profit on the actual unknown distribution if the number of samples is not sufficiently large. This motivates an important question in sample-based mechanism design: Given a set of samples and a mechanism class M, what is the difference between the average profit over the samples and the expected profit on the unknown distribution for any mechanism in M? If this difference is small, the mechanism designer can be assured that the mechanism in M that maximizes average profit over the set of samples nearly maximizes expected profit over the distri- bution as well. Thus, he can be confident in using the set of samples in lieu of a precise description of the distribution. In this paper, we present a general theory for deriving uniform convergence generalization guarantees in multi-item settings, as well as data-dependent guarantees when the distribution over buyers’ values is well-behaved. A generalization guarantee for a mechanism class M bounds the difference between the average profit over the samples and expected profit on the distribution for any mechanism in M. The bound holds uniformly across all mechanisms in M and is a function of both the number of samples and the intrinsic complexity of M. This paper is part of a line of research that studies how learning theory can be used to design and analyze mechanisms, beginning with seminal research by Balcan et al. [7, 8]. However, the majority of these papers have studied only single-item settings [2, 12, 18, 25, 30, 36, 41, 43, 45, 48, 59, 61, 67]. In contrast, we focus on multi-item mechanism design, as have recent papers by Cai and Daskalakis [20], Medina and Vassilvitskii [58], Morgenstern and Roughgarden [62], Syrgkanis [71] and Gonczarowski and Weinberg [42].
1.1 Our contributions Our contributions come in three interrelated parts.
1.1.1 A general theory that unifies diverse mechanism classes. We provide a clean, easy-to-use, general theorem for deriving generalization guarantees and we demonstrate its application to a large number of widely-used mechanism classes. This paper thus expands our understanding of uniform convergence in multi-item mechanism design, which had thus far focused on deriving guarantees for a small number of specific mechanism classes that are “simple” by design [62, 71]. We uncover a key structural property shared by a variety of mechanisms which allows us to prove generalization guarantees: for any fixed set of bids, profit is a piecewise linear function of the mechanism’s parameters. Our main theorem provides generalization guarantees for
2 Category Mechanism class Valuations Result Pricing mechanisms Item-pricing mechanisms General, unit-demand, additive Lemmas 3.18, 4.5, 9.7, 9.8 Two-part tariffs General Lemma 3.15 Non-linear pricing mechanisms General Lemmas 3.17, 8.7 Auctions Second-price auctions with Additive Lemmas 3.19, 4.4, 9.9 reserves Affine maximizer auctions General Lemma 3.21 Virtual valuation General Lemma 3.21 combinatorial auctions Mixed-bundling auctions with General Lemma 3.20 reserves Randomized mechanisms Lotteries Additive, unit-demand Lemmas 3.23, 4.6
Table 1: Brief summary of some of the main mechanism classes we analyze in Sections 3.4 and 4 as well as the Electronic Companion 9.1. any class exhibiting this structure. To prove this theorem, we relate the complexity of the partition splitting the parameter space into linear portions to the intrinsic complexity of the mechanism class, which we quantify using pseudo-dimension. In turn, pseudo-dimension bounds imply generalization bounds. We prove that many seemingly disparate mechanisms share this structure, and thus our main theorem yields learnability guarantees. Table 1 summarizes some of the main mechanism classes we analyze and Tables 2, 3, and 4 summarize our bounds. We prove that our main theorem applies to randomized mechanisms, making us the first to provide generalization bounds for these mechanisms. Our guarantees apply to lotteries, a general representation of randomized mechanisms. Randomized mechanisms are known to generate higher expected revenue than deterministic mechanisms in many settings [28, 32]. Our results imply, for example, that if the mechanism designer plans to offer a menu of ` lotteries over m items to an additive or unit-demand buyer, the difference between any such menu’s average profit over N samples and its expected profit is O˜ Up`m/N , where U is the maximum profit achievable over the support of the buyer’s valuation distribution. We also provide guarantees for pricing mechanisms using our main theorem. These include item-pricing mechanisms, also known as posted-price mechanisms, where each item has a price and buyers buy their utility-maximizing bundles. Additionally, we study multi-part tariffs, where there is an upfront fee and a price per unit. We are the first to provide generalization bounds for these tariffs and other non-linear pricing mechanisms, which have been studied in economics for decades [38, 64, 80]. For instance, our main theorem guarantees that if there are κ units of a single good for sale, the difference between any two-part tariff’s average profit over N samples and its expected profit is O˜ Upκ/N . Our main theorem implies generalization bounds for many auction classes, such as second price auctions. We also study several well-studied generalized VCG auctions, such as affine maximizer auctions, virtual valuations combinatorial auctions, and mixed-bundling auctions [33, 49, 52, 66, 69]. A key challenge which differentiates our generalization guarantees from those typically found in machine learning is the sensitivity of these mechanisms to small changes in their parameters. For example, changing the price of a good can cause a steep drop in profit if the buyer no longer wants to buy it. Meanwhile, for many well-understood function classes in machine learning, there is a close connection between the distance in parameter space between two parameter vectors and the
3 Valuations Auction class Our bounds Prior bounds Additive or Length-` lottery menu Up`m log(`m)/N N/A unit- demand Additive, Length-` item lottery menu Up` log `/N N/A item- independent∗
∗ Additive cost function Table 2: Generalization bounds in big-O˜ notation for lotteries. We denote the maximum profit achievable by any mechanism in the class over the support of the bidders’ valuation distribution by U. There are m items, N samples, and the cost function is general unless otherwise noted. distance in function space between the two corresponding functions. Understanding this connection is often the key to quantifying the class’s intrinsic complexity. Intrinsic complexity can be quantified by pseudo-dimension, for example, which allows us to derive learnability guarantees. Since profit functions do not exhibit this predictable behavior, we must carefully analyze the structure of the mechanisms we study in order to derive our generalization guarantees.
Data-dependent generalization guarantees for profit maximization. We strengthen our main theorem when the distribution over buyers’ values is “well-behaved,” proving generalization guarantees that are independent of the number of items for item-pricing mechanisms, second price auctions with reserves, and a subset of lottery mechanisms. Under anonymous prices, our bounds do not depend on the number of bidders either. These guarantees hold when the bidders are additive with values drawn from item-independent distributions (bidder i1’s value for item j is independent 0 from her value for item j , but her value for item j may be arbitrarily correlated with bidder i2’s value for item j). Bidders with item-independent value distributions have been studied extensively in prior re- search [5, 20, 22, 23, 39, 44, 81]. Cai and Daskalakis [20] provide learning algorithms for bidders with valuations drawn from product distributions, which are item-independent. As we describe in Section 6, we improve a generalization bound by Cai and Daskalakis [20] and Morgenstern and Roughgarden [62] for learning a mechanism with expected revenue that is nearly a constant frac- tion of optimal when the buyers have additive values drawn from a product distribution. The √ improvement is by a multiplicative factor of O( m), where m is the number of items.
Structural profit maximization. Many of the mechanism classes we study exhibit a hierar- chical structure. For example, when designing a pricing mechanism, the designer can segment the population into k groups and charge each group a different price. This is prevalent throughout daily life: movie theaters and amusement parks have different admission prices per market segment, with groups such as Child, Student, Adult, and Senior Citizen. In the simplest case, k = 1 and the prices are anonymous. If k equals the number of buyers, the prices are non-anonymous, thus forming a hierarchy of mechanisms. In general, the designer should not choose the simplest class to optimize over simply to guarantee good generalization because more complex classes are more likely to contain nearly optimal mechanisms. We show how the mechanism designer can determine the precise level in the hierarchy assuring him the optimal tradeoff between profit maximization and generalization.
4 Valuations Mechanism class Price class Our bounds Prior bounds General Length-` menus of Anonymous Up` log(κn`)/N N/A two-part tariffs p over κ units Non-anonymous U n` log(κn`)/N N/A p Qm ‡ Non-linear pricing Anonymous U m i=1(κi + 1)/N N/A p Qm ‡ Non-anonymous U nm i=1(κi + 1)/N N/A p Pm ‡ Additively Anonymous U m i=1 κi/N N/A decomposable p Pm ‡ non-linear pricing Non-anonymous U nm i=1 κi/N N/A Item-pricing Anonymous Upm2/N Upm2/N § Non-anonymous Upnm(m + log n)/N Upnm2 log n/N § Unit- Item-pricing Anonymous Upm · min{m, log(nm)}/N Upm2/N § demand Non-anonymous Upnm log(nm)/N Upnm2 log n/N § Additive Item-pricing Anonymous Upm log m/N Upm log m/N §, (U/δ) pm log (nN) /N † Non-anonymous Upnm log(nm)/N Upnm log(nm)/N §, (U/δ) pnm log (N) /N † Additive, Item-pricing Anonymous Up1/N Upm log m/N §, item- (U/δ) pm log (nN) /N † independent∗ Non-anonymous Upn log n/N Upnm log(nm)/N §, (U/δ) pnm log (N) /N †
∗ ‡ § Additive cost function; κi is an upper bound on the number of units available of item i; Morgenstern and Roughgarden [62]; † Syrgkanis [71]. The probability these bounds fail to hold is δ. In all other bounds, δ appears in a log so we suppress it using big-O˜ notation.
Table 3: Generalization bounds in big-O˜ notation for pricing mechanisms. We denote the maxi- mum profit achievable by any mechanism in the class over the support of the bidders’ valuation distribution by U. There are m items, n buyers, and N samples. The cost function is general unless otherwise noted.
1.2 Related research Sample-based mechanism design was introduced in the context of automated mechanism design (AMD). In AMD, the goal is to design algorithms that take as input information about a set of buyers and return a mechanism that maximizes an objective such as revenue [27, 29, 68]. The input information about the buyers in early AMD was an explicit description of the distribution over their valuations. The support of the distribution’s prior is often doubly exponential, for example in combinatorial auctions, so obtaining and storing the distribution is impractical. In response, sample-based mechanism design was introduced where the input is a set of samples from this distribution [54, 55, 69]. Those papers also introduced the idea of searching for a high-revenue mechanism in a parameterized space where any parameter vector yields a mechanism that satisfies the individual rationality and incentive-compatibility constraints. This was in contrast to the traditional approach of representing mechanism design as an unrestricted optimization problem where those constraints need to be explicitly modeled. The parameterized work studied algorithms for designing combinatorial auctions with high empirical revenue. We follow the parameterized approach, but we study generalization guarantees, which they did not address.
5 Valuations Auction class Our bounds Prior bounds √ General AMAs and λ-auctions Upnm+1m log n/N cUpm/Nnm+2 n2 + nm¶‖ √ VVCAs Upn2m2m log n/N cUpm/Nnm+2 n2 + nm¶‖ MBARPs Upm(log n + m)/N Upm3 log n/N ¶ Additive Second price item auctions with Upm log m/N Upm log m/N § anonymous reserve prices Second price item auctions with Upnm log(nm)/N Upnm log(nm)/N § non-anonymous reserve prices Additive, Second price item auctions with Up1/N Upm log m/N § item- anonymous reserve prices independent∗ Second price item auctions with Upn log n/N Upnm log(nm)/N § non-anonymous reserve prices
∗ Additive cost function; ‖ The value of c > 1 depends on the range of the auction parameters; § Morgenstern and Roughgarden [62]; ¶ Balcan et al. [10].
Table 4: Generalization bounds in big-O˜ notation for auctions. We denote the maximum profit achievable by any mechanism in the class over the support of the bidders’ valuation distribution by U. There are m items, n buyers, and N samples. The cost function is general unless otherwise noted.
Balcan et al. [7, 8] were the first to study the connection between learning theory and revenue maximization, employing classic learning-theoretic tools such as covering numbers. They showed how to use an algorithm A that returns a high-revenue, manipulable mechanism in order to find a high-revenue, incentive-compatible mechanism. Their approach randomly splits the bidders into two groups. With the first group’s bids as input, they use the algorithm A to find a high-revenue mechanism and they field that mechanism on the second group (and vice versa). While the mech- anism is not incentive compatible for the first group, it is incentive compatible for the second. They prove that the mechanism’s revenue over the first group’s bids nearly matches the revenue obtained from the second group, so long as the number of bidders is sufficiently large compared to the covering number of the mechanism class being optimized over. Balcan et al. [7, 8] study settings with unrestricted supply, whereas we primarily focus on settings with limited supply. More recent research has provided generalization guarantees when there is limited supply, with a particular focus on single-parameter (e.g., single-item) settings [2, 18, 24, 25, 30, 36, 41, 43, 45, 48, 59, 61, 67]. From an algorithmic perspective, Devanur et al. [30], Gonczarowski and Nisan [41], Guo et al. [43], and Hartline and Taggart [45] provide computationally efficient algorithms for learning nearly-optimal single-item auctions in various settings. In contrast, we study multi-parameter settings. Balcan et al. [9] drew on classic tools from learning theory to provide algorithms and general- ization guarantees for a related problem in algorithmic game theory: learning agents’ preferences. Specifically, they made connections to the concept of generalized linear functions from the struc- tured prediction literature [e.g., 26, which we discuss in Appendix 9]. Their algorithms make use of past data describing the purchases of a utility-maximizing agent in order to predict the agent’s future purchases. Morgenstern and Roughgarden [62] later used the same concept of generalized linear functions from the structured prediction literature to provide sample complexity guarantees for multi-item revenue maximization. They proposed a technique for bounding a mechanism class’s pseudo-
6 dimension that requires two steps; we describe these two steps at a high level here and refer the reader to Appendix 9 for more details. First, one must show that for any mechanism in the class, its allocation function is a d-dimensional linear function, for some d ∈ Z (see Appendix 9 for the formal definition). Next, fixing an arbitrary set of samples and an allocation per sample, one must bound the pseudo-dimension of the set of revenue functions across all mechanisms that induce those allocations. They show how these two steps imply a bound on the mechanism class’s pseudo-dimension. The guarantees presented in this paper offer several advantages over Morgenstern and Rough- garden’s approach. First, our main theorem depends on a structural property—the piecewise-linear form of the revenue function—that is not defined in terms of any learning theory concept (such as generalized linear functions or pseudo-dimension) and thus can be more readily applicable. More- over, in several cases, Morgenstern and Roughgarden [62] proved loose guarantees using structured prediction; in the appendix, they used a first-principles approach to prove stronger guarantees. Their structured prediction proof technique requires them to bound the total number of allocations a mechanism class can induce on a set of samples. Their bound is a bit loose, and we are able to tighten it using the techniques we develop in this paper, as we detail in Appendix 9. By combining our analysis techniques with tools from structured prediction, we are able to match the tighter bounds that Morgenstern and Roughgarden [62] proved via a first-principals approach. Finally, we apply our guarantees to a wide variety of mechanism classes, both simple and complex (as sum- marized in Table 1), whereas Morgenstern and Roughgarden [62] applied their guarantees to three mechanism classes that are “simple” by design: item-pricing mechanisms, grand-bundle-pricing mechanisms (where an agent can buy either the grand bundle consisting of all items or nothing at all), and second-price item auctions. Syrgkanis [71] also suggests a general technique for providing generalization guarantees which he applies to several “simple” mechanism classes: the same three as Morgenstern and Roughgarden [62] as well as single-item t-level auctions [61]. His generalization guarantees apply only to empirical revenue maximization algorithms, which return the mechanism (from within a particular mechanism class) that maximizes average revenue over the samples. This is in contrast to our bounds (as well as those by Morgenstern and Roughgarden [62]), which apply uniformly to every mechanism in a given class. This is crucial when it may not be computationally feasible to determine a mechanism with highest average revenue over the samples, but only an approximation. Given a set of samples S of size N, Syrgkanis’s bound depends on a quantity he calls the split-sample growth rate, which counts how many different mechanisms the empirical revenue maximization algorithm can return on any sub-sample of S of size N/2. Another advantage of our generalization bounds (beyond applying uniformly to every mechanism in a given class) is that they grow only logarithmically 1 with a term δ , where δ is the probability that the bound fails to hold (as do those by Morgenstern 1 and Roughgarden [62]). In contrast, the bounds by Syrgkanis [71] grow linearly in δ . In Section 6, we provide more details on how our results compare to those by Morgenstern and Roughgarden [62] and Syrgkanis [71], as well as a detailed comparison of our results to other papers on the sample complexity of multi-item revenue maximization [10, 20, 42, 58]. A problem that is similar but distinct from ours is dynamic pricing, where prices are adjusted over a finite time horizon and the consumer demand function is unknown [e.g., 3, 15, 17, 40, 50, all of whom study the single-item setting]. The goal is typically to minimize regret, which is the difference between the cumulative profit of the best prices in hindsight and that of the algorithm’s selected prices.
7 2 Preliminaries and notation
We consider the problem of selling m heterogeneous items to n buyers. We denote a bundle of m items as a quantity vector q ∈ Z≥0. The number of units of item i in the bundle represented by q is denoted by its ith component q[i]. Accordingly, the bundle consisting of only one copy of the th i item is denoted by the standard basis vector ei, where ei[i] = 1 and ei[j] = 0 for all j 6= i. Each buyer j ∈ [n] has a valuation function vj over bundles of items. If one bundle q0 is contained within another bundle q1 (i.e., q0[i] ≤ q1[i] for all i ∈ [m]), then vj (q0) ≤ vj (q1) and vj (0) = 0. We denote an allocation as Q = (q1,..., qn) where qj is the bundle of items that buyer j receives under allocation Q. The cost to produce the bundle q is denoted as c (q) and the cost to produce the Qm allocation Q is denoted as c (Q). Suppose there are κi units available of item i. Let K = i=1 κi. We use vj = (vj (q1) , . . . , vj (qK )) to denote buyer j’s values for all of the K bundles and we use v = (v1,..., vn) to denote a vector of buyer values. We use the notation X to denote the set of Pm all valuation vectors v. We also study additive buyers (vj (q) = q[i]vj (ei)) and unit-demand i=1 buyers vj (q) = maxi:q[i]≥1 vj (ei) . Every auction in the classes we study is dominant strategy incentive compatible, so we assume that the bids equal the bidders’ valuations. There is an unknown distribution D over buyers’ values, which means the support of D is contained in the set X . We make very few assumptions about this distribution. First of all, we do not assume the distribution belongs to a parametric family. Moreover, we do not assume the buyers’ values are independently or identically distributed. In particular, a bidder’s values for multiple bundles may be correlated, and multiple bidders may have correlated values as well. We say that profitM (v) is the profit of a mechanism M on the valuation vector v. For a dis- tribution D over buyers’ values, we denote the expected profit of M over D as profitD (M) = Ev∼D [profitM (v)] and for a set of samples S, we denote the average profit of M over S as 1 P profitS (M) = |S| v∈S profitM (v). d We study real-valued functions parameterized by vectors p in R , denoted as fp : X → R. For a fixed v ∈ X , we often consider fp (v) as a function of its parameters, which we denote as fv (p).
3 Generalization guarantees
In this section, we provide generalization guarantees for sample-based mechanism design. A gen- eralization guarantee for a given mechanism class informs the mechanism designer how well the expected profit of any mechanism in the class matches its average profit over the samples. There- fore, if a mechanism class comes with strong generalization guarantees, the designer can be confident that the set of samples can serve as a surrogate for knowledge of the distribution. In particular, if he finds a mechanism in the class with high revenue over the set of samples, then that mechanism will almost certainly have high revenue in expectation over the distribution over buyers’ values. Generalization guarantees have been a central topic in computational learning theory dating back to seminal work by Vapnik and Chervonenkis [78]. Definition 3.1 (Generalization guarantee). A generalization guarantee for a mechanism class M is a function M : Z≥1 ×(0, 1) → R≥0 defined such that for any δ ∈ (0, 1), any sample size N ∈ Z≥1, and any distribution D over buyers’ valuations, with probability at least 1 − δ over the draw of a set S ∼ DN , for any mechanism M in M, the difference between the average profit of M over S and the expected profit of M over D is at most M(N, δ). In other words, " # 1 X Pr ∃M ∈ M such that profitM (v) − E [profitM (v)] > M(N, δ) < δ. S∼DN N v∼D v∈S
8 If M(N, δ) tends to 0 as N tends to infinity, then we say that M is uniformly learnable. The 1 − δ high probability caveat must be present in any generalization guarantee because it is always possible that the set of samples, no matter how large, is completely unrepresentative of the distribution over buyers’ values. Therefore, we can only guarantee that the difference between the average and expected profit of a mechanism is small with probability 1 − δ over the draw of the set of samples. As is intuitively clear, M(N, δ) ought grow as δ shrinks since we need to ensure that the difference between the average and expected profit of each mechanism in M is at most M(N, δ) with probability at least 1 − δ. Moreover, the larger the mechanism designer’s set of samples is, the smaller the difference between average profit over the samples and expected profit will be. Since this difference is bounded by M(N, δ), we can conclude that M(N, δ) ought to shrink as N grows. The final element that any generalization bound M(N, δ) depends on is the specific mechanism class M. Computational learning theory informs us that the more “complex” the mechanism class M is, the more difficult it is to bound the difference between the average and expected profit of every mechanism in M. Mathematically, this means that richer mechanism classes M have larger generalization bounds M(N, δ), so M(N, δ) grows with the complexity of M. Generalization guarantees are useful beyond their ability to bound the difference between the average profit of a mechanism and its expected profit. For an arbitrary class M, generalization guarantees allow the mechanism designer to relate the expected profit of a mechanism in M which achieves maximum average profit over the set of samples to the expected profit of an optimal mechanism in M. We summarize this connection in the following remark, which is well-known for learning over general function classes in machine learning (see, for example, the textbooks by Shalev-Shwartz and Ben-David [70] and Mohri et al. [60]). Remark 3.2. For a set of samples S from the distribution over buyers’ values, let Mˆ be the mech- anism in M that maximizes average profit over the set of samples and let M ∗ be the mechanism in M that maximizes expected profit over D. Finally, let U be the maximum profit achievable by any mechanism in M over the support of the distribution D. For any δ ∈ (0, 1), with prob- ability at least 1 − δ over the draw of a set of N samples S from D, the difference between the ˆ ∗ expected profit of M over D and the expected profit of M over D is at most 2M (N, δ). In ˆ P ∗ other words, if M = argmaxM∈M v∈S profitM (v) , M = argmaxM∈M {Ev∼D [profitM (v)]}, and U = maxM∈M,v∈supp(D) {profitM (v)}, then Pr E profitM ∗ (v) − profitMˆ (v) > 2M (N, δ) < δ. S∼DN v∼D
Therefore, so long as there is a strong generalization bound M(N, δ) associated with the class M, the mechanism designer can be confident that an optimal mechanism over the set of sam- ples competes with an optimal mechanism in M. Similar bounds also hold for mechanisms with approximately optimal average profit over the samples (see Corollaries 8.1 and 8.2 in Appendix 8).
3.1 General structure for sample-based mechanism design Our general theorem uses structure shared by a myriad of mechanism classes to characterize the function M (N, δ). Our results apply broadly to parameterized sets M of mechanisms. This means d that every mechanism in M is defined by a vector p ∈ R , where the value of d depends on the mechanism class. For example, p might be a vector of prices. Our guarantees apply to mechanism classes where for every valuation vector v ∈ X , the profit as a function of the parameters p, denoted profitv (p), is piecewise linear. We begin by illustrating this property via several simple examples.
9 Figure 1: This figure illustrates the partition of the two-part tariff parameter space into piecewise- linear portions in the following scenario: there is one buyer whose value for one unit is v1(1) = 6, value for two units is v1(2) = 9, value for three units is v1(3) = 11, and value for i ≥ 4 units is v1(i) = 12. We assume the seller produces at most four units. The buyer will buy exactly one unit if v1(1) − p1 − p2 > v1(i) − p1 − i · p2 for all i ∈ {2, 3, 4} and v1(1) − p1 − p2 > 0. This region of the parameter space is colored orange (the top region), and within, profit is linear in p1 and p2. By similar logic, the buyer will buy exactly two units in the blue region (the second-the-top region), exactly three units in the green region (the second-the-bottom region), and exactly four units in the red region (the bottom region), and profit is linear in p1 and p2 in each of these regions.
Example 3.3 (Two-part tariffs). In a two-part tariff, there are multiple units (i.e., copies) of a single item for sale. The seller sets an upfront fee p1 and a price per unit p2. Here, we consider the simple case where there is a single buyer†. If the buyer wishes to buy t ≥ 1 units, she pays the upfront fee p1 plus p2 · t, and if she does not want to buy anything, she does not pay anything. Two-part tariffs have been studied extensively [38, 64, 80] and are prevalent throughout daily life. For example, health club membership programs often require an upfront membership fee plus a fee per month. Amusement parks often require an entrance fee with an additional payment per ride, as Oi [64] analyzed in his paper “A Disneyland Dilemma.” In many cities, purchasing a public transportation card requires a small upfront fee and an additional cost per ride. Many coffee machines, such as those made by Keurig and Nespresso, require specialty coffee pods. Purchasing these pods amounts to paying a fee per unit on top of the upfront fee, which is the cost of the coffee machine. Following the publication of the conference version of this paper [11], Balcan, Prasad, and Sandholm [12] showed how to efficiently learn two-part tariffs that maximize average revenue over a training set, as well as menus of two-part tariffs, which we discuss in Section 3.4.1. Revenue is a piecewise-linear function of the two-part tariff parameters p1 and p2 for the follow- ing reason. Suppose there are κ units of the item for sale. The buyer will buy exactly t ∈ {1, . . . , κ} 0 0 0 units so long as v1 (t)−(p1 + p2 · t) > v1 (t )−(p1 + p2 · t ) for all t 6= t and v1 (t)−(p1 + p2 · t) > 0. κ+1 2 Therefore, for a fixed set of buyer values, there are at most 2 hyperplanes splitting R into convex regions such that within any one region, the number of units bought does not vary. So long as the number of units bought is invariant, profit is a linear function of p1 and p2. See Figure 1 for an illustration.
Example 3.4 (Item-pricing mechanisms). Under an item-pricing mechanism, also known as a posted-price mechanism, there are multiple items for sale and multiple buyers. Unlike in Exam- ple 3.3, we assume there is a single unit of each item for sale. The mechanism designer sets a price per item and buyers buy their utility-maximizing bundles. These mechanisms are prevalent
†We generalize to multiple buyers in Section 3.4.
10 Figure 2: This figure illustrates the partition of the item-pricing parameter space into piecewise- linear portions in the following scenario: there are two items for sale and there are two buyers. Buyer 1’s value for the first item is v1(1, 0) = 2, her value for the second item is v1(0, 1) = 1, and her value for both items is v1(1, 1) = 2.5. Buyer 2’s values are v2(1, 0) = 0, v2(0, 1) = 1, and v2(1, 1) = 1. Suppose buyer 1 comes before buyer 2 in the ordering, which means that buyer 1 will first choose to buy the bundle of items that maximizes her utility and then buyer 2 will buy the bundle of remaining items that maximizes her utility. In the orange region, buyer 1 will buy item 1 because v1(1, 0) − p1 > v1(0, 1) − p2, v1(1, 0) − p1 > v1(1, 1) − (p1 + p2), and v1(1, 0) − p1 > 0. Buyer 2 will not buy anything because the only remaining item is item 2 and v2(0, 1) − p2 < 0. By the same logic, in the red region, neither buyer will buy any item. In the blue region, buyer 1 will buy item 1 and buyer 2 will buy item 2. In the green region, buyer 1 will buy item 2 and buyer 2 will not buy anything. Finally, in the white region, buyer 1 will buy both items and buyer 2 will not buy anything. throughout the economics and computation literature (e.g., [4, 22, 37]). In a bit more detail, an item-pricing mechanism can be defined by either anonymous or non-anonymous prices. Under anonymous prices, the seller sets a price pi per item i. Under non-anonymous prices, there is a buyer-specific price per item. There is some fixed but arbitrary ordering on the buyers such that the first buyer in the ordering arrives first and buys the bundle of items that maximizes his utility, then the next buyer in the ordering arrives and buys the bundle of remaining items that maximizes his utility, and so on. Consider the simple case where the prices are anonymous‡. For a given buyer j and bundle pair q , q ∈ {0, 1}m, buyer j will prefer bundle q over bundle q so long as v (q ) − P p > 1 2 1 2 j 1 i:q1[i]=1 i v (q ) − P p . Therefore, for a fixed set of buyer values and for each buyer j, her preference j 2 i:q2[i]=1 i 2m ordering over the bundles is completely determined by the 2 hyperplanes X X m vj(q1) − pi = vj(q2) − pi : q1, q2 ∈ {0, 1} .
i:q1[i]=1 i:q2[i]=1 Once the buyers’ preference orderings are all fixed, the set of bundles they will each buy is also fixed. Furthermore, in any region of the price space where the bundles the buyers buy are fixed, profit is a linear function of the prices. See Figure 2 for an illustration. We provide generalization guarantees that are closely dependent on the “complexity” of the d partition splitting R into regions such that profitv (p) is linear. Inspired by structure exhibited by many mechanism classes, such as Examples 3.3 and 3.4, we require that this partition be defined by a finite number of hyperplanes. We give the following name to this type of mechanism class: Definition 3.5 ((d, t)-delineable). We say a mechanism class M is (d, t)-delineable if:
‡We study non-anonymous prices as well in Section 3.4.
11 (a) A partition by hy- (b) Another partition (c) Overlay of parti- (d) A further subdivi- perplanes. by hyperplanes. tions (a) and (b). sion of each region.
Figure 3: Illustrations of the proof of Lemma 3.10.
d 1. The class M consists of mechanisms parameterized by vectors p from a set P ⊆ R ; and 2. For any valuation vector v ∈ X , there is a set H of t hyperplanes such that for any connected 0 0 component P of P \ H, the function profitv (p) is linear over P . As is standard, P \ H indicates set removal. For example, in Figures 3a and 3b, there are four connected components in the partition defined by the hyperplanes and in Figure 3c, there are nine connected components. We relate delineability to the mechanism class’s intrinsic complexity using pseudo-dimension, which we describe in the following section. In turn, pseudo-dimension bounds imply generalization guarantees.
3.2 Pseudo-dimension Pseudo-dimension [65] is a well-studied learning-theoretic tool used to measure the complexity of a function class. In the case of mechanism design, when we refer to a function class, we mean the set of all profit functions profitM defined by mechanisms M from a fixed mechanism class M. Pseudo- dimension captures the following intuition: the richer a function class is, the better functions within that class are able to fit complex patterns. Pseudo-dimension is a quantity that measures just how well functions within a class are able to match these complex patterns. To formally define pseudo- dimension, we first introduce the notion of shattering, which is a fundamental concept in machine learning theory. We first describe it for general function classes, and then instantiate it for profit functions.
Definition 3.6 (Shattering). Let F be a set of real-valued functions f : A → R whose domain is (1) (N) (1) (N) an abstract set A. Let S = x , . . . , x be a subset of A and let z , . . . , z ∈ R be a set of witnesses. We say that z(1), . . . , z(N) witness the shattering of S by F if for all T ⊆ S, there exists (i) (i) (i) (i) (i) (i) some function fT ∈ F such that for all x ∈ T , fT x ≤ z and for all x 6∈ T , fT x > z . Figure 4 provides a visualization. Intuitively, the larger the set a function class can shatter, the more complex that function class is. The notion of pseudo-dimension formalizes this intuition.
Definition 3.7 (Pseudo-dimension [65]). Let F be a set of real-valued functions f : A → R whose domain is an abstract set A. Let S ⊆ A be the largest set that can be shattered by F. Then the pseudo-dimension of F, denoted Pdim(F), is defined to be |S|.
For the sake of clarity, we now translate pseudo-dimension into the language of mechanism (1) (N) (1) (N) design. Let S = v ,..., v be a subset of X and let z , . . . , z ∈ R be a set of witnesses. We say that z(1), . . . , z(N) witness the shattering of S by M if for all T ⊆ S, there exists some mechanism M ∈ such that for all v(i) ∈ T , profit v(i) ≤ z(i) and for all v(i) 6∈ T , T M MT
12 (a) Illustration of an affine function f (1) : R → R (b) Illustration of another affine function f (2) : (1) (1) (1) (2) (1) (1) (2) (2) where f x is larger than the witness z R → R. Here, f x < z and f x > and f (1) x(2) < z(2). z(2).
(c) Illustration of a third function f (3). Here, (d) Illustration of one last function f (4). Here, f (3) x(1) > z(1) and f (3) x(2) > z(2). f (4) x(1) < z(1) and f (4) x(2) < z(2).
Figure 4: The two points x(1) and x(2) can be shattered by the set F of affine functions mapping R to R, i.e., F = {x 7→ ax + b : a, b ∈ R}. profit v(i) > z(i). If there exists some z ∈ N that witnesses the shattering of by , then MT R S M we say that S is shatterable by M. Finally, the pseudo-dimension of M, denoted Pdim (M), is the size of the largest set that is shatterable by M. Pollard [65] and Dudley [34] provide generalization guarantees§ in terms of pseudo-dimension. We translate those guarantees into the language of mechanism design below in Theorem 3.8. The theorem holds for any mechanism class M, delineable or otherwise.
Theorem 3.8 ([34, 65]). For any mechanism class M and any distribution D over buyers’ values, let U be the maximum profit achievable by mechanisms in M over the support of D. Then there exists a generalization guarantee M : Z≥1 × (0, 1) → R≥0 defined such that r r Pdim(M) 2 ln(4/δ) (N, δ) = 120U + 4U . M N N
In other words, with probability at least 1 − δ over the draw of the samples S = v(1),..., v(N) ∼ DN , for all mechanisms M ∈ M,
N r r 1 Pdim(M) 2 ln(4/δ) X (i) profitM v − E [profitM (v)] ≤ 120U + 4U . N v∼D N N i=1 By Theorem 3.8, if we can upper bound the pseudo-dimension of a mechanism class M, then we can bound the difference between the average profit over a set of samples and the actual expected profit of any mechanism in M. In the next section, we provide a generalization guarantee M(N, δ) by upper bounding Pdim(M) using the delineability parameters d and t.
§Pseudo-dimension also implies multiplicative generalization guarantees in addition to the additive generalization guarantee presented in Theorem 3.8 [46, 53, 75].
13 3.3 General theorem for sample-based mechanism design In the following theorem, which is our main theorem, we relate pseudo-dimension to delineability (Definition 3.5). Theorem 3.9. Suppose M is a mechanism class that is (d, t)-delineable. Given a distribution D over buyers’ values, let U be the maximum profit achievable by mechanisms in M over the support of D. There exists a generalization guarantee M : Z≥1 × (0, 1) → R≥0 defined such that r r 9d log(4dt) 2 ln(4/δ) (N, δ) = 120U + 4U . M N N In other words, with probability at least 1 − δ over the draw of a set S = v(1),..., v(N) ∼ DN , for all mechanisms M ∈ M,
N r r 1 9d log(4dt) 2 ln(4/δ) X (i) profitM v − E [profitM (v)] ≤ 120U + 4U . N v∼D N N i=1 Proof. This theorem follows directly from the following lemma, where we prove that the pseudo dimension of M is at most 9d log(4dt), and Theorem 3.8, which provides a generalization guarantee in terms of pseudo-dimension.
Lemma 3.10. If M is a mechanism class that is (d, t)-delineable, then the pseudo dimension of M is at most 9d log(4dt). Proof. For any positive integer N, consider a set of N valuation vectors S = v(1),..., v(N) and (1) (N) real values z , . . . , z ∈ R. We will show that there exists a partitioning of the parameter space d d (i) into at most dN · d(Nt) regions such that for all p in any one region and all v , profitv(i) (p) is either greater than z(i) or less than z(i) (but not both). We will then use this fact to bound the pseudo-dimension of M. To this end, for v(i) ∈ S, let H(i) be the set of t hyperplanes such that for any connected 0 (i) 0 component P of P \ H , profitv(i) (p) is linear over P . We now consider the overlay of all N (1) (N) partitions P \H ,..., P \H . Formally, this overlay is made up of the sets P1,..., Pτ , which are SN (i) the connected components of P \ i=1 H . For each set Pj and each i ∈ [N], Pj is completely (i) contained in a single connected component of P \ H , which means that profitv(i) (p) is linear over (i) d Pj. (See Figures 3a, 3b, and 3c for illustrations.) Since H ≤ t for all i ∈ [N], τ < d(Nt) (this follows from Theorem 1 by Buck [19]). SN (i) (i) Consider a single connected component Pj of P\ i=1 H . For any valuation vector v ∈ S, (i) d (i) we know that profitv(i) (p) is linear over Pj. Let aj ∈ R and bj ∈ R be the weight vector and (i) (i) offset such that profitv(i) (p) = aj · p + bj for all p ∈ Pj. We know that there is a hyperplane (i) (i) (i) (i) aj · p + bj = z where on one side of the hyperplane, profitv(i) (p) ≤ z and on the other (i) side, profitv(i) (p) > z for all p ∈ Pj. Let HPj be all N hyperplanes for all N samples, i.e., n (i) (i) (i) o 0 HPj = aj · p + bj = z : i ∈ [N] . Notice that in any connected component P of Pj \ HPj (i) (i) (illustrated in Figure 3d), for all i ∈ [N], profitv(i) (p) is either greater than z or less than z 0 (but not both) for all p ∈ P . In total, the number of connected components of Pj \ HPj is smaller d than dN . The same holds for every partition Pj. Thus, the total number of regions where for (i) (i) all i ∈ [N], profitv(i) (p) is either greater than z or less than z (but not both) is smaller than dN d · d(Nt)d.
14 We will now use this fact to bound the pseudo-dimension of M. Suppose Pdim(M) = N¯. By n (1) (N¯)o (1) (N¯) definition, there exists a set v ,..., v that is shattered by M. Let z , . . . , z ∈ R be the points that witness this shattering. Again, by definition, we know that for any T ⊆ [N¯], there exists a parameter vector p ∈ such that if i ∈ T , then profit v(i) ≥ z(i) and if i 6∈ T , then T P pT profit v(i) < z(i). Let ∗ = p : T ⊆ [N¯] . We know that there exist k ≤ dN¯ d ·d(Nt¯ )d regions pT P T ¯ (i) P1,..., Pk where for each region Pj and each i ∈ [N], either profitv(i) (p) is greater than than z (i) ∗ for all p ∈ Pj or it is less than z . At most one vector in P can come from any one region. This means that |P ∗| = 2N¯ < dN¯ d · d(Nt¯ )d, which means that N¯ < 2d log N¯ + 2 log d + d log t. By Lemma 8.3 in Appendix 8, N¯ < 8d log(4d) + d log t + 2 log d ≤ 9d log(4dt).
In this section, we saw that if the profit functions of mechanisms from a class M are piecewise linear functions of the parameters, then we can provide a generalization guarantee M(N, δ) in terms of the class’s deliniability parameters. Thus, we can bound the difference between the average profit over a set of samples and the actual expected profit of any mechanism in the class.
3.4 Delineable mechanism classes We now show that a diverse array of mechanism classes are delineable, so we can apply Theorem 3.9. We warm up with Examples 3.3 and 3.4. The full proofs of all results are in Appendix 8. Lemma 3.11. Let M be the class of two-part tariffs over a single buyer and κ units (i.e., copies) κ+1 of a single item. Then M is 2, 2 -delineable. 2 Proof. The parameter space defining M is R . As we saw in Example 3.3, for any valuation vector κ+1 2 v, there are 2 hyperplanes splitting R into regions over which profitv (p) is linear. Lemma 3.12. Let M be the class of item-pricing mechanisms with anonymous prices. Then M 2m is m, n 2 -delineable. m Proof. This class’s parameter space is R , since there is one price per item. As we saw in Exam- 2m m ple 3.4, for any valuation vector v, there are n 2 hyperplanes splitting R into regions such that within any one region, profitv (p) is linear.
3.4.1 Non-linear pricing mechanisms. Non-linear pricing mechanisms are specifically used to sell multiple units (i.e., copies) of a set of items. We discuss two-part tariffs and general non-linear pricing mechanisms. We make the following natural assumption which informally states that as the number of units in an allocation grows, the cost of that allocation will exceed the buyers’ welfare. This implies delineability. Our assumption is formalized as follows.
Assumption 3.13. There is some cap κi per item i such that it costs more to produce κi units of m item i than the buyers will pay. In other words, there exists (κ1, . . . , κm) ∈ R such that for all v in the support of the distribution D over buyers’ values and all allocations Q = (q1,..., qn), if Pn Pn there exists an item i such that j=1 qj[i] > κi, then j=1 vj (qj) − c (Q) < 0. For example, suppose there are multiple units of a single item for sale and a single buyer whose value for τ units is always at most 8 + τ. Moreover, suppose the cost of producing τ units is 3τ. Then κ1 = 4 because for τ ≥ 5, 8 + τ < 3τ. In the remainder of this section, we describe why this assumption implies (d, t)-delineability with d < ∞ and t < ∞.
15 Menus of two-part tariffs. Menus of two-part tariffs are a generalization of Example 3.3. At a high level, the seller offers the buyers ` different two-part tariffs and each buyer simultaneously chooses the tariff and number of units that maximizes his utility. Menus of two-part tariffs are common throughout daily life: consumers often choose among various membership tiers—typically with a larger upfront fee in exchange for lower future payments—for health clubs, wholesale stores like Costco, amusement parks, credit cards, and cellphone plans. (1) (1) (`) (`) More formally, in the case of non-anonymous prices, let p1,j , p2,j ,..., p1,j, p2,j be the menu (i) th of two-part tariffs that the seller offers to buyer j. Here, p1,j is the upfront fee of the i tariff and (i) (i) (i) p2,j is the price per unit. If the prices are anonymous, then for each tariff i, p1,1 = ··· = p1,n and (i) (i) ¶ p2,1 = ··· = p2,n. We assume each buyer simultaneously chooses the tariff and the number of units maximizing his utility. In this context, an allocation is simply a vector (q1, . . . , qn) where qj ∈ Z≥0 is the number of units buyer j chooses to buy. If buyer j chooses the tariff tj ∈ [`] and chooses to (tj ) (tj ) buy qj ≥ 1 units, he will pay p1,j + p2,j · qj. Thus, the seller’s profit is n X (tj ) (tj ) p1,j · 1{qj ≥1} + p2,j · qj − c (Q) j=1
2n` where Q = (q1, . . . , qn). Note that this set of mechanisms is parameterized by vectors in R if 2` there are non-anonymous prices and R if the prices are anonymous. Assumption 3.13 states that there is some cap κ ∈ R such that for all v in the support of Pn the distribution D over buyers’ values and all allocations Q = (q1, . . . , qn), if j=1 qj > κ, then Pn j=1 vj (qj) − c (Q) < 0. We also make the natural assumption that the mechanism designer will not choose prices that cause the seller to have negative utility. In other words, we assume he will select a profit non-negative menu of two-part tariffs, formalized as follows. 2` Definition 3.14. In the case of anonymous prices (respectively, non-anonymous), let P ⊆ R 0 2n` (respectively, P ⊆ R ) be the set of prices where no matter which tariff each buyer chooses and no matter how many units he buys, the seller will obtain non-negative utility. In other words, let P (respectively, P 0) be the set of mechanism parameters such that for each buyer j ∈ [n], each tariff tj ∈ [`], and each allocation Q = (q1, . . . , qn), the seller’s utility is non-negative: n X (tj ) (tj ) p1,j · 1{qj ≥1} + p2,j · qj − c (Q) ≥ 0. j=1 The set of profit non-negative menus of two-part tariffs is defined by parameters in P (respectively, P 0). Under Assumption 3.13, we are guaranteed that no matter which profit non-negative parameters the mechanism designer chooses in P or P 0, if all buyers simultaneously choose the tariff and the Pn number of units (q1, . . . , qn) that maximize their utilities, then j=1 qj ≤ κ. See Lemma 8.4 in Appendix 8 for the proof. This fact is crucial to proving delineability since the number of hyperplanes splitting the parameter space into regions where profit is linear depends on κ. Lemma 3.15. Let M and M0 be the classes of anonymous and non-anonymous profit non-negative length-` menus of two-part tariffs. Under Assumption 3.13, M is 2`, n (κ`)2 -delineable and M0 is 2n`, n (κ`)2 -delineable.
¶The fact that the buyers arrive simultaneously and that there are multiple units of each item for sale differentiates this setting from that of item-pricing, which we describe in Example 3.4 and later in this section.
16 General non-linear pricing mechanisms. We study general non-linear pricing mechanisms under Wilson’s bundling interpretation [80]: if the prices are anonymous, there is a price per quantity vector q denoted p (q). The buyers simultaneously choose the bundles maximizing their utility: buyer j will purchase the bundle that maximizes vj (q) − p (q). If the prices are non- anonymous, there is a price per quantity vector q and buyer j ∈ [n] denoted pj (q). These general non-linear pricing mechanisms include multi-part tariffs as a special case. Without any assumptions, the parameter space of this mechanism class has an infinite number of dimensions, since the mechanism designer, in theory, could choose prices for each and every m bundle q ∈ Z≥0. In Lemma 8.5, we show that under Assumption 3.13, no buyer will ever choose a bundle q such that q[i] > κi for any i ∈ [m]. This holds so long as the mechanism designer chooses a profit non-negative non-linear pricing mechanism (see Definition 3.16, which is similar to Definition 3.14). Thus, the seller only needs to carefully set the prices of the bundles q such Qm that q[i] ≤ κi for all i ∈ [m]. Letting K = i=1(κi + 1) be the number of such bundles, the set K of anonymous prices the mechanism designer needs to carefully set is a subset of R and the set nK of non-anonymous prices is a subset of R . We now provide the definition of profit non-negative non-linear pricing mechanisms. Definition 3.16. In the case of anonymous prices (respectively, non-anonymous), let P (respec- tively, P 0) be the set of mechanism parameters such that for each buyer j ∈ [n] and each allocation Q = (q1,..., qn), the seller’s utility is non-negative:
n X pj(qj) − c (Q) ≥ 0. j=1
The set of profit non-negative non-linear pricing mechanisms is defined by parameters in P (re- spectively, P 0). Under Assumption 3.13, we are guaranteed that no matter which profit non-negative parameters the mechanism designer chooses in P or P 0, if all buyers simultaneously choose the bundles that Pn maximize their utilities, then j=1 qj[i] ≤ κi for all i ∈ [m]. See Lemma 8.5 in Appendix 8 for the proof. As in the case of two-part tariffs, this fact is crucial to proving delineability since the number of hyperplanes splitting the parameter space into regions where profit is linear depends on κ1, . . . , κm. Moreover, under these assumptions, the set of mechanism parameters is effectively K-dimensional in the case of anonymous prices and nK-dimensional in the case of non-anonymous prices: without loss of generality, the mechanism designer might as well set the price of any bundle q such that q[i] ≥ κi for some i ∈ [m] to ∞. Lemma 3.17. Let M and M0 be the classes of anonymous and non-anonymous profit non-negative Qm 2 non-linear pricing mechanisms. Let K = i=1 (κi + 1). Under Assumption 3.13, M is K, nK - delineable and M0 is nK, nK2-delineable. We prove polynomial bounds when prices are additive over items (Lemma 8.7 in Appendix 8).
3.4.2 Item-pricing mechanisms. We now apply Theorem 3.9 to anonymous and non-anonymous item-pricing mechanisms. Unlike non-linear pricing mechanisms, in this setting, there is only a single unit of each item for sale. Under anonymous prices, the seller sets a price per item. Under non-anonymous prices, there is a buyer-specific price per item. Since there is only one unit of each item for sale, we cannot assume the buyers arrive simultaneously and buy the bundle of goods maximizing their utilities, as we did
17 when studying non-linear pricing. After all, this may lead to over-demand. Rather, we assume that there is some fixed but arbitrary ordering on the buyers such that the first buyer in the ordering arrives first and buys the bundle of items that maximizes his utility, then the next buyer in the ordering arrives and buys the bundle of remaining items that maximizes his utility, and so on. This assumption is prevalent throughout the economics and computation literature (e.g., [4, 22, 37]).
Lemma 3.18. Let M and M0 be the classes of item-pricing mechanisms with anonymous prices and non-anonymous prices, respectively. If the buyers are additive‖, then M is (m, m)-delineable and M0 is (nm, nm)-delineable.
In Appendix 9, we connect the hyperplane structure we investigate in this paper to the struc- tured prediction literature in machine learning (e.g., [26]), thus proving even stronger generalization bounds for item-pricing mechanisms under buyers with unit-demand and general valuations and answering an open question by Morgenstern and Roughgarden [62]. Balcan et al. [9] were the first to explore the connection between structured prediction and mechanism design, though in a different setting from us: they provided algorithms that make use of past data describing the purchases of a utility-maximizing agent to produce a hypothesis function that can accurately forecast the future behavior of the agent. In Section 6, we compare these results with those from prior research [62, 71].
3.4.3 Auctions. We now present applications of Lemma 3.10 to auctions. In each setting, there is a single unit of each item for sale.
Second price item auctions with item reserves. These auctions are only strategy proof for additive bidders, so we restrict our attention to this setting. In the case of non-anonymous reserves, there is a price pj (ei) for each item i and each bidder j. The bidders submit bids on the items. For each item i, the highest bidder j wins the item if and only if her bid is above pj (ei). She pays the maximum of the second highest bid and pj (ei). If the bidder with the highest bid bids below her reserve, the item goes unsold. In the case of anonymous reserves, p1 (ei) = p2 (ei) = ··· = pn (ei) for each item i.
Lemma 3.19. Let M and M0 be the classes of anonymous and non-anonymous second price item auctions. Then M is (m, m)-delineable and M0 is (nm, m)-delineable.
In Section 6, we provide a detailed comparison of this lemma with results from prior research [30, 62, 71], in both the single- and multi-item setting.
Mixed bundling auctions with reserve prices (MBARPs). MBARPs [73] are a variation on the VCG mechanism with item reserve prices, with an additional fixed boost to the social welfare of any allocation where some bidder receives the grand bundle. Recall that in a single-item VCG auction (i.e., second-price auction) with a reserve price, the item is only sold if the highest bidder’s bid exceeds the reserve price, and the winner must pay the maximum of the second highest bid and the reserve price. To generalize this intuition to the multi-item case, we enlarge the set of agents to include the seller, whose valuation for a set of items is the set’s reserve price. An MBARP gives
‖In the two cases where the buyers have unit-demand and general valuations, we prove generalization bounds by connecting the hyperplane structure we investigate in this paper to the structured prediction literature in machine learning. See Appendix 9 for details.
18 an additional additive boost to the social welfare of any allocation where some bidder receives the grand bundle, and then runs the VCG mechanism over this enlarged set of bidders. The allocation is the boosted social welfare maximizer and the payments are the VCG payments on the boosted social welfare values. Importantly, the seller makes no payments, no matter her allocation. Formally, MBARPs are defined by a parameter γ ≥ 0 and m reserve prices p (e1) , . . . , p (em). Let λ be a function such that λ (Q) = γ if some bidder receives the grand bundle under allocation Q and 0 otherwise. For an allocation Q, let qQ be the items not allocated. Given a valuation vector v, the MBARP allocation is
n ∗ ∗ ∗ X X Q = (q1,..., qn) = argmax vj (qj) + p (ei) + λ (Q) − c (Q) .
j=1 i:qQ[i]=1 Using the notation −j −j −j X X Q = q1 ,..., qn = argmax v` (q`) + p (ei) + λ (Q) − c (Q) , `6=j i:qQ[i]=1 bidder j pays
X −j X −j −j X ∗ X ∗ ∗ v` q` + p (ei) + λ Q − c Q − v` (q` ) − p (ei) − λ (Q ) + c (Q ) . `6=j `6=j i:qQ−j [i]=1 i:qQ∗ [i]=1
Lemma 3.20. Let M be the set of MBARPs. Then M is m + 1, (n + 1)2m+1-delineable. In Table 4, we compare this lemma with results from prior research [10]. Mixed-bundling auctions [49] are a particularly simple class of auctions which correspond to MBARPs with no reserve prices. In this specific case, the class’s profit functions display additional structure that allows us to provide an even better guarantee than that implied by Lemma 3.10: for any valuation vector v, the function profitv is one-dimensional and has exactly one strict local maximum. See Appendix 8.2 for the details.
Affine maximizer auctions (AMAs). AMAs are an expressive mechanism class: Roberts [66] proved that AMAs are the only ex post truthful mechanisms over unrestricted value domains. Later, Lavi et al. [52] proved that under natural assumptions, every truthful multi-item auction is an “almost” AMA, that is, an AMA for sufficiently high values.∗∗ An AMA is defined by a weight per bidder wj ∈ R>0 and a boost per allocation λ (Q) ∈ R≥0. By increasing any wj or λ (Q), the seller can increase bidder j’s bids or increase the likelihood that Q is the auction’s allocation. The AMA allocation Q∗ is the one which maximizes the weighted social welfare, i.e., ∗ ∗ ∗ nPn o Q = (q1,..., qn) = argmax j=1 wjvj (qj) + λ (Q) − c (Q) . The payments have the same form as the VCG payments, with the parameters factored in to ensure truthfulness. Formally, using the notation −j −j −j X Q = q1 ,..., qn = argmax w`v` (q`) + λ (Q) − c (Q) , `6=j
∗∗Surprisingly, even when the buyers have additive values, AMAs can generate higher revenue than running a separate Myerson auction for each item [69], even though the buyers’ values do not exhibit any complementarity or substitutability. While this may seem surprising, it is to be expected due to prior research in bundle pricing in catalog offers [1, 6, 57].
19 each bidder j pays 1 X −j −j −j X ∗ ∗ ∗ w`v` q` + λ Q − c Q − w`v` (q` ) + λ (Q ) − c (Q ) . wj `6=j `6=j
A virtual valuation combinational auction (VVCA) [54] is an AMA where each λ (Q) is split Pn into n terms such that λ (Q) = j=1 λj (Q) where λj (Q) = cj,q for all allocations Q that give bidder j exactly bundle q. Finally, λ-auctions [49] are a special case of AMAs where the bidder weights equal 1.
Lemma 3.21. Let M, M0, and M00 be the classes of AMAs, VVCAs, and λ-auctions, respec- tively. Letting t = (n + 1)2m+1, we have that M is 2n(n + 1) + (n + 1)m+1, t-delineable, M0 is (n2m(3 + 2n), t)-delineable, and M00 is ((n + 1)m , t)-delineable.
In Table 4, we compare this lemma with results from prior research [10]. Lemma 3.21 implies exponentially-many samples are sufficient to ensure that average profit over the samples and expected profit are close. In Section 3.5, we prove an exponential number of samples is also necessary. We now study two hierarchies of AMAs. In Section 5, we show how to learn which level of the hierarchy optimizes the tradeoff between generalization and profit for the setting at hand.
Q-boosted AMAs and λ-auctions. Let Q be a set of allocations. The set of Q-boosted AMAs (resp., λ-auctions) consists of all AMAs (resp., λ-auctions) where only allocations in Q are boosted. In other words, if λ (Q) > 0, then Q ∈ Q.
Lemma 3.22. Let M and M0 be the classes of Q-boosted AMAs and λ-auctions. Then M is 3n(n + |Q|), (n + 1)2(m+1) -delineable and M0 is |Q|, (n + 1) (|Q| + 1)2 -delineable.
3.4.4 Lotteries. We now apply Theorem 3.9 to lottery menus. Lotteries are randomized mechanisms which are known to generate higher expected revenue than deterministic mechanisms in many settings (e.g., [28, 56, 74]) (0) (0) (1) (1) (`) (`) m A length-` lottery menu is a set M = φ , p , φ , p ,..., φ , p ⊆ R ×R, where φ(0) = 0 and p(0) = 0. As is typical in the lottery literature, we assume there is a single additive buyer; our results easily generalize to unit-demand buyers and multiple buyers (see Appendix 8.1 for details). Under the lottery φ(j), p(j), the buyer receives each item i with probability φ(j)[i] (j) (j) and pays a price of p . Since v1(ei) · φ [i] is his value for item i times the probability they get Pm (j) (j) that item, his expected utility is i=1 v1(ei) · φ [i] − p . Given a buyer with values defined by v, let (φv, pv) ∈ M be the lottery that maximizes the buyer’s expected utility. We use the notation q ∼ φv to denote a random allocation of the lottery (φv, pv). Specifically, for all i ∈ [m], q[i] = 1 with probability φv[i] and q[i] = 0 with probability 1 − φv[i]. The expected profit is profitM (v) = pv − Eq∼φv [c (q)]]. Let M be the class of all length-` lottery menus. The key challenge in bounding Pdim (M) is that Eq∼φv [c (q)] is not a piecewise linear function of the parameters φ(0),..., φ(`). To overcome this challenge, rather than bounding Pdim (M), we bound the pseudo-dimension of a related class M0. We then show that optimizing over M0 amounts to optimizing over M itself. To motivate the definition of M0, notice that if m z ∼ U ([0, 1] ), the probability that z[j] is smaller than φv[j] is φv[j]. Therefore, Eq∼φv [c (q)] =
20 h i c P e . For a lottery M, we define profit0 (v, z) := p − c P e and Ez j:z[j]<φv[j] j M v j:z[j]<φv[j] j 0 0 0 define M = profitM : M ∈ M . The important insight is that M is delineable because for a fixed pair (v, z), both the lottery the buyer chooses and the bundle P e are defined by j:z[j]<φv[j] j a set of hyperplanes. Lemma 3.23. The class M0 is ` (m + 1) , (` + 1)2 + m` -delineable. The following lemma guarantees that optimizing over M0 amounts to optimizing over M itself. 0 It follows from the fact that for all v and M, profitM (v) = Ez[profitM (v, z)]. Lemma 3.24. With probability 1 − δ over the draw n o v(1), z(1) ,..., v(N), z(N) ∼ (D × U ([0, 1])m)N , for all mechanisms M ∈ M, N r r 1 Pdim(M0) 2 ln(4/δ) X 0 (i) (i) profitM v , z − E [profitM (v)] ≤ 120U + 4U . N v∼D N N i=1 As we have seen in this section, a wide variety of mechanism classes are delineable. There- fore, Theorem 3.9 immediately implies a generalization guarantee M(N, δ) for a diverse array of mechanism classes M.
3.5 Sample complexity Generalization guarantees can be inverted to provide sample complexity guarantees. A sample complexity guarantee bounds the number of samples required to ensure a desired bound on gener- alization. In other words, suppose that for some δ ∈ (0, 1) and some > 0, the mechanism designer requires that with probability at least 1 − δ, for all mechanisms in the class M, average profit is -close to expected profit. By choosing N so that M(N, δ) < , we can be assured that this will be the case. This leads to the following well-known remark. q C1 Remark 3.25. Let M be a mechanism class with a generalization guarantee M(N, δ) = U N + q 2 C2 4U N log δ for some C1,C2 ∈ R and let D be the distribution over buyers’ values. Then for 2 2U C2 δ ∈ (0, 1) and > 0, a sample size of N = 2 C1 + 32 log δ is sufficient to ensure that with probability at least 1−δ over the draw of a set S ∼ DN , for any mechanism M ∈ M, the difference between the average profit of M over S and the expected profit of M over D is at most . q q C1 2 C2 For every mechanism class M we study, M(N, δ) = U N +4U N log δ for some C1,C2 ∈ R, so Remark 3.25 applies. Notice that the number of samples required to ensure a strong generaliza- tion guarantee for the classes of AMAs, VVCAs, and λ-auctions is expononential in m. We prove that this exponential factor is, in fact, necessary.
Theorem 3.26. For a class of auctions M, let NM(, δ) be the number of samples required to ensure that for any distribution D, with probability at least 1 − δ over the draw of a set of samples of size NM(, δ) from D, for all auctions M ∈ M, average profit is -close to expected profit. nm−n 1. If M is the set of AMAs or λ-auctions, then NM(, δ) ≥ 2 . m 2. If M is the set of VVCAs, then NM(, δ) ≥ 2 − 2. We prove Theorem 3.26 in Appendix 8.3. Thus, delineability implies not only generalization guarantees but also sample complexity guarantees which are nearly tight for AMAs and VVCAs.
21 4 Distribution-dependent generalization guarantees
In this section, we provide two ways to strengthen the results in Section 3 when the buyers’ values are additive and drawn from item-independent distributions. We say a distribution over buyers’ values is item-independent if for all i1, i2 ∈ [n] and j ∈ [m], bidder i1’s value for item j is independent from 0 her value for item j , but her value for item j may be arbitrarily correlated with bidder i2’s value for item j or j0. We also require that the mechanism class’s profit functions decompose additively. For example, under item-pricing mechanisms, the profit function decomposes into the profit obtained from selling item 1, plus the profit obtained by selling item 2, and so on. Surprisingly, our bounds do not depend on the number of items and under anonymous prices, they do not depend on the number of bidders either. To obtain our data-dependent guarantees, we move from pseudo-dimension to Rademacher complexity [13, 51], which allows us to prove distribution-dependent generalization guarantees. This is the key advantage of Rademacher complexity over pseudo-dimension; pseudo-dimension implies generalization guarantees that are worst-case over the distribution whereas Rademacher complexity implies distribution-dependent guarantees. We prove that this shift to Rademacher complexity from pseudo-dimension is in fact necessary in order to obtain guarantees that are independent of the number of items (Theorem 4.7). Before defining Rademacher complexity, we make the notion of a distribution-dependent gen- eralization guarantee more formal, as follows.
Definition 4.1. A distribution-dependent generalization guarantee for a mechanism class M and D a distribution D over buyers’ values is a function M : Z≥1 × (0, 1) → R≥0 defined such that for any sample size N ∈ Z≥1 and any δ ∈ (0, 1), with probability at least 1 − δ over the draw of a set S ∼ DN , for any mechanism M in M, the difference between the average profit of M over S and D the expected profit of M over D is at most M(N, δ). In other words, " # 1 X D Pr ∃M ∈ M such that profitM (v) − E [profitM (v)] > M(N, δ) < δ. S∼DN N v∼D v∈S
The generalization guarantee M is worst case in that in holds for any distribution D. In D contrast, the distribution-dependent generalization guarantee M may be much tighter when the distribution D is “well-behaved”. We now define Rademacher complexity, which allows us to characterize distribution-dependent generalization guarantees. At a high level, Rademacher complexity measures the extent to which a class of mechanism profit functions fit random noise. Intuitively, this measures the richness of a class of mechanisms because more complex classes should be able to fit random noise better than simple classes. Empirical Rademacher complexity can be measured on the set of samples and implies generalization guarantees that improve based on structure exhibited by the set of samples. Formally, given a set S = v(1),..., v(N) , the empirical Rademacher complexity of M with respect to S is defined as " N # 1 X (i) RbS (M) = sup σi · profitM v , Eσ N M∈M i=1 where σi ∼ U ({−1, 1}). Classic results from learning theory [13, 51, 70] imply the distribution- dependent generalization guarantee r D h i 2 ln (2/δ) M(N, δ) = 2 E RbS (M) + U , S∼DN N
22 where U is the maximum profit achievable by mechanisms in M over the support of D (for example, Theorem 26.5 by Shalev-Shwartz and Ben-David [70]). In other words, with probability 1 − δ over the draw v(1),..., v(N) ∼ DN , for any mechanism M ∈ M,
N r 1 h i 2 ln (2/δ) X (i) profitM v − E [profitM (v)] ≤ 2 E RbS (M) + U . (1) N v∼D N N i=1 S∼D h i If the distribution D is “well-behaved” in the sense that ES∼DN RbS (M) is small, then Equa- tion 1 will provide a generalization guarantee that might be much tighter than the worst-case guarantee based on pseudo-dimension from Theorem 3.9. In the following corollary of Theorem 3.9, we show that if the profit functions of a class M decompose additively into a number of simpler functions, then we can easily bound RbS (M) using the Rademacher complexity of those simpler functions. We then demonstrate the power of this corollary by proving stronger guarantees for many well-studied mechanism classes when the bidders are additive and their valuations are drawn from item-independent distributions. This includes bidders with values drawn from product distributions as a special case, a setting which has been extensively studied in the mechanism design literature (e.g., [5, 20, 22, 39, 44, 81]). We say that a mechanism class M decomposes additively if for all mechanisms M ∈ M, there exist T functions f1,M , . . . , fT,M such that the function profitM can be written as profitM (·) = f1,M (·) + ··· + fT,M (·). We note that this decomposition is not necessarily into coordinates (for example, it is not necessary that each function fi,M corresponds to some buyer or good). Corollary 4.2. Suppose that M is a set of additively decomposable mechanisms. Moreover, suppose that for all mechanisms M ∈ M, the range of fi,M over the support of D is [0,Ui] and that the N class {fi,M : M ∈ M} is (di, ti)-delineable. Then for any set of samples S ∼ D , RbS (M) ≤ PT p 180 i=1 Ui di log (4diti) /N. Proof. The corollary follows from Theorem 3.9 and Lemma 4.3 which shows that pseudo-dimension can be used to bound Rademacher complexity, and the fact that for any sets G and G0 of functions 0 0 0 mapping an abstract domain A to R and any set S ⊆ A, RbS ({g + g : g ∈ G, g ∈ G }) ≤ RbS (G) + 0 RbS (G ). It is well-known that Rademacher complexity and pseudo-dimension are closely connected, as summarized by the following lemma. Lemma 4.3. [[34, 65]] For any mechanism class M and any distribution D over buyers’ values, let U be the maximum profit achievable by mechanisms in M over the support of D. Then for any q Pdim(M) set of samples S of size N, RbS (M) = O U N .
We now instantiate Corollary 4.2 for several mechanism classes. The proofs are in Appendix 10. Lemma 4.4. Let M and M0 be the sets of second-price auctions with anonymous and non- anonymous reserves. Suppose the bidders are additive, D is item-independent, and the cost function N p 0 p is additive. For any set S ∼ D , RbS (M) ≤ 180U 1/N and RbS (M ) ≤ 180U n log(4n)/N.
Lemma 4.5. Let M and M0 be the sets of anonymous and non-anonymous item-pricing mech- anisms, respectively. Suppose the bidders are additive, D is item-independent, and the cost func- N p 0 tion is additive. For any set of samples S ∼ D , RbS (M) ≤ 180U 1/N and RbS (M ) ≤ 180Upn log(4n)/N.
23 Menus of item lotteries. A length-` item lottery menu is a set of ` lotteries per item. The n (0) (0) (1) (1) (`) (`)o (0) (0) menu for item i is Mi = φi , pi , φi , pi ,..., φi , pi , where φi = pi = 0. The (ji) (ji) Pm (ji) buyer chooses one lottery φi , pi per menu Mi, pays i=1 p , and receives each item i with (ji) probability φi . Lemma 4.6. Let M be the set of length-` item lottery menus. If the bidder is additive, D N is item-independent, and the cost function is additive, then for any set S ∼ D , RbS (M) ≤ 180p2` log (8`3).
Finally, we prove lower bounds showing that one could not hope to prove the generalization guarantees implied by Lemmas 4.4 and 4.5 using pseudo-dimension alone. Theorem 4.7. Let M and M0 be the classes of anonymous and non-anonymous item-pricing mechanisms. Then Pdim (M) ≥ m and Pdim (M0) ≥ nm. The same holds if M and M0 are the classes of second-price auctions with anonymous and non-anonymous reserves.
As Theorem 4.7 demonstrates, shifting to Rademacher complexity from pseudo-dimension is necessary in order to obtain guarantees that are independent of the number of items (Lemmas 4.4 and 4.5).
5 Optimizing the profit-generalization tradeoff
In this section, we use our results from Section 3 to provide tools for optimizing the profit- generalization tradeoff, taking inspiration from classic machine learning results on structural risk minimization [16, 76, 77]. We begin by demonstrating this tradeoff pictorially. For the sake of illus- tration, suppose that M is a mechanism class that decomposes into a nested sequence of subclasses M1 ⊆ · · · ⊆ Mt = M. For example, if M is the class of AMAs, then Mk could be the class of all Q-boosted AMAs with |Q| = k. Prior work [10] gave uniform convergence bounds for AMAs without taking advantage of the class’s hierarchical structure. We illustrate†† uniform convergence bounds with the purple dashed line in Figure 5 with t = 4. On the x-axis, we illustrate the intrinsic complexity of the nested subclasses which can be measured using pseudo-dimension, for example. On the y-axis, for i = 1, 2, 3, 4, we illustrate the average profit over a fixed set of samples S of the mechanism Mˆ i ∈ Mi that maximizes average profit (the orange solid line). In particular, the dot ˆ on the orange solid line above Mi illustrates profitS Mi . Since Mi ⊆ Mj for i ≤ j, the average profit of Mˆ j over S is at least as high as the average profit of Mˆ i over S, which is why the orange solid line is increasing. Similarly, the dot on the blue dotted line above Mi illustrates the expected profit of Mˆ i. This blue dotted line begins decreasing when the complexity of the subclass grows to the point that overfitting occurs: a mechanism’s average profit over the samples is no longer indicative of its expected profit on the unknown distribution. The purple dashed line illustrates a ˆ ˆ na¨ıve lower bound on the expected profit of Mi which is equal to profitS Mi − M (N, δ). Since ˆ the average profit profitS Mi over S increases with i, this lower bound also increases with i, so the mechanism designer may erroneously think that Mˆ 4 is the best mechanism to field.
Our general theorem allows us to be more careful since we can easily derive bounds Mi (N, δ) for each class Mi. Then, we can spread the confidence parameter δ across all subsets M1,..., Mt using P a weight function w : N → [0, 1] such that w (i) ≤ 1. More formally, by a union bound, we are ††These figures are purely illustrative; they are not based on a simulation or real data.
24 Figure 5: An illustration of uniform generalization guarantees versus stronger complexity-dependent bounds for a mechanism class M that decomposes into a nested sequence of subclasses M1 ⊆ M2 ⊆ M3 ⊆ M4. The x-axis is meant to measure each subclass’s intrinsic complexity using a quantity such as pseudo-dimension. Given a fixed set of samples S, the orange solid line plots the hypothetical average profit of the mechanism Mˆ i ∈ Mi that maximizes average profit over the ˆ samples. Specifically, the dot on the orange solid line above Mi illustrates profitS Mi . Similarly, the dot on the blue dotted line above Mi illustrates the expected profit of Mˆ i. The purple dashed line illustrates the lower bound on expected profit given by the uniform generalization guarantee: ˆ profitS Mi − M (N, δ). Meanwhile, the green dashed-dotted line illustrates the lower bound ˆ on expected profit given by the complexity-dependent generalization guarantee: profitS Mi −
Mi (N, δ). We describe the figure in more detail in Section 5. guaranteed that with probability at least 1 − δ, for all mechanisms M ∈ M, the difference between profitS (M) and profitD (M) is at most mini:M∈Mi Mi (N, δ · w (i)). This is illustrated by the green dashed-dotted line in Figure 5, where for i = 1, 2, 3, 4, the lower bound on the expected profit of ˆ Mi is its average profit minus Mi (N, δ · w (i)). This complexity-dependent lower bound indicates when overfitting begins. By maximizing this complexity-dependent lower bound on expected profit, the designer can correctly determine that Mˆ 2 is a better mechanism to field than Mˆ 4. Both the decomposition of M into subsets and the choice of a weight function allow the designer to encode his prior knowledge about the market. For example, if mechanisms in Mi are likely more profitable than others, he can increase w (i). The larger the weight w (i) assigned to Mi is, the larger δ · w (i) is, and a larger δ · w (i) implies a smaller Mi (N, δ · w (i)), thereby implying stronger guarantees. We now present an application of this complexity-dependent analysis to item pricing. Market segmentation is prevalent throughout daily life: movie theaters, amusement parks, and tourist attractions have different admission prices per market segment, with groups such as Child, Student, Adult, and Senior Citizen. Formally, the designer can segment the buyers into k groups and charge each group a different price. If k = 1, the prices are anonymous and if k = n, they are non-anonymous, thus forming a mechanism hierarchy. For k ∈ [n], let Mk be the class of non- anonymous pricing mechanisms where there are k price groups. In other words, for all mechanisms in Mk, there is a partition of the buyers B1,...,Bk such that for all t ∈ [k], all pairs of buyers 0 j, j ∈ Bt, and all items i ∈ [m], pj (ei) = pj0 (ei). We derive the following guarantee for this hierarchy.
25 Theorem 5.1. Let M be the class of non-anonymous item-pricing mechanisms over additive bid- Pn ders and let w :[n] → R be a weight function such that i=1 w (i) ≤ 1. Then for any δ ∈ (0, 1), N with probability at least 1−δ over the draw S ∼ D , for any k ∈ [n] and any mechanism M ∈ Mk, r s km log (4nm) 2 4 |profit (M) − profit (M)| ≤ 360U + 4U ln . S D N N δ · w (k)
We also prove the following theorem for the hierarchy of AMAs defined by the classes of Q- boosted AMAs. For an AMA M, let QM be the set of all allocations Q such that λ (Q) > 0.
Theorem 5.2. Let M be the class of AMAs and let w be a weight function that maps sets of allocations Q to [0, 1] such that P w (Q) ≤ 1. With probability 1 − δ over the draw S ∼ DN , for any M ∈ M, r s nm (n + |QM |) log(4n) 2 4 |profitS (M) − profitD (M)| ≤ 360U + 4U ln . N N δ · w (QM )
Theorem 5.2 follows from Theorems 3.8 and 3.9 as well as Lemma 3.22: we only need to multiply the weight term with δ as it appears in the resulting bound. Lemmas 3.15, 3.22, and 3.23 similarly imply results for two-part tariffs, Q-boosted λ-auctions, and lottery menus (see Theorems 11.1, 11.2, and 11.3 in Appendix 11).
6 Comparison of our results to prior research
Here, we provide a brief comparison of this paper’s results to prior research. These works provide generalization guarantees for a subset of the mechanisms we study. We match or improve over the best-known guarantees for many of the special classes that these papers also studied. Morgenstern and Roughgarden [62] studied “simple” multi-item mechanisms: item-pricing mechanisms and second-price item auctions. See Appendix 9.1 and Tables 3 and 4 for a com- parison of our guarantees. Syrgkanis [71] provided generalization guarantees specifically for the mechanism that maxi- mizes average revenue over the samples. This is in contrast to our bounds, which apply uni- formly to every mechanism in a given class. This is crucial when it may not be computation- ally feasible to determine a mechanism with highest average revenue over the samples, but only an approximation. Syrgkanis [71] analyzed multi-item, multi-bidder item-pricing mechanisms when the bidders have additive valuations and there is no supplier cost. Let M be this mech- anism class with anonymous prices and let M0 be this class with non-anonymous prices. For a set S of samples, let Mˆ and Mˆ 0 be the mechanisms in M and M0 that maximize average revenue over the samples. Syrgkanis [71] proved that with probability 1 − δ over the draw of N ˆ p S ∼ D , profitD M − maxM∈M profitD (M) = O (U/δ) m log (nN) /N . When D is item- independent, our bound of O Uplog (1/δ) /N improves over this bound. Otherwise, our bound of O Upm log (m) /N + Uplog (1/δ) /N is incomparable. Syrgkanis [71] also proved that with ˆ 0 p probability 1 − δ, profitD M − maxM∈M0 profitD (M) = O (U/δ) nm log (N) /N . Our bound of O Upnm log (nm) /N + Uplog (1/δ) /N is incomparable.
26 Cai and Daskalakis [20] provided learning algorithms for bidders with valuations drawn from product distributions, which are item-independent‡‡. At a high level, their algorithms return mechanisms with expected revenue that is nearly a constant fraction of the optimal expected revenue (OPT ) obtainable by any randomized and Bayesian truthful mechanism. When the buyers have additive valuations, they proved that given N samples, the difference between the OPT D expected revenue of the mechanism their algorithm outputs and 32 is M(N, δ) (as defined in Definition 4.1), where D is the distribution over buyers’ values and M is the set of non- anonymous item-pricing mechanisms (Lemma 15 and Theorem 13 of the full paper by Cai and Daskalakis [21]). At the time, Morgenstern and Roughgarden’s (2016) bound of D (N, δ) = M O Upnm log(nm)/N + Uplog(1/δ)/N was the best-known bound.§§ Via Lemma 4.5, we im- D p p prove this to M(N, δ) = O U n log n/N + U log(1/δ)/N , removing the dependence on the number of items. They also provided algorithms for bidders with other types of valuations, such as subadditive and XOS. In these cases, their algorithms return item-pricing mechanisms with entry fees. Our main theorem would provide pessimistic guarantees for these mechanisms due to the exponentially large number of parameters. Medina and Vassilvitskii [58] studied single-bidder, multi-item pricing in a different model from ours, where there is no bound on the number of items but each item is defined by a feature vector. Their pricing algorithm has access to a bid predictor mapping from feature vectors to bids. They related their algorithm’s performance to the bid predictor’s accuracy, among other factors. Devanur et al. [31] studied several single-item auction classes, including the class M of second price item auctions with non-anonymous reserves and no cost function. When the buyers’ values are contained in the interval [1,U], they proved that N = O (U/)2 (n log (U/) + log (1/δ)) samples are sufficient to ensure with probability 1 − δ over the draw S ∼ DN , for all M ∈ M, |profitS (M) − profitD (M)| ≤ (Section 6.1 of the full paper by Devanur et al. [31]). Our Lemma 3.19 for the multi-item case, specialized to the single-item setting, implies that ! U 2 1 O n log n + log δ samples are sufficient, which is incomparable to Devanur et al.’s bound due to the log factors. Gonczarowski and Weinberg [42] study a setting where there are n bidders with additive, inde- pendent values in the interval [0,H] for m items, as well as a generalization to Lipschitz valuation functions. They prove that poly(n, m, H, 1/) samples are sufficient to learn an approximately Bayesian incentive compatible mechanism whose revenue is within an -factor of optimal. They prove the same result for approximately dominant strategy incentive compatible mechanisms as well. This is an information-theoretic result: their learning algorithm is not computationally effi- cient since it is not known how to find an -approximately optimal mechanism in polynomial time even when the distribution over buyers’ values is known explicitly. In contrast, we are able to handle arbitrarily correlated distributions over combinatorial valuations, though our guarantees apply only to parameterized mechanism classes that may not compete with the optimal incentive compatible mechanism. Meanwhile, Gonczarowski and Weinberg [42] bound the number of samples sufficient to compete with the optimal incentive compatible mechanism, and thus are not restricted to a parameterized set of mechanisms, but require the buyers’ values to be independent and Lipschitz.
‡‡Item-independent distributions are discussed in Section 4. §§For this bound by Morgenstern and Roughgarden [62] to hold, revenue must be contained in the interval [0,U], as we assume throughout this paper.
27 7 Conclusion
We studied profit maximization while relaxing the assumption that the mechanism designer knows the distribution over buyers’ values. We only assumed he has a set of samples from that distribu- tion. We provided a unifying framework for bounding the complexity of a wide variety of mech- anism classes. We characterized structural similarities of mechanism classes including non-linear pricing mechanisms, combinatorially challenging variants of the VCG mechanism such as affine maximizer auctions, and randomized mechanisms (lotteries): profit is a piecewise-linear function of the mechanism class’s parameters. These similarities led us to a general theorem that offers generalization guarantees for a broad range of mechanism classes. It offers the first generalization guarantees for many important mechanism design settings. It also matches and improves over many of the special-purpose generalization guarantees already provided in the nascent literature on sample-based mechanism design. Finally, we provided tools for optimizing a fundamental tradeoff in sample-based mechanism design: more complex mechanisms have higher average profit over the samples than simpler mechanisms, but require more samples to ensure that average profit over samples nearly matches actual expected profit, that is, to avoid overfitting.
Acknowledgements. This material is based on work supported by the National Science Foun- dation under grants CCF-1422910, CCF-1535967, CCF-1733556, CCF-1910321, IIS-1617590, IIS- 1618714, IIS-1718457, IIS-1901403, SES-1919453, and a Graduate Research Fellowship; the ARO under awards W911NF2010081 and W911NF1710082; the Defense Advanced Research Projects Agency under cooperative agreement HR00112020003; an Amazon Research Award; a Microsoft Research Faculty Fellowship; an AWS Machine Learning Research Award; a Bloomberg Data Sci- ence research grant; an IBM PhD Fellowship; and a fellowship from Carnegie Mellon University’s Center for Machine Learning and Health.
References
[1] William James Adams and Janet L. Yellen. Commodity bundling and the burden of monopoly. Quarterly Journal of Economics, 90(3):475–498, Aug 1976.
[2] Noga Alon, Moshe Babaioff, Yannai A Gonczarowski, Yishay Mansour, Shay Moran, and Amir Yehudayoff. Submultiplicative Glivenko-Cantelli and uniform convergence of revenues. Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), 2017.
[3] Victor F. Araman and Ren´eCaldentey. Dynamic pricing for nonperishable products with demand learning. Operations Research, 57(5):1169–1188, 2009.
[4] Moshe Babaioff, Nicole Immorlica, Brendan Lucier, and S. Matthew Weinberg. A simple and approximately optimal mechanism for an additive buyer. In Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS), 2014.
[5] Moshe Babaioff, Yannai A Gonczarowski, and Noam Nisan. The menu-size complexity of revenue approximation. In Proceedings of the Annual Symposium on Theory of Computing (STOC), 2017.
[6] Yannis Bakos and Erik Brynjolfsson. Bundling information goods: Pricing, profits, and effi- ciency. Management Science, 45(12):1613–1630, Dec 1999.
28 [7] Maria-Florina Balcan, Avrim Blum, Jason D. Hartline, and Yishay Mansour. Mechanism de- sign via machine learning. In Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS), pages 605–614, 2005.
[8] Maria-Florina Balcan, Avrim Blum, Jason Hartline, and Yishay Mansour. Reducing mech- anism design to algorithm design via machine learning. Journal of Computer and System Sciences, 74:78–89, December 2008.
[9] Maria-Florina Balcan, Amit Daniely, Ruta Mehta, Ruth Urner, and Vijay V Vazirani. Learning economic parameters from revealed preferences. In Proceedings of the Conference on Web and Internet Economics (WINE), 2014.
[10] Maria-Florina Balcan, Tuomas Sandholm, and Ellen Vitercik. Sample complexity of automated mechanism design. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), 2016.
[11] Maria-Florina Balcan, Tuomas Sandholm, and Ellen Vitercik. A general theory of sample com- plexity for multi-item profit maximization. Proceedings of the ACM Conference on Economics and Computation (EC), 2018.
[12] Maria-Florina Balcan, Siddharth Prasad, and Tuomas Sandholm. Efficient algorithms for learning revenue-maximizing two-part tariffs. In Proceedings of the International Joint Con- ference on Artificial Intelligence (IJCAI), 2020.
[13] Peter L Bartlett and Shahar Mendelson. Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3(Nov):463–482, 2002.
[14] Michael Benisch, Norman Sadeh, and Tuomas Sandholm. Methodology for designing reason- ably expressive mechanisms with application to ad auctions. In Proceedings of the Interna- tional Joint Conference on Artificial Intelligence (IJCAI), 2009. Earlier version in proceedings of ACM EC-08 Workshop on Advertisement Auctions.
[15] Omar Besbes and Assaf Zeevi. Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Operations Research, 57(6):1407–1420, 2009.
[16] Anselm Blumer, Andrzej Ehrenfeucht, David Haussler, and Manfred K Warmuth. Occam’s razor. Information processing letters, 24(6):377–380, 1987.
[17] Josef Broder and Paat Rusmevichientong. Dynamic pricing under a general parametric choice model. Operations Research, 60(4):965–980, 2012.
[18] S´ebastienBubeck, Nikhil R Devanur, Zhiyi Huang, and Rad Niazadeh. Online auctions and multi-scale online learning. Proceedings of the ACM Conference on Economics and Computa- tion (EC), 2017.
[19] R. C. Buck. Partition of space. Amer. Math. Monthly, 50:541–544, 1943. ISSN 0002-9890.
[20] Yang Cai and Constantinos Daskalakis. Learning multi-item auctions with (or without) sam- ples. In Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS), 2017.
[21] Yang Cai and Constantinos Daskalakis. Learning multi-item auctions with (or without) sam- ples. arXiv preprint arXiv:1709.00228, 2017.
29 [22] Yang Cai, Nikhil R. Devanur, and S. Matthew Weinberg. A duality based unified approach to Bayesian mechanism design. In Proceedings of the Annual Symposium on Theory of Computing (STOC), 2016. [23] Shuchi Chawla, Jason D Hartline, and Robert Kleinberg. Algorithmic pricing via virtual valuations. In Proceedings of the ACM Conference on Economics and Computation (EC), 2007. [24] Shuchi Chawla, Jason Hartline, and Denis Nekipelov. Mechanism design for data science. In Proceedings of the ACM Conference on Economics and Computation (EC), 2014. [25] Richard Cole and Tim Roughgarden. The sample complexity of revenue maximization. In Proceedings of the Annual Symposium on Theory of Computing (STOC), 2014. [26] Michael Collins. Discriminative reranking for natural language parsing. Proceedings of the International Conference on Machine Learning (ICML), 2000. [27] Vincent Conitzer and Tuomas Sandholm. Complexity of mechanism design. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), 2002. [28] Vincent Conitzer and Tuomas Sandholm. Applications of automated mechanism design. In UAI-03 workshop on Bayesian Modeling Applications, 2003. [29] Vincent Conitzer and Tuomas Sandholm. Self-interested automated mechanism design and implications for optimal combinatorial auctions. In Proceedings of the ACM Conference on Economics and Computation (EC), pages 132–141, 2004. [30] Nikhil R Devanur, Zhiyi Huang, and Christos-Alexandros Psomas. The sample complexity of auctions with side information. In Proceedings of the Annual Symposium on Theory of Computing (STOC), 2016. [31] Nikhil R Devanur, Zhiyi Huang, and Christos-Alexandros Psomas. The sample complexity of auctions with side information. arXiv preprint arXiv:1511.02296, 2017. [32] Shahar Dobzinski and Shaddin Dughmi. On the power of randomization in algorithmic mech- anism design. In Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS), 2009. [33] Shahar Dobzinski and Mukund Sundararajan. On characterizations of truthful mechanisms for combinatorial auctions and scheduling. In Proceedings of the ACM Conference on Economics and Computation (EC), 2008. [34] Richard Dudley. Universal Donsker classes and metric entropy. The Annals of Probability, 15 (4):1306–1326, 1987. [35] Benjamin Edelman, Michael Ostrovsky, and Michael Schwarz. Internet advertising and the generalized second-price auction: Selling billions of dollars worth of keywords. The American Economic Review, 97(1):242–259, March 2007. ISSN 0002-8282. [36] Edith Elkind. Designing and learning optimal finite support auctions. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA), 2007. [37] Michal Feldman, Nick Gravin, and Brendan Lucier. Combinatorial auctions via posted prices. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA), 2015.
30 [38] Martin S Feldstein. Equity and efficiency in public sector pricing: the optimal two-part tariff. The Quarterly Journal of Economics, pages 176–187, 1972.
[39] Kira Goldner and Anna R Karlin. A prior-independent revenue-maximizing auction for mul- tiple additive bidders. In Proceedings of the Conference on Web and Internet Economics (WINE), 2016.
[40] Negin Golrezaei, Adel Javanmard, and Vahab Mirrokni. Dynamic incentive-aware learning: Robust pricing in contextual auctions. Operations Research, 2020. (forthcoming).
[41] Yannai A Gonczarowski and Noam Nisan. Efficient empirical revenue maximization in single- parameter auction environments. In Proceedings of the Annual Symposium on Theory of Com- puting (STOC), pages 856–868, 2017.
[42] Yannai A Gonczarowski and S Matthew Weinberg. The sample complexity of up-to-ε multi- dimensional revenue maximization. In Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS), 2018.
[43] Chenghao Guo, Zhiyi Huang, and Xinzhi Zhang. Settling the sample complexity of single- parameter revenue maximization. Proceedings of the Annual Symposium on Theory of Com- puting (STOC), 2019.
[44] Sergiu Hart and Noam Nisan. Approximate revenue maximization with multiple items. In Proceedings of the ACM Conference on Economics and Computation (EC), 2012.
[45] Jason Hartline and Samuel Taggart. Non-revelation mechanism design. arXiv preprint arXiv:1608.01875, 2016.
[46] David Haussler. Generalizing the PAC model: Sample size bounds from metric dimension- based uniform convergence results. In Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS), 1989.
[47] Xinran He, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yanxin Shi, Antoine Atal- lah, Ralf Herbrich, Stuart Bowers, and Joaquin Quinonero Candela. Practical lessons from predicting clicks on ads at Facebook. In Proceedings of the International Workshop on Data Mining for Online Advertising, 2014.
[48] Zhiyi Huang, Yishay Mansour, and Tim Roughgarden. Making the most of your samples. In Proceedings of the ACM Conference on Economics and Computation (EC), 2015.
[49] Philippe Jehiel, Moritz Meyer-Ter-Vehn, and Benny Moldovanu. Mixed bundling auctions. Journal of Economic Theory, 134(1):494–512, 2007.
[50] Yash Kanoria and Hamid Nazerzadeh. Incentive-compatible learning of reserve prices for repeated auctions. Operations Research, 2020. (forthcoming).
[51] Vladimir Koltchinskii. Rademacher penalties and structural risk minimization. IEEE Trans- actions on Information Theory, 47(5):1902–1914, 2001.
[52] Ron Lavi, Ahuva Mu’Alem, and Noam Nisan. Towards a characterization of truthful combina- torial auctions. In Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS), 2003.
31 [53] Yi Li, Philip M Long, and Aravind Srinivasan. Improved bounds on the sample complexity of learning. Journal of Computer and System Sciences, 62(3):516–527, 2001.
[54] Anton Likhodedov and Tuomas Sandholm. Methods for boosting revenue in combinatorial auctions. In Proceedings of the AAAI Conference on Artificial Intelligence, 2004.
[55] Anton Likhodedov and Tuomas Sandholm. Approximating revenue-maximizing combinatorial auctions. In Proceedings of the AAAI Conference on Artificial Intelligence, 2005.
[56] Alejandro M Manelli and Daniel R Vincent. Bundling as an optimal selling mechanism for a multiple-good monopolist. Journal of Economic Theory, 127(1):1–35, 2006.
[57] R. Preston McAfee, John McMillan, and Michael D. Whinston. Multiproduct monopoly, commodity bundling, and correlation of values. Quarterly Journal of Economics, 104(2):371– 383, May 1989.
[58] Andr´esMu˜nozMedina and Sergei Vassilvitskii. Revenue optimization with approximate bid predictions. Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), 2017.
[59] Mehryar Mohri and Andr´esMu˜nozMedina. Learning theory and algorithms for revenue opti- mization in second price auctions with reserve. In Proceedings of the International Conference on Machine Learning (ICML), 2014.
[60] Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of Machine Learn- ing. MIT Press, 2012.
[61] Jamie Morgenstern and Tim Roughgarden. On the pseudo-dimension of nearly optimal auc- tions. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), 2015.
[62] Jamie Morgenstern and Tim Roughgarden. Learning simple auctions. In Proceedings of the Conference on Learning Theory (COLT), 2016.
[63] Roger Myerson. Optimal auction design. Mathematics of Operation Research, 6:58–73, 1981.
[64] Walter Y Oi. A Disneyland dilemma: Two-part tariffs for a Mickey Mouse monopoly. The Quarterly Journal of Economics, 85(1):77–96, 1971.
[65] David Pollard. Convergence of Stochastic Processes. Springer, 1984.
[66] Kevin Roberts. The characterization of implementable social choice rules. In J-J Laffont, editor, Aggregation and Revelation of Preferences. North-Holland Publishing Company, 1979.
[67] Tim Roughgarden and Okke Schrijvers. Ironing in the dark. In Proceedings of the ACM Conference on Economics and Computation (EC), 2016.
[68] Tuomas Sandholm. Automated mechanism design: A new application area for search algo- rithms. In Proceedings of the International Conference on Principles and Practice of Constraint Programming (CP), 2003.
[69] Tuomas Sandholm and Anton Likhodedov. Automated design of revenue-maximizing combi- natorial auctions. Operations Research, 63(5):1000–1025, September–October 2015.
32 [70] Shai Shalev-Shwartz and Shai Ben-David. Understanding machine learning: From theory to algorithms. Cambridge University Press, 2014.
[71] Vasilis Syrgkanis. A sample complexity measure with applications to learning optimal auctions. Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), 2017.
[72] Pingzhong Tang. Reinforcement mechanism design. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2017.
[73] Pingzhong Tang and Tuomas Sandholm. Mixed-bundling auctions with reserve prices. In Proceedings of the Conference for Autonomous Agents and Multi-Agent Systems (AAMAS). International Foundation for Autonomous Agents and Multiagent Systems, 2012.
[74] John Thanassoulis. Haggling over substitutes. Journal of Economic theory, 117(2):217–245, 2004.
[75] Vladimir Vapnik. Estimation of dependences based on empirical data. Springer Science & Business Media, 2006.
[76] Vladimir Vapnik. The nature of statistical learning theory. Springer science & business media, 2013.
[77] Vladimir Vapnik and Alexey Chervonenkis. Theory of pattern recognition, 1974.
[78] VN Vapnik and A Ya Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16(2):264–280, 1971.
[79] William E Walsh, David C Parkes, Tuomas Sandholm, and Craig Boutilier. Computing reserve prices and identifying the value distribution in real-world auctions with market disruptions. In Proceedings of the AAAI Conference on Artificial Intelligence, 2008.
[80] Robert B Wilson. Nonlinear pricing. Oxford University Press on Demand, 1993.
[81] Andrew Chi-Chih Yao. An n-to-1 bidder reduction for multi-item auctions and its applications. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA), 2014.
[82] Hector Yee and Bar Ifrach. Aerosolve: Machine learning for humans. Open Source, 2015. URL http://nerds.airbnb.com/aerosolve/.
8 Proofs from Section 3
Corollary 8.1. Let M ∗ ∈ M be the mechanism that maximizes expected profit over the distribution over buyers’ values. For any δ ∈ (0, 1), with probability at least 1 − δ over the draw of a set of samples S of size N from the distribution over buyers’ values, the difference between the expected q ˆ ∗ δ 1 4 profit of Mρ and expected profit of M is at most ρ + M N, 2 + U 2N ln δ . Proof. Let M(S) be the mechanism in M that maximizes empirical profit over S. With probability at least 1 − δ,
δ profit Mˆ + N, ≥ profit Mˆ (2) D ρ M 2 S ρ
33 ≥ profitS (M(S)) − ρ (3) ∗ ≥ profitS (M ) − ρ (4) r 2 ln(4/δ) ≥ profit (M ∗) − U − ρ. (5) D 2N Inequality (2) follows from standard uniform convergence bounds: with probability at least 1 − δ/2, δ profit Mˆ − profit Mˆ ≤ N, . D ρ S ρ M 2
Inequality (3) follows from the fact that Mˆ ρ has empirical profit that is within an additive fac- ˆ tor of ρ from empirically optimal over the set of samples, or in other words, profitS Mρ ≥ profitS (M(S)) − ρ. Inequality (4) follows because M(S) is the empirical profit maximizer (i.e., it maximizes profitS (M)). Finally, inequality (5) is a result of Hoeffding’s inequality, which guaran- q ∗ ∗ 2 ln(4/δ) tees that with probability at least 1 − δ/2, profitS (M ) ≥ profitD (M ) − U 2N . Rearranging, we get that r δ 2 ln(4/δ) profit Mˆ ≥ profit (M ∗) − N, − U − ρ, D ρ D M 2 2N as claimed.
Corollary 8.2. Let M be a mechanism class and let M ∗ be a mechanism in M with maximum expected profit. Given a set of samples S, let Mˆ α be a mechanism in M with empirical profit that ˆ is at least an α-fraction of the empirically optimal: profitS Mα ≥ α · maxM∈M profitS (M). For any δ ∈ (0, 1), with probability at least 1 − δ over the draw of a set of samples S of size N from the distribution over buyers’ values, the difference between the expected profit of Mˆ α and an α-fraction q ∗ δ ln(4/δ) of the expected profit of M is at most M N, 2 + Uα 2N . In other words, r δ 2 ln(4/δ) profit Mˆ ≥ α · profit (M ∗) − N, − Uα . D α D M 2 2N
Proof. Let S = v1,..., vN be a set of samples of bidder valuations. With probability at least 1 − δ, ˆ δ ˆ profitD Mα + M N, ≥ profitS Mα ≥ α · max profitS (M) 2 M∈M r 2 ln(4/δ) ≥ α · profit (M ∗) ≥ α · profit (M ∗) − Uα . S D 2N These inequalities follow for the same reasons as in the proof of Corollary 8.1.
Lemma 8.3 (Shalev-Shwartz and Ben-David [70], Lemma A.2). Let a ≥ 1 and b > 0. Then x < a log x + b implies that x < 4a log(2a) + 2b.
Lemma 8.4. No matter which parameters the mechanism designer chooses in P or P 0, if all buyers simultaneously choose the tariff and the number of units (q1, . . . , qn) that maximize their utilities, Pn then j=1 qj ≤ κ.
34 Proof. We prove this lemma for non-anonymous prices, and the lemma for anonymous prices follow since they are a special case of non-anonymous prices. For a contradiction, suppose there exists a set of buyers’ values v and a non-anonymous menu of two-part tariffs with parameters in P 0 such Pn that if tj is the tariff that buyer j chooses and qj is the number of units he chooses, j=1 qj > κ. Pn (tj ) (tj ) Since the mechanisms are profit non-negative, we know that j=1 p1,j ·1{qj ≥1} +p2,j ·qj −c (Q) ≥ 0, where Q = (q1, . . . , qn). We also know that each buyer’s value for the units he bought is greater Pn Pn (tj ) (tj ) Pn than the price: j=1 vj(qj) ≥ j=1 p1,j · 1{qj ≥1} + p2,j · qj. Therefore, j=1 vj(qj) − c(Q) ≥ 0. However, this contradicts Assumption 3.13, so the lemma holds.
Lemma 3.15. Let M and M0 be the classes of anonymous and non-anonymous profit non-negative length-` menus of two-part tariffs. Under Assumption 3.13, M is 2`, n (κ`)2 -delineable and M0 is 2n`, n (κ`)2 -delineable.
Proof. In the case of anonymous prices, every length-` menu of two-part tariffs is defined by d = 2` parameters: the fixed fee and unit price for each of the ` menu entries. Buyer j will choose the (i) (i) (i) (i) quantity q and menu entry (p0 , p1 ) that maximizes vj(q) − (p0 · 1q>0 + p1 q). By Lemma 8.4, we know each buyer will only choose at most κ units. Therefore, the quantity q and menu entry 2 d (i) (i) that she chooses is determined by (κ`) hyperplanes in R of the form vj(q) − (p0 · 1q>0 + p1 q) ≥ 0 (k) (k) 0 2 vj(q ) − (p0 · 1q0>0 + p1 q ). In total, there are n(`κ) hyperplanes that determine the menu entry and quantity demanded by all n buyers, over which profit is linear in the fixed fees and unit prices. In the case of non-anonymous reserve prices, the same argument holds, except that every length- ` menu of two-part tariffs is defined by 2n` parameters: for each buyer, we must set the fixed fee and unit price for each of the ` menu entries.
Lemma 8.5. No matter which parameters the mechanism designer chooses in P or P 0, if all buyers Pn simultaneously choose the bundles that maximize their utilities, then j=1 qj[i] ≤ κi for all i ∈ [m]. Proof. We prove this lemma for non-anonymous prices, and the lemma for anonymous prices follow since they are a special case of non-anonymous prices. For a contradiction, suppose there exists a set of buyers’ values v and a non-anonymous non-linear pricing mechanism with parameters in P 0 such Pn that if qj is the bundle buyer j chooses, j=1 qj[i] > κi for some i ∈ [m]. Since the mechanisms are Pn profit non-negative, we know that j=1 pj(qj) − c (Q) ≥ 0, where Q = (q1,..., qn). We also know Pn Pn that each buyer’s value for the units he bought is greater than the price: j=1 vj(qj) ≥ j=1 pj(qj). Pn Therefore, j=1 vj(qj) − c(Q) ≥ 0. However, this contradicts Assumption 3.13, so the lemma holds.
Lemma 3.17. Let M and M0 be the classes of anonymous and non-anonymous profit non-negative Qm 2 non-linear pricing mechanisms. Let K = i=1 (κi + 1). Under Assumption 3.13, M is K, nK - delineable and M0 is nK, nK2-delineable.
Proof. We begin by analyzing the case where there are anonymous prices. By Lemma 8.5, the mechanism designer might as well set the price of any bundle q such that q[i] ≥ κi for some i ∈ [m] Qm to ∞. Therefore, every non-linear pricing mechanism is defined by d = i=1 (κi + 1) parameters because that is the number of different bundles and there is a price per bundle. Buyer j will prefer the bundle corresponding to the quantity vector q over the bundle corresponding to the quantity 0 0 0 Qm 2 vector q if vj(q)−p(q) ≥ vj(q )−p(q ). Therefore, there are at most i=1 (κi + 1) hyperplanes in d R determining each buyer’s preferred bundle — one hyperplane per pair of bundles. This means Qm 2 d that there are a total of n i=1 (κi + 1) hyperplanes in R such that in any one region induced by
35 these hyperplanes, the bundles demanded by all n buyers are fixed and profit is linear in the prices of these n bundles. In the case of non-anonymous prices, the same argument holds, except that every non-linear Qm pricing mechanism is defined by n i=1 (κi + 1) parameters — one parameter per bundle-buyer pair.
Definition 8.6 (Additively decomposable non-linear pricing mechanisms). Additively decompos- able non-linear pricing mechanisms are a subset of non-linear pricing mechanisms where the prices are additive over the items. Specifically, if the prices are anonymous, there exist m functions (i) P (i) p :[κi] → R for all i ∈ [m] such that for every quantity vector q, p(q) = i:q[i]≥1 p (q[i]). If the (i) prices are non-anonymous, there exist nm functions pj :[κi] → R for all i ∈ [m] and j ∈ [n] such P (i) that for every quantity vector q, pj(q) = i:q[i]≥1 pj (q[i]).
Lemma 8.7. Let M and M0 be the classes of additively decomposable non-linear pricing mecha- nisms with anonymous and non-anonymous prices, respectively. Then M is
m m ! X Y 2 (κi + 1), n (κi + 1) -delineable i=1 i=1
0 Pm Qm 2 and M is n i=1 (κi + 1) , n i=1 (κi + 1) -delineable.
Proof. In the case of anonymous prices, any additively decomposable non-linear pricing mechanism Pm is defined by d = i=1(κi + 1) parameters. As in the proof of Lemma 3.17, there are a total of Qm 2 d n i=1(κi + 1) hyperplanes in R such that in any one region induced by these hyperplanes, the bundles demanded by all n buyers are fixed and profit is linear in the prices of these n bundles. In the case of non-anonymous prices, the same argument holds, except that every non-linear Pm pricing mechanism is defined by n i=1(κi + 1) parameters — one parameter per item, quantity, and buyer tuple.
Lemma 3.18. Let M and M0 be the classes of item-pricing mechanisms with anonymous prices and non-anonymous prices, respectively. If the buyers are additive¶¶, then M is (m, m)-delineable and M0 is (nm, nm)-delineable.
Proof. In the case of anonymous prices, every item-pricing mechanisms is defined by m prices m m p ∈ R , so the parameter space is R . Let ji be the buyer with the highest value for item i. We know that item i will be bought so long as vji (ei) ≥ p(ei). Once the items bought are fixed, profit m is linear. Therefore, there are m hyperplanes splitting R into regions where profit is linear. nm In the case of non-anonymous prices, the parameter space is R since there is a price per buyer and per item. The items each buyer j is willing to buy is defined by m hyperplanes: vj(ei) ≥ pj(ei). So long as these preferences are fixed, profit is a linear function of the prices. Therefore, there are nm nm hyperplanes splitting R into regions where profit is linear. Lemma 3.19. Let M and M0 be the classes of anonymous and non-anonymous second price item auctions. Then M is (m, m)-delineable and M0 is (nm, m)-delineable. ¶¶In the two cases where the buyers have unit-demand and general valuations, we prove generalization bounds by connecting the hyperplane structure we investigate in this paper to the structured prediction literature in machine learning. See Appendix 9 for details.
36 0 Proof. For a given valuation vector v, let ji be the highest bidder for item i and let ji be the second highest bidder. Under anonymous prices, item i will be bought so long as vji (ei) ≥ p(ei). If buyer j buys item i, his payment depends on whether or not v 0 (e ) ≥ p(e ). Therefore, there are t = 2m i ji i i m hyperplanes splitting R into regions where profit is linear. In the case of non-anonymous prices, nm the only difference is that the parameter space is R . Lemma 3.20. Let M be the set of MBARPs. Then M is m + 1, (n + 1)2m+1-delineable.
Proof. An MBARP is defined by m + 1 parameters since there is one reserve per item and one allocation boost. Let K = (n + 1)m be the total number of allocations. Fix some valuation vector v. We claim that the allocation of any MBARP is determined by at most (n + 1)K2 hyperplanes m+1 k k k ` ` ` in R . To see why this is, let Q = q1 ,..., qn and Q = q1,..., qn be any two allocations K and let qQk and qQ` be the bundles of items not allocated. Consider the 2 hyperplanes defined as n n X ` X ` ` X k X k k vi qi + p (ei) + λ Q − c Q = vi qi + p (ei) + λ Q − c Q . i=1 i=1 j:qQ` [i]=1 j:qQk [i]=1