<<

Generalization Guarantees for Multi-item Profit Maximization: Pricing, Auctions, and Randomized Mechanisms

Maria-Florina Balcan Tuomas Sandholm Ellen Vitercik Carnegie Mellon University Carnegie Mellon University Carnegie Mellon University [email protected] Optimized Markets, Inc. [email protected] Strategic Machine, Inc. Strategy Robot, Inc. [email protected]

April 13, 2021

Abstract We study the design of multi-item mechanisms that maximize expected profit with respect to a distribution over buyers’ values. In practice, a full description of the distribution is typically unavailable. Therefore, we study the setting where the designer only has samples from the distribution and the goal is to find a high-profit mechanism within a class of mechanisms. If the class is complex, a mechanism may have high average profit over the samples but low expected profit. This raises the question: how many samples are sufficient to ensure that a mechanism’s average profit is close to its expected profit? To answer this question, we uncover structure shared by many pricing, auction, and lottery mechanisms: for any set of buyers’ values, profit is piecewise linear in the mechanism’s parameters. Using this structure, we prove new bounds for mechanism classes not yet studied in the sample-based mechanism design literature and match or improve over the best known guarantees for many classes. Finally, we provide tools for optimizing an important tradeoff: more complex mechanisms typically have higher average profit over the samples than simpler mechanisms, but more samples are required to ensure that average profit nearly matches expected profit.

1 Introduction

The design of profit-maximizing∗ mechanisms is a fundamental problem with diverse applications

arXiv:1705.00243v5 [cs.LG] 10 Apr 2021 including Internet retailing, advertising markets, strategic sourcing, and artwork sales. This prob- lem has traditionally been studied under the assumption that there is a joint distribution from which the buyers’ values are drawn and that the mechanism designer knows this distribution in ad- vance. This assumption has led to groundbreaking theoretical results in the single-item setting [63], but transitioning from theory to practice is challenging because the true distribution over buyers’ values is typically unknown. Moreover, in the dramatically more challenging setting where there are multiple items for sale, the support of the distribution alone is often doubly exponential (even if there were just a single buyer with a finite type space), so obtaining and storing the distribution is typically impossible. We relax this strong assumption and instead assume that the mechanism designer only has a set of independent samples from the distribution, an approach introduced by Likhodedov and

∗In this work, we study the standard setting of profit maximization with risk-neutral agents.

1 Sandholm [54, 55] and Sandholm and Likhodedov [69]. Specifically, a single sample is a random draw from the distribution over buyers’ values, listing each buyer’s value for each set of items for sale. When the mechanism designer uses a set of samples rather than a description of the distribution to design a mechanism, we refer to this procedure as sample-based mechanism design. Sample-based mechanism design reflects current industry practices since many companies, such as online ad exchanges [47, 58], sponsored search platforms [14, 35, 72], travel companies [82], and resellers of returned items [79], use historical purchase data to adjust the sales mechanism. We provide provable guarantees for sample-based mechanism design. A mechanism designer must be cautious when performing sample-based mechanism design. In most multi-item settings, the form of the revenue-maximizing mechanism is still a mystery. Therefore, rather than use the samples to uncover the optimal mechanism, much of the literature on sample-based mechanism design suggests that we first fix a reasonably expressive mechanism class and then use the samples to optimize over the class. If, however, the mechanism class is large and complex, a mechanism with high average profit over the set of samples may have low expected profit on the actual unknown distribution if the number of samples is not sufficiently large. This motivates an important question in sample-based mechanism design: Given a set of samples and a mechanism class M, what is the difference between the average profit over the samples and the expected profit on the unknown distribution for any mechanism in M? If this difference is small, the mechanism designer can be assured that the mechanism in M that maximizes average profit over the set of samples nearly maximizes expected profit over the distri- bution as well. Thus, he can be confident in using the set of samples in lieu of a precise description of the distribution. In this paper, we present a general theory for deriving uniform convergence generalization guarantees in multi-item settings, as well as data-dependent guarantees when the distribution over buyers’ values is well-behaved. A generalization guarantee for a mechanism class M bounds the difference between the average profit over the samples and expected profit on the distribution for any mechanism in M. The bound holds uniformly across all mechanisms in M and is a function of both the number of samples and the intrinsic complexity of M. This paper is part of a line of research that studies how learning theory can be used to design and analyze mechanisms, beginning with seminal research by Balcan et al. [7, 8]. However, the majority of these papers have studied only single-item settings [2, 12, 18, 25, 30, 36, 41, 43, 45, 48, 59, 61, 67]. In contrast, we focus on multi-item mechanism design, as have recent papers by Cai and Daskalakis [20], Medina and Vassilvitskii [58], Morgenstern and Roughgarden [62], Syrgkanis [71] and Gonczarowski and Weinberg [42].

1.1 Our contributions Our contributions come in three interrelated parts.

1.1.1 A general theory that unifies diverse mechanism classes. We provide a clean, easy-to-use, general theorem for deriving generalization guarantees and we demonstrate its application to a large number of widely-used mechanism classes. This paper thus expands our understanding of uniform convergence in multi-item mechanism design, which had thus far focused on deriving guarantees for a small number of specific mechanism classes that are “simple” by design [62, 71]. We uncover a key structural property shared by a variety of mechanisms which allows us to prove generalization guarantees: for any fixed set of bids, profit is a piecewise linear function of the mechanism’s parameters. Our main theorem provides generalization guarantees for

2 Category Mechanism class Valuations Result Pricing mechanisms Item-pricing mechanisms General, unit-demand, additive Lemmas 3.18, 4.5, 9.7, 9.8 Two-part tariffs General Lemma 3.15 Non-linear pricing mechanisms General Lemmas 3.17, 8.7 Auctions Second-price auctions with Additive Lemmas 3.19, 4.4, 9.9 reserves Affine maximizer auctions General Lemma 3.21 Virtual valuation General Lemma 3.21 combinatorial auctions Mixed-bundling auctions with General Lemma 3.20 reserves Randomized mechanisms Lotteries Additive, unit-demand Lemmas 3.23, 4.6

Table 1: Brief summary of some of the main mechanism classes we analyze in Sections 3.4 and 4 as well as the Electronic Companion 9.1. any class exhibiting this structure. To prove this theorem, we relate the complexity of the partition splitting the parameter space into linear portions to the intrinsic complexity of the mechanism class, which we quantify using pseudo-dimension. In turn, pseudo-dimension bounds imply generalization bounds. We prove that many seemingly disparate mechanisms share this structure, and thus our main theorem yields learnability guarantees. Table 1 summarizes some of the main mechanism classes we analyze and Tables 2, 3, and 4 summarize our bounds. We prove that our main theorem applies to randomized mechanisms, making us the first to provide generalization bounds for these mechanisms. Our guarantees apply to lotteries, a general representation of randomized mechanisms. Randomized mechanisms are known to generate higher expected revenue than deterministic mechanisms in many settings [28, 32]. Our results imply, for example, that if the mechanism designer plans to offer a menu of ` lotteries over m items to an additive or unit-demand buyer, the difference between any such menu’s average profit over N   samples and its expected profit is O˜ Up`m/N , where U is the maximum profit achievable over the support of the buyer’s valuation distribution. We also provide guarantees for pricing mechanisms using our main theorem. These include item-pricing mechanisms, also known as posted-price mechanisms, where each item has a price and buyers buy their utility-maximizing bundles. Additionally, we study multi-part tariffs, where there is an upfront fee and a price per unit. We are the first to provide generalization bounds for these tariffs and other non-linear pricing mechanisms, which have been studied in economics for decades [38, 64, 80]. For instance, our main theorem guarantees that if there are κ units of a single good for sale, the difference between any two-part tariff’s average profit over N samples and its   expected profit is O˜ Upκ/N . Our main theorem implies generalization bounds for many auction classes, such as second price auctions. We also study several well-studied generalized VCG auctions, such as affine maximizer auctions, virtual valuations combinatorial auctions, and mixed-bundling auctions [33, 49, 52, 66, 69]. A key challenge which differentiates our generalization guarantees from those typically found in machine learning is the sensitivity of these mechanisms to small changes in their parameters. For example, changing the price of a good can cause a steep drop in profit if the buyer no longer wants to buy it. Meanwhile, for many well-understood function classes in machine learning, there is a close connection between the distance in parameter space between two parameter vectors and the

3 Valuations Auction class Our bounds Prior bounds Additive or Length-` lottery menu Up`m log(`m)/N N/A unit- demand Additive, Length-` item lottery menu Up` log `/N N/A item- independent∗

∗ Additive cost function Table 2: Generalization bounds in big-O˜ notation for lotteries. We denote the maximum profit achievable by any mechanism in the class over the support of the bidders’ valuation distribution by U. There are m items, N samples, and the cost function is general unless otherwise noted. distance in function space between the two corresponding functions. Understanding this connection is often the key to quantifying the class’s intrinsic complexity. Intrinsic complexity can be quantified by pseudo-dimension, for example, which allows us to derive learnability guarantees. Since profit functions do not exhibit this predictable behavior, we must carefully analyze the structure of the mechanisms we study in order to derive our generalization guarantees.

Data-dependent generalization guarantees for profit maximization. We strengthen our main theorem when the distribution over buyers’ values is “well-behaved,” proving generalization guarantees that are independent of the number of items for item-pricing mechanisms, second price auctions with reserves, and a subset of lottery mechanisms. Under anonymous prices, our bounds do not depend on the number of bidders either. These guarantees hold when the bidders are additive with values drawn from item-independent distributions (bidder i1’s value for item j is independent 0 from her value for item j , but her value for item j may be arbitrarily correlated with bidder i2’s value for item j). Bidders with item-independent value distributions have been studied extensively in prior re- search [5, 20, 22, 23, 39, 44, 81]. Cai and Daskalakis [20] provide learning algorithms for bidders with valuations drawn from product distributions, which are item-independent. As we describe in Section 6, we improve a generalization bound by Cai and Daskalakis [20] and Morgenstern and Roughgarden [62] for learning a mechanism with expected revenue that is nearly a constant frac- tion of optimal when the buyers have additive values drawn from a product distribution. The √ improvement is by a multiplicative factor of O( m), where m is the number of items.

Structural profit maximization. Many of the mechanism classes we study exhibit a hierar- chical structure. For example, when designing a pricing mechanism, the designer can segment the population into k groups and charge each group a different price. This is prevalent throughout daily life: movie theaters and amusement parks have different admission prices per market segment, with groups such as Child, Student, Adult, and Senior Citizen. In the simplest case, k = 1 and the prices are anonymous. If k equals the number of buyers, the prices are non-anonymous, thus forming a hierarchy of mechanisms. In general, the designer should not choose the simplest class to optimize over simply to guarantee good generalization because more complex classes are more likely to contain nearly optimal mechanisms. We show how the mechanism designer can determine the precise level in the hierarchy assuring him the optimal tradeoff between profit maximization and generalization.

4 Valuations Mechanism class Price class Our bounds Prior bounds General Length-` menus of Anonymous Up` log(κn`)/N N/A two-part tariffs p over κ units Non-anonymous U n` log(κn`)/N N/A p Qm ‡ Non-linear pricing Anonymous U m i=1(κi + 1)/N N/A p Qm ‡ Non-anonymous U nm i=1(κi + 1)/N N/A p Pm ‡ Additively Anonymous U m i=1 κi/N N/A decomposable p Pm ‡ non-linear pricing Non-anonymous U nm i=1 κi/N N/A Item-pricing Anonymous Upm2/N Upm2/N § Non-anonymous Upnm(m + log n)/N Upnm2 log n/N § Unit- Item-pricing Anonymous Upm · min{m, log(nm)}/N Upm2/N § demand Non-anonymous Upnm log(nm)/N Upnm2 log n/N § Additive Item-pricing Anonymous Upm log m/N Upm log m/N §, (U/δ) pm log (nN) /N † Non-anonymous Upnm log(nm)/N Upnm log(nm)/N §, (U/δ) pnm log (N) /N † Additive, Item-pricing Anonymous Up1/N Upm log m/N §, item- (U/δ) pm log (nN) /N † independent∗ Non-anonymous Upn log n/N Upnm log(nm)/N §, (U/δ) pnm log (N) /N †

∗ ‡ § Additive cost function; κi is an upper bound on the number of units available of item i; Morgenstern and Roughgarden [62]; † Syrgkanis [71]. The probability these bounds fail to hold is δ. In all other bounds, δ appears in a log so we suppress it using big-O˜ notation.

Table 3: Generalization bounds in big-O˜ notation for pricing mechanisms. We denote the maxi- mum profit achievable by any mechanism in the class over the support of the bidders’ valuation distribution by U. There are m items, n buyers, and N samples. The cost function is general unless otherwise noted.

1.2 Related research Sample-based mechanism design was introduced in the context of automated mechanism design (AMD). In AMD, the goal is to design algorithms that take as input information about a set of buyers and return a mechanism that maximizes an objective such as revenue [27, 29, 68]. The input information about the buyers in early AMD was an explicit description of the distribution over their valuations. The support of the distribution’s prior is often doubly exponential, for example in combinatorial auctions, so obtaining and storing the distribution is impractical. In response, sample-based mechanism design was introduced where the input is a set of samples from this distribution [54, 55, 69]. Those papers also introduced the idea of searching for a high-revenue mechanism in a parameterized space where any parameter vector yields a mechanism that satisfies the individual rationality and incentive-compatibility constraints. This was in contrast to the traditional approach of representing mechanism design as an unrestricted optimization problem where those constraints need to be explicitly modeled. The parameterized work studied algorithms for designing combinatorial auctions with high empirical revenue. We follow the parameterized approach, but we study generalization guarantees, which they did not address.

5 Valuations Auction class Our bounds Prior bounds √ General AMAs and λ-auctions Upnm+1m log n/N cUpm/Nnm+2 n2 + nm¶‖ √ VVCAs Upn2m2m log n/N cUpm/Nnm+2 n2 + nm¶‖ MBARPs Upm(log n + m)/N Upm3 log n/N ¶ Additive Second price item auctions with Upm log m/N Upm log m/N § anonymous reserve prices Second price item auctions with Upnm log(nm)/N Upnm log(nm)/N § non-anonymous reserve prices Additive, Second price item auctions with Up1/N Upm log m/N § item- anonymous reserve prices independent∗ Second price item auctions with Upn log n/N Upnm log(nm)/N § non-anonymous reserve prices

∗ Additive cost function; ‖ The value of c > 1 depends on the range of the auction parameters; § Morgenstern and Roughgarden [62]; ¶ Balcan et al. [10].

Table 4: Generalization bounds in big-O˜ notation for auctions. We denote the maximum profit achievable by any mechanism in the class over the support of the bidders’ valuation distribution by U. There are m items, n buyers, and N samples. The cost function is general unless otherwise noted.

Balcan et al. [7, 8] were the first to study the connection between learning theory and revenue maximization, employing classic learning-theoretic tools such as covering numbers. They showed how to use an algorithm A that returns a high-revenue, manipulable mechanism in order to find a high-revenue, incentive-compatible mechanism. Their approach randomly splits the bidders into two groups. With the first group’s bids as input, they use the algorithm A to find a high-revenue mechanism and they field that mechanism on the second group (and vice versa). While the mech- anism is not incentive compatible for the first group, it is incentive compatible for the second. They prove that the mechanism’s revenue over the first group’s bids nearly matches the revenue obtained from the second group, so long as the number of bidders is sufficiently large compared to the covering number of the mechanism class being optimized over. Balcan et al. [7, 8] study settings with unrestricted supply, whereas we primarily focus on settings with limited supply. More recent research has provided generalization guarantees when there is limited supply, with a particular focus on single-parameter (e.g., single-item) settings [2, 18, 24, 25, 30, 36, 41, 43, 45, 48, 59, 61, 67]. From an algorithmic perspective, Devanur et al. [30], Gonczarowski and Nisan [41], Guo et al. [43], and Hartline and Taggart [45] provide computationally efficient algorithms for learning nearly-optimal single-item auctions in various settings. In contrast, we study multi-parameter settings. Balcan et al. [9] drew on classic tools from learning theory to provide algorithms and general- ization guarantees for a related problem in algorithmic : learning agents’ preferences. Specifically, they made connections to the concept of generalized linear functions from the struc- tured prediction literature [e.g., 26, which we discuss in Appendix 9]. Their algorithms make use of past data describing the purchases of a utility-maximizing agent in order to predict the agent’s future purchases. Morgenstern and Roughgarden [62] later used the same concept of generalized linear functions from the structured prediction literature to provide sample complexity guarantees for multi-item revenue maximization. They proposed a technique for bounding a mechanism class’s pseudo-

6 dimension that requires two steps; we describe these two steps at a high level here and refer the reader to Appendix 9 for more details. First, one must show that for any mechanism in the class, its allocation function is a d-dimensional linear function, for some d ∈ Z (see Appendix 9 for the formal definition). Next, fixing an arbitrary set of samples and an allocation per sample, one must bound the pseudo-dimension of the set of revenue functions across all mechanisms that induce those allocations. They show how these two steps imply a bound on the mechanism class’s pseudo-dimension. The guarantees presented in this paper offer several advantages over Morgenstern and Rough- garden’s approach. First, our main theorem depends on a structural property—the piecewise-linear form of the revenue function—that is not defined in terms of any learning theory concept (such as generalized linear functions or pseudo-dimension) and thus can be more readily applicable. More- over, in several cases, Morgenstern and Roughgarden [62] proved loose guarantees using structured prediction; in the appendix, they used a first-principles approach to prove stronger guarantees. Their structured prediction proof technique requires them to bound the total number of allocations a mechanism class can induce on a set of samples. Their bound is a bit loose, and we are able to tighten it using the techniques we develop in this paper, as we detail in Appendix 9. By combining our analysis techniques with tools from structured prediction, we are able to match the tighter bounds that Morgenstern and Roughgarden [62] proved via a first-principals approach. Finally, we apply our guarantees to a wide variety of mechanism classes, both simple and complex (as sum- marized in Table 1), whereas Morgenstern and Roughgarden [62] applied their guarantees to three mechanism classes that are “simple” by design: item-pricing mechanisms, grand-bundle-pricing mechanisms (where an agent can buy either the grand bundle consisting of all items or nothing at all), and second-price item auctions. Syrgkanis [71] also suggests a general technique for providing generalization guarantees which he applies to several “simple” mechanism classes: the same three as Morgenstern and Roughgarden [62] as well as single-item t-level auctions [61]. His generalization guarantees apply only to empirical revenue maximization algorithms, which return the mechanism (from within a particular mechanism class) that maximizes average revenue over the samples. This is in contrast to our bounds (as well as those by Morgenstern and Roughgarden [62]), which apply uniformly to every mechanism in a given class. This is crucial when it may not be computationally feasible to determine a mechanism with highest average revenue over the samples, but only an approximation. Given a set of samples S of size N, Syrgkanis’s bound depends on a quantity he calls the split-sample growth rate, which counts how many different mechanisms the empirical revenue maximization algorithm can return on any sub-sample of S of size N/2. Another advantage of our generalization bounds (beyond applying uniformly to every mechanism in a given class) is that they grow only logarithmically 1 with a term δ , where δ is the probability that the bound fails to hold (as do those by Morgenstern 1 and Roughgarden [62]). In contrast, the bounds by Syrgkanis [71] grow linearly in δ . In Section 6, we provide more details on how our results compare to those by Morgenstern and Roughgarden [62] and Syrgkanis [71], as well as a detailed comparison of our results to other papers on the sample complexity of multi-item revenue maximization [10, 20, 42, 58]. A problem that is similar but distinct from ours is dynamic pricing, where prices are adjusted over a finite time horizon and the consumer demand function is unknown [e.g., 3, 15, 17, 40, 50, all of whom study the single-item setting]. The goal is typically to minimize regret, which is the difference between the cumulative profit of the best prices in hindsight and that of the algorithm’s selected prices.

7 2 Preliminaries and notation

We consider the problem of selling m heterogeneous items to n buyers. We denote a bundle of m items as a quantity vector q ∈ Z≥0. The number of units of item i in the bundle represented by q is denoted by its ith component q[i]. Accordingly, the bundle consisting of only one copy of the th i item is denoted by the standard basis vector ei, where ei[i] = 1 and ei[j] = 0 for all j 6= i. Each buyer j ∈ [n] has a valuation function vj over bundles of items. If one bundle q0 is contained within another bundle q1 (i.e., q0[i] ≤ q1[i] for all i ∈ [m]), then vj (q0) ≤ vj (q1) and vj (0) = 0. We denote an allocation as Q = (q1,..., qn) where qj is the bundle of items that buyer j receives under allocation Q. The cost to produce the bundle q is denoted as c (q) and the cost to produce the Qm allocation Q is denoted as c (Q). Suppose there are κi units available of item i. Let K = i=1 κi. We use vj = (vj (q1) , . . . , vj (qK )) to denote buyer j’s values for all of the K bundles and we use v = (v1,..., vn) to denote a vector of buyer values. We use the notation X to denote the set of Pm all valuation vectors v. We also study additive buyers (vj (q) = q[i]vj (ei)) and unit-demand  i=1 buyers vj (q) = maxi:q[i]≥1 vj (ei) . Every auction in the classes we study is dominant strategy incentive compatible, so we assume that the bids equal the bidders’ valuations. There is an unknown distribution D over buyers’ values, which means the support of D is contained in the set X . We make very few assumptions about this distribution. First of all, we do not assume the distribution belongs to a parametric family. Moreover, we do not assume the buyers’ values are independently or identically distributed. In particular, a bidder’s values for multiple bundles may be correlated, and multiple bidders may have correlated values as well. We say that profitM (v) is the profit of a mechanism M on the valuation vector v. For a dis- tribution D over buyers’ values, we denote the expected profit of M over D as profitD (M) = Ev∼D [profitM (v)] and for a set of samples S, we denote the average profit of M over S as 1 P profitS (M) = |S| v∈S profitM (v). d We study real-valued functions parameterized by vectors p in R , denoted as fp : X → R. For a fixed v ∈ X , we often consider fp (v) as a function of its parameters, which we denote as fv (p).

3 Generalization guarantees

In this section, we provide generalization guarantees for sample-based mechanism design. A gen- eralization guarantee for a given mechanism class informs the mechanism designer how well the expected profit of any mechanism in the class matches its average profit over the samples. There- fore, if a mechanism class comes with strong generalization guarantees, the designer can be confident that the set of samples can serve as a surrogate for knowledge of the distribution. In particular, if he finds a mechanism in the class with high revenue over the set of samples, then that mechanism will almost certainly have high revenue in expectation over the distribution over buyers’ values. Generalization guarantees have been a central topic in computational learning theory dating back to seminal work by Vapnik and Chervonenkis [78]. Definition 3.1 (Generalization guarantee). A generalization guarantee for a mechanism class M is a function M : Z≥1 ×(0, 1) → R≥0 defined such that for any δ ∈ (0, 1), any sample size N ∈ Z≥1, and any distribution D over buyers’ valuations, with probability at least 1 − δ over the draw of a set S ∼ DN , for any mechanism M in M, the difference between the average profit of M over S and the expected profit of M over D is at most M(N, δ). In other words, " # 1 X Pr ∃M ∈ M such that profitM (v) − E [profitM (v)] > M(N, δ) < δ. S∼DN N v∼D v∈S

8 If M(N, δ) tends to 0 as N tends to infinity, then we say that M is uniformly learnable. The 1 − δ high probability caveat must be present in any generalization guarantee because it is always possible that the set of samples, no matter how large, is completely unrepresentative of the distribution over buyers’ values. Therefore, we can only guarantee that the difference between the average and expected profit of a mechanism is small with probability 1 − δ over the draw of the set of samples. As is intuitively clear, M(N, δ) ought grow as δ shrinks since we need to ensure that the difference between the average and expected profit of each mechanism in M is at most M(N, δ) with probability at least 1 − δ. Moreover, the larger the mechanism designer’s set of samples is, the smaller the difference between average profit over the samples and expected profit will be. Since this difference is bounded by M(N, δ), we can conclude that M(N, δ) ought to shrink as N grows. The final element that any generalization bound M(N, δ) depends on is the specific mechanism class M. Computational learning theory informs us that the more “complex” the mechanism class M is, the more difficult it is to bound the difference between the average and expected profit of every mechanism in M. Mathematically, this means that richer mechanism classes M have larger generalization bounds M(N, δ), so M(N, δ) grows with the complexity of M. Generalization guarantees are useful beyond their ability to bound the difference between the average profit of a mechanism and its expected profit. For an arbitrary class M, generalization guarantees allow the mechanism designer to relate the expected profit of a mechanism in M which achieves maximum average profit over the set of samples to the expected profit of an optimal mechanism in M. We summarize this connection in the following remark, which is well-known for learning over general function classes in machine learning (see, for example, the textbooks by Shalev-Shwartz and Ben-David [70] and Mohri et al. [60]). Remark 3.2. For a set of samples S from the distribution over buyers’ values, let Mˆ be the mech- anism in M that maximizes average profit over the set of samples and let M ∗ be the mechanism in M that maximizes expected profit over D. Finally, let U be the maximum profit achievable by any mechanism in M over the support of the distribution D. For any δ ∈ (0, 1), with prob- ability at least 1 − δ over the draw of a set of N samples S from D, the difference between the ˆ ∗ expected profit of M over D and the expected profit of M over D is at most 2M (N, δ). In ˆ P ∗ other words, if M = argmaxM∈M v∈S profitM (v) , M = argmaxM∈M {Ev∼D [profitM (v)]}, and U = maxM∈M,v∈supp(D) {profitM (v)}, then     Pr E profitM ∗ (v) − profitMˆ (v) > 2M (N, δ) < δ. S∼DN v∼D

Therefore, so long as there is a strong generalization bound M(N, δ) associated with the class M, the mechanism designer can be confident that an optimal mechanism over the set of sam- ples competes with an optimal mechanism in M. Similar bounds also hold for mechanisms with approximately optimal average profit over the samples (see Corollaries 8.1 and 8.2 in Appendix 8).

3.1 General structure for sample-based mechanism design Our general theorem uses structure shared by a myriad of mechanism classes to characterize the function M (N, δ). Our results apply broadly to parameterized sets M of mechanisms. This means d that every mechanism in M is defined by a vector p ∈ R , where the value of d depends on the mechanism class. For example, p might be a vector of prices. Our guarantees apply to mechanism classes where for every valuation vector v ∈ X , the profit as a function of the parameters p, denoted profitv (p), is piecewise linear. We begin by illustrating this property via several simple examples.

9 Figure 1: This figure illustrates the partition of the two-part tariff parameter space into piecewise- linear portions in the following scenario: there is one buyer whose value for one unit is v1(1) = 6, value for two units is v1(2) = 9, value for three units is v1(3) = 11, and value for i ≥ 4 units is v1(i) = 12. We assume the seller produces at most four units. The buyer will buy exactly one unit if v1(1) − p1 − p2 > v1(i) − p1 − i · p2 for all i ∈ {2, 3, 4} and v1(1) − p1 − p2 > 0. This region of the parameter space is colored orange (the top region), and within, profit is linear in p1 and p2. By similar logic, the buyer will buy exactly two units in the blue region (the second-the-top region), exactly three units in the green region (the second-the-bottom region), and exactly four units in the red region (the bottom region), and profit is linear in p1 and p2 in each of these regions.

Example 3.3 (Two-part tariffs). In a two-part tariff, there are multiple units (i.e., copies) of a single item for sale. The seller sets an upfront fee p1 and a price per unit p2. Here, we consider the simple case where there is a single buyer†. If the buyer wishes to buy t ≥ 1 units, she pays the upfront fee p1 plus p2 · t, and if she does not want to buy anything, she does not pay anything. Two-part tariffs have been studied extensively [38, 64, 80] and are prevalent throughout daily life. For example, health club membership programs often require an upfront membership fee plus a fee per month. Amusement parks often require an entrance fee with an additional payment per ride, as Oi [64] analyzed in his paper “A Disneyland Dilemma.” In many cities, purchasing a public transportation card requires a small upfront fee and an additional cost per ride. Many coffee machines, such as those made by Keurig and Nespresso, require specialty coffee pods. Purchasing these pods amounts to paying a fee per unit on top of the upfront fee, which is the cost of the coffee machine. Following the publication of the conference version of this paper [11], Balcan, Prasad, and Sandholm [12] showed how to efficiently learn two-part tariffs that maximize average revenue over a training set, as well as menus of two-part tariffs, which we discuss in Section 3.4.1. Revenue is a piecewise-linear function of the two-part tariff parameters p1 and p2 for the follow- ing reason. Suppose there are κ units of the item for sale. The buyer will buy exactly t ∈ {1, . . . , κ} 0 0 0 units so long as v1 (t)−(p1 + p2 · t) > v1 (t )−(p1 + p2 · t ) for all t 6= t and v1 (t)−(p1 + p2 · t) > 0. κ+1 2 Therefore, for a fixed set of buyer values, there are at most 2 hyperplanes splitting R into convex regions such that within any one region, the number of units bought does not vary. So long as the number of units bought is invariant, profit is a linear function of p1 and p2. See Figure 1 for an illustration.

Example 3.4 (Item-pricing mechanisms). Under an item-pricing mechanism, also known as a posted-price mechanism, there are multiple items for sale and multiple buyers. Unlike in Exam- ple 3.3, we assume there is a single unit of each item for sale. The mechanism designer sets a price per item and buyers buy their utility-maximizing bundles. These mechanisms are prevalent

†We generalize to multiple buyers in Section 3.4.

10 Figure 2: This figure illustrates the partition of the item-pricing parameter space into piecewise- linear portions in the following scenario: there are two items for sale and there are two buyers. Buyer 1’s value for the first item is v1(1, 0) = 2, her value for the second item is v1(0, 1) = 1, and her value for both items is v1(1, 1) = 2.5. Buyer 2’s values are v2(1, 0) = 0, v2(0, 1) = 1, and v2(1, 1) = 1. Suppose buyer 1 comes before buyer 2 in the ordering, which means that buyer 1 will first choose to buy the bundle of items that maximizes her utility and then buyer 2 will buy the bundle of remaining items that maximizes her utility. In the orange region, buyer 1 will buy item 1 because v1(1, 0) − p1 > v1(0, 1) − p2, v1(1, 0) − p1 > v1(1, 1) − (p1 + p2), and v1(1, 0) − p1 > 0. Buyer 2 will not buy anything because the only remaining item is item 2 and v2(0, 1) − p2 < 0. By the same logic, in the red region, neither buyer will buy any item. In the blue region, buyer 1 will buy item 1 and buyer 2 will buy item 2. In the green region, buyer 1 will buy item 2 and buyer 2 will not buy anything. Finally, in the white region, buyer 1 will buy both items and buyer 2 will not buy anything. throughout the economics and computation literature (e.g., [4, 22, 37]). In a bit more detail, an item-pricing mechanism can be defined by either anonymous or non-anonymous prices. Under anonymous prices, the seller sets a price pi per item i. Under non-anonymous prices, there is a buyer-specific price per item. There is some fixed but arbitrary ordering on the buyers such that the first buyer in the ordering arrives first and buys the bundle of items that maximizes his utility, then the next buyer in the ordering arrives and buys the bundle of remaining items that maximizes his utility, and so on. Consider the simple case where the prices are anonymous‡. For a given buyer j and bundle pair q , q ∈ {0, 1}m, buyer j will prefer bundle q over bundle q so long as v (q ) − P p > 1 2 1 2 j 1 i:q1[i]=1 i v (q ) − P p . Therefore, for a fixed set of buyer values and for each buyer j, her preference j 2 i:q2[i]=1 i 2m ordering over the bundles is completely determined by the 2 hyperplanes    X X m vj(q1) − pi = vj(q2) − pi : q1, q2 ∈ {0, 1} .

 i:q1[i]=1 i:q2[i]=1  Once the buyers’ preference orderings are all fixed, the set of bundles they will each buy is also fixed. Furthermore, in any region of the price space where the bundles the buyers buy are fixed, profit is a linear function of the prices. See Figure 2 for an illustration. We provide generalization guarantees that are closely dependent on the “complexity” of the d partition splitting R into regions such that profitv (p) is linear. Inspired by structure exhibited by many mechanism classes, such as Examples 3.3 and 3.4, we require that this partition be defined by a finite number of hyperplanes. We give the following name to this type of mechanism class: Definition 3.5 ((d, t)-delineable). We say a mechanism class M is (d, t)-delineable if:

‡We study non-anonymous prices as well in Section 3.4.

11 (a) A partition by hy- (b) Another partition (c) Overlay of parti- (d) A further subdivi- perplanes. by hyperplanes. tions (a) and (b). sion of each region.

Figure 3: Illustrations of the proof of Lemma 3.10.

d 1. The class M consists of mechanisms parameterized by vectors p from a set P ⊆ R ; and 2. For any valuation vector v ∈ X , there is a set H of t hyperplanes such that for any connected 0 0 component P of P \ H, the function profitv (p) is linear over P . As is standard, P \ H indicates set removal. For example, in Figures 3a and 3b, there are four connected components in the partition defined by the hyperplanes and in Figure 3c, there are nine connected components. We relate delineability to the mechanism class’s intrinsic complexity using pseudo-dimension, which we describe in the following section. In turn, pseudo-dimension bounds imply generalization guarantees.

3.2 Pseudo-dimension Pseudo-dimension [65] is a well-studied learning-theoretic tool used to measure the complexity of a function class. In the case of mechanism design, when we refer to a function class, we mean the set of all profit functions profitM defined by mechanisms M from a fixed mechanism class M. Pseudo- dimension captures the following intuition: the richer a function class is, the better functions within that class are able to fit complex patterns. Pseudo-dimension is a quantity that measures just how well functions within a class are able to match these complex patterns. To formally define pseudo- dimension, we first introduce the notion of shattering, which is a fundamental concept in machine learning theory. We first describe it for general function classes, and then instantiate it for profit functions.

Definition 3.6 (Shattering). Let F be a set of real-valued functions f : A → R whose domain is  (1) (N) (1) (N) an abstract set A. Let S = x , . . . , x be a subset of A and let z , . . . , z ∈ R be a set of witnesses. We say that z(1), . . . , z(N) witness the shattering of S by F if for all T ⊆ S, there exists (i) (i) (i) (i) (i) (i) some function fT ∈ F such that for all x ∈ T , fT x ≤ z and for all x 6∈ T , fT x > z . Figure 4 provides a visualization. Intuitively, the larger the set a function class can shatter, the more complex that function class is. The notion of pseudo-dimension formalizes this intuition.

Definition 3.7 (Pseudo-dimension [65]). Let F be a set of real-valued functions f : A → R whose domain is an abstract set A. Let S ⊆ A be the largest set that can be shattered by F. Then the pseudo-dimension of F, denoted Pdim(F), is defined to be |S|.

For the sake of clarity, we now translate pseudo-dimension into the language of mechanism  (1) (N) (1) (N) design. Let S = v ,..., v be a subset of X and let z , . . . , z ∈ R be a set of witnesses. We say that z(1), . . . , z(N) witness the shattering of S by M if for all T ⊆ S, there exists some mechanism M ∈ such that for all v(i) ∈ T , profit v(i) ≤ z(i) and for all v(i) 6∈ T , T M MT

12 (a) Illustration of an affine function f (1) : R → R (b) Illustration of another affine function f (2) : (1) (1) (1) (2) (1) (1) (2) (2) where f x is larger than the witness z R → R. Here, f x < z and f x > and f (1) x(2) < z(2). z(2).

(c) Illustration of a third function f (3). Here, (d) Illustration of one last function f (4). Here, f (3) x(1) > z(1) and f (3) x(2) > z(2). f (4) x(1) < z(1) and f (4) x(2) < z(2).

Figure 4: The two points x(1) and x(2) can be shattered by the set F of affine functions mapping R to R, i.e., F = {x 7→ ax + b : a, b ∈ R}. profit v(i) > z(i). If there exists some z ∈ N that witnesses the shattering of by , then MT R S M we say that S is shatterable by M. Finally, the pseudo-dimension of M, denoted Pdim (M), is the size of the largest set that is shatterable by M. Pollard [65] and Dudley [34] provide generalization guarantees§ in terms of pseudo-dimension. We translate those guarantees into the language of mechanism design below in Theorem 3.8. The theorem holds for any mechanism class M, delineable or otherwise.

Theorem 3.8 ([34, 65]). For any mechanism class M and any distribution D over buyers’ values, let U be the maximum profit achievable by mechanisms in M over the support of D. Then there exists a generalization guarantee M : Z≥1 × (0, 1) → R≥0 defined such that r r Pdim(M) 2 ln(4/δ)  (N, δ) = 120U + 4U . M N N

In other words, with probability at least 1 − δ over the draw of the samples S = v(1),..., v(N) ∼ DN , for all mechanisms M ∈ M,

N r r 1   Pdim(M) 2 ln(4/δ) X (i) profitM v − E [profitM (v)] ≤ 120U + 4U . N v∼D N N i=1 By Theorem 3.8, if we can upper bound the pseudo-dimension of a mechanism class M, then we can bound the difference between the average profit over a set of samples and the actual expected profit of any mechanism in M. In the next section, we provide a generalization guarantee M(N, δ) by upper bounding Pdim(M) using the delineability parameters d and t.

§Pseudo-dimension also implies multiplicative generalization guarantees in addition to the additive generalization guarantee presented in Theorem 3.8 [46, 53, 75].

13 3.3 General theorem for sample-based mechanism design In the following theorem, which is our main theorem, we relate pseudo-dimension to delineability (Definition 3.5). Theorem 3.9. Suppose M is a mechanism class that is (d, t)-delineable. Given a distribution D over buyers’ values, let U be the maximum profit achievable by mechanisms in M over the support of D. There exists a generalization guarantee M : Z≥1 × (0, 1) → R≥0 defined such that r r 9d log(4dt) 2 ln(4/δ)  (N, δ) = 120U + 4U . M N N In other words, with probability at least 1 − δ over the draw of a set S = v(1),..., v(N) ∼ DN , for all mechanisms M ∈ M,

N r r 1   9d log(4dt) 2 ln(4/δ) X (i) profitM v − E [profitM (v)] ≤ 120U + 4U . N v∼D N N i=1 Proof. This theorem follows directly from the following lemma, where we prove that the pseudo dimension of M is at most 9d log(4dt), and Theorem 3.8, which provides a generalization guarantee in terms of pseudo-dimension.

Lemma 3.10. If M is a mechanism class that is (d, t)-delineable, then the pseudo dimension of M is at most 9d log(4dt). Proof. For any positive integer N, consider a set of N valuation vectors S = v(1),..., v(N) and (1) (N) real values z , . . . , z ∈ R. We will show that there exists a partitioning of the parameter space d d (i) into at most dN · d(Nt) regions such that for all p in any one region and all v , profitv(i) (p) is either greater than z(i) or less than z(i) (but not both). We will then use this fact to bound the pseudo-dimension of M. To this end, for v(i) ∈ S, let H(i) be the set of t hyperplanes such that for any connected 0 (i) 0 component P of P \ H , profitv(i) (p) is linear over P . We now consider the overlay of all N (1) (N) partitions P \H ,..., P \H . Formally, this overlay is made up of the sets P1,..., Pτ , which are SN (i) the connected components of P \ i=1 H . For each set Pj and each i ∈ [N], Pj is completely (i) contained in a single connected component of P \ H , which means that profitv(i) (p) is linear over (i) d Pj. (See Figures 3a, 3b, and 3c for illustrations.) Since H ≤ t for all i ∈ [N], τ < d(Nt) (this follows from Theorem 1 by Buck [19]). SN (i) (i) Consider a single connected component Pj of P\ i=1 H . For any valuation vector v ∈ S, (i) d (i) we know that profitv(i) (p) is linear over Pj. Let aj ∈ R and bj ∈ R be the weight vector and (i) (i) offset such that profitv(i) (p) = aj · p + bj for all p ∈ Pj. We know that there is a hyperplane (i) (i) (i) (i) aj · p + bj = z where on one side of the hyperplane, profitv(i) (p) ≤ z and on the other (i) side, profitv(i) (p) > z for all p ∈ Pj. Let HPj be all N hyperplanes for all N samples, i.e., n (i) (i) (i) o 0 HPj = aj · p + bj = z : i ∈ [N] . Notice that in any connected component P of Pj \ HPj (i) (i) (illustrated in Figure 3d), for all i ∈ [N], profitv(i) (p) is either greater than z or less than z 0 (but not both) for all p ∈ P . In total, the number of connected components of Pj \ HPj is smaller d than dN . The same holds for every partition Pj. Thus, the total number of regions where for (i) (i) all i ∈ [N], profitv(i) (p) is either greater than z or less than z (but not both) is smaller than dN d · d(Nt)d.

14 We will now use this fact to bound the pseudo-dimension of M. Suppose Pdim(M) = N¯. By n (1) (N¯)o (1) (N¯) definition, there exists a set v ,..., v that is shattered by M. Let z , . . . , z ∈ R be the points that witness this shattering. Again, by definition, we know that for any T ⊆ [N¯], there exists a parameter vector p ∈ such that if i ∈ T , then profit v(i) ≥ z(i) and if i 6∈ T , then T P pT profit v(i) < z(i). Let ∗ = p : T ⊆ [N¯] . We know that there exist k ≤ dN¯ d ·d(Nt¯ )d regions pT P T ¯ (i) P1,..., Pk where for each region Pj and each i ∈ [N], either profitv(i) (p) is greater than than z (i) ∗ for all p ∈ Pj or it is less than z . At most one vector in P can come from any one region. This means that |P ∗| = 2N¯ < dN¯ d · d(Nt¯ )d, which means that N¯ < 2d log N¯ + 2 log d + d log t. By Lemma 8.3 in Appendix 8, N¯ < 8d log(4d) + d log t + 2 log d ≤ 9d log(4dt).

In this section, we saw that if the profit functions of mechanisms from a class M are piecewise linear functions of the parameters, then we can provide a generalization guarantee M(N, δ) in terms of the class’s deliniability parameters. Thus, we can bound the difference between the average profit over a set of samples and the actual expected profit of any mechanism in the class.

3.4 Delineable mechanism classes We now show that a diverse array of mechanism classes are delineable, so we can apply Theorem 3.9. We warm up with Examples 3.3 and 3.4. The full proofs of all results are in Appendix 8. Lemma 3.11. Let M be the class of two-part tariffs over a single buyer and κ units (i.e., copies) κ+1 of a single item. Then M is 2, 2 -delineable. 2 Proof. The parameter space defining M is R . As we saw in Example 3.3, for any valuation vector κ+1 2 v, there are 2 hyperplanes splitting R into regions over which profitv (p) is linear. Lemma 3.12. Let M be the class of item-pricing mechanisms with anonymous prices. Then M  2m is m, n 2 -delineable. m Proof. This class’s parameter space is R , since there is one price per item. As we saw in Exam- 2m m ple 3.4, for any valuation vector v, there are n 2 hyperplanes splitting R into regions such that within any one region, profitv (p) is linear.

3.4.1 Non-linear pricing mechanisms. Non-linear pricing mechanisms are specifically used to sell multiple units (i.e., copies) of a set of items. We discuss two-part tariffs and general non-linear pricing mechanisms. We make the following natural assumption which informally states that as the number of units in an allocation grows, the cost of that allocation will exceed the buyers’ welfare. This implies delineability. Our assumption is formalized as follows.

Assumption 3.13. There is some cap κi per item i such that it costs more to produce κi units of m item i than the buyers will pay. In other words, there exists (κ1, . . . , κm) ∈ R such that for all v in the support of the distribution D over buyers’ values and all allocations Q = (q1,..., qn), if Pn Pn there exists an item i such that j=1 qj[i] > κi, then j=1 vj (qj) − c (Q) < 0. For example, suppose there are multiple units of a single item for sale and a single buyer whose value for τ units is always at most 8 + τ. Moreover, suppose the cost of producing τ units is 3τ. Then κ1 = 4 because for τ ≥ 5, 8 + τ < 3τ. In the remainder of this section, we describe why this assumption implies (d, t)-delineability with d < ∞ and t < ∞.

15 Menus of two-part tariffs. Menus of two-part tariffs are a generalization of Example 3.3. At a high level, the seller offers the buyers ` different two-part tariffs and each buyer simultaneously chooses the tariff and number of units that maximizes his utility. Menus of two-part tariffs are common throughout daily life: consumers often choose among various membership tiers—typically with a larger upfront fee in exchange for lower future payments—for health clubs, wholesale stores like Costco, amusement parks, credit cards, and cellphone plans.  (1) (1)  (`) (`) More formally, in the case of non-anonymous prices, let p1,j , p2,j ,..., p1,j, p2,j be the menu (i) th of two-part tariffs that the seller offers to buyer j. Here, p1,j is the upfront fee of the i tariff and (i) (i) (i) p2,j is the price per unit. If the prices are anonymous, then for each tariff i, p1,1 = ··· = p1,n and (i) (i) ¶ p2,1 = ··· = p2,n. We assume each buyer simultaneously chooses the tariff and the number of units maximizing his utility. In this context, an allocation is simply a vector (q1, . . . , qn) where qj ∈ Z≥0 is the number of units buyer j chooses to buy. If buyer j chooses the tariff tj ∈ [`] and chooses to (tj ) (tj ) buy qj ≥ 1 units, he will pay p1,j + p2,j · qj. Thus, the seller’s profit is n X (tj ) (tj ) p1,j · 1{qj ≥1} + p2,j · qj − c (Q) j=1

2n` where Q = (q1, . . . , qn). Note that this set of mechanisms is parameterized by vectors in R if 2` there are non-anonymous prices and R if the prices are anonymous. Assumption 3.13 states that there is some cap κ ∈ R such that for all v in the support of Pn the distribution D over buyers’ values and all allocations Q = (q1, . . . , qn), if j=1 qj > κ, then Pn j=1 vj (qj) − c (Q) < 0. We also make the natural assumption that the mechanism designer will not choose prices that cause the seller to have negative utility. In other words, we assume he will select a profit non-negative menu of two-part tariffs, formalized as follows. 2` Definition 3.14. In the case of anonymous prices (respectively, non-anonymous), let P ⊆ R 0 2n` (respectively, P ⊆ R ) be the set of prices where no matter which tariff each buyer chooses and no matter how many units he buys, the seller will obtain non-negative utility. In other words, let P (respectively, P 0) be the set of mechanism parameters such that for each buyer j ∈ [n], each tariff tj ∈ [`], and each allocation Q = (q1, . . . , qn), the seller’s utility is non-negative: n X (tj ) (tj ) p1,j · 1{qj ≥1} + p2,j · qj − c (Q) ≥ 0. j=1 The set of profit non-negative menus of two-part tariffs is defined by parameters in P (respectively, P 0). Under Assumption 3.13, we are guaranteed that no matter which profit non-negative parameters the mechanism designer chooses in P or P 0, if all buyers simultaneously choose the tariff and the Pn number of units (q1, . . . , qn) that maximize their utilities, then j=1 qj ≤ κ. See Lemma 8.4 in Appendix 8 for the proof. This fact is crucial to proving delineability since the number of hyperplanes splitting the parameter space into regions where profit is linear depends on κ. Lemma 3.15. Let M and M0 be the classes of anonymous and non-anonymous profit non-negative   length-` menus of two-part tariffs. Under Assumption 3.13, M is 2`, n (κ`)2 -delineable and M0   is 2n`, n (κ`)2 -delineable.

¶The fact that the buyers arrive simultaneously and that there are multiple units of each item for sale differentiates this setting from that of item-pricing, which we describe in Example 3.4 and later in this section.

16 General non-linear pricing mechanisms. We study general non-linear pricing mechanisms under Wilson’s bundling interpretation [80]: if the prices are anonymous, there is a price per quantity vector q denoted p (q). The buyers simultaneously choose the bundles maximizing their utility: buyer j will purchase the bundle that maximizes vj (q) − p (q). If the prices are non- anonymous, there is a price per quantity vector q and buyer j ∈ [n] denoted pj (q). These general non-linear pricing mechanisms include multi-part tariffs as a special case. Without any assumptions, the parameter space of this mechanism class has an infinite number of dimensions, since the mechanism designer, in theory, could choose prices for each and every m bundle q ∈ Z≥0. In Lemma 8.5, we show that under Assumption 3.13, no buyer will ever choose a bundle q such that q[i] > κi for any i ∈ [m]. This holds so long as the mechanism designer chooses a profit non-negative non-linear pricing mechanism (see Definition 3.16, which is similar to Definition 3.14). Thus, the seller only needs to carefully set the prices of the bundles q such Qm that q[i] ≤ κi for all i ∈ [m]. Letting K = i=1(κi + 1) be the number of such bundles, the set K of anonymous prices the mechanism designer needs to carefully set is a subset of R and the set nK of non-anonymous prices is a subset of R . We now provide the definition of profit non-negative non-linear pricing mechanisms. Definition 3.16. In the case of anonymous prices (respectively, non-anonymous), let P (respec- tively, P 0) be the set of mechanism parameters such that for each buyer j ∈ [n] and each allocation Q = (q1,..., qn), the seller’s utility is non-negative:

n X pj(qj) − c (Q) ≥ 0. j=1

The set of profit non-negative non-linear pricing mechanisms is defined by parameters in P (re- spectively, P 0). Under Assumption 3.13, we are guaranteed that no matter which profit non-negative parameters the mechanism designer chooses in P or P 0, if all buyers simultaneously choose the bundles that Pn maximize their utilities, then j=1 qj[i] ≤ κi for all i ∈ [m]. See Lemma 8.5 in Appendix 8 for the proof. As in the case of two-part tariffs, this fact is crucial to proving delineability since the number of hyperplanes splitting the parameter space into regions where profit is linear depends on κ1, . . . , κm. Moreover, under these assumptions, the set of mechanism parameters is effectively K-dimensional in the case of anonymous prices and nK-dimensional in the case of non-anonymous prices: without loss of generality, the mechanism designer might as well set the price of any bundle q such that q[i] ≥ κi for some i ∈ [m] to ∞. Lemma 3.17. Let M and M0 be the classes of anonymous and non-anonymous profit non-negative Qm 2 non-linear pricing mechanisms. Let K = i=1 (κi + 1). Under Assumption 3.13, M is K, nK - delineable and M0 is nK, nK2-delineable. We prove polynomial bounds when prices are additive over items (Lemma 8.7 in Appendix 8).

3.4.2 Item-pricing mechanisms. We now apply Theorem 3.9 to anonymous and non-anonymous item-pricing mechanisms. Unlike non-linear pricing mechanisms, in this setting, there is only a single unit of each item for sale. Under anonymous prices, the seller sets a price per item. Under non-anonymous prices, there is a buyer-specific price per item. Since there is only one unit of each item for sale, we cannot assume the buyers arrive simultaneously and buy the bundle of goods maximizing their utilities, as we did

17 when studying non-linear pricing. After all, this may lead to over-demand. Rather, we assume that there is some fixed but arbitrary ordering on the buyers such that the first buyer in the ordering arrives first and buys the bundle of items that maximizes his utility, then the next buyer in the ordering arrives and buys the bundle of remaining items that maximizes his utility, and so on. This assumption is prevalent throughout the economics and computation literature (e.g., [4, 22, 37]).

Lemma 3.18. Let M and M0 be the classes of item-pricing mechanisms with anonymous prices and non-anonymous prices, respectively. If the buyers are additive‖, then M is (m, m)-delineable and M0 is (nm, nm)-delineable.

In Appendix 9, we connect the hyperplane structure we investigate in this paper to the struc- tured prediction literature in machine learning (e.g., [26]), thus proving even stronger generalization bounds for item-pricing mechanisms under buyers with unit-demand and general valuations and answering an open question by Morgenstern and Roughgarden [62]. Balcan et al. [9] were the first to explore the connection between structured prediction and mechanism design, though in a different setting from us: they provided algorithms that make use of past data describing the purchases of a utility-maximizing agent to produce a hypothesis function that can accurately forecast the future behavior of the agent. In Section 6, we compare these results with those from prior research [62, 71].

3.4.3 Auctions. We now present applications of Lemma 3.10 to auctions. In each setting, there is a single unit of each item for sale.

Second price item auctions with item reserves. These auctions are only strategy proof for additive bidders, so we restrict our attention to this setting. In the case of non-anonymous reserves, there is a price pj (ei) for each item i and each bidder j. The bidders submit bids on the items. For each item i, the highest bidder j wins the item if and only if her bid is above pj (ei). She pays the maximum of the second highest bid and pj (ei). If the bidder with the highest bid bids below her reserve, the item goes unsold. In the case of anonymous reserves, p1 (ei) = p2 (ei) = ··· = pn (ei) for each item i.

Lemma 3.19. Let M and M0 be the classes of anonymous and non-anonymous second price item auctions. Then M is (m, m)-delineable and M0 is (nm, m)-delineable.

In Section 6, we provide a detailed comparison of this lemma with results from prior research [30, 62, 71], in both the single- and multi-item setting.

Mixed bundling auctions with reserve prices (MBARPs). MBARPs [73] are a variation on the VCG mechanism with item reserve prices, with an additional fixed boost to the social welfare of any allocation where some bidder receives the grand bundle. Recall that in a single-item VCG auction (i.e., second-price auction) with a reserve price, the item is only sold if the highest bidder’s bid exceeds the reserve price, and the winner must pay the maximum of the second highest bid and the reserve price. To generalize this intuition to the multi-item case, we enlarge the set of agents to include the seller, whose valuation for a set of items is the set’s reserve price. An MBARP gives

‖In the two cases where the buyers have unit-demand and general valuations, we prove generalization bounds by connecting the hyperplane structure we investigate in this paper to the structured prediction literature in machine learning. See Appendix 9 for details.

18 an additional additive boost to the social welfare of any allocation where some bidder receives the grand bundle, and then runs the VCG mechanism over this enlarged set of bidders. The allocation is the boosted social welfare maximizer and the payments are the VCG payments on the boosted social welfare values. Importantly, the seller makes no payments, no matter her allocation. Formally, MBARPs are defined by a parameter γ ≥ 0 and m reserve prices p (e1) , . . . , p (em). Let λ be a function such that λ (Q) = γ if some bidder receives the grand bundle under allocation Q and 0 otherwise. For an allocation Q, let qQ be the items not allocated. Given a valuation vector v, the MBARP allocation is

 n  ∗ ∗ ∗ X X  Q = (q1,..., qn) = argmax vj (qj) + p (ei) + λ (Q) − c (Q) .

j=1 i:qQ[i]=1  Using the notation   −j  −j −j X X  Q = q1 ,..., qn = argmax v` (q`) + p (ei) + λ (Q) − c (Q) , `6=j i:qQ[i]=1  bidder j pays

X  −j X −j −j X ∗ X ∗ ∗ v` q` + p (ei) + λ Q − c Q − v` (q` ) − p (ei) − λ (Q ) + c (Q ) . `6=j `6=j i:qQ−j [i]=1 i:qQ∗ [i]=1

Lemma 3.20. Let M be the set of MBARPs. Then M is m + 1, (n + 1)2m+1-delineable. In Table 4, we compare this lemma with results from prior research [10]. Mixed-bundling auctions [49] are a particularly simple class of auctions which correspond to MBARPs with no reserve prices. In this specific case, the class’s profit functions display additional structure that allows us to provide an even better guarantee than that implied by Lemma 3.10: for any valuation vector v, the function profitv is one-dimensional and has exactly one strict local maximum. See Appendix 8.2 for the details.

Affine maximizer auctions (AMAs). AMAs are an expressive mechanism class: Roberts [66] proved that AMAs are the only ex post truthful mechanisms over unrestricted value domains. Later, Lavi et al. [52] proved that under natural assumptions, every truthful multi-item auction is an “almost” AMA, that is, an AMA for sufficiently high values.∗∗ An AMA is defined by a weight per bidder wj ∈ R>0 and a boost per allocation λ (Q) ∈ R≥0. By increasing any wj or λ (Q), the seller can increase bidder j’s bids or increase the likelihood that Q is the auction’s allocation. The AMA allocation Q∗ is the one which maximizes the weighted social welfare, i.e., ∗ ∗ ∗ nPn o Q = (q1,..., qn) = argmax j=1 wjvj (qj) + λ (Q) − c (Q) . The payments have the same form as the VCG payments, with the parameters factored in to ensure truthfulness. Formally, using the notation   −j  −j −j X  Q = q1 ,..., qn = argmax w`v` (q`) + λ (Q) − c (Q) , `6=j 

∗∗Surprisingly, even when the buyers have additive values, AMAs can generate higher revenue than running a separate Myerson auction for each item [69], even though the buyers’ values do not exhibit any complementarity or substitutability. While this may seem surprising, it is to be expected due to prior research in bundle pricing in catalog offers [1, 6, 57].

19 each bidder j pays    1 X  −j −j −j X ∗ ∗ ∗  w`v` q` + λ Q − c Q −  w`v` (q` ) + λ (Q ) − c (Q ) . wj `6=j `6=j

A virtual valuation combinational auction (VVCA) [54] is an AMA where each λ (Q) is split Pn into n terms such that λ (Q) = j=1 λj (Q) where λj (Q) = cj,q for all allocations Q that give bidder j exactly bundle q. Finally, λ-auctions [49] are a special case of AMAs where the bidder weights equal 1.

Lemma 3.21. Let M, M0, and M00 be the classes of AMAs, VVCAs, and λ-auctions, respec- tively. Letting t = (n + 1)2m+1, we have that M is 2n(n + 1) + (n + 1)m+1, t-delineable, M0 is (n2m(3 + 2n), t)-delineable, and M00 is ((n + 1)m , t)-delineable.

In Table 4, we compare this lemma with results from prior research [10]. Lemma 3.21 implies exponentially-many samples are sufficient to ensure that average profit over the samples and expected profit are close. In Section 3.5, we prove an exponential number of samples is also necessary. We now study two hierarchies of AMAs. In Section 5, we show how to learn which level of the hierarchy optimizes the tradeoff between generalization and profit for the setting at hand.

Q-boosted AMAs and λ-auctions. Let Q be a set of allocations. The set of Q-boosted AMAs (resp., λ-auctions) consists of all AMAs (resp., λ-auctions) where only allocations in Q are boosted. In other words, if λ (Q) > 0, then Q ∈ Q.

Lemma 3.22. Let M and M0 be the classes of Q-boosted AMAs and λ-auctions. Then M is     3n(n + |Q|), (n + 1)2(m+1) -delineable and M0 is |Q|, (n + 1) (|Q| + 1)2 -delineable.

3.4.4 Lotteries. We now apply Theorem 3.9 to lottery menus. Lotteries are randomized mechanisms which are known to generate higher expected revenue than deterministic mechanisms in many settings (e.g., [28, 56, 74])  (0) (0) (1) (1) (`) (`) m A length-` lottery menu is a set M = φ , p , φ , p ,..., φ , p ⊆ R ×R, where φ(0) = 0 and p(0) = 0. As is typical in the lottery literature, we assume there is a single additive buyer; our results easily generalize to unit-demand buyers and multiple buyers (see Appendix 8.1 for details). Under the lottery φ(j), p(j), the buyer receives each item i with probability φ(j)[i] (j) (j) and pays a price of p . Since v1(ei) · φ [i] is his value for item i times the probability they get Pm (j) (j) that item, his expected utility is i=1 v1(ei) · φ [i] − p . Given a buyer with values defined by v, let (φv, pv) ∈ M be the lottery that maximizes the buyer’s expected utility. We use the notation q ∼ φv to denote a random allocation of the lottery (φv, pv). Specifically, for all i ∈ [m], q[i] = 1 with probability φv[i] and q[i] = 0 with probability 1 − φv[i]. The expected profit is profitM (v) = pv − Eq∼φv [c (q)]]. Let M be the class of all length-` lottery menus. The key challenge in bounding Pdim (M) is that Eq∼φv [c (q)] is not a piecewise linear function of the parameters φ(0),..., φ(`). To overcome this challenge, rather than bounding Pdim (M), we bound the pseudo-dimension of a related class M0. We then show that optimizing over M0 amounts to optimizing over M itself. To motivate the definition of M0, notice that if m z ∼ U ([0, 1] ), the probability that z[j] is smaller than φv[j] is φv[j]. Therefore, Eq∼φv [c (q)] =

20 h  i   c P e . For a lottery M, we define profit0 (v, z) := p − c P e and Ez j:z[j]<φv[j] j M v j:z[j]<φv[j] j 0  0 0 define M = profitM : M ∈ M . The important insight is that M is delineable because for a fixed pair (v, z), both the lottery the buyer chooses and the bundle P e are defined by j:z[j]<φv[j] j a set of hyperplanes.   Lemma 3.23. The class M0 is ` (m + 1) , (` + 1)2 + m` -delineable. The following lemma guarantees that optimizing over M0 amounts to optimizing over M itself. 0 It follows from the fact that for all v and M, profitM (v) = Ez[profitM (v, z)]. Lemma 3.24. With probability 1 − δ over the draw n   o v(1), z(1) ,..., v(N), z(N) ∼ (D × U ([0, 1])m)N , for all mechanisms M ∈ M, N r r 1   Pdim(M0) 2 ln(4/δ) X 0 (i) (i) profitM v , z − E [profitM (v)] ≤ 120U + 4U . N v∼D N N i=1 As we have seen in this section, a wide variety of mechanism classes are delineable. There- fore, Theorem 3.9 immediately implies a generalization guarantee M(N, δ) for a diverse array of mechanism classes M.

3.5 Sample complexity Generalization guarantees can be inverted to provide sample complexity guarantees. A sample complexity guarantee bounds the number of samples required to ensure a desired bound on gener- alization. In other words, suppose that for some δ ∈ (0, 1) and some  > 0, the mechanism designer requires that with probability at least 1 − δ, for all mechanisms in the class M, average profit is -close to expected profit. By choosing N so that M(N, δ) < , we can be assured that this will be the case. This leads to the following well-known remark. q C1 Remark 3.25. Let M be a mechanism class with a generalization guarantee M(N, δ) = U N + q 2 C2 4U N log δ for some C1,C2 ∈ R and let D be the distribution over buyers’ values. Then for 2 2U C2  δ ∈ (0, 1) and  > 0, a sample size of N = 2 C1 + 32 log δ is sufficient to ensure that with probability at least 1−δ over the draw of a set S ∼ DN , for any mechanism M ∈ M, the difference between the average profit of M over S and the expected profit of M over D is at most . q q C1 2 C2 For every mechanism class M we study, M(N, δ) = U N +4U N log δ for some C1,C2 ∈ R, so Remark 3.25 applies. Notice that the number of samples required to ensure a strong generaliza- tion guarantee for the classes of AMAs, VVCAs, and λ-auctions is expononential in m. We prove that this exponential factor is, in fact, necessary.

Theorem 3.26. For a class of auctions M, let NM(, δ) be the number of samples required to ensure that for any distribution D, with probability at least 1 − δ over the draw of a set of samples of size NM(, δ) from D, for all auctions M ∈ M, average profit is -close to expected profit. nm−n 1. If M is the set of AMAs or λ-auctions, then NM(, δ) ≥ 2 . m 2. If M is the set of VVCAs, then NM(, δ) ≥ 2 − 2. We prove Theorem 3.26 in Appendix 8.3. Thus, delineability implies not only generalization guarantees but also sample complexity guarantees which are nearly tight for AMAs and VVCAs.

21 4 Distribution-dependent generalization guarantees

In this section, we provide two ways to strengthen the results in Section 3 when the buyers’ values are additive and drawn from item-independent distributions. We say a distribution over buyers’ values is item-independent if for all i1, i2 ∈ [n] and j ∈ [m], bidder i1’s value for item j is independent from 0 her value for item j , but her value for item j may be arbitrarily correlated with bidder i2’s value for item j or j0. We also require that the mechanism class’s profit functions decompose additively. For example, under item-pricing mechanisms, the profit function decomposes into the profit obtained from selling item 1, plus the profit obtained by selling item 2, and so on. Surprisingly, our bounds do not depend on the number of items and under anonymous prices, they do not depend on the number of bidders either. To obtain our data-dependent guarantees, we move from pseudo-dimension to Rademacher complexity [13, 51], which allows us to prove distribution-dependent generalization guarantees. This is the key advantage of Rademacher complexity over pseudo-dimension; pseudo-dimension implies generalization guarantees that are worst-case over the distribution whereas Rademacher complexity implies distribution-dependent guarantees. We prove that this shift to Rademacher complexity from pseudo-dimension is in fact necessary in order to obtain guarantees that are independent of the number of items (Theorem 4.7). Before defining Rademacher complexity, we make the notion of a distribution-dependent gen- eralization guarantee more formal, as follows.

Definition 4.1. A distribution-dependent generalization guarantee for a mechanism class M and D a distribution D over buyers’ values is a function M : Z≥1 × (0, 1) → R≥0 defined such that for any sample size N ∈ Z≥1 and any δ ∈ (0, 1), with probability at least 1 − δ over the draw of a set S ∼ DN , for any mechanism M in M, the difference between the average profit of M over S and D the expected profit of M over D is at most M(N, δ). In other words, " # 1 X D Pr ∃M ∈ M such that profitM (v) − E [profitM (v)] > M(N, δ) < δ. S∼DN N v∼D v∈S

The generalization guarantee M is worst case in that in holds for any distribution D. In D contrast, the distribution-dependent generalization guarantee M may be much tighter when the distribution D is “well-behaved”. We now define Rademacher complexity, which allows us to characterize distribution-dependent generalization guarantees. At a high level, Rademacher complexity measures the extent to which a class of mechanism profit functions fit random noise. Intuitively, this measures the richness of a class of mechanisms because more complex classes should be able to fit random noise better than simple classes. Empirical Rademacher complexity can be measured on the set of samples and implies generalization guarantees that improve based on structure exhibited by the set of samples. Formally, given a set S = v(1),..., v(N) , the empirical Rademacher complexity of M with respect to S is defined as " N # 1 X  (i) RbS (M) = sup σi · profitM v , Eσ N M∈M i=1 where σi ∼ U ({−1, 1}). Classic results from learning theory [13, 51, 70] imply the distribution- dependent generalization guarantee r D h i 2 ln (2/δ) M(N, δ) = 2 E RbS (M) + U , S∼DN N

22 where U is the maximum profit achievable by mechanisms in M over the support of D (for example, Theorem 26.5 by Shalev-Shwartz and Ben-David [70]). In other words, with probability 1 − δ over the draw v(1),..., v(N) ∼ DN , for any mechanism M ∈ M,

N r 1   h i 2 ln (2/δ) X (i) profitM v − E [profitM (v)] ≤ 2 E RbS (M) + U . (1) N v∼D N N i=1 S∼D h i If the distribution D is “well-behaved” in the sense that ES∼DN RbS (M) is small, then Equa- tion 1 will provide a generalization guarantee that might be much tighter than the worst-case guarantee based on pseudo-dimension from Theorem 3.9. In the following corollary of Theorem 3.9, we show that if the profit functions of a class M decompose additively into a number of simpler functions, then we can easily bound RbS (M) using the Rademacher complexity of those simpler functions. We then demonstrate the power of this corollary by proving stronger guarantees for many well-studied mechanism classes when the bidders are additive and their valuations are drawn from item-independent distributions. This includes bidders with values drawn from product distributions as a special case, a setting which has been extensively studied in the mechanism design literature (e.g., [5, 20, 22, 39, 44, 81]). We say that a mechanism class M decomposes additively if for all mechanisms M ∈ M, there exist T functions f1,M , . . . , fT,M such that the function profitM can be written as profitM (·) = f1,M (·) + ··· + fT,M (·). We note that this decomposition is not necessarily into coordinates (for example, it is not necessary that each function fi,M corresponds to some buyer or good). Corollary 4.2. Suppose that M is a set of additively decomposable mechanisms. Moreover, suppose that for all mechanisms M ∈ M, the range of fi,M over the support of D is [0,Ui] and that the N class {fi,M : M ∈ M} is (di, ti)-delineable. Then for any set of samples S ∼ D , RbS (M) ≤ PT p 180 i=1 Ui di log (4diti) /N. Proof. The corollary follows from Theorem 3.9 and Lemma 4.3 which shows that pseudo-dimension can be used to bound Rademacher complexity, and the fact that for any sets G and G0 of functions 0 0 0 mapping an abstract domain A to R and any set S ⊆ A, RbS ({g + g : g ∈ G, g ∈ G }) ≤ RbS (G) + 0 RbS (G ). It is well-known that Rademacher complexity and pseudo-dimension are closely connected, as summarized by the following lemma. Lemma 4.3. [[34, 65]] For any mechanism class M and any distribution D over buyers’ values, let U be the maximum profit achievable by mechanisms in M over the support of D. Then for any   q Pdim(M) set of samples S of size N, RbS (M) = O U N .

We now instantiate Corollary 4.2 for several mechanism classes. The proofs are in Appendix 10. Lemma 4.4. Let M and M0 be the sets of second-price auctions with anonymous and non- anonymous reserves. Suppose the bidders are additive, D is item-independent, and the cost function N p 0 p is additive. For any set S ∼ D , RbS (M) ≤ 180U 1/N and RbS (M ) ≤ 180U n log(4n)/N.

Lemma 4.5. Let M and M0 be the sets of anonymous and non-anonymous item-pricing mech- anisms, respectively. Suppose the bidders are additive, D is item-independent, and the cost func- N p 0 tion is additive. For any set of samples S ∼ D , RbS (M) ≤ 180U 1/N and RbS (M ) ≤ 180Upn log(4n)/N.

23 Menus of item lotteries. A length-` item lottery menu is a set of ` lotteries per item. The n (0) (0)  (1) (1)  (`) (`)o (0) (0) menu for item i is Mi = φi , pi , φi , pi ,..., φi , pi , where φi = pi = 0. The   (ji) (ji) Pm (ji) buyer chooses one lottery φi , pi per menu Mi, pays i=1 p , and receives each item i with (ji) probability φi . Lemma 4.6. Let M be the set of length-` item lottery menus. If the bidder is additive, D N is item-independent, and the cost function is additive, then for any set S ∼ D , RbS (M) ≤ 180p2` log (8`3).

Finally, we prove lower bounds showing that one could not hope to prove the generalization guarantees implied by Lemmas 4.4 and 4.5 using pseudo-dimension alone. Theorem 4.7. Let M and M0 be the classes of anonymous and non-anonymous item-pricing mechanisms. Then Pdim (M) ≥ m and Pdim (M0) ≥ nm. The same holds if M and M0 are the classes of second-price auctions with anonymous and non-anonymous reserves.

As Theorem 4.7 demonstrates, shifting to Rademacher complexity from pseudo-dimension is necessary in order to obtain guarantees that are independent of the number of items (Lemmas 4.4 and 4.5).

5 Optimizing the profit-generalization tradeoff

In this section, we use our results from Section 3 to provide tools for optimizing the profit- generalization tradeoff, taking inspiration from classic machine learning results on structural risk minimization [16, 76, 77]. We begin by demonstrating this tradeoff pictorially. For the sake of illus- tration, suppose that M is a mechanism class that decomposes into a nested sequence of subclasses M1 ⊆ · · · ⊆ Mt = M. For example, if M is the class of AMAs, then Mk could be the class of all Q-boosted AMAs with |Q| = k. Prior work [10] gave uniform convergence bounds for AMAs without taking advantage of the class’s hierarchical structure. We illustrate†† uniform convergence bounds with the purple dashed line in Figure 5 with t = 4. On the x-axis, we illustrate the intrinsic complexity of the nested subclasses which can be measured using pseudo-dimension, for example. On the y-axis, for i = 1, 2, 3, 4, we illustrate the average profit over a fixed set of samples S of the mechanism Mˆ i ∈ Mi that maximizes average profit (the orange solid line). In particular, the dot  ˆ  on the orange solid line above Mi illustrates profitS Mi . Since Mi ⊆ Mj for i ≤ j, the average profit of Mˆ j over S is at least as high as the average profit of Mˆ i over S, which is why the orange solid line is increasing. Similarly, the dot on the blue dotted line above Mi illustrates the expected profit of Mˆ i. This blue dotted line begins decreasing when the complexity of the subclass grows to the point that overfitting occurs: a mechanism’s average profit over the samples is no longer indicative of its expected profit on the unknown distribution. The purple dashed line illustrates a ˆ  ˆ  na¨ıve lower bound on the expected profit of Mi which is equal to profitS Mi − M (N, δ). Since  ˆ  the average profit profitS Mi over S increases with i, this lower bound also increases with i, so the mechanism designer may erroneously think that Mˆ 4 is the best mechanism to field.

Our general theorem allows us to be more careful since we can easily derive bounds Mi (N, δ) for each class Mi. Then, we can spread the confidence parameter δ across all subsets M1,..., Mt using P a weight function w : N → [0, 1] such that w (i) ≤ 1. More formally, by a union bound, we are ††These figures are purely illustrative; they are not based on a simulation or real data.

24 Figure 5: An illustration of uniform generalization guarantees versus stronger complexity-dependent bounds for a mechanism class M that decomposes into a nested sequence of subclasses M1 ⊆ M2 ⊆ M3 ⊆ M4. The x-axis is meant to measure each subclass’s intrinsic complexity using a quantity such as pseudo-dimension. Given a fixed set of samples S, the orange solid line plots the hypothetical average profit of the mechanism Mˆ i ∈ Mi that maximizes average profit over the  ˆ  samples. Specifically, the dot on the orange solid line above Mi illustrates profitS Mi . Similarly, the dot on the blue dotted line above Mi illustrates the expected profit of Mˆ i. The purple dashed line illustrates the lower bound on expected profit given by the uniform generalization guarantee:  ˆ  profitS Mi − M (N, δ). Meanwhile, the green dashed-dotted line illustrates the lower bound  ˆ  on expected profit given by the complexity-dependent generalization guarantee: profitS Mi −

Mi (N, δ). We describe the figure in more detail in Section 5. guaranteed that with probability at least 1 − δ, for all mechanisms M ∈ M, the difference between profitS (M) and profitD (M) is at most mini:M∈Mi Mi (N, δ · w (i)). This is illustrated by the green dashed-dotted line in Figure 5, where for i = 1, 2, 3, 4, the lower bound on the expected profit of ˆ Mi is its average profit minus Mi (N, δ · w (i)). This complexity-dependent lower bound indicates when overfitting begins. By maximizing this complexity-dependent lower bound on expected profit, the designer can correctly determine that Mˆ 2 is a better mechanism to field than Mˆ 4. Both the decomposition of M into subsets and the choice of a weight function allow the designer to encode his prior knowledge about the market. For example, if mechanisms in Mi are likely more profitable than others, he can increase w (i). The larger the weight w (i) assigned to Mi is, the larger δ · w (i) is, and a larger δ · w (i) implies a smaller Mi (N, δ · w (i)), thereby implying stronger guarantees. We now present an application of this complexity-dependent analysis to item pricing. Market segmentation is prevalent throughout daily life: movie theaters, amusement parks, and tourist attractions have different admission prices per market segment, with groups such as Child, Student, Adult, and Senior Citizen. Formally, the designer can segment the buyers into k groups and charge each group a different price. If k = 1, the prices are anonymous and if k = n, they are non-anonymous, thus forming a mechanism hierarchy. For k ∈ [n], let Mk be the class of non- anonymous pricing mechanisms where there are k price groups. In other words, for all mechanisms in Mk, there is a partition of the buyers B1,...,Bk such that for all t ∈ [k], all pairs of buyers 0 j, j ∈ Bt, and all items i ∈ [m], pj (ei) = pj0 (ei). We derive the following guarantee for this hierarchy.

25 Theorem 5.1. Let M be the class of non-anonymous item-pricing mechanisms over additive bid- Pn ders and let w :[n] → R be a weight function such that i=1 w (i) ≤ 1. Then for any δ ∈ (0, 1), N with probability at least 1−δ over the draw S ∼ D , for any k ∈ [n] and any mechanism M ∈ Mk, r s km log (4nm) 2 4 |profit (M) − profit (M)| ≤ 360U + 4U ln . S D N N δ · w (k)

We also prove the following theorem for the hierarchy of AMAs defined by the classes of Q- boosted AMAs. For an AMA M, let QM be the set of all allocations Q such that λ (Q) > 0.

Theorem 5.2. Let M be the class of AMAs and let w be a weight function that maps sets of allocations Q to [0, 1] such that P w (Q) ≤ 1. With probability 1 − δ over the draw S ∼ DN , for any M ∈ M, r s nm (n + |QM |) log(4n) 2 4 |profitS (M) − profitD (M)| ≤ 360U + 4U ln . N N δ · w (QM )

Theorem 5.2 follows from Theorems 3.8 and 3.9 as well as Lemma 3.22: we only need to multiply the weight term with δ as it appears in the resulting bound. Lemmas 3.15, 3.22, and 3.23 similarly imply results for two-part tariffs, Q-boosted λ-auctions, and lottery menus (see Theorems 11.1, 11.2, and 11.3 in Appendix 11).

6 Comparison of our results to prior research

Here, we provide a brief comparison of this paper’s results to prior research. These works provide generalization guarantees for a subset of the mechanisms we study. We match or improve over the best-known guarantees for many of the special classes that these papers also studied. Morgenstern and Roughgarden [62] studied “simple” multi-item mechanisms: item-pricing mechanisms and second-price item auctions. See Appendix 9.1 and Tables 3 and 4 for a com- parison of our guarantees. Syrgkanis [71] provided generalization guarantees specifically for the mechanism that maxi- mizes average revenue over the samples. This is in contrast to our bounds, which apply uni- formly to every mechanism in a given class. This is crucial when it may not be computation- ally feasible to determine a mechanism with highest average revenue over the samples, but only an approximation. Syrgkanis [71] analyzed multi-item, multi-bidder item-pricing mechanisms when the bidders have additive valuations and there is no supplier cost. Let M be this mech- anism class with anonymous prices and let M0 be this class with non-anonymous prices. For a set S of samples, let Mˆ and Mˆ 0 be the mechanisms in M and M0 that maximize average revenue over the samples. Syrgkanis [71] proved that with probability 1 − δ over the draw of     N ˆ p S ∼ D , profitD M − maxM∈M profitD (M) = O (U/δ) m log (nN) /N . When D is item-   independent, our bound of O Uplog (1/δ) /N improves over this bound. Otherwise, our bound   of O Upm log (m) /N + Uplog (1/δ) /N is incomparable. Syrgkanis [71] also proved that with     ˆ 0 p probability 1 − δ, profitD M − maxM∈M0 profitD (M) = O (U/δ) nm log (N) /N . Our   bound of O Upnm log (nm) /N + Uplog (1/δ) /N is incomparable.

26 Cai and Daskalakis [20] provided learning algorithms for bidders with valuations drawn from product distributions, which are item-independent‡‡. At a high level, their algorithms return mechanisms with expected revenue that is nearly a constant fraction of the optimal expected revenue (OPT ) obtainable by any randomized and Bayesian truthful mechanism. When the buyers have additive valuations, they proved that given N samples, the difference between the OPT D expected revenue of the mechanism their algorithm outputs and 32 is M(N, δ) (as defined in Definition 4.1), where D is the distribution over buyers’ values and M is the set of non- anonymous item-pricing mechanisms (Lemma 15 and Theorem 13 of the full paper by Cai and Daskalakis [21]). At the time, Morgenstern and Roughgarden’s (2016) bound of D (N, δ) =   M O Upnm log(nm)/N + Uplog(1/δ)/N was the best-known bound.§§ Via Lemma 4.5, we im- D  p p  prove this to M(N, δ) = O U n log n/N + U log(1/δ)/N , removing the dependence on the number of items. They also provided algorithms for bidders with other types of valuations, such as subadditive and XOS. In these cases, their algorithms return item-pricing mechanisms with entry fees. Our main theorem would provide pessimistic guarantees for these mechanisms due to the exponentially large number of parameters. Medina and Vassilvitskii [58] studied single-bidder, multi-item pricing in a different model from ours, where there is no bound on the number of items but each item is defined by a feature vector. Their pricing algorithm has access to a bid predictor mapping from feature vectors to bids. They related their algorithm’s performance to the bid predictor’s accuracy, among other factors. Devanur et al. [31] studied several single-item auction classes, including the class M of second price item auctions with non-anonymous reserves and no cost function. When the buyers’ values   are contained in the interval [1,U], they proved that N = O (U/)2 (n log (U/) + log (1/δ)) samples are sufficient to ensure with probability 1 − δ over the draw S ∼ DN , for all M ∈ M, |profitS (M) − profitD (M)| ≤  (Section 6.1 of the full paper by Devanur et al. [31]). Our Lemma 3.19 for the multi-item case, specialized to the single-item setting, implies that ! U 2  1 O n log n + log  δ samples are sufficient, which is incomparable to Devanur et al.’s bound due to the log factors. Gonczarowski and Weinberg [42] study a setting where there are n bidders with additive, inde- pendent values in the interval [0,H] for m items, as well as a generalization to Lipschitz valuation functions. They prove that poly(n, m, H, 1/) samples are sufficient to learn an approximately Bayesian incentive compatible mechanism whose revenue is within an -factor of optimal. They prove the same result for approximately dominant strategy incentive compatible mechanisms as well. This is an information-theoretic result: their learning algorithm is not computationally effi- cient since it is not known how to find an -approximately optimal mechanism in polynomial time even when the distribution over buyers’ values is known explicitly. In contrast, we are able to handle arbitrarily correlated distributions over combinatorial valuations, though our guarantees apply only to parameterized mechanism classes that may not compete with the optimal incentive compatible mechanism. Meanwhile, Gonczarowski and Weinberg [42] bound the number of samples sufficient to compete with the optimal incentive compatible mechanism, and thus are not restricted to a parameterized set of mechanisms, but require the buyers’ values to be independent and Lipschitz.

‡‡Item-independent distributions are discussed in Section 4. §§For this bound by Morgenstern and Roughgarden [62] to hold, revenue must be contained in the interval [0,U], as we assume throughout this paper.

27 7 Conclusion

We studied profit maximization while relaxing the assumption that the mechanism designer knows the distribution over buyers’ values. We only assumed he has a set of samples from that distribu- tion. We provided a unifying framework for bounding the complexity of a wide variety of mech- anism classes. We characterized structural similarities of mechanism classes including non-linear pricing mechanisms, combinatorially challenging variants of the VCG mechanism such as affine maximizer auctions, and randomized mechanisms (lotteries): profit is a piecewise-linear function of the mechanism class’s parameters. These similarities led us to a general theorem that offers generalization guarantees for a broad range of mechanism classes. It offers the first generalization guarantees for many important mechanism design settings. It also matches and improves over many of the special-purpose generalization guarantees already provided in the nascent literature on sample-based mechanism design. Finally, we provided tools for optimizing a fundamental tradeoff in sample-based mechanism design: more complex mechanisms have higher average profit over the samples than simpler mechanisms, but require more samples to ensure that average profit over samples nearly matches actual expected profit, that is, to avoid overfitting.

Acknowledgements. This material is based on work supported by the National Science Foun- dation under grants CCF-1422910, CCF-1535967, CCF-1733556, CCF-1910321, IIS-1617590, IIS- 1618714, IIS-1718457, IIS-1901403, SES-1919453, and a Graduate Research Fellowship; the ARO under awards W911NF2010081 and W911NF1710082; the Defense Advanced Research Projects Agency under cooperative agreement HR00112020003; an Amazon Research Award; a Microsoft Research Faculty Fellowship; an AWS Machine Learning Research Award; a Bloomberg Data Sci- ence research grant; an IBM PhD Fellowship; and a fellowship from Carnegie Mellon University’s Center for Machine Learning and Health.

References

[1] William James Adams and Janet L. Yellen. Commodity bundling and the burden of monopoly. Quarterly Journal of Economics, 90(3):475–498, Aug 1976.

[2] Noga Alon, Moshe Babaioff, Yannai A Gonczarowski, Yishay Mansour, Shay Moran, and Amir Yehudayoff. Submultiplicative Glivenko-Cantelli and uniform convergence of revenues. Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), 2017.

[3] Victor F. Araman and Ren´eCaldentey. Dynamic pricing for nonperishable products with demand learning. Operations Research, 57(5):1169–1188, 2009.

[4] Moshe Babaioff, Nicole Immorlica, Brendan Lucier, and S. Matthew Weinberg. A simple and approximately optimal mechanism for an additive buyer. In Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS), 2014.

[5] Moshe Babaioff, Yannai A Gonczarowski, and Noam Nisan. The menu-size complexity of revenue approximation. In Proceedings of the Annual Symposium on Theory of Computing (STOC), 2017.

[6] Yannis Bakos and Erik Brynjolfsson. Bundling information goods: Pricing, profits, and effi- ciency. Management Science, 45(12):1613–1630, Dec 1999.

28 [7] Maria-Florina Balcan, Avrim Blum, Jason D. Hartline, and Yishay Mansour. Mechanism de- sign via machine learning. In Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS), pages 605–614, 2005.

[8] Maria-Florina Balcan, Avrim Blum, Jason Hartline, and Yishay Mansour. Reducing mech- anism design to algorithm design via machine learning. Journal of Computer and System Sciences, 74:78–89, December 2008.

[9] Maria-Florina Balcan, Amit Daniely, Ruta Mehta, Ruth Urner, and Vijay V Vazirani. Learning economic parameters from revealed preferences. In Proceedings of the Conference on Web and Internet Economics (WINE), 2014.

[10] Maria-Florina Balcan, Tuomas Sandholm, and Ellen Vitercik. Sample complexity of automated mechanism design. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), 2016.

[11] Maria-Florina Balcan, Tuomas Sandholm, and Ellen Vitercik. A general theory of sample com- plexity for multi-item profit maximization. Proceedings of the ACM Conference on Economics and Computation (EC), 2018.

[12] Maria-Florina Balcan, Siddharth Prasad, and Tuomas Sandholm. Efficient algorithms for learning revenue-maximizing two-part tariffs. In Proceedings of the International Joint Con- ference on Artificial Intelligence (IJCAI), 2020.

[13] Peter L Bartlett and Shahar Mendelson. Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3(Nov):463–482, 2002.

[14] Michael Benisch, Norman Sadeh, and Tuomas Sandholm. Methodology for designing reason- ably expressive mechanisms with application to ad auctions. In Proceedings of the Interna- tional Joint Conference on Artificial Intelligence (IJCAI), 2009. Earlier version in proceedings of ACM EC-08 Workshop on Advertisement Auctions.

[15] Omar Besbes and Assaf Zeevi. Dynamic pricing without knowing the demand function: Risk bounds and near-optimal algorithms. Operations Research, 57(6):1407–1420, 2009.

[16] Anselm Blumer, Andrzej Ehrenfeucht, David Haussler, and Manfred K Warmuth. Occam’s razor. Information processing letters, 24(6):377–380, 1987.

[17] Josef Broder and Paat Rusmevichientong. Dynamic pricing under a general parametric choice model. Operations Research, 60(4):965–980, 2012.

[18] S´ebastienBubeck, Nikhil R Devanur, Zhiyi Huang, and Rad Niazadeh. Online auctions and multi-scale online learning. Proceedings of the ACM Conference on Economics and Computa- tion (EC), 2017.

[19] R. C. Buck. Partition of space. Amer. Math. Monthly, 50:541–544, 1943. ISSN 0002-9890.

[20] Yang Cai and Constantinos Daskalakis. Learning multi-item auctions with (or without) sam- ples. In Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS), 2017.

[21] Yang Cai and Constantinos Daskalakis. Learning multi-item auctions with (or without) sam- ples. arXiv preprint arXiv:1709.00228, 2017.

29 [22] Yang Cai, Nikhil R. Devanur, and S. Matthew Weinberg. A duality based unified approach to Bayesian mechanism design. In Proceedings of the Annual Symposium on Theory of Computing (STOC), 2016. [23] Shuchi Chawla, Jason D Hartline, and Robert Kleinberg. Algorithmic pricing via virtual valuations. In Proceedings of the ACM Conference on Economics and Computation (EC), 2007. [24] Shuchi Chawla, Jason Hartline, and Denis Nekipelov. Mechanism design for data science. In Proceedings of the ACM Conference on Economics and Computation (EC), 2014. [25] Richard Cole and . The sample complexity of revenue maximization. In Proceedings of the Annual Symposium on Theory of Computing (STOC), 2014. [26] Michael Collins. Discriminative reranking for natural language parsing. Proceedings of the International Conference on Machine Learning (ICML), 2000. [27] Vincent Conitzer and Tuomas Sandholm. Complexity of mechanism design. In Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI), 2002. [28] Vincent Conitzer and Tuomas Sandholm. Applications of automated mechanism design. In UAI-03 workshop on Bayesian Modeling Applications, 2003. [29] Vincent Conitzer and Tuomas Sandholm. Self-interested automated mechanism design and implications for optimal combinatorial auctions. In Proceedings of the ACM Conference on Economics and Computation (EC), pages 132–141, 2004. [30] Nikhil R Devanur, Zhiyi Huang, and Christos-Alexandros Psomas. The sample complexity of auctions with side information. In Proceedings of the Annual Symposium on Theory of Computing (STOC), 2016. [31] Nikhil R Devanur, Zhiyi Huang, and Christos-Alexandros Psomas. The sample complexity of auctions with side information. arXiv preprint arXiv:1511.02296, 2017. [32] Shahar Dobzinski and Shaddin Dughmi. On the power of randomization in algorithmic mech- anism design. In Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS), 2009. [33] Shahar Dobzinski and Mukund Sundararajan. On characterizations of truthful mechanisms for combinatorial auctions and scheduling. In Proceedings of the ACM Conference on Economics and Computation (EC), 2008. [34] Richard Dudley. Universal Donsker classes and metric entropy. The Annals of Probability, 15 (4):1306–1326, 1987. [35] Benjamin Edelman, Michael Ostrovsky, and Michael Schwarz. Internet advertising and the generalized second-price auction: Selling billions of dollars worth of keywords. The American Economic Review, 97(1):242–259, March 2007. ISSN 0002-8282. [36] Edith Elkind. Designing and learning optimal finite support auctions. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA), 2007. [37] Michal Feldman, Nick Gravin, and Brendan Lucier. Combinatorial auctions via posted prices. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA), 2015.

30 [38] Martin S Feldstein. Equity and efficiency in public sector pricing: the optimal two-part tariff. The Quarterly Journal of Economics, pages 176–187, 1972.

[39] Kira Goldner and Anna R Karlin. A prior-independent revenue-maximizing auction for mul- tiple additive bidders. In Proceedings of the Conference on Web and Internet Economics (WINE), 2016.

[40] Negin Golrezaei, Adel Javanmard, and Vahab Mirrokni. Dynamic incentive-aware learning: Robust pricing in contextual auctions. Operations Research, 2020. (forthcoming).

[41] Yannai A Gonczarowski and Noam Nisan. Efficient empirical revenue maximization in single- parameter auction environments. In Proceedings of the Annual Symposium on Theory of Com- puting (STOC), pages 856–868, 2017.

[42] Yannai A Gonczarowski and S Matthew Weinberg. The sample complexity of up-to-ε multi- dimensional revenue maximization. In Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS), 2018.

[43] Chenghao Guo, Zhiyi Huang, and Xinzhi Zhang. Settling the sample complexity of single- parameter revenue maximization. Proceedings of the Annual Symposium on Theory of Com- puting (STOC), 2019.

[44] Sergiu Hart and Noam Nisan. Approximate revenue maximization with multiple items. In Proceedings of the ACM Conference on Economics and Computation (EC), 2012.

[45] Jason Hartline and Samuel Taggart. Non-revelation mechanism design. arXiv preprint arXiv:1608.01875, 2016.

[46] David Haussler. Generalizing the PAC model: Sample size bounds from metric dimension- based uniform convergence results. In Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS), 1989.

[47] Xinran He, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yanxin Shi, Antoine Atal- lah, Ralf Herbrich, Stuart Bowers, and Joaquin Quinonero Candela. Practical lessons from predicting clicks on ads at Facebook. In Proceedings of the International Workshop on Data Mining for Online Advertising, 2014.

[48] Zhiyi Huang, Yishay Mansour, and Tim Roughgarden. Making the most of your samples. In Proceedings of the ACM Conference on Economics and Computation (EC), 2015.

[49] Philippe Jehiel, Moritz Meyer-Ter-Vehn, and Benny Moldovanu. Mixed bundling auctions. Journal of Economic Theory, 134(1):494–512, 2007.

[50] Yash Kanoria and Hamid Nazerzadeh. Incentive-compatible learning of reserve prices for repeated auctions. Operations Research, 2020. (forthcoming).

[51] Vladimir Koltchinskii. Rademacher penalties and structural risk minimization. IEEE Trans- actions on Information Theory, 47(5):1902–1914, 2001.

[52] Ron Lavi, Ahuva Mu’Alem, and Noam Nisan. Towards a characterization of truthful combina- torial auctions. In Proceedings of the IEEE Symposium on Foundations of Computer Science (FOCS), 2003.

31 [53] Yi Li, Philip M Long, and Aravind Srinivasan. Improved bounds on the sample complexity of learning. Journal of Computer and System Sciences, 62(3):516–527, 2001.

[54] Anton Likhodedov and Tuomas Sandholm. Methods for boosting revenue in combinatorial auctions. In Proceedings of the AAAI Conference on Artificial Intelligence, 2004.

[55] Anton Likhodedov and Tuomas Sandholm. Approximating revenue-maximizing combinatorial auctions. In Proceedings of the AAAI Conference on Artificial Intelligence, 2005.

[56] Alejandro M Manelli and Daniel R Vincent. Bundling as an optimal selling mechanism for a multiple-good monopolist. Journal of Economic Theory, 127(1):1–35, 2006.

[57] R. Preston McAfee, John McMillan, and Michael D. Whinston. Multiproduct monopoly, commodity bundling, and correlation of values. Quarterly Journal of Economics, 104(2):371– 383, May 1989.

[58] Andr´esMu˜nozMedina and Sergei Vassilvitskii. Revenue optimization with approximate bid predictions. Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), 2017.

[59] Mehryar Mohri and Andr´esMu˜nozMedina. Learning theory and algorithms for revenue opti- mization in second price auctions with reserve. In Proceedings of the International Conference on Machine Learning (ICML), 2014.

[60] Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar. Foundations of Machine Learn- ing. MIT Press, 2012.

[61] Jamie Morgenstern and Tim Roughgarden. On the pseudo-dimension of nearly optimal auc- tions. In Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), 2015.

[62] Jamie Morgenstern and Tim Roughgarden. Learning simple auctions. In Proceedings of the Conference on Learning Theory (COLT), 2016.

[63] Roger Myerson. Optimal auction design. Mathematics of Operation Research, 6:58–73, 1981.

[64] Walter Y Oi. A Disneyland dilemma: Two-part tariffs for a Mickey Mouse monopoly. The Quarterly Journal of Economics, 85(1):77–96, 1971.

[65] David Pollard. Convergence of Stochastic Processes. Springer, 1984.

[66] Kevin Roberts. The characterization of implementable social choice rules. In J-J Laffont, editor, Aggregation and Revelation of Preferences. North-Holland Publishing Company, 1979.

[67] Tim Roughgarden and Okke Schrijvers. Ironing in the dark. In Proceedings of the ACM Conference on Economics and Computation (EC), 2016.

[68] Tuomas Sandholm. Automated mechanism design: A new application area for search algo- rithms. In Proceedings of the International Conference on Principles and Practice of Constraint Programming (CP), 2003.

[69] Tuomas Sandholm and Anton Likhodedov. Automated design of revenue-maximizing combi- natorial auctions. Operations Research, 63(5):1000–1025, September–October 2015.

32 [70] Shai Shalev-Shwartz and Shai Ben-David. Understanding machine learning: From theory to algorithms. Cambridge University Press, 2014.

[71] Vasilis Syrgkanis. A sample complexity measure with applications to learning optimal auctions. Proceedings of the Annual Conference on Neural Information Processing Systems (NIPS), 2017.

[72] Pingzhong Tang. Reinforcement mechanism design. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), 2017.

[73] Pingzhong Tang and Tuomas Sandholm. Mixed-bundling auctions with reserve prices. In Proceedings of the Conference for Autonomous Agents and Multi-Agent Systems (AAMAS). International Foundation for Autonomous Agents and Multiagent Systems, 2012.

[74] John Thanassoulis. Haggling over substitutes. Journal of Economic theory, 117(2):217–245, 2004.

[75] Vladimir Vapnik. Estimation of dependences based on empirical data. Springer Science & Business Media, 2006.

[76] Vladimir Vapnik. The nature of statistical learning theory. Springer science & business media, 2013.

[77] Vladimir Vapnik and Alexey Chervonenkis. Theory of pattern recognition, 1974.

[78] VN Vapnik and A Ya Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16(2):264–280, 1971.

[79] William E Walsh, David C Parkes, Tuomas Sandholm, and Craig Boutilier. Computing reserve prices and identifying the value distribution in real-world auctions with market disruptions. In Proceedings of the AAAI Conference on Artificial Intelligence, 2008.

[80] Robert B Wilson. Nonlinear pricing. Oxford University Press on Demand, 1993.

[81] Andrew Chi-Chih Yao. An n-to-1 bidder reduction for multi-item auctions and its applications. In Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA), 2014.

[82] Hector Yee and Bar Ifrach. Aerosolve: Machine learning for humans. Open Source, 2015. URL http://nerds.airbnb.com/aerosolve/.

8 Proofs from Section 3

Corollary 8.1. Let M ∗ ∈ M be the mechanism that maximizes expected profit over the distribution over buyers’ values. For any δ ∈ (0, 1), with probability at least 1 − δ over the draw of a set of samples S of size N from the distribution over buyers’ values, the difference between the expected q ˆ ∗ δ  1 4 profit of Mρ and expected profit of M is at most ρ + M N, 2 + U 2N ln δ . Proof. Let M(S) be the mechanism in M that maximizes empirical profit over S. With probability at least 1 − δ,

   δ    profit Mˆ +  N, ≥ profit Mˆ (2) D ρ M 2 S ρ

33 ≥ profitS (M(S)) − ρ (3) ∗ ≥ profitS (M ) − ρ (4) r 2 ln(4/δ) ≥ profit (M ∗) − U − ρ. (5) D 2N Inequality (2) follows from standard uniform convergence bounds: with probability at least 1 − δ/2,      δ  profit Mˆ − profit Mˆ ≤  N, . D ρ S ρ M 2

Inequality (3) follows from the fact that Mˆ ρ has empirical profit that is within an additive fac-  ˆ  tor of ρ from empirically optimal over the set of samples, or in other words, profitS Mρ ≥ profitS (M(S)) − ρ. Inequality (4) follows because M(S) is the empirical profit maximizer (i.e., it maximizes profitS (M)). Finally, inequality (5) is a result of Hoeffding’s inequality, which guaran- q ∗ ∗ 2 ln(4/δ) tees that with probability at least 1 − δ/2, profitS (M ) ≥ profitD (M ) − U 2N . Rearranging, we get that r    δ  2 ln(4/δ) profit Mˆ ≥ profit (M ∗) −  N, − U − ρ, D ρ D M 2 2N as claimed.

Corollary 8.2. Let M be a mechanism class and let M ∗ be a mechanism in M with maximum expected profit. Given a set of samples S, let Mˆ α be a mechanism in M with empirical profit that  ˆ  is at least an α-fraction of the empirically optimal: profitS Mα ≥ α · maxM∈M profitS (M). For any δ ∈ (0, 1), with probability at least 1 − δ over the draw of a set of samples S of size N from the distribution over buyers’ values, the difference between the expected profit of Mˆ α and an α-fraction q ∗ δ  ln(4/δ) of the expected profit of M is at most M N, 2 + Uα 2N . In other words, r    δ  2 ln(4/δ) profit Mˆ ≥ α · profit (M ∗) −  N, − Uα . D α D M 2 2N

Proof. Let S = v1,..., vN be a set of samples of bidder valuations. With probability at least 1 − δ,    ˆ  δ  ˆ  profitD Mα + M N, ≥ profitS Mα ≥ α · max profitS (M) 2 M∈M r 2 ln(4/δ) ≥ α · profit (M ∗) ≥ α · profit (M ∗) − Uα . S D 2N These inequalities follow for the same reasons as in the proof of Corollary 8.1.

Lemma 8.3 (Shalev-Shwartz and Ben-David [70], Lemma A.2). Let a ≥ 1 and b > 0. Then x < a log x + b implies that x < 4a log(2a) + 2b.

Lemma 8.4. No matter which parameters the mechanism designer chooses in P or P 0, if all buyers simultaneously choose the tariff and the number of units (q1, . . . , qn) that maximize their utilities, Pn then j=1 qj ≤ κ.

34 Proof. We prove this lemma for non-anonymous prices, and the lemma for anonymous prices follow since they are a special case of non-anonymous prices. For a contradiction, suppose there exists a set of buyers’ values v and a non-anonymous menu of two-part tariffs with parameters in P 0 such Pn that if tj is the tariff that buyer j chooses and qj is the number of units he chooses, j=1 qj > κ. Pn (tj ) (tj ) Since the mechanisms are profit non-negative, we know that j=1 p1,j ·1{qj ≥1} +p2,j ·qj −c (Q) ≥ 0, where Q = (q1, . . . , qn). We also know that each buyer’s value for the units he bought is greater Pn Pn (tj ) (tj ) Pn than the price: j=1 vj(qj) ≥ j=1 p1,j · 1{qj ≥1} + p2,j · qj. Therefore, j=1 vj(qj) − c(Q) ≥ 0. However, this contradicts Assumption 3.13, so the lemma holds.

Lemma 3.15. Let M and M0 be the classes of anonymous and non-anonymous profit non-negative   length-` menus of two-part tariffs. Under Assumption 3.13, M is 2`, n (κ`)2 -delineable and M0   is 2n`, n (κ`)2 -delineable.

Proof. In the case of anonymous prices, every length-` menu of two-part tariffs is defined by d = 2` parameters: the fixed fee and unit price for each of the ` menu entries. Buyer j will choose the (i) (i) (i) (i) quantity q and menu entry (p0 , p1 ) that maximizes vj(q) − (p0 · 1q>0 + p1 q). By Lemma 8.4, we know each buyer will only choose at most κ units. Therefore, the quantity q and menu entry 2 d (i) (i) that she chooses is determined by (κ`) hyperplanes in R of the form vj(q) − (p0 · 1q>0 + p1 q) ≥ 0 (k) (k) 0 2 vj(q ) − (p0 · 1q0>0 + p1 q ). In total, there are n(`κ) hyperplanes that determine the menu entry and quantity demanded by all n buyers, over which profit is linear in the fixed fees and unit prices. In the case of non-anonymous reserve prices, the same argument holds, except that every length- ` menu of two-part tariffs is defined by 2n` parameters: for each buyer, we must set the fixed fee and unit price for each of the ` menu entries.

Lemma 8.5. No matter which parameters the mechanism designer chooses in P or P 0, if all buyers Pn simultaneously choose the bundles that maximize their utilities, then j=1 qj[i] ≤ κi for all i ∈ [m]. Proof. We prove this lemma for non-anonymous prices, and the lemma for anonymous prices follow since they are a special case of non-anonymous prices. For a contradiction, suppose there exists a set of buyers’ values v and a non-anonymous non-linear pricing mechanism with parameters in P 0 such Pn that if qj is the bundle buyer j chooses, j=1 qj[i] > κi for some i ∈ [m]. Since the mechanisms are Pn profit non-negative, we know that j=1 pj(qj) − c (Q) ≥ 0, where Q = (q1,..., qn). We also know Pn Pn that each buyer’s value for the units he bought is greater than the price: j=1 vj(qj) ≥ j=1 pj(qj). Pn Therefore, j=1 vj(qj) − c(Q) ≥ 0. However, this contradicts Assumption 3.13, so the lemma holds.

Lemma 3.17. Let M and M0 be the classes of anonymous and non-anonymous profit non-negative Qm 2 non-linear pricing mechanisms. Let K = i=1 (κi + 1). Under Assumption 3.13, M is K, nK - delineable and M0 is nK, nK2-delineable.

Proof. We begin by analyzing the case where there are anonymous prices. By Lemma 8.5, the mechanism designer might as well set the price of any bundle q such that q[i] ≥ κi for some i ∈ [m] Qm to ∞. Therefore, every non-linear pricing mechanism is defined by d = i=1 (κi + 1) parameters because that is the number of different bundles and there is a price per bundle. Buyer j will prefer the bundle corresponding to the quantity vector q over the bundle corresponding to the quantity 0 0 0 Qm 2 vector q if vj(q)−p(q) ≥ vj(q )−p(q ). Therefore, there are at most i=1 (κi + 1) hyperplanes in d R determining each buyer’s preferred bundle — one hyperplane per pair of bundles. This means Qm 2 d that there are a total of n i=1 (κi + 1) hyperplanes in R such that in any one region induced by

35 these hyperplanes, the bundles demanded by all n buyers are fixed and profit is linear in the prices of these n bundles. In the case of non-anonymous prices, the same argument holds, except that every non-linear Qm pricing mechanism is defined by n i=1 (κi + 1) parameters — one parameter per bundle-buyer pair.

Definition 8.6 (Additively decomposable non-linear pricing mechanisms). Additively decompos- able non-linear pricing mechanisms are a subset of non-linear pricing mechanisms where the prices are additive over the items. Specifically, if the prices are anonymous, there exist m functions (i) P (i) p :[κi] → R for all i ∈ [m] such that for every quantity vector q, p(q) = i:q[i]≥1 p (q[i]). If the (i) prices are non-anonymous, there exist nm functions pj :[κi] → R for all i ∈ [m] and j ∈ [n] such P (i) that for every quantity vector q, pj(q) = i:q[i]≥1 pj (q[i]).

Lemma 8.7. Let M and M0 be the classes of additively decomposable non-linear pricing mecha- nisms with anonymous and non-anonymous prices, respectively. Then M is

m m ! X Y 2 (κi + 1), n (κi + 1) -delineable i=1 i=1

0  Pm Qm 2 and M is n i=1 (κi + 1) , n i=1 (κi + 1) -delineable.

Proof. In the case of anonymous prices, any additively decomposable non-linear pricing mechanism Pm is defined by d = i=1(κi + 1) parameters. As in the proof of Lemma 3.17, there are a total of Qm 2 d n i=1(κi + 1) hyperplanes in R such that in any one region induced by these hyperplanes, the bundles demanded by all n buyers are fixed and profit is linear in the prices of these n bundles. In the case of non-anonymous prices, the same argument holds, except that every non-linear Pm pricing mechanism is defined by n i=1(κi + 1) parameters — one parameter per item, quantity, and buyer tuple.

Lemma 3.18. Let M and M0 be the classes of item-pricing mechanisms with anonymous prices and non-anonymous prices, respectively. If the buyers are additive¶¶, then M is (m, m)-delineable and M0 is (nm, nm)-delineable.

Proof. In the case of anonymous prices, every item-pricing mechanisms is defined by m prices m m p ∈ R , so the parameter space is R . Let ji be the buyer with the highest value for item i. We know that item i will be bought so long as vji (ei) ≥ p(ei). Once the items bought are fixed, profit m is linear. Therefore, there are m hyperplanes splitting R into regions where profit is linear. nm In the case of non-anonymous prices, the parameter space is R since there is a price per buyer and per item. The items each buyer j is willing to buy is defined by m hyperplanes: vj(ei) ≥ pj(ei). So long as these preferences are fixed, profit is a linear function of the prices. Therefore, there are nm nm hyperplanes splitting R into regions where profit is linear. Lemma 3.19. Let M and M0 be the classes of anonymous and non-anonymous second price item auctions. Then M is (m, m)-delineable and M0 is (nm, m)-delineable. ¶¶In the two cases where the buyers have unit-demand and general valuations, we prove generalization bounds by connecting the hyperplane structure we investigate in this paper to the structured prediction literature in machine learning. See Appendix 9 for details.

36 0 Proof. For a given valuation vector v, let ji be the highest bidder for item i and let ji be the second highest bidder. Under anonymous prices, item i will be bought so long as vji (ei) ≥ p(ei). If buyer j buys item i, his payment depends on whether or not v 0 (e ) ≥ p(e ). Therefore, there are t = 2m i ji i i m hyperplanes splitting R into regions where profit is linear. In the case of non-anonymous prices, nm the only difference is that the parameter space is R . Lemma 3.20. Let M be the set of MBARPs. Then M is m + 1, (n + 1)2m+1-delineable.

Proof. An MBARP is defined by m + 1 parameters since there is one reserve per item and one allocation boost. Let K = (n + 1)m be the total number of allocations. Fix some valuation vector v. We claim that the allocation of any MBARP is determined by at most (n + 1)K2 hyperplanes m+1 k k k ` ` `  in R . To see why this is, let Q = q1 ,..., qn and Q = q1,..., qn be any two allocations K and let qQk and qQ` be the bundles of items not allocated. Consider the 2 hyperplanes defined as n n X  ` X  `  ` X  k X  k  k vi qi + p (ei) + λ Q − c Q = vi qi + p (ei) + λ Q − c Q . i=1 i=1 j:qQ` [i]=1 j:qQk [i]=1

K In the intersection of these 2 hyperplanes, the allocation of the MBARP is fixed. By a similar argument, it is straightforward to see that K2 hyperplanes determine the allocation of any MBARP in this restricted space without any one bidder’s participation. This leads us to a total of (n + 1)K2 hyperplanes which partition the space of MBARP parameters in a way such that for any two parameter vectors in the same region, the auction allocations are the same, as are the allocations without any one bidder’s participation. Once these allocations are fixed, profit is a linear function in this parameter space.

Lemma 3.21. Let M, M0, and M00 be the classes of AMAs, VVCAs, and λ-auctions, respec- tively. Letting t = (n + 1)2m+1, we have that M is 2n(n + 1) + (n + 1)m+1, t-delineable, M0 is (n2m(3 + 2n), t)-delineable, and M00 is ((n + 1)m , t)-delineable.

Proof. Let K = (n + 1)m be the total number of allocations and let p be a parameter vector where the first n components correspond to the bidder weights wj for j ∈ [n], the next n components n correspond to 1/wj for j ∈ [n], the next 2 2 components correspond to wi/wj for all i 6= j, the next K components correspond to λ(Q) for every allocation Q, and the final nK components correspond to λ(Q)/wj for all allocations Q and all bidders j ∈ [n]. In total, the dimension of this parameter space is at most 2n + 2n2 + K + nK = O(nK). Let v be a valuation vector. We claim that this parameter space can be partitioned using t = (n + 1)K2 hyperplanes into regions where in any one 0 0 region P , there exists a vector k such that profitv(p) = k · p for all p ∈ P . To this end, an allocation Q = (q1,..., qn) will be the allocation of the AMA so long as Pn Pn 0 0 0 0 0 0 i=1 wivi (qi)+λ(Q)−c(Q) ≥ i=1 wivi (qi)+λ (Q )−c(Q ) for all allocations Q = (q1,..., qn) 6= Q. Since the number of different allocations is at most K, the allocation of the auction on v 2 d −1 −n is defined by at most K hyperplanes in R . Similarly, the allocations Q ,...,Q are also 2 d determined by at most K hyperplanes in R . Once these allocations are fixed, profit is a linear function of this parameter space. The proof for VVCAs follows the same argument except that we redefine the parameter space to consist of vectors where the first n components correspond to the bidder weights wj for j ∈ [n], n the next n components correspond to 1/wj for j ∈ [n], the next 2 2 components correspond to 0 m wi/wj for all i 6= j, the next K = n2 components correspond to the bidder-specific bundle boosts 0 cj,q for every quantity vector q and bidder j ∈ [n], and the final nK components correspond to

37 ck,q/wj for every quantity vector q and every pair of bidders j, k ∈ [n]. The dimension of this parameter space is at most 2n + 2n2 + K0 + nK0 ≤ 2K0 + nK0 + K0 + nK0 = O(nK0). Finally, the proof for λ-auctions follows the same argument as the proof for AMAs except there are zero bidder weights. Therefore, the parameter space consists of vectors with K components corresponding to λ(Q) for every allocation Q.

Lemma 3.22. Let M and M0 be the classes of Q-boosted AMAs and λ-auctions. Then M is     3n(n + |Q|), (n + 1)2(m+1) -delineable and M0 is |Q|, (n + 1) (|Q| + 1)2 -delineable.

Proof. Let K = (n + 1)m be the total number of allocations and let p be a parameter vector where the first n components correspond to the bidder weights wj for j ∈ [n], the next n components n correspond to 1/wj for j ∈ [n], the next 2 2 components correspond to wi/wj for all i 6= j, the next |Q| components correspond to λ(Q) for every allocation Q ∈ |Q|, and the final n|Q| components correspond to λ(Q)/wj for all allocations Q ∈ Q and all bidders j ∈ [n]. In total, the dimension of this parameter space is at most 2n + 2n2 + |Q| + n|Q| < (n + 2)(n + |Q|) ≤ 3n(n + |Q|). We set d = 3n(n + |Q|). Fix some valuation vector v. We claim that the allocation of any Q-boosted 2 d AMA is determined by at most (n + 1)K hyperplanes in R . To see why this is, the allocation will j  j j  P  j j j P k k k be Q = q1,..., qn where wivi qi + λ Q − c(Q ) ≥ wivi qi + λ Q − c(Q ) for all k k k allocations Q = q1 ,..., qn . This decision governing which of the K possible allocations will be the AMA allocation is defined by the K2 hyperplanes, one per pair of distinct allocations Qj and Qk. By a similar argument, it is straightforward to see that K2 hyperplanes determine the allocation of any AMA in this restricted space without any one bidder’s participation. This leads us to a total of (n + 1)K2 hyperplanes which partition the space of Q-boosted AMA parameters in a way such that for any two parameter vectors in the same region, the auction allocations are the same, as are the allocations without any one bidder’s participation. Once these allocations are fixed, profit is a linear function in this parameter space. The proof for λ-auctions is very similar to that for AMAs. However, we claim that the al- location of any Q-boosted λ-auction is determined by at most (n + 1)(|Q| + 1)2 hyperplanes in |Q| R . This is because without the bidder weights, the allocation of the Q-boosted λ-auction will either be a boosted allocation or the VCG allocation if it is not boosted. Therefore, there are only (|Q| + 1)2 hyperplanes determining the allocation of the λ-auction, and the same number of hyper- planes determine the allocation of the λ-auction in this restricted space without any one bidder’s participation. Once these allocations are fixed, profit is linear function of the λ-terms.

 0  Lemma 8.8. For all v ∈ X and all M ∈ M, profitM (v) = Ez profitM (v, z) . 0 Proof. By definition of profitm,

 0  profitM (v, z) Ez    X = pv − c  ej Ez j:z[j]<φv[j] X Y Y = pv − c (r) Pr [z[j] < φv[j]] Pr [z[j] ≥ φv[j]] r∈{0,1}m j:r[j]=1 j:r[j]=0 X Y Y = pv − c (r) φv[j] (1 − φv[j]) . r∈{0,1}m j:r[j]=1 j:r[j]=0

38 From the other direction,

profitM (v) = pv − E [c(q)] q∼φv X Y Y = pv − c (r) Pr [q[j] = 1] Pr [q[j] = 0] r∈{0,1}m j:r[j]=1 j:r[j]=0 X Y Y = pv − c (r) φv[j] (1 − φv[j]) . r∈{0,1}m j:r[j]=1 j:r[j]=0

 0  Therefore, profitM (v) = Ez profitM (v, z) . Lemma 3.24. With probability 1 − δ over the draw n   o v(1), z(1) ,..., v(N), z(N) ∼ (D × U ([0, 1])m)N , for all mechanisms M ∈ M,

N r r 1   Pdim(M0) 2 ln(4/δ) X 0 (i) (i) profitM v , z − E [profitM (v)] ≤ 120U + 4U . N v∼D N N i=1 Proof. We know that with probability at least 1 − δ over the draw of a sample n   o v(1), z(1) ,..., v(N), z(N) ∼ (D × U([0, 1])m)N , for all mechanisms M ∈ M,

N 1 X 0  (j) (j)  0  profitM v , z − E profitM (v, z) N v,z∼D×U([0,1])m j=1 r r ! P dim(M0) log(1/δ) = O U + U . N N

We also know from Lemma 8.8 that

 0  E profitM (v, z) = E [profitM (v)] . v,z∼D×U([0,1])m v∼D

Therefore, the theorem statement holds.   Lemma 3.23. The class M0 is ` (m + 1) , (` + 1)2 + m` -delineable.

Proof. The buyer will prefer lottery j ∈ {0, . . . , `} so long as v · φ(j) − p(j) > v · φ(k) − p(k) for `+1 `(m+1) any k 6= j, which amount to 2 hyperplanes in R defining the lottery the buyer chooses. (k) (k) P Next, for each lottery φ , p , there are m hyperplanes determining the vector j:z[j]<φ(k)[j] ej, P  (k) and thus the cost c j:z[j]<φ(k)[j] ej . These vectors have the form z[j] = φ [j]. Thus, there are a total of `m hyperplanes determining the costs. Let H be the union of all (` + 1)2 + m` `(m+1) hyperplanes. Within any connected component of R \ H, the lottery the buyer buys is fixed P  and for each lottery, c j:z[j]<φ(k)[j] ej is fixed. Therefore, profit is a linear function of the prices p(1), . . . , p(`).

39 Figure 6: An example of the γ-MBA revenue of a single bidding instance as γ varies.

8.1 Additional lottery results Lotteries for a unit-demand buyer. Recall that if the buyer is unit-demand, then for any m (j) (j) bundle q ∈ {0, 1} , v1 (q) = maxi:q[i]≥1 v1 (ei). We assume that under a lottery φ , p with a unit-demand buyer, the buyer will only receive one item, and the probability that item is item (j) Pm (j) (j) i is φ [i]. Thus, we assume that i=1 φ [i] ≤ 1. Since v1(ei) · φ [i] is their value for item i Pm (j) (j) times the probability they get that item, their expected utility is i=1 v1(ei) · φ [i] − p , as in the case with an additive buyer. Therefore, the following theorem follows by the exact same proof as Lemma 3.23.

Theorem 8.9. Let M0 be the class of functions defined in Section 3.4.4. Then M0 is   ` (m + 1) , (` + 1)2 + m` -delineable.

Lotteries for multiple unit-demand or additive buyers. In order to generalize to multi- buyer settings, we assume that there are n units of each item for sale and that each buyer will receive at most one unit of each item. The buyers arrive simultaneously and each will buy the lottery that maximizes her expected utility. Thus, the following is a corollary of Lemma 3.23.

Theorem 8.10. Let M0 be the class of functions defined in Section 3.4.4. Then M0 is    ` (m + 1) , n (` + 1)2 + m` -delineable.

8.2 Mixed bundling auctions Mixed bundling auctions (MBAs) are defined by a single parameter γ. They correspond to a λ-auction where λ(Q) = γ if some bidder receives the grand bundle under allocation Q and 0 otherwise. The class of MBAs is particularly simple, and we prove an even tighter bound on the Rademacher complexity of MBAs than that guaranteed by Theorem 3.9. Our analysis requires us to understand how the profit of a γ-MBA on a single bidding instance changes as a function of γ. We take advantage of this function’s structural properties, first uncovered by Jehiel et al. [49]: no matter the number of bidders and no matter the number of items, there exists an easily characterizable value γ∗ such that the function in question is increasing as γ grows from 0 to γ∗, and then it is non-increasing as γ grows beyond γ∗. This is depicted in Figure 6. Intuitively, γ∗ represents the number at which γ has grown so large that the MBA has morphed into a second price auction on the grand bundle. As a result, no matter how much larger γ grows beyond γ∗, the value of γ no longer factors into the profit function. This simple structure allows us to prove the strong generalization guarantee described in Theorem 8.11.

40 γ value Profit on v1 Profit on v2 0 0 ≤ z(1) 2 ≤ z(2) 1.5 3 ≤ z(1) 5 > z(2) 1.75 3.5 > z(1) 5.5 > z(2) 2.5 5 > z(1) 4 ≤ z(2)

Table 5: Example of a shattered set of size 2

Theorem 8.11. Let M be the class of MBAs. Then Pdim(M) = 2. Proof. First, we show that the pseudo-dimension of the class of n-bidder, m-item MBAs is at most 2. Let S = v(1),..., v(N) be a set of n-bidder valuation functions that can be shattered by a set Γ of 2N MBAs. This means that there exist N witnesses z(1), . . . , z(N) such that each MBA in Γ induces a binary labeling of the samples v(j) of S (whether the profit of the MBA on v(j) is at (j) least zj or strictly less than z ). Since S is shatterable, we can thus label S in every possible way using MBAs in Γ. Now, fix one sample v(i) ∈ S. We denote the profit of the γ-MBA on v(i) as a function of γ ∗ as profitv(i) (γ). From Lemma 8.12, we know that there exists γi ∈ [0, ∞), such that profitv(i) (γ) ∗ ∗ is non-decreasing on the interval [0, γi ] and non-increasing on the interval (γi , ∞). Therefore, (1) ∗ (2) ∗ there exist two thresholds ti ∈ [0, γi ] and ti ∈ (γi , ∞) ∪ {∞} such that profitv(i) (γ) is below (1) (1) (2) its threshold for γ ∈ [0, ti ), above its threshold for γ ∈ (ti , ti ), and below its threshold for (2) γ ∈ (ti , ∞). Now, merge these thresholds for all N samples on the real line and consider the interval (t1, t2) between two adjacent thresholds. The binary labeling of the samples in S on this (j) (j) interval is fixed. In other words, for any sample v ∈ S, profitv(j) (γ) is either at least z or (j) strictly less than z for all γ ∈ (t1, t2). There are at most 2N + 1 intervals between adjacent thresholds, so at most 2N + 1 different binary labelings of S. Since we assumed S is shatterable, it must be that 2N ≤ 2N + 1, so N ≤ 2. Finally, we show that the pseudo-dimension of the class of n-bidder, m-item MBAs is at least 2 by constructing a set S = v(1), v(2) that can be shattered by the set of MBAs. To construct this set of samples S, let  ( 0 if ||q||1 < bm/2c (1) (1) 0 if ||q||1 < bm/2c (2) (2)  v1 (q) = v2 (q) = and v1 (q) = v2 (q) = 3 if bm/2c ≤ ||q||1 < m 3 if bm/2c ≤ ||q||1  4 if ||q||1 = m.

Finally, let bidders 3 through n have all-zero valuations in both v(1) and v(2) and let the cost function be 0 for all allocations. Now, let z(1) = 3 and z(2) = 4. We define four MBAs parameterized by the coefficients γ1 = 0, γ2 = 1.5, γ3 = 1.75, γ4 = 2.5. It is easy to check that this set of MBAs shatters S, witnessed by z(1) and z(2). For example, see Table 5. The generalization guarantee follows from Theorem 3.8.

Lemma 8.12. For a valuation vector v, let profitv(γ) be the profit of the γ-MBA on v as a ∗ ∗ function of γ. There exists γ ∈ [0, ∞) such that profitv(γ) is non-decreasing on the interval [0, γ ] and non-increasing on the interval (γ∗, ∞). For additive bidders, this lemma is implied by Theorem 1 in the paper by Jehiel et al. [49] which provides the derivative of profitv(γ). The techniques used by Jehiel et al. [49] extend immediately to general bidders as well, as we show here.

41 Proof of Lemma 8.12. We will show that profitv can be decomposed into simple components, each of which can be easily analyzed on its own, and by combining these analyses, we prove ∗ ∗ ∗ the lemma statement. Suppose Q = (q1,..., qn) is the resulting allocation of a certain γ- −i −i −i MBA M and Q = q1 ,..., qn is the boosted social-welfare maximizing allocation with- ∗ Pn out bidder i’s participation. More explicitly, Q = argmax { i=1 vi (qi) + λ (Q) − c(Q)} and −i nP o Q = argmax k6=i vk (qk) + λ (Q) − c(Q) , where λ (Q) is set according to the MBA alloca- tion boosting rule for all Q. Then bidder i pays

 n   ∗ X ∗ ∗ ∗ X  −i −i −i pi,v (γ) = vi (qi ) −  vj qj + λ (Q ) − c(Q ) −  vj qj + λ Q − c(Q ) . j=1 j6=i This means that n X profitv(γ) = pi,v (γ) i=1 n n X ∗ ∗ ∗ X X  −i −i −i = (1 − n) vi (qi ) − n (λ (Q ) − c (Q )) + vj qj + λ Q − c Q . i=1 i=1 j6=i

P  −i −i −i The profit function can be split into n+1 functions: fi,v(γ) = j6=i vj qj +λ Q −c Q Pn ∗ ∗ ∗ for i ∈ {1, . . . , n} and gv(γ) = (1 − n) i=1 vi (qi ) − n (λ (Q ) − c (Q )) . We claim that fi,v(γ) is continuous for all i, whereas gv(γ) has at most one discontinuity. This means that profitv(γ) = Pn Pn i=1 fi,v(γ) + gv(γ) has at most one discontinuity as well. Moreover, the slope of i=1 fi,v(γ) is between zero and n, whereas the slope of gv(γ) is zero until its discontinuity, and then is −n. Therefore, the slope of profitv(γ) is at least zero before its discontinuity and at most zero after its discontinuity. This is enough to prove the lemma statement. ˜−i −i −i To see why these properties are true for the functions fi,v(γ), first let Q = q˜1 ,..., q˜n be n o ˜−i P the VCG allocation without bidder i, i.e., Q = argmax k6=i vk (qk) − c(Q) . If one bidder is allocated the grand bundle in allocation Q˜−i, then this allocation will only be more valuable as n o ˜−i P γ grows, so Q = argmax k6=i vk (qk) + λ (Q) − c(Q) for all values of γ, which means that           P −i ˜−i ˜−i P −i ˜−i fi,v(γ) = j6=i vj q˜j + λ Q − c Q = j6=i vj q˜j + γ − c Q for all values of γ as well. Clearly, in this case, fi,v(γ) is increasing and continuous. Otherwise, using the notation c1 to denote the cost of producing the grand bundle, we know there exists some value γ such           i P −i ˜−i ˜−i P −i ˜−i  1 that j6=i vj q˜j + λ Q − c Q = j6=i vj q˜j − c Q ≥ maxk6=i vk (1) + γ − c     P −i ˜−i  1 ˜−i if γ ≤ γi and j6=i vj q˜j − c Q < maxk6=i vk (1) + γ − c if γ > γi. This means that Q is the allocation of the γ-MBA without bidder i’s participation for γ ≤ γi, and the allocation of the γ-MBA without bidder i’s participation for γ > γi is the one where the highest bidder for the grand bundle (excluding bidder i) wins the grand bundle. Therefore,     (P −i ˜−i j6=i vj q˜j − c Q if γ ≤ γi fi,v(γ) =  1 maxk6=i vk (1) + γ − c if γ > γi.     P −i ˜−i  1 Notice that j6=i vj q˜j −c Q = maxk6=i vk (1) + γi − c , so fi,v(γ) is continuous. Finally, Pn it is clear that the slope of each fi,v(γ) is between 0 and 1, so the slope of i=1 fi,v(γ) is between 0 and n.

42 Similarly, let Q˜ = (q˜1,..., q˜n) be the allocation of the VCG mechanism run on v. Then there exists some γ∗ such that Q˜ is the allocation of the γ-MBA for γ ≤ γ∗ and the allocation of the γ-MBA for γ > γ∗ is the one where the highest bidder for the grand bundle wins the grand     Pn ˜ ˜  1 ∗ bundle. More explicitly, i=1 vi (q˜i) + λ Q − c Q ≥ maxk∈[n] vk (1) + γ − c if γ ≤ γ and     Pn ˜ ˜  1 ∗ i=1 vi (q˜i) + λ Q − c Q < maxk∈[n] vk (1) + γ − c if γ > γ . Therefore,      ( Pn ˜ ˜ ∗ (1 − n) i=1 vi (q˜i) − n λ Q − c Q if γ ≤ γ gv(γ) = 1 ∗ (1 − n) max {vk (1)} − n γ − c if γ > γ .

∗ Therefore, gv(γ) has at most one discontinuity, which falls at γ . Moreover, the slope of gv(γ) is 0 ∗ ∗ for γ < γ and −n for γ > γ . As described, these properties of fi,v(γ) and gv(γ) are enough to show that the lemma statement holds.

8.3 Proof of Theorem 3.26 In this section, we prove Theorem 3.26. First, we prove part 1 and then we prove part 2.

nm−n Lower Bound on Sample Complexity for λ-Auctions. We prove that NM(, δ) ≥ 2 samples are required to ensure that for any distribution D, with probability at least 1 − δ over the draw of a set of samples of size NM(, δ) from D, for all λ-auctions M ∈ M, average profit is -close to expected profit. Since λ-auctions are a subset of AMAs, this lower bound applies to AMAs as well. To prove Theorem 8.16, we construct a set V of n-bidder, m-item valuation functions taking values in {0, 1} where, under each valuation function, each bidder is interested in a specific subset of items, and these subsets are all pairwise disjoint. Moreover, |V | = nm − n. The high level idea is to show that for any subset H of V , there exists a λ-auction that has high profit over valuation functions in H, but low profit on the valuation functions in V \ H. Theorem 8.13 describes V in more detail. Now suppose that the distribution over the bidders’ valuation functions is the uniform distribution over V . This means that if a set of samples consist of only a small subset of V , then we cannot guarantee that every profit function will achieve average profit over the set of samples which is close to its expected profit over the distribution, as we require. We now present Theorem 8.13, wherein we describe the set V of valuation functions which we will use to prove Theorem 8.16.

Theorem 8.13. For any n, m ≥ 2 and any β ∈ (0, 1), there exists a set of N = nm − n n-bidder, m-item additive valuation functions V = v1,..., vN such that for any H ⊆ V , there exists a i i i λ-auction MH with profit 0 on v if v 6∈ H and profit at least 2 − 2β on v otherwise. Proof. We define the set V = v1,..., vN of n-bidder, m-item additive valuation functions, where

j  j j j j  v = v1(e1), . . . , v1(em), . . . , vn(e1) . . . , vn(em) ,

m with N = n − n. Recall that every allocation Q is written as (q1,..., qn) where q1,..., qn are disjoint subsets of the m items being auctioned. First, let Qˆj be the allocation where bidder j receives all m items. Next, let Q˜1,..., Q˜N be a fixed ordering of the nm − n allocations where all m items are allocated except Qˆ1,..., Qˆn. Let the bundles allocated to the n bidders in Q˜` be ` ` ` q˜ ,..., q˜ and let S` be the set of bidders who are allocated some item in allocation Q˜ . In other 1 n n o words, S = j | q˜` 6= 0 . For a sanity check, notice that P q˜` = 1. ` j i∈S` i

43 We will now define the valuation vectors v1,..., vN in terms of this set of special allocations Q˜1,..., Q˜N , so each vector v` depends on the allocation Q˜`. Specifically, we define v` for ` ∈ [N] `  ` as follows. If i 6∈ S` i.e., q˜i = 0 , set vi (ej) = 0 for all j ∈ [m]. Otherwise, set ( 0 if q˜`[j] = 0 v`(e ) = i . i j ` 1 if q˜i [j] = 1 We proceed to prove that for any subset H ⊆ V , there exists a λ-auction with 0 profit on all valuation functions in V \ H and at least 2 − 2β profit on all valuation functions in H. To define this λ-auction, we set the λ terms such that ( 0 if Q = Q˜` for some v` ∈ H λ (Q) = . 1 − β otherwise

Lemma 8.14. If v` ∈ H, then the profit on v` is at least 2 − 2β.   Pn ` ` ˜` Proof of Lemma 8.14. First, note that i=1 vi q˜i + λ Q = m, and for all allocations Q = ˜` Pn ` (q1,..., qn) 6= Q , i=1 vi (qi) + λ (Q) ≤ m − 1 + 1 − β < m. Therefore, the λ-auction allocation is Q˜`. In order to analyze the profit of this λ-auction, we must understand the payments of each bidder, which means that we must investigate what the outcome of this λ-auction would be without any one bidder’s participation. To this end, suppose i ∈ S , so bidder i is allocated some item in Q,˜     ` ` P ` ` ˜` ` i.e., q˜i 6= 0. Then j6=i vj q˜j + λ Q = m − q˜i 1 because bidder i’s valuation for the bundle ` ` q˜i is exactly q˜i 1. By construction, no bidder receives all m items in Q˜`, so we know that there exists some   0 0 ˜`,−i `,−i `,−i i ∈ S`, i 6= i. With this fact in mind, let Q = q˜1 ,..., q˜n be the allocation where all ` bidders in S` are allocated the same items as they are in Q˜ and bidder i receives the empty set. This is one possible allocation of the λ-auction without bidder i’s participation, and therefore the social welfare of the other bidders will be at least as high under this allocation as it would be in the true   allocation of the λ-auction without bidder i’s participation. By construction, λ Q˜`,−i = 1 − β.     P ` `,−i ˜`,−i ` Therefore, `6=i vj q˜j + λ Q = m − q˜i 1 + 1 − β which means that bidder i must pay `  `  at least m − q˜i 1 + 1 − β − m − q˜i 1 = 1 − β. We know that |S`| ≥ 2, i.e., there are at least 2 bidders who receive a non-empty bundle and therefore must pay at least 1 − β, so the profit of this λ-auction is at least 2 − 2β.

Lemma 8.15. If v` 6∈ H, then the profit on v` is 0.   Pn ` ` ˜` Proof of Lemma 8.15. First, note that i=1 vi q˜i + λ Q = m + 1 − β, and for all allocations ` Pn ` Q = (q1,..., qn) 6= Q˜ , v (qi) + λ (Q) ≤ m − 1 + 1 − β < m, so the λ-auction allocation is i=1 i     ˜` P ` ` ˜` ` Q . Now, suppose i ∈ S`. Then j6=i vj q˜j + λ Q = m − q˜i 1 + 1 − β. Since bidder i is the only bidder with nonzero valuations for the items in q˜` under v`, any allocation Q˜`,−i without his i    P ` `,−i ˜`,−i ` participation will have social welfare at most j6=i vj q˜j + λ Q ≤ m − q˜i 1 + 1 − β. Therefore, bidder i pays nothing. Of course, for any bidder i 6∈ S`, her presence in the auction makes no difference on the resulting allocation because her valuation function under v` is 0 on all items, so he pays nothing as well. Therefore, the profit on v` is 0.

44 Putting Lemmas 8.14 and 8.15 together, we have the desired result.

We now use Theorem 8.13 to prove Theorem 8.16.

Theorem 8.16. For any  ∈ (0, 1), there exists a distribution D and a λ-auction M ∗ such that, nm−n with probability 1 over the draw of a set of samples S of size at most 2 , 1 X profitM ∗ (v) − E [profitM ∗ (v)] > . |S| v∼D v∈S Proof. Let β = 1 −  and let V be the set of valuation functions proven to exist in Theorem 8.13 corresponding to β (i.e. for any H ⊆ V , there exists a λ-auction MH with profit 0 on v if v ∈ H and profit at least 2 − 2β on v otherwise). Let D be the uniform distribution on V . nm−n ∗ Suppose that S is a set of at most 2 samples. Of course, S ⊆ V , so let M be the λ-auction with 0 profit on every valuation function not in the set of samples and profit at least 2 − 2β on every valuation function in the set of samples. We know that M ∗ exists due to Theorem 8.13. Notice that the average empirical profit of M ∗ on S is at least 2−2β. Meanwhile, the probability, on a random draw v ∼ D that profitM ∗ (v) is 0 is exactly the probability that v 6∈ S. Given that |S| 1 the set of training examples has measure nm−n ≤ 2 , we have that

1 X profitM ∗ (v) − E [profitM ∗ (v)] ≥ 2 − 2β − (2 − 2β) Pr [v ∈ S] |S| v∼D v∼D v∈S > 2 − 2β − (1 − β) = 1 − β = . as desired.

Lower Bound on Sample Complexity for VVCAs.We now prove that it is not possible to learn over the set of VVCA profit function under and arbitrary distribution with subexponential sample complexity. In particular, we prove that no algorithm can learn over the class of n-bidder, m-item VVCA profit functions with sample complexity 2m −2. This holds even when the bidders’ valuation functions are additive. The format of this proof similar to that of Theorem 8.16. Namely, we construct a set V of n-bidder, m-item valuation functions such that |V | = 2m − 2. We then show that for any subset H of V , there exists a VVCA that has high profit over valuation functions in H, but low profit on the valuation functions in V \ H. The set V is described in more detail in Theorem 8.17. As described in Theorem 8.16, this immediately implies hardness for learning over the uniform distribution on V . Given the parallel proof structure, we present Theorem 8.17 and refer the reader to Theorem 8.16 to see how it implies hardness for learning.

Theorem 8.17. For any m ≥ 2 and any β ∈ (0, 1), there exists a set of N = 2m − 2 2-bidder additive valuation functions V = {v1,..., vN } such that for any H ⊆ V , there exists a VVCA with profit 0 on vi if vi ∈ V and profit 1 − β on vi if vi 6∈ V .

Proof. We define the set V = {v1,..., vN } of 2-bidder valuation functions, where

j j j j j v = (v1(e1), . . . , v1(em), v2(e1) . . . , v2(em)),

m with N = 2 − 2. Recall that every allocation vector Q can be written as (q1, q2) where q1 and q2 are disjoint subsets of the m items being auctioned. In order to define the valuation functions in V ,

45 we define q˜1,..., q˜N to be a arbitrary, fixed ordering of the vectors in the set {0, 1}m \{0, 1}. We will define each valuation function in V in terms of this ordering. In particular, let Q˜` = 1 − q˜`, q˜` be the allocation where bidder 1 receives 1 − q˜` and bidder 2 receives q˜`. Finally, let v` for ` ∈ [N] be defined as follows:

( ` ( ` ` 1 if q˜ [i] = 0 ` 1 if q˜ [i] = 1 v (ei) = and v (ei) = . 1 0 otherwise 2 0 otherwise

Clearly, if w1 = w2 = 1 and λ1(Q) = λ2(Q) = 0 for all Q ∈ Q, then the VVCA allocation on any v` ∈ S is the one in which bidder 2 receives 1 − q˜` and bidder 1 receives q˜`. This has a social welfare of m, whereas any other allocation has a social welfare at most m − 1. We claim that for any H ⊆ V , there exists a VVCA with profit 0 on vi if vi ∈ H and profit i i ` 1 − β on v if v 6∈ H. The VVCA has bidder weights w1 = w2 = 1, and for all v ∈ H, we set ˜` ˜` λ1(Q ) = c1,1−q˜` = c2,q˜` = λ2(Q ) = 0. Otherwise, we set λi(Q) = (1 − β)/2 for each i ∈ {1, 2}. Lemma 8.18. If v` ∈ H, then the profit on v` is 1 − β.

` ` ` ` Proof of Lemma 8.18. First, note that v1(1 − q˜ ) + v2(q˜ ) + λ1(Q˜ ) + λ2(Q˜ ) = m, and for all ` allocations Q = (q1, q2) 6= Q˜ , v1(q1) + v2(q2) + λ1(Q) + λ2(Q) ≤ m − 1 + 1 − β. Therefore, the VVCA allocation is Q˜`. However, this is neither bidder 1 nor bidder 2’s favorite weighted allocation, ` ` ` ` ` ` since v1(1 − q˜ ) + λ1(Q˜ ) = |1 − q˜ | < v1(1) + c1,1 = |1 − q˜ | + (1 − β)/2 and v2(q˜ ) + λ2(Q˜ ) = ` ` ` ` |q˜ | < v2(1) + c2,1 = |q˜ | + (1 − β)/2. This follows from the fact that q˜ 6= 1 and 1 − q˜ 6= 1 for all ` ∈ [N], so it must be that c1,1 = c2,1 = (1 − β)/2. Since |1−q˜`| and |q˜`| are bidder 1 and 2’s highest valuations for any allocation, respectively, and because (1 − β)/2 is the highest value of any λ term, v1(1) + c1,1 and v2(1) + c2,1 are the maximum weighted valuation that either bidder has for any allocation under this VVCA. Therefore, the profit ` ` ` ` of this VVCA on v` is |q˜ | + |1 − q˜ | + 1 − β − |q˜ | − |1 − q˜ | = 1 − β. Lemma 8.19. If v` 6∈ H, then the profit on that valuation function pair is 0.

` ` ` ` Proof of Lemma 8.19. First, note that v1(1 − q˜ ) + v2(q˜ ) + λ1(Q˜ ) + λ2(Q˜ ) = m + 1 − β, and for ` all allocations Q = (q1, q2) 6= Q˜ , v1(q1) + v2(q2) + λ1(Q) + λ2(Q) ≤ m − 1 + 1 − β < m + 1 − β, ` ` ` so the AMA allocation is Q˜ . Moreover, for all allocations Q = (q1, q2), v1(1 − q˜ ) + λ1(Q˜ ) = ` ` ` ` |1 − q˜ | + (1 − β)/2 ≥ v1(q1) + λ1(Q) and v2(q˜ ) + λ2(Q˜ ) = |q˜ | + (1 − β)/2 ≥ v2(q2) + λ2(Q). Therefore, both bidders receive one of their favorite weighted allocations, so the profit is 0.

9 Connection to structured prediction

In this section, we connect the hyperplane structure we investigate in this paper to the structured prediction literature in machine learning (e.g., [26]), thus proving even stronger generalization bounds for item-pricing mechanisms under buyers with unit-demand and general valuations and answering an open question by Morgenstern and Roughgarden [62]. Balcan et al. [9] were the first to explore the connection between structured prediction and mechanism design, though in a different setting from us: they provided algorithms that make use of past data describing the purchases of a utility-maximizing agent to produce a hypothesis function that can accurately forecast the future behavior of the agent. Morgenstern and Roughgarden [62] used structured prediction to provide sample complexity guarantees for several “simple” mechanism classes. They observed that these classes have profit

46 functions which are the composition of two simpler functions: A generalized allocation function (1) (2) fp : X → Y and a simplified profit function fp : X × Y → R such that profitp (v) = (2)  (1)  fp v, fp (v) . For example, Y might be the set of allocations. In this case, we say that (1) (2) (1) n (1) o (2) n (2) o M is F , F -decomposable, where F = fp : p ∈ P and F = fp : p ∈ P . See Example 9.1 for an example of this decomposition. Example 9.1 (Item-pricing mechanisms [62]). Let M be the class of anonymous item-pricing mechanisms over a single additive bidder and let p = (p1, . . . , pm) be a vector of prices. In this (1) m th (1) case, we can define fp : X → {0, 1} where the i component of fp (v) is 1 if and only if the buyer buys item i. Define ψ(v, α) = (v(α), −α) and define wp = (1, p). Then the α that maximizes p (1) hw , ψ(v, α)i is the α that maximizes the buyer’s utility, i.e., fp (v), as desired. Finally, we define (2) (2)  (1)  fp (v, α) = hα, pi, and we have that profitp(v) = fp v, fp (v) , as desired. Morgenstern and Roughgarden [62] bound Pdim (M) using the “complexity” of F (1), which they quantified using tools from structured prediction, namely, generalized linear functions.

Definition 9.2 (a-dimensional linear class). A set F = {fp : X → Y | p ∈ P} is an a-dimensional a p a linear class if there is a function ψ : X × Y → R and a vector w ∈ R for each p ∈ P such that p p fp (v) ∈ argmaxα∈Y hw , ψ (v, α)i and |argmaxα∈Y hw , ψ (v, α)i| = 1. If M is F (1), F (2)-decomposable and F (1) is an a-dimensional linear class over Y, we say that M is an a-dimensional linear class over Y. The bounds Morgenstern and Roughgarden [62] provided using linear separability are loose in several settings: for anonymous and non-anonymous item-pricing mechanisms under additive bidders, their structured prediction approach gives a pseudo-dimension bound of O m2 and O nm2 log m, respectively. They left as an open question whether linear separability can be used to prove tighter guarantees. Using the hyperplane structures we study in this paper, we prove that the answer is “yes.” We require the following refined notion of (d, t)-delineable classes.

Definition 9.3 ((d, t1, t2)-divisible). Suppose M consists of mechanisms parameterized by vectors d (1) (2) p ⊆ R and that M is F , F -decomposable. We say that M is (d, t1, t2)-divisible if: 0 1. For any v ∈ X , there is a set H of t1 hyperplanes such that for any connected component P d (1) 0 of R \ H, the function fv (p) is constant over all p ∈ P .

2. For any v ∈ X and any α ∈ Y, there is a set H2 of t2 hyperplanes such that for any connected 0 d (2) 0 component P of R \ H2, the function fv,α (p) is linear over all p ∈ P .

Note that (d, t1, t2)-divisibility implies (d, t1 + t2)-delineability. Theorem 9.4 connects linear separability and divisibility with pseudo-dimension.

Theorem 9.4. Suppose M is mechanism class that is (d, t1, t2)-divisible with t1, t2 ≥ 1 and an n a do a-dimensional linear class over Y. Let ω = min |Y| , d (at1) . Then

Pdim (M) = O ((d + a) log (d + a) + d log t2 + log ω) . Proof. To prove this theorem, we will use the following standard notation. For a class F of real-  (1) (N) valued functions mapping X to R, let S = v ,..., v be a subset of X . We define    1 (1) (1)  {f(v )≥z }   .   ΠF (S) = max  .  : f ∈ F . z(1),...,z(N)∈   R  1   {f(v(N))≥z(N)} 

47 |S| The pseudo-dimension of F is the size of the largest set S such that ΠF (S) = 2 . We also use the notation f(S) to denote the vector f(v(1)), . . . , f(v(N)). Morgenstern and Roughgarden [62] proved the following lemma. Lemma 9.5 (Morgenstern and Roughgarden [62]). Suppose M is F (1), F (2)-decomposable and an a-dimensional linear class. Let S = v(1),..., v(N) be a subset of X . Then n  o 0 (1) 0 0 0 ΠM(S) ≤ S , fp (S ) : S ⊆ S, |S | = a, p ∈ P n n (1) (1)  (N) (N)oo · max ΠF (2) v , α ,..., v , α . α(1),...,α(N)∈Y

Suppose the pseudo-dimension of M is N. By definition, there exists a set S = v(1),..., v(N) that is shattered by M. By Lemmas 9.5 and 9.6, this means that

N a n n (1) (1)  (N) (N)oo 2 = ΠM(S) ≤ N ω max ΠF (2) v , α ,..., v , α . α(1),...,α(N)∈Y To prove this theorem, we will show that

n n (1) (1)  (N) (N)oo 2 2 d max ΠF (2) v , α ,..., v , α < d N t2 , (6) α(1),...,α(N)∈Y

N 2d+a 2 d which means that 2 < N d t2ω, and thus N = O ((d + a) log(d + a) + d log t2 + log ω). To this end, let α(1),..., α(N) be N arbitrary elements of Y and let z(1), . . . , z(N) be N arbitrary (i) elements of R. Since M is (d, t1, t2)-divisible, we know that for each i ∈ [N], there is a set H2 of t hyperplanes such that for any connected component P 0 of P \ H(i), f (2) (p) is linear over 2 2 v(i),α(i) 0 (1) (N) all p ∈ P . We now consider the overlay of all N partitions P \ H2 ,..., P \ H2 . Formally, this SN (i) overlay is made up of the sets P1,..., Pτ , which are the connected components of P \ i=1 H2 . For each set Pj and each i ∈ [N], Pj is completely contained in a single connected component of P \ H(i), which means that f (2) (p) is linear over P . Since H(i) ≤ t for all i ∈ [N], 2 v(i),α(i) j 2 2 d τ < d(Nt2) [19]. SN (i) (i) Now, consider a single connected component Pj of P \ i=1 H2 . For any sample v ∈ S, we know that f (2) (p) is linear over P . Let a(i) ∈ d and b(i) ∈ be the weight vector and offset v(i),α(i) j j R j R such that f (2) (p) = a(i)·p+b(i) for all p ∈ P . We know that there is a hyperplane a(i)·p+b(i) = v(i),α(i) j j j j j z(i) where on one side of the hyperplane, f (2) (p) ≤ z(i) and on the other side, f (2) (p) > z(i). v(i),α(i) v(i),α(i) n (i) (i) (i) o Let HPj be all N hyperplanes for all N samples, i.e., HPj = aj · p + bj = z : i ∈ [N] . Notice that in any connected component P 0 of P \ H , for all i ∈ [N], f (2) (p) is either greater than j Pj v(i),α(i) z(i) or less than z(i) (but not both) for all p ∈ P 0. d In total, the number of connected components of Pj \ HPj is smaller than dN . The same holds for every partition P . Thus, the total number of regions where for all i ∈ [N], f (2) (p) is either j v(i),α(i) (i) (i) d d greater than z or less than z (but not both) is smaller than dN · d(Nt2) . In other words,      (2) (1) (1) (1)  1 fp v , α ≥ z      .  d d  .  : p ∈ P ≤ dN · d(Nt2) .      (2) (N) (N) (N)   1 fp v , α ≥ z 

48 Since we chose α(1),..., α(N) and z(1), . . . , z(N) arbitrarily, we may conclude that Inequality (6) holds.

Lemma 9.6. Suppose M is an a-dimensional linear class over Y and (d, t1, t2)-divisible. Then for any set S ⊆ X of size N, n o n o 0 (1) 0 0 0 a a d (S , fp (S )) : S ⊆ S, |S | = a, p ∈ P ≤ N min |Y| , d(at1) . Proof. To begin with, there are of course at most N a ways to choose a set S0 ⊆ S of size a. How many ways are there to label a fixed set S0 = v(i1),..., v(ia) of size a using functions from F (1)? An easy upper bound is |Y|a. Alternatively, we can use the structure of M to prove that there are d 0 (i ) 0 d(at1) ways to label S . Since M is (d, t1, t2)-divisible, we know that for any v j ∈ S , there is (ij ) 0 (ij ) (1) a set H of t1 hyperplanes such that for any connected component P of P \ H , f (p) is 1 1 v(ij ) (i ) 0 j (ij ) 0 constant over all p ∈ P . We now consider the overlay of all a partitions P \ H1 for all v ∈ S . Formally, this overlay is made up of the sets P1,..., Pτ , which are the connected components  (i ) S j (ij ) 0 of P \ (i ) H . For each set P and each v ∈ S , P is completely contained in a v j ∈S0 1 t t (ij ) (1) single connected component of P \ H , which means that f (p) is constant over Pt. This 1 v(ij ) (i ) 0 j (ij ) 0 means that the number of ways to label S is at most τ. Since H1 ≤ t1 for all v ∈ S , n  o d 0 (1) 0 0 0 a  a d τ < d(at1) [19]. Therefore, S , fp (S ) : S ⊆ S, |S | = a, p ∈ P ≤ N min |Y| , d(att) , so the lemma statement holds.

9.1 Divisible mechanism classes We now instantiate Theorem 9.4. Theorem 9.7. Let M and M0 be the classes of item-pricing mechanisms with anonymous prices and non-anonymous prices. If the buyers are unit-demand, then M is m, nm2, 1-divisible and M0 is nm, nm2, 1-divisible. Also, M and M0 are (m + 1)- and (nm + 1)-dimensionally linearly separable over {0, 1}m and [n]m. Therefore, Pdim (M) = O min m2, m log (nm)  and Pdim M0 = O (nm log(nm)) .

(1) m th Proof. We begin with anonymous reserves. Let fp : X → {0, 1} be defined so that the i m component is 1 if and only if item i is sold. For each bidder j, there are 2 hyperplanes defining their preference ordering on the items: vj(ei) − p(ei) = vj(ek) − p(ek) for all i 6= k. This gives a 2 m (1) total of at most t1 = nm hyperplanes splitting R into regions where fv (p) is constant. Next, (2) we can write fp (v, α) = α · p, which is always linear, so we may set t2 = 1. (1) nm Under non-anonymous reserve prices, let fp : X → {0, 1} be defined so that for every bidder (1) j and every item i, there is a component of fp (v) that is 1 if and only if bidder j receives item 2 nm i. As with anonymous prices, there are t1 = nm hyperplanes splitting R into regions where (1) (2) fv (p) is constant. Next, we can write fp (v, α) = α · p, which is always linear, so we may set t2 = 1. Morgenstern and Roughgarden [62] proved that M and M0 are (m + 1)- and (nm + 1)- dimensionally linearly separable over {0, 1}m and [n]m, respectively.

When prices are anonymous, if n < 2m, Theorem 9.7 improves on the pseudo-dimension bound of O m2 Morgenstern and Roughgarden [62] gave for this class, and otherwise it matches their bound. When the prices are non-anonymous our bound improves on their bound of O nm2 log n.

49 Theorem 9.8. Let M and M0 be the classes of item-pricing mechanisms with anonymous prices and non-anonymous prices, respectively. If the buyers have general values, then M is m, n22m, 1- divisible and M0 is nm, n22m, 1-divisible. Also, M is (m + 1)-dimensionally linearly separable over {0, 1}m and M0 is (nm + 1)-dimensionally linearly separable over [n]m. Thus, P dim (M) = O m2 and P dim (M0) = O (nm (m + log n)).

(1) m th Proof. We begin with anonymous reserves. Let fp : X → {0, 1} be defined so that the i 2m component is 1 if and only if item i is sold. For each bidder j, there are 2 hyperplanes defining P 0 P their preference ordering on the bundles: vj(q) − i:q[i]=1 p(ei) = vj(q ) − i:q0[i]=1 p(ei) for all 0 m 2m m q, q ∈ {0, 1} . This gives a total of at most t1 = n2 hyperplanes splitting R into regions where (1) (2) fv (p) is constant. Next, we can write fp (v, α) = α · p, which is always linear, so we may set t2 = 1. (1) nm Under non-anonymous reserve prices, let fp : X → {0, 1} be defined so that for every bidder (1) j and every item i, there is a component of fp (v) that is 1 if and only if bidder j receives item 2m nm i. As with anonymous prices, there are t1 = n2 hyperplanes splitting R into regions where (1) (2) fv (p) is constant. Next, we can write fp (v, α) = α · p, which is always linear, so we may set t2 = 1. Morgenstern and Roughgarden [62] proved that M and M0 are (m + 1)- and (nm + 1)- dimensionally linearly separable over {0, 1}m and [n]m, respectively.

When there are anonymous prices, the number of hyperplanes in the partition is large, so considering the hyperplane partition does not help us. As a result, Theorem 9.8 implies the same bound Morgenstern and Roughgarden [62] gave. In the case of non-anonymous prices, analyzing the hyperplane partition gives a better bound than their bound of O nm2 log n. In Theorem 9.9, we use Theorem 9.4 to prove pseudo-dimension bounds of O (m log m) and O (nm log nm) for the classes of second price auctions for additive bidders with anonymous and non- anonymous reserves, respectively. In Theorem 9.10, we prove the same for item-pricing mechanisms. We thus answer the open question by Morgenstern and Roughgarden [62]. These bounds match those implied by Lemmas 3.18 and 3.19. Theorem 9.9. Let M and M0 be the classes of anonymous and non-anonymous second price item auctions. Then M is (m, m, m)-divisible and M0 is (nm, m, m)-divisible. Also, M and M0 are (m + 1)- and (nm + 1)-dimensionally linearly separable over {0, 1}m and [n]m. Therefore, P dim(M) = O(m log m) and P dim(M0) = O(nm log(nm)).

Proof. We begin with anonymous reserves. For a given valuation vector v, let ji be the highest 0 (1) m bidder for item i and let ji be the second highest bidder. Let fp : X → {0, 1} be defined so that th m the i component is 1 if and only if item i is sold. There are t1 = m hyperplanes splitting R into (1) th (1) regions where fv (p) is constant: the i component of fv (p) is 1 if and only if vji (ei) ≥ p(ei). (2) P n o Next, we can write fp (v, α) = max v 0 (e ), p(e ) − c(α), which is linear so long as i:α[i]=1 ji i i either v 0 (e ) < p(e ) or v 0 (e ) ≥ p(e ) for all i ∈ [m]. Therefore, there are t = m hyperplanes H ji i i ji i i 2 2 0 (2) 0 such that for any connected component P of P \ H2, fv,α(p) is linear over all p ∈ P . (1) nm Under non-anonymous reserve prices, let fp : X → {0, 1} be defined so that for every bidder (1) j and every item i, there is a component of fp (v) that is 1 if and only if bidder j receives item i. nm (1) There are t1 = m hyperplanes splitting R into regions where fv (p) is constant: for every item i, the component corresponding to bidder ji is 1 if and only if vji (ei) ≥ pj(ei). Next, we can write (2) P n o fp (v, α) = max v 0 (e ), p (e ) −c(α), which is linear so long as either v 0 (e ) < p (e ) i:α[i]=1 ji i ji i ji i ji i

50 or v 0 (e ) ≥ p (e ) for all i ∈ [m]. Therefore, there are t = m hyperplanes H such that for any ji i ji i 2 2 0 (2) 0 connected component P of P \ H2, fv,α(p) is linear over all p ∈ P . Morgenstern and Roughgarden [62] proved that M and M0 are (m + 1)- and (nm + 1)- dimensionally linearly separable over {0, 1}m and [n]m, respectively.

Theorem 9.10. Let M and M0 be the classes of item-pricing mechanisms with anonymous prices and non-anonymous prices, respectively. If the buyers are additive, then M is (m, m, 1)-divisible and M0 is (nm, nm, 1)-divisible. Also, M and M0 are (m + 1)- and (nm + 1)-dimensionally linearly separable over {0, 1}m and [n]m. Therefore, P dim(M) = O(m log m) and P dim(M0) = O(nm log(nm)).

Proof. We begin with anonymous reserves. For a given valuation vector v, let ji be the buyer with (1) m th the highest valuation for item i. Let fp : X → {0, 1} be defined so that the i component is 1 m (1) if and only if item i is sold. There are t1 = m hyperplanes splitting R into regions where fv (p) th (1) is constant: the i component of fv (p) is 1 if and only if vji (ei) ≥ p(ei). Next, we can write (2) fp (v, α) = α · p, which is always linear, so we may set t2 = 1. (1) nm Under non-anonymous reserve prices, let fp : X → {0, 1} be defined so that for every bidder (1) j and every item i, there is a component of fp (v) that is 1 if and only if bidder j receives item i. nm (1) There are t1 = nm hyperplanes splitting R into regions where fv (p) is constant: vj(ei) = pj(ei) (2) for all i and all j. Next, we can write fp (v, α) = α · p, which is always linear, so we may set t2 = 1. Morgenstern and Roughgarden [62] proved that M and M0 are (m + 1)- and (nm + 1)- dimensionally linearly separable over {0, 1}m and [n]m, respectively.

10 Proofs from Section 4

Lemma 10.1. Let X = X1 × · · · × Xd. Let F = {fp : p ∈ P} be a set of functions mapping X to R, parameterized by a set P = P1 × · · · × Pd. Suppose for i ∈ [d], there exists a class n (i) o Fi = fp : p ∈ Pi of functions mapping Xi to R such that for any p = (p[1], . . . , p[d]) ∈ P, fp Pd (i) decomposes additively as fp (v1, . . . , vd) = i=1 fp[i] (vi). Then

d X (i) sup fp(v) = sup fp (v) . v∈X ,p∈P i=1 v∈Xi,p∈Pi

Proof. Recall that for any set A ⊆ R, s = sup A if and only if: 1. For all  > 0, there exists a ∈ A such that a > s − , and

2. For all a ∈ A, a ≤ s.

(i) Pd Let ti = supv∈Xi,p∈Pi fp (v) and let t = i=1 ti. We will show that t = supv∈X ,p∈P fp(v). First, we will show that condition (1) holds. In particular, we want to show that for all  > 0, (i) there exists v ∈ X and p ∈ P such that fp(v) > t − . Since ti = supv∈Xi,p∈Pi fp (v), we know (i) that there exists vi ∈ Xi, pi ∈ P such that fpi (vi) > ti − /d. Therefore, letting p = (p1, . . . , pd), Pd (i) Pd we know that fp (v1, . . . , vd) = i=1 fpi (vi) > i=1 ti −  = t − . Since (v1, . . . , vd) ∈ X and (p1, . . . , pd) ∈ P, we may conclude that condition (1) holds.

51 Next, we will show that condition (2) holds. In particular, we want to show that for all v ∈ X (i) Pd (i) and p ∈ P, fp(v) ≤ t. We know that fp[i] (v[i]) ≤ ti, which means that fp(v) = i=1 fp[i] (v[i]) ≤ Pd i=1 ti = t. Therefore, condition (2) holds. Lemma 4.4. Let M and M0 be the sets of second-price auctions with anonymous and non- anonymous reserves. Suppose the bidders are additive, D is item-independent, and the cost function N p 0 p is additive. For any set S ∼ D , RbS (M) ≤ 180U 1/N and RbS (M ) ≤ 180U n log(4n)/N. Proof. We begin with anonymous second-price auctions, which are parameterized by a set P ⊂ m R . Without loss of generality, we may write P = P1 × · · · × Pm, where Pi ⊂ R. Given a n valuation vector v and an item i, let v(i) ∈ R be all n buyers’ values for item i. Let profitp(v(i)) be the profit obtained by selling item i with a reserve price of p. Notice that for any p ∈ P, Pm profitp(v) = i=1 profitp[i](v(i)). Let Xi be the support of the distribution over v(i) and let

Ui = supp∈Pi,v(i)∈Xi profitp(v(i)). Next, let X be the support of D. By definition, since U is the maximum profit achievable via second price auctions over valuation vectors from X , we may write U = supv∈X ,p∈P profitp(v). Since D is item-independent, we know that X = X1 × · · · × Xm. Pm Therefore, we may apply Lemma 10.1, which tells us that U = Ui. Finally, each class of  i=1 functions profitp : p ∈ Pi is (1, 2)-delineable, since for v(i) ∈ Xi, profitv(i)(p) is linear so long as p is larger than the largest component of v(i), between the second largest and largest component of v(i), or smaller than the second largest component of v(i). By Corollary 4.2, we may conclude N  p  that for any set of samples S ∼ D , RbS (M) ≤ O U 1/N . 0 The bound on RbS (M ) follows by almost the exact same logic, except for a few adjustments. nm First of all, the class is defined by nm parameters coming from some set P ⊆ R , since there are n non-anonymous prices per item. Without loss of generality, we assume P = P1 × · · · × Pm, n where Pi ⊆ R is the set of non-anonymous prices for item i. Given a set of non-anonymous prices n p ∈ R for item i, let profitp(v(i)) be the profit of selling the item the bidders defined by v(i) given the reserve prices p. Notice that profitv(i)(p) is linear so long as for each bidder j, p[j] is  either larger than their value for item i or smaller than their value. Thus, the set profitp : p ∈ Pi is (n, n)-delineable. Defining each Ui in the same way as before, Lemma 10.1 guarantees that Pm N U = i=1 Ui. Therefore, by Corollary 4.2, we may conclude that for any set of samples S ∼ D , 0  p  RbS (M ) ≤ O U n log n/N .

Lemma 4.5. Let M and M0 be the sets of anonymous and non-anonymous item-pricing mech- anisms, respectively. Suppose the bidders are additive, D is item-independent, and the cost func- N p 0 tion is additive. For any set of samples S ∼ D , RbS (M) ≤ 180U 1/N and RbS (M ) ≤ 180Upn log(4n)/N.

Proof. We begin with anonymous item-pricing mechanisms, which are parameterized by a set P ⊂ m R . Without loss of generality, we may write P = P1 × · · · × Pm, where Pi ⊂ R. Given a valuation n vector v and an item i, let v(i) ∈ R be all n buyers’ values for item i. Let profitp(v(i)) be the profit obtained by selling item i at a price of p, i.e., profitp(v(i)) = 1{||v(i)||∞≥p}(p−c(ei)). Notice that for Pm any p ∈ P, profitp(v) = i=1 profitp[i](v(i)). Let Xi be the support of the distribution over v(i) and let Ui = supp∈Pi,v(i)∈Xi profitp(v(i)). Next, let X be the support of D. By definition, since U is the maximum profit achievable via item-pricing mechanisms over valuation vectors from X , we may write U = supv∈X ,p∈P profitp(v). Since D is item-independent, we know that X = X1 × · · · × Xm. Pm Therefore, we may apply Lemma 10.1, which tells us that U = Ui. Finally, each class of  i=1 functions profitp : p ∈ Pi is (1, 1)-delineable, since for v(i) ∈ Xi, profitv(i)(p) is linear so long

52 as ||v(i)||∞ ≤ p or ||v(i)||∞ > p. By Corollary 4.2, we may conclude that for any set of samples N  p  S ∼ D , RbS (M) ≤ O U 1/N . 0 The bound on RbS (M ) follows by almost the exact same logic, except for a few adjustments. nm First of all, the class is defined by nm parameters coming from some set P ⊆ R , since there are n non-anonymous prices per item. Without loss of generality, we assume P = P1 × · · · × Pm, n where Pi ⊆ R is the set of non-anonymous prices for item i. Given a set of non-anonymous prices n p ∈ R for item i, let profitp(v(i)) be the profit of selling the item the bidders defined by v(i) given the prices p. Notice that profitv(i)(p) is linear so long as for each bidder j, p(ej) is either larger  than their value for item i or smaller than their value. Thus, the set profitp : p ∈ Pi is (n, n)- delineable. Defining each Ui in the same way as before, Lemma 10.1 in Appendix 10 guarantees that Pm N U = i=1 Ui. Therefore, by Corollary 4.2, we may conclude that for any set of samples S ∼ D , 0  p  RbS (M ) ≤ O U n log n/N .

Lemma 4.6. Let M be the set of length-` item lottery menus. If the bidder is additive, D N is item-independent, and the cost function is additive, then for any set S ∼ D , RbS (M) ≤ 180p2` log (8`3).

Proof. For a given menu M = (M1,...,Mm) of item lotteries, let profitMi (v) be the profit achieved from menu Mi. Since the cost function is additive,

profit (v) = p − [c(q)] = p − c(e ) · φ , Mi i,v E i,v i i,v q∼φi,v where (pi,v, φi,v) is the lottery in Mi that maximizes the buyer’s utility. Notice that profitM (v) = Pm i=1 profitMi (v(ei)). Let Xi be the support of the distribution Di over v(ei) and let Ui = supMi,v(ei)∈Xi profitMi (v(ei)). By definition, since U is the maximum profit achievable via item menus over valuation vectors from X , we may write U = supv∈X ,M∈M profitM (v). Since D is a product distribution, we know that X = X1 × · · · × Xm. Therefore, we may apply Lemma 10.1, Pm which tells us that U = i=1 Ui. Finally, for each i ∈ [n], the class of all single-item lotteries 2 `+1 Mi is (2`, ` )-delineable, since for v(ei) ∈ Xi, the lottery the buyer chooses depends on the 2 0 0 (j) (j) (j ) (j ) 0 hyperplanes φi v(ei) − pi = φi v(ei) − pi for j, j ∈ {0, . . . , `}, and once the lottery is fixed, profitMi (v) is a linear function. Theorem 4.7. Let M and M0 be the classes of anonymous and non-anonymous item-pricing mechanisms. Then Pdim (M) ≥ m and Pdim (M0) ≥ nm. The same holds if M and M0 are the classes of second-price auctions with anonymous and non-anonymous reserves.

Proof. Let M be the class of item-pricing mechanisms with anonymous prices. We construct a set S of m single-bidder, m-item valuation vectors that can be shattered by M. Let v(i) be valuation (i) (i)  (1) (m) vector where v1 (ei) = 3 and v1 (ej) = 0 for all j 6= i and let S = v ,..., v . For any T ⊆ [m], let MT be the mechanism defined such that the price of item i is 2 if i ∈ T and otherwise, its price is 0. If i ∈ T , then profit (v(i)) = 2 and otherwise, profit (v(i)) = 0. Therefore, the MT MT targets z(1) = ··· = z(m) = 1 witness the shattering of S by M. This example also proves that the pseudo-dimension of the class of second-price auctions with anonymous reserve prices is also at least m, since in the single-bidder case, this class is identical to M. Next, let M0 be the class of item-pricing mechanisms with non-anonymous prices. We construct a set S of nm n-bidder, m-item valuation vectors that can be shattered by M0. For i ∈ [m] and (i,j) (i,j) (i,j) 0 0 j ∈ [n], let v be valuation vector where vj (ei) = 3 and vj0 (ei0 ) = 0 for all (i , j ) 6= (i, j). Let

53  (i,j) S = v i∈[m],j∈[n]. For any T ⊆ [m] × [n], let MT be the mechanism defined such that the price of item i for bidder j is 2 if (i, j) ∈ T and otherwise, it is 0. If (i, j) ∈ T , then profit (v(i,j)) = 2 MT and otherwise, profit (v(i,j)) = 0. Therefore, the targets z(i,j) = 1 for all i ∈ [m], j ∈ [n] witness MT the shattering of S by M. This example with the prices as reserve prices also proves that the pseudo-dimension of the class of second-price auctions with non-anonymous reserve prices is at least nm.

11 Proofs from Section 5

Theorem 5.1. Let M be the class of non-anonymous item-pricing mechanisms over additive bid- Pn ders and let w :[n] → R be a weight function such that i=1 w (i) ≤ 1. Then for any δ ∈ (0, 1), N with probability at least 1−δ over the draw S ∼ D , for any k ∈ [n] and any mechanism M ∈ Mk, r s km log (4nm) 2 4 |profit (M) − profit (M)| ≤ 360U + 4U ln . S D N N δ · w (k)

Proof. This theorem follows from the fact that Mk is (km, nm)-delineable. Every mechanism in Mk is defined by km parameters, one price per item per price group, and for every buyer j, the items they are willing to buy are defined by the m hyperplanes vj(ei) = pj(ei) for every item i. Therefore, the theorem follows from Theorems 3.8 and 3.9, and by multiplying δ with w(k).

Two-part tariffs. Let M be the class of anonymous two-part tariff menus, by which we mean the union of all length-` menus of two-part tariffs with anonymous prices. Similarly, let M0 be the class of non-anonymous two-part tariff menus. For a given menu M of two-part tariffs, let `M be the length of its menu. P Theorem 11.1. Let w : N → [0, 1] be a weight function such that w(i) ≤ 1. Then for any δ ∈ (0, 1), with probability at least 1 − δ over the draw of a set of samples of size N from D, for any mechanism M ∈ M, the difference between the average profit of M over the set of samples and the expected profit of M over D is r s ! ` log(nκ` ) 1 1 O U M M + U log . N N δ · w(`M )

Also, with probability at least 1 − δ over the draw of a set of samples of size N from D, for any mechanism M ∈ M0, the difference between the average profit of M over the set of samples and the expected profit of M over D is at most r s ! n` log(nκ` ) 1 1 O U M M + U log . N N δ · w(`M )

Q-boosted λ-auctions. For the next theorem, given a λ-auction M, let QM be the set of all allocations Q such that λ(Q) > 0.

Theorem 11.2. Let M be the class of λ-auctions and let w be a weight function which maps sets of allocations Q to [0, 1] such that P w(Q) ≤ 1. Then for any δ ∈ (0, 1), with probability at least 1−δ over the draw of a set of samples of size N from D, for any mechanism M ∈ M, the difference

54 between the average profit of M over the set of samples and the expected profit of M over D is at most r s ! |Q | log(n|Q |) 1 1 O U M M + U log . N N δ · w(QM )

Menu lotteries. Let M be the class of lottery menus, by which we mean the union of all length-` lottery menus. For a given lottery menu M, let `M be the length of its menu. P Theorem 11.3. Let w : N → [0, 1] be a weight function such that w(i) ≤ 1. Then for any δ ∈ (0, 1), with probability at least 1 − δ over the draw of a set of samples of size N from D, for any mechanism M ∈ M, the difference between the average profit of M over the set of samples and the expected profit of M over D is r s ! ` log(n` ) 1 1 O U M M + U log . N N δ · w(`M )

55