Repeated Games with Many Players∗

Takuo Sugaya and Alexander Wolitzky Stanford GSB and MIT

October 1, 2021

Abstract

Motivated by the problem of sustaining cooperation in large communities with lim- ited information, we analyze sequences of repeated games where the population size N, the discount factor δ, and the monitoring structure (with channel capacity denoted by C) vary together. We show that if (1 δ) N/C then all payoffs − → ∞ are consistent with approximately myopic play. A folk theorem under a novel identi- fiability condition provides a near converse. For example, if (1 δ) N log (N) /C 0 − → then a folk theorem holds under random auditing, where each player’s action is mon- itored with the same probability in every period. If attention is restricted to linear equilibria (a generalization of strongly symmetric equilibria), cooperation is possible only under much more severe parameter restrictions. Methodologically, we develop connections between repeated games and information theory.

Keywords: repeated games, large populations, information theory, Fisher information, channel capacity, random auditing, linear equilibrium JEL codes: C72, C73

∗For helpful comments, we thank Tommaso Denti, Drew Fudenberg, Stephen Morris, Yuliy Sannikov, Satoru Takahashi, and participants in several seminars. Wolitzky acknowledges financial support from the NSF. 1 Introduction

Large groups of individuals often have a remarkable capacity for cooperation, even in the ab- sence of external contractual enforcement (Ostrom, 1990; Ellickson, 1991; Seabright, 2004). Cooperation in large groups usually seems to rely on high-quality monitoring of individual agents’actions, together with sanctions that narrowly target deviators. These are key fea- tures of the community resource management settings documented by Ostrom (1990), as well as the local public goods provision environment studied by Miguel and Gugerty (2005), who in a development context found that parents who fell behind on their school fees and other voluntary contributions faced social sanctions. Large cartels seem to operate similarly. For example, the Federation of Quebec Maple Syrup Producers– a government-sanctioned cartel that organizes more than 7,000 producers, accounting for over 90% of Canadian maple syrup production– strictly monitors its members’sales, and producers who violate its rules regularly have their sugar shacks searched and their syrup impounded, and can also face fines, legal action, and ultimately the seizure of their farms (Kuitenbrouwer, 2016; Edmiston and Hamilton, 2018). In contrast, we are not aware of any evidence that individual maple syrup producers– or the parents studied by Miguel and Gugerty, or the farmers, fishers, and herders studied by Ostrom– are incentivized by the fear of starting a price war or other general breakdown of cooperation. The principle that large-group cooperation requires precise monitoring and personalized sanctions seems like common sense, but it is not reflected in current models. The standard analysis of repeated games with patient players (e.g., Fudenberg, Levine, and Maskin, 1994, henceforth FLM) fixes all parameters of the game except the discount factor δ and considers the limit as δ 1. This approach does not capture situations where, while → players are patient (δ 1), they are not necessarily patient compared to the population ≈ size N (so (1 δ) N may or may not be close to 0). In addition, since standard results are − based on statistical identifiability conditions that hold generically regardless of the number of players, they also do not capture the possibility that more information may be required to support cooperation in larger groups. Finally, since there is typically a vast multiplicity of cooperative equilibria in the δ 1 limit, standard results also say little about what kind of → strategies must be used to support large-group cooperation, for example whether it is better to rely on individual sanctions (e.g., fines) or collective ones (e.g., price wars).

1 In this paper, we extend the standard analysis of repeated games with imperfect moni- toring by considering sequences of games where the population size, discount factor, stage game, and monitoring structure all vary together. These sequences of games can be quite general: we assume only uniform upper bounds on the magnitude of the players’stage-game payoffs and the number of actions available to each player, and a uniform lower bound on “noise,”formalized as a constraint that each player takes each available action with positive probability. Our results characterize sequences of N, δ, and measures of the “informative- ness” of the monitoring structure for which, in the limit, cooperation is either impossible (i.e., play is almost myopic in every equilibrium) or possible (i.e., a folk theorem holds), for any stage games satisfying our conditions. We also show that cooperation is possible only under much more restrictive conditions if society exclusively relies on collective sanctions. In sum, we show that large-group cooperation requires a lot of patience and/or a lot of information, and cannot be based on collective sanctions for reasonable parameter values. We now preview our main ideas and results. A useful measure of the informativeness of a monitoring structure is its channel capacity, C. This is a standard measure in information theory, which in the current context is defined as the maximum expected reduction in un- certainty (entropy) about the realized action profile that results from observing all players’ signals (which can be distinct, as monitoring need not be public), where the maximum is taken over strategy profiles. Channel capacity obeys the elementary inequality C log Y , ≤ | | where Y is the set of possible signal realizations. Due to this inequality, our results based on channel capacity are more general than they would be if we simply measured informativeness by the cardinality of the set of signal realizations. Our first result is that if (1 δ) N/C along a sequence of repeated games satisfying − → ∞ our assumptions, then in the limit payoffs in any Nash or communication equilibrium are consistent with almost-myopic play. This result is an immediate corollary of some general connections between repeated games and information theory that we develop. These con- nections take the form of two lemmas, the first of which shows that, for any communication equilibrium in any repeated game with a full-support monitoring structure, and any possible deviation by any player, we have

δ deviation gain (detectability) (payoff variance). ≤ 1 δ r −

2 In this inequality, a player’s deviation gain and payoff variance are both assessed at the equilibrium occupation measure, which is the discounted, time-averaged distribution over action profiles induced by the equilibrium; and a deviation’s detectability is defined as the expected square likelihood difference of the signal under the deviation, as compared to equi- librium play.1 The inequality is established by a variance decomposition approach. Our second lemma then shows that, in games with a uniform lower bound on noise, per-capita detectability is at most a constant multiple of per-capita channel capacity. This lemma is established by standard information theory methods, such as Pinsker’s inequality. Putting the lemmas together, we see that if (1 δ) N/C then the per-capita deviation gain at − → ∞ the occupation measure converges to 0, and hence payoffs are consistent with almost-myopic play. Our second result provides a near-converse to the first, under some additional structure on monitoring. It shows that, for any sequence of repeated games with public, product- structure monitoring that satisfies an individual identifiability condition with η slack (where η varies along the sequence), and where (1 δ) log (N) /η 0, a folk theorem holds in the − → limit.2 A simple example of public, product-structure monitoring that satisfies individual identifiability with η slack is random auditing, where in each period ηN randomly chosen players have their realized actions revealed perfectly, and nothing is revealed about the other players’actions. Since the channel capacity of random auditing is a constant multiple of ηN, this example describes a class of monitoring structures where a folk theorem holds whenever (1 δ) N log (N) /C 0. This shows that the condition of our first theorem is tight up to − → log (N) slack. Our final result considers the implications of restricting society to “collective”sanctions and rewards. We formalize this restriction by focusing on linear equilibria, where all on- equilibrium path continuation payoff vectors lies on a line in RN . Note that, when the stage game is symmetric and the line in question is the 45◦ line, linear equilibria reduce to strongly symmetric equilibria, which are a standard model of collusion through the threat of price wars (Green and Porter, 1984; Abreu, Pearce, and Stacchetti, 1986; Athey, Bagwell, and 1 ρ Sanchirico, 2004). We show that if there exists ρ > 0 such that (1 δ) exp (N − ) − → ∞ 1 “Detectability”is thus a discrete version of the Fisher information of the signal, a standard concept in Bayesian statistics. 2 The definitions of public, product-structure monitoring and individual identifiability follow FLM. The required slack in identifiability is measured in terms of detectability/Fisher information.

3 along a sequence of repeated games, then equilibrium payoffs are almost-myopic in the limit. Since this condition holds even if N much slower than δ 1, we interpret this result → ∞ → as an impossibility theorem for large-group cooperation in linear equilibria.3 The paper is organized as follows: following a discussion of related literature, Section 2 presents the model, Section 3 presents our anti-folk theorem, Section 4 presents our folk theorem, Section 5 presents our stronger anti-folk theorem for linear equilibria, and Section 6 concludes. Proofs are in Appendix A, with some details deferred to (Online) Appendix B.

1.1 Related Literature

Prior research on repeated games has established folk theorems in the δ 1 limit for fixed → N, as well as anti-folk theorems in the N limit for fixed δ, but has not considered the → ∞ case where N and δ vary together. As this is the case where monitoring precision is critical, our main results on the three-way interaction among N, δ, and monitoring do not have close antecedents in the literature. The most relevant folk theorem papers are FLM and Kandori and Matsushima (1998, henceforth KM). We build on their analysis (as well as that of Abreu, Pearce, and Stacchetti, 1990, henceforth APS), but as we explain in Section 4 their proofs of the folk theorem do not easily extend to the case where N and δ vary together. We thus take a different approach, which is based on “block strategies” as in Matsushima (2004) and Hörner and Olszewski (2006), and involves a novel application of some large deviations bounds. The most relevant anti-folk theorem papers are Fudenberg, Levine, and Pesendorfer (1998), Al-Najjar and Smorodinsky (2000, 2001), Pai, Roth, and Ullman (2014), and Awaya and Krishna (2016, 2019). Following earlier work by Green (1980) and Sabourian (1990), most of these papers establish conditions under which play becomes approximately myopic as N for fixed δ.4 As we explain in Section 3, these papers can be adapted to give re- → ∞ sults that apply when N, δ, and monitoring vary together, but these results are substantially weaker than ours, and in particular are not tight up to log terms. They key difference is that these papers do not use variance decomposition or information theory tools, which are

3 It is well-known that strongly symmetric equilibria are typically less effi cient than general public perfect equilibria in games with public monitoring. Our result is instead that the relationship between N and δ required for any non-trivial incentive provision differs dramatically between strongly symmetric (more generally, linear) equilibria and general ones. 4 Awaya and Krishna instead establish conditions under which access to cheap talk is valuable.

4 not required for their goal of establishing convergence to myopic play as N for fixed → ∞ δ, but are needed for our goal of quantifying the interaction among N, δ, and monitoring.5 Since the monitoring structure can vary with δ in our model, we also relate to the liter- ature on repeated games with frequent actions, where the monitoring structure varies with δ in a particular, parametric manner (Abreu, Milgrom, and Pearce, 1991; Fudenberg and Levine, 2007, 2009; Sannikov and Skrzypacz, 2007, 2010; Rahman, 2014). The most related results here are Sannikov and Skrzypacz’s(2007) theorem on the impossibility of collusion in duopoly with frequent actions and Brownian noise, as well as a similar result by Fudenberg and Levine (2007). These results are related to our anti-folk theorem for linear equilibrium, as we explain in Section 5, but are not closely related to our general results. We believe this paper is the first in the repeated games literature to measure monitoring precision by the mutual information between signals and actions, and to investigate max- imal equilibrium payoffs subject to a constraint on mutual information.6 Our results thus concern optimal monitoring structure design in repeated games, although we consider only asymptotic results rather than exact optimality for fixed parameters. In static moral hazard problems, optimal monitoring design subject to information-theoretic constraints was re- cently studied by Georgiadis and Szentes (2020), Li and Yang (2020), and Hoffman, Inderst, and Opp (2021), while an earlier literature (Maskin and Riley, 1985; Khalil and Lawar- ree, 1995; Lewis and Sappington, 1995) studied the choice between monitoring inputs and outputs. Random auditing, which we find to be approximately optimal, also arises in sta- tic, costly state-verification models (Reinganum and Wilde, 1985; Border and Sobel, 1987; Mookherjee and Png, 1989). Finally, in earlier work (Sugaya and Wolitzky, 2021a) we studied the relation among N, δ, and monitoring in a repeated random-matching game with private monitoring and incomplete information, where each player is “bad” with some probability. In that model, society has enough information to determine which players are bad after a single period

5 For fixed N, Hörner and Takahashi (2016) consider the rate of convergence in δ of the public perfect equilibrium payoff set to the limit set characterized by Fudenberg and Levine (1994) and FLM. While their analysis concerns rates of convergence in one parameter to a known limit payoff set, we ask how the relationship among different parameters determines the limit payoff set. Farther afield, there is also some work suggesting that cooperation in repeated games is harder to sustain in larger groups based on evolutionary models (Boyd and Richerson, 1988) and simulations (Bowles and Gintis, 2011; Chapter 5). 6 Entropy methods have previously been used in the repeated games literature to study automaton strate- gies (Neyman and Okada, 1999, 2000), communication (Gossner, Hernández, and Neyman, 2006), and reputation effects (Gossner, 2011; Ekmekci, Gossner, and Wilson, 2011; Faingold, 2020).

5 of play, but this information is decentralized, so the question is whether the players can aggregate information fast enough to ensure that it pays to be good. In contrast, in the current paper there is complete information and monitoring can be public, so the analysis concerns monitoring precision (the “amount”of information available to society) rather than the speed of information aggregation (the “distribution”of information).

2 Model

Stage Games. A stage game G = (I, A, uˆ) consists of a finite set of players I = 1,...,N , { } a finite action set Ai for each player i I, and a payoff function uˆi : A R for each ∈ → i I (where A = i I Ai and uˆ (a) = (ˆui (a))i I for a A). We assume that there is ∈ ∈ ∈ ∈ some independent noiseQ in the execution of players’ actions: for each player i, there is a i Ai Ai row-stochastic noise matrix π [0, 1] × such that, for any ai, aˆi Ai, when player i ∈ ∈ chooses an intended action a , her realized action is aˆ with probability πi . Let πi = i i ai,aˆi i i A A i minai,aˆi A πa ,aˆ and assume that π > 0 for each i. Define π [0, 1] × by πa,aˆ = i πa ,aˆ ∈ i i ∈ i i for all a, aˆ A. Given a stage game G and noise π, define the expected payoff function ∈ Q ui : A R by ui (a) = πa,aˆuˆi (ˆa) for each i. Denote the range of player i’sexpected payoffs → aˆ by u¯i = maxa ui (a) mina ui (a). As we will explain, the independent noise assumption −P is necessary for our anti-folk theorem (Theorem 1), but not for any of our other results, including our general bound on the strength of players’incentives (Lemma 1).

Monitoring Structures and Channel Capacity. A monitoring structure (Y, p) con- i sists of a finite set of signals of the form y = (y )i I and a family of conditional probability ∈ distributions p (y aˆ). The signal distribution thus depends only on the realized action profile. | Given a monitoring structure (Y, p), denote the probability of signal y at intended action

profile a by q (y a) = πa,aˆp (y aˆ). Assume without loss that for each y Y , there exists | aˆ | ∈ aˆ A such that p (y aˆ) > 0. Together with the assumption that πi > 0 for each i, this ∈ |P implies that q has full support: q (y a) > 0 for all y and a. | Given a distribution of realized action profiles αˆ ∆ (A), a standard measure of the ∈ informativeness of a monitoring structure (Y, p) about the realized action profile aˆ is the

6 mutual information between these random variables, defined as

αˆ (ˆa) p (y aˆ) I (ˆα) = αˆ (ˆa) p (y aˆ) log | .7 | αˆ (ˆa0) p (y aˆ0) y Y aˆ A aˆ0 A X∈ X∈ ∈ | P Mutual information measures the expected reduction in uncertainty (entropy) about aˆ that results from observing y. This is an endogenous object, as it depends on the prior distribution αˆ over realized action profiles. The channel capacity of (Y, p) is defined as

C = max I (ˆα) . αˆ ∆(A) ∈ This is an exogenous measure of the informativeness of (Y, p) about aˆ. Note that C is no greater than the entropy of the signal y, which in turn is at most log Y (Theorem 2.6.3 of | | Cover and Thomas, 2006, henceforth CT). Our folk theorem (Theorem 2) will assume that monitoring is public and has a product structure. A monitoring structure (Y, p) is public if all players observe the same signal: yi = yj for all i, j I, y Y . In this case, in a slight abuse of notation, we simply ∈ ∈ denote the public signal by y. A public monitoring structure (Y, p) has a product structure if there exists sets (Y ) and families of conditional distributions (q (y a )) i i I i i i i I,,yi Yi,ai Ai ∈ | ∈ ∈ ∈ such that Y = Yi and q (y a) = qi (yi ai) for all y, a: that is, the public signal y i | i | 8 consists of conditionallyQ independent signalsQ of each player’saction. A suffi cient condition

for (Y, p) to have a product structure is that there exist sets (Yi)i I and families of conditional ∈ distributions (p (y aˆ )) such that Y = Y and p (y aˆ) = p (ˆy a ) for all i i i i I,yi Yi,aˆi Ai i i i i i i | ∈ ∈ ∈ | | y, aˆ. Q Q

Repeated Games. A repeated game is described by a stage game, a noise matrix, a discount factor δ (0, 1), and a monitoring structure: that is, a tuple Γ = (G, π, δ, Y, p). In ∈ each period t = 1, 2,..., (i) the players observe the outcome of a public randomizing device

zt drawn from the uniform distribution over [0, 1], (ii) the players take actions a, (iii) the

realized action profile aˆ is drawn according to πa,aˆ, (iv) the signal y is drawn according to p (y aˆ), and (v) each player i observes yi.9 A history ht for player i at the beginning of period | i 7 In this paper, all logarithms are base e. 8 Our notation is thus that Y i denotes the set of possible signals observed by player i (for any monitoring structure), while Yi denotes the set of public signals of player i’saction (for a public monitoring structure). 9 The analysis and results are unchanged if player i also observes her own realized action aˆi.

7 t t 1 1 t thus takes the form hi = (zτ , ai,τ , yτ )τ−=1 , zt , with hi = z1, while a strategy σi for player t i maps histories hi to distributions of intended actions ∆ (Ai), for each t. Players maximize discounted expected payoffs. We consider sequences of repeated games Γ indexed by k N. Thus, I, A, uˆ, π, δ, ∈ Y , and p (and hence also C) all implicitly depend on k, although we usually suppress this dependence to simplify notation. Throughout, we restrict attention to sequences (Γ)k with uniformly bounded payoffs and uniformly bounded noise: there exist u¯ > 0 and π > 0 such k k k k k that, for all k N and i I , we have u¯i u¯ and πi π. Note that since πi Ai 1, the ∈ ∈ ≤ ≥ ≤ k latter assumption implies that Ai 1/π for all k and i. ≤ Our anti-folk theorem will apply not only for any Nash equilibrium, but also for any

communication equilibrium with a mediator who observes y (but not a or aˆ). To define the relevant communication equilibrium concept, consider the mediated repeated game, where there is a mediator who at the end of each period observes y, and at the beginning of each 10 period sends a private message ri Ri to each player i, for some sets (Ri)i I . An outcome ∈ ∈ µ of the game is a distribution over paths of actions and signals, (A Y )∞. For any equi- × librium concept, the set of equilibrium outcomes is larger in the mediated game than in the unmediated game, because the mediator can always babble. An equilibrium in the mediated

game is canonical if, for each player i, Ri = Ai and player i’sequilibrium strategy is to take

ai = ri at every history. The mediator’smessages are thus interpreted as recommendations in a canonical equilibrium. Since q has full support, the set of sequential equilibrium out- comes in the mediated game equals the set of canonical Nash equilibrium outcomes in the mediated game.11 The elements of this set are the communication equilibria.12

Sets of Payoffs. Given a stage game G with (expected) payoff functions (ui)i, we define

some relevant sets of payoff vectors. The feasible payoff set is V = co u (a) a A , where { } ∈ co denotes convex hull. Let V ∗ V denote the set of payoff vectors that Pareto-dominate ⊆ a Nash equilibrium payoff in G: that is, v V ∗ iff v V and there exists a static Nash ∈ ∈ 10 This formalism can even allow the possibility that the mediator observes more information than all the “real players” in the game put together, by adding a dummy player who does not take actions but whose signal is observed by the mediator; equivalently, we could add add an N + 1st component to the signal y, which only the mediator observes. 11 This is intuitve, and formally follows from Proposition 4.1 of Sugaya and Wolitzky (2021b). 12 A communication equilibrium in our sense is thus the same as a communication equilibria as defined by Forges (1986) in the game with an additional dummy player who takes no actions but observes y, whose only role is to report y to the mediator.

8 13 N equilibrium α ∆ (Ai) in G such that v u (α). For each v R and ε > 0, let ∈ i ≥ ∈ N Bv (ε) = [vi ε, vi + ε], and let B (ε) = v R : Bv (ε) V ∗ . That is, B (ε) is the set i − Q ∈ ⊆ N of payoff vectors v R such that the cube with center v and side-length 2ε lies entirely Q ∈ within V ∗.

A manipulation for a player i is a mapping α˜i : Ai ∆ (Ai). The interpretation is that → when player i is recommended action ri, she instead plays the mixed action α˜i (ri). Player i’s deviation gain from manipulation α˜i at a (possibly correlated) action profile distribution α ∆ (A) is ∈ Di (˜αi α) = α (r)(ui (˜αi (ri) , r i) ui (r)) , | − − r A X∈ and her maximum deviation gain at α ∆ (A) is ∈

Di (α) = max Di (˜αi α) . α˜i:Ai ∆Ai → |

The per-player average deviation gain at α is i Di (α) /N. For each ε > 0, the set of action distributions consistent with ε-myopic play isP

1 A (ε) = α ∆ (A): D (α) ε , N i ( ∈ i ≤ ) X and the set of payoff vectors consistent with ε-myopic play is

N M (ε) = v R : v = u (α) for some α A (ε) . ∈ ∈  Note that a few players can have large deviation gains at an action distribution α A (ε). ∈ However, so long as the effect of each player’saction on the sum of her opponents’payoffs is bounded, population payoffs at any α A (ε) must be close to those at an action distribution ∈ where all players’deviation gains are less than ε. Note also that Di (α) is convex as the maximum of affi ne functions, and hence A (ε) and M (ε) are convex sets. Our anti-folk theorem will provide conditions under which all equilibrium payoff vectors are contained in M (ε), while our folk theorem will provide conditions under which all payoff vectors in B (ε) arise in equilibrium. As a check that the set B (ε) is reasonably large for all N, in Appendix B.1 we consider a canonical public-goods game where each player chooses

13 Here and throughout, we linearly extend payoff functions to mixed actions.

9 Contribute or Don’tContribute, and a player’spayoff is the fraction of players who contribute less a constant c (0, 1) (independent of N) if she contributes herself. In this game, we ∈ show that for every v (0, 1 c) there exists ε > 0 such that the symmetric payoff vector ∈ − where all players receive payoff v lies in B (ε) for all N.

3 Necessary Conditions for Cooperation

3.1 Anti-Folk Theorem

Our first result is that whenever per-capita channel capacity is much smaller than the dis- count rate, payoffs are consistent with almost-myopic play.

Theorem 1 Consider any sequence of repeated games (Γ)k such that (1 δ) N/C − → ∞ (where δ, N, and C depend on k) and any corresponding sequence of communication equilib- rium payoffs (v)k. For any ε > 0, there exists k¯ such that, for every k k¯, the payoff vector ≥ vk is consistent with ε-myopic play.

Theorem 1 implies that cooperation in large groups requires a large amount of infor- mation, a high degree of patience, or both. Whether the implied necessary condition for cooperation– that (1 δ) N/C is not too large– is highly restrictive for large N depends on − the type of game under consideration. If the space of possible signal realizations Y is fixed independently of N, then C log Y for all N, so the necessary condition implies that δ ≤ | | must converge to 1 at least as fast as N , which is a restrictive condition. This nega- → ∞ tive conclusion applies for traditional applications of repeated games with public monitoring where the signal space is fixed independently of N, such as when the public signal is the market price facing Cournot competitors, the level of pollution in a common water source, or the output of team production. However, in other types of games C naturally scales linearly with N, so that (1 δ) N/C − is small whenever players are patient. In repeated games with random matching (Kandori, 1992; Ellison, 1994; Deb, Sugaya, and Wolitzky, 2020), players match in pairs each period i and y = am(i,t),t, where m (i, t) I i denotes player i’s period-t partner. In these t ∈ \{ } games, C = N log Ai , so per-capita channel capacity is independent of N. Intuitively, in | | random matching games each player gets a distinct piece of information, so the total amount

10 of information available to society is proportional to population size. Channel capacity may also scale linearly with N in public-monitoring games where the public information includes a distinct signal of each player’saction, as in the ratings systems used by websites like eBay and AirBnB. In general, C/N may be constant in games where each player is monitored “separately,” but it converges to 0 as N in games where the players are → ∞ jointly monitored through an aggregate statistic which is independent of N.

Remark 1 In applications like Cournot competition, pollution, or team production, the sig- nal space may be modeled as a continuum, in which case the constraint C log Y is vacuous. ≤ | | However, our results extend to the case where Y is a compact metric space and there exists a compact metric space X and a function f N : AN X (which can vary with N) such that → the signal distribution admits a conditional density of the form pY X (y x), where Y , X, and | | pY X are fixed independent of N. (For example, in Cournot competition x is industry output | and y is the market price; and the market price depends on x and noise, where the “amount of noise” is independent of N.) In this case,

N αˆ (ˆa) pY X y f (ˆa) N | C = max αˆ (ˆa) pY X y f (ˆa) log | N , αˆ ∆(A) y Y | | aˆ A αˆ (ˆa0) pY X (y f (ˆa0)) ∈ Z ∈ aˆ A 0∈ | |  X∈  P which is bounded by

¯ pX (x) pY X (y x) C = max pX (x) pY X (y x) log | | . pX ∆(X) | | p (x) p (y x) ∈ y Y x X x X X Y X Z ∈ Z ∈ ∈ | | R Since C¯ is independent of N, it follows that C is bounded independently of N.

The proof of Theorem 1 proceeds in two steps. First, we bound the strength of players’ incentives in terms of the detectability of manipulations (Lemma 1 in Section 3.2). This incentive bound holds generally for any repeated game and may have other applications. Second, we show that in our model with noisy action execution, detectability is bounded by per-capita channel capacity (Lemma 2 in Section 3.3). Combining the two bounds yields the theorem.

11 3.2 Bounding Incentives by Detectability

Note that if an action profile r A is recommended but player i manipulates according to ∈ α˜i : Ai Ai, the probability that signal y realizes equals →

q (y r;α ˜i) := α˜i (ai ri) q (y ai, r i) , | | | − ai Ai X∈ where α˜i (ai ri) is the probability that α˜i (ri) assigns to action ai. The following is a key | definition.

Definition 1 The detectability of manipulation α˜i at action profile r is

2 q (y r;α ˜i) q (y r) Fi (˜αi r) := q (y r) | − | . (1) | | q (y r) y Y   X∈ |

The maximum detectability of manipulation α˜i is Fi (˜αi) := maxr A Fi (˜αi r). ∈ |

That is, Fi (˜αi r) is the expected square likelihood difference when player i takes α˜i (ri) | rather than ri and her opponents take r i. Note that Fi (˜αi r) (and hence Fi (˜αi)) is well- − | defined because q has full support. For interpretation, note that Fi (˜αi r) is a discrete version | of the Fisher information in Bayesian statistics, which in the current setting would equal

∂ 2 q (y ri, r i) q (y r) ∂ri | − . | q (y r) y Y ! X∈ |

The Fisher information is the variance of the score, (∂/∂ri) log q (y ri, r i), which is the key | − information measure in moral hazard problems (Mirrlees, 1975; Holmström, 1979). Since the expectation of the score equals 0, the score is “often large”iff the Fisher information is large, in which case the signal is useful for providing incentives.14

Denote the variance of player i’spayoff under an action profile distribution α by Vi (α) =

Vara α (ui (a)). For any subset of players J I, action profile distribution α ∆ (A), ∼ ⊆ ∈ and profile of manipulations α˜J = (˜αi)i J for players i J, we also define “group average” ∈ ∈ 14 It has previously been observed that the Fisher information plays an important role in moral hazard problems with quadratic utility (Jewitt, Kadan, and Swinkels, 2008; Hebert, 2018) or frequent actions (Sadzik and Stacchetti, 2015).

12 versions of the deviation gain Di, detectability Fi, and payoff variance Vi, by

1 1 DJ (˜αJ α) = Di (˜αi α) ,DJ (˜α) = Di (α) , | J | J i J i J | | ∈ | | ∈ 1 X 1 X 1 FJ (˜αJ r) = Fi (˜αi r) ,FJ (˜αJ ) = Fi (˜αi) , and VJ (α) = Vi (α) . | J | J J i J i J i J | | X∈ | | X∈ | | X∈ µ Finally, given an outcome µ ∆ ((A Y )∞), let α ∆ (A) denote the marginal of ∈ × t ∈ µ over period-t action profiles, and define αµ ∆ (A), the occupation measure over action ∈ profiles induced by µ, by

∞ µ t 1 µ α (a) = (1 δ) δ − αt (a) for each a A. − t=1 ∈ X Note that the repeated-game payoffs under outcome µ equal

∞ ∞ t 1 µ t 1 µ µ µ (1 δ) δ − α (a) u (a) = (1 δ) δ − α (a) u (a) = α (a) u (a) = u (α ) . − t − t t=1 a A a A t=1 a X X∈ X∈ X X (2) We can now state our incentive bound.

Lemma 1 For any communication equilibrium outcome µ, any subset of players J, and any profile of manipulations α˜J , we have

µ δ µ DJ (˜αJ α ) FJ (˜αJ ) VJ (α ). (3) | ≤ 1 δ r − In particular, any communication equilibrium payoff vector v is consistent with ε-myopic play, where

δ 2 ε = max FI (˜αI ) u¯ . 1 δ α˜I s −   Lemma 1 bounds players’deviation gains from any manipulation under the occupation measure αµ in terms of the discount factor, the variance of the players’continuation pay- offs under αµ, and the maximum detectability of the manipulation. Notable features of the bound include that is explicit, relies only on “static”variables, applies for any communica- tion equilibrium and any monitoring structure, and depends on the ratio of detectability and (1 δ). In contrast, the recursive characterizations of APS and FLM apply only for public − 13 equilibria and are explicit only in the δ 1 limit; and the incentive bounds of Fudenberg, → Levine, and Pesendorfer (1998), al-Najjar and Smorodinsky (2000, 2001), Pai, Roth, and Ull- man (2017), and Awaya and Krishna (2016, 2019) depend on the ratio of the total variation 15 distance q (y r;α ˜i) q (y r) TV := (1/2) y Y q (y r;α ˜i) q (y r) and (1 δ). Under k | − | k ∈ | | − | | − our full support assumption, q (y r;α ˜i) q (y r) is greater than a constant multiple of k | −P | kTV Fi (˜αi r), so these results give a looser comparison between information and discounting | thanp Lemma 1 does, while the full force of the latter result is needed to prove Theorem 1. A relatively straightforward extension of the earlier results could be used to show that limit payoffs are almost-myopic if (1 δ) √N/ log Y , but this hypothesis much more − | | → ∞ restrictive than that of Theorem 1, and in particular is not tight up to log terms. The key technical difference is that the earlier approaches place no restrictions on continuation payoffs beyond feasibility, while we use the law of total variance to recursively bound the variance of continuation payoffs.16 The proof of Lemma 1 (in Appendix A.1) shows that if deviating according to manipu- 17 lation α˜i is unprofitable in each period, then (3) must hold for J = i . There are three { } steps. First, if deviating according to α˜i is unprofitable in period t, the conditional variance of player i’speriod-(t + 1) continuation payoff must be suffi ciently high compared to the devi- ation gain from α˜i in period t and the (inverse of the) maximum detectability of α˜i (equation (8) in Appendix A.1). This step follows from an application of Cauchy-Schwarz. Second, applying this lower bound on conditional variance recursively using the law of total variance, we show that a discounted sum of the variances of player i’sstage-game payoffs must exceed a discounted sum of the conditional variance bounds. Third, by Jensen’s inequality, this inequality relating a discounted sum of payoff variances and a discounted sum of bounds that depend on the deviation gain and detectability in each period implies a correspond- ing inequality relating the payoff variance and deviation gain evaluated at the equilibrium occupation measure and the maximum detectability (equation (9)), which simplifies to (3).

15 See, e.g., Proposition 4.1 of Awaya and Krishna (2019), which actually involves i q i (y r;α ˜i) q i (y r) TV , where q i denotes the marginal of q on Y − . This modification leads to ak tighter− | incentive− − bound| k for signals where− the impact of a player’s action on the distribution of y is much greater than its impact on the distribution of y i. The distinction between q i and q plays an important role in Awaya and Krishna’sanalysis but is irrelevant− in ours; our result and theirs− are thus complementary. 16 Notably, recursively decomposing the variance of continuation payoffs gives a useful bound on the strength of incentives, even though the equilibrium payoff set generally lacks a recursive structure (as is well known, cf. Kandori, 2002). This observation may be of independent interest. 17 Once (3) is established for singleton J, the extension to non-singleton J is immediate from Cauchy- Schwarz.

14 Since Lemma 1 may be of independent interest, we illustrate it with an example.

Example 1. Prisoner’sDilemma with Public, Product-Structure Monitoring. Suppose the stage game is the 2-player prisoner’s dilemma:

CD C 1, 1 1, 2 − D 2, 1 0, 0 − Assume that monitoring is public and has a product structure, where Y = C,D C,D { } × { } and pi (yi = ai ai) = χ (1/2, 1) for each i and ai C,D . | ∈ ∈ { } µ Let J = 1 , and consider the problem of finding α˜i to maximize Di (˜αi α ) subject { } | to (3). For any (intended) action profile distribution α = (αCC , αCD, αDC , αDD), we have

D1 (˜α1 α) = (αCC + αCD)α ˜1 (D C) 2 (αDC + αDD)α ˜1 (C D). Since D1 (˜α1 α) is decreas- | | − | | ing in α˜1 (C D) and Fi (˜αi) is increasing in α˜1 (C D) (as a manipulation that disobeys the | | recommendation with higher probability is weakly more detectable), the solution to this prob-

lem involves α˜1 (C D) = 0. Next, when α˜1 (C D) = 0, the detectability of manipulation α˜1 | | ( F1 (˜α1 r)) is maximized at r1 = C (and any value of r2, since monitoring has a product | structure). Letting α˜1 (D C) = ρ to ease notation, we therefore have | ρ (1 χ) + (1 ρ) χ χ 2 ρχ + (1 ρ) (1 χ) (1 χ) 2 F1 (˜α1) = χ − − − + (1 χ) − − − − χ − 1 χ    −  ρ2 (2χ 1)2 = − . χ (1 χ) −

Note that d1 (˜α1 α) and F1 (˜α1) are both linear in ρ, so the manipulation that maximizes | µ Di (˜αi α ) subject to (3) is given by (˜α1 (C D) = 0, α˜1 (D C) = 1). Therefore, (3) reduces to | | | the requirement that, in any communication equilibrium, we have

δ (2χ 1)2 αCC + αCD − Vi (α), ≤ s1 δ χ (1 χ) − −

2 where Vi (α) = αCC + 4αDC + αCD (αCC + 2αDC αCD) . We thus obtain a simple upper − − bound on probability that the players cooperate, as a function of the discount factor δ and the monitoring precision χ, which applies for any communication equilibrium.

15 3.3 Bounding Detectability by Channel Capacity

We now show that detectability can be bounded by per-capita channel capacity.

Lemma 2 For any subset of players J I and any profile of manipulations α˜J , we have ⊆ 4C FJ (˜αJ ) . (4) ≤ π2 J | | The proof of Lemma 2 (in Appendix A.2) follows from a relatively straightforward applica- tion of information theory results such as Pinsker’sinequality, as well as another application of Cauchy-Schwarz. Theorem 1 follows immediately from Lemmas 1 and 2. Proof of Theorem 1. By Lemmas 1 and 2, all communication equilibrium payoff vectors are consistent with ε-myopic play, where

4¯u2 δC ε = . π2 × (1 δ) N s −

Since u¯ and π are fixed independently of k, it follows that if limk (1 δ) N/C = then − ∞ limk ε = 0, completing the proof.

We can now explain why Theorem 1 requires noisy actions. Without noise, πa,aˆ = 1 a =a ˆ , and hence q (y a) = p (y aˆ) and π = 0, so Lemma 2 is vacuous. Indeed, de- { } | | tectability cannot be bounded by channel capacity in the absence of noise. To see this, suppose that the stage game is an N-player prisoner’s dilemma with a binary public mon- itoring structure, where y = 0 if every player cooperates, and y = 1 if any player defects. Obviously, mutual cooperation is an equilibrium outcome in this game with a moderate dis- count factor, regardless of N: under grim trigger strategies where the signal y = 1 triggers permanent defection, each player’s incentives in this game are the same as in a 2-player prisoner’s dilemma with perfect monitoring. This observation is consistent with Lemma 1 because detectability is infinite in this example: when the other players cooperate, a devia- tion to defection is perfectly detectable. However, channel capacity in this example is only log 2, so detectability is infinitely greater than channel capacity. Thus, in the absence of a noise a monitoring structure can effectively detect deviations (and support strong incentives) even if it not very “informative”in terms of channel capacity. In contrast, Lemma 2 shows that when noise is present, only informative signals can effectively detect deviations.

16 4 Suffi cient Conditions for Cooperation

4.1 Folk Theorem

We prove a folk theorem for unmediated repeated games with public, product-structure monitoring, which implies that the relationship among N, δ, and C in Theorem 1 is tight for these games (up to log N slack). As we explain below, this result also implies a folk theorem for mediated repeated games with arbitrary (non-public) monitoring structures.18 Under public monitoring, the public history ht at the beginning of period t takes the form t t 1 h = (zτ , yτ )τ−=1 , zt . A strategy σi for player i is public if it depends on player i’shistory t t hi only through its public component h .A perfect public equilibrium (PPE) is a profile of public strategies that, beginning at any period t and any public history ht, forms a Nash equilibrium from that period on.19 Denote the set of PPE payoff vectors by E RN . ⊆ For any η > 0, we say that a public monitoring structure satisfies individual identifiability with η slack if

2 qi (yi αi) qi (yi ai) qi (yi ai) | − | η for all i I, ai Ai, αi ∆ (Ai ai ) . | qi (yi ai) ≥ ∈ ∈ ∈ \{ } yi:qi(yi ai) η   X| ≥ | (5) This condition is a variant of FLM’s individual full rank condition and KM’s assumption

(A2”). It says that the detectability (discrete Fisher information) of a deviation from ai to any mixed action αi supported on Ai ai is at least η, from the perspective of an observer \{ } who ignores all signals that occur with probability less than η under ai. Intuitively, this requires that deviations from ai are detectable, and that in addition “detectability” does not come entirely from a few very rare signal realizations. This assumption will ensure that player i can be incentivized through rewards with “magnitude”of order (1 δ) /η, when the − magnitude of a reward is measured either by its maximum absolute value or its variance.20 For example, suppose that the monitoring structure is given by random auditing, where in

18 We also discuss possible extensions to non-product structure monitoring. 19 As usual, this definition allows players to consider deviations to arbitrary, non-public strategies; but such deviations are irrelevant because, whenever a player’sopponents use public strategies, she has a public strategy that is a best response. 20 If (5) were weakened by taking the sum over all yi (rather than only yi such that qi (yi ai) η), player i could be incentivized by rewards with variance O ((1 δ) /η), but not necessarily with maximum| ≥ absolute value O ((1 δ) /η). Our proof of the folk theorem requires− controlling both the variance and absolute value of players’rewards,− so we need the stronger condition.

17 every period ηN players (i.e., fraction η of the population) are selected uniformly at random,

and the public signal reveals their identities and their realized actions: that is, Yi = Ai ∪ {∅} and

η if yi =a ˆi, ηπai,yi if yi Ai, pi (yi aˆi) =  0 if y A aˆ , so that qi (yi ai) = ∈ | i i i |   ∈ \{ } 1 η if yi = .  1 η if yi = ,  − ∅ − ∅    For simplicity, suppose also that πa ,a = π < 1/ ( Ai + 1) for all i, ai = a0 . Since qi (yi ai) i i0 | | 6 i | ≥ ηπ for all ai, yi, we then have

2 q (a a ) max q (a a ) 2 qi (yi αi) qi (yi ai) i i i ai0 =ai i i i0 qi (yi ai) | − | | − 6 | | qi (yi ai) ≥ qi (ai ai) yi:qi(yi ai) ηπ    X| ≥ | | 2 (η (1 ( Ai 1) π) ηπ) = − | | − − ηπ2, η (1 ( Ai 1) π) ≥ − | | − so random auditing satisfies individual identifiability with ηπ2 slack.

Theorem 2 Consider any sequence of repeated games (Γ)k with public, product-structure monitoring, and any sequence of positive numbers (η) such that (1 δ) log (N) /η 0 k − → (where δ, N, and η depend on k) and in game Γk individual identifiability is satisfied with η slack. For any ε > 0, there exists k¯ such that, for every k k¯, in game Γk we have k ≥ B (ε) E. ⊆

Theorem 2 implies that the relationship among N, δ, and C in Theorem 1 is tight up to log (N) slack. To see this, again consider random auditing, which satisfies individual identifiability with ηπ2 slack. Since the number of possible realized actions for each player

is Ai 1/π, random auditing has a channel capacity of at most ηN log (1/π). Therefore, | | ≤ under random auditing Theorem 1 implies that payoffs in any communication equilibrium are consistent with almost-myopic play if (1 δ) /η , while Theorem 2 implies that a − → ∞ folk theorem holds in PPE if (1 δ) log (N) /η 0. − → Theorem 2 also implies a folk theorem in communication equilibrium for arbitrary (non- public) monitoring structures. To see this, given an arbitrary monitoring structure Y,˜ p˜ , call its publicization the public monitoring structure (Y, p) where Y i = Y˜ for all i and pi (˜y aˆ) =p ˜(˜y aˆ) for all i, y,˜ aˆ, where pi denotes the marginal of p on Y i, that is, the signal | | 18 i profile (˜y )i I is drawn from the same family of conditional distributions (p ( aˆ))aˆ A under ∈ ·| ∈ Y,˜ p˜ and (Y, p), but under Y,˜ p˜ each player i observes only y˜i, while under (Y, p) each i player i observes the entire profile (˜y )i I . Under our assumption that the mediator observes ∈ the signal profile, the Nash equilibrium payoff set in a mediated repeated game with an arbitrary monitoring structure Y,˜ p˜ (i.e., the communication equilibrium payoff set with monitoring structure Y,˜ p˜ ) is larger than the Nash equilibrium payoff set in the mediated repeated game with the publicized monitoring structure (Y, p), because the set of canonical strategy profiles is the same in both games, but the set of possible deviations is larger in the latter game. The Nash equilibrium payoff set in the mediated game with public monitoring structure (Y, p) is also obviously larger than the PPE payoff set in the unmediated game with the same monitoring structure. Therefore, whenever the publicization of an arbitrary monitoring structure Y,˜ p˜ satisfies the conditions of Theorem 2, the conclusion of the theorem applies for the communication equilibrium payoff set with monitoring structure Y,˜ p˜ .21

 Finally, note that Theorem 2 is a “Nash threat” folk theorem, recalling that V ∗ is de- fined as the set of payoffs that Pareto-dominate a static Nash equilibrium. To extend this result to a “minmax threat” theorem, players must be made indifferent among all actions in the support of a mixed strategy that minmaxes an opponent. This requires a stronger identifiability condition, similar to Kandori and Matsushima’sassumption (A1).22

4.2 Remarks on the Proof

Theorem 2 is a folk theorem for PPE in games with public monitoring. The standard proof approach for such results, following FLM and KM, relies on transfering continuation payoffs among the players along hyperplanes that are tangent to the boundary of the PPE payoff set. Unfortunately, this approach encounters serious diffi culties when N and δ vary together. The basic problem is that when N is large, changing each player’scontinuation payoff by a small amount can result in a large overall movement in the continuation payoff vector, which makes self generation diffi cult to satisfy. Mathematically, FLM’sproof relies the equivalence

21 As well as for the perfect communication equilibrium payoff set, defined by Tomala (2009). 22 Specifically, consider the Yi Ai matrix Qi whose (yi, ai) entry is qi (yi ai). We assume this matrix | | × | | 1 | 1 has full column rank, and hence there exists a Ai Yi matrix Pi− such that Pi− Pi is the identity matrix. | | × | | 1 A suffi cient condition for the minmax threat folk theorem is that the absolute value of each entry of Pi− is no more than 1/η (and (1 δ) log (N) /η 0). − →

19 of the L1 norm and the Euclidean norm in RN . Since this equivalence is not uniform in N, their proof does not apply when N and δ vary together. To see the problem in more detail, note that when individual identifiability holds with η slack, the per-period movement in each player’s continuation payoff required to provide incentives is of order (1 δ) /η, so the movement of the continuation payoff vector in RN is − O √N (1 δ) /η . Fix a ball B contained in V ∗, and consider the problem of generating − the point v = argmax w B w1– the point in B that maximizes player 1’spayoff– using contin- ∈ uation payoffs drawn from B. Since player 1’scontinuation payoff must be within O (1 δ) − distance of v, and the greatest movement along a tangent hyperplane is O √1 δ . FLM’s − 2 proof approach thus requires √N (1 δ) /η √1 δ, or (1 δ) N/η 1, while we as- −  − −  sume only (1 δ) log (N) /η 1. So, while the conditions for Theorem 2 are tight up to −  log (N) slack, FLM’sapproach requires slack N. Our proof of Theorem 2 (in Appendix A.3) is instead based on the “block strategy” approach introduced by Matsushima (2004) and Hörner and Olszewski (2006) in the context of repeated games with private monitoring. We view the repeated game as a sequence of T -period blocks of periods, where T is a number proportional to 1/ (1 δ). At the beginning − of each block, a target payoff vector is determined by public randomization, and with high probability the players take actions throughout the block that deliver the target payoff. Players accrue promised continuation payoff adjustments throughout the block based on the public signals of their actions, and the distribution of target payoffs in the next block is set so as to deliver the promised adjustments. Since individual identifiability is satisfied with η slack, the required adjustment to each player’scontinuation payoff in every period is O (1/η). By the law of large numbers, when T 1/η, with high probability the total adjustment  that a given player accrues over a T -period block is of order less than T , and is thus small enough that it can be delivered by appropriately specifying the distribution of target payoffs at the start of the next block. The main diffi culty in the proof is caused by the low-probability event that a player ac- crues an unusually large total adjustment over a block, so that at some point the distribution of target payoffs for the next block cannot be modified any further. In this case, the player can no longer be incentivized to take a non-myopic best response, and all players’behavior in the current block must change. Thus, if any player’spayoff adjustment is “abnormal,”all players’payoffs in that block may be far from the target equilibrium payoffs.

20 To prove the theorem, we must ensure that rare payoff-adjustment abnormalities do not compromise either ex ante effi ciency or the players’incentives. Effi ciency is preserved if the blocks are long enough that the probability that any player’spayoff adjustment is abnormal is small. Since the per-period payoff adjustment for each player is O (1/η) and the length of a block is O (1/ (1 δ)), standard concentration bounds imply that the probability that a given − player’spayoff adjustment is abnormal is exp ( O (η/ (1 δ))). Hence, by union bound, the − − probability that any player’s adjustment is abnormal is at most N exp ( O (η/ (1 δ))), − − which converges to 0 when (1 δ) log (N) /η 0. This step in the proof accounts for the − → log (N) slack. Finally, since all players’payoffs are affected when any player’spayoffadjustment becomes abnormal, incentives would be threatened if one player’saction affected the probability that another player’s adjustment becomes abnormal. We avoid this problem by letting each player’sadjustment depend only on the signals of her own actions. This separation of payoff adjustments across players is possible because we assume product structure monitoring. We do not know if Theorem 2 can be extended to non-product structure monitoring.23

5 Comparison with Linear Equilibria

We say that a communication equilibrium is linear if the continuation payoff vectors at all mediator histories lie on a line: for each player i = 1, there exists a constant bi R such that, 6 ∈ for all on-path mediator histories h, h0, we have wi (h0) wi (h) = bi (w1 (h0) w1 (h)), where − − 24 wi (h) denotes player i’sequilibrium continuation payoff at mediator history h. Relabeling the players if necessary, we can take bi 1 for all i without loss. Note that this notion of | | ≤ linear equilibrium generalizes that of a linear public perfect equilibrium in a game with public monitoring, where continuation payoff vectors at all public histories lie on a line, as well as that of a strongly symmetric equilibrium (SSE) in a symmetric game with public monitoring, where the line in question is additionally required to be the 45◦ line.

23 As noted above, we conjecture that the approach of FLM and KM yields a folk theorem if (1 δ) N/η2 0. Since their approach requires only pairwise identifiability, we conjecture that product structure− can be→ relaxed to pairwise identifiability if (1 δ) N/η2 0. We do not know if such a relaxation is possible under the more general hypothesis of Theorem− 2. → 24 t 1 A mediator history h takes the form (zτ , yτ , rτ ) − , zt for some t N. If h is on path in a communi- τ=1 ∈ t 1 t 1 cation equilibrium then (aτ )τ−=1 = (rτ )τ−=1 , so the mediator history encodes past actions as well as signals and public randomization realizations.

21 Linear equilibria are of interest because they capture collective incentive provision. If

bi 0 for all i, all players have the same preferences over histories, and are thus all rewarded ≥ or punished together. If bi < 0 for some i, the players can be divided into two groups, where each group is rewarded when the other is punished. Our final result is that cooperation in a linear equilibrium is possible only under extremely restrictive conditions on N and δ. We view this result as essentially an impossibility theorem for large-group cooperation under collective incentives.

Theorem 3 Consider any sequence of repeated games (Γ)k for which there exists ρ > 0 (in- 1 ρ dependent of k) such that (1 δ) exp (N − ) , and consider any corresponding sequence − → ∞ of linear equilibrium payoffs (v)k. For any ε > 0, there exists k¯ such that, for every k k¯, ≥ the payoff vector vk is consistent with ε-myopic play.

Theorem 3 differs from Theorem 1 not only in the required relationship between N and δ, but also in that Theorem 3’snegative conclusion applies no matter how informative the monitoring structure is. Intuitively, this is because optimal linear equilibria have a bang- bang form even when the realized action profile is perfectly observed, so a binary signal that indicates which of two extreme continuation payoff vectors should be played is as effective as any more informative signal. Theorem 3 is proved in Appendix A.4. To see the main idea, consider the case where

the game is symmetric and bi = 1 for all i, so linear equilibria are SSE. Suppose we wish to

enforce a symmetric pure action profile ~a0 = (a0, . . . , a0), where Di (~a0) = η, and suppose for

simplicity that Ai = 2 and πa ,aˆ = 1 π whenever ai =a ˆi, and πa ,aˆ = π otherwise. By | | i i − i i standard arguments, an optimal SSE takes the form of a “tail test,”where the players are

all punished if the number of players n who take action a0 falls below a threshold n∗. Due to the i.i.d. noise π, the distribution of n is approximately normal when N is large, with mean (1 π) N and standard deviation π (1 π) N. Denote the threshold z-score of the − − tail test by z∗ = (n∗ (1 π) N) / π (1p π) N, let φ and Φ denote the standard normal − − − pdf and cdf, and let x [0, u/¯ (1 pδ)] denote the size of the penalty when the tail test is ∈ − failed. We then must have

φ (z∗) x η and Φ(z∗) x u,¯ π (1 π) N ≥ ≤ − p

22 where the first inequality is incentive compatibility, and the second inequality says that the expected penalty cannot exceed the maximum difference between any two stage game payoffs. Dividing the first inequality by the second, we obtain

φ (z∗) η π (1 π) N − . Φ(z ) u¯ ∗ ≥ p The left-hand size of this inequality is the statistical score of a normal tail test, which is approximately equal to z∗ for z∗ < 0. Hence, z∗ must decrease at least linearly with − √N. But since φ (z∗) decreases exponentially with z∗, and hence exponentially with N, Theorem 3 now follows from incentive compatibility, which implies that the product of 25 φ (z∗) / π (1 π) N and u/¯ (1 δ) must exceed η. − − Theoremp 3 is related to Proposition 1 of Sannikov and Skrzypacz (2007), which is an anti- folk theorem for SSE in a two-player repeated game where actions are observed with additive, normally distributed noise, with variance proportional to 1/ (1 δ). (The interpretation is − r∆ that players change their actions every ∆ units of time, where δ = e− for fixed r > 0 and variance is inversely proportional to ∆, for example as a consequence of observing the average increments of a Brownian process.) As a tail test is also optimal in their setting, the reasoning just given implies that incentives can be provided only if 1/ (1 δ) increases − exponentially with the variance of the noise. Since in their model 1/ (1 δ) increases with − the variance only linearly, they likewise obtain an anti-folk theorem. Also, Proposition 2 of Fudenberg and Levine (2007) is an anti-folk theorem in a game with one long-run player and series of short-run players, where the long-run player’saction is observed with additive, normal noise with variance proportional to 1/ (1 δ)ρ for some ρ > 0; and their Proposition − 3 is a folk theorem when the variance is constant in δ. Theorem 3 suggests that their anti- 1/(1 ρ) folk theorem extends whenever the variance asymptotically dominates (log 1/ (1 δ)) − − for some ρ > 0, while their folk theorem extends whenever the variance is asymptotically dominated by (log 1/ (1 δ))1/(1+ρ) for some ρ > 0.26 − 25 1+ρ Conversely, if πai,ai is suffi ciently large for each ai and (1 δ) exp N 0 for some ρ > 0, then a folk theorem holds for linear equilibria. Intuitively, a target action− profile a can→ now be enforced by a tail  test where the players are all punished only if aˆi = ai for all i, so that no one took the prescribed action. An earlier version of this paper contains a formal6 statement of such a result. 26 The analysis of tail tests as optimal incentive contracts under normal noise goes back to Mirrlees (1975). The logic of Theorem 3 shows that the size of the penalty in a Mirrleesian tail test must increase exponentially with the variance of the noise. We are not aware of references to this point in the literature.

23 6 Conclusion

This paper has developed a theory of large-group cooperation by considering sequences of repeated games where the population size, discount factor, stage game, and monitoring structure all vary together in a flexible manner. We derived general necessary and suffi cient conditions for cooperation to be possible in the limit of such a sequence. For a specific class of monitoring structures (random auditing), these conditions coincide up to log (N) slack. We also showed that cooperation in a linear equilibrium is possible only under much more stringent conditions. Methodologically, our results rely on some new connections between repeated games and information theory. We hope our results raise questions for future theoretical and applied research. On the theory side, we believe that the connections between repeated games, mechanism design, and information theory remain under-explored. Our earlier work (Sugaya and Wolitzky, 2021a) exploited these connections in the context of repeated random-matching games with incomplete information; other promising topics for such an approach include repeated games with frequent actions and static moral hazard in teams problems with large teams. As for applied work, it would be interesting to see more systematic empirical or experimental evidence on the determinants of large-group cooperation, for example on the relative effi cacy of individual and collective sanctions in populations of different sizes.

A Appendix

A.1 Proof of Lemma 1

First, note that the first sentence of the lemma implies the second, because if (3) holds µ 2 for J = I then α A (ε) for ε = (δ/ (1 δ)) (maxα˜ FI (˜αI ))u ¯ , and hence, by (2), ∈ − I t 1 µ µ (1 δ) δ − α (a) u (a) = u (α ) M (ε). t N a A t − ∈ ∈ ∈ µ Next, note that it suffi ces to establish (3) for singleton J, because if Di (˜αi α ) P P | ≤

24 µ (δ/ (1 δ)) Fi (˜αi) Vi (α ) for each i I, then by Cauchy-Schwarz, for any J I, − ∈ ⊆ p µ 1 µ 1 δ µ DJ (˜αJ α ) = Di (˜αi α ) Fi (˜αi) Vi (α ) | J | ≤ J 1 δ i J i J r | | X∈ | | X∈ − 1 δ µ δ µ Fi (˜αi) Vi (α ) = FJ (˜αJ ) Vi (α ). ≤ J 1 δ 1 δ s i J i J r | | − X∈ X∈ − The remainder of the proof thus fixes i I and establishes (3) for J = i . Given a path ∈ { } of action profiles (at)t∞=1, let ui,t = ui (at), and denote player i’s continuation payoff at the beginning of period t by ∞ t0 t wi,t = (1 δ) δ − ui,t. − t =t X0 Denote a history of actions and signals at the beginning of period t by ht = (at, yt). t Fix a communication equilibrium outcome µ and a manipulation α˜i. Let H denote the set of period-t histories ht that are reached with positive probability under µ, and define a t t t t t t H -measurable random variable Wi,t : H R by Wi,t (h ) = E [wi,t h ] for all h H . By → | ∈ the law of total variance, we have

t t Var (Wi,t+1) = Var E Wi,t+1 h + E Var Wi,t+1 h . (6) | |     t t t t t t Similarly, define Ui,t : H R by Ui,t (h ) = E [ui,t h ] for all h H . Let µ (h , r) denote → | ∈ the probability that history ht is reached in period t and then action profile r is played.

Claim 1 For each period t, we have

t 1 1 δ Var E Wi,t+1 h Var (Wi,t) − Var (Ui,t) and (7) | ≥ δ − δ 2 µ 2  t 1 δ Di (˜αi αt ) E Var(Wi,t+1 h ) − | . (8) | ≥ δ F (˜α )   i i   Proof. The derivation of equation (7) is elementary and is deferred to Appendix B.2. For equation (8), since µ is an equilibrium outcome, we have

1 δ t t − Di (˜αi) µ h , rt (q (yt rt) q (yt rt;α ˜i)) Wi,t+1 h , rt, yt . δ ≤ t | − | h ,rt,yt X  

This holds because, if she manipulates according to α˜i in period t, player i can guarantee an

25 t t expected continuation payoff of t µ (h , rt) q (yt rt;α ˜i) Wi,t+1 (h , rt, yt) by following the h ,rt,yt | mediator’srecommendations fromP period t+1 onwards. (In other words, in the continuation

game player i plays as if her period-t action were ri,t rather than α˜i (ri,t). This continuation play may not be optimal, but we are only giving a necessary condition.) Therefore,

1 δ µ t t − Di (˜αi αt ) µ h , rt (q (yt rt) q (yt rt;α ˜i)) Wi,t+1 h , rt, yt δ | ≤ t | − | h ,rt,yt X   t q (yt rt) q (yt rt;α ˜i) t t = µ h , rt q (yt rt) | − | Wi,t+1 h , rt, yt E Wi,t+1 h t | q (yt rt) − | h ,rt,yt  |  X     2 t q(yt rt) q(yt rt;˜αi) t µ (h , r ) q (y r ) | − | h ,rt,yt t t t q(yt rt) r | | ≤ t  t  t 2 P t µ (h , rt) q (yt rt)(Wi,t+1 (h , rt, yt) E [Wi,t+1 h ]) × h ,rt,yt | − | qP t Fi (˜αi) E [Var (Wi,t+1 h )], ≤ | p q where the second inequality follows from Cauchy-Schwarz, and the third follows from the

definition of Fi (˜αi). Squaring both sides and rearranging yields (8). By (6), (7), and (8), for each period t, we have

2 µ 2 1 1 δ 1 δ Di (˜αi αt ) Var (Wi,t+1) Var (Wi,t) − Var (Ui,t) + − | . ≥ δ − δ δ F (˜α )   i i

Since Var (Wi,1) = 0, recursively applying this inequality, for each T N we have ∈

T µ 2 T t 1 1 δ Di (˜αi αt ) δ Var (W ) (1 δ) δ − − | Var (U ) . i,T +1 δ F (˜α ) i,t ≥ − t=1 i i − ! X As payoffs are bounded, the left-hand side of this inequality converges to 0 as T , while → ∞ t 1 µ 2 the right-hand side converges to (1 δ) ∞ δ − ((1 δ) /δ) Di (˜αi α ) /Fi (˜αi) Var (Ui,t) , − t=1 − | t − which therefore must be non-positive. Hence,P we have   

µ 2 ∞ t 1 ∞ t 1 1 δ Di (˜αi αt ) (1 δ) δ − Var (U ) (1 δ) δ − − | i,t δ F (˜α ) − t=1 ≥ − t=1 i i X X t 1 µ 2 1 δ (1 δ) ∞ δ − Di (˜αi α ) − − t=1 | t , (9) ≥ δ F (˜α ) P i i 

26 where the second inequality follows from Jensen. Moreover, we have

µ µ µ 2 Vi (α ) = α (a)(ui (a) ui (α )) a − X ∞ t 1 µ µ 2 = (1 δ) δ − αt (a)(ui (a) ui (α )) − t=1 a − X X ∞ t 1 µ µ 2 (1 δ) δ − αt (a)(ui (a) ui (αt )) ≥ − t=1 a − X X ∞ ∞ t 1 t t 1 t = (1 δ) δ − Var ui (1 δ) δ − Var Ui , (10) − t=1 ≥ − t=1 X  X  2 2 where the first inequality follows because E (X x) E (X E [X]) for all x R, − ≥ − ∈ and the second follows by the law of total variance; and  

µ µ Di (˜αi α ) = α (a)(ui (˜αi (ai) , a i) ui (a)) − | a − X ∞ t 1 µ ∞ t 1 µ = (1 δ) δ − αt (a)(ui(˜αi (ai) , a i) ui (a)) = (1 δ) δ − Di (˜αi αt ) (11). − − t=1 a − − t=1 | X X X By (9), (10), and (11), we have

µ 2 µ 1 δ Di (˜αi α ) Vi (α ) − | . ≥ δ Fi (˜αi)

Rearranging and taking square roots yields (3).

A.2 Proof of Lemma 2

For any r A, let Pr ( r) denote the probability distribution over realized action profiles ∈ ·| aˆ and outcomes y that results under intended action profile r. Throughout the proof, aˆi

denotes a realized action for player i. Note that, for all aˆi Ai, y Y , and r A, we have ∈ ∈ ∈ Pr (ˆai, y r) = πr ,aˆ Pr (y r, aˆi) = q (y r) Pr (ˆai r, y), and therefore, since πr ,aˆ π, | i i | | | i i ≥

2 2 2 q (y r) q (y r) (Pr (y r, aˆi) q (y r)) = | (Pr (ˆai r, y) πri,aˆi ) | (Pr (ˆai r, y) πri,aˆi ) . | − | πri,aˆi | − ≤ π | −    (12)

27 For any r A and ai Ai, we thus have ∈ ∈ 2 1 2 1 (q (y ai, r i) q (y r)) = (πai,aˆi πri,aˆi ) Pr (y r, aˆi) q (y r) | − − | q (y r) − | ! y | y | aˆi X X X 2 1 = (πa ,aˆ πr ,aˆ ) (Pr (y r, aˆi) q (y r)) q (y r) i i − i i | − | y aˆ ! X | Xi 2 1 2 (πa ,aˆ πr ,aˆ ) (Pr (y r, aˆi) q (y r)) ≤ i i − i i q (y r) | − | aˆ y aˆ Xi X | Xi 2 2 q (y r) (Pr (ˆai r, y) πr ,aˆ ) ≤ π2 | | − i i y aˆi X X 2 2 q (y r) Pr (ˆai r, y) πr ,aˆ ≤ π2 | | | − i i | y aˆ ! X Xi 4 Pr (ˆai r, y) 2 q (y r) Pr (ˆai r, y) log | ≤ π | | πr ,aˆ y aˆ i i X Xi 4 Pr (ˆai r, y) = 2 Pr (ˆai, y r) log | . π | πr ,aˆ aˆ ,y i i Xi where the first inequality follows by Cauchy-Schwarz, the second follows by (12) and 2 (πa ,aˆ πr ,aˆ ) 2, the third is immediate, and the fourth follows by Pinsker’s in- aˆi i i − i i ≤ 2 equality (CT, Lemma 11.6.1). Note that the last line equals (4/π ) I (ˆai; y r), where I ( ; r) P | · ·| denotes mutual information conditional on r.

Hence, for any subset of players J I, any profile of manipulations α˜J , and any action ⊆ profile r A, we have ∈ 2 1 (q (y a˜i (ri) , r i) q (y r)) 4 4 FJ (˜αJ r) = | − − | I (ˆai; y r) = I (ˆaJ ; y r) , | J q (y r) ≤ π2 J | π2 J | i J y i J | | X∈ X | | | X∈ | | where the last equality follows because (ˆai)i J are independent conditional on r. ∈ Now note that

I (ˆaJ ; y r) = I (ˆa; y r) I aˆI J ; y r, aˆJ I (ˆa; y r) = I (ˆa, r; y) I (r; y) | | − \ | ≤ | − = I (ˆa; y) + I (r; y aˆ) I (r;y) = I (ˆa; y) + 0 I (r; y) I (ˆa; y) C, | − − ≤ ≤ where the first equality follows by the chain rule for mutual information (CT, Theorem

28 2.2.1), the first inequality follows because mutual information is non-negative, the second and third equalities again follow by the chain rule, the fourth equality follows because r and y are independent conditional on aˆ, the second inequality again follows by non-negativity, and the last inequality follows from the definition of channel capacity. Hence,

4C FJ (˜αJ r) for all r A, | ≤ π2 J ∈ | | and (4) follows.

A.3 Proof of Theorem 2

A.3.1 Preliminaries

Fix any ε > 0. If ε u/¯ 2 then B (ε) = and the conclusion of the theorem is trivial, so ≥ ∅ assume without loss that ε < u/¯ 2. We begin with two preliminary lemmas. First, for each

i I and ri Ai, we define a function fi,ri : Yi R that will later be used to specify player ∈ ∈ → i’scontinuation payoff as a function of yi.

Lemma 3 If individual identifiability holds with η slack, then for each i I and ri Ai ∈ ∈ there exists a function fi,ri : Yi R such that →

E [fi,ri (yi) ri] E [fi,ri (yi) ai0 ] u¯ for all ai0 = ri, (13) | − | ≥ 6

E [fi,ri (yi) ri] = 0, (14) | 2 Var (fi,r (yi) ri) u¯ /η, (15) i | ≤ fi,r (yi) 2¯u/η for all yi. (16) | i | ≤

Proof. Fix i and ri. Let Y ∗ = yi : qi (yi ri) η , and let i { | ≥ }

qi (yi ri) ˜ qi (yi ai0 ) Qi (ri; Yi∗) = | and Qi (ri; Yi∗) = | . qi (yi ri) qi (yi ri) !yi Y a0 =ri !yi Y | ∈ i∗ [i6 | ∈ i∗ p p Note that (5) is equivalent to

d Qi (ri; Y ∗) , co Q˜i (ri; Y ∗) √η for all i I, ri Ai, i i ≥ ∈ ∈   

29 Y where d ( , ) denotes Euclidean distance in R i∗ . Hence, by the separating hyperplane theo- · · | | Yi∗ rem, there exists x = (x (yi))yi Y ∗ R such that x = 1 and (Qi (ri; Yi∗) q) x √η for ∈ i ∈ | | k k − · ≥ all q Q˜ (r ; Y ). By definition of Q and Q˜ , this implies that (q (y r ) q (y a )) x (y ) i i i∗ i i yi Y i i i i i i0 i ∈ ∈ i∗ | − | ≥ q (y r ) η for all a = r . Defining f (y ) =u ¯ x (y ) q (˜y r ) x (˜y ) / q (y r ) η i i i i0 i i,ri i i y˜i PYi i i i i i i i | 6 − ∈ | | pfor all yi Yi∗ and fi,ri (yi) = 0 for all yi Yi∗, conditions (13) and (14) hold. p Moreover, ∈ 6∈ P since [f (y ) r ] = 0 and the term q (˜y r ) x (˜y ) is independent of y , we have E i,ri i i y˜i Yi i i i i i | ∈ |

P 2 2 2 2 2 2 u¯ x (yi) ux¯ i (yi) u¯ x (yi) u¯ Var (fi,ri (yi) ri) = E E , | qi (yi ri) η − qi (yi ri) η ≤ η ≤ η " # " # yi Y | | X∈ i∗ p and hence (15) holds. Finally, (16) holds since, for each yi Y ∗, ∈ i

fi,r (yi) u¯ x (yi) + q (˜yi ri) xi (˜yi) / qi (yi ri) η u¯ 1 + q (˜yi ri) /η 2¯u/η. | i | ≤ | | | | | | ≤  |  ≤ y˜i Y y˜i Y X∈ i∗ p X∈ i∗    

Now fix i I and ri Ai, and suppose that yi,t qi ( ri) for each period t N, ∈ ∈ ∼ ·| ∈ independently across periods. By (15), for any T N, we have ∈

T T 2T 2 t 1 2(t 1) 1 δ u¯ Var δ − f (y ) = δ − Var (f (y )) − . i,ri i,t i,ri i,t 2 η t=1 ! t=1 ≤ 1 δ X X − Together with (14) and (16), Bernstein’sinequality (Boucheron, Lugosi, and Massart, 2013, ¯ Theorem 2.10) now implies that, for any T N and f R+, we have ∈ ∈

T ¯2 t 1 ¯ f η Pr δ − fi,ri (yi,t) f exp . (17) ≥ ≤ − 1 δ2T 2 2 ¯  t=1 ! 2 1− δ2 u¯ + 3 fu¯ X −    Our second lemma defines T and f¯ (as a function of k) so that the bound in (17) is suffi ciently small, and some other conditions used in the proof also hold. This lemma is where we use the assumption that (1 δ) log (N) /η 0. − →

Lemma 4 There exists k¯ such that, for every k k¯, there exist T N and f¯ R that ≥ ∈ ∈

30 satisfy the following three inequalities:

2 f¯ 3 η 60¯uN exp ε, (18) − 1 δ2T 2 2 f¯  ≤ 2 1− δ2 u¯ + 3 3 u¯  −    1 δ 2¯u 8 − f¯+ ε, (19) 1 δT η ≤ −   1 δT 1 δ 2¯u 4¯u − + − f¯+ ε. (20) δT δT η ≤   Proof. For each k, let T be the largest integer such that 8¯u 1 δT /δT ε, and let − ≤  60¯u 1 δT u¯2 f¯ = 36 log log (N) − . s ε 1 δ η   −

Note that 1 δT ε/ (ε + 8¯u) (a constant independent of k) as k . Since (1 δ) log (N) /η − → → ∞ − → 0, it follows that (1 δ) log (N) / η 1 δT 0. Therefore, there exists k¯ such that, for − − → ¯ every k k, we have  ≥

4 60¯u 1 δ 1 36 log log (N) − T 1 and (21) 9s ε 1 δ η ≤   − 60¯u 1 δ 1 1 δ 2 8¯u 36 log log (N) − T + − T ε. (22) s ε 1 δ η 1 δ η ≤   − − ! It now follows from straightforward algebra (provided in Appendix B.3) that (18)—(20) hold for every k k¯. ≥

A.3.2 Equilibrium Construction

Fix any k, T , and f¯ that satisfy (18)—(20), as well any v B (ε). For each extreme point ∈ v∗ of Bv (ε/2), we construct a PPE in a T -period, finitely repeated game augmented with

continuation values drawn from Bv (ε/2) that generates payoff vector v∗. By standard argu- 27 ments, this implies that Bv (ε/2) E (Γ), and hence that v E (Γ). Since v B (ε) was ⊆ ∈ ∈ chosen arbitrarily, it follows that B (ε) E (Γ). ⊆ 27 Specifically, at each history hT +1 that marks the end of a block, public randomization can be used to select an extreme point v∗ to be targeted in the following block, with probabilities chosen so that the T +1 expected payoff E [v∗] equals the promised continuation value w h .  31 N Specifically, for each ζ 1, 1 and v∗ = argmaxv Bv(ε/2) ζ v, we construct a public ∈ {− } ∈ · strategy profile σ in a T -period, finitely repeated game (which we call a block strategy profile) together with a continuation value function w : HT +1 RN that satisfy →

σ T t 1 T T +1 Promise Keeping. vi∗ = E (1 δ) δ − ui,t + δ wi h for all i I. − t=1 ∈ h P i σ˜i,σ i T t 1 T T +1 Incentive Compatibility. σi argmaxσ˜ E − (1 δ) δ − ui,t + δ wi h for ∈ i − t=1 all i I. h i ∈ P T +1 T +1 Self Generation. w h Bv (ε/2) for all h . (Note that, since Bv (ε/2) is cube with ∈ T +1 side-length ε and v∗ = argmaxv Bv(ε/2) ζ v, this is equivalent to ζi wi h vi∗ ∈ · − ∈ T +1 [ ε, 0] for all i and h .)   −

T +1 T T +1 Defining πi h = δ / (1 δ) wi h v∗ , these conditions can be rewritten as − − i     Promise Keeping.

T 1 δ σ t 1 T +1 vi∗ = − T E δ − ui,t + πi h for all i. (23) 1 δ " t=1 # − X  Incentive Compatibility.

T σ˜i,σ i t 1 T +1 σi argmax E − δ − ui,t + πi h for all i. (24) σ˜ ∈ i " t=1 # X  Self Generation. 1 δ T +1 T +1 ζ − πi h [ ε, 0] for all i and h . (25) i δT ∈ −  N Fix ζ 1, 1 and v∗ = argmaxv Bv(ε/2) ζ v. We construct a block strategy profile σ ∈ {− } ∈ · and continuation value function π which, in the next subsection, we show satisfy these three conditions. This will complete the proof of the theorem. First, fix a correlated action profile α¯ ∆ (A) such that ∈

ui (¯α) = vi∗ + ζiε/2 for all i, (26)

NE NE and fix a static Nash equilibrium α ∆ (Ai) such that ui α v∗ ε/2 for all i. ∈ i ≤ i − NE Such α¯ and α exist because v∗ Bv (ε/2) and Bv (ε) F ∗.  ∈ Q ⊆ 32 We now construct the block strategy profile σ. For each player i I and period t ∈ ∈ 1,...,T , we define a state Fi,t 0, 1 for player i in period t, which will determine player { } ∈ { } i’sprescribed equilibrium action in period t. The states are determined by the public history, and so are among the players. We first specify players’prescribed actions as a function of the state, and then specify the state as a function of the public history.

Prescribed Equilibrium Actions: For each period t, let rt A be a pure action ∈ profile which is drawn by public randomization at the start of period t from the distribution α¯ ∆ (A) fixed in (26).28 The prescribed equilibrium actions are defined as follows. ∈

1. If Fi,t = 0 for all i I, the players take at = rt. ∈

2. If there is a unique player i such that Fi,t = 1, the players take at = (ri0, r i,t) for − NE some ri0 BRi (r i,t) if ζi = 1, and they take α if ζi = 1, where BRi (r i) = ∈ − − −

argmaxai Ai ui (ai, r i) is the set of i’sbest responses to r i. ∈ − −

NE 3. If there is more than one player i such that Fi,t = 1, the players take α .

Let α∗ ∆ (Ai) denote the distribution of prescribed equilibrium actions, prior to t ∈ i public randomizationQ zt.

(It may be helpful to informally summarize the prescribed actions. So long as Fi,t = 0 for all players, the players take actions drawn from the target action distribution α¯. If Fi,t = 1 for multiple players, the Pareto-dominated Nash equilibrium αNE is played. The most subtle case is when there is a unique player i such that Fi,t = 1. Intuitively, this case will correspond to situations where the signals of player i’sactions are “abnormal,”which later in the proof will imply that her continuation payoffs cannot be adjusted further without violating the self- generation constraint. In this case, player i starts taking static best responses. Moreover, if ζ = 1– so that player i’scontinuation payoff is already “low”– αNE is played.) i − It will be useful to introduce the following additional state variable Si,t, which summarizes player i’sprescribed action as a function of (Fj,t)j I : ∈ 28 Technically, the public randomization device Zt is always a uniform [0, 1] random variable. Throughout the proof, whenever we say that a certain variable is “drawn by public randomization,” we mean that its realization is determined by public randomization, independently of the other variables in the construction. Since we define only a finite number B of such variables, this can be done by, for example, specifying that if n = b mod B then the nth digit of z is used to encode the realization of the bth such variable we define.

33 1. Si,t = 0 if Fj,t = 0 for all j I, or if there exists a unique player j = i such that ∈ 6 Fj,t = 1, and for this player we have ζj = 1. In this case, player i is prescribed to take

ai,t = ri,t.

2. Si,t = NE if Fi,t = 0 and either (i) there exists a unique player j such that Fj,t = 1,

and for this player we have ζ = 1, or (ii) there are two distinct players j, j0 such j − NE that Fj,t = Fj0,t = 1. In this case, player i is prescribed to take αi .

3. Si,t = BR if Fi,t = 1. In this case, player i is prescribed to best respond to her NE opponents’actions (which equal either r i,t or α i , depending on (Fj,t)j=i.) − − 6

States: At the start of each period t, conditional on the public randomization draw of rt A described above, an additional random variable y˜t Y is also drawn by public ran- ∈ ∈ domization, with distribution q (˜yt rt). That is, the distribution of the public randomization | draw y˜t conditional on the draw rt is the same as the distribution of the realized public signal

profile y˜t when the profile of the players’intended actions is rt; however, the distribution of

y˜t depends only on the public randomization draw rt, and not on the players’actions. For

each player i and period t, let fi,ri,t : Yi R be defined as in Lemma 3, and let →

fi,ri,t (yi,t) if Si,t = 0,

fi,t =  fi,ri,t (˜yi,t) if Si,t = NE, (27)   0 if Si,t = BR.   Thus, the value of fi,t depends on the state (Fn,t)n I , the target action profile rt (which ∈ is drawn from distribution α¯ as described above), the public signal yt, and the additional 29 variable y˜t. Later in the proof, fi,t will be a component of the “reward”earned by player i in period t, which will be reflected in player i’s end-of-block continuation payoff function π : HT +1 R. → We can finally define Fi,t as

t 1 0− t00 1 ¯ Fi,t = 1 t0 t : δ − fi,t00 f . (28) (∃ ≤ ≥ ) t00=1 X 29 Intuitively, introducing the variable y˜t, rather than simply using yi,t everywhere in (27), ensures that the distribution of fi,t does not depend on player i’sopponents’strategies.

34 That is, Fi,t is the indicator function for the event that the magnitude of the component of t0 1 ¯ player i’sreward captured by (fi,t ) − exceeds f at any time t0 t. 00 t00=1 ≤ This completes the definition of the equilibrium block strategy profile σ. Before proceed- ing further, we note that a unilateral deviation from σ by any player i does not affect the T distribution of the state vector (Fj,t)j=i . (However, such a deviation does affect the 6 t=1 T distribution of (Fi,t)t=1.)  

Lemma 5 For any player i and block strategy σ˜i, the distribution of the random vector T (Fj,t)j=i is the same under block strategy profile (˜σi, σ i) as under block strategy profile 6 t=1 − σ. 

Proof. Since Fj,t = 1 implies Fj,t+1 = 1, it suffi ces to show that, for each t, each J I i , ⊆ \{ } t t each h such that J = j I i : Fj,t = 0 , and each zt, the probability Pr (Fj,t+1)j J h , zt, ai,t { ∈ \{ } } ∈ | t is independent of ai,t. Since Fj,t+1 is determined by h and fj,t, it is enough to show that  t Pr (fj,t)j J h , zt, ai,t is independent of ai,t. ∈ | t Recall that Sj,t is determined by h , and that if j J (that is, Fj,t = 0) then Sj,t ∈ ∈ 0,NE . If Sj,t = 0 then player j takes rj,t, which is determined by zt, yj,t is distributed ac- { } cording to pj (yj,t rj,t), and fj,t is determined by yj,t, independently across players conditional | on zt. If Sj,t = NE then y˜j,t is distributed according to pj (˜yj,t rj,t), where rj,t is determined | by zt, and fj,t is determined by y˜j,t, independently across players conditional on zt. Thus, t Pr (fj,t)j J h , zt, ai,t = j=i Pr (fj,t Sj,t, rj,t), which is independent of ai,t as desired. ∈ | 6 | Continuation Value Q Function: We now construct the continuation value function π : HT +1 RN . For each player i and end-of-block history hT +1, player i’s continuation → T +1 value πi h will be defined as the sum of T “rewards” πi,t, where t = 1,...,T , and a T +1 constant term ci that does not depend on h .

The rewards πi,t are defined as follows:

1. If Fj,t = 0 for all j I, then ∈

t 1 πi,t = δ − v∗ + ζ ε/4 ui (α∗) + fi,r (yi,t) . (29) i i − t i,t 

2. If Fi,t = 1 and Fj,t = 0 for all j = i, 6

t 1 πi,t = δ − (v∗ + ζ ε/4 ui (α∗)) . (30) i i − t

35 3. Otherwise, t 1 πi,t = δ − ζ u¯ u (α∗) + 1 Si,t = 0 fi,r (yi,t) . (31) − i − t { } i,t  The constant ci is defined as

T T t 1 1 δ ci = E δ − 1 max Fj,t = 0 (vi∗ + ζiε/4) 1 max Fj,t = 1 ζiu¯ + − vi∗. − j=i − j=i 1 δ " t=1   6   6  # − X (32)

Note that, since vi∗ + ζiε/4 and vi∗ are both feasible payoffs, we have

1 δT ci 2¯u − . (33) | | ≤ 1 δ − Finally, for each i and hT +1, player i’scontinuation value at end-of-block history hT +1 is defined as T T +1 πi h = ci + πi,t. (34) t=1  X A.3.3 Verification of the Equilibrium Conditions

We now verify that σ and π satisfy promise keeping, incentive compatibility, and self gener- ation. We first show that Fi,t = 0 for all i and t with high probability, and then verify the three desired conditions in turn.

Lemma 6 We have ε Pr max Fi,t = 0 1 . (35) i I,t 1,...,T ≥ − 20¯u  ∈ ∈{ } 

Proof. By union bound, it suffi ces to show that, for each i, Pr maxt 1,...,T Fi,t = 1 ∈{ } ≤ ε/20¯uN, or equivalently 

t t 1 ¯ ε Pr max δ − fi,t0 f . (36) t 1,...,T ≥ ! ≤ 20¯uN ∈{ } t0=1 X

T ˜ ˜ To see this, let fi,t = fi,ri,t (˜yi,t). Note that the variables fi,t are independent (unlike the t=1 t T ˜ t   variables (fi,t) ). Since fi,t and (fi,t ) have the same distribution if Si,t = BR, t=1 0 0 t0=1 t0=1 6  

36 while fi,t = 0 if Si,t = BR, we have

t t t 1 ¯ t 1 ˜ ¯ Pr max δ − fi,t0 f Pr max δ − fi,t0 f . (37) t 1,...,T ≥ ! ≤ t 1,...,T ≥ ! ∈{ } t0=1 ∈{ } t0=1 X X

T ˜ Since fi,t are independent, Etemadi’sinequality (Billingsley, 1995; Theorem 22.5) im- t=1 plies that 

t t t 1 ˜ ¯ t 1 ˜ ¯ Pr max δ − fi,t0 f 3 max Pr δ − fi,t0 f/3 . (38) t 1,...,T ≥ ! ≤ t 1,...,T ≥ ! ∈{ } t0=1 ∈{ } t0=1 X X

t 1 ˜ Letting xi,t = δ − fi,t, note that xi,t 2¯u/η with probability 1 by (16), E [xi,t] = 0 by (14), | | ≤ and t t T 1 δT u¯2 Var xi,t = Var (xi,t ) Var (xi,t ) = − by (15). 0 0 ≤ 0 1 δ η t =1 ! t =1 t =1 X0 X0 X0 − T ˜ Therefore, by Bernstein’s inequality ((17), which again applies because fi,t are inde- t=1 pendent) and (18), we have, for each t T ,   ≤

t ε t0 1 ˜ ¯ Pr δ − fi,t0 f/3 . (39) ≥ ! ≤ 60¯uN t0=1 X

Finally, (37), (38), and (39) together imply (36). Incentive Compatibility: We use the following lemma (proof in Appendix B.4).

Lemma 7 For each player i and block strategy profile σ, incentive compatibility holds (i.e., (24) is satisfied) if and only if

t σ i t 1 t t supp σi h argmax E − δ − ui,t + πi,t h , ai,t for all t and h . (40) ⊆ ai,t Ai | ∈    t In addition, for all t t0 and h , we have ≤

σ t0 1 t σ t0 1 t E δ − ui,t + πi,t h = E δ − 1 max Fj,t = 0 (vi∗ + ζiε/4) 1 max Fj,t = 1 ζiu¯ h . 0 j=i 0 j=i 0 | 6 − 6 | h i       (41) 

We now verify (40). Fix a player i, period t, and history ht. We consider several cases,

which parallel the definition of the reward πi,t.

37 1. If Fj,t = 0 for all j I, recall that the equilibrium action profile is rt that is prescribed ∈ by public randomization zt. For each action ai = ri,t, by (13) and (29), and recalling 6 that u¯ maxa ui (a) mina ui (a), we have ≥ −

σ i t 1 t σ i t 1 t E − δ − ui,t + πi,t h , zt, ai,t = ri,t E − δ − ui,t + πi,t h , zt, ai,t = ai | − | t 1 = δ − E ui (rt) + fi,ri,t (yi,t) ai,t = ri,t E ui (ai, r i,t) + fi,ri,t (yi,t) ai,t= ai | − − | 0, so (40) holds.    ≤

2. If Fi,t = 1 and Fj,t = 0 for all j = i, then the reward πi,t specified by (30) does not 6 t depend on yi,t. Hence, (40) reduces to the condition that every action in supp σi (h ) t is a static best responses to σ i (h ). This conditions holds for the prescribed action − NE profile, (ri0 BRi (r i,t) , r i,t) or α . ∈ − − 3. Otherwise,

(a) If Si,t = 0, then (40) holds because it holds in Case 1 above and (29) and (31)

differ only by a constant independent of yi,t.

(b) If Si,t = 0, then either Fj,t = Fj ,t = 1 for distinct players j, j0, or there exists a 6 0 unique player j = i with Fj,t = 1, and for this player we have ζ = 1. In both 6 j − NE cases, α is prescribed. Since the reward πi,t specified by (31) does not depend t on yi,t, (40) reduces to the condition that every action in supp σi (h ) is a static t NE best responses to σ i (h ), which holds for the prescribed action profile α . −

Promise Keeping: This essentially holds by construction: we have

T 1 δ σ t 1 T +1 − E δ − ui,t + πi h 1 δT − " t=1 # XT  1 δ σ t 1 = − E δ − ui,t + πi,t + ci (by (34)) 1 δT − " t=1 # ! TX  1 δ σ t 1 = − E δ − 1 max Fj,t = 0 (vi∗ + ζiε/4) 1 max Fj,t = 1 ζiu¯ + ci (by (41)) T j=i j=i 1 δ " t=1 6 − 6 # − X       = vi∗ (by (32)), so (23) holds.

Self Generation: We use the following lemma (proof in Appendix B.5).

38 Lemma 8 For every end-of-block history hT +1, we have

T 2¯u ζ π f¯+ and (42) i i,t η t=1 ≤ X T T ¯ 2¯u 1 δ πi,t f + + 2¯u − . (43) ≤ η 1 δ t=1 − X

In addition, 1 δT ε ζ ci − . (44) i ≤ − 1 δ 8 − T +1 T +1 To establish self generation ((25)), it suffi ces to show that, for each h , ζ πi h 0 i ≤ T T +1 and (1 δ) /δ πi h ε. This now follows because  − ≤   T T T +1 1 δ ε ¯ ζ πi h = ζ ci + πi,t − + f + 2¯u/η (by (42) and (44)) i i ≤ − 1 δ 8 t=1 ! −  X 1 δT 1 δ − ε + 8 − f¯+ 2¯u/η 0 (by (19)), and ≤ 8 (1 δ) − 1 δT ≤ −     T −  1 δ T +1 1 δ −T πi h −T ci + πi,t δ ≤ δ | | ! t=1  XT 1 δ 1 δ ¯ − 4¯u − + f + 2¯u/η (by (33) and (43)) ≤ δT 1 δ  −  1 δT 1 δ = − 4¯u + − f¯+ 2¯u/η ε (by (20)), δT δT ≤  which completes the proof.

A.4 Proof of Theorem 3

Fix a linear equilibrium with weights b = (1, b2, . . . , bN ), where bi 1 for all i. Let | | ≤ + I = i : bi 0 and I− = i : bi 0 . Define { ≥ } { ≤ }

+ + infh ui (h) if i I , suph ui (h) if i I , vi = ∈ and v¯i = ∈ .  sup ui (h) if i I−  infh ui (h) if i I−  h ∈  ∈

By standard arguments, for every β [0, 1], there exists a linear equilibrium with the same ∈ weights b and expected payoff v = (1 β) v + βv¯ such that the set v : h s.t. v = u (h) is − { ∃ }

39 closed and there exist histories h and h0 such that u (h) = v and u (h0) =v ¯. Since M (ε) is convex, it thus suffi ces to show that v, v¯ M (ε). ∈ In the following lemma, given α ∆ (A) and a function f : A Y R, Eα [f (r, y)] ∈ × → α,a denotes expectation where r α and then y q ( r), and E i0 [f (r, y)] denotes expectation ∼ ∼ ·| where r α and then y q ( ai0 , r i). The proof is deferred to Appendix B.6. ∼ ∼ ·| −

Lemma 9 There exist α ∆ (A) and x : A Y R such that ∈ × →

α v¯ = E [u (r) bx (r, y)] , − α α,ai0 E [ui (r) bix (r, y) ri = ai] E [ui (ai0 , r i) bix (r, y) ri = ai] for all i, ai supp αi, ai0 Ai − | ≥ − − | ∈ ∈ δ x (r, y) 0, u¯ for all r, y. ∈ 1 δ  −  If the constraint x (r, y) [0, (δ/ (1 δ))u ¯] is replaced with x (r, y) [ (δ/ (1 δ))u, ¯ 0], ∈ − ∈ − − then the same statement holds with v in place of v¯.

Taking α and x as in Lemma 9, we have, for any player i and manipulation α˜i,

α,α˜i(ai) α Di (˜αi α) αi (ai) E [bix (r, y) ri = ai] E [bix (r, y) ri = ai] | ≤ | − | ai X  α,a0 α αi (ai) max E i [x (r, y) ri = ai] E [x (r, y) ri = ai] ≤ ai0 | − | ai X α (r) max E [x (r, y) r, ai] E [x (r, y) r] , ai ≤ r | | − | | X where the second inequality uses bi 1. Hence, | | ≤ 1 1 Di (α) α (r) max E [x (r, y) r, ai] E [x (r, y) r] N N ai i ≤ i r | | − | | X X1X max E [x (y) ai, r i] E [x (y) r] . r,a N − ≤ i | | − | | X

40 We conclude that i Di (α) /N is bounded by the solution to the program

P 1 max E [x (y) ai, r i] E [x (y) r] s.t. (Y,p),r,a,x N − i | | − | | X δ x (y) 0, u¯ for all y, ∈ 1 δ  −  E [x (y) r] u,¯ | ≤ where the last constraint follows because E [x1 (y) r] = u1 (r) v¯1 u¯. The remainder of the | − ≤ proof shows that the value of this program converges to 0 along any sequence of equilibria satisfying the theorem’shypothesis. We first consider the sub-program where (Y, p) is fixed, so maximization is over (r, a, x).

Recall that q (y a) = πa,aˆp (y aˆ). Note that the value of the sub-program with signal | aˆ | distribution q is greaterP than that with signal distribution qˆ, if qˆ is a garbling of q. (That is, there exists a Markov matrix M such that qˆ = Mq.) To see this, fix any (r, a, xˆ) that are feasible with signal distribution qˆ, and define x (r, y) = M (ˆy y)x ˆ (r, yˆ). Then yˆ | P q (y a) x (r, y) = q (y a) M (ˆy y)x ˆ (r, yˆ) = qˆ(ˆy a)x ˆ (r, yˆ) for all a, | | | | y y yˆ yˆ X X X X so (r, a, x) is feasible with signal distribution q and yields the same value in the sub-program. Consequently, it is without loss to let Y = A and p (y aˆ) =a ˆ for all y, aˆ, so that | q (ˆa a) = πa,aˆ for all a, aˆ. Now fix r, a A, and for each i, define | ∈

1 π if aˆi = ai, 1 π if aˆi = ri, − − i i i π¯ai,aˆi =  π if aˆi = ri, , π¯ri,aˆi =  π if aˆi = ai, , π¯a˜i,aˆi = 1 aˆi =a ˜i for a˜i / ai, ri .   { } ∈ { }  0 otherwise  0 otherwise     Then define π¯ = π¯i for all a,˜ aˆ. The following lemma (proved in Appendix B.7) a,˜ aˆ i a˜i,aˆi implies that the valueQ of the program is upper-bounded by that with π =π ¯.

Lemma 10 π is a garbling of π¯.

41 We conclude that i Di (α) /N is bounded by the solution to the program

P 1 max E [x (ˆa) ai, r i] E [x (ˆa) r] s.t. (45) r,a,x N − i | | − | | X δ x (ˆa) 0, u¯ for all a,ˆ (46) ∈ 1 δ  −  E [x (ˆa) r] u,¯ (47) | ≤ where aˆ is distributed π¯a,˜ aˆ. Note that, for a˜ = r or a˜ = (ai, r i) for some i, π¯a,˜ aˆ > 0 iff − aˆ ai, ri . Note also that it is without loss to take ai = ri for all i. For if ai = ri then ∈ i { } 6 the programQ becomes

1 max E [x i (ˆa i) aj, r j] E [x i (ˆa i) r] s.t. (46), (47). a i,r i,x i:A i R N | − − | − − − − | | − − − − → j=i X6

Any feasible triple (a i, r i, x i) in this reduced program can be extended to a feasible − − − triple (a, r, x) with ai = ri in the original program which gives the same value, by defining 6 x (ˆa) = x i (ˆa i) for all aˆ. We thus assume that ai = ri for all i. − − 6 We now show that the value of program (45)—(47) converges to 0, which will complete the proof. Note that this value is less than the sum of the values of the two programs

1 max (E [x (ˆa) ai, r i] E [x (ˆa) r])+ s.t. (46), (47), and r,a,x N − i | − | 1 X max (E [x (ˆa) r] E [x (ˆa) ai, r i])+ s.t. (46), (47). r,a,x N − i | − | X We show that the value of the first of these programs converges to 0. A symmetric argument shows that the value of the second program also converges to 0, which implies that the value of program (45)—(47) converges to 0 as well, as desired. Letting λ 0 denote the multiplier on (47), it is immediate that the solution to the first ≥ program takes the form

1 π¯ π¯ π¯r,aˆ (a ,r ),aˆ δ N i (ai,r i),aˆ δ 1 i i − − u¯ if − > λ + 1, 1 δ u¯ if  π¯  > λ, 1 δ N i π¯r,aˆ x (ˆa) = P r,aˆ = − π¯ . − π¯ π¯r,aˆ (a ,r ),aˆ  (ai,r i),aˆ−  1 i i 0 if − < λ 0 if N Pi π¯ − < λ + 1  π¯r,aˆ  r,aˆ   P

42 For all aˆ ai, ri , ∈ i { } 1 π − if aˆ = a , Q π¯(ai,r i),aˆ π i i − = . π π¯r,aˆ  1 π if aˆi = ri  −

Since (1 π) /π > π/ (1 π) (as π < 1/2), it follows that there exists n∗ 0, 1,...,N − −  ∈ { } and β [0, 1] such that ∈

δ 1 δ u¯ if i :a ˆi = ai > n∗, − { } x (ˆa) = δ .  β 1 δ u¯ if i :a ˆi = ai = n∗,  − { }  0 if i :a ˆi = ai < n∗ { }   Let n = i :a ˆi = ai and let n i = j = i :a ˆj = aj . Note that, for any n∗, |{ }| − |{ 6 }|

Pr (n = n∗ ai, r i) = (1 π) Pr (n i = n∗ 1 r i) + π Pr (n i = n∗ r i) , and | − − − − | − − | − Pr (n = n∗ r) = π Pr (n i = n∗ 1 r i) + (1 π) Pr (n i = n∗ r i) , | − − | − − − | −

and hence Pr (n n∗ ai, r i) Pr (n n∗ r i) = (1 2π) Pr (n i = n∗ 1 r i). Therefore, ≥ | − − ≥ | − − − − | − the program becomes

δ max u¯ (1 2π)(β Pr (n i = n∗ 1 r i) + (1 β) Pr (n i = n∗ r i)) (48) n∗ 0,1,...,N ,β [0,1] 1 δ − − − | − − − | − ∈{ } ∈ − 1 δ s.t. β Pr (n = n∗ r) + Pr (n n∗ + 1 r) − , (49) | ≥ | ≤ δ where

N 1 N n∗ N 1 n∗ n∗ N n∗ Pr (n i = n∗ r i) = − π (1 π) − − and Pr (n = n∗ r) = π (1 π) − . − | − n − | n −  ∗   ∗

1 ρ Fix a sequence, indexed by k, of games with (1 δ) exp (N − ) for some ρ > 0 and − → ∞ pairs (n∗, β) that satisfy the constraint (49). Fix ε > 0, and suppose towards a contradiction that, for every k¯, there is some k k¯ such that the value of the objective (48) exceeds ε. ≥ Taking a subsequence and relabeling k¯ if necessary, this implies that there exists k¯ such that, for every k k¯, the value of the objective (48) exceeds ε. ≥ We consider two cases, and derive a contradiction in each of them. First, suppose that there exists c > 0 such that, for every k˜, there is some k k˜ satisfying ≥

43 π (n∗ 1) / (N 1) > c. By Hoeffding’sinequality, | − − − |

2 n∗ 1 Pr (n i n∗ 1 r i) exp 2 π − (N 1) . − ≥ − | − ≤ − − N 1 −  −  !

Hence, for every k˜, there is some k k˜ such that the value of (48) is at most ≥

2 δ n∗ 1 δ u¯ (1 2π) exp 2 π − (N 1) u¯ (1 2π) exp 2c2 (N 1) . 1 δ − − − N 1 − ≤ 1 δ − − − −  −  ! −  1 ρ 2 Since (1 δ) exp (N − ) , we have exp ( c N) / (1 δ) 0 for all c > 0, and hence − → ∞ − − → (48) is less than ε for suffi ciently large k, a contradiction. Second, suppose that for any c > 0, there exists k˜ such that, for every k k˜, we have ≥

n∗ 1 π − c. (50) − N 1 ≤

Fix c > 0 and take k suffi ciently large that (50) holds. For any m N, we have ∈

N n n∗ Pr (n n∗ + 1 r) N (1 π) (N n∗)!n∗! π − ≥ | = − − Pr (n = n r ) N n (N n)!n! 1 π i ∗ i n=n +1 ∗ − | − X∗ − −  −  N n n∗ n n∗ N (1 c) N n∗ − n∗ 1 c (N 1) − − − − − − N 1 n N n + c (N 1) ≥ n=n +1 ∗ X∗ −    − −  n∗+m m N n∗ n∗ 1 c (N 1) (1 c) − − − − ≥ − n∗ + m × N n∗ + c (N 1) n=n∗+1  − −  X m N n∗ n∗ 1 c (N 1) = m (1 c) − − − − . − n + m × N n + c (N 1)  ∗ − ∗ − 

By (50), for any γ0 > 0, for suffi ciently large k we have (n∗ 1) / (n∗ + m) 1 γ0, and − ≥ − hence

N n∗ n∗ 1 c (N 1) N n∗ n∗ 1 c (N 1) − − − − (1 γ0) − − − − n∗ + m × N n∗ + c (N 1) ≥ − n∗ 1 × N n∗ + c (N 1) − − − N 1 − − 1 c − n∗ 1 = (1 γ0) − N −1 − 1 + c − N n∗ c− 1 (1 γ )(π 2c) (1 π c) − π c 0 (1 γ0) −c = − − − − , ≥ − 1 + 1 π c (π c) (1 π) − − − −

44 which converges to 1 γ0 as c 0. Hence, for any γ > 0, there exists k˜ suffi ciently large − → such that for every k k˜, ≥ m Pr (n n∗ + 1 r) (1 γ0)(π 2c) (1 π c) ≥ | m (1 c) − − − − m (1 γ) . Pr (n i = n∗ r i) ≥ − (π c) (1 π) ≥ − − | −  − −  We therefore have

β Pr (n = n∗ r) + Pr (n n∗ + 1 r) Pr (n n∗ + 1 r) | ≥ | ≥ | m (1 γ) . Pr (n i = n∗ r i) ≥ Pr (n i = n∗ r i) ≥ − − | − − | − Similarly, for each m and γ > 0, there exists k˜ such that for every k k˜, we have ≥

β Pr (n = n∗ r) + Pr (n n∗ + 1 r) | ≥ | m (1 γ) . Pr (n i = n∗ 1 r i) ≥ − − − | − Thus, for each m and γ > 0, there exists k˜ such that for every k k˜, we have ≥

β Pr (n = n∗ r) + Pr (n n∗ + 1 r) | ≥ | m (1 γ) . β Pr (n i = n∗ 1 r i) + (1 β) Pr (n i = n∗ r i) ≥ − − − | − − − | − The value of (48) thus satisfies

δ u¯ (1 2π)(β Pr (n i = n∗ 1 r i) + (1 β) Pr (n i = n∗ r i)) 1 δ − − − | − − − | − − β Pr (n i = n∗ 1 r i) + (1 β) Pr (n i = n∗ r i) u¯ (1 2π) − − | − − − | − (by (49)) ≤ − β Pr (n = n r) + Pr (n n + 1 r) ∗| ≥ ∗ | u¯ (1 2π) − . ≤ m (1 γ) − Taking m and γ such that u¯ (1 2π) / (m (1 γ)) < ε gives the desired contradiction. − − References

[1] Abreu, Dilip, David Pearce, and Ennio Stacchetti. “Optimal Cartel Equilibria with Imperfect Monitoring.”Journal of Economic Theory 39.1 (1986): 251-269.

[2] Abreu, Dilip, David Pearce, and Ennio Stacchetti. “Toward a Theory of Discounted Repeated Games with Imperfect Monitoring.”Econometrica 58.5 (1990): 1041-1063.

[3] Abreu, Dilip, Paul Milgrom, and David Pearce. “Information and Timing in Repeated Partnerships.”Econometrica 59.6 (1991): 1713-1733.

45 [4] Al-Najjar, Nabil I., and Rann Smorodinsky. “Pivotal Players and the Characterization of Influence.”Journal of Economic Theory 92.2 (2000): 318-342.

[5] Al-Najjar, Nabil I., and Rann Smorodinsky. “Large Nonanonymous Repeated Games.” Games and Economic Behavior 37.1 (2001): 26-39.

[6] Athey, Susan, Kyle Bagwell, and Chris Sanchirico. “Collusion and Price Rigidity.”Re- view of Economic Studies 71.2 (2004): 317-349.

[7] Awaya, Yu, and Vijay Krishna. “On Communication and Collusion.” American Eco- nomic Review 106.2 (2016): 285-315.

[8] Awaya, Yu, and Vijay Krishna. “Communication and Cooperation in Repeated Games.” Theoretical Economics 14.2 (2019): 513-553.

[9] Billingsley, Patrick. Probability and Measure, 3rd ed. Wiley (1995).

[10] Border, Kim C., and Joel Sobel. “Samurai Accountant: A Theory of Auditing and Plunder.”Review of Economic Studies 54.4 (1987): 525-540.

[11] Boucheron, Stéphane, Gábor Lugosi, and Pascal Massart. Concentration Inequalities: A Nonasymptotic Theory of Independence. Oxford University Press, 2013.

[12] Bowles, Samuel, and Herbert Gintis. A Cooperative Species: Human Reciprocity and its Evolution. Princeton University Press, 2011.

[13] Boyd, Robert, and Peter J. Richerson. “The Evolution of Reciprocity in Sizable Groups.”Journal of Theoretical Biology 132.3 (1988): 337-356.

[14] Cover, Thomas M., and Joy A. Thomas. Elements of Information Theory, 2nd ed. Wiley (2006).

[15] Deb, Joyee, Takuo Sugaya, and Alexander Wolitzky. “The Folk Theorem in Repeated Games with Anonymous Random Matching.”Econometrica 88.3 (2020): 917-964.

[16] Edmiston, Jake, and Graeme Hamilton, “The Last Days of Quebec’s Maple Syrup Rebellion,”National Post, April 6, 2018.

[17] Ekmekci, Mehmet, Olivier Gossner, and Andrea Wilson. “Impermanent Types and Per- manent Reputations.”Journal of Economic Theory 147.1 (2012): 162-178.

[18] Ellickson, Robert C. Order without Law: How Neighbors Settle Disputes. Harvard Uni- versity Press, 1991.

[19] Ellison, Glenn. “Cooperation in the Prisoner’s Dilemma with Anonymous Random Matching.”Review of Economic Studies 61.3 (1994): 567-588.

[20] Faingold, Eduardo. “Reputation and the Flow of Information in Repeated Games.” Econometrica 88.4 (2020): 1697-1723.

46 [21] Forges, Francoise. “An Approach to Communication Equilibria.” Econometrica 54.6 (1986): 1375-1385.

[22] Fudenberg, Drew, and David K. Levine. “Effi ciency and Observability with Long-Run and Short-Run Players.”Journal of Economic Theory 62.1 (1994): 103-135.

[23] Fudenberg, Drew, and David K. Levine. “Continuous Time Limits of Repeated Games with Imperfect Public Monitoring.” Review of Economic Dynamics 10.2 (2007): 173- 192.

[24] Fudenberg, Drew, and David K. Levine. “Repeated Games with Frequent Signals.” Quarterly Journal of Economics 124.1 (2009): 233-265.

[25] Fudenberg, Drew, David Levine, and Eric Maskin. “The Folk Theorem with Imperfect Public Information.”Econometrica 62.5 (1994): 997-1039.

[26] Fudenberg, Drew, David Levine, and Wolfgang Pesendorfer. “When are Nonanonymous Players Negligible?”Journal of Economic Theory 79.1 (1998): 46-71.

[27] Georgiadis, George, and Balazs Szentes. “Optimal Monitoring Design.” Econometrica 88.5 (2020): 2075-2107.

[28] Gossner, Olivier. “Simple Bounds on the Value of a Reputation.” Econometrica 79.5 (2011): 1627-1641.

[29] Gossner, Olivier, Penelope Hernández, and Abraham Neyman. “Optimal Use of Com- munication Resources.”Econometrica 74.6 (2006): 1603-1636.

[30] Green, Edward J. “Noncooperative Price Taking in Large Dynamic Markets.”Journal of Economic Theory 22.2 (1980): 155-182.

[31] Green, Edward J., and Robert H. Porter. “Noncooperative Collusion under Imperfect Price Information.”Econometrica 52.1 (1984): 87-100.

[32] Hébert, Benjamin. “Moral Hazard and the Optimality of Debt.” Review of Economic Studies 85.4 (2018): 2214-2252.

[33] Hoffmann, Florian, Roman Inderst, and Marcus Opp. “Only Time will Tell: A Theory of Deferred Compensation.”Review of Economic Studies 88.3 (2021): 1253-1278.

[34] Holmström, Bengt. “Moral Hazard and Observability.”Bell Journal of Economics 10.1 (1979): 74-91.

[35] Hörner, Johannes, and Satoru Takahashi. “How Fast do Equilibrium Payoff Sets Con- verge in Repeated Games?.”Journal of Economic Theory 165 (2016): 332-359.

[36] Jewitt, Ian, Ohad Kadan, and Jeroen M. Swinkels. “Moral Hazard with Bounded Pay- ments.”Journal of Economic Theory 143.1 (2008): 59-82.

47 [37] Kandori, Michihiro. “Social Norms and Community Enforcement.”Review of Economic Studies 59.1 (1992): 63-80.

[38] Kandori, Michihiro. “Introduction to Repeated Games with Private Monitoring.”Jour- nal of Economic Theory 102.1 (2002): 1-15.

[39] Kandori, Michihiro, and Hitoshi Matsushima. “Private Observation, Communication and Collusion.”Econometrica 66.3 (1998): 627-652.

[40] Khalil, Fahad, and Jacques Lawarree. “Input Versus Output Monitoring: Who is the Residual Claimant?.”Journal of Economic Theory 66.1 (1995): 139-157.

[41] Kuitenbrouwer, “‘It’sCrazy, Isn’tIt’: Quebec’sMaple Syrup Rebels Face Ruin as Cartel Crushes Dissent,”Financial Post, December 5, 2016.

[42] Lewis, Tracy R., and David E.M. Sappington. “Using Markets to Allocate Pollution Permits and Other Scarce Resource Rights under Limited Information.” Journal of Public Economics 57.3 (1995): 431-455.

[43] Li, Anqi, and Ming Yang. “Optimal Incentive Contract with Endogenous Monitoring Technology.”Theoretical Economics 15.3 (2020): 1135-1173.

[44] Maskin, Eric, and John Riley. “Input Versus Output Incentive Schemes.” Journal of Public Economics 28.1 (1985): 1-23.

[45] Matsushima, Hitoshi. “Repeated Games with Private Monitoring: Two Players.”Econo- metrica 72.3 (2004): 823-852.

[46] Miguel, Edward, and Mary Kay Gugerty. “Ethnic Diversity, Social Sanctions, and Public Goods in Kenya.”Journal of Public Economics 89.11-12 (2005): 2325-2368.

[47] Mirrlees, James A. “The Theory of Moral Hazard and Unobservable Behaviour: Part I.”Working Paper (1975) (published in Review of Economic Studies 66.1 (1999): 3-21).

[48] Mookherjee, Dilip, and Ivan Png. “Optimal Auditing, Insurance, and Redistribution.” Quarterly Journal of Economics 104.2 (1989): 399-415.

[49] Neyman, Abraham, and Daijiro Okada. “Strategic Entropy and Complexity in Repeated Games.”Games and Economic Behavior 29.1-2 (1999): 191-223.

[50] Neyman, Abraham, and Daijiro Okada. “Repeated Games with Bounded Entropy.” Games and Economic Behavior 30.2 (2000): 228-247.

[51] Ostrom, Elinor. Governing the Commons: The Evolution of Institutions for Collective Action. Cambridge University Press, 1990.

[52] Pai, Mallesh M., Aaron Roth, and Jonathan Ullman. “An Antifolk Theorem for Large Repeated Games.” ACM Transactions on Economics and Computation (TEAC) 5.2 (2016): 1-20.

48 [53] Rahman, David. “The Power of Communication.”American Economic Review 104.11 (2014): 3737-51.

[54] Reinganum, Jennifer F., and Louis L. Wilde. “Income Tax Compliance in a Principal- Agent Framework.”Journal of Public Economics 26.1 (1985): 1-18.

[55] Sabourian, Hamid. “Anonymous Repeated Games with a Large Number of Players and Random Outcomes.”Journal of Economic Theory 51.1 (1990): 92-110.

[56] Sadzik, Tomasz, and Ennio Stacchetti. “Agency Models with Frequent Actions.”Econo- metrica 83.1 (2015): 193-237.

[57] Sannikov, Yuliy, and Andrzej Skrzypacz. “Impossibility of Collusion under Imperfect Monitoring with Flexible Production.”American Economic Review 97.5 (2007): 1794- 1823.

[58] Sannikov, Yuliy, and Andrzej Skrzypacz. “The Role of Information in Repeated Games with Frequent Actions.”Econometrica 78.3 (2010): 847-882.

[59] Seabright, Paul. The Company of Strangers. Princeton University Press, 2004.

[60] Sugaya, Takuo, and Alexander Wolitzky, “Communication and Community Enforce- ment.”Journal of Political Economy 129.9 (2021a): 2595-2628.

[61] Sugaya, Takuo, and Alexander Wolitzky, “The Revelation Principle in Multistage Games.”Review of Economic Studies 88.3 (2021b): 1503-1540.

[62] Tomala, Tristan. “Perfect communication equilibria in repeated games with imperfect monitoring.”Games and Economic Behavior 67.2 (2009): 682-694.

49 B Online Appendix

B.1 The Set B (ε) in A Public-Goods Game

Consider the public-goods game where each player chooses Contribute or Don’t Contribute, and a player’s payoff is the fraction of players who contribute less a constant c (0, 1) ∈ (independent of N) if she contributes herself. Fix any v (0, 1 c), let v = (v, . . . , v) RN , ∈ − ∈ and let ε = cv (1 c v) /4 > 0. We show that Bv (ε) V for all N. Since no one − − ⊆ contributing is a Nash equilibrium with 0 payoffs, this implies that Bv (ε) V ∗, and hence ⊆ v B (ε), for all N. ∈ To see this, fix any N. Since the game is symmetric, to show that Bv (ε) V it suffi ces ⊆ to show that, for any number n 0,...,N , there exists a feasible payoff vector where n ∈ { } “favored”players receive payoffs no less than v + ε, and the remaining N n “disfavored” − players receive payoffs no more than v ε. Fix such an n, and let x = n/N. − Consider the mixed action profile α1 where favored players Contribute with probability (v + ε) / (1 c) (0, 1) and disfavored players always Contribute. At this profile, favored − ∈ players receive payoff v + ε v + ε f (x) := x + (1 x) (1) c , 1 c − − 1 c − − while disfavored players receive payoff

v + ε g (x) := x + (1 x) (1) c. 1 c − − −

Note that f 0 (x) < 0, so f (x) f (1) = v + ε. ≥ Now, with f (x) so defined, consider the mixed action profile α2 where favored players Contribute with probability (v + ε)2 / ((1 c) f (x)) (0, 1) and disfavored players Con- − ∈ tribute with probability (v + ε) /f (x) (0, 1). Note that each player’s payoff at profile α2 ∈ equals her payoff at profile α1 multiplied by (v + ε) /f (x). Therefore, at profile α2, favored players receive payoff v + ε f (x) = v + ε, f (x) while disfavored players receive payoff

v + ε v + ε v + ε g (x) = f (x) 1 c f (x) − − 1 c f (x)   −   v + ε v + ε 1 c (v + ε) (since f (x) 1) ≤ − − 1 c ≤   v ε, − ≤ −

50 where the last inequality follows from ε = cv (1 c v) /4 and straightforward algebra. − −

B.2 Derivation of Equation (7)

t t Since wi,t = (1 δ) ui,t + δwi,t+1, for every history h H we have − ∈

t t t t 1 t 1 δ t Wi,t h = (1 δ) Ui,t h +δE Wi,t+1 h E Wi,t+1 h = Wi,t h − Ui,t h . − | ⇐⇒ | δ − δ         Therefore,

t 1 1 δ Var E Wi,t+1 h = Var Wi,t − Ui,t | δ − δ     1 1 δ 2 1 δ = Var (Wi,t) + − Var (Ui,t) 2 − Cov (Ui,t,Wi,t) δ2 δ − δ2   1 1 δ 2 1 δ 1 δ Var (Wi,t) + − Var (Ui,t) − Var (Ui,t) − Var (Wi,t) ≥ δ2 δ − δ2 − δ2   1 1 δ = Var (Wi,t) − Var (Ui,t) . δ − δ

B.3 Omitted Details for the Proof of Lemma 4

We show that, with the stated definitions of T and f¯, (21) and (22) imply (18)—(20). First, note that 1 δ2 (1 + δ) (1 δ) 1 δ − = − < 2 − . 1 δ2T 1 + δT 1 δT 1 δT − − − Hence,  

2f¯ 1 δ2 4 1 δ 60¯u 1 δT u¯2 − < − 36 log log (N) − 9¯u 1 δ2T 9¯u 1 δT s ε 1 δ η −  −   −  4 60¯u 1 δ 1 = 36 log log (N) − T 1 (by (21)). 9s ε 1 δ η ≤   − Therefore,

2 2 f¯ f¯ η η ¯2 − 3 − 3 f η 60¯uN exp 60¯uN exp = 60¯uN exp − 2T .  1 δ2T  2 f¯   1 δ2T   1 δ2T  1 δ 2 2 ≤ 2 2 36 − 2 u¯ 2 1− δ2 u¯ + 3 3 u¯ 2 1− δ2 u¯ + 1− δ2 u¯ 1 δ !  −   − −  −      

51 Moreover,

T ¯2 60¯u 1 δ f η 36 log ε log (N) 1− δ 1 + δ 60¯u 60¯u 2T = 2T − = log log (N) log log (N) . 36 1 δ u¯2 36 1 δ 1 + δT ε ≥ ε 1− δ2 1− δ2     − − Hence, we have

2 f¯ 3 η 60¯u 60¯uN exp − 60¯uN exp log log (N) = ε.  1 δ2T 2 2 f¯  ≤ − ε 2 1− δ2 u¯ + 3 3 u¯      −     This establishes (18). Next, we have

1 δ ¯ 2¯u 60¯u 1 δ 1 1 δ 2 8 − T f + = 8¯u 36 log log (N) − T + − T ε (by (22)). 1 δ η s ε 1 δ η 1 δ η ≤ −     − − ! (51) This establishes (19). Finally, by (51) and 8¯u 1 δT /δT ε, we have − ≤  1 δT 1 δ 2¯u 1 δT 1 δT 1 δ 2¯u ε ε ε 4¯u − + − f¯+ = 4¯u − + − − f¯+ 4 + ε. δT δT η δT δT 1 δT η ≤ 8 8 8 ≤   −   This establishes (20).

B.4 Proof of Lemma 7

t We show that player i has a profitable one-shot deviation from σi at some history h if and only if (40) is violated at ht. To see this, we first calculate player i’s continuation payoff under σ from period t + 1 onwards (net of the constant ci and the rewards already accrued t πi,t ). For each t0 t + 1, there are several cases to consider. t0=1 0 ≥

P 1. If Fj,t0 = 0 for all j, then by (14) and (29) we have

σ t0 1 t0 t0 1 E δ − ui,t + πi,t h = δ − ui (αt∗ ) + vi∗ + ζiε/4 ui (αt∗ ) + E fi,r (yi,t ) ri,t 0 0 | 0 − 0 i,t0 0 | 0 h i t0 1  h i = δ − (vi∗ + ζiε/4) .

2. If Fi,t = 1 and Fj,t = 0 for all j = i, then by (30) we have 0 0 6

σ t0 1 t0 t0 1 t0 1 E δ − ui,t + πi,t h = δ − (ui (αt∗ ) + vi∗ + ζiε/4 ui (αt∗ )) = δ − (vi∗ + ζiε/4) . 0 0 | 0 − 0 h i 52 3. Otherwise,

(a) If Si,t0 = 0, then by (14) and (31) (and recalling that player i’sequilibrium action

is ri,t0 when Si,t0 = 0) we have

σ t0 1 t0 t0 1 E δ − ui,t + πi,t h = δ − ui (αt∗ ) ζiu¯ u (αt∗ ) + E fi,r (yi,t ) ri,t 0 0 | 0 − − 0 i,t0 0 | 0 t 1 h i = δ 0− ( ζ u¯) . h i − i

(b) If Si,t = 0, then by (31) we have 0 6

σ t0 1 t0 t0 1 t0 1 E δ − ui,t + πi,t h = δ − (ui (αt∗ ) ζiu¯ u (αt∗ )) = δ − ( ζiu¯) 0 0 | 0 − − 0 − h i In total, (41) holds, and player i’scontinuation payoff under σ from period t + 1 onwards equals

T σ t0 1 t E δ − 1 max Fj,t = 0 (vi∗ + ζiε/4) 1 max Fj,t = 1 ζiu¯ h . j=i 0 − j=i 0 | "t =t+1 6 6 # 0X       T

By Lemma 5, the distribution of (Fn,t0 )n=i does not depend on player i’s period- 6 t0=t+1 t action, and hence neither does player i’s continuation payoff under σ from period t + 1 onwards. Therefore, player i’s period-t action ai,t maximizes her continuation payoff from

σ i t 1 t period t onwards if and only if it maximizes E − [δ − ui,t + πi,t h , ai,t]. |

B.5 Proof of Lemma 8

Define

t 1 v δ − (vi∗ + ζiε/4 ui (αt∗)) if Fj,t = 0 for all j = i, πi,t = t 1 − 6 , and ( δ − ( ζiu¯ ui (αt∗)) otherwise − − t 1 f δ − fi,ai,t (yi,t) if either Fj,t = 0 for all j or Si,t = 0, πi,t = ( 0 otherwise.

53 v f Note that, by (29)—(31), we can write πi,t = πi,t + πi,t. We show that, for every end-of-block history hT +1, we have

T 1 δT ζ πv 2¯u − , 0 and (52) i i,t ∈ − 1 δ t=1  −  XT f ¯ ζi πi,t f + 2¯u/η. (53) ≤ t=1 X v f Since πi,t = πi,t + πi,t, (52) and (53) imply (42) and (43), which proves the first part of the lemma.

For (52), note that, by definition of the prescribed equilibrium actions, if Fj,t = 0 for all j = i, then (i) if ζi = 1, we have ui (αt∗) a α¯ (a) min ui (a) , maxa0 ui (ai0 , a i) i − 6 ≥ NE ≥ ui (¯α) = v∗ + ε/2, by (26); and (ii) if ζ = 1, we have ui (α∗) max ui (¯α) , ui α = i i − P t ≤ ui (¯α) = v∗ ε/2, again by (26). In total, we have ζ (v∗ + ζ ε/4 ui (α∗)) ε/4. Since i − i i i −  t ≤ −  obviously ζ (v∗ + ζ ε/4 ui (α∗)) 2¯u and u¯ ζ ui (α∗) 2¯u, we have i i i − t ≥ − − − i t ≥ −

t 1 v δ − ζi (vi∗ + ζiε/4 ui (αt∗)) if Fj,t = 0 for all j = i t 1 ζiπi,t = t 1 − 6 2¯uδ − , 0 . ( δ − ( u¯ ζiui (αt∗)) otherwise ∈ − − −  

For (53), note that Si,t = 0 implies Fi,t = 0, and hence

T T f t 1 ζi πi,t ζi 1 Fi,t = 0 δ − fi,ai,t (yi,t) . ≤ { } t=1 t=1 X X

t 1 ¯ Since Fi,t+1 = 1 whenever δ − fi,a (yi,t) f, and in addition fi,a (yi,t) 2¯u/η t0=1,..,t i,t ≥ i,t ≤ by (16), this inequality implies (53). P

54 For the second part of the lemma, by (32), we have

T T t 1 1 δ ζici = ζi E δ − 1 max Fj,t = 0 (vi∗ + ζiε/4) 1 max Fj,t = 1 ζiu¯ + − vi∗ j=i j=i 1 δ − " t=1 6 − 6 # ! X       − T t 1 = E δ − 1 max Fj,t = 0 ( ε/4) + 1 max Fj,t = 1 (¯u + ζivi∗)   j=i j=i  t=1 6 − 6     [0,2¯u] X  ∈   T   t 1 | {z } E δ − 1 max Fj,t = 0 ( ε/4) + 1 max Fj,t = 1 2¯u j=i j=i ≤ " t=1 6 − 6 # X       1 δT ε ε ε − 1 + 2¯u (by (35)) ≤ − 1 δ − 20¯u 4 20¯u − 1 δT ε     − (as ε < u/¯ 2). ≤ − 1 δ 8 − B.6 Proof of Lemma 9

Let E = (1 β) v + βv¯ : β [0, 1] . By standard arguments (e.g., APS), E is self-generating: { − ∈ } for any v E, there exist α ∆ (A) and w : A Y E such that ∈ ∈ × →

α v = E [u (r) + δw (r, y)] and

α α,ai0 E [ui (r) + δwi (r, y) ri = ai] E [ui (ai0 , r i) + δwi (r, y) ri = ai] for all i, ai supp αi, ai0 Ai. | ≥ − | ∈ ∈ Since v E and w (r, y) E for all r, y, we have ∈ ∈

vi wi (r, y) = bi (v1 w1 (r, y)) for all i = 1, r A, y Y. − − 6 ∈ ∈

Since v¯1 v1 for all v E, if v =v ¯ then w1 (r, y) v1 for all r, y. Hence, taking ≥ ∈ ≤ v =v ¯ = (1 δ) u (α) + δbE [w (r, y) α] and defining x (r, y) = (δ/ (1 δ)) (¯v1 w1 (r, y)) − | − − ∈ [0, (δ/ (1 δ))u ¯] for all r, y, and letting E [ ] denote expectation where y q ( a), we have, − · ∼ ·| for all a, r,

δ u (a) bE [x (r, y)] = u (a) bE (¯v1 w1 (r, y)) − − 1 δ −  −  δ = u (a) E (¯v w (r, y)) = (1 δ) u (a) + δE [w (r, y)] , − 1 δ − −  − 

and the result follows. Similarly, if v = v then w1 (r, y) v1 for all r, y, and the symmetric ≥ argument applies.

55 B.7 Proof of Lemma 10

i Since (ˆai)i I are independent conditional on (ai)i I , it suffi ces to show that π is a garbling ∈ ∈ of π¯i for each i. Since π < 1/2, the matrix π¯i is invertible, with inverse matrix πˆi given by

1 π 1 π 1 −2π if aˆi = ai, 1 −2π if aˆi = ri, − − πˆi = π if aˆ = r , , πˆi = π if aˆ = a , , πˆi = 1 aˆ =a ˜ for a˜ / a , r . ai,aˆi  1 2π i i ri,aˆi  1 2π i i a˜i,aˆi i i i i i − − − − { } ∈ { }  0 otherwise  0 otherwise

  The matrix M i := πiπˆi is easily calculated as

i 1 π i π π − 1 π if a , a a , r , i a˜i,aˆi 1 2π a˜i,aˆi 1 2π i i0 i i Ma˜i,aˆi = − − − − ∈ { } . ( 1 aˆi =a ˜i otherwise { } 

Note that, for a˜i ai, ri , ∈ { }

i Ai 1 π π Ma˜i,aˆi = | | − − ( Ai 1) = 1, Ai 1 Ai π − | | − Ai 1 Ai π aˆ Xi | | − − | | | | − − | | i i and clearly aˆ Ma˜ ,aˆ = 1 for a˜i / ai, ri . In addition, since πa˜ ,a π for all a˜i, ai0 , we have i i i ∈ { } i i0 ≥

P i i πa˜ ,aˆ (1 π) 1 πa ,aˆ π π (1 π) (1 π) π i i − − − i i − − − = 0, 1 2π ≥ Ai 1 Ai π −  | | − − | | i i i i i and clearly M 1 for all ai, a0 . So M is a Markov matrix and π = M π¯ , completing ai,aˆi ≤ i the proof.

56