The Repeated Prisoners’ Dilemma with Private Monitoring: a N-player case¤

Ichiro Obara Department of Economics University of Pennsylvania [email protected] First version: December, 1998 This version: April, 2000

Abstract This paper studies the repeated prisoners’ dilemma with private monitoring for arbitrary number of players. It is shown that a mixture of a and permanent defection can achieve an almost e¢cient for some range of discount factors if private monitoring is almost perfect and sym- metric, and if the number of players is large. This approximate e¢cieicny result also holds when the number of players is two for any prisoners’ dilemma as long as monitoring is almost perfect. A detailed characterization of these sequential equilibria is provided.

1. Introduction

This paper examines the repeated prisoners’ dilemma with arbitrary number of play- ers, where players only observe private and imperfect signals about the other players’ actions. This game belongs to the class of repeated games with private monitor- ing. While repeated games with public monitoring have been extensively analyzed in, for example, Abreu, Pearce and Stachetti [1] or Fudenberg, Levin, and Maskin [8], few things are known about repeated games with private monitoring. It is shown in Compte [5] and Kandori and Matsushima [10] that a folk theorem still holds in this class of game with communication between players, but it is di¢cult to analyze it without communication because the simple recursive structure is lost. The two player prisoner’s dilemma was already examined in Sekiguchi [16], which is the …rst paper to show that the e¢cient outcome can be achieved in some repeated prisoner’s dilemmas with almost perfect private monitoring. This paper is an exten- sion of Sekiguchi [16] in the sense that (1): a similar grim is employed,

¤I am grateful to my advisers George Mailath and Andrew Postlewaite for their encouragement and valuable suggestions, and Johannes Horner, Stephen Morris, Nicolas Persico, Tadashi Sekiguchi for their helpful comments. All remaining errors are mine. (2): the e¢cient outcome is obtained for any prisoner’s dilemma with two players, (3): this e¢ciency result for the two player case is extended to the case of arbitrary number of players with some symmetric condition on the monitoring structure, and (4): the sequential equilibrium corresponding to this grim trigger is explicitly constructed. In Sekiguchi [16], the critical step of the arguments is to obtain the unique optimal action with respect to a player’s subjective belief about the other player’s continuation strategy. Since players randomize between a grim trigger strategy and a permanent defection in the …rst period, each player’s continuation strategy is always one of these two strategies after any history. This means that a player’s subjective belief about the other player’s strategy can be summarized in one parameter: a subjective probability of the permanent defection being played by the other player. In this paper, the clear cut characterization of the optimal action is provided as a function of this belief. In Sekiguchi [16], it is shown that a player should start defecting if she is very con…dent that the other player has started defecting, and a player should cooperate if she is really con…dent that the other player is still cooperating. However, it is not clear what a player should do if the belief is somewhere in the middle. This paper …lls in that gap. Through a detailed examination of players’ incentive, we also …nd that the payo¤ restriction imposed in Sekiguchi [16] is not really necessary, that is, the approximate e¢ciency result is valid for any prisoner’s dilemma game. Although the same kind of clear characterization of the optimal action is possible with many players, it is not straightforward to extend this e¢ciency result to the n player case. The dynamics of belief is richer with more than two players. In particular, it is possible to have a belief that some player started defecting but the other players are still cooperating. In such a case, a player might think that it is better to continue cooperating because it might keep cooperative players from starting defection. So, it is no longer clear when players should pull the trigger. Under the assumption that the probability of any signal pro…le only depends on the number of the total errors it contains, it is shown that the e¢ciency outcome can be supported with the mixture of the permanent defection and a certain kind of grim trigger strategy, where players start defecting if they observe even one signal of deviation by any other player. This strategy generates an extreme belief dynamics under the assumption on the signal distribution, which in turn rationalizes the use of this strategy. As soon as a player observes any bad signal from any other player, the player expects that some other players also got some bad signals with high probability. Then, she becomes pessimistic enough to start defecting for herself because defection should prevail among all players using the same strategy at least in the next period. A sequence of papers have re…ned the result of Sekiguchi [16] for the two player case. Piccione [15] also achieves the e¢cient outcome for any prisoners’ dilemma with two players and almost perfect private monitoring. Moreover, he establishes a folk theorem for a class of prisoners’ dilemma using a strategy which allows players to randomize between cooperation and defection after every history. The strategy used in his paper can be represented as an automaton with countably in…nite states. Ely and Välimäki [7] prove a folk theorem using a similar strategy, but their strategy

2 is “simple” in the sense that it is a two states automaton. Bhaskar [3] is closest to this paper in terms of results and strategies employed in the two player case. He essentially shows (2) and (4), and also proves a folk theorem for a class of prisoner’s dilemma through a di¤erent line of attack from Piccione [15] or Ely and Välimäki [7] using a public randomization device. Mailath and Morris [11] is the …rst paper to deal with the n player case in the private monitoring framework. They show that a perfect equilibrium with public monitoring is robust to the introduction of private monitoring if players’ con- tinuation strategies are approximately after every history. A folk theorem can be obtained when information is almost public and almost perfect. Al- though the stage game in this paper has a more speci…c structure, the information structure allowed in this paper is not nested in their information structure. Especially, private signals can be independent over players in this paper. This paper is organized as follows. In Section 2, the model is described. In Section 3, the assumptions on the information structure are presented. Section 4 discusses the optimal action with respect to player’s beliefs and the belief dynamics generated by the equilibrium strategy proposed in this paper. A sequential equilibrium is constructed in Section 5. Section 6 gives a detailed characterization of the sequential equilibrium constructed in Section 5. Section 6 concludes with a comparison to other related papers.

2. The Model

Let N = f1; 2; :::; ng be the set of players and g be the stage game played by those players. The stage game g is as follows. Player i chooses an action ai from the Ai = C; D : action set f g Actions are not observable to the othner players and taken a A = ¦ Ai: simultaneously. A n-tuple action pro…le is denoted by 2 i=1 A pro…le of all i0s a ¦ A : player’s actions but player is ¡i 2 j j6=i Each player receives a private signal about all the other players’ actions within ! = (! ; :::; ! ; ! :::; ! ) C; D n¡1 = • that period. Let i i;1 i;i¡1 i;i+1; i;n 2 f g i be a generic signal received by player i; where !i;j stands for the signal which player i receives about player j0s action: A generic signal pro…le is denoted by ! = (!1; :::; !n) 2 •. All players have the same payo¤ function u. Player i’s payo¤ u (ai; !i) depends on her own action ai and private signal !i. Other players’ actions a¤ect a player i’s payo¤ only through the distribution over the signal which player i receives. The distribution conditional on a is denoted by p (!ja). It is assumed that p (!ja) are full support distributions, that is, p (!ja) > 0 8a8!: The space of a system of full support p (! a) P: distributions f j ga2A is denoted by Since we are interested in the situation where information is almost perfect, we restrict attention to a subset of P where information is almost perfect. Information is almost perfect when every person’s signal pro…le is equal to the actual action pro…le in that period with probability larger than 1 ¡ " for some small number ": To sum up, the space of the information structure we mainly deal with in this

3 paper is the following subset of P : ( ¯ ) n ¯ p (! a) > 1 " !i = a i i; n£(n¡1)£2 ¯ j ¡ if ¡ for all P" = p (! a) ++ a P p (! a) = 1 f j ga2A 2 < ¯ and 8 , j ¯ !

and p" is a generic element of P": P0 = p0 (! a) We also introduce the perfectly informative signal distribution f j ga2A, a A; p0 (! a) = 1 !i = a i i where, for any 2 S j if ¡ for all . The whole space of the information structure P P0 is endowed with the Euclidean norm. The stage game payo¤ only depends on the number of signals C and D a player re- d (! ) D ! : u (a ; ! ) = u (a ; ! ) ceives. Let i be the number of “ ”s contained in i Then, i i0 i i00 d (! ) = d (! ) a : u ¡a ; Dk¢ i d (! ) = k: if i0 i00 for any i Let i be the payo¤ of player when i k (k) = u ¡D; Dk¢ u ¡C; Dk¢ > The deviation gPain when defections are observePd is M ¡ 0: The payo¤s u (C; !i) p (!j (C; :::; C)) and u (D; !i) p (!j (D; :::; D)) are nor- ! ! malized to 1 and 0 respectively for all i: It is assumed that (1; ::; 1) is an e¢cient stage game payo¤. The largest deviation gain and the smallest deviation gain are M and M respectively, where M = max M (k) and M = min M (k) 1·k·n¡1 1·k·n¡1 The stage game g is repeated in…nitely many times by n players, who discount their payo¤s with a common discount factor ± 2 (0; 1) : Time is discrete and denoted t = 1; 2; :::: i ht = ¡¡a1; !1¢ ; :::; ¡at¡1; !t¡1¢¢ t 2 by Player ’s private history is i i i i i for = h1 = : Ht ht H = S1 Ht: i and i ; Let i be the set of all such history i and i i Player ’s t=1 strategy is a sequence of mappings ¾i = (¾i;1; ¾i;2; ::::) ; each ¾i;t being a mapping Ht A : V (¾ : p; ±) = from i to probability measures on i Discounted average payo¤ is i 1 (1 ±) P ±t¡1E [u ((at; !t)) ¾; p] ; Ht ¡ i i j where the probability measure on i is generated t=1 by (¾; p) : The least upper bound and largest lower bound of the discounted average payo¤ are denoted by V and V respectively. For this n¡player repeated prisoner’s dilemma, the grim trigger ¾C and the per- manent defection ¾D are de…ned as follows: ½ C ht = ((C; C) ; :::; (C; C)) t if i or t=1 ¾C (hi) = D otherwise ¾ (ht) = D ht H D i for all i 2 i

We also use ¾C or ¾D for any continuation strategy which is identical to ¾C or ¾D, t + 1 ¾ ¡ht+k¢ = ¾ ¡hk¢ that is, any continuation strategy from the period such that i ai i for k = 1; 2; ::: and ai = C or D: Moreover, any continuation strategy which is realization equivalent to ¾C or ¾D is also denoted by ¾C or ¾D respectively1. This grim trigger strategy is the harshest one among all the variations of the grim trigger strategies in the n player case. Players using ¾C switch to ¾D as soon as they observe any signal pro…le which is not fully cooperative. When player i is mixing ¾C and ¾D with probability (qi; 1 ¡ qi) ; this strategy is denoted by qi¾C + (1 ¡ qi) ¾D:.

1 A strategy is realization equivalent to another strategy if the former generates the same outcome distribution as the latter independent of the other players’ strategies.

4 Suppose that ¾C or ¾D is chosen by all players. Let µ 2 £ = f0; 1; :::; n ¡ 1g be the number of players using ¾D as a continuation strategy among n players. Then a prob- q (ht; p) £ = 0; 1; :::; n 1 ability measure i i 2 on the space f ¡ g is derived conditional ¡ ht: on the realization of the private history i Clearly, this measure also depends on the initial level of mixture between ¾C and ¾D by every player, but this dependence is not shown explicitly as it is obvious. In the two player case, a player’s strategy can be represented as a function of belief, using the fact that the other player is always playing either ¾C or ¾D on and o¤ the equilibrium path. Here the space of the other player’s “types” is much larger. However, there is a convenient way of summarizing information further. We classify £ into the two sets: f0g and f1; :::; n ¡ 1g ; that is, the state no one have ever switched to ¾D and the state where there is at least one player who has already switched to ¾D. Player i0s conditional subjective probability that no player has started using ¾D Á (q ) = q (ht; p) (0) : is denoted by ¡i ¡i i The reason why we just focus on this number is because the exact number of players who are playing permanent defection does not make much di¤erence in what will happen in the future given everyone’s strategy. As soon as someone starts playing ¾D, every other player starts playing ¾D with very high probability from the very next period on by the assumption of almost perfect monitoring. What is important is not how many players have switched to ¾D; but whether anyone has switched to ¾D or not. Let Vi (¾i; µ : p; ±) be player i’s discounted average payo¤ when µ other players are playing ¾D and n¡µ ¡1 other players are playing ¾C: This notation is justi…ed under the assumption of the symmetry distribution, which is introduced in the next section. Finally, we need the following notations:

n¡1 V (¾ ; q : p; ±) = X V (¾ ; µ : p; ±) q (µ) i i ¡i i i ¡i µ=0 n¡1 (q ; p) = X ©U ¡¡D; Dµ¢ : p¢ U ¡¡C; Dµ¢ : p¢ª q (µ) M ¡i i ¡ i ¡i µ=0

3. Information Structure

In this section, various assumptions on the information structure are proposed and discussed. In the following sections, a sequential equilibrium is constructed with a mixture of grim trigger strategy and permanent defection, which achieves an approx- imately e¢cient outcome for some range of discount factors. As is the case with any equilibrium based on simple grim trigger strategies, this equilibrium satis…es the fol- lowing property; players stick to the grim trigger strategy as long as they have an optimistic belief about the others, and they switch to the permanent defection once

2 q p Here, ¡i is a mapping from the space of private history (and the monitoring technology ) to a space of probability measures on £: However, we also use q i as a point in a n ¡ 1 dimensional ½ ¾ ¡ n¡1 p n¡1 P p = 1 ; p µ = l: simplex f lgli=0 2 <+j l where l means the probability of the event l=0

5 they become pessimistic and never come back. This property is satis…ed in games with perfect monitoring, but not so easily satis…ed in games with imperfect private monitoring. In order to achieve a certain level of coordination, which is necessary for an equilibrium with trigger strategies, we impose some assumptions on p (!ja) in addition to the assumption that it is almost perfect. The …rst assumption, which is maintained throughout this paper, is

Assumption 1 p (! a) = p ¡¡! ¢ ¡a ; :::; a ; :::; a ¢¢ ¿ : N N: j ¿(i)¿(j) j ¿(1) ¿(i) ¿(N) for any permutation !

This assumption implies the followings. First, each player’s expected payo¤ is the same under the same situation combined with the assumption on u: Second, only the number of C and D played matters in terms of expected payo¤. Finally, the dynamics of belief for each player is the same with the same sort of private history. This assumption allows us to treat agents symmetrically and justi…es all the notations introduced in Section 2. Although Assumption 1 is strong enough to achieve an almost e¢cient outcome for two players, a stronger assumption is called upon to achieve similar results with more than two players. Let # (!ja) stand for the number of errors in !: The following assumptions is strong enough for that purpose:

Assumption 2.

p (!0ja) = p (!00ja) if # (!0ja) = # (!00ja) for any !0; !00 2 •; a 2 A

A couple of remarks on these assumptions are in order. First, Assumption 1 is a relatively weak assumption about the symmetry of a signal distribution and satis…ed in most of the papers in reference which analyze the repeated prisoner’s dilemma with almost perfect private monitoring. Second, while Assumption 2 is much stronger than Assumption 1 in general, it is very close to Assumption 1 in the two player case. Consequently, this assumption is also satis…ed in those papers as most of them concentrate on the two player case. Assumption 2 means that the probability of some signal pro…le only depends on the number of errors contained in that pro…le. For example, given that everyone is playing C; the probability that a player receives two “D” signals while the other players get correct signals is equal to the probability that two players receive one “D” while the rest of the players gets correct signals.

6 For example, the following information structure satis…es Assumption 2 for general n:

² Example: Totally Decomposable Case

p (! a) =¦ ¦ p (!i;j aj) a A ! •; p (!i;j aj) = 1 " j j j for all 2 and 2 where j ¡ if i6=j !i;j = aj: Given the action by player j; the probability that player i 6= j receives the right signal or the wrong signal about player j’s action is the same across i 6= j. Also note that players’ signals are conditionally independent over players.

4. Belief Dynamics and

We represent our equilibrium strategy as a mapping from belief to actions 4 fC; Dg. More explicitly, our strategy takes the following form: 8 C q QC < if i 2 q q¡ QI ½ (q i) = if i 2 ¡ : D q¡ QD if ¡i 2

C I D where Q ; Q ; Q are mutually exclusive and exhaustive sets in the space of the probability measures on £; and q means playing C with probability q and playing D with probability 1 ¡ q: The steps to achieve the approximate e¢cient payo¤ is as follows. First, we give a almost complete characterization of the unique optimal action as a function of belief. As a next step, we analyze the dynamics of belief when players are playing either ¾C or ¾D: Since the dynamics of belief in the n¡player case has some particular feature which does not appear in the two player case, we analyze it in detail. The third step ½ (q ) is to construct a sequential equilibrium. We …rst have to …nd ¡i which assigns the optimal action for any possible belief on and o¤ the equilibrium path. For this purpose, we strengthen the characterization result in the …rst step to get this function ½ (q ) : ½ (q ) ¡i In order to prove that ¡i is a sequential equilibrium, we also have to check consistency, that is, to check if players are actually playing either ¾C or ¾D ½ (q ) on the equilibrium path by following ¡i . Since this sequential equilibrium is constructed for a discount factor in the middle range, as the …nal step, we divide the original to component repeated games or use a public randomization device to implement the same payo¤ by reducing high ± e¤ectively.

4.1. Belief and the Best Response Before analyzing the unique optimal action, we …rst extend one property which obvi- ously holds in the two player case to the n¡player case. In the two player case, the di¤erence in payo¤s by ¾C and ¾D is linear and there is a unique q¤ (±; p0) where a player is indi¤erent between ¾C and ¾D with perfect monitoring if ± is high enough

7 as shown in Figure 1. When private monitoring is almost perfect, there is q¤ (±; p") where a player is indi¤erent between ¾C and ¾D and q¤ (±; p") ! q¤ (±; p0) as " ! 0:

Put Figure 1 here.

When the number of players is more than two; the corresponding object Vi (¾C; q i : p0; ±)¡ V (¾ ; q : p ; ±) n ¡ i D ¡i 0 for the ¡player case is a slightly more complex object. Even when players randomize independently and symmetrically playing C with probabil- n 1 ¡ µ ¡n 1¢ q D (1 q) q (µ) = P (1 q) qn¡1¡µ ¡ ity and with probability ¡ , that is, ¡i ¡ µ for µ=0 µ = 0; :::; n ¡ 1, it is a n ¡ 1 degree polynomial in q 2 [0; 1] : Potentially, this equation may have n ¡ 1 solutions between 0 and 1 as shown in …gure 2. We de…ne q¤ (±; p") V (¾ ; q : p ; ±) V (¾ ; q : p ; ±) = 0 to be the solution which is closest to 0 for i C ¡i 0 ¡ i D ¡i 0 n 1 ¡ µ ¡n 1¢ q (µ) = P (1 q) qn¡1¡µ ¡ : where ¡i ¡ µ µ=0

Put Figure 2 here.

When monitoring is almost perfect, Vi (¾C ; q i : p"; ±)¡Vi (¾D; q i : p"; ±) is very V (¾ ; q : p ; ±) V (¾ ; q : p ; ±) :¡ ¡ close to i C ¡i 0 ¡ i D ¡i 0 Actually, it is easy to con…rm that (0) q " 0:3 ± > M ; the former converges to the latter uniformly in as ! If 1+M(0) then Vi (¾C ; 0 : p0; ±) ¡ Vi (¾D; 0 : p0; ±) > 0; which implies that there exists q¤ (±; p0) be- tween 0 and 1. The following lemma extends a useful property for the two player case to the n¡player case. (0) q¤ (±; p0) 1 ± M Lemma 1. ! as # 1+M(0)

Proof. See Appendix q Now we show that the optimal action as a function of ¡i is still similar to the one with perfect monitoring if the dynamics of q i is very close to the dynamics of q ¡ ¡i with perfect monitoring. As a …rst step to show that, the following lemma shows that ¾D is still optimal if a player knows that someone has switched to the permanent defection and " is small.

Lemma 2. There exists a b" > 0 such that Vi (¾i; q i : p"; ±) is maximized by ¾D for p ; q (µ) = 1 µ = 0 ¡ any b" if ¡i for any 6 .

3 Also note that convergence of Vi (¾i; qj : p") to Vi (¾i; qj : p0) is independent of the choice of associated sequence fp"g because of the de…nition of P".

8 Proof. Take ¾D and any strategy which starts with C. The least deviation gain is (1 ¡ ±) 4: The largest loss caused by the di¤erence in continuation payo¤s with ¾D and the latter strategy is ±"V : Setting b" small enough guarantees that (1 ¡ ±) 4 > ±"V for any " 2 (0; b") : Then, D must be the optimal action for any such ": Since players q (µ) = 1 µ = 0 are using permanent defection, ¡i for some 6 in the next period. This implies that D is the optimal action in all the following periods. ¥

Using p (¢j¢) and given the fact that all players are playing either ¾C or ¾D; we introduce a notation for the transition probability of the number of players who have switched to ¾D: Let ¼ (ljm) be a probability that l players will play ¾D from the next period when m players are playing ¾D now. In other words, this ¼ (ljm) is a probability that l ¡ m players playing C receive the signal D when n ¡ m players play C and m players play D: Of course, ¼ (ljm) > 0 if l = m and ¼ (ljm) = 0 if l < m. The following lemma, which is a sort of Maximum theorem, provides various informative and useful bounds on the variations of discounted average payo¤s caused by introducing small imperfectness in private monitoring. Lemma 3. inf V (¾ ; 0 : p ; ±) (1¡±)+±"V 1. i C " = 1 ±(1 ") p"2P" ¡ ¡ ³ ´ ± M(0) ; 1 ; " > 0 " [0; "] ; 2. Given 2 1+M(0) There exists a such that for any 2 1 ±+±"V sup Vi (¾i; 0 : p"; ±) ¡ 5 1¡±(1¡") ¾i;p"2P" Proof. (1): For any " 2 (0; 1) and p" 2 P";

Vi (¾C; 0 : p"; ±) = (1 ¡ ±) + ±¼ (0j0) Vi (¾C; 0 : p"; ±) + ± (1 ¡ ¼ (0j0)) V So, (1 ¡ ±) + ± (1 ¡ ¼ (0j0)) V (1 ¡ ±) + ±"V Vi (¾C ; 0 : p"; ±) = = 1 ¡ ±¼ (0j0) 1 ¡ ± (1 ¡ ") ³ (0) ´ ± M ; 1 ; Vi (¾C ; 0 : p0; ±) > Vi (¾D; 0 : p0; ±) : (2): Given 2 1+M(0) it is easy to check that Pick " small enough such that (i) Vi (¾C ; 0 : p"; ±) > Vi (¾D; 0 : p"; ±) for any p" and (ii) " < ": ¾ ¾ : b Let 0¤ be the optimal strategy given that everyone is using C 4 Suppose ¾ D " [0; "] ; that 0¤ assigns for the …rst period. Then for any 2 V (¾ ; 0 : p ) (1 ±) U ¡¡D; D0¢ : p ¢ + i 0¤ " 5 ¡ " ½ n ¾ ± ¼ (1 1) V (¾ ; 0 : p ; ±) + P ¼ (µ 1) V (¾ ; µ 1 : p ; ±) j i 0¤ " j i D ¡ " µ=2

4 This ¾¤ exists because the strategy space is a compact space in product topology, on which discounted average payo¤ functions are continuous. Of course, this ¾¤ depends on the choice of p":

9 In this inequality, the second component represents what player i could get if she knew the true continuation strategies of her opponents at each possible state. To see that this additional information is valuable, suppose that the continuation strategy ¾ V (¾ ; 0 : p ; ±) V (¾ ; µ 1 : p ; ±) of 0¤ leads to a higher expected payo¤ than i 0¤ " or i D ¡ " at ¾ ¾ the corresponding states, then this contradicts the optimality of 0¤ or D by Lemma 2. So this inequality holds. Then, for any " 2 [0; "] ;

Pn (1 ¡ ±) U (D; D0 : p") + ± ¼ (kj1) Vi (¾D; µ ¡ 1 : p"; ±) µ=2 V (¾¤; 0 : p ; ±) i 0 " 5 1 ¡ ±¼ (1j1) = Vi (¾D; 0 : p"; ±)

< Vi (¾C; 0 : p"; ±) ¾ ; ¾ C Since this contradicts the optimality of 0¤ 0¤ has to assign for the …rst period. Now,

V (¾¤; 0 : p ; ±) (1 ±) + ±¼ (0 0) V (¾¤; 0 : p ; ±) + ± (1 ¼ (0 0)) V i 0 " 5 ¡ j i 0 " ¡ j

So, (1 ±) + ± (1 ¼ (0 0)) V (1 ±) + ±"V V (¾¤; 0 : p ; ±) ¡ ¡ j ¡ i 0 " 5 5 1 ¡ ±¼ (0j0) 1 ¡ ± (1 ¡ ")

1 ±+±"V sup Vi (¾i; 0 : p"; ±) ¡ " [0; "] : This implies that 5 1¡±(1¡") for any 2 ¥ ¾i;p"2P"

(1) means that a small departure from the perfect monitoring does not reduce the payo¤ of ¾C much when all the other players are using a grim trigger strategy. (2) means that there is not much to be exploited by using other strategies than ¾C with a small imperfection in the private signal as long as all the other players are using a grim trigger strategy. The following result shows that the unique optimal action is almost completely q characterized as a function of ¡i except for an arbitrary small neighborhood and equivalent to the optimal action with perfect monitoring. ±; ´ > 0; " > 0 p ; Proposition 1. Given for any there exists a such that for any "

C i q Á (q ) 1¡± (q ; p ) ² it is not optimal to play for player if i satis…es i 5 ± M i 0 ¡ ´ ¡ ¡ ¡

D i q Á (q ) 1¡± (q ; p )+ ² it is not optimal to play for player if i satis…es i = ± M i 0 ´ ¡ ¡ ¡

Proof: (1): It is not optimal to play C if

10 (1 ¡ ±) M (q i; p") ·¡ ½ ¾ ¸ > ± Á (q ) (1 ") sup V (¾ ; 0 : p ; ±) + "V + (1 Á (q )) "V ¡i ¡ i i " ¡ ¡i ¾i;

By Lemma 3.2., this inequality is satis…ed for any " 2 [0; "] and any p" if

(1 ±) (q ; p ) ¡ M ¡i " · ½ ¾ ¸ 1 ¡ ± + ±"V > ± Á (q i) (1 ¡ ") + "V + (1 ¡ Á (q i)) "V ¡ 1 ¡ ± (1 ¡ ") ¡

(1 ±) (q ; p ) ±Á (q ) " 0: LHS converges to ¡ M ¡i 0 and RHS converges to ¡i as ! q Á (q ) 1¡± (q ; p ) ´ ´ > 0; So, if ¡i satis…es ¡i 5 ± M ¡i 0 ¡ for any then there exists a "0 (±; ´; q ) (0; ") B (q ) q C ¡i 2 and a neighborhood ¡i of ¡i such that is not optimal for p q0 B (q ) : "0 (±; ´; q ) > 0 any "0(±;´;q i) and any i 2 i This i can be set independent q ¡ ¡ ¡ q ¡ : of ¡i by the standard arguments because ¡i is in a compact space (2): It is not optimal to play D if

(1 ±) (q ; p ) ¡ M ¡i " < ± £Á (q ) (1 ") V (¾ ; 0 : p ; ±) + "V + (1 Á (q )) "V "V ¤ ¡i f ¡ i C " g ¡ ¡i ¡

this inequality is satis…ed for " 2 (0; 1) and any p" if

(1 ¡ ±) M (q i; p") · ½¡ ¾ ¸ 1 ¡ ± + ±"V < ± Á (q i) (1 ¡ ") + "V + (1 ¡ Á (q i)) "V ¡ "V ¡ 1 ¡ ± (1 ¡ ") ¡

Á (q ) 1¡± (q ; p ) " 0: q This inequality converges to ¡i = ± M ¡i 0 as ! So, if ¡i satis…es 1 ± Á (q ) ¡ (q ; p ) + ´ ´ > 0; "00 (±; ´; q ) D ¡i = ± M ¡i 0 for any there exists a ¡i such that p P q0 q : "00 (±; ´; q ) is not optimal for any " 2 "00(±;´;q i) and any i around i Again, i q : ¡ ¡ ¡ ¡ can be set independent of ¡i Finally, setting " (±; ´) = min f"0 (±; ´) ; "00 (±; ´)g completes the proof. ¥

This proposition implies that the optimal action can be completely characterized Á (q ) = 1¡± except for an arbitrary small neighborhood of the manifold satisfying i ± M (q ; p ) n 1 ; i ¡ ¾ ¡i 0 in a ¡ dimensional simplex where player is indi¤erent between C and ¾D with perfect monitoring: Note that this unique optimal action is equivalent to the one with perfect monitoring. Since the continuation payo¤s with private monitoring converge to the continuation payo¤ with perfect monitoring as monitoring gets accu- rate, a player prefers C(D) to D (C) in private monitoring if and only if she prefers

11 C (D) to D (C) in perfect monitoring except for the region where she is almost indif- ferent between C and D with perfect monitoring. A slight noise still matters in this region, but we also characterize the best response for this region later. Although this argument is essentially the path dominant argument used in Sekiguchi [16] for n = 2, it extends that argument to the n¡player case and provides a sharper characterization even for n = 2: In fact, this proposition is the reason why an almost e¢cient outcome is achieved for any prisoner’s dilemma in the two player case. An immediate corollary of this proposition is that C is the unique optimal action Á ± > M(0) ; " : given that is su¢ciently close to 1, 1+M(0) and is small

± > M(0) ; Á > 0 " > 0 p ; Corollary 1. Given 1+ (0) there exists and such that for any " M £ ¤ it is not optimal for player i to play D if Á 2 Á; 1 :

4.2. Belief Dynamics Since the optimal action is almost characterized as a function of q i; we have to q ¡ analyze the dynamics of ¡i associated with grim trigger strategies. Given the optimal action shown above, all we have to make sure for the consistency of grim trigger strategies is that q i stays in the “C area” described by Proposition 1 as long as i ¡ q player has observed fully cooperative signals from the beginning and ¡i stays in the “D area” once player i received a bad signal or started playing defection for herself. Assumption 2 on the signal distribution is required here for the …rst time as the following arguments show. First, consider a history where player i observes some defection for the …rst time. If this is the …rst period, player i interprets this as a signal of ¾D rather than as an error if " is small5. Suppose next that this kind of history is reached after the …rst period. Also suppose that the number of players is three for simplicity and player 1 observes 1 defection by player 2. With Assumption 2, player 1 can interpret this as a 1-error event and still believe that everyone is cooperative. On the other hand, it is equally likely that player 2’s observation contained 1 error in the last period and the current signal is correct. Note that there are two such events. The player for whom player 2 observed \D" last period can be player 1 or player 3. Since someone should have already defected after all other possible histories, the probability that no one ¾ 1 has switched to D is at most 3 . Obviously, this ‡exibility of interpretation increases as the number of players increases, which makes it easier to move Á closer to 0 after this sort of history. Note that this upper bound of Á does not depend on the level of ": Second, consider a history where player i has already started defection. Suppose that all players but player i have been cooperative until the present. Also suppose again that the number of players is three and i = 1 for the sake of simple exposition:

5 We need to let players to randomize between ¾C and ¾D in the initial period for this argument to work in the initial period. If players start with, say, ¾C with probability 1, no learning occurs after the initial period. This is …rst observed by Matsushima [12]. Note that this is the only reason why the initial randomization is needed. The rest of arguments does not depend on this initial randomization.

12 For everyone to be still cooperative after the current period, all players but player 1 should observe the wrong signal C about player 1 and the correct signal C about the other players in the current period. Again, there are other events with the same probability, where some player switches to ¾D: For example, player 2 may observe the correct signal D about player 1 and the wrong signal D about player 3. This event contains the same number of errors. Since there are 5 such events, the probability ¾ 5 that at least one player has switched to D is at least 6 even though it is assumed that all players but player i have been cooperative until the current period. With positive probability that someone has already started defection, the posterior Á is strictly less 1 : ":6 than 6 This argument is again independent of the level of The above arguments strongly suggest that once players start playing ¾D; then they continue to do so. The following proposition summarizes the above arguments ¾ : W = ©q Á (q ) 1¡± (q ; p ) kª ; and also provide another useful property to support C Let k ¡ij ¡i = ± M ¡i 0 ¡ k > 0 0 < 1¡± (n 1) k: k W where and ± M ¡ ¡ When is small, k just covers the region where C is the unique optimal action and the unique optimal action is indeterminate. Á > Á C C; Á Wk We show that t if a player plays , observes and t¡1 2 .

Proposition 2. Suppose that every player plays q¾C + (1 ¡ q) ¾D with q 2 (0; 1) in the …rst period, and (i) : Assumption 1 is satis…ed and n = 2, or (ii) : Assumption 2 is satis…ed. Then for all i and t = 2; 3; :::

² For any Á0 > 0; k > 0; there exists e" such that for any " 2 (0;e") Á ¡q ¡ht+n¢¢ Á0 ht+n = ¡ht¡1; (C; C) ; :::; (C; C)¢ Á ¡q ¡ht¡1¢¢ ¡i i = for i i when ¡i i 2 Wk Á (q (ht)) 1 ² ¡i i 5 n after histories such as

½ ht = ¡ht¡1; ¡C; !t¡1¢ ¢ t 3 i i i for = – ht = ¡C; !t¡1¢ t = 2 i i for !t¡1 = C with i 6 or ½ ht = ¡ht¡1; ¡D; !t¡1¢ ¢ t 3 i i i for = – ht = ¡D; !t¡1¢ t = 2 i i for

Proof. See appendix.

6 This last argument is speci…c to the n = 3 player case. The two player case has to be treated separately. See Sekiguchi [16] for that case.

13 5. Sequential Equilibrium with the “Grim Trigger” Strategy

As in section 2, the equilibrium strategy is represented as a mapping from the space of belief Q to 4 fC; Dg : The following notatins are useful:

QC = q V (¾ ; q : p ; ±) > V (¾ ; q : p ; ±) " f ¡ij C ¡i " D ¡i " g QI = q V (¾ ; q : p ; ±) = V (¾ ; q : p ; ±) " f ¡ij C ¡i " D ¡i " g QD = q V (¾ ; q : p ; ±) < V (¾ ; q : p ; ±) " f ¡ij C ¡i " D ¡i " g ½ 1 ¾ Qn = q Á (q ) ¡ij ¡i 5 n

n These are subsets of the n ¡ 1 dimensional simplex on £: The subset Q is a set q QI containing the absorbing set of the dynamics of ¡i under the grim trigger. " is a i ¾C ¾D: q¤ (±; p") manifold where player is indi¤erent between and In particular, ¡i 2 QI QC QC = ©q Á (q ) > 1¡± (q ; p )ª " by de…nition. Note that " converges to 0 ij i ± 4 i 0 © ¡ ª ¡ ¡ QD QD = q Á (q ) < 1¡± (q ; p ) " 0: and " converges to 0 ij i ± 4 i 0 as ! ½ ¡ q¡ Q C¡; D Now de…ne as a mapping from ¡i 2 to 4 f g in the following way: 8 C q QC < if ¡i 2 " ½¤ (q ) = q (±; p ) q QI " i ¤ " if i 2 " ¡ : D q¡ QD if ¡i 2 " We know from Proposition 1 that this function assigns the best response action almost QI " everywhere except for a neighborhood of " when is small. Thanks to the last ½ (q ) proposition about the dynamics of belief, now we can verify that "¤ i actually q = QI ¡ assigns the unique optimal action for any belief ¡i 2 " with one weak assumption. ±; Qn QD " Proposition 3. Given if 2 0 and is small enough, then C q QC QI ² is the unique optimal action if and only if ¡i 2 " [ " D q QD QI ² is the unique optimal action if and only if ¡i 2 " [ " Proof: Fix ´ > 0 in Proposition 1 and set k > 0 slightly larger than ´ > 0: Now take any q Á (q ) > 1¡± (q ; p ) k q QD: D ¡i such that ¡i ± 4 ¡i 0 ¡ and ¡i 2 " We prove that is the unique optimal action in this region. If a player plays C, then Proposition 2 implies that the continuation strategy is ¾C if " is su¢ciently small: Since ¾C is dominated by ¾D in this region, the unique optimal action should be D: Similarly, take any Á such 1¡± (q ; p ) + ´ > Á (q ) q QC : D that ± 4 i 0 i and i 2 " If is played, then Proposition 2 ¡ Qn QD ¡ ¡ ¾ and the assumption 2 0 implies that the continuation strategy is D as long as " is small enough. Since ¾D is dominated by ¾C in this region, the unique optimal action should be C: q QD Á (q ) 1¡± (q ; p ) k Since any other ¡i 2 " satis…es ¡i 5 ± 4 ¡i 0 ¡ and any other q QD Á (q ) 1¡± (q ; p ) + ´ q QC " > 0; ¡i 2 " satis…es ¡i = ± 4 ¡i 0 satis…es ¡i 2 " for small the proof is complete.¥

14 The last thing we have to make sure is that players are actually playing either ¾C or ¾D on the equilibrium path after they initially randomize between these strategies. It is almost obvious from the arguments we used to prove the propositions. (i) : n = 2 (ii) : Proposition 4. Suppose that Assu³mption 1´is satis…ed and , or M(0) n D ± ; 1 ; Q Q0 ; Assumption 2 is satis…ed. Given 2 1+M(0) if ½ then there is a " > 0 p ; ½¤ (q ) such that for any " The following proposition shows that " ¡i is a symmetric sequential equilibrium, which generates the same outcome distribution as (:::; q¤ (±; p") ¾C + (1 ¡ q¤ (±; p")) ¾D; :::): Proof. ³ ´ M(0) n D ± ; 1 ; q¤ (±; p") (0; 1) " Q Q" Since 2 1+M(0) exists in if is small enough. ½ is also satis…ed for small ": Suppose that player i chooses ¾C as her strategy in the …rst period. Set " > 0 small enough for Corollary 1 and Proposition 2 to hold. As long as (C; C) is the outcome, Á > Á and C is the unique optimal action by Corollary 1 and Proposition 2. Consider a history where player i observed some D for the …rst time or a history i D; Át = Á (q (ht)) where player started playing After this sort of history, i i i is going n ¡ to stay in Q forever by Proposition 2 as long as she continues playing D: Since Qn QD ¾ : ½ " , this continuation strategy is clearly D ½¤ (q ) We also know that " ¡i assigns the best response after any o¤ the equilibrium history. Moreover, consistency is obvious. Taking " small such that all the above arguments go through, we con…rm that ½¤ (q ) " ¡i is a symmetric sequential equilibrium, which generates the same outcome distribution as (:::; q¤ (±; p") ¾C + (1 ¡ q¤ (±; p")) ¾D; :::) by construction:¥

Since the probability that everyone chooses ¾C in this sequential equilibrium; n 1 (0) q¤ (±; p") ¡ ; ± M gets closer to 1 as gets closer to 1+M(0) by Lemma 1, an outcome ± M(0) arbitrary close to the e¢cient outcome can be achieved for arbitrary close to 1+M(0) . For high ±; we can use a public randomization device again to reduce ± e¤ectively. It is also possible to use Ellison’s trick as in Ellison [6] to achieve an almost e¢cient outcome although the strategy is more complex and no longer a grim trigger. . Here is the corollary of Proposition 3 regarding to an approximately e¢cient outcome. Corollary 2. Suppose that (i) : Assumption 1 is satis…ed and n = 2, or (ii) : As- k > 0; Qn QD; " > 0 sumption 2 is satis…ed. For any if ½ 0 then there is a such that for any p"; there exists a sequential equilibrium whose payo¤ is more than 1 ¡ k: Qn QD Finally, we have to check if the assumption ½ 0 used above to prove propo- sitions is not so strong. First of all, this is always satis…ed when n = 2 for a range ¡ 1 ¢ (0) (0)+ (1) ± q¤ (±; p0) ; 1 ; ± M M M : of with 2 2 that is, when is between 1+M(0) and 1+M(0)+M(1) The Qn QD following proposition provides su¢cient conditions for ½ 0 to hold for general n.

15 Proposition 5. :

n D I ³ n ´ (k) = k = 1; :::; n 1 Q Q0 Q0 = ± M ; M ² If M M for ¡ , then ½ and 6 ; for 2 1+M 1+nM ² Regarding n as a parameter, take a sequence of the stage game with n = 2; 3; :::::: If there exists a lower bound M > 0 such that min M (k) = M independent 1·k·n¡1 n; n n n; Qn QD (n) QI (n) = of th³en there exists´ such that for all = ½ 0 and 0 6 ; ± M(0) ; nM for 2 1+M(0) 1+nM

QI = ©q Á (q ) = 1¡± ª 1 < Proof: If the deviation gain is constant, 0 ¡ij ¡i ± 4 So, n 1¡± Qn QD (n) : M < ± QI = ; ± ± 4 i¤ ½ 0 Combining this inequality with 1+ for 0 6 ; 2 ³ ´ M M ; nM : 1+M 1+nM is obtained n; Qn QD (n) 1 < 1¡± : Qn QD (n) If M is indepen³dent of ´ ½ 0 if n ± M So, ½ 0 and I M(0) nM M(0) nM Q0 (n) = ± ; n n n < 6 ; for 2 1+M(0) 1+nM for all = if is chosen such that 1+M(0) 1+nM The assumption of constant deviation gain is natural in the team production, where a player’s cost comes from disutility associated with the e¤ort level.

6. Conclusion

In this paper, we clarify the incentive structure in a general repeated prisoner’s dilemma with private monitoring when players are using a mixture of a grim trig- ger strategy and permanent defection, and provide the su¢cient conditions under which the simple grim trigger strategy supports the e¢cient outcome as a sequential equilibrium for some range of discount factors. It is also shown that the best response to a mixture of grim trigger strategy and permanent defection can be characterized almost uniquely, which makes it possible to provide the clear representation of the sequential equilibrium supporting the e¢cient outcome . There have been two lines of research exploring the sustainability of the e¢cient outcome or the possibility of any folk theorem in repeated prisoners’ dilemma with private monitoring. One direction of research is based on grim trigger strategies. Bhaskar [3] and Sekiguchi [16] belong to this literature as well as this paper. The emphasis of these papers are on coordination of players’ actions and beliefs. As shown in Mailath and Morris [11], the almost common knowledge of continuation strategies is very important for subgame perfect equilibria or perfect public equilibria to be robust to the perturbation with respect to private noise because it enables players to keep coordinating with others without public signals. In this paper, in fact a player strongly believes that the other players are playing ¾C when she continues playing C; and believes that the other players are playing ¾D once she started playing D: However, note that our strategy does not exhibit such a strong coordination as required in Mailath and Morris [11]. While Mailath and Morris [11] requires that players’ continuation strategies are almost common knowledge after any history, a discoordination of belief or continuation strategy can occur almost surely for our

16 mixing grim trigger strategy. When a player observes D for the …rst time, she is still uncertain about the other player’s continuation strategies. The other direction of research is based on a complete mixed strategies which makes the other player indi¤erent over many strategies. Ely and Välimäki [7] and Piccione [15] are among papers in this direction.7 One advantage of the …rst approach compared to the second approach is:

1. The equilibrium needs to use mixing only at the …rst period of the game, while the latter approach uses a behavior strategy which let players randomize at every period after every history.

Another advantage, which is closely related to the …rst one, is as follows: 2. Since our strategy is almost pure, it is very easy to justify the use of a mixed strategy. Especially, our strategy satis…es the re…nement criterion for repeated game equilibria with respect to incomplete information on players’ payo¤s, which is proposed by Bhaskar[4]. This criterion prohibits players to play dif- ferent continuation strategies after di¤erent histories if players are indi¤erent between these continuation strategies after all these di¤erent histories. Strate- gies in Ely and Välimäki [7] or Piccione [15] violate this criterion at the second period when private signals are independent. While puri…cation is straightfor- ward for our strategy by introducing a small amount of uncertainty into stage game payo¤s, payo¤ uncertainty has to depend on a private history in a par- ticular way to purify equilibria in Ely and Välimäki [7] or Piccione [15]. It is also easy to adopt Nash’s population interpretation to purify our equilibrium. What we have in our mind is a pool of players who are matching with the other players to play repeated games, where most of the players use grim trigger and only a small portion of the players use permanent defection.

The case where private monitoring is not almost perfect has not been systemati- cally investigated and it remains as a important topic for future research.

7 Obara [14] uses the same kind of strategy for repeated partenership games with imperfect public monitoring and constructs a sequential equilibrium which cannot be supported by public perfect equilibria.

17 Appendix. Proof of Lemma 1. (0) ± = 4 ; q¤ (±; p0) = 1 q When 1+4(0) is the solution of the equation in : V (¾ ; q : p ; ±) V (¾ ; q : p ; ±) = 0 i C ¡i 0 ¡ i D ¡i 0 n 1 ¡ µ ¡n 1¢ q (µ) = P (1 q) qn¡1¡µ ¡ kµ = 0; :::; n 1:: where ¡i ¡ µ for ¡ µ=0 @q¤(±;p0) < 0 We just need to show that @± j±= 4(0) using the implicit function theo- 1+4(0) rem. Since

V (¾ ; q : p ; ±) V (¾ ; q : p ; ±) i C ¡i 0 ¡ i D ¡i 0 n¡1 µ ¶ X µ n 1 = (1 ±) (1 q) qn¡1¡µ ¡ (µ; p ) ±qn¡1 ¡ ¡ µ 4 0 ¡ µ=0

@Vi(¾C ;q¤ :p0;±) Vi(¾D;q¤ :p0;±) @q (±; p ) ¡i ¡ ¡i ¤ 0 = @± 4(0) 4(0) @± j±= ¡ @V (¾ ;q¤ :p ;±) V (¾ ;q¤ :p ;±) j±= 1+4(0) i C i 0 ¡ i D i 0 1+4(0) ¡ @q ¡ 1 + (0) = 4 ¡(1 ±) (n 1) (1) + ± (n 1)j±= 4(0) ¡ ¡ 4 ¡ 1+4(0) 1 1 = ¡ > 0 n ¡ 1 4 (0) + 4 (1) ¥

Proof of Proposition2

case 1: t 1 ht = ¡ht¡1; (C; C)¢ Á ¡ = Á ¡q ¡ht¡1¢¢ W : Let i i such that i ¡i i 2 k

Applying Bayes’ rule8, Át = Á (q (ht)) i ¡i i t 1 t 1 Á ¡ P (!t¡1=C h ¡ ) = i j i (1 Át¡1)P (!t¡1=C µ =0;ht¡1)+Át¡1P (!t¡1=C ht¡1) ¡ i i j t¡16 i i i j i

Át¡1 This function is increasing in i and crosses 45± line once. Note that this ¡ t 1¢ Át¡1(1 ") ' Á ¡ = i ¡ : Áb function is bounded below by i (1 Át¡1)"+Át¡1 Let be the unique …xed ¡ i i Át = Á ¡q ¡ht¡1¢¢ W ; point of this mapping. Given that i ¡i i 2 k it is easy to see that

8 All the conditional distributions implicitly depend on the level of initial mixture between ¾C ; ¾D:

18 ³ ´ ' ¡Át¡1¢ Á Á0 > 0 q i and b can be made larger than for any ¡i uniformly by choosing " Át¡1 < Á; C; 'n ¡Á1¢ small enough . If i b then, as long as players continue to observe i Á: Át ' ¡Át¡1¢ is going to increase monotonically to b On the other hand, since i = i and Át+n ' ¡Át+n¡1¢ ; Át+n¡1 'n ¡Át¡1¢ n = 0; 1; ::: Át¡1 Á; i = i i is larger than i for any If i = b Át+n¡1 Á > Á0 n: ©Át+nª1 Á0: then i = b for any This implies that i n=0 is always above

½ ht = ¡ht¡1; ¡C; !t¡1¢ ¢ t 3 i i i for = case 2: ht = ¡C; !t¡1¢ t = 2 i i for !t¡1 = ! = C with i i0 6

Suppose that t = 3: By Bayes’ Rule, Át = Á (q (ht)) i ¡i i t 2 t 1 t 1 ¡ t¡2 ¡ ¡ 0 Á P ((! ;! ;! )=(C;C;!i) µt 2=0) = i ¡i i j ¡ t 2 t 2 t 1 t 2 t 2 t 2 t 1 (1 Á ¡ )P ((! ¡ ;! ¡ )=(C;!0 ) µ =0; h ¡ )+Á ¡ P ((! ¡ ;! ¡ )=(C;!0 ) µ =0) ¡ i i i i j t¡26 i i i i i j t¡2

This is bounded above by

¡¡ t 2 t 1 t 1¢ ¢ P ! ¡ ; ! ¡ ; ! ¡ = (C; C; !0 ) µt 2 = 0 ¡i i i j ¡ ¡¡ t 2 t 1¢ ¢ P ! ¡ ; ! ¡ = (C; !0 ) µ = 0 i i i j t¡2 P (# (! C)) i0 j 5 #(! C) P (# (! C)) + (n 1) i0 j P (# (! C)) i0 j ¡ i0 j 1 = #(!0 C) 1 + (n ¡ 1) ij 1 5 n

P (# (!i0 C)) # (!i0 C) where j is the probability of the event that j errors octcur. Ái So, once players observed a bad signal for thte 1…rst time, the posterior jumps 1 Á ¡ qt¡1 t 3: down at least below n independent of the prior i or ¡i for = This argument is independent of the level of ": t = 2; Át 1 " When i decreases enough to be less than n if is very small. This is because players do not interpret it as an error but as a signal of ¾D at the …rst period.

½ t ¡ t 1 ¡ t 1¢ ¢ h = h ¡ ; D; ! ¡ t 3 t 1 i i i for = ! ¡ = !00 case 3: ht = ¡D; !t¡1¢ t = 2 with i i i i for

we have to treat (i) n = 3 and (ii) n = 2 separately again. (i): n = 3 By Bayes’ Rule,

19 Át = Á (q (ht)) i ¡i i t 1 t 1 t 1 ¡ ¡ ¡ 00 Á P ((! ;! )=(C;!i ) µt 1=0) = i ¡i i j ¡ t 1 t 1 t 1 t 1 (1 Á ¡ )P (! ¡ =!00 µ =0)+Á ¡ P (! ¡ =!00 µ =0) ¡ i i i j t¡16 i i i j t¡1

This is bounded above by

¡¡ t 1 t 1¢ ¢ P ! ¡ ; ! ¡ = (C;!00) µt 1 = 0 ¡i i i j ¡ ¡ t 1 ¢ P ! ¡ = !00 µ = 0 i i j t¡1 P (# (! C) + n 1) i00j ¡ 5 n¡1 P (# (! C) + n 1) + P (# (! C) + n 1) P ¡(n¡2)(n¡1)¢¡n¡1¢ i00j ¡ i00j ¡ m m m=1 1 = n¡1 1+ P ¡(n¡2)(n¡1)¢¡n¡1¢ m m m=1 1 ( n = 3) 5 6 This holds with equality when

This argument is independent of "; too. (ii): n = 2 Át¡1(1 2P (1) P (2))+(1 Át¡1)(P (1)+P (2)) ¡ ¡ t 1 ¢¢ i ¡ ¡ ¡ i Á q i h ¡ ; (D; C) = ¡ i Át¡1(1 P (1) p(2))+(1 Át¡1)(P (1)+P (2)) i ¡ ¡ ¡ i Át¡1P (1)+(1 Át¡1)(1 P (1) P (2)) ¡ ¡ t 1 ¢¢ i ¡ i ¡ ¡ Á q i h ¡ ; (D; D) = ¡ i Át¡1(P (1)+P (2))+(1 Át¡1)(1 P (1) P (2)) i ¡ i ¡ ¡ where P (k) is a probability that k errors occur. ¡ t ¡ t 1 ¢¢ 1 Á q h ¡ ; (D; !i) " It can be shown that ¡i i 5 2 when is small. See Sekiguchi [16] for detail. Á (q (ht)) 1 With case 2 and case 3, I can conclude that ¡i i 5 n after any history such as

½ ht = ¡ht¡1; ¡C; !t¡1¢ ¢ t 3 i i i for = ² ht = ¡C; !t¡1¢ t = 2 i i for !t¡1 = C with i 6

or

½ ht = ¡ht¡1; ¡D; !t¡1¢ ¢ t 3 i i i for = ² ht = ¡D; !t¡1¢ t = 2 i i for

¥

20 References

[1] Abreu, D. Pearce, and E. Stachetti, Toward a theory of discounted repeated games with imperfect monitoring, Econometrica 58 (1990), 1041-1064. [2] V. Bhaskar, Informational constraints and overlapping generations model: folk and anti-folk theorem, Rev. of Econ. Stud 65, (1998), 135-149. [3] V. Bhaskar, Sequential equilibria in the repeated prisoner’s dilemma with private monitoring, mimeo, 1999. [4] V. Bhaskar, The robustness of repeated game equilibria to incomplete payo¤ information, mimeo, 1999. [5] O. Compte, Communication in repeated games with imperfect private monitor- ing, Econometrica 66 (1998), 597-626. [6] G. Ellison, Cooperation in the prisoner’s dilemma with anonymous random matching, J. Econ. Theory 61 (1994), 567-588. [7] J. C. Ely and J. Välimäki, A robust folk theorem for the prisoner’s dilemma, mimeo, 1999. [8] D. Fudenberg, D. Levine, and E. Maskin, The folk theorem with imperfect public information, Econometrica 62 (1994), 997-1040. [9] J. Harsanyi, Games with randomly distributed payo¤s: a new rational for mixed- strategy equilibrium points, Int. J. 2, (1973), 1-23. [10] M. Kandori and H. Matsushima, Private observation, communication, and col- lusion, Econometrica 66 (1998), 627-652. [11] G. Mailath and S. Morris, Repeated games with almost-public monitoring, mimeo, 1999. [12] H. Matsushima, On the theory of repeated game with non-observable actions, part I: Anti-Folk Theorem without communication. Econ. Letters 35 (1990), 253-256. [13] I. Obara, Repeatd prisoners’ dilemma with private monitoring: a N-player case, mimeo, 1999 [14] I. Obara, Private strategy and e¢ciency: repeated partnership games revisited, mimeo, 1999. [15] M. Piccione, The repeated prisoner’s dilemma with imperfect private monitoring, mimeo, 1998. [16] T. Sekiguchi, E¢ciency in repeated prisoner’s dilemma with private monitoring, J. Econ. Theory 76 (1997), 345-361.

21