1 Grim Trigger in the Repeated Prisoner's Dilemma

Problem Set 4 GTSB Fall 2015

Due on 11/15. If you are working with a partner, you and your partner may turn in a single copy of the problem set. Please show your work and acknowledge any additional resources consulted. Questions marked with an (∗) are intended for math-and-game-theory-heads who are interested in deeper, formal exploration, perhaps as preparation for grad school. The questions typically demonstrate the robustness of the results from class or other problems, and the answers do not change the interpretation of those results. Moreover, this material will not play a large role on the exam and tends to be worth relatively little on the problem sets. Some folks might consequently prefer to skip these problems.

1 Grim Trigger in the Repeated Prisoner’s Dilemma (70 points)

In one instance of the prisoner’s dilemma, each player chooses whether to pay some cost c > 0 in order to confer a beneﬁt b > c onto the other player. The payoﬀs from a single iteration of this prisoner’s dilemma are therefore:

Cooperate Defect Cooperate (b − c, b − c) (−c, b) Defect (b, −c) (0, 0)

The repeated prisoner’s dilemma1 is built out of several stages, each of which is a copy of the above game. At the end of each stage, the two players repeat the prisoner’s dilemma again with probability δ, where 0 ≤ δ ≤ 1. A strategy in the repeated prisoner’s dilemma is a rule which determines whether a player will cooperate or defect in each given stage. This rule may depend on which round it is, and on either player’s actions in previous rounds. For example, the grim trigger strategy is described by the following rule: cooperate if both players have never defected, and defect otherwise. The goal of this problem is to show that the c strategy pair in which both players play grim trigger is a Nash equilibrium if δ > b .

(a) Suppose that player 1 and player 2 are both following the grim trigger strategy. What actions will be played in each stage of the repeated game? What are the payoﬀs to players 1 and 2 in each stage?

(b) Using your result from parta, write down the expected payoﬀ to player 1 from the entire repeated prisoner’s dilemma in terms of c, b, and δ. Hint: Remember that, if |δ| < 1:

a a + aδ + aδ2 + aδ3 + ... = 1 − δ 1Please consult Section 5 of the Game Theory handout on Repeated Games for details.

1 (c) Now we will check whether player 1 can improve his payoﬀ by deviating from the grim trigger strategy. Argue that we only need to check the case where player 1 plays all-D, that is, player 1 defects in every round.

(d) Suppose that player 2 plays grim trigger and player 1 deviates from grim trigger and plays all-D. What is the total payoﬀ to player 1 from the entire repeated prisoner’s dilemma?

(e) For grim trigger to be a Nash equilibrium, we need that the payoff to player 1 from playing grim trigger is greater than or equal to the payoff to player 1 from playing all-D, assuming player 2’s strategy is fixed. Using your results from partsb andd, write down an inequality that must be satisfied in order for grim trigger to be a Nash equilibrium. Simplify this inequality to obtain the condition c δ > b .

(f)( ∗) - 10 points. Show that the Grim Trigger is a Subgame Perfect equilibrium in addition to being a Nash equilibrium [Hint: use the one-stage deviation principle]. For a formal discussion of subgame perfection, see the Game Theory Handout. So far we have focused on the Grim Trigger because it is a relatively simple strategy to understand, but not necessarily because we think it is used in practice. Importantly, many of the insights we have learned from studying the Grim Trigger generalize to any Nash equilibrium.

(g)( ∗) - 10 points. Show that in any Nash equilibrium in which both players play C at each period, player 2 must cooperate less in the future if player 1 were to deviate and play D at any period instead of C. Interpret this result in terms of ‘reciprocity,’ as discussed in lecture.

2 No Cooperation for Small δ (50 points)

In lecture, we argued that cooperative equilibria exist in the repeated prisoner’s dilemma if and c only if δ > b . In problem1, you showed that we can have a Nash equilibrium in which both players c always cooperate (speciﬁcally, the equilibrium in which both players play grim trigger) if δ > b . In c this problem, we will show that if δ < b , then the only Nash equilibrium is (all-D, all-D). That is, c cooperative equilibria exist only if δ > b . Combined, your responses to these two questions thus provide a complete proof to our claim from lecture.

(a) Suppose that the strategy pair (s1, s2) is a Nash equilibrium, and let U1(s1, s2) and U2(s1, s2)

be the payoﬀs to players 1 and 2, respectively. Show that U1(s1, s2) ≥ 0 and U2(s1, s2) ≥ 0.

(b) Notice that, in each round of the prisoner’s dilemma, the sum of the payoﬀs to players 1 and

2 is either 2(b − c), b − c, or 0. Show that, if s1 and s2 are any two strategy pairs, then 2(b−c) U1(s1, s2) + U2(s1, s2) ≤ 1−δ .

2 c (c) Now assume δ < b . Using your results from partb, show that U1(s1, s2) + U2(s1, s2) < 2b for any strategy pair (s1, s2). Use this to conclude that, if (s1, s2) is a Nash equilibrium, at least one player receives total payoﬀ less than b.

(d) Suppose that, when players 1 and 2 play s1 and s2, both players cooperate in some round k. Without loss of generality, we may assume that k = 1 (otherwise we repeat the argument from partsa-c to the subgame starting at round k, introducing a factor of δk−1). Using your result from partc, show that one of the players can improve his payoﬀ by deviating.

(e) Next we need to rule out the possibility of a round in which one player cooperates and the other defects. Repeat the argument of partb using the additional result that players 1 and 2 never simultaneously cooperate (so the sum of their payoﬀs in a given round is either b − c or b−c 0). Show that U1(s1, s2) + U2(s1, s2) ≤ 1−δ .

c (f) Again assume that δ < b . Use your results from partsa ande to conclude that each player’s payoﬀ is less than b; that is, U1(s1, s2) < b and U2(s1, s2) < b.

(g) Now suppose that, in the ﬁrst round, player 1 cooperates and player 2 defects. By your reasoning from part (f), player 2 receives total payoﬀ less than b. Show that player 2 can

improve his payoﬀ by deviating, so that (s1, s2) is not a Nash equilibrium.

Using this proof by contradiction, you have showed that a strategy pair (s1, s2) which involves c cooperation in any period cannot be a Nash equilibrium if δ < b . It follows that (all-D, all-D) is the only equilibrium in this case.

3 Panchanathan and Boyd (2004)

Recall the model presented in Panchanathan and Boyd (2004):

“. . . we consider a large population subdivided into randomly formed social groups of size n. Social life consists of two stages. First, individuals decide whether or not to contribute to a one-shot collective action game at a net personal cost C in order to create a beneﬁt B shared equally amongst the n−1 other group members, where B > C. Second, individuals engage in a multi-period ‘mutual aid game’. . . In each period of the mutual aid game, one randomly selected individual from each group is ‘needy’. Each of his n − 1 neighbours can help him an amount b at a personal cost c, where b > c > 0. Each individual?s behavioural history is known to all group members. This assumption is essential because it is known that indirect reciprocity cannot evolve when information quality is poor. The mutual aid game repeats with probability w and terminates with probability 1 − w, thus lasting for 1/(1 − w) periods on average.”

Recall, also, the “shunner” strategy:

3 q r s

Figure 1: An Abstract Information Structure

“Shunners contribute to the collective action and then try to help those needy individuals who have good reputations during the mutual aid game, but mistakenly fail owing to errors with probability e. . . Shunners never help needy recipients who are in bad standing.”

n−1 1−e (a) The authors argue that (shunner, shunner, . . . , shunner) is an equilibrium iﬀ n 1−w (b− c)(1 − we) > C. Argue that this is the case, even if one considers deviations to all possible strategies in this game, and not just to the two other strategies described by the authors. To simplify the math, feel free to set e = 0.

(b)( ∗) When is contribution to the public good sustained as part of a Nash equilibrium. What property must any strategy that sustains contributions to the public good have?

4 Introduction to Information Structures

Consider the information structure presented in Fig.1. Recall, an information structure has three components. The first component is a set of states of the world. In this information structure, there are three states, each represented by a ball. The second component is the priors–the probability with which each state occurs. In Fig.1, the priors are presented below the balls: state 1 occurs with probability q, state 2 occurs with probability r, and state 3 occurs with probability s = 1 − (q + r). The third component is a partition of the states for each player that identifies states that the player can and cannot distinguish. In Fig.1, player 1 (top) cannot distinguish states 2 and 3 (blue) from each other, but can distinguish state 1 (red) from states 2 and 3. Player 2 (bottom) cannot distinguish states 1 and 2 (red) from each other, but can distinguish state 3 (blue) from states 1 and 2. Suppose the two players whose information structure is represented in Fig.1 are playing the coordination game presented in Fig.2. That is, first, nature randomly draws a ball according to the prior probabilities. Each player sees the color associated with that state by their partition. Then, each player plays an action in the coordination game and receives payoffs according to their actions, as presented in the payoff matrix.

(a) A state-dependent strategy is one in which a player chooses one action in some states of the world, and another in other states. Identify a valid, state-dependent strategy for player 1. Do

4 L R L a b

R c d

a > c , d > b p = (d-b)/(d-b + a-c)

Figure 2: The Coordination Game

q r s t

Figure 3: An Information Structure Illustrating the Importance of Higher-Order Beliefs

the same for player 2.

(b) Consider the strategy pair where player 1 plays “A when red and B when blue” (a.k.a. “A iff red”) and player 2 plays “B when green and A when orange” (a.k.a. “A iff orange”). What are each player’s payoffs?

(c) A Bayesian Nash equilibrium (BNE) is simply a pair of strategies such that neither player gains a higher expected payoff by unilaterally deviating to a different strategy. This can easily be checked by showing that for no player is there no color such that by playing a different action in that color, the player would receive a higher payoff. Show that when r/(r + s) < p and r/(q + r) < 1 − p, the strategy pair “A iff red; A iff green” is a BNE. Note: when at least one of the players is playing a state-dependent strategy in equilibrium, we call this an equilibrium with state-dependent strategies (ESDS).

5 Using Information Structures to Understand Higher-Order Be- liefs

Next, consider the information structure presented in Fig.3. Again, suppose the two players whose information structure is represented in Fig.3 are playing

5 the coordination game presented in Fig.2. That is, first, nature randomly draws a ball according to the prior probabilities. Each player sees the color associated with that state by their partition. Then, each player plays an action in the coordination game and receives payoffs according to their actions, as presented in the payoff matrix.

(a) In this example, both players know that at the rightmost state it is not the leftmost state and vice-versa. Assume r/(r + s) > p and s/(s + t) > p. Show there is no ESDS where both play A in the leftmost state and B in the rightmost state. To do this:

(i) Suppose that both play A in the leftmost state. Player 2 must play A one state to the right, too, since this state is also green. Show that player 1 will maximize his payoffs by playing A when blue. (ii) Show that player 2 will then maximize his payoffs by playing A when orange. (iii) Argue that player 1 will maximize his payoffs by playing A when yellow.

(b) What is the relationship to higher order beliefs? To answer this, ﬁrst suppose that 2 plays A when he sees green. Then answer each of the following:

(i) Suppose player 2 sees orange. What does player 2 think about what 1 sees? (ii) What does player 2 think that 1 thinks about what 2 sees? (iii) What does player 2 think that 1 thinks that 2 will do? (iv) What does player 2 think that 1 will do in response? (v) How should player 2 respond?