Repeated Games

ISCI 330 Lecture 16

March 13, 2007

Repeated Games ISCI 330 Lecture 16, Slide 1 Lecture Overview

Repeated Games ISCI 330 Lecture 16, Slide 2 Intro

Up to this point, in our discussion of extensive-form games we have allowed players to specify the action that they would take at every choice node of the game. This implies that players know the node they are in and all the prior choices, including those of other agents. We may want to model agents needing to act with partial or no knowledge of the actions taken by others, or even themselves. This is possible using imperfect information extensive-form games. each player’s choice nodes are partitioned into information sets if two choice nodes are in the same information set then the agent cannot distinguish between them.

Repeated Games ISCI 330 Lecture 16, Slide 3 5.2 Imperfect-information extensive-form games 119

(N,A,H,Z,χ,ρ,σ,u) is a perfect-information extensive-form game, and • I = (I1,...,In), where Ii is an equivalence relation on (that is, a partition of) • h H : ρ(h) = i with the property that χ(h) = χ(h′) whenever h and h′ are in { ∈ } the same equivalence class Ii.

Note that in order for the choice nodes to be truly indistinguishable, we require that the set of actions at each choice node in an information set be the same (otherwise, the player would be able to distinguish the nodes). Thus, if I Ii is an equivalence class, weExample can unambiguously use the notation χ(I) to denote the∈ set of actions available to player i at any node in information set I.

1 "b " b L " q bR " b 2 " b 2 "b " b (1,1) A " qbB q " b " 1 b %e %e ℓ % q e r ℓ % q e r %% ee %% ee (0,0) (2,4) (2,4) (0,0) q q q q Figure 5.10 An imperfect-information game. What are the equivalence classes for each player? ConsiderThe the imperfect-information pure strategies for each extensive-form player are a game choice shown of an in action Figure in 5.10. In this game, player 1 has two information sets: the set including the top choice node, and each equivalence class. the set including the bottom choice nodes. Note that the two bottom choice nodes in the second information set have the same set of possible actions. We can regard player Repeated Games ISCI 330 Lecture 16, Slide 4 1 as not knowing whether player 2 chose A or B when she makes her choice between ℓ and r.

5.2.2 Strategies and equilibria A pure for an agent in an imperfect-information game selects one of the avail- able actions in each information set of that agent:

Definition 5.2.2 Given an imperfect-information game as above, a pure strategy for agent i with information sets I ,...,I is a vector of a ,...,a such that a i,1 i,k 1 k j ∈ χ(Ii,j).

Thus perfect-information games can be thought of as a special case of imperfect- information games, in which every equivalence class of each partition is a singleton. Consider again the Prisoner’s Dilemma game, shown as a normal form game in Figure 3.2. An equivalent imperfect-information game in extensive form is given in Figure 5.11. Note that we could have chosen to make player 2 choose first and player 1 choose second.

Multi Agent Systems, draft of September 19, 2006 Normal-form games 120 5 Reasoning and Computing with the Extensive Form

We can represent any normal form game. 1 "b " b C" q b D " b " 2 b %e %e c % q e d c % q e d %% ee %% ee (-1,-1) (-4,0) (0,-4) (-3,-3) q q q q FigureNote 5.11 that itThe would Prisoner’s also be Dilemma the same game if we in put extensive player 2 form. at the root node.

Recall that perfect-information games were not expressive enough to capture the Prisoner’sRepeated GamesDilemma game and many other ones. In contrast, asISCIis 330 obvious Lecture 16, from Slide 5 this ex- ample, any normal-form game can be trivially transformed into an equivalent imperfect- information game. However, this example is also special in that the Prisoner’s Dilemma is a game with a dominant strategy solution, and thus in particular a pure-strategy . This is not true in general for imperfect-information games. To be precise about the equivalence between a normal form game and its extensive-form image we must consider mixed strategies, and this is where we encounter a new subtlety. As we did for perfect-information games, we can define the normal form game cor- responding to any given imperfect-information game; this normal game is again de- fined by enumerating the pure strategies of each agent. Now, we define the set of mixed strategies of an imperfect-information game as simply the set of mixed strate- gies in its image normal form game; in the same way, we can also define the set of behavioral Nash equilibria.4 However, we can also define the set of behavioral strategies in the strategy extensive-form game. These are the strategies in which each agent’s (potentially prob- abilistic) choice at each node is made independently of his choices at other nodes. The difference is substantive, and we illustrate it in the special case of perfect-information games. For example, consider the game of Figure 5.2. A strategy for player 1 that selects A with probability .5 and G with probability .3 is a behavioral strategy. In contrast, the mixed strategy (.6(A, G), .4(B, H)) is not a behavioral strategy for that player, since the choices made by him at the two nodes are not independent (in fact, they are perfectly correlated). In general, the expressive power of behavioral strategies and the expressive power of mixed strategies are non-comparable; in some games there are outcomes that are achieved via mixed strategies but not any behavioral strategies, and in some games it is the other way around. Consider for example the game in Figure 5.12. In this game, when considering mixed strategies (but not behavioral strategies), R is a strictly dominant strategy for agent 1, D is agent 2’s strict , and thus (R,D) is the unique Nash equi-

4. Note that we have defined two transformations – one from any normal form game to an imperfect- information game, and one in the other direction. However the first transformation is not one to one, and so if we transform a normal form game to an extensive-form one and then back to normal form, we will not in general get back the same game we started out with. However, we will get a game with identical strategy spaces and equilibria.

c Shoham and Leyton-Brown, 2006

Induced Normal Form

Same as before: enumerate pure strategies for all agents Mixed strategies are just mixtures over the pure strategies as before. Nash equilibria are also preserved. Note that we are now able both to convert NF games to EF, and EF games to NF.

Repeated Games ISCI 330 Lecture 16, Slide 6 Lecture Overview

Repeated Games ISCI 330 Lecture 16, Slide 7 Introduction

Play the same normal-form game over and over each round is called a “stage game” Questions we’ll need to answer: what will agents be able to observe about others’ play? how much will agents be able to remember about what has happened? what is an agent’s utility for the whole game? Some of these questions will have different answers for finitely- and infinitely-repeated games.

Repeated Games ISCI 330 Lecture 16, Slide 8 Finitely Repeated Games

Everything is straightforward if we repeat a game a finite number of times we can write the whole thing as an extensive-form game with imperfect information at each round players don’t know what the others have done; afterwards they do overall payoff function is additive: sum of payoffs in stage games

Repeated Games ISCI 330 Lecture 16, Slide 9 6.1 Repeated games 135

6.1.1 Finitely repeated games One way to completely disambiguate the semantics of a finitely repeated game is to specify it as an imperfect-information game in extensive form. Figure 6.2 describes the twice-played Prisoner’s Dilemma game in extensive form. Note that it captures the assumption that at each iteration the players do not know what the other player is playing, but afterwards they do. Also note that the payoff function of each agent is additive, that is, it is the sum of payoffs in the two stage games.

1 XXX C XX D  q XX ¨H 2 X¨H cd¨¨ HH cd¨¨ HH 1¨ qH 1 1¨ q H 1 CDCDCDCD @ @ @ @ q2 @ q2 @ q2 @ q2 @ cdcdcdcdcdcdcdcd¡A ¡A ¡A ¡A ¡A ¡A ¡A ¡A ¡ qqqqqqqqA ¡ A ¡ A ¡ A ¡ A ¡ A ¡ A ¡ A (-2,-2) A (-1,-5) A (-5,-1) A (-4,-4) A (-1,-5) A (0,-8) A (-4,-4) A (-3,-7) A q (-5,-1) q (-4,-4) q (-8,0) q (-7,-3) q (-4,-4) q (-3,-7) q (-7,-3) q (-6,-6) q q q q q q q q Figure 6.2 Twice-played Prisoner’s Dilemma in extensive form.

The extensive form also makes it clear that the strategy space of the repeated game is much richer than the strategy space in the stage game. Certainly one strategy in the repeated game is to adopt the same strategy in each stage game; clearly, this memory- less strategy, called a stationary strategy, is a behavioral strategy in the extensive-form stationary representation of the game. But in general, the action (or mixture of actions) played strategy at a stage game can depend on the history of play thus far. Since this fact plays a particularly important role in infinitely repeated games, we postpone further discussion of this to the next section. Indeed, in the finite, known repetition case, we encounter again the phenomenon of , which we first encountered when we introduced perfect equilibria. Recall that in the , discussed in Section 5.1.3, the unique SPE was to go down and terminate the game at every node. Now consider a finitely repeated Prisoner’s Dilemma case. Again, it can be argued, in the last round it is a dominant strategy to defect, no matter what happened so far. This is , and no choice of action in the preceding rounds will impact the play in the last round. Thus in the second to last round too it is a dominant strategy to defect. Similarly, by induction, it can be argued that the only equilibrium in this case is to always defect. However, as in the case of the centipede game, this argument is vulnerable to both empirical and theoretical criticisms.

6.1.2 Infinitely repeated games When the infinitely repeated game is transformed into extensive form, the result is an infinite tree. So the payoffs cannot be attached to any terminal nodes, nor can they be defined as the sum of the payoffs in the stage games (which in general will be infinite).

Multi Agent Systems, draft of September 19, 2006

Example 134 6 Richer Representations: Beyond the Normal and Extensive Forms

C D C D

C 1, 1 4, 0 C 1, 1 4, 0 − − − − − − ⇒ D 0, 4 3, 3 D 0, 4 3, 3 − − − − − −

Figure 6.1 Twice-played Prisoner’s Dilemma.

(e.g., the computation of Nash equilibria can be provably faster, or pure-strategy Nash equilibria can be proven to always exist). In this chapter we will present various different representations that address these limitations of the normal and extensive forms. In Section 6.1 we will begin by con- sidering the special case of extensive-form games which are constructed by repeatedly playing a normal-form game, and then we will extend our consideration to the case where the normal form is repeated infinitely. This will lead us to stochastic games in Section 6.2, which are like repeated games but do not require that the same normal- form game is played in each time step. In Section 6.3 we will consider structure of a different kind: instead of considering time, we will consider games involving uncertainty. Specifically, in Bayesian games agents face uncertainty—and hold pri- vate information—about the game’s payoffs. Section 6.4 describes congestion games, Repeated Games which model situations in which agents contend for scarce resources. Finally,ISCI in Sec- 330 Lecture 16, Slide 10 tion 6.5 we will consider representations that are motivated primarily by compactness and by their usefulness for permitting efficient computation (e.g., of Nash equilibria). Such compact representations can extend upon any other existing representation such as normal form games, extensive-form games or Bayesian games.

6.1 Repeated games

In repeated games, a given game (often thought of in normal form) is played multiple stage game times by the same set of players. The game being repeated is called the stage game. For example, Figure 6.1 depicts two players playing the Prisoner’s Dilemma exactly twice in a row. This representation of the repeated game, while intuitive, obscures some key factors. Do agents see what the other agents played earlier? Do they remember what they knew? And, while the utility of each stage game is specified, what is the utility of the entire repeated game? We answer these questions in two steps. We first consider the case in which the game is repeated a finite and commonly known number of times. Then we consider the case in which the game is repeated infinitely often, or a finite but unknown number of times.

c Shoham and Leyton-Brown, 2006

Example6.1 Repeated games 135 134 6 Richer Representations: Beyond the Normal and Extensive Forms 6.1.1 Finitely repeated games

One way to completelyC disambiguate D the semantics of a finitelCy repeated D game is to specify it as an imperfect-information game in extensive form. Figure 6.2 describes the twice-played Prisoner’s Dilemma game in extensive form. Note that it captures C 1, 1 4, 0 C 1, 1 4, 0 the assumption that− at− each iteration− the players do not know− − what the− other player is playing, but afterwards they do. Also note that the payoff function of each agent is additive, that is, it is the sum of payoffs in the⇒ two stage games. D 0, 4 3, 3 D 0, 4 3, 3 − − − − − −

Figure 6.1 Twice-played1 Prisoner’s Dilemma. XXX C XX D  q XX ¨H 2 X¨H cd¨ H cd¨ H (e.g., the¨ computationqH of Nash equilibria can be provably¨ faster, or q pure-strategyH Nash equilibria1¨ can be proven toH always 1 exist). 1¨ H 1 CDCDCDCD In@ this chapter we will present@ various different represent@ ations that address @ these limitationsq2 @ of the normal and q2 extensive@ forms. In Section q2 @ 6.1 we will begin by q2 con-@ cdcdcdcdcdcdcdcd¡A sidering¡ theA special case¡A of extensive-form¡A games¡A which are¡Aconstructed¡ byA repeatedly¡A ¡ qqqqqqqqAplaying¡ a normal-formA ¡ A game, and¡ thenA we will¡ extendA our¡ consiA deration¡ A to the case¡ A (-2,-2) whereA (-1,-5) the normalA (-5,-1) form isA repeated(-4,-4) infinitely.A (-1,-5) ThisA will(0,-8) lead usA to stochastic(-4,-4) A games(-3,-7) in A q (-5,-1)Sectionq 6.2,(-4,-4) whichq are(-8,0) like repeatedq (-7,-3) gamesq but(-4,-4) do not requireq (-3,-7)that theq same(-7,-3) normal-q (-6,-6) formq game isq played in eachq time step.q In Sectionq 6.3 we willq consider structureq q of aFigure different 6.2 kind:Twice-played instead of considering Prisoner’s time, Dilemma we will in con extensivesider games form. involving uncertainty. Specifically, in Bayesian games agents face uncertainty—and hold pri- vate information—about the game’s payoffs. Section 6.4 describes congestion games, Repeated GamesThe extensivewhich model form situations also makes in which it clear agents that contend the for strategy scarce re spacsources.e of Finally, theISCI repeated in Sec- 330 Lecture game 16, Slide 10 is much richertion 6.5 than we willthe consider strategy representations space in the that stage are motivate game.d Cert primarilyainly by one compactness strategy in the and by their usefulness for permitting efficient computation (e.g., of Nash equilibria). repeated gameSuchcompact is to adopt representations the same can strategy extend upon in each any other stage exis gameting representation; clearly, this such memory- less strategy,as normal called form a stationary games, extensive-form strategy,games is a behavioral or Bayesian strategygames. in the extensive-form stationary representation of the game. But in general, the action (or mixture of actions) played strategy at a6.1 stage Repeated game can games depend on the history of play thus far. Since this fact plays a particularly important role in infinitely repeated games, we postpone further discussion of this to theIn repeated next section. games, a given Indeed, game in (often the thought finite, of known in normal repet form)ition is played case, multiple we encounter stage game times by the same set of players. The game being repeated is called the stage game. again the phenomenonFor example, Figure of backward 6.1 depicts two induction, players playing which the we Pris firstoner’s en Dilemmacountered exactly when we introducedtwice subgame in a row. perfect equilibria. Recall that in the centipede game, discussed in Section 5.1.3,This the representation unique SPE of the was repeated to go game, down while and intuitive, terminateobscures the somegame key at factors. every node. Now considerDo agents a finitely see what repeated the other Prisoner’s agents played Dilemma earlier? case. Do they Ag reain,member it can what be they argued, in knew? And, while the utility of each stage game is specified, what is the utility of the the last roundentire it repeated is a dominant game? strategy to defect, no matter what happened so far. This is common knowledge,We answer these and questions no choice in two of steps. action We first in the consider preceding the caserounds in which willthe game impact the play in theis last repeated round. a finite Thus and in commonly the second known to number last round of times. too Then it i wse a consider dominant the case strategy to defect. Similarly,in which the by game induction, is repeated it infinitely can be often,argued or a that finite the but onunknownly equilibrium number of times. in this case is to always defect. However,c asShoham in the and case Leyton-Brown, of the centipede 2006 game, this argument is vulnerable to both empirical and theoretical criticisms.

6.1.2 Infinitely repeated games When the infinitely repeated game is transformed into extensive form, the result is an infinite tree. So the payoffs cannot be attached to any terminal nodes, nor can they be defined as the sum of the payoffs in the stage games (which in general will be infinite).

Multi Agent Systems, draft of September 19, 2006 Notes

Observe that the strategy space is much richer than it was in the NF setting Repeating a Nash strategy in each stage game will be an equilibrium (called a stationary strategy) however, there can also be other equilibria In general strategies adopted can depend on actions played so far We can apply backward induction in these games when the normal form game has a dominant strategy.

Repeated Games ISCI 330 Lecture 16, Slide 11 Lecture Overview

Repeated Games ISCI 330 Lecture 16, Slide 12 Infinitely Repeated Games

Consider an infinitely repeated game in extensive form: an infinite tree! Thus, payoffs cannot be attached to terminal nodes, nor can they be defined as the sum of the payoffs in the stage games (which in general will be infinite).

Definition

Given an infinite sequence of payoffs r1, r2,... for player i, the k average reward of i is limk→∞ Σj=1rj/k.

Repeated Games ISCI 330 Lecture 16, Slide 13 Discounted reward

Definition

Given an infinite sequence of payoffs r1, r2,... for player i and a discount factor β with 0 ≤ β ≤ 1, the future discounted rewards of P∞ j i is j=1 β rj.

Interpreting the discount factor: 1 the agent cares more about his well-being in the near term than in the long term 2 the agent cares about the future just as much as the present, but with probability 1 − β the game will end in any given round. The analysis of the game is the same under both perspectives.

Repeated Games ISCI 330 Lecture 16, Slide 14 a choice of action at every decision point here, that means an action at every stage game ...which is an infinite number of actions! Some famous strategies (repeated PD): Tit-for-tat: Start out cooperating. If the opponent defected, defect in the next round. Then go back to cooperation. Trigger: Start out cooperating. If the opponent ever defects, defect forever.

Strategy Space

What is a pure-strategy in an infinitely-repeated game?

Repeated Games ISCI 330 Lecture 16, Slide 15 Strategy Space

What is a pure-strategy in an infinitely-repeated game? a choice of action at every decision point here, that means an action at every stage game ...which is an infinite number of actions! Some famous strategies (repeated PD): Tit-for-tat: Start out cooperating. If the opponent defected, defect in the next round. Then go back to cooperation. Trigger: Start out cooperating. If the opponent ever defects, defect forever.

Repeated Games ISCI 330 Lecture 16, Slide 15 Nash Equilibria

With an infinite number of equilibria, what can we say about Nash equilibria? we won’t be able to construct an induced normal form and then appeal to Nash’s theorem to say that an equilibrium exists Nash’s theorem only applies to finite games Furthermore, with an infinite number of strategies, there could be an infinite number of pure-strategy equilibria! It turns out we can characterize a set of payoffs that are achievable under equilibrium, without having to enumerate the equilibria.

Repeated Games ISCI 330 Lecture 16, Slide 16