Basics of game theory 1 (10/1/17 to 2/2/17)

5 What is a game?

We wish to model real situations using games. So first we look at the issues that must be considered when the model is being constructed. We use the words agent and player interchangeably.

a) The number of agents (or player groups) in a game. Each game must have at least two players. b) Information availability: Whether all players in the game have full and complete information when it is their turn to play. These are called games of perfect or complete information. For example, is such a game since both players know all the moves made by the other and the rules of the game (known to both players) constrain what moves can be made at each point in future. Games where players must move simultaneously (so or where some information is not available as in many card games are examples of games with imperfect information. This applies largely to sequential games where players move in turn. c) Moves or actions available to each player. This is normally specified in the form of a set of strategies for each player. Informally, a is a rule or algorithm which tells a player what move to make at each point. d) The rules of the game. These are normally expected to be known to all players but in real life applications that are modelled as games one or more players may not be fully aware of the rules. e) Games can be one-shot or repetitive or sequential. A one-shot game is played only once. A repetitive game is one where a game is played repeatedly between the same players. A game may be finitely repeated or repeated infinitely. Player strategies/behaviour can vary depending on whether a game is one-shot or repetitive. A is one where players take turns to play and the game typically lasts over multiple rounds of play (e.g. chess). A sequential game can also be repetitive if it is played again and again. f) Presence of stochastic or random element. A player called ‘Nature’ is invented when there is a stochastic or random element enters into the game. For example, dealing cards or throwing dice. In such cases a probability distribution is defined over all possible moves/states to specify the random move played by Nature. g) Nature of payoff. The goal of any player is to maximize his/her payoff or alternately win the game. player (or some times the group). For any game a payoff function for each player has to be defined. The domain of the payoff function is a set of strategy profiles (informally if the game has n players a strategy profile is the n-tuple th (s1, . . . , sn) where si is a strategy from the strategy set of the i player) and the range is R. h) Cooperation: in some games, called cooperative games, one or more agents may cooperate with another agent or group.

i) Static versus dynamic: there are two senses in which games can be dynamic. Games with incomplete information (e.g. when a player does not know the payoff or costs or other information that is private to other players) are dynamic since more information may become available as a game proceeds thereby changing available or chosen strategies. In the second sense evolutionary games are dynamic as strategies that have lower fitness gradually vanish from the population (see section 4 in lec. 1).

6 An example game

Consider an example game described below:

ˆ Two Langurs L(big) and l(small) are below a fruit tree that has 10 fruits on it. One or both of them must climb an shake the tree so that the fruit drop to the ground after which they can them.

3 Figure 1: L plays first. Figure 2: l plays first. Figure 3: L, l play simultaneously.

ˆ Here is some more data on the game:

–L takes 2 units of fruit energy to climb and shake the branches. –l (being light and small) takes 0 units of fruit energy for the same. – If L climbs then fruits are shared L-6, l-4. – If l climbs then fruits are shared L-9, l-1. – If both L and l climb then fruits are shared L-7, l-3.

ˆ The goal of each langur is to get maximum net fruit energy units (1 fruit ≡ 1 fuit energy unit). ˆ Two moves are available to both Langurs. Either climb (c) or wait below (w). ˆ There are 3 possible scenarios

–L moves first. –l moves first. – Both move simultaneously. Note that neither knows what move/strategy the other will play.

We represent the game as a game tree. Figures 1, 2 are game trees when L l play first respectively and 3 is a tree when both play simultaneously. In figure 3 L moves first but it can also be drawn with l moving first - it is symmetric with respect to the two players. Figures 1 and 2 are games since both players know everything that the other player does. Figure 3 is an imperfect or incomplete information game since neither knows which strategy/move will be played by the other. So, the c and w nodes are connected with a dashed line to form what is called an information set. In figure 3 l does not know which of c or w L has played before making its own move since both play simultaneously. The nodes in an information set are indistinguishable for the player whose turn it is to play. A single node is also an information set - a set with just one element, namely the node itself. So, we can always talk in terms of information sets while talking about game trees. The game trees in figures 1 to 3 are finite game trees - that is they have finitely many nodes. We can also have infinite game trees where the tree has infinitely many nodes - for example a chess game where repetition of moves is allowed. A game tree may have infinite number of nodes but may simultaneously have finite depth. This can happen if one or more nodes have infinitely many children but the depth of a path from the root to any leaf node is finite. Such games are called finite horizon games. An example of such a game is where at a node a player’s move corresponds to choosing any number from a real interval e.g. [a, b], a < b, a, b ∈ R.

4 P2 (column player l) P1 (row player L) [(c,c)(w,c)] [(c,c)(w,w)] [(c,w)(w,c)] [(c,w)(w,w)] c 5,3 5,3 4,4 4,4 w 9,1 0,0 9,1 0,0

Table 1: The normal form of the langur game.

A game represented as a tree which shows the details of who moves when is called a game in extensive form. A game can also be represented as a table or matrix (n−dimensional where n is the number of players) the value of the ith dimension di is the number of strategies available to player i and the entries in the matrix give the payoffs for each player when it chooses that strategy. This is called the normal form of the game. For example the tree in figure 1 converts to the following normal form game. l’s strategies say what l will play when L plays c and when it plays w. The game tree in figure 3 has the following normal form. Note that l has only two strategies since nodes in an information set are indistinguishable.

P2 (column player l) P1 (row player L) c w c 5,3 4,4 w 9,1 0,0

6.1 Analysis of the game

We assume both players are rational which means each player tries to maximize its payoff. Consider the game in figure 1 where L plays first. It reasons that playing w is best since the payoff is 9 if l climbs. But l must climb because then it gets at least 1 otherwise it gets 0. Note that L has just 2 strategies - w and c. However, l has 4 strategies. It can decide: i) play c irrespective of what L plays, ii) play w irrespective of what L plays, iii) play the same move that L plays, iv) play the opposite of what L plays. These 4 strategies are clear from the normal form of the game. To completely define a strategy for player i we have to give a rule/algorithm that gives the move for each information set when it is i0 s turn to play. For the game in figure 2 (l plays first) l can play w knowing that rationality demands L will play c to maximize its payoff resulting in a payoff of 4 for both.

6.1.1 Threats, promises

Suppose for the game in figure 1 l threatens to play strategy iv) and makes L believe the threat then clearly L should play c because then its payoff is 4 which is better than 0 (if it plays w). This is usually called an incredible threat since the key question is how does l make L believe that it will play iv) since rationally w is the best strategy for L. Notice that if the threat does not work then both players lose. Similarly, l can make a promise that it will play c if L plays c. This gives L a payoff of 5 while l gets 3 so L has 1 more and l has 1 less compared to the threat. But, in this case L need not accept the promise since if it plays w it stands to gain 9 and l will be forced to play c to maximize its payoff. A promise implies it is good for the other player but not for the promiser. If it was good for both then both should prefer it in the normal course of play. Threats and promises need some apriori commitment from the players. l must, in advance, convince L of the threat or the promise for either to be effective.

5 6.1.2 Mixed and pure strategies

For the game in figure 3 since neither knows what the other will play the best strategy is for both to choose strategies with a suitable probability distribution. Such strategies are called mixed strategies. The earlier strategies (in the normal form table) are called pure strategies.

Assume row player L chooses c with probability pr and column player l chooses c with probability pc. Then L’s expected payoff is:

Πr = 5prpc + 9(1 − pr)pc + pr(1 − pc)4

Differentiating and equating to 0 gives:

∂Πr = 5pc − 9pc + (1 − pc)4 = 4 − 8pc = 1 − 2pc = 0 ∂pr

1 or pc = 2 . Repeating the calculation for l gives:

Πc = 3prpc + (1 − pr)pc + pr(1 − pc)4

∂Πc = 3pr − pr + (1 − pr)4 = 1 − 2pr = 0 ∂pc 1 or pr = 2 . Both ∂Πr and ∂Πc have exactly similar expressions. So, let us analyse ∂Πr further. We see, ∂pr ∂pc ∂pr  > 0 p < 1  c 2 ∂Πr  satisfies = 0 pc = 0 ∂pr   1 < 0 pc > 2

∂Πr 1 Since > 0 when pc < we can increase pr to the maximum possible value (that is 1) to maximize Πr - that is ∂pr 2 1 use the pure strategy. Similar reasoning works when pc > 2 to give pr = 0. This implies:

 1  1 1 pc < 1 pr <  2  2 pr = [0, 1] pc = 0 and pc = [0, 1] pr = 0    1  1 0 pc > 2 0 pr > 2

6.1.3 Nature moves

If stochastic events/actions are possible at a node we define a new player called Nature. This is also how we model different states of the world that are possible in some games (e.g. in games of chance). In both cases we define probabilitiy distributions over actions or states of the world. Consider the game tree in figure 4. The non-leaf node labels have the format (, ). From the tree we see that player P1 is able to see Nature’s moves since nodes N1, N2 are distinct. However, player P2 is not able to see Nature’s moves since it responds only to whether P1 plays L or R - L1, L2 form an information set and R1, R2 form another information set. So, P2 does not know whether Nature has played N1 or N2. Player P1 has the following 4 strategies:

1. (N1 L,N2 L) - Play L irrespective of what Nature plays. 2. (N1 R,N2 R) - Play R irrespective of what Nature plays. 3. (N1 L,N2 R) - Play L if Nature plays N1 and R otherwise.

6 Figure 4: Game tree with Nature (N) playing the first move.

4. (N1 R,N2 L) - Play R if Nature plays N1 and L otherwise.

Player P2 has the following 4 strategies:

a) (L1L2 l, R1R2 l) - Play l irrespective of the information set. b) (L1L2 r, R1R2 r) - Play r irrespective of the information set. c) (L1L2 l, R1R2 r) - Play l if P1 plays L and r if P1 plays R. d) (L1L2 r, R1R2 l) - Play r if P1 plays L and l if P1 plays R.

Payoffs are now expectation based. For example:

ΠP 1(1, a) = 0.6 × ΠP 1(1, a, N1) + 0.4 × ΠP 1(1, a, N2) = 0.6 × 4 + 0.4 × 6 = 4.8

ΠP 2(2, c) = 0.6 × ΠP 2(2, c, N1) + 0.4 × ΠP 2(2, c, N2) = 0.6 × 1 + 0.4 × 6 = 3.0

6.1.4 Equilibrium strategies and Nash equilibrium

If we look at the normal form of the game in figure 1 the strategy pairs (w,[(c,c)(w,c)]), (w,[(c,w)(w,c)]) and (c,[(c,w)(w,w)]) are such that L has the maximum payoff (look at the corresponding column values in the same column) and simultaneously l has the maximum payoff (check the corresponding row values in the same row). Such strategy pairs are called equilibrium strategies or more commonly Nash equilibria. In a Nash equilibrium each player’s strategy is the best response to the strategies chosen by the other player.

6.2 Notation, definitions, some results

This section fixes the notation and conventions used in this document and defines all the terms being used more precisely.

7 For game trees terminal (leaf) nodes are rectangular with the label being the payoff values. Intermediate (interior) nodes are ovals with the node label in the form (node-name, player-ID). The root of the tree is always called root. Definition 1 (Game in extensive form and related). A game in extensive form contains a finite set of players n; a rooted game tree where each interior node of the tree is mapped to one of the players and each edge emanating from the node is labelled by a move that the player can make at that node; the number of children at each interior node is equal to the number of moves that the player can make at that node. Each terminal (leaf) node is assigned a payoff value that is a real number (this can be done through a payoff function). Further, due to imperfect information if some nodes are indistinguishable for a particular player then they form an equivalence set of nodes called an information set. The information set functions exactly like a normal interior node - that is it has a single player associated with all nodes in the set and the possible moves from each node in the information set are identical. If there is a chance or stochastic element in the game then a fictitious player called Nature is added to the set of players and the edges emanating from Nature are labelled with the probability of making that move. The game is finite if the game tree has finitely many nodes. The game tree is of finite extent if the game tree has infintely many nodes but has finite depth. This means some nodes in the game tree have infinitely many children.

Nodes that are part of an information set are connected by dashed undirected edges when game trees are drawn. In some depictions the entire information set may be circled with a dotted or undotted oval. Note that all nodes in a game tree can be treated as information sets. Genuine information sets will contain more than one node while degenerate information sets will contain just a single node. The words action and move are synonymous in the context of a player playing a game and are used interchangeably. Definition 2 (Strategy, pure strategy, mixed strategy). In a game a strategy for a player is a rule or algorithm that completely defines the sequence of moves for the player for every choice of moves by the other players. A pure strategy is one where all moves in the strategy are completely determined and each move is one of the moves available to the player.

If s1, . . . , sk ∈ S are pure strategies for player i then a mixed strategy is a probability distribution over s1, . . . , sk Pk and written as mi = p1s1 + p2s2 + ... + pksk with j=1 pj = 1. The set of pj 6= 0 is called the support of mi. If only one pj 6= 0 (i.e. it is 1) then mi is the pure strategy sj ∈ S.

Definition 3 (Strategy profile). Let Si be the set of strategies available to player i. Then in an n-players game the n-tuple s = (s1, s2, . . . , sn) where si ∈ Si is called a pure strategy profile - that is s ∈ S1 × S2 × ... × Sn. S is the set of all pure strategy profiles, that is it is the Cartesian product above and s ∈ S.

In an n-players game a mixed strategy profile m = (m1, . . . , mn) where mi is a mixed strategy for player i. M is the set of mixed strategy profiles.

Definition 4 (Payoff). The payoff function Π: M → R defines the payoff Π(m) for a mixed strategy profile m, m ∈ M. Assuming players choices are made independently the payoff for player i from mixed strategy profile m is the expected value X X Πi(m) = ... ps1 . . . psn Πi((s1, . . . , sn))

s1∈S1 sn∈Sn

Definition 5 (Normal/strategic form of game). The normal or strategic form of a game has n players, Si set of pure strategies for each player i, and a payoff function Πi : S → R for player i, where S is the set of strategy profiles. th A normal form game can be specified as a |S1| × |S2| × ... × |Sn| sized matrix where the possible indicies of the i dimension are the strategies in Si.

For example the normal form of the Langur game is shown in table 1. To convert an extensive form game into normal form we can use the following steps: a) The number of players in normal form is the same as that in extensive form.

8 b) For each player a pure strategy is a choice of move at each information set labelled by the player in the game tree. Qmi The number of pure strategies for player i with mi information sets in the game tree is: j=1 movesj - that is the product of the moves available at each information set labelled by the player in the game tree. P c) If T is the set of terminal nodes then the payoff for player i for strategy profile s is given by: Πi(s) = t∈T p(s, t)Πi(t) - this is the general form that allows nature moves. Any strategy profile s is a set of paths starting at the root and ending at the terminal nodes t in the game tree that have non-zero probability given the strategy profile s. p(s, t) is the probability of the path corresponding to the terminal node t.

The game tree in figure 4 has the following normal form: The sample calculation for the strategy profile:

P2: column player P1: row player [(L1L2 l)(R1R2 l)] [(L1L2 l)(R1R2 r)] [(L1L2 r)(R1R2 l)] [(L1L2 r)(R1R2 r)] [(N1 L)(N2 L)] 4.8, 4.0 2.4, 1.6 2.4, 4.6 0.0, 3.0 [(N1 L)(N2 R)] 2.4, 2.4 4.0, 4.8 0.0, 3.0 1.6, 5,4 [(N1 R)(N2 L)] 5.4, 1.6 3.0, 0.0 3.0, 2.2 0.6, 0.6 [(N1 R)(N2 R)] 3.0, 0.0 4.6, 2.4 0.6, 0.6 2.2, 3.0

Table 2: The normal form of the game in figure 4 with Nature moves.

s =[[(N1 L)(N2 L),(L1L2 l)(R1R2 l)]] in table 2 for each player P1, P2 look as follows:

ΠP 1(s) = 0.6 × 4 + 0.4 × 6 = 4.8

ΠP 2(s) = 0.6 × 4 + 0.4 × 4 = 4.0

Let G be a game in normal form with n players. The following data is given for the ith player: Si: set of pure strategies. Πi : S → R: the payoff function, S is the set of strategy profiles. ∆Si: set of mixed strategies. ∗ ∗ ∆ S = ∆S1 × ∆S2 × ... × ∆Sn: set of mixed strategy profiles. If m ∈ ∆ S is a mixed strategy profile then mi refers th to the i component of m which is a mixed strategy of player i. Clearly, mi ∈ ∆Si. ∗ 0 0 0 0 Let m ∈ ∆ S, m ∈ ∆Si then (m−i, m ) = (mi, m−i) = replace mi by m in m. It is the profile obtained by replacing 0 mi in m by m . Given the above notation we can now define Nash equilibrium.

? ? ? ? Definition 6 (Nash Equilibrium). Strategy profile m = (m1, . . . , mn) ∈ ∆ S is a Nash equilibrium if for every 0 ? ? 0 ? player i ∈ 1..n and for every m ∈ ∆Si, Πi(m ) ≥ Πi(m−i, m ). So, for player i choosing mi is at least as good as ? choosing any other m ∈ ∆Si given that the other players play m−i.

So for a strategy profile m? that is a Nash equilibrium (NE) each player’s response is the best response to strategies chosen by the other players. Other individual responses may be as good for particular moves but no strategy is strictly better. In many games one can accurately predict that players choose strategies that implement the NE. If players deviate from the NE then there is a good chance that they do not understand the game properly or the payoff function from the players’ point of view is different from the one that has been assumed in the model. We now state a few results. Theorem 1. Every extensive form game has a unique normal form but the converse is not true. Theorem 2 (Fundamental theorem of game theory - Nash). If each player in an n-players game has a finite number of pure strategies then the game has at least one Nash equilibrium possibly in mixed strategies.

9 Theorem 3 (Fundamental theorem of mixed strategy NE). Let m = (m1, . . . , mn) be a mixed strategy profile for an n player game. Then m is a NE iff for any player i ∈ 1..n with pure strategy set Si the following holds:

0 0 a) If s, s ∈ Si occur with positive weight (probability) in m then the payoffs for s, s are equal when played against m−i. b) If s has positive weight and s0 has 0 weight then the payoff for s0 is less than or equal to the payoff for s when played against m−i.

Proof. Consider a). The argument is fairly easy. Assume s and s0 had probabilities p > 0 and p0 > 0 respectively in m (which is an NE). If the payoff one of them is greater than the other then clearly we will get a higher payoff by shifting all the probability mass of the strategy with lower payoff to the strategy with the higher payoff keeping others unchanged implying that m is not a NE. A very similar argument works for b).

Now for the only if part let a) and b) be true. Let m = (m1, . . . , mn) be a mixed strategy such that for each mi all the strategies that have positive probability weight have equal payoff and all strategies mi with 0 weight have payoff less than or equal to those with non-zero weight. Since above is true for all players for any particular player, say i, shifting weight to strategies with 0 weight cannot increase the payoff for player i assuming others continue with 0 0 their strategies so for all i strategy profile m is such that Πi(m) ≥ Π((m−i, m )) where m is a strategy in mi with 0 weight to which some probability weight is shifted from the other that have non-zero weight. This implies that m is an equilibrium strategy and a Nash equilibrium.

6.3 Solution to mixed strategy NE and dominant strategies

Consider the followng 2-players normal form (NF) game. Let Row be the row player and Col the column player Col Row l r

L a1, a2 b1, b2

R c1, c2 d1, d2 We derive conditions for checking for a mixed strategy equilibrium. Let the row and column players play the mixed strategies mr = (pr, 1 − pr) and mc = (pc, 1 − pc) respectively where probabilities pr, pc > 0. From theorem 3 Row’s payoff for strategies L and R must be equal given Col plays mc. And the same holds for strategies l and r for Col given Row plays mr. This gives:

pca1 + (1 − pc)b1 = pcc1 + (1 − pc)d1 (1)

pra2 + (1 − pr)c2 = prb2 + (1 − pr)d2 (2)

From equations (1) and (2) we get:

d1 − b1 pc = (a1 − b1) − (c1 − d1) 1 = (3) 1 + (a1−c1) (d1−b1) d2 − c2 pr = (d2 − c2) + (a2 − b2) 1 = (4) 1 + (a2−b2) (d2−c2)

10 For pc and pr to be probabilities: a − c 1 1 ≥ 0 (5) d1 − b1 a − b 2 2 ≥ 0 (6) d2 − c2 In the mixed NE the payoffs for the Row and Col players are: payoffr = a1prpc + pr(1 − pc)b1 + (1 − pr)pcc1 + (1 − pr)(1 − pc)d1 (7) payoffc = a2prpc + pc(1 − pr)c2 + (1 − pc)prb2 + (1 − pc)(1 − pr)d2 (8)

The number of configurations that must be checked for a 2×2 game is 4 pure configurations and 1 mixed configuration. In addition we can have a pure-mixed NE which adds another 4 configurations giving a total of 9. For an n×m NF game n n  the number of ways in which non-zero probability mass can be distributed over n strategies is: 1 + 1 + ... + n−1 = 2n − 1. The summation counts the number of ways in which probability weight can be distributed over n strategies - leaving out 0, 1 etc. Similarly, for the second player we get 2m −1 configurations. So, the total number of configurations to be checked becomes: (2n − 1) × (2m − 1).

6.3.1 Dominance

0 For a game in NF let si, si be two strategies for player i. Then the strict dominance relation ( ) and the weak dominance relation  are defined as follows: 0 0 si si:(si strictly dominates si) if for every choice of strategies by other players payoff for i by using si is greater 0 than by using si. 0 0 si  si:(si weakly dominates si) if for every choice of strategies by other players payoff for i by using si is greater 0 than or equal to payoff using si and for some choice of strategies by other players payoff for i using si is greater than 0 payoff using si. 0 Note that implies . By a rational agent we mean that if s  si then i will choose si. The following example will make the concept of dominance more concrete. Col Row a b c d i (1, 2) (-1, 2) (2, 4) (-1, 3) ii (4, 1) (1, 1) (3, 1) (0, 2) For Row, ii i since for any strategy of Col, Row’s payoff for ii is greater than for i. For Col, d a, b and c  a, b. In an NE all players play their dominant strategies.

6.4 , iterative elimination of dominated strategies

Given a finite extensive game with perfect information we compute the best strategies for all players by the method of backward induction. This assumes that the players are rational and will choose moves at each stage that will maximize their payoff. Algorithm 1 gives the details. If we use the algorithm on the game tree in figure 1 then the node c will be labelled with (4, 4) and node w by (9, 1). Now these have become terminal nodes. The root is now labelled with (9, 1). So, the strategies returned are: L: w; l: [w, c]. This is a NE. For NF games we simplify the game by eliminating strongly dominated strategies. The order in which strongly dominated strategies are eliminated does not change the final normal form of the game. Similarly, we can eliminate weakly dominated strategies but now the order of removal can change the . Consider the following NF for the langur game.

11 input : Full information game tree output: Strategies of all players 1 repeat 2 Choose node n all of whose children are terminal nodes; /* Let player playing at node n be i */ 3 Choose terminal node of n with max payoff and label n with that payoff vector and record the move as part of the strategy for player i; /* Now n has become a terminal node */ 4 until tree has only one node; 5 return strategy for all players Algorithm 1: The backward induction algorithm.

l L (cc wc) (cc ww) (cw wc) (cw ww) c (5, 3) (5, 3) (4, 4) (4, 4) w (9, 1) (0, 0) (9, 1) (0, 0)

For L neither c nor w dominates the other. For l we get:

1. (cw wc) (cc ww)

2. (cc wc)(cc ww) 3. (cw wc)(cc wc) 4. (cw ww)(cc ww)

5. (cw wc)(cw ww)

6.5 Examples of games

We now look at some well know examples of games. Marble choosing game: Row: has a box with 2 marbles. Col: also has a box with 2 marbles. Both play simultaneously and choose either one or two marbles randomly from their boxes. If both have drawn the same number of marbles Row wins else Col wins. The loser pays the winner Re 1. The NF form of the games is: Col Row 1 2 1 (1, -1) (-1, 1) 2 (-1, 1) (1, -1)

1 1 There is no pure NE. For a mixed NE using equations (4) and (3) we get pr = 2 and pc = 2 . The expected payoffs for Row and Col are using equations (7) and (8): PayoffRow =PayoffCol = 0.

12 Battle of the sexes Ram and Sita love each other and would rather be together than apart. But Ram prefers to go to a music program while Sita wants to watch the T-20 game. The NF form of the game is: Sita Ram Music T-20 Music (2, 1) (0, 0) T-20 (0, 0) (1,2)

2 There are two pure strategy NE: (Music, Music) and (T-20, T-20). For the mixed strategy NE we get pr = /3, 1 2 2 pc = /3. The payoffs are: PayoffRam = /3, PayoffSita = /3. Note that the pure strategy NEs are better than the mixed strategy NE. There is yet another way to find the mixed NE. Let pr = probability Ram chooses music. pc = probability Sita chooses music.

ΠRam = 2prpc + (1 − pr)(1 − pc) = 3prpc − pr − pc + 1 (9)

ΠSita = prpc + 2(1 − pr)(1 − pc) = 3prpc − 2pr − 2pc + 2 (10)

Differentiating equation (9):

∂ΠRam = 3pc − 1 ∂pr This implies:  1 > 0, pc > /3 ∂ΠRam  1 = = 0, pc = /3 ∂pr   1 < 0, pc < /3

∂2Π 2 Note that Ram/∂pr = 0 so we have an inflection point.

The best pr value is given by:  1 1, pc > /3  1 pr = [0, 1] pc = /3   1 0, pc < /3

∂Π 1 The reason for the above is that since the slope Ram/∂pr is positive for pc > /3 we can push pr to its maximum value 1 1 to get the maximum possible payoff ΠRam. A similar argument works for pc < /3 pushing pr to its minimum value 0. We can repeat the above calculation with equation 10 to get:

  2 2 > 0, pr > /3 1, pr > /3 ∂ΠSita   2 and 2 = = 0, pr = /3 pc = [0, 1] pr = /3 ∂pc    2  2 < 0, pr < /3 0, pr < /3

Figure 5 shows the pc, pr response functions.

13 pc

1.0

2 3

( 2 , 1 ) 3 3 1 3

pr 1 2 1.0 3 3

Figure 5: pc response function (red), pr response function (black).

Hawk-Dove game Assume there is an area with lot of bird food and two birds Row and Col are fighting for possession. The strategies available to both birds are: Hawk: fight till you are injured or opponent retreats. Dove: threaten to fight but retreat before sustaining injury if opponent continues to fight. Assume v =value of the food, w =cost of injury and w > v. The NF form is given below: Col Row Hawk Dove v−w v−w Hawk ( 2 , 2 ) (v, 0) v v Dove (0, v) ( 2 , 2 )

More generally, Col Row Hawk Dove with x < l < t < u. Hawk (x, x) (u, l) Dove (l, u) (t, t) A symmetric NE is when both/all players play the same strategy for NE. Players cannot condition their strategies on whether they play first or second. So, in the Hawk-Dove game though (Hawk, Dove) and (Dove, Hawk) are NEs the players cannot use them since they are not symmetric. This also implies that both players have to play the same mixed strategy for NE. Let h be the probability of playing Hawk and (1 − h) the probability of playing Dove. Then the payoffs for playing Hawk, Dove are (by either player): v − w Π = h( + (1 − h)v Payoff for playing pure Hawk Hawk 2 v Π = h(0) + (1 − h) Payoff for playing pure Dove Dove 2 v ? v By theorem 3 the two payoffs must be equal. Equating and simplifying we get h = w , so NE value h = w . And the v w−v v w−v equilibrium payoffs are: ΠHawk = 2 w and ΠDove = 2 w . If w ≈ v then the payoff is close to 0 and all value is lost in fighting. On the other hand if w >> v then payoff is v approximately 2 . This leads to the MAD (mutually assured destruction) principle. If both play Hawk then the payoff is negative and w approximately − 2 . This game was used to model the cold war and nuclear deterrence. This also gives some justification

14 for the NPT (nuclear non-proliferation treaty) which is a highly discriminatory setup but which is supported strongly by many nations. The reasoning is that if many countries have nuclear weapons then this increases the chance of playing Hawk perhaps due to some error with catastrophic consequences for everyone. Prisoner’s Dilemma game Two prisoners are jailed for a serious crime that they committed jointly which carries a heavy sentence. The prisoners are held separately and cannot communicate with each other. The evidence for the crime is not sufficient to convict them for the serious crime but it is enough to convict them for a different crime that is less serious and carries a lighter sentence. However, if only one of them confesses then the other one will get the heavy sentence and the confessor will be let off. If both confess both get a medium heavy sentence. The strategies available to both prisoners are: Defect: that is confess - represented by D. Cooperate: do not confess - represented by C. Note: do not confuse C with confess. It is the opposite. The NF game is given below: Col Prisoner Row Prisoner C D u > w > v > 0 C (v, v) (u, 0) D (0, u) (w, w)

Note that (D, D) is an NE. In this case we minimize the payoff since it is a jail sentence. It can be converted to a standard maximization problem by inverting the sign of all payoffs. Row reasons as follows: If Col cooperates then Row should defect since s/he gets off. If Col defects then Row should defect since w < u. So, in either case Row should defect. Similar reasoning convinces Col to also defect. And this is indeed the NE. The dilemma is that if both had cooperated then both would have got a higher payoff in this case a lighter sentence of v. The canonical Prisoner’s dilemma game (PD) looks as follows: Col prisoner Row prisoner Cooperate Defect t > r > p > s Cooperate (r, r) (s, t) Defect (t, s) (p, p)

We have r > p as cooperate is better than defect. Also defection is the dominant strategy for both players so t > r and p > s. This gives the condition for the PD. Here the NE is calculated in the normal way (not minimization).

7 Cooperative games

In multi-player games it may be better for one or more individual players to join with others so that their individual payoff is higher as part of a coalition compared to when they play as individuals. In non-cooperative games each player is trying to maximize his/her payoff. In cooperative games forming coalitions can improve/increase the payoff for an individual player. Note that cooperation is distinct from altruism. Altruism means that the individual displaying altruism pays a price for being altruistic. In cooperation the individual actually gains by being part of a coalition. Let us see this through an example. Example: Fˆetegame. Five children Amar, Akbar, Anthony, Sita, Gita are friends and are going to a fˆetewhere each child has brought some food items that they could find at home. Each child hopes to sell her/his item for Rs.10/-.This is what each has brought: Sita - 4 bananas; Gita - 2 - mangoes and some sugar; Amar - 500ml of milk; Akbar - 250gms of maida and 3 eggs; Anthony - 200gms malai and some baking powder.

15 When they look at what each has brought Sita observes that if they cooperate they should actually be able to do better (assuming all items are sold for Rs.10/- per unit): Coalition What Coalition payoff(Rs) Individual payoff(Rs) Sita,Gita 4 plates of fruit salad 40 20 Sita,Gita,Amar 4 banana-mango milk shakes 40 13.33 Amar,Akbar 4 egg dosas 40 20 Sita,Gita,Amar,Akbar,Anthony Banana-mango loaf, 16 slices 160 32 As can be seen different coalitions give different payoffs to individuals that is more than the indvidual payoffs if they do not cooperate. Multiple coalitions can be present simultaneously, for exampl Sita, Gita and Amar, Akbar. The grand coalition which involves everyone gives a payoff that is more than 3 times the individual payoff without a coalition. One important observation is that such cooperation is possible only if the payoff to the coalition is transferable, for example currency of some kind. To formalize cooperative games define a finite set of players P = {1, . . . , n}, P(P ) the power set of P , cooperative game G = (P, v) where v : P(P ) → R+ is the characteristic or coalition function. v describes the payoff or utility obtained by forming coalitions. v satisfies:

1. v(∅) = 0.

2. For any s1, s2 ∈ P(P ) s1 ⊆ s2 =⇒ v(s1) ≤ v(s2)

Defining v can be expensive since the domain is the power set of P . Definition 7 (Coalition structure). The coalition structure for a cooperative game G = (P, v) with transferable utility k is a partition ξ = (c1, . . . , ck) of P such that P = ∪1 ci and ci ∩ cj = ∅, i 6= j.

For example we have the following possible coalitional structures in the cooperative fˆetegame described earlier:

ξ1 = {{Sita}, {Gita}, {Amar}, {Akbar}, {Anthony}} ξ2 = {{Sita, Gita}, {Amar}, {Akbar}, {Anthony}} ξ3 = {{Sita, Gita, Amar}, {Akbar}, {Anthony}} ξ4 = {{Sita, Gita}, {Amar, Akbar}, {Anthony}} ξ5 = {Sita, Gita, Amar, Akbar, Anthony}

Definition 8 (Outcome). The outcome of the game G = (P, v) is a pair (ξ, a), a = (a1, . . . , an) is a payoff vector, P ai ≥ 0, i∈c ai = v(c) for c ∈ ξ. Payoff of a coalition is completely distributed to individuals in the coalition.

For example in the fˆetegame the outcome can be:

(ξ1, a1), a1 = (10, 10, 10, 10, 10) (ξ2, a2), a2 = (20, 20, 10, 10, 10) 1 1 1 (ξ3, a3), a3 = (13 /3, 13 /3, 13 /3, 10, 10) (ξ4, a4), a4 = (20, 20, 20, 20, 10) (ξ5, a5), a5 = (32, 32, 32, 32, 32)

Cooperative games can have non-transeferable utility. For example if the utility is in kind (some form of barter) which is not arbitrarily shareable.

Definition 9 (Superadditive game). A game G = (P, v) is superadditive if for any two disjoint coalitions c1, c2, v(c1 ∪ c2) ≥ (c1) + v(c2).

In a superadditive game a grand coalition is likely to happen so outcome can be represented just by a. If v(c) = 2 2 2 2 kck = (kc1k + kc2k) ≥ kc1k + kc2k .

16 For example the fˆeteis not superadditive: Consider c1 = {Sita, Gita}, c2 = {Amar}. We have v(c1) = 40, v(c2) = 10, v({Sita, Gita, Amar}) = 40 < v(c1) + v(c2). Finding Nash equilibria in cooperative games is harder since now we have to analyse coalitions of players. But if the outcome of a coalition is worse off for an individual i then i will break off from the coalition. For superadditive games players will join a grand coalition only if the resulting payoff vector a = (a1, . . . , an) satisfies ai ≥ v({i}), ∀i ∈ P . Definition 10 (Imputation). The imputation for G is the set of payoff vectors that satisfy the above condition. That is imputation I(G) is:

n n X I(G) = {a ∈ R | ai = v(P ) and ai ≥ v({i}) ∀i ∈ P } 1

Definition 11 (Core of G). X Core(G) = {a ∈ I(G)| ai ≥ v(c) for all coalitions c ⊆ P } i∈c

If a ∈ Core(G) then no subgroup can be better off by breaking away from the grand coalition - so the game is stable. Example: n ≥ 3 players. Players are paired to play a game, say chess. They receive Rs.1000/- (for both players) for participating in the game. So, v is defined as:   |c| × 1000 |c| even v(c) = 2 |c−1|  2 × 1000 |c| odd for any coalition c. If n ≥ 4 and is even then (500, 500,..., 500) ∈ Core(G) and is the only vector in the core. Note that in this situation no pair can be better off by breaking away.

For n ≥ 3, n odd the core is empty. Assume a1, a2, a3 are in the core then v(1, 2, 3) = 1000 = a1 + a2 + a3. Since each ai > 0 let a1 > 0 then a2 + a3 < 1000 but v(2, 3) = 1000. Another way to see this is to observe that at least one person must get less than 500, possibly two will be less than 500 if all payoffs are positive. They (or one person) are better off breaking away since they can form another pair and share 1000.

17