<<

AM 121: Intro to Optimization Models and Methods

Lecture 11: Yiling Chen SEAS

Lesson Plan

‡ Two player, zero-sum games ‡ 4 techniques to solve such games ‡ The theorem and ‡ Solving poker (for those interested)

Very elegant connection between game theory and duality theory! Example: Political Campaign ‡ Two days, two cities. Available strategies: ± one day in each city (E) ± two days in city A ± two days in city B ‡ Payoff for Row player: ± ¶VRIYRWHVZRQIURPRSSRQHQW ± FROXPQSOD\HU¶VSD\RIILV ± payoff to row player) column E A B E 1, -1 2, -2 4, -4 row A 1, -1 0, 0 5, -5 B 0, 0 1, -1 -1, 1

Example: Political Campaign ‡ Two days, two cities. Available strategies: ± one day in each city (E) ± two days in city A ± two days in city B ‡ Payoff for Row player: ± ¶VRIYRWHVZRQIURPRSSRQHQW ± FROXPQSOD\HU¶VSD\RIILV ± payoff to row player) column E A B E 1 2 4 row A 1 0 5 B 0 1 -1 The family of games we consider

‡ Two player, zero sum games ‡ ^«P`VWUDWHJLHVIRUURZSOD\HUDQG ^«Q`VWUDWHJLHVIRUFROXPQSOD\HU

‡ Denote entry aij!R in payoff table, the payoff to row when row plays i and column plays j

‡ ³3D\RII´WRSOD\HUVSr(i,j) = aij, Sc(i,j) = - aij

" Sc(i,j) + Sr LM  WKH³]HURVXP´SURSHUW\

Goal: compute a solution to the game that provides an optimal for each player

Solution Concept: Nash Equilibrium

‡ Roughly speaking, a strategy profile is a NE LLIHYHU\SOD\HU¶VVWUDWHJ\LVRSWLPDOJLYHQ RWKHUSOD\HUV¶VWUDWHJLHV I. Solving via Iterated removal of strictly dominated strategies

‡ Row: Strategy i is strictly dominated by strategy L¶ZKHQ

¾ Sr L¶M #Sr(i,j) for all j!^«Q`DQG

¾ Sr L¶M !Sr(i,j) for some j!^«Q` ‡ Column: Strategy j is strictly dominated by VWUDWHJ\M¶ when

¾ Sc LM¶ #Sc(i,j) for all i!^«P`DQG

¾ Sc LM¶ !Sc(i,j) for some i!^«P`

‡ Can apply iteratively: iterated removal of strictly dominated strategies

Example: Political Campaign ‡ Two days, two cities. Available strategies: ± one day in each city (E) ± two days in city A ± two days in city B ‡ Can solve by iterated removal of strictly dominated strategies: E A B E 1 2 4 A 1 0 5 B 0 1 -1 Solution is (E,E) Say game has value 1. (Payoff to row player in solution.) Example: Political Campaign ‡ Two days, two cities. Available strategies: ± one day in each city (E) ± two days in city A ± two days in city B ‡ Can solve by iterated removal of strictly dominated strategies: E A B E 1 2 4 A 1 0 5

B 0 1 -1 1 Solution is (E,E) Say game has value 1. (Payoff to row player in solution.) Example: Political Campaign ‡ Two days, two cities. Available strategies: ± one day in each city (E) ± two days in city A ± two days in city B ‡ Can solve by iterated removal of strictly

dominated strategies: 2 E A B E 1 2 4 A 1 0 5

B 0 1 -1 1 Solution is (E,E) Say game has value 1. (Payoff to row player in solution.) Example: Political Campaign ‡ Two days, two cities. Available strategies: ± one day in each city (E) ± two days in city A ± two days in city B ‡ Can solve by iterated removal of strictly

dominated strategies: 2 E A B E 1 2 4 3 A 1 0 5

B 0 1 -1 1 4 Solution is (E,E) Say game has value 1. (Payoff to row player in solution.) Political Campaign: Variation 1 ‡ No dominated strategies!

E A B E -3 -2 6 A 2 0 2 B 5 -2 -4 Political Campaign: Variation 1 ‡ 1RGRPLQDWHGVWUDWHJLHV&RQVLGHU³EHVW- UHVSRQVH´G\QDPLFV E A B E -3 -2 6 A 2 0 2 B 5 -2 -4

The arrow from (E,E) to (B,E) indicates that (B,E) has more payoff for row than (E,E) and that row would deviate and play (E,E) if it knew column was playing E. Not all arrows are shown! (e.g., also one from (A,E) to (B,E) not shown.) Unique stable solution is (A,A). Political Campaign: Variation 1

&DQDOVRFRQVLGHUWKH³PLQLPD[FULWHULRQ´ min value E A B saddle point E -3 -2 6 -3 A 2 0 2 0 maximin B 5 -2 -4 -4 strategy max value 5 0 6 worst case payoff column minimax worst case payoff row can can expect to achieve when strategy expect to achieve when plays each strategy plays each strategy II. Solving via Saddle points

‡ Minimax criterion:

‡ Row: maxi minj Sr(i,j) ‡ Column: minj maxi Sr(i,j) (= -maxj mini Sc(i,j))

‡ ,QH[DPSOHVDPHHQWU\ $$ \LHOGVWKH³PD[LPLQ´ YDOXHIRUURZDQGWKH³PLQLPD[´YDOXHIRUFROXPQ ‡ &DOOHGD³saddle point´

‡ This is why the maximin and minimax strategies form a stable point and solve this game. Even if column knows row is playing A, column does not want to deviate (and vice versa)

Political Campaign: Variation 2 ‡ Strategy B for row is dominated by strategy A. ‡ Apply minimax criterion? E A B E 0 -2 2 A 3 4 -3 B 2 3 -4 Political Campaign: Variation 2 ‡ Strategy B for row is dominated by strategy A.

‡ Apply minimax criterion not a saddle E A B point E 0 -2 2 -2 maximin -3 strategy for A 3 4 -3 row B 2 3 -4

3 4 2

minimax strategy But (E,B) not a saddle point. for column "$column can deviate and play A, then row can GHYLDWHDQGSOD\$DQGVRRQ«

Political Campaign: Variation 2 ‡ Consider best-response dynamics:

E A B E 0 -2 2 A 3 4 -3 B 2 3 -4

No stable solution! If one player knows the other SOD\HU¶VVWUDWHJ\WKHQKHRUVKHKDVDXVHIXO deviation. Simpler Example: Odds and Evens

‡ Two players (Odd and Even) simultaneously choose the number of fingers (1 or 2) to put out. If the sum is odd, then Odd wins $1 from Even. If the sum is even, Even wins $1 from Odd.

Even one two

Odd one -1 +1 two +1 -1

‡ No stable solution. How do you play this game?

Mixed strategies m ‡ Let x=(x1«[m), xi#0, ¦ixi=1 denote a mixed strategy of row player; the probability xi with which row plays each strategy i!^«P` ‡ Define a maximin strategy x* as solving m maxx minj!^«Q` ¦i=1 xiaij

‡ Find the mixed strategy x that maximizes my worst- case payoff (given that column knows what I will do) Even Graphical method: plot expected value to row for each one two pure strategy of column as mixed one -1 +1 strategy of row varies Odd two +1 -1 1 Expected EV when column value (EV) SOD\VµWZR¶ row player

0 x SUREµRQH¶E\URZ ! 1 1

EV when column maximin SOD\VµRQH¶ -1 for row Lower envelope is the minimal payoff row

would receive for each strategy (x1, 1-x1) Mixed strategies m ‡ Let x=(x1«[m), xi#0, ¦ixi=1 denote a mixed strategy of row player; the probability xi with which row plays each strategy i!^«P` n ‡ Let y=(y1«\n), yj#0, ¦jyj=1 denote a mixed strategy of column player ‡ Define a maximin strategy x* as solving m maxx minj!^«Q` ¦i=1 xiaij ‡ Define a minimax strategy y* as solving n miny maxi!{1,...,m} ¦j=1 yjaij Even Graphical method: plot expected value to column for one two each pure strategy of row as mixed one -1 +1 strategy of column varies Odd two +1 -1

Upper envelope is the Sr (= -Sc) for each strategy (y , 1-y ) by column 1 1 1 EV to EV when row row SOD\VµRQH¶ player

0 y SUREµRQH¶E\FRO ! 1 1

EV when row minimax SOD\VµWZR¶ -1 for column Mixed strategies ‡ ,QWKHH[DPSOHWKH³PD[LPLQFULWHULRQ´VXJJHVWV x*=(!, !) and y*=(!, !) as solutions ‡ Note: the value of the maximin strategy to row = the value of the minimax strategy to column = 0 ‡ Also a stable solution: if column plays y* then row does not wish to deviate from x*, and vice versa ‡ Consider this a solution to the game Even one two one -1 +1 Odd two +1 -1 III. Solving via the Graphical method ‡ Can use in a game with only two pure strategies for each player the graphical method can be used

‡ Row: vary x1 and plot EV to row for each possible strategy j!^«Q`RIFROXPQ7DNHWKHORZHU- * envelope and find x 1 that maximizes.

‡ Column: vary y1 and plot EV to row for each possible strategy i!^«P`RIURZ7DNHWKHXSSHU- * envelope and find y1 that minimizes.

‡ Can also use graphical method as long as one of the players has only two pure strategies; can solve for other player algebraically. IV. Solving via LP Computing maximin strategy via LP m ‡ maxx minj ¦i=1aij xi y1 y2 y3 m s.t. ¦i=1xi = 1 E A B xi # 0 x1 E 0 -2 2

x2 A 3 4 -3

Computing maximin strategy via LP m ‡ maxx minj ¦i=1aij xi y1 y2 y3 m s.t. ¦i=1xi = 1 E A B xi # 0 x1 E 0 -2 2

x2 A 3 4 -3 ‡ max v m s.t. v % ¦i=1 aij xi, & j!^«Q` m ¦i=1xi = 1 xi # 0, & i!^«P` v free Example: Political campaign

‡ max v y y y m 1 2 3 s.t. v % ¦i=1aij xi, & j!^«Q` E A B m ¦i=1xi = 1 x1 E 0 -2 2 xi # 0, & i!^«P` x2 A 3 4 -3 v free ‡ max v s.t. -3x + v % 0 Solution: 2 x*=(7/11,4/11) v*=2/11 2x1 ± 4x2 + v % 0 -2x1 + 3x2 + v % 0 x1 + x2 = 1 x1, x2 # 0 v free

Computing minimax strategy via LP

n ‡ miny maxi ¦j=1 yj aij y1 y2 y3 n s.t. ¦j=1yj = 1 E A B yj # 0 x1 E 0 -2 2

x2 A 3 4 -3 Computing minimax strategy via LP

n ‡ miny maxi ¦j=1 yj aij y1 y2 y3 n s.t. ¦j=1yj = 1 E A B yj # 0 x1 E 0 -2 2

x2 A 3 4 -3 ‡ min w n s.t. w # ¦j=1 aij yj & i!^«P` n ¦j=1 yj = 1 yj # 0, & j!^«Q` w free

Example: Political campaign

‡ min w y1 y2 y3 n s.t. w # ¦j=1 aij yj & i!^«P` E A B n ¦j=1 yj = 1 x1 E 0 -2 2 yj # 0, & j!^«Q` x2 A 3 4 -3 w free

‡ min w Solution: * s.t. 2y2 ± 2y3 + w # 0 y =(0,5/11,6/11) w*=2/11 ±3y1 ±4y2 + 3y3 + w # 0 y1 +y2 +y3 = 1 y1, y2, y3 # 0 w free Minimax Theorem (von Neumann 1928)

‡ Theorem. For every zero sum, two-player game the pair of strategies (x*,y*), optimal according to the minimax criterion, is a Nash equilibrium, and with v*=w*=v (the value of the game). ‡ Definition. Strategies (x,y) are a Nash equilibrium of a game if Sr(x,y)#Sr [¶\ IRUDOO[¶zx and Sc(x,y)#Sc [\¶ IRUDOO\¶zy ‡ Theorem. Nash (1950). Existence of NE in all matrix-form games.

‡ Given (x,y), prob. play (i,j) is xiyj, so expected payoff Sr(x,y)=¦i¦jxi aij yj; also have Sc(x,y)=-Sr(x,y)

Example: Political campaign ‡ max v y y s.t. -3x2 + v % 0 1 2 y3 2x1 ± 4x2 + v % 0 E A B -2x + 3x + v % 0 1 2 x1 E 0 -2 2 x1 + x2 = 1 x2 A 3 4 -3 x1, x2 # 0 v free ‡ min w

s.t. 2y2 ± 2y3 + w # 0 minimax problem is the dual ±3y ±4y +3y + w # 0 of the maximin problem. 1 2 3 both feasible (easy to see). y1 +y2 +y3 = 1 duality theorem " y , y , y # 0 must have the same value! 1 2 3 * w free w =v* y1 y2

a11 a12 x1 a21 a22 x2

‡ max 0x1 + 0x2 + v ‡ max (0 1) x s.t. ±a11x1 ± a21x2 + v % 0 v T ±a12x1 ± a22x2 + v % 0 s.t. -A 1 x % 0 x1 + x2 = 1 1 0 v = 1 x1, x2 # 0 x # 0, v free v free

‡ min 0y1 + 0y2 + w ‡ min (0 1) y s.t. ±a11y1 + a12y2 + w # 0 w s.t. -A 1 y # 0 ±a21y1 ± a22y2 + w # 0 y + y = 1 1 0 w = 1 1 2 y # 0, w free y1, y2 # 0 w free

Minimax Theorem (von Neumann 1928)

‡ Theorem. For every zero sum, two-player game the pair of strategies (x*,y*), optimal according to the minimax criterion, is a Nash equilibrium, and with v*=w*=v (the value of the game).

" still need to show that this is a Nash equilibrium y1 y2

a11 a12 x1 a21 a22 x2 ‡ max v ‡ min w

s.t. v % ¦i=1 aij xi, & j s.t. w # ¦j=1 aij yj & i ¦i=1xi = 1 ¦j=1 yj = 1 xi # 0, & i yj # 0, & j v free w free * * ‡ Complementary slackness: y j s j=0, & j * * * " ¦j y j (¦i aijxi -v ) = 0 * * * * * " ¦j ¦i y j aij xi = ¦j y j v = v

* * * ‡ Shows that v = ¦i ¦j x i aij y j = expected payoff to row given (x*,y*)

y1 y2

a11 a12 x1 a21 a22 x2 ‡ max v ‡ min w

s.t. v % ¦i=1 aij xi, & j s.t. w # ¦j=1 aij yj & i ¦i=1xi = 1 ¦j=1 yj = 1 xi # 0, & i yj # 0, & j v free w free

multiply ith dual constraint multiply jth primal constraint strong duality by xi and sum by yj and sum * * * * * * ¦i¦j xiaijyj % ¦ixiw = w = v =¦jyjv % ¦i¦jx iaijyj

x* is best-response for y* is best-response for row player (recall that column player (recall that * * * * * * v =¦i¦j x i aij y j ) w =¦i¦j x i aij y j ) Minimax Theorem (von Neumann 1928)

‡ Theorem. For every zero sum, two-player game the pair of strategies (x*,y*), optimal according to the minimax criterion, is a Nash equilibrium, and with v*=w*=v (the value of the game).

‡ Useful: shows that can solve two-player, zero sum games via linear programming!

Example: Tic-tac-toe X ‡ Model as an extensive-form game: 1 1 « 2 «

corner 1 corner 1 center 1 2 side 1 «

side 2 « ‡ +RZZRXOG\RXFRQYHUWWR³PDWUL[IRUP´" ‡ Hint: what is a (pure) strategy i!^«P` ‡ &DQKDQGOHµH[WHQVLYH-IRUP¶JDPHVE\ constructing all possible strategies ‡ Strategy == complete description of how to play in all possible states of the game

‡ Given this, possible to construct the payoff matrix

Example: Poker (Kuhn, 1950) ‡ 3OD\HUV$DQG%&DUGV^`3XW³DQWH´LQWR ³NLWW\´,I³EHW´WKHQSXWDQRWKHULQNLWW\3OD\HU$ goes first. Goal is to win back the kitty. ‡ Each dealt one card. Game stops when bet followed E\EHWSDVVE\SDVVRUEHWE\SDVV ³IROG´  ‡ In first two, winner decided by comparing cards. In third, player with bet wins. ‡ Possible plays: A pass, B pass: $1 to holder of higher card A pass, B bet, A pass: $1 to B A pass, B bet, A bet: $2 to holder of higher card A bet, B pass: $1 to A A bet, B bet: $2 to holder of higher card What are the strategies? ‡ $IWHUEHLQJGHDOWDFDUG$KDVWKUHH³OLQHV´ 1. Pass. If B bets, pass again. 2. Pass. If B bets ,bet. 3. Bet. ‡ $IWHUEHLQJGHDOWDFDUG%KDVIRXU³OLQHV´ 1. Pass no matter what. 2. If A passes, pass. If A bets, bet. 3. If A passes, bet. If A bets, pass. 4. Bet no matter what. ‡ Pure strategy = statement about line for each possible card that player is dealt.

‡ Defined by triples (a1,a2,a3) and (b1,b2,b3) ± where ai is the line row player will use when have card i and bi is line column player will use when has card i. E.g., (3,1,2) for player A and (3,2,4) for player B

How to determine the payoff aij? ‡ E.g., (3,1,2) for player A and (3,2,4) for player B ‡ Six ways in which cards can be dealt

card dealt betting session payment A to B A B 1 2 A bets, B bets 2 1 3 A bets, B bets 2 2 1 A passes, B bets, A 1 passes 2 3 A passes, B bets, A 1 passes 3 1 A passes, B bets, A bets -2 3 2 A passes, B passes -1

‡ Average payment = (2+2+1+1-2-1)/6= ! How many combinations are there?

‡ Player A has 3 x 3 x 3 = 27 pure strategies ‡ Player B has 4 x 4 x 4 = 64 pure strategies ‡ Hence there are 27 x 64 = 1728 pairs ‡ Can first apply iterated elimination of strictly dominated strategies.

Eliminating strictly dominated strategies ‡ $IWHUEHLQJGHDOWDFDUG$KDVWKUHH³OLQHV´ 1. Pass. If B bets, pass again. 2. Pass. If B bets ,bet. 3. Bet. ‡ $IWHUEULQJGHDOWDFDUG%KDVIRXU³OLQHV´ 1. Pass no matter what. 2. If A passes, pass. If A bets, bet. 3. If A passes, bet. If A bets, pass. 4. Bet no matter what. ‡ Player holding 1 should never answer a bet with a bet, since the player will lose regardless and will lose less by passing.

Prune: (2,a2,a3) Prune (2,b2,b3), (4,b2,b3) ‡ Player holding 3 should never answer a bet with a pass, since by passing the player will lose but by betting the player will win.

Prune: (a1,a2,1) Prune (b1,b2,1), (b1,b2,3) ‡ Player holding 3 should always answer a pass with a bet, since in either case will win but answering with a bet opens possibility that opponent will bet and increases size of win

Prune: (b1,b2,2) $GGLWLRQDOSUXQLQJ« ‡ $IWHUEHLQJGHDOWDFDUG$KDVWKUHH³OLQHV´ 1. Pass. If B bets, pass again. 2. Pass. If B bets ,bet. 3. Bet. ‡ $IWHUEULQJGHDOWDFDUG%KDVIRXU³OLQHV´ 1. Pass no matter what. 2. If A passes, pass. If A bets, bet. 3. If A passes, bet. If A bets, pass. 4. Bet no matter what. ‡ Player A has 2 x 3 x 3 = 12 pure strategies and player B has 2 x 4 x 1 = 8 pure strategies. Dropped to 96 pairs,WHUDWH« ‡ When holding a 2, player A should not play line 3. Prune (a1,3,a3). Player B has either a 1 (in which case plays lines 1 RU RUD LQZKLFKFDVHSOD\VOLQH «FDQHVWDEOLVKWKDW better for A to pass in first round. ‡ When holding a 2, player B should not play lines 3 or 4. Prune (b1,3,b3), and (b1,4,b3) ‡ Player A has 2 x 2 x 2 = 8 pure strategies and player B has 2 x 2 x 1 = 4 pure strategies. Dropped to 32 pairs. Construct the payoff matrix (1,1,4) (1,2,4) (3,1,4) (3,2,4) (1,1,2) 1/6 1/6 (1,1,3) -1/6 1/3 1/6 (1,2,2) 1/6 1/6 -1/6 -1/6 (1,2,3) 1/6 -1/6 (3,1,2) -1/6 1/3 1/2 (3,1,3) -1/6 1/6 1/6 1/2 (3,2,2) 1/2 -1/3 1/6 (3,2,3) 1/3 -1/6 1/6 «DQGXVH/3WRVROYHWKHJDPH ‡ $IWHUEHLQJGHDOWDFDUG$KDVWKUHH³OLQHV´ 1. Pass. If B bets, pass again. remaining strategies 2. Pass. If B bets ,bet. 3. Bet. {(1,1,2),(1,1,3),(1,2,2),(1,2,3), ‡ $IWHUEULQJGHDOWDFDUG%KDVIRXU³OLQHV´ (3,1,2),(3,1,3),(3,2,2),(3,2,3)} 1. Pass no matter what. 2. If A passes, pass. If A bets, bet. {(1,1,4),(1,2,4),(3,1,4),(3,2,4)} 3. If A passes, bet. If A bets, pass. 4. Bet no matter what. ‡ x*=(1/2, 0, 0, 1/3, 0, 0, 0, 1/6) y* = (2/3, 0, 0, 1/3) ‡ Player A: when holding 1, mix lines 1 and 3 in 5:1 proportion when holding 2, mix lines 1 and 2 in 1:1 proportion when holding 3, mix lines 2 and 3 in 1:1 proportion ‡ Player B: when holding 1, mix lines 1 and 3 in 2:1 proportion when holding 2, mix lines 1 and 2 in 2:1 proportion when holding 3, use line 4. Players sometimes bluff and sometimes underbid!