Lecture 26 Matching Pennies Payoff Matrix Mixed Strategy Analysis Al's

Part 4: Cooperation & Competition 11/26/07

Matching Pennies

• Al and Barb each independently picks either Lecture 26 heads or tails • If they are both heads or both tails, Al wins • If they are different, Barb wins

11/26/07 1 11/26/07 2

Payoff Matrix Mixed Strategy

• Although we cannot use maximin to select a Barb Minimum of each pure strategy, we can use it to select a pure strategy is the same mixed strategy head tail • Take the maximum of the minimum payoffs over all assignments of probabilities head +1, –1 –1, +1 • von Neumann proved you can always ﬁnd Al an equilibrium if mixed strategies are tail –1, +1 +1, –1 permitted

11/26/07 3 11/26/07 4

Al’s Expected Payoff Analysis from Penny Game

• Let PA = probability Al picks head

• and PB = probability Barb picks head • Al’s expected payoff:

E{A} = PA PB – PA (1 – PB) – (1 – PA) PB + (1 – PA) (1 – PB)

= (2 PA – 1) (2 PB – 1)

11/26/07 5 11/26/07 6

1 Part 4: Cooperation & Competition 11/26/07

How Barb’s Behavior Affects How Barb’s Behavior Affects Al’s Expected Payoff Al’s Expected Payoff

11/26/07 7 11/26/07 8

More General Analysis (Differing Payoffs) Random Rationality • Let A’s payoffs be: H = HH, h = HT, t = TH, T = TT “It seems difﬁcult, at ﬁrst, to accept the idea

• E{A} = PAPBH + PA(1 – PB)h + (1 – PA)PBt that ‘rationality’ — which appears to + (1 – PA)(1 – PB)T demand a clear, deﬁnite plan, a

= (H + T – h – t)PAPB + (h – T)PA + (t – T)PB + T deterministic resolution — should be

• To ﬁnd saddle point set ∂E{A}/∂PA = 0 and ∂ achieved by the use of probabilistic devices. E{A}/∂PB = 0 to get: Yet precisely such is the case.” T " t T " h —Morgenstern P = , P = A H + T " h " t B H + T " h " t

11/26/07 9 11/26/07 10

Probability in Games of Chance Review of von Neumann’s and Strategy Solution • “In games of chance the task is to determine • Every two-person zero-sum game has a and then to evaluate probabilities inherent in maximin solution, provided we allow mixed the game; strategies • in games of strategy we introduce • But— it applies only to two-person zero- sum games probability in order to obtain the optimal choice of strategy.” • Arguably, few “games” in real life are zero- sum, except literal games (i.e., invented — Morgenstern games for amusement)

11/26/07 11 11/26/07 12

2 Part 4: Cooperation & Competition 11/26/07

Nonconstant Sum Games Dominant Strategy Equilibrium

• There is no agreed upon deﬁnition of • Dominant strategy: rationality for nonconstant sum games – consider each of opponents’ strategies, and • Two common criteria: what your best strategy is in each situation – dominant strategy equilibrium – if the same strategy is best in all situations, it is the dominant strategy – Nash equilibrium • Dominant strategy equilibrium: occurs if each player has a dominant strategy and plays it

11/26/07 13 11/26/07 14

Another Example Nash Equilibrium

Price Beta Competition p = 1 p = 2 p = 3 • Developed by John Nash in 1950 p = 1 0, 0 50, –10 40, –20 • His 27-page PhD dissertation: Alpha p = 2 –10, 50 20, 20 90, 10 Non-Cooperative Games p = 3 –20, 40 10, 90 50, 50 • Received Nobel Prize in Economics for it in 1994 There is no dominant strategy • Subject of A Beautiful Mind

11/26/07 Example from McCain’s Game Theory: An Introductory Sketch 15 11/26/07 16

Deﬁnition of Nash Equilibrium Another Example (Reconsidered)

• A set of strategies with the property: Price Beta No player can beneﬁt by changing actions Competition while others keep strategies unchanged p = 1 p = 2 p = 3 • Players are in equilibrium if any change of p = 1 0, 0 50, –10 40, –20 strategy would lead to lower reward for that Alpha p = 2 –10, 50 20, 20 90, 10 player • For mixed strategies, we consider expected p = 3 –20, 40 10, 90 50, 50 reward better for Beta better for Alpha Not a Nash equilibrium 11/26/07 17 11/26/07 Example from McCain’s Game Theory: An Introductory Sketch 18

3 Part 4: Cooperation & Competition 11/26/07

The Nash Equilibrium Extensions of the Concept of a Rational Solution Price Beta • Every maximin solution is a dominant Competition p = 1 p = 2 p = 3 strategy equilibrium p = 1 0, 0 50, –10 40, –20 • Every dominant strategy equilibrium is a Alpha p = 2 –10, 50 20, 20 90, 10 Nash equilibrium p = 3 –20, 40 10, 90 50, 50

Nash equilibrium

11/26/07 Example from McCain’s Game Theory: An Introductory Sketch 19 11/26/07 20

Cooperation Better for Both: A Dilemma Dilemmas

Price Beta • Dilemma: “A situation that requires choice between options that are or seem equally Competition p = 1 p = 2 p = 3 unfavorable or mutually exclusive” p = 1 0, 0 50, –10 40, –20 – Am. Her. Dict. Alpha p = 2 –10, 50 20, 20 90, 10 • In game theory: each player acts rationally, p = 3 –20, 40 10, 90 50, 50 but the result is undesirable (less reward)

Cooperation

11/26/07 Example from McCain’s Game Theory: An Introductory Sketch 21 11/26/07 22

The Prisoners’ Dilemma • Devised by Melvin Dresher & Merrill Flood in Prisoners’ Dilemma: The Story 1950 at RAND Corporation • Further developed by mathematician Albert W. • Two criminals have been caught Tucker in 1950 presentation to psychologists • They cannot communicate with each other • It “has given rise to a vast body of literature in • If both confess, they will each get 10 years subjects as diverse as philosophy, ethics, biology, sociology, political science, economics, and, of • If one confesses and accuses other: course, game theory.” — S.J. Hagenmayer – confessor goes free • “This example, which can be set out in one page, – accused gets 20 years could be the most inﬂuential one page in the social • If neither confesses, they will both get 1 sciences in the latter half of the twentieth year on a lesser charge century.” — R.A. McCain

11/26/07 23 11/26/07 24

4 Part 4: Cooperation & Competition 11/26/07

Prisoners’ Dilemma Ann’s “Rational” Analysis Payoff Matrix (Dominant Strategy)

Bob Bob cooperate defect cooperate defect cooperate –1, –1 –20, 0 cooperate –1, –1 –20, 0 Ann Ann defect 0, –20 –10, –10 defect 0, –20 –10, –10

• defect = confess, cooperate = don’t • if cooperates, may get 20 years • if defects, may get 10 years • payoffs < 0 because punishments (losses) • ∴, best to defect 11/26/07 25 11/26/07 26

Bob’s “Rational” Analysis Suboptimal Result of (Dominant Strategy) “Rational” Analysis

Bob Bob cooperate defect cooperate defect cooperate –1, –1 –20, 0 cooperate –1, –1 –20, 0 Ann Ann defect 0, –20 –10, –10 defect 0, –20 –10, –10

• if he cooperates, may get 20 years • each acts individually rationally ⇒ get 10 years • if he defects, may get 10 years (dominant strategy equilibrium) • ∴, best to defect • “irrationally” decide to cooperate ⇒ only 1 year 11/26/07 27 11/26/07 28

Summary

• Individually rational actions lead to a result that all agree is less desirable • In such a situation you cannot act unilaterally in Classiﬁcation of Dilemmas your own best interest • Just one example of a (game-theoretic) dilemma • Can there be a situation in which it would make sense to cooperate unilaterally? – Yes, if the players can expect to interact again in the future

11/26/07 29 11/26/07 30

5 Part 4: Cooperation & Competition 11/26/07

General Conditions for a General Payoff Matrix Dilemma

• You always beneﬁt if the other cooperates: Bob . CC > CD and DC > DD • You sometimes beneﬁt from defecting: cooperate defect . DC > CC or DD > CD • Mutual coop. is preferable to mut. def. cooperate CC (R) CD (S) . CC > DD Reward Sucker Ann • Consider relative size of CC, CD, DC, DD defect DC (T) DD (P) . think of as permutations of R, S, T, P Temptation Punishment . only three result in dilemmas 11/26/07 31 11/26/07 32

Three Possible Orders The Three Dilemmas

CC DC • Chicken (TRSP) (R) (T) . DC > CC > CD > DD . characterized by mutual defection being worst . two Nash equilibria (DC, CD) • Stag Hunt (RTPS) . CC > DC > DD > CD . better to cooperate with cooperator . Nash equilibrium is CC CD DD (S) (P) • Prisoners’ Dilemma (TRPS) . DC > CC > DD > CD The three dilemmas: TRSP, RTPS, TRPS . better to defect on cooperator . Nash equilibrium is DD 11/26/07 33 11/26/07 34

Assumptions

• No mechanism for enforceable threats or The Iterated Prisoners’ Dilemma commitments • No way to foresee a player’s move • No way to eliminate other player or avoid and Robert Axelrod’s Experiments interaction • No way to change other player’s payoffs • Communication only through direct interaction

11/26/07 35 11/26/07 36

6 Part 4: Cooperation & Competition 11/26/07

Axelrod’s Experiments IPD Payoff Matrix

• Intuitively, expectation of future encounters B may affect rationality of defection cooperate defect • Various programs compete for 200 rounds – encounters each other and self cooperate 3, 3 0, 5 • Each program can remember: A – its own past actions defect 5, 0 1, 1 – its competitors’ past actions • 14 programs submitted for ﬁrst experiment N.B. Unless DC + CD < 2 CC (i.e. T + S < 2 R), can win by alternating defection/cooperation 11/26/07 37 11/26/07 38

Indefinite Number Analysis of Some Simple of Future Encounters Strategies • Cooperation depends on expectation of • Three simple strategies: indefinite number of future encounters – ALL-D: always defect • Suppose a known finite number of – ALL-C: always cooperate encounters: – RAND: randomly cooperate/defect – No reason to C on last encounter • Effectiveness depends on environment – ALL-D optimizes local (individual) fitness – Since expect D on last, no reason to C on next to last – ALL-C optimizes global (population) fitness – RAND compromises – And so forth: there is no reason to C at all

11/26/07 39 11/26/07 40

Expected Scores Result of Axelrod’s Experiments

⇓ playing ⇒ ALL-C RAND ALL-D Average • Winner is Rapoport’s TFT (Tit-for-Tat) – cooperate on ﬁrst encounter ALL-C 3.0 1.5 0.0 1.5 – reply in kind on succeeding encounters • Second experiment: RAND 4.0 2.25 0.5 2.25 – 62 programs – all know TFT was previous winner ALL-D 5.0 3.0 1.0 3.0 – TFT wins again

11/26/07 41 11/26/07 42

7 Part 4: Cooperation & Competition 11/26/07

Characteristics of Successful Strategies Demonstration of • Don’t be envious – at best TFT ties other strategies Iterated Prisoners’ Dilemma • Be nice – i.e. don’t be ﬁrst to defect • Reciprocate Run NetLogo demonstration – reward cooperation, punish defection PD N-Person Iterated.nlogo • Don’t be too clever – sophisticated strategies may be unpredictable & look random; be clear

11/26/07 43 11/26/07 44

Effects of Many Kinds of Noise Tit-for-Two-Tats Have Been Studied • More forgiving than TFT • Misimplementation noise • Wait for two successive defections before • Misperception noise punishing – noisy channels • Beats TFT in a noisy environment • Stochastic effects on payoffs • E.g., an unintentional defection will lead • General conclusions: TFTs into endless cycle of retaliation – sufﬁciently little noise ⇒ generosity is best • May be exploited by feigning accidental – greater noise ⇒ generosity avoids unnecessary defection conﬂict but invites exploitation

11/26/07 45 11/26/07 46

More Characteristics of Successful Strategies • Should be a generalist (robust) Kant’s Categorical Imperative – i.e. do sufﬁciently well in wide variety of environments • Should do well with its own kind “Act on maxims that can at the same time – since successful strategies will propagate have for their object themselves as universal • Should be cognitively simple laws of nature.” • Should be evolutionary stable strategy – i.e. resistant to invasion by other strategies

11/26/07 47 11/26/07 48

8 Part 4: Cooperation & Competition 11/26/07

Reading

• CS 420/594: Flake, ch. 18 (Natural & Ecological & Spatial Models Analog Computation)

11/26/07 49 11/26/07 50

Ecological Model Variables

• What if more successful strategies spread in • Pi(t) = probability = proportional population population at expense of less successful? of strategy i at time t

• Models success of programs as fraction of • Si(t) = score achieved by strategy i total population • Rij(t) = relative score achieved by strategy i • Fraction of strategy = probability random playing against strategy j over many rounds program obeys this strategy – ﬁxed (not time-varying) for now

11/26/07 51 11/26/07 52

Computing Score of a Strategy Updating Proportional Population

• Let n = number of strategies in ecosystem • Compute score achieved by strategy i: P t S t n i ( ) i ( ) Pi (t +1) = n S t R t P t i ( ) = ik ( ) k ( ) Pj t S j t " " j=1 ( ) ( ) k=1

S(t) = R(t)P(t) ! 11/2!6/07 53 11/26/07 54

9 Part 4: Cooperation & Competition 11/26/07

Some Simulations Demonstration Simulation • 60% ALL-C • 20% RAND • Usual Axelrod payoff matrix • 10% ALL-D, TFT • 200 rounds per step

11/26/07 55 11/26/07 56

Collectively Stable Strategy • Let w = probability of future interactions • Suppose cooperation based on reciprocity NetLogo Demonstration of has been established Ecological IPD • Then no one can do better than TFT provided: $ T # R T # R' w " max& , ) Run EIPD.nlogo % R # S T # P(

• The TFT users are in a Nash equilibrium

11/26/07 57 11/26/07 58 !

“Win-Stay, Lose-Shift” Strategy Simulation without Noise • 20% each • no noise • Win-stay, lose-shift strategy: – begin cooperating – if other cooperates, continue current behavior – if other defects, switch to opposite behavior • Called PAV (because suggests Pavlovian learning)

11/26/07 59 11/26/07 60

10 Part 4: Cooperation & Competition 11/26/07

Effects of Noise Simulation with Noise • 20% each • Consider effects of noise or other sources of error • 0.5% noise in response • TFT: – cycle of alternating defections (CD, DC) – broken only by another error • PAV: – eventually self-corrects (CD, DC, DD, CC) – can exploit ALL-C in noisy environment

• Noise added into computation of Rij(t)

11/26/07 61 11/26/07 62

Spatial Effects Spatial Simulation • Previous simulation assumes that each agent is equally likely to interact with each other • Toroidal grid • So strategy interactions are proportional to • Agent interacts only with eight neighbors fractions in population • Agent adopts strategy of most successful • More realistically, interactions with neighbor “neighbors” are more likely – “Neighbor” can be deﬁned in many ways • Ties favor current strategy • Neighbors are more likely to use the same strategy

11/26/07 63 11/26/07 64

Typical Simulation (t = 1) Typical Simulation (t = 5)

Colors: Colors:

ALL-C ALL-C TFT TFT RAND RAND PAV PAV ALL-D ALL-D

11/26/07 65 11/26/07 66

11 Part 4: Cooperation & Competition 11/26/07

Typical Simulation (t = 10) Typical Simulation (t = 10) Zooming In

Colors:

ALL-C TFT RAND PAV ALL-D

11/26/07 67 11/26/07 68

Typical Simulation (t = 20) Typical Simulation (t = 50)

Colors: Colors:

ALL-C ALL-C TFT TFT RAND RAND PAV PAV ALL-D ALL-D

11/26/07 69 11/26/07 70

Typical Simulation (t = 50) Zoom In

Simulation of Spatial Iterated Prisoners Dilemma

Run sipd.nlogo

11/26/07 71 11/26/07 72

12 Part 4: Cooperation & Competition 11/26/07

SIPD Without Noise Conclusions: Spatial IPD

• Small clusters of cooperators can exist in hostile environment • Parasitic agents can exist only in limited numbers • Stability of cooperation depends on expectation of future interaction • Adaptive cooperation/defection beats unilateral cooperation or defection

11/26/07 73 11/26/07 74

Additional Bibliography

1. von Neumann, J., & Morgenstern, O. Theory of Games and Economic Behavior, Princeton, 1944. 2. Morgenstern, O. “Game Theory,” in Dictionary of the History of Ideas, Charles Scribners, 1973, vol. 2, pp. 263-75. 3. Axelrod, R. The Evolution of Cooperation. Basic Books, 1984. 4. Axelrod, R., & Dion, D. “The Further Evolution of Cooperation,” Science 242 (1988): 1385-90. 5. Poundstone, W. Prisoner’s Dilemma. Doubleday, 1992.

11/26/07 75