Part 4: Cooperation & Competition 11/26/07
Matching Pennies
• Al and Barb each independently picks either Lecture 26 heads or tails • If they are both heads or both tails, Al wins • If they are different, Barb wins
11/26/07 1 11/26/07 2
Payoff Matrix Mixed Strategy
• Although we cannot use maximin to select a Barb Minimum of each pure strategy, we can use it to select a pure strategy is the same mixed strategy head tail • Take the maximum of the minimum payoffs over all assignments of probabilities head +1, –1 –1, +1 • von Neumann proved you can always find Al an equilibrium if mixed strategies are tail –1, +1 +1, –1 permitted
11/26/07 3 11/26/07 4
Al’s Expected Payoff Analysis from Penny Game
• Let PA = probability Al picks head
• and PB = probability Barb picks head • Al’s expected payoff:
E{A} = PA PB – PA (1 – PB) – (1 – PA) PB + (1 – PA) (1 – PB)
= (2 PA – 1) (2 PB – 1)
11/26/07 5 11/26/07 6
1 Part 4: Cooperation & Competition 11/26/07
How Barb’s Behavior Affects How Barb’s Behavior Affects Al’s Expected Payoff Al’s Expected Payoff
11/26/07 7 11/26/07 8
More General Analysis (Differing Payoffs) Random Rationality • Let A’s payoffs be: H = HH, h = HT, t = TH, T = TT “It seems difficult, at first, to accept the idea
• E{A} = PAPBH + PA(1 – PB)h + (1 – PA)PBt that ‘rationality’ — which appears to + (1 – PA)(1 – PB)T demand a clear, definite plan, a
= (H + T – h – t)PAPB + (h – T)PA + (t – T)PB + T deterministic resolution — should be
• To find saddle point set ∂E{A}/∂PA = 0 and ∂ achieved by the use of probabilistic devices. E{A}/∂PB = 0 to get: Yet precisely such is the case.” T " t T " h —Morgenstern P = , P = A H + T " h " t B H + T " h " t
11/26/07 9 11/26/07 10
!
Probability in Games of Chance Review of von Neumann’s and Strategy Solution • “In games of chance the task is to determine • Every two-person zero-sum game has a and then to evaluate probabilities inherent in maximin solution, provided we allow mixed the game; strategies • in games of strategy we introduce • But— it applies only to two-person zero- sum games probability in order to obtain the optimal choice of strategy.” • Arguably, few “games” in real life are zero- sum, except literal games (i.e., invented — Morgenstern games for amusement)
11/26/07 11 11/26/07 12
2 Part 4: Cooperation & Competition 11/26/07
Nonconstant Sum Games Dominant Strategy Equilibrium
• There is no agreed upon definition of • Dominant strategy: rationality for nonconstant sum games – consider each of opponents’ strategies, and • Two common criteria: what your best strategy is in each situation – dominant strategy equilibrium – if the same strategy is best in all situations, it is the dominant strategy – Nash equilibrium • Dominant strategy equilibrium: occurs if each player has a dominant strategy and plays it
11/26/07 13 11/26/07 14
Another Example Nash Equilibrium
Price Beta Competition p = 1 p = 2 p = 3 • Developed by John Nash in 1950 p = 1 0, 0 50, –10 40, –20 • His 27-page PhD dissertation: Alpha p = 2 –10, 50 20, 20 90, 10 Non-Cooperative Games p = 3 –20, 40 10, 90 50, 50 • Received Nobel Prize in Economics for it in 1994 There is no dominant strategy • Subject of A Beautiful Mind
11/26/07 Example from McCain’s Game Theory: An Introductory Sketch 15 11/26/07 16
Definition of Nash Equilibrium Another Example (Reconsidered)
• A set of strategies with the property: Price Beta No player can benefit by changing actions Competition while others keep strategies unchanged p = 1 p = 2 p = 3 • Players are in equilibrium if any change of p = 1 0, 0 50, –10 40, –20 strategy would lead to lower reward for that Alpha p = 2 –10, 50 20, 20 90, 10 player • For mixed strategies, we consider expected p = 3 –20, 40 10, 90 50, 50 reward better for Beta better for Alpha Not a Nash equilibrium 11/26/07 17 11/26/07 Example from McCain’s Game Theory: An Introductory Sketch 18
3 Part 4: Cooperation & Competition 11/26/07
The Nash Equilibrium Extensions of the Concept of a Rational Solution Price Beta • Every maximin solution is a dominant Competition p = 1 p = 2 p = 3 strategy equilibrium p = 1 0, 0 50, –10 40, –20 • Every dominant strategy equilibrium is a Alpha p = 2 –10, 50 20, 20 90, 10 Nash equilibrium p = 3 –20, 40 10, 90 50, 50
Nash equilibrium
11/26/07 Example from McCain’s Game Theory: An Introductory Sketch 19 11/26/07 20
Cooperation Better for Both: A Dilemma Dilemmas
Price Beta • Dilemma: “A situation that requires choice between options that are or seem equally Competition p = 1 p = 2 p = 3 unfavorable or mutually exclusive” p = 1 0, 0 50, –10 40, –20 – Am. Her. Dict. Alpha p = 2 –10, 50 20, 20 90, 10 • In game theory: each player acts rationally, p = 3 –20, 40 10, 90 50, 50 but the result is undesirable (less reward)
Cooperation
11/26/07 Example from McCain’s Game Theory: An Introductory Sketch 21 11/26/07 22
The Prisoners’ Dilemma • Devised by Melvin Dresher & Merrill Flood in Prisoners’ Dilemma: The Story 1950 at RAND Corporation • Further developed by mathematician Albert W. • Two criminals have been caught Tucker in 1950 presentation to psychologists • They cannot communicate with each other • It “has given rise to a vast body of literature in • If both confess, they will each get 10 years subjects as diverse as philosophy, ethics, biology, sociology, political science, economics, and, of • If one confesses and accuses other: course, game theory.” — S.J. Hagenmayer – confessor goes free • “This example, which can be set out in one page, – accused gets 20 years could be the most influential one page in the social • If neither confesses, they will both get 1 sciences in the latter half of the twentieth year on a lesser charge century.” — R.A. McCain
11/26/07 23 11/26/07 24
4 Part 4: Cooperation & Competition 11/26/07
Prisoners’ Dilemma Ann’s “Rational” Analysis Payoff Matrix (Dominant Strategy)
Bob Bob cooperate defect cooperate defect cooperate –1, –1 –20, 0 cooperate –1, –1 –20, 0 Ann Ann defect 0, –20 –10, –10 defect 0, –20 –10, –10
• defect = confess, cooperate = don’t • if cooperates, may get 20 years • if defects, may get 10 years • payoffs < 0 because punishments (losses) • ∴, best to defect 11/26/07 25 11/26/07 26
Bob’s “Rational” Analysis Suboptimal Result of (Dominant Strategy) “Rational” Analysis
Bob Bob cooperate defect cooperate defect cooperate –1, –1 –20, 0 cooperate –1, –1 –20, 0 Ann Ann defect 0, –20 –10, –10 defect 0, –20 –10, –10
• if he cooperates, may get 20 years • each acts individually rationally ⇒ get 10 years • if he defects, may get 10 years (dominant strategy equilibrium) • ∴, best to defect • “irrationally” decide to cooperate ⇒ only 1 year 11/26/07 27 11/26/07 28
Summary
• Individually rational actions lead to a result that all agree is less desirable • In such a situation you cannot act unilaterally in Classification of Dilemmas your own best interest • Just one example of a (game-theoretic) dilemma • Can there be a situation in which it would make sense to cooperate unilaterally? – Yes, if the players can expect to interact again in the future
11/26/07 29 11/26/07 30
5 Part 4: Cooperation & Competition 11/26/07
General Conditions for a General Payoff Matrix Dilemma
• You always benefit if the other cooperates: Bob . CC > CD and DC > DD • You sometimes benefit from defecting: cooperate defect . DC > CC or DD > CD • Mutual coop. is preferable to mut. def. cooperate CC (R) CD (S) . CC > DD Reward Sucker Ann • Consider relative size of CC, CD, DC, DD defect DC (T) DD (P) . think of as permutations of R, S, T, P Temptation Punishment . only three result in dilemmas 11/26/07 31 11/26/07 32
Three Possible Orders The Three Dilemmas
CC DC • Chicken (TRSP) (R) (T) . DC > CC > CD > DD . characterized by mutual defection being worst . two Nash equilibria (DC, CD) • Stag Hunt (RTPS) . CC > DC > DD > CD . better to cooperate with cooperator . Nash equilibrium is CC CD DD (S) (P) • Prisoners’ Dilemma (TRPS) . DC > CC > DD > CD The three dilemmas: TRSP, RTPS, TRPS . better to defect on cooperator . Nash equilibrium is DD 11/26/07 33 11/26/07 34
Assumptions
• No mechanism for enforceable threats or The Iterated Prisoners’ Dilemma commitments • No way to foresee a player’s move • No way to eliminate other player or avoid and Robert Axelrod’s Experiments interaction • No way to change other player’s payoffs • Communication only through direct interaction
11/26/07 35 11/26/07 36
6 Part 4: Cooperation & Competition 11/26/07
Axelrod’s Experiments IPD Payoff Matrix
• Intuitively, expectation of future encounters B may affect rationality of defection cooperate defect • Various programs compete for 200 rounds – encounters each other and self cooperate 3, 3 0, 5 • Each program can remember: A – its own past actions defect 5, 0 1, 1 – its competitors’ past actions • 14 programs submitted for first experiment N.B. Unless DC + CD < 2 CC (i.e. T + S < 2 R), can win by alternating defection/cooperation 11/26/07 37 11/26/07 38
Indefinite Number Analysis of Some Simple of Future Encounters Strategies • Cooperation depends on expectation of • Three simple strategies: indefinite number of future encounters – ALL-D: always defect • Suppose a known finite number of – ALL-C: always cooperate encounters: – RAND: randomly cooperate/defect – No reason to C on last encounter • Effectiveness depends on environment – ALL-D optimizes local (individual) fitness – Since expect D on last, no reason to C on next to last – ALL-C optimizes global (population) fitness – RAND compromises – And so forth: there is no reason to C at all
11/26/07 39 11/26/07 40
Expected Scores Result of Axelrod’s Experiments
⇓ playing ⇒ ALL-C RAND ALL-D Average • Winner is Rapoport’s TFT (Tit-for-Tat) – cooperate on first encounter ALL-C 3.0 1.5 0.0 1.5 – reply in kind on succeeding encounters • Second experiment: RAND 4.0 2.25 0.5 2.25 – 62 programs – all know TFT was previous winner ALL-D 5.0 3.0 1.0 3.0 – TFT wins again
11/26/07 41 11/26/07 42
7 Part 4: Cooperation & Competition 11/26/07
Characteristics of Successful Strategies Demonstration of • Don’t be envious – at best TFT ties other strategies Iterated Prisoners’ Dilemma • Be nice – i.e. don’t be first to defect • Reciprocate Run NetLogo demonstration – reward cooperation, punish defection PD N-Person Iterated.nlogo • Don’t be too clever – sophisticated strategies may be unpredictable & look random; be clear
11/26/07 43 11/26/07 44
Effects of Many Kinds of Noise Tit-for-Two-Tats Have Been Studied • More forgiving than TFT • Misimplementation noise • Wait for two successive defections before • Misperception noise punishing – noisy channels • Beats TFT in a noisy environment • Stochastic effects on payoffs • E.g., an unintentional defection will lead • General conclusions: TFTs into endless cycle of retaliation – sufficiently little noise ⇒ generosity is best • May be exploited by feigning accidental – greater noise ⇒ generosity avoids unnecessary defection conflict but invites exploitation
11/26/07 45 11/26/07 46
More Characteristics of Successful Strategies • Should be a generalist (robust) Kant’s Categorical Imperative – i.e. do sufficiently well in wide variety of environments • Should do well with its own kind “Act on maxims that can at the same time – since successful strategies will propagate have for their object themselves as universal • Should be cognitively simple laws of nature.” • Should be evolutionary stable strategy – i.e. resistant to invasion by other strategies
11/26/07 47 11/26/07 48
8 Part 4: Cooperation & Competition 11/26/07
Reading
• CS 420/594: Flake, ch. 18 (Natural & Ecological & Spatial Models Analog Computation)
11/26/07 49 11/26/07 50
Ecological Model Variables
• What if more successful strategies spread in • Pi(t) = probability = proportional population population at expense of less successful? of strategy i at time t
• Models success of programs as fraction of • Si(t) = score achieved by strategy i total population • Rij(t) = relative score achieved by strategy i • Fraction of strategy = probability random playing against strategy j over many rounds program obeys this strategy – fixed (not time-varying) for now
11/26/07 51 11/26/07 52
Computing Score of a Strategy Updating Proportional Population
• Let n = number of strategies in ecosystem • Compute score achieved by strategy i: P t S t n i ( ) i ( ) Pi (t +1) = n S t R t P t i ( ) = ik ( ) k ( ) Pj t S j t " " j=1 ( ) ( ) k=1
S(t) = R(t)P(t) ! 11/2!6/07 53 11/26/07 54
!
9 Part 4: Cooperation & Competition 11/26/07
Some Simulations Demonstration Simulation • 60% ALL-C • 20% RAND • Usual Axelrod payoff matrix • 10% ALL-D, TFT • 200 rounds per step
11/26/07 55 11/26/07 56
Collectively Stable Strategy • Let w = probability of future interactions • Suppose cooperation based on reciprocity NetLogo Demonstration of has been established Ecological IPD • Then no one can do better than TFT provided: $ T # R T # R' w " max& , ) Run EIPD.nlogo % R # S T # P(
• The TFT users are in a Nash equilibrium
11/26/07 57 11/26/07 58 !
“Win-Stay, Lose-Shift” Strategy Simulation without Noise • 20% each • no noise • Win-stay, lose-shift strategy: – begin cooperating – if other cooperates, continue current behavior – if other defects, switch to opposite behavior • Called PAV (because suggests Pavlovian learning)
11/26/07 59 11/26/07 60
10 Part 4: Cooperation & Competition 11/26/07
Effects of Noise Simulation with Noise • 20% each • Consider effects of noise or other sources of error • 0.5% noise in response • TFT: – cycle of alternating defections (CD, DC) – broken only by another error • PAV: – eventually self-corrects (CD, DC, DD, CC) – can exploit ALL-C in noisy environment
• Noise added into computation of Rij(t)
11/26/07 61 11/26/07 62
Spatial Effects Spatial Simulation • Previous simulation assumes that each agent is equally likely to interact with each other • Toroidal grid • So strategy interactions are proportional to • Agent interacts only with eight neighbors fractions in population • Agent adopts strategy of most successful • More realistically, interactions with neighbor “neighbors” are more likely – “Neighbor” can be defined in many ways • Ties favor current strategy • Neighbors are more likely to use the same strategy
11/26/07 63 11/26/07 64
Typical Simulation (t = 1) Typical Simulation (t = 5)
Colors: Colors:
ALL-C ALL-C TFT TFT RAND RAND PAV PAV ALL-D ALL-D
11/26/07 65 11/26/07 66
11 Part 4: Cooperation & Competition 11/26/07
Typical Simulation (t = 10) Typical Simulation (t = 10) Zooming In
Colors:
ALL-C TFT RAND PAV ALL-D
11/26/07 67 11/26/07 68
Typical Simulation (t = 20) Typical Simulation (t = 50)
Colors: Colors:
ALL-C ALL-C TFT TFT RAND RAND PAV PAV ALL-D ALL-D
11/26/07 69 11/26/07 70
Typical Simulation (t = 50) Zoom In
Simulation of Spatial Iterated Prisoners Dilemma
Run sipd.nlogo
11/26/07 71 11/26/07 72
12 Part 4: Cooperation & Competition 11/26/07
SIPD Without Noise Conclusions: Spatial IPD
• Small clusters of cooperators can exist in hostile environment • Parasitic agents can exist only in limited numbers • Stability of cooperation depends on expectation of future interaction • Adaptive cooperation/defection beats unilateral cooperation or defection
11/26/07 73 11/26/07 74
Additional Bibliography
1. von Neumann, J., & Morgenstern, O. Theory of Games and Economic Behavior, Princeton, 1944. 2. Morgenstern, O. “Game Theory,” in Dictionary of the History of Ideas, Charles Scribners, 1973, vol. 2, pp. 263-75. 3. Axelrod, R. The Evolution of Cooperation. Basic Books, 1984. 4. Axelrod, R., & Dion, D. “The Further Evolution of Cooperation,” Science 242 (1988): 1385-90. 5. Poundstone, W. Prisoner’s Dilemma. Doubleday, 1992.
11/26/07 75
13