The Life Game: Cognitive Strategies for Repeated Stochastic Games

2011 IEEE International Conference on Privacy, Security, Risk, and Trust, and IEEE International Conference on Social Computing The Life Game: Cognitive Strategies for Repeated Stochastic Games Kan-Leung Cheng Inon Zuckerman Dana Nau Jennifer Golbeck Dept. of Computer Science Dept. of Industrial Engineering Dept. of Computer Science College of Information Studies University of Maryland and Management University of Maryland University of Maryland College Park, MD, USA Ariel University Center of Samaria College Park, MD, USA College Park, MD, USA Ariel, 44837, Israel Player 2 Abstract—Standard models in bio-evolutionary game theory Prisoner’s Dilemma involve repetitions of a single stage game (e.g., the Prisoner’s 1=Cooperate 2=Defect Dilemma or the Stag Hunt); but it is clear that repeatedly playing 1=Cooperate (3, 3) (0, 5) Player 1 the same stage game is not an accurate model of most individuals’ 2=Defect (5, 0) (1, 1) lives. Rather, individuals’ interactions with others correspond to many different kinds of stage games. Fig. 1. Prisoner’s Dilemma payoff matrix. In this work, we concentrate on discovering behavioral strategies that are successful for the life game, in which the stage game is chosen stochastically at each iteration. We present a cognitive agent model based on Social Value Orientation (SVO) theory. We provide extensive evaluations of our model’s performance, proposed the life game, in which an individual plays a mixture both against standard agents from the game theory literature of games drawn sequentially according to some stochastic and against a large set of life-game agents written by students process from a large set of stage games. Bacharach referred in two different countries. Our empirical results suggest that for to the size and variety of this set as the game’s ludic diversity life-game strategies to be successful in environments with such (thus an ordinary non-stochastic repeated game has minimal agents, it is important (i) to be unforgiving with respect to trust behavior and (ii) to use adaptive, fine-grained opponent models ludic diversity). of the other agents. In this paper, we concentrate on discovering behavioral Keywords-repeated games, non-zero-sum games, stochastic strategies that are successful in life games of high ludic games, social value orientation diversity. These games pose difficulties when trying to describe a successful strategy. For example, well-known strategies such I. INTRODUCTION as the famous tit-for-tat strategy cannot be used verbatim, An interesting puzzle in the evolution of human societies is because not all iterations will have actions in which the actions the dissonance between Darwin’s principle of natural selection correspond to “cooperate” and “defect.” The complexity of and cooperative actions commonly observed in human (and the game dictates a large, complex strategy space, but our animal) societies. A prominent way to study this problem objective is to discover important general properties that has been to use repeated games, to discover its equilibrium characterize successful strategies. properties under different environmental properties, starting This paper makes the following contributions. We formally conditions, and reproduction mechanisms. describe the life game (which Bacharach only described infor- In the standard repeated-game model, a set of agents re- mally), and discuss its inherent challenges. We then propose a peatedly play a game called the stage game. Many different cognitive behavioral model for agents in the life game, based games can be used as the stage game. For example, Axelrod’s upon a prominent social-preference theory called Social Value famous Iterated Prisoner’s Dilemma competitions showed the Orientation theory (SVO). We also refine and extensively emergence of cooperation, even though the rational dominant evaluate our model using a large set of peer designed agents equilibrium in a one-shot Prisoner’s Dilemma is to defect written by students in two different countries. Our empirical [3]. Maynard Smith studied two-player Chicken game with a results suggest that an unforgiving strategy performs better population of Hawks and Doves [14], and Skyrms studied the than a tit-for-tat-like strategy. That is, in stage games where evolved population when individuals were playing the Stag- there are analogs of “cooperate” and “defect” (as in the hunt game [17]. Prisoner’s Dilemma), if another agent chooses the “defect” Each of the above studies used a simple game model in action rather than the “cooperate” action, then we should which the same stage game was used at every iteration. expect them to behave similarly in future iterations, and choose However, as pointed out by Bacharach [5, p. 100], repeatedly our actions accordingly. The empirical work also demonstrates playing the same game is unlikely to be an accurate model the importance of an adaptive, fine-grained set of opponent of any individual’s life. As more accurate model, Bacharach models in successful strategies. 978-0-7695-4578-3/11 $26.00 © 2011 IEEE 95 DOI II. BACKGROUND Player 2 2x2 symmetric game 1 2 There have been many studies on iterated games in the game 1 (, )(, ) theory literature. The most famous example is the Iterated Player 1 2 (,)( , ) Prisoner’s Dilemma (IPD) (see Fig. 1), which is an iterated variant of the Prisoner’s Dilemma that is played repeatedly Fig. 2. Generalized form of 2x2 symmetric games. an unknown number of iterations. The Prisoner’s Dilemma is a widely used model for social dilemmas between two agents and has been often used to study theories of human this paper, this assumption is no longer valid. As we will see cooperation and trust. The intriguing characteristic of the below (in Section IV), most known strategies simply cannot IPD is that while game theory analysis for a single iteration be generalized to the complex world of the life game, and suggests that rational agents should “defect”, cooperation often consequently, new ones must be defined. emerges when the number of iterations is unknown. III. THE LIFE GAME MODEL One of the first interesting questions with respect to the IPD was the discovery and description of successful IPD Our life game model is defined as a set of iterated and strategies. These strategies and their properties were meant to symmetric 2x2 normal-form games, where the two agents are help enrich the theoretical biology/evolutionary discussion on denoted as 1 and 2. In every iteration, a symmetric random various mechanisms that complement the Darwinian process game will be generated, and the agent’s strategy space will (for instance: reciprocity, kin selection, group selection). An be composed of two actions, namely 1 and 2.Thegame important milestone was the completion of two publicly open is a complete information game, as each agent knows the IPD tournaments that were run by Robert Axelrod in the complete payoff matrix, and can compute the payoffs for each early 80s [3]. In his tournament, each strategy was paired combination of actions. As a normal form games, both agents with each other strategy for 200 iterations of a Prisoner’s need to simultaneously choose one action from the random Dilemma, and scored on the total payoffs accumulated through game. After deciding on the actions, each agent will be notified the tournament. The winner of the first tournament was Anatol on the attained payoffs and the action chosen by the other Rapoport’s tit-for-tat strategy, which simply cooperates on the agent. The two agents will play the games in succession, first iteration of the game, and then repeats the other agent’s without knowing when the series of games will end. Each action from the previous iteration. Surprisingly, the same tit- game is randomly and independently generated. That is, the − 1 for-tat strategy was also the winner in the second tournament. randomized matrix in round does not have any impact on the matrix generated for round . We do not place any Axelrod, in his post tournaments analysis, discovered that restrictions on the agents’ memory, and they may record past greedy strategies tended to do very poorly in the long run matrices and the actions taken by both agents and utilize it in while cooperative strategies did better. Furthermore, by an- their strategy. alyzing the top-scoring strategies in the tournament, Axelrod In this paper, we used a random matrix in which the payoff presented several properties that describe successful strategies: values were chosen uniformly from [0, 9]. This kind of random nice (cooperate, never be the first to defect), provocable to game is generalized enough to represent most of the interesting both retaliation and forgiveness (return defection for defection, games found in the game theory literature. Among the well- cooperation for cooperation), non-envious (be fair with your known examples are the Prisoner’s Dilemma, Chicken Game, partner), and clarity (don’t try to be tricky). Since Axelrod’s and Stag-Hunt [16]. Fig. 2 shows a generalized form of the IPD tournaments, there has been an extensive research on payoff matrix for such games, where various constraints on the finding and describing successful strategies [6], [15]. payoff values can be used to define different classes of social The most relevant piece of literature to our study is a recent dilemmas. Note that the semantics of the actions depend on paper [7] where the authors presented an equilibrium analysis the value of , , and . For example, if =3, =0, =5 for the emergent of cultures when playing multiple games. and =1(a Prisoner’s Dilemma), then 1 and 2 can be Nevertheless, they were not concerned with the success of considered as “Cooperate” and “Defect” respectively. individual strategies, and assumed a predefined set of 6 games Similarly to the IPD, in the life game competition each with explicitly labeled actions to avoid the semantic problem.

The Life Game: Cognitive Strategies for Repeated Stochastic Games

The Determinants of Efficient Behavior in Coordination Games

Lecture 4 Rationalizability & Nash Equilibrium Road

Lecture Notes

Introduction to Game Theory Lecture 1: Strategic Game and Nash Equilibrium

The Conservation Game ⇑ Mark Colyvan A, , James Justus A,B, Helen M

Game Theory Model and Empirical Evidence

Zbwleibniz-Informationszentrum

Ui(Α) = −(Ni (Α) + Ni (Α) + Ni (Α)), R C B Where Ni (Α), Ni (Α), Ni (Α) Is the Number of Repetitions of Numbers in I’S Row, Column and Block Respec- Tively

An Introduction to Game Theory

The Evolution of Inefficiency in a Simulated Stag Hunt

Construction of Subgame-Perfect Mixed-Strategy Equilibria in Repeated Games

Game-Theoretic Learning