Journal of Economic Literature Vol. XXXVI (September 1998), pp. 1347–1374

Mailath: Do PeopleJournal of Economic Play Literature, Nash Vol. Equilibrium? XXXVI (September 1998) Do People Play Nash Equilibrium? Lessons From Evolutionary

GEORGE J. MAILATH1

1. Introduction such justification, the use of game the- ory in applications is problematic. The T THE SAME TIME that noncoopera- appropriate use of game theory requires tive game theory has become a stan- A understanding when its assumptions dard tool in economics, it has also come make sense and when they do not. under increasingly critical scrutiny from In some ways, the challenge of pro- theorists and experimentalists. Noncoop- viding a compelling justification is not a erative game theory, like neoclassical new one. A major complaint other so- economics, is built on two heroic as- cial scientists (and some economists) sumptions: Maximization—every eco- have about economic methodology is nomic agent is a rational decision maker the central role of the maximization hy- with a clear understanding of the world; pothesis. A common informal argument and consistency—the agent’s under- is that any agent not optimizing—in standing, in particular, expectations, of particular, any firm not maximizing other agents’ behavior, is correct (i.e., profits—will be driven out by market the overall pattern of individual optimiz- forces. This is an evolutionary argu- ing behavior forms a Nash equilibrium). ment, and as is well known, Charles These assumptions are no less controver- Darwin was led to the idea of natural sial in the context of noncooperative selection from reading Thomas game theory than they are in neoclassical Malthus.2 But does such a justification economics. work? Is Nash equilibrium (or some re- A major challenge facing noncoopera- tive game theorists today is that of 2 “In October 1838, that is, fifteen months after providing a compelling justification for I had begun my systematic enquiry, I happened to these two assumptions. As I will argue read for amusement ‘Malthus on Population,’ and here, many of the traditional justifica- being well prepared to appreciate the struggle for existence which everywhere goes on from long- tions are not compelling. But without continued observation of the habits of animals and plants, it at once struck me that under these cir- 1 University of Pennsylvania. Acknowledgments: cumstances favorable variations would tend to be I thank , Steven Matthews, preserved, and unfavorable ones to be destroyed. Loretta Mester, John Pencavel, three referees, The results of this would be the formation of new and especially Larry Samuelson for their com- species. Here, then I had at last got a theory by ments. Email: [email protected]. which to work;” Charles Darwin (1887, p. 83). 1347 1348 Journal of Economic Literature, Vol. XXXVI (September 1998) lated concept) a good predictor of be- of players, these players are interacting, havior? and the behavior is naive (in two senses: While the parallel between noncoop- players do not believe—understand— erative game theory and neoclassical that their own behavior potentially af- economics is close, it is not perfect. fects future play of their opponents, Certainly, the question of whether and players typically do not take into agents maximize is essentially the same account the possibility that their oppo- in both. Moreover, the consistency as- nents are similarly engaged in adjusting sumption also appears in neoclassical their own behavior). It is important to economics as the assumption that prices note that successful behavior becomes clear markets. However, a fundamental more prevalent not just because market distinction between neoclassical eco- forces select against unsuccessful be- nomics and noncooperative game theory havior, but also because agents imitate is that, while the many equilibria of a successful behavior. competitive economy almost always Since evolutionary game theory stud- share many of the same properties ies populations playing games, it is also (such as efficiency or its lack),3 the useful for studying social norms and many equilibria of games often have conventions. Indeed, many of the moti- dramatically different properties. While vating ideas are the same.4 The evolu- neoclassical economics does not address tion of conventions and social norms is the question of equilibrium selection, an instance of players learning to play game theory must. an equilibrium. A convention can be Much of the work in evolutionary thought of as a symmetric equilibrium game theory is motivated by two basic of a coordination game. Examples in- questions: clude a population of consumers who 1. Do agents play Nash equilib- must decide which type of good to pur- rium? chase (in a world of competing stan- 2. Given that agents play Nash equi- dards); a population of workers who librium, which equilibrium do they must decide how much effort to exert; a play? population of traders at a fair (market) Evolutionary game theory formalizes who must decide how aggressively to and generalizes the evolutionary argu- bargain; and a population of drivers ran- ment given above by assuming that domly meeting at intersections who more successful behavior tends to be must decide who gives way to whom. more prevalent. The canonical model Evolutionary game theory has pro- has a population of players interacting vided a qualified affirmative answer to over time, with their behavior adjusting the first question: In a range of settings, over time in response to the payoffs agents do (eventually) play Nash. There (utilities, profits) that various choices is thus support for equilibrium analysis have historically received. These play- in environments where evolutionary ar- ers could be workers, consumers, firms, guments make sense. Equilibrium is etc. The focus of study is the dynamic best viewed as the steady state of a behavior of the system. The crucial as- community whose members are myopi- sumptions are that there is a population cally groping toward maximizing behav- ior. This is in marked contrast to the 3 Or perhaps economists have chosen to study only those properties that are shared by all equi- 4 See, for example, Jon Elster (1989), Brian libria. For example, different competitive equilib- Skyrms (1996), Robert Sugden (1989), and H. ria have different income distributions. (1996). Mailath: Do People Play Nash Equilibrium? 1349 earlier view (which, as I said, lacks in Section 5. As much as possible, I satisfactory foundation), according to have used simple examples. Very few which game theory and equilibrium theorems are stated (and then only in- analysis are the study of the interaction formally). Recent surveys of evolution- of (ultra-) rational agents with a large ary game theory include Eric van amount of (common) knowledge.5 Damme (1987, ch. 9), Michihiro Kan- The question of which equilibrium is dori (1997), Mailath (1992), and Jörgen played has received a lot of attention, Weibull (1995). most explicitly in the refinements litera- ture. The two most influential ideas in 2. The Question that literature are backward and for- Economics and game theory typically ward induction. Backward induction assume that agents are “rational” in the and its extensions—subgame perfection sense of pursuing their own welfare, as and sequentiality—capture notions of they see it. This hypothesis is tautologi- credibility and sequential rationality. cal without further structure, and it is Forward induction captures the idea usually further assumed that agents un- that a player’s choice of current action derstand the world as well as (if not can be informative about his future better than) the researcher studying the play. The concerns about adequate world inhabited by these agents. This foundations extend to these refinement often requires an implausible degree of ideas. While evolutionary game theory computational and conceptual ability on does discriminate between equilibria, the part of the agents. For example, backward induction receives little sup- while chess is strategically trivial,6 it is port from evolutionary game theory. computationally impossible to solve (at Forward induction receives more sup- least in the foreseeable future). port. One new important principle for Computational limitations, however, selecting an equilibrium, based on sto- are in many ways less important than chastic stability, does emerge from conceptual limitations of agents. The evolutionary game theory, and this prin- typical agent is not like Gary Kasparov, ciple discriminates between strict equi- the world champion chess player who libria (something backward and forward knows the rules of chess, but also knows induction cannot do). that he doesn’t know the winning strat- The next section outlines the major egy. In most situations, people do not justifications for Nash equilibrium, and know they are playing a game. Rather, the difficulties with them. In that sec- people have some (perhaps imprecise) tion, I identify learning as the best notion of the environment they are in, available justification for Nash equilib- their possible opponents, the actions rium. Section 3 introduces evolutionary they and their opponents have available, game theory from a learning perspec- and the possible payoff implications of tive. The idea that Nash equilibrium different actions. These people use heu- can be usefully thought of as an evolu- ristics and rules of thumb (generated tionary stable state is described in Sec- from experience) to guide behavior; tion 4. The question of which Nash sometimes these heuristics work well equilibrium is played is then discussed 6 Since chess is a finite game of perfect informa- 5 Even in environments where an evolutionary tion, it has been known since 1912 that either analysis would not be appropriate, equilibrium White can force a win, Black can force a win, or analysis is valuable in illuminating the strategic either player can force a draw (Ernst Zermelo structure of the game. 1912). 1350 Journal of Economic Literature, Vol. XXXVI (September 1998) and sometimes they don’t.7 These heu- minimum of other workers’ efforts ristics can generate behavior that is high low inconsistent with straightforward maxi- mization. In some settings, the be- worker’s high 50 havior can appear as if it was generated effort low 33 by concerns of equity, fairness, or Figure 1. A “stag-hunt” played by workers in a team. revenge. I turn now to the question of consis- tency. It is useful to first consider situ- duces a per capita output of 3, while if ations that have aspects of coordination, all workers put in high effort, per capita i.e., where an agent (firm, consumer, output is 7. Suppose, moreover, the worker, etc.) maximizes his welfare by disutility of high effort is 2 (valued in choosing the same action as the major- the same units as output). We can thus ity. For example, in choosing between represent the possibilities, as in Figure computers, consumers have a choice 1.9 It is worth emphasizing at this point between PCs based on Microsoft’s op- that the characteristics that make the erating systems and Apple-compatible stag hunt game interesting are perva- computers. There is significantly more sive. In most organizations, the value of software available for Microsoft-com- a worker’s effort is increasing in the ef- patible computers, due to the market fort levels of the other workers. share of the Microsoft computers, and What should we predict to be the this increases the value of the outcome? Consider a typical worker, Microsoft-compatible computers. Firms whom I call Bruce for definiteness. If must often choose between different all the other workers are only putting in possible standards. low effort, then the best choice for The first example concerns a team of Bruce is also low effort: high effort is workers in a modern version of Jean- costly, and choosing high effort cannot Jacques Rousseau’s (1950) Stag Hunt.8 increase output (since output is deter- In the example, each worker can put in mined by the minimum effort of all the low or high effort, the team’s total out- workers). Thus, if Bruce expects the put (and so each worker’s compensa- other workers to put in low effort, then tion) is determined by the minimum ef- Bruce will also put in low effort. Since fort of all the workers, and effort is all workers find themselves in the same privately costly. Suppose that if all situation, we see that all workers choos- workers put in low effort, the team pro- ing to put in low effort is a Nash equi- librium: each worker is behaving in his 7 Both Andrew Postlewaite and Larry Samuelson have made the observation that in life there are no own best interest, given the behavior of one-shot games and no “last and final offers.” the others. Now suppose the workers Thus, if an experimental subject is placed in an (other than Bruce) are putting in high artificial environment with these properties, the subject’s heuristic will not work well until it has effort. In this case, the best choice for adjusted to this environment. I return to this in Bruce is now high effort. While high ef- my discussion of the ultimatum game. fort is costly, Bruce’s choice of high 8 Rousseau’s (1950) stag hunt describes several hunters in the wilderness. Individually, each rather than low effort now does affect hunter can catch rabbits and survive. Acting to- gether, the hunters can catch a stag and have a 9 The stag hunt game is an example of a coordi- feast. However, in order to catch a stag, every nation game. A pure coordination game differs hunter must cooperate in the stag hunt. If even from the game in Figure 1 by having zeroes in the one hunter does not cooperate (by catching rab- off-diagonal elements. The game in Figure 13 is a bits), the stag escapes. pure coordination game. Mailath: Do People Play Nash Equilibrium? 1351 output (since Bruce’s choice is the of course, ruled out by equilibrium con- minimum) and so the increase in output siderations. But what does that mean? (+ 4) is more than enough to justify the The critical feature of the scenario just increase in effort. Thus, if Bruce ex- described is that the expectations of pects all the other workers to be putting Bruce or Sheila about the behavior of in high effort, then he will also put in the other members of the team are in- high effort. As for the low-effort case, a correct, something that Nash equilib- description of behavior in which all rium by definition does not allow. workers choose high effort constitutes a As I said earlier, providing a compel- Nash equilibrium. ling argument for Nash equilibrium is a These two descriptions of worker be- major challenge facing noncooperative havior (all choose low effort and all game theory today.12 The consistency choose high effort) are internally con- in Nash equilibrium seems to require sistent; they are also strict: Bruce that players know what the other play- strictly prefers to choose the same ef- ers are doing. But where does this fort level as the other workers.10 This knowledge come from? When or why is implies that even if Bruce is somewhat this a plausible assumption? Several unsure about the minimum effort justifications are typically given for choice (in particular, as long as Bruce Nash equilibria: preplay communica- assigns a probability of no more than tion, self-fulfilling prophecies (consis- 0.4 to some worker choosing the other tent predictions), focal points, and effort choice), this does not affect his learning.13 behavior. The idea underlying preplay commu- But are these two descriptions good nication is straightforward. Suppose the predictions of behavior? Should we, as workers in Bruce’s team meet before outside observers, be confident in a prediction that all the workers in that term is commonly understood (particularly in Bruce’s team will play one of the two macroeconomics) to refer to coordination on an inefficient equilibrium (such as low effort chosen Nash equilibria? And if so, why and by all workers). which one? Note that this is not the 12 The best introspective (i.e., knowledge or same as asking if Bruce will choose an epistemic) foundations for Nash equilibrium as- sume that each player’s conjectures about the be- effort that is consistent with equilib- havior of other players is known by all the players; rium. After all, both choices (low and see Robert Aumann and Adam Brandenburger high effort) are consistent with equilib- (1995). This assumption does not appear to be a significant improvement over the original assump- rium, and so Bruce necessarily chooses tion of Nash behavior. an equilibrium effort. More generally, there is no compelling intro- Rather, the question concerns the be- spective argument for any useful equilibrium no- tion. The least controversial are those that do not havior of the group as a whole. How do require the imposition of a consistency condition, we rule out Bruce choosing high effort such as rationalizability (introduced by Douglas because he believes everyone else will, Bernheim 1984 and David Pearce 1984); for two players it is equivalent to the iterated deletion of while Sheila (another worker on Bruce’s strictly dominated strategies), the iterated deletion team) chooses low effort because she of weakly dominated strategies, and backward in- believes everyone else will?11 This is, duction. However, in most games, rationalizability does little to constrain players’ behavior, and as we will see below, the iterated deletion of weakly 10 A strict Nash equilibrium is a Nash equilib- dominated strategies and backward induction are rium in which, given the play of the opponents, both controversial. each player has a unique best reply. 13 What if there is only one equilibrium? Does 11 While the term “coordination failure” would this by itself give us a reason to believe that the seem to be an apt one to describe this scenario, unique equilibrium will be played? The answer is no. 1352 Journal of Economic Literature, Vol. XXXVI (September 1998) they must choose their effort levels and employer discuss how much effort they each will high wage low wage high wage 3,2 0,0 exert. If the workers reach an agree- employee ment that they all believe will be fol- low wage 0,0 2,3 lowed, it must be a Nash equilibrium (otherwise at least one worker has an Figure 2. A “Battle-of-the-sexes” between an employer and a potential employee bargaining over wages. Each incentive to deviate). This justification simultaneously makes a wage demand or offer. The certainly has appeal and some range of worker is only hired if they agree. applicability. However, it does not cover all possible applications. It also assumes that an agreement is reached, Such a game, which may describe a bar- and, once reached, is kept. While it gaining interaction, has several Pareto seems clear that an agreement will be noncomparable Nash equilibria: there reached and kept (and which one) in are several profitable agreements that our stag hunt example (at least if the can be reached, but the bargainers have team is small!), this is not true in gen- opposed preferences over which agree- eral. The first difficulty, discussed by ment is reached. In this case, it is not Aumann (1990), is that an agreement clear that an agreement will be reached. may not be kept. Suppose we change Moreover, if the game does have multi- the payoff in the bottom left cell of Fig- ple Pareto noncomparable Nash equi- ure 1 from 3 to 4 (so that a worker libria, then the preplay communication choosing low effort, when the minimum stage is itself a bargaining game and so of the other workers’ effort is high, re- perhaps should be explicitly modelled ceives a payoff of 4 rather than 3). In (at which point, the equilibrium prob- this version of the stag hunt, Bruce lem resurfaces).14 Finally, there may be benefits from high-effort choices by the no possibility of preplay communica- other workers, irrespective of his own tion. choice. Bruce now has an incentive to The second justification of self-fulfill- agree to high effort, no matter what he ing prophecy runs as follows: If a theory actually intends to do (since this in- uniquely predicting players’ behaviors is creases the likelihood of high effort by known by the players in the game, then the other workers). But then reaching it must predict Nash equilibria (see the agreement provides no information (1991, pp. 105–108) for about the intended play of workers and an extended discussion of this argu- so may not be kept. The second diffi- ment). The difficulty, of course, is that culty is that no agreement may be the justification requires a theory that reached. Suppose, for example, the in- uniquely predicts player behavior, and teraction has the characteristics of a that is precisely what is at issue. battle-of-the-sexes game (Figure 2). The focal point justification, due to Thomas Schelling (1960), can be phrased as “if there is an obvious way to It is possible that the unique Nash equilibrium play in a game (derived from either the yields each player their maximin values, while at the same time being riskier (in the sense that the structure of the game itself or from the Nash equilibrium strategy does not guarantee the maximin value). This is discussed by, for example, 14 The preplay communication stage might also John Harsanyi (1977, p. 125) and Aumann (1985). involve a correlating device, like a coin. For exam- David Kreps (1990a, p. 135) describes a compli- ple, the players might agree to flip a coin: if heads, cated game with a unique equilibrium that is also then the worker receives a high wage, while if unlikely to be played. tails, the worker receives a low wage. Mailath: Do People Play Nash Equilibrium? 1353 setting), then players will know what ing, then we must be at a Nash equilib- other players are doing.” There are rium. There are two elements to this many different aspects of a game that learning story. The first is that, given can single out an “obvious way to play.” maximizing behavior of players, players For example, considerations of fairness can learn the behavior of their oppo- may make equal divisions of a surplus nents.16 The second is that players are particularly salient in a bargaining maximizing. This involves, as I dis- game. Previous experience suggests that cussed earlier, additional considera- stopping at red lights and going through tions of learning. Even if a player knows green is a good strategy, while another how their opponents have played (for possible strategy (go at red and stop at example, the player may be the “last green) is not (even though it is part of mover”), they may not know what the another equilibrium). It is sometimes best action is. A player will use their argued that efficiency is such an aspect: past experience, as well as the experi- if an equilibrium gives a higher payoff ence of other players (if that is available to every player than any other equilib- and relevant), to make forecasts as to rium, then players “should not have any the current or future behavior of oppo- trouble coordinating their expectations nents, as well as the payoff implications at the commonly preferred equilibrium of different actions. Since learning it- point” (John Harsanyi and Reinhard self changes the environment that the Selten 1988, p. 81). In our earlier stag agents are attempting to learn (as other hunt example, this principle (called agents change their behavior in re- payoff dominance by Harsanyi and sponse to their own learning), the pro- Selten 1988) suggests that the high- cess of learning is quite subtle. Note effort equilibrium is the obvious way to that theories of learning have a focal play. On the other hand, the low-effort point aspect, since histories (observed equilibrium is less risky, with Bruce re- patterns of play of other agents) can ceiving a payoff of 3, no matter what serve as coordinating devices that make the other members of his team do. In certain patterns of play “the obvious contrast, it is possible that a choice of way to play,” as in the traffic example high effort yields a payoff of only 0. As from the previous paragraph. we will see, evolutionary game theory The discussion so far points out some has been particularly important in ad- of the problems with many of the stan- dressing this issue of riskiness and pay- dard justifications for Nash equilibrium off dominance. See Kreps (1990a) for and its two assumptions (maximization an excellent extended discussion and and consistency). Besides learning, an- further examples of focal points.15 other approach is to concede the lack of Finally, agents may be able to learn a sound foundation for consistency, but to play an equilibrium. In order to learn maintain the hypothesis that agents to play an equilibrium, players must be maximize. The question then is whether playing the same game repeatedly, or at rationality and knowledge of the game least, similar games that can provide (including the rationality of opponents) valuable experience. Once all players 16 Non-evolutionary game theory work on learn- have learned how their opponents are ing has focused on the question of when maximiz- playing, and if all players are maximiz- ing players can in fact learn the behavior of their opponents. Examples of this work include Drew Fudenberg and David Kreps (1989), Drew Fuden- 15 Kreps (1990b, ch. 12) is a formal version of berg and David Levine (1993), Faruk Gul (1996), Kreps (1990a). and and Ehud Lehrer (1993). 1354 Journal of Economic Literature, Vol. XXXVI (September 1998) is enough, at least in some interesting IIII 999 cases, to yield usable predictions. The C1 C2 C3 10000 two (related) principles that people E1 E2 E3 have typically used in applications are backward induction and the iterated de- 10 9 1000 letion of weakly dominated strategies.17 0 100 99 In many games, these procedures iden- Figure 3. A short centipede game. tify a unique outcome. The difficulty is that they require an implausible degree of rationality and knowledge of other ever, while this logic is the same in players’ rationality. The two key exam- longer versions, many researchers are ples here are Rosenthal’s centipede no longer comfortable with the same game (so called because of the appear- prediction.19 It only requires one ance of its extensive form—Figure 3 is player to think that there is some a short centipede) and the finitely re- chance that the other player is willing peated Prisoner’s dilemma. The centi- to play C initially to support playing C pede is conceptually simpler, since it is in early moves. Similarly, if we consider a game of perfect information. The cru- the repeated prisoner’s dilemma, the cial feature of the game is that each logic of backward induction (together player, when it is their turn to move, with the property that the unique one- strictly prefers to end the game imme- period dominant-strategy equilibrium diately (i.e., choose En), rather than yields each player their maximin payoff) have the opponent end the game on the implies that even in early periods coop- next move by choosing En+1. More- eration is not possible. over, at each move, each player strictly prefers to have play proceed for a fur- 3. Evolutionary Game Theory ther two moves, rather than end imme- and Learning diately. For the game in Figure 3, if The previous section argued that, of play reaches the last possible move, the various justifications that have been player I surely ends the game by choos- advanced for equilibrium analysis, ing E rather than C . Knowing this, 3 3 learning is the least problematic. Evolu- player II should choose E . The induc- 2 tionary game theory is a particularly at- tion argument then leads to the conclu- tractive approach to learning. In the sion that player I necessarily ends the typical evolutionary game-theoretic game on the first move.18 I suspect model, there is a population of agents, everyone is comfortable with the pre- each of whose payoff is a function of diction that in a two-move game, player not only how they behave, but also how I stops the game immediately. How- the agents they interact with behave. At any point in time, behavior within the 17 Backward induction is also the basic principle underlying the elimination of equilibria relying on population is distributed over the dif- “incredible” threats. ferent possible strategies, or behaviors. 18 This argument is not special to the extensive If the population is finite, a state (of the form. If the centipede is represented as a normal form game, this backward induction is mimicked by the iterated deletion of weakly dominated 19 Indeed, more general knowledge-based con- strategies. While this game has many Nash equi- siderations have led some researchers to focus on libria, they all involve the same behavior on the the procedure of one round of weak domination equilibrium path: player I chooses E1. The equi- followed by iterated rounds of strict domination. libria only differ in the behavior of players off-the- A nice (although technical) discussion is Eddie equilibrium-path. Dekel and Faruk Gul (1997). Mailath: Do People Play Nash Equilibrium? 1355 population) is a description of which tion playing each strategy as a function agents are choosing which behavior. If of the current state. Evolutionary (also the population is infinite, a state is a known as selection or learning) dynam- description of the fractions of the popu- ics specify that behavior that is success- lation that are playing each strategy. If ful this period will be played by a larger a player can maximize, and knows the fraction in the immediate future. Static state, then he can choose a best reply. models study concepts that are in- If he does not know the state of the tended to capture stability ideas moti- population, then he must draw infer- vated by dynamic stories without explic- ences about the state from whatever in- itly analyzing dynamics. formation he has. In addition, even An important point is that evolution- given knowledge of the state, the player ary dynamics do not build in any as- may not be able to calculate a best re- sumptions on behavior or knowledge, ply. Calculating a best reply requires other than the basic principle of differ- that a player know all the strategies ential selection—apparently successful available and the payoff implications of behavior increases its representation in all these strategies. The observed his- the population, while unsuccessful be- tory of play is now valuable for two rea- havior does not. sons. First, the history conveys informa- Evolutionary models are not struc- tion about how the opponents are tural models of learning or bounded ra- expected to play. Second, the observed tionality. While the motivation for the success or failure of various choices basic principle of differential selection helps players determine what might be involves an appeal to learning and good strategies in the future. Imitation bounded rationality, individuals are not is often an important part of learning; explicitly modeled. The feature that successful behavior tends to be imi- successful behavior last period is an at- tated. In addition, successful behavior tractive choice this period does seem to will be taught. To the extent that play- require that agents are naive learners. ers are imitating successful behavior They do not believe, or understand, and not explicitly calculating best re- that their own behavior potentially af- plies, it is not necessary for players to fects future play of their opponents, distinguish between knowledge of the and they do not take into account the game being played and knowledge of possibility that their opponents are how opponents are playing. Players similarly engaged in adjusting their own need know only what was successful, behavior. Agents do not look for pat- not why it was successful. terns in historical data. They behave as Evolutionary game-theoretic game if the world is stationary, even though theoretic models are either static or dy- their own behavior should suggest to namic, and the dynamics are either in them it is not.20 Moreover, agents be- discrete or continuous time. A discrete have as if they believe that other agents’ time dynamic is a function that specifies experience is relevant for them. Imita- the state of the population in period t + tion then seems reasonable. Note that 1 as a function of the state in period t, the context here is important. This style i.e., given a distribution of behavior in the population, the dynamic specifies 20 It is difficult to build models of boundedly next period’s distribution. A continuous rational agents who look for patterns. Masaki Aoyagi (1996) and Doron Sonsino (1997) are rare time dynamic specifies the relative rates examples of models of boundedly rational agents of change of the fractions of the popula- who can detect cycles. 1356 Journal of Economic Literature, Vol. XXXVI (September 1998) of modeling does not lend itself to small models is to improve our intuition and numbers of agents. If there is only a to deepen our understanding of how small population, is it plausible to be- particular economic or strategic forces lieve that the agents are not aware of interact. For this literature to progress, this? And if agents are aware, then we must analyze (certainly now, and imitation is not a good strategy. As perhaps forever) simple and tractable we will see, evolutionary dynamics games. The games are intended as have the property that, in large popu- examples, experiments, and allegories. lations, if they converge, then they Modelers do not make assumptions of converge to a Nash equilibrium.21 This bounded rationality because they be- property is a necessary condition for lieve players are stupid, but rather that any reasonable model of social learn- players are not as sophisticated as our ing. Suppose we had a model in which models generally assume. In an ideal behavior converged to something that world, modelers would study very com- is not a Nash equilibrium. Since the plicated games and understand how environment is eventually stationary agents who are boundedly rational in and there is a behavior (strategy) some way behave and interact. But the available to some agent that yields a world is not ideal and these models are higher payoff, then that agent should intractable. In order to better under- eventually figure this out and so stand these issues, we need to study deviate.22 models that can be solved. Put differ- One concern sometimes raised about ently, the bounds on rationality are to evolutionary game theory is that its be understood relative to the complex- agents are implausibly naive. This con- ity of the environment. cern is misplaced. If an agent is bound- edly rational, then he does not under- 4. Nash Equilibrium as an Evolutionary stand the model as written. Typically, Stable State this model is very simple (so that com- plex dynamic issues can be studied) and Consider a population of traders that so the bounds of the rationality of the engage in randomly determined pair- agents are often quite extreme. For ex- wise meetings. As is usual, I will treat ample, agents are usually not able to de- the large population as being infinite. tect any cycles generated by the dynam- Suppose that when two traders meet, ics. Why then are the agents not able to each can choose one of two strategies, figure out what the modeler can? As in “bold” and “cautious.” If a trader has most of economic theory, the role of chosen to be “bold,” then he will bar- gain aggressively, even to the point of 21 More accurately, they converge to a Nash losing a profitable trade; on the other equilibrium of the game determined by the strate- hand, if a trader has chosen to be “cau- gies that are played along the dynamic path. It is tious,” then he will never lose a profit- possible that the limit point fails to be a Nash equilibrium because a strategy that is not played able trade. If a bold trader bargains along the path has a higher payoff than any strat- with a cautious trader, a bargain will be egy played along the path. struck that leaves the majority of gains Even if there is “drift” (see the ultimatum game below), the limit point will be close to a Nash from trade with the bold trader. If two equilibrium. cautious traders bargain, they equally 22 Of course, this assumes that this superior divide the gains from trade. If two bold strategy is something the agent could have thought of. If the strategy is never played, then the agent traders bargain, no agreement is might never think of it. reached. One meeting between two Mailath: Do People Play Nash Equilibrium? 1357 traders is depicted as the symmetric Bold Cautious game in Figure 4.23 Bold 03 Behaviors with higher payoffs are Cautious 12 more likely to be followed in the future. Suppose that the population originally Figure 4. Payoff to a trader following the row strategy against a trader following the column strategy. The consists only of cautious traders. If no gains from trade are 4. trader changes his behavior, the popula- tion will remain completely cautious. Now suppose there is a perturbation librium). At that distribution, if the that results in the introduction of some population is perturbed, so that, for ex- bold traders into the population. This ample, slightly more than half the popu- perturbation may be the result of entry: lation is now bold while slightly less perhaps traveling traders from another than half is cautious, cautious traders community have arrived; or experimen- have a higher payoff, and so learning tation: perhaps some of the traders are will lead to an increase in the number of not sure that they are behaving opti- cautious traders at the expense of bold mally and try something different.24 In traders, until balance is once again a population of cautious traders, bold restored. traders also consummate their deals and It is worth emphasizing that the final receive a higher payoff than the cau- state is independent of the original dis- tious traders. So over time, the fraction tribution of behavior in the population, of cautious traders in the population and that this state corresponds to the will decrease and the fraction of bold symmetric Nash equilibrium. Moreover, traders will increase. However, once this Nash equilibrium is dynamically there are enough bold traders in the stable: any perturbation from this state population, bold traders no longer have is always eliminated. an advantage (on average) over cautious This parable illustrates the basics of traders (since two bold traders cannot an evolutionary game theory model, in reach an agreement), and so the frac- particular, the interest in the dynamic tion of bold traders will always be behavior of the population. The next strictly less than one. Moreover, if the section describes the well-known notion population consists entirely of bold of an evolutionary stable strategy, a traders, a cautious trader can success- static notion that attempts to capture fully invade the population. The only dynamic stability. Section 4.2 then de- stable population is divided between scribes explicit dynamics, while Section bold and cautious traders, with the pre- 4.3 discusses asymmetric games. cise fraction determined by payoffs. In 4.1. Evolutionary Stable Strategies our example, the stable population is equally divided between bold and cau- In the biological setting, the idea that tious traders. This is the distribution a stable pattern of behavior in a popula- with the property that bold and cautious tion should be able to eliminate any in- traders have equal payoffs (and so also vasion by a “mutant” motivated John describes a mixed-strategy Nash equi- Maynard Smith and G. R. Price (1973) to define an evolutionary stable strategy 23 This is the Hawk-Dove game traditionally (ESS).25 If a population pattern of be- used to introduce the concept of an evolutionary stable strategy. 25 Good references on biological dynamics and 24 In the biological context, this perturbation is evolution are Maynard Smith (1982) and Josef referred to as a mutation or an invasion. Hofbauer and Karl Sigmund (1988). 1358 Journal of Economic Literature, Vol. XXXVI (September 1998) havior is to eliminate invading muta- ab tions, it must have a higher fitness than a 21 the mutant in the population that re- b 20 sults from the invasion. In biology, animals are programmed (perhaps Figure 5. The numbers in the matrix are the row a genetically) to play particular strategies player’s payoff. The strategy is an ESS in this game. and the payoff is interpreted as “fit- ness,” with fitter strategies having mixed strategy and the population is higher reproductive rates (reproduction originally monomorphic. is asexual). Definition 1. A (potentially mixed) It will be helpful to use some nota- strategy p is an Evolutionary Stable tion at this point. The collection of Strategy (ESS) if: available behaviors (strategies) is S and the payoff to the agent choosing i when 1. the payoff from playing p against p his opponent chooses j is π(i,j). We will is at least as large as the payoff follow most of the literature in assum- from playing any other strategy ing that there is only a finite number of against p; and available strategies. Any behavior in S is 2. for any other strategy q that has the called pure. A mixed strategy is a prob- same payoff as p against p, the pay- ability distribution over pure strategies. off from playing p against q is at While any pure strategy can be viewed least as large as the payoff from as the mixed strategy that places prob- playing q against q.27 ability one on that pure strategy, it will Thus, p is an evolutionary stable be useful to follow the convention that strategy if it is a symmetric Nash equi- the term “mixed strategy” always refers librium, and if, in addition, when q is to a mixed strategy that places strictly also a best reply to p, then p does better positive probability on at least two against q than q does. For example, a is strategies. Mixed strategies have two an ESS in the game in Figure 5. leading interpretations as a description ESS is a static notion that attempts to of behavior in a population: either the capture dynamic stability. There are population is monomorphic, in which two cases to consider: The first is that p every member of the population plays is a strict Nash equilibrium (see foot- the same mixed strategy, or the popula- note 10). Then, p is the only best reply tion is polymorphic, in which each to itself (so p must be pure), and any member plays a pure strategy and the agent playing q against a population fraction of the population playing any whose members mostly play p receives particular pure strategy equals the a lower payoff (on average) than a p probability assigned to that pure strat- player. As a result, the fraction of the 26 egy by the mixed strategy. As will be population playing q shrinks. clear, the notion of an evolutionary sta- The other possibility is that p is not ble strategy is best understood by as- the only best reply to itself (if p is a suming that each agent can choose a mixed strategy, then any other mixture 26 The crucial distinction is whether agents can with the same or smaller support is also play and inherit (learn) mixed strategies. If not, a best reply to p). Suppose q is a best then any mixed strategy state is necessarily the re- 27 π( ) = Σ π( ) sult of a polymorphic population. On the other The payoff to p against q is p,q ij i,j piqj hand, even if agents can play mixed strategies, the Formally, the strategy p is an ESS if, for all q, population may be polymorphic with different π(p,p) ≥ π(q,p), and if there exists q ≠ p such that agents playing different strategies. π(p,p) = π(q,p), then π(q,p) > π(q,q). Mailath: Do People Play Nash Equilibrium? 1359 reply to p. Then both an agent playing p The continuous time replicator dynamic and an agent playing q earn the same is then: payoff against a population of p players. dpt i = t × (π( t) − π( t t)) ( ) After the perturbation of the population pi i,p p ,p . 1 by the entry of q players, however, the dt population is not simply a population of Thus, if strategy i does better than p players. There is also a small fraction average, its representation in the popu- ( t > ) of q players, and their presence will lation grows dpi / dt 0 , and if another determine whether the q players are strategy i′ is even better, then its eliminated. The second condition in growth rate is also higher than that of the definition of ESS guarantees that strategy i. Equation (1) is a differential in the perturbed population, it is equation that, together with an initial the p players who do better than the q condition, uniquely determines a path players when they play against a q for the population that describes, for player. any time t, the state of the population. A state is a rest point of the dynamics 4.2. The Replicator and Other More if the dynamics leave the state un- General Dynamics t = changed (i.e., dpi /dt 0 for all i). A rest While plausible, the story underlying point is Liapunov stable if the dynamics ESS suffers from its reliance on the as- do not take states close to the rest point sumption that agents can learn (in the far away. A rest point is asymptotically biological context, inherit) mixed strate- stable if, in addition, any path (implied gies. ESS is only useful to the extent that by the dynamics) that starts sufficiently it appropriately captures some notion of close to the rest point converges to that dynamic stability. Suppose individuals rest point. now choose only pure strategies. Define There are several features of the rep- t pi as the proportion of the population licator dynamic to note. First, if a pure choosing strategy i at time t. The state of strategy is extinct (i.e., no fraction of the population at time t is then the population plays that pure strategy) t ≡ ( t … t ) at any point of time, then it is never p p1, , pn , where n is the number of played. In particular, any state in which strategies (of course, pt is in the n − 1 the same pure strategy is played by dimensional simplex). The simplest evo- every agent (and so every other strategy lutionary dynamic one could use to is extinct) is a rest point of the replica- investigate the dynamic properties of tor dynamic. So, being a rest point is ESS is the replicator dynamic. In its not a sufficient condition for Nash equi- simplest form, this dynamic specifies librium. This is a natural feature that that the proportional rate of growth in a we already saw in our discussion of the strategy’s representation in the popula- game in Figure 4—if everyone is bar- tion, pt, is given by the extent to which i gaining cautiously and if traders are not that strategy does better than the popu- aware of the possibilities of bold play, 28 lation average. The payoff to strategy i then there is no reason for traders to t when the state of the population is p is change their behavior (even though a π(i,pt) = ∑ π(i,j)p t, while the population rational agent who understood the pay- j j off implications of the different strate- average payoff is π(pt,pt) = ∑ π(i,j)pt pt. ij i j gies available would choose bold behav- 28 The replicator dynamic can also be derived ior rather than cautious). from more basic biological arguments. Second, the replicator dynamic is not 1360 Journal of Economic Literature, Vol. XXXVI (September 1998) a best reply dynamic: strategies that are mixed strategy inheritance, then p is an not best replies to the current popula- ESS if and only if it is asymptotically tion will still increase their repre- stable (see W. G. S. Hines 1980, Arthur sentation in the population if they do Robson 1992, and Zeeman 1981). A better than average (this feature only point I will come back to is that both becomes apparent when there are at asymptotic stability and ESS are con- least three available strategies). This cerned with the stability of the system again is consistent with the view that after a once and for all perturbation. this is a model of boundedly rational They do not address the consequences learning, where agents do not under- of continual perturbations. As we will stand the full payoff implications of the see, depending upon how they are different strategies. modeled, continual perturbations can Finally, the dynamics can have multi- profoundly change the nature of learn- ple asymptotically stable rest points. ing. The asymptotic distribution of behavior While the results on the replicator in the population can depend upon the dynamic are suggestive, the dynamics starting point. Returning to the stag are somewhat restrictive, and there has hunt game of Figure 1, if a high frac- been some interest in extending the tion of workers has chosen high effort analysis to more general dynamics.29 historically, then those workers who had Interest has focused on two classes of previously chosen low effort would be dynamics. The first, monotone dynam- expected to switch to high effort, and so ics, roughly requires that on average, the fraction playing high effort would players switch from worse to better (not increase. On the other hand, if workers necessarily the best) pure strategies. have observed low effort, perhaps low The second, more restrictive, class, ag- effort will continue to be observed. Un- gregate monotone dynamics, requires der the replicator dynamic (or any de- that, in addition, the switching of strate- 0 terministic learning dynamic), if p high > gies has the property that the induced t → 0 3/5, then phigh 1, while if p high < 3/5, distribution over strategies in the popu- t → then phigh 0. The equilibrium that lation has a higher average payoff. It is players eventually learn is determined worth noting that the extension to ag- by the original distribution of players gregate monotone dynamics is not that across high and low effort. If the origi- substantial: aggregate monotone dynam- 0 nal distribution is random (e.g., p high is ics are essentially multiples of the repli- determined as a realization of a uniform cator dynamic (Samuelson and Jianbo random variable), then the low effort Zhang 1992). equilibrium is 3/5’s as likely to arise as the high effort equilibrium. This notion 29 There has also been recent work exploring the of path dependence—that history mat- link between the replicator dynamic and explicit ters—is important and attractive. models of learning. Tilman Börgers and Rajiv Sarin (1997) consider a single boundedly rational E. Zeeman (1980) and Peter Taylor decision maker using a version of the Robert Bush and Leo Jonker (1978) have shown that and Frederick Mosteller (1951, 1955) model of if p is an ESS, it is asymptotically stable positive reinforcement learning, and show that the equation describing individual behavior looks like under the continuous time replicator the replicator dynamic. John Gale, Kenneth Bin- dynamic, and that there are examples of more, and Larry Samuelson (1995) derive the rep- asymptotically stable rest points of the licator dynamic from a behavioral model of aspira- tions. Karl Schlag (1998) derives the replicator replicator dynamic that are not ESS. If dynamic in a bandit setting, where agents learn dynamics are specified that allow for from others. Mailath: Do People Play Nash Equilibrium? 1361

Since, by definition, a Nash equilib- visiting trader rium is a strategy profile with the prop- Bold Cautious Bold 0,0 3,1 erty that every player is playing a best owner reply to the behavior of the other play- Cautious 1,3 2,2 ers, every Nash equilibrium is a rest point of any monotone dynamic. How- Figure 6. Payoff to two traders bargaining in the row trader’s establishment. The first number is the owner’s ever, since the dynamics may not intro- payoff, the second is the visitor’s. The gains from trade are 4. duce behavior that is not already pre- sent in the population, not all rest points are Nash equilibria. If a rest equilibrium). There are also two pure point is asymptotically stable, then the strategy asymmetric equilibria (the learning dynamics, starting from a point owner is bold, while the visitor is cau- close to the rest point but with all tious; and the owner is cautious, while strategies being played by a positive the visitor is bold). Moreover, these two fraction of the population, converge to pure strategy equilibria are strict. the rest point. Thus, if the rest point is Symmetric games, like that in Figure not a Nash equilibrium, some agents 6, have the property that the strategies are not optimizing and the dynamics available to the players do not depend will take the system away from that rest upon their role , i.e., row (owner) or point. This is the first major message column (visitor). The assumption that in from evolutionary game theory: if the a symmetric game, agents cannot condi- state is asymptotically stable, then it de- tion on their role is called no role iden- scribes a Nash equilibrium. tification. In most games of interest, the strategies available to a player also 4.3. Asymmetric Games and depend upon his role. Even if not, as in Nonexistence of ESS the example just discussed, there is The assumption that the traders are often some distinguishing feature that drawn from a single population and that allows players to identify their role the two traders in any bargaining en- (such as the row player being the in- counter are symmetric is important. In cumbent and the column player being order for two traders in a trading en- an entrant in a contest for a location). counter to be symmetric in this way, Such games are called asymmetric. their encounter must be on “neutral” The notion of an ESS can be applied ground, and not in one of their estab- to such games by either changing the lishments.30 Another possibility is that definition to allow for asymmetric mu- each encounter occurs in one of the tants (as in Jeroen Swinkels 1992), or, traders’ establishments, and the behav- equivalently, by symmetrizing the ior of the traders depends upon game. The symmetrized game is the whether he is the visitor or the owner of game obtained by assuming that, ex the establishment. This is illustrated in ante, players do not know which role Figure 6. This game has three Nash they will have, so that players’ strategies equilibria: The stable profile of a 50/50 specify behavior conditional on differ- mixture between cautious and bold be- ent roles, and first having a move of na- havior is still a mixed strategy equilib- ture that allocates each player to a role. rium (in fact, it is the only symmetric (In the trader example, a coin flip for each trader determines whether the 30 More accurately, the traders’ behavior cannot trader stays “home” this period, or visits depend on the location. another establishment.) However, every 1362 Journal of Economic Literature, Vol. XXXVI (September 1998)

αβ a 2,2 1,2 b 2,1 0,0

Figure 7. The asymmetric version of the game in Figure 5. The strategy (a,α) is not an ESS of the symmetrized game, since the payoff earned by the strategy (a,α) when playing against (b,α) equals the payoff earned by the strategy (b,α) when playing α 1 2π α +1 2π α = against (b, ) (which equals / (a, ) / ( ,b) 3/2). Proportion of bold visitors

Proportion of bold owners ESS in such a symmetrized game is a strict equilibrium (Selten 1980). This is Figure 8. The phase diagram for the two an important negative result, since most population trader game. games (in particular, non-trivial exten- sive form games) do not have strict equilibria, and so ESSs do not exist for cheap talk and forward and backward most asymmetric games. induction. The intuition for the nonexistence is It is also helpful to consider the be- helpful in what comes later, and is most havior of dynamics in a two-population easily conveyed if we take the mono- world playing the trader game. There morphic interpretation of mixed strate- are two populations, owners and visi- gies. Fix a non-strict Nash equilibrium tors. A state now consists of the pair of an asymmetric game, and suppose it (p,q), where p is the fraction of owners is the row player that has available who are bold, while q is the fraction of 31 other best replies. Recall that a strategy visitors who are bold. There is a sepa- specifies behavior for the agent in his rate replicator dynamic for each popula- role as the row player and also in his tion, with the payoff to a strategy fol- role as the column player. The mutant lowed by an owner depending only on of interest is given by the strategy that q, and not p (this is the observation specifies one of these other best replies from the previous paragraph that row players do not interact with row play- for the row role and the existing behav- = = ior for the column role. This mutant is ers). While (p*,q*), where p* q* ⁄ not selected against, since its expected 1 2, is still a rest point of the two di- payoff is the same as that of the remain- mensional dynamical system describing der of the population. First note that all the evolution of trader behavior, it is agents in their column role still behave no longer asymptotically stable. The the same as before the invasion. Then, phase diagram is illustrated in Figure 8. in his row role, the mutant is (by as- If there is a perturbation, then owners sumption) playing an alternative best and traders move toward one of the reply, and in his column role he re- two strict pure equilibria. Both of the ceives the same payoff as every other asymmetric equilibria are asymptoti- column player (since he is playing the cally stable. same as them). See Figure 7. The idea If the game is asymmetric, then we that evolutionary (or selection) pres- sures may not be effective against 31 This is equivalent to considering dynamics in the one-population model where the game is sym- alternative best replies plays an im- metrized and p is the fraction of the population portant role in subsequent work on who are bold in the owner role. Mailath: Do People Play Nash Equilibrium? 1363

Player 2 2 LR T 1,1 1,0 Player 1 LR B 1,1 0,0 1 Figure 9. Any state in which all agents in population 2 1 play L is a stable rest point. 1 TB have already seen that the only ESS are strict equilibria. There is a similar 1 0 lack of power in considering asymptotic 0 0 stability for general dynamics in asym- Figure 10. An extensive form game with the metric games. In particular, asymptotic normal form in Figure 9. stability in asymmetric games implies “almost” strict Nash equilibria, and if the profile is pure, it is strict.32 For dominated strategy) will also now dis- the special case of replicator dynamics, appear. a Nash equilibrium is asymptotically There is an important distinction stable if and only if it is strict (Klaus here between strict and weak domina- Ritzberger and Jörgen Weibull 1995). tion. It is not true that weakly domi- nated strategies are similarly elimi- 5. Which Nash Equilibrium? nated. Consider the game taken from Samuelson and Zhang (1992, p. 382) in Beyond excluding non-Nash behavior Figure 9. It is worth noting that this is as stable outcomes, evolutionary game the normal form of the extensive form theory has provided substantial insight game with perfect information given in into the types of behavior that are con- Figure 10. sistent with evolutionary models, and There are two populations, with the types of behavior that are not. agents in population 1 playing the role 5.1. Domination of player 1 and agents in population 2 playing the role of player 2. Any state in The strongest positive results concern which all agents in population 2 play L the behavior of the dynamics with re- is Liapunov stable: Suppose the state spect to strict domination: If a strategy starts with almost all agents in popula- is strictly dominated by another (that tion 2 playing L. There is then very lit- is, it yields a lower payoff than the tle incentive for agents in the first other strategy for all possible choices population playing B to change behav- of the opponents), then over time that ior, since T is only marginally better strictly dominated strategy will disap- (moreover, if the game played is in fact pear (a smaller fraction of the popula- the extensive form of Figure 10, then tion will play that strategy). Once that agents in population 1 only have a strategy has (effectively) disappeared, choice to make if they are matched with any strategy that is now strictly domi- one of the few agents choosing R). So, nated (given the deletion of the original dynamics will not move the state far from its starting point, if the starting 32 Samuelson and Zhang (1992, p. 377) has the point has mostly L-playing agents in precise statement. See also Daniel Friedman (1991) and Weibull (1995) for general results on population 2. If we model the dynamics continuous time dynamics. for this game as a two-dimensional con- 1364 Journal of Economic Literature, Vol. XXXVI (September 1998) tinuous time replicator dynamic, we have33

t t dp q = pt × ( − pt)( − t) , 1 1 q L dt and dqt = qt × (1 − qt), dt t where p is the fraction of population 1 Fraction playing playing T and qt is the fraction of popula- tion 2 playing L. The adjustment of the fraction of population 2 playing L re- Fraction playing T, pt flects the strict dominance of L over R: Since L always does better than R, if qt is Figure 11. The phase diagram for the game with interior (that is, there are both agents weak dominance. playing L and R in the population) the t fraction playing L increases (dq ⁄dt > 0), point. Nothing prevents the system independently of the fraction in popula- from “drifting” along the heavy lined tion 1 playing T, with the adjustment horizontal segment in Figure 11. More- only disappearing as qt approaches 1. over, each time the system is perturbed The adjustment of the fraction of popu- from a rest point toward the interior of lation 1 playing T, on the other hand de- the state space (so that there is a posi- pends on the fraction of population 2 tive fraction of population 1 playing B playing R: if almost all of population 2 is and of population 2 playing R), it typi- playing L (qt is close to 1), then the ad- cally returns to a different rest point justment in pt is small (in fact, arbitrarily (and for many perturbations, to a rest small for qt arbitrarily close to 1), no point with a higher fraction of popula- matter what the value of pt. The phase tion 2 playing L). This logic is reminis- diagram is illustrated in Figure 11. cent of the intuition given earlier on for It is also important to note that no the nonexistence of ESS in asymmetric rest point is asymptotically stable. Even games. It also suggests that sets of the state with all agents in population 1 states will have better stability proper- playing T and all agents in population 2 ties than individual states. playing L is not asymptotically stable, Recall that if a single strategy profile because a perturbation that increases is “evolutionarily stable,” then behavior the fraction of population 2 playing R within the population, once near that while leaving the entire population 1 profile, converges to it and never leaves playing T will not disappear: the system it. A single strategy profile describes has been perturbed toward another rest the aggregate pattern of behavior within the population. A set of strategy 33 The two-population version of (1) is: dpt profiles is a collection of such descrip- i = pt × (π (i,qt) − π (pt,qt)) dt i 1 1 tions. Loosely, we can think of a set of and strategy profiles as being “evolutionarily dqt stable” if behavior within the popula- i = qt × (π (pt,j) − π (pt,qt)), dt j 2 2 tion, once near any profile in the set, π ( ) where k i,j is player k’s payoff from the strategy converges to some profile within it and pair (i,j) and never leaves the set. The important fea- π ( ) = Σ π ( ) k p,q ij k i,j piqj. ture is that behavior within the popula- Mailath: Do People Play Nash Equilibrium? 1365 tion need not settle down to a steady learn that the proposer really was mak- state, rather it can “drift” between the ing a take-it-or-leave-it offer. As men- different patterns within the “evolution- tioned in footnote 7, most people may arily stable” set. not be used to take-it-or-leave-it offers, and it may take time for the responders 5.2. The Ultimatum Game and to properly appreciate what this means. Backward Induction The second is that the monetary reward The ultimatum game is a simple game is only one ingredient in responders’ with multiple Nash equilibria and a utility functions, and that responders unique backward induction outcome. must learn what the “fair” offer is. There is $1 to divide. The proposer If proposers learn sufficiently fast proposes a division to the responder. relative to responders, then there can The responder either accepts the divi- be convergence to a Nash equilibrium sion, in which case it is implemented, or that is not the backward induction solu- rejects it, in which case both players re- tion. In the backward induction solu- ceive nothing. If the proposer can make tion, the responder gets almost nothing, any proposal, the only backward induc- so that the cost of making an error is tion solution has the receiver accepting low, while the proposer loses signifi- the proposal in which the proposer re- cantly if he misjudges the acceptance ceives the entire dollar. If the dollar threshold of the responder. In fact, in a must be divided into whole pennies, simplified version of the ultimatum there is another solution in which the game, Nash equilibria in which the re- responder rejects the proposal in which sponder gets a substantial share are he receives nothing and accepts the stable (although not asymptotically proposal of 99 cents to the proposer stable). In a non-backward induction and 1 cent to the responder. This pre- outcome, the proposer never learns that diction is uniformly rejected in experi- he could have offered less to the re- ments! sponder (since he never observes such How do Gale, Binmore, and Samuel- behavior). If he is sufficiently pessimis- son (1995) explain this? The critical is- tic about the responder’s acceptance sue is the relative speeds of conver- threshold, then he will not offer less, gence. Both proposers and responders since a large share is better than noth- are “learning.” The proposers are learn- ing. Consider a simplified ultimatum ing which offers will be rejected (this is game that gives the proposer and re- learning as we have been discussing it). sponder two choices: the proposer In principle, if the environment (terms either offers even division or a small of the proposal) is sufficiently compli- positive payment, and the responder cated, responders may also have diffi- only responds to the small positive pay- culty evaluating offers. In experiments, ment (he must accept the equal divi- responders do reject as much as 30 sion). Figure 12 is the phase diagram cents. However, it is difficult to imag- for this simplified game. While this ine that responders do not understand game has some similarities to that in that 30 cents is better than zero. There Figure 9, there is also an important dif- are at least two possibilities that still al- ference. All the rest points inconsistent low the behavior of the responders to with the backward induction solution be viewed as learning. The first is that (with the crucial exception of the point responders do not believe the rules of labelled A) are Liapunov stable (but not the game as described and take time to asymptotically so). Moreover, in re- 1366 Journal of Economic Literature, Vol. XXXVI (September 1998)

A dynamic properties in a fundamental way. With no drift, the only asymptoti- cally stable rest point is the backward induction solution. With drift, there can be another asymptotically stable rest point near the non-backward-induction Nash equilibria. The ultimatum game is special in its simplicity. Ideas of backward induction and sequential rationality have been in-

Fraction of responders rejecting fluential in more complicated games Fraction of proposers (like the centipede game, the repeated offering even split prisoner’s dilemma, and alternating of- Figure 12. A suggestive phase diagram for a fer bargaining games). In general, back- simplified version of the ultimatum game. ward induction has received little sup- port from evolutionary game theory for more complicated games (see, for exam- sponse to some infrequent perturba- ple, Ross Cressman and Karl Schlag tions, the system will effectively “move (1995 ), Georg Nöldeke and Samuelson along” these rest points toward A. But (1993 ), and Giovanni Ponti (1996)). once at A (unlike the corresponding sta- 5.3. Forward Induction and Efficiency ble rest point in Figure 11), the system will move far away. The rest point la- In addition to backward induction, beled A is not stable: Perturbations the other major refinement idea is for- near A can move the system onto a tra- ward induction. In general, forward in- jectory that converges to the backward duction receives more support from induction solution. evolutionary arguments than does back- While this seems to suggest that non- ward induction. The best examples of backward-induction solutions are frag- this are in the context of cheap talk ile, such a conclusion is premature. (starting with Akihiko Matsui 1991). Since any model is necessarily an ap- Some recent papers are V. Bhaskar proximation, it is important to allow for (1995), Andreas Blume, Yong-Gwan drift. Binmore and Samuelson (1996) Kim and Joel Sobel (1993), Kim and use the term drift to refer to un- Sobel (1995), and Karl Wärneryd modeled small changes in behavior. (1993)—see Sobel (1993) and Samuel- One way of allowing for drift would be son (1993) for a discussion. to add to the learning dynamic an addi- Forward induction is the idea that ac- tional term reflecting a deterministic tions can convey information about the flow from one strategy to another that future intentions of players even off the was independent of payoffs. In general, equilibrium path. Cheap talk games this drift would be small, and in the (signaling games in which the messages presence of strong incentives to learn, are costless) are ideal to illustrate these irrelevant. However, if players (such as ideas. Cheap talk games have both re- the responders above) have little incen- vealing equilibria (cheap talk can con- tive to learn, then the drift term be- vey information) and the so-called bab- comes more important. In fact, adding bling equilibria (messages do not an arbitrarily small uniform drift term convey any information because neither to the replicator dynamic changes the the sender nor the receiver expects Mailath: Do People Play Nash Equilibrium? 1367 them to), and forward induction has population (both receive the same pay- been used to eliminate some nonreveal- off when matched with the babbling ing equilibria. strategy, but the truthful strategy does Consider the following cheap talk strictly better when matched with the game: There are two states of the truthful strategy). Moreover, the sepa- world, rain and sun. The sender knows rating equilibria are strict equilibria, the state and announces a message, rain and so, ESSs. Suppose, for example, all or sun. On the basis of the message, the players are playing the truthful strategy. receiver chooses an action, picnic or Then any other strategy must yield movie. Both players receive 1 if the re- strictly lower payoffs: Either, as a ceiver’s action agrees with the state of sender, the strategy specifies an action the world (i.e., picnic if sun, and movie conditional on a state that does not cor- if rain), and 0 otherwise. Thus, the respond to that state, leading to an in- sender’s message is payoff irrelevant correct action choice by the truthful re- and so is “cheap talk.” The obvious pat- ceiver, or, as a responder, the strategy tern of behavior is for the sender to sig- specifies an incorrect action after a nal the state by making a truthful an- truthful announcement). nouncement in each state and for the This simple example is driven by the receiver to then choose the action that crucial assumption that the number of agrees with the state. In fact, since the messages equals the number of states. receiver can infer the state from the an- If there are more messages than states, nouncement (if the announcement dif- then there are no strict equilibria, and fers across states), there are two sepa- so no ESSs. To obtain similar efficiency rating equilibrium profiles (truthful results for a larger class of games, we announcing, where the sender an- need to use set-valued solution con- nounces rain if rain and sun if sun; and cepts, such as cyclical stability (Itzhak false announcing, where sender an- Gilboa and Matsui 1991) used by Mat- nounces sun if rain and rain if sun). sui (1991) , and equilibrium evolution- A challenge for traditional non-coop- ary stability (Swinkels 1992) used by erative theory, however, is that bab- Blume, Kim, and Sobel (1993). Matsui bling is also an equilibrium: The sender (1991) and Kim and Sobel (1995) study places equal probability on rain and coordination games with a preplay sun, independent of the state. The re- round of communication. In such ceiver, learning nothing from the mes- games, communication can allow play- sage, places equal probability on rain ers to coordinate their actions. How- and sun. Consider ESS in the sym- ever, as in the above example, there are metrized game (where each player has also babbling equilibria, so that commu- equal probability of being the sender or nication appears not to guarantee co- receiver). It turns out that only separat- ordination. Evolutionary pressures, on ing equilibria are ESS. The intuition is the other hand, destabilize the babbling in two parts. First, the babbling equilib- equilibria. Blume, Kim, and Sobel rium is not an ESS: Consider the truth- (1993) study cheap talk signaling games ful entrant (who announces truthfully like that of the example. Bhaskar (1995) and responds to announcements by obtains efficiency with noisy pre-play choosing the same action as the an- communication, and shows that, in his nouncement). The payoff to this entrant context at least, the relative importance is strictly greater than that of the bab- of noise and mutations is irrelevant. bling strategy against the perturbed The results in this area strongly sug- 1368 Journal of Economic Literature, Vol. XXXVI (September 1998)

column player agree in this case, since the game is so HL simple),34 and observe that the basin of H 100,100 0,0 row player attraction of (H,H) is 100-times the L 0,0 1,1 size of (L,L). If we imagine that the in- itial condition is chosen randomly, then Figure 13. (H,H) seems more likely than (L,L). the H pattern of behavior is 100 times as likely to arise. I now describe a more recent perspective that makes the last gest that evolutionary pressures can idea more precise by eliminating the destabilize inefficient outcomes. The need to specify an initial condition. key intuition is that suggested by the The motivation for ESS and (asymp- example above. If an outcome is ineffi- totic) stability was a desire for robust- cient, then there is an entrant that is ness to a single episode of perturbation. equally successful against the current It might seem that if learning operates population, but that can achieve the ef- at a sufficiently higher rate than the ficient outcome when playing against a rate at which new behavior is intro- suitable opponent. A crucial aspect is duced into the population, focusing on that the model allow the entrant to have the dynamic implications of a single sufficient flexibility to achieve this, and perturbation is reasonable. Dean Foster that is the role of cheap talk above. and Young (1990) have argued that the notions of an ESS and attractor of the 5.4. Multiple Strict Equilibria replicator dynamic do not adequately Multiple best replies for a player capture long-run stability when there raise the question of determining which are continual small stochastic shocks. of these best replies are “plausible” or Young and Foster (1991) describe simu- “sensible.” The refinements literature lations and discuss this issue in the con- (of which backward and forward induc- text of Robert Axelrod’s (1984) com- tion are a part) attempted to answer puter tournaments. this question, and by so doing eliminate There is a difficulty that must be con- some equilibria as being uninteresting. fronted when explicitly modeling ran- Multiple strict equilibria raise a com- domness. As I mentioned above, the pletely new set of issues. It is also worth standard replicator dynamic story is an recalling that any strict equilibrium is idealization for large populations (spe- asymptotically stable under any mono- cifically, a continuum). If the mutation- tonic dynamic. I argued earlier that this experimentation occurs at the individ- led to the desirable feature of history ual level, there should be no aggregate dependence. However, even at an intui- impact; the resulting evolution of the tive level, some strict equilibria are system is deterministic and there are no more likely. For example, in the game “invasion events.” There are two ways described by Figure 13, (H,H) seems to approach this. One is to consider more likely than (L,L). There are sev- aggregate shocks (this is the approach eral ways this can be phrased. Certainly, of Foster and Young 1990 and Fuden- (H,H) seems more “focal,” and if asked berg and Christopher Harris 1992). The to play this game, I would play H, as would (I suspect) most people. Another 34 A state is in the basin of attraction of an equi- way of phrasing this is to compare the librium if, starting at that state and applying the dynamic, eventually the system is taken to the basins of attraction of the two equilibria state in which all players play the equilibrium under monotone dynamics (they will all strategy. Mailath: Do People Play Nash Equilibrium? 1369 other is to consider a finite population and the experimentation phase. Note and analyze the impact of individual ex- that after the experimentation phase (in perimentation. Michihiro Kandori, contrast to the learning phase), with Mailath, and Rafael Rob (1993) study positive probability, fewer workers may the implications of randomness on the be choosing a best reply. individual level; in addition to empha- Attention now focuses on the behav- sizing individual decision making, the ior of the Markov chain with the per- paper analyzes the simplest model that petual randomness. Because of the ex- illustrates the role of perpetual random- perimentation, every state is reached ness. with positive probability from any other Consider again the stag hunt game, state (including the states in which all and suppose each work-team consists of workers choose high effort and all work- two workers. Suppose, moreover, that ers choose low effort). Thus, the the firm has N workers. The relevant Markov chain is irreducible and aperi- state variable is z, the number of work- odic. It is a standard result that such a ers who choose high effort. Learning Markov chain has a unique stationary implies that if high effort is a better distribution. Let µ(ε) denote the station- strategy than low effort, then at least ary distribution. The goal has been to some workers currently choosing low ef- characterize the limit of µ(ε) as ε be- fort will switch to high effort (i.e., comes small. This, if it exists, is called zt+1 > zt if zt < N). A similar property the stochastically stable distribution holds if low effort is a better strategy (Foster and Young 1990) or the limit than high. The learning or selection dy- distribution. States in the support of namics describe, as before, a dynamic the limit distribution are sometimes on the set of population states, which is called long-run equilibria (Kandori, now the number of workers choosing Mailath, and Rob 1993). The limit dis- high effort. Since we are dealing with a tribution is informative about the be- finite set of workers, this can also be havior of the system for positive but thought of as a Markov process on a fi- very small ε . Thus, for small degrees of nite state space. The process is Markov, experimentation, the system will spend because, by assumption workers only almost all of its time with all players learn from last period’s experience. choosing the same action. However, Moreover, both all workers choosing every so often (but infrequently) high effort and all workers choosing low enough players will switch action, which effort are absorbing states of this will then switch play of the population Markov process.35 This is just a restate- to the other action, until again enough ment of the observation that both states players switch. correspond to Nash equilibria. It is straightforward to show that the Perpetual randomness is incorporated stochastically stable distribution exists; by assuming that, in each period, each characterizing it is more difficult. Kan- worker independently switches his ef- dori, Mailath, and Rob (1993) is mostly fort choice (i.e., experiments) with concerned with the case of 2×2 symmet- probability ε, where ε > 0 is to be ric games with two strict symmetric thought of as small. In each period, Nash equilibria. Any monotone dynamic there are two phases: the learning phase divides the state space into the same two basins of attraction of the equilib- 35 An absorbing state is a state that the process, ria. The risk dominant equilibrium once in, never leaves. (Harsanyi and Selten 1988) is the equi- 1370 Journal of Economic Literature, Vol. XXXVI (September 1998)

minimum of other quires n workers. While this is no worker’s efforts longer a two - player game, it is still high low true that (for large populations) the size worker’s high V 0 of the basin of attraction is the deter- effort low 33 mining feature of the example. For ex- ample, if V = 8, high effort has a smaller Figure 14. A new “stag-hunt” played by workers in a team. basin of attraction than low effort for all n ≥ 3. As V increases, the size of the team for which high effort becomes too librium with the larger basin of attrac- risky increases.36 This attractively cap- tion. The risk dominant equilibrium is tures the idea that cooperation in large “less risky” and may be Pareto domi- groups can be harder to achieve than nated by the other equilibrium (the risk cooperation in small groups, just due to dominant equilibrium results when the uncertainty that everyone will coop- players choose best replies to beliefs erate. While a small possibility of non- that assign equal probability to the two cooperation by any one agent is not de- possible actions of the opponent). In stabilizing in small groups (since the stag hunt example, the risk domi- cooperation is a strict equilibrium), it is nant equilibrium is low effort and it is in large ones. Pareto dominated by high effort. In In contrast to both ESS and replica- Figure 13, the risk dominant equilib- tor dynamics, there is a unique out- rium is H and it Pareto dominates L. come. History does not matter. This is Kandori, Mailath, and Rob (1993) show the result of taking two limits: first, that the limit distribution puts prob- time is taken to infinity (which justifies ability one on the risk dominant equilib- looking at the stationary distribution), rium. The non-risk dominant equilib- and then the probability of mutation is rium is upset because the probability of taken to zero (looking at small rates). a sufficiently large number of simulta- The rate at which the stationary distri- neous mutations that leave society in bution is approached from an arbitrary the basin of attraction of the risk domi- starting point is decreasing in popula- nant equilibrium is of higher order than tion size (since the driving force is the that of a sufficiently large number of si- probability of a simultaneous mutation multaneous mutations that cause society by a fraction of the population). Moti- to leave that basin of attraction. vated by this observation, Glenn Ellison This style of analysis allows us to for- (1993) studied a model with local inter- mally describe strategic uncertainty. To actions that has substantially faster make this point, consider again the rates of convergence. The key idea is workers involved in team production as that, rather than playing against a ran- a stag hunt game. For the payoffs as in domly drawn opponent from the entire Figure 1, leads to the population, each player plays only low effort outcome even if each work against a small number of neighbors. team has only two workers. Suppose, The neighborhoods are overlapping, though, the payoffs are as in Figure 14, however, so that a change of behavior in with V being the value of high effort (if one neighborhood can (eventually) in- reciprocated). For V > 6 and two-worker fluence the entire population. teams, the principles of risk dominance and payoff dominance agree: high ef- 36 High effort has the larger basin of attraction 1 3 fort. But now suppose each team re- if ( ⁄2)n > ⁄V. Mailath: Do People Play Nash Equilibrium? 1371

Since it is only for the case of 2×2 choice of state-dependent mutation symmetric games that the precise mod- probabilities. In particular, in 2 × 2 eling of the learning dynamic is irrele- games, if the risk dominant strategy is vant, extensions to larger games require “harder” to learn than the other (in the specific assumptions about the dynam- sense that the limiting behavior of the ics. Kandori and Rob (1995), Nöldeke mutation probabilities favors the non- and Samuelson (1993), and Young risk dominant strategy), then the risk (1993) generalize the best reply dy- dominant equilibrium will not be se- namic in various directions. lected. On the other hand, if the state Kandori, Mailath, and Rob (1993), dependence of the mutation prob- Young (1993), and Kandori and Rob abilities only arises because the prob- (1995) study games with strict equilib- abilities depend on the difference in ria, and (as the example above illus- payoffs from the two strategies, the risk trates) the relative magnitudes of dominant equilibrium is selected probabilities of simultaneous mutations (Lawrence Blume 1994). This latter are important. In contrast, Samuelson state dependence can be thought of as (1994) studies normal form games with being strategy neutral in that it only de- nonstrict equilibria, and Nöldeke and pends on the payoffs generated by the Samuelson (1993) study extensive form strategies, and not on the strategies games. In these cases, since the equilib- themselves. The state dependence that ria are not strict, states that correspond Bergin and Lipman (1996) need for to equilibria can be upset by a single their general possibility result is per- mutation. This leads to the limit distri- haps best thought of as strategy depen- bution having nonsingleton support. dence, since the selection of some strict This is the reflection in the context of equilibria only occurs if players find it stochastic dynamics of the issues illus- easier to switch to (learn) certain strate- trated by discussion of Figures 9, 10, gies (perhaps for complexity reasons). and 12. In general, the support will con- Binmore, Samuelson, and Richard tain “connected” components, in the Vaughan (1995), who study the result of sense that there is a sequence of single the selection process itself being the mutations from one state to another source of the randomness, also obtain state that will not leave the support. different selection results. Finally, the Moreover, each such state will be a rest matching process itself can also be an point of the selection dynamic. The re- important source of randomness; see sults on extensive forms are particularly Young and Foster (1991) and Robson suggestive, since different points in a and Fernando Vega-Redondo (1996). connected component of the support correspond to different specifications of 6. Conclusion off-the-equilibrium path behavior. The introduction of stochastic dy- The result that any asymptotically sta- namics does not, by itself, provide a ble rest point of an evolutionary dy- general theory of equilibrium selection. namic is a Nash equilibrium is an im- James Bergin and Bart Lipman (1996) portant result. It shows that there are show that allowing the limiting behavior primitive foundations for equilibrium of the mutation probabilities to depend analysis. However, for asymmetric on the state gives a general possibility games, asymptotic stability is effectively theorem: Any strict equilibrium of any equivalent to strict equilibria (which do game can be selected by an appropriate not exist for many games of interest). 1372 Journal of Economic Literature, Vol. XXXVI (September 1998)

To a large extent, this is due to the fo- hope was not met, and we now under- cus on individual states. If we instead stand that that hope, in principle, could consider sets of states (strategy pro- never be met. The refinements litera- files), as I discussed at the end of Sec- ture still serves the useful role of pro- tion 5.1, there is hope for more positive viding a language to describe the prop- results.37 erties of different equilibria. Applied The lack of support for the standard researchers find the refinements litera- refinement of backward induction is in ture of value for this reason, even some ways a success. Backward induc- though they cannot rely on it mechani- tion has always been a problematic prin- cally to eliminate “uninteresting” equi- ciple, with some examples (like the cen- libria. The refinements literature is cur- tipede game) casting doubt on its rently out of fashion because there were universal applicability. The reasons for too many papers in which one example the lack of support improve our under- suggested a minor modification of an standing of when backward induction is existing refinement and no persuasive an appropriate principle to apply. general refinement theory emerged. The ability to discriminate between There is a danger that evolutionary game different strict equilibria and provide a theory could end up like refinements. It formalization of the intuition of strate- is similar in that there was a lot of early gic uncertainty is also a major contribu- hope and enthusiasm. And, again, there tion of the area. have been many perturbations of mod- I suspect that the current evolution- els and dynamic processes, not always ary modeling is still too stylized to be well motivated. As yet, the overall pic- used directly in applications. Rather, ture is still somewhat unclear. applied researchers need to be aware of However, on the positive side, impor- what they are implicitly assuming when tant insights are still emerging from they do equilibrium analysis. evolutionary game theory (for example, In many ways, there is an important the improving understanding of when parallel with the refinements literature. backward induction is appropriate and Originally, this literature was driven by the formalization of strategic uncer- the hope that theorists could identify tainty). Interesting games have many the unique “right” equilibrium. If that equilibria, and evolutionary game the- original hope had been met, applied re- ory is an important tool in under- searchers need never worry about a standing which equilibria are particu- multiplicity problem. Of course, that larly relevant in different environments.

37 Sets of strategy profiles that are asymptoti- REFERENCES cally stable under plausible deterministic dynam- ics turn out also to have strong Elon Kohlberg and Aoyagi, Masaki. 1996. “Evolution of Beliefs and Jean-Francois Mertens (1986) type stability prop- the Nash Equilibrium of Normal Form Games,” erties (Swinkels (1993)), in particular, the prop- J. Econ. Theory, 70, pp. 444–69. erty of robustness to deletion of never weak best Aumann, Robert J. 1985. “On the Non-Transfer- replies. This latter property implies many of the able Utility Value: A Comment on the Roth– refinements that have played an important role in Shafer Examples,” Econometrica, 53, pp. 667– the refinements literature and signaling games, 77. such as the intuitive criterion, the test of equilib- ———. 1990. “Nash Equilibria are Not Self-En- rium domination, and D1 (In-Koo Cho and Kreps forcing,” in Jean Jaskold Gabszewicz, Jean 1987). A similar result under different conditions François Richard, and Laurence A. Wolsey, pp. was subsequently proved by Ritzberger and 201–206. Weibull (1995), who also characterize the sets of Aumann, Robert J. and Brandenburger, Adam. profiles that can be asymptotically stable under 1995. “Epistemic Conditions for Nash Equilib- certain conditions. rium,” Econometrica, 63, pp. 1161–80. Mailath: Do People Play Nash Equilibrium? 1373

Axelrod, Robert. 1984. The Evolution of Coopera- Fudenberg, Drew and David Levine. 1993. tion. New York: Basic Books. “Steady State Learning and Nash Equilibrium,” Bergin, James and Barton L. Lipman. 1996. “Evo- Econometrica, 61, pp. 547–73. lution with State-Dependent Mutations,” Gabszewicz, Jean Jaskold, Jean François Richard, Econometrica, 64, pp. 943–56. and Laurence A. Wolsey, eds. 1990. Economic Bernheim, B. Douglas. 1984. “Rationalizable Stra- Decision-Making: Games, Econometrics, and tegic Behavior,” Econometrica, 52, pp. 1007–28. Optimisation. Contributions in Honour of Bhaskar, V. 1995. “Noisy Communication and the Jacques H. Drèze. New York: North-Hol- Evolution of Cooperation,” U. St. Andrews. land. Binmore, Kenneth G. and Larry Samuelson. 1996. Gale, John, Kenneth G. Binmore, and Larry Sam- “Evolutionary Drift and Equilibrium Selection,” uelson. 1995. “Learning to be Imperfect: The mimeo, U. Wisconsin. Ultimatum Game,” Games & Econ. Behavior, 8, Binmore, Kenneth G., Larry Samuelson, and Rich- pp. 56–90. ard Vaughan. 1995. “Musical Chairs: Modeling Gilboa, Itzhak and Akihiko Matsui. 1991. “Social Noisy Evolution,” Games & Econ. Behavior, 11, Stability and Equilibrium,” Econometrica, 59, pp. 1–35. pp. 859–67. Blume, Andreas, Yong-Gwan Kim, and Joel Sobel. Gul, Faruk. 1996. “Rationality and Coherent 1993. “Evolutionary Stability in Games of Com- Theories of Strategic Behavior,” J. Econ. The- munication,” Games & Econ. Behavior, 5, pp. ory, 70, pp. 1–31. 547–75. Harsanyi, John C. 1977. Rational Behavior and Blume, Lawrence. 1994. “How Noise Matters,” Bargaining Equilibrium in Games and Social mimeo, Cornell U. Situations. Cambridge, UK: Cambridge U. Börgers, Tilman and Rajiv Sarin. 1997. “Learning Press. Through Reinforcement and Replicator Dy- Harsanyi, John C. and Reinhard Selten. 1988. A namics,” J. Econ. Theory, 77, pp. 1–14. General Theory of Equilibrium in Games. Cam- Bush, Robert R. and Frederick Mosteller. 1951. bridge: MIT Press. “A Mathematical Model for Simple Learning,” Hines, W. G. S. 1980. “Three Characterizations of Psych. Rev., 58, pp. 313–23. Population Strategy Stability,” J. Appl. Prob- ———. 1955. Stochastic Models of Learning. ability, 17, pp. 333–40. New York: Wiley. Hofbauer, Josef and Karl Sigmund. 1988. The The- Cho, In-Koo and David Kreps. 1987. “Signaling ory of Evolution and Dynamical Systems. Cam- Games and Stable Equilibria,” Quart. J. Econ., bridge: Cambridge U. Press. 102, pp. 179–221. Kalai, Ehud and Ehud Lehrer. 1993. “Rational Creedy, John, Jeff Borland, and Jürgen Eichber- Learning Leads to Nash Equilibrium,” Econo- ger, eds. 1992. Recent Developments in Game metrica, 61, pp. 1019–45. Theory. Hants, England: Edward Elgar Publish- Kandori, Michihiro. 1997. “Evolutionary Game ing Limited. Theory in Economics,” in David M. Kreps and Cressman, Ross and Karl H. Schlag. 1995. “The Kenneth F. Wallis, pp. 243–77. Dynamic (In)Stability of Backwards Induction.” Kandori, Michihiro and Rafael Rob. 1995. “Evolu- Technical report, Wilfred Laurier U. and Bonn tion of Equilibria in the Long Run: A General U. Theory and Applications,” J. Econ. Theory, 65, Darwin, Charles. 1887. The Life and Letters of pp. 383–414. Charles Darwin, Including an Autobiographical Kandori, Michihiro, George J. Mailath, and Rafael Chapter. Francis Darwin, ed., second ed., Vol. Rob. 1993. “Learning, Mutation, and Long Run 1, London: John Murray. Equilibria in Games,” Econometrica, 61, pp. Dekel, Eddie and Faruk Gul. 1997. “Rationality 29–56. and Knowledge in Game Theory,” in David M. Kim, Yong-Gwan and Joel Sobel. 1995. “An Evolu- Kreps and Kenneth F. Wallis, pp. 87–172. tionary Approach to Pre-Play Communication,” Ellison, Glenn. 1993. “Learning, Local Interac- Econometrica, 63, pp. 1181–93. tion, and Coordination,” Econometrica, 61, pp. Kohlberg, Elon and Jean-Francois Mertens. 1986. 1047–71. “On the Strategic Stability of Equilibria,” Elster, Jon. 1989. “Social Norms and Economic Econometrica, 54, pp. 1003–37. Theory,” J. Econ. Perspectives, 3, pp. 99–117. Kreps, David M. 1990a. Game Theory and Eco- Foster, Dean and H. Peyton Young. 1990. “Sto- nomic Modelling. Oxford: Clarendon Press. chastic Evolutionary Game Dynamics,” Theor. ———. 1990b. A Course in Microeconomic The- Population Bio., 38, pp. 219–32. ory. Princeton, NJ: Princeton U. Press. Friedman, Daniel. 1991. “Evolutionary Games in Kreps, David M. and Kenneth F. Wallis, eds. Economics,” Econometrica, 59, pp. 637–66. 1997. Advances in Economics and Econo- Fudenberg, Drew and Christopher Harris. 1992. metrics: Theory and Applications–Seventh “Evolutionary Dynamics with Aggregate World Congress of the Econometric Society, Shocks,” J. Econ. Theory, 57, pp. 420–41. Vol. 1. Cambridge: Cambridge U. Press. Fudenberg, Drew and David Kreps. 1989. “A The- Mailath, George J. 1992. “Introduction: Sympo- ory of Learning, Experimentation, and Equilib- sium on Evolutionary Game Theory,” J. Econ. rium,” mimeo, MIT and Stanford. Theory, 57, pp. 259–77. 1374 Journal of Economic Literature, Vol. XXXVI (September 1998)

Matsui, Akihiko. 1991. “Cheap-Talk and Coopera- Schelling, Thomas. 1960. The Strategy of Conflict. tion in Society,” J. Econ. Theory, 54, pp. 245– Cambridge, MA: Harvard U. Press. 58. Schlag, Karl H. 1998. “Why Imitate, and If So, Maynard Smith, John. 1982. Evolution and the How?” J. Econ. Theory, 78, pp. 130–56. Theory of Games. Cambridge: Cambridge U. Selten, Reinhard. 1980. “A Note on Evolutionary Press. Stable Strategies in Asymmetric Animal Con- Maynard Smith, John and G. R. Price. 1973. “The flicts,” J. Theor. Bio., 84, pp. 93–101. Logic of Animal Conflict,” Nature, 246, pp. 15– Skyrms, Brian. 1996. Evolution of the Social Con- 18. tract. Cambridge: Cambridge U. Press. Myerson, Roger B. 1991. Game Theory: Analysis Sobel, Joel. 1993. “Evolutionary Stability and Effi- of Conflict. Cambridge, MA: Harvard U. Press. ciency,” Econ. Letters, 42, pp. 301–12. Nitecki, Z. and Robinson, C. eds. 1980. Global Sonsino, Doron. 1997. “Learning to Learn, Pattern Theory of Dynamical Systems. Vol. 819 of Lec- Recognition, and Nash Equilibrium,” Games & ture Notes in Mathematics, Berlin: Springer- Econ. Behavior, 18, pp. 286–331. Verlag. Sugden, Robert. 1989. “Spontaneous Order,” J. Nöldeke, Georg and Larry Samuelson. 1993. “An Econ. Perspectives, 3, pp. 85–97. Evolutionary Analysis of Backward and Forward Swinkels, Jeroen M. 1992. “Evolutionary Stability Induction,” Games & Econ. Behavior, 5, pp. with Equilibrium Entrants,” J. Econ. Theory, 425–54. 57, pp. 306–32. Pearce, David. 1984. “Rationalizable Strategic Be- ———. 1993. “Adjustment Dynamics and Rational havior and the Problem of Perfection,” Econo- Play in Games,” Games & Econ. Behavior, 5, metrica, 52, pp. 1029–50. pp. 455–84. Ponti, Giovanni. 1996. “Cycles of Learning in the Taylor, Peter D. and Leo B. Jonker. 1978. “Evolu- Centipede Game,” Technical Report, University tionary Stable Strategies and Game Dynamics,” College, London. Math. Biosciences, 40, pp. 145–56. Ritzberger, Klaus and Jörgen W. Weibull. 1995. van Damme, Eric. 1987. Stability and Perfection “Evolutionary Selection in Normal-Form of Nash Equilibria. Berlin: Springer-Verlag. Games,” Econometrica, 63, pp. 1371–99. Wärneryd, Karl. 1993. “Cheap Talk, Coordination, Robson, Arthur J. 1992. “Evolutionary Game The- and Evolutionary Stability,” Games & Econ. Be- ory,” in John Creedy, Jeff Borland, and Jürgen havior, 5, pp. 532–46. Eichberger, pp. 165–78. Weibull, Jörgen W. 1995. Evolutionary Game The- Robson, Arthur J. and Fernando Vega-Redondo. ory. Cambridge: MIT Press. 1996. “Efficient Equilibrium Selection in Evo- Young, H. Peyton. 1993. “The Evolution of Con- lutionary Games with Random Matching,” J. ventions,” Econometrica, 61, pp. 57–84. Econ. Theory, 70, pp. 65–92. ———. 1996. “The Economics of Conventions,” J. Rousseau, Jean-Jacques. 1950. “A Discourse on Econ. Perspectives, 10, pp. 105–22. the Origin of Inequality,” in The Social Con- Young, H. Peyton. and Dean Foster. 1991. “Coop- tract and Discourses. New York: Dutton. Trans- eration in the Short and in the Long Run,” lated by G. D. H. Cole. Games & Econ. Behavior, 3, pp. 145–56. Samuelson, Larry. 1993. “Recent Advances in Evo- Zeeman, E. 1980. “Population Dynamics from lutionary Economics: Comments,” Econ. Let- Game Theory,” in Z. Nitecki and C. Robinson, ters, 42, pp. 313–19. pp. 471–97. ———. 1994. “Stochastic Stability in Games with ———. 1981. “Dynamics of the Evolution of Ani- Alternative Best Replies,” J. Econ. Theory, 64, mal Conflicts,” J. Theor. Bio., 89, pp. 249–70. pp. 35–65. Zermelo, Ernst. 1912. “Über eine Anwendung der Samuelson, Larry and Jianbo Zhang. 1992. “Evolu- Mengenlehre auf die Theorie des Schach- tionary Stability in Asymmetric Games,” J. speils,” Proceedings of the Fifth International Econ. Theory, 57, pp. 363–91. Congress of Mathematicians, II, pp. 501–504.