Proc. Natl. Acad. Sci. USA Vol. 96, pp. 10564–10567, September 1999

Perspective

Stochastic : For playing games, not just for doing theory

Jacob K. Goeree† and Charles A. Holt Department of , Rouss Hall, University of Virginia, Charlottesville, VA 22903

Recent theoretical advances have dramatically increased the relevance of game theory for predicting human behavior in interactive situations. By relaxing the classical assumptions of perfect rationality and perfect foresight, we obtain much improved explanations of initial decisions, dynamic patterns of learning and adjustment, and equilibrium steady-state distributions.

About 50 years ago, John Nash walked into the office of the Chair of the Princeton Mathematics Department with an existence proof for N-person games that was soon published in these Proceedings (1). dismissively re- marked ‘‘that’s trivial, you know, that’s just a fixed point theorem’’ (ref. 2, p. 94). But word of Nash’s work spread quickly, and with the as its centerpiece, game theory has now gained the central role first envisioned by von Neumann and Morgenstern (3). Game theory is rapidly be- coming a general theory of social science, with extensive applications in economics, psychology, political science, law, and biology. There is, however, widespread criticism of theories based on the ‘‘rational choice’’ assumptions of perfect decision making (no errors) and perfect foresight (no surprises). This skepti- cism is reinforced by evidence from laboratory experiments FIG. 1. Experiment based on the traveler’s dilemma game, using with financially motivated subjects. Nash participated in such randomly matched student subjects who made claim decisions inde- experiments as a subject and later designed similar experi- pendently in a sequence of 10 periods. Earnings ranged from $24 to ments of his own but lost whatever confidence he had in game $44 and were paid in private, immediately after the experiment. The frequency of actual decisions for the final five periods is indicated by theory when he saw how poorly it predicted human behavior the blue bars for R ϭ 50, by the yellow bars for R ϭ 25, and by the red (2). And , who shared the 1995 Economics bars for R ϭ 10. With R ϭ 50, the average claim was quite close to the Nobel Prize with Nash and Harsanyi, remarked that ‘‘game Nash prediction of 80, but, with R ϭ 10, the average claim started high theory is for proving theorems, not for playing games’’ (per- (at Ϸ180) and moved away from the Nash prediction, ending up at 186 sonal communication). This paper describes new develop- in the final five periods. The observed average claim varies with the ments in game theory that relax the classical assumptions of penalty͞reward parameter R in an intuitive manner (a higher R results perfect rationality and foresight. These approaches to intro- in lower claims), in sharp contrast with the Nash prediction of 80, spection (before play), learning (from previous plays), and independent of R. equilibrium (after a large number of plays) provide comple- any common claim. For example, suppose the range of ac- mentary perspectives for explaining actual behavior in a wide ceptable claims is from 80 to 200; then, a common claim of 200 variety of games. yields 200 for both, but a deviation by one person to 199 raises ϩ Coordination and Social Dilemma Games that person’s payoff to 199 R. The paradoxical Nash equilibrium of this ‘‘traveler’s dilemma’’ is for both The models summarized here have been influenced strongly by travelers to claim 80, the lowest possible amount (8). Simple data from experiments that show disturbing differences be- intuition suggests that claims are likely to be much higher when tween game-theoretic predictions and behavior of human the penalty parameter, R, is low (8), but that claims may subjects (4–7). Here, we consider a social dilemma game for approach the Nash prediction when R is high. Fig. 1 shows the which Nash’s theory predicts a unique equilibrium that is frequency of actual decisions in the final five rounds of an ‘‘bad’’ for all concerned and a in which any experiment with randomly matched student subjects for R ϭ common effort level is an equilibrium. That is, the Nash 10 (red bars), R ϭ 25 (yellow bars), and R ϭ 50 (blue bars) (9). equilibrium makes no clear prediction. The task for theory is to explain these intuitive treatment The social dilemma is based on a story in which two travelers effects, which contradict the Nash prediction of 80, indepen- lose luggage with identical contents. The airline promises to dent of R. pay any claim in an acceptable range as long as the claims are The second game is similar, with payoffs again determined equal. If not, the higher claimant is assumed to have lied and by the minimum decision. Players choose ‘‘effort levels,’’ and both are reimbursed at the lower claim, with a penalty, R, being both have to perform a costly task to raise the joint production taken from the high claimant and given to the low claimant. A level. A player’s payoff is the minimum of the two efforts minus Nash equilibrium in this context is a pair of claims that survives ␲ ϭ Ϫ the cost of the player’s own effort: i min{x1, x2} cxi, where an ‘‘announcement test’’: if each person writes in their claim x is player i’s effort level and c Ͻ 1 is a cost parameter. and then announces it, neither should want to reconsider. Note i Consider any common effort level: A unilateral increase is that, when R Ͼ 1, each traveler has an incentive to ‘‘undercut’’

†To whom reprint requests should be addressed. E-mail: jg2n@ PNAS is available online at www.pnas.org. virginia.edu.

10564 Downloaded by guest on September 24, 2021 Perspective: Goeree and Holt Proc. Natl. Acad. Sci. USA 96 (1999) 10565

costly but does not affect the minimum, and a unilateral Hence, a player’s decision will increase if the expected payoff decrease will reduce the minimum by more than the cost function is increasing at x(t), and vice versa. So the decisions reduction because c Ͻ 1. Hence, any common effort survives evolve over time, with the rate of change proportional to the the Nash announcement test. slope of expected payoff, ␲eЈ, plus a stochastic Brownian motion A high effort choice is riskier when the effort cost is high, term: dx ϭ ␲eЈ(x(t),t)dt ϩ ␴ dw(t), where ␴ determines the which suggests that actual behavior might be sensitive to the relative importance of the random shock dw(t). It can be shown effort cost. Fig. 2 shows the time-sequences of average effort that this individual adjustment process translates into a dif- choices for three groups of 10 randomly matched subjects who ferential equation for the population distribution: made effort choices in the range from 110 to 170 cents (J.K.G. ѨF͑x,t͒ and C.A.H., unpublished work). There is an upward pattern for ϭϪ␲eЈ͑ x,t͒f͑x,t͒ ϩ ␮fЈ͑x,t͒, [1] the low-cost treatment (thin blue lines for c ϭ 0.25) and an Ѩt essentially symmetric downward adjustment for the high-cost ␮ ϭ ␴2͞ Ј treatment (thin green lines for c ϭ 0.75). The thick color-coded where 2 is a noise or error parameter and f, f lines show average efforts for each treatment. The Nash represents the population density and its slope. This is the equilibrium or any standard variant of it does not predict the famous Fokker-Planck equation from statistical physics, which strong treatment effect, which is consistent with simple intu- has been derived in this interactive context (S. P. Anderson, ition about the effects of effort costs. Another interesting J.K.G., and C.A.H., unpublished work). The evolutionary result is that subjects may ‘‘get stuck’’ in a low-effort equilib- process in Eq. 1 is a nonlinear partial differential equation that rium (10, 11). can be solved by using numerical techniques. The red lines in For both games, the most salient feature of observed Fig. 2 show the resulting trajectories of average efforts for the behavior is not predicted by classical game theory. To under- two treatments of the coordination game experiment. This stand these results, note that players are usually unsure about model explains both the strong treatment effect and the what others will do and that there may be randomness in the general qualitative features of the time path of the average way individuals process and act on information. Even relatively effort choice. small payoff asymmetries and small amounts of noise can have large ‘‘snowball’’ effects in an interactive, strategic game, and Learning Dynamics this intuition is a key element of the models presented below. Evolutionary models are sometimes criticized on the grounds Evolutionary Dynamics that they ignore the cognitive abilities of human subjects to learn and adapt. The learning model that is closest to the A natural approach to explaining divergent adjustment pat- evolutionary approach is ‘‘reinforcement learning’’ based on terns like those in Fig. 2 is to develop models of evolution and the psychological insight that successful strategies will be learning (12–18). The idea behind evolution in this context is reinforced and used more frequently. Whereas the previous not based on the notion of reproductive fitness but, rather, that section describes the evolution of decisions in a ‘‘large’’ people tend to adopt successful strategies. Evolutionary mod- population, reinforcement learning pertains to changes in els typically add stochastic elements that are reminiscent of decision probabilities of a single decision maker. Reinforce- biological mutation. Consider a population of players that are ment models are formalized by giving each decision an initial characterized by their decision, x(t), at time t, which earns an weight and then adding the payoff actually obtained for a expected payoff of ␲e (x(t),t) where the expectation is taken decision chosen to the weight for that decision (7, 19). The with respect to the current population distribution of decisions, model is stochastic in the sense that the probability that a F (x,t). The evolutionary assumption is that players’ decisions decision is taken is the ratio of its own weight to the sum of all tend to move toward the direction of higher expected payoffs. weights. Simulations show that reinforcement learning models can explain key features of data from many economics exper- iments (19). Alternatively, learning can be modeled in terms of beliefs about others’ decisions. For example, suppose beliefs are characterized by a weight for each possible value of the other player’s decision and that the subjective probability associated with each decision is its weight divided by the sum of all weights. Weights are updated over time by adding 1 to the weight of the decision that is actually observed while the other weights remain unaltered. The resulting subjective probabili- ties determine expected payoffs, which in turn determine choices. To allow for some randomness in responses to payoff differences, psychologists have proposed probabilistic choice rules that have the desirable property that decisions with higher expected payoffs are more likely to be chosen, although not necessarily with probability 1. In particular, the ‘‘logit’’ FIG. 2. Experiment based on the coordination game, using ran- domly matched student subjects who made effort decisions indepen- probabilistic choice rule specifies that the probability of se- dently in a sequence of 10 periods. Average effort choices are given by lecting a particular decision, i, is proportional to an exponential the thin green lines for the low-cost sessions (with c ϭ 0.25) and by the function of its expected payoff, ␲e(i): thin blue lines for the high-cost sessions (with c ϭ 0.75). The thick lines exp͑␲e͑i͒͞␮͒ show average efforts for both treatments. As simple economic intuition ͑ ͒ ϭ ϭ would suggest, higher effort costs result in lower efforts, an effect that Pr i N , i 1,...,N. [2] is not predicted by the Nash equilibrium. Note that, in one of the ͸ exp ͑␲e͑j͒͞␮͒ high-cost sessions, effort choices gravitate toward the lowest possible jϭ1 effort choice of 110, the equilibrium that is worst for all concerned. The ␮ thin red lines give the average effort choices predicted by an evolu- As goes to infinity, all choice probabilities are equal, tionary adjustment model in which players tend to move in the regardless of payoff differences, but small payoff differences direction of higher payoffs but may make mistakes in doing so. will have large effects when ␮ goes to 0. Fig. 3 shows simulated Downloaded by guest on September 24, 2021 10566 Perspective: Goeree and Holt Proc. Natl. Acad. Sci. USA 96 (1999)

adjust strategies, and reach an equilibrium. One way to pro- ceed in these situations is to use general intuition and intro- spection about what the other player(s) might do, what they think others might do, and so on. Edgar Allen Poe mentions this type of iterated introspection in an account of a police inspector trying to decide where to search for the ‘‘purloined letter.’’ Economists have long thought about such iterated expectations and have noted that the resulting ‘‘infinite re- gression’’ often leads to a Nash equilibrium and hence does not provide a new approach to behavior in one-shot games. Naturally, our approach is to introduce noise into the introspective process, with the intuition that players’ own decisions involve less noise than their perceptions of others’ decisions, which in turn involves less noise than other players’ perceptions of others’ decisions, and so on. Our model is based FIG. 3. Patterns of adjustment for the traveler’s dilemma game. on the probabilistic choice mapping in Eq. 2, which we will The thick lines show the average claims of human subjects in sessions express compactly as p ϭ ␾␮(q), where q represents the vector with a penalty͞reward parameter of R ϭ 50 (blue line), R ϭ 25 (yellow line), and R ϭ 10 (red line). The color-coded thin lines represent of belief probabilities that determine expected payoffs, and p simulated average claims for the different treatments. These simula- is the vector of probabilities that is determined by the prob- tions are based on a Bayesian learning model, in which players use abilistic choice rule ␾␮. We assume that the error parameter observations of their opponents’ past play to update their beliefs about associated with each higher level of iterated introspection is what will happen next. Adding logit errors to such a naive learning ␶ Ͼ 1 times the error parameter associated with the lower level. model results in predictions that conform nicely with the actual For instance, p ϭ ␾␮ (␾␶␮(q)) represents a player’s noisy (␮) adjustment patterns observed in the laboratory. response to the other player’s noisy (␶␮) response to beliefs q. The ‘‘telescope’’ parameter ␶ determines how fast the error decisions with noisy learning, together with the observed choices of human subjects, for the three treatments of the rate blows up with further iterations. We are interested in the traveler’s dilemma game. Simulations of this type also have choice probabilities as the number of iterations goes to infinity: been used to explain adjustment patterns in other experiments p ϭ lim ␾␮͑␾␶␮͑ ⅐⅐⅐␾␶ n␮͑q͒͒͒. [3] (20, 21). n3ϱ ␶ Ͼ ␾ Logit Equilibrium This limit is well defined for 1, and because ϱ maps the whole probability simplex to a single point, the limit is inde- In a steady state, the distribution function in Eq. 1 stops pendent of the initial belief vector q. It has been shown (J.K.G. evolving; that is, dF͞dt ϭ 0. Setting the right side of Eq. 1 to and C.A.H., unpublished work) that Eq. 3 provides a good 0, dropping the time arguments, and rearranging yields a explanation of (nonequilibrium) play in many types of one-shot differential equation in the equilibrium density: fЈ(x)͞f(x) ϭ games; see ref. 26 for alternative approaches (C. M. Capra, ␲eЈ(x)͞␮. This equation can be integrated to yield f(x) ϭ k personal communication). exp(␲eЈ(x)͞␮), which is the continuous analogue of Eq. 2. Thus, The logit equilibrium arises as a limit case of this two- a gradient-based evolutionary model with Brownian motion parameter introspective model. Recall that a logit equilibrium produces the logit model in a steady state. Likewise, when is a fixed point of ␾␮: that is, a vector p* that satisfies p* ϭ simulated behavior in learning models with logit errors settles ␾␮(p*). For ␶ ϭ 1, a fixed point of ␾␮ is also a fixed point of down, the steady state distributions of decisions are consistent Eq. 3. So, if the introspective model converges for ␶ ϭ 1, the with the logit rule in Eq. 2. result is a logit equilibrium. If ␶ Ͼ 1, Eq. 3 determines beliefs In equilibrium, Eq. 2 must be interpreted carefully because the choice density determines the expected payoffs, which in turn determine the choice density. In other words, the logit equilibrium is a fixed point: The ‘‘belief’’ density that goes into the expected payoff function on the right side of Eq. 2 must match the ‘‘choice’’ density that comes out of the logit rule on the left side. It has been proven that such an equilibrium exists for all games with a finite number of decisions (22, 23). This elegant proof is based on a fixed-point theorem, like Nash’s half-page proof in these Proceedings (1). The equilibrium conditions in Eq. 2 can be solved numer- ically by using an estimated error parameter. Fig. 4 shows the logit equilibrium densities for the three treatments of the traveler’s dilemma experiment. The theoretical densities pick up the general location of the data frequencies in Fig. 1, with the colors used to match predictions and data for each treatment. There are discrepancies, but the logit equilibrium predictions are superior to the Nash prediction of 80 for all treatments. The logit equilibrium has been successfully applied to explain behavior in a variety of other economic environ- ments (refs. 22–25; R. D. McKelvey and T. R. Palfrey, personal FIG. 4. Logit equilibrium densities for the traveler’s dilemma game with a penalty͞reward parameter of R ϭ 50 (blue line), R ϭ 25 (yellow communication). line), and R ϭ 10 (red line). The logit equilibrium is capable of reproducing the most salient feature of the human data: that is, the A Model of Iterated Noisy Introspection intuitive effect of the penalty͞reward parameter on average claims. Note that the logit equilibrium predictions are roughly consistent with Many political contests, legal disputes, and special auctions are the actual human data shown in Fig. 1 and are far superior to the Nash played only once, in which case there is no opportunity to learn, prediction of 80 (green bar), which is independent of R. Downloaded by guest on September 24, 2021 Perspective: Goeree and Holt Proc. Natl. Acad. Sci. USA 96 (1999) 10567

p that will generally not match the iterated layers of intro- 2. Nasar, S. (1998) A Beautiful Mind (Simon and Schuster, New spective beliefs on the right side. In this sense, the introspective York). model generalizes the logit equilibrium by relaxing the as- 3. von Neumann, J. & Morgenstern, O. (1944) Theory of Games and sumption of perfect consistency between actions and beliefs, Economic Behavior (Princeton Univ. Press, Princeton, NJ). just as the logit equilibrium generalizes Nash by relaxing the 4. Ochs, J. (1994) Games Econ. Behav. 10, 202–217. 5. McKelvey, R. D. & Palfrey, T. R. (1992) Econometrica 60, assumption of perfectly rational decision making. 803–836. 6. Beard, T. R. & Beil, R. O. (1994) Management Sci. 40, 252–262. Conclusion 7. Roth, A. E. & Erev, I. (1995) Games Econ. Behav. 8, 164–212. 8. Basu, K. (1994) Am. Econ. Rev. 84, 391–395. This paper describes three complementary modifications of 9. Capra, C. M., Goeree, J. K., Gomez, R. & Holt, C. A. (1999) Am. classical game theory. The models of introspection, learning͞ Econ. Rev., in press. evolution, and equilibrium contain the common stochastic 10. Van Huyck, J. B., Battalio, R. C. & Beil, R. O. (1990) Am. Econ. elements that represent errors or unobserved Rev. 80, 234–248. shocks. Like the ‘‘three friends’’ of classical Chinese gardening 11. Cooper, R., DeJong, D. V., Forsythe, R. & Ross, T. W. (1992) (pine, prunus, and bamboo), these approaches fit together Q. J. Econ. 107, 739–771. nicely, each with a different purpose. Models of iterated noisy 12. Foster, D. & Young, P. (1990) Theor. Popul. Biol. 38, 219–232. 13. Fudenberg, D. & Harris, C. (1992) J. Econ. Theor. 57(2), 420– introspection are used to explain beliefs and choices in games 441. played only once, where surprises are to be expected, and 14. Kandori, M., George, M. & Rob, R. (1993) Econometrica 61, beliefs are not likely to be consistent with choices. With 29–56. repetition, beliefs and decisions can be revised through learn- 15. Binmore, K., Samuelson, L. & Vaughan, R. (1995) Games Econ. ing or evolution. Choice distributions will tend to stabilize Behav. 11, 1–35. when there are no more surprises in the aggregate, and the 16. Crawford, V. P. (1991) Games Econ. Behav. 3, 25–59 resulting steady state constitutes a noisy equilibrium. 17. Crawford, V. P. (1991) Econometrica 63, 103–144. These theoretical perspectives have allowed us to predict 18. Young, P. (1993) Econometrica 61, 57–84. initial play, adjustment patterns, and final tendencies in lab- 19. Erev, I. & Roth, A. E. (1998) Am. Econ. Rev. 88, 848–881. oratory experiments. There are discrepancies, but the overall 20. Cooper, D. J., Garvin, S. & Kagel, J. H. (1994) Rand J. Econ. 106, pattern of results is surprisingly coherent, especially consider- 662–683. 21. Camerer, C. & Ho, T. H. (1999) Econometrica, in press. ing that we are using human subjects in interactive situations. 22. McKelvey, R. D. & Palfrey, T. R. (1995) Games Econ. Behav. 10, The resulting models have the empirical content that makes 6–38. them relevant for playing games, not just for doing theory. 23. McKelvey, R. D. & Palfrey, T. R. (1998) Exp. Econ. 1, 1–41. 24. Anderson, S. P., Goeree, J. K. & Holt, C. A. (1998) J. Political We thank Melayne McInnes for a helpful suggestion. This research Econ. 106, 828–853. was funded in part by the National Science Foundation (Grants 25. Anderson, S. P., Goeree, J. K. & Holt, C. A. (1998) J. Public Econ. SBR-9617784 and SBR-9818683). 70, 297–323. 36. Harsanyi, J. C. & Selten, R. (1988) A General Theory of Equi- 1. Nash, J. (1950) Proc. Natl. Acad. Sci. USA 36, 48–49. librium Selection in Games (MIT Press, Cambridge, MA) Downloaded by guest on September 24, 2021