American Economic Association

Testing in the Field: Swedish LUPI Lottery Games Author(s): Robert Östling, Joseph Tao-yi Wang, Eileen Y. Chou and Colin F. Camerer Source: American Economic Journal: Microeconomics, Vol. 3, No. 3 (August 2011), pp. 1-33 Published by: American Economic Association Stable URL: http://www.jstor.org/stable/41237195 . Accessed: 21/07/2014 17:24

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp

. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].

.

American Economic Association is collaborating with JSTOR to digitize, preserve and extend access to American Economic Journal: Microeconomics.

http://www.jstor.org

This content downloaded from 128.32.135.128 on Mon, 21 Jul 2014 17:24:34 PM All use subject to JSTOR Terms and Conditions AmericanEconomie Journal: Microeconomics 3 (August 2011): 1-33 http://www. aeaweb. org/ articles. php ?doi= 10.1257 /mie. 3. 3. 1

TestingGame Theoryin the Field: Swedish LUPI LotteryGames1^

By Robert Östling, JosephTao-yi Wang, Eileen Y. Chou, and Colin F. Camerer*

Game theoryis usuallydifficult to testin thefield because predic- tionstypically depend sensitively on features that are notcontrolled or observed.We conductone such testusing both laboratory and fielddata fromthe Swedish lowest unique positive integer (LUPI) game. In this game,players pick positiveintegers and whoever choosesthe lowest unique number wins. Equilibrium predictions are derivedassuming Poisson distributedpopulation uncertainty. The fieldand lab data showsimilar patterns. Despite various deviations fromequilibrium, there is a surprisingdegree of convergence toward equilibrium.Some deviations can be rationalizedby a cognitivehier- archymodel. (JEL C70, C93, D44, H27)

theorypredictions are challengingto testwith field data because those predictionsare usuallysensitive to detailsabout strategies, information and payoffswhich are difficultto observein thefield. As RobertAumann pointed out: "In applications,when you wantto do somethingon thestrategic level, you must havevery precise rules [. . .] An auctionis a beautifulexample of this,but it is very special.It rarelyhappens that you have rules like that (Eric van Damme 1998)." In thispaper we exploitsuch a happening,using field data from a Swedishlottery game.In thislottery, players simultaneously choose positiveintegers from 1 to K. Thewinner is theplayer who chooses the lowest number that nobody else picked.We call thisthe LUPI game,because the /owest wnique positive integer wins.1 Because strategiesand payoffsare known,the fieldsetting is unusuallywell-structured

* Östling:Institute for International Economic Studies, Stockholm University, SE- 106 91 Stockholm,Sweden (e-mail:[email protected]); Wang: Department of Economics,National Taiwan University, 21 Hsu-Chow Road,Taipei 100, Taiwan (e-mail: [email protected]); Chou: Managementand Organization, Kellogg School of Management,Northwestern University, Evanston, IL 60201 (e-mail:e-chou@ kellogg.northwestern.edu); Camerer: Division forthe Humanitiesand Social Sciences, MC 228-77, CaliforniaInstitute of Technology,Pasadena, CA 91125 (e-mail:[email protected]). The firsttwo authors,Wang and Östling,contributed equally to thispaper. A previousversion of thispaper was includedin Östling'sdoctoral thesis. We are gratefulfor help- fulcomments from Vincent P. Crawford,Tore Ellingsen, Ido Erev,Magnus Johannesson, Botond Köszegi, David Laibson,Erik Lindqvist, Stefan Molin, Noah Myung,Rosemarie Nagel, Charles Noussair, Carsten Schmidt, Dylan Thurston,Dmitri Vinogradov, Mark Voorneveld,Jörgen Weibull, seminar participants and severalanonymous referees.Östling acknowledges financial support from the Jan Wallander and Tom HedeliusFoundation. Wang acknowledgessupport from the NSC ofTaiwan (NSC 98-2410-H-002-069-MY2, NSC 99-2410-H-002-060-MY3). Camereracknowledges support from the NSF HSD program,HSFP, and theBetty and GordonMoore Foundation. Ť To commenton thisarticle in theonline discussion forum, or to viewadditional materials, visit the article page at http://www.aeaweb.org/articles.php?doi= 1 0. 1257/mic.3.3. 1. The Swedishcompany called the game Limbo, but we thinkLUPI is moremnemonic, and more apt because in thetypical game of limbo, two players who tie in howlow theycan slideunderneath a bar do notlose. 1

This content downloaded from 128.32.135.128 on Mon, 21 Jul 2014 17:24:34 PM All use subject to JSTOR Terms and Conditions 2 AMERICANECONOMIC JOURNAL:MICROECONOMICS AUGUST201 1 comparedto otherstrategic field data on contracting,pricing, entry, information dis- closure,or auctions.The priceone paysfor clear structure is thatthe game does not veryclosely resemble any other familiar economic game. Gaining structure at the expenseof generality is similarto thetradeoff faced in usingdata from game shows andsports to understandgeneral strategic principles. This paperanalyzes LUPI theoreticallyand reportsdata from the Swedish field lotteryand fromparallel lab experiments.The paperhas severaltheoretical and empiricalparts. The partshave a coherentnarrative flow because each partraises somenew question which is answeredby thenext part. The overarchingquestion, whichis centralto all empiricalgame theory, is thisone: What strategic models best explainbehavior in games? The firstspecific question is Ql: Whatdoes an equilibriummodel of behavior predictin thesegames? To answerthis question, we firstnote that subjects do not knowexactly how manyother players are participatingin thegame and thatthe actualnumber of players varies from day to day.We thereforeapproximate the equi- libriumby applying the theory of Poisson games.2 In Poissongames, the number of playersis Poisson-distributed(Roger B. Myerson1998).3 Remarkably, assuming a variablenumber of playersrather than a fixednumber makes computation of equi- libriumsimpler if the number of players is Poisson-distributed. The numberof players in theSwedish LUPI gameactually varies too much from day-to-dayto matchthe cross-dayvariance implicit in the Poisson assumption. However,the Poisson- is (probably)the only computable equilib- riumbenchmark. Field tests of theory always violate some of the assumptions of the theory,to somedegree; it is an empiricalquestion whether the equilibrium bench- markfits reasonably well despiteresting on incorrectassumptions. (We revisitthis importantissue in theconclusion after all thedata are presented.) Afterderiving the Poisson equilibrium in orderto answerQl, we comparethe Poissonequilibrium to thefield data. In our view,the equilibrium is surprisingly close (givenits complexityand counterintuitiveproperties). However, there are clearlylarge deviationsfrom the equilibriumprediction and some behaviorally interestingfine-grained deviations. These empirical results raises question Q2: Can non-equilibriumbehavioral models explain the deviations when the game is first played? The simpleLUPI structureallows us to providetentative answers to Q2 by comparingPoisson-Nash equilibrium predictions with predictions of a particu- lar parametricmodel of boundedlyrational play: the level-/:or cognitivehierar- chy (CH) approach.CH predictstoo manylow-number choices (compared to the Poisson-Nash),capturing some deviation of the field data. Because theLUPI game is simple,it is easy to go a stepfurther and runa lab experimentthat matches many of thekey features of thegame played in thefield.

As MiltonFriedman (1953) famouslynoted, theories with false assumptions could often predict well (and,in economics,often do). Thisalso distinguishesour paper from contemporaneous research on uniquebid auctions by Jürgen Eichberger and DmitriVinogradov (2008); AndreaGallice (2009); YaronRaviv and GaborVirag (2009); AmnonRapoport et al. (2009), andHarold Houba, Dinard van der Laan, andDirk Veldhuizen (forthcoming), which all assumethat the numberof players is fixedand commonly known.

This content downloaded from 128.32.135.128 on Mon, 21 Jul 2014 17:24:34 PM All use subject to JSTOR Terms and Conditions VOL.3 NO. 3 ÖSTLINGETAL: TESTINGGAME THEORY IN THE FIELD 3

Thelab dataenable us to addressone more question: Q3: How welldoes behaviorin a lab experimentdesigned to closelymatch features of a fieldenvironment parallel behaviorin thefield? Q3 is importantbecause of an ongoingdebate about lab-field parallelismin economics,rekindled with some skepticism by StevenD. Levittand JohnA. List (2007) (see ArminFalk and JamesJ. Heckman2009 and Camerer forthcomingfor replies). We concludethat the basic empiricalfeatures of thelab andfield behavior are comparable.This close matchadds to a smallamount of evi- denceof howwell experimentallab datacan generalizeto a particularfield setting whenthe experiment was specificallyintended to do so. The abilityto trackdecisions by each playerin thelab also enablesus to answer someminor questions that cannot be answeredby field data. For example, it appears thatplayers tend to play recent winning numbers more, sociodemographic variables do notcorrelate strongly with performance, and there are not strong identifiable dif- ferencesin skillacross players (measured by winningfrequency). Beforeproceeding, we mustmention an importantcaveat. LUPI was notdesigned bythe lottery creators to be an exactmodel of a particulareconomic game. However, it combinessome strategicfeatures of interestingnaturally occurring games. For example,in gameswith congestion, a player'spayoffs are lowerif otherschoose thesame .Examples include choices of trafficroutes and researchtopics, or buyersand sellerschoosing among multiple markets. LUPI has theproperty of an extremecongestion game, in whichhaving even one otherplayer choose the same numberreduces one's payoffto zero.4Indeed, LUPI is similarto a game in whichbeing first matters (e.g., in a patentrace), but if players are tiedfor first they do notwin. One close marketanalogue to LUPI is thelowest unique bid auction (LUBA; see Eichbergerand Vinogradov 2008; Gallice2009; Ravivand Virag 2009; Rapoportet al. 2009; and Houba, van derLaan, and Veldhuizenforthcoming). In theseauctions, an objectis sold to the lowestbidder whose bid is unique (or in someversions, to thehighest unique bidder). LUPI is simplerthan LUBA because winnersdo nothave to paythe amount they bid, and there are no privatevaluations and beliefsabout valuations of others.However, LUPI containsthe same essential strategicconflict: between wanting to choose low numbersand wantingto choose uniquenumbers. Finally,the scientificvalue of LUPI games is like the scientificvalue of data fromgame showsand professionalsports, such as Deal or No Deal (e.g., Steffen Andersenet al. 2008, and ThierryPost et al. 2008). Like theLUPI lottery,game showsand sportsare not designedto be replicasof typicaleconomic decisions. Nonetheless,game shows and sportsare widelystudied in economicsbecause they providevery clear field data about actual economic choices (often for high stakes), andthey have simple structures that can be analyzedtheoretically. The sameis true forLUPI. The nextsection provides a theoreticalanalysis of a simpleform of theLUPI game,the Poisson-Nashequilibrium. Section II reportsthe basic fielddata and

4Note,however, that LUPI is nota congestiongame as definedby Robert W. Rosenthal(1973) sincethe payoff fromchoosing a particularnumber does notonly depend on howmany other players picked that number, but also on howmany picked lower numbers.

This content downloaded from 128.32.135.128 on Mon, 21 Jul 2014 17:24:34 PM All use subject to JSTOR Terms and Conditions 4 AMERICANECONOMIC JOURNAL:MICROECONOMICS AUGUST2011 comparethem to thePoisson-Nash approximate benchmark. It also introducesthe cognitivehierarchy model and asks whether it can explainthe field data. Section III describesthe lab replication.Section IV concludesthe paper. Supporting material is availablein a separateonline Appendix.

I. Theory

In thesimplest form of LUPI, thenumber of players, N, has a knowndistribution, theplayers choose integers from 1 to К simultaneously,and the lowest unique num- berwins. The winnerearns a payoffof 1, whileall othersearn 0.5 We firstanalyze the game whenplayers are assumedto be fullyrational, best responding,and have equilibrium beliefs. We assumethat the number of players N is a randomvariable that has a Poissondistribution.6 The Poissonassumption proves tobe easierto workwith than a fixedN, boththeoretically and computationally. The actualvariance of N in thefield data is muchlarger than in thePoisson distribution so thePoisson-Nash equilibrium is onlya computableapproximation to thecorrect equilibrium.Whether it is a good approximationwill partly be answeredby looking at how well thetheory fits the field data.7 In addition,we implementthe Poisson distributionofN exactlyin lab experiments.

A. Propertiesof Poisson Games

In thissection, we brieflysummarize the theory of Poissongames developed by Myerson(1998, 2000). The theoryis thenused in SectionIB to characterizethe Poisson-Nashequilibrium in theLUPI game. Gameswith population uncertainty relax the assumption that the exact number of playersis commonknowledge. In particular,in a Poissongame the number of play- ersN is a randomvariable that follows a Poissondistribution with mean n. We have -n к N ~ Poisson^): N = к withprobability , and,in thecase of a Bayesiangame (or thecognitive hierarchy model developed below),players' types are independentlydetermined according to theprobability distributionr - (r(t))teTon sometype space T. Let a typeprofile be a vectorof non-negativeintegers listing the number of players of each typet in Г, andlet Z(T) be theset of all suchtype profiles in thegame. Combining N and r we can describe

5 In thisstylized case, we assumethat if there is no lowestunique number then there is no winner.This simplifies theanalysis because it means that only the probability of being unique must be computed.In theSwedish game, if thereis no uniquenumber then the players who picked the smallest and least-frequently-chosennumber share the top6prize. Playersdid notknow the number of totalbets in boththe field and lab versionsof theLUPI game.Although playersin thefield could get information about the current number of bets that had been made so farduring the day, playershad to place theirbets before the game closed for the day and thereforecould not know with certainty the totalnumber of players that would participate in thatday. ' For smallN, we showin onlineAppendix A thatthe equilibrium probabilities for fixed-TV Nash and Poisson- Nashequilibrium are practically indistinguishable (Figure Al).

This content downloaded from 128.32.135.128 on Mon, 21 Jul 2014 17:24:34 PM All use subject to JSTOR Terms and Conditions VOL.3 NO. 3 ÖSTLINGETAL : TESTINGGAME THEORY IN THE FIELD 5 thepopulation uncertainty with the distribution y ~ Q(y) wherey G Z(T) andy(t) is thenumber of players of type t G T. Playershave a commonfinite action space С withat leasttwo alternatives, which generatesan actionprofile Z(C) containingthe number of players that choose each action.Utility is a boundedfunction U: Z(C) x С х Г- » M, wheref/(jc, b, t) is thepayoff of a playerwith type ¿, choosing action ¿?, and facingan opponentaction profileof x. Let x(c) denotethe number of otherplayers playing action с G С. Myerson(1998) showsthat the Poisson distribution has twoimportant properties thatare relevantfor Poisson games and simplifycomputations dramatically. The firstis thedecomposition property, which in thecase of Poissongames imply that thedistribution of typeprofiles for any y G Z(T) is givenby m U- m- • Hence,Y(t), the random number of playersof typet G Г, is Poissonwith mean nr(t), and is independentof Y(t') forany otherť G T. Moreover,suppose each playerindependently plays the mixed strategy o' choosingaction с e С withprob- abilitycr(c 1 1) givenhis typet. Then,by thedecomposition property, the number of playersof type t thatchooses action c, Y(c, f),is Poissonwith mean nr(t)a(c 11) and is independentof Y(c' ť) forany other c', ť . The secondproperty of Poissondistributions is the aggregation property, which statesthat any sum of independentPoisson randomvariables is Poisson distrib- uted.This property implies that the number of players (across all types)who choose actionc, X(c), is Poissonwith mean ]CřGrwr(ř)a(c 11), independent of X(c') forany otherc' G C. We referto thisproperty of Poisson games as theindependent actions (IA) property. Myerson(1998) also showsthat the Poisson game has anotheruseful property: environmentalequivalence (ЕЕ). Environmentalequivalence means that conditional on beingin thegame, a typet playerwould perceivethe population uncertainty as an outsiderwould, i.e., Q(y). If thestrategy and typespaces are finite,Poisson gamesare theonly games with population uncertainty that satisfy both IA and ЕЕ (Myerson1998). ЕЕ is a surprisingproperty. Take a PoissonLUPI gamewith 27 playerson average. In ourlab implementation,a large number of players are recruited and are told that the numberof players who will be active(i.e., play for real money) in eachperiod varies. Considera playerwho is toldshe is active.On theone hand,she might then act as if sheis playingagainst the number of opponent players that is Poisson-distributedwith a meanof 26 (sinceher active status has lowered the mean of the number of remaining players).On theother hand, the fact that she is activeis a cluethat the number of play- ersin thatperiod is large,not small. If TV is Poisson-distributedthetwo effects exactly cancelout so all activeplayers in all periodsact as ifthey face a Poisson-distributed numberof opponents. ЕЕ, combinedwith IA, makesthe analysis rather simple. An equilibriumfor the Poisson game is definedas a strategyfunction a suchthat everytype assigns positive probability only to actionsthat maximize the expected utilityfor players of this type; that is, forevery action с e С andevery type t G T,

This content downloaded from 128.32.135.128 on Mon, 21 Jul 2014 17:24:34 PM All use subject to JSTOR Terms and Conditions 6 AMERICANECONOMIC JOURNAL:MICROECONOMICS AUGUST2011

ifсг (с Iř) > 0 thenU(c 't,o) = maxU(b ' t,a) forthe expected utility

^|.,^1п('У')%>,).xez(c)cec ' xyc)! / where

t(c) = Y,r{t)a{c't) teT is themarginal probability that a randomsampled player will choose action с under a. Note thatthis equilibrium is by definitionsymmetric; asymmetric equilibria whereplayers of the same type could play differently are not defined in gameswith populationuncertainty since ex antewe do notknow the list of participating players. Myerson(1998) provesexistence of equilibriumunder all gamesof population uncertaintywith finite action and type spaces, which includes Poisson games.8 This existenceresult provides the basis for the following characterization ofthe Poisson- Nashequilibrium.

B. PoissonEquilibrium for the LUPI Game

In the(symmetric) Poisson equilibrium, all playersemploy the same mixed strat- egyp = (pup2, . . . ,Pk) whereY^i=' Pi - 1-Let therandom variable X(k) be the numberof players who pick к in equilibrium.Then, Pr (X(fc) = i) is theprobability thatthe number of playerswho pickк in equilibriumis exactly/. By environmen- tal equivalence(ЕЕ), Pr(X(ifc)= /) is also theprobability that / opponents pick к. Hence,the expected payoffs for choosing different numbers are:

тг(1) = Pr(X(l) - 0) - e~np> - • тг(2) - Pr(X(l) + 1) • Pr(X(2) = 0) = (1 npxe-™) e~np2

тг(З) = Pr(X(l) ф 1) - Pr(X(2) ф 1) • Pr(X(3) = 0)

тг(Л)= Щ Рг(*(0 Ф 4 * МВД = 0)

= - e-™ [Г! [l-nPie-^]j

8 Forinfinite types, Myerson (2000) provesexistence of equilibrium for Poisson games alone.

This content downloaded from 128.32.135.128 on Mon, 21 Jul 2014 17:24:34 PM All use subject to JSTOR Terms and Conditions VOL 3 NO. 3 ÖSTLINGETAL : TESTINGGAME THEORY IN THE FIELD 7 forall /:> 1. If bothк and к + 1 are chosenwith positive probability in equilib- rium,then n(k) = n(k + 1). Rearrangingthis equilibrium condition implies

(1) enpM= enpk- npk.

Alternatively,this condition can be writtenas

(2) Pk- Рш = -тНп(1- W~n

In additionto condition1 (or 2), the probabilitiesmust sum up to one and the expectedpayoff from playing numbers not in thesupport of the equilibrium strategy cannotbe higherthan the numbers played with positive probability. The threeequilibrium conditions allow us to characterizethe equilibriumand showthat it is unique.

PROPOSITION 1: Thereis a uniquemixed equilibrium p = (рь ръ • • • , pK) ofthe PoissonLUPI gamethat satisfies the following properties:

1) Full support:pk > 0 forall k.

2) Decreasingprobabilities: pk+' < pkforallk.

3) Convexity/concavity: (pM - pk+2) > (Pk ~ Рш) far pk+{ > '/n, and (Pk+i - Pk+i) < (pk- Pk+i)forl/n > pk.

4) Convergenceto uniformplay withmany players: for anyfixed K, n - » oo impliesрш -> pk.

5) Probabilityasymptotes to zerowith more numbers to guess:for anyfixed n, К - > oc impliespK - ►0.

The proofis givenin theAppendix. The intuitionfor the results in Proposition1 are as follows.For the first property, first note that if к is chosen,so is к + 1, since deviatingfrom к to к + 1 wouldotherwise be profitable.Nothing matters if there is a smallernumber than к uniquelychosen by an opponent,but if not,picking к winsonly if nobodyelse choosesк, whilepicking к + 1 winsif eithernobody choosesк or ifmore than two opponents choose к. Togetherwith the fact that 1 has to be chosenguarantees full support. Secondly, lower numbers should be chosen moreoften because the LUPI rulefavors lower numbers. For example,if everyone is choosinguniformly, you shouldpick 1. However,as morepeople participate in thegame, this advantage disappears which implies convergence to uniform(prop- erty4).9 Thirdly,condition 2 showsthat the difference between pk and pk+x solely

9 For example,when К = 100 and n = 500, the mixtureprobabilities start at p] - 0.0124 and end with p91= 0.0043,/?98 = 0.0038,p99 = 0.0031,pm = 0.0023; so theratio of highestto lowestprobabilities is about

This content downloaded from 128.32.135.128 on Mon, 21 Jul 2014 17:24:34 PM All use subject to JSTOR Terms and Conditions 8 AMERICANECONOMIC JOURNAL:MICROECONOMICS AUGUST201 1

x10"4

sо °- Above 1/53783: Concave

1 "

Below 1/53783: Convex, asymptotes to zero I 1 1 1 1 1 1 1 1 1 1 1 ol 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,000 Numberschosen (truncatedat 10,000)

Figure 1. Poisson-NashEquilibrium for the Field LUPI Game(n = 53,783,К = 99,999)

x dependson thefunction /(*) = xe wherex = npk.Since/ (x) > 0 ifx < 1, and f'(x) < 0 ifx > 1, thecritical cutoff for concavity /convexity happens dXpk = l/n. Lastly,since pK is thesmallest among all probabilities,if pK does notconverge to zeroas К becomeslarge, the probabilities will not sum up to one. In theSwedish game the average number of playerswas n - 53,783 and num- berchoices were positive integers up to К = 99,999.As Figure1 shows,the equi- libriuminvolves mixing with substantial probability between 1 and 5,500,starting from/?!= 0.0002025.The predictedprobabilities drop off very sharply at around 5,500.This is due to theconcavity /convexity switch around l/n, which happens at T = 5,513 (pT = 0.00001857); thedifference (рк - рш) increasesas you move toward'/n fromeither side. Figure1 showsonly the predicted probabilities for 1 to 10,000,since probabilities for numbers above 5,518 are positivebut minuscule. Notethat n = 53,783 < К = 99,999implies that K/n > 1, andhence, the concav- ity/convexityswitch (and the"sharp drop") has to occurat T < K.10 Thecentral empirical question that will be answeredlater is howwell actual behav- iorin thefield matches the equilibrium prediction in Figure1. Keep in mindthat the simplifiedgame analyzed in thissection differs in somepotentially important ways fromthe actual Swedish game. Computing the equilibrium is complicatedand its

six-to-one.When К = 100 and n = 5,000, all mixtureprobabilities for numbers 1 to 100 are 0.01 (up to two- decimalprecision). 10 pTis closeto 1/n by the concavity /convexity switch. So, Tis positivelyrelated to n. SincepK converges to zero forlarge К due to Property5 of Proposition1, T does notdepend on К as longas К is large(and "non-binding").

This content downloaded from 128.32.135.128 on Mon, 21 Jul 2014 17:24:34 PM All use subject to JSTOR Terms and Conditions VOL 3 NO. 3 ÖSTLINGETAL : TESTINGGAME THEORY IN THE FIELD 9 propertiesare notparticularly intuitive. It mighttherefore be surprisingif theactual datamatched the equilibrium closely. Because there are 49 daysof data, we can also see whetherchoices move in thedirection of the Poisson-Nash benchmark over time.

II. The Field LUPI Game

The fieldversion of LUPI, called Limbo,was introducedby the government- ownedSwedish gambling monopoly Svenska Spel on January29, 2007. This sec- tiondescribes its essential elements; additional description is in onlineAppendix C. In Limbo,players chose an integerbetween 1 and 99,999. Each numberbet costsSEK 10 (approximatelyEURO 1). The game was playeddaily and thewin- ningnumber was presentedon TV in theevening and on theInternet. The winner received18 percentof the total sum of bets, with the prize guaranteed to be at least 100,000SEK (approximatelyEURO 10,000). If no numberwas uniquethe prize was sharedevenly among those who chose the smallest and least-frequently chosen number.There were also smallersecond and thirdprizes (SEK 1,000and SEK 20) forbeing close to thewinning number. Duringthe first three to fourweeks, it was onlypossible to play the game at physicalbranches of SvenskaSpel by fillingout a form(Figure All in theonline * Appendix).The formallowed players to bet on up to six numbers,1to play the same numbersfor up to 7 daysin a row,or to leta computerchoose random numbers for them(a "HuxFlux"option). During the following weeks it was also possibleto play online,see onlineAppendix С fora descriptionof the online interface. Daily data were downloadedfor the firstseven weeks, ending on the 18thof March2007. The game was stoppedon March24th, one day aftera newspaper articleclaimed that some players had colluded in thegame, but it is unclearwhether collusionactually occurred. Unfortunately,we have only gained access to aggregatedaily frequencies, not to individuallevel data. We also do notknow how many players used therandomiza- tionHuxFlux option. However, because the operators told us howHuxFlux worked, we can estimatethat approximately 19 percentof playerswere randomizing in the firstweek.12 Notethat the theoretical analysis of the LUPI gamein theprevious section differs fromthe field LUPI game in threeways. First, the theory used a tie-breakingrule in whichnobody wins if there is no uniquelychosen number (to simplifyexpected payoffcalculations enormously). In the fieldgame, however, players who tie by choosingthe smallest and least-frequentlychosen number share the prize. This is a minordifference because theprobability that there is no uniquenumber is very smalland it neverhappened during the 49 daysfor which we havedata. A second, moreimportant, difference is thatwe assumethat each playercan onlypick one

1' The rulethat players could only pick up to six numbersa day was enforcedby therequirement that players had to use a "gambler'scard" linked to theirpersonal identification number when they played. Colluding in LUPI can conceivablyincrease the probability of winningbut would require a remarkabledegree of coordinationacross a largesyndicate, and is also ifothers collude in a similar 12 risky way. In thefirst week, the randomizer chose numbers from 1 to 15,000with equal probability.The dropin numbers justbelow and above 15,000suggests the 19 percentfigure.

This content downloaded from 128.32.135.128 on Mon, 21 Jul 2014 17:24:34 PM All use subject to JSTOR Terms and Conditions 10 AMERICANECONOMIC JOURNAL:MICROECONOMICS AUGUST2011

Table 1- Descriptive Statistics and Poisson-NashEquilibrium Predictions for Field LUPI Game

All days Weekl Week7 Eq Avg. SD Min. Max. Avg. Avg. Avg. Numberof bets 53,783 7J82 38,933 69,830 57,017 47,907 53,783 Averagenumber played 2,835 813 2,168 6,752 4,512 2,484 2,595 Mediannumber played 1,675 348 436 2,273 1,204 1,936 2,542 Winningnumber 2,095 1,266 162 4,465 1,159 1,982 2,595 Lowestnumber not played 3,103 929 480 4,597 1,745 3,462 4,938 AboveT= 5,513 (percent) 6.65 6.24 2.56 30.32 20.11 2.80 0.00 First(25 percent)quartile 780 227 66 1,138 425 914 1,251 Third(75 percent)quartile 2,898 783 2,130 7,610 3,779 3,137 3,901 Below 100 (percent) 6.08 4.84 2.72 2.97 15.16 3.19 2.02 Below 1,000(percent) 32.31 8.14 21.68 63.32 44.91 27.52 20.03 Below 5,000 (percent) 92.52 6.44 68.26 96.74 78.75 96.44 93.32 Below 10,000(percent) 96.63 3.80 80.70 98.94 88.07 98.81 100.00 Evennumbers (percent) 46.75 0.58 45.05 48.06 45.91 47.45 50.00 Divisibleby 10 (percent) 8.54 0.466 7.61 9.81 8.43 9.01 9.99 Proportion1900-2010 (percent) 71.61 4.28 67.33 87.01 79.39 68.79 49.78 11,22, ...,99 (percent) 12.2 0.82 10.8 14.4 12.39 11.44 9.09 111,222, . . . , 999 (percent) 3.48 0.65 2.48 4.70 4.27 2.78 0.90 1111,2222,...,9999(1/1,000) 4.52 0.73 2.81 5.80 4.74 3.95 0.74 11111,22222, ..., 99999 (1/1,000) 0.76 0.84 0.15 5.45 2.26 0.21 0

Notes:Proportion of numbersbetween 1900 and 2010 refersto theproportion relative to numbersbetween 1844 and 2066. "11, 22, ..., 99" refersto theproportion relative to numbersbelow 100, "111, 222, ..., 999" refersto numbersbelow 1,000,and so on. The "Eq Avg"predictions refers to thepredictions of thePoisson-Nash equilib- riumwith n = 53, 783, andК = 99,999. number.In thefield game, players are allowed to bet on up to sixnumbers. This does playa rolefor the theoretical predictions, since it allows players to coordinateone's guessesto avoidchoosing the same numbermore than once (as couldbe thecase wheneach bid is placedby a differentplayer). Finally, we do nottake the second andthird prizes present in thefield version into account, but this is unlikelyto make a big differencegiven the strategic nature of the game. Nevertheless,these three differences between the payoff structures of thegame analyzedtheoretically, and the field game as itwas played,are a motivationfor run- ninglaboratory experiments with single bets, no opportunityfor direct , andonly a firstprize, which match the game analyzed theoretically more closely.

A. DescriptiveStatistics

Table 1 reportssummary statistics for the first 49 days of thegame. Two addi- tionalcolumns display the corresponding daily averages for the first and last weeks to see howmuch learning takes place. The lastcolumn displays the corresponding statisticsthat would result from play according to thePoisson equilibrium. Overall,the average number of betsN was 53,783,but there was considerable variationover time. There is no apparenttime trend in thenumber of participating players,but there is less participationon Sundaysand Mondays(see FigureA2 in theonline Appendix).13 The variationof the number of bets across days is therefore

13TheSunday-Monday average N (standarddeviation) is 44,886 (4,001) andthe Tuesday-Saturday average is 57,341 (5,810). Dividingthe sample in thisway does reducethe variance in N by almosthalf. However, the sum-

This content downloaded from 128.32.135.128 on Mon, 21 Jul 2014 17:24:34 PM All use subject to JSTOR Terms and Conditions VOL 3 NO. 3 ÖSTLINGETAL : TESTINGGAME THEORY IN THE FIELD 11 muchhigher than what the Poisson distribution predicts (its standarddeviation is 232). However,note that larger variance in N meanssometimes there are many fewer players(so chosennumbers should be smaller)and sometimesthere are many more players(so chosennumbers should be larger).Fixing the mean of N and increasing thevariance might therefore have littleoverall impact on theequilibrium number distribution(and has littleeffect in thelab datareported later). Despitesome differences between the simplified theory and the way the field lot- terygame was implemented,the average number chosen overall was 2,835,which is close to theequilibrium prediction of 2,595.The meannumber in thelast week 14 is 2,484,compared to theprediction of 2,595. The medianconverged toward the equilibriumprediction of 2,542,from 1,204 in thefirst week to 1,936 in thelast week.Winning numbers, and the lowest numbers not chosen by anyone, also varied a lotover time. In equilibrium,the first and foremost prediction is thatessentially nobody (fewer than0.01 percent)should choose a numberabove T = 5,513. In thefield lottery game,20 percentchose thesehigh numbers in thefirst week, but in thelast week only2.8 percentdid. For numbersabove 10,000,12 percentchose theseextremely highnumbers in the first week, but in the last week only 1 percentdid. This indicates bothcompelling convergence, as well as initialdeviation. In fact,the third quartile (75 percent)was 7,610 in day 1, butquickly dropped below 3,200,resulting in an averageof 3,779 forweek 1 and 2,443 forweek 2. Then,the third quartile con- vergedback toward the equilibrium prediction (3,901), endingup at 3,137 in week 7. All otheraggregate statistics in Table 1 are closerto theequilibrium predictions in thelast week than in thefirst week. Many of the statistics converge rather swiftly and closely.For example,although 20 percentchose numbersabove T - 5,513 in week 1, less than5 percentdid each dayfrom week 3 to 7.15 Aninteresting feature of the data is a tendencyto avoid round or focal numbers and choosequirky numbers that are perceived as "anti-focal"(as in hide-and-seekgames, see VincentP. Crawfordand Nagore Iriberri 2007a). Evennumbers were chosen less oftenthan odd ones (46.75 percent versus 53.25 percent). Numbers divisible by 10 are chosena littleless often than predicted. Strings of repeating digits (e.g., 1 1 1 1) arecho- sen too often.Players also overchoosenumbers that represent years in moderntime (perhapstheir birth years). If players had played according to equilibrium, the fraction ofnumbers between 1900 and2010 dividedby all numbersbetween 1844 and 2066 shouldbe 49.78 percent,but the actual fraction was 70 percent.16

тагу statisticsin thetwo groups are very close together(the means are 2,792 and 2,941). To judge thesignificance of thedifference between theory and datawe simulated1000 weekly average num- bersfrom the Poisson-Nash equilibrium. That is, 7 x 53,783i.i.d. draws were drawn from the equilibrium distribu- tionand theaverage number was computed.This yieldsone simulatedaverage. The procedurewas thenrepeated a totalof 10,000times to create10,000 simulated averages. The low and highrange of 9,500 of thesesimulated averages- a 95 percentconfidence interval - is 2,590to 2,600.Since the weekly averages in the data lie outsidethis extremelytight interval, we can concludethat the data are significantly different than those predicted by theory. But notethat this is an extremelydemanding test because the very large sample sizes meanthat the data mustlie very close to the to not the 15 theory reject theory. FigureA3 (in onlineAppendix) provides weekly boxplots of thedata and FigureA4 plotsaverage daily fre- quenciesfor week 1,3, 5, and 7 forthose who are interestedin weeklychanges in thedistribution and percentiles. õWe comparethe number of choicesbetween 1900 and 2010 to the numberof choices between1844 and 2066 sincethere are twice as manystrategies to choosefrom in thelatter range compared to thefirst. If all players

This content downloaded from 128.32.135.128 on Mon, 21 Jul 2014 17:24:34 PM All use subject to JSTOR Terms and Conditions 12 AMERICANECONOMIC JOURNAL:MICROECONOMICS AUGUST2011

Figure 2. NumbersChosen between 1900and 2010,and between 1844and 2066, duringAll Days in the Field

Figure2 showsthis focality in a histogramof numbersbetween 1900 and 2010 (aggregatingall 49 days). Note thatalthough the numbers around 1950 are most popular,there are noticeabledips at focalyears that are divisibleby ten.17Figure 2 also showsthe aggregate distribution of numbersbetween 1844 and 2066, which clearlyshows the popularity of numbers around 1950 and 2007. There are also spikes in thedata forspecial numbers like 2121, 2222, and 2345. Explainingthese focal numberswith the cognitive hierarchy model presented below is noteasy (unlessthe 0-stepplayer distribution is defined to includefocality), so we willnot comment on themfurther (see Crawfordand Iriberri 2007a fora successfulapplication in simpler hide-and-seekgames).

B. Results

Do subjectsin thefield LUPI gameplay according to thePoisson-Nash equilib- riumbenchmark? In orderto investigatethis, we assumethat the number of players is Poissondistributed with mean equal to theempirical daily average number of numberschosen (53,783). As notedpreviously, this assumption is wrongbecause thevariation in numberof betsacross days is muchhigher than what the Poisson distributionpredicts. Figure3 showsthe average daily frequencies from the first week together with theequilibrium prediction (the dashed line), for all numbersup to 99,999 and for therestricted interval up to 10,000.Recall thatin thePoisson-Nash equilibrium, probabilitiesof choosinghigher numbers first decrease slowly, drop quite sharply randomizeduniformly (an approximationtothe equilibrium strategy with large n andK), theproportion of numbers between1900 and 2010 wouldbe 50 percent. 17 Notethat it would be unlikelyto observethese dips reliably with typical experimental sample sizes. It is only withthe large amount of data available from the field (2.5 millionobservations) that these dips are visually obvious anddifferent in frequencythan neighboring unround numbers.

This content downloaded from 128.32.135.128 on Mon, 21 Jul 2014 17:24:34 PM All use subject to JSTOR Terms and Conditions VOL.3 NO. 3 ÖSTLINGETAL: TESTINGGAME THEORY IN THE FIELD 13

Figure 3. Average Daily Frequenciesand Poisson-NashEquilibrium Prediction for the FirstWeek in the Field (n = 53,783,К = 99,999) at around5,500, and asymptotesto zeroafter p55l3 « '/n (recallProposition 1 and Figure1). Comparedto equilibrium,there is overshootingat numbersbelow 1,000, undershootingat numbers between 2,000 and 5,500, and again overshooting beyond 18 5,500. It is also noteworthyhow spiky the data is comparedto the equilibrium pre- diction,which is a reflectionof clustering on specialnumbers, as describedabove. Nonetheless,the abilityof the very complicatedPoisson-Nash equilibrium approximationto capturesome of the basic featuresof the data is surprisinglygood. Forexample, most of the guesses (79.89 percentin thefirst week) areconcentrated at orbelow T = 5,513.As a refereenicely expressed this central result: "To me,the truncationof thedistribution (i.e., theset [Г,К] has negligiblemass) is thefirst- ordereffect of equilibriumreasoning. Furthermore, the relationship between k, K, and T is notobvious so thefinding that, by the seventh week, almost all ofthe mass ofthe empirical distribution is concentrated in [0, T] is quitestriking." Figure4 showsaverage daily frequencies of choices from the second through the last(7th) week. Behavior in this period is closerto equilibrium than in the first week. In particular,the overshooting of high numbers almost vanishes - only4.41 percent ofthe choices are above 5,513. However, when only numbers below 10,000are plot- ted,the overshooting of low numbersand undershooting of intermediatenumbers is stillclear (although the undershooting region shrinks to numbers between 4,000 and 5,500) andthere are stillmany spikes of clustered choices. The firstthree columns of Table 2 providethe frequencies for the last week of fielddata (FigureA4 in theonline Appendix) with a bin size of 500 up to number 5,500 and compareit withthe Poisson-Nash equilibrium prediction. The '2 test statisticis 53,864.6,strongly rejecting the equilibrium model. This suggeststhat althoughthere is onlysubstantial undershooting in thelast three bins, the data from thefinal week is stillfar from the Poisson-Nash equilibrium. Moreover, the drop fromaround 43,000 (in thefirst four bins) to around30,000 (in thenext four bins) is also difficultto accountfor.

18 This overshooting-undershooting-overshootingpatterncould explainwhy in Table 1, the averagenumber playedis close to theequilibrium prediction, while the first quartile and medianare alwaystoo low (thoughcon- vergingup) andthe percentage above 5,513 is alwaystoo high.

This content downloaded from 128.32.135.128 on Mon, 21 Jul 2014 17:24:34 PM All use subject to JSTOR Terms and Conditions 14 AMERICANECONOMIC JOURNAL:MICROECONOMICS AUGUST2011

Figure4. AverageDaily Frequenciesand Poisson-NashEquilibrium Prediction forWeek 2-7 in the Field (n = 53,783,К = 99,999)

Table 2- Frequency Table for the Last Week of Field Data, the Poisson-NashEquilibrium and Cognitive Hierarchy Model

Week7 versusequilibrium Week7 versuscognitive hierarchy Totalfrequency Averagefrequency Totalfrequency Averagefrequency (forall numbers) (foreach number) (forall numbers) (foreach number) Bin range Week7 Eq Week7 Ëq Week7 CH Week7 CH 1-500 47,047 33,796.5 94 бгб 47,047 43,538.8 94 87Л 501-1,000 45,052 33,448.3 90 66.9 45,052 42,641.2 90 85.3 1,001-1,500 41,489 33,060.8 83 66.1 41,489 42,343.9 83 84.7 1,501-2,000 43,815 32,624.0 88 65.2 43,815 41,257.8 88 82.5 2,001-2,500 33,827 32,123.5 68 64.2 33,827 39,631.4 68 79.3 2,501-3,000 29,850 31,537.8 60 63.1 29,850 36,794.6 60 73.6 3,001-3,500 33,115 30,832.0 66 61.7 33,115 32,437.4 66 64.9 3,501-4,000 25,765 29,943.8 52 59.9 25,765 25,532.5 52 51.1 4,001^,500 16,810 28,745.2 34 57.5 16,810 16,006.9 34 32.0 4,501-5,000 6,614 26,891.5 13 53.8 6,614 6,401.8 13 12.8 5,001-5,500 2,463 22,130.7 5 44.3 2,463 2,075.7 5 4.2 X2 53,864.6*** 107.6*** 2,891.4*** 5.66

***Significant at the1 percentlevel. **Significant at the5 percentlevel. * Significantat the10 percentlevel.

Nevertheless,given there are so muchdata, it is not surprisingthat the equi- libriummodel is rejected.One possibleremedy is to use theaverage number of guessesfor each number(in thebin range), instead of thetotal number of guesses. Thisis doneby simplydividing the bin totals by 500. We thenround these numbers intointegers so we can performa x2 test(like for the total number of guesses). The resultsare shownin column4 and 5 of Table 2. The x2 teststatistic is 107.6,still rejectingthe equilibrium model, but much smaller than that of the total. The toppanel of Table 3 providesadditional weekly goodness-of-fit measures for thePoisson-Nash equilibrium. Weekly x2 testresults are shown in the first two rows. In particular,the x2 teststatistics drop sharply from more than 640 in thefirst week to less than1 10 in thelast week. Nevertheless, the equilibrium prediction is rejected at the0.1 percentlevel for all weeks.

This content downloaded from 128.32.135.128 on Mon, 21 Jul 2014 17:24:34 PM All use subject to JSTOR Terms and Conditions VOL.3 NO. 3 ÖSTLINGETAL : TESTINGGAME THEORY IN THE FIELD 15

Table 3- Goodness-of-Fitfor Poisson-NashEquilibrium and Cognitive Hierarchy for Field Data

Week (1) (2) (3) (4) (5) (6) (7) Poisson-Nashequilibrium X2(for average frequency) 640.45 323.76 257.42 259.27 261.19 121.29 107.60 (Degreeof freedom) (10)*** (10)*** (10)*** (10)*** (10)*** (10)*** (10)*** Proportionbelow (percent) 48.95 61.29 67.14 67.44 69.93 76.25 76.23 ENO 2,176.4 4,964.4 6,178.4 7,032.4 8,995.0 14,056.8 13,879.3 Cognitivehierarchy model Log-likelihood -53,740 -31,881 -22,085 -19,672 -19,496 -19,266 -17,594 r 1.80 3.17 4.17 4.64 5.02 6.76 6.12 Л 0.0034 0.0042 0.0058 0.0068 0.0069 0.0070 0.0064 X2(for average frequency) 77.92 52.21 7.64 3.90 4.60 4.64 5.48 (Degreeof freedom) (9)*** (9)*** (8) (8) (8) (9) (9) Proportionbelow (percent) 62.58 72.57 78.65 80.17 82.09 82.43 82.24 ENO 3,188.8 7,502.5 9,956.0 12,916.1 17,873.0 21,469.6 21,303.0

Notes:The degreeof freedomfor a '2 testis thenumber of binsminus one. The proportionbelow the theoretical predictionrefers to thefraction of the empirical density that lies belowthe theoretical prediction. ***Significant at the 1 percentlevel. **Significant at the5 percentlevel. * Significantat the10 percentlevel.

Anotherpossibility is to calculatethe proportion of the empirical density that lies belowthe predicted density. This measureis one minusthe summed "miss rates", thedifferences between actual and predictedfrequencies, for numbers which are chosenmore often than predicted. In Table 3, thepercentage of empiricaldata that lies below equilibriumdensity is reportedin the thirdrow, increasing from just below50 percentin thefirst week to morethan 75 percentin thelast week. Finally,when all modelsare not literally true, one can comparemodels using the "equivalentnumber of observations"(ENO) of therelevant model computed from rawmean squared errors. ENO was firstproposed by Ido Erevet al. (2007) to com- parerelative performance of differentlearning models (that were all rejected)pre- dictingsubject behavior in gameswith mixed strategy equilibria, a similarsituation to whatwe have here.Roughly speaking, ENO representsthe weight one should puton themodel relative to theexisting data when predicting new observations. As statedin Erevet al. (2007): "The ENO of themodel is thevalue of [N] (thesize of theexperiment) that is expectedto lead to a predictionthat is as accurateas the model'sprediction." As shownin thefourth row of Table 3, theENO ofthe Poisson- Nashequilibrium increases from about 2,200 in week 1 to almost14,000 in week7, demonstratingthe improvement of equilibriumfrom week 1 to 7.19 The nextquestion is whetheran alternativetheory can explainboth the degree to whichthe equilibrium prediction is surprisinglyaccurate and thedegree to which thereis systematicdeviation.

19However,since there are on average47,907(^¿ 335,347/7)guesses every day in week 7, even an ENO of 14,000can stillbe easilyoutweighed by one dayof data.

This content downloaded from 128.32.135.128 on Mon, 21 Jul 2014 17:24:34 PM All use subject to JSTOR Terms and Conditions 16 AMERICANECONOMIC JOURNAL:MICROECONOMICS AUGUST2011

Figure5. Probabilityof ChoosingNumbers 1-20 in SymmetricQRE (n= 26.9,K= 99, Л = 0.001,...,10) and in the Poisson-NashEquilibrium (n = 26.9,К = 99)

С. RationalizingNon-Equilibrium Play

A naturalway to modellimits on strategicthinking is by assumingthat different playerscarry out different numbers of stepsof iterated strategic thinking in a cogni- tivehierarchy (CH). Thisidea has beendeveloped in behavioral game theory by sev- eralauthors (e.g., Rosemarie Nagel 1995; Dale О. Stahland Paul W. Wilson1995; MiguelCosta-Gomes, Crawford, and Bruno Broseta 2001; Camerer,Teck-Hua Ho, andJuin-Kuan Chong 2004; and Costa-Gomesand Crawford2006) and appliedto manygames of differentstructures (e.g., Crawford2003; Camerer,Ho, and Chong 2004; Crawfordand Iriberri 2007b; andTore Ellingsen and Östling 2010).20 One alternativecandidate to cognitivehierarchy would be thequantal response equilibrium(QRE). QRE andCH havebeen compared to Nashpredictions in many experimentalstudies, and theyoften explain deviations from Nash equilibrium in similarways (e.g., BrianW. Rogers,Thomas R. Palfrey,and Camerer2009). However,QRE and CH can be clearlydistinguished in LUPI games since QRE seemsto predict too few low-number choices. For example, for n = 26.9 playersand numberchoices from 1 toК = 99 (as implementedin ourlab experiment),Figure 5 showsa 3-dimensionalplot of theQRE probabilitydistributions for many values of Л, alongwith the Poisson-Nash equilibrium.21 When A is low,the distribution is approximatelyuniform. As Л increasesmore probability is placed on lowernum- bers1-12. WhenЛ is highenough the QRE closelyapproximates the Poisson-Nash equilibrium,which puts roughly linear declining weight on numbers1 to 15 and

20 A precursorto thesemodels was theinsight, developed much earlier in the 1980's by researchersstudying negotiation,that people often "ignore the cognitions of others" in asymmetric-informationbidding and negotiation games(Max H. Bazermanet al. 2000). The plotshows the QRE basedon a powerfunction, but the picture looks identical with a logitfunction.

This content downloaded from 128.32.135.128 on Mon, 21 Jul 2014 17:24:34 PM All use subject to JSTOR Terms and Conditions VOL.3 NO. 3 ÖSTL1NGETAL : TESTINGGAME THEORY IN THE FIELD 17 infinitesimalweight on highernumbers. We thereforefocus on a cognitivehierarchy modelin thispaper. Level-/:and cognitivehierarchy models require a specificationof how &-step playersbehave and theproportions of playersfor various k. We followCamerer, Ho, andChong (2004) and assumethat the proportion of players that do к thinking stepsis Poissondistributed with mean r, i.e., theproportion of players that think in к stepsis givenby

/(*) = e'Trk/kl.

We assumethat £-step thinkers incorrectly believe that all otherplayers can only thinkfewer steps than themselves, but correctly guess theproportions of players doing0 to к - 1 steps(as a truncatedPoisson distribution). In otherwords, level- 1 thinkersbelieve all otherplayers are level-0 types.Level-2 thinkers believe there arelevel-0 types and level- 1 types.Level-3 thinkers believe there are level-0, level- 1 and level-2types, and so on.22Then the conditional density function for the belief of a &-stepthinker about the proportion of / < к stepthinkers is

Х^оДА) The IA andЕЕ propertiesof Poisson games (together with the general type speci- ficationdescribed earlier) imply that the numberof playersthat a k-sttpthinker believeswill play strategy i is Poissondistributed with mean

k-' щ' = "E 8kU)pjr

Hence,the expected payoff for a /¿-stepthinker of choosing number / is

7=1

To fitthe data well, it is necessaryto assumethat players respond stochastically (as in QRE) ratherthan always choose best responses (see also Rogers,Palfrey, and 23 Camerer2009). We assumethat level-0 players randomize uniformly across all

22 An alternativeapproach which often has advantagesis thatlevel-/: types think all otherplayers are exactly level(k - 1). However,this does notwork in LUPI games:If we startout with LO typesplaying random, LI types shouldall play 1. If L2 typesbelieve there are only LI types,they should never play 1. If L3 typesbest respond to onlyL2 types,then they should all play 1 (sincethey believe nobody is playing1), and thislogic will continueto cycle. The CH modelwith best-response piles up mostpredicted responses at a verysmall range of thelowest inte- gers( 1-step thinkers choose 1, 2-stepthinkers choose 2, and ¿-stepthinkers will neverpick a numberhigher than k). Assumingquantal response smoothes out the predicted choices over a widernumber range.

This content downloaded from 128.32.135.128 on Mon, 21 Jul 2014 17:24:34 PM All use subject to JSTOR Terms and Conditions 18 AMERICANECONOMIC JOURNAL:MICROECONOMICS AUGUST201 1

x 1Ö4 6 -

•' ' <- Cognitivehierarchy (r = 1.80, Л = 0.0034)

- 2 3 ' Cognitivehierarchy (r = 6.12, Л = 0.0064)

*- ' ' ^N. Equilibrium

0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,000 Numberschosen (truncatedat 10,000)

Figure6. Probabilityof ChoosingNumbers 1-10,000 in the Poisson-NashEquilibrium and the CognitiveHierarchy Model (n = 53,783,К = 99,999) numbers1 to K, andhigher-step players best respond with probabilities determined by a normalizedpower function of expected payoffs.24 The probabilitythat а к stepplayer plays number / is givenby

k_ (nj-îli-f^g-ф-«*)* ' - EL (n¿! [l née-^}e-»«)X - forA > 0. Since q) is definedrecursively it onlydepends of whatlower step thinkersdo - it is straightforwardto compute the predictedchoice probabilities numericallyfor each typeof &-stepthinker (for given values of т and A) usinga loop, thenaggregating the estimated p' acrosssteps к. Apartfrom the number of playersand the number of strategies,there are twoparameters: the average number ofthinking steps, r, andthe precision parameter, A. Figure6 showsthe prediction of the cognitive hierarchy model for the parameters of thefield LUPI game,i.e., whenn = 53,783 and К = 99,999.The dashedline

24 In manyprevious studies logit choice functions are typically used and they fit comparably to power functions (e.g., Camererand Ho 1998 forlearning models). Some QRE applicationshave used powerfunctions and found betterfits (e.g., in auctions,Jacob K. Goeree,Charles A. Holt,and Palfrey2002). However,in thiscase a logit choicefunction fits substantially worse for the field data (with 99,999 numbers to choosefrom). The reasonis that logitchoice probabilities are convex in expectedpayoff. This impliesnumerically that probabilities are either sub- stantialfor only a smallnumber of the 99,999 possible numbers, or areclose to uniformacross numbers. The logit CH modelsimply cannot fit the intermediate case in whichthousands of numbers are chosen with high probability andmany other numbers have very low probability(as in thedata).

This content downloaded from 128.32.135.128 on Mon, 21 Jul 2014 17:24:34 PM All use subject to JSTOR Terms and Conditions VOL 3 NO. 3 ÖSTLINGETAL: TESTINGGAME THEORY IN THE FIELD 19

Figure7. AverageDaily Frequencies,Cognitive Hierarchy (solid line) and Poisson-NashEquilibrium Prediction (dashed line) for the FirstWeek in the Field {n = 53,783,К = 99,999,т= 1.80,Л = 0.0034)

correspondsto thecase whenplayers do relativelyfew steps of reasoning and their responsesare verynoisy (r = 1.80 and Л = 0.0034). The dottedline corresponds to thecase whenplayers do moresteps of reasoningand respondmore precisely (r = 6.12 and Л = 0.0064). Increasingr and Л createsa closerapproximation to thePoisson-Nash equilibrium, although even witha highr thereare too many choicesof low numbers. Can thecognitive hierarchy model account for the main deviations from equi- libriumdescribed in theprevious section? The bottompanel of Table 3 reportsthe resultsfrom the maximum likelihood estimation of the data using the cognitive hier- archymodel.25 The best-fittingestimates week-by-week suggest that both param- etersincrease over time. The averagenumber of thinkingsteps that people carry out,r, increasesfrom about 1.8 in thefirst week - an estimatein linewith estimates from1.0 to 2.5 thattypical fit experimental datasets well (Camerer,Ho, andChong 2004)- to 6 in thelast week. Figure7 showsthe average daily frequencies from the first week togetherwith theCH estimationand the equilibrium prediction. The CH modeldoes a reasonable job of accountingfor the over- and undershootingtendencies at low and interme- diatenumbers (with the estimated f = 1.80). In laterweeks, the week-by-week

25 It is difficultto guaranteethat these estimates are globalmaxima since the likelihood function is notsmooth andconcave. We also useda relativelycoarse grid search, so theremay be otherparameter values that yield slightly higherlikelihoods and differentparameter values.

This content downloaded from 128.32.135.128 on Mon, 21 Jul 2014 17:24:34 PM All use subject to JSTOR Terms and Conditions 20 AMERICANECONOMIC JOURNAL:MICROECONOMICS AUGUST201 1

Figure 8. AverageDaily Frequencies,Cognitive Hierarchy (solid line) and Poisson-NashEquilibrium Prediction (dashed line) for the Last Weekin the Field (n = 53,783,K= 99,999,r= 6.12,Л = 0.0064) estimatesof т driftupward a little(and Л increasesslightly), which is a reduced- formmodel of learningas an increasein themean number of thinkingsteps. In the lastweek the cognitive hierarchy prediction is muchcloser to equilibrium(because r is around6) butis stillconsistent with the smaller amounts of over-and under- shootingof low andintermediate numbers (see Figure8). To getsome notion of howclose to thedata the fitted cognitive hierarchy model is, thebottom panel of Table 3 displaysseveral goodness-of-fit statistics. First, the log-likelihoodsreveal that the cognitivehierarchy model does betterin explain- ingthe data towardthe last week and is alwaysmuch better than Poisson-Nash.26 However,as shownin theright panel of Table 2, thoughpredictions of thecogni- tivehierarchy model are muchcloser to thedata than that of equilibrium,the large numberof observationssimply forces the x2 testto rejectthe model, even when we bin 500 numbersinto one category.In particular,the last week x2 teststatistic forthe cognitive hierarchy model is 2,891.4,much smaller than that of equilibrium (53,864.6),but still at an extremelyhigh level of significance. As discussedabove, one remedyis to considerrounded averages for each bin. This is reportedin thelast twocolumns of Table 2. The last week x2 teststatistic

26 Since thecomputed Poisson-Nash equilibrium probabilities are e fork > 5,518,the likelihood is always essentiallyzero for the equilibrium prediction. In onlineAppendix B, however,we computethe log-likelihood for thelow numbersonly. Based on GideonSchwarz' (1978) informationcriterion, the cognitive hierarchy model still performsbetter in all weeks.

This content downloaded from 128.32.135.128 on Mon, 21 Jul 2014 17:24:34 PM All use subject to JSTOR Terms and Conditions VOL.3 NO. 3 ÖSTLINGETAL: TESTINGGAME THEORY IN THE FIELD 21 is 5.66, implyingthat the prediction is notstatistically different from the cognitive hierarchyprediction. Since there is one cell thathas a predictedvalue less thanfive, it mightnot be appropriateto conducta x2 testincluding that cell. Hence,the bot- tompanel of Table 3 reportsweekly x2 testresults that focus on onlythe bins that havea predictedvalue greater than 5. The fifthrow of the bottom panel provides the numberof bins used, and the fourth row reports x2 teststatistics. As can be seen,the cognitivehierarchy model is stillstrongly rejected in thefirst two weeks, though not as stronglyas equilibrium,but not in thefollowing weeks. Two othermeasures of goodness-of-fitpreviously discussed are also computed in orderto comparethe CH modelwith the equilibrium prediction. In particular,the proportionof theempirical density that lies belowthe predicted density is reported in sixthrow of thebottom panel (in Table 3). The cognitivehierarchy model does betterthan the equilibrium prediction in all sevenweeks based on thisstatistic. For example,in thefirst week, 63 percentof players'choices were consistent with the cognitivehierarchy model, whereas 49 percentwere consistent with equilibrium. However,both models improve substantially across the weeks. On theother hand, weeklyENO is calculatedand reportedin thefourth row of thebottom panel (in Table3). Again,the cognitive hierarchy model does betterin all sevenweeks, start- ingfrom around 3,200 in thefirst week and end up at 21,000in thefinal two.27 In conclusion,the cognitive hierarchy model performs better than the Poisson- Nash equilibriumin all sevenweeks of data regardlessof whatmeasure is used, explainingthe systematicdeviation from equilibrium. In particular,the cognitive hierarchymodel can rationalizethe tendencies that some numbers are played more, as well as theundershooting below the equilibrium cutoff. The value-addedof the cognitivehierarchy model is notprimarily that it gives a slightlybetter fit, but that it providesa plausiblestory for how players manage to playso close to equilibrium.28

III. The LaboratoryLUPI Game

We conducteda parallellab experimentfor two reasons. First,the rules of the fieldLUPI game do not exactlymatch the theoretical assumptionsused to generatethe Poisson-Nash equilibrium prediction. In thefield data somechoices were made by a randomnumber generator, some players might havechosen multiple numbers or colluded,there were multiple prizes, and the vari- ance inTV is largerthan the Poisson distribution variance. In thelab, we can moreclosely implement the assumptions of thetheory. If the theoryfits poorly in thefield and closelyin thelab, thenthat suggests the theory is on theright track when its underlying assumptions are mostcarefully controlled. If

27 Again,since the total number of guesses in week7 is 335,347,even though the CH modelhas a muchhigher ENO thanPoisson-Nash equilibrium, it also can be easilyoutweighed by merelyone dayof data. 28 Nonetheless,one mightwonder whether the parameter-free Poisson-Nash equilibrium does worsethan cogni- tivehierarchy merely because the latter has twoparameters. We addressthis issue in OnlineAppendix В byestimat- ingthe Poisson-Nash equilibrium model week-by-week to obtainthe best n (meanof the Poisson distribution) that minimizesmean squared error instead of maximizingempirical likelihood. As shownin Table Al, theestimated Poisson-Nashmodel still performs worse than the cognitive hierarchy model in week 1-5, butcatches up and is comparableto cognitive hierarchy in week6 and7. Nevertheless,to makethis prediction, the estimated n haveto be muchlower than the actual n and itis notclear how such incorrect beliefs could be sustained.

This content downloaded from 128.32.135.128 on Mon, 21 Jul 2014 17:24:34 PM All use subject to JSTOR Terms and Conditions 22 AMERICANECONOMIC JOURNAL:MICROECONOMICS AUGUST201 1 thetheory fits closely in bothcases, thissuggests that the additional factors in the fieldthat are excluded from the theory do notmatter. If thetheory fits both well, but slightlybetter in thelab, thisis also reassuringsince it indicatesthe additional fac- torsin thefield contributed merely noise. Second,because thefield game is rathersimple, it is possibleto designa lab experimentwhich closely matches the field in itskey features. How closelythe lab andfield data match provides some evidence in ongoingdebate about how well lab resultsgeneralize to comparablefield settings (e.g., Levittand List 2007, and Falk andHeckman 2009). In designingthe laboratory game, we compromisebetween two goals: to createa simpleenvironment in whichtheory should apply (theoretical validity), and to rec- reatethe features of thefield LUPI gamein thelab (specializedexternal validity). Because we use thisopportunity to createan experimentalprotocol that is closely matchedto a particularfield setting, we oftensacrificed theoretical validity in favor ofclose fieldreplication. The firstchoice is thescale of thegame: The numberof players(N), possible numberchoices (K), and stakes.We choose to scale downthe number of players andthe largest payoff by a factorof 2,000.This implies that there were on average 26.9 playersand theprize to thewinner in each roundwas $7. We scaled down К by a factorof 1,000since К - 99 allowsfor focal numbers such as 66, 77, 88, and 99 to be chosen,and theshape of theequilibrium distribution has someof the basicfeatures of the equilibrium distribution for the field data parameters (e.g., most numbersshould be below 10 percentof K). Since thefield data span49 days,the experimentalso has 49 roundsin each session.(We typicallyrefer to experimental roundsas "days"and seven-"day"intervals as "weeks"for semantic comparability betweenthe lab andfield descriptions.) The numberof playersin each roundwas drawnfrom a distributionwith mean 26.9. In threeof thefour sessions, subjects were told the mean number of players, and thatthe number varied from round to round,but did notknow the distribution (in orderto match the field situation in whichplayers were very unlikely to know the totalnumber playing each day).Due to a technicalerror, in thesethree sessions, the variancewas lowerthan the Poisson variance (7.2 to 8.6 ratherthan 26.9). However, thismistake is likelyto have little effect on behaviorbecause subjects only know the winningnumber in each roundand can drawlittle inference about the underlying distributionof thenumber of players.In thelast session,the number of playersin each roundwas drawnfrom a Poissondistribution with mean 26.9 andthe subjects wereinformed about this (Figure A5 in theonline Appendix). Furthermore, the data fromthe true Poisson session and the lower- variance sessions look statistically simi- larso we pool themfor all analysis(see below). Some designchoices made the lab settingdifferent from the fieldsetting but closerto theassumptions of the theory. In contrastto thefield game, in thelab each playerwas allowedto choose only one number, they could not use a randomnumber generator,there was onlyone prize per round, $7, andif there was no uniquenumber nobodywon. In thefield data, we do notknow how muchSwedish players learned each day aboutthe full distribution ofnumbers that were chosen. The numberswere available

This content downloaded from 128.32.135.128 on Mon, 21 Jul 2014 17:24:34 PM All use subject to JSTOR Terms and Conditions VOL.3 NO. 3 ÖSTLINGETAL. : TESTINGGAME THEORY IN THE FIELD 23 onlineand partiallyreported on a TV show.To maintainparallelism with the field, onlythe winning number was announcedin thelab. Four laboratorysessions were conductedat the CaliforniaSocial Science ExperimentalLaboratory (CASSEL) at theUniversity of CaliforniaLos Angeles on March22nd and March25th of 2007, and on March3, 2009. The experiments wereconducted using the Zürich Toolbox for Ready-made Economic Experiments (zTree) developedby Urs Fischbacher,as describedin Fischbacher(2007). Within each session,38 graduateand undergraduatestudents were recruited,through CASSEL's web-basedrecruiting system. All subjectsknew that their payoff will be determinedby theirperformance. We madeno attemptto replicatethe demograph- ics of thefield data, which we unfortunatelyknow very little about. However, the playersin thelaboratory are likelyto differin termsof gender,age and ethnicity comparedto theSwedish players. In thefour sessions, we had slightlymore male thanfemale subjects, with the great majority clustered in theage bracketof 18 to 22, andthe majority spoke a secondlanguage. Half of the subjects had never partici- patedin anyform of lottery before. Subjects had variouslevels of exposure to game theory,but very few had seenor heardof a similargame prior to thisexperiment.

A. ExperimentalProcedure

At thebeginning of each session,the experimenter first explained the rules of theLUPI game.The instructionswere based on a versionof the lottery form for the fieldgame translated from Swedish to English(see onlineAppendix D). Subjects werethen given the option of leaving the experiment, in orderto see howmuch self- selectioninfluences experimental generalizability. None of therecruited subjects chose to leave,which indicates a limitedrole forself-selection (after recruitment andinstruction). In threeof the four sessions, subjects were told that the experiment would end at a predetermined,but non-disclosed time to avoidan end-gameeffect (also matching thefield setting, which ended abruptly and unexpectedly).Subjects were also told thatparticipation was randomlydetermined at thebeginning of each round,with 26.9 subjectsparticipating on average.Subjects in thefourth session were explicitly toldthere were 49 rounds,and thenumber of playerswas drawnfrom a Poisson distribution.They were also shownin the instructions a graph showing a distribution functionfor a Poissondistribution with mean 26.9. In the beginningof each round,subjects were informedwhether they would activelyparticipate in thecurrent round (i.e., if theyhad a chanceto win). They wererequired to submita numberin each round,even if theywere not selected to participate.The differencebetween behavior of selectedand non-selectedplayers givesus someinformation about the effect of marginalincentives on performance (see Camererand Robin M. Hogarth1999). Whenall subjectshad submitted their chosen numbers, the lowest unique positive integerwas determined.If therewas a lowestunique positive integer, the winner earned$7; if no numberwas unique,no subjectwon. Each subjectwas privately informed,immediately after each round,what the winning number was, whether theyhad won thatparticular round, and theirpayoff so farduring the experiment.

This content downloaded from 128.32.135.128 on Mon, 21 Jul 2014 17:24:34 PM All use subject to JSTOR Terms and Conditions 24 AMERICANECONOMIC JOURNAL:MICROECONOMICS AUGUST2011

Figure9. LaboratoryTotal Frequenciesand Poisson-NashEquilibrium Prediction {all sessions,participating players only, n = 26.9,К = 99)

Thisprocedure was repeated49 times,with no practicerounds (as is thecase ofthe field).After the last round,subjects were asked to completea shortquestionnaire whichallowed us to buildthe demographics of our subjectsand a classificationof strategiesused. In twoof the sessions, we includedthe cognitive reflection test as a wayto measure cognitive ability (to be describedbelow). All sessionslasted for less thanan hour,and subjectsreceived a show-upfee of $8 or $13 in additionto earn- ingsfrom the experiment (which averaged $8.60). Screenshotsfrom the experiment areshown in onlineAppendix D.

B. Lab DescriptiveStatistics

In theremainder of thepaper, we focusonly on thechoices from incentivized subjectsthat were selectedto activelyparticipate. It is noteworthy,however, that thechoices of participating and non-participating subjects did notsignificantly dif- fer(/7-value 0.66, Mann-Whitney). The choicesof participatingsubjects from the sessionwith the announcedPoisson distributionand thepooled otherthree ses- sionsdo notsignificantly differ at the five percent level (p = 0.058,Mann- Whitney, p = 0.59 based on Mest withclustered standard errors). In theremainder of the paperwe thereforepool all foursessions. Figure9 showsthe data for the choices of participating players (together with the Poisson-Nashequilibrium prediction). There are very few numbers above 20 so the numbers1 to 20 arethe focus in subsequentgraphs. In linewith the field data, play- ershave a slightpredilection for certain numbers, while others are avoided. Judging fromFigure 9, subjectsavoid some even numbers, especially 10, while they endorse theodd (and prime)numbers 11, 13, and 17. Interestingly,only one subjectplayed 20, while19 was playedten times and 21 was playedseven times. Table4 showssome descriptive statistics for the participating subjects in thelab experiment.As in thefield, some players in thefirst week tendto pick veryhigh numbers(above 20) butthe percentage shrinks by theseventh week. The average numberchosen in thelast weekcorresponds closely to theequilibrium prediction (5.8 versus5.2) and the mediansare identical(5.0). Both the averagewinning numbersand thelowest unchosen numbers are relativelyclose to theequilibrium

This content downloaded from 128.32.135.128 on Mon, 21 Jul 2014 17:24:34 PM All use subject to JSTOR Terms and Conditions VOL 3 NO. 3 ÖSTLINGETAL : TESTINGGAME THEORY IN THE FIELD 25

Table 4- Descriptive Statistics for Laboratory Data

Allrounds R.l-7 R.43^9 Eq Avg. SD Min. Max. Avg. Avg. Avg. Averagenumber played 1% L43 432 1Z55 8^56 Jm Jj2 Mediannumber played 4.65 1.03 3 10 6.14 5.00 5 First(25 percent)quartile 2.50 0.54 2 4 3.00 2.43 3 Third(75 percent)quartile 7.24 1.72 5 17 10.21 7.14 8 Below 20 (percent) 98.02 2.77 81.98 100.00 93.94 98.42 100.00 AboveT = 11 (percent) 5.60 6.52 0 42.34 16.52 4.64 2.44 Evennumbers (percent) 45.19 4.47 35.16 53.47 42.11 49.15 46.86 Session 1 Winningnumber 6.02 9.38 1 67 13.00 2.50 5.22 Lowestnumber not played 8.08 2.57 1 12 4.86 8.14 8.44 Session2 Winningnumber 5.07 2.59 1 10 5.83 5.14 5.22 Lowestnumber not played 7.47 2.96 1 12 6.29 8.43 8.44 Session3 Winningnumber 5.61 3.26 1 14 6.14 5.67 5.22 Lowestnumber not played 7.53 2.68 2 13 7.43 10.00 8.44 Session4 (Poisson) Winningnumber 5.81 3.62 1 17 6.71 3.14 5.22 Lowestnumber not played 7.61 3.30 1 13 5.14 8.14 8.44

Notes: Summarystatistics are based onlyon choicesof subjectswho are selectedto participate.The equilibrium - columnrefers to whatwould result if all playersplayed according to equilibrium(n = 26.9 and К 99). prediction.The tendencyto pickodd numbersdecreases over time - 42 percentof all numbersare even in thefirst week, whereas 49 percentare even in thelast week. As in thefield data, the overwhelming impression from Table 4 is thatconvergence to equilibriumis quiterapid over the 49 periods(despite receiving feedback only aboutthe winning number).

С AggregateResults

In thePoisson equilibrium with 26.9 averagenumber of players,strictly posi- tiveprobability is puton numbers1 to 16, whileother numbers have probabilities numericallyindistinguishable from zero. Figure 10 showsthe average frequencies playedin week 1 to 7 togetherwith the equilibrium prediction (dashed line) andthe estimatedweek-by-week results using the cognitive hierarchy model (solid line). These graphsclearly indicate that learning is quickerin thelaboratory than in the field.Despite that the only feedback given to playersin each roundis thewinning number,behavior is remarkablyclose to equilibriumalready in thesecond week. However,we can also observethe same discrepancies between the equilibrium pre- dictionand observedbehavior as in thefield. The distributionof numbersis too spikyand thereis overshootingof low numbersand undershootingat numbersjust belowthe equilibrium cutoff (at number16). Figure10 also displaysthe estimates from a maximumlikelihood estimation of thecognitive hierarchy model presented in theprevious section (solid line).29 In this

29 To illustratehow the CH modelbehaves, consider N = 26.9 andК = 99, withr = 1.5 and Л = 2. FigureA6 (in theonline Appendix) shows how 0 to 5 stepthinkers play LUPI andthe predicted aggregate frequency, summing

This content downloaded from 128.32.135.128 on Mon, 21 Jul 2014 17:24:34 PM All use subject to JSTOR Terms and Conditions 26 AMERICANECONOMIC JOURNAL:MICROECONOMICS AUGUST2011

8 ^Rr-Ì' r-, ■ В FIRN - 8 X ПШп'П в К' . б 'Г'

Week1 Week2 Week3

14« Д П и16 ' - 16- ■ » - 12 ъ'V "-ьП « ^'Яп^ 12ип0л vr4j'i А

llkjWeek 4 ШЩWeek 5 .. j WkJ.Week 6

I rrln П ' lìn'

Week7 Numberschosen (truncatedat 20)

Figure 10.Average Daily Frequenciesin the Laboratory,Poisson-Nash Equilibrium Prediction (dashed lines) and EstimatedCognitive Hierarchy (solid lines), Week 1-7 (n = 26.9,К = 99)

estimation,we use theestimated weekly rfrom the field data and estimate Л only.30 We do notestimate both parameters since we aremost likely over-fitting byallowing twofree parameters to estimaterelatively few choice probabilities.31 The cognitive hierarchymodel can accountboth for the spikes and theover- and undershooting. Table5 showsthe estimated Л parameter.There is no cleartime trend in thisparam- eterand the A parametermoves around quite a lotover the weeks. We also estimate theprecision parameter A whilekeeping the average number of thinking steps fixed at 1.5, whichhas been shownto be a value of r thatpredicts experimental data well in a largenumber of games (Camerer,Ho, and Chong2004). The estimated precisionparameter is in thiscase considerablylower in thefirst week, but is then relativelyconstant.32 acrossall thinkingsteps. In thisexample, 1-step thinkers put most probability on number1, 2-step thinkers put most probabilityon number5, and 3-stepthinkers put most probability on numbers3 and7. The alternativewould be to fixЛ andestimate r, butthere is no wayto tellwhat a "reasonable"value of Л is. The precisionparameter Л dependson scalingof payoffs, the number of strategies,etc., and can notbe interpreted acrossgames. 31 Whentrying to estimateboth parameters simultaneously, we founddifferent estimates for different grid sizes andinitial values. Most estimates of r werebetween 6 and 12 and Л weremost often between 10 and20 (apartfrom thefirst week which always resulted in а Л of 1.32 ). The log-likelihoodis neithersmooth nor concave with fixed r either,but with only one parameterto estimatewe coulduse a veryfine grid to searchfor the best-fitting parameter. 32 FigureA7 (in theonline Appendix) shows the fitted cognitive hierarchy model when r is restrictedto 1.5. It is clearthat the model with r = 1.5 can accountfor the undershooting also whenthe number of thinkingsteps is fixed,but it has difficultiesin explainingthe overshooting of low numbers.The mainproblem is thatwith r = 1.5,

This content downloaded from 128.32.135.128 on Mon, 21 Jul 2014 17:24:34 PM All use subject to JSTOR Terms and Conditions VOL 3 NO. 3 ÖSTLINGETAL : TESTINGGAME THEORY IN THE FIELD 27

Table 5- Goodness-of-Fitfor Poisson-NashEquilibrium and Cognitive Hierarchy Model for Laboratory Data

Week (1) (2) (3) (4) (5) (6) (7) Poisson-Nashequilibrium X2(for average frequency) 24.69 24.14 18.86 21.81 20.17 11.58 21.39 (Degreeof freedom) (5)*** (5)*** (5)*** (5)*** (5)*** (5)** (5)*** Proportionbelow (percent) 82.25 88.55 87.61 88.64 88.64 92.86 87.06 ENO 158.5 202.5 173.2 239.5 244.1 844.4 200.3 Cognitivehierarchy model Log-likelihood -210.4 -104.3 -88.6 -88.7 -87.5 -80.2 -99.4 r (fromfield) 1.80 3.17 4.17 4.64 5.02 6.76 6.12 Л 1.26 5.97 16.89 5.59 5.28 22.69 4.52 X2(for average frequency) 24.31 18.80 8.49 4.57 6.83 2.74 10.06 (Degreeof freedom) (5)*** (5)*** (5) (5) (5) (5) (5)* Proportionbelow (percent) 84.62 87.44 90.52 92.54 92.42 91.11 91.07 ENO 296.0 263.3 466.8 4,909.9 3,475.8 894.7 502.9 Restrictedcognitive hierarchy model A(r=1.5) 1.09 2.52 2.57 2.63 2.60 2.31 2.08

Notes:The degreeof freedomfor a x2 testis thenumber of binsminus one. The proportionbelow the theoretical predictionrefers to thefraction of the empirical density that lies belowthe theoretical prediction. ***Significant at the1 percentlevel. **Significant at the5 percentlevel. * Significantat the10 percentlevel.

Table5 providessome goodness-of-fit statistics for the cognitive hierarchy model andthe Poisson-Nash equilibrium prediction. Consistent with results in thefield, the cognitivehierarchy model fits data slightly better than the (parameter free) Poisson- Nashequilibrium in mostweeks. In particular,the x2 testrejects equilibrium for all 7 weeks,but cannot reject the cognitive hierarchy model starting from week 3, even whenwe onlybin 2 numbersinto one category.33Similarly, based theproportion belowthe predicted density, the equilibrium prediction does remarkablywell, while thecognitive hierarchy model does even better in all butthe second and sixth weeks.34 The ENO resultsalso confirmthat equilibrium does prettywell startingfrom the secondweek, while cognitive hierarchy always does betterthan equilibrium.

D. IndividualResults

An advantageof thelab overthe field, in thiscase, is thatthe behavior of indi- vidualsubjects can be trackedover time and we can gathermore information about themto linkto choices.Online Appendix D discussessome details of these analyses butwe summarizethem here only briefly.

thereare too manyzero-step thinkers that play all numbersbetween 1 and 99 withuniform probability. The log- likelihoodsfor the CH modelwith r = 1.5 rangefrom -241 in week 1 to -212 in week2, whichare muchworse thanwhen using the field values of r. 33 We onlyuse 6 bins(up to number12) hereto preventthe predicted number of observationsto dropbelow 5. Even ifwe do notbin numbers at all, thex2 test(up to number12) yieldssimilar results, rejecting the equilibrium predictionfor all weeks,and rejecting the cognitive hierarchy model for week 1,2,3, 6, and7 (at the5 percentlevel) andmarginally for week 4 and 5 (at the10 percentlevel). In onlineAppendix В we calculatethe log-likelihoods using data from numbers 1 to 16, whichallows us to comparethe equilibrium prediction with cognitive hierarchy. Based on theSchwarz (1978) informationcriterion, thecognitive hierarchy model outperforms equilibrium in all weeks.

This content downloaded from 128.32.135.128 on Mon, 21 Jul 2014 17:24:34 PM All use subject to JSTOR Terms and Conditions 28 AMERICANECONOMIC JOURNAL:MICROECONOMICS AUGUST201 1

Table 6- Panel Data RegressionsExplaining Individual NumberChoices in the Laboratory

All periods Weekl Week2 Week3-7 Round(1-49) 0.001 -0.109 -0.065 0.023 (0.13) (-0.42) (-0.62) (1.58) t - 1 winner 0.178*** 0.148** 0.304*** 0.059* (4.89) (2.38) (2.98) (1.89) ř-2winner 0.133*** 0.096 0.242** 0.038* (2.98) (1.18) (2.40) (1.68) t - 3 winner 0.083* 0.052 -0.050 0.030 (1.94) (0.65) (-0.63) (1.18) Fixedeffects Yes Yes Yes Yes Observations 4,360 421 585 3,354 R2 0.03 0.09 0.01 0.00

Notes:The tablereport results from a linearsubject fixed effects panel regression. Only actively participatingsubjects are included,ř-statistics based on clusteredstandard errors are within parentheses. ***Significant at the1 percentlevel. **Significant at the5 percentlevel. * Significantat the10 percentlevel.

In a post-experimentalquestionnaire, we askedpeople to statewhy they played as theydid. We codedtheir responses into four categories (sometimes with multiple categories):"Random," "stick" (with one number),"lucky," and "strategic" (explic- itlymentioning response to strategiesof others).The fourcategories were coded 35, 30, 11,and 38 percentof the time. These categories had somerelation to actual choicesbecause "stick"players chose fewerdistinct numbers and "lucky"players hadnumber choices with a highermean and higher variance. The onlydemographic variablewith a significanteffect on choices and payoffswas "exposureto game theory";those subjects chose numbers with less variationacross rounds. A measure of "cognitivereflection" (Shane Frederick 2005), a short-formIQ test,did notcor- relatewith choice measures or withpayoffs. As is oftenseen in gameswith mixed equilibria, there is some mildevidence of "purification"since subjects chose only9.65 differentnumbers on average(see onlineAppendix D), comparedto 10.9 expectedin Poisson-Nashequilibrium. In thepost-experimental questionnaire, several subjects said thatthey responded to previouswinning numbers. To measurethe strengthof thislearning effect we regressedplayers' choices on thewinning number in the three previous periods. Table 6 showsthat the winning numbers in previousrounds do affectplayers' choices early on,but this tendency to respond to previous winning numbers is considerablyweaker in laterweeks (3 to7). The smallround-specific coefficients inTable 6 also showthat theredoes not appear to be anygeneral trend in players' choices over the 49 rounds.

IV. Conclusion

Itis oftendifficult to testgame theory using field data because equilibrium predic- tionsdepend so sensitivelyon strategies,information and payoffs, which are usually notobservable in thefield. This paper exploits an empiricalopportunity totest game

This content downloaded from 128.32.135.128 on Mon, 21 Jul 2014 17:24:34 PM All use subject to JSTOR Terms and Conditions VOL 3 NO. 3 ÖSTLINGETAL : TESTINGGAME THEORY IN THE FIELD 29 theoryin a fieldsetting which is simpleenough that clear predictions apply (when somesimplifying assumptions are made).The gameis a LUPI lottery,in whichthe lowestunique positive integer wins a fixedprize. LUPI is a close relativeof auctions in whichthe lowest unique bid wins. One contributionof ourpaper is to characterizethe Poisson-Nash equilibrium of theLUPI gameand analyze behavior in this game using both a fielddataset, including morethan two million choices, and parallel laboratory experiments which are designed to firstpermit a cleartest of thetheory while also matchingthe field setting. In both thefield and lab, players quickly learn to play close to equilibrium, but there are some diagnosticdiscrepancies between players' behavior and equilibrium predictions. As notedearlier, the variance in thenumber of playersin thefield data is much largerthan the variance assumed in thePoisson-Nash equilibrium. So thefield data is notan ideal testof thistheory, strictly speaking. Therefore, the key issues are how muchthe theory's predictions vary with changes in var(TV), and how much behaviorchanges in responseto var(TV). If eithertheory or behavioris insensitive to var(TV), then the Poisson-Nash equilibrium could be a usefulapproximation to thefield data. As fortheory: For thesimple examples in whichfixed-TV and Poissonequilibria can be computed,zero variance (fixed-TV) and Poisson variance equilibria are almost exactlythe same (see onlineAppendix A) . In fact,as shownin Figure Al (inthe online Appendix),the equilibrium probabilities for the fixed-TV and Poisson-Nashequilib- riumfor the LUPI gamewith n = 27 and К = 99 (similarto thevalues used in the lab) are practicallyindistinguishable. Keep in mindthat increasing var (TV) (holding n constant)implies that sometimes there are a lotof extra players so numberchoices shouldbe higher,and sometimesthere are fewerplayers so numberchoices should be lower.These two opposing effects could minimize the effect of variance on mean choices(as thelow-K cases in onlineAppendix A suggestthey do). As forbehavior: There are two sources of evidence that actual behavior is nottoo sensitiveto var (TV) . First, in the field data the Sunday and Monday sessions have lower n andlower standard deviation than all days,but choices are very comparable to data fromall days(in whichvar (TV) about twice as large).Second, in the lab datadifferent sessionswith var (TV) « 8 andvar (TV) « 27 lead to indistinguishablebehavior. Thesetheoretical and behavioral considerations suggest why the "wrong" theory (Poisson-Nash)might approximate actual behavior surprisingly well in the field (despitethe field var (TV) being empirically far from what the theory assumes). A differentway to describeour contribution is this: A LUPI gamewas actually playedin thefield, with specific rules. Can we produceany kindof theorywhich fitsthe data from this game? In thisview, it does notmatter whether the field setting matchesthe predictions of a theoryexactly. Instead, all thatmatters is whetherthe theoryfits well, even if its assumptions are wrong. Here theanswer is ratherclear: The empiricaldistribution of choicesclearly is movingin thedirection of the Poisson-Nash equilibrium over the 49 days(as judged by everynumber choice statistic)and is numericallyclose. As a bonus,the CH modelimproves a littleon thePoisson-Nash equilibrium, when optimally param- eterized,in the sense that it can explainthe key ways in whichbehavior departs from Poisson-Nash(too many low andvery high numbers) in theshort run. The estimated

This content downloaded from 128.32.135.128 on Mon, 21 Jul 2014 17:24:34 PM All use subject to JSTOR Terms and Conditions 30 AMERICANECONOMIC JOURNAL:MICROECONOMICS AUGUST201 1 numberof thinkingsteps is in thefirst week 1.80,which is in line withestimates frommany lab experiments. Notethat the point of thecognitive hierarchy model is notsimply to fitthe data betterthan Poisson-Nash, but also to showhow people with limited computational powermight start near, and converge to, such a complexequilibrium. However, the cognitivehierarchy model provides merely suggestive evidence regarding this con- vergence,and hence, should be viewedas a potentialstepping stone (instead of the finalword) to an investigationusing a formallearning model. Finally,because the LUPI fieldgame is simple,it is possibleto do a lab experi- mentthat closely replicates the essential features of thefield setting (which most experimentsare notdesigned to do). Thisclose lab-fieldparallelism in designadds evidenceto theongoing debate about when lab findingsgeneralize to parallelfield settings(e.g., Levitt and List 2007, and Falk and Heckman 2009). The lab gamewas describedvery much like theSwedish lottery (controlling context), experimental subjectswere allowed to selectout of theexperiment after it was described(allow- ingself-selection), and lab stakeswere made equal to thefield stakes (in expected terms).Basic lab and fieldfindings are fairlyclose: In bothsettings, choices are close to equilibrium,but thereare too manylarge numbers and too few agents chooseintermediate numbers at thehigh end of the equilibrium range. We interpret thisas a good exampleof close lab-fieldgeneralization, when the lab environment is designedto be close to a particularfield environment.35

Appendix:Proof of Proposition 1

We firstprove the five properties and then prove that the equilibrium is unique.

1) We provethis property by induction.For к - 1, we musthave px > 0. Otherwise,deviating from the proposed equilibrium by choosing1 would guaranteewinning for sure. Now supposethat there is somenumber к + 1 thatis notplayed in equilibrium,but that к is playedwith positive probability. We showthat n(k + 1) > тг(&),implying that this cannot be an equilibrium. To see this,note that the expressions for the expected payoffs allows us to writethe ratio тг[к + 1)/тг(&)as = ■ к(к + 1) = Pr(X(k+ 1) Q) nti РгДО) Ф 1) чк) рг(вд = o)-nř:;pr(x(i)#i) ■ _ Рт(Х(к + 1) = 0) Рг(ВД ф 1) РгДО) = 0)

If к + 1 is notused in equilibrium,Pr(X(fc + 1) = 0) = 1, implyingthat theratio is aboveone. This shows that all integersbetween 1 andК areplayed withpositive probability in equilibrium.

35 Of course,it is also conceivablethat there is a genuinelab-field behavioral difference but it is approximately canceledby differences in thedesign details which have opposite effects.

This content downloaded from 128.32.135.128 on Mon, 21 Jul 2014 17:24:34 PM All use subject to JSTOR Terms and Conditions VOL 3 NO. 3 ÖSTLINGETAL : TESTINGGAME THEORY IN THE FIELD 31

2) Rewritecondition (1) as

_ enPM enPk = _npk

By thefirst property, both/?¿ and/?¿_H are positive, so thatthe right hand side is negative.Since the exponential is an increasingfunction, we concludethat Pk > Pk+'-

3) Condition(2) can be re-writtenas

= Pk-Pk+i -jfH1 -f(nPk))

- wheref(x) - xe~x,f'x) - (1 x)e~x and x = npk.Hence, f'x) > 0 if* < 1, and/'(jc) < Oifjc > 1. Ifpk+i > 1/w,by thesecond property, npk> npk+' > 1. So,f(npk+i) > f(npk).It followsthat

- - {Pk+i Pk+i) ~7f ta(l -ЛпРш))

= > -¿Ц1 -f(npk)) (Pk-Pk+i)-

lfpk< i/n,by the second property, npk+x < npk< l.So, f(npk+i)

- = (Pk+i Pk+2) -jïH1 -finPk+i))

- < -jfHl -f{npk)) (Рк-рш).

4) Takingthe limit of (2) as n -> oo impliesthat^+1 = pk. • 5) Since 1 = Y,k=l Рк > К pK by the second property,we have pK < 1/K-* Oas/T^ oc.

In orderto show thatthe equilibriump = {p',p2,-• -,Рк) is unique,suppose by contradictionthat there is anotherequilibrium p' = (p[9p'2, . . . , pfK).By theequi- libriumcondition (1), Pi uniquelydetermines all probabilities/?2,...,р^, while p' uniquelydetermines Ръ-.-^Рк- Without loss of generality,we assumep[ > px. Since in anyequilibrium, pk+x is strictlyincreasing in pkby condition(1), it must be thecase thatall positiveprobabilities in p' are higherthan in p. However,since p is an equilibrium,^2k=i Pk= 1-This meansthat J2 k=l p¿ > 1, contradictingthe assumptionthat pr is an equilibrium.

This content downloaded from 128.32.135.128 on Mon, 21 Jul 2014 17:24:34 PM All use subject to JSTOR Terms and Conditions 32 AMERICANECONOMIC JOURNAL:MICROECONOMICS AUGUST201 1

REFERENCES

Andersen,Steffen, Glenn W. Harrison,Morten I. Lau, and E. ElisabetRutstròm. 2008. "RiskAver- sionin Game Shows."In RiskAversion in Experiments,ed. JamesC. Cox and GlennW. Harrison, 359-404. Bingley:Emerald. Bazerman,Max H., JaredR. Curhan,Don A. Moore,and KathleenL. Valley.2000. "Negotiation." AnnualReview of Psychology, 51: 279-314. Camerer,Colin F. Forthcoming."The Promiseof and Success of Lab Generalizabilityin Experimen- tal Economics:A Replyto Levittand List (2007)." In Methodsin ExperimentalEconomics, ed. AndrewCaplin and Andrew Schotter, Oxford University Press.. Camerer,Colin, and Teck-HuaHo. 1998. "Experience-WeightedAttraction Learning in Coordination Games:Probability Rules, Heterogeneity, and Time- Variation." Journal of Mathematical Psychol- ogy,42(2-3): 305-26. Camerer,Colin R, and Robin M. Hogarth.1999. "The Effectsof FinancialIncentives in Experi- ments:A Reviewand Capital-LaborProduction Framework." Journal of Risk and Uncertainty, 19(1-3): 7-42. Camerer,Colin R, Teck-HuaHo, and Juin-KuanChong. 2004. "A CognitiveHierarchy Model of Games."Quarterly Journal of Economics, 119(3): 861-98. Costa-Gomes,Miguel A., and VincentP. Crawford.2006. "Cognitionand Behaviorin Two-Person GuessingGames: An ExperimentalStudy." American Economic Review, 96(5): 1737-68. Costa-Gomes,Miguel, Vincent P. Crawford,and BrunoBroseta. 2001. "Cognitionand Behaviorin Normal-FormGames: An ExperimentalStudy." Econometrica, 69(5): 1193-1235. Crawford,Vincent P. 2003. "Lyingfor Strategic Advantage: Rational and Boundedly Rational Misrep- resentationof Intentions." American Economic Review, 93(1): 133-49. Crawford,Vincent P., and Nagore Iriberri. 2007a. "Fatal Attraction:Salience, Naïveté, and Sophisticationin Experimental'Hide-and-Seek' Games." American Economic Review, 97(5): 1731-50. Crawford,Vincent P., and NagoreIriberri. 2001b. "Level-kAuctions: Can a NonequilibriumModel of StrategicThinking Explain the Winner's Curse and Overbiddingin Private-ValueAuctions?" Econometrica,15(6): 1721-70. Eichberger,Jürgen, and DmitriVinogradov. 2008. "Least UnmatchedPrice Auctions:A First Approach."University of Heidelberg Department of Economics Discussion Paper 471. Ellingsen,Tore, and RobertÖstling. 2010. "WhenDoes CommunicationImprove Coordination?" AmericanEconomic Review, 100(4): 1695-1724. Erev,Ido, AlvinE. Roth,Robert L. Slonim,and GregBarron. 2007. "Learningand Equilibriumas UsefulApproximations: Accuracy of Predictionon RandomlySelected Constant Sum Games." EconomicTheory, 33(1): 29-5 1. Falk,Armin, and JamesJ. Heckman.2009. "Lab ExperimentsAre a Major Sourceof Knowledgein theSocial Sciences."Science, 326(5952): 535-38. Fischbacher,Urs. 2007. "z-Tree:Zurich Toolbox for Ready-Made Economic Experiments." Experi- mentalEconomics, 10(2): 171-78. Frederick,Shane. 2005. "CognitiveReflection and DecisionMaking." Journal of Economic Perspec- tives,19(4): 25-42. Friedman,Milton. 1953. Essays in PositiveEconomics. Chicago: University of Chicago Press. Gallice,Andrea. 2009. "LowestUnique Bid Auctionswith Signals." Collegio Carlo Alberto Working Paper112. Goeree,Jacob K., CharlesA. Holt,and ThomasR. Palfrey.2002. "QuantalResponse Equilibrium and Overbiddingin Private-Value Auctions." Journal of Economic Theory, 104(1): 247-72. Houba,Harold, Dinard van der Laan, and DirkVeldhuizen. Forthcoming. "Endogenous Entry in Low- est-UniqueSealed-Bid Auctions." Theory and Decision, Levitt,Steven D., and JohnA. List.2007. "WhatDo LaboratoryExperiments Measuring Social Prefer- encesReveal about the Real World?"Journal of Economic Perspectives, 21(2): 153-74. Myerson,Roger B. 1998. "PopulationUncertainty and Poisson Games."International Journal of GameTheory, 27(3): 375-92. Myerson,Roger B. 2000. "LargePoisson Games." Journal of Economic Theory, 94(1): 7^5. Nagel,Rosemarie. 1995. "Unravelingin GuessingGames: An ExperimentalStudy. AmericanEco- nomicReview, 85( 5): 1313-26. Östling,Robert, Joseph Tao-yi Wang, Eileen Y. Chou,and Colin F. Camerer.2011. "TestingGame Theoryin theField: Swedish LUPI LotteryGames: Dataset." American Economic Journal: Micro- economics.http://www.aeaweb.org/articles.php7doi::: 10.1257/mic.3.3. 1.

This content downloaded from 128.32.135.128 on Mon, 21 Jul 2014 17:24:34 PM All use subject to JSTOR Terms and Conditions VOL.3 NO. 3 ÖSTLINGETAL : TESTINGGAME THEORY IN THE FIELD 33

Post,Thierry, Martijn J. van denAssem, Guido Baltussen, and RichardH. Thaler.2008. "Deal or No Deal? DecisionMaking under Risk in a Large-PayoffGame Show."American Economic Review, 98(1): 38-71. Rapoport,Amnon, Hironori Otsubo, Bora Kim,and WilliamE. Stein.2009. "Unique Bid Auction Games."Jena Economic Research Paper 2009-005. Raviv,Yaron, and Gabor Virag.2009. "Gamblingby Auctions."International Journal of Industrial Organization,27(3): 369-78. Rogers,Brian W., Thomas R. Palfrey,and ColinF. Camerer.2009. "HeterogeneousQuantal Response Equilibriumand CognitiveHierarchies." Journal of Economic Theory, 144(4): 1440-67. Rosenthal,Robert W. 1973. "A Class of Games PossessingPure-Strategy Nash Equilibria."Interna- tionalJournal of Game Theory, 2(1): 65-67. Schwarz,Gideon. 1978. "Estimatingthe Dimension of a Model."Annals of Statistics, 6(2): 461-64. Stahl,Dale O., and Paul W. Wilson.1995. "On Players'Models of OtherPlayers: Theory and Experi- mentalEvidence." Games and EconomicBehavior, 10(1): 218-54. vanDamme, Eric. 1998."On theState of the Art in GameTheory: An Interviewwith ." Gamesand EconomicBehavior, 24(1-2): 181-210.

This content downloaded from 128.32.135.128 on Mon, 21 Jul 2014 17:24:34 PM All use subject to JSTOR Terms and Conditions