<<

Risk Aversion in Game Shows

by

Steffen Andersen, Glenn W. Harrison, Morten Igel Lau and Elisabet E. Rutström †

Forthcoming, J.C. Cox and G.W. Harrison (eds.), Risk Aversion in Experiments (Greenwich, CT: JAI Press, Research in Experimental Economics, Volume 12, 2007)

Abstract. We review the use of behavior from television game shows to infer risk attitudes. In many cases these shows provide evidence when contestants are making decisions over very large stakes, and in a replicated, structured way. Inferences are generally confounded by the subjective assessment of skill in some games, as well as the dynamic nature of the task in most games. We consider the game shows Card Sharks, Jeopardy!, Lingo, and finally Deal Or No Deal. We provide a detailed case study of the analyses of Deal Or No Deal, since it is one of the cleanest games for inference and has attracted considerable attention. We describe the manner in which the analyses have been undertaken, and propose a general method to overcome the curse of dimensionality that one encounters when estimating risk attitudes in the context of a dynamic, stochastic programming environment.

† Centre for Economic and Business Research, Copenhagen Business School, Copenhagen, Denmark (Andersen); Department of Economics, College of Business Administration, University of Central Florida, USA (Harrison and Rutström) and Department of Economics and Finance, Durham Business School, Durham University, (Lau). E-mail: [email protected], [email protected], [email protected], and [email protected]. Harrison and Rutström thank the U.S. National Science Foundation for research support under grants NSF/IIS 9817518, NSF/HSD 0527675 and NSF/SES 0616746. We are grateful toPavlo Blavatskyy, Daniel Mulino, Ganna Pogrebna, Thierry Post and Martijn van den Assem for discussions and comments. Table of Contents

1. Previous Literature ...... -2- A. Card Sharks...... -2- Estimates of Risk Attitudes...... -3- EUT Anomalies...... -6- B. Jeopardy!...... -8- C. Lingo ...... -10- 2. Deal Or No Deal...... -14- 3. Previous Deal Or No Deal Analyses ...... -18- A. Bombardini and Trebbi [2005]...... -19- B. Mulino, Scheelings, Brooks and Faff [2006]...... -21- C. Post, van den Assem, Baltussen and Thaler [2006a][2006b] ...... -23- Interval Responses ...... -23- Prior Outcomes...... -25- Stake Effects ...... -27- Main Inferences...... -28- Prospect Theory ...... -29- Maximum Likelihood Estimation ...... -30- D. De Roos and Sarafidis [2006]...... -35- E. Other Studies...... -37- 4. A General Estimation Strategy...... -38- A. Basic Intuition...... -38- B. Formal Specification ...... -41- C. Estimates ...... -45- 5. Conclusions ...... -47- References...... -57- Observed behavior on television game shows constitutes a controlled natural experiment that has been used to estimate risk attitudes. Contestants are presented with well-defined choices where the stakes are real and sizeable, and the tasks are repeated in the same manner from contestant to contestant. We review behavior in these games, with an eye to inferring risk attitudes. We describe the types of assumptions needed to evaluate behavior, and propose a general method for estimating the parameters of structural models of choice behavior for these games. We illustrate with a detailed case study of behavior in the U.S. version of Deal Or No Deal. In section 1 we review the existing literature in this area that is focused on risk attitudes, starting with Gertner [1993] and the Card Sharks program. We then review the analysis of behavior

on Jeopardy! by Metrick [1995] and on Lingo by Beetsma and Schotman [2001].1 In section 2 we turn to a detailed case study of the Deal Or No Deal program that has generated an explosion of analyses trying to estimate large-stakes risk aversion. We first explain the basic rules of the game, which is shown with some variations in many countries. In section 3 we review in some detail previous analyses of behavior in Deal Or No Deal versions shown in , Italy, the Netherlands, , and the United Kingdom. Section 4 proposes a general method for estimating choice models in the stochastic dynamic programming environment that most of these game shows employ. We resolve the “curse of dimensionality” in this setting by using randomization methods. We illustrate the application of the method using U.S. behavior, and estimating a simple structural model of expected utility theory choice behavior. The manner in which our inferences can be extended to other models is also discussed. Finally, in section 5 we identify several weaknesses of data, and how they might be addressed. We stress the complementary use of natural experiments, such as game shows, and laboratory experiments.

1 Behavior on Who Wants To Be A Millionaire has been carefully evaluated by Hartley, Lanot and Walker [2005], but this game involves a large number of options and alternatives that necessitate some strong assumptions before one can pin down risk attitudes rigorously. We focus on games in which risk attitudes are relatively easier to identify. -1- 1. Previous Literature A. Card Sharks The game show Card Sharks provided an opportunity for Gertner [1993] to examine dynamic choice under uncertainty involving substantial gains and losses. Two key features of the show allowed him to examine the hypothesis of asset integration: each contestant’s stake accumulates from round to round within a game, and the fact that some contestants come back for repeat plays after winning substantial amounts. The game involves each contestant deciding in a given round whether to bet that the next card drawn from a deck will be higher or lower than some “face card” on display. Figure 1 provides

a rough idea of the layout of the “Money Cards” board before any face cards are shown. Figure 2 provides a clearer representation of the board from a computerized laboratory implementation of Card Sharks in Andersen et al. [2006c]. In Figure 2 the contestant-subject has a face card with a 3, and is about to enter their first bet. Cards were drawn without replacement from a standard 52-card deck, with no Jokers and with Aces high. The contestant decides on the direction of the next card, and then on an amount to bet that their choice is correct. If they are correct their stake increments by the amount bet, but if they are incorrect their stake is reduced by the amount bet.2 Every contestant starts off with an initial stake of $200, and bets could be in increments of $50 of the available stake. After three rounds in the first, bottom “row” of cards, the contestant moved to the second, middle “row” and received an additional $200 (or $400 in some versions). If the contestant’s stake had gone to zero in the first row, they went straight to the second row and received the new stake; otherwise, the additional stake was theirs to bet with. The second row included three choices, just as in the first row. After these three choices, and if the subject’s stake had not dropped to zero, they could play the final bet. In this case they had to bet at least one-half of their stake, but otherwise the bets were the same. One feature of the game is that contestants sometimes had the option to switch face cards in the hope of

2 If the new card is the same as the face card, there is no change in the stake. There are minor variations in the versions of the game show. -2- getting one that was easier to win against.3 The show aired in the in two major versions. The first, between April 1978 and October 1981, was on NBC and had Jim Perry as the host. The second, between January 1986 and March 1989, was on CBS and had as the host.4 The maximum prize was $28,800 on the NBC version and $32,000 on the CBS version, and would be won if the contestant correctly bet the maximum amount in every round. This only occurred once. Using official inflation calculators5 this converts into 2006 dollars between $89,138 and $63,936 for the NBC version, and between $58,920 and $52,077 for the CBS version. These stakes are actually quite modest in relation to contemporary game shows in the United States, such

as Deal Or No Deal described below, which typically has a maximal stake of $1,000,000. Of course, maximal stakes can be misleading, since Card Sharks and Deal Or No Deal are both “long shot” lotteries. Average earnings in the CBS version used by Gertner [1993] were $4,677, which converts to between $8,611 and $7,611 in 2006, whereas average earnings in Deal Or No Deal have been $131,943 for the sample we report later (excluding a handful of special shows with significantly higher prizes).

Estimates of Risk Attitudes The analysis of Gertner [1993] assumes a Constant Absolute Risk Aversion (CARA) utility function, since he did not have information on household wealth and viewed that as necessary to estimate a Constant Relative Risk Aversion (CRRA) utility function. We return to the issue of household wealth later.

3 In the earliest versions of the show this option only applied to the first card in the first row. Then it applied to the first card in each row in later versions. Finally, in the last major version it applied to any card in any row, but only one card per row could be switched. 4 Two further American versions were broadcast. One was a syndicated version in the 1986/1987 season, with Bill Rafferty as host. Another was a brief syndicated version in 2001. A British version, called , aired in the 1980's and again in the 1990's. A German version called Bube Dame Hörig, and a Swedish version called Lagt Kort Ligger, have also been broadcast. Card Sharks re-runs remain relatively popular on the American , a cable station. 5 Available at http://data.bls.gov/cgi-bin/cpicalc.pl. -3- Gertner [1993] presents several empirical analyses. He initially (p.511) focused on the last round, and used the optimal “investment” formula b = [ln(pwin) - ln(plose)]/ (2")(0) where the probabilities of winning and losing the bet b are defined by pwin and plose, and the utility function is U(W) = -exp(-"W) for wealth W.6 From observed bets he inferred ". There are several potential problems with this approach. First, there is an obvious sample selection problem from only looking at the last round, although this is not a major issue since relatively few contestants go bankrupt (less than 3%).

Second, there is the serious problem of censoring at bets of 50% or 100% of the stake. The former are assumed to be optimal bets, when in fact the contestant might wish to bet less; thus inferences will be biased towards showing less risk aversion than there might actually be. The latter were assumed to be risk neutral, when in fact they might be risk lovers; thus inferences will be biased towards showing more risk aversion than there might actually be. Two wrongs do not make a right, although one does encounter such claims in empirical work. Gertner [1993; p.510] was well aware of the issue, and indeed motivated this approach by a desire to avoid it: Regression estimates of absolute risk aversion are sensitive to the distribution assumptions one makes to handle the censoring created by the constraints that a contestant must bet no more than her stake and at least half of her stake in the final round. Therefore, I develop two methods to estimate a lower bound on the level of risk aversion that do not rely on assumptions about the error distribution. Of course, the first method relies on exactly the same sort of assumptions, although not formalized in terms of an error distribution. And it is not clear that the estimates will be lower bounds, since this censoring issue biases inferences in either direction. The average estimate of ARA to emerge is 0.000310, with a standard error of 0.000017, but it is not clear how one should interpret this estimate since it could be an overestimate or an underestimate.

6 Let the expected utility of the bet b be pwin×U(b) + plose×U(-b). The first order condition for a maximum over b is then pwin×UN(b) + plose×UN(-b) = 0. Since UN(b) = -exp(-"b) and UN(-b) = -exp(-"(-b)), substitution and simple manipulation yield the formula. -4- The second approach is a novel and early application of simulation methods, which we will develop in greater detail below. A computer simulates optimal play by a risk-neutral agent playing the entire game 10 million times, recognizing that the cards are drawn without replacement. The computer does not appear to recognize the possibility of switching cards, but that is not central to the methodological point. The average return from this virtual lottery (VL), in which one had a risk- neutral agent play for you, is $6,987 with a standard deviation of $10,843. It is not apparent that the lottery would have a Gaussian distribution of returns, but that can be allowed for in a more complete numerical analysis as we show later, and is again not central to the main methodological point.

The next step is to compare this distribution with the observed distribution of earnings, which was an average of $4,677 with a standard deviation of $4,258, and use a revealed preference argument to infer what risk attitudes must have been in play for this to have been the outcome instead of the virtual lottery: A second approach is to compare the sample distribution of outcomes with the distribution of outcomes if a contestant plays the optimal strategy for a risk-neutral contestant. One can solve for the coefficient of absolute risk aversion that would make an individual indifferent between the two distributions. By revealed preference, an “average” contestant prefers the actual distribution to the expected-value maximizing strategy, so this is an estimate of the lower bound of constant absolute risk aversion. (p.511/512) This approach is worth considering in more depth, because it suggests estimation strategies for a wide class of stochastic dynamic programming problems which we develop in Section 4. This exact method will not work once one moves beyond special cases such as risk neutrality, where outcomes and behavior in later rounds have no effect on optimal behavior in earlier rounds. But we will see that an extension of the method does generalize. The comparison proposed here generates a lower bound on the ARA, rather than a precise estimate, since we know that an agent with an even higher ARA would also implicitly choose the observed distribution over the virtual RN distribution. Obviously, if one could generate VL distributions for a wide range of ARA values, it would be possible to refine this estimation step and select the ARA that maximizes the likelihood of the data. This is, in fact, exactly what we propose

-5- later as a general method for estimating risk attitudes in such settings. The ARA bound derived from this approach is 0.0000711, less than one-fourth of the estimate from the first method. Gertner [1993; p.512] concludes that The “Card Sharks” data indicate a level of risk aversion higher than most existing estimates. Contestants do not seem to behave in a risk-loving and enthusiastic way because they are on television, because anything they win is gravy, or because the producers of the show encourage excessive risk-taking. I think this helps lend credence to the potential importance and wider applicability of the anomalous results I document below. His first method does not provide any basis for these claims, since risk-loving is explicitly assumed away. His second method does indicate that the average player behaves as if risk averse, but there are no standard errors on that bound. Thus one simply cannot say that it is statistically significant evidence of risk aversion.

EUT Anomalies The second broad set of empirical analyses by Gertner [1993] considers a regression model of bets in the final round, and shows some alleged violations of EUT. The model is a two-limit tobit specification, recognizing that bets at 50% and 100% may be censored. However, most of the settings in which contestants might rationally bet 50% or 100% are dropped. Bets with a face card of 2 or an Ace are dropped since they are sure things in the sense that the optimal bet cannot result in a loss (the bet is simply returned if the same card is then turned up). Similarly, bets with a face card of an 8 are dropped, since contestants almost always bet the minimum. These deletions amount to 258 of the 844 observations, which is not a trivial sub-sample. The regression model includes several explanatory variables. The central ones are cash and stake. Variable cash is the accumulated earnings by the contestant to that point over all repetitions of the game. So this includes previous plays of the game for “champions,” as well as earnings accumulated in rounds 1-6 of the current game. Variable stake is the accumulated earnings in the current game, so it excludes earnings from previous games. One might expect the correlation of stake and cash to be positive and high, since the average number of times the game is played in these

-6- data is 1.85 (= 844/457). Additional explanatory variables include a dummy for new players that are in their first game; the ratio of cash to the number of times the contestant has played the whole game (the ratio is 0 for new players); the value of any cars that have been won, given by the stated sticker price of the car; and dummy variables for each of the possible face card pairs (in this game a 3 is essentially the same as a King, a 4 the same as a Queen, etc). The stake variable is included as an interaction with these face dummies, which are also included by themselves.7 The model is estimated with or without a multiplicative heteroskedasticity correction, and the latter estimates preferred. Card-counters are ignored when inferring probabilities of a win, and this seems reasonable as a first approximation.

Gertner [1993; §VI] draws two striking conclusions from this model. The first is that stake is statistically significant in its interactions with the face cards. The second is that the cash variable is not significant. The first result is said to be inconsistent with EUT since earnings in this show are small in relation to wealth, and The desired dollar bet should depend upon the stakes only to the extent that the stakes impact final wealth. Thus, risky decisions on “Card Sharks” are inconsistent with individuals maximizing a utility function over just final wealth. If one assumes that utility depends only on wealth, estimates of zero on card intercepts and significant coefficients on the stake variable imply that outside wealth is close to zero. Since this does not hold, one must reject utility depending only on final wealth. (p. 517) This conclusion bears close examination. First, there is a substantial debate as to whether EUT has to be defined over final wealth, whatever that is, or can be defined just over outcomes in the choice task before the contestant (e.g, see Cox and Sadiraj [2006] and Harrison, Lau and Rutström [2007] for references to the historical literature). So even if one concludes that the stake matters, this is not fatal for specifications of EUT defined over prizes, as clearly recognized by Gertner [1993; p.519] in his reference to Markowitz [1952]. Second, the deletion of all extreme bets likely leads to a significant understatement of uncertainty about coefficient estimates. Third, the regression does not correct for panel effects, and these could be significant since the variables cash and stake are

7 In addition, a variable given by stake2/2000 is included by itself to account for possible non- linearities. -7- correlated with the individual.8 Hence their coefficient estimates might be picking up other, unobservable effects that are individual-specific. The second result is also said to be inconsistent with EUT, in conjunction with the first result. The logic is that stake and cash should have an equal effect on terminal wealth, if one assumes perfect asset integration and that utility is defined over terminal wealth. But one has a significant effect on bets, and the other does not. Since the assumption that utility is defined over terminal wealth and that asset integration is perfect are implicitly maintained by Gertner [1993; p.517ff.], he concludes that EUT is falsified. However, one can include terminal wealth as an argument of utility without also assuming perfect asset integration (e.g., Cox and Sadiraj [2006]). This is also recognized

explicitly by Gertner [1993; p.519], who considers the possibility that “contestants have multi- attribute utility functions, so that they care about something in addition to wealth.”9 Thus, if one accepts the statistical caveats about samples and specifications for now, these results point to the rejection of a particular, prominent version of EUT, but they do not imply that all popular versions of EUT are invalid.

B. Jeopardy! In the game show Jeopardy! there is a sub-game referred to as Final Jeopardy. At this point three contestants have cash earnings from the initial rounds. The skill component of the game consists of hearing some text read out by the host, at which point the contestants jump in to state the question that some text is the answer to.10 In Final Jeopardy the contestants are told the general subject matter for the task, and then have to privately and simultaneously state a wager amount from their accumulated points. They can wager any amount up to their earned endowment at that point, and are rewarded with even odds: if they are correct they get that wager amount added, but if they

8 Gertner [1993; p. 512]: “I treat each bet as a single observation, ignoring any contestant-specific effects.” 9 He rejects this hypothesis, for reasons not important here. 10 For example, in a game aired on 9/16/2004, the category was “Speaking in Tongues.” The $800 text was “A 1996 Oakland School Board decision made many aware of this term for African-American English.” Uber-champion Ken Jennings correctly responded, “What be Ebonics?” -8- are incorrect they have that amount deducted. The winner of the show is the contestant with the most cash after this final stage. The winner gets to keep the earnings and come back the following day to try and continue as champion. In general, these wagers are affected by the risk attitudes of contestants. But they are also affected by their subjective beliefs about their own skill level relative to the other two contestants, and by what they think the other contestants will do. So this game cannot be fully analyzed without making some game-theoretic assumptions. Jeopardy! was first aired in the United States in 1964, and continued until 1975. A brief season returned between 1978 and 1979, and then the modern era began in 1984 and continues today. The format changes have been relatively small, particularly in the modern era. The data used by Metrick [1995] comes from shows broadcast between October 1989 and January 1992, and reflects over 1,150 decisions. Metrick [1995] examines behavior in Final Jeopardy in two stages.11 The first stage considers the subset of shows in which one contestant is so far ahead in cash that the bet only reveals risk attitudes and beliefs about own skill. In such “runaway games” there exist wagers that will ensure victory, although there might be some rationale prior to September 2003 for someone to bet an amount that could lead to a loss. Until then, the champion had to retire after 5 wins, so if one had enough confidence in one’s skill at answering such questions, one might rationally bet more than was needed to ensure victory. After September 2003 the champion stayed on until defeated. In the runaway games Metrick [1995; p.244] uses the same formula (0) that Gertner [1993] used for CARA utility functions. The only major difference is that the probability of winning in

Jeopardy! is not known objectively to the observer.12 His solution is to substitute the observed fraction of correct answers, akin to a rational expectations assumption, and then solve for the CARA

11 Nalebuff [1990; p.182] proposed the idea of the analysis, and the use of empirical responses to avoid formal analysis of the strategic aspects of the game. 12 One formal difference is that the first order condition underlying that formula assumes an interior solution, and the decision-maker in runaway games has to ensure that he does not bet too much to fall below the highest possible points of his rival. Since this constraint did not bind in the 110 data points available, it can be glossed. -9- parameter " that accounts for the observed bets. The result is an estimate of " equal to 0.000066 with a standard error of 0.000056. Thus there is slight evidence of risk aversion, but it is not statistically significant, leading Metrick [1995; p.245] to conclude that these contestants behaved in a risk-neutral manner. The second stage of the analysis considers sub-samples in which two players have accumulated scores that are sufficiently close that they have to take the other into account, but where there is a distant third contestant who can be effectively ignored. Metrick [1995] cuts this Gordian Knot of strategic considerations by assuming that contestants view themselves as betting against contestants’ whose behavior can be characterized by their observed empirical frequencies.

He does not use these data to make inferences about risk attitudes.

C. Lingo The underlying game in Lingo involves a team of two people guessing a hidden 5-letter word. Figure 3 illustrates one such game from the U.S. version. They are told the first letter of the word, and can then just state words. If incorrect, the words that are tried are used to reveal letters in the correct word if there are any. To take the example in Figure 3, the true word was STALL. So the initial S was shown. The team suggested SAINT and are informed (by yellow coloring) that the A and T is in the correct word. They are not told the order of the letters A and T in the correct word. The team then suggested STAKE, and informed that the T and A were in the right place (by red coloring) and that no other letters were in the correct word. They then tried STAIR, SEATS, and finally STALL. Most teams are able to guess the correct word in 5 rounds. The game occurs in two stages. In the first stage one team of two play against another team, and plays several of these Lingo word-guessing games. The couple with the most money then goes on to the second stage, which is the one of interest for measuring risk attitudes. So the winning couple comes into the main task with a certain earned endowment (which could be augmented by an unrelated game called “jackpot”). They also come in with some knowledge of their own ability to solve these word-guessing puzzles.

-10- In the Dutch data used by Beetsma and Schotman [2001], spanning 979 games, the frequency distribution of the number of solutions across rounds 1 through 5 in the final stage was 0.14, 0.32, 0.23, 0.13, 0.081 and 0.089, respectively. Every round that the couple requires to guess the word means that they have to pick one ball from an urn affecting their payoffs, as described below. If they do not solve the word puzzle they have to pick 6 balls. These balls determine if the team goes “bust” or “survives” something called the Lingo Board in that round. An example of the Lingo Board is shown in Figure 4, from Beetsma and Schotman [2001; Figure 3].13 There are 35 balls in the urn numbered from 1 to 35, plus one “golden ball.” If the golden ball is picked then the team wins the cash prize for that round and gets a free pass to the next round. If one of the numbered

balls is picked, then the fate of the team depends on the current state of the Lingo Board. The team goes “bust” if they get a row, column or diagonal of V’s, akin to the parlor game noughts and crosses. So solving the word puzzle in less moves is good, since it means that less balls have to be drawn from the urn, and hence that the survival probability is higher. In the example from Figure 1, drawing a 5 would be fatal, drawing an 11 would not be, and drawing a 1 would not be if a 2 or 8 had not been previously drawn. If the team survives a round it gets a cash prize, and is asked if they want to keep going or stop. This lasts for five rounds. So apart from the skill part of the game, guessing the words, this is the only choice the team makes. This is therefore a “stop-go” problem, in which the team balances their current earnings with the lottery of continuing and either earning more cash or going bust. If they choose to continue their stake doubles; if the golden ball had been drawn it is replaced in the urn. If they go bust they take home nothing. Teams can play the game up to three times, then they retire from the show. Risk attitudes are involved when the team has to balance the current earnings with the lottery of continuing. That lottery depends on their subjective beliefs about their skill level, the state of the Lingo Board at that point, and their perception of the probabilities of drawing a “fatal” number or

13 The Lingo Board in the U.S. version is larger, and there are more balls in the urn, with implications for the probabilities needed to infer risk attitudes. -11- the golden ball. In many respects, apart from the skill factor and the relative symmetry of prizes, this game is remarkably like Deal Or No Deal, as we see later. Beetsma and Schotman [2001] evaluate data from 979 finals. Each final lasts several rounds, so the sample of binary stop/continue decisions is larger, and constitutes a panel. Average earnings in this final round in their sample are 4,106 Dutch Guilders (ƒ), with potential earnings, given the initial stakes brought into the final, of around ƒ15,136. The average exchange rate into U.S. dollars in 1997, which is around when these data were from, was ƒ0.514 per dollar, so these stakes are around $2,110 on average, and up to roughly $7,780. These are not life-changing prizes, like the top prizes in Deal Or No Deal, but are clearly substantial in relation to most lab experiments.

Beetsma and Schotmer [2001; §4] show that the stop/continue decisions have a simple monotonic structure if one assumes CRRA or CARA utility. Since the odds of surviving never get better with more rounds, if it is optimal to stop in one round it will always be optimal to stop in any later round. This property does not necessarily hold for other utility functions. But for these utility

* functions, which are still an important class, one can calculate a threshold survival probability pi for any round i such that the team should stop if the actual survival probability falls below it. This threshold probability does depend on the utility function and parameter values for it, but in a closed- form fashion that can be easily evaluated within a maximum-likelihood routine.14 Each team can play the game three times before they have to retire as champions. The specification of the problem clearly recognizes the option value in the first game of coming back to play the game a second or third time, and then the option value in the second game of coming back to play a third time. The certainty-equivalent of these option values depends, of course, on the risk attitudes of the team. But the estimation procedure “black boxes” these option values, to collapse the estimation problem down to a static one: they are free parameters to be estimated along with the parameter of the utility function. Thus they are not constrained by the expected returns and risk of

14 Their equation (12) shows the formula for the general case, and equations (5) and (8) for the special final-round cases assuming CRRA or CARA. There is no statement that this is actually evaluated within the * maximum likelihood evaluator, but pi is not listed as a parameter to be estimated separately from the utility function parameter, so this is presumably what was done. -12- future games, the functional form of utility, and the specific parameters values being evaluated in the maximum likelihood routine. Beetsma and Schotmer [2001; p.839] do clearly check that the option value in the first game exceeds the option value in the second game, but (a) they only examine point estimates, and make no claim that this difference is statistically significant,15 and (b) there is no check that the absolute values of these option values are consistent with the utility function and parameter values. There is no mention of any corrections for the fact that each team makes several decisions, and that errors for that team are likely correlated. With these qualifications, the estimates of the CRRA parameter are 0.42, with a standard

error of 0.05, if one assumes that utility is only defined over the monetary prizes. It rises to 6.99, with a standard error of 0.72, if one assumes a baseline wealth level of ƒ50,000, which is the preferred estimate. Each of these estimates is significantly different from 0, implying rejection of risk neutrality in favor of risk aversion. The CARA specification generates comparable estimates.

One extension is to allow for probability weighting on the actual survival probability pi in round i. The weighting occurs in the manner of original Prospect Theory, due to Kahneman and Tverksy [1979], and not in the rank-dependent manner of Quiggin [1982][1993] and Cumulative Prospect Theory. One apparent inconsistency is that the actual survival probabilities are assumed to

* weighted subjectively, but the threshold survival probabilities pi are not, which seems odd (see their equation (18) on page 843). The results show that estimates of the degree of concavity of the utility function increase substantially, and that contestants systematically over-weight the actual survival probability. We return to some of the issues of structural estimation of models assuming decision weights, in a rank-dependent manner, in the discussion of Deal Or No Deal and Andersen et al. [2006a][2006b].

15 The point estimates for the CRRA function (their Table 6, p.837) are generally around ƒ1,800 and ƒ1,500, with standard errors of roughly ƒ200 on each. Similar results obtain for the CARA function (their Table 7, p.839). So these differences are not obviously significant at standard critical levels. -13- 2. Deal Or No Deal The basic version of Deal Or No Deal is the same across all versions. We explain these general rules by focusing on the English-language version shown in the United States, and then consider variants found in other countries. The show confronts the contestant with a sequential series of choices over lotteries, and asks a simple binary decision: whether to play the (implicit) lottery or take some deterministic cash offer. A contestant is picked from the studio audience. They are told that a known list of monetary prizes, ranging from $0.01 up to $1,000,000, has been placed in 26 suitcases.16 Each suitcase is carried on- stage by attractive female models, and has a number from 1 to 26 associated with it. The contestant

is informed that the money has been put in the suitcase by an independent third party, and in fact it is common that any unopened cases at the end of play are opened so that the audience can see that all prizes were in play. Figure 5 shows how the prizes are displayed to the subject at the beginning of the game. The contestant starts by picking one suitcase that will be “his” case. In round 1 the contestant must pick 6 of the remaining 25 cases to be opened, so that their prizes can be displayed. Figure 6 shows how the display changes after the contestant picks the first case: in this case the contestant unfortunately picked the case containing the $300,000 prize. A good round for a contestant occurs if the opened prizes are low, and hence the odds increase that his case holds the higher prizes. At the end of each round the host is phoned by a “banker” who makes a deterministic cash offer to the contestant. In one of the first American shows (12/21/2005) the host made a point of saying clearly that “I don’t know what’s in the suitcases, the banker doesn’t, and the models don’t.” The initial offer in early rounds is typically low in comparison to expected offers in later rounds. We use an empirical offer function later, but the qualitative trend is quite clear: the bank offer starts out at roughly 10% of the expected value of the unopened cases, and increments by

16 A handful of special shows, such as season finales and season openers, have higher stakes up to $6 million. Our later statistical analysis includes these data, and adjusts the stakes accordingly. -14- about 10% of that expected value for each round. This trend is significant, and serves to keep all but extremely risk averse contestants in the game for several rounds. For this reason it is clear that the case that the contestant “owns” has an option value in future rounds. In round 2 the contestant must pick 5 cases to open, and then there is another bank offer to consider. In succeeding rounds 3 through 10 the contestant must open 4, 3, 2, 1, 1, 1, 1, 1 and 1 cases, respectively. At the end of round 9 there are only 2 unopened cases, one of which is the contestant’s case. In round 9 the decision is a relatively simple one from an analyst’s perspective: either take the non-stochastic cash offer or take the lottery with a 50% chance of either of the two remaining unopened prizes. We could assume some latent utility function, and estimate parameters for that function that best explain observed binary choices. Unfortunately, relatively few contestants get to this stage, having accepted offers in earlier rounds. In our data, only 9% of contestants reach that point. More serious than the smaller sample size, one naturally expects that risk attitudes would affect those surviving to this round. Thus there would be a serious sample attrition bias if one just studied choices in later rounds. If the bank offer was random this sequence could be evaluated as a series of static choices, at least under standard EUT. The cognitive complexity of evaluating the compound lottery might be a factor in behavior, but it would conceptually be simple to analyze. However, the bank offer gets richer and richer over time, ceteris paribus the random realizations of opened cases. In other words, if each unopened case truly has the same subjective probability of having any remaining prize, there is a positive expected return to staying in the game for more and more rounds. A risk averse subject that might be just willing to accept the bank offer, if the offer were not expected to get better and better, would choose to continue to another round since the expected improvement in the bank offer provides some compensation for the additional risk of going into the another round. Thus, to evaluate the parameters of some latent utility function given observed choices in

-15- earlier rounds, we have to mentally play out all possible future paths that the contestant faces.17 Specifically, we have to play out those paths assuming the values for the parameters of the likelihood function, since they affect when the contestant will decide to “deal” with the banker, and hence the expected utility of the compound lottery. This corresponds to procedures developed in the finance literature to price path-dependant derivative securities using Monte Carlo simulation (e.g., Campbell, Lo and MacKinlay [1997; §9.4]). We discuss general numerical methods for this type of analysis later. Saying “no deal” in early rounds provides one with the option of being offered a better deal in the future, ceteris paribus the expected value of the unopened prizes in future rounds. Since the process of opening cases is a martingale process, even if the contestant gets to pick the cases to be

opened, it has a constant future expected value in any given round equal to the current expected value. This implies, given the exogenous bank offers (as a function of expected value), that the dollar value of the offer will get richer and richer as time progresses. Thus bank offers themselves will be a sub- martingale process. In the United States version the contestants are joined after the first round by several family members or friends, who offer suggestions and generally add to the entertainment value. But the contestant makes the decisions. For example, in the very first show a lady was offered $138,000, and her hyper-active husband repeatedly screamed out “no deal!” She calmly responded, “At home, you do make the decisions. But .... we’re not at home!” She turned the deal down, as it happens, and went on to take an offer of only $25,000 two rounds later. Our sample consists of 127 contestants recorded between December 19, 2005 and March 19, 2007. This sample includes 6 contestants that participated in special versions, for ratings purposes, in which the top prize was increased from $1 million to $2 million, $3 million, $4 million, $5 million or $6 million.18 The biggest winner on the show so far has been Michelle Falco, who was

17 Or make some a priori judgement about the bounded rationality of contestants. For example, one could assume that contestants only look forward one or two rounds, or that they completely ignore bank offers. 18 Other top prizes were increased as well. For example, in the final show of the first season, the top 5 prizes were changed from $200k, $300k, $400k, $500k and $1m to $300k, $400k, $500k, $2.5m and $5m, respectively. -16- lucky enough to be on the September 22, 2006 show with a top prize of $6 million. Her penultimate offer was $502,000 when the 3 unopened prizes were $10, $750,000 and $1 million, which has an expected value of $583,337. She declined the offer, and opened the $10 case, resulting in an offer of $808,000 when the expected value of the two remaining prizes was $875,000. She declined the offer, and ended up with $750,000 in her case. In other countries there are several variations. In some cases there are fewer prizes, and fewer rounds. In the United Kingdom there are only 22 monetary prizes, ranging from 1p up to £250,000, and only 7 rounds. In round 1 the contestant must pick 5 boxes, and then in each round until round 6 the contestant has to open 3 boxes per round. So there can be a considerable swing from round to round in the expected value of unopened boxes, compared to the last few rounds of the U.S. version. At the end of round 6 there are only 2 unopened boxes, one of which is the contestant’s box. Some versions substitute the option of switching the contestant’s box for an unopened box, instead of a bank offer. This is particularly common in the French and Italian versions, and relatively rare in other versions. Things become much more complex in those versions in which the bank offer in any round is statistically informative about the prize in the contestant’s case. In that case the contestant has to make some correction for this possibility, and also consider the strategic behavior of the banker’s offer. Bombardini and Trebbi [2005] offer clear evidence that this occurs in the Italian version of the show, but there is no evidence that it occurs in the UK version. The Australian version offers several additional options at the end of the normal game, called Chance, SuperCase, and Double Or Nothing. In many cases they are used as “entertainment filler,” for games that otherwise would finish before the allotted 30 minutes. It has been argued, most notably by Mulino et al. [2006], that these options should rationally change behavior in earlier rounds, since they provide some uncertain “insurance” against saying “deal” earlier rather than later. We discuss their arguments in detail in section 3.B below.

-17- 3. Previous Deal Or No Deal Analyses

The previous literature19 has employed three types of empirical strategies with observed DOND behavior. The first empirical strategy is the calculation of CRRA bounds at which a given subject is indifferent between one choice or another. These bounds can be calculated for each subject and each choice, so they have the advantage of not assuming that each subject has the same risk preferences, just that they use the same functional form. The studies differ in terms of how they use these bounds, as discussed briefly below. The use of bounds such as these is familiar from the laboratory experimental literature on risk aversion: see Holt and Laury [2002], Harrison, Johnson,

McInnes and Rutström [2005] and Harrison, Lau, Rutström and Sullivan [2005] for discussion of how one can then use interval regression methods to analyse them. The limitation of this approach is that it is difficult to go beyond the CRRA or other one-parameter families, and in particular to examine other components of choice under uncertainty (such as more flexible utility functions,

preference weighting or loss aversion).20 Post, van den Assem, Baltussen and Thaler [2006a] use CRRA bounds in their analysis, and it has been employed in various forms by others as noted below. The second empirical strategy is the examination of specific choices that provide “trip wire” tests of certain propositions of EUT, or provide qualitative indicators of preferences. For example, decisions made in the very last rounds often confront the contestant with the expected value of the unopened prizes, and allow one to identify risk lovers or risk averters directly. The limitation of this approach is that these choices are subject to sample selection bias, since risk attitudes and other preferences presumably played some role in whether the contestant reached these critical junctures.

19 The literature consists so far of working papers, and revisions might be expected as the research being reported continues. However, it has already generated a lengthy lead article in the Wall Street Journal (January 12, 2006, p.A1) and National Public Radio interviews in the U.S. with researchers Thaler and Post on the programs Day to Day (http://www.npr.org/templates/story/story.php?storyId=5243893) and All Things Considered (http://www.npr.org/templates/story/story.php?storyId=5244516) on March 3, 2006. In each case there were extensive references to detailed results from the working papers discussed below. 20 Abdellaoui, Barrios and Wakker [2007; p.363] offer a one-parameter version of the EP function which exhibits non-constant RRA for empirically plausible parameter values. It does impose some restrictions on the variations in RRA compared to the two-parameter EP function, but is valuable as a parsimonious way to estimate non-CRRA specifications, and could be used for “bounds analyses” such as these. -18- Moreover, they provide limited information at best, and do not allow one to define a metric for errors. If we posit some stochastic error specification for choices, as is now common, then one has no way of knowing if these specific choices are the result of such errors or a manifestation of latent preferences. Blavatskyy and Pogrebna [2006b] illustrate the sustained use of this type of empirical strategy, which is also used by other studies in some respects. The third empirical strategy it to propose a latent decision process and estimate the structural parameters of that process using maximum likelihood. This is the approach we favor, since it allows one to examine structural issues rather than rely on ad hoc proxies for underlying preferences. Our discussion of the literature therefore focuses on those that have used similar methods, and the technical similarities and differences to our approach.

A. Bombardini and Trebbi [2005] The statistical analysis of Bombardini and Trebbi [2005; §3.1] (BT) is exemplary: it sets out a formal likelihood model built on a latent utility-maximizing choice process and solves for the entire path of choices.21 In fact, their numerical methods involve solving the game by backward-induction for each candidate preference parameter. So they only estimate the likelihood for the last 3 rounds of the game (p.14), to avoid the computational burden of solving for the larger game; our approach avoids this numerical complication, and allows us to estimate over the entire game. BT assume a CRRA functional form for utility. They use this functional form to find the CRRA values at which the subject is indifferent between accepting the bank offer or not, and correctly use these to infer an interval response. They then specify the likelihood for an interval-

21 BT also consider the implications of allowing for the apparent fact that in the Italian version the banker knows what prizes are in the remaining, unopened boxes. The latter possibility raises some thorny estimation issues, since the contestant should view themselves in a strategic game under incomplete information with the banker. Thus an offer might affect their belief about the value of the prizes remaining in some complicated manner. Faced with relatively intractable strategic games in the analysis of the venerable double-auction institution, theorists have often added simplifying assumptions to bypass such complications. For example, Friedman [1991] uses a Bayesian Game Against Nature assumption: that the agent behaves as if there is no strategic game with another sentient player, simply to avoid the cognitive burden of doing so. We focus on the version of their estimation that assumes that offers are not informative. BT conclude that it makes little difference to their estimates, which is consistent with that assumption. There is no evidence that the banker in the non-Italian versions knows the prizes. -19- censored model, recognizing that some subjects have closed CRRA intervals (those that accepted a bank offer) and others do not (e.g., those that do not accept any bank offer).22 BT do not allow for any stochastic error in the decision-making process, although they do allow for sampling errors in the estimation stage. They consider three different, exogenous levels of wealth as arguments of the utility function. One is zero, one is a proxy for income, and one is a tenfold increase in the proxy for income and intended as a proxy for wealth. They also, like us, use a non-parametric bank offer function based on empirical data. One limitation of their estimation method is that it does not correct for the possible correlation of errors by the same individual. In their model each contestant can have 1, 2 or 3 observations, depending on their choices. They note (p.16) three methods of allowing for taste heterogeneity, but none of these correct the estimates of standard errors for the unobserved individual heterogeneity common in panel models of this kind.23 BT consider differences in RRA when stakes differ (§4.4). They do so by comparing contestants that get into the last few rounds with low stakes remaining against contestants that get into the last few rounds with higher stakes remaining. They conclude that the former have much lower RRA than the latter, consistent with risk neutrality for low stakes and risk aversion for higher stakes. There are some obvious concerns that sub-sampling from later rounds might bias inferences, since subjects with higher aversion to risk would, ceteris paribus, have dropped out earlier. And variations in luck in earlier rounds would affect choices and sample selection into later rounds, as noted below in our discussion of the effect of prior outcomes in Post, van den Assem, Baltussen and Thaler [2006a]. On the other hand, their conclusion is consistent with our conclusion using the entire game and a utility function that allows for varying RRA. One concern with these results of BT

22 This is the interval-response method that would be appropriate for the CRRA-consistent bounds inferred by Post, van den Assem, Baltussen and Thaler [2006a]. 23 Their first method is to add a standard deviation around the estimate of the CRRA parameter r. Their second method is to allow for a small set of observable individual characteristics to affect the mean estimate of r. The set of characteristics they collect is larger than most in the DOND literature (see their Appendix C), but still small. Their third method is to allow for these observable characteristics to affect the estimated standard deviation of r, which is a heteroskedasticity correction. -20- is that they would seem to invalidate the CRRA utility function they use throughout, raising specification issues. A particularly interesting extension by BT is to estimate a non-EUT model (§5). They in fact estimate a RDEU model, or a version of CPT in the gain domain. Like us, they use a CRRA value function and a probability weighting function. Unfortunately they use an extremely restrictive functional form for the probability weighting specification, the power function T(p) = pH. For values of the exponent H<1 (H>1), this implies overweighting (underweighting) for all p. Thus if the subjects exhibit probability weighting of the form we identify, consistent with the bulk of the RDEU and CPT literature, there would be a concave portion for small p and then a convex portion for larger p.

In fact, if one estimates the conventional form (8) popularized by Tversky and Kahneman [1992] with the UK data used by Andersen et al. [2006a], the function is concave up to p=0.37 and convex thereafter (the switch point has to be either 0.37 or 0.63). Hence, if probabilities for prizes in the last three rounds, the sample used by BT, are between 0.25 and 0.5, they would be forcing the power function to fit “data that wants to be concave and then convex.” It is therefore not a surprise that they estimate H=0.92 with a standard error of 0.06, and cannot reject the EUT-consistent null that H=1. This is an artefact of assuming the restrictive power function, not an inference solely from the data.

B. Mulino, Scheelings, Brooks and Faff [2006] The Australian version of DOND has several unique features. Two, in particular, “kick in” when a contestant is down to two or one unopened boxes after accepting an earlier bank offer. After the offer is accepted it is almost universal that the host counter-factually opens the remaining boxes, for various entertainment reasons: “did you make a good deal or not?” Depending on the history of the game, the host might play what is called a Chance round or a SuperCase round. In Chance the contestant can swap the accepted bank offer for a 50:50 chance at the remaining prizes. In SuperCase, which occurs when there is only one unopened case and all prizes are known, the contestant can swap the accepted bank offer for a lottery of 8 prizes in the low to middle range of the original prize

-21- set. These options are also used by the producers to “fill” an episode, so that there is no need to start a new contestant if the previous game finishes quickly. Mulino, Scheelings, Brooks and Faff [2006] (MSBF) show that these options provide a form of insurance to contestants in earlier rounds, encouraging them to accept bank offers earlier than they might otherwise. They focus on the effect that these three different ways of framing the choice task affect elicited risk attitudes: the standard game, the choices in Chance, and the choices in SuperCase. Their main statistical analysis consists of two parts. The first is a calculation of CRRA- consistent bounds for each subject in each round, in the same manner as Bombardini and Trebbi

[2005] and Post, van den Assem, Baltussen and Thaler [2006a]. These bounds are calculated allowing for a stochastic bank offer (as we do) as well as a deterministic bank offer, to show the effect of allowing for uncertainty about the bank offer. They also show the effect of allowing for the extra games, and that they indeed affect elicited risk attitudes. The second part of their statistical analysis appears to be closer to our specification. They assume a latent CRRA utility function, a stochastic noise process in decision-making following the Luce specification used by Holt and Laury [2002]. MSBF do not state (p.18) how they numerically evaluated the log-likelihood, and in particular the possible paths leading to the continuation values each subject faced. This is a non- trivial numerical issue: the methods we propose in §2 are intuitive but numerically intensive, and Bombardini and Trebbi [2005; p.14] find that they have to limit their analysis to just three final rounds for numerical reasons. In fact, they used a grid search over two parameters, with the grid refined in the neighborhood of the global maximum (Daniel Mulino, private correspondence; 12/18/2006). This is also consistent with the explanation of the use of bootstrapping methods to identify standard errors for the MLE results (fn. 27). These are both computationally intensive choices, which are adequate for small-dimensional parameter searches but would likely become prohibitive for more then 2 or 3 parameters.

-22- C. Post, van den Assem, Baltussen and Thaler [2006a][2006b] Post, van den Assem, Baltussen and Thaler [2006a] (PABT) has been widely cited in the press, and was one of the first academic papers to evaluate behavior in DOND. Their results are methods provide a striking contrast to those in most of the DOND literature, and deserve careful scrutiny. A major revision in Post, van den Assem, Baltussen and Thaler [2006b] (PABTN) employs empirical methods much closer to the prevailing literature, and is evaluated at the end of this section.

Interval Responses The primary manner in which PABT infer risk attitudes is by calculating CRRA-consistent bounds for each subject in each round. They do correct for the increasing bank offers in this calculation, taking into account the continuation value of turning down a bank offer in a given round. Consider the example in their Table III, described in PABT (p. 15). In round 1 this contestant has an upper bound CRRA of 6.94, since he would have to have a CRRA of this or higher to accept the deal given the options that he faced in future rounds (in particular, the improving bank offer). In round 2 his CRRA would have to have been 2.76 or higher to accept this deal. One should then take these as interval responses, following the tabulation of CRRA intervals implied by observed lottery choices in Holt and Laury [2002; Table 3, p.1649] and the statistical analyses of Harrison, Johnson, McInnes and Rutström [2005] and Harrison, Lau, Rutström and Sullivan [2005]. Thus the first round for this subject implied an interval response of (-4, 6.94], the second round implied an interval response of (-4, 2.76], and so on. We do not a priori truncate the second round responses as (2.76, 6.94], since we want to remain open to finding shifts in risk attitudes over time, and collapsing the information from two rounds into one response would lose the original information on the two interval responses in different rounds. This is one of the hypotheses that PABT are interested in. One might respond that the first round response is “uninformative” given the second round response. First, this misses the point about using interval responses directly since that is what the

-23- subject revealed: we could have used a later-round example to make that point. Second, while it is true that the first round response is not statistically informative with respect to the mean estimate of this subject, given the second round response, it is informative with respect to the standard error of that estimate. In other words, it is informative to know that the same subject has made two choices in which the latent value of a CRRA is 6.94 or lower instead of just one choice in which the latent value of a CRRA is 6.94 or lower.24 This is particularly true for later rounds, but there is simply no rationale to discard the data from early rounds if one wants to estimate the standard error of the CRRA parameter in a full-information manner. PABT do not want to calculate that standard error, at least from this stage of their statistical

characterization. They want to construct bounds, based on the last two “active rounds” in the game for this subject. Thus, for the example in their Table III, the contestant accepted in round 5 where his CRRA switching value is 0.97, and his CRRA switching value on round 4 was 1.51. So this provides the bounds for their analysis: [0.97, 1.51]. In turn, they simply take an average of these bounds, on the assertion that “By construction, the upper and lower bounds are biased estimates. By averaging the positive and negative errors can be expected to cancel out, leading to a better estimate.” (p.12). This is not quite correct. Their bounds are biased if one views them as the mean of the latent process generating the observed choices, and this seems to be what the intent is here. But they are not biased if one views than as the “revealed preference” interval of values of some parameter of a latent process generating the observed choices. In this case the latent process, of course, is some individual behaving as if using a CRRA utility function and facing these opportunities. But if one has this latent process in mind, as PABT do, they should use the interval responses from each round and use standard interval-regression methods to infer the latent process, allowing for the panel nature of the data (since there are multiple observations per subject).25 Quite apart from the interval response being the correct unit of analysis given their

24 One would have to correct for panel effects in this case, but such corrections are standard. 25 The estimation of these models has been a standard feature of popular statistical packages such as Stata or LIMDEP for many years. It is also relatively simply to code up directly as a specialized maximum likelihood routine. -24- assumptions about the latent decision process, there is a concern that the intervals they calculate are large. The example they give is quite large: [0.97, 1.51]. The problem is that even if one accepts their bounds, the imprecision here is 1.51 - 0.97 = 0.54 in CRRA units, and collapsing this to a mid-point of 1.24 with an imprecision of 0 clearly biases the analysis by introducing an error in the dependant variable. To be literal, [1.24, 1.24] is not the same as [0.97, 1.51]. Thus one does not know how many of their tests would survive when one recognizes that it is actually an interval between 0.97 and 1.5, rather than some precise point estimate (the average). In their equation (9) in PABT (p.20), used to test EUT, they simply put the inferred average on the left hand side as the dependent variable, ignoring the known noise around it from the interval.26

Prior Outcomes PABT (p.21) devise a measure of the “fortune” of the player up to the last two “active rounds.” This is just the ratio of the EV of the remaining prizes in that round to the EV of all prizes at the beginning of the game, and is a nice measure of the “joss,” or good luck or bad luck, of the player in terms of opening prizes.27 This measure, which we call joss, is then boiled down to a dummy Loss for losses (if the ratio is less than 1) or Gain for gains (if the ratio is greater than 1), and then used to generate measures of PriorLosses = (joss-1)×Loss and PriorGains = (joss- 1)×Gain. We discuss the use of these measures below. These measures play an important role in the main statistical inference of PABT, so it is

26 Footnote 4 (p.44) of PABT has some “corrections” to these steps when calculating the CRRA bounds, none of which are said to affect the main conclusions. First, the subject must have an EV of at least 1000, or else they substitute the average bounds for subjects that stop in those rounds. Such subjects should probably just be dropped, since they are effectively out of the game. Second, subjects that do not accept a deal only have one end-point of the interval, so they assume that the other end-point is given by the round 9 end- point for the subject in question minus the average spread in end-points for all other subjects that stopped in the previous round. This issue can be more cleanly addressed if one moves to a statistical analysis directly defined over the interval-censored responses. In this case one just substitutes a clopen interval such as (-4, 0] if the subject turned down a fair bank offer in the last round of choice (since a CRRA value of 0 indicates risk-neutrality). 27 They take an average of this measure for the two rounds they consider, which probably does not make that much of a difference since EV is a martingale. If one were to keep the CRRA interval response from each round separate as a data point then it would be possible to use the value of this measure of luck from each round, which would be more precise. -25- worthwhile seeing what values they take. Figures 7 and 8 illustrate the density of PriorLosses (to the left of the horizontal axis and the vertical line at 0, in a solid line) and PriorGains (to the right of the horizontal axis, in a dashed line) in the US and UK shows. These distributions use kernel densities for all subjects active in each round.28 Consider the evolution of the measure in Figure 8 as the game proceeds, recognizing that the sample changes as some contestants drop out. In round 1 the asymmetry in the two distributions is the direct result of an asymmetry in the value of the bottom half of the prizes and the top half of the prizes. In each, as shown in Figures 5 and 6, there are just as many “low” prizes on the left of the board as “good” prizes on the right of the board, but the value of the two sets is sharply asymmetric.29 Hence it is easier to suffer bad luck in EV terms than it is to enjoy good luck, and the distribution shows that. This asymmetry disappears as one proceeds into rounds 2 and 3, and the measures are much more symmetric. The fascinating evolution really starts in round 5 of the UK data. Consider the PriorGains measure first. It just collapses as one moves from round 4 into rounds 5 and 6, reflecting the attrition of the sample of subjects that had good luck up to round 4. Their good luck translated into better offers, which induced them to accept the deal, ceteris paribus their risk attitudes. The other feature of the PriorGains measure is that it is not bounded at 1, whereas the PriorLosses measure is bounded at 1. So we start to see some subjects with very high values of this measure compared to the range of values of the PriorLosses measure. If all else were equal in terms of sample composition, and they are not, this would lead to larger standard errors on the PriorGains measure than the PriorLosses measure. This would make it appear as if bad luck is more important for behavior than good luck, but that is just an artefact of these being normalized differently. The implication is that the measures should be normalized, at the very least, if one wants to make inferences about their relative importance (and had some way of handling the sample attrition).

28 There are insufficient contestants with negative PriorGains in round 9 of the US show, so there is no display for that round. The same general pictures can be generated for the UK and Australian versions. 29 The same asymmetry in prizes is found in all versions of Deal Or No Deal. -26- Turning to the PriorLosses measure, it changes from a symmetric, one-mode density in round 3 to become a severely positively-skewed density. This is again due to sample attrition, with the majority of “bad luck subjects” remaining in round 6 being those that had severely bad luck. In effect, their offers are so low that they reason that they may as well hope for the very best luck in the remaining stochastic realizations. Without knowing how such subjects change their reference point with the trail of bad luck they experience it is not possible to say if they are behaving in a loss averse or loss seeking manner. The main concern is that the changes in these measures reflect sample attrition, which in turn reflects the core parameters one would like to estimate. This makes it very difficult to make simple claims about the effect of this measure on behavior, without a model of the choice process leading to the measure taking on certain values. That is, the measure is endogenous with respect to the parameters being measured, generating inconsistent estimates unless one accounts for the correlations implied by the endogeneity (in the standard manner of sample selection corrections).30

Stake Effects To capture stake effects, or the possibility that RRA might not be constant, PABT introduce a dummy variable equal to the log of the EV at the outset of each game. This will pick up cross- country effects, since the stakes in the games are very different. Of course, it will also pick up differences in risk attitudes across countries, but they dismiss this as likely to be small compared to the pure stake effect: “Apart from the initial prizes, the editions used for our study are very similar and the contestants from the three European countries are comparable in terms of their cultural and economic background.” (p.4) Another concern is that there are in fact format differences across countries, as a further confound to the comparison of stakes. The Dutch DOND has 26 prizes and 9 rounds. However, the

30 Our estimation approach avoids this problem by examining the entire history of each subject’s choices in a game, and being explicit about the reference point. Of course, we assume that the reference point is exogenous, and that assumption deserves examination in future work. -27- German DOND consists of two versions. Only 6 of the German contestants in the PABT sample come from the version with 26 prizes and 9 rounds; the remaining 20 come from the version with 20 prizes and 8 rounds. These format differences may not be trivial in terms of the effect on the uncertainty of the lotteries implied by the game. Finally, the sample sizes for the Belgian and German samples are very small: only 19 contestants and 26 contestants, respectively. The reason that this issue is important is because it is the way in which PABT examine the possibility of varying RRA with stakes. They find (Table V, p.37) that a dummy for the size of the stakes in the cross-country editions is not statistically significant as a determinant of behavior, and

use this to infer that there is no evidence to reject CRRA. Of course, this measure suffers from the confounds noted above, as well as the small samples (which bias hypothesis tests in favor of not finding significant effects if they are there for larger samples). Even without the confounds and small samples, it is a between-subjects test. Andersen et al. [2006a] examine this issue using an Expo-Power specification that looks at the wide range of prizes within a series in a given country. These differences are within-sample, since each subject faces a wide range of prizes in each round. This would appear to be a more reliable way to detect such differences. We also illustrate the same point later with an estimation of the Hyperbolic Absolute Risk Aversion function of Merton [1971].

Main Inferences The main PABT results are presented in their Table V (p.37). We focus on the first column of results, assuming that the subjects do not integrate their prizes in DOND with their outside income or wealth, and that they rationally look forward in the game to improved bank offers. The main effect, robust over the other variations, is that prior losses increases RRA. From a constant of 0.24 RRA is increased by 1.68, and this is statistically significant. The first concern is whether this survives recognition of the interval nature of the dependent variable. The second concern is how this interacts with non-CRRA specifications, which our analysis

-28- finds to be important for the UK contestants. If someone has had bad luck, then by definition their lottery (whether myopic or forward-looking) is going to have more prizes that are of lower value than someone else. So if there is IRRA with prizes this IRRA will simply be picked up by the loss term. Indeed, Holt and Laury [2002] provided evidence of IRRA in a laboratory setting, and while the prizes in the lab are tiny in comparison to DOND, in this setting at least the used top prizes up to $345. PABT would argue that this is controlled for by the “stake” variable, but that variable is confounded with other things noted above. In other words, PABT might be right that this is a pure prior loss effect, but there are confounds and the inference is not at all obvious from the analysis presented.31

Prospect Theory Section VI is an innovative attempt to calibrate and estimate CPT for these data. PABT recognize (p.26) that CPT cannot be used to infer parameters for each subject as they did for CRRA and EUT, since there are too many parameters. But they can pool across subjects, and test CPT in that manner. They also recognize some gaps in applying CPT here: • CPT says nothing about multi-stage decisions. So they assume subjects are “myopic” in their value functions, although they define myopia as looking forward one period. This seems overly restrictive. The key point of CPT in this respect is not to integrate current income with past income, but that is a backwards-looking myopia rather than a forward-looking myopia. • CPT does not specify how the reference point should change in dynamic games. But Thaler & Johnson [1990] do have a good discussion of some alternatives, and DOND suggests

31 PABT make an odd claim (p. 24) about the comparison of apples and oranges. Apparently they find that losers in large-stake countries are less risk averse than winners in small-stake countries. This is due to the difference in the size of the coefficients on the stake variable and the loss variable; but the stake variable is in logs, and measures EV across countries, and one does not know what size of average loss is captured by the loss variable. They then go on to compare winners in one country with losers in another country (Table VI), which potentially confounds several things and has very small samples. Again, the claims about these winners and losers might be correct, but these results are not convincing.

-29- some obvious ones: zero, the last bank offer, or the highest bank offer so far. One additional possibility is to use the average earnings in the broadcast show up to that point for the opening round, until a bank offer exceeds that. Given the popularity of the show, it can be expected after several broadcasts that contestants have a good sense of the rough level of expected earnings in these games, and what is a good outcome and a bad outcome historically. The estimation procedure used by PABT is then to find parameters that allow the “hit rate” for CPT to exceed the hit rate for EUT. They find some parameters (p.28) that take the hit rate to 65% or 69%, and declare victory over EUT, which had hit rates of 60% or 61%. Of course, this comparison

of hit rates contains no basis for statistical comparison, since we have no measure of the imprecision of the estimates underlying the hit rates. This is precisely what likelihoods do in a complete statistical analysis. Moreover, one would ideally want to characterize two latent processes here, EUT and CPT, and allow the data to identify the relative weights on each. Of course, there are several EUT variants (e.g., CRRA, expo-power) and several CPT variants (e.g., reference points, different probability weighting functions). But one can presumably boil these down to some interesting alternatives to compare, since the data requirements for more than two processes can be daunting.

Maximum Likelihood Estimation The primary manner in which the PABTN revision differs from PABT is in the extensive use of structural ML estimation of EUT and CPT models. In this respect it follows the rest of the DOND literature. The EUT specification employs EP, following Andersen et al. [2006a]. This has the advantage that varying RRA is not confounded with stakes, as noted above. They also follow Andersen et al. [2006a] and estimate an initial “wealth” argument of the utility function. We hesitate to call this wealth, since it need not refer to lifetime wealth, but could just be some measure of the

-30- consumption over which the contestant pools the prize.32 To simplify the analysis they assume that contestants only look forward one round when deciding whether to accept the current offer. Since the qualitative path of expected bank offers is monotonic in the versions of DOND considered, when viewed from a given round, this assumption should capture the essential aspects of forward- looking behavior, even if it is an ad hoc restriction on the rationality of contestants. We provide evidence below that it biases inferences in the case of the U.S. show, implying greater risk aversion than the unconstrained case. The bias is statistically significant, but not large in economic terms. Of course, the evaluation horizon could vary from version to version, and across nations, making it dangerous to impose some restriction a priori, and interesting to evaluate and compare across

different versions. Their EUT specification includes a heteroskedastic error story, in which greater errors are likely to occur with “more difficult” decisions. This seems plausible, but the choice of a metric for what is a difficult decision is far from obvious. They use the standard deviation of the utility of the prizes remaining as their metric. So a contestant facing a 50/50 choice between two similar prizes would have a lower error than a contestant facing a 50/50 choice between two dis-similar prizes. This does not seem a priori obvious. The likely effect is, as PABTN note, to place greater weight in the (deterministic part of the) likelihood on decisions in later rounds, since they tend to have significantly larger standard deviations that decisions in earlier rounds. This weighting is claimed to have no significant effect on the core parameters of the utility function, but those alternative estimates are not provided. The EUT estimates (Table VI) have some significant differences across versions and

countries. The estimated wealth parameter for the Dutch version is relatively large, at 92,172, as is the estimate for the US version, at $94,115. But the counterpart estimate for the German edition is tiny and insignificant, at 544. The Dutch estimates exhibit evidence of IRRA, starting from a

32 This conceptual issue is discussed at greater length in Andersen et al. [2005][2006c]. It is not the case that this wealth variable has to proxy lifetime wealth for the analysis to be consistent with EUT, since one can define the argument of utility over any number of baseline consumption levels, including zero or lifetime wealth as (popular) extremes. -31- modest level of risk aversion. The US estimates imply CRRA, which is consistent with the evidence from Andersen et al. [2006a] for the UK version when an endogenous wealth parameter is included. But the German estimates appear to indicate CARA, and are consistent with contestants behaving in a risk-neutral manner. These are large qualitative differences, although subjective risk preferences could vary across countries and versions. PABTN claim that their EUT estimates are implausible, in the sense that they cannot explain the choices of “winners” and “losers.” These categories are similar in spirit to the categories of “good fortune” and “bad fortune” discussed above, but are now defined differently. A contestant is said to be a winner if the expected value of remaining prizes remains greater than some fraction of the opening-round expected value even if he opens the highest (remaining) prize. Similarly, a contestant is said to be a loser if the expected value of remaining prizes remains below some fraction of the opening-round expected value even if she opens the lowest (remaining) prize. The fraction in this metric is set a priori to a. Again, winners defined in this manner would tend to have higher value prizes remaining, and losers defined in this manner would tend to have lower value prizes remaining, by construction. The main stylized fact that PABTN are concerned with is that winners and losers tend to behave in a more risk-loving manner, by turning down deals in later rounds, compared to others. The evidence for this claim comes primarily from the Dutch and German versions, and is quite unclear in the US version (their Table III). None of the data underlying this oft-repeated stylized fact are subjected to any statistical tests, and the tiny sample sizes would likely preclude those even if they were conceptually well-defined. The EUT model is re-estimated for each sub-sample (Table VII), but the samples are very small and the reported estimates are “erratic,” to put it diplomatically, suggesting serious numerical convergence problems. The instability of such estimates cannot be taken as reliable evidence for or against EUT. Thus, essentially the same concerns raised earlier about the measures of “fortune” apply here: the raw data can be confounded by stakes, sample selection biases into these categories of winners and losers, and the historical structure of the prizes remaining (e.g., how skewed they are).

-32- In addition, the fractions of choices in later rounds involve very small samples, given the limited samples that PABTN employ. They correctly note (p.14) that The analysis of deal percentages is rather crude. It does not specify an explicit model of risky choice and it not account for the precise choices (bank offers and remaining prizes) faced by the contestants. Furthermore, there is no attempt at statistical inference or controlling for confounding effects at this stage of our analysis. The goal of the ML analysis is to undertake such controls. But one cannot then come back and expect the ML analysis to account for anecdotes that do not condition on any of the acknowledged confounds, and that appears to be the logic employed by PABTN to dismiss the EUT estimates. Those anecdotal outliers could simply be a reflection of heterogeneous risk preferences: this ML

specification is assuming that every contestant has the same utility function, and that is a strong assumption. The deeper issue is whether this specification accounts for average behavior, weighted by the number of “winners,” “losers” and others, not whether it accounts for every individual contestant. In other words, this econometric specification is not built to explain the anecdotal outliers. Thus, even if one accepted that the model should account for these outliers, the implication is that one should consider plausible specifications of individual heterogeneity, as illustrated most clearly by De Roos and Sarafidis [2006] (discussed below). Thus it is not the case, as PABTN claim (p.3), that their findings are “difficult to reconcile with expected utility theory without using too many epicycles.”33 One simply allows for less restrictive specifications, consistent with the economic theory that allows people to have different risk attitudes. The difficulty with this modeling option in the case of DOND is that we have very few observations for each subject, so one cannot allow for individual heterogeneity easily or without parametric assumptions. But one cannot therefore logically claim that EUT is a poor model for this behavior, when one can only apply restrictive forms of EUT. Hey and Orme [1994] remains a landmark in this respect, simply because they collected enough data in a laboratory experiment to test EUT without assuming a representative agent (and overall they find strong evidence in favor of

33 An epicycle is a small circle or geocentric planetary path. Each will return one to the same place. We suspect that epicycles are not a smart way to try to extend any theory. -33- EUT compared to the alternatives considered). We stress that we do not seek to simply defend EUT in an unqualified manner. In fact, our own estimates have shown evidence of probability weighting and “aspiration levels” in DOND, and we consider at some length the subtleties of testing for path-dependence in these settings (Andersen et al. [2006a][2006b]). The estimates of PABTN provide no evidence for or against EUT that we regard as convincing, since they seem interested in telling only one “story” from these data. The CPT specification of PABTN focuses on the role of the reference point, which is where this “story” enters explicitly. The most radical departure from the usual specification is that probability weighting and decision weights are assumed to play no role in behavior. Virtually every

other DOND analysis finds a significant role for decision weights, so this is likely to be a serious mis-specification of CPT. PABTN argue that this is a simplification in the interests of parsimony, which seems to suggest that it would not affect results. But no evidence for that inference is provided, except in the contemporary DOND literature, which PABTN do not mention. The main innovation of the PABTN analysis is the estimation of an endogenous reference point, building on the approach developed by Andersen et al. [2006d] in a dynamic laboratory experiment. The basic idea is that contestants start with some initial, “homegrown” reference point in round 1, and then slowly update this towards the expected bank offer in the next round. The initial reference point is included in the ML estimation, as is a parameter measuring the speed with which the initial value is updated in each round. This extension to the literature is valuable, although it would be valuable to compare it to several alternative, natural specifications of the reference point, such as the current offer or the best offer. Those specifications have been shown by Andersen et al. [2006a] to be associated with the complete absence of loss aversion in the UK version of DOND.

So there is a reasonable concern that the evidence of loss aversion reported by PABTN might be fragile to their preferred specification. Of course, one cannot draw comparisons without extending the PABTN analysis to allow for decision weights, and it is not obvious that loss aversion would survive that extension alone: since the decision weight parameters differ for gains and losses, they could influence the evidence for loss aversion.

-34- This issue is a fundamental problem for many behavioral models in economics: some free parameters are offered to account for an apparent anomaly, but these are unobservable. This is not an intrinsic problem, since we have latent parameters in standard EUT models as well that need to be inferred. But the invitation to undertake a specification search, guided by priors that have not been systematically examined, is then a particularly dangerous one. The reference point is one such danger point for specifications of prospect theory. As stressed by Pesendorfer [2006; p.714-5]: When prospect theory is applied to economic setting, it is often impossible to identify the reference point. [...] Behavioral economists deal with this ambiguity by treating the reference point as a free variable chosen to match the observed behavior of an application. Depending on an application, the reference point may be the endowment, the value of the endowment at a fixed date, the value at which the agent previously bought an asset, or the expected earnings at the end of the day. [...] In all these applications, we cannot observe variations in the reference point in the same way that experimenters can fix and manipulate the reference point. Therefore, the reference point becomes a parameter that is calibrated to match the observed data. But unlike risk aversion and the discount factor, the reference point need not be consistent across applications or even consistent across periods for the same application. Essentially, it captures a (subjective and unobservable) state of the decisionmaker. [...] The unobservability of the subjective state makes it more difficult to falsify the theory. [...] Ultimately, the theory allows too many degrees of freedom. (Of course, defenders of prospect theory might say that there can’t be a puzzle because prospect theory is right!). The point is to be aware of the fragility of inferences to the structural specification assumed, and to honestly report the mapping from alternative specifications with non-trivial a priori weight to posterior-based inferences (Leamer and Leonard [1983]). Quite apart from the specification of the reference point, picking which parts of a structural CPT model to include or not (e.g., the exclusion of decision weights by PABTN) only adds to the need to report the effects of all priors. The “story” of path-dependence in choices in DOND may well be a correct hypothesis, and this is a wonderful domain to look for evidence of it, but it is important to provide a sense of how fragile the inferences are.

D. De Roos and Sarafidis [2006] The Australian version of DOND provides the data for De Roos and Sarafidis [2006] (DS), who focus on the estimation of risk attitudes assuming EUT as well as a RDEU specification. They

-35- employ two statistical methods: an exploratory CARA-consistent bounds analysis, and then a full- blown maximum likelihood model initially using CRRA and CARA. There are two important methodological features of the latter model: it deals with panel effects rigorously, and is then extended to a RDEU specification. The bounds analysis is viewed as a relatively non-parametric way of motivating the need to worry about individual (observed and unobserved) heterogeneity. The ML model proposed by DS allows for noise in the bank offer, but their empirical implementation does not include it. However, they do allow for a stochastic noise component in the latent decision-making process. This specification is nicely motivated (p.12) by noting that a number of subjects appear to have behaved inconsistently with EUT, at least if one assumes CARA. In each

case the inconsistency involved a reversal of risk attitudes as rounds progressed: someone accepting an offer that had been revealed as worse than a rejected offer in a previous round, for example. Although one would want to account for stochastic errors consistently with estimates of the sampling error of the risk preferences before declaring such choices EUT-inconsistent, these motivating examples are correct for this pedagogic purpose. The methodological highlight of the base EUT specification of DS is the definition of three ways of allowing for individual heterogeneity (p.17). One is to ignore it. The other is to add a random effect < to the latent index defining the strength of preference for one outcome over the other. The final method is to add a random effect L to the coefficients defining the core parameters of the structural model. We explain each precisely below, in the context of our formal specification. In each case the random effect term < and L varies across individuals but is the same for each choice by the same individual. These are valid statistical ways to allow for individual heterogeneity, but tend to be very computationally intensive. The second methodological highlight of DS is their extension to consider a RDEU specification. They do so by assuming that decisions are made in the gain frame, as we do, and as do Bombardini and Trebbi [2005; §5] discussed above. Unfortunately they also employ the restrictive power probability weighting function T(p) = pH, which restricts one to estimate either (weakly) concave or (weakly) convex decision weights for the entire range of probabilities. They estimate H .

-36- ½ using CRRA and the dynamic (forward-looking) model with corrections for individual heterogeneity. This implies severe over-weighting of all probabilities. With the rank-dependent specification, this in turn implies decision weights that reflect over-weighting of low prizes and under- weighting of high prizes, which is counter-intuitive. One “extension” of their RDEU specification is to estimate the early RDEU model of Yaari [1987]. This amounts to assuming that there is no diminishing marginal utility of income, and that contestants are risk neutral. Hence all of the explanatory power rests on the probability weighting function. In this case they estimate H . 1½, implying under-weighting for all p. This implies decision weights that reflect over-weighting of large prizes, which is intuitively plausible (although here the decision weights have to also do the work of the utility function in terms of allowing for diminishing marginal utility).

E. Other Studies Deck, Lee and Reyes [2007] estimate CRRA-consistent and CARA-consistent bounds from observed choices in the Mexican version of DOND. They use the last few rounds, since the computational burden of evaluating every possible continuation path became too large for earlier rounds (p.6). Our numerical approach, on the other hand, samples from these paths in a Monte Carlo manner, so does not become numerically crippled when evaluating the whole game. Of course, this means that there is some approximation error from the Monte Carlo sampling. Blavatskyy and Pogrebna [2006a][2006b] do not undertake estimation of a model of the preferences of any latent decision-making process. Instead, they seek to identify “trip wire” tests of certain theoretical predictions. The problem with this approach is that is provides no metric for stochastic error: a “small deviation” from the prediction of a deterministic model counts as much as a “large deviation.” The lack of any consideration of stochastic error is odd, given the important contribution of Blavatsky [2005] on exactly this general issue. Botti, Conte, DiCagno and D’Ippoliti [2006] also focus on a static representation of the game. They follow De Roos and Sarafidis [2006] and consider the role of unobserved individual

-37- heterogeneity in behavior, using techniques from maximum simulated likelihood. Their restriction to a static representation, however, is a severe modeling limitation.

4. A General Estimation Strategy The Deal Or No Deal game is a dynamic stochastic task, in which the contestant has to make choices in one round that generally entail consideration of future consequences. The same is true of the other game shows used for estimation of risk attitudes. In Card Sharks the level of bets in one round generally affect the scale of bets available in future rounds, including bankruptcy, so for plausible preference structures one should take this effect into account when deciding on current

bets. Indeed, as explained earlier, one of the empirical strategies employed by Gertner [1993] can be viewed as a crude precursor to our general method. In Lingo the stop/continue structure, with a certain amount being compared to a virtual lottery, is evident. We propose a general estimation strategy for such environments, and apply it to Deal Or No Deal. The strategy uses randomization to break the general “curse of dimensionality” that is evident if one considers this general class of dynamic programming problems (Rust [1997]).

A. Basic Intuition The basic logic of our approach can be explained from the data and simulations shown in Table 1. We restrict attention here to the first 75 contestants that participated in the standard version of the television game with a top prize of $1 million, to facilitate comparison of dollar amounts. There are 9 rounds in which the banker makes an offer, and in round 10 the surviving contestant simply opens his case. Only 7, or 9%, of the sample of 75 made it to round 10, with most accepting the banker’s offer in rounds 6, 7, 8 and 9. The average offer is shown in column 4. We stress that this offer is stochastic from the perspective of the sample as a whole, even if it is non-stochastic to the specific contestant in that round. Thus, to see the logic of our approach from the perspective of the individual decision-maker, think of the offer as a non-stochastic number, using the average values shown as a proximate indicator of the value of that number in a particular instance.

-38- In round 1 the contestant might consider up to 9 virtual lotteries. He might look ahead one round and contemplate the outcomes he would get if he turned down the offer in round 1 and accepted the offer in round 2. This virtual lottery, realized in virtual round 2 in the contestant’s thought experiment, would generate an average payoff of $31,141 with a standard deviation of $23,655. The top panel of Figure 9 shows the simulated distribution of this particular lottery. The distribution of payoffs to these virtual lotteries are highly skewed, so the standard deviation may be slightly misleading if one thinks of these as Gaussian distributions. However, we just use the standard deviation as one pedagogic indicator of the uncertainty of the payoff in the virtual lottery: in our formal analysis we consider the complete distribution of the virtual lottery in a non-

parametric manner. In round 1 the contestant can also consider what would happen if he turned down offers in rounds 1 and 2, and accepted the offer in round 3. This virtual lottery would generate, from the perspective of round 1, an average payoff of $53,757 with a standard deviation of £45,996. The bottom panel of Figure 9 shows the simulated distribution of this particular virtual lottery. Compared to the virtual lottery in which the contestant said “No Deal” in round 1 and “Deal” in round 2, shown above it in Figure 9, it gives less weight to the smallest prizes and greater weight to higher prizes. Similarly for each of the other virtual lotteries shown. The forward looking contestant in round 1 is assumed to behave as if he maximizes the expected utility of accepting the current offer or continuing. The expected utility of continuing, in turn, is given by simply evaluating each of the 9 virtual lotteries shown in the first row of Table 1. The average payoff increases steadily, but so does the standard deviation of payoffs, so this evaluation requires knowledge of the utility function of the contestant. Given that utility function, the contestant is assumed to behave as if they evaluate the expected utility of each of the 9 virtual lotteries. Thus we calculate 9 expected utility numbers, conditional on the specification of the parameters of the assumed utility function and the virtual lotteries that each subject faces in their round 1 choices. In round 1 the subject then simply compares the maximum of these 9 expected utility numbers to the utility of the non-stochastic offer in round 1. If that maximum exceeds the

-39- utility of the offer, he turns down the offer; otherwise he accepts it. In round 2 a similar process occurs. One critical feature of our virtual lottery simulations is that they are conditioned on the actual outcomes that each contestant has faced in prior rounds. Thus, if a (real) contestant has tragically opened up the 5 top prizes in round 1, that contestant would not see virtual lotteries such as the ones in Table 1 for round 2. They would be conditioned on that player’s history in round 1. We report here averages over all players and all simulations. We undertake 50,000 simulations for each player in each round, so as to condition on their history.34 This example can also be used to illustrate how our maximum likelihood estimation procedure works. Assume some specific utility function and some parameter values for that utility

function. The utility of the non-stochastic bank offer in round R is then directly evaluated. Similarly, the virtual lotteries in each round R can then be evaluated.35 They are represented numerically as 100-point discrete approximations, with 100 prizes and 100 probabilities associated with those prizes. Thus, by implicitly picking a virtual lottery over an offer, it is as if the subject is taking a draw from this 100-point distribution of prizes. In fact, they are playing out the DOND game, but this representation as a virtual lottery draw is formally identical. The evaluation of these virtual lotteries generates v(R) expected utilities, where v(1)=9, v(2)=8,...,v(9)=1 as shown in Table 1. The maximum expected utility of these v(R) in a given round R is then compared to the utility of the offer, and the likelihood evaluated in the usual manner. We present a formal statement of the latent EUT process leading to a likelihood defined over parameters and the observed choices, and then discuss how this intuition changes when we assume alternative, non-EUT processes.

34 If bank offers were a deterministic and known function of the expected value of unopened prizes, we would not need anything like 50,000 simulations for later rounds. For the last few rounds of a full game, in which the bank offer is relatively predictable, the use of this many simulations is a numerically costless redundancy. 35 There is no need to know risk attitudes, or other preferences, when the distributions of the virtual lotteries are generated by simulation. But there is definitely a need to know these preferences when the virtual lotteries are evaluated. Keeping these computational steps separate is essential for computational efficiency, and is the same procedurally as pre-generating “smart” Halton sequences of uniform deviates for later, repeated use within a maximum simulated likelihood evaluator (e.g., Train [2003; p. 224ff.]). -40- B. Formal Specification We assume that utility is defined over money m using the popular Constant Relative Risk Aversion (CRRA) function u(m) = m1-r /(1-r)(1) where r is the utility function parameter to be estimated. In this case r … 1 is the RRA coefficient, and u(m) = ln(m) for r = 1. With this parameterization r = 0 denotes risk neutral behavior, r > 0 denotes risk aversion, and r < 0 denotes risk loving. We review one extension to this simple CRRA model later, but for immediate purposes it is desirable to have a simple specification of the utility function in order to focus on the estimation methodology.36

Probabilities for each outcome k, pk, are those that are induced by the task, so expected utility is simply the probability weighted utility of each outcome in each lottery. There were 100 outcomes in each virtual lottery i, so

EUi = 3k=1, 100 [ pk × uk ]. (2) Of course, we can view the bank offer as being a degenerate lottery. A simple stochastic specification was used to specify likelihoods conditional on the model. The EU for each lottery pair was calculated for a candidate estimate of the utility function parameters, and the index

LEU = (EUBO - EUL)/: (3)

calculated, where EUL is the lottery in the task, EUBO is the degenerate lottery given by the bank offer, and : is a Fechner noise parameter following Hey and Orme [1994].37 The index LEU is then

36 It is possible to extend the analysis by allowing the core parameter r to be a function of observable characteristics. Or one could view the CRRA coefficient as a random coefficient reflecting a subject specific random effect L, so that one would estimate ^r = ^r 0 + L instead. This is what DeRoos and Sarafidis [2006] do for their core parameters, implicitly assuming that the mean of L is zero and estimating the standard deviation of L. Our approach is just to estimate ^r 0. 37 Harless and Camerer [1994], Hey and Orme [1994] and Loomes and Sugden [1995] provided the first wave of empirical studies including some formal stochastic specification in the version of EUT tested. There are several species of “errors” in use, reviewed by Hey [1995][2002], Loomes and Sugden [1995], Ballinger and Wilcox [1997], and Loomes, Moffatt and Sugden [2002]. Some place the error at the final choice between one lottery or the other after the subject has decided deterministically which one has the higher expected utility; some place the error earlier, on the comparison of preferences leading to the choice; and some place the error even earlier, on the determination of the expected utility of each lottery. -41- used to define the cumulative probability of the observed choice to “deal” using the cumulative standard normal distribution function: G(LEU) = M(LEU). (4) This provides a simple stochastic link between the latent economic model and observed choices.38 The likelihood, conditional on the EUT model being true and the use of the CRRA utility function, depends on the estimate of r and : given the above specification and the observed choices. The conditional log-likelihood is

EUT ln L (r, :; y) = 3i [ (ln G(LEU) * yi=1) + (ln (1-G(LEU)) * yi=0) ] (5)

where yi =1(0) denotes the choice of “deal” (“no deal”) in task i. We extend this standard formulation to include forward looking behavior by redefining the lottery that the contestant faces. One such virtual lottery reflects the possible outcomes if the subject always says “no deal” until the end of the game and receives his prize. We call this a virtual lottery since it need not happen; it does happen in some fraction of cases, and it could happen for any subject. Similarly, we can substitute other virtual lotteries reflecting other possible choices by the contestant. Just before deciding whether to accept the bank offer in round 1, what if the contestant behaves as if the following simulation were repeated ' times: { Play out the remaining 8 rounds and pick cases at random until all but 2 cases are unopened. Since this is the last round in which one would receive a bank offer, calculate the expected value of the remaining 2 cases. Then multiply that expected value by the fraction that the bank is expected to use in round 9 to calculate the offer. Pick that fraction from a prior as to the average offer fraction, recognizing that the offer fraction is stochastic. } The end result of this simulation is a sequence of ' virtual bank offers in round 9, viewed from the perspective of round 1. This sequence then defines the virtual lottery to be used for a contestant in round 1 whose horizon is the last round in which the bank will make an offer. Each of the ' bank offers in this virtual simulation occurs with probability 1/', by construction. To keep things

38 DeRoos and Sarafidis [2006] assume a random effects term < for each individual and add it to the latent index defining the probability of choosing deal. This is the same thing as changing our specification (4) to G(LEU) = M(LEU) + <, and adding the standard deviation of < as a parameter to be estimated (the mean of < is assumed to be 0). -42- numerically manageable, we can then take a 100-point discrete approximation of this lottery, which will typically consist of ' distinct real values, where one would like ' to be relatively large (we use '=100,000). This simulation is conditional on the 6 cases that the subject has already selected at the end of round 1. Thus the lottery reflects the historical fact of the 6 specific cases that this contestant has already opened. The same process can be repeated for a virtual lottery that only involves looking forward to the expected offer in round 8. And for a virtual lottery that only involves looking forward to rounds 7, 6, 5, 4, 3 and 2, respectively. Table 1 illustrates the outcome of such calculations. The contestant can be viewed as having a set of 9 virtual lotteries to compare, each of which entail saying “no deal”

in round 1. The different virtual lotteries imply different choices in future rounds, but the same response in round 1. To decide whether to accept the deal in round 1, we assume that the subject simply compares the maximum EU over these 9 virtual lotteries with the utility of the deterministic offer in round 1. To calculate EU and utility of the offer one needs to know the parameters of the utility function, but these are just 9 EU evaluations and 1 utility evaluation. These evaluations can be undertaken within a likelihood function evaluator, given candidate values of the parameters of the utility function. The same process can be repeated in round 2, generating another set of 8 virtual lotteries to be compared to the actual bank offer in round 2. This simulation would not involve opening as many cases, but the logic is the same. Similarly for rounds 3 through 9. Thus for each of round 1 through 9, we can compare the utility of the actual bank offer with the maximum EU of the virtual lotteries for that round, which in turn reflects the EU of receiving a bank offer in future rounds in the underlying game. In addition, there exists a virtual lottery in which the subject says “no deal” in every round. This is the virtual lottery that we view as being realized in round 10 in Table 1. There are several significant advantages of this virtual lottery approach. First, we can directly see that the contestant that has a short horizon behaves in essentially the same manner as the

-43- contestant that has a longer horizon, and just substitutes different virtual lotteries into their latent EUT calculus. This makes it easy to test hypotheses about the horizon that contestants use, although here we assume that contestants evaluate the full horizon of options available. Second, one can specify mixture models of different horizons, and let the data determine what fraction of the sample employs which horizon. Third, the approach generalizes for any known offer function, not just the ones assumed here and in Table 1. Thus it is not as specific to the DOND task as it might initially appear. This is important if one views DOND as a canonical task for examining fundamental methodological aspects of dynamic choice behavior. Those methods should not exploit the specific structure of DOND, unless there is no loss in generality. In fact, other versions of DOND can be used to illustrate the flexibility of this approach, since they sometimes employ “follow on” games that can simply be folded into the virtual lottery simulation. Finally, and not least, this approach imposes virtually no numerical burden on the maximum likelihood optimization part of the numerical estimation stage: all that the likelihood function evaluator sees in a given round is a non- stochastic bank offer, a handful of (virtual) lotteries to compare it to given certain proposed parameter values for the latent choice model, and the actual decision of the contestant to accept the offer or not. This parsimony makes it easy to examine non-CRRA and non-EUT specifications of the latent dynamic choice process, illustrated in Andersen et al. [2006a][2006b]. All estimates allow for the possibility of correlation between responses by the same subject, so the standard errors on estimates are corrected for the possibility that the responses are clustered for the same subject. The use of clustering to allow for “panel effects” from unobserved individual effects is common in the statistical survey literature.39 In addition, we consider allowances for

39 Clustering commonly arises in national field surveys from the fact that physically proximate households are often sampled to save time and money, but it can also arise from more homely sampling procedures. For example, Williams [2000; p.645] notes that it could arise from dental studies that “collect data on each tooth surface for each of several teeth from a set of patients” or “repeated measurements or recurrent events observed on the same person.” The procedures for allowing for clustering allow heteroskedasticity between and within clusters, as well as autocorrelation within clusters. They are closely related to the “generalized estimating equations” approach to panel estimation in epidemiology (see Liang and Zeger [1986]), and generalize the “robust standard errors” approach popular in econometrics (see Rogers [1993]). Wooldridge [2003] reviews some issues in the use of clustering for panel effects, noting that significant inferential problems may arise with small numbers of panels. -44- random effects from unobserved individual heterogeneity40 after estimating the initial model that assumes that all subjects have the same preferences for risk.

C. Estimates We estimate the CRRA coefficient to be 0.18 with a standard error of 0.030, implying a 95% confidence interval between 0.12 and 0.24. So this provides evidence of moderate risk aversion over this large domain. The noise parameter : is estimated to be 0.077, with a standard deviation of 0.015. We can extract from our estimation information on the horizon actually used to evaluate the

alternatives to accepting the bank offer. Figure 10 displays histograms of these evaluation horizons in each round. The results and point to these contestants putting most weight on the last two rounds in which they will receive an offer. Thus in round 1 through 3 almost every contestant actually used the expected utility of the round 8 virtual lottery to compare to the bank offer. The expected utilities for other virtual lotteries may well have generated the same binary decision, but the virtual lottery for round 8 was the one actually used since it was greater than the others in terms of expected utility. It is a simple matter to examine the effects of constraining the horizon over which the contestant is assumed to evaluate options. If one assumed that choices in each round were based on a comparison of the bank offer and the expected outcome from the terminal round, when the case was opened, then the CRRA estimate becomes 0.023, with a 95% confidence interval between -0.060 and 0.11. So one cannot reject the hypothesis that subjects behave as if risk-neutral if they are only assumed to look to the terminal round, and ignore the intervening bank offers (the p-value for this hypothesis is 0.59). If one instead assumes that choices in each round were based on a myopic horizon, in which the contestant just considers the distribution of likely offers in the very next

40 In the DOND literature, de Roos and Sarafidis [2006] demonstrate that alternative ways of correcting for unobserved individual heterogeneity (random effects or random coefficients) generally provide similar estimates, but that they are quite different from estimates that ignore that heterogeneity. Botti et al. [2006] also consider unobserved individual heterogeneity, and show that it is statistically significant in their models (which ignore dynamic features of the game). -45- round, the CRRA estimate becomes 0.30, with a 95% confidence interval between 0.18 and 0.42. Thus we obtain results that are similar to those obtained when we allow subjects to consider all horizons, although the estimates are biased and imply greater risk aversion than the unconstrained estimates. The estimated noise parameter increases to 0.12, with a standard error of 0.043. Overall, the estimates assuming myopia are statistically significantly different from the unconstrained estimates, even if the estimates of risk attitudes are substantively similar. Our specification of alternative evaluation horizons does not lead to a nested hypothesis test of parameter restrictions, so a formal test of the differences in these estimates required a non-nested hypothesis test. We use the popular Vuong [1989] procedure, even though it has some strong assumptions discussed in Harrison and Rutström [2005]. We find that we can reject the hypothesis that the evaluation horizon is only the terminal horizon with a p-value of 0.026, and the reject the hypothesis that the evaluation horizon is myopic with a p-value of less than 0.0001. Finally, we can consider the validity of the CRRA assumption in this setting, by allowing for varying RRA with prizes. One natural candidate utility function to replace (1) is the Hyperbolic Absolute Risk Aversion (HARA) function of Merton [1971]. We use a specification of HARA41 given in Gollier [2001]: U(y) = . A(0 + y/()1!(, (…0(1') where the parameter . can be set to 1 for estimation purposes without loss of generality. This function is defined over the domain of y such that 0 + y/( > 0. The first order derivative with respect to income is

UN(y) = (. A(1!()/()A(0 + y/()!( which is positive if and only if . A(1!()/( > 0 for the given domain of y. The second order derivative is UO(y) = !(. A(1!()/()A(0 + y/()!(!1 < 0 which is negative for the given domain of y. Hence it is not possible to specify risk loving behavior

41 Gollier [2001; p.25] refers to this as a Harmonic Absolute Risk Aversion, rather than the Hyperbolic Absolute Risk Aversion of Merton [1971; p.389]. -46- with this specification when non-satiation is assumed. This is not a particularly serious restriction for a model of aggregate behavior in Deal Or No Deal. With this specification ARA is 1/(0 + y/(), so the inverse of ARA is linear in income; RRA is y/(0 + y/(), which can both increase and decrease with income. Relative risk aversion is independent of income and equal to ( when 0 = 0. Using the HARA utility function, we estimate 0 to be 0.30, with a standard error of 0.070 and a 95% confidence interval between 0.15 and 0.43. Thus we can easily reject the assumption of CRRA over this domain. We estimate ( to be 0.992, with a standard error of 0.001. Evaluating RRA over various prize levels reveals an interesting pattern: RRA is virtually 0 for all prize levels up to around $10,000, when it becomes 0.03, indicating very slight risk aversion. It then increases sharply

as prize levels increase. At $100,000 RRA is 0.24, at $250,000 it is 0.44, at $500,000 it is 0.61, at $750,000 it is 0.70, and finally at $1 million it is 0.75. Thus we observe striking evidence of risk neutrality for small stakes, at least within the context of this task, and risk aversion for large stakes. If contestants are constrained to only consider the options available to them in the next round, roughly the same estimates of risk attitudes obtain, even if one can again statistically reject this implicit restriction. RRA is again overestimated, reaching 0.39 for prizes of $100,000, 0.61 for prizes of $250,000, and 0.86 for prizes of $1 million. On the other hand, assuming that contestants only evaluate the terminal option leads to much lower estimates of risk aversion, consistent with the findings assuming CRRA. In this case there is virtually no evidence of risk aversion at any prize level up to $1 million, which is clearly implausible a priori.

5. Conclusions Game shows offer obvious advantages for the estimation of risk attitudes, not the least being the use of large stakes. Our review of analyses of these data reveal a steady progression of sophistication in terms of the structural estimation of models of choice under uncertainty. Most of these shows, however, put the contestant into a dynamic decision-making environment, so one cannot simply (and reliably) use static models of choice. Using Deal Or No Deal as a detailed case study, we considered a general estimation methodology for such shows in which randomization of

-47- the potential outcomes allows us to break the curse of dimensionality that comes from recognizing these dynamic elements of the task environment. The Deal Or No Deal paradigm is important for several reasons, and more general that it might at first seem. It incorporates many of the dynamic, forward-looking decision processes that strike one as a natural counterpart to a wide range of fundamental economic decisions in the field. The “option value” of saying “No Deal” has clear parallels to the financial literature on stock market pricing, as well as to many investment decisions that have future consequences (so-called “real options”). There is no frictionless market ready to price these options, so familiar arbitrage conditions for equilibrium valuation play no immediate role, and one must worry about how the individual makes these decisions. The game show offers a natural experiment, with virtually all of the major components replicated carefully from show to show, and even from country to country. The only sense in which Deal Or No Deal is restrictive is that it requires that the contestant make a binary “stop/go” decision. This is already a rich domain, as illustrated by several prominent examples: the evaluation of replacement strategy of capital equipment (Rust [1987]) and the closure of nuclear power plants (Rothwell and Rust [1997]). But it would be valuable to extend the choice variable to be non-binary, such as in Card Sharks where the contestant has the bet level to decide in each round, as well as some binary decision (whether to switch the face card). Although some progress has been made on this problem, reviewed in Rust [1994], the range of applications has not been wide (e.g., Rust and Rothwell [1995]). Moreover, none of these have considered risk attitudes, let alone associated concepts such as loss aversion or probability weighting. Thus the detailed analysis of choice behavior in environments such as Card Sharks should provide a rich test-case for many broader applications. These game shows provide a particularly fertile environment to test extensions to standard EUT models, as well as alternatives to EUT models of risk attitudes. We have discussed earlier the applications that considered rank-dependent models such as RDU, and then sign-dependent models such as CPT. Our own applications, using the virtual lottery approach and UK data, has demonstrated the sensitivity of inferences to the manner in which key concepts are operationalized.

-48- Andersen et al. [2006a] find striking evidence of probability weighting, which is interesting since the Deal Or No Deal game has symmetric probabilities on each case. Using natural reference points to define contestant-specific gains or losses, we find no evidence of loss aversion. Of course, that inference depends on having identified the right reference point, but CPT is generally silent on that specification issue when it is not obvious from the frame. Andersen et al. [2006b] illustrate the application of alternative “dual-criteria” models of choice from psychology, built to account for lab behavior with long-shot, asymmetric lotteries such as one finds in Deal Or No Deal. No doubt many other specifications will be considered. Within the EUT framework, Andersen et al. [2006a] demonstrated the important of allowing for asset integration. When utility was assumed to be

defined over prizes plus some outside wealth measure42, behavior was characterized well by a CRRA specification; but when it was assumed to be defined over prizes only, behavior was better characterized by a non-CRRA specification with increasing RRA over prizes. There are three major weaknesses of game shows. The first is that one cannot change the rules of the game or the information that contestants receive, much as one can in a laboratory experiment. Thus, the experimenter only gets to watch and learn, since natural experiments are, as described by Harrison and List [2004], serendipity observed. However, it is a simple matter to design laboratory experiments that match the qualitative task domains in the game show, even if one cannot hope to have stakes to match the game show (e.g., Tenorio and Cason [2002], Healy and Noussair [2004], Andersen et al. [2006b] and Post et al. [2006b]). Once this has been done, exogenous treatments can be imposed and studied. If behavior in the default version of the game can be calibrated to behavior in a lab environment, then one has some basis for being interested in the behavioral effects of treatments in the lab. The second major weakness of game shows is the concern that the sample might have been selected by some latent process correlated with the behavior of interest to the analyst: the classic sample selection problem. Most analyses of game shows are aware of this, and discuss the

42 This estimated measure might be interpreted as wealth, or as some function of wealth in the spirit of Cox and Sadiraj [2006]. -49- procedures by which contestants get to participate. At the very least, it is clear that the demographic diversity is wider than found in the convenience samples of the lab. We believe that controlled lab experiments can provide guidance on the extent of sample selection into these tasks, and that the issue is a much more general one. The third major weakness of game shows is the lack of information on observable characteristics, and hence the inability to use that information to examine heterogeneity of behavior. It is possible to observe some information from the contestant, since there is normally some pre- game banter that can be used to identify sex, approximate age, marital status, and ethnicity. But the general solution here is to employ econometric methods that allow one to correct for possible heterogeneity at the level of the individual, even if one cannot condition on observable characteristics of the individual. Until then, one either pools over subjects under the assumption that they have the same preferences, as we have done; make restrictive assumptions that allow one to identify bounds for a given contestant, but then provide contestant-specific estimates (e.g., Post, van den Assem, Baltussen and Thaler [2006a]); or pay more attention to statistical methods that allow for unobserved heterogeneity. One such method is to allow for random coefficients of each structural model to represent an underlying variation in preferences across the sample (e.g., Train [2003; ch.6], De Roos and Sarafidis [2006] and Botti, Conte, DiCagno and D’Ippoliti [2006]). This is quite different from allowing for standard errors in the pooled coefficient, as we have done. Another method is to allow for finite mixtures of alternative structural models, recognizing that some choices or subjects may be better characterized in this domain by one latent decision-making process and that others may be better characterized by some other process (e.g., Harrison and Rutström [2005]). These methods are not necessarily alternatives, but they each demand relatively large data sets and considerable attention to statistical detail.

-50- Figure 1: Money Cards Board in Card Sharks

Figure 2: Money Cards Board from Lab Version of Card Sharks

-51- Figure 3: The Word Puzzle in Lingo

Figure 4: Example of a Lingo Board

-52- Figure 5: Opening Display of Prizes in TV Game Show

Figure 6: Prizes Available After One Case Has Been Opened

-53- Figure 7: Metrics of Good Luck and Bad Luck in US Show Luck measured by ratio of EV of remaining cases to EV at start of game, minus 1; N=119 Round 1 Round 5

-1 0 1 2 -1 0 1 2 Round 2 Round 6

-1 0 1 2 -1 0 1 2 Round 3 Round 7

-1 0 1 2 -1 0 1 2 Round 4 Round 8

-1 0 1 2 -1 0 1 2

Figure 8: Metrics of Good Luck and Bad Luck in UK Show Luck measured by ratio of EV of remaining cases to EV at start of game, minus 1; N=211 Round 1 Round 4

-1 0 1 2 3 4 5 -1 0 1 2 3 4 5 Round 2 Round 5

-1 0 1 2 3 4 5 -1 0 1 2 3 4 5 Round 3 Round 6

-1 0 1 2 3 4 5 -1 0 1 2 3 4 5

-54- Table 1: Virtual Lotteries for US Deal or No Deal Game Show Average (standard deviation) of payoff from virtual lottery using 100,000 simulations. Sample restricted to the first 75 contestants in the “standard” versions with $1 million top prize.

Active Average Looking at virtual lottery realized in ... Round Contestants Deal! Offer round 2 round 3 round 4 round 5 round 6 round 7 round 8 round 9 round 10

1750$16,180 $31,141 $53,757 $73,043 $97,275 $104,793 $120,176 $131,165 $136,325 $136,281 100% ($23,655) ($45,996) ($66,387) ($107,877) ($102,246) ($121,655) ($154,443) ($176,425) ($258,856)

2750$33,453 $53,535 $72,588 $96,887 $104,369 $119,890 $130,408 $135,877 $135,721 100% ($46,177) ($66,399) ($108,086) ($102,222) ($121,492) ($133,239) ($175,278) ($257,049)

3750$54,376 $73,274 $97,683 $105,117 $120,767 $131,563 $136,867 $136,636 100% ($65,697) ($107,302) ($101,271) ($120,430) ($153,058) -173810 ($255,660))

4751$75,841 $99,895 $107,290 $123,050 $134,307 $139,511 $139,504 100% ($108,629) ($101,954) ($120,900) ($154,091) ($174,702) ($257,219))

5745$103,188 $111,964 $128,613 $140,275 $145,710 $145,757 99% ($106,137) ($126,097) ($160,553) ($180,783) ($266,303)

66916$112,818 $128,266 $139,774 $145,348 $145,301 92% ($124,945) ($159,324) ($180,593) ($266,781)

75320$119,746 $136,720 $142,020 $142,323 71% ($154,973) ($170,118) ($246,044)

83316$107,779 $116,249 $116,020 44% ($157,005) ($223,979)

91710$79,363 $53,929 23% ($113,721)

10 7 9%

Note: Data drawn from observations of contestants on the U.S. game show, plus author’s simulations of virtual lotteries as explained in the text.

-55- Figure 9: Two Virtual Lottery Distributions in Round 1 VL if No Deal in round 1 and then Deal in round 2 Density

0 50000 100000 150000 200000 Prize Value VL if No Deal in round 1 No Deal in round 2 and then Deal in round 3 Density

0 50000 100000 150000 200000 Prize Value

Figure 10: Evaluation Horizon by Round 1 4 7 80 60 40 20 0 2 5 8 80 60 40 20 Frequency 0 3 6 9 80 60 40 20 0 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Future Round Used

-56- References

Abdellaoui, Mohammed; Barrios, Carolina, and Wakker, Peter P., “Reconciling Introspective Utility With Revealed Preference: Experimental Arguments Based on Prospect Theory,” Journal of Econometrics, 138, 2007, 356-378. Andersen, Steffen; Harrison, Glenn W.; Lau, Morten Igel, and Rutström, E. Elisabet, “Eliciting Risk and Time Preferences,” Working Paper 05-24, Department of Economics, College of Business Administration, University of Central Florida, 2005. Andersen, Steffen; Harrison, Glenn W.; Lau., Morten, and Rutström, E. Elisabet, “Dynamic Choice Behavior in a Natural Experiment,” Working Paper 06-10, Department of Economics, College of Business Administration, University of Central Florida, 2006a. Andersen, Steffen; Harrison, Glenn W., Lau, Morten I., and Rutström, E. Elisabet, “Dual Criteria Decisions,” Working Paper 06-11, Department of Economics, College of Business Administration, University of Central Florida, 2006b. Andersen, Steffen; Harrison, Glenn W., Lau, Morten I., and Rutström, E. Elisabet, “Risk Aversion in the Large and the Small: Evidence from Natural and Laboratory Experiments,” Working Paper, Department of Economics, College of Business Administration, University of Central Florida, 2006c. Andersen, Steffen; Harrison, Glenn W., and Rutström, E. Elisabet, “Choice Behavior, Asset Integration and Natural Reference Points,” Working Paper 06-04, Department of Economics, College of Business Administration, University of Central Florida, 2006d. Ballinger, T. Parker, and Wilcox, Nathaniel T., “Decisions, Error and Heterogeneity,” Economic Journal, 107, July 1997, 1090-1105. Beetsma, R.M.W.J., and Schotman, P.C., “Measuring Risk Attitudes in a Natural Experiment: Data from the Television Game Show Lingo,” Economic Journal, 111, October 2001, 821-848. Blavatskyy, Pavlo, “A Stochastic Expected Utility Theory,” IEW Working Paper No. 231, Institute for Empirical Research in Economics, University of Zurich, February 2005. Blavatskyy, Pavlo, and Pogrebna, Ganna, “Loss Aversion? Not with Half-a-Million on the Table!” Working Paper 274, Institute for Empirical Research in Economics, University of Zurich, July 2006a. Blavatskyy, Pavlo, and Pogrebna, Ganna, “Testing the Predictions of Decision Theories in a Natural Experiment When Half a Million Is At Stake,” Working Paper 291, Institute for Empirical Research in Economics, University of Zurich, June 2006b. Bombardini, Matilde, and Trebbi, Francesco, “Risk Aversion and Expected Utility Theory: A Field Experiment with Large and Small Stakes,” Working Paper 05-20, Department of Economics, University of British Columbia, November 2005. Botti, Fabrizio; Conte, Anna; DiCagno, Daniela, and D’Ippoliti, Carlo, “Risk Attitude in Real Decision Problems,” Unpublished Manuscript, LUISS Guido Carli, Rome, September 2006.

-57- Campbell, John Y.; Lo, Andrew W., and MacKinlay, A. Craig, The Econometrics of Financial Markets (Princeton: Princeton University Press, 1997). Cox, James C., and Sadiraj, Vjollca, “Small- and Large-Stakes Risk Aversion: Implications of Concavity Calibration for Decision Theory,” Games & Economic Behavior, 56(1), July 2006, 45- 60. Deck, Cary; Lee, Jungmin, and Reyes, Javier, “Risk Attitudes in Large Stake Gambles: Evidence from a Game Show,” Working Paper, Department of Economics, University of Arkansas, January 2007; forthcoming, Applied Economics. De Roos, Nicolas, and Sarafidis, Yianis, “Decision Making Under Risk in Deal or No Deal,” Working Paper, School of Economics and Political Science, University of Sydney, April 2006. Friedman, Daniel, “A Simple Testable Model of Double Auction Markets,” Journal of Economic Behavior & Organization, 15, 1991, 47-70.

Gertner, R., “Game Shows and Economic Behavior: Risk-Taking on Card Sharks,” Quarterly Journal of Economics, 108(2), 1993, 507-521. Gollier, Christian, The Economics of Risk and Time (Cambridge, MA: MIT Press, 2001). Harless, David W., and Camerer, Colin F., “The Predictive Utility of Generalized Expected Utility Theories,” Econometrica, 62(6), November 1994, 1251-1289. Harrison, Glenn W.; Johnson, Eric; McInnes, Melayne M., and Rutström, E. Elisabet, “Risk Aversion and Incentive Effects: Comment,” American Economic Review, 95(3), June 2005, 897- 901. Harrison, Glenn W.; Lau, Morten I., and Rutström, E. Elisabet, “Estimating Risk Attitudes in Denmark: A Field Experiment,” Scandinavian Journal of Economics, 109(2), June 2007 forthcoming. Harrison, Glenn W.; Lau, Morten Igel; Rutström, E. Elisabet, and Sullivan, Melonie B., “Eliciting Risk and Time Preferences Using Field Experiments: Some Methodological Issues,” in J. Carpenter, G.W. Harrison and J.A. List (eds.), Field Experiments in Economics (Greenwich, CT: JAI Press, Research in Experimental Economics, Volume 10, 2005). Harrison, Glenn W., and List, John A., “Field Experiments,” Journal of Economic Literature, 42(4), December 2004, 1013-1059. Harrison, Glenn W., and Rutström, E. Elisabet, “Expected Utility Theory and Prospect Theory: One Wedding and A Decent Funeral,” Working Paper 05-18, Department of Economics, College of Business Administration, University of Central Florida, 2005. Hartley, Roger; Lanot, Gauthier, and Walker, Ian, “Who Really Wants to be a Millionaire? Estimates of Risk Aversion from Gameshow Data,” Working Paper, Department of Economics, University of Warwick, February 2005. Healy, Paul, and Noussair, Charles, “Bidding Behavior in the Price Is Right Game: An Experimental Study,” Journal of Economic Behavior & Organization, 54, 2004, 231-247. Hey, John, “Experimental Investigations of Errors in Decision Making Under Risk,” European

-58- Economic Review, 39, 1995, 633-640. Hey, John D., “Experimental Economics and the Theory of Decision Making Under Uncertainty,” Geneva Papers on Risk and Insurance Theory, 27(1), June 2002, 5-21. Hey, John D., and Orme, Chris, “Investigating Generalizations of Expected Utility Theory Using Experimental Data,” Econometrica, 62(6), November 1994, 1291-1326. Holt, Charles A., and Laury, Susan K., “Risk Aversion and Incentive Effects,” American Economic Review, 92(5), December 2002, 1644-1655. Kahneman, Daniel, and Tversky, Amos, “Prospect Theory: An Analysis of Decision Under Risk,” Econometrica, 47, 1979, 263-291. Leamer, Edward, and Leonard, Herman, “Reporting the Fragility of Regression Estimates,” Review of Economics & Statistics, 65(2), May 1983, 306-317.

Liang, K-Y., and Zeger, S.L., “Longitudinal Data Analysis Using Generalized Linear Models,” Biometrika, 73, 1986, 13-22. Loomes, Graham; Moffatt, Peter G., and Sugden, Robert, “A Microeconometric Test of Alternative Stochastic Theories of Risky Choice,” Journal of Risk and Uncertainty, 24(2), 2002, 103-130. Loomes, Graham, and Sugden, Robert, “Incorporating a Stochastic Element Into Decision Theories,” European Economic Review, 39, 1995, 641-648. Markowitz, Harry, “The Utility of Wealth,” Journal of Political Economy, 60, April 1952, 151-158. Merton, Robert C., “Optimum Consumption and Portfolio Rules in a Continuous-Time Model,” Journal of Economic Theory, 3, 1971, 373-413, Metrick, Andrew, “A Natural Experiment in ‘Jeopardy!’,” American Economic Review, 85(1), March 1995, 240-253. Mulino, Daniel; Scheelings, Richard; Brooks, Robert, and Faff, Robert, “An Empirical Investigation of Risk Aversion and Framing Effects in the Australian Version of Deal Or No Deal,” Working Paper, Department of Economics, Monash University, June 2006. Nalebuff, Barry, “Puzzles: Slot Machines, Zomepirac, Squash, and More,” Journal of Economic Perspectives, 4(1), Winter 1990, 179-187. Pesendorfer, Wolfgang, “Behavioral Economics Comes of Age: A Review Essay on Advances in Behavioral Economics,” Journal of Economic Literature, XLIV, September 2006, 712-721. Post, Thierry; van den Assem, Martijn; Baltussen, Guido, and Thaler, Richard, “Deal or No Deal? Decision Making under Risk in a Large-Payoff Game Show,” Working Paper, Department of Finance, Erasmus School of Economics, Erasmus University, April 2006a. Post, Thierry; van den Assem, Martijn; Baltussen, Guido, and Thaler, Richard, “Deal or No Deal? Decision Making under Risk in a Large-Payoff Game Show,” Working Paper, Department of Finance, Erasmus School of Economics, Erasmus University, December 2006b. Quiggin, John, “A Theory of Anticipated Utility,” Journal of Economic Behavior & Organization, 3(4),

-59- 1982, 323-343. Quiggin, John, Generalized Expected Utility Theory: The Rank-Dependent Model (Norwell, MA: Kluwer Academic, 1993). Rogers, W. H., “Regression standard errors in clustered samples,” Stata Technical Bulletin, 13, 1993, 19-23. Rothwell, Geoffrey, and Rust, John, “On the Optimal Lifetime of Nuclear Power Plants,” Journal of Business & Economic Statistics, 15(2), April 1997, 195-208. Rust, John, “Optimal Replacement of GMC Bus Engines: An Empirical Model of Harold Zurcher,’ Econometrica, 55, 1987, 999-1033. Rust, John, “Structural Estimation of Markov Decision Processes,” in D. McFadden and R. Engle (eds.), Handbook of Econometrics (Amsterdam: North-Holland, Volume 4, 1994).

Rust, John, “Using Randomization to Break the Curse of Dimensionality,” Econometrica, 65(3), May 1997, 487-516. Rust, John, and Rothwell, Geoffrey, “Optimal Response to a Shift in Regulatory Regime: The Case of the US Nuclear Power Industry,” Journal of Applied Econometrics, 10, 1995, S75-S118. Tenorio, Rafael, and Cason, Timothy, “To Spin or Not To Spin? Natural and Laboratory Experiments from The Price is Right,” Economic Journal, 112, 2002, 170-195. Thaler, Richard H., and Johnson, Eric J., “Gambling With the House Money and Trying to Break Even: The Effects of Prior Choice Outcomes on Risky Choice,” Management Science, 36(6), June 1990, 643-660. Train, Kenneth E., Discrete Choice Methods with Simulation (New York: Cambridge University Press, 2003). Tversky, Amos, and Kahneman, Daniel, “Advances in Prospect Theory: Cumulative Representations of Uncertainty,” Journal of Risk & Uncertainty, 5, 1992, 297-323; references to reprint in D. Kahneman and A. Tversky (eds.), Choices, Values, and Frames (New York: Cambridge University Press, 2000). Vuong, Quang H., “Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses,” Econometrica, 57(2), March 1989, 307-333. Williams, Rick L., “A Note on Robust Variance Estimation for Cluster-Correlated Data,” Biometrics, 56, June 2000, 645-646. Wooldridge, Jeffrey, “Cluster-Sample Methods in Applied Econometrics,” American Economic Review (Papers & Proceedings), 93, May 2003, 133-138. Yaari, Menahem E., “The Dual Theory of Choice under Risk,” Econometrica, 55(1), 1987, 95-115.

-60-