<<

HETEROGENEITYIN RISK PREFERENCES:

EVIDENCEFROMA REAL-WORLD BETTING MARKET∗

Angie Andrikogiannopoulou Filippos Papakonstantinou King’s College London King’s College London

Abstract

We develop a structural model of behavior that accounts for rich heterogeneity in individuals’ risk pref- erences, and we estimate it using a panel dataset of individual activity in a wagering market. The number and variety of choices we observe enables us to distinguish the key features of prospect theory — value-function curvature, loss aversion, and probability weighting. Our Bayesian hierarchical mixture model enables us to estimate the proportion of distinct utility ‘types’ characterized by the different prospect theory features, and thus to evaluate their relative prevalence in the population. We find that utility curvature alone does not explain the choices we observe and that, while loss aversion is important, probability weighting is the most prevalent behavioral feature of risk attitudes: Two-thirds of individuals exhibit loss aversion, but all exhibit probability weighting.

Keywords: Risk Preferences, Prospect Theory, Loss Aversion, Probability Weighting, Discrete Choice, Mixture Model, Hierarchical Model, Bayesian Estimation

∗For helpful comments, we thank participants at the 2014 European Economic Association Annual Congress and the 2018 Center for Economic Analysis of Research Workshop, and seminar participants at Cambridge University, Einaudi Institute for Economics and Finance, Imperial College London Business School, McGill University, Oxford University, Princeton University, the Stockholm School of Economics, the University of Amsterdam, the University of Geneva, and the Wharton School. Send correspondence to Filippos Papakonstantinou, King’s Business School, Bush House, 30 Aldwych, London WC2B 4BG, UK; telephone: +44 (0) 20 7848 3770. E-mail: [email protected]. Since Kahneman and Tversky (1979) proposed prospect theory, a plethora of experimental and field studies have shown that individual behavior deviates from the predictions of expected utility theory and exhibits patterns consistent with loss aversion and probability weighting. But even though prospect theory has become the most widely accepted behavioral theory of choice, the limited availability of rich data, particularly in the field, and modeling limitations have made it difficult to distinguish between the different prospect theory features and assess their relative importance and prevalence in the population. Hence, an important question remains unresolved: which of prospect theory’s features are most useful in explaining behavior? In this paper, we aim to address this question in the field by combining (1) a panel dataset that tracks individuals’ choices in a sports wagering market with (2) a hierarchical structural model that allows for an extensive representation of individual heterogeneity within and across utility types characterized by the different prospect theory features, and (3) a Bayesian algorithm designed to estimate this model.

The sports wagering market is an advantageous natural laboratory for risk preference estimation. First, it features choices whose outcomes (i) are determined exogenously by match outcomes, and (ii) are associated with probabilities that are predicted quite accurately by the quoted in the market. These features allow for a lottery representation of choices and hence facilitate preference estimation. Second, this market features a variety of consisting of a wide range of probabilities and prizes, including gains and losses as well as small and large stakes, which enables us to identify and comprehensively study all features of prospect theory. Third, in this setting, individuals typically make over time a large number of choices, which enables us to estimate preference heterogeneity across individuals with improved accuracy.

To estimate preference heterogeneity across individuals, extant studies have followed two main ap- proaches. Most commonly, each individual’s preferences are estimated in isolation; but this may lead to inaccurate estimates, especially when only a few choices per individual are observed. As a result, some recent studies have combined individual information with information from the population distribution, which is estimated under specific distributional assumptions (e.g., discrete or normal). This approach allows for information sharing across individuals, but in a restricted manner. For example, it is inappropriate to assume a normal population distribution if individuals cluster into distinct sub-populations. In this paper, we propose a hierarchical mixture model that generalizes previous methodologies. First, we model the population as a flexible mixture of four sub-populations: one type that exhibits neither probability weighting nor loss aversion, two types that each turn on one of these features, and an unrestricted type that exhibits both features.

This approach introduces the proportions of each type as model parameters, which are jointly estimated

1 with the entire distribution of preference parameters, and allows us to assess the relative importance of the

prospect-theory features. Second, we allow for preference heterogeneity within each sub-population. Thus,

we forgo the restrictive assumption that there is a representative agent for each type, which would not only

understate the heterogeneity but would also bias the preference estimates (and the type allocations). Overall,

our methodology enables us to share information across individuals in a flexible manner dictated by the data

and improves inference at both the individual and population level, as what we learn about each individual’s

preferences and type allocations feeds into the shape of the population distribution, and vice versa. To

estimate our model, we design and implement an appropriate Bayesian Markov chain Monte Carlo algorithm.1

Our model of behavior is based on the cumulative prospect theory (CPT) of Tversky and Kahneman

(1992), according to which individuals use a value function defined over gains and losses relative to a reference

point, have different sensitivity to losses and gains, and systematically distort event probabilities by weighting

the gain/loss cumulative distribution. Our average preference parameter estimates are within the range of

those reported in the experimental literature (see Wakker, 2010): the value function curvature is moderate,

loss aversion is mild, and individuals significantly overweight extreme positive and negative outcomes.

Furthermore, we find that utility curvature alone does not adequately explain individual choices, and that

the distinctive features of prospect theory — loss aversion and probability weighting — offer a significant

improvement in explaining observed behavior. But, importantly, our study reveals that these two features

are not equally prevalent. We find that individuals can essentially be partitioned into two groups: about

two-thirds exhibit both behavioral features, and one-third exhibit probability weighting but not loss aversion.

These findings indicate that prospect theory should not be viewed as a monolithic theory with a set of features

— loss aversion and probability weighting — that all individuals exhibit, but rather reality is more nuanced.

This more nuanced view could facilitate and guide the use of prospect theory in applied models. For example,

our finding that probability weighting plays a more important role than loss aversion could be useful in

finance, where loss aversion has sometimes led to counterfactual predictions for individual investor behavior

and for the aggregate stock market (see Barberis, 2013 for a review). Indeed, in an application we show

that our preference estimates make realistic predictions when applied to a standard portfolio choice problem,

as they yield optimal portfolios similar to those observed in surveys on household portfolio allocations.

We conduct a number of robustness checks, necessitated by the fact that we study a field setting hence we

1This estimation solves for an equilibrium density — the joint of the model parameters — which is roughly a stochastic analogue of a fixed point. Our estimation uses a Gibbs sampler, which is the stochastic analogue of an algorithm that solves for a fixed point.

2 do not have the benefit of experimental control. Regarding individuals’ beliefs, our baseline is the standard assumption in the literature that they are rational which, in our setting, can be approximated quite well by the probabilities implied by the market prices. In a sensitivity analysis, we show that our results are robust to al- ternative approximations of the rational beliefs, as well as to small but significant systematic deviations of the subjective from the rational beliefs. We also show that observed bets are unlikely to be driven by (real or per- ceived) superior information or affinity toward specific teams (e.g., home-area teams), since almost all individ- uals place wagers on a large variety of leagues and teams rather than any specific one, and virtually no individ- uals generate significantly positive returns from betting. Regarding the choice sets individuals face, our base- line assumption is that individuals consider all lotteries available in the , except for those that have rarely been chosen by any individual in our sample. In a robustness check, we consider an alternative choice set containing only lotteries that respect a tight, but reasonable, budget constraint of a few hundred euros.

The majority of evidence on estimating risk preferences comes from lab experiments, surveys, and semi-experimental data from TV game shows.2 A small number of studies have estimated preferences in the

field using real decisions. Most of these have used aggregate prices to infer representative-agent preferences: for example, Golec and Tamarkin (1998), Jullien and Salanie (2000), and Snowberg and Wolfers (2010) use horse-racetrack betting prices; Chetty (2006) uses labor supply data; and Kliger and Levy (2009) and

Polkovnichenko and Zhao (2013) use financial market prices.3 Only a handful of studies have used individual- level data, hence are able to relax the representative-agent assumption and estimate heterogeneous preferences:

Cicchetti and Dubin (1994), Cohen and Einav (2007), and Barseghyan et al. (2013) study deductible choices in insurance markets, and Paravisini, Rappoport and Ravina (2017) study portfolio choices in a peer-to-peer lending market. Due to the nature of the data they analyze, these studies estimate only a subset of the CPT features. An exception is Andrikogiannopoulou and Papakonstantinou (2019) who, using the same data as we do, estimate the full CPT specification, but focus on history dependence rather than heterogeneity. Our paper is different in that it accommodates rich heterogeneity both within and across utility types which allows us not only to estimate all CPT features but also to assess for the first time their relative importance and prevalence.

Our findings are more directly related to the small number of field studies that emphasize the importance

2For lab experiments, see, e.g., Tversky and Kahneman (1992), Hey and Orme (1994), Holt and Laury (2002), Choi et al. (2007), and von Gaudecker, van Soest and Wengstrom (2011). For surveys, see, e.g., Barsky et al. (1997) and Donkers, Melenberg and van Soest (2001). For TV game show studies, see, e.g., Gertner (1993), Metrick (1995), Beetsma and Schotman (2001), Post et al. (2008), and Bombardini and Trebbi (2012). 3An exception is the recent study by Chiappori et al. (2019), who use aggregate betting data to back out heterogeneity in a generic one-dimensional utility.

3 of probability weighting. Snowberg and Wolfers (2010) compare a model with only probability weighting to a model with only risk aversion and find that the former better explains aggregate prices in the horse-racetrack betting market. Barseghyan et al. (2013) find that, to explain individual insurance choices, it is necessary to use a model with a form of probability distortions that could be interpreted as probability weighting and/or loss aversion. These studies do not juxtapose probability weighting with loss aversion because, with the data available to them, the two cannot be distinguished. Thus, an important contribution of our study is that we exploit two distinct characteristics of our data — the large number of observations per individual and the great variety of possible choices — to separately identify loss aversion and probability weighting and we develop a rich model of heterogeneity to assess their relative prevalence in the population.

Methodologically, most of the literature has followed a parametric estimation approach, as we do. Some studies have taken a nonparametric approach, but usually at the cost of ignoring individual heterogeneity which is one of the main focuses of our analysis (e.g., Abdellaoui, 2000; Bleichrodt and Pinto, 2000).4

Furthermore, our model builds on and extends earlier work in the experimental literature that characterizes risk preference heterogeneity using mixture or hierarchical models, and generalizes it to incorporate both features simultaneously. Using mixtures, Harrison and Rutstrom (2009) propose that each subject probabilis- tically behaves according to expected utility theory for some choices and according to prospect theory for others; instead, we classify individuals rather than choices to utility types. Considering a mixture of utility types, Conte, Hey and Moffatt (2011) estimate population-level parameters assuming all individuals belong to both types, and then allocate individuals to types. Instead, we estimate parameters at the population and the individual level in a consistent manner, as whatever we learn about the population feeds back into the estimation of individuals’ preferences and type allocations, and vice versa. Finally, Bruhin, Fehr-Duda and Epper (2010) consider a mixture of CPT types, each with different parameters but with no variation across individuals within types; we accommodate a broader variation in preferences by allowing for both across- and within-type heterogeneity. On the other hand, Nilsson, Rieskamp and Wagenmakers (2011),

Murphy and ten Brincke (2017), and Baillon, Bleichrodt and Spinu (2019) use hierarchical models to estimate prospect theory preferences by assuming that parameters are drawn from a normal distribution.

We generalize this approach by allowing parameters to be drawn from a mixture distribution, with each component corresponding to a different utility type. Thus, we are able to estimate the proportions of different

4See Stott (2006) and Barseghyan et al. (2018) for a discussion of the trade-offs between parametric and nonparametric preference estimation. Succinctly, while a parametric approach risks being misspecified, a nonparametric one is less efficient and more demanding of the data in terms of identification.

4 utility types simultaneously with the entire distribution of risk preferences, and to share information across

individuals in a manner that is not restricted by the normal parametric assumption.

Our paper, like other field studies on risk preference estimation, should be read with the following

caveat in mind. It focuses on a subset of the population — people participating in a sports wagering market

— which may not be representative of the whole population. But, even though the group of people who

bet on sports may be self-selected, it is in fact quite large: Betting is a very common and economically

important activity, with “half of America’s population and over two-thirds of Britain’s [placing a] bet on

something” per year (The Economist, 8 July 2010) and with $1 trillion wagered on sports, globally, per year

(H2 Capital report of 2013). Furthermore, the fact that the risk preferences we estimate (i) are

very heterogeneous, (ii) are within the range of extant estimates from lab experiments, and (iii) yield sensible

predictions when applied to different choice settings as demonstrated in our portfolio choice application,

further indicates that a potential sample selection bias is likely to be limited. Regardless, keeping in mind that

extrapolating preference parameters from any group or context to another necessitates additional assumptions,

our estimates from this setting are complementary to those from other settings (e.g., insurance choice), and

may even be particularly relevant for sub-populations or contexts that share similar characteristics with

those in wagering markets (e.g., traditional financial markets).

The rest of the paper is organized as follows. In Section 1, we describe the data. In Section 2 we present

our model of individual behavior, in Section 3 we describe our estimation methodology, and in Section 4

we discuss our results. In Section 5 we use our estimates in a portfolio choice application, and in Section

6 we present robustness checks. In Section 7, we conclude. An Appendix contains additional details on the

estimation procedure, and an Internet Appendix contains additional tables and figures supporting our analysis.

1 Data

Here, we explain how the online sports-betting market works, and we describe our data.

1.1 Background information and data description

A sportsbook is a quote-driven dealer market that offers betting opportunities on a variety of sports (e.g.,

soccer, tennis, and auto races), a variety of fixtures within each (e.g., soccer matches from the English

Premier League and the European Champions League), and a variety of events associated with each fixture

(e.g., for a given soccer match, one can wager on the winning team, the exact score, the total number of

5 corners, etc.). The acts as a dealer and, at any point in time, he sets the odds for each of the

possible outcomes of an event, i.e., the inverse of the price for receiving a unit monetary payoff in case the

outcome is realized. Customers can place wagers at these odds and the bookmaker takes the opposite side

of each wager, with the intention of making a profit through the bid-ask spread, often called the commission.

For example, an individual betting on the winner of a soccer match can select one of three possible outcomes

— home win, draw, or away win — each of which has a price determined by the bookmaker. If the outcomes’

prices are, respectively, 0.55, 0.30, and 0.20 (quoted as 1 1.82, 1 3.33, and 1 5.00 in 0.55 ≈ 0.30 ≈ 0.20 = decimal odds5) then an individual backing the home team to win will make AC82 for each AC100 staked if

the home team wins; similarly, he will make AC233 or AC400 for each AC100 staked, if he backs the draw

or away outcome and his bet wins. The sum of the prices on all outcomes of the event is the price of a

portfolio which pays AC1 for sure at the end of the event; the amount by which this sum exceeds 1 (in this

example, 0.55 0.30 0.20 1 0.05) represents the bookmaker’s commission for each euro wagered. + + − = The quoted prices are associated with the outcomes’ implied probabilities, calculated in this example as

0.55 0.30 0.20 0.55 0.30 0.20 0.52, 0.55 0.30 0.20 0.29, and 0.55 0.30 0.20 0.19, respectively. + + ≈ + + ≈ + + ≈ According to the traditional sportsbook pricing model, the bookmaker aims to keep a ‘balanced’ book;

for example, if the home team is bet heavily, the bookmaker could lower its odds to attract money on the

other team. This way, the amount of money paid out to winners, and therefore his commission, is the same

regardless of which team wins. But, even though the bookmaker can change the odds over time, each bet’s

payoff is determined by the odds at which it was placed.6 This means that the book is rarely completely

balanced, and that balancing the book can be a risky strategy that could be exploited by arbitrageurs if

prices deviate substantially from the efficient ones. As a result, it has been suggested that

may find it optimal to set the prices close to the efficient ones, even if this implies that the book becomes

slightly unbalanced sometimes (see Paul and Weinbach, 2008, 2009 for empirical evidence supporting this

view). This approach reduces the costs and risks of balancing the book, while the bookmaker still earns

the commission in expectation.7 For example, if the home team is expected to win with probability 0.52

0.52 and the bookmaker wants to make a commission of 5%, he can set the home team’s price at 1 0.05 0.55 − ≈ 5Odds are quoted to two decimal places, with a tick size that depends on the odds, becoming smaller as the odds approach 1, i.e., as an outcome becomes a heavy favorite. 6This applies to fixed-odds betting markets, as the one we study. In , commonly used in horse-racetrack betting, bettors place wagers in a common pool, and the payout of winning bets is not known in advance but rather is determined at the end of the event by splitting the pool between the winners. 7Levitt (2004) suggests that bookmakers may allow unbalanced betting to exploit bettor biases, but the bookmaker under study has indicated that he does not employ this strategy.

6 (quoted as 1 1.82 in decimal odds). So, in expectation, individuals backing this outcome will make 0.55 ≈ 0.52 ( AC82) 0.48 ( AC100) AC5 and the bookmaker will make AC5 for every AC100 wagered. Our · + + · − ≈ − market efficiency tests below and the conversations we had with our data provider suggest that the efficient pricing model better describes the behavior of the bookmaker under study. Regardless, in Section 6.2 we show that our preference estimates are quite robust to both price-setting behaviors.

An individual can place a single wager as a single bet, or he can combine multiple wagers into various bet types to produce a wide array of payoff profiles and risk levels. Specifically, N wagers can be combined in a type-N accumulator bet, which yields a gross return equal to the product of the gross returns of all N wagers if all of them win, and zero otherwise. Or they can be combined into permutation or full-cover bets, which themselves combine various accumulators. Specifically, one can form a permutation bet that includes

N N! P N the k k!(N k)! type-k accumulators for some k N, or a full-cover bet that includes the 1 k N k = − ≤ ≤ ≤ P N accumulators of type 1 through N or the 2 k N k accumulators of type 2 through N. The individual ≤ ≤ then chooses how much money he wants to stake on his bet, and finally he submits one or more bets in the form of a betting slip, an example of which is shown in Figure 1.

Event Sport Selection Bet Type Odds Total Stake USA vs Germany (FIBA Worlds) Match Winner Germany Single 1.33 50 Arsenal vs Chelsea (EPL) Total Goals Over/Under 2.5 Soccer Over 2.5 Doubles 2.25 NY Yankees vs Toronto Blue Jays (MLB) Total Match Runs Odd/Even Odd Doubles 1.67 Taylor vs Baxter (World GP, 1st round) Match Winner Darts Baxter Doubles 1.10 30

Figure 1: Sample betting slip. It contains two bet types: a ‘single’ bet on one event and a ‘doubles’ bet on three events, i.e., a permutation bet on the three possible type-2 accumulators.

The data we use in this study is from a European online sports-betting agency. It contains the complete betting histories of 336 randomly selected customers of the company over a period of 5 years, from October

2005 to November 2010.8 We observe the following information for each bet placed: (i) date, (ii) event bet and outcome chosen (e.g., Nadal will defeat Federer in the US Open final), (iii) amount wagered, (iv) bet type (e.g., single or double), (v) quoted prices for all the outcomes of the chosen event, and (vi) bet result. We also know the gender, age, country of residence, and zip code of the individuals, and we create individual-level proxies for income and education by matching these with the relevant census data.

8The original data contained 400 individuals. We dropped 36 who never placed a bet and 28 who placed fewer than 5 bets.

7 Table 1: Summary Statistics for Individual Characteristics and Betting Activity

Panel A: Individual Characteristics

Mean Median Std. Dev. Min Max Female 0.07 0 0.25 0 1 Age 32.85 31 9.63 18 67 Income 23.89 24.15 5.28 10.38 52.54 Education 1.99 2.00 0.27 1.13 2.80

Panel B: Bet Characteristics

Mean Median Std. Dev. Min Max Event types per individual 23.11 15 25.49 1 235 Leagues per individual 34.51 32 20.64 1 109 Teams per individual 182.61 122 174.81 4 976 Bet days per individual 35.02 21 42.03 5 380 Bet days per year, per individual 64.76 42 63.93 1 327 Events per bet day 5.44 4 5.16 1 66 In Panel A, Female is a dummy indicating gender. Age is in years. Income is in thousands of euros. Education values 1, 2, and 3 correspond to middle-, high-, and post-high-school education. In Panel B, Event types / Leagues / Teams per individual is the number of different event types / leagues / teams on which each individual has bet. A bet day for an individual is a day on which he has placed a bet. Events per bet day is the number of events on which an individual bets in a bet day.

In Table 1, we provide summary statistics for the characteristics of the individuals we observe and their selected bets. The mean age is 33 years, and most individuals (93%) are male. Each individual, on average, places wagers on 35 days, at a frequency of about once a week, and spreads his bets across 35 different leagues, 183 teams, and 23 event types (final outcome, exact score, etc.).

In Panel a (Panel b) of Figure 2 we plot a histogram of the percentage of bets by each individual on the league (team) that he has bet most frequently. We see that most individuals place a small proportion — on aver- age, less than 20% (3%) — of their bets on any specific league (team). This suggests it is unlikely that choices are driven by a systematic preference and/or by information (real or perceived) about any specific league/team.

Finally, in Figure 3 we examine whether the prices quoted by our bookmaker are efficient. We collect from the Web the closing prices and the final outcomes of virtually all the soccer matches available in our sportsbook during the period covered by our data; then we divide the match outcomes into 100 percentile groups based on their prices, and for each group we plot the average probability implied by the prices against the average actual win frequency.9 Even though the implied probabilities of favorite outcomes are

9In this analysis we use only soccer matches, because data on their outcomes are more readily available than for other sports. Since soccer betting is the most popular market segment in the sportsbook we study, this analysis should be quite representative. Furthermore, repeating the analysis including only matches that were selected by the individuals in our sample does not change the results.

8 (a) Bets on the most-frequently-bet league. (b) Bets backing the most-frequently-bet team.

Figure 2: Histograms, across individuals, of the percentage of betting activity involving specific leagues (Panel a) and specific teams (Panel b). slightly below their win frequencies, indicating a small favorite-longshot inefficiency, we generally observe that deviations of the implied from the objective win probabilities in this market are small. Indeed, the favorite-longshot inefficiency in our data is much weaker than that found in studies of parimutuel horse- racetrack betting (e.g., Ali, 1977; Snowberg and Wolfers, 2010).10 Consistent with this, we find that no individual in our data has bet-picking skill — positive or negative — as no one has realized betting returns that are statistically different from minus the commission (see Section 6.3).

1

0.9

0.8

0.7

0.6

0.5

0.4 Win Frequency

0.3

0.2

0.1

0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Implied Win Probability

Figure 3: Comparison of the win probabilities implied by the quoted prices with the historical win frequencies of outcomes with similar quoted prices, for the 100 percentile price groups.

10While parimutuel-betting prices have consistently been shown to be inefficient, for fixed-odds sports-betting markets the evidence is mixed: some studies find inefficiencies that could be exploited to yield positive returns (e.g., Cain, Law and Peel, 2000; Kuypers, 2000), while others (e.g., Pope and Peel, 1989; Woodland and Woodland, 1994) find that the odds are quite efficient, as we do. Wolfers and Zitzewitz (2004) also find evidence of efficiency in other prediction markets.

9 1.2 Lottery representation

To estimate preferences using field data, it is necessary to map the complexity of the real world to a man- ageable framework. As Barseghyan et al. (2018) point out in their review, this is challenging because, contrary to experiments, researchers in the field have no control over individuals’ subjective beliefs about the probabilities of possible outcomes and over the choice sets individuals face. These challenges are typically addressed by imposing a number of reasonable assumptions, which are partly motivated by psychological evidence on behavior (e.g., mental accounting) and partly by computational convenience. Here, we discuss how we represent individuals’ choices in the sports-betting market as choices from a set of lotteries.

The first step of our analysis is to represent the observed bet choices as lotteries. First, we create a lottery representation of the simplest bet an individual can place, which is to select an outcome in one event.

Such a bet has two potential prizes: if the selected outcome occurs, the prize is the stake times the net return associated with the selection, else the stake is lost. But, as in any field setting, individuals’ subjective beliefs with respect to the probabilities of these prizes are unobservable, giving rise to an identification problem which is commonly resolved by assuming beliefs are rational;11 see Manski (2004) for a general discussion of this issue, and Barseghyan et al. (2018) for a discussion specifically in the context of preference estimation.

We employ a similar assumption here. In our main analysis, we approximate the true probabilities of match outcomes by the probabilities implied by the quoted odds, and in a robustness check we repeat our analysis using the win frequencies of outcomes with similar prices, which — as discussed above — are close to the implied probabilities.12 Importantly, two features of our data indicate that subjective probabilities are likely to be close to the implied. First, several online tools (which can be accessed through the betting platform) help individuals calculate the implied probabilities from the quoted prices, so they are observable by the individuals themselves when they place their bets. As a result, it is likely that the implied probabilities serve as a focal point that anchors individuals’ beliefs as well as discourages significant deviations from them. Second, as discussed above, individuals place wagers on a large variety of leagues and teams. So, given how infrequently individuals place bets on specific leagues or teams, it is unlikely that they have superior information about

11In the preference estimation literature, an exception is Gandhi and Serrano-Padial (2014), who assume risk neutrality and estimate heterogeneity around the rational beliefs using aggregate betting prices. 12Similar assumptions have been employed by all field studies that estimate preferences. For example, studies using horse-racetrack betting data also approximate each outcome’s true probability with the win frequency of outcomes with similar prices; studies using insurance choices assume that each individual holds rational beliefs regarding his probability of incurring a claim during the policy period, and estimate these beliefs by regressing claim rates on demographics; and game show studies assume contestants’ beliefs are consistent with a rule that explains the majority of the variance of the presenter’s strategic behavior.

10 any of them. Nevertheless, in a sensitivity analysis in Section 6.2, we show that our results are robust to

the presence of small but significant systematic deviations of the subjective beliefs from the rational beliefs.

Intuitively, our results are robust because, even if an individual believes that an outcome is mispriced (e.g., he

believes that Nadal’s edge over Federer is undervalued), he still needs to choose a specific event (e.g., to back

Nadal to win, or to win by a large margin) as well as whether and how to combine this wager with others

into a combination bet. These choices correspond to lotteries with very different risk profiles, hence they

should be informative about risk preferences even if subjective probabilities deviate from the implied ones.

Subsequently, we create a lottery representation of combination bets. As already mentioned, the building

block of more complex bets is an accumulator bet, which selects an outcome for each event among a set of

events. Such a bet also has two potential prizes: if all selected outcomes occur, the prize is the stake times the

product of the gross returns associated with each selection minus one, else the stake is lost. The probabilities

of these prizes can be calculated from the quoted prices: the probability of the positive prize is the product

of the implied probabilities associated with each selection, and the probability of the negative prize is one

minus that. Since all other bets an individual can place in the sportsbook are combinations of accumulators,

we can construct their lottery representations by combining the lotteries of the constituent accumulator bets.

Once we have a lottery representation for each bet, we create a lottery representation for each play session

— defined as a single day — for each individual;13 we represent this day lottery by constructing all possible

combinations of payoffs from all the bets placed on a given day. We drop 1,043 day lotteries which contain

bets on related events hence cannot easily be constructed, and our final sample contains 11,490 lotteries.14

In Table 2, we report summary statistics for the characteristics of these chosen day lotteries, pooled

across all individuals. About half of the lotteries contain 2 prizes, but 25% of lotteries contain at least

6 prizes. The bet amount ranges from 1 cent to thousands of euros (AC5,500), while the maximum prize

ranges from 1 cent to hundreds of thousands of euros. Wagers have negative mean return, because of the

commission incorporated in the prices, and most lotteries have positive skewness. Overall, the summary

statistics demonstrate that there is substantial variation in the lotteries we observe. 13Alternatively, a play session could be defined, at one extreme, as an individual bet, and at the other extreme as the portfolio of all outstanding bets. The former is too narrow since individuals often place multiple bets at the same time, while the latter is in fact very close to how we have defined the play session as a single day, as most bets in our sample have been placed on the same day as the event. 14When multiple bets are placed on related events (e.g., the winner of a match and its exact score), we cannot construct the lottery without further information about the outcomes’ joint distributions, so we drop days that involve related bets. These days are relatively few, and it is unlikely that choices on these days are systematically different than on others, so dropping them should not affect our results.

11 Table 2: Day Lottery Characteristics Percentiles 1st 5th 25th 50th 75th 95th 99th Number of Prizes 2 2 2 2 6 61 354 Mean -43.39 -17.54 -4.96 -2.05 -0.77 -0.13 -0.03 Standard Deviation 0.31 1.43 8.18 20.17 45.03 153.91 411.11 Skewness -1.18 -0.32 0.72 2.49 5.94 33.23 180.94 Bet Amount 0.10 0.55 3.90 10.00 25.00 100.00 311.68 Maximum Prize 0.33 2.02 21.19 79.08 276.08 3 103 2 104 · · Summary statistics for chosen day lotteries, pooled across individuals. Monetary amounts are in euros.

The second step of our analysis is to specify the set of lotteries from which individuals make their choices. Similar to other field studies that use individual (insurance or betting) choices (e.g., Barseghyan et al., 2013; Andrikogiannopoulou and Papakonstantinou, 2019), we assume that the choice set contains all available choices (e.g., all lotteries in the sportsbook), except those that have rarely been chosen by any individual in the sample (e.g., lotteries involving many complex bets). So, to construct the choice set, we generate a large number of lotteries by randomly drawing their risk characteristics from their empirical distributions in the data. Specifically, the procedure we use to draw lotteries resembles the one individuals use to place wagers: we select the number of bets per session, the type of each bet, the amount staked in each bet, and the prices associated with each wager.15 Then, we augment this choice set with the chosen lottery and the safe lottery which corresponds to the option of not placing a bet. In a sensitivity analysis in Section

6.4, we vary the maximum stake we draw to account for the fact that individuals may face unobserved budget constraints. In Section A of the Appendix, we present more details on the choice set generation procedure.

As mentioned above, each individual places wagers on a great variety of leagues, teams, and events, indicating that his choices are not driven by a systematic preference for specific teams or events. As a result, even though different matches are offered in the sportsbook at different times, the choice set should not change over time as the same lotteries can be constructed at any point in time from the available matches. The reason is that at any point in time there exists a multitude of simple bets spanning the range of win probabilities, and these can be combined into various bet types to produce the same set of payoff risk profiles. To demonstrate this, we show in Figure 4 that, for two arbitrarily chosen days, it is possible to use simple bets from just a single sport to form binary lotteries whose win probabilities cover the entire range of possible values.16

15This choice set generation is a reduced-form analogue of a structural model in which the more complex and costly a lottery is, the less likely it is to be in the choice set (for similar reduced-form approaches, see Ben-Akiva et al., 1984; Brownstone, Bunch and Train, 2000). The two approaches should yield similar choice sets, so we use the reduced-form approach for computational convenience. 16Bookmakers do not offer wagers with win probability very close to 1, because they need to make a profit.

12 (a) Single wagers on winner. (b) Double wagers on two winners. (c) Single wagers on double chance.

Figure 4: Illustration of the wide range of win probabilities available on the sportsbook on two arbitrary days. We plot — in red x-marks for one day and in blue dots for the other day — the implied win probabilities for simple binary bets on soccer matches in the major leagues: for single bets on the match winner (Panel a), for bets that combine two single bets on the winner of two matches (Panel b), and for ‘double chance’ bets that back the home or draw, home or away, or draw or away outcome (Panel c).

2 Model

Here, we present the model of individual behavior and the model of heterogeneity.

2.1 Individual behavior

Each day, an individual has the opportunity to place a bet with some individual-specific probability which could depend on, e.g., whether he is generally busy. If he has this opportunity then, depending on his risk preferences, he either rejects it by choosing the safe lottery or he accepts it and chooses one of the risky lotteries available in the sportsbook.

We adopt the random-utility model (Marschak, 1960; McFadden, 1974) and decompose the utility that an individual with preference parameters ρ obtains from choosing lottery L into a deterministic component

U and a stochastic component ε as U (ρ, L) ε. +

The stochastic component ε accounts for all unmodeled factors that could affect each lottery’s utility, and it is necessary to reconcile observed behavior with a(ny) deterministic theory of choice. For example, it could capture time variation in choice behavior, preferences over specific match characteristics like the specific teams involved, and/or some form of utility from gambling. As is common in discrete-choice analysis, the stochastic components are assumed to be i.i.d. across choices and drawn from the double exponential distribution that has location 0 and inverse scale τ > 0.17 17The assumption that ε follows the double exponential implies that differences across alternatives in ε follow the logistic distribution with mean 0, which is a symmetric bell-shaped distribution with heavy tails, hence yields robust analysis. Since differences in ε — not levels — affect choice, the location of the double exponential is irrelevant and is normalized to 0. Furthermore, since utility has no units, it is necessary to normalize the scale of utility: we normalize to 0 and 1, respectively, the utility of the degenerate lotteries that pay 0 and 1 with certainty, and interpret our estimates for τ accordingly (see Section 4.3).

13 For the deterministic utility function U over lotteries, we employ the cumulative prospect theory (CPT) of

Tversky and Kahneman (1992), which has become generally accepted as a better descriptive theory of choice

than expected utility theory (EUT), and is often used to explain individuals’ actual behavior. According to

CPT, individuals use a value function defined over gains and losses relative to a reference point, have different

sensitivity to losses and gains, and systematically distort event probabilities by weighting the gain/loss cumu-

lative distribution. Based on empirical data, Tversky and Kahneman (1992) suggest that the value function is

concave/convex over gains/losses and is steeper for losses than for gains, corresponding to loss aversion, and

that the probability weighting function is inverse-S shaped. Our model accommodates the key features of CPT

— value function curvature, differential sensitivity to gains and losses, and probability weighting — but we use

flexible functional forms that allow for a variety of shapes for the value and probability-weighting functions.

As is common in the literature, we also assume individuals frame choices narrowly, evaluating lotteries in iso-

lation, separately from other unobserved (gambling, investing, etc.) risk-taking activities they may engage in.

To be specific, according to the Tversky and Kahneman (1992) CPT specification, an individual with

preference parameters ρ : (α,λ,γ,κ)0 evaluates lottery L with monetary prizes x m ... x0 0 ... xM = − ≤ ≤ = ≤ ≤ and corresponding probabilities p , by comparing final wealth to a reference point, commonly assumed { i } to be current wealth, and assigning to the lottery the value U (ρ, L) : P w u (x ) , = i i i where u is the value function and w are the decision weights. { i } The value function takes the form   xα for x 0 u (x α, λ) : ≥ . (1) ; =  λ ( x)α for x < 0 − − Parameter α >0 measures value function curvature, where 0<α <1 implies the value function is concave

in gains and convex in losses while α >1 implies it is convex in gains and concave in losses. The relative

sensitivity to gains and losses is measured by parameter λ>0, where λ>1 implies greater sensitivity to

losses (loss aversion) and 0<λ<1 implies greater affinity to gains (gain seekingness). Though many early

studies have found that, on average, α 1 and λ 1, we do not impose these commonly used restrictions, ≤ ≥ because recent experiments have found that some individuals exhibit risk and/or gain seekingness (e.g.,

Abdellaoui, Bleichrodt and Paraschiv, 2007; von Gaudecker, van Soest and Wengstrom, 2011).18

18In our estimation, we find that few individuals have α > 1 and/or λ < 1. Imposing the restrictions α 1 and λ 1 yields ≤ ≥ very similar results to the ones from the unrestricted estimation.

14 The decision weight wi takes the general CPT form   w (pi ... pM ) w (pi 1 ... pM ) for 0 i M w + + − + + + ≤ ≤ , (2) i =  w (p m ... pi ) w (p m ... pi 1) for m i < 0 − + + − − + + − − ≤ where the probability weighting function w has the flexible parametric form proposed by Goldstein and

Einhorn (1987) and Lattimore, Baker and Witte (1992): κpγ w (p γ, κ) : . (3) ; = κpγ (1 p)γ + − The curvature of the probability weighting function is measured by γ > 0, where 0 < γ < 1 corresponds to an inverse-S shape hence an overweighting of extreme outcomes, and γ > 1 corresponds to an S shape hence an underweighting of extreme outcomes. The elevation of the probability weighting function is measured by κ > 0; for example, with γ 1, κ > 1 corresponds to a globally concave function hence an = overweighting of outcomes in the right (left) tail of the distribution of positive (negative) outcomes, and

0 < κ < 1 corresponds to a globally convex function.19

To reduce model complexity and facilitate identification, we assume the following. First, α and γ , hence the value function and the probability weighting function, are the same for gains and losses. Indeed, empirical studies usually estimate similar values for these parameters over gains and losses (e.g., Tversky and Kahneman, 1992; Abdellaoui, 2000), so it is common in the literature to make this assumption (e.g.,

Post et al., 2008; Baillon, Bleichrodt and Spinu, 2019). Furthermore, to assume different value function curvatures for gains and losses is problematic both theoretically, as it implies loss aversion is not well defined (Kobberling and Wakker, 2005; Wakker, 2010), and empirically, as it hinders the identification of loss aversion (Nilsson, Rieskamp and Wagenmakers, 2011). Second, the reference point individuals use to separate gains from losses equals current wealth. While Kahneman and Tversky (1979) suggest that the status quo is a natural choice for most choice situations, other possible reference points such as an expectation have been proposed in the literature. But, as the empirical evidence on the characterization of the reference point is mixed (see O’Donoghue and Sprenger, 2018 for a review), the status quo remains the most common assumption in the literature. Indeed, in a recent experiment, Baillon, Bleichrodt and Spinu

(2019) find that the status quo is one of the most commonly used reference points.20 19For an illustration of the effects of γ and κ on w, see Figure IA.3 in the Internet Appendix. 20Furthermore, our assumption of a static reference point instead of a backward-looking one as in Andrikogiannopoulou and Papakonstantinou (2019) has little effect on the estimated population means of the preference parameters, so it does not seem to be particularly restrictive.

15 2.2 Heterogeneity

Experimental and survey studies have shown that there is considerable heterogeneity across individuals in risk taking (e.g., Tversky and Kahneman, 1992; Hey and Orme, 1994; and Chiappori and Paiella, 2011).

We introduce heterogeneity in preferences by proposing that the parameters ρn for individual n are drawn from a population distribution. Since the elements of ρn are bounded, it is convenient to define transformed parameters ρ : g (ρ ), where g represents Johnson transformations that map elements of ρ to ( , ), ˜n = n n −∞ +∞ and to model heterogeneity in terms of these parameters.21 A simple way to model heterogeneity would be to assume the transformed parameters are drawn from a multivariate normal with population mean µρ ˜   and variance Vρ. That is, p ρn µρ, Vρ fN ρn µρ, Vρ , or equivalently ˜ ˜ ˜ ˜ = ˜ ˜ ˜   p ρn µρ, Vρ fρ ρn µρ, Vρ , (4) ˜ ˜ = ˜ ˜ where p ( ) denotes a conditional density, fN is the normal density, and f can be derived from the Jacobian. ·|· ρ The above specification allows for individual heterogeneity within a utility type that exhibits all the CPT features. However, here we go a step further and allow for heterogeneity not only within a utility type but also across different types. This allows us to estimate the population proportions of individuals exhibiting different behavioral features, and therefore to evaluate the relative importance of each feature. To this end, we propose that each individual belongs to one of several utility types, and allow for heterogeneity across individuals of the same type. That is, we relax the assumption that preferences are drawn from a single distribution, and instead propose that they are drawn from a mixture. Thus, Equation 4 becomes  n o  n o q q X q q q q p ρn µρ,Vρ h fρ ρn µρ,Vρ , (5) ˜ ˜ = ˜ ˜ q Q ∈ q P q where Q is the set of utility types; h [0, 1] is the proportion of type q in the population, with q Q h 1; ∈ ∈ = q  q q  q q 22 and fρ µρ, Vρ is the density of the preference parameters for type q, with parameters µρ, Vρ . That is, ·| ˜ ˜ ˜ ˜ each individual belongs to type q with probability hq and, conditional on belonging to type q, his preference

q parameters are drawn from the density fρ . It is also useful to consider that there exist unobserved ‘allocation’ variables eq 0, 1 that sum to 1 and indicate the type of individual n. Letting h : hq be the probability n ∈ { } = { } vector, the allocation vector e : eq follows the multinomial M (1, h) distribution. n = n To assess the importance of the loss aversion and probability weighting features of CPT in explaining

21  y m  For example, if y is bounded from below at m and from above at M, its transformation is y : ln M− y . ˜ = − 22We retain the convenient assumption that transformed parameters are normally distributed across individuals within a type, unless they are restricted (e.g., λ 1 for all individuals in the type with no loss aversion). =

16 individual behavior, we estimate a mixture of four utility types. One type exhibits neither loss aversion nor probability weighting, one type only turns off loss aversion and one only turns off probability weighting, and one type is unrestricted and exhibits all CPT features. Thus, we have four cases. If individual n belongs to the type with neither loss aversion nor probability weighting, then ρn : (αn, 1)0. If he belongs to the type = with no loss aversion, then ρ : (α , 1, γ , κ )0. If he belongs to the type with no probability weighting, n = n n n then ρn : (αn, λn, 1)0. And if he belongs to the unrestricted type, ρn is unconstrained. = As with the preferences, we allow the inverse scale τ of the stochastic utility component and the prob- ability π of having the opportunity to bet to be individual-specific. To keep things simple and to ensure that allocation of individuals to utility types is made on the basis of their preferences, we assume π and

τ are independent from the preference parameters.

3 Estimation

In this section, we present the likelihood of observing the data as a function of our model parameters, we describe our Bayesian estimation technique, and we discuss identification.

3.1 The likelihood

According to our model of behavior, on each day t, individual n has the opportunity to log on the sportsbook with probability πn. Given this opportunity, he either rejects it by choosing the safe lottery δ0 that pays 0, hence we observe no play on day t, or he accepts it and chooses a risky lottery y δ from the sportsbook. nt 6= 0 Letting y : y be the sequence of daily lottery choices we observe and x : x be a sequence of n = { nt } n = { nt } dummies indicating whether we observe play on a given day, the likelihood of observing (xn, yn) is therefore: Y p( x ,y π ,τ ,ρ ) x π p( y τ ,ρ ) (1 x ) π p( y δ τ ,ρ ) (1 π ) , (6) n n| n n n = nt n nt | n n + − nt n nt = 0| n n + − n t where p (y τ , ρ ) is the probability that individual n with parameters τ , ρ chooses lottery y on day t. nt | n n n n nt An individual chooses y L from the choice set C with probability nt = j     p ynt L j τn, ρn p U ρn, L j εnjt > max U(ρn, Li ) εnit = = + i j C { + }  6= ∈ exp τnU ρn, L j P , (7) = exp (τnU(ρn, Li )) i C ∈ where the latter expression follows from our distributional assumption for ε. Because our choice set (defined in Section 1.2 above) includes many similar lotteries, we can reduce the computational complexity owing

17 to the utility calculations in the denominator of Equation 7 by grouping similar lotteries into clusters and keeping from each cluster its most representative lottery. We use 100 representative lotteries, but increasing this number to, e.g., 200 has a small effect on our estimates, which indicates that these lotteries capture a large part of the variation in the set of available lotteries. See the Appendix for the clustering algorithm and the measure of lottery similarity.

3.2 Bayesian estimation

Given the extensive unobserved heterogeneity hence the large number of parameters in our model, for tractabil- ity we use Bayesian estimation methods, specifically the Markov chain Monte Carlo (MCMC) technique. The Bayesian approach combines information in the data — individuals’ lottery choices and play frequen- cies — with priors, to produce posterior inference. Since we have no concrete prior information about the model parameters, we use priors that are weak, to let the data determine the posteriors. Specifically, we utilize a two-layer hierarchical prior, which models the lack of prior information through a distribution over priors.

The first layer is essentially the model of heterogeneity presented above, i.e., it proposes that individual-level parameters are drawn from a population distribution with unknown parameters that are estimated from the data. In the second layer, we propose a weak ‘hyper-prior’ distribution for these population parameters, which contains very little information. Thus, our approach contains more data-driven information, hence allows for a more “objective approach to inference” (Gelman, 2006) and improves the robustness of the resulting pos- teriors (Robert, 2007). Specifically, for the population proportions h, we adopt the symmetric Dirichlet prior, i.e., h D h with h 1, which implies an uninformative uniform distribution over all possible values of ∼ = h. For the population means and variances, we use the independent Normal-inverse-Wishart conjugate prior,   q q   q  i.e., (µπτ , Vπτ ) NIW κ, K , λπτ , 3 and µρ, Vρ NIW κ, K , λρ, 3 , with parameters that render ˜ ˜ ˜ ˜ ∼ ˜ ˜ ˜ ˜ ∼ ˜ the priors weak. In Section 6.1, we show that our estimates are robust to alternative prior specifications.

To estimate our model, we derive the joint posterior of the model parameters conditional on the data. The joint posterior density is proportional to the likelihood times the joint prior, but we cannot calculate it analyti- cally. Thus, we obtain information about it using an MCMC algorithm to generate a chain of draws whose dis- tribution converges to the joint posterior. Next, we discuss our algorithm; for more details, see the Appendix.

We use the Gibbs sampler (Geman and Geman, 1984), which partitions the parameters into blocks such that it is possible to draw sequentially from the posterior of each block conditional on the other blocks and the data. Intuitively, the Gibbs sampler works as follows. Given two blocks of parameters, A and B, we can

18 use Bayes’ rule to draw from their joint posterior by first drawing from the marginal posterior of A and then from the conditional posterior of B given A. So, given an initial draw for A, we can generate a draw for B from the joint posterior, which we can then use to generate a draw for A from the joint posterior. Thus, we can sequentially make draws that form a Markov chain which, under weak conditions, converges to the joint posterior regardless of the initial draw (see Geweke, 1999). For example, in our model, individuals’ alloca- tions e to types constitute a sample from the multinomial M (1, h) distribution, hence they can be used to { n} learn about the population proportions h. Furthermore, for individuals belonging to type q, their transformed

 q q  q q preference parameters are a sample from N µρ, Vρ , so we can use them to learn about µρ and Vρ . ˜ ˜ ˜ ˜ A non-trivial complication arises from the fact that the four type-specific distributions from which individuals’ preference parameters ρn are drawn overlap (pairwise) at sets of measure zero. As a result, we cannot draw sequentially from the conditional posteriors of the type allocations en and of the preference parameters ρn, because it would be impossible to explore the entire posterior of ρn. We circumvent this issue by designing a Metropolis-Hastings algorithm (Metropolis et al., 1953; Hastings, 1970) that enables us to draw in one step from the joint conditional posterior of e and ρ . In this step, we make candidate draws { n} { n} using a generating density that first draws the allocation to one of the utility types and then, conditional on the allocation, draws the preference parameters.23 These candidates are accepted or rejected based on an acceptance probability that generally moves the Markov chain toward areas of the parameter space where posterior probability is high. Intuitively, in this step, we combine information in each individual’s likelihood with population-level information (i.e., the proportions and the distribution of the preference parameters for each type), to learn (i) about the probability that the individual belongs to each type and (ii) about the distribution of his preference parameters conditional on belonging to each type.

3.3 Identification

In the context of Bayesian estimation, identification differs from the classical definition that distinct pa- rameter values give rise to distinct distributions for the data. Rather, identifiability and informativeness are used somewhat loosely (and interchangeably) to refer to the situation in which the likelihood is not flat so the data contains information about the model parameters (Dawid, 1979; Poirier, 1998; Gelfand and Sahu,

1999). One way to assess this is by arguments concerning the features of the data that provide information about the model parameters, and another is by checking that the posterior is far from the prior. Here, we

23See Gottardo and Raftery (2008) for a similar approach for drawing from mixtures of mutually singular distributions.

19 provide intuition, and in Section 6.1 we show that indeed the posteriors are far from the priors.

First, to separate between risk preferences and the probability of having the opportunity to bet, we use information in both the betting frequency and the lottery choices conditional on placing a bet. The reason is that, while both busyness and risk preferences affect the former, only risk preferences affect the latter.

For example, two individuals who have different probabilities of having the opportunity to bet but the same risk preferences will exhibit different betting frequencies but similar choices conditional on placing a bet.

Second, we can separate between risk aversion and loss aversion by observing individual attitude to- ward lotteries with small versus large stakes. Specifically, risk aversion affects the utility function’s global concavity while loss aversion primarily affects its concavity around the reference point. As a result, an individual who is only mildly averse to both small-stakes and large-stakes lotteries is likely to be loss averse rather than risk averse, while an individual who is mildly averse to small-stakes lotteries and very averse to large-stakes lotteries is likely to be risk averse rather than loss averse (see Rabin, 2000).

Third, we separate probability weighting from risk aversion by observing individual attitude toward lotter- ies that have different levels of variance versus skewness. Specifically, a risk-seeking individual will choose lotteries that have high variance, while one who overweights small probabilities will choose lotteries with pos- itive skewness. Furthermore, we obtain information about the probability weighting function curvature and elevation by observing behavior toward lotteries that contain a wide range of gains and losses. An individual whose probability weighting function has an inverse-S shape overweights outcomes at both extremes of the distribution, while an individual whose probability weighting function is elevated primarily overweights outcomes at one extreme (the left in the case of losses and the right in the case of gains) of the distribution.

Finally, all this information about individuals’ parameters is pooled together so we can learn about the population parameters and vice versa. That is, information on individuals’ parameters helps us learn about the population (the utility type proportions and the preference parameters’ distribution), and information about the population helps us share information across individuals hence learn about individuals’ parameters.

For example, consider how an individual is allocated to a type, e.g., the λ 1 type with no loss aversion. = If (i) the proportion of the λ 1 type in the population is high, (ii) the individual’s choices are better = explained by λ 1 hence his likelihood (as a function of λ) at λ 1 is high, and (iii) the overlap between = = this likelihood and the population distribution at values λ 1 is low, then the individual likely belongs 6= to the λ 1 type. Conversely, the higher is the probability that an individual belongs to the λ 1 type, = = the higher is the posterior estimate for this type’s proportion in the population.

20 4 Results

In this section, we present our estimation results. In Section 4.1, we present results on the population proportions of the various behavioral types. In Section 4.2, we present results on the distribution of the preference parameters over the entire population. In Section 4.3, we examine how well our model fits the data.

4.1 Proportions of utility types

In Table 3, we present posterior estimates for the proportions of the four utility types in the population. Our results show that utility curvature alone cannot adequately explain individuals’ choices, and that incorpo- rating the distinctive features of prospect theory — i.e., loss aversion and probability weighting — offers a significant improvement in explaining observed behavior. To assess the relative importance of these two features, we observe that individuals can be partitioned into two groups: 63% belong to the unrestricted CPT type and therefore exhibit all CPT features, and 36% belong to the type that exhibits probability weighting but not loss aversion. We estimate that only 1% of the population belongs to the type that does not exhibit probability weighting. We conclude that, while loss aversion is an important determinant of risk attitudes, probability weighting is even more so.

Table 3: Utility Type Proportions Mean Median Std.Dev. 95% HPDI Type with λ γ κ 1 0.00 0.00 0.00 [0.00, 0.02] = = = Type with λ 1 0.36 0.36 0.04 [0.28, 0.44] = Type with γ κ 1 0.01 0.01 0.01 [0.00, 0.03] = = Unrestricted Type 0.63 0.63 0.04 [0.54, 0.71] Summary statistics of the posterior distribution for the population proportions of the four behavioral types: one exhibits neither loss aversion nor probability weighting (λ γ κ 1), one does not exhibit loss aversion (λ 1), one does not exhibit probability = = = = weighting (γ κ 1), and one exhibits all CPT features. The 95% Highest Posterior Density Interval (HPDI) for a parameter = = is the smallest interval such that the posterior probability that the parameter is in this interval is 0.95.

This result adds to findings regarding the importance of probability weighting. In experiments, Bruhin,

Fehr-Duda and Epper (2010) and Conte, Hey and Moffatt (2011) find that roughly 80% of their student sub- jects exhibit significant deviations from linear probability weighting, and Fehr-Duda and Epper (2012) find that all individuals exhibit some degree of probability distortion. In the field, Barseghyan et al. (2013) find that some kind of “generic” probability distortions are important in explaining insurance deductible choices, and Snowberg and Wolfers (2010) show that, with a representative agent, a model with probability weighting alone better fits racetrack betting prices than a model with risk aversion alone. Using a dataset which

21 contains both gains and losses, we estimate the fully specified form of CPT that includes both probability weighting and loss aversion, therefore we are able to refine and extend these results: Probability weighting is not only an important component of behavior, but furthermore it is substantially more prevalent than loss aversion. As we noted in the introduction, this finding implies that prospect theory is not a monolithic theory with a set of features — loss aversion and probability weighting — that all individuals exhibit, which in turn can have far-reaching implications in applied models that use prospect theory.

In Figure 5, we plot a histogram of individuals’ posterior mean of the probability of belonging to the unrestricted type which exhibits both loss aversion and probability weighting.24 This histogram shows that about one-third of the individuals are clearly allocated to the unrestricted type, as their posterior probability of belonging to it is very close to 1, while the remaining two-thirds are less clearly allocated, as their posterior allocation probabilities are farther from 1. It is worth noting that, in contrast to Bruhin, Fehr-Duda and Epper (2010), we do not find as clear a classification of individuals to utility types, which is due to the fact that we allow for preference heterogeneity both within and across utility types. As we noted previously, the posterior probability that an individual belongs to a type is increasing in the amount of overlap between his likelihood and the distribution of the preference parameters within that type’s sub-population. Thus, the more (less) distinct are the sub-population distributions of the preference parameters, which is the case in the absence (presence) of within-type heterogeneity, the more (less) clear is the classification of individuals to the different utility types.

Figure 5: Histogram of individuals’ posterior probability of belonging to the unrestricted full-featured type.

24Since the estimated proportions of the type that exhibits neither probability weighting nor loss aversion (λ γ κ 1) and = = = the type that exhibits loss aversion but not probability weighting (γ κ 1) are almost zero, the histograms for these types are = = concentrated at zero probability and hence the histogram for the λ 1 type is simply the mirror image of the histogram for the = unrestricted type, which we show here.

22 4.2 Population distribution Here, we present estimates for the means and variances (see Table 4) and the percentiles (see Table 5) of the model parameters over the entire population. The value function we estimate is moderately concave (convex) over gains (losses), with a mean α of 0.66, and loss aversion is mild, with a mean λ of 1.45. The average values for the curvature and elevation of the probability weighting function indicate it is mostly concave, implying significant overweighting of extreme outcomes (the mean of γ is 0.84 and of κ is 2.69). These estimates deviate from the commonly used estimates from Tversky and Kahneman (1992), which suggest that the value function has milder curvature (α 0.88) and more pronounced loss aversion (λ 2.25), and = = that probabilities are distorted not only for extreme outcomes but also for outcomes near the reference point.

This could be because laboratory results are not applicable in the field (see, e.g., Harrison, List and Towe,

2007), or due to the nature of our setting, e.g., very loss-averse people might not participate in a market with negative expected returns. But it is important to note that our preference estimates fall well within the range of estimates from lab experiments (see Wakker, 2010 for a review), which should temper sample selection concerns. Indeed, in a meta-analysis of lab experiments Booij, Van Praag and Van De Kuilen (2010) find that most studies estimate an average α below 0.7, as we do, while more recently using a random-effects methodology in a meta-analysis study Walasek, Mullett and Stewart (2018) estimate a loss aversion λ of

1.31, which is not far from (and indeed slightly lower than) our mean estimate of 1.45. Table 4: Population Means and Variances Panel A: Means Panel B: Variances Mean Median Std.Dev. 95% HPDI Mean Median Std.Dev. 95% HPDI π 0.25 0.25 0.01 [0.23, 0.28] 0.05 0.05 0.00 [0.04, 0.06] τ 1.93 1.93 0.10 [1.75, 2.14] 2.13 2.11 0.26 [1.66, 2.66] α 0.66 0.66 0.01 [0.64, 0.68] 0.02 0.02 0.00 [0.01, 0.03] λ 1.45 1.45 0.05 [1.37, 1.55] 0.47 0.45 0.14 [0.32, 0.79] γ 0.84 0.84 0.01 [0.82, 0.86] 0.02 0.02 0.00 [0.01, 0.02] κ 2.69 2.68 0.12 [2.46, 2.94] 2.72 2.69 0.43 [1.94, 3.59] Summary statistics of the posteriors for the population mean and variance of the probability π of having the opportunity to place a bet, the inverse scale τ of the stochastic utility component, and the preference parameters α, λ, γ , and κ.

Importantly, our estimates suggest that individuals are averse to the risk in the typical lottery in the sportsbook, while they exhibit substantial probability weighting, which implies a preference for skewness.

This is consistent with past studies that have found that a preference for skewness, rather than risk, explains individuals’ behavior in betting markets (e.g., Golec and Tamarkin, 1998; Snowberg and Wolfers, 2010). Our results for the population variances (Panel B of Table 4) and for the percentiles of the population distribution (Table 5) also reveal considerable heterogeneity in all parameters. Specifically, we find that

23 Table 5: Estimated Percentiles of Population Distributions of Model Parameters

Percentiles 1st 5th 25th 50th 75th 95th 99th π 0.01 0.02 0.07 0.18 0.38 0.72 0.88 τ 0.16 0.33 0.84 1.53 2.65 4.94 6.64 α 0.33 0.45 0.57 0.65 0.74 0.90 1.03 λ 0.61 0.85 1.00 1.13 1.77 2.81 3.77 γ 0.55 0.64 0.76 0.84 0.92 1.04 1.13 κ 0.48 0.89 1.61 2.14 3.38 6.28 7.89 Estimates (the posterior mean) of the percentiles of the population distributions for the probability π of having the opportunity to place a bet on any given day, the inverse scale τ of the stochastic component of utility, and the preference parameters α, λ, γ , and κ. the 25th75th percentiles for the population distribution are 0.570.74 for α, 1.001.77 for λ, 0.760.92 for γ , and 1.613.38 for κ. Allowing for such heterogeneity in theoretical and empirical models may be crucial to understanding observed patterns in behavior at the individual and aggregate levels (e.g., portfolio allocation and asset prices). In Figure 6, we also plot the estimated population density for each parameter.

π τ α 5 0.6 5 4 4 0.4 3 3 2 2 0.2 1 1 0 0 0 0 0.25 0.5 0.75 1 0 2 4 6 8 0 0.25 0.5 0.75 1 1.25 λ γ κ 1 6 0.6 0.8 4 0.6 0.4 0.4 2 0.2 0.2 0 0 0 0 1 2 3 4 5 0 0.25 0.5 0.75 1 1.25 0 2 4 6 8 (a) Histograms simulated from estimated population distributions. (b) Density plots of estimated population distributions.

Figure 6: Estimated population distributions of the probability π of having the opportunity to place a bet, the inverse scale τ of the stochastic utility component, and the preference parameters α, λ, γ , and κ. In the histograms in Panel (a), bar heights are normalized so the area of each bar represents the probability of the corresponding interval. In the density plots in Panel (b), the circles at λ 1 and at γ 1, κ 1 represent point masses that correspond to the = = = proportions of individuals with no loss aversion and with no probability weighting, respectively; the proportion with no loss aversion is 36% and the proportion with no probability weighting is 1%.

To gain a better understanding of what this heterogeneity in preference parameters implies for the hetero- geneity in the shapes of the value and the probability weighting functions, in Figure 7 we plot these functions for a range of parameter values drawn from their estimated joint distribution. These plots demonstrate significant heterogeneity for the value function and for the probability weighting function.

We note that, motivated by prior evidence which suggests that individual preference heterogeneity is al- most entirely unobserved (e.g., von Gaudecker, van Soest and Wengstrom, 2011; Barseghyan et al., 2013), our model focuses on unobserved heterogeneity. In the Internet Appendix, we report results from regressions that examine the relationship between individuals’ characteristics (gender, age, income, and education) and their

24 1

Value 0.8

0.6 ) p (

w 0.4 Losses Gains

0.2

0 0 0.2 0.4 0.6 0.8 1 p (a) Value function. (b) Probability weighting function.

Figure 7: Panel (a) plots the value function for values α and λ drawn from their estimated joint distribution. Panel (b) plots the probability weighting function for values γ and κ drawn from their estimated joint distribution; the 45◦ line is plotted in red. In both panels, darker lines correspond to parameter values from areas with higher estimated density. estimated preference parameters. Like previous studies, we find that little of the heterogeneity in preferences is explained by demographic characteristics, as the R2s of our regressions are small, ranging from 1% to 5%.

In our estimation, it is important to allow for correlation among preference parameters, because an individual’s behavior is determined by the combination of his parameters. In Table 6, we see that the estimated correlation between α and γ (κ) is positive (negative) indicating a positive relationship between the curvature of the value function and the curvature and elevation of the probability weighting function. Table 6: Population Correlations λ γ κ

α 0.10 0.33 0.40 − − (0.05)(0.07)(0.05) λ 0.36 0.63 − (0.06)(0.06) γ 0.15 − (0.07) Posterior means and standard deviations (in parentheses) of the correlations between the preference parameters.

This is consistent with the insight in Hsee and Rottenstreich (2004) that common psychological processes drive the sensitivity to deviations from the reference point (captured by α) and from impossibility/certainty

(captured by γ and κ). Interestingly, restating an insight from Qiu and Steiger (2011) in terms of our specification, the stronger this relationship is, the less of the variation in risk attitudes in a population can

EUT alone explain.25 So our finding further emphasizes the importance of using CPT to explain heteroge- neous risk attitudes. Furthermore, we find a negative correlation between λ and γ and a positive correlation between λ and κ; the remaining pairwise correlations are negative but quite weak. 25The intuition is as follows. A more curved value function works somewhat like a less curved/elevated probability weighting func- tion: both imply diminished sensitivity to extreme prizes. So if, contrary to our finding, a more curved value function was correlated with less probability weighting, EUT could capture a lot of the variation in risk attitudes despite ignoring probability weighting.

25 4.3 Model fit

Here, we discuss how well our model fits the observed data. First, we show that the stochastic utility compo- nent needed to explain the observed choices is small. Normalizing to one the utility difference between the two degenerate lotteries with certain payoff of AC0 and AC1, respectively, we estimate the population median of the inverse scale τ of the random component to be 1.53 (see Table 5). This is comparable to the value estimated in some lab experiments, and implies, e.g., that an individual making a choice between two lotteries whose utility differs by 1, chooses the less-preferred lottery with probability 0.19.26 Even though the stochastic component of utility needed to fit the model to the data is small, it is important because it helps explain both why individ- uals make different risky choices at different times and why they sometimes place bets and sometimes do not.

As we see in Figure 8a, for more than half of the individuals, their ‘best’ chosen lottery has a slightly negative certainty equivalent, most often a few cents below zero. This means that these individuals are — according to the deterministic component of utility — almost indifferent between the safe and the chosen lottery, and that a very small stochastic component of utility (e.g., from utility of gambling) is sufficient to explain their choice to place a bet. Furthermore, as we discuss next, while the chosen risky lotteries are among the most desirable alternatives according to the deterministic component of utility, they are rarely the most desirable alternative.

Next, we assess model fit by examining if, for the preferences we estimate for each individual, the deter- ministic component of utility ranks his chosen lotteries among the most desirable alternatives. First, for each observed choice of each individual, we use his estimated preferences to calculate the chosen lottery’s rank among the alternatives, and we divide this rank by the number of alternatives to produce a normalized rank that is close to 0 for desirable and close to 1 for undesirable lotteries. For each individual, we average this normal- ized rank across his choices, and in Figure 8b we plot the histogram of this mean ranking across individuals.

For the median individual, the chosen lotteries are on average more preferred than 95% of the alternatives.

Then in Figure 8c we plot a histogram, across all choices of all individuals, of the certainty equivalent (CE) difference between the chosen lottery and each of the alternatives (i.e., the individual-specific most-preferred lottery, second-most-preferred lottery, etc.). We see that the CE difference between the chosen and the most- preferred alternative is close to zero, meaning that, in utility terms, the chosen lottery is close to the most desirable alternative. Furthermore, the CE difference between the chosen and most alternative lotteries is quite high, indicating that the model does a good job of separating the desirable from the undesirable alternatives.

26For example, von Gaudecker, van Soest and Wengstrom (2011) find that an individual choosing between two lotteries whose utility differs by 1 would choose the less-preferred one with probability 0.45.

26 (a) Certainty equivalent difference, best (b) Mean ratio ranking of chosen lottery. (c) Mean certainty equivalent difference, chosen minus safe lottery. chosen minus alternatives.

Figure 8: Illustrations of goodness of fit of our model. Panel (a) plots the histogram, across individuals, of the certainty equivalent difference between the best chosen lottery and the safe lottery. Panel (b) plots the histogram, across individuals, of the mean ratio ranking of the chosen lotteries; for choices that are more (less) preferred, the ranking is close to 0 (1). Panel (c) plots the histogram, across alternatives, of the certainty equivalent difference between the chosen lotteries and the alternatives, averaged across choices and individuals.

Finally, we examine whether our model fits the data better than (i) a model in which lotteries are chosen randomly; (ii) a model in which all individuals have the Tversky and Kahneman (1992) parameters; and (iii) a model in which all individuals have the population mean parameters, so essentially the no-heterogeneity case.

In Table 7, we present the average (across individuals) log likelihood from each model as well as the Deviance

Information Criterion, which includes a penalty for model complexity. The results show that our model fits the data significantly better than all three alternatives. We also note that comparisons between models with only a subset of the CPT features are implicitly part of the estimation of our mixture model, which can be viewed as a tool for model selection analysis. As we show in Section 4.1, the model with no loss aversion (the model with all CPT features) best explains observed behavior for one-third (two-thirds) of our individuals. Table 7: Comparison of Our Model with Alternatives

log L DIC Model Mean Median Random Choice -251.76 -150.51 503.52 Tversky-Kahneman Parameters -227.85 -133.15 455.70 No Heterogeneity -210.94 -121.19 421.88 Our Model -179.27 -106.98 362.10

Mean and median log likelihood, across individuals, and Deviance Information Criterion (DIC) for our model and three alternatives: (i) one in which lotteries are chosen randomly; (ii) one in which all individuals have the Tversky and Kahneman (1992) parameters; and (iii) one in which all individuals have the population mean parameters. DIC D (β) p , where D (β) 2 log p (x, y β ), = + D = − | D (β) E  D (β) x, y, and p D (β) D β, with β the model parameters and β E β x, y, i.e., the posterior mean. = | D = − ¯ ¯ = |

27 5 Application to portfolio choice

Here, we examine the implications of our estimated utility parameters for a portfolio choice problem. We study the optimal decision of an individual who allocates his wealth between the risk-free asset, a single-stock undiversified portfolio, and a well-diversified fund. This analysis enables us to examine how the optimal de- cisions that we calculate relate to evidence on household portfolio allocations, hence provides a sense of how realistic our estimates are when applied to a choice situation different from the one in which we estimate them.

In the asset-allocation problem we consider, an individual solves  max U r p α, λ, γ, κ (8) ωS,ωF ; s.t. r r ω r ω r , p = f + S S + F F where r f is the rate of return of the risk-free asset; rS and rF are the rates of return (in excess of r f ) of the stock and the fund, respectively; r p is the rate of return of portfolio p with portfolio weights ωS and

27 ωF on the stock and the fund, respectively; and U is the CPT utility functional. Since the distribution of portfolio returns is essentially continuous, the utility functional is an integral rather than a sum, i.e., Z 0 Z  d  +∞ d  U r p : u(x ) w Fp (x) dx u(x ) w 1 Fp (x) dx, ;· = ;· dx ;· + 0 ;· dx − − ;· −∞ where u is the value function and w the probability weighting function defined in Equations 1 to 3, and Fp is the distribution function for the returns of portfolio p. We solve this problem for each of our individuals, using the posterior mean estimates of his preference parameters (αn, λn, γn, and κn) and nominal annual

28 returns for r f , rS, and rF . To solve this problem, we construct a two-dimensional grid spaced at 1% increments over the range of values for the weights ωS and ωF , and we pick the weights that maximize the objective U for each individual. We approximate the objective by generating 1 million draws for the

29 returns r f , rS, and rF from their empirical joint distribution, and numerically calculating the integrals.

27Since retail investors are unlikely to engage in shorting and/or borrowing, we impose ω 0, ω 0, and ω ω 1. Also, S ≥ F ≥ S + F ≤ since U is homogeneous in wealth, we can evaluate it on portfolio rates of return rather than on wealth changes as in the more common CPT formulation. See Polkovnichenko (2005) for a similar setup. 28Our use of nominal annual returns follows common practice in the literature. Benartzi and Thaler (1995) argue that households consider nominal annual returns in portfolio allocation because they file taxes and receive reports from brokers and mutual funds annually, and nominal returns are more prominent in annual reports. 29Specifically, for the period 1975–2011, we obtain data on the risk-free rate (the one-month T-Bill) and on equity returns from the Center for Research in Securities Prices (CRSP), and on U.S. equity mutual fund returns (net of fees, expenses, and transaction costs) from the CRSP U.S. Mutual Fund Database. Then, for each draw, we randomly pick a month t in this period, we randomly pick a stock among all stocks and a fund among all funds active in month t, and we calculate the returns for the twelve months starting with month t. The mean/median/standard deviation of our draws are 5.51%/5.23%/3.26% for r f , 12.07%/4.69%/79.60% for rS, and 11.49%/12.54%/24.52% for rF . For details, see Section IA.7 of the Internet Appendix.

28 25

20

15

10

5

0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 ωS ωS + ωF

Figure 9: Histogram of the proportion of undiversified to total equity investment across individuals, conditional on investing in equity, as implied by the individual-level preferences we estimate.

We find that 48% of our individuals would optimally invest in equity and, conditional on investing in equity, the proportion of undiversified to total equity investment varies widely across individuals (see Figure

9) with a mean/median of 49%/46% and a standard deviation of 24%. These results are broadly in line with corresponding household data from the U.S. Survey for Consumer Finances (SCF). First, according to the

SCF, since 2001 about 50% of all households in the U.S. participate in the stock market each year, which is similar to the 48% participation rate we calculate using our CPT estimates. This finding — that half of our individuals would not participate in a market that offers risky assets with positive expected returns — is not surprising given that we find that individuals exhibit a preference for skewness, not risk. Second, both in the SCF data and in our analysis, a significant proportion invests simultaneously in both undiversified and well-diversified equity portfolios: In the 2001 SCF, the proportion of direct to total equity holdings has a median value of 40% across households, which is close to the median value of 46% that we calculate. This behavior — which is puzzling with expected utility preferences — is optimal for a significant proportion of our individuals due to the interaction of probability weighting which makes the undiversified portfolios

(with their low probability of very large gains) attractive, and loss aversion which makes the diversified portfolios (with their low probability of big losses coupled with decent average gains) attractive. In Section

IA.7 of the Internet Appendix, we analyze how each preference parameter affects the optimal weights on the stock and the fund in order to see how our estimated distribution of preference parameters translates to the distribution of optimal portfolios we present in Figure 9.

Overall, we conclude that our preference parameter estimates yield portfolios similar to those observed in the data. This indicates that our estimates are sensible and could be relevant more generally.

29 6 Robustness checks

Here, we present robustness checks: i) we conduct a prior sensitivity analysis, ii) we let beliefs deviate from those implied by the odds, iii) we test for the prevalence of betting skill, and iv) we impose a budget constraint.

6.1 Prior sensitivity analysis

Here, we focus on the sensitivity of the posterior predictive densities of the parameters on their prior predictive densities, where the prior (posterior) predictive densities are the distributions of an individual’s parameters before (after) observing our data for all individuals. Figure 10 shows that a drastic change in the prior predictive density — from the baseline that is flat almost everywhere, except for values close to the endpoints for bounded distributions, to an alternative that is concentrated over a smaller range of values away from the boundaries — has almost no effect on the posterior predictive density, for all parameters.

π τ α 5 0.6 5 4 4 0.4 3 3 2 2 0.2 1 1 0 0 0 0 0.25 0.5 0.75 1 0 2 4 6 8 0 0.25 0.5 0.75 1 1.25 λ γ κ 1 6 0.6 0.8 4 0.6 0.4 0.4 2 0.2 0.2

0 0 0 0 1 2 3 4 5 0 0.25 0.5 0.75 1 1.25 0 2 4 6 8

Figure 10: Prior sensitivity analysis. We plot the prior and posterior predictive densities for the baseline prior and for an alternative. The prior predictive densities for the baseline (alternative) prior, are plotted in dotted blue (red) lines. The proportion with no loss aversion (no probability weighting) is 50% (50%) for both priors. The posterior predictive densities corresponding to the baseline (alternative) prior specification are plotted in solid blue (dashed red) lines, and they almost coincide. The proportion with no loss aversion (no probability weighting) is 36% (1%) for both posteriors.

In the Internet Appendix, we present additional results in which we vary the marginal priors of the pop- ulation parameters rather than the prior predictive densities, which compound all the marginals. Specifically, we show that our results are robust to varying by orders of magnitude the scale 3 of the prior distribution

q for Vρ , which shows that the heterogeneity we estimate is not driven by the priors. We also show that ˜ varying the priors for the utility type proportions h does not affect their posteriors, indicating that the priors do not drive our finding that a very substantial 36% of the individuals do not exhibit loss aversion.

30 6.2 Sensitivity to subjective beliefs

In our main analysis we assume beliefs are rational, and we approximate the outcomes’ true probabilities with those implied by their quoted odds. Here, we discuss two alternative assumptions about subjective beliefs.

First, we approximate the outcomes’ true probabilities using the win frequencies of outcomes with similar odds. In Figure 11a, we compare the posteriors from this analysis with those from the baseline, and see that they are quite similar. This is not surprising, given that the two sets of probabilities are similar (see Figure 3).

Second, we allow for the possibility that observed choices are at least partially motivated by heterogeneous optimistic subjective beliefs. Letting 1 p be the implied win probability for a selected outcome, we let − h i 1 ζp for ζ ζ,1 be its perceived win probability. In Figure 11b, we plot the posteriors for two values of − ∈ ζ , hence for two deviation magnitudes: across all lotteries, the small/large deviations increase the perceived probability of winning the maximum prize by 1.4%/3.3% on average, and increase the lottery’s expected value by 2.2%/4.5%. We find that these deviations have little effect on our results. As already discussed, this is because, even if an individual bets on a match because he (believes he) has superior information on it, he still needs to choose among the multitude of wagers that involve this match, and he also has the option to combine this wager with wagers on other matches. These choices lead to lotteries involving very different risk profiles, so the observed choice is still informed by (and so informs us about) the individual’s preferences.30

π τ α 5 0.6 5 4 4 0.4 3 3 2 2 0.2 1 1 0 0 0 0 0.25 0.5 0.75 1 0 2 4 6 8 0 0.25 0.5 0.75 1 1.25 λ γ κ 1 6 0.6 0.8 4 0.6 0.4 0.4 2 0.2 0.2 0 0 0 0 1 2 3 4 5 0 0.25 0.5 0.75 1 1.25 0 2 4 6 8 (a) Estimated densities, baseline beliefs vs. subjective beliefs (b) Estimated densities, baseline beliefs vs. subjective beliefs that correspond to past win frequencies. that deviate from implied probabilities.

Figure 11: Sensitivity analysis to alternative subjective beliefs. In Panel (a), we plot in red dash-dotted lines the esti- mated densities assuming subjective probabilities of outcomes equal the win frequencies of outcomes with similar prices. In Panel (b), we plot in red dotted (black dash-dotted lines) the estimated densities assuming subjective beliefs exhibit small (large) deviations from the odds-implied probabilities. In both panels, we plot in blue solid lines the estimated den- sities from our baseline analysis. The proportion with no loss aversion is 36% for the baseline beliefs, 48% for the beliefs in Panel (a), and 35% for both beliefs in Panel (b). The proportion with no probability weighting is 1% in all cases.

30The robustness of our results to the use of slightly different probabilities in the lottery representation of individuals’ choices should also alleviate concerns that our results are sensitive to possible inaccuracies in the calculations of these probabilities, as e.g., due to the tick size of the quoted odds.

31 6.3 Testing for skill in picking bets

Here, we examine how many individuals exhibit ‘skill’ in picking bets, hence their choices might not (solely) reflect their risk preferences. Skilled individuals should be able to exploit mispriced bets to systematically earn returns in excess of the expected return implied by the prices. To detect skill, we carry out individual- level tests to examine whether any individual’s average excess return significantly exceeds 0; since we perform multiple hypothesis tests at the individual level, we calculate bootstrap p-values and use the Sidak

(1967) correction for significance levels. We find no skilled individuals in our sample.

6.4 Budget-constrained choice set

In our baseline analysis, we construct the choice set by drawing the stake and other lottery characteristics from their empirical distributions in our data. Here, we conduct a sensitivity analysis to account for the fact that individuals may face a budget constraint. Specifically, we reconstruct the choice set by constraining the maximum possible stake to be AC500, which is a reasonable bound given our proxy for the income of the individuals under study.31 In Figure 12, we compare the posteriors from this analysis with those from the baseline analysis, and we see that they are very similar. The intuition for this is that individuals rarely choose lotteries with intermediate stakes, e.g., between AC100 and AC500, therefore the inclusion of lotteries with even higher stakes does not reveal substantially different information regarding their preferences.

π τ α 5 0.6 5 4 4 0.4 3 3 2 2 0.2 1 1 0 0 0 0 0.25 0.5 0.75 1 0 2 4 6 8 0 0.25 0.5 0.75 1 1.25 λ γ κ 1 6 0.6 0.8 4 0.6 0.4 0.4 2 0.2 0.2 0 0 0 0 1 2 3 4 5 0 0.25 0.5 0.75 1 1.25 0 2 4 6 8

Figure 12: Estimated population densities from our baseline model (blue solid line) and a model in which the maximum possible stake is constrained to be AC500 (red dash-dotted line). The proportion with no loss aversion is 36% for the baseline and 37% for the alternative model, and the proportion with no probability weighting is 1% for both models. 31That this is a reasonable lower bound for the budget constraint is corroborated by Olafsson and Pagel (2018) who use individual-level data from Iceland to show that even the least liquid households hold considerable liquidity (cash plus credit capacity) of approximately 60% of their monthly income, which for our sample translates to an average of more than AC1,000.

32 7 Concluding discussion

We have developed a rich structural model that accounts for individual heterogeneity in risk preferences both within and across utility types, and we have estimated it using a panel dataset of consumer betting activity in an online sports wagering market. The combination of our model and data allows us to estimate the different behavioral features in individual risk attitudes as well as to assess their relative prevalence in the population. We find that the distinctive features of prospect theory, i.e., loss aversion and probability weighting, offer a significant improvement in explaining observed behavior relative to utility curvature alone.

Comparing the two features, we find that two-thirds of the individuals exhibit loss aversion, while almost all exhibit probability weighting, suggesting that probability weighting is a more prevalent determinant of risk attitudes. Our findings indicate that prospect theory should not be viewed as a monolithic theory with a set of features that all individuals exhibit, but rather reality is more nuanced. Our estimates on the prevalence of the various behavioral features can provide an empirical justification for specific modeling assumptions and can be used to evaluate and guide applied research.

Besides emphasizing the relative importance of probability weighting, our paper also makes a method- ological contribution to the literature of decision making. Specifically, our mixture model and estimation technique for classifying individuals into types while allowing for within-type heterogeneity could be applied in a variety of experimental and field studies. For example, our approach could be used to study the prevalence of various types of behavior in terms of, e.g., risk preferences, time preferences, trust, or pro-social behavior.

Appendix

A Random generation of lotteries

Here we describe how we randomly generate lotteries. We follow a procedure that closely resembles the one people employ when they place bets, i.e., we draw the number of bets that comprise the lottery, the type of each bet, the amount of money staked in each bet, the number of wagers in each bet, and the odds associated with each wager. We draw these characteristics from their empirical distributions, i.e., from parametric distributions fitted by maximum likelihood to the data. In detail, we use the following steps:

1. For the number of bets that comprise a lottery, we make draws from a negative binomial that we fit to

the number of bets per day lottery across all day lotteries we observe.

33 2. We draw the general type of each bet (permutation or full-cover) from a Bernoulli distribution with success

probability equal to the proportion of permutation bets in our data. For permutation bets, we observe that

the specific bet type (i.e., the type of accumulators it involves) depends on the number of bets in the lottery

to which it belongs, so we draw the former conditional on the latter. Specifically, for each value of the

number of bets in a lottery, we fit a negative binomial to all bet types (indexed by k, the type of accumulator

bets they involve) that correspond to lotteries with this number of bets in our data, and then draw from the

fitted distributions conditional on the number of bets in the lottery. For full-cover bets, the same kind of

dependence is expected, but these types of bets occur rarely in our data, so we simply draw them from an

unconditional multinomial whose event probabilities equal their proportions calculated from all lotteries.

3. We also observe in the data that the number of wagers in a bet is related to the bet type. Sometimes, the

bet type imposes the number of wagers (for example, in a type-3 accumulator the number of wagers is

exactly three), while other times it imposes the minimum number of wagers (e.g., in a doubles there must

be at least two wagers). Furthermore, the distribution of the number of wagers in a bet depends on the

bet type; e.g., it is more likely to have four wagers in a doubles than in a singles bet. Thus, for each bet

type, we fit a negative binomial distribution with the appropriate support to the number of wagers in all

bets of that type in our data, and then we make draws from the fitted distributions conditional on bet type.

4. In our data, the amount of money staked in each bet also depends on the bet type. For permutation bets,

we fit to the amounts staked in all permutation bets a log-normal distribution whose mean is linear in the

bet type (indexed by k, the type of accumulator bets that the bet type involves). For each of the full-cover

bet types, we fit a log-normal to the amounts that were staked in bets of this type in our data. Then,

we draw from these fitted distributions the amount of money staked in a bet conditional on its bet type.

5. For the odds associated with each wager, we draw the commission and the implied win probability, and cal-

culate the corresponding odds. The commission is drawn from a Johnson distribution that we fit to the com-

missions we observe in all wagers in our data. The implied win probability is drawn from a uniform distri-

bution ranging from the minimum we observe in our data to the maximum possible given the commission.

We randomly make 20,000 draws, and we group them into 100 clusters using a hierarchical agglomeration algorithm (the generalized Ward method; Batagelj, 1988) that initially places each lottery in its own cluster, and in each successive step combines two clusters such that the mean distance between each lottery and the

34 corresponding cluster center is minimized, where the distance metric is the Wasserstein distance.32 Then, we choose the most representative lottery from each cluster, i.e., the lottery with the smallest mean distance to the other cluster members. This collection of lotteries reasonably ‘spans’ the randomly drawn lotteries.

B Markov chain Monte Carlo algorithm

Here, we present in detail the Markov chain Monte Carlo (MCMC) algorithm we use to estimate our model.  We present the algorithm for the general case in which the elements of the parameter vector β : π, τ, ρ0 0 = have type-specific distributions with full covariance matrices.

We want to make draws from the joint density  n o  , , µq , q , β , , p h en β Vβ n xn yn { } ˜ ˜ { } { } which is proportional to the product of the likelihood in Equation 6 and the joint prior " # " # Y  n o n o Y   β , µq , q ( ) µq , q ( ) . p n en β Vβ p en h p β Vβ p h ˜ ˜ · | · ˜ ˜ · n q Since it is not possible to calculate analytically the joint posterior density, we make draws from it using n o µq , q β a Gibbs sampler. Our Gibbs sampler consists of one block each for h, for β Vβ , and for n , en . ˜ ˜ { } { } In the first iteration, k 0, we make initial draws for all parameters. These starting values should not = matter if the whole posterior is explored (for details, see Geweke, 1999). We have used both fixed starting values and randomly drawn starting values (using the priors), and results are always very similar. n o n o Given the values of h, µq , V q , β , and e in iteration k, in iteration k 1 the Gibbs sampler β β { n} { n} + ˜ ˜ n o  (k) µq , q β (k) , (k) produces sequential draws from the conditional densities p h en , p β Vβ n en , and { } ˜ ˜ { } { }  n o(k 1) n o(k 1)  β , , , µq + , q + , (k 1) p n en xn yn β Vβ h + , which follow from the assumptions in Section 2 of { } { } { } ˜ ˜ the paper. Specifically, suppressing iteration superscripts, the conditional densities for each block are: PN  Letting h h 1 en, the conditional posterior for h has the Dirichlet distribution h en D h . • = · +n 1 |{ } ∼ =   µq , q The conditional posterior for β Vβ is • ˜ ˜    q   1 q  q 1 µq , β , q N κq , q − , β ,µq W λ , 3 − , β en n Vβ β Kβ Vβ en n β β β ˜ { } { } ˜ ∼ ˜ ˜ ˜ { } { } ˜ ∼ ˜ ˜

32The Wasserstein distance is a metric defined on the space of probability distributions. The Wasserstein distance between densities f , f is inf E [ X Y ] where f is any joint density for X, Y with marginals f , f . It can be thought of as X Y fXY k − k XY X Y the minimum work needed to turn fX into fY , where work is the amount of probability mass moved times the distance it is moved.

35 where the posterior hyperparameters are

N !! 1 N 1 − q 1  q − X q q X q  q  q 0 − 3 3 β µ β µ K β K β Vβ en β β en n β n β ˜ = ˜ + ˜ ˜ = ˜+ ˜ − ˜ ˜ − ˜ n 1 n 1 = = N ! N 1 q q 1  q − X  q  q X q κ − κ β λ λ , β Kβ K β β Vβ en n β β en ˜ = ˜ ˜ ˜+ ˜ ˜ ˜ = ˜ + n 1 n 1 = =

with κβ ,K β ,λβ ,3β the prior hyperparameters. We set κβ 0, K β 100I , λβ β , and 3β I ; this ˜ ˜ ˜ ˜ ˜ = ˜ = ˜ = ˜ ˜ = makes our priors weak and lets the data determine the posterior.

The conditional posterior for (β , e ) is • n n  n o n o   n o n o β , , , µq , q , ( , β ) β , µq , q ( ) . p n en xn yn β Vβ h p xn yn n p n en β Vβ p en h ˜ ˜ ∝ | · ˜ ˜ · | This conditional posterior does not have a convenient form, so we make draws from it using the Metropolis-

Hastings algorithm. That is, we need to use a candidate generating density q ( ) to make candidate · draws, which are then accepted or rejected, based on an acceptance probability that generally moves the

Markov chain toward areas of the parameter space where posterior probability is high. We first generate

a candidate draw en0 by using a random walk density that allocates the individual to one of the utility

types, and then we generate a candidate draw βn0 , taking care that βn0 satisfies the parameter constraints for the type to which individual n has been allocated. Candidate draws are accepted or rejected according

to the Chib and Greenberg (1995) acceptance probability, calculated as   n q o n q o   p β0 , e0 x , y , µ , V , h q (β , e )  n n| n n β β · n n  min  n ˜o n ˜ o  , 1 .  β , , , µq , q , β ,   p n en xn yn β Vβ h q n0 en0 | ˜ ˜ · If the candidate draw is not accepted, the algorithm retains the current draw.

We make 5 million draws from which we discard the first 10% as burn-in and retain every 50th after that to mitigate serial correlation. The convergence of the MCMC draws to the joint posterior distribution is monitored by visually inspecting the trace plots and density plots of the posterior estimates as well as using standard diagnostic tests. In Figure IA.6 of the Internet Appendix, we present the trace plots of the population model parameters, which indicate no problems with convergence. The algorithm was coded in Octave and

C++, and its run time was 40 hours on the c4.8xlarge instance on Amazon’s EC2 cloud computing service.

36 References

Abdellaoui, M. 2000. “Parameter-free elicitation of utility and probability weighting functions.” Management Science, 46(11): 1497– 1512. Abdellaoui, M, H Bleichrodt, and C Paraschiv. 2007. “Loss aversion under prospect theory: A parameter-free measurement.” Management Science, 53(10): 1659–1674. Ali, M. M. 1977. “Probability and utility estimates for racetrack bettors.” Journal of Political Economy, 85(4): 803–815. Andrikogiannopoulou, A, and F Papakonstantinou. 2019. “History-dependent risk preferences: Evidence from individual choices and implications for the disposition effect.” Review of Financial Studies, forthcoming. Baillon, A, H Bleichrodt, and V Spinu. 2019. “Searching for the reference point.” Management Science, forthcoming. Barberis, N. 2013. “Thirty years of prospect theory in economics: A review and assessment.” Journal of Economic Perspectives, 27(1): 173–96. Barseghyan, L, F Molinari, T O’Donoghue, and J. C Teitelbaum. 2013. “The nature of risk preferences: Evidence from insurance choices.” American Economic Review, 103(6): 2499–2529. Barseghyan, L, F Molinari, T O’Donoghue, and J. C Teitelbaum. 2018. “Estimating risk preferences in the field.” Journal of Economic Literature, 56(2): 501–564. Barsky, R, T Juster, M Kimball, and M Shapiro. 1997. “Preference parameters and behavioral heterogeneity: An experimental approach in the health and retirement survey.” Quarterly Journal of Economics, 112(2): 537–80. Batagelj, V. 1988. “Generalized Ward and related clustering problems.” In Classification and related methods of data analysis, ed. H. H. Bock, 67–74. Amsterdam, the Netherlands:North-Holland. Beetsma, R. M. W. J, and P. C Schotman. 2001. “Measuring risk attitudes in a natural experiment: Data from the television game show Lingo.” Economic Journal, 111(474): 821–848. Ben-Akiva, M, M. J Bergman, A. J Daly, and R Ramaswamy. 1984. “Modeling inter-urban route choice behaviour.” In Proceed- ings of the 9th International Symposium on Transportation and Traffic Theory, ed. J. Volmuller and R. Hamerslag. Boca Raton, FL:CRC Press. Benartzi, S, and R. H Thaler. 1995. “Myopic loss aversion and the equity premium puzzle.” Quarterly Journal of Economics, 110(1): 73–92. Bleichrodt, H, and J. L Pinto. 2000. “A parameter-free elicitation of the probability weighting function in medical decision analysis.” Management Science, 46(11): 1485–1496. Bombardini, M, and F Trebbi. 2012. “Risk aversion and expected utility theory: An experiment with large and small stakes.” Journal of the European Economic Association, 10(6): 1348–1399. Booij, A. S, B. M. S Van Praag, and G Van De Kuilen. 2010. “A parametric analysis of prospect theory’s functionals for the general population.” Theory and Decision, 68(1): 115–148. Brownstone, D, D. S Bunch, and K Train. 2000. “Joint mixed logit models of stated and revealed preferences for alternative-fuel vehicles.” Transportation Research Part B: Methodological, 34(5): 315–338. Bruhin, A, H Fehr-Duda, and T Epper. 2010. “Risk and rationality: Uncovering heterogeneity in probability distortion.” Econo- metrica, 78(4): 1375–1412. Cain, M, D Law, and D Peel. 2000. “The favourite-longshot bias and market efficiency in U.K. football betting.” Scottish Journal of Political Economy, 47(1): 25–36. Chetty, R. 2006. “A new method of estimating risk aversion.” American Economic Review, 96(5): 1821–1834. Chiappori, P.-A, and M Paiella. 2011. “Relative risk aversion is constant: Evidence from panel data.” Journal of the European Economic Association, 9(6): 1021–1052. Chiappori, P.-A, B Salanié, F Salanié, and A Gandhi. 2019. “From aggregate betting data to individual risk preferences.” Econometrica, 87(1): 1–36. Chib, S, and E Greenberg. 1995. “Understanding the Metropolis-Hastings algorithm.” American Statistician, 49(4): 327–335. Choi, S, R Fisman, D Gale, and S Kariv. 2007. “Consistency and heterogeneity of individual behavior under uncertainty.” American Economic Review, 97(5): 1921–1938. Cicchetti, C. J, and J. A Dubin. 1994. “A microeconometric analysis of risk aversion and the decision to self-insure.” Journal of Political Economy, 102(1): 169–186. Cohen, A, and L Einav. 2007. “Estimating risk preferences from deductible choice.” American Economic Review, 97(3): 745–788.

37 Conte, A, J. D Hey, and P. G Moffatt. 2011. “Mixture models of choice under risk.” Journal of Econometrics, 162(1): 79–88. Dawid, A. P. 1979. “Conditional independence in statistical theory.” Journal of the Royal Statistical Society, Series B (Methodologi- cal), 41(1): 1–31. Donkers, B, B Melenberg, and A van Soest. 2001. “Estimating risk attitudes using lotteries: A large sample approach.” Journal of Risk and Uncertainty, 22(2): 165–95. Fasman, J. 2010. “Shuffle up and deal.” Economist, July 8. http://tinyurl.com/gstoqu2, Last Accessed: 1 Aug, 2016. Fehr-Duda, H, and T Epper. 2012. “Probability and risk: Foundations and economic implications of probability-dependent risk preferences.” Annual Review of Economics, 4(1): 567–593. Gandhi, A, and R Serrano-Padial. 2014. “Does belief heterogeneity explain asset prices: The case of the longshot bias.” The Review of Economic Studies, 82(1): 156–186. von Gaudecker, H, A van Soest, and E Wengstrom. 2011. “Heterogeneity in risky choice behavior in a broad population.” American Economic Review, 101(2): 664–694. Gelfand, A. E, and S. K Sahu. 1999. “Identifiability, improper priors, and Gibbs sampling for generalized linear models.” Journal of the American Statistical Association, 94(445): 247–253. Gelman, A. 2006. “Prior distributions for variance parameters in hierarchical models.” Bayesian Analysis, 1(3): 513–533. Geman, S, and D Geman. 1984. “Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images.” Pattern Analysis and Machine Intelligence, IEEE Transactions on Pattern Analysis and Machine Intelligence, 6: 721–741. Gertner, R. 1993. “Game shows and economic behavior: Risk-taking on ‘Card Sharks’.” Quarterly Journal of Economics, 108(2): 507–521. Geweke, J. 1999. “Using simulation methods for Bayesian econometric models: Inference, development, and communication.” Econometric Reviews, 18(1): 1–73. Goldstein, W. M, and H. J Einhorn. 1987. “Expression theory and the preference reversal phenomena.” Psychological Review, 94(2): 236–254. Golec, J, and M Tamarkin. 1998. “Bettors love skewness, not risk, at the horse track.” Journal of Political Economy, 106(1): 205– 225. Gottardo, R, and A Raftery. 2008. “Markov chain Monte Carlo with mixtures of mutually singular distributions.” Journal of Computational and Graphical Statistics, 17(4): 949–975. H2 Gambling Capital. 2013. “There’s nothing virtual about the opportunity in real-money gambling.” H2 Gambling Capital & Odobo, Gibraltar. Harrison, G. W, and E. E Rutstrom. 2009. “Expected utility theory and prospect theory: One wedding and a decent funeral.” Experimental Economics, 12(2): 133–158. Harrison, G. W, J. A List, and C Towe. 2007. “Naturally occurring preferences and exogenous laboratory experiments: A case study of risk aversion.” Econometrica, 75(2): 433–458. Hastings, W. 1970. “Monte Carlo sampling methods using Markov chains and their applications.” Biometrika, 57(1): 97–109. Hey, J. D, and C Orme. 1994. “Investigating generalizations of expected utility theory using experimental data.” Econometrica, 62(6): 1291–1326. Holt, C. A, and S. K Laury. 2002. “Risk aversion and incentive effects.” American Economic Review, 92(5): 1644–1655. Hsee, C. K, and Y Rottenstreich. 2004. “Music, pandas, and muggers: On the affective psychology of value.” Journal of Experimental Psychology: General, 133(1): 23–30. Jullien, B, and B Salanie. 2000. “Estimating preferences under risk: The case of racetrack bettors.” Journal of Political Economy, 108(3): 503–530. Kahneman, D, and A Tversky. 1979. “Prospect theory: An analysis of decisions under risk.” Econometrica, 47(2): 263–292. Kliger, D, and O Levy. 2009. “Theories of choice under risk: Insights from financial markets.” Journal of Economic Behavior & Organization, 71(2): 330–346. Kobberling, V, and P. P Wakker. 2005. “An index of loss aversion.” Journal of Economic Theory, 122(1): 119–131. Kuypers, T. 2000. “Information and efficiency: An empirical study of a fixed odds betting market.” Applied Economics, 32(11): 1353– 63. Lattimore, P, J Baker, and A Witte. 1992. “The influence of probability on risky choice: A parametric examination.” Journal of Economic Behavior & Organization, 17(3): 377–400. Levitt, S. D. 2004. “Why are gambling markets organised so differently from financial markets?” Economic Journal, 114(495): 223–

38 246. Manski, C. F. 2004. “Measuring expectations.” Econometrica, 72(5): 1329–1376. Marschak, J. 1960. “Binary-choice constraints and random utility indicators.” In Stanford symposium on mathematical methods in the social sciences, Vol. 7, ed. K. J. Arrow, S. Karlin and P. Suppes, 19–38. Stanford, CA: Stanford University Press. McFadden, D. 1974. “Conditional logit analysis of qualitative choice behavior.” In Frontiers in Econometrics, ed. P. Zarembka, 105–142. :Academic Press. Metrick, A. 1995. “A natural experiment in Jeopardy!” American Economic Review, 85(1): 240–253. Metropolis, N, A Rosenbluth, M Rosenbluth, A Teller, and E Teller. 1953. “Equation of state calculations by fast computing machines.” Journal of Chemical Physics, 21(6): 1087–1092. Murphy, R. O, and R. H ten Brincke. 2017. “Hierarchical maximum likelihood parameter estimation for cumulative prospect theory: Improving the reliability of individual risk parameter estimates.” Management Science, 64(1): 308–326. Nilsson, H, J Rieskamp, and E.-J Wagenmakers. 2011. “Hierarchical Bayesian parameter estimation for cumulative prospect theory.” Journal of Mathematical Psychology, 55(1): 84–93. O’Donoghue, T, and C Sprenger. 2018. “Reference-dependent preferences.” In Handbook of behavioral economics, Vol. 1, ed. Douglas Bernheim, Stefano DellaVigna and David Laibson, Chapter 1, 1–77. Amsterdam, the Netherlands:Elsevier. Olafsson, A, and M Pagel. 2018. “The liquid hand-to-mouth: Evidence from personal finance management software.” The Review of Financial Studies, 31(11): 4398–4446. Paravisini, D, V Rappoport, and E Ravina. 2017. “Risk aversion and wealth: Evidence from person-to-person lending portfolios.” Management Science, 63(2): 279–297. Paul, R. J, and A. P Weinbach. 2008. “Price setting in the NBA gambling market: Tests of the Levitt model of sportsbook behavior.” International Journal of Sport Finance, 3: 137–145. Paul, R. J, and A. P Weinbach. 2009. “Sportsbook behavior in the NCAA football betting market: Tests of the traditional and Levitt models of sportsbook behavior.” Journal of Prediction Markets, 3(2): 21–37. Poirier, D. J. 1998. “Revising beliefs in nonidentified models.” Econometric Theory, 14(4): 483–509. Polkovnichenko, V. 2005. “Household portfolio diversification: A case for rank-dependent preferences.” Review of Financial Studies, 18(4): 1467–1502. Polkovnichenko, V, and F Zhao. 2013. “Probability weighting functions implied in options prices.” Journal of Financial Economics, 107(3): 580–609. Pope, P. F, and D. A Peel. 1989. “Information, prices and efficiency in a fixed-odds betting market.” Economica, 56(223): 323–41. Post, T, M. J van den Assem, G Baltussen, and R. H Thaler. 2008. “Deal or no deal? Decision making under risk in a large-payoff game show.” American Economic Review, 98(1): 38–71. Qiu, J, and E Steiger. 2011. “Understanding the two components of risk attitudes: An experimental analysis.” Management Science, 57(1): 193–199. Rabin, M. 2000. “Risk aversion and expected utility theory: A calibration theorem.” Econometrica, 68(5): 1281–1292. Robert, C. P. 2007. The Bayesian choice: From decision-theoretic foundations to computational implementation. New York:Springer Science & Business Media. Sidak, Z. 1967. “Rectangular confidence regions for the means of multivariate normal distributions.” Journal of the American Statistical Association, 62(318): 626–633. Snowberg, E, and J Wolfers. 2010. “Explaining the favorite-longshot bias: Is it risk-love or misperceptions?” Journal of Political Economy, 118(4): 723–746. Stott, H. P. 2006. “Cumulative prospect theory’s functional menagerie.” Journal of Risk and Uncertainty, 32(2): 101–130. Tversky, A, and D Kahneman. 1992. “Advances in prospect theory: Cumulative representation of uncertainty.” Journal of Risk and Uncertainty, 5(4): 297–323. Wakker, P. P. 2010. Prospect theory: For risk and ambiguity. Cambridge University Press. Walasek, L, T. L Mullett, and N Stewart. 2018. “A meta-analysis of loss aversion in risky contexts.” Unpublished Paper. Wolfers, J, and E Zitzewitz. 2004. “Prediction markets.” Journal of Economic Perspectives, 18(2): 107–126. Woodland, L. M, and B. M Woodland. 1994. “Market efficiency and the favorite-longshot bias: The baseball betting market.” Journal of Finance, 49(1): 269–79.

39