PREFERENCE FOR SIMPLICITY

INDIRA PURI

MIT Economics

First Uploaded: September 8, 2018. This Version: May 21, 2020.

Abstract. This paper introduces and axiomatizes representations in which the agent assesses a lottery less favorably if it contains more outcomes. These representations, which we term simplicity theory, are motivated by experimental evidence. They capture as special cases expected utility, the certainty effect, and a range of laboratory and em- pirical phenomena. We compare simplicity theory to existing theories including expected utility theory and prospect theory. We provide parametric examples of simplicity the- ory, and relate the theory to applications including , frequency with which people choose dominated options, cognitive psychology research, and household financial behavior.

E-mail address: [email protected]. I am deeply grateful to Glenn Ellison, Muhamet Yildiz, and especially Drew Fudenberg, for guidance on this project. I also thank , Abhijit Banerjee, Gabriel Carroll, Roberto Corrao, Jetlir Du- raj, Sara Ellison, Sergiu Hart, Kevin He, Ben H`ebert, Caroline Hoxby, Jon Levin, Paul Milgrom, Stephen Morris, Abraham Neyman, Muriel Niederle, Kirby Nielson, Ryan Oprea, Pietro Ortoleva, Nobuhiro Kiy- otaki, Ilya Segal, Andrei Shleifer, Andy Skrzypacz, Tomasz Strzalecki, John Sturm, Takuo Sugaya, Larry Summers, Robert Townsend, Ivan Werning, Alex Wolitzky, Stanford Theory Workshop participants, and MIT Theory Workshop participants for valuable feedback. I am grateful for financial support from the Paul and Daisy Soros Foundation for New Americans and the National Science Foundation Graduate Research Fellowship Program under Grant No. 1122374. Any opinions, findings, and conclusions or rec- ommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation or other funding sources. PREFERENCE FOR SIMPLICITY 1

1. Introduction This paper introduces simplicity theory, which is an alternative to expected utility theory, prospect theory, and other theories of decision making under risk. In simplicity theory, agents act as if they assign a utility premium to lotteries with fewer outcomes. We interpret this observed behavior as a bias towards simplicity, arising from cognitive constraints. We introduce simplicity theory for four reasons. First, many of the experiments which motivated the introduction of prospect theory have an alternative explanation, namely, simplicity theory. Second, this alternative explanation is supported by experiments in other fields, in particular by the psychology and marketing literatures. Third, prospect theory fails rigorous tests, and the authors of these tests write that accounting for the support of a lottery helps explain their observed data. Fourth, in certain settings, people routinely violate dominance; simplicity theory predicts when and why such violations will occur. In the next four paragraphs, we explore each of these motivating reasons in depth. The idea that the experiments which motivated prospect theory have an alternative explanation builds on earlier insights. Neilson (1992) pointed out that many of the usual violations of expected utility theory could be explained by individuals preferring lotteries with fewer outcomes. Decades later, the same reasoning appears valid. In particular, two classes of experiments motivating prospect theory – the certainty effect and examin- ing when violations of independence occur – can be explained by a preference for fewer outcomes. First, prominent experiments which find that certainty is special compare two- outcome lotteries and one-outcome lotteries only (Tversky and Kahneman (1981), Tver- sky and Kahneman (1986), Cohen and Jaffray (1988), Andreoni and Sprenger (2012), Callen, Isaqzadeh, Long, and Sprenger (2014)). This leaves open the possibility that three-outcome lotteries are better than five-outcome lotteries, that five-outcome lotteries are better than ten-outcome lotteries, and so on. Second, an experimental finding that mo- tivated the introduction of prospect theory is that violations of independence occur more often when comparing three-outcome lotteries to two- or one-outcome lotteries, than when comparing three-outcome lotteries to each other (Starmer (2000), Harless and Camerer (1994))1. In their 2012 review of prospect theory, Fehr-Duda and Epper (2012) write that this finding is “the reason why these models [probability-weighting] were devised in the first place.” Probability weighting captures the phenomenon, but so does the idea that individuals prefer lotteries with fewer outcomes. Other literatures have tested the idea that people penalize lotteries or menus with too many outcomes. In psychology, there is a literature on ‘complexity aversion.’ Complexity aversion encompasses a number of experimental studies, with the definition of complexity

1This is referred to by Starmer (2000) as ‘behavior on the interior of the probability triangle tends to conform more closely to the implications of Expected Utility Theory than behavior at the borders;’ Harless and Camerer (1994) likewise write ‘in the triangle interior, however, EU manages a miraculous recovery.’ 2 INDIRA PURI varying from study to study. Both Moffatt, Sitzia, and Zizzo (2015) and Sonsino, Ben- zion, and Mador (2002) define complexity as the number of different outcomes of a lottery. They find evidence that individuals are complexity averse. Huck and Weizs¨acker (1999), Mador, Sonsino, and Benzion (2000), and Sonsino and Marvin (2001) define complexity in terms of the size of a lottery’s support in addition to other attributes, and likewise find evidence in favor of their version of complexity aversion. In marketing, there is a literature on choice overload, which states that if a consumer is asked to choose between too many options, she may not choose at all (Tversky and Shafir (1992), Iyengar and Lep- per (2000), Iyengar and Kamenica (2010), Scheibehenne, Greifeneder, and Todd (2010), Chernev, B¨ockenhold, and Goodman (2015)). This is distinct from the idea of selecting between lotteries. However, the apparent existence of a choice overload effect suggests that too many possibilities is somehow ‘bad’; simplicity theory adapts this idea for the lottery domain. Historically, expected utility theory has been tested rigorously and often; alternative decision-making theories less so. Recent research subjects prospect theory to rigorous tests. Bernheim and Sprenger (2019) test the rank dependent predictions of cumulative prospect theory, and test prospect theory using event-splitting effects. Participants in their study do not, in aggregate, act in accordance with prospect or cumulative prospect theory predictions. Bernheim and Sprenger (2019) suggest that participants preferring lotteries with fewer outcomes helps explain their data: “A. . . more promising possibility is that the observed behavior reflects a combination of standard prospect theory and a form of complexity aversion: people may prefer lotteries with fewer outcomes because they are easier to understand.” Fehr-Duda et al. (2010) and Etchart-Vincent (2009) test whether the probability weighting function remains the same regardless of a lottery’s support, as predicted by prospect and cumulative prospect theory. They find that the probability weighting function changes with the support of the lottery considered. Households routinely violate dominance in certain financial settings. Simplicity the- ory provides one possible explanation of when and why such violations will occur. Egan (2018) writes that several banks, for example, J.P. Morgan, regularly issue dominated and dominating otherwise identical bonds, where domination takes the price of the bond into account. Consumers often purchase the dominated product. In some cases, consumers purchase the dominated product ten times as often as its dominating counterpart. He rationalizes this consumer behavior by consumers performing limited search. Anagol et al. (2016) similarly find that life insurance agents in India routinely advise customers to select strictly dominated products. We show conditions under which a simplicity agent will exhibit a higher willingness to pay for dominated products, and relate this preference to real-world financial behavior. We particularly examine certain financial derivatives (Section 5.3). We also sketch an application of the theory to limited search (Section 6). PREFERENCE FOR SIMPLICITY 3

The unique observable content corresponding to simplicity theory is captured in its ax- ioms. There are four axioms characterizing simplicity theory, of which three are new. The three new ones are the following. First, Same-Support Independence states that, given an ordering on two lotteries with the same support, that ordering should not reverse when an identically weighted lottery is added to each. Cross-Support Independence states that given four lotteries, if the two -better lotteries have a larger support size than the two -worse lotteries, then the mixture of all four lotteries which places more weight on the -better lotteries should be -better than the mixture of all four lotteries which places more weight on the -worse lotteries. Semi-Continuity has part of the content of the standard sequential continuity axiom: it says that, as a sequence of lotteries pn converges to a lottery p, the lottery p cannot be worse than the lotteries pn which led to it. The formal version of each axiom can be found in Section 3.3. While we believe a stochastic model may be a more realistic depiction of how people behave, we start the paper by discussing a deterministic preference over lotteries. This is for the purpose of relating the theory to prospect and expected utility theory; tractability; and because the stochastic representation uses the deterministic preference as an input to its weighting function. To understand the stochastic model, it is therefore helpful to first understand the deterministic model. The theoretical properties of deterministic simplicity theory are as follows. It is well- defined for any preference  over ∆X, where X is an arbitrary space and ∆X the set of lotteries over X which have finite support (Section 3). Section 3.4 shows that, under minor technical conditions, a simplicity representation is unique up to an affine transformation. This provides a means by which to identify a simplicity representation using observable data. Section 3.4 also defines the notion of being more complexity averse, and shows that being more complexity averse translates to the representation in an intuitive way. Section 3.6 provides parametric examples and discusses the form of the complexity cost function. Section 3.5 extends simplicity theory to provide conditions under which each lottery has a certainty equivalent. We show how doing so increases the tractability of the represen- tation, in that the representation has more properties than in the general case. Utilizing certainty equivalents, Section 3.6 axiomatizes the curvature of the complexity cost func- tion. Section 4 introduces stochastic simplicity theory. The stochastic choice model is well- defined for any choice rule ρ over M(∆X), the set of finite menus of lotteries on X. Section 4.2 shows that the axioms corresponding to stochastic simplicity theory are pre- cisely those corresponding to deterministic simplicity theory, alongside the standard Luce axioms. Section 4.2.2 shows that one can uniquely identify the stochastic choice repre- sentation from choice frequency data. It also conducts comparative statics analysis with respect to the complexity cost function. Section 4.2.3 explores properties of the weighting 4 INDIRA PURI function. It shows that curvature of the weighting function corresponds to whether viola- tions of first-order stochastic dominance become more frequent as complexity increases. We show that simplicity theory predicts phenomena which other theories of choice un- der risk cannot, in Section 5. Section 5.2 discusses laboratory evidence; in addition to the experiments discussed in the introduction, it shows that simplicity theory predicts violations of dominance as in the uncertainty effect (Section 5.2.1), event splitting effects (Section 5.2.2), and the tendency to make more mistakes as the number of outcomes in- creases (Section 5.2.3). Section 5.3 discusses retail investor behavior regarding options. It also shows that simplicity agents may prefer dominated binary options to dominating option mixtures, and relates this behavior to the European Union and Israel banning the marketing of binary options to retail investors on the grounds that retail investors were losing too much money when buying these products (Weinglass (2017), European Secu- rities and Markets Authority Press Release (2018)). Section 5.1 shows that methods of eliciting risk aversion which rely on certainty equivalents conflate risk aversion with com- plexity aversion. It uses this insight to predict puzzling experimental findings regarding risk aversion, including: that higher IQ people are less risk averse; and that, when a per- son is under cognitive load, her risk aversion increases. Section 5.4 provides a working memory interpretation of simplicity theory. Under this interpretation, agents who are unfamiliar with the outcome domain X may be more likely to display an as-if preference for simplicity, because it may take more working memory to consider each outcome. The working memory interpretation of simplicity theory generates new predictions, which the section discusses. While simplicity theory works well in the applications described, and in these cases is able to accommodate behavior that well-known decision theories cannot, there are other settings in which using simplicity theory may not be as appropriate. Section 6 discusses these settings and provides generalizations of simplicity theory which may better suit them. It also sketches further possible extensions of the theory, for example, to choice un- der time pressure. Section 7 concludes. Proofs for Sections 3.3, 3.4, and 4 are in Appendix A. All other proofs are in Appendix B.

2. Related Theoretical Literature There are several related strands of literature. The first are papers which axioma- tize representations in which a lottery’s support affects its attractiveness. These include Gilboa (1988) and Jaffray (1988), in which a lottery’s attractiveness depends on its ex- pected utility value as well as the worst outcome in the lottery. Cohen (1992) axiomatizes a representation in which the evaluation of a lottery depends on its expected utility value, the best outcome in the lottery, and the worst outcome in the lottery. The proofs in these papers start by imposing independence on a subset of lotteries. In this way, these papers are similar to the present manuscript. However, in contrast to these papers, a simplicity PREFERENCE FOR SIMPLICITY 5 agent cares about the number of outcomes in a lottery. Also in this category is the boundary effects model of Neilson (1992). In this model, each support size has its own utility function: a lottery with support size n is evaluated with Bernoulli utility function un, with u1 > u2 > ...u|X|. To the best of our knowledge, this representation lacks axiomatization. The deterministic simplicity representation in the current paper can be thought of as a special case of boundary effects. We axiomatize this special case, providing testable empirical content unique to simplicity theory. We also provide novel applications of simplicity theory, both in the experimental domain and in the non-experimental domain (Section 5). The stochastic choice simplicity represen- tation, while related to the deterministic simplicity representation, is new and distinct from boundary effects. For example, the stochastic simplicity representation will predict probabilistic violations of dominance even when choosing between lotteries of the same support size, while boundary effects will not. Second is a literature on items in menus. The agent of Ortoleva (2013) chooses between lotteries of menus of objects, penalizing those lotteries with a larger number of menus. In Ergin (2003) and Ergin and Sarver (2010), the agent picks between menus, and menus with a larger number of items incur higher cost. Like the current paper, these papers in- corporate the idea that a larger support for some object of interest may not be desirable. However, because the domains are different, the empirical content unique to a simplicity agent (i.e. the axioms in this paper) is distinct from the empirical content unique to agents in the aforementioned settings (i.e. the axioms in the preceding papers). The motivation and applications for each of the above papers, and the current paper, are also distinct. For example, a simplicity agent has an as-if preference over lotteries; and she makes more mistakes as complexity of lotteries within a given menu increases. Third, there are papers axiomatizing the certainty effect. Schmidt (1998) axiomatizes the u−v model of preferences; in this model, the agent evaluated degenerate lotteries with Bernoulli utility u and nondegenerate lotteries with Bernoulli utility v. In Cerreia-Vioglio, Dillenberger, and Ortoleva (2015), the agent evaluates a lottery according to its worst cer- tainty equivalent over a set of Bernoulli utility functions. These papers relate to the current manuscript in that a special case of the simplicity representation accommodates the certainty effect (Section 5). However, a simplicity representation also accommodates behavior other than the certainty effect. For example, a simplicity agent may prefer lot- teries with two outcomes to lotteries with five. Fourth, the stochastic simplicity model contributes to a burgeoning literature on sto- chastic choice. This literature has historically focused on static random expected utility variants (Luce (1959), Block and Marshak (1960), Gul and Pesendorfer (2006)), with more recent work exploring dynamic stochastic choice (Frick, Iijima, and Strzalecki (2019)) or preferences on menus given stochastic behavior (Anh and Sarver (2013) ). Relative to 6 INDIRA PURI

Figure 1. Simplicity Representation: Example. This figure considers three possible outcomes, with the utility of each outcome given by 2, 5, and 10, respectively. The cost function is taken to be C(1) = 0,C(2) = 4, and C(3) = 11. these models, a stochastic simplicity agent behaves differently in that she will pick lot- teries with fewer outcomes more often. Stochastic simplicity theory also accommodates a new prediction, that violations of first order stochastic dominance become more frequent as the complexity of the lotteries under consideration increases.

3. Representation, Axiomatization, and Properties (Deterministic Case) This section introduces the (deterministic) simplicity model, characterizes the model via axioms, and discusses its properties including observable comparative statics, uniqueness of the representation, and properties of the cost function. A simplicity representation has two components: an expected utility component and a complexity cost. The agent trades off goodness of a lottery, measured by its expected utility value, against the complexity of the lottery, proxied for by support size. We connect this proxy for complexity to cog- nitive psychology research, and show that complexity as measured in this way allows the simplicity model to uniquely and cleanly predict varied and widespread empirical phenom- ena. Understanding the deterministic representation is also useful in understanding our stochastic model, introduced later in the paper. We will hereafter distinguish the models by referring to the deterministic model as ‘simplicity theory’ and the stochastic model as ‘stochastic simplicity theory.’ We emphasize that while we follow the standard practice of referring to (deterministic) observed choices as a ‘preference relation’, such observed behavior may be the result of mistakes. These mistakes may arise from cognitive constraints, such as limited working PREFERENCE FOR SIMPLICITY 7 memory. The provided theory allows for both the mistakes and preference interpretations, as it speaks only to observable behavior.

3.1. Environment. Let X be an arbitrary space, and ∆X the set of finite-support lotter- ies on X. The preference  is defined on ∆X. Given outcome x ∈ X and lottery p ∈ ∆X, notation δx denotes the lottery that gives outcome x with probability 1 and notation p(x) denotes the probability that lottery p assigns to outcome x. The support of a lottery is the set of outcomes to which the lottery assigns positive probability. Throughout this paper, we do not consider compound lotteries. We define a combina- tion of lotteries as follows.

Definition 1. (Combination of Lotteries). Given α ∈ [0, 1], and p, q ∈ ∆X, define the lottery αp + (1 − α)q as (αp + (1 − α)q)(x) = αp(x) + (1 − α)q(x).

The combination of two lotteries is written as a definition because compound lotteries are presumed not to exist. An interpretation is that the agent is never actually offered a compound lottery; the notation αp + (1 − α)q is shorthand for the single lottery written above. We justify not considering compound lotteries on three grounds. First, there does not seem to be an intuitive way to think about a complexity trade-off between level of com- poundness and support of a lottery. Given also a lack of experimental guidance, any for- malization would have to be done blindly. Second, in many of the experiments mentioned in the introduction, subjects are offered single lotteries. They are not offered compound lotteries. Insofar as one goal of this paper is to formalize these experimental findings, it can perhaps be forgiven for adhering to the set-up in those experimental findings. Third, in many real-world cases of decision making under uncertainty, for example, the individual decision to participate in a national lottery; the individual decision to purchase insurance; the individual decision to gamble; the collective decision to adopt solar energy; the choice at hand is between single lotteries only.

3.2. Simplicity Representation.

Representation 1. A preference  on ∆X is said to have a Simplicity Representation (u, C) if it can be represented by a function X U(p) = u(x)p(x) − C(|support(p)|) x∈support(p) where u : X → R, termed the Bernoulli utility, is injective, and C : {|Z| : Z ⊆ X is finite} → R satisfies: [1] C(1) = 0. [2] C is weakly increasing. 8 INDIRA PURI

The first property says that degenerate lotteries incur no complexity cost; this is just a normalization. The second property says that C is monotone: lotteries with a larger support size incur a weakly higher complexity cost. The preference allows for violations of first-order stochastic dominance when the amount of dominance is small relative to the lottery’s complexity cost. A simplicity agent trades off quality of the lottery, as measured by its expected utility value, and the cognitive effort of considering the lottery. If a lottery first-order stochastically dominates another lottery by some small amount, but it is much harder to think about, the agent will resolve the conflict in favor of the slightly worse, easier-to-process lottery. On the other hand, if a lottery is much better as measured by its expected utility value, then the agent will pick it even if the lottery is more complex. Stochastic simplicity theory, introduced in Section 4, will add nuance relative to the deterministic model, in several ways. First, the deterministic model predicts that, when choosing between lotteries which have the same number of outcomes, the agent always picks optimally. One may instead expect the frequency of first-order stochastic dominance violations to increase as complexity increases. Stochastic simplicity theory will make this prediction. Second, the deterministic model predicts zero-one violations of dominance. One may instead expect probabilistic violations of dominance, for example, due to noisy cognitive errors which induce the agent to think less about a large support lottery, as dis- cussed in Section 5.4.1. Stochastic simplicity theory will allow for probabilistic violations of dominance. This representation is sensitive to perturbations: an  probability of a new outcome will incur the same complexity cost as a non-trivial probability of a new outcome. This prediction maps well to experimental data: see Section 5. For readers who may not like this prediction, in Section 6, we provide a more general extension to the model which ac- commodates perturbations. This extension maintains many of the properties of simplicity theory, while allowing for robustness to perturbations. One may also ask why support is the correct way to model simplicity. As analysts, we have to choose what is first-order when modeling decision-making. Because support captures so many experiments and empirical phenomena, including those involving dom- inance violations and mistakes (Section 5), we consider support first-order in thinking about simplicity. We begin with analysis of the deterministic model for several reasons. First, it provides a starting point from which to understand how simplicity theory differs from expected utility theory, prospect theory, and other deterministic models of decision making under risk. Second, the tractability of the model lends itself to easily understandable axioma- tization, comparative statics, and clean application results. Third, once we understand the deterministic model, it becomes easier to understand the properties of the stochastic model and more general extensions to simplicity theory. PREFERENCE FOR SIMPLICITY 9

3.3. Axiomatization of Simplicity Representation. The axioms corresponding to this representation are described below. These axioms collectively describe empirical con- tent unique to this theory. Axiom 1 (Same-Support Independence). Given lotteries p, q ∈ ∆X with the same support, any lottery r, and any α ∈ (0, 1), p q ⇐⇒ αp + (1 − α)r αq + (1 − α)r. That is, given an ordering on two lotteries with the same support, that ordering should not reverse when an identically weighted lottery is combined with each. We discuss the intuition behind the next axiom, Axiom 2 (Cross-Support Independence) before presenting it. Suppose there are lotteries p, p0, q, q0 with p  p0 and q  q0. Then the classical independence axiom would imply that 1 1 1 1 (1) p + q  p0 + q0. 2 2 2 2 This may not be true in our setting because (a) the lottery on the left may not have the same support as the lottery on the right, and (b) it could be that p is preferred to p0 because it has fewer outcomes than p0, and not because p would be preferred to p0 by an expected utility agent. To obtain an implication where lotteries on both sides 1 1 0 1 1 0 have the same support, we may add a lottery, for example r = 4 p + 4 p + 4 q + 4 q to each side of Equation (1), and normalize weights to make each side an admissible lottery. More generally, the added lottery r could be any lottery whose support is the union of the supports of p, q, p0, and q0. To ensure that lotteries p and q are ‘legitimately’ better than p0 and q0, and not just better because they have fewer outcomes, we add the further stipulation that |support(p)| ≥ |support(q0)| and |support(q)| ≥ |support(p0)|. The reason we have |support(p)| ≥ |support(q0)| rather than |support(p)| ≥ |support(p0)| is that in the proof of Theorem 1, we will want p to be a lower-outcome lottery than p0, yet still be able to say something about consequences of mixing p and p0 with the same lottery. The final statement of the axiom says that given two lotteries p and q where the better lottery has a weakly larger support size, the agent should prefer to put more weight on the better lottery when combining p and q. It is an analog of the variant of the traditional independence axiom which states: if p q, then α ≥ β ⇐⇒ αp+(1−α)q βp+(1−β)q. Axiom 2 (Cross-Support Independence). Consider lotteries p, p0, q, q0 ∈ ∆X. Sup- pose |support(p)| ≥ |support(q0)| and |support(q)| ≥ |support(p0)|. Then, for any α ∈ (0, 1), and any lottery r whose support is the union of the supports of each of these lotteries, 1 1  1 1  p  p0 and q  q0 =⇒ α p + q + (1 − α)r  α p0 + q0 + (1 − α)r. 2 2 2 2 with strict inequality if either p p0 or q q0. In addition, if p = p0, q = q0, and p and q have identical support size, then the reverse implication also holds. 10 INDIRA PURI

This axiom includes conditions on support size which can be removed in the special case of money lotteries (Section 3.5). The next axiom, Axiom 3 (Semi-Continuity), says that as one moves from larger-support lotteries to a small-support lottery, the small-support lottery cannot be worse than the larger-support lotteries which led to it. The axiom is termed ‘Semi-Continuity’ because it is implied by the classical sequential continuity axiom.

Axiom 3 (Semi-Continuity). Consider a sequence of lotteries {pn}. If pn → p, then, for any lottery q, if pn  q for all n, then p  q. If, in addition, the pn and p have a common support size, then, for any lottery q, if pn  q for all n, then p  q. The final axiom says that the preference over objects X is strict; the preference  on lotteries over X may still contain ties.

Axiom 4 (Strict Degenerate Preference). The preference  induced on X is strict: 0 for each x, x ∈ X either δx δx0 or δx0 δx. Theorem 1 (Simplicity Axiomatization). A weak order  on ∆X admits a Simplicity Representation if and only if it satisfies Axioms 1 (Same-Support Independence), 2 (Cross- Support Independence), 3 (Semi-Continuity), and 4 (Strict Degenerate Preference).

The proof of the sufficiency of the axioms for the representation proceeds in three steps. (1) Given a subset Z ⊆ X, show that the preference  induced on lotteries with sup- P port Z can be represented by UZ (p) = x∈Z uZ (x)p(x). This step uses Axioms 1 (Same-Support Independence) and 3 (Semi-Continuity), and results on mixture sets from Herstein and Milnor (1953). (2) Show that there is a function u which has the property that, for any subset Z ⊆ X, P UZ (p) = x∈Z u(x)p(x). This step uses Axioms 1 (Same-Support Independence), 2 (Cross-Support Independence), and 4 (Strict Degenerate Preference). (3) Construct C using an iterative algorithm. It may be helpful to consult Figure 2. The cost of a lottery p is equal to its counterfactual utility in an expected utility world, P u(x)p(x), minus its actual utility in a simplicity world. From step (2), we have the counterfactual utility in an expected utility world. To construct the actual utility in a simplicity world, we proceed by induction. As a first step, normalize C(1) to be zero. On step j > 1, either there exists a j-outcome lottery p and lower-outcome lottery q with p ∼ q, or there does not. If such lotteries do exist, then since we know the actual utility of q by an earlier step, we can calculate the unique value of C(j). If such lotteries do not exist, we show that all j-outcome lotteries must be strictly worse than all k-outcome lotteries for any k < j. We then set C(j) to be large enough to reflect this. This step uses Axioms 2 (Cross-Support Independence), 3 (Semi-Continuity), PREFERENCE FOR SIMPLICITY 11

Expected Utility Theory Simplicity Theory 4-Outcome Lotteries 3-Outcome Lotteries 2-Outcome Lotteries 1-Outcome Lotteries

Utility Utility

Figure 2. Utility Range for Expected Utility v. Simplicity Theory. In EU, for finite X, for any support size i ≥ 2, the utility range of i-outcome ∗ ∗ lotteries is the open interval (u(x∗), u(x )), where x∗ is a worst and x a best outcome in X. In Simplicity Theory, the utility range of i-outcome lotteries is ∗ (u(x∗) − C(i), u(x ) − C(i)).

and 4 (Strict Degenerate Preference), as well as the Cantor-Birkhoff representation theorem. That C is a function of the lottery’s support size comes from Axiom 2 (Cross-Support Independence). In particular, it uses the idea that whenever same-support size lotteries p and q have lower-outcome equivalents p0 and q0 respectively, we may apply this axiom to show that the cost function calculated using p and p0 is identical to the cost function calculated using q and q0. The monotonicity of C comes from Axiom 3 (Semi-Continuity).

3.4. Identification and Comparative Statics. When X is a finite set, the uniqueness of a simplicity representation is not guaranteed. To see this, consider the example in Figure 1. Suppose that the three outcomes described in Figure 1 are the only possible outcomes. Then C(3) can be made arbitrarily large while still accurately representing the preference. However, the least cost function in a simplicity representation will be unique up to an affine transformation. We formalize this idea next.

Proposition 1 (Existence of a Least Cost Function). Suppose that preference  on ∆X admits a simplicity representation with Bernoulli utility u and cost function C. Then there exists a unique cost function C∗ with the following properties: (1) (u, C∗) represents . (2) C∗ ≤ C0 for any cost function C0 where (u, C0) represents . We call C∗ the least cost function corresponding to Bernoulli utility u and preference , and the function U(p) = P u(x)p(x) − C∗(|support(p)|) a least-cost simplicity representa- tion of .

Proposition 2 (Uniqueness of Least-Cost Simplicity Representations). Least-cost simplic- ity representations are unique up to an affine transformation; that is, suppose preference  on ∆X admits two least-cost simplicity representations (u, C) and (u0,C0). Then there 0 0 exist α > 0, β ∈ R such that u = αu + β and C = αC . 12 INDIRA PURI

This proposition says that least-cost simplicity representations are unique up to an affine transformation. One implication of this proposition is that it is possible to perfectly identify the least-cost complexity function and Bernoulli utility from choice data. The reason that β corresponds to u but not C is because it is the entire representation U(p) = P u(x)p(x) − C(|support(p)|) which is unique up to an affine transformation. Thus, when considering αU(p) + β, we may add β either to the Bernoulli utility u or to the cost function C; without loss of generality, and to be consistent with the expected utility literature, we add β to u. The uniqueness of the least-cost representation also allows us to conduct comparative static analysis. Just as one defines more and less risk averse in expected utility theory, we would like to define more and less complexity averse. Intuitively, more complexity averse should mean choosing smaller-outcome lotteries more often. We formalize this intuition in Definition 2. We show that this observable behavior is equivalent, in the representation, to larger jumps in the complexity cost function (Proposition 3).

Definition 2. Consider two preferences 1 and 2 on ∆X, with the following properties: each admits a simplicity representation, and for any lotteries p, q with the same number of outcomes, p 1 q ⇐⇒ p 2 q. We say that 1 is more complexity averse than 2 if, for any lotteries p and q where q has fewer outcomes than p,

q 2 p =⇒ q 1 p.

Proposition 3 (Comparative Statics). Consider two preference relations 1 and 2 on ∆X, with the following properties: each admits a simplicity representation, and for any lotteries p, q with the same number of outcomes, p 1 q ⇐⇒ p 2 q. Then the following two statements are equivalent:

(1) 1 is more complexity averse than 2. (2) For all least-cost simplicity representations (u1,C1) of 1 and (u2,C2) of 2, if u1 = u2, then C1(j) − C1(k) ≥ C2(j) − C2(k) for all 1 ≤ k ≤ j ≤ |X|.

3.5. Adding Certainty Equivalents. When each lottery has a certainty equivalent, the representation has additional desirable properties. For example, certainty equivalents ensure that the representation is unique, up to an affine transformation (Section 3.5.1). To see this, consider Figure 1. When there are only three outcomes, the cost given to three-outcome lotteries can be made arbitrarily large while still accurately representing the preference. If instead each three-outcome lottery had a certainty equivalent, then the cost function given to those three outcome lotteries has to respect those certainty equivalents. Certainty equivalents also allow us to prove an additional comparative static result (Section 3.5.1). We use certainty equivalents to axiomatize a Weak Simplicity Representation, introduced in Section 6.1, and to axiomatize the curvature of the cost function in Section 3.6.1. PREFERENCE FOR SIMPLICITY 13

To obtain certainty equivalents, have the preference  be defined over money lotteries: X = R. To ensure that each lottery has a certainty equivalent, it suffices for the Bernoulli utility u in the representation to be continuous and unbounded below. A natural additional condition is that ‘more money is better’: in the language of the representation, u is strictly increasing. We amend the Simplicity Representation introduced in Section 3.2 to include these technicalities.

Definition 3. When X = R, a preference  on ∆X is said to have a simplicity repre- sentation with certainty equivalents (u, C) if it admits a simplicity representation whose Bernoulli utility u is continuous, unbounded below, and strictly increasing.

The axioms corresponding to a simplicity representation with certainty equivalents are similar to those earlier described in Section 3.3. Axioms 1 (Same-Support Independence) and 3 (Semi-Continuity) are maintained. Axiom 4 (Strict Degenerate Preference) is re- placed by the following.

Axiom 5 (More Money is Better). Given any x, y ∈ X,

δx δy ⇐⇒ x > y.

Axiom 6 (Unboundedness). For all x, y ∈ X with δx δy, there exists z ∈ X such that, for all α ∈ (0, 1) and any lottery r with support {x, y, z}, 1 1  αδ + (1 − α)r α δ + δ + (1 − α)r. y 2 x 2 z Axioms 5 (More Money is Better) and 6 (Unboundedness) are standard, with the latter adapted to our support-relevant setting.

Axiom 7 (Singleton Continuity). Given x, y ∈ X with δx δy, there exists z ∈ X such that, for all α ∈ (0, 1) and any lottery r with support {x, y, z}, 1 1  αδ + (1 − α)r ∼ α δ + δ + (1 − α)r. z 2 x 2 y This axiom says that, if the agent were to judge lotteries as an expected utility agent would, there would be an outcome which is just as good as a 50-50 shot at two other outcomes. Note that this axiom on its own is not enough to imply the existence of certainty equivalents for 50-50 lotteries for a simplicity agent, as a simplicity agent would assign a complexity cost to the two-outcome lottery, but not to the one-outcome lottery. In the money lottery case, we can weaken Axiom 2 (Cross-Support Independence) to remove the conditions on support size, as follows.

Axiom 8 (Cross-Support Independence (Money Lotteries)). Consider lotteries p, p0, q, q0 ∈ ∆X, with p0 and q0 degenerate. Then, for any α ∈ (0, 1), and any lottery r 14 INDIRA PURI whose support is the union of the supports of each of these lotteries, 1 1  1 1  p  p0 and q  q0 =⇒ α p + q + (1 − α)r  α p0 + q0 + (1 − α)r. 2 2 2 2 with strict inequality if either p p0 or q q0. In addition, if p = p0, q = q0, and p and q have identical support size, then the reverse implication also holds.

Theorem 2 (Simplicity Axiomatization with Certainty Equivalents). Let X = R. A weak order  on ∆X admits a Simplicity Representation with Certainty Equivalents (Definition 3) if and only if it satisfies Axioms 1 (Same-Support Independence), 3 (Semi-Continuity), 5 (More Money is Better), 6 (Unboundedness), 7 (Singleton Continuity), and 8 (Cross- Support Independence (Money Lotteries)).

This theorem characterizes the simplicity representation with certainty equivalents on money lotteries. Axioms 1 (Same-Support Independence), 3 (Semi-Continuity), and 8 (Cross-Support Independence (Money Lotteries)) play similar roles as they did in the general case Theorem 1. Axioms 5 (More Money is Better) - 7 (Singleton Continuity) guarantee the existence of certainty equivalents.

3.5.1. Uniqueness and Comparative Statics. With certainty equivalents, the simplicity representation is unique up to an affine transformation. Contrast Proposition 2, where only the least-cost representation was unique.

Proposition 4 (Uniqueness of Simplicity Representation with Certainty Equivalents). Let X = R. A simplicity representation with certainty equivalents corresponding to a preference  on ∆X is unique up to an affine transformation; that is, suppose  admits two simplicity representations with certainty equivalents (u, C) and (u0,C0). Then there 0 0 exist α > 0, β ∈ R such that u = αu + β, and C = αC . We now turn to comparative statics. In the general setting, Proposition 3 provided a comparative statics result with regards to complexity. It focused on the least cost function to due the possibility of non-uniqueness of the cost function. Now that the cost function is unique, that proposition continues to hold, replacing ‘least cost function’ in that proposition with ‘cost function.’ Moreover, Proposition 3 showed that being more complexity averse is equivalent to larger marginals on the cost function. With certainty equivalents, the levels of the cost function also have a meaning in terms of the underlying preference. We document this in the following proposition.

Proposition 5 (Comparative Statics with Certainty Equivalents). Consider two prefer- ence relations 1 and 2 on ∆X, with the following properties: each admits a simplicity representation with certainty equivalents, and for any lotteries p, q with the same number of outcomes, p 1 q ⇐⇒ p 2 q. Then the following two statements are equivalent: PREFERENCE FOR SIMPLICITY 15

(1) For all x ∈ X, p ∈ ∆X, δx 2 p =⇒ δx 1 p. (In words, whenever 2 prefers a sure outcome to a lottery, 1 does too). (2) For all simplicity representations with certainty equivalents (u1,C1) of 1 and (u2,C2) of 2, if u1 = u2, then C1 ≥ C2.

Condition (1) of the proposition says that, whenever 2 prefers certainty to a lottery, 1 does as well. The proposition shows that this observed behavior is equivalent to C1 ≥ C2. Note that Proposition 5 relies on the existence of certainty equivalents. Without certainty equivalents, (2) would still imply (1) but the reverse implication need not be true. For example, suppose that X = {x1, x2, x3} and 1 and 2 on ∆X admit least-cost simplicity representations with u1(x1) = u2(x1) = 4, u1(x2) = u2(x2) = 4.5, u1(x3) = u2(x3) = 5, C1(1) = 0,C1(2) = C1(3) = 1 and C2(1) = 0,C2(2) = 1,C2(3) = 2. Then condition (1) of Proposition 5 is satisfied: both 1 and 2 prefer sure outcomes to any lottery. But it is not the case that C1 ≥ C2.

3.6. The Complexity Cost Function.

3.6.1. Curvature Axiomatization. Let  be defined on money lotteries (as in Section 3.5) and suppose  admits a simplicity representation with certainty equivalents (Theorem 2). The following proposition characterizes preference-based behavior corresponding to a convex (resp. concave) complexity cost function.

Proposition 6 (Cost Function Curvature - Axiom). Suppose , a weak order on money lotteries, admits a simplicity representation with certainty equivalents (Definition 3). Then the following are equivalent. (1) The complexity cost function is convex (resp. concave). (2) For all n ∈ N, there exist lotteries p, q, r such that |support(p)| = n+2; |support(q)| =

n+1 and |support(q)| = n; and if p ∼ δxp , q ∼ δxq , r ∼ δxr for some xp, xq, xr ∈ X,

then for any lottery s whose support is the union of the supports of p, q, r, δxr , δxq , δxp and any α ∈ (0, 1), 1 1 1  1 1 1  α r + δ + p + (1 − α)r  α δ + q + δx + (1 − α)r (resp. ). 4 2 xq 4 4 xp 2 4 r

1 2 2 1 Intuitively, since p is a size n+2 lottery, and q a size n+1 lottery, if 3 p+ 3 δxq  3 q+ 3 δxp , then we can derive inductively that C(n)+C(n+2) ≥ 2C(n+1) for all n ∈ N (the standard definition of convexity for discrete functions (Murota, 2018)), by utilizing the simplicity representation and adding and subtracting lotteries appropriately.

3.6.2. Parametric Examples. An example of a concave complexity cost function is C(|support(p)|) = α log(|support(p)|). In this example, α is a parameter governing the degree of complexity 16 INDIRA PURI aversion. The concavity of the log function implies that the agent distinguishes more be- tween smaller-support lotteries than larger-support lotteries. For example, C(2) − C(1) > C(1001) − C(1000). A convex complexity cost function, for example C(|support(p)|) = α(|support(p)|)2 would have the opposite property: the agent cares more about 1000 vs. 1001 outcomes than he does about 1 vs. 2 outcomes.

4. Stochastic Simplicity Theory In what we have done so far, the agent deterministically chooses lotteries. An alternative formulation is that the agent chooses lotteries with noise: in this interpretation, the agent choosing lotteries with fewer outcomes more often may be more naturally interpreted as a mistake, rather than an as-if preference. In particular, a lesser probability of choosing large-outcome lotteries may be due to an unwillingness to spend time processing such lotteries. It may also be due to mistakes on the part of the agent: for example, she may routinely underestimate how good large-support lotteries are. In this section, we represent and axiomatize a stochastic choice version of simplicity theory, wherein the decision maker has a lower probability of selecting a lottery which has more outcomes. We explore properties of the representation, and show that it is able to capture additional behavior that the deterministic model cannot. In particular, stochastic simplicity theory will predict that, even when picking between lotteries which each have the same number of outcomes, the number of mistakes the agent makes grows in the number of outcomes. This implies, for example, that violations of first order stochastic dominance increase as complexity grows.

4.1. Space. Let X be an arbitrary space, ∆X the set of finite-support lotteries on X, and M(∆X) the set of finite menus on ∆X. The analyst observes a choice frequency ρ on M(∆X), with ρ(q, A) denoting the probability that the agent selects lottery q from menu A.

4.2. Representation, Axioms, and Properties.

Representation 3. Choice frequency ρ on M(∆X) has a stochastic simplicity represen- tation (φ, u, C) if there exists a function W : ∆X → R++ such that W (q) (1) ρ(q, A) = P 0 for all A ∈ M(∆X) q0∈A W (q ) P  (2) W (q) = φ x∈X u(x)p(x) − C(|support(p)|) for some strictly increasing function φ : R → R, Bernoulli utility u : X → R, and weakly increasing function C : Z → R, normalized so that C(1) = 0.

This is simply a Luce model where the weighting function is a transformation of a simplicity representation. As earlier, we refer to C as the complexity cost function and u as the Bernoulli utility. We will call φ, which preserves the ordering given by (u, C) PREFERENCE FOR SIMPLICITY 17 but maps it into a Luce-friendly setting, the transformation function. We are primarily concerned with understanding u and C; the transformation function can be thought of as normalization to fit the simplicity representation into a Luce framework. The axioms corresponding to this stochastic choice model are as follows. First, there are the standard Luce axioms; this gives us the Luce representation. To obtain a simplicity representation for the weighting function, we impose further restrictions on ρ. Axiom 9 (Luce’s IIA). For any menus A, B ∈ M(∆X) and any lotteries q, r ∈ A ∩ B, whenever the probabilities are positive, ρ(q, A) ρ(q, B) = . ρ(r, A) ρ(r, B) Axiom 10 (Positivity). For any menu A ∈ M(∆X) and any lottery q ∈ A, ρ(q, A) > 0. Now define an auxiliary preference relation ∗ on ∆X off of the observable ρ as follows: for any lotteries q, r ∈ ∆X, q ∗ r ⇐⇒ ρ(q, {q, r}) ≥ 0.5. Now we impose the axioms corresponding to a Simplicity Representation (Axioms 1 (Same- Support Independence) - 4 (Strict Degenerate Preference)) on ∗. Since ∗ was defined off of observables, the axioms imposed on ∗ can be translated to axioms on the observable ρ. For example, ∗ satisfies Axiom 1 (Same-Support Independence) if and only if ρ satisfies: if lotteries p, q have the same support, then for any α ∈ (0, 1) and any lottery r, ρ(q, {p, q}) ≥ 0.5 ⇐⇒ ρ((1 − α)q + αr, {(1 − α)q + αr, (1 − α)p + αr}) ≥ 0.5. Theorem 3 (Stochastic Simplicity Model Axiomatization). Choice frequency ρ on M(∆X) has a stochastic simplicity representation if and only if ρ satisfies Axioms 9 (Luce’s IIA) and 10 (Positivity) and ∗ is a weak order satisfying Axioms 1 (Same-Support Independence), 2 (Cross-Support Independence), 3 (Semi-Continuity), and 4 (Strict De- generate Preference). This theorem states that the stochastic simplicity model is characterized by the stan- dard Luce axioms alongside the axioms corresponding to deterministic simplicity theory (Section 3).

4.2.1. Certainty Equivalents. As with deterministic simplicity theory, working with X = R, i.e. money lotteries, makes the model more tractable. Just as we characterized simplic- ity theory with certainty equivalents, we can characterize stochastic simplicity theory with certainty equivalents. The richness of R will allow for clean comparative statics results, demonstrated in Section 4.2.2. As in Section 4.2, given choice frequency ρ, define auxiliary preference relation ∗ on ∆X as: for any lotteries q, r ∈ ∆X, q ∗ r ⇐⇒ ρ(q, {q, r}) ≥ 0.5. 18 INDIRA PURI

Definition 4. When X = R, we say that ρ admits a stochastic simplicity representation with certainty equivalents if ρ admits a stochastic simplicity representation (φ, u, C) with u unbounded below, continuous, and strictly increasing.

Theorem 4. Choice frequency ρ on M(∆R) has a stochastic simplicity representation with certainty equivalents if and only if ρ satisfies Axioms 9 (Luce’s IIA) and 10 (Positivity) and ∗ is a weak order satisfying the conditions of Theorem 2. 4.2.2. Identification and Comparative Statics. Our earlier analysis of simplicity theory (Section 3.4) tells us that the Bernoulli utility u is unique up to an affine transformation, and that for a given Bernoulli utility u there exists a unique least cost function C. We appeal to these earlier propositions alongside properties of the Luce model to obtain an identification result for the stochastic choice framework. In other words, we can infer the complexity cost, expected utility component, and transformation function uniquely from the stochastic choice rule. Definition 5. A stochastic simplicity representation (φ, u, C) for ρ is said to be least-cost if C is the least cost function corresponding to u, as defined in Proposition 1. Proposition 7 (Uniqueness of Stochastic Simplicity Representation). Suppose ρ admits two least cost stochastic simplicity representations (φ, u, C) and (φ,˜ u,˜ C˜). Then there exist α > 0, β ∈ R such that u˜ = αu + β and C˜ = αC. Further, if (u, C) = (˜u, C˜), then there exists γ > 0 with φ˜ = γφ. In words, simplicity representation (u, C) is an affine transformation of simplicity repre- sentation (˜u, C˜), and onceu ˜ has been normalized to match u, the φ functions are positive scalar multiples of each other.

We now turn to comparative static analysis. In deterministic simplicity theory (Section 3.4), we held u fixed and varied C to understand how a higher complexity cost function translates to observable behavior. Here, we need to hold both φ and u fixed to understand how variations in C alone affect observable behavior. Relative to the deterministic notions of more complexity averse (Definition 2, Proposition 5), the definition of more complex- ity averse in the stochastic context includes one additional condition. This additional condition, that the two choice frequencies being considered agree in value on degenerate lotteries, will ensure that φ is held fixed.

0 Definition 6. Let ρ and ρ be two choice rules on M(∆R) which admit stochastic sim- plicity representations with certainty equivalents. Say that ρ is more complexity averse than ρ0 if: 0 (1) For all outcomes x, y ∈ X, ρ(δx, {δx, δy}) = ρ (δx, {δx, δy})). (2) For all lotteries q, r with the same support, ρ(q, {q, r}) > ρ(r, {q, r}) ⇐⇒ ρ0(q, {q, r}) > ρ0(r, {q, r}). PREFERENCE FOR SIMPLICITY 19

0 (3) For all lotteries q and outcomes x ∈ X, ρ(δx, {q, δx}) ≥ ρ (δx, {q, δx}).

In words, ρ and ρ0 agree in value on degenerate lotteries, and they also agree on ordering over lotteries of the same support. But, when choosing between any risky lottery and a safe choice, ρ chooses the safe choice more frequently than ρ0. Observe that the third condition speaks only to a type of local complexity aversion. We will use the continuity and unboundedness of the Bernoulli utility to move from local complexity aversion (that ρ picks safe options at least as frequently as ρ0) to global com- plexity aversion (that the cost function C corresponding to ρ is everywhere at least as large as the cost function C0 corresponding to ρ0). Intuitively, one can think of an analog in expected utility theory, where local risk aversion combined with continuity implies global risk aversion. The first two conditions translate to ρ and ρ0 having the same transformation func- tion φ if both admit stochastic simplicity representations with certainty equivalents. In particular, the continuity and unboundedness of the Bernoulli utility imply that if two transformation functions agree on degenerate lotteries, they must agree everywhere.

Proposition 8 (Comparative Statics for Stochastic Simplicity Representation). Let ρ and ρ0 be two choice rules which admit stochastic simplicity representations with certainty equivalents. Then the following are equivalent (1) Choice rule ρ is more complexity averse than choice rule ρ0 (2) For all stochastic simplicity representations (φ, u, C) of ρ and (φ0, u0,C0) of ρ0, if φ = φ0 and u = u0, then C ≥ C0. Further, at least one such pair of representations exists.

This proposition says that a larger complexity cost function translates to higher-support lotteries being chosen less frequently than safe options, all else equal.

4.2.3. Properties of the Transformation Function. In the working memory interpretation of simplicity theory (Section 5.4.1), the complexity cost is exogenous, a product of in-built and environment-driven working memory capacity. As such, one may wonder how choice frequency changes as working memory (i.e. the complexity cost function) increases or decreases. Relatedly, because lotteries with more outcomes have a higher complexity cost, one may wonder how choice between a set of lotteries which all have 5 outcomes differs from choice between a set of lotteries which all have 2 outcomes, if at all. The curvature of the transformation function φ is crucial in calculating this comparative static. When φ is log-convex, as complexity increases, the number of mistakes the agent makes increases. In particular, the agent chooses worse or dominated lotteries more often – even when choosing between lotteries of the same support size. We start this section by computing the comparative static of choice frequency with 20 INDIRA PURI respect to complexity cost. We then use similar analysis to show that when φ is log- convex, the number of mistakes increases as complexity increases.

Definition 7. A complexity-contingent choice function (φ, u): RN → R maps complexity cost C to stochastic simplicity representation (φ, u, C).

Definition 8. Say that complexity-contingent choice function (φ, u) becomes more con- fused as complexity increases if for any two lotteries q, r with P u(x)q(x) < P u(x)r(x), (φ, u, C)(q, {q, r}) ≥ (φ, u, C0)(q, {q, r}) whenever C0 − C =  > 0.

In words, as complexity increases, (φ, u) becomes more confused if, even with lotteries of the same support size, the frequency with which the worse lottery is picked increases. Worse is defined in expected utility terms. A special case is first-order stochastic dom- inance. Expected utility respects first-order stochastic dominance. Therefore if (φ, u) becomes more confused as complexity increases, then (φ, u) chooses dominated lotteries more often as complexity increases.

Proposition 9 (Curvature of φ corresponds to confusion comparative static). Let (φ, u) be a complexity-contingent choice function with φ at least once differentiable. The following are equivalent. (1) (φ, u) becomes more confused as complexity increases. (2) φ has increasing logarithmic derivative.

φ0(y) Observe that an increasing logarithmic derivative φ(y) is equivalent to φ being log- convex whenever φ is twice-differentiable.

Corollary 1. When φ is twice differentiable, (φ, u) becomes more confused as complexity increases if and only if φ is log-convex.

This analysis has an analog in standard expected utility theory and risk aversion. In theory, EU admits both risk aversion and risk loving, and risk attitude corresponds to the curvature of the Bernoulli utility. But think that risk aversion is more realistic, and therefore in applications use concave Bernoulli utility. Similarly, stochastic simplicity theory admits both more and less confusion with higher complexity, and which occurs corresponds to the curvature of the transformation function. But we think that more confusion is more realistic, so when choosing functional forms in empirical work, one should choose φ which has increasing logarithmic derivative. For example, the exponential family of functions φ(y) = exp(|y|α), α ≥ 1, will have this property. Since complexity increases in number of outcomes, similar analysis shows that people should become more confused as the number of outcomes increases, meaning that when choosing between lotteries which each have 100 outcomes, people may pick dominated PREFERENCE FOR SIMPLICITY 21 choices more often than when choosing between lotteries which each have 2 outcomes2.. We state this result formally below.

Proposition 10 (As outcomes increase, people make more mistakes). Suppose choice frequency ρ admits a stochastic simplicity representation (φ, u, C) with φ having increasing logarithmic derivative. Then for any lotteries q, q0, r, r0 where P u(x)q(x) = P u(x)q0(x) < P u(x)r(x) = P u(x)r0(x), and |support(q)| = |support(r)| < |support(q0)| = |support(r0)|, ρ(q, {q, r}) ≤ ρ(q0, {q0, r0}).

Corollary 2 (As outcomes increase, violations of first-order stochastic dominance in- crease). Suppose choice frequency ρ admits a stochastic simplicity representation (φ, u, C) with φ having increasing logarithmic derivative. Then ρ(q, {q, r}) ≤ ρ(q0, {q0, r0}). for any lotteries q, q0, r, r0 with the following properties: (1) Lotteries q and q0 have identical CDFs. Lotteries r and r0 have identical CDFs. (2) Lotteries q and q0 first-order stochastically dominate lotteries r and r0, respectively. (3) The common support size of q and r is smaller than the common support size of q0 and r0: |support(q)| = |support(r)| < |support(q0)| = |support(r0)|

This proposition says that, whenever φ is log-convex, people make more mistakes picking within lotteries of the same support size as that support size increases. For example, this theory predicts that people choose dominated lotteries more often when choosing between lotteries which each have fifty outcomes than when choosing between lotteries which each have two outcomes.

5. Implications We apply simplicity theory and stochastic simplicity theory (Sections 3.2, 4) to a series of applications below. For expositional brevity, we use the term ‘simplicity theory,’ but the reader should keep in mind that stochastic simplicity theory has identical implications except where otherwise specified. For example, one can think of the stochastic simplicity model as predicting behavior for a representative agent whose actions stand in for the percentage of the population which behaves in a certain way.

5.1. Risk Aversion.

2One could think about alternative comparative static formulations, for example, the agent making more mistakes as the number of lotteries within a menu grows. Our formulation considers behavior as complexity of lotteries within a menu increases, which maps into recent experimental evidence described in Section 5. Alternative formulations may be a topic of future research. 22 INDIRA PURI

5.1.1. Measurement. Many ways of eliciting risk aversion, including the Barsky, Juster, Kimball, and Shapiro (1997) methodology often used in finance and the Becker, DeG- root, and Marschak (1964) mechanism of experimental economics, work by comparing one-outcome and two-outcome lotteries. Consider an individual whose preferences admit a simplicity representation. Any method of eliciting her risk aversion by examining lotteries with different numbers of outcomes may conflate risk aversion with complexity aversion. To show this, suppose one is trying to elicit a certainty equivalent for lottery p. The relevant equation is X u(xp) = u(x)p(x) − C(|support(p)|), where xp is the certainty equivalent of lottery p. A lower certainty equivalent xp could reflect the curvature of the Bernoulli utility, or it could reflect the degree to which the individual prefers one-outcome lotteries to |support(p)|-outcome lotteries. Risk aversion generally pre-supposes a deterministic representation (e.g. expected util- ity theory), and therefore the risk aversion implications use the deterministic simplicity theory model only.

5.1.2. Risk Aversion and Intelligence. A surprising but repeated experimental finding is that people with higher cognitive ability are less risk averse (Dohmen, Falk, Huffman, and Sunde (2018), Dohmen, Falk, Huffman, and Sunde (2010), Rustichini, DeYoung, Anderson, and Burks (2016), Burks, Carpenter, Goette, and Rustichini (2009), Boyle, Yu, Buchman, Laibson, and Bennett (2011)). Simplicity theory offers a possible explanation: higher cognitive ability individuals may be less complexity averse. They may be more willing to consider larger outcome lotteries, for example because higher cognitive ability individuals have larger working memory capacity (discussed further in Section 5.4.1). As discussed in Section 5.1.1, lower complexity aversion would manifest as lower ‘risk aversion’ if certainty equivalent measures were used to elicit risk attitudes.

5.1.3. Risk Aversion and Cognitive Load. Another experimental finding is that, within the same person, risk aversion increases with cognitive load (Whitney, Rinehart, and Hinson (2008), Benjamin, Brown, and Shapiro (2013), Deck and Jahedi (2015), Schildberg-H¨orisch (2018)). This is at odds with the popular conception of risk aversion as a person-specific characteristic. Simplicity theory reconciles the two as follows. It could be that higher cognitive load increases complexity aversion (i.e. the cost associated with larger outcome lotteries) without changing the curvature of the Bernoulli utility function.

5.2. Experiments. In addition to the psychology, marketing, and economics experiments described in the introduction, simplicity theory predicts further experimental phenomena.

5.2.1. Violations of Dominance as in the Uncertainty Effect. The uncertainty effect (Gneezy, List, and Wu (2006), Simonsohn (2009), Wang, Feng, and Keller (2013), Yang, Vosgerau, PREFERENCE FOR SIMPLICITY 23 and Loewenstein (2013)) is a phenomenon in which agents prefer a dominated lottery. This dominated lottery has the feature that it contains only one outcome, whereas the dominating lottery contains two. Gneezy, List, and Wu (2006) argue that this phenomenon cannot be accounted for by prospect theory, cumulative prospect theory or expected utility. In contrast, a simplicity agent with C(2) much larger than C(1) would exhibit the uncertainty effect.

5.2.2. Event Splitting. By event splitting (Birnbaum (2008)), we refer to the following phenomenon: an agent prefers the lottery (outcome x with p1 probability, outcome y with 1 − p1 probability) to the lottery (outcome x with p1/2 probability, outcome x with p1/2 probability, outcome y with 1 − p1 probability). For example, an agent who prefers lottery A = ($4 with 50% probability, $8 with 50% probability) to the lottery B = ($4 with 25% probability, $4 with 25% probability, $8 with 50% probability) would be said to exhibit event splitting. As discussed by Bernheim and Sprenger (2019), event splitting is not accommodated by prospect theory, which predicts that agents would have the reverse preference: due to non-linear weighing of probabilities, a prospect theory agent would prefer lottery B to lottery A. Event splitting is also not accommodated by expected utility theory or cumulative prospect theory (Birnbaum, 2008). Simplicity theory can accommodate event splitting as follows. Let  > 0 be arbitrarily small. Then lottery B = ($4 with 25% probability, $4 +  with 25% probability, $8 with 50% probability) has three outcomes. In contrast, lottery A has two outcomes. A simplicity agent with C(3) > C(2) would prefer lottery A to lottery B, for  small enough. This would not be the case for an expected utility or cumulative prospect theory agent, because lottery B dominates lottery A. It would not be the case for a prospect theory agent for sufficiently small  because of non-linearity of the probability weighting function. This approximate event splitting is tested in Bernheim and Sprenger (2019), who find, as simplicity theory predicts, that subjects tend to prefer two-outcome lotteries like A to dominating three-outcome lotteries like B.

5.2.3. More mistakes as number of outcomes increases. As discussed in Section 4.2.3, the stochastic simplicity model with log-convex transformation function predicts that the agent makes more mistakes as outcomes increases. This means, for example, that the agent chooses worse lotteries more often when picking between non-degenerate lotteries than when picking between certain outcomes. This prediction tallies with recent work by Martinez-Marquina, Niederle, and Vespa (2019), who find that subjects behave less opti- mally when put into an uncertain environment. Precisely, the experiment compares two scenarios. In one, the value of a firm is fixed and the agent decides how much she wants to buy by submitting her willingness to pay for the firm (‘deterministic treatment’). In the other scenario, the value of the firm has a 50% change of being low and a 50% chance of being high, and the agent submits her willingness to pay for the firm, after which the 24 INDIRA PURI value of the firm is realized and the agent receives her payoff (‘probabilistic treatment’). The finding is that the proportion of subjects submitting payoff-maximizing willingness- to-pays (where the objective function is given to the subjects by the experimenters) is significantly lower in the probabilistic treatment than in the deterministic treatment. One maps this to stochastic simplicity theory by observing that any submitted willingness to pay induces a lottery over payoffs. In the probabilistic treatment, the payoff is stochastic, and in particular has twice as many possible outcomes as in the deterministic treatment. Observe that one does not need the full power of simplicity theory to accommodate this experimental result: it is enough for a theory to predict more mistakes on non-degenerate lotteries than on certain outcomes. Still, to our knowledge, existing stochastic choice the- ories such as random expected utility theory cannot accommodate this finding. Simplicity theory predicts it.

5.3. Complex Objects as Large-Support Lotteries. One interpretation of the above theory is that the agent is choosing between lotteries. This has been the interpretation used so far in the paper. Another interpretation is that the agent is choosing between objects. Objects are represented ‘as if’ they are lotteries, with complex objects mapping naturally to larger support lotteries.

Figure 3. Binary Option Application. The figure depicts the value of an underlying asset on the horizontal scale. Binary option B pays off $(S2 − S1) if the value of the underlying asset surpasses S2, and 0 otherwise. Buying a call option with strike price S1 and selling a call option with strike price S2 results in the payoff structure O: $0 if the value of the asset is below S1, $x if the value of the asset is S1 + x for x ∈ [0,S2 − S1], and $(S2 − S1) if the value of the asset is above S2. A simplicity agent may prefer the dominated binary option B to the dominating option mixture O.

5.3.1. Binary Options. A binary option is a financial product which works as follows. It gives the holder a specified amount of money if a specified event occurs on a specified date, and pays out no money if the event does not occur on that date. For example, a contract which gives its buyer $5 if the stock price of Apple is above $200 on March 1, 2020, is a binary option. A binary option can be thought of as a two-outcome lottery. Any binary PREFERENCE FOR SIMPLICITY 25 option is dominated by an appropriate mixture of options. For example, suppose the agent buys a call option on Apple stock with a strike price of $195 and sells a call option with a strike price of $200, both with expiry March 1, 2020. This gives the agent $5 if the price of Apple exceeds $200, and a strictly positive amount of money if the strike price of Apple is between $195 and $200. This option mixture dominates the binary option; a more general example is presented in Figure 3. Simplicity theory predicts that some investors may have a higher certainty equivalent for the binary option lottery than for the option mixture lottery. In particular, this means they have a higher willingness to pay for the former. In the language of Figure 3, a simplicity agent may be willing to pay more for binary option B than option mixture O, although O gives weakly higher payoffs for every realization of the underlying asset price3. That retail investors have an unusual attraction to binary options relative to other fi- nancial products is consistent with commentator analysis and with the history of binary options. When they became available to retail investors, binary options were exceedingly popular despite widespread fraud and predatory odds (Pape (2010), Murphy (2018), Eu- ropean Securities and Markets Authority Press Release (2018)). To protect retail investors from buying binary options, the sale of binary options to retail investors was banned in Eu- rope in 2018 (European Securities and Markets Authority Press Release, 2018), and Israel in 2017 (Weinglass, 2017). with several countries including the UK and Denmark taking more stringent measures to permanently ban the products (Financial Conduct Author- ity Press Release (2019), Finanstilsynet Press Release (2019)). Commentators attributed the popularity of binary options among retail investors to several causes, including their simplicity. For example, the popular investment-education website Investopedia writes: “Binary options are deceptively simple to understand, making them a popular choice for low-skilled traders.” (Mitchell, 2019). Nadex, a US binary options exchange, writes: “If you think about it, binary options reflect the way we think about things in our daily life. Things either happen or they don’t. With a binary option, payouts reflect that and are always all or nothing at expiration. You’ll find we like to keep trading simple.” (Nadex, 2019). In ongoing follow-up work (Goodman and Puri, 2020), we examine the retail market for binary options. Over the course of a year, traded dominated binary options are frequently more expensive than dominating option mixtures.We test for standard financial explana- tions including random noise, differential collateral requirements, investor knowledge, and different institutional details; our early results suggest that none of these can fully explain the price premium for binary options. If the binary option price premium arises from a higher willingness to pay for binary options on the part of retail investors, this higher will- ingness to pay contradicts prospect theory, cumulative prospect theory, expected utility

3It may seem like O has a continuum of outcomes; in practice, its support is discrete because exchanges usually have discrete minimum price fluctuations for any given asset. 26 INDIRA PURI theory, and any theory that respects first-order stochastic dominance. Simplicity theory can account for the results.

5.4. Cognitive Psychology.

5.4.1. Predictions Arising from Relationship to Working Memory. Working memory refers to the system humans use to temporarily hold and process information (Baddeley, Eysenck, and Anderson, 2015). In normal circumstances, it is thought to have limited capacity (Bad- deley, Eysenck, and Anderson, 2015). Capacity is measured in ‘chunks,’ which are defined in the cognitive psychology literature to be ‘meaningful items’ (Cowan, 2010) Early re- search put the capacity at seven chunks (Miller, 1956) while contemporary research puts capacity nearer to four (Cowan (2001), Gilchrist et al. (2008), Baddeley, Eysenck, and Anderson (2015)). If the task at hand requires storage of information above working memory capacity, recent research appears to suggest that additional cognitive resources are recruited, at weakly positive cognitive cost (Jaeggi et al., 2007). Map this to simplicity theory as follows. Each (outcome, probability) pair in a lottery is a chunk. Storing more chunks is costly in that it crowds out storage of other information. Storage of chunks beyond the usual capacity of working memory also requires additional cognitive cost. The complexity cost function C(i) in Representation 1 then reflects cumu- lative cost of storing i chunks. This interpretation of complexity cost ties into puzzling findings on decision making un- der uncertainty. Working memory capacity tends to be higher in individuals with greater cognitive ability (Fukuda et al. (2010), Shipstead et al. (2016)). This is consistent with such individuals exhibiting lower risk aversion (Section 5.1.2). Working memory capacity decreases under cognitive load by definition. Simplicity theory therefore predicts that the same individual will appear more risk averse as cognitive load increases (Section 5.1.3). This interpretation of simplicity theory also generates predictions about which indi- viduals are likely to prefer simplicity. In particular, specialists choosing between familiar lotteries may be less likely to prefer simplicity. Individuals making unfamiliar decisions may be more likely to prefer simplicity. This is because a ’meaningful unit’ is different for a layperson than for an expert performing tasks in her area of expertise (Chase and Er- icsson (1982), Anderson (1995), Curby and Gauthier (2007)). For example, a hedge fund employee may be so familiar with options that, to her, an option is a single meaningful unit. In contrast, a retail investor less familiar with options may regard each (outcome, probability) pair as a chunk. The retail investor would then prefer simplicity in investing, while the hedge fund employee may not.

6. Extensions As discussed earlier, a simplicity agent trades off quality of the lottery, as measured by its expected utility value, and the cognitive effort of considering the lottery. Of course, PREFERENCE FOR SIMPLICITY 27 there may be settings in which trading off goodness and complexity may be less relevant; there may be applications we have not explored; and there may be more involved models we could have used. This section explores each of these in turn. In Section 6.1, we generalize the simplicity model in a way that may be more appropriate in certain settings, for example, settings in which one lottery is obviously better than another. Section 6.2 discusses further possible extensions which may be of interest, including: how to combine simplicity theory and prospect theory (Section 6.2.2), how to think about simplicity theory and time-based decisions (Section 6.2.1), and how to apply simplicity theory to search behavior (Section 6.2.3).

6.1. Weak Simplicity Representation. In this section we discuss two settings where a simplicity model may not be appropriate, and provide an extension which may be better able to accommodate them. This extension is termed a Weak Simplicity Representation (Representation 2; in it, an agent’s cognitive cost may depend more generally on the support of a lottery rather than the support size only.

Representation 2. When X = R, a preference  on ∆X is said to have a Weak Simplicity Representation if it can be represented by a function X U(p) = u(x)p(x) − C(support(p)) x∈support(p) where u : X → R, termed the Bernoulli utility, is continuous, unbounded below, and strictly increasing, and C : {Z}{Z⊆X is finite} → R satisfies: [1] C(1) = 0. [2] C is monotonic: if support(p) ⊆ support(q), then C(support(p)) ≤ C(support(p)).

The axioms corresponding to a weak simplicity representation are identical to those corresponding to Definition 3, in Section 3.5 with Axiom 2 (Cross-Support Independence) replaced by Weak Cross-Support Independence and Axiom 3 (Semi-Continuity) replaced by Weak Semi-Continuity. The only difference in Axiom 12 (Weak Semi-Continuity) relative to Axiom 3 (Semi- Continuity) is that it refers to support rather than support size. The difference between Axiom 11 (Weak Cross-Support Independence) and Axiom 2 (Cross-Support Independence) is (1) that it now refers only to certainty equivalents rather than to all lotteries, and (2) that it does not include the final statement of the axiom.

Axiom 11 (Weak Cross-Support Independence). Consider lotteries p, q ∈ ∆X which have the same support. Then, for any α ∈ (0, 1), any lottery r satisfying support(r) = support(p) ∪ support(q) ∪ {x, y}, and any outcomes x, y ∈ X, 1 1  1 1  p ∼ δ and δ ∼ q =⇒ α p + δ + (1 − α)r ∼ α δ + q + (1 − α)r. x y 2 2 y 2 x 2 28 INDIRA PURI

Axiom 12 (Weak Semi-Continuity). Consider a sequence of lotteries {pn}, where all the pn have the same support. If pn → p, then, for any lottery q, if pn  q for all n, then p  q.

If, in addition, p has the same support as the pn, then, for any lottery q, if pn  q for all n, then p  q.

Having weakened the independence axioms, we add another axiom which was earlier implied by Axiom 11 (Weak Cross-Support Independence).

Axiom 13 (Pairwise Consistency). Given x, y ∈ X, 2 1 1 2 δ δ ⇐⇒ δ + δ δ + δ . x y 3 x 3 y 3 x 3 y In words: given a lottery with two outcomes, the agent prefers to put more weight on the outcome she values more.

Theorem 5 (Weak Simplicity Axiomatization on Money Lotteries). A weak order  on ∆X admits a Weak Simplicity Representation if and only if it satisfies Axioms 1 (Same-Support Independence), 5 (More Money is Better), 6 (Unboundedness), 7 (Sin- gleton Continuity), 11 (Weak Cross-Support Independence), 12 (Weak Semi-Continuity), and 13 (Pairwise Consistency).

After replacing |support(p)| by support(p), the uniqueness and comparative static prop- erties are identical to those in Section 3.5.1, with identical proofs. Two settings which may be better accommodated by a Weak Simplicity model rather than a Simplicity model are as follows. The first is when one lottery is obviously better than another. For our definition of obvious, we use the concept of working memory, which can be thought of as one micro-foundation of the cost function (Section 5.4.1). We’ll de- fine a relationship between two lotteries as obvious if the comparison requires no working memory, i.e. no computation. Note that this notion is much more demanding than the notion of first-order stochastic dominance, the determination of which may take a good deal of computation and hence working memory.

Definition 9. Suppose X is endowed with an ordering ≥. We say lottery p ∈ ∆X is obviously better than lottery q ∈ ∆X if, for all xp ∈ support(p) and xq ∈ support(q), xp > xq. We denote this relationship between p and q by support(p) > support(q).

This definition says that if one lottery’s support is strictly higher than another lottery’s support, the first lottery is obviously better. The intuition behind this definition is that, if lotteries have overlapping support, then the agent has to consider the probabilities in- volved in each lottery, which may consume working memory. The Weak Simplicity Representation is able to respect the notion of obviously better. For example, it allows C(support(p)) = C(support(q)) if support(p) > support(q). The PREFERENCE FOR SIMPLICITY 29 agent would then act like an expected utility agent over p and q and therefore will choose the obviously better lottery. For situations in which the agent is selecting between sets of lotteries p and q where none is obviously better than another, the Weak Simplicity Rep- resentation allows C(support(q)) ≤ C(support(p)) if support(q) ⊆ support(p), respecting the spirit of the Simplicity Representation while still allowing the cost function to respect the notion of obviously better for cases in which that is relevant. The second setting in which a Weak Simplicity model may be preferred to a Simplicity model, is when one wishes to treat many outcomes which are close to each other as a single outcome. For example, perhaps a lottery which gives $1 with 40% probability and $3.99, 4.00, 4.01 each with 20% probability should be thought of as having two outcomes rather than four. The Weak Simplicity Representation allows such grouping, for example, max support(p)−min support(p) C(support(p)) = 0.5 , the smallest number of intervals of length 0.5 needed to cover the support of p. As another example, a degenerate lottery which pays off a year from now has a large support if one accounts for uncertainty over the inflation rate; by using a Weak Simplicity Representation rather than a Simplicity Representation, one can treat the lottery as having one outcome instead of many. Thus, a Weak Simplicity Representation is able to accommodate small perturbations to payoffs.

6.2. Directions for Future Research.

6.2.1. Simplicity Theory with Time Data. Suppose that, similar to Cerreia-Vioglio et al.

(2018), one has data on the agent’s choice over lotteries t in which she is forced to pick between any two lotteries under exogenous time limit t. In this case, one could think about axiomatizing and exploring the properties of a model where t can be represented by X Ut(p) = u(x)p(x) − C(|support(p)|)exp(−t), where C is non-negative and increasing. This representation says that, as t → ∞, i.e. as the agent has infinite time to make her decision, she behaves similarly to an expected utility agent. But when she has to choose under time pressure – when t is small – she behaves like a simplicity agent. In this model, violations of dominance come both from possible cognitive mistakes, as sketched in Section 5.4.1, and from time pressure.

6.2.2. Combining Simplicity Theory and Prospect Theory. Whereas prospect theory (Tver- sky and Kahneman (1979)) has a probability weighting function and loss aversion, sim- plicity theory as developed does not. Simplicity theory instead penalizes the number of outcomes in a lottery. One can distinguish between the two theories in a number of ways; one such way is event-splitting effects, as discussed in Section 5.2.2. Other ways are testing the simplicity axioms in the laboratory; testing the uncertainty effect (Section 5.2.1); or testing the prediction that households will make particular dominated financial decisions (Section 5.3.1). 30 INDIRA PURI

If one wanted to combine prospect theory and the complexity cost function, one way of doing so is to have have a probability weighting function, loss aversion, and a penalty for the support of the lottery. That is, letting X = {b1, .., bn, y1, .., ym} ⊆ R with b1 < ... < P + bn < 0 ≤ y1 < .., ym, the utility of a lottery p ∈ ∆X evaluates to x∈X:x>0 u (x)w(p(x))+ P − + x∈X:x<0 u (x)w(p(x)) − C(|support(p)|), where u denotes the Bernoulli utility used for gains, u− the Bernoulli utility used for losses, w the probability weighting function, and C the complexity cost function. Similarly, for cumulative prospect theory (Tversky Pm Pn and Kahneman (1992)), U(p) = i=1 w(Pr(p ≥ yi))(u(yi) − u(yi−1)) + j=1 w(Pr(p ≤ bi))(u(bi) − u(bi+1)) − C(|support(p)|). To combine prospect theory with weak simplicity theory (Section 6.1), replace C(|support(p)|) with C(support(p)) in the above expressions. We leave detailed exploration of the theoretical properties of such a combination to future research.

6.2.3. Applying Simplicity Theory to Accommodate Satisficing Behavior. Simplicity the- ory may accommodate under-search behavior as follows. Suppose that the agent is per- forming search in which, should he continue searching, he loses his current best option. Then the action ‘do not search’ is a degenerate lottery whose payoff is the current best option. The option ‘search’ is a lottery with many possible outcomes. A simplicity agent will be biased towards ‘do not search,’ and in particular will always under-search relative to an expected utility agent. This behavior may have applications to the default bias, a tendency to choose the default option rather than searching for an alternative (Madrian and Shea (2001), Thaler and Sunstein (2003), Johnson and Goldstein (2003), Thaler and Benartzi (2004)). For example, the working-memory interpretation of simplicity theory (Section 5.4.1) predicts that when under higher cognitive load, agents choose the default option more often as their penalization for large-outcome lotteries increases. This associ- ation is borne out in the data: agents choose the default option more when under higher cognitive load (Huh, Vosgerau, and Morewedge (2014)).

7. Conclusion In response to experimental evidence across several literatures, this paper introduced simplicity theory. In simplicity theory, agents assign a complexity cost to lotteries with more outcomes. We introduced and axiomatized a representation capturing this idea (Sec- tions 3.2, 3.3) and studied its theoretical properties including uniqueness (Section 3.4), comparative statics (Section 3.4), and curvature (Section 3.6), and provided paramet- ric examples (Section 3.6). Section 4 introduced stochastic simplicity theory, including identification and comparative statics results. While the axioms of the stochastic and deterministic models are similar, the stochastic model allows more more nuance in predic- tions. For example, the stochastic model allows for the frequency of mistakes to increase as the number of outcomes increases. PREFERENCE FOR SIMPLICITY 31

Section 5 further discussed the relationship between simplicity theory and laboratory and real-world behavior. We showed that simplicity theory predicts several classes of lab- oratory experiments (Section 5.2), as well as subpar decisions regarding financial products (Section 5.3.1). We argued that simplicity theory predicts puzzling stylized facts about risk aversion, including a relationship between risk aversion and intelligence, and risk aver- sion and cognitive load (Section 5.1). We mapped simplicity theory to research on working memory (Section 5.4). In Section 3.5, we provided both axioms and conditions on the simplicity representa- tion which would guarantee that every lottery has a certainty equivalent. We used these certainty equivalents to further provide further uniqueness and comparative statics results (Section 3.5.1), and to study the cost function (Section 3.6). Section 6 discussed extensions to more general models. For example, weak simplicity theory (Section 6.1), wherein agents prefer lotteries with fewer outcomes but may do so asymmetrically (Section 6.1), accommodates settings in which one choice is obviously bet- ter than another. Simplicity theory is not only consistent with much prior research, it generates a large number of testable predictions. There are many possible extensions to this paper. First, one could test whether people behave according to simplicity theory, for example by test- ing its axioms (Section 3.3) or by testing the predictions described in Section 5. Second, one could explore the extent to which a preference for simplicity affects pricing in real- world markets. Third, simplicity theory can be extended to dynamic settings. Fourth, simplicity theory can be incorporated into existing models where expected utility is used, with an eye towards examining how predictions change. Further extensions are sketched in Section 6.2. These include combining prospect theory with simplicity theory; applying simplicity theory to satisficing behavior; and modeling an as-if preference for simplicity under time pressure. 32 INDIRA PURI

References

Anagol, Santosh, Shawn Cole, and Shayak Sarkar. 2016. Understanding the Advice of Commissions- Motivated Agents: Evidence from the Indian Life Insurance Market, Review of Economics and Statistics 99. Anderson, J. R. 1995. Cognitive Psychology and its Implications (4th Ed.)., W. H. Freeman and Com- pany. Andreoni, James and Charles Sprenger. 2012. Uncertainty Equivalents: Testing the Limits of the Inde- pendence Axiom, Working Paper. Anh, David and Todd Sarver. 2013. Preference for flexibility and random choice, Econometrica 81. Baddeley, Alan, Michael W. Eysenck, and Michael C. Anderson. 2015. Memory, Psychology Press. Barsky, Robert B., F. Thomas Juster, Miles S. Kimball, and Matthew D. Shapiro. 1997. Preference parameters and behavioral heterogeneity: an experimental approach in the health and retirement study, Quarterly Journal of Economics 112. Becker, Gordon M, Morris H. DeGroot, and Jacob Marschak. 1964. Measuring utility by a single-response sequential method, Behavioral Science 9. Benjamin, Daniel J., Sebastian A. Brown, and Jesse M. Shapiro. 2013. Who Is ‘Behavioral‘? Cognitive Ability and Anomalous Preferences, Journal of the European Economic Association 11. Bernheim, B. Douglas and Charles Sprenger. 2019. Direct tests of cumulative prospect theory, Working Paper. Birnbaum, Michael H. 2008. New paradoxes of risky decision making, Psychological Review 115. Block, H.D and J Marshak. 1960. Random orderings and stochastic theories of responses, In: Olkin, I. (ed.) Contributions to Probability and Statistics, Stanford, Stanford University Press, 97-132. Bordalo, Pedro, Nicola Gennaioli, and Andrei Shleifer. 2012. Salience Theory of Choice Under Risk, Quarterly Journal of Economics 127. Boyle, Patricia A, Lei Yu, Aron S Buchman, David I Laibson, and David A Bennett. 2011. Cognitive function is associated with risk aversion in community-based older persons, BMC Geriatrics 11. Burks, Stephen V., Jeffrey P Carpenter, Lorenz Goette, and Aldo Rustichini. 2009. Cognitive skills affect economic preferences, strategic behavior, and job attachment, PNAS 106. Callen, Michael, Mohammad Isaqzadeh, James D. Long, and Charles Sprenger. 2014. Violence and Risk Preference: Experimental Evidence from Afghanistan, American Economic Review 104, 123-148. Cerreia-Vioglio, Simone, David Dillenberger, and Pietro Ortoleva. 2015. Cautious expected utility and the certainty effect, Econometrica 83, 693-728. Cerreia-Vioglio, Simone, Fabio Maccheroni, Massimo Marinacci, and Aldo Rustichini. 2018. Multinomial logit processes and preference discovery: inside and outside the black box, Working Paper. Chase, W. G. and K. A. Ericsson. 1982. Skill and working memory, In G. H. Bower (Ed.), The psychology of learning and motivation (Vol. 16). Chernev, Alexander, Ulf B¨ockenhold, and Joseph Goodman. 2015. Choice overload: a conceptual review and meta-analysis, Journal of Consumer Psychology 25. Cohen, Mich´ele.1992. Security level, potential level, expected utility: A three-critieria decision model under risk, Theory and Decision 2. Cohen, Mich´eleand Jean-Yves Jaffray. 1988. Certainty Effect Versus Probability Distortion: An Experi- mental Analysis of Decision Making Under Risk, Journal of Experimental Psychology 14, 554-560. PREFERENCE FOR SIMPLICITY 33

Cowan, N. 2001. The magical number 4 in short-term memory: A reconsideration of mental storage capacity., Behavioral and Brain Sciences 24. Cowan, Nelson. 2010. The Magical Mystery Four: How is Working Memory Capacity Limited, and Why?, Current Directions in Psychological Science 19. Curby, Kim M. and Isabel Gauthier. 2007. A visual short-term memory advantage for faces, Psychonomic Bulletin & Review 14. Deck, Cary and Salar Jahedi. 2015. The effect of cognitive load on economic decision making: A survey and new experiments, European Economic Review 78, 97-119. Dohmen, Thomas, , David Huffman, and Uwe Sunde. 2010. Are risk aversion and impatience related to cognitive ability?, American Economic Review 100. . 2018. On the relationship between cognitive ability and risk preference, Journal of Economic Perspectives 32. Egan, Mark. 2018. Brokers vs. Retail Investors: Conflicting Interests and Dominated Products, Journal of Finance Forthcoming. Ergin, Haluk. 2003. Costly contemplation, Mimeo. Ergin, Haluk and Todd Sarver. 2010. A Unique Costly Contemplation Representation, Econometrica 28, 1285 - 1339. Etchart-Vincent, Nathalie. 2009. Probability weighting and the ‘level’ and ‘spacing’ of outcomes: An ex- perimental study over losses, Journal of Risk and Uncertainty 39, 45-63. European Securities and Markets Authority Press Release. 2018. ESMA renews binary options prohibition for a further three months from 2 January 2019, https://www.esma.europa.eu/press-news/esma-news/ esma-renews-binary-options-prohibition-further-three-months-2-january-2019. Financial Conduct Authority Press Release. 2019. FCA confirms permanent ban on the sale of binary options to retail consumers, https://www.fca.org.uk/news/statements/ fca-confirms-permanent-ban-sale-binary-options-retail-consumers. Fehr-Duda, Helga, Adrian Bruhin, Thomas Epper, and Renate Schubert. 2010. Rationality on the rise: Why relative risk aversion increases with stake size, Journal of Risk and Uncertainty 40, 147-180. Fehr-Duda, Helga and Thomas Epper. 2012. Probability and Risk: Foundations and Economic Implications of Probability-Dependent Risk Preferences, Annual Review of Economics 4, 567-593. Finanstilsynet Press Release. 2019. Finanstilsynet planlægger forbud mod binære optioner for at beskytte detailkunder, https://www.finanstilsynet.dk/Nyheder-og-Presse/Pressemeddelelser/ 2019/Forbud-mod-binaere-optioner-for-at-beskytte-detailkunder. Frick, Mira, Ryota Iijima, and Tomasz Strzalecki. 2019. Dynamic Random Utility, Econometrica 87. Fudenberg, Drew and Tomasz Strzalecki. 2015. Dynamic Logit with Choice Aversion, Econometrica 83. Fukuda, K, E Vogel, U Mayr, and E Awh. 2010. Quantity, not quality: the relationship between fluid intelligence and working memory capacity, Psychonomic Bulletin and Review 17. Gilboa, Itzhak. 1988. A combination of expected utility and maxmin decision criteria, Journal of Mathe- matical Psychology 32. Gilchrist, AL, N Cowan, and Naveh-Benjamin M. 2008. Working memory capacity for spoken sentences decreases with adult aging: Recall of fewer, but not smaller chunks in older adults, Memory 16. Gneezy, Uri, John A. List, and George Wu. 2006. The uncertainty effect: when a risky prospect is valued less than its worse possible outcome, The Quarterly Journal of Economics 121. 34 INDIRA PURI

Goodman, Aaron and Indira Puri. 2020. Overpaying for Binary Options: Preference for Simplicity in Retail Markets, Working Paper. Gul, Faruk and Wolfgang Pesendorfer. 2006. Random Expected Utility, Econometrica 74. Harless, David W. and Colin F. Camerer. 1994. The Predictive Utility of Generalized Expected Utility Theories, Econometrica 62. Herstein, I.N. and John Milnor. 1953. An Axiomatic Approach to Measurable Utility, Econometrica 21, 291-297. Huck, Steffan and Georg Weizs¨acker. 1999. Risk, complexity, and deviations from expected-value maxi- mization: results of a lottery choice experiment, Journal of Economic Psychology 20, 699 - 715. Huh, Young Eun, Joachim Vosgerau, and Carey K. Morewedge. 2014. Social Defaults: Observed Choices Become Choice Defaults, Journal of Consumer Research 41, 746-760. Iyengar, Sheena S. and Emir Kamenica. 2010. Choice proliferation, simplicity seeking, and asset allocation, Journal of Public Economics 94, 530 - 539. Iyengar, Sheena S. and Mark R. Lepper. 2000. When Choice is Demotivating: Can One Desire Too Much of a Good Thing?, Journal of Personality and Social Psychology 79. Jaffray, Jean-Yves. 1988. Choice under Risk and the Security Factor: An Axiomatic Model, Theory and Decision 24. Jaeggi, Susanne M., Martin Buschkuehl, Alex Etienne, Christoph Ozdoba, Walter J. Perrig, and Arto C. Nirkko. 2007. On how high performers keep cool brains in situations of cognitive overload, Cognitive, Affective, & Behavioral Neuroscience 7. Johnson, Eric J. and Daniel Goldstein. 2003. Do defaults save lives?, Science 302. Luce, R. Duncan. 1959. Individual Choice Behavior, John Wiley. Mador, Galit, Doron Sonsino, and Uri Benzion. 2000. On complexity and lotteries’ evaluation – three experimental observations, Journal of Economic Psychology 21, 625-637. Madrian, Bridget and David Shea. 2001. The power of suggestion: inertia in 401(k) participation and savings behavior, Quarterly Journal of Economics 117. Martinez-Marquina, Alejandro, Muriel Niederle, and Emanuel Vespa. 2019. Failures in Contingent Rea- soning: The Role of Uncertainty, American Economic Review 109. Miller, George A. 1956. The magical number seven, plus or minus two: Some limits on our capacity for processing information, Psychological Review 63. Mitchell, Cory. 2019. What You Need To Know About Binary Options Outside the U.S, https://www. investopedia.com/articles/optioninvestor/10/binary-options.asp. Moffatt, Peter G., Stefania Sitzia, and Daniel John Zizzo. 2015. Heterogeneity in preferences towards complexity, Journal of Risk and Uncertainty 51, 147-170. Murota, Kazuo. 2018. Main Features of Discrete Convex Analysis, COSS 2018. Murphy, Hannah. 2018. Binary options trading is dead - though few will mourn it, Financial Times. Nadex. 2019. What are Binary Options and How Do They Work?, https://www.nadex.com/products/ binary-options/what-are-binary-options. Neilson, William S. 1992. Some mixed results on boundary effects, Economics Letters 39, 275-278. Ortoleva, Pietro. 2013. The price of flexibility: towards a theory of thinking aversion, Journal of Economic Theory 148, 903-934. Pape, Gordon. 2010. Don’t Gamble On Binary Options, Forbes. PREFERENCE FOR SIMPLICITY 35

Rustichini, Aldo, Colin G. DeYoung, Jon E. Anderson, and Stephen V. Burks. 2016. Toward the integration of personality theory and decision theory in explaining economic behavior: an experimental investigation, Journal of Behavioral and Experimental Economics 64, 122-137. Scheibehenne, Benjamin, Rainer Greifeneder, and Peter M. Todd. 2010. Can there ever be too many options? A meta-analytic review of choice overload., Journal of Consumer Research 37. Schildberg-H¨orisch, Hannah. 2018. Are risk preferences stable?, Journal of Economic Perspectives 32. Schmidt, Ulrich. 1998. A Measurement of the Certainty Effect, Journal of Mathematical Psychology 42, 32-47. Shipstead, Zach, Tyler L Harrison, and Randall W Engle. 2016. Working Memory Capacity and Fluid Intelligence: Maintenance and Disengagement, Perspectives on Psychological Science 11. Simonsohn, Uri. 2009. Direct Risk Aversion: Evidence from Risky Prospects Valued Below Their Worst Outcome, Psychological Science 20. Sonsino, Doron, Uri Benzion, and Galit Mador. 2002. The complexity effects on choice with uncertainty – experimental evidence, The Economic Journal 112, 936 - 965. Sonsino, Doron and Mandelbaum Marvin. 2001. On preference for flexibility and complexity aversion: experimental evidence, Theory and Decision 51. Starmer, Chris. 2000. Developments in Non-Expected Utility Theory: The Hunt for a Descriptive Theory of Choice Under Risk, Journal of Economic Literature 32. Thaler, Richard H. 1985. Mental Accounting and Consumer Choice, Marketing Science 4. Thaler, Richard H. and Shlomo Benartzi. 2004. Save More TomorrowTM: Using to Increase Employee Saving, Journal of 112. Thaler, Richard H. and Cass Sunstein. 2003. Libertarian paternalism, American Economic Review 93. Tversky, Amos and . 1979. Prospect Theory: An Analysis of Decision under Risk, Econometrica 47. . 1981. The Framing of Decisions and the Psychology of Choice, Science 211. . 1986. Rational Choice and the Framing of Decisions, The Journal of Business 59. . 1992. Advances in Prospect Theory: Cumulative Representation of Uncertainty, Journal of Risk and Uncertainty 5, 297-323. Tversky, Amos and Eldar Shafir. 1992. Choice under conflict: the dynamics of deferred decision, Psycho- logical Science 3. Wang, Yitong, Tianjun Feng, and L. Robin Keller. 2013. A further exploration of the uncertainty effect, Journal of Risk and Uncertainty 47, 291-310. Weinglass, Simona. 2017. Israel bans binary options industry, finally closing vast, 10-year fraud, Times of Israel. Whitney, Paul, Christa A Rinehart, and John M. Hinson. 2008. Framing effects under cognitive load: the role of memory in risky decisions, Psychonomic Bulletin & Review 15, 1179 - 1184. Yang, Yang, Joachim Vosgerau, and . 2013. Framing influences willingness to pay but not willingness to accept, Journal of Marketing Research L, 725-738. 36 INDIRA PURI

Appendix A. Proofs Contents A.1. Tools and Terminology 36 A.2. Proof of Theorem 1 37 A.3. Proof of Proposition 1 42 A.4. Proof of Proposition 2 43 A.5. Proof of Proposition 3 44 A.6. Proof of Theorem 3 45 A.7. Proof of Theorem 4 45 A.8. Proof of Proposition 7 45 A.9. Proof of Proposition 8 46 A.10. Proof of Proposition 9 47 A.11. Proof of Proposition 10 and Corollary 2 47 A.1. Tools and Terminology. Our proofs will utilize several classical results. The first is the mixture set representation result of Herstein and Milnor (1953).

Definition 10. A mixture set is a set Π endowed with a family of operations (hα)α∈[0,1] 2 such that hα :Π → Π and, for all p, r ∈ Π, (1) h1(p, r) = p. (2) hα(p, r) = h1−α(r, p). (3) hα(hβ(p, r), r) = hαβ(p, r).

Theorem 6 (Herstein and Milnor (1953)). Let Π be a mixture set. A weak order  on Π satisfies both

(1) For all p, q, r ∈ Π, p ∼ q implies h 1 (p, r) ∼ h 1 (q, r). 2 2 (2) For all p, q, r ∈ Π, if p q r, then the sets {α ∈ [0, 1]|hα(p, r)  q} and {α ∈ [0, 1]|q  hα(p, r)} are open in [0,1] if and only if there exists a linear function U :Π → R such that U represents . 0 Moreover, if U and U are linear functions from Π to R that both represent , then 0 U = αU + β for some constants α > 0 and β ∈ R. We will also use the Cantor-Birkhoff utility representation theorem.

Definition 11. Given a space S and a preference  defined on S, the set M ⊆ S is -order dense if and only if, for all p, q ∈ S such that p q, there exists m ∈ M such that p  m  q.

Theorem 7 (Cantor-Birkhoff). The following are equivalent: (1)  on S is complete and transitive and there exists a countable and -order dense subset of S. (2) There exists f : S → R that represents . PREFERENCE FOR SIMPLICITY 37

Our final tool is the classic nested interval theorem.

Theorem 8 (Nested Interval Theorem). Suppose that the sequence In = [an, bn] ⊂ R is ∞ such that an+1 ≥ an and bn+1 ≤ bn. Then the intersection of the intervals, ∩i=1In, is not empty. In addition, if the length of the intervals converge to zero, then the intersection of the intervals is a singleton.

Lotteries with the same support will be a special topic of interest. Towards this end, given a subset Z ⊆ X, let int(∆Z) := {p ∈ ∆X : support(p) = Z} be the set of lotteries with support Z. The support size of a lottery p is |support(p)|. An i-outcome lottery is a lottery with support size i.

A.2. Proof of Theorem 1. We prove the sufficiency of the axioms for the representation. The proof starts by showing that the conditions in Theorem 6 are satisfied for lotteries of the same support. Claim 1. For any finite set Z ⊆ X, for all p, q, r ∈ int ∆Z, if p q r, then the sets {α ∈ [0, 1]|αp + (1 − α)r  q} and {α ∈ [0, 1]|q  αp + (1 − α)r} are closed in [0,1].

Proof. Suppose for contradiction that either {α ∈ [0, 1]|q  αp + (1 − α)r} or {α ∈ [0, 1]|αp + (1 − α)r  q} is not closed in [0, 1]. Without loss of generality, suppose that {α ∈ [0, 1]|αp + (1 − α)r  q} is not closed in [0, 1]. This means that {α ∈ [0, 1]|q αp + (1 − α)r} is not open in [0, 1]. This means that there exists an α∗ ∈ [0, 1] such that q α∗p + (1 − α∗)r, but αp + (1 − α)r  q, for all α ∈ (α, α∗) for some α ∈ (0, α∗), or for all α ∈ (α∗, α¯) for someα ¯ ∈ (0, α).

Without loss of generality, suppose that the latter is true. Construct a sequence pα = ∗ ∗ ∗ αp + (1 − α)r for α ∈ (α , α¯). Note that lim pα = α p + (1 − α )r. α→α∗ By assumption, αp + (1 − α)r  q for all α in the sequence, and yet q α∗p + (1 − α∗)r. This contradicts Axiom 3 (Semi-Continuity). 

Claim 2. Given any finite subset Z ⊆ X, there exists a linear function UZ : ∆Z → R which represents Z , and X UZ (p) = uZ (x)p(x) x∈Z 0 for some uZ : Z → R. In addition, if UZ , UZ are both linear functions which represent 0 Z , then UZ = αUZ + β for some α > 0, β ∈ R. 38 INDIRA PURI

Proof. Given p, q ∈ int ∆Z, define

hα(p, q) := αp + (1 − α)q.

Since the set int ∆Z is convex, hα(p, q) exists for every p, q ∈ int ∆Z and α ∈ [0, 1]. Next, Z satisfies the necessary conditions in Theorem 6 by Axioms 1 (Same-Support Independence) and Claim 1 respectively. By Theorem 6, there exists a linear function P UZ which represents Z . That UZ may be written as UZ (p) = x∈Z uZ (x)p(x) follows by induction on the size of Z. Finally, that UZ is unique up to an affine transformation follows from Theorem 6. 

Our next goal is to find a common Bernoulli utility function for all supports Z. Claim 3. Axiom 2 (Cross-Support Independence) implies Axiom 13 (Pairwise Consistency).

0 0 1 Proof. In Axiom 2 (Cross-Support Independence), take p = p = δx, q = q = δy, α = 3 , 1 1 and r = 2 δx + 2 δy. 

0 Claim 4. If Z ⊆ Z , then there is some α > 0 and β ∈ R such that uZ (x) = αuZ0 (x) + β for all x ∈ Z.

Proof. We will show that ˜ X UZ (p) := uZ0 (x)p(x) x∈Z represents Z . By Claim 2, this means that there is some α > 0 and β ∈ R such that uZ (x) = αuZ0 (x) + β for all x ∈ Z. If Z = Z0, the statement is immediate. Suppose now that Z0 differs from Z by one element, that is, x0 := Z0 \ Z. P Since UZ (p) = x∈Z uZ (x)p(x) represents Z , it follows that p q if and only if

X [p(x) − q(x)] uZ (x) > 0. x∈Z 1 1 0 1 1 0 By Axiom 1 (Same-Support Independence), p q if and only if 2 p + 2 x 2 q + 2 x , that is, if and only if

X [p(x) − q(x)] uZ0 (x) > 0. x∈Z P Hence, x∈Z uZ (x)p(x) represents Z , and the result follows. 0 If Z differs from Z by n > 1 elements, call them x1, .., xn, then let Zi = Z ∪ {x1, .., xi}. 0 Then Z ⊂ Z1 ⊂ Z2 ⊂ ... ⊂ Zn−1 ⊂ Z . By repeating the above process n times, the statement follows. 

Claim 5. There exists a function u : X → R such that, for all finite Z ⊆ X, PREFERENCE FOR SIMPLICITY 39

X u(x)p(x) x∈Z represents Z .

Proof. We will find u constructively.

Fix x1, x2 ∈ X, with δx1 δx2 (this is possible by Axiom 4 (Strict Degenerate X Preference)). By Claim 2, there exists u : {x1, x2} → R such that u(x)p(x) x∈{x1,x2} represents {x1,x2}. By Axiom 13 (Pairwise Consistency), u(x1) > u(x2). Now, for each y ∈ X, y 6= x1, y 6= x2, consider the set Y = {x1, x2, y}. By Claim 4, Y can be represented by a function

X UY = p(x)uY (x), x∈Y where uY : Y → R is a function satisfying uY (x1) = u(x1) and uY (x2) = u(x2). Extend u to all of X by setting, for each y ∈ X, y 6= x1, y 6= x2,

u(y) = uY (y). P We now check that x∈Z u(x)p(x) represents Z on sets Z of the form Z = {x1, x2, y1, .., yn}, where x1, x2, y1, .., yn ∈ X are distinct. Proceed by induction on n. The base case is n = 1. P For n = 1, x∈Z u(x)p(x) represents Z on Z = {x1, x2, y} by construction. Suppose now that u represents Z for all sets Z of the stated form with n = k. Consider an arbitrary P set Z = {x1, x2, y1, ..., yk+1}. By the inductive step, x∈Z0 u(x)p(x) represents Z0 on 0 P Z = {x1, x2, y1, ..., yk}. Also by the inductive step, x∈Z00 u(x)p(x) represents Z00 on 00 0 00 Z = {x1, x2, y2, ..., yk+1}. By Claim 4, there are functions u : Z → R and u : Z → R such that, first, both X X u0(x)p(x) and u00(x)p(x) x∈Z x∈Z 0 0 00 00 represent Z , and, second, u (x) = u(x) for all x ∈ Z , and u (x) = u(x) for all x ∈ Z . By Claim 2, there exist α > 0, β ∈ R such that

u0(x) = αu00(x) + β for all x ∈ Z. In particular, we have

0 00 u (x1) = αu (x1) + β 0 00 u (x2) = αu (x2) + β 0 00 Now, since x1, x2 ∈ Z ∩ Z , we have 0 00 0 00 u (x1) = u (x1) = u(x1) and u (x2) = u (x2) = u(x2). 40 INDIRA PURI

0 00 Since u(x1) 6= u(x2), conclude that α = 1 and β = 0. Thus, u (y1) = u (y1) = u(y1) and 00 0 0 00 P u (yk+1) = u (yk+1) = u(yk+1). So u = u = u, and hence x∈Z u(x)p(x) represents Z , as desired.

P Finally, we check that x∈Z u(x)p(x) represents Z on sets Z of the form Z = {y1, .., yn}, where y1, .., yn ∈ X are distinct. Note that Z = {y1, .., yn} is a subset of 0 P Z = {x1, x2, y1, .., yn}. By the above analysis, x∈Z0 u(x)p(x) represents Z0 . By Claim 4, P x∈Z u(x)p(x) will represent Z . 

Claim 6. Given x, y ∈ X, u(x) > u(y) if and only if δx δy.

Proof. First suppose that δx δy. Then, by Claim 3 and Axiom 13 (Pairwise Consistency), 2 1 1 2 δ + δ δ + δ . 3 x 3 y 3 x 3 y X 0 0 2 1 Since u(x )p(x ) represents {x,y} by Claim 5, this means that 3 u(x) + 3 u(y) > x0∈{x,y} 1 2 3 u(x) + 3 u(y), implying that u(x) > u(y). 2 1 1 2 In the other direction, suppose that u(x) > u(y). Then 3 u(x) + 3 u(y) > 3 u(x) + 3 u(y). X 0 0 2 1 1 2 Since u(x )p(x ) represents {x,y} by Claim 5, this means that 3 δx+ 3 δy 3 δx+ 3 δy. x0∈{x,y} Hence, by Axiom 13 (Pairwise Consistency), δx δy.  Claim 7. Given a support size i, the function X U(p) = u(x)p(x) x∈X represents  induced on lotteries with support size i.

Proof. Suppose for contradiction that p  q for some p, q with support size i, but X X u(x)p(x) < u(x)q(x). x∈X x∈X 3 P 1 P 1 P Multiplying both sides by 3/8, and adding 8 x∈X u(x)p(x)+ 8 x∈X u(x)p(x)+ 8 x∈X u(x)q(x) to each side, we obtain 3 X 3 X 1 X 1 X u(x)p(x) + u(x)p(x) + u(x)p(x) + u(x)q(x) 8 8 8 8 x∈X x∈X x∈X x∈X 3 X 3 X 1 X 1 X < u(x)q(x) + u(x)p(x) + u(x)p(x) + u(x)q(x). 8 8 8 8 x∈X x∈X x∈X x∈X P Since, by Claim 5, the function U(r) = x∈X r(x)u(x) represents {support(q)∪support(p)}, the above statement implies that 3 3 1 1 3 3 1 1 p + p + p + q ≺ q + p + p + q. 8 8 8 8 8 8 8 8 PREFERENCE FOR SIMPLICITY 41

But, because |support(p)| = |support(q)|; p  q; and p ∼ p, this statement contradicts Axiom 2 (Cross-Support Independence). In the reverse direction, suppose q p. Then the same argument shows that P u(x)p(x) < P u(x)q(x). 

Claim 8. Given a lottery p ∈ ∆X with |support(p)| = i, suppose that there exist lotteries q1 and q2, each with support size j > i, such that q1  p  q2. Then there exists a lottery q with support size j which has the property that q ∼ p.

Proof. Suppose towards a contradiction that no such lottery q exists. This means that the sets A = {q0 ∈ ∆X : q0 has j outcomes and p q0} and B = {q0 ∈ ∆X : q0 has j outcomes and q0 b} are nonempty and their union is the set of all lotteries with j outcomes. By Claim 7, the function U(r) = P r(x)u(x) represents  on lotteries with support size j. Notice that since U is continuous in r and j > 1, the range of U is an interval. Let A0 be the image of A under function U and B0 be the image of B under U. By transitivity, A0 and B0 must be non-overlapping intervals. We will show that regardless of the form of the interval that A0 and B0 take, we reach a contradiction. 0 ∗ 0 Suppose that A contains its supremum value a and B contains its infimum value b∗. 0 0 ∗ Since A and B are non-overlapping intervals, it must be the case that a < b∗. But now, −1 ∗ using continuity of U, the set U ((a , b∗)) is non-empty and is disjoint from A and B, contradicting and A ∪ B covers all lotteries with j outcomes. Now suppose that A0 contains its supremum value a∗ and B0 does not contain its in- ∗ fimum value b∗. If a < b∗, then again there is a non-empty set of j-outcome lotteries −1 ∗ ∗ ∗ U ((a , b∗)) not covered by A or B, so suppose a = b∗. Let q be that j-outcome lottery ∗ ∗ ∗ ∗ such that U(q ) = a . Then this says that p q . Let x∗ be a worse outcome in q

∗ ∗ and x a best outcome. By Axiom 4 (Strict Degenerate Preference), δx δx∗ . Let q be the lottery that allocates probability qi to each outcome xi in q which is not x∗ or ∗ ∗ x ; and allocates probabilities qi +  to outcome x and probability qi −  to outcome x∗. 0 Then by Claim 6 and continuity of U, U(q) ∈ B and therefore q p for all . But lim→0 q = q ≺ p, contradicting Axiom 3 (Semi-Continuity). The case where A0 does not contain its supremum value a∗ and B0 does contain its infimum value b∗ proceeds similarly to the case just studied. Finally, suppose that A0 does not contain its supremum value a∗ and B0 does not con- ∗ ∗ tain its infimum value b∗. Then since a ≤ b∗ by transitivity of , letting q be that j-outcome lottery such that U(q∗) = a∗ (which exists since we said that range of U forms an interval), q∗ is neither in A nor in B, contradicting that A∪B covers the set of lotteries with j outcomes.  42 INDIRA PURI

Claim 9. There exists a function C : ∆X → R such that X U(p) = u(x)p(x) − C(|support(p)|) x∈support(p) represents . In addition, C is monotonic, i.e. if |support(p)| ≤ |support(q)|, then C(|support(p)|) ≤ C(|support(q)|).

Proof. The proof is constructive. We claim that the following process for constructing C will create the desired representation. • Step 1: Set C(1) = 0. • Step i > 1: Consider all lotteries p with |support(p)| = i. – Case 1 : There exists an integer k ∈ {1, 2, .., i − 1} such that there exists

a lottery qk ∈ ∆X with |support(p)| = k < i and a lottery p ∈ ∆X with 0 |support(p)| = i satisfying qk ∼ p. Let k be the largest k for which this is P P true. Set C(i) = x u(x)p(x) − x u(x)qk0 (x) + C(k). h P i – Case 2 : No such k exists. Then set C(i) = sup{p:|support(p)|=i} x u(x)p(x) −  P  inf{q:|support(q)|

A.3. Proof of Proposition 1. We claim that the cost function C construction in Claim 9 is minimal for Bernoulli utility u. By the proof of Theorem 1, P u(x)p(x)−C(|support(p)|) represents . Suppose for contradiction that there exists a cost function C0 where P u(x)p(x)− C0(|support(p)|) represents , and C0(i) < C(i) for some integer i. Let j be the least i for which C0(i) < C(i). Since C0(1) = C(1) = 0 by definition of a simplicity representation, it follows that j ≥ 2. Now, either there exists a j-outcome lottery p and a k < j outcome lottery q such that p ∼ q, or there does not. If there does, then since both cost functions represent the preference, P u(x)q(x) − C(k) = P u(x)p(x) − C(j). Likewise, P u(x)q(x) − C0(k) = P u(x)p(x) − C0(j). On the other hand, by choice of j, it follow that P u(x)q(x) − C(k) = P u(x)q(x) − C0(k). Thus, C(j) = C0(j). If such lotteries p and q do not exist, then by the analysis in Claim 9, it must be that q0 p0 for all j-outcome lotteries p0 and k-outcome lotteries q0, k ∈ {1, 2, .., j − 1}. h P i Recall that, by construction, in this case C(j) = sup{p:|support(p)|=i} x u(x)p(x) −  P  0 inf{q:|support(q)| 0. Since j ≥ 2, we may pick a lotteries p0, q0 with support size j and j − 1 respec- P P  P 0 tively, such that u(x)p(x) > sup{p:|support(p)|=i} x u(x)p(x) − 2 and u(x)q (x) ≤ P  P inf{q:|support(q)|=j−1} x u(x)q(x) + 2 . Since inf{q:|support(q)|=i} x u(x)q(x) = infx u(x) for PREFERENCE FOR SIMPLICITY 43

0 P 0 P any i, this choice of q also implies that u(x)q (x) ≤ inf{q:|support(q)|≤j−1} x u(x)q(x) +  2 . Then " # " # X X X X u(x)p0(x) − C0(j) = u(x)p0(x) − sup u(x)p(x) + inf u(x)q(x) {q:|supp(q)| inf u(x)q(x) − C(j − 1) + {q:|support(q)| u(x)q0(x) − C(j − 1) X = u(x)q0(x) − C0(j − 1), where the last equality follows by choice of j. The above derivation implies p0 q0, a contradiction. 

A.4. Proof of Proposition 2. That the two Bernoulli utilities are affine transformations 0 of each other (i.e. u = αu + β for some α > 0, β ∈ R) follows from Claims 2 and 5. We prove that C(i) = αC0(i) for all i by induction on i. The base case is i = 1; then C(1) = C0(1) = 0 by definition of a simplicity representation. Suppose that C(j) = αC0(j) for j ∈ {1, .., i − 1}. Either there exists a j-outcome lottery p and a k-outcome lottery q (k < j) such that p ∼ q, or there does not. If such lotteries do exist, then by the proof of Proposition 1, X X C(i) = u(x)p(x) − u(x)q(x) + C(k), and x∈X x∈X X X C0(i) = u0(x)p(x) − u(x)q(x) + C0(k). x∈X x∈X Using the inductive step, X X C(i) = u(x)p(x) − u(x)qk0 (x) + C(k) x∈X x∈X X 0 X 0 = [αu (x) − β]p(x) − [αu(x) − β]qk0 (x) + αC (k) x∈X x∈X " # X X = α u0(x)p(x) − u(x)q(x) + C0(k) x∈X x∈X = αC0(i), 44 INDIRA PURI as desired. Now suppose that such lotteries p and q do not exist. Then by the proof of Proposition 1, " # " # X X C(i) = sup u(x)p(x) − inf u(x)q(x) + C(i − 1), and {q:|support(q)|

A.5. Proof of Proposition 3. To prove that (1) implies (2), suppose that 1 is more complexity averse than 2. Consider least-cost simplicity representations X X U1 = u(x)p(x) − C1(|support(p)|) and U2 = u(x)p(x) − C2(|support(p)|)

of 1 and 2, respectively. Notice that the Bernoulli utilities of each representation are identical; this is by construction of the representations U1 and U2 in Theorem 1, since, for lotteries p and q with the same support size, p 1 q ⇐⇒ p 2 q by assumption. Suppose for contradiction that C2(i) − C2(i − 1) > C1(i) − C1(i − 1) for some i. Let j be the least i for which C2(i) − C2(i − 1) > C1(i) − C1(i − 1). Either there exists a j-outcome lottery p and a j − 1-outcome lottery q (k < j) such that p ∼1 q, or there does not. If such lotteries do exist, then X 0 = u(x)(p(x) − q(x)) − [C1(j) − C1(j − 1)] X < u(x)(p(x) − q(x)) − [C2(j) − C2(j − 1)]

Since U2 represents 2 by assumption, the above derivation implies q 2 p. But then q 2 p and q ∼1 p, contradicting that 1 prefers fewer outcomes more than 2. If such lotteries do not exist, then since U1 represents , by monotonicity of C1, it must be that there exist no j-outcome lottery p and k-outcome lottery q (k < j) with q ∼ p. In this case, recall from the proof of Proposition 1 that " # " # X X C1(j) = sup u(x)p(x) − inf u(x)q(x) + C1(j − 1), and {q:|support(q)|

Since the Bernoulli utilities are equal, this means C2(j) − C2(j − 1) = C1(j) − C1(j − 1), contradicting C2(j) − C2(j − 1) > C1(j) − C1(j − 1) To prove that (2) implies (1), first note that since the Bernoulli utilities are equal, it 0 0 0 follows that for any two i-outcome lotteries p, p , p 1 p ⇐⇒ p 2 p . Now suppose for contradiction that there exists a k-outcome lottery q and a j outcome lottery p, 1 ≤ k < j, PREFERENCE FOR SIMPLICITY 45 such that q 2 p but p 1 q. Since least-cost simplicity representations X X U1 = u(x)p(x) − C1(|support(p)|) and U2 = u(x)p(x) − C2(|support(p)|) represent 1 and 2 by assumption, this means X X u(x)q(x) − C2(k) > u(x)p(x) − C2(j) and X X u(x)q(x) − C1(k) ≤ u(x)p(x) − C1(j)

Adding the equations, C2(j) + C1(k) > C2(k) + C1(j). But this means C2(j) − C2(k) > C1(j) − C1(k), contradicting (1). 

A.6. Proof of Theorem 3. From Luce (1959), there exists a function W : ∆X → R++ W (q) such that ρ(q, A) = P 0 for all A ∈ M(∆X) if and only if Axioms 9 (Luce’s IIA) q0∈A W (q ) and 10 (Positivity) are satisfied. Next, observe that ρ(q, {q, p}) > 0.5 ⇐⇒ W (q) > W (p) ⇐⇒ q ∗ p. Since ∗ satisfies Axioms 1 (Same-Support Independence), 2 (Cross-Support Independence), 3 (Semi-Continuity), and 4 (Strict Degenerate Preference), it admits a Simplicity Repre- sentation by Theorem 1. That is, for some Bernoulli utility u and weakly increasing cost function C, X X q ∗ p ⇐⇒ u(x)q(x) − C(|support(q)) > u(x)p(x) − C(|support(p)). And therefore W (q) > W (p) ⇐⇒ P u(x)q(x) − C(|support(q)) > P u(x)p(x) − C(|support(p)). By cardinal uniqueness of the representation, it follows that X X  W (q) = φ u(x)q(x) − C(|support(q)) > u(x)p(x) − C(|support(p)) for some strictly increasing φ : R → R. Conversely, if ρ admits a simplicity-based Luce representation, then it must satisfy Axioms 9 (Luce’s IIA) and 10 (Positivity) by Luce (1959). Next, for any lotteries q, p, W (q) > W (p) ⇐⇒ ρ(q, {q, p}) > 0.5 ⇐⇒ q ∗ p. Since W = φ (P u(x)q(x) − C(|support(q)) > P u(x)p(x) − C(|support(p))), ∗ ad- mits a simplicity representation, so by Theorem 1 satisfies Axioms 1 (Same-Support Independence), 2 (Cross-Support Independence), 3 (Semi-Continuity), and 4 (Strict De- generate Preference),

A.7. Proof of Theorem 4. This is identical to the proof of Theorem 3, replacing refer- ences to Theorem 1 by reference to Theorem 2.

A.8. Proof of Proposition 7. That (u, C) is an affine transformation of (˜u, C˜) follows from Proposition 2. To show that, when (u, C) = (˜u, C˜), then φ is a positive multiple of 46 INDIRA PURI

φ˜, fix any lottery q ∈ ∆X. Then for any lottery r, φ((u, C)(q)) φ˜((u, C)(q)) ρ(q, {r, q}) = = . φ((u, C)(q)) + φ((u, C)(r)) φ˜((u, C)(q)) + φ˜((u, C)(r)) Simplifying, φ((u, C)(r)) = φ((u,C)(q)) φ˜((u, C)(r)). φ˜((u,C)(q)) In particular, set γ = φ((u,C)(q)) . Note that this is well-defined because, by definition of φ˜((u,C)(q)) the stochastic simplicity representation, the weighting function is positive for all lotteries, i.e. for all s ∈ ∆X, W˜ (s) = φ˜((u, C)(s)) > 0.

A.9. Proof of Proposition 8. Suppose that choice rule ρ is more complexity averse than choice rule ρ0, and that both admit stochastic simplicity representations. Let (φ, u, C) be the representation corresponding to ρ and (φ0, u0,C0) the representation corresponding to ρ0. Then if φ = φ0 and u = u0, by ρ being more complexity averse than ρ0, it follows that for any outcome x ∈ X and lottery q ∈ ∆X, φ(u(x)) φ(u(x)) ≥ φ(u(x)) + φ(P u(x)q(x) − C(|support(q)|)) φ(u(x)) + φ(P u(x)q(x) − C0(|support(q)|)) which means, since φ is increasing, that C(|support(q)| ≥ C0(|support(q)|. To show that at least one such pair of representations exists, observe that because ρ and ρ0 agree on ordering for lotteries of the same support, it follows that the induced lottery preference relations ∗ and ∗0 agree on lotteries of the same support. From this and from the fact that both ρ and ρ0 admit stochastic simplicity representations, which means that ∗ and ∗0 admit simplicity representations, it follows from the proof of Theorem 1 (and in particular Claims 2 and 5) that the Bernoulli utilities corresponding to ∗ and ∗0 are identical up to an affine transformation. In particular, we can take u = u0. Next, since ρ and ρ0 agree in value on degenerate lotteries, it follows that, for any x, y ∈ X, φ(u(x)) φ0(u(x)) = φ(u(x)) + φ(u(y)) φ0(u(x)) + φ0(u(y))

0 φ(u(x)) 0 i.e. that φ (u(y)) = φ0(u(x)) φ(u(y)). In particular, fixing x ∈ X, φ is a positive multiple of φ on the range of u. From Theorems 4 and 2, the range of the simplicity representation (u, C) is a subset of the range of u. Therefore φ0 is a positive multiple of φ everywhere on its domain; without loss of generality, we may take φ0 = φ. This shows that at least pair of representations (φ, u, C) and (φ, u, C0) of ρ and ρ0, respectively, exists. To prove the converse direction, suppose that at least one pair of representations (φ, u, C) and (φ, u, C0) of ρ and ρ0, respectively, exists, and that C ≥ C0. We need to check each of the conditions for ρ being more complexity averse than ρ0. Since the repre- sentations have identical φ and u, it follows immediately that ρ and ρ0 agree in value for degenerate lotteries and agree on ordering for lotteries of the same support. It follows by C ≥ C0 and the stochastic simplicity representation that for all lotteries q and outcomes 0 x ∈ X, ρ(δx, {q, δx}) ≥ ρ (δx, {q, δx}). PREFERENCE FOR SIMPLICITY 47

A.10. Proof of Proposition 9. Suppose that lotteries q, r ∈ ∆X are such that P u(x)q(x) < P u(x)r(x) and they have the same support size. By assumption, C0 = C + ; we will take a derivative with respect to . We have

φ (P u(x)q(x) − C(|support(q)|) − ) (φ, u, C0)(q, {q, r}) = . φ (P u(x)q(x) − −C(|support(q)|) − ) + φ (P u(x)r(x) − C(|support(r)|) − ) Letting U(q) = P u(x)q(x)−C(|support(q)|)− and U(r) = P u(x)r(x)−C(|support(r)|)− −(φ(U(q))+φ(U(r)))φ0(U(q))+φ(U(q))(φ0(U(q))+φ0(U(r))) , taking a derivative with respect to  yields (φ(P u(x)q(x)−C)+φ(P u(x)r(x)−k))2 . This expression is weakly positive if and only if φ(U(q)) φ(U(r)) −φ0(U(q))φ(U(r)) + φ(U(q))φ0(U(r)) ≥ 0 ⇐⇒ ≥ . φ0(U(q)) φ0(U(r)) Since q and r were arbitrary, this happens if and only if φ has increasing logarithmic φ(y) derivative, i.e. if and only if φ0(y) weakly decreases in y. A.11. Proof of Proposition 10 and Corollary 2. Identical to the proof of Proposition 9, where  is taken to represent C(|support(q0)|) − C(|support(q)|).