Toward a Formal Science of Heuristics Ardavan S

Home , Formal science

Toward a Formal Science of Heuristics Ardavan S. Nobandegani1,3 & Thomas R. Shultz2,3 {ardavan.salehinobandegani, william.campoli}@mail.mcgill.ca {thomas.shultz}@mcgill.ca 1Department of Electrical & Computer Engineering, McGill University 2School of Computer Science, McGill University 3Department of Psychology, McGill University

Abstract see, e.g., Martignon & Hoffrage, 2002, Hogarth & Karelaia, 2006) toward establishing a mathematically-rigorous charac- Heuristics are simple, effective cognitive processes that de- liberately ignore parts of information relevant to decision- terization of the environmental conditions underpinning the making. Ecological rationality, as an essential part of the less-is-more effect — Todd and Gigerenzer (2007) explicitly Adaptive Toolbox research program on heuristics, investi- call for developing such deep theoretical accounts. gates the environmental conditions under which simple heuristics would outperform complex models of decision-making, In this work, we present a new research program, dubbed thereby providing support for the surprising less-is-more ef- formal science of heuristics (FSH), that nicely complements fect. In this work, we present a new research program, dubbed formal science of heuristics (FSH), that nicely complements the ecological rationality research, developing it into a richer the ecological rationality research, developing it into a much research program, and, additionally, permitting mathemati- richer research program. Concretely, FSH sets to (i) mathe- cians and computer scientists to make important contributions matically delineate the broadest class of environmental conditions under which a heuristic is fully optimal, and (ii) formally to the science of heuristics. investigate how deviations from those conditions would lead Concretely, FSH pursues the following two objectives. (1) to degradation of performance, thereby allowing for a mathematically rigorous characterization of their robustness. As an FSH seeks to mathematically delineate the broadest class of instantiation of the FSH research program, we present several environmental conditions under which a heuristic is fully op- analytical results aiming to delineate the mildest conditions timal (i.e., using the standard terminology of computer sci- granting the optimality of a well-known heuristic: Take The Best. We conclude by discussing the implications that pursuit ence, the environmental conditions under which a heuristic of FSH could have on the science of heuristics. serves as a correct algorithm w.r.t. the objective of interest, or, Keywords: Ecological rationality; one-reason heuristics; for- equivalently, an approximation algorithm with an approxima- mal science of heuristics; Take The Best heuristic tion ratio of one). (2) FSH aims to formally investigate how deviations from optimality conditions would lead to degrada- 1 Introduction tion of performance, thereby allowing for a mathematically Heuristics—simple, effective cognitive processes that de- rigorous characterization of a heuristic’s robustness. As such, liberately ignore parts of information relevant to decision- to provide strongest theoretical support for the robustness of a making—are assumed to underpin much of human judg- heuristic, FSH aims to analytically provide the mildest techni- ment and decision-making (e.g., Gigerenzer & Selten, 2001; cal conditions granting the optimality of a heuristic. Accord- Mousavi, Gigerenzer, & Kheirandish, 2016), and are widely ing to the Adaptive Toolbox theory, robustness is predomi- considered to be sub-optimal, attaining higher speed at the nantly responsible for the less-is-more effect, and plays a cen- expense of lower accuracy (e.g., Payne et al., 1993; Shah & tral role in the success of fast-and-frugal heuristics in every- Oppenheimer, 2008; Evans and Over, 2010). day life decisions (e.g., Gigerenzer & Todd, 1999; Gigerenzer Challenging the latter mindset, the influential ecological & Gaissmaier, 2011). rationality research program (as part of the Adaptive Tool- With regard to objective (2) mentioned above, one box theory) maintains that heuristic are well-matched to the of the mildest technical conditions worth considering is environment they are adopted in (Todd & Gigerenzer, 2007), distribution-free performance guarantees, widely studied in and seeks to investigates the environmental conditions un- statistical learning theory and machine learning (e.g., Valiant, der which heuristics would outperform complex models of 1984; Kearns, Vazirani, & Vazirani, 1994). As is often the decision-making, giving rise to the surprising less-is-more case, a decision-maker lacks (at least partially) the knowledge effect: when less information or computation leads to more of the regularities of their environment, and, therefore, is not accurate judgments than more information or computation fully informed as to how information relevant to a decision- (Gigerenzer & Gaissmaier, 2011). making task of interest is distributed. Distribution-free re- Despite great successes, ecological rationality work has sults, as the term suggests, establish performance guarantees predominantly focused on simulation-based demonstrations that hold true regardless of the probability distribution gov- of simple heuristics outperforming complex strategies (e.g., erning a decision-making task (e.g., the distribution of at- Gigerenzer et al., 2008, Gigerenzer & Todd, 1999, Todd & tributes in a multi-alternative decision-making task). As such, Gigerenzer, 2000, Hoffrage & Reimer, 2004; Gigerenzer & distribution-free results provide strong robustness guarantees Goldstein, 1996), directing comparatively little effort (but while demanding minimal environmental knowledge on the 877 part of the decision-maker, thus playing an integral role in the Nakisa, & Redington, 2003). For example, on the task of FSH research program. predicting which of two cities has a higher homeless rate, We should note that establishing distribution-free perfor- TTB achieves better prediction accuracy than several com- mance guarantees for a heuristic does not imply that: (1) petitors, including multiple regression model (Gigerenzer et the decision-maker is inattentive to their environment, nor al., 2008). More strikingly, Czerlinski, Gigerenzer, and Gold- that (2) the decision-maker is not trying to select a heuris- stein (1999) empirically showed that, across 20 real-world tic well-matched to the environment — experimental evi- prediction problems, on average TTB obtains the best pre- dence clearly suggests otherwise (e.g., Rieskamp & Otto, diction accuracy when competing with several prominent al- 2006; Hoffart, Rieskamp, & Dutilh, 2018; Payne, Bettman, ternatives, including multiple regression and tallying heuris- & Johnson, 1988; Broder,¨ 2003; Pachur, Todd, Gigerenzer, tic. Relatedly, on the same 20 real-world prediction problems, Schooler, & Goldstein, 2011). On the contrary, establishing Gigerenzer et al. (2008) empirically show that the predictive distribution-free performance guarantees on a heuristic en- accuracy of TTB came, on average, within three percentage sures that that heuristic is well-matched to the environment, points of a complex Bayesian network model. More broadly, even when the decision-maker’s knowledge of the environ- when environments are moderately unpredictable and learn- ment is imperfect—a psychologically plausible assumption. ing samples are small, as with many social and economic sit- This work is organized as follows. We begin by presenting uations, TTB tends to make inferences as accurately as or an overview of a well-known heuristic: Take The Best (TTB). better than multiple regression and neural networks (Chater, As an instantiation of FSH, we then establish several ana- Oaksford, Nakisa, & Redington, 2003). lytical results, including strong distribution-free performance To provide direct experimental evidence for TTB as a psy- guarantees, for TTB. Finally, we conclude by discussing the chological model, Broder¨ and his colleagues (Broder¨ 2000; implications that pursuit of FSH could have on the science of Broder¨ and Schiffer 2003) conducted 20 studies, conclud- heuristics. ing that TTB is used under a number of conditions such as when information is costly and the variability of the validity 2 Take The Best: An Overview of the attributes is high. Furthermore, Broder¨ and Gaissmaier Take The Best (TTB; Tversky, 1969, Gigerenzer, Hoffrage, (2007) and Nosofsky and Bergert (2007) showed that TTB & Kleinbolting,¨ 1991) belongs to the class of one-reason predicts response times better than weighted additive and ex- decision-making heuristics which base decisions on only one emplar models. attribute value. In its classic form, TTB is concerned with Previous work assessing the prediction accuracy of TTB the task of predicting which of two objects, each possessing has mainly focused on computer simulations, with some work several binary-valued attributes, has a higher value on a given establishing analytical results formally supporting the effi- criterion, e.g., which of two cities has a higher population, or, cacy of TTB (e.g., Martignon & Hoffrage, 2002; Hogarth & which of two cookies would be more delicious. Karelaia, 2005, 2006; Baucells, Carrasco, & Hogarth, 2008). The machinery of TTB is quite simple: Starting with Pursuing the research program proposed by FSH, and con- the attribute having the highest validity, make pairwise- trary to past analytical work, in this work we consider a comparisons between the attribute values of the two objects; much broader class of problems involving nonlinear objec- as soon as the first discriminating attribute is encountered tive functions (Definitions 1-4) with interactions between at- (i.e., the attribute on which the two objects differ), announce tributes being also accounted for (Definition 2). For the the object attaining the highest attribute value on the discrimi- broad class of problems discussed above, we formally estab- nating attribute to be the winning object. TTB visits attributes lish conditions granting the optimality of TTB when dealing in a descending order of their validities. If no discriminating with both non-binary, discrete attribute values (Propositions attribute is ever encountered, TTB selects the winning object 3, 5, and 6) and continuous attribute values (Proposition 4). 1 th uniformly at random. In TTB, the validity vi of the i at- We also analytically investigate a broad class of prediction tribute is given by (Gigerenzer et al., 2008): problems—involving both structured (Definition 3) and un-

Ri structured noise (Definition 4)—for which only probabilistic vi :, , guarantees can be provided. Additionally, and in sharp con- Ri +Wi trast to past analytical work, we provide strong distribution- where Ri,Wi are the number of correct and incorrect infer- free guarantees on TTB for several classes of prediction prob- th ences based on the i attribute alone, respectively. lems (Propositions 5, 6, and 8). The efficacy of TTB receives strong empirical support from a wide range of economic, demographic, environmental, and 3 Instantiating FSH: The Case of TTB other prediction tasks (e.g., Gigerenzer et al., 2008; Czer- linski, Gigerenzer, & Goldstein, 1999; Chater, Oaksford, As an instantiation of FSH, in this section we establish several analytical results, including strong distribution-free per- 1Without loss of generality, we assume that the decision-maker formance guarantees, for TTB. initially recognizes all the objects which s/he has to choose from. Accordingly, the use of recognition heuristic (Gigerenzer et al., Before we proceed further, let us formally delineate an ob- 2008), as the first step of TTB, is implicitly considered in our work. jective function which characterizes a broad class of decision- 878 making problems. process to be the winning object. TTB visits attributes in a Definition 1. (Objective function) Let O1,O2,...,ON de- descending order of their validities. If no discriminating at- note the set of N objects a decision-maker should choose tribute is ever encountered, TTB selects the winning object th from. Let also Oi j ∈ {0,1} denote the value of the j at- uniformly at random. tribute of object Oi where 1 ≤ j ≤ M, and wk denote the Proposition 3. (Multi-level attribute values) Let th weight corresponding to the k attribute, Ak. Finally, let O1,O2,...,ON denote the set of N objects a decision-maker ψ(·) be an arbitrary monotonically-increasing function (i.e., is to choose from, with each object having M attributes. Let d th ∀x : dx ψ(x) > 0). Then, the winning object Oi? is the one also Oi j ∈ {0,1,··· ,θ} denote the value of the j attribute of ? whose index i satisfies the following objective function: object Oi where 1 ≤ j ≤ M. Finally, let wk denote the weight th M corresponding to the k attribute (see Definition 1), and vk ? denote the validity of the kth attribute. If ∀k w = v , and i :, argmaxψ( w jOi j). (1) k k i ∑ 1 j=1 ∃r > 1+θ s.t. ∀k v ≤ ( )v , then TTB is an optimal strat- k r k−1 Therefore, in the case of having only two objects O1,O2 to egy for the class of decision-making problems characterized choose from, the optimal decision rule is given by: in Definition 1. In simple terms, Proposition 3 analytically establishes a M O2 M conditions granting the optimality of TTB (when generalized ψ(∑ w jO1 j) Q ψ(∑ w jO2 j), (2) j=1 O1 j=1 to the setting of N objects, each with discrete, multi-level attribute values) with respect to the objective given in (1). O 2 Proposition 4. (Continuous attribute values) Let ∀i ∈ where A Q B denotes the following: choose O1 if A > B; {0,1},O denote the two objects a decision-maker should O1 i choose O2 if A < B; and choose uniformly at random between choose from, with each object having M attributes. Let also th O1,O2 if A = B. Oi j ∈ R denote the value of the j attribute of object Oi. Fi- th It is worth noting that in Eqs. (1-2), ψ can be any nally, let wk denote the weight corresponding to the k at- x th monotonically-increasing function, e.g., ψ(x) = e ,ψ(x) = tribute (see Definition 1), and vk denote the validity of the k 2x + log(x). attribute. Assuming that k∗ denotes the index of the discrim- Proposition 1. (Sufficient condition for optimality) If inating attribute on which TTB halts, and |O1k∗ − O2k∗ | ≤ δ, there exists a k ∈ N such that ∀p < k, ∀i, j ∈ {1,...,N} Oip = the following statement holds true: If ∀k wk = vk, and ∀i, j ∈ U 1 O jp and wk > ∑i>k wi, then the following holds true: {1,...,M} O ≤ U, and ∃r > 1 + s.t. ∀k v ≤ ( )v , i j δ k r k−1 ∃ j ∈ {1,...,N} ∀i 6= j O jk > Oik ⇒ Oi? := O j. (3) then TTB is an optimal strategy for the class of decision- making problems characterized in Definition 1. Importantly, Proposition 1 establishes a condition granting Proposition 4 analytically establishes a conditions granting basing decision on only one attribute while preserving opti- the optimality of TTB (when choosing between two objects, mality with respect to the objective given in (1). As such, each with continuous attribute values) with respect to the ob- Proposition 1 provides a firm rational basis for the possi- jective given in (1). bility of one-reason decision-making for the broad class of Following the line of research proposed by FSH, next we decision-making problems characterized in Definition 1. present our first distribution-free guarantee for TTB. Next, Proposition 2 establishes a condition granting the op- Proposition 5. (Distribution-free guarantee) Let timality of TTB (when choosing between an arbitrary num- O ,O ,...,O denote the set of N objects a decision-maker ber of objects) with respect to the objective given in (1). 1 2 N is to choose from, with each object having M attributes. Let Proposition 2. (Generalizing TTB to N-object predic- also O ∈ {0,1,··· , } denote the value of the jth attribute tion tasks) Let O ,O ,...,O denote the set of N objects i j θ 1 2 N d a decision-maker is to choose from. Let also wk denote the of object Oi, where {Oi j}i, j v P with P denoting a joint weight corresponding to the kth attribute (see Definition 1), probability distributions over the set of all attribute values th and vk denote the validity of the k attribute. If ∀k wk = vk {Oi j}i, j. Finally, let wk denote the weight corresponding to 1 the kth attribute (see Definition 1), and v denote the validity and ∃r ∈ >2 s.t. ∀k v ≤ ( )v , then TTB is an optimal k R k r k−1 of the kth attribute. Then, for any joint probability distribu- strategy for the class of decision-making problems character- tion P the following statement holds true: If ∀k wk = vk, and ized in Definition 1. 1 ∃r > 1+θ s.t. ∀k v ≤ ( )v , then TTB is an optimal strat- In the N-object setting (as in Proposition 2), TTB works k r k−1 as follows: Starting with the attribute having the highest va- egy for the class of decision-making problems characterized lidity, compare attribute values across the N objects; as soon in Definition 1. as the first discriminating attribute is encountered (i.e., the Proposition 5 analytically establishes a condition ensuring attribute on which at least two objects differ), exclude from the optimality of TTB (when generalized to the setting of N consideration those objects faring worse on the discriminat- objects, each with discrete, multi-level attribute values) with ing attribute; announce the object surviving this elimination respect to the objective given in (1), in a strong distribution- 879 free manner. It is crucial to note that the optimality guarantee Proposition 7. (Noise-level-independent probabilistic given in Proposition 5 holds true for any joint distribution P guarantee) Consider the class of prediction problems for- on the set of all attribute values. mally characterized in Definition 3. Then, for any noise vari- 2 Next, in Definition 2, we formally characterize a broad ance σ > 0, the following statement holds true: If ∀k wk = vk, class of prediction problems wherein interactions between at- >2 1 and ∃r ∈ R s.t. ∀k vk ≤ ( )vk−1, then the probability with tribute values are also accounted for. r which TTB correctly selects the superior object is ≥ 0.5. Definition 2. (Objective function) Let O1,O2,...,ON denote the set of N objects a decision-maker should choose Proposition 7 formally establishes the following important th result for the inherently-noisy world characterized in Defini- from, and Oi j ∈ {0,1,··· ,θ} denote the value of the j at- tion 4: For any noise level σ2, TTB dominates the selection- tribute of object Oi where 1 ≤ j ≤ M. Additionally, let wk th purely-by-chance strategy which select the winning object denote the weight corresponding to the k attribute, and rpq uniformly at random. This result importantly demonstrates denote the weight quantifying the amount of interaction be- 2 tween the pth and the qth attributes. Finally, let χ(·) be an ar- that, independent of noise level σ , the adoption of TTB (in- d stead of the selection-purely-by-chance strategy) is rationally bitrary monotonically-increasing function (i.e., ∀x: dx χ(x) > ? justified for the inherently-noisy, class of prediction problems 0). Then, the winning object Oi? is the one whose index i satisfies the following objective function: formally characterized in Definition 3. Definition 4 below formally characterizes a broad class of M predictions problems, once again, under the noisy-world set- ? i : argmaxχ( w jOi j + rpqOipOiq). (4) , i ∑ ∑ ting; this time, however, the noise component contaminating j=1 p,q p6=q the prediction problem has an unstructured form. Definition 4. (Probabilistic guarantee) Let ∀i ∈ {0,1}, Proposition 6. (Distribution-free guarantee) Consider Oi denote the two objects a decision-maker should choose th the class of prediction problems formally characterized in from. Let also Oi j ∈ {0,1} denote the value of the j attribute th Definition 2. Let also P denote a joint probability dis- of object Oi, wk denote the weight corresponding to the k tributions over the set of all attribute values {Oi j}i, j, i.e., attribute, and Ci denote the score the object Oi attains on the d criterion of interest to the prediction task (e.g., the population {Oi j}i, j v P. Then, for any joint probability distribution P the following statement holds true: If ∀k wk = vk, of a city, if the prediction task is to predict which of two cities has a higher population). Finally, let φ(·) be a monotonically- and ∃R ∈ R s.t. ∀p,q ∈ {1,··· ,M} rpq < R, and ∃r > 1 + r − 1 1 increasing function (i.e., ∀x: d φ(x) > 0). Then, consider the θ s.t. ∀k Mθ2R ≤ v ≤ ( )v , then TTB is an dx r − (θ + 1) 2 k r k−1 class of prediction problems satisfying the following: optimal strategy for the class of decision-making problems M M characterized in Definition 2. P(C1 > C2|φ(∑ w jO1 j) > φ(∑ w jO2 j) ≥ 1 − η, 0 < η 1. In simple terms, Proposition 6 formally establishes a j=1 j=1 distribution-free result granting the optimality of TTB (when generalized to the N-object setting, each with discrete, multi- Proposition 8. (Distribution-free guarantee) Consider level attribute values) with respect to the broad class of pre- the class of prediction problems formally characterized in diction problems characterized in Definition 2 (with interac- Definition 4. Let also P denote a joint probability dis- tions between attributes also accounted for). tributions over the set of all attribute values {Oi j}i, j, i.e., Next, Definition 3 formally characterizes a broad class of d {Oi j}i, j v P. Then, for any joint probability distribution predictions problems under the noisy-world setting wherein P the following statement holds true: If ∀k wk = vk, and the noise component contaminating the prediction problem 1 ∃r ∈ >2 s.t. ∀k v ≤ ( )v , then the probability with which has a particular structured form: Gaussian distribution. R k r k−1 Definition 3. (Objective function) Let ∀i ∈ {0,1}, Oi de- TTB mistakenly selects the inferior object is less than η, note the two objects a decision-maker should choose from, where 0 < η 1. th and Oi j ∈ {0,1} denote the value of the j attribute of object Proposition 8 establishes a distribution-free condition suf- Oi where 1 ≤ j ≤ M. Additionally, let wk denote the weight ficient to grant that the probability of TTB erring in a predic- th corresponding to the k attribute, and Ci denote the score the tion task belonging to the class of problems characterized in object Oi attains on the criterion of interest to the prediction Definition 4 is minuscule. task (e.g., the population of a city, if the prediction task is to predict which of two cities has a higher population). Fi- 4 General Discussion nally, let χ(·) be an arbitrary monotonically-increasing func- In this work, we presented a research program, dubbed for- d tion (i.e., ∀x: dx χ(x) > 0). Then, consider the class of predic- mal science of heuristics (FSH), that nicely complements the tion problems satisfying the following: influential ecological rationality research program (Todd & Gigerenzer, 2007), developing it into a much analytically- M d 2 richer scientific endeavor. By pursuing its two stated goals Ci :, χ(∑ w jOi j) + ε, ε ∼ N (0,σ ). (5) j=1 (see Introduction section), FSH seeks to (i) mathematically 880 delineate the key premise ecological rationality rests on—that vironmental conditions under which they are (near) optimal heuristics are well-matched to the environments in which they and/or how deviations from those conditions would lead to are adopted (Todd & Gigerenzer, 2007)—and (ii) establish performance degradation. Thus, future work following FSH the strongest analytical results supporting this premise. After should address this analytical shortcoming. all, to rigorously and thoroughly answer whether a heuristic Rieskamp and Otto (2006) show that people are sensitive is well-matched to its environment, we need to formally char- to the distribution of cues in an environment, appropriately acterize the broadest class of environments for which that applying either TTB or a weighted additive mechanism, de- heuristic performs (near) optimally, and experimentally in- pending on which will be more accurate. However, how peo- vestigate how often people use that heuristic in such environ- ple are able to determine which type of environment they ments. are in has largely remained an open question. Establishing Instantiating FSH with the well-known Take The Best distribution-free guarantees, as advocated by FSH, sheds new (TTB) heuristic, and contrary to past analytical work, in light on this open question, by formally demonstrating that this work we considered a much broader class of predic- a heuristic may well yield adequate performance despite the tion problems involving nonlinear objective functions (Def- decision-maker’s possibly incomplete (or, in the worst case, initions 1-4) with interactions between attributes also being erroneous) assumptions about her environmental conditions, accounted for (Definition 2). For the classes discussed above, thereby liberating her from having a thorough understand- we formally established conditions granting the optimality of ing of her environment—a more psychologically plausible TTB when dealing with both non-binary, discrete attribute assumption. For example, Proposition 5 establishes a condi- values (Propositions 3, 5, and 6) and continuous attribute tion granting the optimality of TTB (with respect to the class values (Proposition 4). We also analytically investigated a of problems characterized in Definition 1) that holds true for broad class of prediction problems—involving both struc- any joint probability distribution P on the set of attribute val- tured (Definition 3) and unstructured noise (Definition 4)— ues. This result has an important implication: Even if the for which only probabilistic guarantees can be provided. Ad- decision-maker makes wrong assumptions about the true un- ditionally, and in sharp contrast to past analytical work, we derlying distribution P governing the set of attribute values, also provided distribution-free guarantees on TTB for several adopting TTB still remains to be the optimal strategy (for the classes of prediction problems (Propositions 5–6, and 8). class of problems characterized in Definition 1). Crucially, Our work also serves as a potential template for how FSH the latter statement remains valid regardless of how wrong could be pursued: For a given heuristic, formally characterize the decision-maker’s assumptions about P are. the class of decision-making problems with respect to which We must note, however, that the present work (and pur- the performance of the heuristic is to be analytically inves- suit of FSH, in general) does not address the recent conun- tigated, followed by analytical results rigorously delineating drum raised by Otworowska et al. (2018) regarding the com- the extent to which that heuristic is performing (near) opti- putational intractability of the Adaptive Toolbox theory. As mally for that class. A generic approach would be to start Otworowska et al. (2018) analytically demonstrate, there ex- with a narrow class (containing a set of restricted problems) ists no efficient (i.e., polynomial-time) process that can adapt for which a heuristic is performing (near) optimally; and then toolboxes to be ecologically rational for all possible environ- gradually expand that class into a larger one and see if previ- ments. A resolution of this complexity-theoretic conundrum ously established performance guarantees still hold (or to es- might be attained by restricting the class of environments un- tablish new performance guarantees, in case they fail to hold). der consideration, based on the psychologically plausible as- A similar approach has been widely and productively used in sumption that the range of environments humans have to deal theoretical computer science and computational complexity with is undoubtedly vast, but not arbitrary. theory, e.g., through formally introducing many complexity Pursuit of FSH would have important implications for ex- classes, with one class serving as a relaxation of another. perimental wok on heuristics: Every analytical result (how- Our particular focus on TTB in this work was only meant to ever general it may be) is established under a particular set showcase how the mindset advocated by FSH could be pur- of assumptions the validity of which needs to be experimen- sued in the case of a given heuristic—in our case, the Take tally confirmed. Experimental work should therefore inves- The Best (TTB) heuristic. Ultimately, a serious investiga- tigate the empirical validity of such assumptions. Likewise, tion of FSH should lead to having mathematically rigorous an empirical disconfirmation of an assumption on which an answers to the two stated goals of FSH for every experimen- analytical result rests should call for the development of new tally well-documented heuristic that people use, e.g., the Tal- empirically-grounded formal results. Accordingly, pursuit of lying heuristic (Gigerenzer & Gaissmaier, 2011), the Priority FSH yields new experimental work, and, conversely, those heuristic (Katsikopoulos & Gigerenzer, 2008), the Recogni- experimental findings guide the development of new analyti- tion heuristic (Gigerenzer & Gaissmaier, 2011), and the Min- cal results—a synergetic scientific endeavor. imalist heuristic (Gigerenzer et al., 2008). By now, a large Finally, pursuit of FSH allows mathematicians and theo- number of heuristics are documented in the literature, many retical computer scientists to make important contributions of which still lack an adequate characterization of the en- to the science of heuristics by developing a mathematically- 881 rigorous foundation for the effectiveness of heuristics in ev- Gigerenzer, G., & Todd, P. M. (1999). Simple Heuristics that Make eryday life decisions. As such, we hope that FSH paves the us Smart. New York: Oxford University Press. way for having a highly interdisciplinary research program on Hoffart, J. C., Rieskamp, J., & Dutilh, G. (2018). How environmental regularities affect people’s information search in probability heuristics wherein analytical and experimental studies, hand judgments from experience. Journal of Experimental Psychol- in hand, deepen our understanding of the effectiveness of the ogy. Learning, Memory, and Cognition. heuristics we live by. We see our work as a step toward that. Hoffrage, U., & Reimer, T. (2004). Models of bounded rationality: The approach of fast and frugal heuristics. Management Revue, Investigations into human judgment and decision-making 437–459. have led to the discovery of a multitude of cognitive biases Hogarth, R. M., & Karelaia, N. (2005). Simple models for multi- and fallacies, with new ones continually emerging, leading to attribute choice with many alternatives: When it does and does a state of affairs which can be characterized as the cognitive not pay to face trade-offs with binary attributes. Management Science, 51(12), 1860–1872. fallacy zoo! Recently, we have formally presented a princi- Hogarth, R. M., & Karelaia, N. (2006). take-the-best and other sim- pled way to bring order to this zoo (Nobandegani, Campoli, & ple strategies: Why and when they work well with binary cues. Shultz, 2019). The work presented here, together with recent Theory and Decision, 61(3), 205–249. formal advances on bringing systematic order to the cognitive Katsikopoulos, K. V., & Gigerenzer, G. (2008). One-reason decision-making: Modeling violations of expected utility theory. fallacy zoo (Nobandegani, Campoli, & Shultz, 2019), suggest Journal of Risk and Uncertainty, 37(1), 35. a fresh formal approach to pursuing the heuristics-and-biases Kearns, M. J., Vazirani, U. V., & Vazirani, U. (1994). An introduc- research program: an approach which aims to lay the formal tion to computational learning theory. MIT press. foundations of the “unreasonable” effectiveness of the heuris- Martignon, L., & Hoffrage, U. (2002). Fast, frugal, and fit: Simple tics we live by, and to bring mathematically-rigorous system- heuristics for paired comparison. Theory and Decision, 52(1), 29–71. atic order to the cognitive biases ensued by those heuristics. Mousavi, S., Gigerenzer, G., & Kheirandish, R. (2016). Rethinking behavioral economics through fast-and-frugal heuristics. Rout- References ledge Handbook of Behavioral Economics, 280–296. Baucells, M., Carrasco, J. A., & Hogarth, R. M. (2008). Cumula- Nobandegani, A. S., Campoli, W., & Shultz, T. R. (2019). Bringing tive dominance and heuristic performance in binary multiattribute order to the cognitive fallacy zoo. In Proceedings of the 17th choice. Operations Research, 56(5), 1289–1304. International Conference on Cognitive Modeling. Montreal, QC. Broder,¨ A. (2000). Assessing the empirical validity of the” take-the- Nosofsky, R. M., & Bergert, F. B. (2007). Limitations of exem- best” heuristic as a model of human probabilistic inference. Jour- plar models of multi-attribute probabilistic inference. Journal nal of Experimental Psychology: Learning, Memory, and Cogni- of Experimental Psychology: Learning, Memory, and Cognition, tion, 26(5), 1332. 33(6), 999. Broder,¨ A. (2003). Decision making with the” adaptive toolbox”: Otworowska, M., Blokpoel, M., Sweers, M., Wareham, T., & van influence of environmental structure, intelligence, and working Rooij, I. (2018). Demons of ecological rationality. Cognitive memory load. Journal of Experimental Psychology: Learning, Science, 42(3), 1057–1066. Memory, and Cognition, 29(4), 611. Pachur, T., Todd, P. M., Gigerenzer, G., Schooler, L., & Goldstein, Broder,¨ A., & Gaissmaier, W. (2007). Sequential processing of cues D. G. (2011). The recognition heuristic: A review of theory and in memory-based multiattribute decisions. Psychonomic Bulletin tests. Frontiers in Psychology, 2, 147. & Review, 14(5), 895–900. Payne, J. W., Bettman, J. R., & Johnson, E. J. (1988). Adaptive Broder,¨ A., & Schiffer, S. (2003). Take the best versus simultane- strategy selection in decision making. Journal of Experimental ous feature matching: Probabilistic inferences from memory and Psychology: Learning, Memory, and Cognition, 14(3), 534. effects of reprensentation format. Journal of Experimental Psy- Payne, J. W., R., B. J., & J., J. E. (1994). The adaptive decision chology: General, 132(2), 277. maker. New York: Cambridge University Press. Chater, N., Oaksford, M., Nakisa, R., & Redington, M. (2003). Rieskamp, J., & Otto, P. E. (2006). Ssl: a theory of how people Fast, frugal, and rational: How rational norms explain behavior. learn to select strategies. Journal of Experimental Psychology: Organizational Behavior and Human Decision Processes, 90(1), General, 135(2), 207. 63–86. Shah, A. K., & Oppenheimer, D. M. (2008). Heuristics made easy: Czerlinski, J., Gigerenzer, G., & Goldstein, D. G. (1999). How good An effort-reduction framework. Psychological Bulletin, 134(2), are simple heuristics? In Simple Heuristics that Make us Smart 207. (pp. 97–118). Oxford University Press. Todd, P. M., & Gigerenzer, G. (2000). Precis´ of simple heuristics Evans, J. S. B., & Over, D. E. (2010). Heuristic thinking and human that make us smart. Behavioral and Brain Sciences, 23(5), 727– intelligence: a commentary on marewski, gaissmaier and gigeren- 741. zer. Cognitive Processing, 11(2), 171–175. Todd, P. M., & Gigerenzer, G. (2007). Environments that make us Gigerenzer, G., & Gaissmaier, W. (2011). Heuristic decision mak- smart: Ecological rationality. Current directions in psychological ing. Annual Review of Psychology, 62, 451–482. science, 16(3), 167–171. Gigerenzer, G., & Goldstein, D. G. (1996). Reasoning the fast and Tversky, A. (1969). Intransitivity of preferences. Psychological frugal way: models of bounded rationality. Psychological Review, Review, 76(1), 31. 103(4), 650. Valiant, L. G. (1984). A theory of the learnable. Communications Gigerenzer, G., Hoffrage, U., & Kleinbolting,¨ H. (1991). Proba- of the ACM, 27(11), 1134–1142. bilistic mental models: a brunswikian theory of confidence. Psy- chological Review, 98(4), 506. Gigerenzer, G., Martignon, L., Hoffrage, U., Rieskamp, J., Czerlin- ski, J., & Goldstein, D. G. (2008). One-reason decision making. In: Handbook of Experimental Economics Results. In C. Plott & V. Smith (Eds.), (1st ed., Vol. 1, pp. 1004–1017). North Holland. Gigerenzer, G., & Selten, R. (2002). Bounded rationality: The adaptive toolbox. MIT press. 882