Université Paris Descartes

École doctorale 261 « Cognition, Comportements, Conduites Humaines » Laboratoire de Psychologie du Développement et de l’Éducation de l’enfant LaPsyDÉ, UMR 8240, CNRS

Testing the corrective assumption of in reasoning

Bence BAGO

Thèse de doctorat de Psychologie, mention Neurosciences Cognitives

Dirigée par Dr. Wim De Neys

Présentée et soutenue publiquement le 30 mai 2018.

Devant un jury composé de : De Neys, Wim, Directeur de recherche, CNRS – Directeur de la thèse Mercier, Hugo, Chargé de recherche, CNRS – Rapporteur Prado, Jérôme, Chargé de recherche, Université de Lyon 1 – Rapporteur Sander, Emmanuel, Professeur des Universités, Université de Genève – Examinateur Borst, Gregoire, Professeur des Universités, Université Paris Descartes – Examinateur

1

2 Bago Bence – Thèse de doctorat - 2018

Résumé (français) : Dans le champ du raisonnement, les théories du double processus sont largement reconnues comme expliquant différents phénomènes, tels que les biais décisionnels et le raisonnement moral ou coopératif. Ces théories conçoivent le mode de pensée de l’homme comme une interaction entre un système rapide, automatique et intuitif (Système 1) et un système plus lent et contrôlé (Système 2). Le point de vue dominant sur les double processus et le modèle default-interventionist qui suppose l’existence d’une interaction sérielle entre ces systèmes. Ainsi, quand quelqu’un est affronté à un problème de raisonnement, la réponse de Système 1 se forme initialement. Puis, le Système 2 peut être impliqué dans le processus. Les théories du double processus dominantes postulent que les biais de raisonnement sont le résultat d’une réponse intuitive erronée du Système 1. Selon ces théories le Système 1 est capable de générer des réponses basées sur les indices « heuristiques », tels que les stéréotypes – mais il ne peut pas rendre compte des principes logico-mathématiques. Malgré la grande reconnaissante qu’elle a reçu, cette théorie contient une présomption jamais testée, notamment la présomption « corrective ». Celle-ci postule que dans les situations où les indices heuristiques sont en conflit avec les principes logico-mathématiques, le Système 2 devient obligatoirement impliqué afin de corriger la réponse erronée de Système 1 et ainsi arrive à une réponse utilisant les principes logiques. Il semble donc crucial de tester cette présomption, qui est la question centrale de cette thèse. Dans l’Etude 1, j’ai utilisé des versions modifiées du paradigme de deux réponses afin de tester la présomption corrective utilisant deux problèmes classiques du raisonnement (problèmes de taux de base et de raisonnement syllogistique). Dans ce paradigme, les participants résolvent la même tâche deux fois. D’abord, ils doivent donner une réponse très rapidement. Après, ils font face à la même tâche sans contrainte temporelle. Afin de vérifier que la première réponse est intuitive, on a employé quatre méthodes : des instructions, une charge concomitante, un temps limite de réponse, ainsi que la charge concomitante et le temps limite simultanément. La théorie du double processus prédit que les réponses logiquement correctes n’apparaissent que dans l’étape finale. A contrario, j’ai trouvé que la plupart des participants ayant donné la bonne réponse à l’étape finale l’avaient déjà donnée lors de la phase initiale. Cet effet était présent dans toutes les procédures expérimentales et dans les 2 problèmes de raisonnement. Dans l’Etude 2, j’ai testé la même présomption avec un problème de raisonnement plus difficile, le problème de la « batte-et-balle ». J’ai conduit 7 expériences avec le paradigme de deux réponses et j’ai trouvé que les personnes ayant donné la réponse correcte à la fin l’ont déjà générée lors de la réponse initiale – donc, il semblerait que les participants l’ont fait intuitivement. Ces résultats m’ont amené à réviser le cadre default-interventionist et à proposer une théorie du double processus hybride qui suppose que le Système 1 génère deux différentes réponses intuitives dont une basée sur les principes logico-mathématiques. Ces réponses possèdent une force équivoque au début – celle qui gagnera plus en force sera donnée comme la réponse initiale. J’ai testé les prédictions dérivées de ce modèle via l’Etude 3. Dans l’Etude 4, j’ai utilisé l’EEG afin de retrouver les corrélats neuronaux du traitement logique précoce au cours du raisonnement. Au cours de l’Etude 5, j’ai commencé à tester la possibilité de généraliser ce modèle hybride et j’ai étudié si les patterns de réponse étaient similaires lorsque les participants répondent à des dilemmes moraux. Grâce l’Etude 6, j’ai mis au point le modèle hybride en testant les changements de force des réponses intuitives au cours du temps. En résumé, cette thèse montre la nécessité de réviser la vue traditionnelle des théories du double processus du raisonnement chez l’homme.

Mots clés (français) : Raisonnement, Théorie du double processus, Supposition « corrective »

3 Bago Bence – Thèse de doctorat - 2018

Abstract : Dual-process theories of reasoning have become widely recognized as an explanation for various phenomena, such as thinking biases, moral or cooperative reasoning. Dual-process theory conceives human thinking as the interaction of a fast, more automatic, intuitive system (System 1) and a slower, controlled, more deliberative one (System 2). Arguably, the most dominant view on dual processes is the default-interventionist model. This posits a serial interaction between the two systems. When someone is faced with a reasoning problem, initially a System 1 intuitive response is formed. Then, afterwards, System 2 might get engaged in the process. Prominent dual-process theorists argue that reasoning bias occurs as a result of erroneous System 1 intuition. System 1 is thought to be able to generate responses based on “” cues, such as rather than logico-mathematical principles. Despite its huge recognition, this theory comes with an untested assumption: the corrective (time-course) assumption. This posits that in cases in which heuristic cues are in conflict with logico-mathematical principles, System 2 needs to engage in order to correct initially formed System 1 intuitions, and form a judgement based on logical principles. Testing this assumption is inevitably important and the central question of this thesis. In Study 1, I used four modified versions of the two-response paradigm to test the corrective assumption with two different classic reasoning problems (base rate problems, syllogisms). In this paradigm, people are presented with the same problem twice. First, they are asked to give an initial, very quick response. After, they are presented with the same problem again and asked to give a final response without any constraints. To make sure that the initial response is really intuitive, I applied four different procedures: instructions, concurrent load, response deadline and load plus deadline. Dual process theory predicts that logically correct responses will only appear at the final response stage. Surprisingly, I found that the majority of people who gave the logically correct response in the final response stage already gave it form the beginning. This effect was found to be consistent among all experimental procedures and both reasoning problems. In Study 2, I tried to test the same assumption with a different -harder- reasoning task, the bat-and-ball problem. Interestingly, I ran 7 experiments with the two-response paradigm and consistently found that correct reasoners are often able to generate the correct response from the beginning, so-to-say, intuitively. These results forced me to revise the default-interventionist framework and propose the hybrid dual process model. This model now argues that System 1 generates two kinds of intuitive responses one of which is based on mathematico-logical principles. These responses are not necessarily generated with equal strength – the one which gains the more strength will be given as the initial response. In Study 3, I directly tested predictions derived from this model. In Study 4, I used EEG to search for the neural correlates of early logical processing in reasoning. In Study 5, I started to test the hybrid model’s domain generality, and test if I find similar patterns of responses when people are faced with moral dilemmas. In Study 6, I further developed the hybrid model by examining the changes in the strength of intuitive responses over time. Overall, this thesis found evidence that forces us to revise the traditional dual process view on human reasoning.

Keywords : Reasoning; Dual process theory; Corrective assumption

4 Bago Bence – Thèse de doctorat - 2018

No one can give you back the time you spend on reading this.

5 Bago Bence – Thèse de doctorat - 2018

Acknowledgements

This work would not have been started, let alone completed, without my supervisor. He has a very good eye for details and a probably unlimited source of patience. What I managed to learn about scientific writing over the last years, is directly coming from him. I would like to thank him for helping me cope with difficulties, fight with the sometimes irrational reviewers, the enormous amount of time and energy he invested in me and my career; and when I say enormous, I am not exaggerating. I sometimes wonder how he kept up with his other duties. If I become half as good of a researcher as he is, I will already be satisfied with myself. Wim, thank you very much. I do hope I will have the opportunity to continue working with you in the future.

Wim thought me half of what I know about research, but the other half is coming from my first supervisor, later collaborator and friend, Balázs Aczél. Balázs was the first one who introduced me to the world of reasoning and science in general. He thought me how to be critical about my own work. I thank him for his help all the way from the very beginning to this point. I thank him for all the endless debates we had about science over the years. I do hope they continue. I would also like to thank to members of ELTE Decisionlab in general, who remained my collaborators and friends even after I left Budapest; Aba Szöllösi, Andrei Földes, Barnabás Szászi, Bence Pálfi, Márton Kovács, Péter Szécsi.

I would like to say thank you to all the members of LaPsyDé. First of all, to Darren Frey with whom I enjoyed working a lot, and who always had a couple of nice words to make me feel better about my work. Also to Matthieu Raoelison whose programming skills helped develop this work as well. I would like to say special thanks to Julie Vidal who taught me all I know about EEGLab and EEG analysis in general. I would like to thank all of you for your direct contribution to this work. Furthermore, a huge thank you to Gregoire Borst, who literally always helped me with whatever problem I had, regardless of its size and required time to solve it.

I would like to thank all of the (used-to-be) Ph.D., students who made every day a bit brighter, a bit more fun; Emmanuel Ahr, Lison Bouhours, Annaelle Camarda, Adriano Linzarini, Cloelia Tissier. Special thanks to Margot Roell who always provided me a daily dose of gossip and French-English translations whenever it was necessary.

6 Bago Bence – Thèse de doctorat - 2018

I was granted an internship at the University of Amsterdam during my Ph.D. From here, Leendert van Maanen and Han van der Maas helped me to understand dynamic decision models and further develop my ideas. I owe them a lot of thanks for the opportunity and help.

Thanks to my parents who encouraged me throughout the way from the beginning. Who supported my decisions even though they did not always agree with me. My mother taught me how to fight the impossible, and my father taught how to figure out solutions in very hard situations. I am eternally grateful to my two siblings; Bálint and Zsófi who fulfilled my life with a lot of fun since they were born; and also to our cat, Ficzkó. There is only one last name I have not mention here. My beloved girlfriend, Anna. You are a grammar nazi. Way worse than Balázs. However, the fact that my life was not simply good but great in the last few years in Paris is entirely on you. Whenever a review took my will to live, you always gave it back. I’m eternally grateful for your love, patience and constant support. This work would simply not be here, if it was not for you.

I would like to dedicate the first paragraph of the Introduction to Bence Turbók who never did anything like that. “Never”. I would like to dedicate this work to all of my Hungarian best friends, my high school mates, who will also happen to be present when I defend this thesis.

Funding information This Ph.D., was fully sponsored by the Ecole des Neurosciences de Paris. My gratitude will follow them ‘til the end.

7 Bago Bence – Thèse de doctorat - 2018

Table of Contents

ACKNOWLEDGEMENTS ...... 6

TABLE OF CONTENTS ...... 8

RELATED SCIENTIFIC OUTPUT ...... 11

SYMBOLS AND ACCRONYMS ...... 13

INTRODUCTION ...... 14

REFERENCES ...... 27

CHAPTER 1: FAST LOGIC? EXAMINING THE TIME-COURSE ASSUMPTION OF DUAL PROCESS THEORY ...... 31

INTRODUCTION ...... 32 METHOD ...... 38 EXPERIMENT 1 – INSTRUCTIONS ONLY ...... 39 EXPERIMENT 2 – RESPONSE DEADLINE ...... 44 EXPERIMENT 3 – LOAD ...... 48 EXPERIMENT 4 – RESPONSE DEADLINE AND LOAD ...... 50 RESULTS ...... 50 GENERAL DISCUSSION ...... 69 REFERENCES ...... 77

CHAPTER 2: THE SMART SYSTEM 1: EVIDENCE FOR THE INTUITIVE NATURE OF CORRECT RESPONDING IN THE BAT-AND-BALL PROBLEM ...... 81

INTRODUCTION ...... 82 STUDY 1 ...... 87 METHOD ...... 87 MATERIALS ...... 87 RESULTS ...... 94 DISCUSSION ...... 99 STUDY 2-5 ...... 100 METHOD – STUDY 2: 2-OPTION VS 4-OPTION FORMAT (CROWDSOURCE SAMPLE) ...... 100 METHOD - STUDY 3: 2-OPTION VS 4-OPTION FORMAT (HUNGARIAN STUDENT SAMPLE) ...... 101 METHOD – STUDY 4: FREE RESPONSE FORMAT (CROWDSOURCE SAMPLE)...... 102 METHOD – STUDY 5: FREE RESPONSE FORMAT (HUNGARIAN CONVENIENCE SAMPLE) ...... 103 RESULTS ...... 103 DISCUSSION ...... 106

8 Bago Bence – Thèse de doctorat - 2018

STUDY 6 ...... 107 METHODS ...... 107 RESULTS AND DISCUSSION ...... 110 STUDY 7 ...... 111 METHODS ...... 111 RESULTS AND DISCUSSION ...... 113 GENERAL DISCUSSION ...... 115 REFERENCES ...... 122

CHAPTER 3: ADVANCING THE SPECIFICATION OF DUAL PROCESS MODELS OF HIGHER COGNITION: A CRITICAL TEST OF THE HYBRID DUAL PROCESS MODEL ...... 126

INTRODUCTION ...... 127 METHOD ...... 137 RESULTS ...... 142 GENERAL DISCUSSION ...... 148 REFERENCES ...... 154

CHAPTER 4: FAST AND SLOW THINKING: ELECTROPHYSIOLOGICAL EVIDENCE FOR EARLY CONFLICT SENSITIVITY ...... 157

INTRODUCTION ...... 158 METHOD ...... 162 PARTICIPANTS ...... 162 MATERIAL AND PROCEDURE ...... 163 RESULTS ...... 166 BEHAVIORAL RESULTS...... 166 GENERAL DISCUSSION ...... 171 REFERENCES ...... 174

CHAPTER 5: THE INTUITIVE GREATER GOOD: TESTING THE CORRECTIVE DUAL PROCESS MODEL OF MORAL COGNITION...... 178

INTRODUCTION ...... 179 STUDY 1 ...... 183 METHOD ...... 183 RESULTS ...... 190 STUDY 2 ...... 192 METHOD ...... 192 RESULTS ...... 194 STUDY 3 ...... 195 METHODS ...... 196 RESULTS AND DISCUSSION ...... 197 GENERAL DISCUSSION ...... 202 REFERENCES ...... 210

CHAPTER 6: RISE AND FALL OF CONFLICTING INTUITIONS DURING REASONING ...... 214

INTRODUCTION ...... 215

9 Bago Bence – Thèse de doctorat - 2018

STUDY 1 ...... 217 METHOD ...... 217 RESULTS ...... 220 DISCUSSION ...... 222 STUDY 2 ...... 223 METHOD ...... 223 RESULTS AND DISCUSSION ...... 223 GENERAL DISCUSSION ...... 225 REFERENCES ...... 227

CONCLUSION ...... 228

REFERENCES ...... 239

APPENDIX A ...... 241

APPENDIX B ...... 245

APPENDIX C ...... 261

I

10 Bago Bence – Thèse de doctorat - 2018

RELATED SCIENTIFIC OUTPUT

This thesis comprises the following peer-reviewed journal articles that have been accepted or submitted for publication:

Chapter 1. [1.] Bago, B., & De Neys, W. (2017). Fast logic?: Examining the time course assumption of dual process theory. Cognition, 158, 90-109

Chapter 2. [2.] Bago, B. & De Neys, W. (revision under review). The smart System 1: Evidence for the intuitive nature of correct responding in the bat-and-ball problem.

Chapter 3. [3.] Bago, B. & De Neys, W. (revision under review). Advancing the specification of dual process models of higher cognition: a critical test of a hybrid model view.

Chapter 4. [4.] Bago, B., Frey, D., Vidal, J., Houdé, O., Borst, G. & De Neys, W. (under review). Fast and Slow Thinking: Electrophysiological evidence for early conflict sensitivity.

Chapter 5. [5.] Bago, B. & De Neys, W. (under review) The intuitive greater good: Testing the corrective dual-process model of moral cognition.

Chapter 6. [6.] Bago, B., & De Neys, W. (2017). The rise and fall of conflicting intuitions during reasoning. Proceedings of the 39th Annual Meeting of the Cognitive Science Society. London, UK, 26th – 29th July, 2017., pp. 87-92

The Introduction and final Conclusion builds on all of these articles as well.

The following list includes additional scientific output achieved throughout the completion of this Ph.D., but is not included in the dissertation.

[7.] Bago, B. Roelison, M., De Neys, W. (under revision). Second guess: Testing the specificity of error detection in the bat and ball problem.

[8.] Frey, D., Bago, B., De Neys, W. (2017) Commentary: Seeing the conflict: an attentional account of reasoning errors. Frontiers in

[9.] Frey, D., De Neys, W., & Bago, B. (2016). The Jury of Intuition: Conflict Detection and Intuitive Processing. Journal of Applied Research in Memory and Cognition, 5(3), 335-337.

[10.] Aczel, B., Szollosi, A. & Bago, B. (2018). The Effect of Transparency on Framing Effects in Within-subjects Designs. Journal of Behavioral Decision Making, 31(1), 25-39.

[11.] Szollosi, A., Bago, B., Szaszi, B. & Aczel, B. (2017). Determinants of Confidence in the bat & ball problem. Acta Psychologica, 180, 1-7.

[12.] Aczel, B., Szollosi, A., & Bago, B. (2015). Lax monitoring versus logical intuition: The determinants of confidence in conjunction fallacy. Thinking & Reasoning, 22(1), 99-117

11 Bago Bence – Thèse de doctorat - 2018

The following list includes additional scientific output achieved before the start of the Ph.D., and which is not included in the dissertation.

[13.] Aczel, B., Bago, B., Szollosi, A., Foldes, A. & Lukacs, B. (2015). Is it time for studying real-life debiasing? Evaluation of the effectiveness of an analogical intervention technique. Frontiers in Psychology, 6:1120. doi: 10.3389/fpsyg.2015.01120

[14.] Aczel, B, Bago, B, Szollosi, A, Foldes, A & Lukács, B. (2015). Measuring individual differences in decision biases: methodological considerations. Frontiers in Psychology, 6:1770. doi: 10.3389/fpsyg.2015.01770

[15.] Aczel, B., Bago, B., Foldes, A. (2012). Is there evidence for automatic imitation in a strategic context? Proceedings of the Royal Society B: Biological Sciences. 279(1741), 3231- 3233.

[16.] Aczel, B., Kekecs, Z., Bago, B., Szollosi, A., Foldes, A., (2015). An Empirical Analysis of the Methodology of Automatic Imitation Research in Strategic Context. Journal of Experimental Psychology: Human Perception and Action, 41 (4), 1049-1062. doi:http://dx.doi.org/10.1037/xhp0000081

12 Bago Bence – Thèse de doctorat - 2018

SYMBOLS AND ACCRONYMS

The following list contains the meaning of commonly used symbols, acronyms, and Abbreviations in this thesis.

Statistical Parameters df Degrees of Freedom b Regression coefficient M Mean N Sample Size p “The p value is the sum [or, for continuous t(), the integral] over values of the test statistic that are at least as extreme as the one that is actually observed.” (Wagenmakers, 2007, p. 782) r Correlation (Pearson, except when otherwise noted) SE Standard Error SD Standard Deviation t T - Statistic Z Z - Statistic χ2 Chi Squared, with df degrees of freedom

13 Bago Bence – Thèse de doctorat - 2018

Introduction

On the 21st of May in 2004, Micheal from Texas wanted to get drunk. He was an alcoholic, so this is least surprising. The problem was that he had a painful throat illness, so he could not take the alcohol as he normally would. Thus, he decided to take the alcohol rectally this time. He went in the shop, asked for two bottles of sherry, then went home and poured it up in his anus. He absorbed 3 litres of sherry in total. The next morning, he was found dead. according to toxicology reports, he died out of alcohol poisoning. For his own successful removal out of the gene pool, he received the Darwin award for “astounding misapplication of judgment” (http://darwinawards.com/darwin/darwin2007-13.html).

This is a “good” example of what a thinking bias is; when people’s decision deviates from any normative rule. Which is, in this case, is not to pour up several litres of strong alcohol in your anus. Such thinking and decision-making biases have long been a central interest of psychologists; why do people make such irrational (i.e., decisions which deviates from normative rules) decisions even when they face possible serious consequences? The systematic investigation of these biases started with the seminal work of and Amos Tversky. In their studies, Tversky and Kahneman usually gave people hypothetical situations in which to arrive at the correct solutions, participants had to apply abstract rules such as logic, probabilities or algebra. Let me demonstrate it with an example, based on the famous base rate neglect problem (Tversky & Kahneman, 1974): There is an event with 1000 people. Jo is a randomly chosen participant who attended the event. We know that Jo is 23 years old and is finishing a degree in engineering. On Friday nights, Jo likes to go out and drink beer. We also know that 997 people attending the event were women. What is most likely: Is Jo a man or a woman? In this problem, people are first presented with a description of the person ‘Jo’. This stereotypic description cues the response that Jo is a man. People are also presented with base rate information which cues the response that Jo is a woman. When giving a response, the majority of people tend to ignore this information and go solely with the stereotypic cue. Over the years, many similar biases have been revealed and investigated (Kahneman, 2011).

14 Bago Bence – Thèse de doctorat - 2018

There are two more examples I would like to highlight, as they will be an integral part of this thesis. One is the notorious bat-and-ball problem (Frederick, 2005), in which participants are asked the following question: “A bat and a ball together cost $1.10. The bat costs $1 more than the ball. How much does the ball cost?” Nearly 80% of respondents immediately provide the response ‘10 cents’ to this question (including graduates from world-leading universities; Frederick, 2005). This seems intuitively very appealing, but it is nevertheless wrong. Clearly, if the ball costs 10 cents the bat has to cost $1.10 (because it costs $1 more). Therefore, in this case, the bat and the ball would cost together $1.20 and not $1.10. The correct solution is 5 cents – only this could satisfy every criterion. The second example one is the so-called ‘’ in syllogistic reasoning (Evans, Barston, & Pollard, 1983). In this problem, people are given two premises and a conclusion and asked whether the conclusion logically follows from the premises. Here is an example: Premise 1: All dogs have four legs Premise 2: Puppies are dogs Conclusion: Puppies have four legs The conclusion is very believable, and it also follows logically (in terms of formal, Aristotelian logic). Indeed, people can judge the logicality of this problem very easily and accurately. Consider this problem on the other hand: Premise 1: All dogs have four legs Premise 2: Puppies have four legs Conclusion: Puppies are dogs This conclusion still sounds very believable, but this conclusion does not actually follow logically from the premises. Nonetheless, in this problem people usually make judgements based on their beliefs and accept the invalid conclusion. People are biased by their beliefs. Arguably, the most widely recognized explanation for reasoning bias is offered by so- called dual-process theories of reasoning (Evans, 2010; Frederick, 2005; Kahneman, 2011; Stanovich, 2011). This theory distinguishes two kinds of reasoning systems. System 1 is the intuitive system which can produce a response very quickly, without putting a load on central cognitive capacity. This system makes decisions based on so-called heuristic cues; information such as stereotypes or common beliefs. The key trait of System 1 is automaticity – and that is the key and defining difference between the two systems. System 2 requires cognitive control to operate and posits high load on central capacity

15 Bago Bence – Thèse de doctorat - 2018

(Stanovich & Toplak, 2012). It is this system that is believed to be able to produce responses on probabilistic or logical information. The essence of the dual-process theory lies in the interaction between the two systems. Two major models have been proposed: the default-interventionist and the parallel- competitive model. The default-interventionist model posits a strictly serial view where System 1 produces a quick, initial response (Evans & Stanovich, 2013). After it is completed, System 2 gets engaged and might override the initial response. On the other hand, according to the parallel-competitive view, System 1 and 2 are activated in a parallel way, right from the beginning of the reasoning process (Epstein, 1994; Sloman, 1996).

Figure 1. Figure shows how many articles make reference to “dual process” theory. The graph was done by Web of Science.

However, proponents of both views agree that in situations such as the base rate problem, System 1 produces a response based on heuristic cues, and does not take into account the probabilistic information. Thus, this very quick intuitive response generated with great valence, makes people err in the first hand. Monitoring and correcting this erroneous intuitive response is the task of System 2. Arguably, over the years, the default-interventionist view became the most popular and influential one. Therefore, I will be focusing on this model throughout my dissertation. Dual process theories have become increasingly popular in psychology and influenced theorizing in a wide range of domains such as moral reasoning or (Greene, 2013; Rand, Greene, & Nowak, 2012). Figure 1 represents the number of articles referring to “dual

16 Bago Bence – Thèse de doctorat - 2018 process theory” in Web of Science. One cannot overlook the exponentially growing number of scientific articles on this topic. Hence, its increasing importance and wide application make the empirical testing of the assumptions of dual process theory essential.

One of the most important assumptions this theory makes is the so-called “corrective” assumption. This posits that in situations in which there is a conflict between heuristic cues and logico-mathematical considerations, System 1 initially provides a response based on heuristic cues. Thus, giving the logic or probability-based response requires the engagement of System 2 and the correction of the initial, intuitive response (hence the name “corrective assumption”). This assumption has never been directly tested before. This shortcoming will be the main focus of this thesis. By doing this, I hope to achieve a better view of why people are biased (and not biased) in their thinking. Therefore, in the long-term, I hope to help people detect and correct their own mistakes. At this point, it is important to emphasize that there is not a single “default- interventionist” theory, but more have been proposed over the years; for example the ‘Tripartite model’ by Keith Stanovich (Stanovich, 2009), or the ‘Heuristic-Analytic theory’ by Jonathan Evans (Evans, 2006). These theories differ in a number of key points, but they still build on the same logic. They all agree that the defining feature for System 1 is that it is automatic, and for System 2 that it is more controlled. Most importantly, all default- interventionist views posit that in the presence of logic-heuristic conflict, System 2 is needed to override the heuristic System 1 response in order to arrive at the logical response. Hence, in any specific theory, the corrective assumption stands. Testing it cannot help us differentiate between these theories, but can nevertheless falsify both of them. We will gain a general picture and see if the never-tested general idea works at all. First, I will review key findings that are typically cited in support of dual process theory and evaluate the evidence in light of what they can really say about the corrective assumption.

Reaction times and response deadline

One of the most important features of System 2 is that it works slower than System 1, it takes more time to give a response. Unfortunately, dual-process theories are quite underspecified in this respect (Kruglanski, 2013); they cannot specify that System 1 = X sec, but can only say, that it is generally faster than System 2 response. Thus, in general, one can only argue that a System 2 response should take more time than a System 1 response. This

17 Bago Bence – Thèse de doctorat - 2018 translates into the operationalized hypothesis that giving the correct, probabilistic response on the base rate problem should take more time than the incorrect, heuristic response. In the same vein, if people are presented with a strict response deadline when giving a response to a reasoning problem, they will be less likely to engage in deliberative reasoning, therefore more incorrect response will be generated. Results do seem to support both of these hypotheses; De Neys (2006) found that logic or probability based responding generally takes more time, than heuristic responding, and Johnson, Tubau, and De Neys (2016) found the same for the bat- and-ball problem as well. In another illustrative study, Evans and Curtis- Holmes (2005) presented people with a 2 sec response deadline while presenting them with syllogistic reasoning problems. They indeed found that this manipulation decreased the accuracy level on problems where the believability of the conclusion was in conflict with its logical validity.

This evidence, however, cannot say anything about the time course of the processing. It simply does not follow that in the case of correct responding, the heuristic response was generated before the correct response. It is still possible that people who gave the correct response never considered the heuristic responses. Thus, we cannot draw conclusions with regard to the corrective assumption.

Working memory load

As discussed above, one of the defining features of System 2 is that it requires working memory to operate. Thus, if one restricts working memory usage of the participants, then it is expected that System 2 will be “knocked out”, in other words, will not be able to engage in the reasoning process. The same hypothesis can be derived as for the response deadline experiments; correct responding on reasoning problems should decrease due to the unavailability of System 2. These experiments regularly make participants respond to reasoning problems while they are making them keep in mind a number of letters, numbers or images. . This extra task keeps working memory occupied, and hence, it will not be available to the reasoning process. Indeed, various experiments have found the expected effect for the base rate problem (Franssens & De Neys, 2009) and the bat-and-ball problem (Johnson et al., 2016). I would like to note that what these experiments suggest is that for some participants, System 2 is required to get to the correct solution. However, the accuracy level never goes down to 0%. There are always some people who manage to solve the task even under very

18 Bago Bence – Thèse de doctorat - 2018 high load (Johnson et al., 2016). There is indeed a noticeable, significant performance decrease, but clearly, even under very high load, many respondents can produce the correct response. This observation in and by itself can already raise doubts against the corrective assumption.

Individual differences

Keith Stanovich and colleagues run a set of studies where they tried to test what determines if a person responds correctly to reasoning problems or not. In these studies, they continuously observed that performance on reasoning problems positively associated with general intelligence, working memory , cognitive reflection abilities, numeracy and various thinking dispositions such as actively open-minded thinking (Stanovich & West, 2001; Toplak, West, & Stanovich, 2011). As these measures are thought to be associated with the efficiency of System 2 deliberation, these results might suggest that correct responding occurs for those people who are capable of high-quality deliberation – those who have a lower level of working memory capacity or IQ, might simply not be able to figure out the correct solution. This, however, does not directly validate the corrective assumption. This correlation does not mean that correct respondents first generated the incorrect, heuristic response; neither do they mean that people with higher working memory capacity or IQ are more likely to engage in System 2. It might simply be that high IQ reasoners are more likely to give a correct response because they have more accurate intuitions (Thompson & Johnson, 2014). Individual difference studies are uninformative with respect to the corrective assumption.

The two-response paradigm

Probably the most direct evidence to evaluate the corrective assumption comes from the so-called two-response paradigm (Newman, Gibb, & Thompson, 2017; Pennycook & Thompson, 2012; Thompson & Johnson, 2014; Thompson, Turner, & Pennycook, 2011). In this paradigm, participants are presented with a reasoning problem twice. First, they are asked to give a very quick intuitive, initial response. After, they are asked to give a final response without any constraints. In these settings, we could expect a negligible amount of accurate responses for reasoning problems at the initial response stage. In a set of experiments

19 Bago Bence – Thèse de doctorat - 2018

Thompson and colleagues found only a little amount of response change between the initial and the final responses. The small number of changes already suggests that there might be problems with the corrective assumption; it suggests that those people who gave a correct response at the end, already gave it from the beginning. Arguably, this would not be in line with the corrective assumption. Clearly, the low amount of response change also suggests that not a lot of people changed their incorrect response too. This is not a problem from the corrective assumption point of view. In fact, not engaging in additional deliberative reasoning is one of the major causes of failing to respond logically to reasoning problems. However, Thompson and colleagues only used instructions to assure that the initial response is intuitive. One cannot be sure whether participants had not engaged in any type of deliberative reasoning during that stage. This problem will be further addressed later in this thesis.

Conflict detection

Another line of studies focuses on the question whether people detect the conflict between their biased, heuristic response and logical considerations. For this reason, these studies usually contrast ‘conflict’ and ‘no-conflict’ problems. Conflict problems are similar to those in the examples, where heuristic and logical considerations are not in line with each other. In contrast, in no-conflict versions, the heuristic response is also logically correct. One can achieve this by, for example, changing the base rate ratios (e.g., there are 997 men and 3 woman at the party in the introductory). Now, stereotypes and base rates cue the same response and point to the same conclusion: Jo must be a man. In short, conflict detection studies showed that even incorrect, biased reasoners show sensitivity to logical cues. For example, when solving conflict (vs. no-conflict) problems even incorrect responders show elevated response times (e.g., Bonner & Newell, 2010; De Neys & Glumicic, 2008; Stupple, Ball, & Ellis, 2013; Villejoubert, 2009), decreased post-decision confidence (e.g., De Neys, Cromheeke, & Osman, 2011; De Neys, Rossi, & Houdé, 2013; Gangemi, Bourgeois-Gironde, & Mancini, 2015), and increased activation in brain regions believed to mediate conflict detection (De Neys, Vartanian, & Goel, 2008; Simon, Lubin, Houdé, & De Neys, 2015). But this is not the whole story. At this point, it becomes clear that participants do not have a bias blind spot as suggested by dual-process theorists. But it was not known whether people detect conflict as a result of intuitive System 1 processing, or because System 2 becomes engaged.

20 Bago Bence – Thèse de doctorat - 2018

For this, Johnson et al. (2016) conducted a study where they applied . Recall, cognitive load makes System 2 engagement highly unlikely. Thus, if cognitive load has no effect on conflict detection measures, one can argue that conflict detection is a result of System 1, intuitive processing. Results indeed confirmed this thesis; Johnson et al., 2016 found evidence for successful conflict detection even when very high load was applied. There is another line of studies worth mentioning, conducted by Simon Handley and colleagues. They used belief bias problems and in one condition, they asked participants to provide an evaluation of the conclusion by its believability (Handley, Newstead, & Trippas, 2011; Handley & Trippas, 2015). They made no reference to the logical validity of the problem. They found that participants made more errors and took more time to respond when answering problems where logical validity was in conflict with the heuristic answer. This suggests that participants automatically engaged in some kind of logical processing. In another illustrative study, Trippas, Handley, Verde, & Morsanyi (2016) asked participants to judge how bright the conclusion looked . They found that biased participants judged the conclusion of the logically valid conclusions as brighter, even though they were written with the same level of brightness. This suggests that participants had an intuitive, implicit sense of logic. This also contradicts the dual process predictions which portrays logic as a result of explicit, controlled processing. All these results led a number of authors to argue that there is some kind of elementary logical processing happening at the early stages of the reasoning process (De Neys, 2012, 2014; De Neys & Glumicic, 2008; Pennycook, Fugelsang, & Koehler, 2015). These are called “logical intuitions” (i.e., automatically generated responses based on the activation of basic principles of logic or probabilities). This gave rise to a new wave of dual process theorizing, the “hybrid” dual process model. Early versions of this model argued that System 1 generates two kinds of responses; a logical and a heuristic one (De Neys, 2012). Once people detect the conflict, System 2 becomes engaged to resolves it. Conflict detection studies question dual process theory, but they are not clearly testing the corrective assumption. Bluntly put, they are saying nothing about the nature of the correct responses. Even the original hybrid model argued that System 2 engagement is necessary to give a final correct response. One also needs to note that conflict detection studies are also criticized (e.g., Aczel, Szollosi, & Bago, 2016; Klauer & Singmann, 2013; Mata, Schubert, & Ferreira, 2014; Pennycook, Fugelsang, & Koehler, 2012; Singmann, Klauer, & Kellen, 2014; Travers, Rolison, & Feeney, 2016) – with opponents usually arguing that the empirical effects are caused by shallow cues rather than logical processing.

21 Bago Bence – Thèse de doctorat - 2018

Brain Imaging

Another line of evidence one should consider, comes from brain imaging studies. By understanding which brain areas react to a given task, we might gain better insight in what cognitive mechanisms are at play when solving reasoning problems. For example, in an illustrative study, De Neys, Vartanian, & Goel (2008) used fMRI technique to investigate the difference between correct and incorrect reasoners in the base rate task. One of the regions of interest was the Right Lateral Prefrontal Cortex. The RLPFC has been previously shown to be associated with response inhibition; whenever people had to inhibit a primary, automatic response, their RLPFC became engaged (Aron, Robbins, & Poldrack, 2004). Recall, if the corrective assumption is right, correct reasoners must inhibit their initial response before giving the correct response. Thus, we should see increased RLPFC activation levels for correct reasoners compared to incorrect reasoners. Indeed, this is what the authors found. Thus, the authors concluded that inhibitory control is required in order to reach the correct solution. However, one might argue that this does not necessarily confirms the corrective assumption. Any brain region can be implicated in multiple processes. Hence, it cannot be excluded that the observed RLPFC activation resulted from a different (non-inhibitory) process. In addition, the RLPFC activation for correct responses was not replicated in a later study (Vartanian et al., in press). Additionally, in a meta-analysis, Prado, Chadha, & Booth (2011) collected fMRI studies of deductive reasoning. They showed that there is not a single brain network which underlies the mechanisms of all types of deductive reasoning. All types of reasoning task have a different pattern of activation/activated brain areas. The authors concluded that there is not a single type of representation underlying all types of deductive reasoning. The authors did not investigate dual process theory per se, but the simultaneous activation of multiple involved brain areas already show that there will probably not be a single brain area underlying computations for System1/2. Therefore, fMRI is probably not the best approach to evaluate the corrective assumption. In another study, Banks & Hope (2014) used EEG and the belief bias paradigm to investigate dual process theory. The EEG technique has a higher temporal resolution than fMRI; it might be better suited to test the dual process time course assumption (i.e., the corrective assumption). If the corrective assumption is right, we should see that logic and beliefs interact later in the reasoning process (Note: what “later” means is relative. As

22 Bago Bence – Thèse de doctorat - 2018 discussed, dual-process theories do not make it clear after what time System 2 should be engaged.) On the contrary, they found that logic and beliefs start interacting around 300 ms after stimulus onset. This can, in fact, already question the corrective assumption similarly to the conflict detection studies. This evidence rather argues that people process logical information from very early on during reasoning.

Mouse tracking

Lately, researchers started to use mouse cursor tracking to understand the dynamics of reasoning. In this paradigm, different response options are presented in the corners of the screen, and participants are instructed to move the cursor toward the response option they want to choose. This paradigm has been used to understand the dynamics of the processes behind decision making (Spivey, Grosjean, & Knoblich, 2005). One evidence for the corrective assumption could be that people start moving toward the heuristic response option before they end up giving the correct response. For example, Travers, Rolison & Feeney (2016) presented participants the bat-and-ball problem. They found that around 5 s after stimulus onset participants started to move toward the heuristic response option. Movements toward the correct response were observed only 10 s after stimulus onset. This provides some evidence for the corrective assumption. However, one could expect that an automatic, System 1 response should be generated right after reading the problem. This means that the 5 s time gap before the initial movements were initiated, is still quite long; the initial response might have been generated by then. Thus, one might argue that the procedure is not picking up on intuitive processing (Travers et al., 2016). Additionally, with another reasoning task, Szaszi, Palfi, Szollosi, Kieslich, & Aczel (2018) observed initial movements toward the correct solution, contrary to the observations of Travers et al. They further argued that based on this plain mouse tracking data, one cannot differentiate between different dual-process theories, as it simply is not clear which response (or mouse movement) is intuitive and which one is deliberative.

Dual process theory outside of logical reasoning: the case of morality

Dual process theory has become very influential and affected theorizing in other domains besides logical reasoning. A key example concerns the domain of moral reasoning. In this domain, researchers present people with moral dilemmas. In these dilemmas, usually two types of considerations are in conflict; one is the so-called ‘deontological’ reasoning

23 Bago Bence – Thèse de doctorat - 2018

(following a norm, such as ‘don’t kill’), or ‘utilitarian’ reasoning (doing whatever causes the greater good). Consider for example the following dilemma: Would you kill a terrorist in order to save a group of people he is intending to blow-up? In this example, deontological considerations suggest not to kill the terrorist, as killing him would be against the norm of ‘not killing’. On the other hand, a utilitarian reasoner would argue that you should kill the terrorist, as killing him would save more people, thus you would do what is causing the greater good. The corrective assumption argues that in such a conflicted situation System 1 will generate a response in accordance with deontological considerations. System 2 deliberation is required to produce a utilitarian response. Hence, in other words, you would only find it acceptable to kill the terrorist after you engaged in effortful deliberation, and corrected the initial deontological intuition (Greene, Nystrom, Engell, Darley, & Cohen, 2004; Greene, Sommerville, Nystrom, Darley, & Cohen, 2001; Gürçay & Baron, 2017). As in the case of logico-mathematical reasoning, although scholars have run numerous latency and load studies (Greene, Morelli, Lowenberg, Nystrom & Cohen, 2008; Suter & Hertwig, 2011) the core corrective assumption has not been directly tested.

Chapter Summary

In sum, the corrective assumption remains untested. That is precisely where I started my investigations. In Chapter 1, I present four experiments with a modified two response paradigm. I used four kinds of experimental manipulations in order to assure that the initial response is indeed intuitive. Namely, instructions, cognitive load, response deadline, and response deadline plus cognitive load. I also used base rate problems and syllogistic reasoning problems to test the generalizability of the findings (to assure that whatever I would find in one task would replicate in another task). I consistently found across tasks and manipulations that the majority of correct respondents who gave the correct response in the end, generated it from the very beginning, so-to-say, intuitively – contrary to the corrective assumption. To account for the surprising findings I present an updated version of the hybrid model. In this model, two intuitive, initial responses are generated simultaneously; one of them is based on heuristic cues, while the other one is based on logico-mathematical principles. These two intuitive responses differ in their strength level; whichever is stronger becomes activated and will be given as an initial response. The relative difference between the two competing intuitive responses defines the level of experienced conflict.

24 Bago Bence – Thèse de doctorat - 2018

In Chapter 2, I present 7 studies, in which I tried to replicate the findings with a harder reasoning task the bat-and-ball problem. I also consistently found that regardless of the answer format people are presented with (2 response options, 4 response options, free response) people still generate the correct response from the very beginning. Across my studies, I observe that correct final responses are often non-corrective in nature. Many reasoners who manage to answer the bat-and-ball problem correctly after deliberation already solved it correctly when they reasoned purely intuitively in the initial response phase. This implies that sound reasoners do not necessarily need to deliberate to correct their intuitions, their intuitions are often already correct. Pace the corrective view, findings suggest that in these cases they deliberate to verify correct intuitive insights. In Chapter 3, I empirically test predictions derived from the hybrid dual process model. I also adopted the two-response paradigm with the base-rate neglect problems. By manipulating the extremity of the base-rates, I aimed to affect the strength of the logical intuition that is hypothesized to cue selection of the correct base-rate response. Consistent with the hybrid model predictions, I observed that experimentally reducing the strength of the logical intuition decreased the number of correct intuitive (e.g., initial) responses when solving problems in which heuristic and logical intuitions conflicted. Critically, incorrect intuitive responders were less likely to register the intrinsic conflict (as reflected in decreased confidence) in this case, whereas correct intuitive responders registered more conflict. Implications and remaining challenges for dual-process theorizing are discussed. In Chapter 4, I further tested hybrid and default-interventionist models. I relied on the temporal resolution of electroencephalogram (EEG) recordings to decide between different models. I adopted base-rate problems in which an intuitively cued stereotypical response was either congruent or incongruent with the correct response that was cued by the base-rates. Results showed that solving problems in which the base-rates and stereotypical description cued conflicting responses resulted in an increased centro-parietal N2 and frontal P3. Consistent with the hybrid model predictions, this early, split-second conflict sensitivity suggests that the critical base-rates can be processed fast without slow and deliberate System 2 reflection. In Chapter 5, I tested the generality of my findings by focusing on the corrective assumption in the domain of moral reasoning. The influential dual process model of moral cognition posits that utilitarian responses to moral dilemmas (i.e., choosing the greater good) require deliberate correction of an initial, intuitive deontological response that primes us not to hurt others. In this chapter, I used the two-response paradigm to test this critical

25 Bago Bence – Thèse de doctorat - 2018

“corrective” dual process assumption. Participants had to give their first, initial response to moral dilemmas under time-pressure and cognitive load. Next, they could take the time to reflect on the problem and give a final response. This allowed me to identify the intuitively generated response that preceded the final response given after deliberation. Results across three studies consistently show that in the vast majority of cases (+70%) in which people opt for a utilitarian response after deliberation, the utilitarian response is already given in the initial phase. Hence, utilitarian responders do not need to deliberate to correct an initial deontological response. Their intuitive response is already utilitarian in nature. I argue that results support the hybrid dual process model over the default-interventionist view. In Chapter 6, I further test on the hybrid model. A key challenge at this point is to search for boundary conditions; identify cases in which the strength of different intuitions will be weaker/stronger. Therefore, I ran two studies with the two-response paradigm I adopted base-rate problems in which base rate and stereotypic information can cue conflicting intuitions. By manipulating the information presentation order, I aimed to manipulate their saliency; and by that, indirectly the activation strength of the intuitions. Contrary to my expectation, I observed that the order manipulation had opposite effects in the initial and final response stages. I explain these results by taking into account that the strength of intuitions is not constant but changes over time; they have a peak, a growth, and a decay rate.

26 Bago Bence – Thèse de doctorat - 2018

REFERENCES

Aczel, B., Szollosi, A., & Bago, B. (2016). Lax monitoring versus logical intuition: The determinants of confidence in conjunction fallacy. Thinking & Reasoning, 22(1), 99– 117. https://doi.org/10.1080/13546783.2015.1062801 Aron, A. R., Robbins, T. W., & Poldrack, R. A. (2004). Inhibition and the right inferior frontal cortex. Trends in cognitive sciences, 8(4), 170-177. Banks, A. P., & Hope, C. (2014). Heuristic and analytic processes in reasoning: An event‐ related potential study of belief bias. Psychophysiology, 51(3), 290–297. Bonner, C., & Newell, B. R. (2010). In conflict with ourselves? An investigation of heuristic and analytic processes in decision making. Memory & Cognition, 38(2), 186–196. De Neys, W. (2006). Automatic–heuristic and executive–analytic processing during reasoning: Chronometric and dual-task considerations. The Quarterly Journal of Experimental Psychology, 59(6), 1070–1100. De Neys, W. (2012). Bias and conflict a case for logical intuitions. Perspectives on Psychological Science, 7(1), 28–38. De Neys, W. (2014). Conflict detection, dual processes, and logical intuitions: Some clarifications. Thinking & Reasoning, 20(2), 169–187. De Neys, W., Cromheeke, S., & Osman, M. (2011). Biased but in doubt: Conflict and decision confidence. PloS One, 6(1), e15954. De Neys, W., & Glumicic, T. (2008). Conflict monitoring in dual process theories of thinking. Cognition, 106(3), 1248–1299. De Neys, W., Rossi, S., & Houdé, O. (2013). Bats, balls, and substitution sensitivity: Cognitive misers are no happy fools. Psychonomic Bulletin & Review, 20(2), 269– 273. De Neys, W., Vartanian, O., & Goel, V. (2008). Smarter than we think: When our brains detect that we are biased. Psychological Science, 19(5), 483–489. Epstein, S. (1994). Integration of the cognitive and the psychodynamic unconscious. American Psychologist, 49(8), 709–724. Evans, J St BT. (2010). Thinking Twice: Two Minds in One Brain. Oxford: Oxford University Press. Evans, J. St BT, Barston, J. L., & Pollard, P. (1983). On the conflict between logic and belief in syllogistic reasoning. Memory & Cognition, 11(3), 295–306. Evans, J. St BT. (2006). The heuristic-analytic theory of reasoning: Extension and evaluation. Psychonomic Bulletin & Review, 13(3), 378–395. Evans, J. St BT, & Curtis-Holmes, J. (2005). Rapid responding increases belief bias: Evidence for the dual-process theory of reasoning. Thinking & Reasoning, 11(4), 382– 389. Evans, J. St BT, & Stanovich, K. E. (2013). Dual-process theories of higher cognition advancing the debate. Perspectives on Psychological Science, 8(3), 223–241. Franssens, S., & De Neys, W. (2009). The effortless nature of conflict detection during thinking. Thinking & Reasoning, 15(2), 105–128.

27 Bago Bence – Thèse de doctorat - 2018

Frederick, S. (2005). Cognitive reflection and decision making. The Journal of Economic Perspectives, 19(4), 25–42. Gangemi, A., Bourgeois-Gironde, S., & Mancini, F. (2015). Feelings of error in reasoning— in search of a phenomenon. Thinking & Reasoning, 21(4), 383–396. https://doi.org/10.1080/13546783.2014.980755 Greene, J. D. (2013). Moral tribes: emotion, reason and the gap between us and them. New York, NY: Penguin Press. Greene, J. D., Morelli, S. A., Lowenberg, K., Nystrom, L. E., & Cohen, J. D. (2008). Cognitive load selectively interferes with utilitarian moral judgement. Cognition, 107(3), 1144-1154. Greene, J. D., Nystrom, L. E., Engell, A. D., Darley, J. M., & Cohen, J. D. (2004). The neural bases of cognitive conflict and control in moral judgment. Neuron, 44(2), 389-400. Greene, J. D., Sommerville, R. B., Nystrom, L. E., Darley, J. M., & Cohen, J. D. (2001). An fMRI investigation of emotional engagement in moral judgment. Science, 293(5537), 2105-2108. Gürçay, B., & Baron, J. (2017). Challenges for the sequential two-system model of moral judgement. Thinking & Reasoning, 23(1), 49-80. Handley, S. J., Newstead, S. E., & Trippas, D. (2011). Logic, beliefs, and instruction: A test of the default interventionist account of belief bias. Journal of Experimental Psychology: Learning, Memory, and Cognition, 37(1), 28–43. Handley, S. J., & Trippas, D. (2015). Chapter Two-Dual Processes and the Interplay between Knowledge and Structure: A New Parallel Processing Model. Psychology of Learning and Motivation, 62, 33–58. Johnson, E. D., Tubau, E., & De Neys, W. (2016). The Doubting System 1: Evidence for automatic substitution sensitivity. Acta Psychologica, 164, 56–64. Kahneman, D. (2011). Thinking, fast and slow. New York, NY: Farrar, Straus and Giroux. Klauer, K. C., & Singmann, H. (2013). Does logic feel good? Testing for intuitive detection of logicality in syllogistic reasoning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 39(4), 1265–1273. Kruglanski, A. W. (2013). Only one? The default interventionist perspective as a unimodel— Commentary on Evans & Stanovich (2013). Perspectives on Psychological Science, 8(3), 242–247. Mata, A., Schubert, A.-L., & Ferreira, M. B. (2014). The role of language comprehension in reasoning: How “good-enough” representations induce biases. Cognition, 133(2), 457–463. Newman, I., Gibb, M., & Thompson, V. A. (2017). Rule-based reasoning is fast and belief- based reasoning can be slow: Challenging current explanations of belief -bias and base-rate neglect. Journal of Experimental Psychology: Learning, Memory, and Cognition, 43(7), 1154–1170. Pennycook, G., Fugelsang, J. A., & Koehler, D. J. (2012). Are we good at detecting conflict during reasoning? Cognition, 124(1), 101–106. Pennycook, G., Fugelsang, J. A., & Koehler, D. J. (2015). What makes us think? A three- stage dual-process model of analytic engagement. , 80, 34–72.

28 Bago Bence – Thèse de doctorat - 2018

Pennycook, G., & Thompson, V. A. (2012). Reasoning with base rates is routine, relatively effortless, and context dependent. Psychonomic Bulletin & Review, 19(3), 528–534. Prado, J., Chadha, A., & Booth, J. R. (2011). The brain network for deductive reasoning: a quantitative meta-analysis of 28 neuroimaging studies. Journal of Cognitive Neuroscience, 23(11), 3483–3497. Rand, D. G., Greene, J. D., & Nowak, M. A. (2012). Spontaneous giving and calculated greed. Nature, 489(7416), 427–430. Simon, G., Lubin, A., Houdé, O., & De Neys, W. (2015). Anterior cingulate cortex and intuitive bias detection during number conservation. Cognitive Neuroscience, 6(4), 158–168. https://doi.org/10.1080/17588928.2015.1036847 Singmann, H., Klauer, K. C., & Kellen, D. (2014). Intuitive logic revisited: new data and a Bayesian mixed model meta-analysis. PloS One, 9(4), e94223. Sloman, S. A. (1996). The empirical case for two systems of reasoning. Psychological Bulletin, 119(1), 3–22. Spivey, M. J., Grosjean, M., & Knoblich, G. (2005). Continuous attraction toward phonological competitors. Proceedings of the National Academy of Sciences of the United States of America, 102(29), 10393–10398. Suter, R. S., & Hertwig, R. (2011). Time and moral judgment. Cognition, 119(3), 454-458. Stanovich, K. (2011). Rationality and the reflective mind. Oxford: Oxford University Press. Stanovich, K. E., & West, R. F. (2001). Individual differences in reasoning: Implications for the rationality debate? Behavioral and Brain Sciences, 23(05), 645–665. Stanovich, Keith E. (2009). Distinguishing the reflective, algorithmic, and autonomous minds: Is it time for a tri-process theory. In J. S. B. T. Evans & K. Frankish (Eds.), In two minds: Dual processes and beyond (pp. 55–88). New York, NY: Oxford University Press. Stanovich, Keith E., & Toplak, M. E. (2012). Defining features versus incidental correlates of Type 1 and Type 2 processing. Mind & Society, 11(1), 3–13. Stupple, E. J., Ball, L. J., & Ellis, D. (2013). Matching bias in syllogistic reasoning: Evidence for a dual-process account from response times and confidence ratings. Thinking & Reasoning, 19(1), 54–77. Szaszi, B., Palfi, B., Szollosi, A., Kieslich, P. J., & Aczel, B. (2018). Thinking dynamics and individual differences: Mouse-tracking analysis of the denominator neglect task. Judgment and Decision Making, 13(1), 23–32. Thompson, V. A., & Johnson, S. C. (2014). Conflict, metacognition, and analytic thinking. Thinking & Reasoning, 20(2), 215–244. https://doi.org/10.1080/13546783.2013.869763 Thompson, V. A., Turner, J. A. P., & Pennycook, G. (2011). Intuition, reason, and metacognition. Cognitive Psychology, 63(3), 107–140. Tinghög, G., Andersson, D., Bonn, C., Johannesson, M., Kirchler, M., Koppel, L., & Västfjäll, D. (2016). Intuition and moral decision-making–the effect of time pressure and cognitive load on moral judgment and altruistic behavior. PloS One, 11(10), e0164012.

29 Bago Bence – Thèse de doctorat - 2018

Toplak, M. E., West, R. F., & Stanovich, K. E. (2011). The Cognitive Reflection Test as a predictor of performance on -and-biases tasks. Memory & Cognition, 39(7), 1275–1289. Travers, E., Rolison, J. J., & Feeney, A. (2016). The time course of conflict on the Cognitive Reflection Test. Cognition, 150, 109–118. Trippas, D., Handley, S. J., Verde, M. F., & Morsanyi, K. (2016). Logic Brightens My Day: Evidence for Implicit Sensitivity to Logical Validity. Journal of Experimental Psychology: Learning, Memory, and Cognition, 1448–1457. http://dx.doi.org/10.1037/ xlm0000248 Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185(4157), 1124–1131. Vartanian, O., Beatty, E., Smith, I., Blackler, K., Lam, Q., Forbes, S., & De Neys, W. (in press). The Reflective Mind: Examining Individual Differences in Susceptibility to Base Rate Neglect with fMRI. Journal of Cognitive Neuroscience. Villejoubert, G. (2009). Are representativeness judgments automatic and rapid? The effect of time pressure on the conjunction fallacy. In Proceedings of the Annual Meeting of the Cognitive Science Society (Vol. 30, pp. 2980–2985). Wagenmakers, E. J. (2007). A practical solution to the pervasive problems of p values. Psychonomic bulletin & review, 14(5), 779-804.

30 Bago Bence – Thèse de doctorat - 2018

Chapter 1: Fast logic? Examining the time-course assumption of dual process theory

Influential dual process models of human thinking posit that reasoners typically produce a fast, intuitive heuristic (i.e., Type-1) response which might subsequently be overridden and corrected by slower, deliberative processing (i.e., Type-2). In this study we directly tested this time course assumption. We used a two response paradigm in which participants have to give an immediate answer and afterwards are allowed extra time before giving a final response. In four experiments we used a range of procedures (e.g., challenging response deadline, concurrent load) to knock out Type 2 processing and make sure that the initial response was intuitive in nature. Our key finding is that we frequently observe correct, logical responses as the first, immediate response. Response confidence and latency analyses indicate that these initial correct responses are given fast, with high confidence, and in the face of conflicting heuristic responses. Findings suggest that fast and automatic Type 1 processing also cues a correct logical response from the start. We sketch a revised dual process model in which the relative strength of different types of intuitions determines reasoning performance.

Based on Bago, B., & De Neys, W. (2017). Fast logic?: Examining the time course assumption of dual process theory. Cognition, 158, 90-109.

Supplementary material for this chapter can be found in Appendix A.

31 Bago Bence – Thèse de doctorat - 2018

Introduction

Decades of research on reasoning and decision-making have indicated that educated adult reasoners often violate elementary logical or probabilistic rules. As an example, consider that there is an event with 1000 people, you are told that most people at the event are I.T technicians, but there are also 5 attendees who are professional boxers. Assume that you are searching for someone you do not know and you are only given one piece of information; the person is described to you as being ‘strong’. What do you think is more likely? Is this person a boxer or an I.T. technician? On the basis of the base rate probabilities, one might say that the person is an I.T technician because there are much more I.T. technicians than boxers at the event. However, intuitively people will be tempted to conclude that the person is a boxer based on the stereotypical association (“I.T. technicians are weak”) that the description cues. Many studies have shown that people tend to neglect the base rates in these situations (e.g., Tversky & Kahneman, 1974; Pennycook, Trippas, Handley, & Thompson, 2014). Hence, participants typically base their choice on the stereotypical association and conclude that that the person is a boxer. Such intuitive or “heuristic” associations have been shown to bias people’s judgment in a wide range of tasks and situations (Gilovich, Griffin, & Kahneman, 2002). One of the possible explanations for the phenomenon is presented by dual process theories of thinking. According to the classic dual process view, there are two different types of thinking: Type 1 and Type 2 processes. Type 1 processing is fast, autonomous, does not require working memory, operates unconsciously and immediately triggers an answer. Type 2 processing puts a heavy load on working memory, operates consciously, controlled and relatively slow. The two types of processes are also often referred to as ‘intuitive’ or ‘heuristic’ vs ‘deliberate’ or ‘analytical’ (Stanovich & Toplak, 2012). It is important to note that dual process theory is an umbrella term; several types of dual process theories exist (Keith E. Stanovich & West, 2000a). In this study, we focus on the influential, default- interventionist view of dual processes that has been advocated in the seminal work of Evans and Stanovich (2013) and Kahneman (2011). The standard assumption in the default-interventionist dual process (DI) framework is that the automatic and fast Type 1 process first produces an intuitive heuristic answer. Generation of the heuristic answer might subsequently be followed by a deliberative, slow Type 2 process, which may result in a correction of the initial heuristic answer. Note that in

32 Bago Bence – Thèse de doctorat - 2018 cases - such as the introductory reasoning problem - in which the initial heuristic response conflicts with the correct logical1 response, the corrective Type 2 thinking is believed to be critical to arrive at the correct logical answer. In cases where the Type 2 processing fails, the heuristic response will not be corrected and the reasoner will end up giving the erroneous heuristic answer. Thus, the expected time course assumption is that reasoners will first generate a heuristic answer and, if needed, will after additional reflection correct this to arrive at the correct logical response. To avoid confusion it is important to stress that the DI time-course prediction does not entail that Type 1 processing necessarily results in an incorrect response or that Type 2 processing necessarily results in a correct response. Normative correctness is not a defining feature of Type 1 or Type 2 processing (e.g., it is not because a response is correct that it resulted from Type 2 processing, and Type 2 processing does not necessarily result in a correct response; e.g., Evans, 2012; Evans & Stanovich, 2013; Stanovich & Toplak, 2012). For example, sometimes reasoners might err precisely because their cognitive resources are overburdened by too much deliberation (e.g. Evans, 2010; Stanovich, 2011). Likewise, it is not hard to see that a person who is guessing can end up giving a correct response without engaging in any deliberation. The DI time course prediction concerns the processing of the typical reasoner in the prototypical situation in which a cued heuristic response conflicts with the correct logical response such as it has been studies in numerous classic tasks from the reasoning and decision-making field since the 1960s. In this case the DI model clearly entails that the typical reasoner will need to recruit Type 2 thinking to correct the initial heuristic Type 1 response in order to arrive at a correct response. Indeed, it is precisely the failure to engage in Type 2 processing that DI theorists have put forward as the primary cause of the massive “bias” in these tasks (Evans, 2012; Kahneman, 2011; Stanovich & West, 2000). Nevertheless, it is important to keep in mind that dual process theories do not claim that one can universally equate Type 2 processing with normative correctness. But unfortunately, and perhaps somewhat surprisingly, there is little evidence in the literature that allows us to directly validate the core DI time course assumption. For example, in one study De Neys (2006a) presented participants with a range of classic reasoning

1Note that we will be using the label “correct” or “logical” response as a handy shortcut to refer to “the response that has traditionally been considered as correct or normative according to standard logic or probability theory”. The appropriateness of these traditional norms has sometimes been questioned in the reasoning field (e.g., see Stanovich & West, 2000, for a review). Under this interpretation, the heuristic response should not be labeled as “incorrect” or “biased”. For the sake of simplicity we stick to the traditional labeling. In the same vein, we use the term “logical” as a general header to refer both to standard logic and probability theory.

33 Bago Bence – Thèse de doctorat - 2018 problems in which a cued heuristic response conflicted with the correct logical response and recorded response latencies. Results consistently showed that correct responses were given much slower than heuristic (i.e., incorrect) responses. One might argue that this finding is in agreement with the time course assumption. Giving a (correct) response that is assumed to result from slow Type 2 processing takes more time than giving an (incorrect) response that is assumed to result from fast Type 1 processing. However, although this fits with the claim that Type 2 processing is slower than Type 1 processing, it does not imply that someone who engaged in Type 2 reasoning first engaged in Type 1 reasoning. The latency data does not imply that correct reasoners generated the incorrect answer first, and then corrected it. Reasoners who complete Type 2 thinking might give the correct response without ever having considered the incorrect, heuristic response. In another illustrative study, Evans and Curtis – Holmes (2005) used an experimental design in which people had to judge the logical validity of reasoning problems under time pressure; one group of reasoners were given only 2 seconds to answer, whereas a control group were allowed to take as much time as they wanted to give an answer. A higher percentage of incorrect answers was found in the time pressure group. Hence, this also indicates that giving the correct response requires time. However, this does not necessarily show that individuals who gave the correct response in the free time condition generated the heuristic response first and corrected this subsequently. As with the latency data of De Neys (2006a), it might be that reasoners engaged in Type 2 thinking right away, without any need to postulate an initial generation of a heuristic response. One might note that there is also some incidental evidence for the DI time course assumption. For example, Frederick (2005) notes that when participants solve his Cognitive Reflection Test (which was designed to cue a strong heuristic response), correct responders often considered the incorrect, heuristic answer first “as is apparent from introspection, verbal reports, and scribbles in the margin” (Frederick, 2005, p. 27). But unfortunately, he gives no further information about the protocol analysis or the precise prevalence of these observations. Frederick also mentions that incorrect responders rate the problems as easier than correct responders and suggests that this presumably indicates that correct responders are more likely to consider both responses. But even when this assumption holds, it does clearly not imply that correct responders considered the heuristic response before the correct response. Arguably, the most direct evidence to evaluate the dual process time course assumption comes from experiments using the two response paradigm (Newman, Gibb, &

34 Bago Bence – Thèse de doctorat - 2018

Thompson, 2016; Pennycook & Thompson, 2012; Thompson & Johnson, 2014; Thompson, Prowse Turner, & Pennycook, 2011). In this paradigm, participants are presented with a reasoning problem and are instructed to respond as quickly as possible with the first, intuitive response that comes to mind. Afterwards, they are presented with the problem again, and they are given as much time as they want to think about it and give a final answer. A key observation for our present purposes was that Thompson and colleagues noted that people spent little time rethinking their answer in the second stage and hardly ever changed their initial response. Note that the fact that people do not change an initial heuristic response is not problematic for the dual process framework, of course. It just implies that people failed to engage the optional Type 2 processing. Indeed, since such failures to engage Type 2 are considered a key cause of incorrect responding, a dominant tendency to stick to incorrect initial responses is not surprising from the classic dual process stance. However, the lack of answer change tentatively suggests that in those cases where a correct logical response was given as final response, the very same response was generated from the start. Bluntly put, the logical response might have been generated fast and intuitively based on mere Type 1 processing (Pennycook & Thompson, 2012; Thompson & Johnson, 2014). This would pose a major challenge for standard dual process theory. However, it cannot be excluded that Thompson et al.’s participants engaged in Type 2 processing when they gave their first, initial response. Although Thompson et al. instructed participants to quickly give the first response that came to mind, participants might have simply failed to respect the instruction and ended up with a correct response precisely because they recruited Type 2 thinking2. Clearly, researchers have to make absolutely sure that only Type 1 processing is engaged at the initial response stage. In follow-up work Newman et al. (2016) have started to address this problem by giving participants a challenging response deadline to enter their initial response. One critical observation was that even in the initial response stage, people showed some sensitivity to the logical status of the problems. For example, participants were slightly more likely to accept valid than invalid inferences even when responding under deadline. Although this logical discrimination ability was more pronounced after additional reflection in the final response

2 Note that Thompson et al. obviously realized this and tried to control for it. For example, they always asked participants to verify that their first response was really the one that came to mind first, and they discarded the rare trials with negative verification answers. However, there is no way to be sure that participants’ verification answer was true or not. The problem is not so much that people might be intentionally lying but simply that they might have little explicit insight into which thought was generated first. The point here is that a more stringent control is needed.

35 Bago Bence – Thèse de doctorat - 2018 stage, the initial sensitivity suggests that to some degree they processed the logical status of the problems intuitively. Nevertheless, a critic can always argue that the deadline was not demanding enough to exclude all Type 2 processing. There is also some indirect evidence that could make one suspicious of the central time course assumption of dual process theory. One source of evidence comes from recent studies on conflict detection during thinking that try to determine whether biased reasoners notice that their heuristic answer violates logical principles (see De Neys, 2014, 2015, for review; Pennycook, Fugelsang, & Koehler, 2015). Therefore these studies typically contrast reasoners’ processing of conflict and control no-conflict problems. Conflict problems are constructed such that a heuristically cued intuitive response conflicts with the correct logical response. In the control no-conflict problem this conflict is not present and the cued heuristic response is also logically correct. For example, the introductory base rate neglect problem that we presented above was a conflict problem; the description will cue a heuristic response that conflicts with the response that is cued by considerations of the base-rates. A no-conflict version of this problem can be constructed by simply reversing the base rates (i.e., 995 boxers / 5 I.T. technicians). In this case the answer cued by the base rates, and the heuristic answer cued by the description, are pointing to the same conclusion: the person we are looking for is a boxer. The conflict detection studies have shown that biased reasoners who fail to give the correct response to the conflict problems typically do show sensitivity to the presence of conflict. For example, when solving conflict (vs no-conflict) problems even incorrect responders show elevated response times (e.g., Bonner & Newell, 2010; De Neys & Glumicic, 2008; Stupple, Ball, & Ellis, 2013; Villejoubert, 2009), decreased post-decision confidence (e.g., De Neys, Cromheeke, & Osman, 2011; De Neys, Rossi, & Houdé, 2013; Gangemi, Bourgeois-Gironde, & Mancini, 2015), and increased activation in brain regions believed to mediate conflict detection (De Neys, Vartanian, & Goel, 2008b; Simon, Lubin, Houdé, & De Neys, 2015). The fact that heuristic responders are sensitive to the presence of conflict between their heuristic answer and logical considerations has led some authors to suggest that some elementary logical processing might be occurring from the start of the reasoning process (e.g., De Neys, 2012, 2014; see also Pennycook et al., 2015). Related indirect evidence comes from work by Handley and colleagues (e.g., (Handley et al., 2011; Handley & Trippas, 2015; Pennycook, Trippas, et al., 2014a; Trippas et al., 2016). For example, in one of their studies participants were given syllogistic reasoning problems in which the believability of the conclusion could conflict with its logical validity (e.g., a believable but invalid conclusion such as “All flowers need water. Roses need water.

36 Bago Bence – Thèse de doctorat - 2018

Roses are flowers”). It has long been established that when asked to judge the logical validity of such problems, people will tend to be biased by intuitive, heuristic associations based on the believability of the conclusion. Hence, they will be tempted to accept the invalid conclusion simply because it fits with their prior beliefs. However, one of Handley et al.’s (2011) key manipulations was to explicitly instruct participants to give the heuristically cued response. That is, participants were asked to quickly judge whether the conclusion was believable or not without making any reference to logical reasoning. Interestingly, Handley et al. observed that the logical status of the conclusion nevertheless affected people’s believability judgments. People had more difficulty (i.e., took longer and made more errors) judging the believability of the conclusion when it conflicted with its logical validity. Bluntly put, although there was no reason to engage in logical reasoning, people couldn’t help to do so. These findings led Handley and colleagues. to suggest that the logical response might be generated simultaneously with the heuristic, belief-based response by fast and automatic Type 1 processing. Taken together, previous literature has not provided sufficient evidence for the critical time course assumption of the DI dual process model, and indirect evidence has led some authors to challenge this presumption as well. For completeness, one might note that the indirect evidence is not uncontroversial either (e.g., Aczel, Szollosi, & Bago, 2016; Klauer & Singmann, 2013; Mata, Schubert, & Ferreira, 2014; Pennycook, Fugelsang, & Koehler, 2012; Singmann, Klauer, & Kellen, 2014; Travers, Rolison, & Feeney, 2016). However, the point is that there is a strong need for the field to validate the time course assumption directly. As Newman et al. (2016), the present study focuses on this key issue. For this purpose, we adopted a two response paradigm. Participants were asked to give an immediate first answer, and then they were allowed to take as much time as they needed to give a final answer. We were specifically interested in the correctness of the initially generated answers and used a range of methodological procedures to make sure that the initial response was truly intuitive in nature. Default-interventionist (DI) dual process theory would predict that people typically give the heuristic answer for the first response, which is the incorrect answer in the case of conflict problems. Afterwards, when sufficient time is allotted for Type 2 processing to occur, they might be able to correct their initial response and arrive at the correct answer. In sum, in principle there should be only two main answer types according to standard DI theory: either incorrect for first response – incorrect for second response or incorrect for first response – correct for second response. Our key question is whether generation of a correct final

37 Bago Bence – Thèse de doctorat - 2018 response is indeed preceded by generation of an initial incorrect response or whether people can generate the correct logical answer for the first answer as well. This latter pattern would provide direct evidence for the existence of fast, logical Type 1 reasoning. Critically, we wanted to make sure and validate that the first response that participants gave only reflected the output of Type 1 processing. For this reason, in four experiments we used a combination of techniques that allowed us to minimize or control the impact of Type 2 processing. Experiment 1 (instructions only), served as a baseline condition in which we merely instructed participants to give their very first intuitive response and answer as quickly as possible. In Experiment 2 (response deadline), we made sure to avoid that (some) participants might take too long to give their first response by enforcing a strict and challenging response deadline. In Experiment 3 (cognitive load) we knocked out Type 2 processing experimentally by imposing a cognitive load task that burdened participants’ executive resources. In Experiment 4 (deadline + load) we combined both the response deadline and a cognitive load. Finally, to check the generality of the findings, two different reasoning tasks were used; a syllogistic reasoning and a base rate task. These were selected because of two reasons: first, these tasks are highly popular in the research community and have inspired much of the theorizing in the field. Second, the tasks are different in the sense that different normative systems are required to solve them correctly (standard logic for syllogistic reasoning, and probability theory for base rate task). The differences or similarities between the tasks can start to give us an indication of the generality of the findings.

METHOD

As we outlined above, we ran four studies that used a range of methodological procedures to make sure that the initial response was truly intuitive in nature. The rationale is that if the initial response results from automatic, intuitive processing, results should not be affected by any of these manipulations and findings should be similar across the four studies. To pre-empt the results, we indeed observed that the findings hardly varied across our studies. For ease of presentation we will present a single results section in which the study factor is included as a between-subject factor in the analyses. Here we present an overview of the method sections of the four studies. Every participant was allowed to take part in only one experiment.

38 Bago Bence – Thèse de doctorat - 2018

Experiment 1 – Instructions only

Participants

A total of 101 participants were tested (61 female, Mean age = 38.95, SD = 12.69). The participants were recruited via the Crowdflower platform, and received $0.30 for their participation. Only native English speakers from the USA or Canada were allowed to participate in the study. A total of 48% of the participants reported high school as highest completed educational level, while 51% reported having a post-secondary education degree (1% did not answer).

Materials

Base rate task. Participants solved a total of eight base-rate problems. All problems were taken from Pennycook, Cheyne, Barr, Koehler, and Fugelsang (2014). Participants always receive a description of the composition of a sample (e.g., “This study contained I.T engineers and professional boxers”), base rate information (e.g., “There were 995 engineers and 5 professional boxers”) and a description that was designed to cue a stereotypical association (e.g. “This person is strong”). Participants’ task was to indicate to which group the person most likely belonged. The problem presentation format we used in this research was based on Pennycook et al.'s (2014) rapid-response paradigm. In this paradigm, the base rates and descriptive information are presented serially and the amount of text that is presented on screen is minimized. Pennycook et al. introduced the paradigm to minimize the influence of reading times and get a purer and less noisy measure of reasoning time per se. First, participants received the names of the two groups in the sample (e.g., “This study contains clowns and accountants”). Next, under the first sentence (which stayed on the screen) we presented the descriptive information (e.g., Person ‘L’ is funny). The descriptive information specified a neutral name (‘Person L’) and a single word personality trait (e.g., “strong” or “funny”) that was designed to trigger the stereotypical association. Finally, participants received the base rate probabilities. The following illustrates the full problem format:

39 Bago Bence – Thèse de doctorat - 2018

This study contains clowns and accountants. Person 'L' is funny. There are 995 clowns and 5 accountants. Is Person 'L' more likely to be: o A clown o An accountant

Half of the presented problems were conflict items and the other half were no-conflict items. In no-conflict items the base rate probabilities and the stereotypic information cued the same response. In conflict items the stereotypic information and the base rate probabilities cued different responses. Three kinds of base rates were used: 997/3, 996/4, 995/5. Note that the material that was selected for the present study was extensively pretested. Pennycook et al. (2014) made sure that words that were selected to cue a stereotypical association consistently did so but avoided extremely diagnostic cues. The importance of such a non-extreme, moderate association is not trivial. Note that we label the response that is in line with the base rates as the correct response. Critics of the base rate task (e.g., Gigerenzer, Hell, & Blank, 1988; see also Barbey & Sloman, 2007) have long pointed out that if reasoners adopt a Bayesian approach and combine the base rate probabilities with the stereotypic description, this can lead to interpretational complications when the description is extremely diagnostic. For example, imagine that we have an item with males and females as the two groups and give the description that Person ‘A’ is ‘pregnant’. Now, in this case, one would always need to conclude that Person ‘A’ is a woman, regardless of the base rates. The more moderate descriptions (such as ‘kind’ or ‘funny’) help to avoid this potential problem. In addition, the extreme base rates (997/3, 996/4, or 995/5) that were used in the current study further help to guarantee that even a very approximate Bayesian reasoner would need to pick the response cued by the base-rates (see De Neys, 2014). Each problem started with the presentation of a fixation cross for 1000 ms. After the fixation cross disappeared, the sentence which specified the two groups appeared for 2000 msec. Then the stereotypic information appeared, for another 2000 ms, while the first sentence remained on the screen. Finally, the last sentence specifying the base rates appeared together with the question and two response alternatives. Note that we presented the base- rates and question together (rather than presenting the base-rate for 2000 ms first) to minimize the possibility that some participants would start solving the problem during presentation of the base-rate information. Once the base-rates and question were presented participants were

40 Bago Bence – Thèse de doctorat - 2018 able to select their answer by clicking on it. The position of the correct answer alternative (i.e., first or second response option) was randomly determined for each item. The eight items were presented in random order. An overview of the full item set can be found in the Supplementary material (Appendix A).

Syllogistic reasoning task. Participants were given eight syllogistic reasoning problems taken from De Neys, Moyens, and Vansteenwegen (2010). Each problem included a major premise (e.g., “All dogs have four legs”), a minor premise (e.g., “Puppies are dogs”), and a conclusion (e.g., “Puppies have four legs”). The task was to evaluate whether the conclusion follows logically from the premises. In four of the items the believability and the validity of the conclusion conflicted (conflict items, two problems with an unbelievable–valid conclusion, and two problems with a believable–invalid conclusion). For the other four items the logical validity of the conclusion was in accordance with its believability (no-conflict items, two problems with a believable–valid conclusion, and two problems with an unbelievable–invalid conclusion). We used the following format:

All dogs have four legs Puppies are dogs Puppies have four legs Does the conclusion follow logically? o Yes o No

The premises and conclusion were presented serially. Each trial started with the presentation of a fixation cross for 1000 ms. After the fixation cross disappeared, the first sentence (i.e., the major premise) was presented for 2000 ms. Next, the second sentence (i.e., minor premise) was presented under the first premise for 2000 ms. After this interval was over, the conclusion together with the question “Does the conclusion follow logically?” and two response options (yes/no) was presented right under the premises. Once the conclusion and question were presented, participants could give their answer by clicking on the corresponding bullet point. The eight items were presented in a randomised order. An overview of the full item set can be found in the Supplementary material (Appendix A).

41 Bago Bence – Thèse de doctorat - 2018

Procedure

The experiment was run online. People were clearly instructed that we were interest in their first, initial response to the problem. Instruction stressed that it was important to give the initial response as fast as possible and that participants could afterwards take additional time to reflect on their answer. The literal instructions that were used, stated the following: “Welcome to the experiment! Please read these instructions carefully! This experiment is composed of 16 questions and a couple of practice questions. It will take about 20 minutes to complete and it demands your full attention. You can only do this experiment once. In this task we'll present you with a set of reasoning problems. We want to know what your initial, intuitive response to these problems is and how you respond after you have thought about the problem for some more time. Hence, as soon as the problem is presented, we will ask you to enter your initial response. We want you to respond with the very first answer that comes to mind. You don't need to think about it. Just give the first answer that intuitively comes to mind as quickly as possible. Next, the problem will be presented again and you can take all the time you want to actively reflect on it. Once you have made up your mind you enter your final response. You will have as much time as you need to indicate your second response. After you have entered your first and final answer we will also ask you to indicate your confidence in the correctness of your response. In sum, keep in mind that it is really crucial that you give your first, initial response as fast as possible. Afterwards, you can take as much time as you want to reflect on the problem and select your final response. You will receive $0.30 for completing this experiment. Please confirm below that you read these instructions carefully and then press the "Next" button.”

All participants were presented with both the syllogistic reasoning and base-rate task in a randomly determined order. After the general instructions were presented the specific instructions for the upcoming task (base-rates or syllogisms) were presented. The following specific instructions were used for the syllogistic reasoning task: “In this part of this experiment you will need to solve a number of reasoning problems. At the beginning you are going to get two premises, which you have to assume being true. Then a conclusion will be presented. You have to indicate whether the conclusion follows logically from the premises or not. You have to assume that the premises are all true. This is very important. Below you can see an example of the problems.

Premise 1: All dogs have four legs Premise 2: Puppies are dogs Conclusion: Puppies have four legs Does the conclusion follow logically? o Yes o No

The two premises and the conclusion will be presented on the screen one by one. Once the conclusion is presented you can enter your response. As we told you we are interested in your initial, intuitive response. First, we want you to respond with the very first answer that comes to mind. You don't need to think about it. Just give the first answer that intuitively comes to mind as quickly as possible. Next, the problem will be presented again and you can take all the time you want to actively reflect on it. Once you have made up your mind you enter your final response. After you made your choice and clicked on it, you will be automatically taken to the next page. After you

42 Bago Bence – Thèse de doctorat - 2018

have entered your first and final answer we will also ask you to indicate your confidence in the correctness of your response. Press "Next" if you are ready to start the practice session!”

For the base-rate task these instructions were presented:

“In a big research project a large number of studies were carried out where a psychologist made short personality descriptions of the participants. In every study there were participants from two population groups (e.g., carpenters and policemen). In each study one participant was drawn at random from the sample. You’ll get to see one personality trait of this randomly chosen participant. You’ll also get information about the composition of the population groups tested in the study in question. You'll be asked to indicate to which population group the participant most likely belongs. As we told you we are interested in your initial, intuitive response. First, we want you to respond with the very first answer that comes to mind. You don't need to think about it. Just give the first answer that intuitively comes to mind as quickly as possible. Next, the problem will be presented again and you can take all the time you want to actively reflect on it. Once you have made up your mind you enter your final response. After you made your choice and clicked on it, you will be automatically taken to the next page. After you have entered your first and final answer we will also ask you to indicate your confidence in the correctness of your response. Press "Next" if you are ready to start the practice session!”

After the task specific instructions, participants solved two practice (no-conflict) problems to familiarize them with the task. Then they were able to start the experiment. For the first response people were instructed to give a quick, intuitive response. After they clicked on the answer, they were asked to give their confidence in their answer, on a scale from 0% to 100%, with the following question: “How confident are you in your answer? Please type a number from 0 (absolutely not confident) to 100 (absolutely confident)”. Next, they were presented with the problem again, and they were told that they could take as much time as they needed to give a final answer. As a last step, they were asked to give the confidence in their final answer (the same question format as for first answer confidence was used). The colour of the actual question and answer options were green during the first response, and they were blue during the second response phase, to visually remind participants which question they were answering at the moment. For this purpose, right under the question a reminder sentence was placed: “Please indicate your very first, intuitive answer!” and “Please give your final answer.” respectively. The presentation order of the base rate and syllogistic reasoning tasks was randomized. After participants finished the first task they could briefly pause, were presented with the instructions and practice problems of the second task, and started the second task. For both the base-rate and syllogistic reasoning task two different problem sets were used. The conflict items in one set were the no-conflict items in the other, and vice-versa. This was done by reversing the base-rates (base-rate task) or by switching the conclusion and minor premise (syllogisms, see De Neys et al., 2010). Each of the two sets was used for half of the

43 Bago Bence – Thèse de doctorat - 2018 participants. Supplementary material (Appendix A) section A and B gives an overview of all problems in each of the sets. This counterbalancing minimized the possibility that mere content or wording differences between conflict and no-conflict items could influence the results. At the end of the study participants were asked to answer demographic questions.

Experiment 2 – response deadline

Participants

In the actual experiment (response deadline), 104 participants were recruited (63 female, M = 39.9 years, SD = 13.31 years). An additional 52 participants (31 female, M = 44.13, SD = 13.2) were recruited for a reading pretest (i.e., reading pretest; see further). Participants received $0.11 for their participation in the reading pretest and $0.50 for participation in the actual experiment. The same recruitment procedure was used as in Experiment 1. In the response deadline condition, 35% of the participants reported high school as highest educational level, 63% reported having a post-secondary education degree, while 2% reported less than high school educational level. In the reading pre-test 40% of the participants reported high school as highest educational level, while 60% of them reported having a post-secondary education degree.

Materials

Reading pre-test. In the reading pretest participants were asked to simply read each one of the base-rate and syllogistic reasoning problems that were used in Experiment 1. The basic goal of this reading condition was to define the response deadline for the actual reasoning study. Our rationale was to base the deadline on the average reading times for the syllogistic reasoning and base-rate problems (see further). Note that as many critics have argued, dual process theories are massively underspecified (Kruglanski, 2013). The theory only posits that Type 1 processes are relatively faster than Type 2 processes. However, no criterion is available that would allow us to a priori characterize a response as a Type 1 response in an absolute sense (i.e., faster than x seconds = Type 1). Our reading baseline provides a practical criterion to define a response deadline. The rationale is simple, if we allot participants only as much time as it takes to read the problems, we can be reasonable sure that

44 Bago Bence – Thèse de doctorat - 2018 reasoning related Type 2 processing will be minimal. Obviously, making a strict distinction between reading and reasoning is not possible but the point here is that the reading pretest will give us a practical criterion that should serve as a reasonable and universally applicable proxy. As a side-note, note that our reasoning task format was especially selected with an optimization of the deadline in mind. As we clarified, we aimed to minimize the amount of text that was presented on screen. This is again not a trivial issue. In two-response studies with traditional base-rate problems Pennycook and Thompson (Pennycook & Thompson, 2012; Thompson et al., 2011; see also Newman et al., 2016) already tried to introduce a deadline for the first response. However, because of the lengthy nature of the problems, pilot testing indicated that the deadline needed to be set at 12 s. Arguably, such a lengthy response window leaves ample room for a possible impact of Type 2 processing. This is one of the complications that can be sidestepped by adopting Pennycook et al.’s (2014) fast response base-rate format that we used in the present studies. Hence, by minimizing the to-be read text we hoped to minimize reading time and set a much stricter deadline in absolute terms. Participants were instructed that the goal of the pretest was to determine how long people needed to read item materials. They were instructed that there was no need for them to try to solve the problems and simply needed to read the items in the way they typically would. When they were finished reading, they were asked to randomly click on one of the presented response options to advance to the next problem. Presentation format was the same as in Experiment 1. The only difference was that the problem was not presented a second time and participants were not asked for a confidence rating. To make sure that participants would be motivated to actually read the material we told them that we would present them with two (for both tasks, four in sum) very easy verification questions at the end of the study to check whether they read the material. The literal instructions were as follows: . “Welcome to the experiment! Please read these instructions carefully! This experiment is composed of 16 questions and 4 practice questions. It will take 5 minutes to complete and it demands your full attention. You can only do this experiment once. In this task we'll present you with a set of problems we are planning to use in future studies. Your task in the current study is pretty simple: you just need to read these problems. We want to know how long people need on average to read the material. In each problem you will be presented with two answer alternatives. You don’t need to try to solve the problems or start thinking about them. Just read the problem and the answer alternatives and when you are finished reading you randomly click on one of the answers to advance to the next problem. The only thing we ask of you is that you stay focused and read the problems in the way you typically would. Since we want to get an accurate reading time estimate please avoid whipping your nose, taking a phone call, sipping from your coffee, etc. before you finished reading. At the end of the study we will present you with some easy verification questions to check whether you actually read the problems. This is simply to make sure that participants are complying with the instructions and actually read the problems (instead of clicking through them without paying

45 Bago Bence – Thèse de doctorat - 2018

attention). No worries, when you simply read the problems, you will have no trouble at all at answering the verification questions. You will receive $0.11 for completing this experiment. Please confirm below that you read these instructions carefully and then press the "Next" button.”

Specific instructions before the syllogistic items started: “In the first part of this experiment you will need to read a specific type of reasoning problems. At the beginning you are going to get two premises, which you have to assume being true. Then a conclusion, question and answer alternatives will be presented. We want you to read this information and click on any one of the two answers when you are finished. Again, no need to try to solve the problem. Just read it. Below you can see an example of the problems.

Premise 1: All dogs have four legs Premise 2: Puppies are dogs Conclusion: Puppies have four legs Does the conclusion follow logically? o Yes o No The two premises and the conclusion will be presented on the screen one by one. Once the conclusion is presented, you simply click on one of the answer alternatives when you finished reading and the next problem will be presented. Press "Next" if you are ready to start a brief practice session!”

Specific instructions before the base rate items started: “In a big research project a large number of studies were carried out where a psychologist made short personality descriptions of the participants. In every study there were participants from two population groups (e.g., carpenters and policemen). In each study one participant was drawn at random from the sample. You’ll get to see one personality trait of this randomly chosen participant. You’ll also get information about the composition of the population groups tested in the study in question. Then, a question to indicate to which population group the participant most likely belongs will appear. We simply want you to read this question and the two answer alternatives. Once you finished reading this, you simply click on either one of the answer alternatives and then the next problem will be presented. Again no need to try to solve the problem, just read the question and simply click on either one of the answers when you are finished. Press "Next" if you are ready to start a brief practice session!”

The verification questions were constructed such that a very coarse reading of the problems would suffice to recognize the correct answer. The following are examples of the verification questions for the syllogistic and base-rate problems: “We asked you to read the conclusions of a number of problems. Which one of the following conclusions was NOT presented during the task: o Whales can walk o Boats have wheels o Roses are flowers o Waiters are tired”

46 Bago Bence – Thèse de doctorat - 2018

And an example of the verification question for the base rate task: “We asked you to read problems about a number of population groups. Which one of the following combination of two groups was NOT presented during the task: o Nurses and artists o Man and woman o Scientists and assistants o Cowboys and Indians”

The correct answer was blatantly unrelated to any of the presented material content. Note that 94% of the verification questions were solved correctly, which indicates that by and large, participants were at least minimally engaged in the reading task. Only those participants were analysed, who correctly solved both verification questions regarding each task. In sum, the reading condition should give us a baseline against which the reasoning response times for the initial response can be evaluated. Any Type 1 response during reasoning also minimally requires that a) the question and response alternatives are read, and b) participants move the mouse to select a response. The reading condition allows us to partial out the time needed for these two components. In other words, the reading condition will gives us a raw indication of how much time a Type 1 response should (minimally) take. Results of the reading pretest indicated that participants needed on average 2.92 sec (SD = 1.95) for base rate problems, and 2.62 sec (SD = 1.89) for the syllogistic reasoning problems to read the problems and click on a response option3. We rounded this value to the nearest integer (3 s) to give participants some minimal leeway. Hence, we set a universal response deadline of 3 s for the reasoning experiment.

Procedure

The same two reasoning tasks and problems were used as in Experiment 1. The only difference was that a response deadline was introduced to minimize the possibility that participants would engage in time-consuming Type 2 processing when giving their first response. Once the question was presented, participants had 3000 ms to click on one of the answer alternatives and after 2000 ms the background colour turned yellow to remind them to pick an answer immediately. If participants did not select an answer within 3000 ms they got feedback to remind them that they had not answered within the deadline and they were told to

3 Note that reading time averages were calculated on the logarithmically transformed data, but they were transformed back to seconds.

47 Bago Bence – Thèse de doctorat - 2018 make sure to respond faster on subsequent trials. Obviously, there was no response deadline for the second response. Participants were given 3 (no-conflict) practice problems before starting each task to familiarize them with the deadline procedure. During the actual reasoning task, participants failed to provide a first response within the deadline on 12% of the trials. These missed trials were discarded and were not included in the reported data.

Experiment 3 – Load

Participants

A total of 99 participants were recruited (44 female, M = 39.28, SD = 13.28). The same recruitment procedure was used as in Experiment 1. Participants received $0.50 for their participation. A total of 48% of the participants reported high school as highest educational level, while 43% of them reported having a post-secondary degree. Six percent of the participants did not provide education level information.

Materials & Procedure

In Experiment 3 we used a concurrent load task - the dot memorization task (Miyake, Friedman, Rettinger, Shah, & Hegarty, 2001) - to burden participants’ executive cognitive resources while they were solving the reasoning tasks. The idea behind the load manipulation is straightforward. One of the defining features of Type 2 processing is that it requires executive (working memory) resources (e.g., Evans & Stanovich,2013; Kahneman, 2011). Hence, if we burden participants’ cognitive resources with a secondary load task while they are solving the reasoning problems, we reduce the possibility that they can engage in Type 2 thinking (De Neys, 2006a; De Neys & Schaeken, 2007; Franssens & De Neys, 2009). The same two reasoning tasks and problems were use as in Experiment 1. In every trial, after the fixation cross disappeared, participants were shown a matrix in which 4 dots were presented in a complex interspersed pattern in a 3 x 3 grid (see Figure 1) for 2000 ms. Participants were instructed to memorize the pattern. Previous studies established that this demanding secondary task successfully burdens executive resources during reasoning (De Neys, 2006b; Franssens & De Neys, 2009; Miyake et al., 2001). After the matrix disappeared, the reasoning problem was presented as in Experiment 1 and participants had to give their first response and their response confidence. After this, they were shown four matrices with

48 Bago Bence – Thèse de doctorat - 2018 different dot patterns and they had to select the correct, to-be-memorized matrix (see Figure 1). Participants were given feedback as to whether they recalled the correct matrix or not. Subsequently, the problem was presented again and participants selected their final response and response confidence. Hence, no load was imposed during the second, final response stage. There was no time limit for either one of the responses. All trials on which an incorrect matrix was selected (11% of trials) were removed from the analysis. Before the actual experiment participants were familiarized with the task procedure. First, they received two reasoning problems which were identical to the practice questions used in Experiment 1. After, participants were presented with a dot matrix practice question – they were simply shown a dot pattern for 2000 ms and after it disappeared they were asked to identify the pattern from four presented options. As a last step, they were presented two more practice reasoning problems which they needed to solve under load following the procedure outlined above. A) B)

Figure 2. Example of a dot matrix (left panel, A) and a pattern recall question with the possible answer options (right panel, B).

49 Bago Bence – Thèse de doctorat - 2018

Experiment 4 – Response deadline and load

Participants

In Experiment 4 (deadline and load), 115 participants were recruited (53 female, M = 38.85 years, SD = 12.12 years). The same recruitment procedure was used as in Experiment 1. Participants received $0.50 for their participation. A total of 40% of the participants reported high school as highest educational level, 57% of them reported having a post-secondary education degree, and 3% reported less than high school educational level.

Materials & Procedure

In Experiment 4, the same load task was used as in Experiment 3, the only exception was that participants had to give both their first and final response under load4. Hence, participants had to memorize the dot pattern until they had given their final response and confidence rating. In addition, the same response deadline (3 s) procedure as in Experiment 2 was used to further limit the possibility of Type 2 engagement in the first response stage. Participants in Experiment 4 failed to provide a first response before the deadline on 6.3% of the trials. These missed trials were discarded from the analysis. Trials on which an incorrect matrix was selected (11% of trials) were also removed. Taken together, this resulted in a total number of 16.7% of the trials that were excluded from the analysis.

RESULTS

Accuracy of final response

For consistency with previous work we first present the response accuracies for the final response. Table 1 gives an overview of the results. As the table indicates, accuracies across the four experiments are generally in line with previous studies that adopted a single response paradigm. On the majority of trials, people are typically biased when solving the

4 The idea was to test whether the additional load during the final response stage would affect the likelihood that people changed from an initial incorrect to final correct answer. However, because of the floored number of such change responses (see further) in the no load conditions, this sub-hypothesis could not be verified.

50 Bago Bence – Thèse de doctorat - 2018

conflict problems. Average final accuracy across the four experiments is only 40 % for the base rate task and 48 % for the syllogistic reasoning task (differences between experiments are considered in the next section). However, on the no-conflict problems where heuristic Type 1 processing cues the correct response, participants perform significantly better with average final accuracies reaching 94 % for the base-rate task, b = - 4.77, Z = -25.06, p < 0.001, and 77 % for the syllogistic reasoning task, b = - 1.51, Z = -17.07, p < 0.001. Hence, final response accuracies are generally consistent with what can be expected based on the literature that adopted a classic one-response paradigm with these task (e.g., De Neys et al., 2010; Pennycook et al., 2015).

Table 1. Percentage of correct final responses (SD) on conflict and no-conflict problems in the base-rate and syllogistic reasoning tasks across the four experiments. Base Rate Syllogisms Experiment Conflict No-conflict Conflict No-conflict Experiment 1: Instructions only 36.3% (48.1) 95.4% (22.8) 41.8% (49.4) 77.4% (41.8)

Experiment 2: Response deadline 37% (48.3) 93.7% (24.2) 42.8% (49.6) 77.9% (41.5)

Experiment 3: Load 45.2% (49.8) 95.3% (21.2) 53.6% (49.9) 76.3% (42.6)

Experiment 4: Deadline + Load 42.8% (49.5) 93% (25.4) 54.5% (49.9) 76% (42.7)

Average 40.1% (49) 94.1% (23.5) 47.9% (50) 76.9% (42.1)

Direction of Change

Our primary interest in the present study (inspired by Pennycook and Thompson, 2012) is what we will refer to as the “Direction of Change” analysis for the conflict items). By direction of change, we mean the way or direction in which a given person in a specific trial changed (or didn’t change) her initial answer during the rethinking phase. More specifically, people can give a correct or incorrect response in each of the two response stages. Hence, in theory this can result in four different types of answer change patterns: 1) a person could either give the incorrect (heuristic) answer as the first response, and then change to the

51 Bago Bence – Thèse de doctorat - 2018 correct (logical) answer as the final response (we will use the label “01” to refer to this type of change pattern), 2) one can give the incorrect answer as the first response and final response (we use the label “00” for this type of pattern), 3) one can give the correct answer as the first response and change to the incorrect response as the final response (we use the label “10” for this type of pattern), and 4) one can give the correct answer for the first and final response (we use the label “11” for this pattern). To recap, we will use the following labels to refer to four possible types of answer change patterns: “01” (i.e., response 1 incorrect, response 2 correct), “00” (i.e., response 0 incorrect, response 2 incorrect), “10” (i.e., response 1 correct, response 2 incorrect), and “11” (i.e., response 1 correct, response 2 correct). Table 2 shows how frequent each of the four types of directions of change were for the critical conflict problems. First thing to note is that for both the base-rate and syllogistic reasoning tasks there are two general trends that support the DI dual process view. First, there is a high prevalence of 00 responses. For both reasoning tasks this was clearly the dominant response category. The dominance of the 00 category in the conflict problems supports DI theory; people will typically tend to stick to the heuristic response which results in an erroneous first response that is subsequently not corrected. Second, we also observe a small number of trials in the 01 category. In line with standard DI theory, sometimes an initial erroneous response will be corrected after additional reflection, but these cases are quite rare. By and large, these trends fit the standard DI predictions. However, a key challenge for the standard DI model is the high frequency of “11” responses (as Table 2 shows, 31% and 42% of responses for base-rate and syllogisms, respectively). Indeed, both for the base-rate and syllogistic reasoning task it was the case that for the majority of trials in which the final response was correct, this correct response was already given as the initial response (i.e., 74.8% and 87.6% of the final correct response trials in the base-rate and syllogistic reasoning task, respectively). Hence, in these cases the correct logical response was given immediately.

52 Bago Bence – Thèse de doctorat - 2018

Table 2. Percentage of trials within every direction of change categories for conflict items. The raw number of trials in each category is presented between brackets. Direction of Change Task Experiment 11 00 10 01 Base rate Experiment 1: 27% (108) 61% (244) 2.75% (11) 9.25% (37) Instructions only Experiment 2: 25.1% (92) 59.9% (217) 4.4% (16) 11.2% (41) Response deadline Experiment 3: 38.8% (122) 52.5% (179) 2.3% (8) 9.4% (32) Load Experiment 4: 32.7% (127) 53.4% (207) 3.1% (13) 10.7% (41) Deadline + Load

Average 30% 56.7% 3.2% 10.1%

Syllogism Experiment 1: 35.1% (142) 54.7% (221) 3.5% (14) 6.7% (27) Instructions only Experiment 2: 38.9% (139) 50.4% (180) 5.3% (19) 5.3% (19) Response deadline Experiment 3: 46.2% (156) 43.5% (147) 3% (10) 7.4% (25) Load Experiment 4: 49.7% (187) 43.1% (162) 2.7% (10) 4.5% (17) Deadline + Load

Average 42.3% 48.1% 3.6% 6%

Table 3. Percentage of trials within every direction of change categories for no-conflict items. The raw number of trials in each category is presented between brackets. Direction of Change Task Experiment 11 00 10 01 Base rate Experiment 1: 90.3% (361) 3.5% (14) 2% (8) 4.3% (17) Instructions only Experiment 2: 90.6% (338) 3.2% (12) 2.1% (8) 4% (15) Response deadline Experiment 3: 91.3% (273) 3.3% (10) 1.3% (4) 4% (12) Load Experiment 4: 88.2% (290) 4.3% (14) 2.1% (7) 5.5% (18) Deadline + Load

Average 90.9% 3.6% 1.9% 4.5%

Syllogism Experiment 1: 73.8% (298) 18.3% (74) 4.2% (17) 3.7% (15) Instructions only Experiment 2: 74.7% (272) 15.4% (56) 4.1% (15) 5.8% (21) Response deadline Experiment 3: 72.7% (245) 17.8% (60) 5.9% (20) 3.6% (12) Load Experiment 4: 70.6% (272) 20% (77) 3.1% (12) 6.2% (24) Deadline + Load

Average 73% 17.9% 4.3% 4.8%

53 Bago Bence – Thèse de doctorat - 2018

Moreover, the high prevalence of 11 responses is observed across experiments; by and large all experiments show similar results. A high proportion of 11 responses in Experiment 1 (instructions only) solely could have been attributed to the fact that some participants might simply not respect the instructions. However, the 11 responses remained stable in Experiment 2-4 that minimized the possibility to engage in slow and demanding Type 2 thinking. Indeed, if anything it seems that the proportion of 11 responses is slightly elevated in the two conditions where cognitive load was applied (Experiment 3: Load, and Experiment 4: Load + deadline). Mixed-effect multilevel logistic regression models5 showed that this trend reached significance in the syllogistic reasoning task, χ2 (5) = 12.07, p = 0.007, but not in the base-rate task, χ2 (5) = 6.67, p = 0.083. However, the key point here is that none of the experimental procedures decreased the frequency of 11 responses. This indicates that the correct initial responses did not result from additional Type 2 processing. Another potential explanation for the high prevalence of 11 responses is that they simply result from random guessing. Indeed, the experimental design is challenging for participants; they were asked to produce a very quick answer, and could even be faced with a strict deadline and/or secondary task load. One might argue that the task is simply too demanding and participants are consequently answering randomly when asked to enter an initial response. Obviously, if people guess, they would give a correct initial response at about 50%. Hence, in theory, a high prevalence of 11 responses might result from such random guessing. However, direct evidence against the guessing account comes from the no-conflict problems. As Table 3 shows, responses here are almost exclusively of the 11 type. Across experiments they accounted for 90% (base-rates) and 73% (syllogism) of responses. Mixed- effect multilevel logistic regression models showed that there were no significant differences with respect to the frequency of 11 no-conflict responses across the four experiments, neither in the base rate task, χ2 (5) = 1.07, p = 0.78, nor in the syllogistic reasoning task, χ2 (5) = 1.73, p = 0.63. However, both for the base, b = -4.49, Z = -25.64, p < 0.0001, and syllogistic reasoning task, b = -1.49, Z = -16.97, p < 0.0001, the frequency of 11 responses was clearly higher for the no-conflict than for the conflict problems. Note that this dominance of 11 responses on no-conflict problems is as predicted by DI theory given that the heuristic Type 1 processing is expected to cue the correct response on no-conflict problems. However, the

5 Rationale for this analysis can be found under the section “Confidence and response time analysis”. As the dependent variable here was categorical, we used multilevel logistic regression models, instead of regular regression. Note that one might also argue that the analysis should include 10 responses. However, results are not affected. The trend towards more initial correct responses under load reached significance in the syllogistic reasoning task, χ2 (5) = 10.02, p = 0.02; but not in the base-rate tasks, χ2 (5) = 5.08, p = 0.17.

54 Bago Bence – Thèse de doctorat - 2018 point here is that the pattern directly argues against the guessing account: If our experimental demands were too challenging and participants were simply guessing when giving their initial response, the initial responses should not have differed for conflict and no-conflict problems either. Our stability and confidence analyses below will provide further evidence against the guessing account. Finally, we would like to note that the consistent low prevalence of 10 and 10 responses that we observed on the conflict trials across tasks and experiments support Thompson et al.’s (2011) and Pennycook and Thompson’s (2012) earlier observations that people mostly stick to their initial response and rarely change their answer regardless whether it was correct or not. On average the 10 and 01 categories accounted for less than 11.4 % of responses on the conflict trials across tasks and experiments. In sum, the key challenge for the time course assumption of DI dual process theory of the present direction of change analysis is the high prevalence of “11” response. Although we observed the predicted dominance of 00 responses, we also found that in the majority of cases in which the correct response was given as final answer, this response was already selected as initial answer. This tentatively suggests that in those cases where people arrive at a correct final response, the correct response was already generated intuitively.

Stability index analysis

Our direction of change analysis was computed across items and participants (using a mixed model approach). One might wonder whether participants are stable in their preference for one or the other type of change category. That is, does an individual who produces a correct vs. incorrect response on one conflict problem does so consistently for the other items or are people more variable in their response preferences across problems? To answer this question we calculated for every participant on how many (out of the number of trials they answered) conflict problems they displayed the same direction of change category. We refer to this measure as the stability index. For example, if an individual shows the same type of direction of change on all four conflict problems, the stability index would be 100%. If the same direction of change is only observed on two trials, the stability index would be 50% etc. Table 4 presents an overview of the findings. Note that due to our methodological restrictions (discarding of no response trials under deadline and load trials in which the memorization was not successful) for a small number of participants only 3 responses were available. Here the stability index is calculated over the available items. The table shows the percentage of

55 Bago Bence – Thèse de doctorat - 2018 participants who displayed the same direction of change type on 100%, 75%, 66%, 50%, or <33% of trials. As the table shows, for both reasoning tasks, in all four experiments, the vast majority of participants displayed the exact same type of change on at least two out of 3 or three out of 4 conflict problems. This pattern held across experiments. The average stability index for the base-rate task was 84.9% (SD = 20.5) and 73.9% (SD = 21.2) for the syllogistic reasoning task. This indicates that the type of change is highly stable at the individual level. If people show a specific direction of change pattern on one problem, they tend to show it on all problems. Note that the stability index analysis also argues further against the guessing account. If people were guessing randomly, they should not tend to pick the same response consistently.

Table 4. Total frequency of stability indexes for each direction of change category. The raw number of participants in each category is presented between brackets. Stability Task Experiment < 33% 50% 66% 75% 100% Base rate Experiment 1: 2% (2) 13% (14) 0 21% (21) 63% (63) Instructions only Experiment 2: 2.97% 15.84% 8.91% (9) 14.85% 57.43% Response (3) (16) (15) (58) deadline Experiment 3: 4.21% 10.52% 5.26% (5) 22.11% 57.89% Load (4) (10) (21) (55) Experiment 4: 3.51% 12.28% 10.52% 13.16% 60.53% Deadline + Load (4) (14) (12) (15) (69)

Average 3.17% 12.91% 8.23% 17.78% 59.71%

Syllogism Experiment 1: 0 32.76% 0 23.76% 43.56% Instructions (33) (24) (44) only Experiment 2: 1.01% 27.27% 10.1% 26.26% 35.35% Response (1) (27) (10) (26) (35) deadline Experiment 3: 2.13% 39.36% 15.96% 14.89% 27.66% Load (2) (37) (15) (14) (26) Experiment 4: 0.88% 36.84% 16.67% 14.04% 31.57% Deadline + Load (1) (42) (19) (16) (36)

Average 1.34% 34.06% 14.24% 19.74% 34.54%

56 Bago Bence – Thèse de doctorat - 2018

Confidence ratings and response time analysis

Examining response latencies and confidence for the four different types of direction of change categories can help to get some further insight in the reasons behind people’s answer change (or lack thereof). We focus our analysis here on the critical conflict items (the contrast with no-conflict items can be found in the next section). Results are presented in Figures 2 and 3. These figures present a rich data set. We are looking at results from two different reasoning tasks (base-rate; syllogisms), four experiments (Experiment 1-4), two response stages (initial and final response), and two dependent measures (confidence and latencies). However, as with the direction of change analysis above, by and large the findings are fairly consistent across experiments and tasks. For ease of interpretation, Figure 4 presents the findings averaged across experiments and tasks. This will allow us to identify and discuss the main trends first. Subsequently, we will present more detailed statistical tests of the findings. As Figure 4 (top panels) indicates, the key pattern with respect to the confidence ratings is that the 00 and 11 direction of change categories show very high confidence both for the initial and final response. Confidence for the 01 and 10 cases – in which participants changed their initial response - is considerably lower at both response stages. Hence, cases in which participants change their initial response tend to be characterized by a lower response confidence. The latency findings for the final response (bottom panel D.) further indicate that the lower confidence for the 01 and 10 cases is accompanied by elevated final response times. Participants take more time to give their final answer in the 01 and 11 cases than in the 00 and 11 ones. In other words, the few cases in which people do change their initial response are characterized by a longer rethinking in the final response stage. Note that this pattern is consistent with Thompson et al.’s (2011) original two-response findings and supports the DI dual process prediction that answer change results from time-consuming Type 2 thinking. With respect to the initial response times Figure 4 (bottom panel C) indicates that all initial responses are given fairly fast. The vertical line in Figure 4 (and 2) denotes the 3 s response deadline that was set based on our reading pretest. Average response times for each of the four direction of change categories are all below this threshold. Obviously, as Figure 2 (left hand panels) indicates, the initial response times do show more variability across experiments. Not surprisingly, in the two experiments in which a deadline was set (and responses above 3 s were discarded), average initial response times values are smaller than in the experiments without deadline. However, even without the deadline participants generally

57 Bago Bence – Thèse de doctorat - 2018 tend to give their initial response within reasonable limits. The only apparent deviancy is the 10 case in which the initial response seems to take slightly longer than in the other direction of change categories. To analyse the results statistically we used the nlme statistical package in R to create mixed effect multi-level models (Pinheiro, Bates, Debroy, & Sarkar, 2015). This allows us to analyse the data in a trial-by-trial basis, while accounting for the random effect of subjects (Baayen, Davidson, & Bates, 2008). Mixed effect models have increased statistical power due to the inclusion of random effects, and the ability to handle data which violates the assumption of homoscedasticity (Baayen et al., 2008). The direction of change category (11, 00, 10, and 01) and the experimental condition (Experiment 1-4) were entered to the model as fixed effect factors, and participants were entered as a random factor. We ran separate analyses for each of the two reasoning tasks. In the few cases in which we found a significant interaction between direction of change and experimental condition we also analysed each experiment separately. For all response time analyses reported in the present paper, the response times were transformed logarithmically prior to analysis. Note that given the positively skewed nature of the logarithmically transformed reaction time data we further excluded trials whose log- transformed value was over 1.5 (these trials were over one and the half times of the interquartile range, amounting to a deviation of more than 3.5 SD from the mean) to get a more normal-like distribution. This implied that 3.3% of trials were excluded from the reaction time analysis (inital and final response combined). In the confidence analyses, trials where participants entered a confidence rating higher than 100 were also excluded. This amounted to 2.7% of trials (initial and final response combined).

Response times – initial response

Results for the initial response time analysis in the base rate task showed that the main effect of direction of change category significantly improved model fit, χ2 (7) = 36.25, p < 0.0001, as well as the main effect of experiment, χ2 (10) = 116.23, p < 0.0001. The interaction factor did not improve model fit significantly, χ2 (19) = 5.03, p = 0.83. Similarly, for the syllogistic reasoning task the main effect of direction of change significantly improved model fit, χ2 (7) = 31.04, p < 0.0001, as did the main effect of experiment, χ2 (10) = 161.13, p < 0.0001. The interaction factor did not improve model fit further, χ2 (19) = 8, p = 0.53.

58 Bago Bence – Thèse de doctorat - 2018

With respect to the main effect of experiment we ran a follow-up contrast test to verify whether the visually identified trend towards longer initial response latencies in the experiments without response deadline was significant. The contrast test indeed indicated that this was the case for both reasoning tasks (base-rate: b = 0.21, t (408) = 11.266, p < 0.0001, r = 0.49; syllogisms: b = 0.23, t (406) = 13.608, p < 0.0001, r = 0.56). The main effect of direction of change on the other hand, seemed to be driven by overall longer initial response latencies in the 10 case. Follow-up contrast analyses also established that this specific trend was significant for the syllogisms, b = -0.16, t (39) = -4.488, p = 0.0001, r = 0.58, but not for base rate problems: b = -0.06, t (41) = -1.966, p = 0.056, r = 0.29. It is important to stress here that there was no interaction between the direction of change and experiment factors. Hence, the increased initial 10 latencies are observed across experiments. If the longer initial 10 responses were only observed in the instruction only experiment, for example, they could have been attributed to Type 2 thinking. Participants would not respect the instruction to respond intuitively, take extra time to deliberate and consequently manage to give the correct response. However, the fact that the longer initial 10 response times are also observed under time pressure and load argues against this explanation. This tentatively suggests that the Type 1 processing in the 10 case (i.e., selection of a correct initial response that is afterwards changed), might be genuinely slower than the Type 1 processing that is occurring in the other direction of change categories.

Response time – final response

For the base-rate problems results of the final response time analysis showed that there was a significant main effect of direction of change, χ2 (7) = 26.37, p < 0.0001, and experimental condition, χ2 (10) = 31.45, p < 0.0001. The interaction between both factors did not improve model fit, χ2 (19) = 8.71, p = 0.46. The same pattern was observed for the syllogistic reasoning task; there was a main effect of direction of change, χ2 (7) = 80.12, p < 0.0001, and experimental condition, χ2 (10) = 14.8, p = 0.002, and the interaction did not improve model fit, χ2 (19) = 11.88, p = 0.22. Follow-up tests for the main effect of direction of change indicated that, as the visual inspection indicated, final response times were longer in the cases where participants changed their initial response (10 and 01) than in cases (11 and 00) where the initial response was not changed (base rate task, b = -0.1, t (115) = -4.91, p < 0.0001, r = 0.42; syllogistic reasoning, b

59 Bago Bence – Thèse de doctorat - 2018

= -0.24, t (98) = -8.059, p < 0.0001, r = 0.63). For completeness, we also note that an exploratory follow-up test for the main effect of experiment indicated that somewhat surprisingly, the final response times in the two conditions that set an initial deadline (Experiment 2: deadline & Experiment 4: deadline + load) also tended to be slightly faster than in the conditions without initial deadline (base rate task, b = 0.11, t (408) = 4.659, p < 0.0001, r = 0.22; syllogistic reasoning, b = 0.08, t (405) = 2.953, p = 0.0033, r = 0.15). Hence, although there was never a deadline for the second response, the fact that participants had previously been faced with one tended to make them speed-up for the final response too.

A) Base Rate: Initial response B) Base rate: final response

C) Syllogism: Initial response D) Syllogism: Final response

Figure 3. Mean conflict problems response latencies of the initial and final responses in the base-rate and syllogistic reasoning task for each of the direction of change categories. Error bars are 95% confidence intervals. Note that averages and confidence intervals were calculated on log-transformed latencies. The figure shows the back-transformed (anti-logged) latencies.

60 Bago Bence – Thèse de doctorat - 2018

Confidence – Initial response

The confidence analysis for the initial response for the syllogistic reasoning task showed that the direction of change factor improved model fit significantly, χ2 (7) = 79.88, p < 0.0001, but the experiment factor, χ2 (10) = 6.69, p = 0.08, and interaction did not, χ2 (19) = 16.33, p = 0.06. Follow-up contrast tests for the main effect of direction of change indicated that the visually identified trend towards lower initial confidence ratings for the two change categories (10 and 01) was also significant, b = -15.42, t (110) = -7.449, p < 0.0001, r = 0.58. For the base-rate problems we found a main effect of direction of change category, χ2 (7) = 41.7, p < 0.0001, as well as a main effect of experimental condition, χ2 (10) = 9.47, p = 0.024, and a significant interaction, χ2 (19) = 17.39, p = 0.043. With respect to the main effect of direction of change, follow-up tests established that as in the syllogistic reasoning task, initial confidence in the two categories in which the initial response was changed, was lower than in the no-change categories, b (116) = -14.96, t (116) = -7.14, p < 0.0001, r = 0.55. Because the experiment and direction of change factors interacted we also tested whether this effect was present in all experiments. Results showed that this was the case for Experiment 1(instructions only), b = -10.94, t (23) = -2.787, p = 0.01, r = 0.5, Experiment 2 (response deadline), b = -13.14, t (32) = -3.158, p = 0.004, r = 0.49, and Experiment 4 (deadline and load), b = -26.79, t (30) = -6.224, p < 0.0001, r = 0.75. However, the trend towards lower initial response confidence in the change vs no-change categories was less pronounced and failed to reach significance in Experiment 3 (load), b = -5, t (28) = -1.427, p = 0.16, r = 0.26.

Confidence – final response

The confidence analysis for the final response in the syllogistic reasoning task showed that the direction of change effect was significant, χ2 (7) = 100.58, p < 0.0001, but neither the main effect of experiment, χ2 (10) = 5.4, p = 0.145, nor the interaction, χ2 (19) = 15.95, p = 0.068, improved model fit significantly. A follow-up contrast test for the main effect of direction of change indicated that the visually identified trend towards lower final confidence rating for the two change categories (10 and 01) was significant, b = -20.26, t (108) = -9.523, p < 0.0001, r = 0.68. In the base-rate task both the main effect of direction of change, χ2 (7) = 49.79, p < 0.0001, and the main effect of experiment were significant, χ2 (10) = 10.84, p = 0.013. The interaction did not improve model fit, χ2 (19) = 7.28, p = 0.608. With respect to the

61 Bago Bence – Thèse de doctorat - 2018

main effect of direction of change, the follow-up test established that as in the syllogistic reasoning task and initial confidence analysis, final confidence in the 10 and 01 change categories was lower than the final confidence in the 00 and 11 categories, b = -15.34, t (117) = -7.522, p < 0.0001, r = 0.57. In sum, taken together, these analyses support the major trends that were visually identified. Both in the syllogistic reasoning and base-rate tasks we consistently observe across our experiments that answer change is associated with lowered response confidence and longer final rethinking times.

A) Base rate: initial response B) Base rate: final response

C) Syllogism: initial response D) Syllogism: final response

Figure 4. Mean conflict problem confidence ratings for initial and final responses in the base-rate and syllogistic reasoning tasks for each of the direction of change categories. Error bars are 95% confidence intervals.

62 Bago Bence – Thèse de doctorat - 2018

A) Confidence ratings: initial response B) Confidence ratings: final response

C) Reaction time: initial response D) Reaction time: final response

Figure 5. Mean initial and final conflict problem response latencies and confidence ratings averaged across reasoning tasks and experiments. Error bars are 95% confidence intervals. The figure shows the back-transformed (anti-logged) latencies.

Conflict detection analysis

Our key observation so far has been that in the cases where people end up giving a correct final response, they already selected this response as their initial, intuitive response. This suggests that correct logical responses can be generated by fast and automatic Type 1 processes. In this final section, we want to examine whether reasoners are faced with two competing intuitions at the first response stage. That is, one reason for why people in the 11

63 Bago Bence – Thèse de doctorat - 2018 category manage to give a correct initial response might be that the problem simply does not generate an intuitive heuristic response for them. Hence, they would only generate a correct, logical intuition and would not be faced with an interfering heuristic one. Likewise, one might question whether Type 1 processes for reasoners in the 00 direction of change category also generate a logical intuition in addition to the heuristic intuition that led them to select the incorrect response. In other words, so far our findings indicate that there are conflict trials on which some people generate correct, logical intuitions and there are conflict trials on which some people generate an incorrect, heuristic intuition. What we want to test here is whether both intuitions are also generated concurrently within the same trial. We can address this question by looking at the contrast between conflict and no- conflict problems. If conflict problems cue two conflicting initial intuitive responses, people should process the problems differently than the no-conflict problems (in which such conflict is absent) in the initial response stage. As we noted in the introduction, conflict detection studies that used a classic single response paradigm have shown that processing conflict problems typically results in lower confidence and longer response latencies, for example. Interestingly, Thompson et al. (2011) found that the lowered confidence ratings for conflict vs no-conflict were also observed for the initial response. Note that Thompson et al. did not find an impact on response latencies but this might be accounted for by the design characteristics of the two response paradigm (i.e., forcing people to give an explicit response as fast as possible might prevent the slowing effect from showing up). Nevertheless, Thompson et al.’s confidence findings suggest that - averaged over possible change types - there is some evidence for the hypothesis that reasoners are faced with conflicting intuitions when giving their initial responses. The critical question that we want to answer here is whether this is the case for each of the four direction of change categories. Therefore, we contrasted the confidence ratings for the first response on the conflict problems for each direction of change category with the first response confidence on the no-conflict problems. Note that we used only the dominant no-conflict 11 category for this contrast (which we will refer to as “baseline”), as responses in the other no-conflict direction of change categories cannot be interpreted unequivocally Figure 5 shows the results. Visual inspection of Figure 5 indicates that there is a general trend across tasks and experiments towards a decreased initial confidence when solving conflict problems for all direction of change categories. However, this effect is much larger for the 01 and 10 cases in which reasoners subsequently changed their initial response. This suggests that although reasoners might be experiencing some conflict between

64 Bago Bence – Thèse de doctorat - 2018 competing intuitions in all cases, this conflict is much more pronounced in the 00 and 01 case. For completeness, Figure 6 also presents an overview of the conflict vs no conflict contrast response time findings. As the figure indicates, the data were noisier here and there is no clearly consistent pattern that seems stable across tasks and experiments. To avoid spurious conclusions we refrained from analysing these response time data further (Simmons, Nelson, & Simonsohn, 2011).

A) Base Rate task 40

35

30

25

20

15

10

5

Confidence difference score (%) score difference Confidence 0 11 00 10 01 -5 Direction of change

B) Syllogism 40

35

30

25 Experiment 1: Instructions only 20 Experiment 2: Response deadline 15 Experiment 3: Load 10 Experiment 4: Deadline and Load 5

Confidence difference score (%)score differenceConfidence 0 11 00 10 01 -5 Direction of change

Figure 6. Confidence rating differences between each direction of change category and the baseline (11 no-conflict) for the initial responses in the base rate (A) and syllogistic reasoning task (B). Positive values mean that people were less confident in a given direction of change category than in the baseline. Error bars are standard errors of the corresponding t-values plotted here for illustrative purposes.

65 Bago Bence – Thèse de doctorat - 2018

A) Base rate task 1,5 1 0,5 0 -0,5 -1 -1,5 -2

-2,5 RT difference score (s) score difference RT -3 -3,5 -4 11 00 10 01 Direction of change

B) Syllogistic reasoning 1,5 1 0,5 0 -0,5 -1 Experiment 1: Instructions only -1,5 Experiment 2: Response deadline -2 Experiment 3: Load -2,5

RT difference score (s) score difference RT Experiment 4: Deadline and Load -3 -3,5 -4 11 00 10 01 Direction of change

Figure 7. Response latency differences between each direction of change category and the baseline (11 no-conflict) for the initial response in the base rate (A) and syllogistic reasoning (B) task. Negative values mean that responses took participants longer in a given direction of change category than in the baseline. Error bars are standard errors of the corresponding t- values plotted here for illustrative purposes. The figure shows the back-transformed (anti- logged) latencies.

To analyse the confidence results statistically we again created mixed effect multi- level models (Pinheiro et al., Sarkar, 2015). We ran a separate analysis for each of the four direction of change conflict problem categories in each of the two reasoning tasks. In the analysis the confidence for the first response in the direction of change category in question

66 Bago Bence – Thèse de doctorat - 2018 was contrasted with the first response confidence for 11 no-conflict problems which served as our baseline. We will refer to this contrast as the Conflict factor. The conflict factor was entered as fixed factor, and participants were entered as random factor. We also entered the experimental condition (Experiment 1-4) as fixed factor in the model to see whether it interacted with the conflict factor and the findings were stable across our experiments. As before, in the cases in which we found a significant interaction we also analysed each experiment separately. 11 Category. Results for the 11 category indicated that for the base rate problems, the main effect of conflict was significant, χ2 (5) = 41.94, p < 0.0001. There was no significant effect of experiment, χ2 (8) = 7.24, p = 0.065, and interaction, χ2 (11) = 1.97, p = 0.58. Hence, people were less confident in the 11 category, b = -6.567, t (179) = -6.789, p < 0.0001, r = 0.45, than in the baseline condition. Similar results were found with regard to the syllogistic reasoning problems, where we observed a significant main effect of conflict, χ2 (5) = 32.43, p < 0.0001, whereas the main effect of condition, χ2 (8) = 3.96, p = 0.2663, and interaction, χ2 (11) = 7.34, p = 0.062 were not significant. People were less confident in the 11 category, b = -4.804, t (285) = -5.825, p < 0.0001, r = 0.33, compared to the baseline condition. 00 category. In the base rate task, the effect of conflict was significant, χ2 (5) = 31.17, p < 0.0001, as well as the effect of experiment, χ2 (8) = 11.74, p = 0.008, but not the interaction, χ2 (11) = 6.12, p = 0.11. People were less confident in the 00 responses, b (279) = -5.598, t (279) = -5.657, p < 0.0001, r = 0.32, than the baseline. In the syllogistic reasoning problems the effect of conflict did not reach significance, χ2 (5) = 3.35, p = 0.067, although there was a trend in the expected direction. The effect of condition, χ2 (8) = 2.53, p = 0.47, and the interaction were not significant, χ2 (11) = 2.67, p = 0.45. 10 category. For the base rate problems we found a significant main effect of conflict, χ2 (5) = 42.85, p < 0.0001. The experiment factor, χ2 (8) = 7.72, p = 0.052, and interaction did not reach significance, χ2 (11) = 1.48, p = 0.686. The 10 answers yielded a lower confidence level, b = -20.58, t (37) = -6.667, p < 0.0001, r = 0.74, than the baseline condition. Similar results were found for the syllogistic reasoning problems; there was a significant effect of conflict, χ2 (5) = 72.81, p < 0.0001, but no significant effect of condition, χ2 (8) = 2.3, p = 0.51, or interaction, χ2 (11) = 5.72, p = 0.13. The initial confidence in the 10 category was lower than in the baseline condition, b = -24.54, t (44) = -8.834 p < 0.0001, r = 0.8. 01 category. In the base rate task, conflict was found to be significant, χ2 (5) = 93.62, p < 0.0001, as well as condition, χ2 (8) = 10.65, p = 0.01, and the interaction factor, χ2 (11) = 13.26, p = 0.0041. With regard to syllogistic reasoning problems, we found that conflict, χ2

67 Bago Bence – Thèse de doctorat - 2018

(5) = 46.82, p < 0.0001, and the interaction, χ2 (11) = 7.84, p = 0.049 improved model fit significantly, but not the main effect of experimental condition, χ2 (8) = 4.36, p = 0.23. Because of the interactions we also ran the conflict contrast for each of the experiments separately. As Figure 5 indicates, although the confidence decrease was especially pronounced in Experiment 4 (deadline + load) for the base-rate task, our analyses indicated that it reached significance in each of the four experiments (in every condition r > 0.57, and p < 0.01). A similar pattern was observed in the syllogistic reasoning task. Although the conflict factor failed to reach significance in the instructions only condition, b = -3.8, t (18) = -1.364, p = 0.19, r = 0.31, the effect was present in every other category (all r > 0.61, and p < 0.01). Taken together, the conflict detection analysis on the confidence ratings for the first, intuitive answer indicates that by and large participants showed decreased response confidence (in contrast with the no-conflict baseline) after having given their first, intuitive response on the conflict problems in all direction of change categories. This supports the hypothesis that participants were always being faced with two conflicting responses when solving the conflict problems. In other words, results imply that 11 responders also activate a heuristic intuition in addition to the logical response they selected. Likewise, 00 responders also activate a logical intuition despite their selection of the incorrect, heuristic response. But visual inspection also clearly shows that the decreased confidence effect is much larger for the 10 and 01 cases than for the 11 and 00 ones. A contrast analysis6 that tested this trend directly indicated that it was indeed significant, both for the base rate, Z = -16.57, p < 0.0001, (r = 0.33 for the no-change group, while r = 0.71 for the change group), and syllogistic reasoning problems, Z = -17.49, p < 0.0001, (r = 0.23 for no-change and r = 0.69 for change group). This indicates that although reasoners might be generating two intuitive responses and are being affected by conflict between them in all cases, this conflict is much more pronounced in cases where people subsequently change their answer. This tentatively suggests that it is this more pronounced conflict experience that makes them subsequently change their answer. As we will explain in the General Discussion section, we believe that this more pronounced conflict in the change categories points to relative differences in the strength of the logical and intuitive intuition in the different answer categories.

6 For this contrast analysis, we first calculated the r effect sizes in the same way we did in previous sections. As a next step we used Fisher r-to-z transformation to assess the statistical difference between the two independent r- values. We used the following calculator for the transformation and p-value calculation: http://vassarstats.net/rdiff.html

68 Bago Bence – Thèse de doctorat - 2018

GENERAL DISCUSSION

In this study we aimed to examine the time course assumption of classic default- interventionist dual process theory (DI). DI theory suggests that reasoners typically produce a fast, intuitive (Type 1) response by default, and that subsequently this response might be overridden by further, more deliberative processes (Type 2). The quick, initial Type 1 answer is believed to be driven by heuristics based on stereotypes or common beliefs, thus in “conflict” situations (where the or common belief cues a response that differs from what the logical answer would be) this initial intuition is producing an erroneous answer. Type 2 thinking is expected to override this response, but sometimes Type 2 processing fails too, which results in biased reasoning. Hence, in principle correct responding for conflict items must originate from slow, deliberative Type 2 processing according to DI theory. In the present study we used the two response paradigm to test this assumption. In this paradigm people are instructed to give a very quick initial response and are afterwards allotted as much time as they want to indicate their final response. We also used different research designs to minimize the possibility that participants engaged in Type 2 processing when giving their initial response so as to make sure that the initial answer provided by the participant was really intuitive in nature. Our analyses focused on four possible direction of change categories: initial response correct, but final response incorrect (10), initial response incorrect and final response correct (01), both responses correct (11) and both incorrect (00). DI theory predicts that reasoners either give a 00 response when they cannot override their first, erroneous heuristic answer, or a 01 response, when Type 2 processing overrides and corrects the initial incorrect response. In line with this hypothesis, we got a high prevalence of 00 responses (about 50% across our studies and reasoning tasks) which basically means that people were typically biased and failed to answer the problems correctly. Less frequently - in about 10% of the cases - we also observed responses in the 01 category. This suggests that correction of an initial erroneous response by Type 2 processing is rare which is also in line with DI theory. However, contrary to DI predictions, our key finding was a relatively high prevalence (+30% throughout) of 11 answers, which suggests that people were giving the correct response intuitively. Confidence and latency analyses indicated that both in the initial and final response stage answers in the 11 and 00 categories were given quickly and with high confidence. For the rare 01 and 10 responses in which reasoners changed their initial answer, we observed - in

69 Bago Bence – Thèse de doctorat - 2018 line with previous observations (e.g., Thompson & Johnson, 2014; Thompson et al., 2011) lower confidence ratings and longer final response times. As a final step, we examined whether people were facing two competing intuitions at the initial response stage. Therefore, we contrasted the initial confidence levels and latencies of conflict and no-conflict problems. Initial response latencies did not differ but the confidence ratings did indicate that participants were generally experiencing some conflict in every direction of change category: reasoners were less confident in the correctness of their answer in conflict than in no-conflict trials. This suggests that people typically generated both a logical and heuristic intuition when faced with the conflict problems. If reasoners generated only one type of intuition any intrinsic conflict should obviously not impact their processing. However, the size of the experienced conflict (i.e., the difference between confidence levels of conflict and no-conflict problems) was quite different across categories. In categories where people changed their answer (10 and 01), people experienced more conflict than in the 11 and 00 cases where they did not change their initial answer. Hence, although reasoners in all direction of change categories might be experiencing conflict between competing intuitions, this conflict seems much more pronounced in the cases in which an initial answer is changed. In line with some recent suggestions this might indicate that one factor that determines whether or not a first intuitive answer is changed, is the level of experienced conflict (e.g., Thompson & Johnson, 2014). With few exceptions our findings were consistent across the two reasoning tasks and four different studies that we ran. This stability indicates that the findings are quite robust and minimize the possibility that they result from any idiosyncratic task or experimental feature. Nevertheless, the study only started to adopt the current direction of change analysis and experimental control. It will obviously be important to generalize and validate the findings further in future studies. With this consideration in mind the findings do indicate that there is substantial ground to question the traditional DI theory time-course characterization. But it is important to avoid confusion with respect to our precise claim. Our core finding is that people are able to give the logical answer to conflict problems intuitively. It is this phenomenon that we are referring to as fast or Type 1 logic to contrast it with slow and demanding logical reasoning based on Type 2 processing. However, it should be stressed that although the outcome of the two types of logical responding might be the same (i.e., selection of the correct response) this does obviously not entail that the underlying processing is also similar. That is, we do not claim that people are going through the exact same complex calculations as they would in the Type 2 case but simply perform these faster in the Type 1 case. Clearly, the

70 Bago Bence – Thèse de doctorat - 2018 point is that both types of logical responding are based on different types of processes (i.e., Type 1 and Type 2) and will consequently have different characteristics. For example, we believe it is quite unlikely that people will manage to justify their initial logical response and explain why it is correct without engaging in additional Type 2 processing (e.g., De Neys & Glumicic, 2008; Mercier & Sperber, 2011; Trouche, Sander, & Mercier, 2014). To further clarify this point let us draw an analogy between our account and the recall and recognition distinction in memory research (e.g., Haist, Shimamura, & Squire, 1992). Imagine you are given a list of ten names to study and your memorization performance is being tested. Recall memory will allow you to explicitly retrieve (some of) the names on the list (e.g., you might manage to jot down that “Dave”, “Tom”, and “Lauren” were among the presented names). Recognition memory will allow you to merely decide whether or not a certain item was on the list (e.g., you might manage to say “yes” when asked whether “Dave” was among the presented names or not). Sometimes you might not be able to recall the name, but you could still recognize whether you saw it on the list or not. Recall and recognition can both allow us to retrieve a memory trace but they differ (among other things) in the processes involved in memory retrieval (e.g., Anderson & Bower, 1972; Ben-Yakov, Dudai, & Mayford, 2015; Buratti & Allwood, 2012). This recall/recognition dichotomy is akin to what we are alluding to here. In our view, fast Type 1 logical responding can be conceived as a more superficial, recognition memory-like process that activates a stored logical principle and allows us to recognize that a competing heuristic intuition is questionable, without us being able to explicitly label or justify the principle. Although the present study does not allow us to pinpoint how the Type 1 and Type 2 logical responses differ precisely, the key point we want to make is that our theorizing does not entail that fast, Type 1 logical responses are similar – let alone superior – to Type 2 logical responses. Note that the differential nature of Type 1 and Type 2 logical responding that we are alluding to here might receive some support from the recent work of Trippas et al. (2016). Inspired by initial studies of Morsanyi and Handley (2012a, 2012b), Trippas et al. presented participants with logical arguments and simply asked them to indicate how much they liked the conclusion or how bright (i.e., “luminous”) they judged the conclusion to be. Although these tasks made no reference whatsoever to logical reasoning, the authors observed that people gave higher liking and brightness ratings to logically valid than to logically invalid conclusions. As the authors noted, these findings (i.e., sensitivity to logical structure outside of an explicit reasoning context) lend credence to the idea that logical structure might be processed automatically and intuitively. But more critically, Trippas et al. explained (and

71 Bago Bence – Thèse de doctorat - 2018 predicted) the results within a fluency misattribution framework (e.g., Topolinski & Strack, 2009). The rationale is that if logical structure is processed intuitively, valid conclusions will give rise to feelings of conceptual fluency. However, because of the implicit nature of the process people will have no explicit insight into the nature of this feeling. As Trippas et al. argue, the enhanced processing fluency of logically valid conclusions will consequently be (mis)attributed to positive affect and will lead to the judgment that the conclusion is brighter and more likeable (see also Thompson & Morsanyi, 2012, for a related suggestion). This fluency account might help to characterize the precise origin of Type 1 logical sensitivity and underscores the point that fast (Type 1) and slow (Type 2) logical responses likely result from a qualitatively different type of processing. Before discussing more theoretical implications of our findings we would like to highlight some important methodological and practical considerations. One issue concerns the validity of the two-response paradigm. As we noted, a potential critique of previous studies that adopted a two-response paradigm is that we cannot be certain that participants respect the instructions to give the first intuitive response and did not engage in Type 2 processing during the initial response stage. In the present study we used a combination of methods (e.g., response deadline, cognitive load) to minimize the amount of Type 2 thinking at the initial answer stage. By and large, we found that none of these experimental manipulations critically affected the results. This implies that participants in the standard/instruction-only paradigm are in fact doing what they are instructed to do and refrain from engaging in Type 2 processing during the initial response stage. In this sense, the present study presents a methodological validation of the standard two-response paradigm that relies on instructions only. When adopting a two-response paradigm it is reasonable for scholars to assume that participants will stick to purely intuitive responding in the initial response phase (see also Thompson et al., 2011, for a further discussion of the validity of the two-response paradigm). Another methodological point is that from the outset our task design was optimized to identify potential Type 1 logical responding in case it existed. For example, we used a fast response version of the base-rate task (Pennycook, Cheyne, et al., 2014) that did not require reading through a lengthy description and minimized response time variance. In addition, it is also the case that both of our reasoning tasks used a simplified binary-response format in which participants selected one of two presented response options by clicking on them. Consequently, participants did not have to explicitly generate the conclusion themselves when giving their initial response. One might wonder whether these adaptations or simplifications invalidate the results. For example, a critic might object that with a harder task version in

72 Bago Bence – Thèse de doctorat - 2018 which reasoners had to generate their own conclusions, there would be no evidence for fast logical responding. There are a number of points to make here. First, although our tasks were modified, the dominant response category was still of the 00 type. In the majority of cases, participants failed to solve the problems correctly even after they were allowed additional processing time in the final response stage. If our tasks would have been too easy, our sample of educated adults should obviously not have struggled to solve them correctly. Second, if we were to ask participants to type down or verbalize a conclusion, the typing or verbalization itself might need controlled processing and prevent a proper measurement of pure, intuitive processing. By definition, if we want to study intuitive processing, we need to use the proper methodological tools to measure it. Third, it is not necessary, nor claimed that Type 1 and Type 2 logical responding have the same characteristics. We concur, for example, that it is quite likely that the explicit conclusion generation in syllogistic reasoning cannot be achieved by Type 1 processes. But the point that there are differences between a fast/intuitive and slower/deliberative type of logical responding should not be held against the postulation of the existence of logical intuitions per se. This would be as nonsensical as arguing against the postulation of recognition memory because the recognized memory item cannot be explicitly recalled. From a more theoretical point of view, it will be clear that it is hard for the standard DI theory to explain the current findings. But if the DI model is not adequate, are there possible alternative conceptualisations that allow us to make theoretical sense of the results? One might be tempted here to consider so-called parallel dual process models (e.g., Epstein, 1994; Sloman, 1996). These parallel models are a classic alternative to the popular DI model. The DI model posits that Type 1 and Type 2 processes interact in a serial fashion. Reasoners initially start with Type 1 processing and Type 2 processing is only engaged in a later stage. In the parallel model both Type 1 and Type 2 processing are engaged simultaneously from the start. Hence, Type 1 and Type 2 processing operate in parallel rather than serially. One might wonder whether such parallel Type 2 processing might explain correct immediate responses. Note, however, that - as all dual process models – the parallel model still defines Type 2 processing as being slow and demanding of cognitive resources (Epstein, 1994; Sloman, 1996). Now, our key observation was not that people generated correct responses, but that they did so intuitively. Correct initial responding was observed even when Type 2 processing was knocked out by a challenging deadline and concurrent load task. Hence, even if the parallel’s model assumption with respect to the simultaneous activation of Type 1 and Type 2 processing were to be correct, it cannot explain the occurrence of fast and intuitive logical

73 Bago Bence – Thèse de doctorat - 2018 responding in the present study. Moreover, ideally we do not only need to account for the occurrence of initial correct responses, we also need to explain the direction of change results. That is, how can we explain why one reasoner is ending up in the 00 category and another one in the 11 or 01 category, for example? We believe that a more promising explanation can be offered by recent hybrid dual process models of reasoning (De Neys, 2012; Handley & Trippas, 2015; Pennycook et al., 2015; see also Macchi & Bagassi, 2015, for a related view). Hybrid models assume that more than one Type 1 answer can be generated as a result of parallel intuitive processes, which might be followed by the more demanding Type 2 processing. Bluntly put, the hybrid model combines key features of the serial and parallel model: Just like the serial model it assumes that Type 2 processing is optional and starts later than Type 1 processing. And just like the parallel model, it assumes that there is parallel logical and heuristic processing. However, unlike the parallel model it is claimed that this logical processing results from Type 1 processing. For example, the hybrid “logical intuition model” of De Neys (2012) suggests that people intuitively detect the conflict between heuristic responses and logical principles. The basic idea is that conflict is caused by two simultaneously activated Type 1 responses, one is cueing the logical response based on stored knowledge of elementary logical principles, another is cueing the heuristic response based on belief-based semantic associations. Critically, De Neys (2012, 2014)indicated that this does not entail that the two Type 1 responses are similar in strength (see also Pennycook et al., 2015; Pennycook, Trippas, et al., 2014). More specifically, the idea is that most people are typically biased when solving traditional reasoning problems, for example, precisely because their heuristic intuition is more salient or stronger (i.e., has a higher activation level) than their logical intuition. Building on this differential strength suggestion can help us to make sense of the direction of change findings and explain why one ends up in a specific change category. Note that what we refer to here as the differential strength of different intuitions is also a key feature of Pennycook et al.’s (2015) three-stage dual process model. This model proposes that initially multiple Type 1 responses will be cued by a stimulus (Stage 1), leading to the potential for conflict detection between different Type 1 responses (Stage 2). If successful, conflict detection will lead to Type 2 processing (Stage 3). What is critical for our present purposes is that one central feature of the model is that the multiple, potentially competing, Type 1 responses that are initially cued by a problem (e.g., a “logical” and “heuristic” intuition) are envisaged to differ in the ease and speed in which they come to

74 Bago Bence – Thèse de doctorat - 2018 mind. This idea nicely fits with our differential strength suggestion and allow us to make sense of the present findings. More specifically, what we propose is that we need to consider both absolute (which one of the two intuitions is strongest?) and relative (how pronounced is the activation difference between both intuitions?) strength differences between the logical and heuristic intuition. The initial response will be determined by the absolute strength level. Whichever intuition is strongest will be selected as initial response. Whether or not the initial response gets subsequently changed will be determined by the relative difference between both intuitions. The smaller the difference, the less confident one will be, and the more likely that the initial response will be changed. Figure 7 illustrates this idea. In the figure we have plotted the strength of the logical and heuristic intuition for each of the four direction of change categories in (imaginary) activation strength “units” for illustrative purposes. For example, in the 11 case, the logical intuition might be 4 units strong whereas the heuristic intuition might be only 1 unit strong. In the 00 case, we would have the opposite situation with a 4 unit strong heuristic intuition and a much weaker, 1 unit logical intuition. In the two change categories, one of the two intuitions will also dominate the other but the relative difference will be less pronounced. For example, in the 01 case the heuristic intuition might have strength level 3 whereas the logical intuition has strength level 2. Because the relative difference is less pronounced, there will be more doubt and this will be associated with longer final rethinking and answer change. In other words, in each of the four direction of change categories there will be differences in which intuition is the dominant one and how dominant the intuition is. The more dominant an intuition is, the more likely that it will be selected as initial response, and the less likely that it will be changed. Obviously, this “activation strength” proposal will need to be further tested but we believe it presents a coherent and parsimonious account to explain direction of change findings and re-conceptualize the time-course assumptions in the classic DI model. To avoid confusion, it should be stressed that the hybrid model we are advocating does not question that people rely by default on Type 1 processing and switch to Type 2 processing in a later stage. As we noted, the hybrid model still maintains the DI feature that default Type 1 processing precedes Type 2 processing. The key point is that the default Type 1 activation needs to include some elementary logical processing. If classic DI models allow for the postulation of logical intuitions as characterized here, they are of course fully coherent with the hybrid view (e.g., see De Neys, 2014). As one reviewer noted, at least at a high level of conceptualization classic DI theory might be open towards this possibility. If this is the case,

75 Bago Bence – Thèse de doctorat - 2018 the development or revision of DI theory we call for here would not be inconsistent with the spirit of classic DI theorists’ ideas. To conclude, the present studies indicate that fast and automatic Type 1 processing can cue a correct logical response from the start of the reasoning process. This pattern of results lends credence to a model in which the relative strength of different types of intuitions determines reasoning performance.

Figure 8. Illustration of possible absolute (which one of the two intuitions is strongest?) and relative (how pronounced is the activation difference between both intuitions?) strength differences between the logical and heuristic intuition in the different direction of change categories. The figure shows the strength of the logical and heuristic intuition for each of the in (imaginary) activation strength “units” for illustrative purposes

76 Bago Bence – Thèse de doctorat - 2018

REFERENCES

Aczel, B., Szollosi, A., & Bago, B. (2016). Lax monitoring versus logical intuition: The determinants of confidence in conjunction fallacy. Thinking & Reasoning, 22(1), 99– 117. https://doi.org/10.1080/13546783.2015.1062801 Anderson, J. R., & Bower, G. H. (1972). Recognition and retrieval processes in free recall. Psychological Review, 79(2), 97–123. Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59(4), 390– 412. Barbey, A. K., & Sloman, S. A. (2007). Base-rate respect: From to dual processes. Behavioral and Brain Sciences, 30(3), 241–254. Ben-Yakov, A., Dudai, Y., & Mayford, M. R. (2015). Memory Retrieval in Mice and Men. Cold Spring Harbor Perspectives in Biology, 7(12), a021790. Bonner, C., & Newell, B. R. (2010). In conflict with ourselves? An investigation of heuristic and analytic processes in decision making. Memory & Cognition, 38(2), 186–196. Buratti, S., & Allwood, C. M. (2012). The accuracy of meta-metacognitive judgments: Regulating the realism of confidence. Cognitive Processing, 13(3), 243–253. De Neys, W. (2006a). Automatic–heuristic and executive–analytic processing during reasoning: Chronometric and dual-task considerations. The Quarterly Journal of Experimental Psychology, 59(6), 1070–1100. De Neys, W. (2006b). Dual processing in reasoning two systems but one reasoner. Psychological Science, 17(5), 428–433. De Neys, W. (2012). Bias and conflict a case for logical intuitions. Perspectives on Psychological Science, 7(1), 28–38. De Neys, W. (2014). Conflict detection, dual processes, and logical intuitions: Some clarifications. Thinking & Reasoning, 20(2), 169–187. De Neys, W. (2015). Heuristic Bias and Conflict Detection During Thinking. Psychology of Learning and Motivation, 62, 1–32. De Neys, W., Cromheeke, S., & Osman, M. (2011). Biased but in doubt: Conflict and decision confidence. PloS One, 6(1), e15954. De Neys, W., & Glumicic, T. (2008). Conflict monitoring in dual process theories of thinking. Cognition, 106(3), 1248–1299. De Neys, W., Moyens, E., & Vansteenwegen, D. (2010). Feeling we’re biased: Autonomic arousal and reasoning conflict. Cognitive, Affective, & Behavioral Neuroscience, 10(2), 208–216. De Neys, W., Rossi, S., & Houdé, O. (2013). Bats, balls, and substitution sensitivity: Cognitive misers are no happy fools. Psychonomic Bulletin & Review, 20(2), 269– 273. De Neys, W., & Schaeken, W. (2007). When people are more logical under cognitive load. Experimental Psychology (Formerly Zeitschrift Für Experimentelle Psychologie), 54(2), 128–133.

77 Bago Bence – Thèse de doctorat - 2018

De Neys, W., Vartanian, O., & Goel, V. (2008). Smarter Than We Think When Our Brains Detect That We Are Biased. Psychological Science, 19(5), 483–489. Epstein, S. (1994). Integration of the cognitive and the psychodynamic unconscious. American Psychologist, 49(8), 709–724. Evans, J. (2010). Thinking Twice: Two Minds in One Brain. Oxford: Oxford University Press. Evans, J. S. B. (2012). Spot the difference: distinguishing between two kinds of processing. Mind & Society, 11(1), 121–131. Evans, J. S. B., & Curtis-Holmes, J. (2005). Rapid responding increases belief bias: Evidence for the dual-process theory of reasoning. Thinking & Reasoning, 11(4), 382–389. Evans, J. S. B., & Stanovich, K. E. (2013). Dual-process theories of higher cognition advancing the debate. Perspectives on Psychological Science, 8(3), 223–241. Franssens, S., & De Neys, W. (2009). The effortless nature of conflict detection during thinking. Thinking & Reasoning, 15(2), 105–128. Frederick, S. (2005). Cognitive reflection and decision making. The Journal of Economic Perspectives, 19(4), 25–42. Gangemi, A., Bourgeois-Gironde, S., & Mancini, F. (2015). Feelings of error in reasoning— in search of a phenomenon. Thinking & Reasoning, 21(4), 383–396. https://doi.org/10.1080/13546783.2014.980755 Gigerenzer, G., Hell, W., & Blank, H. (1988). Presentation and content: The use of base rates as a continuous variable. Journal of Experimental Psychology: Human Perception and Performance, 14(3), 513–525. Gilovich, T., Griffin, D. W., & Kahneman, D. (2002). Heuristics and biases: The psychology of intuitive judgement. Cambridge: Cambridge University Press. Haist, F., Shimamura, A. P., & Squire, L. R. (1992). On the relationship between recall and recognition memory. Journal of Experimental Psychology: Learning, Memory, and Cognition, 18(4), 691–702. Handley, S. J., Newstead, S. E., & Trippas, D. (2011). Logic, beliefs, and instruction: A test of the default interventionist account of belief bias. Journal of Experimental Psychology: Learning, Memory, and Cognition, 37(1), 28–43. Handley, S. J., & Trippas, D. (2015). Chapter Two-Dual Processes and the Interplay between Knowledge and Structure: A New Parallel Processing Model. Psychology of Learning and Motivation, 62, 33–58. Kahneman, D. (2011). Thinking, fast and slow. New York, NY: Farrar, Straus and Giroux. Klauer, K. C., & Singmann, H. (2013). Does logic feel good? Testing for intuitive detection of logicality in syllogistic reasoning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 39(4), 1265–1273. Kruglanski, A. W. (2013). Only one? The default interventionist perspective as a unimodel— Commentary on Evans & Stanovich (2013). Perspectives on Psychological Science, 8(3), 242–247. Macchi, L., & Bagassi, M. (2015). When analytic thought is challenged by a misunderstanding. Thinking & Reasoning, 21(1), 147–164. Mata, A., Schubert, A.-L., & Ferreira, M. B. (2014). The role of language comprehension in reasoning: How “good-enough” representations induce biases. Cognition, 133(2), 457–463.

78 Bago Bence – Thèse de doctorat - 2018

Mercier, H., & Sperber, D. (2011). Why do humans reason? Arguments for an argumentative theory. Behavioral and Brain Sciences, 34(2), 57–74. Miyake, A., Friedman, N. P., Rettinger, D. A., Shah, P., & Hegarty, M. (2001). How are visuospatial working memory, executive functioning, and spatial abilities related? A latent-variable analysis. Journal of Experimental Psychology: General, 130(4), 621– 640. Morsanyi, K., & Handley, S. (2012a). Does thinking make you biased? The case of the engineers and lawyer problem. In Proceedings of the Annual Meeting of the Cognitive Science society (Vol. 34, pp. 2049–2054). Morsanyi, K., & Handley, S. J. (2012b). Logic feels so good—I like it! Evidence for intuitive detection of logicality in syllogistic reasoning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 38(3), 596–616. Newman, I., Gibb, M., & Thompson, V. A. (2016). Rule-based reasoning is fast and belief- based reasoning can be slow: Challenging current explanations of belief -bias and base-rate neglect. Manuscript Submitted for Publication. Pennycook, G., Cheyne, J. A., Barr, N., Koehler, D. J., & Fugelsang, J. A. (2014). Cognitive style and religiosity: The role of conflict detection. Memory & Cognition, 42(1), 1–10. Pennycook, G., Fugelsang, J. A., & Koehler, D. J. (2012). Are we good at detecting conflict during reasoning? Cognition, 124(1), 101–106. Pennycook, G., Fugelsang, J. A., & Koehler, D. J. (2015). What makes us think? A three- stage dual-process model of analytic engagement. Cognitive Psychology, 80, 34–72. Pennycook, G., & Thompson, V. A. (2012). Reasoning with base rates is routine, relatively effortless, and context dependent. Psychonomic Bulletin & Review, 19(3), 528–534. Pennycook, G., Trippas, D., Handley, S. J., & Thompson, V. A. (2014). Base rates: Both neglected and intuitive. Journal of Experimental Psychology: Learning, Memory, and Cognition, 40(2), 544–554. Pinheiro, J., Bates, D., Debroy, S., & Sarkar, D. (2015). nlme: Linear and Nonlinear Mixed Effects Models. Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359–1366. Simon, G., Lubin, A., Houdé, O., & De Neys, W. (2015). Anterior cingulate cortex and intuitive bias detection during number conservation. Cognitive Neuroscience, 6(4), 158–168. https://doi.org/10.1080/17588928.2015.1036847 Singmann, H., Klauer, K. C., & Kellen, D. (2014). Intuitive logic revisited: new data and a Bayesian mixed model meta-analysis. PloS One, 9(4), e94223. Sloman, S. A. (1996). The empirical case for two systems of reasoning. Psychological Bulletin, 119(1), 3–22. Stanovich, K. (2011). Rationality and the reflective mind. Oxford: Oxford University Press. Stanovich, K. E., & Toplak, M. E. (2012). Defining features versus incidental correlates of Type 1 and Type 2 processing. Mind & Society, 11(1), 3–13. Stanovich, K. E., & West, R. F. (2000). Advancing the rationality debate. Behavioral and Brain Sciences, 23(5), 701–717.

79 Bago Bence – Thèse de doctorat - 2018

Stupple, E. J., Ball, L. J., & Ellis, D. (2013). Matching bias in syllogistic reasoning: Evidence for a dual-process account from response times and confidence ratings. Thinking & Reasoning, 19(1), 54–77. Thompson, V. A., & Johnson, S. C. (2014). Conflict, metacognition, and analytic thinking. Thinking & Reasoning, 20(2), 215–244. Thompson, V. A., Prowse Turner, J. A., & Pennycook, G. (2011). Intuition, reason, and metacognition. Cognitive Psychology, 63(3), 107–140. Thompson, V., & Morsanyi, K. (2012). Analytic thinking: do you feel like it? Mind & Society, 11(1), 93–105. Topolinski, S., & Strack, F. (2009). The architecture of intuition: Fluency and affect determine intuitive judgments of semantic and visual coherence and judgments of grammaticality in artificial grammar learning. Journal of Experimental Psychology: General, 138(1), 39–63. Travers, E., Rolison, J. J., & Feeney, A. (2016). The time course of conflict on the Cognitive Reflection Test. Cognition, 150, 109–118. Trippas, D., Handley, S. J., Verde, M. F., & Morsanyi, K. (2016). Logic Brightens My Day: Evidence for Implicit Sensitivity to Logical Validity. Journal of Experimental Psychology: Learning, Memory, and Cognition, 1448–1457. https://doi.org/http://dx.doi.org/10.1037/ xlm0000248 Trouche, E., Sander, E., & Mercier, H. (2014). Arguments, more than confidence, explain the good performance of reasoning groups. Journal of Experimental Psychology: General, 143(5), 1958–1971. Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185(4157), 1124–1131. Villejoubert, G. (2009). Are representativeness judgments automatic and rapid? The effect of time pressure on the conjunction fallacy. In Proceedings of the Annual Meeting of the Cognitive Science society (Vol. 30, pp. 2980–2985).

80 Bago Bence – Thèse de doctorat - 2018

Chapter 2: The smart System 1: Evidence for the intuitive nature of correct responding in the bat-and-ball problem

Abstract

Influential work on reasoning and decision making has popularized the idea that sound reasoning requires correction of fast, intuitive thought processes by slower and more demanding deliberation. We present seven studies that force us to question this corrective view of human thinking. We focused on the very problem that has been widely featured as the paradigmatic illustration of the corrective view, the well-known bat-and-ball problem. A two- response paradigm in which people were required to give an initial response under time- pressure and cognitive load allowed us to identify the intuitively generated response that preceded the final response given after deliberation. Across our studies we observe that correct final responses are often non-corrective in nature. Many reasoners who manage to answer the bat-and-ball problem correctly after deliberation already solved it correctly when they reasoned purely intuitively in the initial response phase. This implies that sound reasoners do not necessarily need to deliberate to correct their intuitions, their intuitions are often already correct. Pace the corrective view, findings suggest that in these cases they deliberate to verify correct intuitive insights.

Based on Bago, B. & De Neys, W. (under review). The smart System 1: Evidence for the intuitive nature of correct responding in the bat-and-ball problem.

Supplementary material for this chapter can be found in Appendix B.

81 Bago Bence – Thèse de doctorat - 2018

Introduction

"The intellect has little to do on the road to discovery. There comes a leap in , call it intuition or what you will, and the solution comes to you and you don't know why or how." - Albert Einstein (as quoted by Oesper, 1975)

"It is through logic that we prove, but through intuition that we discover." - Henri Poincaré (as quoted by Poincaré, 1914)

There are few problems in the reasoning and decision making field that attracted so much interest as the bat-and-ball problem. In its original formulation as it was first proposed by Frederick (2005) the problem states:

“A bat and a ball together cost $1.10. The bat costs $1 more than the ball. How much does the ball cost?”

Intuitively, the answer “10 cents” readily springs to mind. Indeed, this is the answer that most people tend to give (Bourgeois-Gironde & Van Der Henst, 2009; Frederick, 2005). However, although the answer seems obvious it is also dead wrong. Clearly, if the ball costs 10 cents and the bat costs $1 more, then the bat would cost $1.10. In this case, the bat and ball together would cost $1.20 and not $1.10. After some reflection it is clear that the ball must cost 5 cents and the bat costs – at a dollar more - $1.05 which gives us a total of $1.10. Many people who are presented with the bat-and-ball problem will attest that the “10 cents” answer seems to pop-up in a split second whereas working to the “5 cents” solution seems to take more time and effort. As such, it is not surprising that the problem has been widely featured – from the scientific literature (Frederick, 2005; Kahneman, 2011; Mastrogiorgio & Petracca, 2014; Sinayev & Peters, 2015) to popular science best-sellers (Gladwell, 2005; Levitt & Dubner, 2010) to the pages of the Wall Street Journal (Lehrer, 2011) – as a textbook illustration of the dual process nature of human thinking. The dual process model conceives of thinking as an interaction between intuitive and deliberative processing, often referred to as System 1 and System 2 (e.g., Epstein, 1994; Evans & Stanovich, 2013; Kahneman, 2011; Sloman, 1996). Although there are many types of dual process models they all postulate that the intuitive System 1 operates fast and effortless whereas the deliberate System 2 is slower and effortful (i.e., it heavily burdens our

82 Bago Bence – Thèse de doctorat - 2018 limited cognitive resources). In the bat-and-ball problem it is System 1 that is assumed to be cuing the immediate “10 cents” response. Computation of the “5 cents” response is assumed to require the engagement of System 2. Because human reasoners have a strong tendency to minimize demanding computations, they will often refrain from engaging or completing the slow System 2 processing when the fast System 1 has already cued a response. Consequently, most reasoners will simply stick to the System 1 response that quickly came to mind and fail to consider the correct response. Reasoners who do manage to solve the problem correctly will need to correct the initially generated intuitive response after having completed their System 2 computations. At the heart of the dual process model lies a “corrective” view on sound reasoning and deliberation: Correct responding is assumed to require correction of an intuitive System 1 response by slower and more demanding System 2 processing. This can easily lead to a somewhat grim characterization of System 1 as a source of error that needs supervision from the deliberate System 2 (Marewski & Hoffrage, 2015). Bluntly put, System 2 is portrayed as the good guy that cleans up the mess left behind by the fast but error prone System 1. To be clear, it should be stressed that the dual process model does not simply postulate that intuitions are always incorrect. It is not disputed that intuitive responses can be appropriate and helpful in some cases (Evans, 2010; Evans & Stanovich, 2013; Kahneman, 2011; Sloman, 1996). The point is simply that intuitive responses have the potential to be incorrect and therefore need to be monitored and sometimes corrected. It is this corrective aspect of the model that can lead to the rather negative view of System 1. And it is this corrective nature of System 2 processing for which the bat-and-ball problem seems to present a paradigmatic example. Although the corrective dual process model and related “intuition-as-bias” view have been very influential various scholars have called for a more positive view towards intuitive processing (e.g., Klein, 2004; Peters, 2012; Reyna, 2012). More generally, as our opening quotes illustrate, historically speaking, leading mathematicians and physicist have long favored a quite different look on intuition and deliberation. In scientific discoveries and mathematical breakthroughs intuition has often been conceived as guiding the intellect. Great minds such as Newton, Einstein, Poincaré, and Kukulé famously reported how their major breakthroughs – from the discovery of gravity to the Benzene ring structure – came to them intuitively (Ellenberg, 2015; Marewski & Hoffrage, 2015). Although this “intuition-as-a- guide” view does not deny that more effortful, deliberate thinking is indispensable to validate and develop an initial intuitive idea, the critical origin of the insight is believed to rely on

83 Bago Bence – Thèse de doctorat - 2018 intuitive processes (Ellenberg, 2015; Marewski & Hoffrage, 2015). In contrast with the corrective dual process view, here sound reasoning is not conceived as a process that corrects erroneous intuitions but as a process that builds on correct insights. The present study focuses on an empirical test of the corrective nature of sound reasoning in the bat-and-ball problem. Our motivation was that although the problem is considered a paradigmatic example of the need to correct an initial intuitive response and has been widely used to promote this view, the critical corrective assumption has received surprisingly little direct empirical testing. Arguably, one of the reasons is that the characterization seems self-evident from an introspective point of view. As we already noted, few people (including the present authors) would contest that it feels as if the “10 cents” answer pops up immediately and arriving at the “5 cents” solution requires more time and effort. Indeed, many scholars have referred to this introspective experience to warrant the corrective assumption (Frederick, 2005; Kahneman, 2011). However, while introspection is not without its merits, it is also well established that introspective impressions can be deceptive (Mega & Volz, 2014; Schooler, Ohlsson, & Brooks, 1993). In addition to introspection, Frederick (2005) also cited some indirect support for the corrective assumption in the paper that introduced the bat-and-ball problem. For example, Frederick observed that incorrect responders rate the problem as easier than correct responders and suggests that this presumably indicates that correct responders are more likely to consider both responses (see also Mata & Almeida, 2014; but see also Szaszi, Szollosi, Palfi, & Aczel, 2017). The problem is that although such assumptions are not unreasonable, they do not present conclusive evidence. Clearly, even when the assumption holds that correct responders are more likely to consider both the incorrect and correct responses, it does obviously not imply that they considered the incorrect response before the correct response. Other potential support comes from latency studies. A number of studies reported that correct “5 cents” responses take considerably longer than incorrect “10 cents” responses (e.g., Alós-Ferrer, Garagnani, & Hügelschäfer, 2016; Johnson, Tubau, & De Neys, 2016; Stupple, Pitchford, Ball, Hunt, & Steel, 2017; Travers, Rolison, & Feeney, 2016). For example, in one of our own studies we observed that correct responders needed on average about a minute and a half to enter their response whereas incorrect responders only took about 30 seconds (Johnson et al., 2016). Although this fits with the claim that System 2 processing is slower than System 1 processing, it does not imply that someone who engaged in System 2 reasoning, first engaged in System 1. That is, the fact that a correct response takes more time does not imply that correct responders generated the incorrect response before they

84 Bago Bence – Thèse de doctorat - 2018 considered the correct response. They might have needed more time to complete the System 2 computations without ever having considered the incorrect response. Somewhat more convincing evidence for the corrective dual process assumption in the bat-and-ball problem comes from a recent paper by Travers et al. (2016). In the study Travers et al. adopted a mouse tracking paradigm. In this paradigm different response options are presented in each of the corners of the screen (e.g., “10 cents”, “5 cents”) and participants have to move the mouse pointer from the center of the screen towards the response option of their choice to indicate their decision. This procedure can be used to study the time-course of decisions on the basis of participant’s mouse cursor trajectories (e.g., Spivey, Grosjean, & Knoblich, 2005). For example, do reasoners who ultimately select the correct response tend to move the mouse first towards the incorrect response? Travers et al. found that this was indeed the case. After about 5s participants started to make initial movements towards the incorrect “10 cents” option. However, movements towards the correct response were not observed until about 5 s later. These findings present some support but they are not conclusive. Note that if a response is truly intuitive, one might expect it to be cued instantly upon reading the problem. In this sense, the 5 s time lag before participants started to make mouse movements in the Travers et al. study is still quite long. This leaves open the possibility that the procedure is not picking up on earlier intuitive processing (Travers et al., 2016). In the present paper we report a research project involving a total of seven studies in which we aimed to test the corrective nature of correct responding in the bat-and-ball problem directly. We therefore adopted the so-called two-response paradigm (Thompson, Prowse Turner, & Pennycook, 2011). Thompson and colleagues developed this procedure to gain direct behavioral insight into the timing of intuitive and deliberative response generation. In the paradigm participants are presented with a reasoning problem and are instructed to respond as quickly as possible with the first, intuitive response that comes to mind. Subsequently, they are presented with the problem again, and they are given as much time as they want to think about it and give a final answer. Interestingly, a key observation for our present purposes was that Thompson and colleagues observed that people rarely change their initial response in the deliberation phase (Pennycook & Thompson, 2012; Thompson & Johnson, 2014; Thompson et al., 2011). This lack of answer change tentatively suggests that in those cases where a correct response was given as final response, the very same response was generated from the start. In other words, the correct response might have been generated fast and intuitively based on mere System 1 processing (Pennycook & Thompson, 2012;

85 Bago Bence – Thèse de doctorat - 2018

Thompson & Johnson, 2014; see also Bago & De Neys, 2017, and Newman, Gibb, & Thompson, 2017). However, past two-response studies used problems which were considerably easier than the bat-and-ball problem (Travers et al., 2016). Note that the dual process model does not entail that correct responding requires System 2 thinking in all possible situations and conditions. In some elementary tasks (e.g., Modus Ponens inferences in conditional reasoning, see Evans, 2010) the processing needed to arrive at the correct response might be so basic that it has been fully automatized and incorporated as a System 1 response. Likewise, in some cases System 1 might not generate a (incorrect) response and there will obviously be no need to correct it. Hence, findings pointing to correct intuitive responding might be attributed to the exceptional, non-representative nature of the task (Aczel, Szollosi, & Bago, 2016; Mata, Ferreira, Voss, & Kollei, 2017; Pennycook, Fugelsang, & Koehler, 2012; Singmann, Klauer, & Kellen, 2014; Travers et al., 2016). Proponents of the corrective dual process view can still argue that in prototypical cases – with the bat-and-ball problem as paradigmatic example - correct responding can only occur after deliberation and correction of an intuitive response. In addition, one might argue that at least in the initial two-response studies participants were simply instructed - and not forced - to respond intuitively. Hence, participants might have failed to respect the instructions and ended up with a correct first response precisely because they engaged in System 2 processing. Clearly, one has to make absolutely sure that only System 1 is engaged at the initial response phase. In the present study we adopt the two-response paradigm in a reasoning task with items that are directly modeled after the bat-and-ball problem. We also use the most stringent procedures to guarantee that the first response is truly intuitive in nature. Participants are forced to give their first response within a challenging deadline (e.g., 4 s in Study 1, the time needed to simply read the problem as indicated by pretesting). In addition, during the initial response phase participants’ cognitive resources are also burdened with a secondary load task. The rationale is simple. System 2 processing, in contrast with System 1, is defined as time and resource demanding. By depriving participants from these resources we attempt to “knock” out System 2 as much as possible during the initial response phase (Bago & De Neys, 2017). Finally, we also use a range of procedures to eliminate possible confounds resulting from task familiarity or guessing. To give away the punchline, our key result is that although we replicate the biased responding on the bat-and-ball problem, we also find consistent evidence for correct intuitive responding. Whenever people manage to give the correct “5 cent” answer as their final

86 Bago Bence – Thèse de doctorat - 2018 response after deliberation, they often already selected this answer intuitively as their initial response without any deliberation. In the different studies we use various manipulations (response format variations, Study 1-5; response justification elicitation, Study 6-7) that help to pinpoint the nature of the intuitive correct responses and the contrast with deliberate correct responses after reflection. Based on our empirical findings we will argue that the role of System 1 and 2 in dual process theories needs to be re-conceptualized: In line with the “intuition-as-a-guide” view favored by Einstein and Poincaré, it seems that in addition to the correction of an incorrect intuition, deliberation is often also used to verify and justify a correct intuitive insight.

Study 1

Method

Participants

In Study 1, 101 Hungarian undergraduate students (87 female, Mean age = 19.8 years, SD = 1.5 years) from the Eotvos Lorand University of Budapest were tested. Only freshmen were allowed to participate, so their highest completed educational level was high school (except 1 subject who reported that she already had a Bachelor degree). Participants received course credit for taking part. Participants in Study 1 (and all other reported studies) completed the study online.

Materials

Reasoning problems. In total, eight content modified version of the bat-and-ball problem were presented. The original bat-and-ball problem is frequently featured in scientific studies and popular science writing which implies that prospective participants might be familiar with it (Haigh, 2016). Previous studies have shown that more experienced participants (who took part in more studies and thus had a higher likelihood to have previously seen or solved the bat-and-ball problem) performed significantly better than naïve participants (Stieger & Reips, 2016). Furthermore, Chandler, Mueller, and Paolacci (2014) have also found a positive correlation between the performance on the Cognitive Reflection

87 Bago Bence – Thèse de doctorat - 2018

Test (CRT, a short 3-item questionnaire that includes the bat-and-ball problem) and the number of experiments people participated in. However, this correlation disappeared when the researchers used structurally identical, but content modified version of the problems. We used similar content modified versions of the original bat-and-ball problem (e.g., problems stated that a cheese and a bread together cost 2.90 euro7 or that an apple and an orange together cost 1.80 euro) in the present study to minimize the effect of familiarity or prior exposure on task performance. Furthermore, we presented multiple items to help us test for a possible guessing confound. Study 1 adopted a binary response format (see further). Hence, mere guessing at the initial response stage would already result in 50% correct “intuitive” responses. However, by presenting multiple items and computing a measure of “stability” (i.e., out of the presented items, on how many did the participant respond similarly?) we can control for the guessing account. If System 1 computes the correct response one would expect that reasoners manage to select it consistently. If correct responding results from mere guessing, it should only by observed on about half of the presented items. Note that in theory presenting multiple items might also boost participants’ accuracy because of a learning or transfer effect: after having solved the problem once, participants might have learned to apply the same strategy on subsequent items. However, Chandler et al.’s (2016) work suggests that people’s knowledge about the problem is highly item and content specific. Hoover and Healy (2017) also failed to observe transfer effects with repeated presentation of (content modified) problems. Hence, by changing the content of every item we can expect to minimize learning effects. In Study 1 each problem was always presented with two answer options; the logically correct response (e.g., “5 cents” in the original bat-and-ball problem) which is assumed to require System 2 deliberation and the “heuristic” response which is assumed to be cued by System 1 (e.g., “10 cents” in the original problem). We will use the labels “logical” and “heuristic” response to refer to these answers options. Mathematically speaking, the correct equation to solve the bat-and-ball problem is: 100 + 2x = 110, instead, people are thought to be intuitively using the “100 + x = 110” equation to determine their response (Kahneman, 2011). We always used the latter equation to determine the “heuristic” answer option, and the former to determine the “logical” answer option for each problem (e.g., if the two objects were said to cost 2.30 in total, the presented heuristic response option was 30 cents and the presented correct response option was 15 cents).

7 Note that in all our studies with Hungarian subjects we used euro instead of dollar units since this currency is more familiar to them.

88 Bago Bence – Thèse de doctorat - 2018

Participants had to indicate their answer by clicking on one of the options with the mouse. After providing an answer, they immediately advanced to the next problem. In order to minimize the influence of reading times and get a less noisy measure of reasoning time, problems were presented serially. First, the first sentence of the problem was presented, which always stated the two objects and their cost together (e.g., “An apple and an orange cost 1.80 euros in total”). Next, the rest of the problem was presented under the first sentence (which stayed on the screen), with the question and the possible answer options. The following illustrates the full problem format:

An apple and an orange cost 1.80 euros in total. The apple costs 1 euro more than the orange. How much does the orange cost? o 40 cents o 80 cents

To further assure that possible correct (or incorrect) responses did not originate from guessing we also presented control problems (see De Neys, Rossi, & Houdé, 2013; Travers et al., 2016). In the regular bat-and-ball problem the alleged heuristic System 1 intuition cues an answer that conflicts with the correct answer. In the “no-conflict” control problems, the heuristic System 1 intuition was made to cue the correct response option. This was achieved by deleting the critical relational “more than” statement. With the above example, a control problem looked as follows:

An apple and an orange cost 1.80 euros in total. The apple costs 1 euro. How much does the orange cost? o 40 cents o 80 cents

In this case the intuitively cued “80 cents” answer was also correct. The second response option for the control problems was always the correct response divided by 2 (e.g., “40 cents” in the example). Half of the presented problems were regular “conflict” problems, half of them were control problems. If participants are not guessing, performance on the control problems should be at ceiling (e.g., De Neys et al., 2013). Two problem sets were used in order to counterbalance the item content; the conflict items in one set were the control items in the other, and vice-versa. Participants were randomly assigned to one of the sets. This counterbalancing minimized the possibility that

89 Bago Bence – Thèse de doctorat - 2018 item contents would affect the contrast between conflict and control items. The presentation order of the items was randomized in both sets. All problems are presented in the Supplementary Material, section A.

Load task. We wanted to make sure that participants’ initial response was truly intuitive (i.e., System 2 engagement is ruled out). Therefore, we used a cognitive load task (i.e., the dot memorization task, see Miyake, Friedman, Rettinger, Shah, & Hegarty, 2001) to burden participants’ cognitive resources The rationale behind the load manipulation is simple; by definition, System 2 processing requires executive cognitive resources, while System 1 processing does not (Evans & Stanovich, 2013). Consequently, if we burden someone’s executive resources while they are asked to solve reasoning problems, System 2 engagement is less likely. We opted for the dot memorization task because it is well- established that it successfully burdens participant’s executive resources (De Neys & Schaeken, 2007; De Neys & Verschueren, 2006; Franssens & De Neys, 2009; Miyake et al., 2001). Before each reasoning problem participants were presented with a 3 x 3 grid, in which 4 dots were placed. Participants were instructed that it was critical to memorize it even though it might be hard while solving the reasoning problem. After answering the reasoning problem participants were shown four different matrixes and they had to choose the correct, to-be- memorized pattern. They received feedback as to whether they chose the correct or incorrect pattern. The load was only applied during the initial response stage and not during the subsequent final response stage in which participants were allowed to deliberate and recruit System 2 (see further).

Procedure

Reading pre-test. Before we ran the main study we also recruited an independent sample of 64 participants for a pre-test in which participants were simply asked to read our reasoning task items and randomly click on a response option when they were ready. The idea was to base the response deadline in the main reasoning task on the average reading time in the reading test. Note that dual process theories are highly underspecified in many aspects (Kruglanski, 2013); they argue that System 1 is faster than System 2, but do not further specify how fast System 1 is exactly (e.g., System 1 < x seconds). Hence, the theory gives us no unequivocal criterion on which we can base our deadline. Our “average reading time”

90 Bago Bence – Thèse de doctorat - 2018 criterion provides a practical solution to define the response deadline. If people are allotted the time they need to simply read the problem, we can assume that System 2 engagement is minimal. Full procedural details are presented in the Supplementary Material, Section D. The average reading time of the sample was 3.87 s (SD = 2.18 s). To give participants some minimal leeway, we rounded the average reading time to the closest higher natural number and set the response deadline to 4 seconds.

Reasoning task. The experiment was run online. Participants were specifically instructed at the beginning that we were interested in their very first, initial answer that came to mind. They were also told that they would have additional time afterwards to reflect on the problem and could take as much time as they needed to provide a final answer. The literal instructions that were used, stated the following (translated from Hungarian):

Welcome to the experiment! Please read these instructions carefully!

This experiment is composed of 8 questions and a couple of practice questions. It will take about 10 minutes to complete and it demands your full attention. You can only do this experiment once.

In this task we'll present you with a set of reasoning problems. We want to know what your initial, intuitive response to these problems is and how you respond after you have thought about the problem for some more time. Hence, as soon as the problem is presented, we will ask you to enter your initial response. We want you to respond with the very first answer that comes to mind. You don't need to think about it. Just give the first answer that intuitively comes to mind as quickly as possible. Next, the problem will be presented again and you can take all the time you want to actively reflect on it. Once you have made up your mind you enter your final response. You will have as much time as you need to indicate your second response.

After you have entered your first and final answer we will also ask you to indicate your confidence in the correctness of your response.

In sum, keep in mind that it is really crucial that you give your first, initial response as fast as possible. Afterwards, you can take as much time as you want to reflect on the problem and select your final response.

You will receive 500 HUF for completing this experiment.

Please confirm below that you read these instructions carefully and then press the "Next" button.

After this general introduction, participants were presented with a task specific introduction which explained them the upcoming task and informed them about the response deadline. The literal instructions were as follows:

91 Bago Bence – Thèse de doctorat - 2018

We are going to start with a couple of practice problems. First, a fixation cross will appear. Then, the first sentence of the problem is going to be presented for 2 seconds. Next, the rest of the problem will be presented. As we told you we are interested in your initial, intuitive response. First, we want you to respond with the very first answer that comes to mind. You don't need to think about it. Just give the first answer that intuitively comes to mind as quickly as possible. To assure this, a time limit was set for the first response, which is going to be 4 seconds. When there is 1 second left, the background colour will turn to yellow to let you know that the deadline is approaching. Please make sure to answer before the deadline passes. Next, the problem will be presented again and you can take all the time you want to actively reflect on it. Once you have made up your mind you enter your final response.

After you made your choice and clicked on it, you will be automatically taken to the next page. After you have entered your first and final answer we will also ask you to indicate your confidence in the correctness of your response. Press "Next" if you are ready to start the practice session!

After the specific instruction page participants solved two unrelated practice reasoning problems to familiarize them with the procedure. Next, they solved two practice dot matrix practice problems (without concurrent reasoning problem). Finally, at the end of the practice, they had to solve the two earlier practice reasoning problems under cognitive load. Each problem started with the presentation of a fixation cross for 1000 ms. After the fixation cross disappeared, the dot matrix appeared and stayed on the screen for 2000 ms. Then the first sentence appeared alone for 2000 ms. Finally, the remaining part of the problem appeared (while the first sentence stayed on screen). At this point participants had 4000 ms to give an answer; after 3000 ms the background of the screen turned yellow to warn participants about the upcoming deadline. If they did not provide an answer before the deadline, they were asked to pay attention to provide an answer within the deadline. The position of the correct answer alternative (i.e., first or second response option) was randomly determined for each item. After the initial response, participants were asked to enter their confidence in the correctness of their answer on a scale from 0% to 100%, with the following question: “How confident are you in your answer? Please type a number from 0 (absolutely not confident) to 100 (absolutely confident)”. After indicating their confidence, they were presented with four dot matrix options, from which they had to choose the correct, to-be-memorized pattern. Once they provided their memorization answer, they received feedback as to whether it was correct. If the answer was not correct, they were also asked to pay more attention to memorizing the correct dot pattern on subsequent trials.

92 Bago Bence – Thèse de doctorat - 2018

Finally, the same item was presented again, and participants were asked to provide a final response. Once they clicked on one of the answer options they were automatically advanced to the next page where they had to provide their confidence level again. The colour of the answer options was green during the first response, and blue during the final response phase, to visually remind participants which question they were answering. Therefore, right under the question we also presented a reminder sentence: “Please indicate your very first, intuitive answer!” and “Please give your final answer.”, respectively, which was also coloured as the answer options. At the very end of the experiment, participants were shown the standard bat-and-ball problem and were asked whether they had seen it before. We also asked them to enter the solution. Finally, participants completed a page with demographic questions. Exclusion criteria. In total, 26.7% (n = 27) of participants reported they had seen the original bat-and-ball problem before. Thirteen participants in this group managed to give the correct “5 cents” response. Although we used content modified problem versions in our study, we wanted to completely eliminate the possibility that these participants’ prior knowledge of the original correct solution would affect the findings. Therefore, we decided to discard all data from the participants who had seen the original bat-and-ball problem before and knew the correct solution (i.e., 13% from total sample, 88 participants were further analyzed). The remaining participants failed to provide a first response before the deadline in 10.3% of the trials. In addition, in 11.3% of the trials participants responded incorrectly to the dot memorization load task. All these trials were removed from the analysis because it cannot be guaranteed that the initial response resulted from mere System 1 processing: If participants took longer than the deadline, they might have engaged in deliberation. If they failed the load task, we cannot be sure that they tried to memorize the dot pattern and System 2 was successfully burdened. In these cases we cannot claim that possible correct responding at the initial response stage is intuitive in nature. Hence, removing trials that did not meet the inclusion criteria gives us the purest possible test of our hypothesis. In total, 18.3% of trials were excluded and 575 trials (out of 704) were further analyzed (initial and final response for the same item counted as 1 trial). For completeness, note that in Study 1 - and all other studies reported here - we also ran our analyses with all trials included. Key findings were never affected. Statistical analyses. Throughout the article we used mixed-effect regression models (Baayen, Davidson, & Bates, 2008) in which participants and items were entered as random

93 Bago Bence – Thèse de doctorat - 2018 intercepts to analyse our results. For the binary choice data we used logistic regression while for the continuous confidence and reaction time data we used linear regression.

Results

Table 1 gives an overview of the results. For consistency with previous work we first focus on the response accuracies for the final response. In line with the literature we find that most participants fail to solve the conflict versions of the bat-and-ball problem correctly. On average, final accuracy on the conflict items reached 24.5% (SD = 43.1). As one might expect, on the no-conflict control problems where System 1 processing is hypothesized to cue the correct response, final accuracy is at ceiling with about 96% (SD = 19.6) correct responses. These final response accuracies are consistent with what can be expected based on previous work that adopted a classic one-response paradigm (e.g., 21.6% and 97.4% in Johnson et al., 2016; 21% and 98% in De Neys et al., 2013). This indicates that the use of a two-response paradigm does not distort reasoning performance per se (Thompson et al., 2011).

Table 1. Initial and final average (SD) response accuracy in study 1-5. Study Response format Conflict items No-conflict control items Initial Final Initial Final response response response response Study1 2 response 20.1% (40.2) 24.5% (43.1) 93.4% (24.9) 96.0% (19.6) Study2a 2 response 7.7% (26.7) 7.3% (26.1) 94.6% (22.6) 96.5% (18.5) Study2b 4 response 9.0% (28.6) 10.8% (31.1) 94.9% (22) 95.1% (21.6) Study3a 2 response 23.9% (42.8) 28.3% (45.2) 96.3% (18.9) 97.1% (17) Study3b 4 response 19.0% (39.4) 29.5% (45.3) 95.6% (20.6) 97.1% (17) Study4 Free response 10.9% (31.3) 13.1% (33.9) 96.4% (18.8) 99.3% (8.5) Study5 Free response 30.5% (46.2) 41.4% (49.4) 93.8% (24.2) 98.1% (13.6)

Average 2 response 14.7% (34.4) 15.4% (36.1) 94.5% (22.9) 96.4% (18.6) 4 response 11.1% (31.5) 14.6% (35.4) 95.0% (21.7) 95.5% (20.7) Free response 20.4% (40.4) 26.8% (44.4) 95.0% (21.9) 98.7% (11.5)

Overall average 13.8% (34.5) 16.8% (37.4) 94.7% (22.3) 96.5% (18.5)

94 Bago Bence – Thèse de doctorat - 2018

The initial response accuracies are more surprising. In about 20.1% (SD = 40.2%) of the conflict trials people already give the correct answer as their initial response. This indicates that participants can intuitively solve the bat-and-ball problem. However, the raw percentage of correct intuitive responses is not fully informative. We can obtain a deeper insight into the results by performing a Direction of Change analysis on the conflict trials (Bago & De Neys, 2017). This means that we look at the way a given person in a specific trial changed (or didn’t change) her initial answer after the deliberation phase. More specifically, people can give a correct or incorrect response in each of the two response stages. Hence, in theory this can result in four different types of answer change patterns (“00”, incorrect response in both stages; “11”, correct response in both stages; “01”, initial incorrect and final correct response; “10”, initial correct and final incorrect response). According to the standard dual process model, two types of direction of change should be observed; 00 and 01. The “00” category implies that System 1 processes generated an initial incorrect response and System 2 thinking did not manage to override it. Consequently, the person is biased. The “01” category presents the standard correction case; System 1 processing will generate an initial incorrect response, but System 2 processing later manages to correct it in the final response stage.

Table 2. Frequency of direction of change categories in study 1-5 for conflict items. Raw number of trials are in brackets. Study Response Direction of change category Non- format correction 11 00 10 01 (11/11+01) Study1 2 response 16.5% (45) 71.8% (196) 3.7% (10) 8.1% (22) 67.2% Study2a 2 response 4.5% (24) 89.5% (475) 3.2% (17) 2.8% (15) 61.5% Study2b 4 response 8.6% (42) 88.8% (436) 0.4% (2) 2.2% (11) 79.2% Study3a 2 response 16.7% (23) 64.5% (89) 7.2% (10) 11.6% (16) 59% Study3b 4 response 19.0% (26) 71.5% (98) - 9.5% (13) 66.7% Study4 Free response 10.9% (15) 86.7% (119) - 2.2% (3) 83.3% Study5 Free response 29.7% (38) 57.8% (74) 0.8% (1) 11.7% (15) 71.7%

Average 2 response 9.8% (92) 80.7% (760) 3.9% (37) 5.6% (53) 63.4% 4 response 10.8% (68) 85.0% (534) 0.3% (2) 3.8% (24) 73.9% Free response 20.0% (53) 72.8% (193) 0.4% (1) 7.0% (18) 74.6%

Overall average 10.9% (198) 81.7% (1487) 2.2% (40) 5.2% (95) 67.6%

95 Bago Bence – Thèse de doctorat - 2018

Table 2 summarizes the frequencies of each direction of change category for the conflict problems. The most prevalent category is the 00 one; this pattern was generated in 71.8% of the trials. This means that both at the initial and final response stage participants were typically biased when solving our problems, which mirrors the overall accuracy pattern. One can also observe that there is a non-negligible amount of 01 responses (8.1% of trials). This is in accordance with the dual process predictions. Some reasoners initially generated the incorrect response, but managed to correct it after deliberation. However, the problem is that we observe about twice as many 11 cases (16.5% of trials). This means that in many cases (i.e., 67.2%) in which reasoners managed to give the correct response as their final answer, they already gave it intuitively at the initial response stage. We refer to this critical number [(i.e., 11/(11+01) ratio] as the % of non-corrective correct responses or non-correction rate in short. Overall, this result implies that contrary to the core dual process assumption, correct responding in the bat-and-ball problem does not necessarily require System 2 correction. System 2 correction does exist (i.e., we observe some 01 cases) but the point is that this correction seems to be far less common than assumed.

Table 3. Frequency of stability index values on conflict items in Study 1-5. The raw number of participants for each value is presented between brackets. Study Response Stability index value Average format <33% 50% 66% 75% 100% stability

Study1 2 response 3.6% (3) 10.7% (9) 8.3% (7) 9.5% (8) 67.9% (57) 87.1% Study2a 2 response 2.4% (4) 3.0% (5) 4.8% (8) 6.6% (11) 83.2% (139) 93.7% Study2b 4 response 3.5% (5) 1.4% (2) 2.8% (4) 4.2% (6) 88.1% (126) 95.0% Study3a 2 response 4.9% (2) 19.5% (8) 7.3% (3) 12.2% (5) 56.1% (23) 81.5% Study3b 4 response 2.3% (1) 7.0% (3) 14.0% (6) 2.3% (1) 74.4% (32) 89.7% Study4 Free response - 2.6% (1) 5.1% (2) - 92.3% (36) 97.0% Study5 Free response - 19.5% (8) 9.8% (4) 9.8% (4) 61.0% (25) 84.6%

Average 2 response 3.1% (9) 7.5% (22) 6.2% (18) 8.2% (24) 75.0% (219) 90.0% 4 response 3.2% (6) 2.7% (5) 5.4% (10) 3.8% (7) 84.9% (158) 93.7% Free response - 11.3% (9) 7.5% (6) 5.0% (4) 76.5% (61) 90.6%

Overall average 2.7% (15) 6.5% (36) 6.1% (34) 6.3% (35) 78.5% (438) 91.3%

96 Bago Bence – Thèse de doctorat - 2018

For completeness, Table S3 in the Supplementary Material also gives an overview of the direction of change findings for the no-conflict control items. Not surprisingly, here responses predominantly (93%) fell in the 11 category. Clearly, a critic might utter that given our binary response format our correct initial responses on the conflict items result from mere guessing. Indeed, our task is quite challenging – people have to respond within a very strict deadline and under cognitive load. In theory, it is possible that participants found the task too hard and just randomly clicked on one of the presented solutions. However, the ceiled initial performance on the no-conflict control problems argues against a general guessing confound (i.e., 93.4% vs 20.1% correct initial responses on the no-conflict and conflict problems respectively, χ2 (1) = 56.64, p < 0.0001, b = -4.77). If our task was so challenging that participants had to guess because they were not even able to read the problem information, their performance on the conflict and no- conflict problems should not have differed and should have hovered around 50% in both cases. Further evidence against a guessing account is also provided by our stability analysis (see below). Our direction of change analysis was computed across items and participants. One might wonder whether participants are stable in their preference for one or the other type of change category. That is, does an individual who produces a correct (incorrect) initial response on one conflict problem does so consistently for the other items? To answer this question we calculated for every participant on how many conflict problems they displayed the same direction of change category. We refer to this measure as the stability index. For example, if an individual shows the same type of direction of change on all four conflict problems, the stability index would be 100%. If the same direction of change is only observed on two trials, the stability index would be 50% etc. Table 3 presents an overview of the findings. Note that due to our methodological restrictions (discarding of answers after the deadline and for which the load memorization was not successful) for a small number of participants less than four responses were available. Here the stability index is calculated over the available items. As Table 3 shows, the dominant category is the 100% stability one. The average stability index on conflict items was 87.1% (SD = 20.5)8. This indicates that the type of change is highly stable at the individual level. If people show a specific direction of change pattern on one conflict problem, they tend to show it on all conflict problems. Note that the

8 Table S4 in the Supplementary Material shows that the stability was also high on the control problems with an average of 93% (SD = 12.8).

97 Bago Bence – Thèse de doctorat - 2018 stability index directly argues against the guessing account. If people were guessing when giving their initial answer, they should not tend to pick the same response consistently. Likewise, the high stability also argues against a possible practice or learning account. Although we used content-modified items (i.e., each item mentioned a unique set of products and total dollar amount), one might argue that the repeated presentation of multiple problems helped participants to learn and automatize the calculation of the correct response. Correct intuitive responding would arise near the end of the study, but would not be observed at the start. However, pace the learning account, the high stability indicates that participants’ performance did not change across the study. If participants managed to give a correct intuitive answer at the end of the study, they already did so at the start. If they were biased at the start, they were so at the end. This directly argues against a learning account and fits with recent empirical findings suggesting that mere repeated exposure has no substantial impact on bat-and-ball performance (e.g., Hoover & Healy, 2017).

Table 3. Frequency of stability index values on conflict items in Study 1-5. The raw number of participants for each value is presented between brackets. Study Response Stability index value Average format <33% 50% 66% 75% 100% stability

Study1 2 response 3.6% (3) 10.7% (9) 8.3% (7) 9.5% (8) 67.9% (57) 87.1% Study2a 2 response 2.4% (4) 3.0% (5) 4.8% (8) 6.6% (11) 83.2% (139) 93.7% Study2b 4 response 3.5% (5) 1.4% (2) 2.8% (4) 4.2% (6) 88.1% (126) 95.0% Study3a 2 response 4.9% (2) 19.5% (8) 7.3% (3) 12.2% (5) 56.1% (23) 81.5% Study3b 4 response 2.3% (1) 7.0% (3) 14.0% (6) 2.3% (1) 74.4% (32) 89.7% Study4 Free response - 2.6% (1) 5.1% (2) - 92.3% (36) 97.0% Study5 Free response - 19.5% (8) 9.8% (4) 9.8% (4) 61.0% (25) 84.6%

Average 2 response 3.1% (9) 7.5% (22) 6.2% (18) 8.2% (24) 75.0% (219) 90.0% 4 response 3.2% (6) 2.7% (5) 5.4% (10) 3.8% (7) 84.9% (158) 93.7% Free response - 11.3% (9) 7.5% (6) 5.0% (4) 76.5% (61) 90.6%

Overall average 2.7% (15) 6.5% (36) 6.1% (34) 6.3% (35) 78.5% (438) 91.3%

98 Bago Bence – Thèse de doctorat - 2018

Discussion

Consistent with the prior literature, Study 1 showed that people are typically biased and fail to solve the bat-and-ball problem. At first sight, this might seem to fit with the standard dual process story that solving the problem is hard and requires deliberate System 2 processing to override an incorrect intuition. However, the key finding is that in those cases where people do manage to give a correct final response after deliberation, they often already selected this answer as their first, intuitive response. Even though we experimentally reduced the use of System 2 deliberation and forced people to respond in a mere 4 s, correct responders often also gave the correct answer as their initial response. In many cases correct responses were non-corrective in nature. In other words, correct responders do not necessarily need to correct their intuition, their intuition is often already correct. The high non-correction rate directly argues against the core dual process theory assumption concerning the corrective nature of System 2 processing. But obviously, the results are based on one single study. In Study 2-5 we therefore tested the robustness of the findings. We examined whether we can replicate the pattern and whether it is robust to variations in response format. In Study 1 we used a binary forced-choice response format. In Study 2 and 3 we also introduced a condition with a 4-option response format. In Study 4 and 5 we used a free-response format. Each response format has some methodological advantages and disadvantages. The simple binary format allows us to set the most stringent deadline for the initial response stage. If people are presented with and have to read more response options or have to type their own response, they will need more time to enter a response. On the other hand, the multiple response and free response format allow better control against guessing and provide insight into the nature or specificity of the intuitive response. Clearly, if an individual intuitively selects the correct response in a binary choice format, this does not necessarily imply that she has calculated that the correct response is “5 cents”. She might have intuitively detected that the “10 cents” cannot be right (e.g., because the sum would be larger than $1.10) without knowing the correct solution. If participants select the “5 cents” response from a wider range of options – or generate it themselves – we can conclude that they also computed the correct response. In other words, we can test how precise their intuitive knowledge is. In Study 1 we tested Hungarian undergraduates. In Study 2-5, we also recruited participants from a wider range of populations (e.g., in terms of nationality, age, and education level).

99 Bago Bence – Thèse de doctorat - 2018

Study 2-5

For ease of presentation we will present a single result section in which the response format factor is included as factor in the analyses. Here we present an overview of the method sections of Study 2-5.

Method – Study 2: 2-option vs 4-option format (crowdsource sample)

Participants

A total of 372 participants were tested (196 female, Mean age = 39.6 years, SD = 13.5 years). Participants were recruited on-line via the Crowdflower platform and received $0.20 for their participation. Only native English speakers from the USA or Canada were allowed to take part in the experiment. A total of 36% of the participants reported high school as highest completed educational level, while 62% reported having a post-secondary education degree (2% reported less than high school, and 1 participant did not provide this information).

Materials and Procedure

Reading pre-test. Half of the participants were presented with four response options in Study 2. Since reading through more options will in itself take more time, we decided to run a new reading pre-test with the 4-option format (see supplementary material, section D, for full details). The mean reading time in the pre-test sample was 4.3 s (SD = 2 s). As in Study 1, we rounded the deadline to nearest higher natural number. Hence, the time limit in the 4-option format was set to 5 s (vs 4 s in the 2-option format).

Reasoning task. Participants were randomly allocated to the 2-option or 4-option treatment. The 2-option condition was completely identical to Study 1 except for the fact that material was presented in English and not in Hungarian. In the 4-options condition, two foil response options were presented in addition to the heuristic and correct response option. We used a “high” and a “low” foil option and used the following rules to determine them: the “high” foil was always the sum of the heuristic and logical options, whereas the “low” foil

100 Bago Bence – Thèse de doctorat - 2018 was always the greatest common divisor of the correct and heuristic option that was smaller than the correct answer. For example, in the original bat-and-ball problem, these would be the four response options: 1 (low foil), 5 (correct), 10 (heuristic), 15 (high foil). The presentation order of the response options was always the same in the initial and final response stages, but was counterbalanced across trials. All problems with their respective answer options are presented in the Supplementary Material, section A.

Exclusion criteria. The same exclusion criteria were applied as in Study 1. In total 15.3% of participants were excluded because they had seen the bat-and-ball problem before and knew the correct response. We further excluded all trials where participants failed to provide a response within the deadline (7.1% of trials; 7% in the 2-option and 7.3% in the 4- option condition) or did not provide the correct response to the load memorization task (13.3% of trials; 14.9% in the 2-option and 11.5% in the 4-option condition). Altogether, 18.7% of trials were excluded and 2049 trials (out of 2520) were further analyzed.

Method - Study 3: 2-option vs 4-option format (Hungarian student sample)

Participants

In total, 121 Hungarian university students from the Eotvos Lorand University of Budapest were tested (92 female, Mean age = 22.2 years, SD = 1.4 years). Participants received 500 HUF (~$1.7) for taking part. In total, 83% of subjects reported high school as highest educational level, and 17% reported that they already obtained a post-secondary educational degree.

Materials and procedure

We used Hungarian translations of the material but otherwise the Study 3 design was identical to Study 2. The same exclusion criteria were applied. In total 29.8% participants were excluded (85 were further analysed) because they had seen the original bat-and ball problem before and knew the correct answer. We further excluded trials where participants

101 Bago Bence – Thèse de doctorat - 2018 failed to provide a response within the deadline (9.4% of trials; 8.3% in the 2-option, and 10.5% in the 4-option condition) or did not provide the correct response to the memorization load task (11.9% of trials; 11.9% in both the 2-option and 4-option condition). Altogether, 19.6% of trials were excluded and 547 trials (out of 680) were further analyzed.

Method – Study 4: free response format (crowdsource sample)

Participants

A total of 47 participants took part (30 female, Mean age = 43.8 years, SD = 15.2 years) in this study. Participants were recruited via Crowdflower and received $0.20 for completing the study. 32% of participants reported high school as highest completed educational level, and 66% reported having a post-secondary educational level degree (2% reported less then high school).

Materials and procedure

Study 4 used a free response format. Both in the initial and final response stage participants needed to click on a blank field where they had to enter their response, type their answer, and click on a button labelled “Next” to advance. Obviously, this procedure will take longer than simple response selection and will be affected by variance in typing skill. To avoid unwarranted response rejection we decided to set a liberal deadline of 10 s in the initial response stage. The problem background turned yellow 2 seconds before the deadline. Note that previous studies that adopted a free response format without time-restrictions reported response latencies of over 82 s for correct answers (Johnson et al., 2016). Hence, by all means the 10 s deadline remains challenging. In addition, participants still had to give their initial response under secondary task load. Consequently, even with the longer deadline we still minimize System 2 engagement during the initial response stage. Otherwise, the study design was completely similar to Study 1. The same exclusion criteria were applied as in Study 1. In total 14.9% participants were excluded because they had seen the original bat-and-ball problem before and knew the correct response. We further excluded trials where participants failed to provide a response

102 Bago Bence – Thèse de doctorat - 2018 within the deadline (2.5% of trials) or did not provide the correct response to the memorization load task (12.8% of trials). Altogether, 14.4% of trials were excluded and 274 trials (out of 320) were further analyzed.

Method – Study 5: free response format (Hungarian convenience sample)

Participants

A total of 55 Hungarian volunteers participated (50 female, Mean age = 33.8 years, SD = 9.3 years) in this study. Participants were recruited online through the help of social media. Participants completed the experiment online. 33% of participants reported that their highest completed educational degree was high school, while 64% reported having a post- secondary educational level degree (4% reported less then high school).

Materials and procedure

The same procedure was used as in Study 4 but with a stricter initial response deadline of 8 s. This was based on the fact that we observed that it took Study 4 participants on average only 5.1 s (SD = 1.46 s) to enter their initial response. The problem background again turned yellow 2 seconds before the deadline. The same exclusion criteria were applied as in Study 1. In total 18.2% of participants were excluded because they had seen the original bat-and-ball problem before and knew the correct response. We further excluded trials where participants failed to provide a response within the deadline (11.1% of trials) or did not provide the correct response to the memorization load task (11.7% of trials). Altogether, 19.7% of trials were excluded and 289 trials (out of 360) were further analyzed.

Results

Table 1 and 2 give an overview of the accuracy and direction of change findings in each study. The bottom rows of the tables show the overall average and the average in

103 Bago Bence – Thèse de doctorat - 2018 function of the three response formats we used across Study 1-5. The overall average sketches a pattern that is consistent with the Study 1 findings. Most people are biased when solving the conflict bat-and-ball problems with initial and final accuracies of 13.8% and 16.8% whereas performance on the control no-conflict versions is at ceiling. Direction of change results for the conflict problems show that there is some correction going on with 5.2% of “01” cases in which an initial incorrect response is corrected after deliberation in the final response stage. However, the key finding is that the “11” cases in which a correct final response is preceded by a correct initial response are twice as likely (10.9%). Indeed, the overall non-correction rate reached 67.6% - which is virtually identical to the 67.2% rate observed in Study 1. When eyeballing the averages for the three response formats separately, it is clear that by and large this overall pattern is observed with each format. There is no evidence for a systematic decreased correct intuitive responding in the 4-option and free response studies. Statistical analyses showed that the initial accuracy, χ2 (2) = 1.11, p = 0.57, final accuracy, χ2 (2) = 0.67, p = 0.72, as well as the rate of non-corrective correct responding, χ2 (2) = 4.2, p = 0.12, were not significantly affected by the response format. This implies that our core finding is robust: Across multiple studies with different response formats (and a range of populations) we consistently observe that when reasoners solve the bat-and-ball problem correctly, they typically give the same answer as their first, intuitive response. This questions the assumption that System 2 is needed to correct our intuition. As a side note, an intriguing observation is that when looking at the individual studies, one might note that in the two studies with participants recruited on the Crowdflower platform (Study 2: 2-option and 4-option formats and Study 4: free response format) we observed overall lower accuracies than in the other studies, both at the initial and final response stage. This trend reached significance at the initial, χ2 (1) = 7.1, p = 0.008, b = 1.44, and at the final response stage, χ2 (1) = 4.35, p = 0.04, b = 1.5, as well. Nevertheless, despite the overall lower accuracy we observe the same key pattern. The non-correction rate did not differ significantly in our Crowdflower studies and the other studies, χ2 (1) = 1.63, p = 0.20. Table 3 gives an overview of the stability index on the conflict items in Study 2-5. As in Study 1 we observe that the index is consistently high with an overall average value of 91.3%. This indicates that the direction of change pattern is highly stable at the individual level. If people show a specific direction of change pattern on one conflict problem, they show this same pattern on all conflict problems. This argues against a guessing and practice account. If participants gave correct intuitive responses because they guessed or because the repeated presentation allowed them to automatize the calculations after the first trial(s), their

104 Bago Bence – Thèse de doctorat - 2018 performance should have shown more variability. Nevertheless, there was some variability and especially with respect to the practice account one might argue that it can be informative to focus exclusively on the very first problem that reasoners solved. In an additional analysis we therefore included only the first conflict problem that reasoners solved and excluded all later trials. Obviously, given that we restrict the analysis to a single trial with only a small number of critical correct (initial and final) responses per study, it should not be surprising that the data are noisier. Nevertheless, key finding is that even in this single trial analysis, the overall non-correction rate across our studies still reached 42% (individual experiments range from 11% to 75%; average 2-response format = 30.1%, average 4-response = 61.5%, average free response = 40%, see supplementary Table S7 for a full overview). Although the effect is less robust than in the full analysis, this confirms that the critical intuitive correct responding is present from the start. Having established that the core finding concerning the generation of a correct System 1 intuition is robust, we can dig somewhat deeper into the findings. One interesting open question is whether correct initial responders are faced with two competing intuitions at the first response stage. That is, a possible reason for why people in the 11 category manage to give a correct initial response might be that the problem simply does not generate an intuitive heuristic “10 cents” response for them. Hence, they would only generate a correct, logical intuition and would not be faced with an interfering heuristic one. Alternatively, they might generate two competing intuitions, but the correct intuition might be stronger and therefore dominate (Bago & De Neys, 2017). We can address this question by looking at the contrast between conflict and no- conflict control problems. If conflict problems cue two conflicting initial intuitive responses, people should process the problems differently than the no-conflict problems (in which such conflict is absent) in the initial response stage. Studies on conflict detection during reasoning that used a classic single response paradigm have shown that processing conflict problems typically results in lower confidence and longer response latencies (e.g., Botvinick, 2007; De Neys, 2012; Pennycook, Fugelsang, & Koehler, 2015). The question that we want to answer here is whether this is also the case at the initial response stage. Therefore, we contrasted the confidence ratings and response times for the initial response on the conflict problems with those for the initial response on the no-conflict problems. Our central interest here concerns the 11 cases but a full analysis and discussion for each direction of change category is presented in the Supplementary Material, section B. In sum, results across our five studies indeed show that 11 responders showed both longer latencies (average increase = 720 ms, SD

105 Bago Bence – Thèse de doctorat - 2018

=113.8 ms, χ2 (1) = 21.7, p < 0.0001, b = 0.05) and decreased confidence (average decrease = 12.3 points, SD = 1.9 points, χ2 (1) = 43.6, p < 0.0001, b = -12.2) on the conflict vs no- conflict problems. This supports the hypothesis that in addition to the dominant correct logical intuition the opposing heuristic “10 cents” is also being cued. In other words, System 1 seems to be generating two conflicting intuitions – a logical correct and incorrect heuristic one - in which one is stronger than the other and gets automatically selected as initial response without System 2 deliberation. Finally, one might also want to contrast the confidence ratings for the different direction of change categories. Previous two-response studies (e.g., Bago & De Neys, 2017; Thompson et al., 2011; Thompson & Johnson, 2014) established that the initial response confidence was lower for responses that got subsequently changed after deliberation (i.e., the “01” and “10” types) than for responses that were not changed (i.e., the “00” and “11” types). It has been suggested that this lower initial confidence (or “Feeling of Rightness” as Thompson et al. refer to it) would be one factor that determines whether reasoners will engage in System 2 deliberation (e.g., Thompson et al., 2011). We therefore looked at the average confidence ratings across Study 1-5. To test the confidence trends, we entered direction of change category and/or response stage (initial or final) as fixed factors in our model (with, as in all our analyses, participants and items as random intercepts). Figure S1 in the supplementary material shows that the pattern reported by Thompson et al. (2011) is replicated in the current study: The initial response confidence for the “01” and “10” categories in which people change their initial response is lower than for responses that are not changed, χ2 (1) = 193.9, p < 0.0001, b = -27.2. A similar but less pronounced pattern is observed in the final response stage, χ2 (1) = 39, p < 0.0001, b = -8.9. When contrasting the initial and final confidence we also observe that after deliberation there is an overall trend towards increased confidence in the final response stage, χ2 (1) = 91.6, p < 0.0001, b = 5.3.

Discussion

Study 2-5 replicated the key finding of our first study. Reasoners who manage to solve the bat-and-ball problem after deliberation often already solved it correctly on the basis of mere intuitive processing. This argues against the corrective nature of System 2 deliberation. But this obviously raises a new question. If we don’t necessarily need System 2 deliberation to correct our intuition, then what do we use or need it for? Why would correct responders

106 Bago Bence – Thèse de doctorat - 2018 ever deliberate, if their intuition already provides them with the correct answer? Here it is important to stress that the fact that people can intuitively generate the correct response, does not imply that the intuitive response will have the exact same characteristics as correct responses that are given after proper deliberation. Even if we accept that System 1 and System 2 might both generate a correct response, the processing characteristics will presumably differ. In other words, there should be some boundary conditions as to what reasoners can do on the basis of mere System 1 processing. Study 6 and 7 focus on this issue. One of the features that is often associated with System 2 deliberation is that it is cognitively transparent (Bonnefon, 2016; Evans & Stanovich, 2013). That is, the output comes “with an awareness of how it was derived” (Bonnefon, 2013). Intuitive processing lacks this explanatory property. Indeed, it is precisely the absence of such processing insight or justification that is often conceived as a defining property of intuitive processing - and one of the reasons to label intuitions as “gut-feelings” (Marewski & Hoffrage, 2015; Mega & Volz, 2014). Bluntly put, this suggest that people might intuitively know and generate a correct answer, but they will not know why it is correct. In Study 6 and 7 we tested this hypothesis by looking at people’s response justifications after the initial and final response stage. We hypothesised that although intuitive processes might suffice to estimate the correct answer and produce a correct initial response, people should have little insight into this process and fail to justify why their answer is correct. However, after deliberation in the second response stage, such proper response justification should become much more likely.

Study 6

Methods

Participants

We recruited 63 Hungarian university students from the Eotvos Lorand University of Budapest (48 female, Mean age = 22.7 years, SD = 1.9 years). These participants received course credit for taking part. In total, 79% of participants reported high school as their highest completed educational level, 21% reported that they already had a post-secondary educational level degree.

107 Bago Bence – Thèse de doctorat - 2018

Materials and procedure

Since the primary goal of Study 6 (and 7) was to study participant’s response justification we made a number of procedural changes to optimize the justification elicitation. Given that explicit justification might be hard (and/or frustrating) we opted to present only half of the Study 1 problems (i.e., two conflict and two no-conflict versions). These items were chosen randomly from the Study 1 problems. Problem content was counterbalanced as in Study 1 and we also used the binary 2-option response format. The procedure followed the same basic two-response paradigm as in Study 1 with the exception that cognitive load was not applied and participants were not requested to enter their response confidence so as to further simplify the task design. The same response deadline as in Study 1 (4 s) was maintained. Note that previous work from our team that contrasted deadline and load treatments indicated that a challenging response deadline may suffice to minimize System 2 engagement in a two-response paradigm (see Bago & De Neys, 2017). After both the initial and final response people were asked the following justification question: “Could you please try to explain why you selected this answer? Can you briefly justify why you believe it is correct? Please type down your justification below.” There was no time restriction to enter the justification. Whenever participants missed the response deadline for the reasoning problem, they were not presented with the justification question, but rather a message which urged them to make sure to enter their response before the deadline on the next item. Justification analysis. To analyse participants’ justification we defined 8 main justification categories on the basis of an initial screening. Two independent raters categorized the justification responses into these categories. They were in agreement in 86.1% (378 out of 439) of the cases. Cases in which the individual raters did not agree, were afterwards discussed among them with the goal of reaching an agreement (which was reached in all cases). Although our key interest lies in the rate of correct justification, the categorization gives us some insight into the variety of justifications participants spontaneously produce. The eight justification categories along with illustrative examples are presented below. Note that for illustrative purposes we have rephrased the examples into the original bat-and-ball problem units:

Correct math. People referred to the correct mathematical solution (e.g., “because they cost 1.10 together, and the ball costs 1 more than the ball, the ball will be 5 cents and the bat

108 Bago Bence – Thèse de doctorat - 2018

1.05”, “5 cents + 1.05 = 1.10”, “110 = x + 100x, 10 = 2x, x = 5”, “if the bat is 105 cents and the ball is 5 cents then the bat will be 100 more”). Incorrect math. Participants referred to some sort of mathematical solution but it was not correct (e.g., “1.10 total, so it’s got to be 10 cents”, “1.10 minus 1 is 10 cents”, “because I subtract the given price from the total price, then the rest will be the price of the good in question”, “1.10 – 1 = 10”) 9. Unspecified math. Participants referred to mathematical calculation but did not specify the calculation (e.g., “I just did the math”, “mental calculation”, “this is how the solution comes out with mathematical calculations”; “result of my calculation”). Hunch. People referred to their gut feeling or intuition (e.g., “this is what seemed best, in the moment of the decision”, “this was more sympathetic”, “I saw the numbers and based my decision on my intuition”, “this automatically came to me as soon as I saw the problem”). Guess. Participants referred to guessing. (e.g., “I guessed”, “I could not read it because it was so fast, just clicked on something”, “I couldn’t really think about the correct solution so I guessed”, “was my best guess”). Previous. Participants referred to previous answer without specifying it (e.g., “the justification is the same as before”, “my way of thinking is similar to the one I used in the previous task”, “I applied same logic as before”, “see before”). Other. Any justification that did not fit in other categories (e.g., “I was not sure because I do not have enough time to think it through”, “ I cannot think and read at the same time”, “Hard to tell”, “ Cannot justify it because I had to answer so quickly that I already forgot”). Exclusion criteria. The same exclusion criteria as in Study 1 were applied: 17.5% of participants were excluded because they had seen the original bat-and-ball problem before and knew the correct response. We further excluded trials where participants failed to provide a response within the deadline (13% of trials). Altogether, 181 trials (out of 208) were further analyzed.

9 To avoid confusion, note that for the no-conflict control trials, references to the mathematical solution “1.10 - .10 = 1” (i.e., an incorrect math justification on the conflict problems) were obviously scored as correct justifications.

109 Bago Bence – Thèse de doctorat - 2018

Results and discussion

The accuracy and direction of change pattern in Study 6 was consistent with the pattern we observed in Study 1-5: People typically fail to solve the bat-and-ball problem (37.8% final conflict accuracy), but among correct responders the correct response is frequently generated as initial response (non-correction rate of 59%, see Table 4 and 5 for full details). But clearly, the focus in Study 6 concerns the justifications. We are primarily interested in the proportion of correct justifications for correct conflict responses: In those cases that participants manage to respond correctly, could they also properly justify why their answer was correct? Correct justifications were defined as any minimal reference to the correct mathematical solution (e.g., “because they cost 1.10 together, and the ball costs 1 more, the ball must be 5 cents and the bat 1.05”, “5 cents + 1.05 = 1.10”, “110 = bat + ball, bat = ball + 100, so ball = 10/2). In these cases we can be sure that participants have some minimal insight into the nature of the correct solution. Table 6 gives an overview of our justification classification for the critical conflict problems (see Supplementary Material Table S5 for the no-conflict problems and Table S9 for the conflict justifications for the individual direction of change categories). Key finding is that correct justifications were indeed much more likely after deliberation than after intuitive responding. Correct justifications on the conflict problems tripled from 20.7% for initial correct responses to 60.6% for final correct responses, χ2 (1) = 12.4, p < 0.001, b = -2.9. This presents some initial support for a boundary condition of correct System 1 responding. Although correct responders often generate the correct solution intuitively, they typically only manage to justify it after deliberation.

Table 4. Initial and final accuracies (SD) in justification studies (Study 6-7). Study Response format Conflict items No-conflict control items Initial response Final response Initial response Final response Study 6 2 response 32.2% (32.2) 37.8% (37.8) 86.8% (86.8) 94.5% (94.5) Study 7a 2 response 28.9% (45.7) 34.2% (47.8) 90.8% (29.1) 100% (0) Study 7b Free response 17% (37.8) 48.9% (50.3) 92.5% (26.4) 100% (0)

However, one might note that our open justification was quite noisy. There were certain types of justifications that were hard to interpret. For example, in the “Unspecified math” category we grouped answers in which participants indicated they “calculated” the

110 Bago Bence – Thèse de doctorat - 2018 response but did not explain how (e.g., “I did the math”). These were most common when participants gave an incorrect response (29% of incorrect cases), but were also observed for correct response (3% of correct cases). Clearly, it is possible that people knew the correct justification, but simply felt there was no need to specify or clarify it. Similarly, participants sometimes also wrote they did “what they did before” (i.e., “Previous” category, 13% of correct responses). Here too it is possible that people could justify the correct solution but did not bother to specify it. To sidestep such interpretational complications, we used a more structured justification elicitation in Study 7. A number of additional methodological improvements also allowed us to further validate the findings.

Table 5. Frequency of each direction of change category for the conflict items in each justification study (Study 6-7). Raw number of trials are in brackets. Study Response Direction of change category Non- format correction 11 00 10 01 (11/11+01) Study 6 2 response 22.2% (20) 52.2% (47) 10% (9) 15.5% (14) 58.8% Study 7a 2 response 21.1% (16) 57.9% (44) 7.9% (6) 13.2% (10) 61.5% Study 7b Free response 14.77% (13) 48.86% (43) 2.27% (2) 34.1% (30) 30.2%

Study 7

Methods

Participants

A total of 128 Hungarian undergraduates from the Eotvos Lorand University of Budapest were tested (103 female, Mean age = 20.3 years, SD = 1.9 years). Participants received course credit for taking part. 87.5% of participants reported that their highest completed educational level was high school, and 12.5% reported they already had a post- secondary educational level degree.

111 Bago Bence – Thèse de doctorat - 2018

Materials and procedure

The procedure was based on Study 6 with several methodological changes. The main difference concerned the justification elicitation. To reduce interpretation noise we adopted a semi-structured justification format with four pre-defined answer options that were based on the most frequent justification responses in Study 6. The following illustrates the lay-out:

Could you please justify, why do you think that this is the correct response to the question? Please choose from the presented options below:

° “I did the math. Please specify how: ____” ° “I guessed” ° “I decided based on intuition/gut feeling” ° “Other, please specify: ____”

For the first and fourth answer options participants were also asked to specify their answer. Our rationale was that this format should clarify that we expected them to enter a specification and thereby minimize mere unspecified references to “math/calculations” or “same as before” type responses. As in Study 6, participants were presented with 2 conflict and 2 no-conflict problems. In Study 7 we adopted further modified content adopted from Trouche (2016; see also Mata et al., 2017). Problems had the same structure as the original bat-and-ball and our Study 6 items but instead of listing the price of two goods, they referred to a different unit (e.g., weight). This should further reduce any possible familiarity effect. Here is an example:

An apple and an orange weigh 160 grams together. The apple weighs 100 grams more than the orange. How much does the orange weigh?”. ° 60 grams ° 30 grams

As in our other studies, two problem sets were used in order to counter-balance item content; the conflict items in one set were the control items in the other, and vice-versa. Participants were randomly assigned to one of the sets. A full overview of all problems can be found in the Supplementary Material (section A). In Study 6 we use the binary, 2-option response format for the reasoning problems. In Study 7 we used both a 2-option and free-response format. Participants were randomly assigned to one of the treatments. The 2-response design was identical to Study 6. The response deadline in the 2-option condition was again 4 s. In the free response condition the deadline was set at 7 s.

112 Bago Bence – Thèse de doctorat - 2018

In Study 7 we also recorded the time it took for participants to enter their justification. Note that although correct justifications were much more likely after the final response in Study 6, there were still about 20% correct justifications for the initial response on our conflict bat-and-ball problems. One possibility is that some participants used the initial justification stage to start deliberating about their answer. If this is the case, one might expect that the justification response times for correct initial justifications will be affected. The justification latency results in Study 7 lend some support to this hypothesis. Although there were only a handful of correct initial justifications (n = 6), these did tend to take considerably longer (mean = 33.4 s, SD = 2.9 s) than when participants entered an incorrect initial math justification (mean = 17.7, SD = 2.04 s, χ2 (1) = 3.01, p = 0.08) or a correct initial math justification on no-conflict problems (mean = 17.87 s, SD = 2.9 s, χ2 (1) = 4.6, p = 0.032, b = 0.25). This suggests that some caution might be needed when interpreting the few correct initial response justifications. As an additional manipulation check we also included a bogus test question at the end of the survey in Study 7 (Meade & Craig, 2012). Participants might not answer truthfully to our familiarity check (“Have you seen this problem before?”) because they feel it is undesirable to answer affirmatively (e.g., because of fears of being screened out of the study, Wyse, 2013). We included the bogus question “Have you ever lied in your life?” to identify such a possible tendency. However, all participants passed this check question and answered affirmative.

Exclusion criteria. The same exclusion criteria as in our other studies were applied. 17.2% of participants were excluded because they had seen the original bat-and-ball problem before and knew the correct response. We further excluded trials where participants failed to provide a response within the deadline (18.5% of trials in the 2-option condition; 12.9% of trials in the free response condition). Altogether, 15.6% of trials were excluded and 358 trials (out of 424) were further analyzed.

Results and discussion

Accuracy and direction of change findings for the 2-option and free response condition in Study 7 can be found in Table 4 and 5. As the tables show, results for the 2-option

113 Bago Bence – Thèse de doctorat - 2018 condition are perfectly in line with the 2-option findings in Study 6 and our other studies: In the vast majority of cases people fail to solve the conflict versions of the bat-and-ball problem, but those who do solve it correctly, often already do so intuitively. The non- correction rate in the 2-option condition reached 62%. However, the pattern in the free response format clearly diverged. Final accuracy reached 48.9% here; the highest rate we observed in any of our studies. The direction of change analysis indicates that this was driven by an extremely high rate of “01” responses (34%) in which the correct response was generated after deliberation. This differed significantly from the 2-option condition rate, χ2 (1) = 9.1, p = 0.002, b = 1.48. However, the “11” response rate (i.e., correct final responses that are preceded by a correct initial response) did not differ from the 2-option condition, χ2 (1) = 0.12, p = 0.72, b = -0.4810, and was in line with what we observed previously. What this suggests is that participants in the free response justification condition showed higher accuracy, not because it was easier to solve the problem intuitively but because it was easier to arrive at the correct response after deliberation. It seems that the combination of being asked to generate your own response and having to justify it boosted deliberation. This boosted deliberation resulted in a non-correction rate of 30%, which is the lowest we observed in any of our studies11. Although the positive impact on deliberation is interesting in its own right it does not impact our main justification goal. The key question remains to what extent people can justify their correct initial and final responses: In those cases that participants manage to respond correctly, could they also properly justify why their answer was correct? Table 7 shows the justification results for the conflict problems. We replicate the main finding of Study 6. Both in the 2-option, χ2 (1) = 54.5, p < 0.0001, b = -64.6, and free response format condition, χ2 (1) = 22.3, p < 0.0001, b = -3.3, correct justifications are much more likely for correct final than for correct initial responses. With the structured justification elicitation in Study 7 we obtained correct justification in over 90% of final correct responses (2-option: 96.2%; free response: 90.7%). This directly establishes that whenever people give a correct response after deliberation, they have little trouble to justify it. However, such justification is much rarer for correct initial responses (2-option: 9.1%; free response: 26.7%). A closer look at Table 7 shows that the dominant justifications for correct initial responses were references to intuition

10 The random effect of items was left out of the model because of convergence problems. 11 For completeness, we also looked at response accuracy on the first conflict problem in our justification studies. Across studies 6 and 7 the non-correction rate was 33.3% (individual studies range from 23.3% to 60% non- correction; average 2-response format = 43.3%, free response = 23.3%, see supplementary Table S8). Although these data concern a limited number of observations they further indicate that correct intuitive responding is observed from the start of the experiment, as we observed in Study 1-5.

114 Bago Bence – Thèse de doctorat - 2018 and guessing. These types of justifications were completely absent for correct final responses. Taken together, the results of our justification studies provide clear evidence for a boundary condition of correct System 1 intuitions. We can estimate the correct answer intuitively, but we don’t know how we do it. Our System 1 knowledge is not cognitively transparent (Bonnefon, 2016).

Table 6. Frequency of different types of justifications for conflict items in Study 6 and 7 (raw number of justifications in brackets). Study Justification Initial response Final response Correct Incorrect Correct Incorrect Study 6 Correct math 20.7% (6) 1.6% (1) 60.6% (20) 5.6% (3) 2 response Incorrect math - 24.6% (15) - 33.3% (18) Unspecified math - 14.8% (9) 6.1% (2) 44.4% (24) Hunch 3.4% (1) 6.6% (4) 6.1% (2) - Guess 34.5% (10) 9.8% (6) 3% (1) 3.7% (2) Previous 20.7% (6) 6.6% (4) 6.1% (2) 1.9% (1) Other 20.7% (6) 36.1% (22) 18.2% (6) 11.1% (6)

Study 7 Correct math 9.1% (2) 1.9% (1) 96.2% (25) 2% (1) 2 response Incorrect math - 37% (20) - 69.4% (34) Unspecified math 9.1% (2) 9.3% (5) 3.8% (1) 16.3% (8) Hunch 45.5% (10) 27.8% (15) - 2% (1) Guess 36.4% (8) 24.1% (13) - 10.2% (5) Other - - - -

Study 7 Correct math 26.7%(4) - 90.7% (39) 2.2% (1) Free response Incorrect math - 38.6% (28) - 68.9% (31) Unspecified math 6.7% (1) 4.1% (3) 4.7% (2) 17.8% (8) Hunch 46.7% (7) 30.1% (22) - 4.4% (2) Guess 13.3% (2) 23.3% (17) - 6.7% (3) Other 6.7% (1) 4.1% (3) 4.7% (2) -

General Discussion

Influential work in the reasoning and decision making field since the 1960s has popularized a corrective view of human reasoning (Evans & Stanovich, 2013; Kahneman,

115 Bago Bence – Thèse de doctorat - 2018

2011). This view entails that sound reasoning often requires correction of fast, intuitive thought processes by slower and more demanding deliberation. The present study questions this idea. We focused on the very problem that has been widely featured as the paradigmatic illustration of the corrective view, the bat-and-ball problem (e.g., Frederick, 2005; Kahneman, 2011). By adopting a two response paradigm in which people were required to give an initial response under time-pressure and cognitive load we were able to empirically identify the intuitively generated response that preceded the final response given after deliberation. Across our studies we consistently observed that correct final responses are often non-corrective in nature. In a substantial number of cases, reasoners who manage to answer the bat-and-ball problem correctly after deliberation already solved it correctly when they reasoned purely intuitively in the initial response phase. In other words, people who solve the bat-and-ball problem do not necessarily need to deliberate to correct their intuitions, their intuitions are often already correct. These findings point to at least two fundamental implications about the way we conceive intuitive and deliberate thinking or System 1 and 2. On one hand, it requires us to upgrade our view of the intuitive System 1. Although System 1 can frequently cue incorrect intuitions, it also generates correct intuitions. Among correct responders it are these correct intuitions that will often dominate. Consequently, even when we’re faced with the notorious bat-and-ball problem intuitive thinking is less ignorant or “smarter” than traditionally assumed. On the other hand, the upgrading of System 1 also implies that we need to revise the role of System 2. When the correct response can be generated intuitively, the central role of System 2 deliberation cannot exclusively lie in a correction process. The results of our justification studies suggest that instead of in correction, the contribution of deliberate processing in these cases might rather lie in its cognitive transparency (Bonnefon, 2016). We observed that whereas people don’t manage to explain why their intuitive response is correct, they seem to have little difficulties in providing such correct justifications after deliberation. Clearly, being able to produce a proper justification for one’s insights is quite crucial. This was well-understood by the leading scientists we cited to illustrate the “intuition-as-a-guide” view: Although Einstein and Poincaré wanted to highlight the key role of intuitive processes, they also stressed the importance of subsequent deliberation. Bluntly put, Kukulé and Newton would not have managed to convince their peers, if they had simply claimed their ideas were correct because they “felt it”. Hence, even among the historical proponents of the key role of intuitive thinking for sound reasoning there was never any question that intuitive insight will need further reflection and validation to be fully developed. In other words, the initial

116 Bago Bence – Thèse de doctorat - 2018 intuitive insight is important but does not suffice. What the present study suggests is that this view on intuitive reasoning in which deliberation is helping to validate an initial intuitive insight might be a more appropriate model to conceive of human reasoning than a view in which the key function of deliberation merely lies in correction of erroneous intuitions. In recent years there have been a number of popular accounts that have celebrated the advantages of intuitive thinking over deliberate thinking (Dijksterhuis, 2011; Gigerenzer, 2007; Gladwell, 2005). Against this backdrop it should be stressed that our call to upgrade the role of System 1 and our arguments against the corrective view of System 2 should not be conceived as a claim to downgrade the importance of System 2 deliberation. First, across all our studies there were always instances in which System 2 correction did occur (i.e. “01” cases). Hence, the prototypical corrective pattern in which an initially faulty intuition is corrected after deliberation is also observed. Second, as we alluded to above, the fact that deliberation does not necessarily play a role in correction does not imply it is not important for other reasons. Our findings suggest that one such reason might be its cognitive transparency and the fact that after deliberation people are able to come up with a proper justification. Hence, deliberation can help to produce a good explanation or argument for why a response is correct. Such arguments are critical for communicative purposes (e.g., Mercier & Sperber, 2011). What is true for scientific discussions is also true for daily life: we will not be very successful in convincing others that our answer to a problem is correct, if we can only tell them that we feel it is right. If we come up with a good explanation, however, people will be much more likely to change their mind (Trouche, Sander, & Mercier, 2014). Such argumentative persuasion has been argued to be the evolutionary driving force behind the development of the human capacity to reason (Mercier & Sperber, 2011). Indeed, the human success as a social and cultural species is hard to imagine without an ability to communicate and transmit good problem solutions12. Hence, it would be foolish to interpret our findings and arguments against the corrective view of deliberation or System 2 as evidence against the role of deliberation in thinking per se. In addition, one needs to bear in mind that although our findings present evidence for the possible non-corrective nature of correct responding, most people are still biased and fail to give the correct answer when solving the bat-and-ball problem. In absolute numbers, incorrect “10 cents” responses are still much more common than correct “5 cents” responses.

12 Clearly, deliberation might have further additional benefits beyond communication per se. For example, another value of deliberate explanation might lie in the improvement of one’s own understanding which can facilitate knowledge transfer to other relevant problems (e.g., Wertheimer, 1945).

117 Bago Bence – Thèse de doctorat - 2018

Solving the bat-and-ball problem correctly is still exceptional. The key issue is that in those cases it does occur, the correct response is often already generated intuitively. But in absolute terms such correct intuitive response generation remains rare. Obviously, the point is not that System 1 is always correct, the point is that it can be correct and is often already so for reasoners who respond correctly after having deliberated. Likewise, our findings should not be taken to imply that System 2 deliberation cannot be used to correct faulty intuitions or that such correction is never required. The key assumption we test in the present study is whether correct responding results from deliberate correction of a faulty intuition. We examined whether sound reasoners respond correctly precisely because they manage to complete this correction process. Our results show that this is not necessarily the case. Often, correct responders have nothing to correct. However, this does not imply that correction is redundant for everyone. Our results do not imply that everyone manages to generate the correct answer intuitively. As we noted, the vast majority of reasoners gives the faulty intuitive “10 cents” response both at the initial response stage and after deliberation. Hence, not everyone will generate a correct (intuitive) response. For most reasoners, the incorrect intuition will dominate. Consequently, our empirical results directly argue against the idea that correction of System 1 is never needed. And it might very well be the case that additional deliberation could be helpful in these cases. Imagine that we devise an intervention procedure that allows us to train biased reasoners to deliberately correct. This might very well reduce bias and boost correct responses. Our results do not speak to this issue. Hence, the present findings do not imply that interventions are pointless or that deliberate correction is impossible or redundant. Our point here is that spontaneous sound reasoning does not necessarily require such correction. It is this central assumption of the corrective view that our results question. As we clarified in the introduction, literally hundreds of studies have focused on the bat-and-ball problem in the last ten years. One common objection to our study might be that if the non-correction phenomenon and correct “5 cents” intuitions are really so ubiquitous, then why has this phenomenon not been documented previously? We believe that the simple answer is that scholars haven’t really looked for it. Note that we were only able to identify the correct intuitions by carving up the reasoning process with our two-response paradigm. In a traditional “one-response” experiment System 1 and 2 processing will go hand in hand. That is, the correct intuitive responders will typically back up their System 1 processing with System 2 deliberation to validate their answer. Reasoners will not end their reasoning process after they have come up with a correct intuitive response. This implies, for example, that the

118 Bago Bence – Thèse de doctorat - 2018 final response generation for those who give the correct response will still take longer than for those who give an incorrect response and do not engage (or engage less profoundly) in System 2 deliberation13. It is only by experimentally isolating the initial reasoning stage that we were able to demonstrate the correct nature of the initially generated response in these cases. Bluntly put, it is unlikely that a pure correct intuitive response will be observed “in the wild”. Just as with other non-naturalistically perceivable scientific phenomena (Niaz, 2009) we suspect that this helps to explain why the non-corrective nature of System 2 deliberation has gone largely unnoticed in empirical studies. To be very clear, we are not the first to point towards the potential of intuitive processing. We referred to the intuition-as-guide view to illustrate how more than a century ago leading scientists already argued that the origin of their key insights relied on intuitive processing. Furthermore, within the cognitive sciences various scholars have developed related ideas in a range of frameworks (e.g., Dijksterhuis’ Unconscious Thinking Theory, 2011; Gigerenzer’s Fast and Frugal Heuristics, 2007; Klein’s Naturalistic Decision Making, 2004; Reyna’s Fuzzy-Trace Theory, 2012). More specifically, Peters (2012) has explicitly raised the possibility that good reasoners might manage to arrive at the correct response in the bat-and-ball problem precisely because they have correct intuitions. The critical contribution of our study lies in the empirical demonstration of this phenomenon. More generally, one might argue that even the traditional dual process framework can accommodate the present findings with some additional qualification. One general feature of dual process models is that with practice and experience processes that initially need System 2 deliberation can be automatized and handled by System 1 (Epstein, 1994; Evans & Stanovich, 2013; Kahneman, 2011; Sloman, 1996). In a way, such automatization is precisely what we hope to achieve in many teaching or learning contexts (e.g., Schneider & Shiffrin, 1977). Solving the bat-and- ball problem boils down to solving the algebraic equation “X + Y = 1.10, Y = 1 +X, Solve for X”. This is something that all educated adults have done at length in their high school math classes. One can account for the present findings by assuming that years of exposure to this type of problem solving helped sound reasoners to automatize the solution process. Consequently, there would no longer be a need for deliberate correction. We would not object to the idea that such automatization may lie at the heart of the currently observed correct

13 This also implies that one’s performance on the bat-and-ball or related problems (e.g., items from the Cognitive Reflection test, Frederick, 2005) are still a valid measure of one’s tendency to reflect or deliberate. The present data indicate that correct responders are still more likely to deliberate (or are better at deliberation) than incorrect responders (i.e., after deliberation they manage to justify their response). The point is simply that the nature of this deliberation process does not necessarily lie in a correction process but rather in a justification process.

119 Bago Bence – Thèse de doctorat - 2018 intuitive responding. Hence, although the current findings argue against the traditional corrective dual process view they do not necessarily invalidate the wider framework itself. But it underscores the need for any viable dual process model to fully recognize and embrace the potential of System 1. Proponents of the corrective view can try to point to a fundamental methodological limitation of our study (or any empirical study that aim to disentangle intuitive and deliberate processing). We used a two-response paradigm in which we tried to make sure that the initial responses were intuitive in nature by combing an instruction, time-pressure, and load manipulation. All these manipulations have been previously shown to limit System 2 deliberation. By combining them we believe we created one the most stringent and purest operationalisations of System 1 processing that have been adopted in the dual process literature to date. However, critics might argue that we can never be completely sure that we eliminated all System 2 processing. In theory, this is correct. The general problem is that dual process theories are underspecified (Kruglanski, 2013). The framework entails that System 2 is slower and more demanding than System 1 but gives us no unequivocal a priori criterion that allows us to classify a process as System 1 or 2 (e.g., takes at least x time, or x amount of load). Consequently, as long as we keep on observing correct initial responses, one can always argue that these will disappear “with just a little bit more load/time pressure”. Note, however, that the corrective assumption becomes unfalsifiable at this point. Any negative evidence can always be explained away by arguing that the procedure did not fully rule out deliberation. In sum, we believe that by all currently adopted practical standards the processing in the initial response phase in our experiments should be considered to be intuitive in nature. Nevertheless, we readily acknowledge that we can never claim with absolute certainty that all deliberate processing was ruled out14. Another objection against the current work might be that it focused exclusively on the bat-and-ball problem. One easy way out for proponents of the corrective view would be to argue that our findings simply imply that the field has mistakenly characterized the bat-and-

14 One related argument might be that our “intuitive” correct responders’ System 2 is so efficient that it works much faster and with less executive burden than others. Hence, rather than showing that correct responding is intuitive we might have identified a type of very fast and undemanding System 2 processing among some highly gifted reasoners. In our opinion, this line of reasoning defies the whole purpose of trying to distinguish between two types of processing. In theory, there will always be a point at which it is impossible to distinguish very fast, efficient deliberation from intuition. We can only think of two systems within certain operational boundaries. Hence, there would be no principled way of determining which system is at play here.

120 Bago Bence – Thèse de doctorat - 2018 ball problem as a prototypical example of the correction process. Hence, the corrective view could be maintained but it would simply need to change its poster boy. This is problematic for several reasons. First, if we post hoc attribute every finding with a task that does not fit the predictions as an exceptional case, we end up with a framework that has hardly any explanatory power. But more critically, the current findings have also been observed with other classic reasoning tasks such as belief-bias syllogisms and base-rate neglect tasks (Bago & De Neys, 2017; Newman et al., 2017). Hence, it is not the case that the observed non- correction is some idiosyncratic peculiarity of the bat-and-ball problem that would fail to generalize to other tasks. Nevertheless, belief-bias and base-rate neglect problems are easier (i.e., show lower bias rates) than the bat-and-ball problem and there is less a priori agreement on how representative they are to test the corrective view (Aczel et al., 2016; Evans, 2018; Mata et al., 2017; Singmann et al., 2014; Travers et al., 2016; Pennycook et al., 2012). By showing that the corrective prediction does not hold up in the specific case that is considered to be one of its strongholds, we believe we provide a critical test that forces us to question a strict corrective dual process view of deliberation. Deliberation is undoubtedly critical for human thinking, but sound reasoners do not necessarily need it to correct faulty intuitions.

121 Bago Bence – Thèse de doctorat - 2018

REFERENCES

Aczel, B., Szollosi, A., & Bago, B. (2016). Lax monitoring versus logical intuition: The determinants of confidence in conjunction fallacy. Thinking & Reasoning, 22(1), 99– 117. Alós-Ferrer, C., Garagnani, M., & Hügelschäfer, S. (2016). Cognitive reflection, decision biases, and response times. Frontiers in Psychology, 7. Bago, B., & De Neys, W. (2017). Fast logic?: Examining the time course assumption of dual process theory. Cognition, 158, 90–109. Bonnefon, J.-F. (2013). New ambitions for a new paradigm: Putting the at the service of humanity. Thinking & Reasoning, 19(3–4), 381–398. Bonnefon, J.-F. (2016). The Pros and Cons of Identifying Critical Thinking with System 2 Processing. Topoi, 1–7. Botvinick, M. M. (2007). Conflict monitoring and decision making: reconciling two perspectives on anterior cingulate function. Cognitive, Affective, & Behavioral Neuroscience, 7(4), 356–366. Bourgeois-Gironde, S., & Van Der Henst, J.-B. (2009). How to open the door to System 2: Debiasing the Bat-and-Ball problem. In S. Watanabe, A. P. Blaisdell, L. Huber, & A. Young (Eds.), Rational animals, irrational humans (pp. 235–252). Chandler, J., Mueller, P., & Paolacci, G. (2014). Nonnaïveté among Amazon Mechanical Turk workers: Consequences and solutions for behavioral researchers. Behavior Research Methods, 46(1), 112–130. De Neys, W. (2012). Bias and conflict a case for logical intuitions. Perspectives on Psychological Science, 7(1), 28–38. De Neys, W., Rossi, S., & Houdé, O. (2013). Bats, balls, and substitution sensitivity: Cognitive misers are no happy fools. Psychonomic Bulletin & Review, 20(2), 269– 273. De Neys, W., & Schaeken, W. (2007). When people are more logical under cognitive load. Experimental Psychology (Formerly Zeitschrift Für Experimentelle Psychologie), 54(2), 128–133. De Neys, W., & Verschueren, N. (2006). Working memory capacity and a notorious brain teaser: The case of the Monty Hall Dilemma. Experimental Psychology, 53(2), 123– 131. Dijksterhuis, A. P. (2011). Het slimme onbewuste. Prometheus. Ellenberg, J. (2015). How not to be wrong: The power of mathematical thinking. Penguin. Epstein, S. (1994). Integration of the cognitive and the psychodynamic unconscious. American Psychologist, 49(8), 709–724. Evans, J. St. B. T. (2010). Thinking Twice: Two Minds in One Brain. Oxford: Oxford University Press. Evans, J. St. B.T., & Stanovich, K. E. (2013). Dual-process theories of higher cognition advancing the debate. Perspectives on Psychological Science, 8(3), 223–241. Evans, J. St. B. T. (2018). Dual Process Theories: Perspectives and problems. In W. De Neys (Ed.), Dual Process Theory 2.0 (pp. 137–155). Oxon, UK: Routledge.

122 Bago Bence – Thèse de doctorat - 2018

Franssens, S., & De Neys, W. (2009). The effortless nature of conflict detection during thinking. Thinking & Reasoning, 15(2), 105–128. Frederick, S. (2005). Cognitive reflection and decision making. The Journal of Economic Perspectives, 19(4), 25–42. Gigerenzer, G. (2007). Gut feelings: The intelligence of the unconscious. London: Penguin Books. Gladwell, M. (2005). Blink: The guide to thinking without thinking. Little, Brownand Company, New York. Haigh, M. (2016). Has the Standard Cognitive Reflection Test Become a Victim of Its Own Success? Advances in Cognitive Psychology, 12(3), 145–149. Hoover, J. D., & Healy, A. F. (2017). Algebraic reasoning and bat-and-ball problem variants: Solving isomorphic algebra first facilitates problem solving later. Psychonomic Bulletin & Review, 1–7. Johnson, E. D., Tubau, E., & De Neys, W. (2016). The Doubting System 1: Evidence for automatic substitution sensitivity. Acta Psychologica, 164, 56–64. Kahneman, D. (2011). Thinking, fast and slow. New York, NY: Farrar, Straus and Giroux. Klein, G. A. (2004). The power of intuition: How to use your gut feelings to make better decisions at work. Crown Business. Kruglanski, A. W. (2013). Only one? The default interventionist perspective as a unimodel— Commentary on Evans & Stanovich (2013). Perspectives on Psychological Science, 8(3), 242–247. Lehrer, J. (2011). The Science of Irrationality. Retrieved from https://www.wsj.com/articles/SB10001424052970203633104576625071820638808 Levitt, S. D., & Dubner, S. J. (2010). Freakonomics (Vol. 61). Sperling & Kupfer editori. Marewski, J. N., & Hoffrage, U. (2015). Modeling and aiding intuition in organizational decision making. Journal of Applied Research in Memory and Cognition, 4, 145–311. Mastrogiorgio, A., & Petracca, E. (2014). Numerals as triggers of System 1 and System 2 in the ‘bat and ball’problem. Mind & Society, 13(1), 135–148. Mata, A., & Almeida, T. (2014). Using metacognitive cues to infer others’ thinking. Judgment and Decision Making, 9(4), 349–359. Mata, A., Ferreira, M. B., Voss, A., & Kollei, T. (2017). Seeing the conflict: an attentional account of reasoning errors. Psychonomic Bulletin & Review, 1–7. Meade, A. W., & Craig, S. B. (2012). Identifying careless responses in survey data. Psychological Methods, 17(3), 437–455. Mega, L. F., & Volz, K. G. (2014). Thinking about thinking: implications of the introspective error for default-interventionist type models of dual processes. Frontiers in Psychology, 5. Mercier, H., & Sperber, D. (2011). Why do humans reason? Arguments for an argumentative theory. Behavioral and Brain Sciences, 34(02), 57–74. Miyake, A., Friedman, N. P., Rettinger, D. A., Shah, P., & Hegarty, M. (2001). How are visuospatial working memory, executive functioning, and spatial abilities related? A latent-variable analysis. Journal of Experimental Psychology: General, 130(4), 621– 640.

123 Bago Bence – Thèse de doctorat - 2018

Newman, I., Gibb, M., & Thompson, V. A. (2017). Rule-based reasoning is fast and belief- based reasoning can be slow: Challenging current explanations of belief -bias and base-rate neglect. Journal of Experimental Psychology: Learning, Memory, and Cognition, 43(7), 1154–1170. Niaz, M. (2009). Critical appraisal of physical science as a human enterprise: Dynamics of scientific progress (Vol. 36). Springer Science & Business Media. Oesper, R. E. (1975). The human side of scientists. University Publications, University of Cincinnati. Pennycook, G., Fugelsang, J. A., & Koehler, D. J. (2012). Are we good at detecting conflict during reasoning? Cognition, 124(1), 101–106. Pennycook, G., Fugelsang, J. A., & Koehler, D. J. (2015). What makes us think? A three- stage dual-process model of analytic engagement. Cognitive Psychology, 80, 34–72. Pennycook, G., & Thompson, V. A. (2012). Reasoning with base rates is routine, relatively effortless, and context dependent. Psychonomic Bulletin & Review, 19(3), 528–534. Peters, E. (2012). Beyond comprehension the role of numeracy in judgments and decisions. Current Directions in Psychological Science, 21(1), 31–35. Poincaré, H. (1914). Science and Method, translated by F. Maitland, Preface by B. Russell (Thomas Nelson and Sons, London, 1914). Reyna, V. F. (2012). A new intuitionism: Meaning, memory, and development in fuzzy-trace theory. Judgment and Decision Making, 7(3), 332–359. Schooler, J. W., Ohlsson, S., & Brooks, K. (1993). Thoughts beyond words: When language overshadows insight. Journal of Experimental Psychology: General, 122(2), 166–183. Sinayev, A., & Peters, E. (2015). Cognitive reflection vs. calculation in decision making. Frontiers in Psychology, 6, 532. Singmann, H., Klauer, K. C., & Kellen, D. (2014). Intuitive logic revisited: new data and a Bayesian mixed model meta-analysis. PloS One, 9(4), e94223. Sloman, S. A. (1996). The empirical case for two systems of reasoning. Psychological Bulletin, 119(1), 3–22. Spivey, M. J., Grosjean, M., & Knoblich, G. (2005). Continuous attraction toward phonological competitors. Proceedings of the National Academy of Sciences of the United States of America, 102(29), 10393–10398. Stieger, S., & Reips, U.-D. (2016). A limitation of the Cognitive Reflection Test: familiarity. PeerJ, 4, e2395. Stupple, E. J., Pitchford, M., Ball, L. J., Hunt, T. E., & Steel, R. (2017). Slower is not always better: Response-time evidence clarifies the limited role of miserly information processing in the Cognitive Reflection Test. PloS One, 12(11), e0186404. Szaszi, B., Szollosi, A., Palfi, B., & Aczel, B. (2017). The cognitive reflection test revisited: exploring the ways individuals solve the test. Thinking & Reasoning, 1–28. Thompson, V. A., & Johnson, S. C. (2014). Conflict, metacognition, and analytic thinking. Thinking & Reasoning, 20(2), 215–244. Thompson, V. A., Prowse Turner, J. A., & Pennycook, G. (2011). Intuition, reason, and metacognition. Cognitive Psychology, 63(3), 107–140. Travers, E., Rolison, J. J., & Feeney, A. (2016). The time course of conflict on the Cognitive Reflection Test. Cognition, 150, 109–118.

124 Bago Bence – Thèse de doctorat - 2018

Trouche, E. (2016). Le raisonnement comme competence sociale: Une comparaison expérimentale avec les théories intellectualistes. Unpublished PhD thesis. Trouche, E., Sander, E., & Mercier, H. (2014). Arguments, more than confidence, explain the good performance of reasoning groups. Journal of Experimental Psychology: General, 143(5), 1958–1971. Wyse, S. E. (2013). Don’t disappoint your online survey respondents. Retrieved from https://www.snapsurveys.com/blog/disappoint-online-survey-respondents/

125 Bago Bence – Thèse de doctorat - 2018

Chapter 3: Advancing the specification of dual process models of higher cognition: a critical test of the hybrid dual process model

Abstract

Dual process models of higher cognition have become very influential in the cognitive sciences. The popular Default-Interventionist model has long favored a serial view on the interaction between intuitive and deliberative processing (or System 1 and System 2). Recent work has led to an alternative hybrid model view in which people’s intuitive reasoning performance is assumed to be determined by the absolute and relative strength of competing intuitions. In the present study, we tested unique new predictions to validate the hybrid model. We adopted a two-response paradigm with popular base-rate neglect problems. By manipulating the extremity of the base-rates in our problems we aimed to affect the strength of the logical intuition that is hypothesized to cue selection of the correct base-rate response. Consistent with the hybrid model predictions, we observed that experimentally reducing the strength of the logical intuition decreased the number of correct initial (i.e., “intuitive”) responses when solving problems in which heuristic and logical intuitions conflicted. Critically, incorrect intuitive responders were less likely to register the intrinsic conflict (as reflected in decreased confidence) in this case, whereas correct intuitive responders registered more conflict. Implications and remaining challenges for dual process theorizing are discussed.

Based on Bago, B., & De Neys, W. (under review). Advancing the specification of dual process models of higher cognition: a critical test of the hybrid dual process model

126 Bago Bence – Thèse de doctorat - 2018

Introduction

For centuries, human thinking has been portrayed as an interplay between intuitive and deliberate thought processes. This classic dichotomy is captured by so-called dual process models of higher cognition that have become very influential in modern day research on reasoning, judgment, and decision-making (Evans, 2010; Kahneman, 2011; Sloman, 1996; Stanovich, 2011). By now, dual process models have been used to explain numerous phenomena ranging from probabilistic or deductive reasoning biases (Kahneman, 2011), economic behavior (Alós-Ferrer & Strack, 2014), moral reasoning (Greene, 2013), cooperative behavior (Rand, Greene, & Nowak, 2012), and creativity (Barr, Pennycook, Stolz, & Fugelsang, 2015; Cassotti, Agogué, Camarda, Houdé, & Borst, 2016). At the most basic level, a dual process model posits that there are two different types of thinking, often referred to as System 1 and System 2 processing. System 1 (also referred to as intuitive, heuristic, or Type 1 processing) operates fast and effortless whereas System 2 (also referred to as deliberate, analytic, or Type 2 processing) is believed to be slower and effortful. There are different types of dual process models but arguably the dominant framework has been the serial or default-interventionist model that has been put forward by prominent scholars such as Daniel Kahneman (Kahneman, 2011) or Jonathan Evans and Keith Stanovich (Evans & Stanovich, 2013). At the core of the default-interventionist (DI) model lays a serial view on the interaction between System 1 and 2. The key idea is that when people are faced with a reasoning problem, they will typically rely on the fast System 1 to generate an answer. This is the default system. If needed, people can activate System 2 in a later phase to intervene and correct System 1 output. But this System 2 engagement only occurs after System 1 has been engaged and it is also optional. That is, activation of System 2 is not guaranteed. More generally, in the serial DI model, reasoners are conceived as cognitive misers who try to minimize cognitive effort (Kahneman, 2011). Since System 2 thinking is hard, people will often refrain from it and stick to the default System 1 response. The serial DI model offers an appealing explanation for the widespread “bias” that has been observed in the reasoning and decision-making literature. To illustrate, consider the following problem (an adaptation of the famous base-rate problem, Tversky & Kahneman, 1974):

127 Bago Bence – Thèse de doctorat - 2018

“There is an event with 1000 people. Jo is a randomly chosen participant who attended the event. We know that Jo is 23 years old and is finishing a degree in engineering. On Friday nights, Jo likes to go out and drink beer. We also know that 997 people attending the event were women. What is most likely: Is Jo a man or a woman?”

In the problem, people get information about the composition of a sample and a short description of one participant (e.g., “Jo”). The problem is specifically constructed such that it cues a prepotent intuitive response based on heuristic, stereotypical associations prompted by the description (e.g., “Jo is an engineer and likes beer, so Jo is a man”). This is the intuitive “heuristic” default response that is believed to be generated by System 1. However, this response conflicts with the response that is cued by the base-rate information. Indeed, given that there are hardly any males in the sample (3 out of 1000) logically speaking it will be much more likely that a randomly drawn individual will be female. Although it might be more likely that Jo is an engineer on the basis of the description alone (e.g., in general, there might be more male than female beer-loving engineers), the extreme base-rates should push the scale to the “female” side. However, decades of empirical studies have established that the vast majority of participants opt for the heuristically cued intuitive response and seem to neglect elementary logico-mathematical principles in this and a range of related tasks (e.g., Kahneman, 2011). Why do intelligent adults so often violate basic logico-mathematical principles? Default-Interventionist theorists have highlighted that the key problem is that taking these principles into account typically requires demanding System 2 computations. When the fast System 1 has provided us with a response, most reasoners will refrain from engaging the effortful System 2. Consequently, they will not detect that their answer conflicts with more logical considerations. Put differently, these biased System 1 reasoners do not detect that they are being biased. The few people who do engage System 2 will override the initially cued intuitive System 1 response after their System 2 deliberation is completed and manage to give the correct response. The above illustrates how the serial interaction view in the default-interventionist dual process view makes at least two critical assumptions or hypotheses about people’s reasoning performance. First, biased reasoners who give the heuristic System 1 response that conflicts with logico-mathematical principles will not detect that their response conflicts with these principles. Second, deliberate System 2 processing is assumed to be essentially corrective in

128 Bago Bence – Thèse de doctorat - 2018 nature: Sound reasoning in the case of conflict implies correction of the default intuitive response. We can refer to these hypotheses as the “bias blind spot” and “corrective” assumption, respectively (De Neys, 2018). To avoid confusion, we will clarify a number of points before advancing. First, we use the label “correct” or ‘‘logical” response as a handy shortcut to refer to ‘‘the response that has traditionally been considered as correct or normative according to standard logic or probability theory”. The appropriateness of these traditional norms can be questioned (e.g., see Stanovich & West, 2000, for a discussion). Under this interpretation, the heuristic response should not be labeled as ‘‘incorrect” or ‘‘biased”. With regard to the base rate problem, one could opt to label the responses “base-rate” and “stereotype” -based response, respectively. However, the theoretical models referred to in this article are models of reasoning in the broad sense, not of a specific reasoning problem. Therefore, we prefer to stick to the traditional labels “correct” for the probability-based responses and “incorrect” for the stereotype-based responses. Nevertheless, it will be clear that readers should refrain from a blind literal reading of the labels. In the same vein, one should refrain from equating System 2 processing and normative correctness. No dual process theorist has ever claimed that System 1 is always wrong and System 2 is always right. For example, it is crisp clear that adults can readily compute the answer to the problem “How much is 5 + 5?” without any deliberation. At the same time, System 2 does not universally produce the normative response. There can be situations in which too much deliberation will lead people astray (e.g., Reyna, 2004). Hence, the normative correctness of a response cannot be a defining feature of System 1 and 2. It is simply the case that the two features (i.e., whether a response has been generated by System 1 or 2 and whether it is normatively correct or not according to traditional standards) are often correlated in the type of problems we typically study in the reasoning and heuristics and biases field (Evans, 2012). For example, it has been the demanding System 2 nature of correct responding in problems such as the base rate neglect task, conjunction fallacy, belief bias syllogisms, and many others that has been used to account for the established correlation between “correct” responding and individual differences in cognitive capacity (e.g., Stanovich & West, 2000; Evans & Stanovich, 2013). When we talk about the “corrective assumption” or “bias blind spot” we talk about these reasoning problems where logic/probability based-responding has traditionally been considered to result from System 2 deliberation. But clearly, this does not imply that one can or should generally equate System 2 processing and normative correctness.

129 Bago Bence – Thèse de doctorat - 2018

Finally, readers need to keep in mind that the claim about the serial nature of the System 1 and 2 interaction in the default-interventionist model concerns the postulated processing architecture during the core reasoning process. As De Neys (2018) put it, literally speaking, one might argue that a response to a reasoning problem can never be purely intuitive. That is, before System 1 can cue an intuitive response one will need to read or listen to the problem premises, for example. Such reading and comprehension processes may require deliberation and draw on the very same resources that System 2 requires. Consequently, one can argue that every reasoning process starts with initial System 2 activation. Likewise, one might argue that every reasoning process also ends with System 2 activation. That is, once reasoners have computed a response to a problem, they will need to verbalize or type down their answer. This answer production may also require System 2. In this sense, it can be said that even the serial default-interventionist model assumes that System 2 is always “on”. But the idea here is that System 2 is in a “low-effort” mode in which it simply accepts the suggestions made by System 1 without checking them (Kahneman, 2011). Hence, it does not engage in any proper deliberation so its core function is not activated. In sum, it is useful to bear in mind that the serial default-interventionist claim concerns the processing during the actual “reasoning” stage and not the initial encoding of the preambles or the ultimate overt response production (De Neys, 2018). The default-interventionist model and the corresponding bias blind spot and corrective assumptions have had far-reaching impact on theorizing in the various fields that have adopted dual process models and, more generally, our view of human rationality (e.g., Gürçay & Baron, 2017; Stanovich & West, 2000). However, in recent years direct experimental testing of the core assumptions has pointed to fundamental issues. Pace the “bias blind spot” hypothesis, a range of studies have established that often biased reasoners do show bias sensitivity (e.g., Bonner & Newell, 2010; De Neys & Glumicic, 2008; Gangemi, Bourgeois-Gironde, & Mancini, 2015; Pennycook, Trippas, Handley, & Thompson, 2014; Stupple, Ball, Evans, & Kamal-Smith, 2011; but see also Aczel, Szollosi, & Bago, 2016; Mata, Ferreira, Voss, & Kollei, 2017; Travers, Rolison, & Feeney, 2016). In these studies, participants are presented with both traditional reasoning problems in which a cued heuristic response conflicts with a logical principle and control no-conflict problems. Small content transformations in the control versions ensure that the intrinsic conflict in the traditional version is removed. For example, a no-conflict problem of the above base-rate problem would simply switch the base-rate around (e.g., “There are 997 males and 3 females in the sample”). Everything else stays the same. Hence, in the control case both the description and base-rates

130 Bago Bence – Thèse de doctorat - 2018 cue the same response (i.e., “Jo is a man”). We can test people’s bias or conflict sensitivity by measuring how they process these different versions. If biased reasoners are blind heuristic thinkers who do not take logical principles into account, then the fact that they conflict or not with the cued heuristic response should not impact their reasoning. However, the available evidence indicates that biased reasoners often do register conflict. For example, biased reasoners show increased response doubt – as reflected in lower confidence and slightly longer decision latencies, when they give a biased answer on the conflict problems (e.g., De Neys, Rossi, & Houdé, 2013; Gangemi et al., 2015; Pennycook, Trippas, et al., 2014; Stupple et al., 2011). They also show increased activation of brain areas that are supposed to mediate conflict and error detection (i.e., the Anterior Cingulate Cortex, e.g., De Neys, Vartanian, & Goel, 2008; Simon, Lubin, Houdé, & De Neys, 2015). Critically, the bias or conflict sensitivity is also observed under severe time-pressure and cognitive load (Bago & De Neys, 2017a; Franssens & De Neys, 2009; Johnson, Tubau, & De Neys, 2016; Pennycook, Cheyne, Barr, Koehler, & Fugelsang, 2014; Thompson & Johnson, 2014). These time-pressure and load manipulations are used to experimentally “knock-out” System 2 deliberation. Since System 2 processing is time and cognitive resource demanding we can minimize its impact by restricting participants’ response time or burdening their cognitive resources with a demanding concurrent task. This allows us to determine whether a certain effect is driven by System 1 or System 2. In sum, in direct contrast with the bias blind spot hypothesis, available evidence indicates that biased reasoners not only show sensitivity to logic/heuristic conflict, they do so intuitively on the basis of mere System 1 processing. In addition, the corrective DI assumption is also being questioned. Recall that in the DI framework correct responses in case of conflict are assumed to result from a correction of the heuristic System 1 response after System 2 deliberation. However, evidence is amassing that correct responses in these cases are also generated intuitively (e.g., Bago & De Neys, 2017a; Newman, Gibb, & Thompson, 2017). Most direct evidence for this claim comes from studies that adopt a two-response paradigm (Thompson, Turner, & Pennycook, 2011). In this paradigm, participants are asked to immediately respond with the first intuitive answer that comes to mind. Afterwards, they are allowed to take all the time they want to reflect on the problem and generate a final response. To make sure that the initial response is generated intuitively on the basis of System 1 processing, it has to be generated under stringent time- pressure and/or cognitive load (Bago & De Neys, 2017a; Newman et al., 2017). This procedure allows us to examine the time-course of response generation and establish

131 Bago Bence – Thèse de doctorat - 2018 empirically which response is generated by System 1. Studies that adopted this approach clearly indicate that many reasoners who give a correct final response (i.e., after System 2 deliberation was allowed) already managed to give this response in the initial response stage in which they had to reason intuitively. Hence, pace the corrective DI assumption, correct responders do not necessarily need to deliberate to correct a faulty intuition, their intuitive System 1 response is already correct. In sum, we believe that evidence is amassing against the core predictions of the serial DI model (but see also Evans, 2018– for an alternative view). Note that traditional competitors of the serial model do not fare any better in this respect. For example, the parallel model (Epstein, 1994; Sloman, 1996) posits that System 1 and System 2 are always engaged simultaneously from the start of the reasoning process. In theory, this model can account for biased reasoners’ conflict sensitivity. However, just like the serial model it still assumes that cueing of the logical answer relies on System 2 deliberation. As we mentioned, evidence suggests that this can be done on to basis of mere System 1 processing. Therefore, a number of scholars (e.g., Bago & De Neys, 2017a; Ball, Thompson, & Stupple, 2017; Banks, 2018; Białek & De Neys, 2017; De Neys, 2012; Pennycook, 2017; Pennycook, Fugelsang, & Koehler, 2015; Thompson & Newman, 2017; Trippas & Handley, 2018) have recently called for a new dual process view which we can refer to as a “hybrid” model15 (De Neys, 2018). At the most general level, what sets the hybrid model view apart is that it entails that the response that is traditionally considered to be computed by System 2 can also be cued by System 1. Hence, System 1 is assumed to generate (at least) two different types of intuitive responses. For example, in the case of a classic reasoning task one of these is the traditional “heuristic” intuitive response that is based on semantic and other associations (e.g., the response cued by the stereotypical description in the base-rate problem). This is the exact same intuitive response that is also assumed to be cued by the serial (and parallel) model. The critical second response is what we can refer to as a “logical” intuitive response which is based on elementary knowledge of basic logical and probabilistic principles (e.g., the role of base-rates). The underlying idea here is that even biased reasoners implicitly grasp elementary logical and probabilistic principles and activate this knowledge automatically when faced with a reasoning task. This intuitive logical knowledge allows one to detect that the heuristic

15 We use the “hybrid” model label to refer to core features that seem to be shared – under our interpretation – by the recent theoretical proposals of these various authors. It should be clear that this does not imply that these proposals are completely similar. We are talking about a general family resemblance rather than full correspondence and focus on commonalities rather than the differences.

132 Bago Bence – Thèse de doctorat - 2018 intuition is questionable in case of conflict without a need to engage in demanding System 2 computations. Clearly, if people have indeed logical intuitions such as the hybrid model entails, one might wonder why they are still massively biased and predominantly opt for the heuristic response? A key point is that the different intuitions can vary in strength or activation level (De Neys, 2012; Pennycook et al., 2015; Trippas & Handley, 2017). Typically, the heuristic intuition will be stronger (i.e., have a higher activation level) than the logical one. The presence of a logical intuitive response allows reasoners to detect conflict, but it does not suffice for the logical response to be selected as overt answer. In most cases, the heuristic intuition will dominate, and the modal reasoner will still be biased. But critically there can be individual variance in this respect. For some reasoners, the logical intuition might be so weak that they even fail to detect conflict (Pennycook et al., 2015). For others, the logical intuition can be stronger than the heuristic one (Bago & De Neys, 2017a). Consequently, these latter individuals will also manage to give the correct answer as their initial response without any further System 2 engagement. Taken together, these ideas result in a model in which one’s intuitive reasoning performance is determined by the absolute and relative strength of different intuitions (Bago & De Neys, 2017a; Pennycook et al., 2015, Pennycook, 2018). Whatever intuition has the highest absolute strength level gets selected as initial response. The relative difference determines the level or likelihood of experienced conflict. The more equal the activation strengths (i.e., the smaller the relative difference), the more pronounced the conflict experience will be. For example, an individual with a very strong heuristic intuition and a weak logical intuition should be less likely to detect conflict than an individual with a logical and heuristic intuition that are equally strong. There is little doubt that the hybrid model captures the recent empirical conflict detection and correct intuitive response generation findings that the standard DI (or parallel) model struggles to account for. However, in and by itself this is not surprising. The hybrid model is a post hoc postulation. It was specifically designed to account for the observed empirical findings. It did not predict these findings a priori. This is an important difference with the DI model. The DI model made clear and testable predictions (e.g., the bias blind spot and corrective assumptions) that allowed us to test and validate or falsify the model. In order to advance the development of the hybrid model, we need to derive such a priori hybrid model predictions and test them empirically. In the present paper, we present a study that focuses on this issue.

133 Bago Bence – Thèse de doctorat - 2018

One way to test and validate the hybrid model is by experimentally manipulating the strength of the logical intuition. One can achieve this, for example, by manipulating the extremity of the base-rates (e.g., Pennycook et al., 2012, 2015). Extreme base-rates (e.g., 997 women and 3 men) present a stronger cue with respect to the importance of taking the base- rates into account than more moderate base-rates (e.g., “700 women and 300 men”). Logically speaking, the more dominant the larger group is in size, the more likely that a randomly drawn individual will belong to it. Hence, by manipulating the extremeness of the base-rates, we should affect the strength of the logical intuition; it will become weaker as the base-rate probabilities become more moderate (Pennycook et al., 2015). This leads to at least three testable predictions. First, if the logical intuition is made weaker, we should observe fewer correct intuitive answers. This prediction is based on the postulation that the absolute strength level determines the initial response selection. Whatever intuition dominates gets selected. Hence, all other things being equal if we make the logical intuition less strong it will be even more likely that the heuristic response will dominate. Consequently, correct intuitive responses should be less likely. Second, biased reasoners should be less likely to detect conflict when the logical intuition is less strong. This prediction is based on the assumption that the relative strength difference determines the conflict detection likelihood. In case of a dominant heuristic intuition, making the logical intuition less strong will increase the relative difference between the two (i.e., the heuristic will dominate even more) which should decrease the likelihood and/or the level of experienced conflict. Interestingly, at first sight, the initial studies in which Pennycook et al. (2012, 2015) introduced the base-rate extremity manipulation might seem to support these predictions. In contrast with the extreme base-rate condition - in which the logical intuition strength should be maximal - the moderate-base-rate condition gave rise to fewer correct responses and less or no conflict detection effects. However, note that Pennycook et al. used a traditional “one- response” paradigm in which participants were allotted all the time they needed to deliberate and reflect on the problem. This implies that the results can be driven by System 2 processing. However, if the hybrid model is correct we should observe similar effects in the absence of any System 2 processing. That is, the claim is that people’s intuitive reasoning performance is solely based on the absolute and relative intuitive strength differences within System 1. Hence, the reduced accuracy and conflict detection effects should be observed in the absence of System 2 intervention. To test the hypothesis we adopted Pennycook et al.’s base-rate extremity manipulation in the present study but combined it with a two-response design in which participants gave

134 Bago Bence – Thèse de doctorat - 2018 both an initial intuitive and final response after deliberation. In the initial response stage we imposed a challenging response deadline and a concurrent load task to guarantee that the findings could not be affected by System 2 processing. Key question is whether we will observe reduced accuracy and conflict detection at the initial, intuitive response stage. Critically, the hybrid model makes a counter-intuitive but clear third prediction. In contrast with biased, incorrect responders, the few reasoners who still manage to give a correct intuitive response with moderate base-rates (i.e., when the logical intuition is made weaker) should show stronger conflict effects than with extreme base-rates (i.e., when the logical intuition is stronger). Why should this be the case? Figure 1 gives a pictorial illustration of the hybrid model assumptions. In the figure, we have plotted the strength of the different intuitions in imaginary activation “units”. The bottom panel (1B) shows the modal case of biased (i.e., incorrect) responders. This is the case we have focused on so far. The model assumes the incorrect responders are biased precisely because their heuristic intuition is stronger (e.g., 4 units) than their logical intuition (e.g., 2 units). Now, imagine that our base-rate manipulation decreases the strength of a logical intuition with, say, 1 unit. This is illustrated at the right hand side of the figure. With moderate base-rates, an incorrect responder’s logical intuition strength will decrease (e.g., it will go from 2 units to 1 unit). Because of the logical strength reduction, the relative strength difference between the logical and heuristic intuition increases (e.g., it goes from a 2 to a 3 unit difference). Consequently, conflict detection becomes less likely for the incorrect responders. But as the top panel of Figure 1 illustrates, we should expect the exact opposite effect for correct intuitive responders. The model assumes they respond correctly precisely because their logical intuition is stronger (e.g., 4 units) than their heuristic intuition (e.g., 2 units). With moderate base-rates, an incorrect responders’ logical intuition strength will decrease (e.g., it will go from 4 to 3 units). In this case, the experimental logical strength reduction will decrease the relative strength difference between the logical and heuristic intuition (e.g., it goes from a 3 to 1 unit difference). Consequently, since a smaller relative difference implies more conflict, reasoners who still respond correctly with moderate-base-rates should show a more pronounced conflict effect. These opposite effects of the logical strength manipulation on incorrect and correct intuitive conflict detection should provide us with a strong test of the hybrid models’ assumptions.

135 Bago Bence – Thèse de doctorat - 2018

A. Initial correct response case

B. Initial incorrect response case

Figure 1. Illustration of the hybrid model predictions at the initial response stage for correct (A.) and incorrect (B.) responders. The y-axis represents the strength of the heuristic and logical intuition in imaginary strength units. Moderate base-rates are assumed to decrease the strength of the logical intuition by one unit.

136 Bago Bence – Thèse de doctorat - 2018

Method

Participants

In total, 145 participants were tested (81 females, M= 40.7 years, SD = 14.1 years). Participants were recruited online via the Crowdflower platform. Only North-American English speakers were allowed to participate. Participants were paid $0.25. A total of 40.6% of participants reported having high school as highest completed educational level, while 58% reported that they had a post-secondary educational degree (1.4% reported less than high school).

Material

Reasoning task. Participants solved eight base-rate problems. All problems were taken from Pennycook, Cheyne, Barr, Koehler, and Fugelsang (2014). Participants always received a description of the composition of a sample (e.g., “This study contained I.T engineers and professional boxers”), base-rate information (e.g., “There were 995 engineers and 5 professional boxers”) and a description that was designed to cue a stereotypical association (e.g. “This person is strong”). Participants’ task was to indicate to which group the person most likely belonged. The problem presentation format we used in this research was based on Pennycook et al.'s (2014) rapid-response paradigm. In this format, the base-rates and descriptive information are presented serially and the amount of text that is presented on screen is minimized to minimize the influence of reading processes. Participants received three pieces of information in a given trial. First, the names of the two groups in the sample (e.g., “This study contains clowns and accountants”) were presented. Second, participants were presented with stereotypical descriptive information (e.g., Person ‘L’ is funny) as well. The descriptive information specified a neutral name (‘Person L’) and a single word personality trait (e.g., “strong” or “funny”) that was designed to trigger the stereotypical association (based on extensive pretesting, see Pennycook et al., 2015). Finally, participants were also presented with the base-rate probabilities after the presentation of the stereotypes. The following illustrates the full problem format:

137 Bago Bence – Thèse de doctorat - 2018

This study contains clowns and accountants. Person 'L' is funny. There are 995 clowns and 5 accountants. Is Person 'L' more likely to be: o A clown o An accountant

Half of the presented problems were conflict items and the other half were no-conflict items. In no-conflict items, the base-rate probabilities and the stereotypic descriptive information cued the same response. In conflict items, the stereotypic information and the base-rate probabilities cued different responses. Two different item sets were used. The conflict items in one set were the no-conflict items in the other, and vice-versa. This was done by reversing the base-rates. Each of the two sets was used for half of the participants. This counterbalancing minimized the possibility that mere content or wording differences between conflict and no-conflict items could influence the results. As in Pennycook et al. (2015), we used two kinds of base-rates (which were manipulated between-subjects): a moderate and extreme condition. As Pennycook et al. we also used three base-rate pairs within each condition: in the moderate condition they were 700/300, 710/290, 720/280, and in the extreme condition they were 997/3, 996/4, and 995/5. These slight manipulations of the base-rate pairs within each condition help to make the task less repetitive (De Neys & Glumicic, 2008). Only the base-rates were changed between the two conditions, everything else (stereotypes, name of the groups) remained constant. Participants were randomly allocated to the moderate or extreme base-rate treatment. Each problem started with the presentation of a fixation cross for 1000 ms. After the fixation cross disappeared, the sentence which specified the two groups appeared for 2000 ms. Then the stereotypic information appeared, for another 2000 ms, while the first sentence remained on the screen. Finally, the base-rates appeared together with the question and two response alternatives. Note that we presented the base-rates and question together (rather than presenting the last information for 2000 ms first) to minimize the possibility that some participants would start solving the problem during the presentation of the last part of the problem. Once all the parts were presented, participants were able to select their answer by clicking on it. The position of the correct answer alternative (i.e., first or second response option) was randomly determined for each item. The eight items were presented in random order.

138 Bago Bence – Thèse de doctorat - 2018

Two-response format. The two-response task format was similar to the one introduced by Bago and De Neys (2017a). People were clearly instructed that we were interested in their first, initial response to the problem. Instructions stressed that it was important to give the initial response as fast as possible and that participants could afterwards take additional time to reflect on their answer. The literal instructions that were used stated the following:

“Welcome to the experiment! Please read these instructions carefully! This experiment is composed of 8 questions and a couple of practice questions. It will take about 10 minutes to complete and it demands your full attention. You can only do this experiment once. In this task we'll present you with a set of reasoning problems. We want to know what your initial, intuitive response to these problems is and how you respond after you have thought about the problem for some more time. Hence, as soon as the problem is presented, we will ask you to enter your initial response. We want you to respond with the very first answer that comes to mind. You don't need to think about it. Just give the first answer that intuitively comes to mind as quickly as possible. Next, the problem will be presented again and you can take all the time you want to actively reflect on it. Once you have made up your mind you enter your final response. You will have as much time as you need to indicate your second response. After you have entered your first and final answer we will also ask you to indicate your confidence in the correctness of your response. In sum, keep in mind that it is really crucial that you give your first, initial response as fast as possible. Afterwards, you can take as much time as you want to reflect on the problem and select your final response. You will receive $0.25 for completing this experiment. Please confirm below that you read these instructions carefully and then press the "Next" button.”

After the general instructions were presented the specific instructions for the base-rate task were presented:

“In a big research project a large number of studies were carried out where a psychologist made short personality descriptions of the participants. In every study there were participants from two population groups (e.g., carpenters and policemen). In each study one participant was drawn at random from the sample. You’ll get to see one personality trait of this randomly chosen participant. You’ll also get information about the composition of the population groups tested in the study in question. You'll be asked to indicate to which population group the participant most likely belongs. As we told you we are interested in your initial, intuitive response. First, we want you to respond with the very first answer that comes to mind. You don't need to think about it. Just give the first answer that intuitively comes to mind as quickly as possible. Next, the problem will be presented again and you can take all the time you want to actively reflect on it. Once you have made up your mind you enter your final response. After you made your choice and clicked on it, you will be automatically taken to the next page. After you have entered your first and final answer we will also ask you to indicate your confidence in the correctness of your response. Press "Next" if you are ready to start the practice session!”

After the task specific instructions, participants solved practice problems (specified under “Procedure”) to familiarize them with the task. Then they were able to start the experiment. For the first response, people were instructed to give a quick, intuitive response. After they clicked on the answer, they were asked to enter their confidence in their answer, on

139 Bago Bence – Thèse de doctorat - 2018 a scale from 0% to 100%, with the following question: “How confident are you in your answer? Please type a number from 0 (absolutely not confident) to 100 (absolutely confident)”. Next, they were presented with the problem again, and they were told that they could take as much time as they needed to give a final answer. As a last step, they were asked to give the confidence in their final answer. The colour of the actual question and answer options were green during the first response, and they were blue during the second response phase, to visually remind participants which question they were answering at the moment. For this purpose, right under the question, a reminder sentence was placed: “Please indicate your very first, intuitive answer!” and “Please give your final answer.” respectively.

Response deadline. In order to minimize the possibility of System 2 engagement during the initial response, we used a strict response deadline (3000 milliseconds), based on a previous reading pre-test (see Bago & De Neys, 2017a). The deadline cut-off was based on the average time the pre-test participants needed to simply read the problems. 1000 ms before the deadline, the background turned yellow to alert the participants to the approaching deadline. If participants did not select an answer within 3000 ms they got feedback to remind them that they had not answered within the deadline and they were told to make sure to respond faster on subsequent trials. Obviously, there was no response deadline for the final response.

Cognitive load task. To further minimize the possibility of System 2 engagement during the initial response phase we also imposed a concurrent load task to burden participants’ cognitive resources (i.e., the dot memorization task, see Miyake, Friedman, Rettinger, Shah, & Hegarty, 2001). Here we also followed the procedure adopted by Bago and De Neys (2017a). The rationale behind the load manipulation is simple; by definition, System 2 processing requires executive cognitive resources, while System 1 processing does not (Evans & Stanovich, 2013). Consequently, if we burden participants’ executive resources while they are asked to solve reasoning problems, System 2 engagement is less likely. We opted for the dot memorization task because it is well-established that it successfully burdens participant’s executive resources in a reasoning context (De Neys & Schaeken, 2007; De Neys & Verschueren, 2006; Franssens & De Neys, 2009; Johnson et al., 2016). Before each reasoning problem (and after the presentation of the fixation cross) participants were presented with a 3 x 3 grid, in which 4 dots were placed. Participants were instructed that it

140 Bago Bence – Thèse de doctorat - 2018 was critical to memorize the pattern even though it might be hard while solving the reasoning problem. After providing the initial response and the initial confidence rating, participants were shown four different matrixes and they had to choose the correct, to-be-memorized pattern. They received feedback as to whether they chose the correct or incorrect pattern. The load was only applied during the initial response stage and not during the subsequent final response stage in which participants were allowed to deliberate and recruit System 2.

Conflict detection measure. Previous research in the reasoning and cognitive control field established that the effect of conflict can be measured by the post-decision confidence level differences between conflict and no-conflict items (e.g., Botvinick, 2007; Johnson et al., 2016; Mevel et al., 2015; Pennycook, Trippas, et al., 2014; Stupple, Ball, & Ellis, 2013; Yeung & Summerfield, 2012). If people are being faced with two competing responses this should decrease their response confidence on the conflict items. Therefore, we will use this confidence difference as our primary index to measure the level of experienced conflict at the initial response stage; a higher confidence decrease is assumed to reflect a higher level of experienced conflict. Note that we refrained from using response latencies to measure conflict detection. Although this is a popular conflict measure in one-response paradigms, previous two-response studies established that it does not reliably track conflict detection effects reflected in confidence ratings at the initial response stage (Bago & De Neys, 2017a; Thompson & Johnson, 201416). For completeness, one may also note that our assumption that the confidence decrease reflects the level of experienced conflict is questionable at the final response stage. After deliberate reflection, the initially experienced doubt can be mitigated (e.g., De Neys et al., 2013). However, our key interest lies in the initial, intuitive response stage in which System 2 deliberation is experimentally minimized. Thus, confidence in the correctness of the response was recorded after the initial and the final response stage but our primary interest concerns the initial response stage. Note that participants were still under concurrent load while providing the initial confidence rating. This helps to guarantee that the confidence rating is not affected by post-decision System 2 processing.

16 As Argued by Bago and De Neys (2017a) this might result from the specific design characteristics of the two- response paradigm. Forcing people to respond as fast as possible might prevent the slowing effect from showing up.

141 Bago Bence – Thèse de doctorat - 2018

Procedure

The experiment was run online. After the instructions, participants were presented with practice problems to familiarize them with the procedure. They first solved two practice (no-conflict) reasoning problem. After, they were presented with a practice dot matrix recall item (i.e., they were simply shown a dot pattern and after it disappeared they were asked to identify the pattern from four presented options). As a last practice step, they were given two reasoning problems (the first of which was the initial reasoning problem) which they now had to solve under load. At the end of the experiment, standard demographic questions were recorded.

Exclusion criteria. All trials where participants did not manage to provide an initial response within the deadline were excluded from the analysis (9.1% of trials). We also excluded those trials where participants did not give the correct response to the dot memorization task (17.4% of trials). These exclusion criteria help to guarantee that System 2 processing was maximally ruled out during the initial response stage. Altogether, we excluded 24.1% of trials and analyzed 881 trials (out of 1160).

Results

Accuracy. Table 1 gives an overview of the response accuracies (i.e., percentage of trials on which the response cued by the base-rates was selected). Visual inspection of the table points to a number of expected trends. First, as in previous studies (e.g., De Neys & Glumicic, 2008; Pennycook et al., 2012, 2015) accuracy is uniformly high on the no-conflict control trials (90% in moderate and 93% in extreme condition). This is not surprising. All dual process models predict that mere System 1 processing suffices to solve these problems correctly. Further in line with general expectations, we find that participants have a harder time on the traditional conflict problems: Reasoners are typically biased here with a maximum accuracy that does not exceed 42%. These effects replicate classic findings with the one-response paradigm. In line with previous two-response studies (e.g., Bago & De Neys,

142 Bago Bence – Thèse de doctorat - 2018

2017a; Newman et al., 2017; Thompson et al., 2011) we also observe that there is a general trend towards a slightly higher accuracy at the final than at the initial response stage17. Turning to the effect of the base-rate extremity manipulation one can see that at the final response stage we replicate the findings of Pennycook et al. (2015): When reasoners had the time to deliberate, there are fewer correct responses with moderate (23%) than with extreme base-rates problems (41.6%). However, the key finding is that we observe a similar trend at the initial response stage: there are also fewer correct initial responses with moderate (16.4%) than with extreme (29.7%) base-rates.

Table 1. Frequency of correct initial and final responses for conflict and no-conflict items with extreme and moderate base-rates.

Response Base-rate Extremity Moderate Extreme

Conflict Initial 16.4% 29.7%

Final 23% 41.6%

No-conflict Initial 90.9% 93.4%

Final 90% 93.7%

To test these results statistically, we used mixed effect logistic regression models in which subjects were entered as random effect and response stage (initial or final) and base- rate extremity (moderate or extreme) as fixed factors. For the conflict problems, we found that response stage, χ2 (3) = 20.49, p < 0.0001, b = 1.09, and base-rate extremity, χ2 (4) = 9.27, p < 0.0001, b = -2.32, increased the model fit significantly but their interaction, χ2 (5) = 0.998, p = 0.32, did not. These results confirm that the moderate (vs extreme) base-rate manipulation decreases response accuracy at both response stages. As one may expect, none of these effects are observed on the no-conflict problems. Initial and final no-conflict accuracies are

17 As a side note, the current findings also replicate the two-response findings with respect to the non-corrective nature of System 2 processing (e.g., Bago & De Neys, 2017a). We observe that on the majority of trials on which a reasoner gives the correct response as their final answer, they already generated the correct response in the initial response stage (both with moderate and extreme base-rates, 61.5% and 64% of correct final trials, respectively). This provides further evidence against the corrective assumption: correct responders do not necessarily need to engage System 2 to correct their intuitions, their intuitions are already correct. However, this does not imply that correction does not occur. Although most correct reasoners do not need to correct their intuitive response, some do. This is reflected, for example, in the higher overall accuracy that we observed in the final vs initial response stage.

143 Bago Bence – Thèse de doctorat - 2018 uniformly high. Neither the effect of response number, χ2 (3) = 0.08, p = 0.78, nor base-rate extremity, χ2 (4) = 1.06, p = 0.30, or their interaction, χ2 (5) = 0.05, p = 0.82, was significant. In sum, the general pattern of accuracy results is completely in line with previous studies. Key new finding is that we observe an effect of the base-rate extremity manipulation at the initial response stage. Moderate base-rates make intuitive correct responding less likely. Hence, a manipulation that is assumed to decrease the strength of the logical intuition indeed leads to fewer correct intuitive logical responses. This supports the accuracy prediction of the hybrid model.

Conflict detection findings. The hybrid model predicted that initial incorrect reasoners will be less likely to experience conflict in the moderate base-rate condition, whereas correct initial responders should show the opposite effect and should experience more conflict when base-rates are moderate vs extreme. To test this hypothesis we contrasted the initial confidence levels for correct initial no-conflict trials (which we will refer to as our “baseline”) and correct/incorrect conflict trials. We discard the rare incorrectly solved no-conflict trials because these cannot be interpreted unequivocally (see Bago & De Neys, 2017a; De Neys et al., 2011; Pennycook et al., 2015) Figure 2 shows the conflict detection effect findings (i.e., confidence contrast, no- conflict minus conflict trials confidence) at the initial response stage (see also Table 2 for complete overview). A higher difference value implies a larger confidence decrease when solving conflict items which should reflect a more pronounced conflict experience. Visual inspection indeed confirms the predictions of the hybrid model. Incorrect responders show a smaller confidence decrease or conflict effect with moderate-base than with extreme base- rates, whereas correct responders show the opposite trend. Hence, consistent with the hybrid model predictions, an experimental manipulation that decreases the strength of the logical intuition tends to make incorrect responders feel less conflicted and correct responders more conflicted. We used mixed effect linear regression models to test the visual trends in the initial confidence data statistically. We entered the random intercept of subjects in the models. As fixed factors, we entered a variable which we will refer to as “response group”, base-rate extremity (moderate or extreme), and their interaction. The “response group” variable coded whether a given data point was a correctly solved no-conflict trial, a correctly solved conflict trial, or an incorrectly solved correct trial. If base-rate extremity has opposite effects on correct and incorrect reasoners’ conflict experience we would expect a significant interaction

144 Bago Bence – Thèse de doctorat - 2018 between the response group and extremity factors. The analysis showed that the main effect of response group, χ2 (5) = 27.12, p < 0.0001, improved model fit significantly, while the main effect of base-rate extremity did not, χ2 (6) = 3.83, p = 0.0502. Critically, the interaction also improved fit further, χ2 (8) = 11.43, p = 0.003. To follow-up on this interaction, we ran separate analyses for correctly and incorrectly solved conflict trials. Here we tested whether the simple interaction between the conflict factor (conflict or no-conflict) and base-rate extremity (moderate or extreme) was significant. This allows us to test whether the observed conflict effect decrease with moderate base-rates for incorrect conflict responses and the observed conflict effect increase with moderate base-rates for correct responses were statistically significant. Results showed that this was indeed the case. Both for correct, b = 5.26, t (719.7) = 2.3, p = 0.021, and incorrect responses, b = -8.06, t (748.2) = -2.1, p = 0.036, the interaction significantly improved model fit.

Figure 2. Initial confidence difference between the no-conflict baseline and correctly and incorrectly solved conflict problems with extreme and moderate base-rates. A higher difference value reflects a more pronounced conflict detection. Error bars are standard errors of the difference between the means of the baseline and conflict cases.

Additional data. Our primary interest concerned the confidence conflict findings for the initial, intuitive responses. However, we also recorded confidence rating for the final response. For completeness, Table 2 (bottom panel) presents an overview of these findings. As the table indicates the pattern is similar to what we observed at the initial response stage. The differential impact of the extremity manipulation for correct and incorrect responders is

145 Bago Bence – Thèse de doctorat - 2018 also present at the final response stage. However, the final response confidence findings should be interpreted with caution. First, the hybrid model we set out to test here made no clear prediction on what would happen at the final response stage. In addition, deliberate System 2 processing might mitigate the intuitively detected initial conflict. Indeed, especially for correct responses, final confidence cannot be considered a pure index of conflict detection per se (De Neys et al., 2011; Pennycook et al., 2015).

Table 2. Overall confidence ratings (A.) and confidence contrast difference between the no- conflict baseline and conflict problems (B.) as a function of response stage, response accuracy and base-rate extremity. Standard deviations of the mean are in brackets. Response Accuracy Base Rate extremity Moderate Extreme A. Overall Initial Correct no-conflict 80.7% (21.7) 88% (18.7) Incorrect no-conflict 53.2% (41.1) 53.8% (35.2) Correct conflict 65.2% (31.9) 84.4% (16.3) Incorrect conflict 77.6% (25.2) 79.8% (28.9) Final Correct no-conflict 83.7% (20.3) 91.6% (15.4) Incorrect no-conflict 54.2% (38.4) 41.6% (31.9) Correct conflict 65.4% (32.1) 83.3% (24.5) Incorrect conflict 79.8% (24.7) 83.9% (20.7) B. Difference contrast (correct no-conflict baseline – conflict) Initial Correct conflict 15.5% (5.5) 3.6% (2.5) Incorrect conflict 3.1% (2.4) 8.2% (2.8) Final Correct conflict 18.3% (4.7) 8.3% (3.1) Incorrect conflict 3.9% (2.3) 7.7% (2.6)

Nevertheless, for consistency, we used the exact same mixed effect regression models approach as with the initial confidence findings to test the final confidence trends statistically. As with the initial confidence data, there was a significant interaction between response group (correctly solved no-conflict trial, correctly solved conflict trial, or an incorrectly solved conflict trial) and extremity factors (moderate vs extreme), χ2 (8) = 22.75, p < 0.0001. In follow-up tests, we again tested the interaction between the conflict factor (conflict vs no conflict) and base rate extremity (extreme vs moderate) separately for correct and incorrect

146 Bago Bence – Thèse de doctorat - 2018 conflict response. Results showed that the interaction was significant both for the correct, b = -12, t (754.7) = -3.6, p = 0.0003, and incorrect, b = 5.86, t (725.2) = 2.5, p = 0.012, responses. Hence, the final confidence trends are consistent with the initial confidence ones but should be interpreted with caution.

Table 3. Overall response times (A.) and response time differences between the no-conflict baseline and conflict problems (B.) as a function of response stage, accuracy and base-rate extremity. Standard deviations of the mean are in brackets. Means were calculated on log- transformed data and were back-transformed prior to the subtraction. Response Accuracy Base Rate extremity Moderate Extreme A. Overall Initial Correct no-conflict 1.43 s (1.4) 1.63 s (1.4) Incorrect no-conflict 1.14 s (1.3) 1.61 s (1.5) Correct conflict 1.58 s (1.5) 1.96 s (1.3) Incorrect conflict 1.41 s (1.4) 1.46 s (1.4) Final Correct no-conflict 2.67 s (1.6) 2.86 s (1.7) Incorrect no-conflict 2.3 s (1.7) 2 s (1.5) Correct conflict 2.89 s (1.8) 2.36 s (1.9) Incorrect conflict 2.65 s (1.7) 2.91 s (1.7) B. Difference contrast (correct no-conflict baseline – conflict) Initial Correct conflict -0.15 s (0.26) -0.33 s (0.2) Incorrect conflict 0.02 s (0.14) 0.17 s (0.15) Final Correct conflict -0.22 s (0.27) -0.5 s (0.24) Incorrect conflict 0.02 s (0.17) -0.05 s (0.2)

With the same caveat in mind, we also present the descriptive conflict latency contrast data for initial and final responses. As we explained, response latencies are a popular conflict detection measure in one-response paradigms, but previous two-response paradigms indicated that they do not reliably track conflict detection effects reflected in confidence ratings at the initial response stage (Bago & De Neys, 2017a; Thompson & Johnson, 2014). As Table 3 indicates, this trend is also observed in the present study. At the initial response stage - where people are forced to give a response as fast as possible – the descriptive data do generally not point to longer processing times for conflict problems and do not track the confidence results.

147 Bago Bence – Thèse de doctorat - 2018

At the final response stage, the latency pattern also diverges from the confidence pattern. The same mixed model regression approach as with the confidence ratings indeed indicated that neither for the initial, χ2 (8) = 4.5, p = 0.08, nor final response times, χ2 (8) = 0.3, p = 0.86, the critical interaction term between response group and base rate extremity improved model fit. However, one might note that at the descriptive level both the final latency and confidence contrast indexes do point to a less pronounced conflict detection effect for incorrect responders at the final response stage (i.e., less pronounced slowing and response doubt) with moderate vs extreme base-rates. Although the exploratory ad hoc nature of these additional data analyses needs to be kept in mind, we do note that this final response pattern is consistent with Pennycook et al.’s (2015) original base-rate manipulation findings (i.e., less conflict detection with moderate base-rates).

General Discussion

In this study, we tested three predictions of a hybrid dual process model in which people’s intuitive reasoning performance is assumed to be determined by the absolute and relative strength of different intuitions. By manipulating the extremity of the base-rates in our reasoning problems we manipulated the strength of the logical intuition that is hypothesized to cue selection of the correct base-rate response. Consistent with the hybrid model predictions we observed that experimentally reducing the strength of the logical intuition decreased the number of correct intuitive responses when solving problems in which heuristic and logical intuitions conflicted. Second, incorrect intuitive responders were less likely to register the intrinsic conflict (as reflected in decreased confidence) in this case, whereas correct intuitive responders experienced more conflict. These findings are hard to account for in the traditional serial default-interventionist (or parallel) model but provide support for the postulations of the hybrid dual process model. The present study highlights how the hybrid dual process model generates new predictions that allow us to validate the model. Although we believe that the results illustrate the potential of the hybrid model view we also want to stress that the model is a “work in progress” and that there remain important challenges ahead. One issue concerns the specification of the role of System 2 processing in the framework. The general hybrid model that we put forward here focuses on the initial response stage in which System 2 is not activated. Although it postulates that detection of conflict will serve as a cue for the

148 Bago Bence – Thèse de doctorat - 2018 recruitment of System 2 (De Neys, 2012), it currently makes no further predictions about the nature of this System 2 processing. Interestingly, Pennycook et al. (2015) - one of the author teams that favored a hybrid model view - have attempted to provide such a further characterization of the System 2 processing stage. In their three-stage model, the third processing stage specifies two different types of System 2 engagement that can follow intuitive conflict detection: Cognitive Decoupling and Rationalization. Pennycook et al. explained the effects of their base-rate extremity manipulation primarily on the basis of these System 2 processes (e.g., less detection will lead to less rationalization and hence, a less pronounced response latency increase). The present study highlights that similar effects can be observed on the basis of mere System 1 processing. During the critical initial response stage, System 2 processing was experimentally “knocked” out in the current study. Hence, in and by itself we do not need System 2 modulation to account for the effects of the base-rate extremity manipulation. To avoid confusion, it is important to stress that our findings do not argue against Pennycook et al.’s findings or model. The hybrid model that we put forward here focuses on the System 1 interaction between conflicting intuitions. It makes no further claims about the System 2 deliberation that might follow this conflict. In other words, it is possible that System 2 processes will result in similar effects. The model does not speak to this issue. The point is simply that the effects of a base-rate extremity manipulation can be observed in the absence of System 2 processing. System 2 processing might modulate these effects but it is not required to account for them. This implies that the precise role and possible unique contribution of System 2 remains to be specified in future work. A second issue is that even the nature of the intuitive System 1 processing in the hybrid model (or any other dual process model for that matter) is in need of a more detailed specification. To illustrate, consider the recent findings of Bago and De Neys (2017b). In this study, we attempted to manipulate the strength of a heuristic intuition by changing the presentation order of the base-rates and descriptive information. Building on work of Pennycook et al. (2015) we hypothesized that whatever information is presented last would be more salient and increase in intuitive strength. Pennycook et al. found some evidence for this assumption with a classic single-response paradigm (e.g., presenting the description after (vs before) the base-rates decreased the number of correct responses). However, when we used the order manipulation with a two-response paradigm we observed the exact opposite effects at the initial response stage (Bago & De Neys, 2017b). This led Bago and De Neys to hypothesize that the last cued intuitive response had not reached its peak level at the enforced initial answer stage. As Bago and De Neys argued, although System 1 processing is assumed

149 Bago Bence – Thèse de doctorat - 2018 to be “fast” it is perhaps naïve to assume that intuitions are generated instantly at full strength. We need to factor in that they need some time to reach their peak strength (and will subsequently also decay in strength with the passing of time). The findings of Bago and De Neys (2017b) help us to start sketching a more fine-grained specification of the intuitive response generation mechanism. But the point we want to highlight here is that none of these features (i.e., rise and decay time) were a priori predicted by the hybrid model. Hence, arriving at a fully specified model of the postulated logical and heuristic intuition generation in the hybrid model will undoubtedly need further explorative work in the coming years. A related question is what exactly constitutes the “strength” of an intuition. The hybrid model we proposed here uses “strength” as a general functional label to refer to the hypothesized activation level of an intuitive response. But “strength” and “activation level” can be operationalized in various ways (e.g., processing “fluency” or “speed”). At present the specific underlying processing specification and physical implementation remains to be characterized. Although we believe it is reasonable to rely on functional descriptions in theory development, we readily acknowledge that pinpointing the precise implementation remains an important challenge. We noted that the “hybrid” model we presented here was inspired by the work of various scholars (e.g., (Bago & De Neys, 2017a; Ball et al., 2017; Banks, 2017; Banks & Hope, 2014; Białek & De Neys, 2017; De Neys, 2012; Pennycook, 2017; Pennycook et al., 2015; Thompson & Newman, 2017; Trippas & Handley, 2017). Although we focused on the communalities, one might wonder about the precise relation between the various models we subsumed under the “hybrid” view. To avoid confusion, it might be worthwhile to explicitly point to some key developments. One early starting point for the hybrid framework was the “logical intuition” (De Neys, 2012) model. The model introduced the idea that System 1 cues both a heuristic and logical intuition which allowed to account for conflict detection findings and the evidence against the “bias blind” spot. Pennycook et al.’s (2015) “three-stage” model presented a more advanced hybrid model view that allowed to explicitly account for possible conflict detection failures while it also specified different types of System 2 engagement. Critically, Pennycook et al. postulated that differences in activation strength (i.e., generation speed in Pennycook et al.’s conceptualization) might underlie detection failures: For some reasoners the logical intuition can be so weak that they will not register conflict with the stronger heuristic intuition. Bago and De Neys (2017) further built on this strength variability idea to account for the observation that some reasoners generated the correct responses intuitively in their two-response studies. Hence, Bago and De Neys, specified that logical

150 Bago Bence – Thèse de doctorat - 2018 intuitions can also dominate heuristic ones. In sum, although these different models are constructed around a shared hybrid core, it should be clear that that the latest version (i.e., Bago & De Neys, 2017) specifies features that were not specified in the initial De Neys (2012) version. For example, although De Neys’ (2012) proposal entailed that different intuitions can have different strengths, it did not predict explicitly that there would be cases in which the logical intuition would dominate (or cases in which it would be absent, e.g., Pennycook et al., for that matter). But these observations are readily accounted for in the Bago and De Neys (2017) model. This again illustrates the point that the hybrid view is a work in progress. It has been further developed and specified over the last couple of years and will need further specification and development in coming years. As we noted, it is precisely because of the post hoc nature of the specifications in the Bago and De Neys (2017) version that it is critical to derive new predictions from the model and test these. It is here that the key contribution of the present paper lies. Finally, one might also wonder how one needs to conceive the relation between the hybrid and traditional DI dual process model. Clearly, the hybrid model combines key features of the traditional serial DI model and parallel model (hence, it’s “hybrid” name): Just like the serial model it assumes that System 2 processing is optional and starts later than System 1 processing. And just like the parallel model, it assumes that there is parallel logical and heuristic processing. However, unlike the parallel model it is claimed that this logical processing results from System 1 actiavtion. Nevertheless, the hybrid model maintains the core DI assumption that people rely by default on System 1 processing and only switch to System 2 processing in a later stage of the reasoning process. That is, the hybrid model still maintains the DI feature that some initial System 1 processing always precedes System 2 processing. We have been very clear about this in our own past writings. To illustrate, this is what De Neys (2014, p. 180) wrote:

“I believe that the conflict detection findings and Logical Intuition suggestion call for a hybrid view in which there is parallel activation of two different types of System-1 processes and an optional stage of System-2 processing (see De Neys, 2012, Figure 1, p. 34). To recap, my claim is that rather than parallel activation of two systems there would be parallel activation of two different types of intuitive System-1 processes: An intuitive heuristic process based on mere semantic and stereotypical associations, and what I refer to as an intuitive logical process based on the activation of traditional logical and probabilistic principles. Hence, it’s the “internal” System-1 conflict that triggers System-2. I agree with proponents of the default-interventionist view (e.g., Evans & Stanovich, 2013; Kahneman, 2010; Thompson, 2009) that it makes no sense to postulate that people simultaneously engage both full-blown System-2 and System-1 processing from the start. However, to put it in default-interventionist terms, my point is that part of our default processing is the activation of stored knowledge about class

151 Bago Bence – Thèse de doctorat - 2018

inclusion, proportionality, simple logical rules, etc. It is the automatic activation of this implicit knowledge that I refer to as “Logical Intuitions”. To be clear, once the default System-1 activation includes these logical intuitions, my view fits with the default-interventionist characterization.”

Bago and De Neys (2017, p. 108) stated:

“To avoid confusion, it should be stressed that the hybrid model we are advocating does not question that people rely by default on Type 1 processing and switch to Type 2 processing in a later stage. As we noted, the hybrid model still maintains the DI feature that default Type 1 processing precedes Type 2 processing. The key point is that the default Type 1 activation needs to include some elementary logical processing. If classic DI models allow for the postulation of logical intuitions as characterized here, they are of course fully coherent with the hybrid view (e.g., see De Neys, 2014).”

Note that Evans (2018) recently indicated that the traditional DI model indeed allows for the incorporation of logical intuitions. Evans’ point is that popular reasoning and decision making tasks that have been used to test the hybrid view entail fairly simple logic rules and principles. Therefore, these rules or principles might have become automatized through, for example, schooling and/or repeated exposure in daily life. Interestingly, note that such an automatization is precisely what De Neys (2012) has sketched as potential origin of the logical intuitions. In addition, task complexity has been delineated as a boundary condition of logical intuitions (De Neys, 2012, 2014). The logical intuition proposal never entailed that System 1 generates correct solutions to every possible problem that people are faced with. As we have put it somewhat bluntly before, the idea is not that people can solve nuclear physics or rocket science equations intuitively (De Neys, 2012, 2014). De Neys (2012) explicitly indicated that logical intuitions are observed in many classic reasoning tasks precisely because they entail simple logical principles. Recent evidence also suggests that intuitive logical responding is less likely with increasing problem complexity (Trippas, Thompson, & Handley, 2017). For clarity, we note that the tasks that have been used to test the hybrid model (e.g., bat-and-ball problem, conjunction fallacy, ratio bias, base-rate neglect, belief bias syllogisms) are the same tasks that have frequently been used to validate the DI model. We agree with the claim that these tasks are simple (De Neys, 2012). We disagree with a possible implied suggestion that the tasks would therefore not be valid to test the DI predictions. The very same tasks have been used to argue in favor of the bias blind spot and corrective DI assumptions in the past (e.g., Kahneman, 2011; Stanovich & West, 2000).

152 Bago Bence – Thèse de doctorat - 2018

To cut a long story short, because of the link between the hybrid and DI model some might argue that the hybrid model can be conceived as an extension rather than a revision of the traditional DI model. We believe this is ultimately a semantic discussion in which we prefer to keep a neutral stance. But as we stressed before and again here we readily acknowledge that the hybrid model builds on and maintains key features of the traditional DI framework. In sum, the critical contribution of the present paper is that it demonstrates how the hybrid dual process model view – just like the traditional Default-Interventionist model in the past – allows us to derive new predictions that we can verify empirically. We believe that the current set of findings would be hard to account for in the traditional DI (or parallel) model and thereby lend credence to the hybrid model view. However, there is little reason to be triumphal. Even the hybrid model is still in its infancy. A key challenge will be to provide a more fine-grained specification of the nature of the different System 1 intuitions and the role of System 2 deliberation. At the same time, the present findings further underline the potential of a hybrid model view. In our opinion, it presents the most promising way forward for the dual process field.

153 Bago Bence – Thèse de doctorat - 2018

REFERENCES

Aczel, B., Szollosi, A., & Bago, B. (2016). Lax monitoring versus logical intuition: The determinants of confidence in conjunction fallacy. Thinking & Reasoning, 22(1), 99– 117. https://doi.org/10.1080/13546783.2015.1062801 Alós-Ferrer, C., & Strack, F. (2014). From dual processes to multiple selves: Implications for economic behavior. Journal of Economic Psychology, 41, 1–11. Bago, B., & De Neys, W. (2017a). Fast logic?: Examining the time course assumption of dual process theory. Cognition, 158, 90–109. Bago, B., & De Neys, W. (2017b). The rise and fall of conflicting intuitions during reasoning. In Proceedings of the 39th Annual Meeting of the Cognitive Science Society (pp. 87– 92). Austin, TX: Cognitive Science Society. Retrieved from https://mindmodeling.org/cogsci2017/papers/0028/index.html Ball, L., Thompson, V., & Stupple, E. (2018). Conflict and dual process theory: the case of belief bias. In W. De Neys (Ed.), Dual Process Theory 2.0. Oxon, UK: Routledge. Banks, A. (2018). Comparing dual process theories: evidence from event-related potentials. In W. De Neys (Ed.), Dual Process Theory 2.0. Oxon, UK: Routledge. Banks, A. P., & Hope, C. (2014). Heuristic and analytic processes in reasoning: An event‐ related potential study of belief bias. Psychophysiology, 51(3), 290–297. Barr, N., Pennycook, G., Stolz, J. A., & Fugelsang, J. A. (2015). Reasoned connections: A dual-process perspective on creative thought. Thinking & Reasoning, 21(1), 61–75. Białek, M., & De Neys, W. (2017). Dual processes and moral conflict: Evidence for deontological reasoners’ intuitive utilitarian sensitivity. Judgment and Decision Making, 12(2), 148–167. Bonner, C., & Newell, B. R. (2010). In conflict with ourselves? An investigation of heuristic and analytic processes in decision making. Memory & Cognition, 38(2), 186–196. Botvinick, M. M. (2007). Conflict monitoring and decision making: reconciling two perspectives on anterior cingulate function. Cognitive, Affective, & Behavioral Neuroscience, 7(4), 356–366. Cassotti, M., Agogué, M., Camarda, A., Houdé, O., & Borst, G. (2016). Inhibitory control as a core process of creative problem solving and idea generation from childhood to adulthood. New Directions for Child and Adolescent Development, 2016(151), 61–72. De Neys, W. (2012). Bias and conflict a case for logical intuitions. Perspectives on Psychological Science, 7(1), 28–38. De Neys, W. (2018). Bias, conflict, and fast logic: Towards a hybrid dual process future? In W. De Neys (Ed.), Dual Process Theory 2.0. Oxon, UK: Routledge. De Neys, W., Cromheeke, S., & Osman, M. (2011). Biased but in doubt: Conflict and decision confidence. PloS One, 6(1), e15954. De Neys, W., & Glumicic, T. (2008). Conflict monitoring in dual process theories of thinking. Cognition, 106(3), 1248–1299. De Neys, W., Rossi, S., & Houdé, O. (2013). Bats, balls, and substitution sensitivity: Cognitive misers are no happy fools. Psychonomic Bulletin & Review, 20(2), 269– 273.

154 Bago Bence – Thèse de doctorat - 2018

De Neys, W., & Schaeken, W. (2007). When people are more logical under cognitive load. Experimental Psychology (Formerly Zeitschrift Für Experimentelle Psychologie), 54(2), 128–133. De Neys, W., Vartanian, O., & Goel, V. (2008). Smarter Than We Think When Our Brains Detect That We Are Biased. Psychological Science, 19(5), 483–489. De Neys, W., & Verschueren, N. (2006). Working memory capacity and a notorious brain teaser: The case of the Monty Hall Dilemma. Experimental Psychology, 53(2), 123– 131. Epstein, S. (1994). Integration of the cognitive and the psychodynamic unconscious. American Psychologist, 49(8), 709–724. Evans, J. (2010). Thinking Twice: Two Minds in One Brain. Oxford: Oxford University Press. Evans, J. S. B., & Stanovich, K. E. (2013). Dual-process theories of higher cognition advancing the debate. Perspectives on Psychological Science, 8(3), 223–241. Evans, J. S. B. T. (2018). Dual Process Theories: Perspectives and problems. In W. De Neys (Ed.), Dual Process Theory 2.0 (pp. 137–155). Oxon, UK: Routledge. Franssens, S., & De Neys, W. (2009). The effortless nature of conflict detection during thinking. Thinking & Reasoning, 15(2), 105–128. Gangemi, A., Bourgeois-Gironde, S., & Mancini, F. (2015). Feelings of error in reasoning— in search of a phenomenon. Thinking & Reasoning, 21(4), 383–396. https://doi.org/10.1080/13546783.2014.980755 Greene, J. (2013). Moral tribes: emotion, reason and the gap between us and them. New York, NY: Penguin Press. Gürçay, B., & Baron, J. (2017). Challenges for the sequential two-system model of moral judgement. Thinking & Reasoning, 23(1), 49–80. Johnson, E. D., Tubau, E., & De Neys, W. (2016). The Doubting System 1: Evidence for automatic substitution sensitivity. Acta Psychologica, 164, 56–64. Kahneman, D. (2011). Thinking, fast and slow. New York, NY: Farrar, Straus and Giroux. Mata, A., Ferreira, M. B., Voss, A., & Kollei, T. (2017). Seeing the conflict: an attentional account of reasoning errors. Psychonomic Bulletin & Review. Retrieved from http://dx.doi.org/10.3758/s13423-017-1234-7 Mevel, K., Poirel, N., Rossi, S., Cassotti, M., Simon, G., Houdé, O., & Neys, W. D. (2015). Bias detection: Response confidence evidence for conflict sensitivity in the ratio bias task. Journal of Cognitive Psychology, 27(2), 227–237. https://doi.org/10.1080/20445911.2014.986487 Miyake, A., Friedman, N. P., Rettinger, D. A., Shah, P., & Hegarty, M. (2001). How are visuospatial working memory, executive functioning, and spatial abilities related? A latent-variable analysis. Journal of Experimental Psychology: General, 130(4), 621– 640. Newman, I., Gibb, M., & Thompson, V. A. (2017). Rule-based reasoning is fast and belief- based reasoning can be slow: Challenging current explanations of belief -bias and base-rate neglect. Journal of Experimental Psychology: Learning, Memory, and Cognition, 43(7), 1154–1170. Pennycook, G. (2018). A perspective on the theoretical foundation of dual process models. In W. De Neys (Ed.), Dual Process Theory 2.0. Oxon, UK: Routledge.

155 Bago Bence – Thèse de doctorat - 2018

Pennycook, G., Cheyne, J. A., Barr, N., Koehler, D. J., & Fugelsang, J. A. (2014). Cognitive style and religiosity: The role of conflict detection. Memory & Cognition, 42(1), 1–10. Pennycook, G., Fugelsang, J. A., & Koehler, D. J. (2015). What makes us think? A three- stage dual-process model of analytic engagement. Cognitive Psychology, 80, 34–72. Pennycook, G., Trippas, D., Handley, S. J., & Thompson, V. A. (2014). Base rates: both neglected and intuitive. Journal of Experimental Psychology: Learning, Memory, and Cognition, 40(2), 544–554. Rand, D. G., Greene, J. D., & Nowak, M. A. (2012). Spontaneous giving and calculated greed. Nature, 489(7416), 427–430. Reyna, V. F. (2004). How people make decisions that involve risk: A dual-processes approach. Current Directions in Psychological Science, 13(2), 60–66. Simon, G., Lubin, A., Houdé, O., & De Neys, W. (2015). Anterior cingulate cortex and intuitive bias detection during number conservation. Cognitive Neuroscience, 6(4), 158–168. https://doi.org/10.1080/17588928.2015.1036847 Sloman, S. A. (1996). The empirical case for two systems of reasoning. Psychological Bulletin, 119(1), 3–22. Stanovich, K. (2011). Rationality and the reflective mind. Oxford: Oxford University Press. Stanovich, K. E., & West, R. F. (2000). Individual differences in reasoning: Implications for the rationality debate. Behavioral and Brain Sciences, 23(5), 645–665. Stupple, E. J., Ball, L. J., & Ellis, D. (2013). Matching bias in syllogistic reasoning: Evidence for a dual-process account from response times and confidence ratings. Thinking & Reasoning, 19(1), 54–77. Stupple, E. J., Ball, L. J., Evans, J. S. B., & Kamal-Smith, E. (2011). When logic and belief collide: Individual differences in reasoning times support a selective processing model. Journal of Cognitive Psychology, 23(8), 931–941. Thompson, V. A., & Johnson, S. C. (2014). Conflict, metacognition, and analytic thinking. Thinking & Reasoning, 20(2), 215–244. Thompson, V. A., Turner, J. A. P., & Pennycook, G. (2011). Intuition, reason, and metacognition. Cognitive Psychology, 63(3), 107–140. Thompson, V., & Newman, I. (2018). Logical intuitions and other conundra for dual process theories. In W. De Neys (Ed.), Dual Process Theory 2.0. Oxon, UK: Routledge. Travers, E., Rolison, J. J., & Feeney, A. (2016). The time course of conflict on the Cognitive Reflection Test. Cognition, 150, 109–118. Trippas, D., & Handley, S. (2018). The parallel processing model of belief bias: review and extensions. In W. De Neys (Ed.), Dual Process Theory 2.0. Oxon, UK: Routledge. Trippas, D., Thompson, V. A., & Handley, S. J. (2017). When fast logic meets slow belief: Evidence for a parallel-processing model of belief bias. Memory & cognition, 45(4), 539-552. Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185(4157), 1124–1131. Yeung, N., & Summerfield, C. (2012). Metacognition in human decision-making: confidence and error monitoring. Phil. Trans. R. Soc. B, 367(1594), 1310–1321.

156 Bago Bence – Thèse de doctorat - 2018

Chapter 4: Fast and Slow Thinking: Electrophysiological Evidence for Early Conflict Sensitivity

Popular dual process models have characterized reasoning as an interplay between fast, intuitive (System 1) and slow, deliberate (System 2) processes, but the precise nature of the interaction between the two systems is much debated. Here we relied on the temporal resolution of electroencephalogram (EEG) recordings to decide between different models. We adopted base-rate problems in which an intuitively cued stereotypical response was either congruent or incongruent with the correct response that was cued by the base-rates. Results showed that solving problems in which the base-rates and stereotypical description cued conflicting responses resulted in an increased centro-parietal N2 and frontal P3. This early conflict sensitivity suggests that the critical base-rates can be processed fast without slow and deliberate System 2 reflection. Findings validate prior EEG work and support recent hybrid dual process models in which the fast System 1 is processing both heuristic belief-based responses (e.g., stereotypes) and elementary logico-mathematical principles (e.g., base-rates).

Based on Bago, B., Frey, D., Vidal, J., Houdé, O., Borst, G. & De Neys, W. (submitted). Fast and Slow Thinking: Electrophysilogical Evidence for early conflict sensitivity.

157 Bago Bence – Thèse de doctorat - 2018

Introduction

For centuries, human thinking has been conceived as an interplay between more intuitive and deliberate processes. In the last decades dual process models that are inspired by this classic dichotomy have moved to the center stage in the cognitive and economic sciences (Evans & Stanovich, 2013; Greene, 2013; Kahneman, 2011; Rand, Greene, & Nowak, 2012). At the heart of these dual process models lays the idea that human reasoning relies on two different types of thinking - often referred to as System 1 and System 2 processing (K. E. Stanovich, 1999). System 1 is assumed to operate quickly and effortlessly whereas System 2 is assumed to be slower and more effortful. It is System 1 (often also called the intuitive or heuristic system) that is supposed to mediate intuitive thinking whereas System 2 (often also called the deliberate or analytic system) is supposed to mediate more deliberate thinking. Despite the popularity of dual process models, the approach is also criticized (e.g., De Neys & Glumicic, 2008; Gigerenzer & Regier, 1996; Keren & Schul, 2009; Osman, 2013). One key concern is that the framework lacks a precise processing specification of the two systems. A critical issue is the fact that the nature of the interaction between the two systems is not clear. Traditionally there has been some debate between proponents of a serial and parallel view. The parallel view entails that both systems are always activated simultaneously from the start of the reasoning process (Epstein, 1994; S. A. Sloman, 1996b). The serial model entails that people initially only activate System 1 and optional System 2 activation occurs later in the reasoning process (Evans & Stanovich, 2013; Kahneman, 2011). More recently, so-called hybrid models have been put forward (e.g., Bago & De Neys, 2017; Banks, 2018; De Neys, 2012; Handley & Trippas, 2015; Pennycook, Fugelsang, & Koehler, 2015; Thompson & Newman, 2018; Trippas & Handley, 2018). Simply put, these hybrid models posit that the response that is traditionally expected to be calculated by System 2 can also be cued by System 1. System 1 would generate different types of intuitions such that possible conflict between them can be detected early in the reasoning process without slow System 2 computations. To illustrate these different views, consider the following reasoning problem: You are told that there is a sample of 995 females and 5 males. Next, you’re told that one person (“Person X”) got drawn randomly from the sample and you’re informed that we know that this person X is a doctor. You are then asked whether it is more likely that Person X is male

158 Bago Bence – Thèse de doctorat - 2018 or female. This example is based on Tversky and Kahneman’s (1974) famous base-rate neglect problems. Intuitively, many people will tend to say that Person X is a doctor based on stored stereotypical associations cued by the descriptive information (“Doctors are male”). In case your only piece of information would be the job description of the person that might be a fair guess. In general, there are more male than female doctors. However, there are also female doctors and in the problem premises you were explicitly told that there were far more females than males in the sample where Person X was drawn from. If you take this extreme base-rate information into account this should push the scale to the “female” side. However, decades of studies have shown that people often fail to respect elementary logical considerations such as the base-rate principle and give the intuitive or so-called “heuristic” response that is cued by their stereotypical prior beliefs (e.g., Kahneman, 2011). Traditional serial and parallel dual process models have typically assumed that taking logico-mathematical principles into account and giving the response favored by the base- rates, for example, requires System 2 deliberation. The key idea is that because System 2 operations are demanding and slow, most people will not wait for the slow process to complete or will simply refrain from engaging in it altogether. Consequently, they end up being biased and give the heuristic System 1 response. The hybrid model entails that people can also process the logical response intuitively. Hence, System 1 will cue at least two intuitive responses: a “heuristic” response based-on stereotypical associations and a “logical” intuitive response based on automatically activated elementary knowledge of logico- mathematical principles. Both the hybrid and traditional models can explain that the heuristic response will typically dominate: the traditional models because the logical response will not (yet) be computed at the time of decision; the hybrid model because the heuristic response can have a higher activation level (Bago & De Neys, 2017; Pennycook et al., 2015). However, the key difference is that the intuitive processing of logical features in the hybrid model implies that it allows reasoners to detect instantly that there are conflicting responses at play early on in the reasoning process without any engagement of the slow System 2. Recent behavioral studies that aimed to test these different models have provided some initial support for the hybrid view (e.g., Franssens & De Neys, 2009; Johnson, Tubau, & De Neys, 2016; Nakamura & Kawaguchi, 2016; Pennycook, Trippas, Handley, & Thompson, 2014; Thompson & Johnson, 2014; Trippas, Handley, Verde, & Morsanyi, 2016; Trippas, Thompson, & Handley, 2017). For example, conflict detection studies have contrasted how people process classic reasoning problems in which an intuitively cued heuristic response

159 Bago Bence – Thèse de doctorat - 2018 conflicts with elementary logical considerations (i.e., conflict problems) and control no- conflict problems. In the control versions small content transformations guarantee that the intuitively cued heuristic response is also logically correct. For example, one can easily create a no-conflict control version of the introductory base-rate problem by switching the base-rates around (e.g., you are told that person X is a doctor but is drawn from a sample with 995 males and 5 females). In this case both base-rate considerations and stereotypical associations triggered by the job description cue the exact same response. Results show that people are sensitive to the presence of conflict as evidenced by increased response times (e.g., De Neys & Glumicic, 2008), decreased confidence (e.g., De Neys, Cromheeke, & Osman, 2011), or activation of brain regions that have long been assumed to mediate conflict detection (e.g., Anterior Cingulate Cortex, e.g., De Neys, Vartanian, & Goel, 2008; Simon, Lubin, Houdé, & De Neys, 2015; Vartanian et al., in press). Critically, these effects are observed even when people are put under time-pressure or cognitive load so that possible System 2 processing is experimentally minimized (e.g., Bago & De Neys, 2017; Franssens & De Neys, 2009; Howarth, Handley, & Walsh, 2016; Johnson, Tubau, & De Neys, 2016; Newman, Gibb, & Thompson, 2017; Pennycook et al., 2015; Thompson & Johnson, 2014). In sum, these conflict sensitivity findings suggest that base- rates and other logico-mathematical aspects of the reasoning problem are processed even when people have to rely on mere System 1 processing. This conclusion has been validated with a range of behavioral paradigms (e.g., Handley & Trippas, 2015; Trippas, Handley, Verde, & Morsanyi, 2016; Trippas, Thompson, & Handley, 2017; but see also Mata, Ferreira, Voss, & Kollei, 2017; Pennycook, Fugelsang, & Koehler, 2012; Travers, Rolison, & Feeney, 2016). However, all these behavioral studies face an intrinsic limitation: by definition, they are all response dependent. For example, confidence measures are typically collected post response. Likewise, response time measurements require overt response generation. Consequently, even when applying time pressure manipulations or minimal “rapid-response” task versions designed to allow for fast response generation (e.g., Pennycook, Cheyne, Barr, Koehler, & Fugelsang, 2014), it still takes at the very least several seconds before an overt response has been selected in a reasoning task. However, if the fast System 1 is indeed processing base-rate and other logical task features intuitively, it should be possible to find signs of early conflict sensitivity much earlier in the reasoning process, before the actual response has been given.

160 Bago Bence – Thèse de doctorat - 2018

Banks and Hope (2014) were the first to realize the potential of electroencephalogram (EEG) recordings and their unique temporal resolution in this respect. Banks and Hope presented participants with syllogisms in which the logical validity of the conclusions could conflict with a heuristic response cued by the believability of the conclusion. For example, an illustration of a conflict problem would be a valid syllogism with an unbelievable conclusion (e.g., “All mammals can walk. Whales are mammals. Therefore, whales can walk.”). An illustration of a no-conflict problem would be a valid syllogism with a believable conclusion (e.g., “All flowers need light. Roses are flowers. Therefore, roses need light”). By time- locking an event-related potentials (ERPs) analysis to the presentation of the last word of the conclusion (i.e., the exact point at which belief-logic conflict could occur), Banks and Hope could test whether early electrophysiological activation differed as a function of the conflict status of the problem. Such early conflict sensitivity would be expected if fast System 1 operations process the logical status of the problem. If slow System 2 processing is required, then detection of logic/belief conflict should occur much later in the reasoning process. Results pointed to very early conflict sensitivity after a mere 200 ms were elapsed: In contrast with no-conflict problems, the conflict trials gave rise to a reduced N2 and enhanced P3 component. The N2 and P3 are well-known negative and positive deflections that occur between 200-350 ms and 300-500ms after the event, respectively, and have been associated with information monitoring, control, and updating processes (e.g., Borst, Simon, Vidal, & Houdé, 2013; Folstein & Van Petten, 2008; Polich, 2007; Ullsperger, Fischer, Nigbur, & Endrass, 2014; Yeung & Summerfield, 2012). The Banks and Hope (2014) early conflict sensitivity findings indicate that logical reasoning—a process that is traditionally believed to require slow System 2 computations— can be literally accomplished in a split second. This fits with the hybrid dual process model’s postulation of intuitive logical processing (Banks, 2018). However, to draw strong theoretical conclusions it is important to establish whether the results are robust. To avoid confusion, Banks and Hope (2014) were obviously not the first to study reasoning processes with EEG per se (e.g., Bonnefond, Kaliuzhna, Van der Henst, & De Neys, 2014; Bonnefond & Van der Henst, 2009, 2013, Luo et al., 2008, 2013; Luo, Yang, Du, & Zhang, 2011; Malaia, Tommerdahl, & McKee, 2015). However, the problem is that these prior studies were not specifically designed to test between different dual process models. For example, many studies used a design that was time-locked to the response generation (e.g., Luo et al., 2013) or initial presentation of the problem premises (e.g., Luo et al., 2011; Luo et al., 2008). This complicates testing for early conflict sensitivity (i.e., participants are still reading the premises

161 Bago Bence – Thèse de doctorat - 2018 or already responded). In addition, many studies did not manipulate belief-logic conflict experimentally (e.g., Bonnefond, Kaliuzhna, Van der Henst, & De Neys, 2014; Bonnefond & Van der Henst, 2009, 2013; Malaia, Tommerdahl, & McKee, 2015). In sum, to draw clear conclusions it is important to test the generalizability and robustness of the initial Banks and Hope (2014) findings. The present paper addresses this issue. We focused on the popular base-rate task and tested whether the N2 and P3 showed early sensitivity to the intrinsic conflict between the response cued by the base-rates and the stereotypical description. Our rationale for choosing the base-rate task was that the task has been extensively used in behavioral conflict detection studies. These studies presented abundant behavioral evidence for the intuitive nature of the base-rate processing in the task (e.g., Bago & De Neys, 2017; Franssens & De Neys, 2009; Pennycook, Trippas, Handley, & Thompson, 2014; Thompson & Johnson, 2014). Hence, if Banks and Hope are correct in that the N2 and P3 reflect early conflict sensitivity, we should a fortiori observe it in the base-rate task. A second objective of the study was to test whether correct and incorrect responders show differential conflict sensitivity. As most reasoning and EEG studies, Banks and Hope’s (2014) work only focused on correctly solved trials. However, behavioral studies have indicated that intuitive conflict sensitivity is not only observed for correct but also for incorrect conflict responses (e.g., Bago & De Neys, 2017; Franssens & De Neys, 2009; Pennycook et al., 2015; Thompson & Johnson, 2014; but see also Aczel, Szollosi, & Bago, 2016; Mata et al., 2017; Travers et al., 2016). Hence, by including both correctly and incorrectly solved conflict trials in our analysis we wanted to test whether the N2 and P3 effects were observed irrespective of response accuracy.

METHOD

Participants

In total, 31 participants took part in this experiment (27 female, M = 23.4 year, SD = 4.2 year). All participants were right handed and had normal or corrected vision. None of them reported to have had neurological surgery or any known neurological or psychiatric problems. All of the participants were native English speaking North-American current or

162 Bago Bence – Thèse de doctorat - 2018 former university students. All participants provided written informed consent and were tested in accordance with national and international norms governing the use of human research participants.

Material and Procedure

Reasoning task. Participants solved a total of 66 base-rate problems. All problems were taken from Pennycook et al. (2015). In each problem participants received a description of the composition of a sample (e.g., “This study contained I.T engineers and professional boxers”), base-rate information (e.g., “There were 995 engineers and 5 professional boxers”) and a description that was designed to cue a stereotypical association (e.g. “This person is strong”). Participants’ task was to indicate to which group the person most likely belonged. The problem presentation format was based on Pennycook et al.'s (2014) rapid- response paradigm. In this paradigm, the descriptive information consists of a neutral name (“Person L”) and a single word personality trait (e.g., “strong” or “funny”) that was designed to trigger the stereotypical association. The following illustrates the full problem format:

This study contains clowns and accountants. Person 'L' is funny. There are 995 clowns and 5 accountants. Is Person 'L' more likely to be: 1) A clown 2) An accountant

Each trial started with the presentation of a fixation cross for 1000 ms. After the fixation cross disappeared, the sentence which specified the two groups appeared for 2000 ms. Then the descriptive information appeared for another 2000 ms while the first sentence remained on the screen. Finally, the last sentence specifying the base-rates appeared together with the question and two response alternatives. Note that we presented the base-rates and question together (rather than presenting the base-rate for 2000 ms first) to minimize the possibility that some participants would start solving the problem during presentation of the base-rate information. Once the base-rates and question were presented participants were able to select their answer by pushing a button corresponding to the selected response. There was a

163 Bago Bence – Thèse de doctorat - 2018

7000 ms response deadline on each problem. Note that in 0.6% of the trials participants missed the deadline. These trials were discarded from further analysis. Half of the presented problems were conflict items and the other half were no-conflict items. In no-conflict items the base-rate probabilities and the descriptive information cued the same response. In conflict items the descriptive information and the base-rate probabilities cued different responses. As Pennycook et al. (2014) we used three slightly altered base-rate levels (i.e., 997/3, 996/4, 995/5) to make the task less repetitive. Each ratio was used with equal frequency. Problems were presented in random order. All material was extensively pretested (see Pennycook et al., 2015). Pennycook et al. made sure that words that were selected to cue a stereotypical association consistently did so while avoiding extremely diagnostic cues. Such a non-extreme, moderate association is important. For convenience and consistency with prior work we label the response that is in line with the base-rates as the correct response. Critics of the base-rate task (e.g., Gigerenzer, Hell, & Blank, 1988) have long pointed out that if reasoners adopt a Bayesian approach and combine the base-rate probabilities with the stereotypical description, this can lead to interpretational complications when the description is extremely diagnostic. For example, imagine that we have an item with males and females as the two groups and give the description that Person ‘A’ is ‘pregnant’. Now, in this case, one would always need to conclude that Person ‘A’ is a woman, regardless of the base-rates. The more moderate descriptions (such as ‘kind’ or ‘funny’) help to avoid this potential problem. In addition, the extreme base-rates (997/3, 996/4, or 995/5) that were used in the current study further help to guarantee that even a very approximate Bayesian reasoner would need to pick the response cued by the base-rates (see De Neys, 2014).

EEG recording and preprocessing. The electroencephalogram (EEG) was recorded from a 256-channel HydroCel Geodesic Sensor Net (Electrical Geodesics Inc., Eugene, Oregon, USA) containing electrodes imbedded in small sponges soaked in a potassium chloride saline solution. Continuous EEG was acquired through a DC amplifier (Net Amps 300 1.0.1, EGI) and digitized at a sampling rate of 500 Hz. A common reference at the vertex was used during acquisition and electrode impedances was kept below 100 k. Eye-blinks and eye-movements were monitored via pairs of channels (included in the net) covering the face area.

164 Bago Bence – Thèse de doctorat - 2018

All processing stages described below were performed using EEGLab (Delorme & Makeig, 2004). Activity from all electrodes was re-referenced to the average. The raw EEG data was passed through a high pass filter (0.5 Hz) and a low pass filter (30 Hz). Muscular artefacts and ocular artefacts were removed from continuous EEG data using Artefact Subspace Reconstruction (ASR implemented in the EEGLab plugin “clean_rawdata”, see Mullen et al., 2015). The continuous EEG was then segmented from -200 ms to 700 ms relative to the onset of the presentation of the base-rates. The epochs were baseline corrected using the mean prestimulus voltage in the 200ms prestimulus period. N2 and P3 amplitudes were defined as the average voltage in pre-specified time windows. The N2 was defined as the average in the 175-250ms time interval, while the P3 was defined as the average voltage in the 300-500ms interval following stimulus onset (see Rietdijk, Franken, & Thurik, 2014). As in previous work (e.g., Banks & Hope, 2014) we calculated the mean amplitudes at frontal central and parietal electrode sites (roughly corresponding to Fz, Cz and Pz) where N2 and P3 are typically maximal. As we had 256 electrodes, when calculating the amplitude averages for each electrode site we took into account all electrodes which were located directly next to the electrode in question. Hence, we took into account the following electrodes (numbers corresponding to the electrode map of the HydroCel Geodesic Sensor Net; Parietal (Pz, 89, 100, 110, 119, 128, 129, 130), Central (Cz, 9, 45, 81, 132, 186), Frontal (Fz, 13, 14, 20, 22, 27, 28)). We performed a trial-based analysis, using mixed effect models and the lmerTest package in R (Kuznetsova et al., 2015). Note that this trial based analysis does not change the ERP averages compared to a more traditional ERP analysis where the averages are first calculated at the individual level. However, it increases the probability of detecting real effects as it takes into account individual trials and thus increases statistical power (Baayen, Davidson, & Bates, 2008; Quené & Van den Bergh, 2008; for applications with ERP data, see: Tremblay & Newman, 2015). We entered the random effect of participants and the random effect of electrodes in the model to filter for noise introduced by individual electrodes or participants.

165 Bago Bence – Thèse de doctorat - 2018

RESULTS

Behavioral results

Accuracy

In line with previous studies, participants were frequently biased by the stereotypical description on the conflict problems: Overall conflict problem accuracy reached 66.4% (SD = 47.2). As expected, accuracy on the no-conflict problems in which the stereotypical and base- rate response agreed was at ceiling with an overall accuracy rate of 97.6% (SD = 15.3), χ2 (1) = 1008.4, p < 0.0001, b = 4.06. Note that in all remaining behavioral and ERP analyses we discarded the few (i.e., 2.4%) incorrectly solved no-conflict trials. In no-conflict trials, the base-rate and stereotypical information points to the same correct response. Therefore, incorrect responses cannot be interpreted unequivocally and are typically discarded in conflict detection studies (De Neys & Glumicic, 2008; Pennycook et al., 2015). We will refer to the correct no-conflict problems as the baseline problems.

Table 1. Overview of the latency results. The table shows the geometrical means (SD, and SE for mean differences) as well as the difference between no-conflict correct (baseline) trials and correctly and incorrectly solved conflict trials.

Correct Incorrect Conflict 1929.2 ms (1.8) 1706.7 ms (2.1)

No-conflict 1556.1 ms (1.8) 2124.5 ms (2.2)

Difference score -361.1 ms (0.06) -150.6 ms (0.09)

Latencies

All latencies were logarithmically (log10) transformed prior to analysis. Table 1 shows the results. We found that participants took overall more time to solve conflict than the

166 Bago Bence – Thèse de doctorat - 2018 baseline no-conflict trials, χ2 (1) = 116.44, p < 0.0001, b = -0.07. More critically, we found the increase both for correctly, χ2 (1) = 113.51, p < 0.0001, b = -0.08, and incorrectly, χ2 (1) = 69.35, p < 0.0001, b = -0.08., solved conflict problems. Increased latencies for conflict vs baseline no-conflict problems are typically taken as evidence for conflict sensitivity (De Neys & Glumicic, 2008; Pennycook et al., 2015). Hence, at the behavioral level our latency results replicate previously established evidence for conflict sensitivity with this task (De Neys, 2012; Pennycook et al., 2014, 2015).

ERP Amplitudes

Figure 1 gives a general overview of the grand average ERP waveforms for conflict and no-conflict baseline trials at our three electrode locations.

N2 amplitude

We first analyzed the overall contrast between no-conflict baseline and conflict problems (irrespective of conflict accuracy). As for statistical testing, we added electrode location (Frontal, Central, Parietal), conflict (conflict or no-conflict problem), and their interaction to the model. We found a significant effect of electrode location, χ2 (2) = 30.76, p < 0.001, a significant effect of conflict, χ2 (3) = 13.71, p < 0.001, and also a significant interaction effect, χ2 (5) = 6.34, p = 0.04. As we had a significant interaction, we analyzed each electrode site separately. We found a significant difference between conflict and no- conflict trials at the Central, χ2 (1) = 14.7, p < 0.001, b = -0.37, and Parietal groups, χ2 (1) = 13.46, p < 0.001, b = -0.48, but not for the Frontal group, χ2 (1) = 0.44, p = 0.51, b = -0.08. In all of these groups, the N2 amplitude was more negative on conflict trials then on no-conflict trials. Hence, the presence of conflict between base-rates and description resulted in a more pronounced centro-parietal N2 which supports the idea that reasoners show early conflict sensitivity.

167 Bago Bence – Thèse de doctorat - 2018

Figure 1. Grand average ERP waveforms for conflict and no-conflict baseline trials at each of the three electrode locations of interest (Frontal, Central, Parietal).

Next, we also ran an exploratory analysis in which we separated the conflict trials by their accuracy to test whether the N2 findings differed for correct and incorrect responses. Thus, instead of conflict, we entered a variable “response category” with 3 levels (no-conflict correct responses, conflict correct and conflict incorrect responses) into the model. Results showed that both the main effect of electrode site, χ2 (2) = 30.76, p < 0.001, and response category, χ2 (4) = 13.72, p < 0.001, but not their interaction, χ2 (8) = 13.71, p = 0.07, significantly improved model fit. Follow-up test for the response category effect showed that in comparison with the no-conflict baseline trials the N2 across the 3 electrode sites was more negative for both correct conflict responses, b = -0.25, t = -3.26, p = 0.001, and incorrect conflict responses, b = -0.25, t = -2.36, p = 0.018. However, visual inspection of Figure 2 suggests that this effect might not be equally strong on all electrode locations; while there is a clear effect on the centro-parietal Central and Parietal groups, the effect on the Frontal group seems weaker. Given that the model interaction was marginally significant and the overall

168 Bago Bence – Thèse de doctorat - 2018 analysis also pointed to stronger conflict effects on centro-parietal electrodes, we analysed the effect of response category at each electrode location separately. Consistent with the visual trend, results showed that there were significant effects for the Parietal, χ2 (2) = 14.89, p < 0.001, and Central, χ2 (2) = 15.33, p < 0.001, sites but not at the Frontal one, χ2 (2) = 0.46, p = 0.79.

P3 amplitude

We used the same analysis approach as for the N2 amplitude. Hence, we first analyzed the overall contrast between no-conflict baseline and conflict problems (irrespective of accuracy). We added electrode location (Frontal, Central, Parietal), conflict (conflict or no- conflict problem) and their interaction to the model. Results pointed to a significant effect of electrode location χ2 (2) = 34.68, p < 0.001, but no effect of conflict, χ2 (3) = 3.57, p = 0.06, and also a significant effect of their interaction, χ2 (5) = 16.6, p = 0.04. Given the interaction, we analyzed each electrode site separately. We found a significant difference between conflict and no-conflict trials in the Frontal group, χ2 (1) = 13.47, p = 0.001, b = 0.38, but not in the Central, χ2 (1) = 0.92, p = 0.34, b = -0.08, or Parietal groups, χ2 (1) = 3.4, p = 0.07, b = -0.21. Hence, P3 findings point to a more frontal conflict sensitivity following the centro-parietal N2. Next, we also ran an analysis in which we separated the conflict trials by their accuracy. As with the N2, we therefore entered a variable “response category” with 3 levels (no-conflict correct responses, conflict correct and conflict incorrect responses) into the models. We found a significant effect of electrode location, χ2 (2) = 34.7, p < 0.001, an effect of category, χ2 (4) = 11.2, p = 0.004, and a significant interaction, χ2 (8) = 20.3, p < 0.001. Analysis of the individual electrode sites indicated that there was no effect of category in Central, χ2 (2) = 2.2, p = 0.33, and Parietal sites, χ2 (2) = 4.37, p = 0.11. Consistent with the overall analysis, response category did have a significant effect at the frontal Frontal site, χ2 (2) = 24.6, p < 0.001. Follow-up test showed that P3 was significantly more positive for correct conflict trials than in the no-conflict baseline, b = 0.59, t = 4.86, p < 0.001. Although P3 amplitude was also more positive for incorrectly solved conflict than baseline no-conflict trials, the trend did not reach significance, b = -0.01, t = -0.09, p = 0.93. Hence, the frontal P3 conflict effect was specifically driven by correctly solved conflict trials.

169 Bago Bence – Thèse de doctorat - 2018

A. N2 results

B. P3 results

Figure 2. Average ERP amplitudes for N2 (A) and P3 (B) for all of the three response categories (No-conflict baseline, correct conflict, incorrect conflict) at each electrode location of interest (Frontal, Central, Parietal). Error bars are 95% confidence intervals.

170 Bago Bence – Thèse de doctorat - 2018

GENERAL DISCUSSION

In the present paper we used EEG to test for early conflict sensitivity during reasoning. We adopted base-rate problems in which a cued stereotypical response was either congruent or incongruent with the correct response that was cued by the base-rates. Results showed that solving problems in which the base-rates and stereotypical description cued conflicting responses resulted in an increased centro-parietal N2 and frontal P3. This early conflict sensitivity suggests that the critical base-rates can be processed fast without slow and deliberate System 2 reflection. Consistent with previous EEG work (Banks & Hope, 2014), these results lend credence to recent hybrid dual process models entailing that the fast System 1 is processing both heuristic belief-based responses (e.g., stereotypes) and elementary logical principles (e.g., base-rates). Results also suggest that the early conflict sensitivity is observed both for correct and incorrect conflict responses. Although the P3 results did not reach significance for incorrect responders, the earlier N2 was observed regardless of response accuracy. Hence, this tentatively suggests that even incorrect responders manage to readily process the base-rate information. This supports earlier behavioral findings on the base-rate and other tasks suggesting that incorrect responding does not necessarily result from a failure to detect conflict (De Neys, 2012; Pennycook et al., 2015). With respect to the theoretical implications of our findings, one might suggest a possible escape hatch for the traditional dual process models. In theory, it could be argued that the deliberate System 2 reflection on the base-rates was already completed after 200 ms. However, reflective System 2 processes have been traditionally characterized as taking several seconds or even minutes to complete (Banks, 2018; Kahneman, 2011). We entail that if reflective processes during high-level reasoning are argued to be completed in a split- second, this defies the central dual process purpose of contrasting “Fast and Slow” thinking (Kahneman, 2011) and the use of operation speed as a defining characteristics of intuitive and deliberate processing. We believe that the present findings underscore that it is more fruitful for dual process theorists to incorporate processing of elementary logical principles such as the importance of base-rate information into the fast and intuitive System 1 (e.g., Evans, 2018). This is precisely the core tenet of the hybrid dual process model (De Neys, 2018). Overall, our EEG results corroborate the findings of Banks and Hope (2014). Both studies indicate that the N2 and P3 show early sensitivity to heuristic/logic conflict during

171 Bago Bence – Thèse de doctorat - 2018 reasoning at the neural level. This lends general credence to the robustness of the initial Banks and Hope findings. However, for completeness we should point out that there were also some differences between the two studies. One example concerns the directionality of the N2 findings. Although Banks and Hope found that the N2 was affected by conflict, it was not in the direction they initially expected: the N2 was larger (more negative) for no-conflict than for conflict problems. The monitoring and control processes that the N2 is often believed to index typically result in a more negative N2 amplitude in those conditions where one is faced with cognitive conflict (e.g., Folstein & Van Petten, 2008; Ullsperger et al., 2014; Yeung & Summerfield, 2012). Banks and Hope suggested that their finding might result from the peculiarities of the specific paradigm they adopted and might not be reliable. We simply note in this respect that in the present study the N2 did show the expected pattern and was more negative on the conflict than no-conflict problems. One interesting question for further research concerns the precise nature of the cognitive processes that gave rise to the N2 and P3 potentials. We noted that the N2 and P3 have been frequently linked to control processes such as monitoring, updating, and response inhibition (e.g., Borst et al., 2013; Folstein & Van Petten, 2008; Polich, 2007; Ullsperger et al., 2014; Yeung & Summerfield, 2012). All these processes can be conceived to be implicated in the detection of conflict between a cued stereotypical and base-rate response: In order to detect such conflict, monitoring processes must be engaged, detection of conflict might require updating of one’s problem representation, and generation of a single answer can imply inhibition of one of the conflicting responses. The present study does not allow us— and was not designed to—make more specific claims about the precise contribution of each component and to disentangle different hypotheses. For example, we observed that in contrast to correct conflict problem responders, incorrect responders’ initial N2 was not followed by a (significant) P3. Based on the assumption that the N2 primarily signals presence of conflict and the P3 response inhibition, one very tentative explanation is that incorrect responders detect conflict but are subsequently less efficient at recruiting inhibitory processing (e.g., Houdé, 2000; Houdé & Borst, 2015; Simon et al., 2015). However, Banks (2018; Banks and Hope, 2014) reasoned that the P3 would rather reflect an updating process (e.g., Polich, 2007). Hence, the non-significant P3 for incorrect responders might as well indicate that incorrect responders are struggling with this updating process. A related point concerns the precise interpretation of our N2 potential. On the basis of the grand average waveforms (Figure 1) one could argue that this potential might be conceived as a reduced positivity (P2) rather than increased negativity (N2) for the conflict

172 Bago Bence – Thèse de doctorat - 2018 problems. Interestingly, the P2 has been associated with anticipation and selective attention processes (Luck & Hillyard, 1994; Ma, Li, Shen, & Qiu, 2015; Philips & Takeda, 2009). Studies suggest that an increase in selective attention results in a decreased P2; task conditions that require more selective attention typically give rise to a reduced P2 (Luck & Hillyard, 1994; Philips & Takeda, 2009). Hence, in this light one might note that rather than conflict detection per se, our early “N2/P2” might reflect the selective attention increase that enables or accompanies successful detection. Although these are interesting questions for further research the key point that was tested in the present study was whether during high-level reasoning we see an early divergence in the ERP signal in cases in which reasoners are confronted with conflict between cued heuristics and base-rate considerations. It is such early sensitivity—in combination with convergent behavioral findings—that presents a strong case against the traditional dual process assumption that taking the base-rates into account and detecting conflict with the cued heuristic response requires slow and demanding System 2 reflection. In closing, we believe that the present study nicely demonstrates how the temporal resolution of EEG can be used to inform fundamental theoretical debates in the reasoning field. Different dual process models make in essence different predictions about the time- course of intuitive and deliberate interaction (Bago & De Neys, 2017; De Neys, 2018; Evans, 2007; Travers et al., 2016). We hope that the current study further illustrates the potential of EEG and will lead to a more general adoption of the methodology in dual process research (Banks, 2018). We conclude that the presently available EEG evidence provides good support for a hybrid dual process model in which the fast System 1 cues responses that were traditionally believed to require slow System 2 reflection. In general, this requires us to upgrade the role of System 1 and questions the long-held belief that taking basic logico- mathematical principles into account necessarily requires us to engage in slow deliberative thinking.

173 Bago Bence – Thèse de doctorat - 2018

REFERENCES

Aczel, B., Szollosi, A., & Bago, B. (2016). Lax monitoring versus logical intuition: The determinants of confidence in conjunction fallacy. Thinking & Reasoning, 22(1), 99– 117. https://doi.org/10.1080/13546783.2015.1062801 Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59(4), 390– 412. Bago, B., & De Neys, W. (2017). Fast logic?: Examining the time course assumption of dual process theory. Cognition, 158, 90–109. Banks, A. (2018). Comparing dual process theories: evidence from event-related potentials. In W. De Neys (Ed.), Dual Process Theory 2.0. Oxon, UK: Routledge. Banks, A. P., & Hope, C. (2014). Heuristic and analytic processes in reasoning: An event‐ related potential study of belief bias. Psychophysiology, 51(3), 290–297. Bonnefond, M., Kaliuzhna, M., Van der Henst, J.-B., & De Neys, W. (2014). Disabling conditional inferences: an EEG study. Neuropsychologia, 56, 255–262. Bonnefond, M., & Van der Henst, J.-B. (2009). What’s behind an inference? An EEG study with conditional arguments. Neuropsychologia, 47(14), 3125–3133. Bonnefond, M., & Van der Henst, J.-B. (2013). Deduction electrified: ERPs elicited by the processing of words in conditional arguments. Brain and Language, 124(3), 244–256. Borst, G., Simon, G., Vidal, J., & Houdé, O. (2013). Inhibitory control and visuo-spatial reversibility in Piaget’s seminal number conservation task: a high-density ERP study. Frontiers in Human Neuroscience, 7, 920. De Neys, W. (2012). Bias and conflict a case for logical intuitions. Perspectives on Psychological Science, 7(1), 28–38. De Neys, W. (2018). Bias, conflict, and fast logic: Towards a hybrid dual process future? In W. De Neys (Ed.), Dual Process Theory 2.0. Oxon, UK: Routledge. De Neys, W., Cromheeke, S., & Osman, M. (2011). Biased but in doubt: Conflict and decision confidence. PloS One, 6(1), e15954. De Neys, W., & Glumicic, T. (2008). Conflict monitoring in dual process theories of thinking. Cognition, 106(3), 1248–1299. De Neys, W., Vartanian, O., & Goel, V. (2008). Smarter Than We Think When Our Brains Detect That We Are Biased. Psychological Science, 19(5), 483–489. Delorme, A., & Makeig, S. (2004). EEGLAB: an open source toolbox for analysis of single- trial EEG dynamics including independent component analysis. Journal of Neuroscience Methods, 134(1), 9–21. Epstein, S. (1994). Integration of the cognitive and the psychodynamic unconscious. American Psychologist, 49(8), 709–724. Evans, J. S. B. (2007). On the resolution of conflict in dual process theories of reasoning. Thinking & Reasoning, 13(4), 321–339. Evans, J. S. B., & Stanovich, K. E. (2013). Dual-process theories of higher cognition advancing the debate. Perspectives on Psychological Science, 8(3), 223–241.

174 Bago Bence – Thèse de doctorat - 2018

Evans, J. S. B. (2018). Dual Process Theories: Perspectives and problems. In W. De Neys (Ed.), Dual Process Theory 2.0 (pp. 137–155). Oxon, UK: Routledge. Folstein, J. R., & Van Petten, C. (2008). Influence of cognitive control and mismatch on the N2 component of the ERP: a review. Psychophysiology, 45(1), 152–170. Franssens, S., & De Neys, W. (2009). The effortless nature of conflict detection during thinking. Thinking & Reasoning, 15(2), 105–128. Gigerenzer, G., Hell, W., & Blank, H. (1988). Presentation and content: The use of base-rates as a continuous variable. Journal of Experimental Psychology: Human Perception and Performance, 14(3), 513–525. Gigerenzer, G., & Regier, T. (1996). How do we tell an association from a rule? Comment on Sloman (1996). Psychological Bulletin, 119(1), 23–26. Greene, J. (2013). Moral tribes: emotion, reason and the gap between us and them. New York, NY: Penguin Press. Handley, S. J., & Trippas, D. (2015). Chapter Two-Dual Processes and the Interplay between Knowledge and Structure: A New Parallel Processing Model. Psychology of Learning and Motivation, 62, 33–58. Houdé, O. (2000). Inhibition and cognitive development: Object, number, categorization, and reasoning. Cognitive Development, 15(1), 63–73. Houdé, O., & Borst, G. (2015). Evidence for an inhibitory-control theory of the reasoning brain. Frontiers in Human Neuroscience, 9, 148. Howarth, S., Handley, S. J., & Walsh, C. (2016). The logic-bias effect: The role of effortful processing in the resolution of belief–logic conflict. Memory & Cognition, 44(2), 330– 349. Johnson, E. D., Tubau, E., & De Neys, W. (2016). The Doubting System 1: Evidence for automatic substitution sensitivity. Acta Psychologica, 164, 56–64. Kahneman, D. (2011). Thinking, fast and slow. New York, NY: Farrar, Straus and Giroux. Keren, G., & Schul, Y. (2009). Two is not always better than one: A critical evaluation of two-system theories. Perspectives on Psychological Science, 4(6), 533–550. Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B. (2015). Package ‘lmerTest.’ R Package Version, 2(0). Luck, S. J., & Hillyard, S. A. (1994). Electrophysiological correlates of feature analysis during visual search. Psychophysiology, 31(3), 291–308. Luo, J., Liu, X., Stupple, E. J., Zhang, E., Xiao, X., Jia, L., … Zhang, Q. (2013). Cognitive control in belief‐laden reasoning during conclusion processing: An ERP study. International Journal of Psychology, 48(3), 224–231. Luo, J., Yang, Q., Du, X., & Zhang, Q. (2011). Neural correlates of belief-laden reasoning during premise processing: An event-related potential study. Neuropsychobiology, 63(2), 112–118. Luo, J., Yuan, J., Qiu, J., Zhang, Q., Zhong, J., & Huai, Z. (2008). Neural correlates of the belief-bias effect in syllogistic reasoning: an event-related potential study. Neuroreport, 19(10), 1073–1078. Ma, Q., Li, D., Shen, Q., & Qiu, W. (2015). Anchors as semantic primes in value construction: an EEG study of the anchoring effect. PloS One, 10(10), e0139954.

175 Bago Bence – Thèse de doctorat - 2018

Malaia, E., Tommerdahl, J., & McKee, F. (2015). Deductive versus probabilistic reasoning in healthy adults: An EEG analysis of neural differences. Journal of Psycholinguistic Research, 44(5), 533–544. Mata, A., Ferreira, M. B., Voss, A., & Kollei, T. (2017). Seeing the conflict: an attentional account of reasoning errors. Psychonomic Bulletin & Review. Retrieved from http://dx.doi.org/10.3758/s13423-017-1234-7 Mullen, T. R., Kothe, C. A., Chi, Y. M., Ojeda, A., Kerth, T., Makeig, S., … Cauwenberghs, G. (2015). Real-time neuroimaging and cognitive monitoring using wearable dry EEG. IEEE Transactions on Biomedical Engineering, 62(11), 2553–2567. Nakamura, H., & Kawaguchi, J. (2016). People like logical truth: testing the intuitive detection of logical value in basic propositions. PloS One, 11(12), e0169166. Newman, I., Gibb, M., & Thompson, V. A. (2017). Rule-based reasoning is fast and belief- based reasoning can be slow: Challenging current explanations of belief -bias and base-rate neglect. Journal of Experimental Psychology: Learning, Memory, and Cognition, 43(7), 1154–1170. Osman, M. (2013). A Case Study Dual-Process Theories of Higher Cognition—Commentary on Evans & Stanovich (2013). Perspectives on Psychological Science, 8(3), 248–252. Pennycook, G., Cheyne, J. A., Barr, N., Koehler, D. J., & Fugelsang, J. A. (2014). Cognitive style and religiosity: The role of conflict detection. Memory & Cognition, 42(1), 1–10. Pennycook, G., Fugelsang, J. A., & Koehler, D. J. (2012). Are we good at detecting conflict during reasoning? Cognition, 124(1), 101–106. Pennycook, G., Fugelsang, J. A., & Koehler, D. J. (2015). What makes us think? A three- stage dual-process model of analytic engagement. Cognitive Psychology, 80, 34–72. Pennycook, G., Trippas, D., Handley, S. J., & Thompson, V. A. (2014). Base-rates: both neglected and intuitive. Journal of Experimental Psychology: Learning, Memory, and Cognition, 40(2), 544–554. Philips, S., & Takeda, Y. (2009). An EEG/ERP study of efficient versus inefficient visual search. In Proceedings of the Annual Meeting of the Cognitive Science Society (Vol. 31). Polich, J. (2007). Updating P300: an integrative theory of P3a and P3b. Clinical Neurophysiology, 118(10), 2128–2148. Quené, H., & Van den Bergh, H. (2008). Examples of mixed-effects modeling with crossed random effects and with binomial data. Journal of Memory and Language, 59(4), 413–425. Rand, D. G., Greene, J. D., & Nowak, M. A. (2012). Spontaneous giving and calculated greed. Nature, 489(7416), 427–430. Rietdijk, W. J., Franken, I. H., & Thurik, A. R. (2014). Internal consistency of event-related potentials associated with cognitive control: N2/P3 and ERN/Pe. PloS One, 9(7), e102672. Simon, G., Lubin, A., Houdé, O., & De Neys, W. (2015). Anterior cingulate cortex and intuitive bias detection during number conservation. Cognitive Neuroscience, 6(4), 158–168. https://doi.org/10.1080/17588928.2015.1036847 Sloman, S. A. (1996). The empirical case for two systems of reasoning. Psychological Bulletin, 119(1), 3.

176 Bago Bence – Thèse de doctorat - 2018

Stanovich, K. E. (1999). Who is rational?: Studies of individual differences in reasoning. Lawrence Erlbaum. Thompson, V. A., & Johnson, S. C. (2014). Conflict, metacognition, and analytic thinking. Thinking & Reasoning, 20(2), 215–244. Thompson, V., & Newman, I. (2018). Logical intuitions and other conundra for dual process theories. In W. De Neys (Ed.), Dual Process Theory 2.0. Oxon, UK: Routledge. Travers, E., Rolison, J. J., & Feeney, A. (2016). The time course of conflict on the Cognitive Reflection Test. Cognition, 150, 109–118. Tremblay, A., & Newman, A. J. (2015). Modeling nonlinear relationships in ERP data using mixed‐effects regression with R examples. Psychophysiology, 52(1), 124–139. Trippas, D., & Handley, S. (2018). The parallel processing model of belief bias: review and extensions. In W. De Neys (Ed.), Dual Process Theory 2.0. Oxon, UK: Routledge. Trippas, D., Handley, S. J., Verde, M. F., & Morsanyi, K. (2016). Logic Brightens My Day: Evidence for Implicit Sensitivity to Logical Validity. Journal of Experimental Psychology: Learning, Memory, and Cognition, 1448–1457. http://dx.doi.org/10.1037/ xlm0000248 Trippas, D., Thompson, V. A., & Handley, S. J. (2017). When fast logic meets slow belief: Evidence for a parallel-processing model of belief bias. Memory & Cognition, 45(4), 539–552. Tversky, A., & Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science, 185(4157), 1124–1131. Ullsperger, M., Fischer, A. G., Nigbur, R., & Endrass, T. (2014). Neural mechanisms and temporal dynamics of performance monitoring. Trends in Cognitive Sciences, 18(5), 259–267. Vartanian, O., Beatty, E., Smith, I., Blackler, K., Lam, Q., Forbes, S., & De Neys, W. (in press). The Reflective Mind: Examining Individual Differences in Susceptibility to Base-rate Neglect with fMRI. Journal of Cognitive Neuroscience. Yeung, N., & Summerfield, C. (2012). Metacognition in human decision-making: confidence and error monitoring. Phil. Trans. R. Soc. B, 367(1594), 1310–1321.

177 Bago Bence – Thèse de doctorat - 2018

Chapter 5: The intuitive greater good: Testing the corrective dual process model of moral cognition

Building on the old adage that the deliberate mind corrects the emotional heart, the influential dual process model of moral cognition has posited that utilitarian responding to moral dilemmas (i.e., choosing the greater good) requires deliberate correction of an intuitive deontological response. In the present paper we present three studies that force us to revise this longstanding “corrective” dual process assumption. We used a two-response paradigm in which participants had to give their first, initial response to moral dilemmas under time- pressure and cognitive load. Next, participants could take all the time to reflect on the problem and give a final response. This allowed us to identify the intuitively generated response that preceded the final response given after deliberation. Results consistently show that in the vast majority of cases (+70%) in which people opt for a utilitarian response after deliberation, the utilitarian response is already given in the initial phase. Hence, utilitarian responders do not need to deliberate to correct an initial deontological response. Their intuitive response is already utilitarian in nature. We show how this leads to a revised model in which moral judgments depend on the absolute and relative strength differences between competing deontological and utilitarian intuitions.

Based on Bago, B. & De Neys, W., (submitted). The intuitive greater good: Testing the corrective dual process model of moral cognition.

Supplementary material to this chapter can be found in Appendix C.

178 Bago Bence – Thèse de doctorat - 2018

INTRODUCTION

In the spring of 2013 the Belgian federal health minister, Laurette Onkelinckx, faced a tough decision. In a highly mediatized case, the seven year old Viktor Ameys who suffered from a very rare immune system disorder, begged her to approve reimbursement of the drug Soliris – a life-saving but extremely expensive treatment costing up to $400 000 a year (Dolgin, 2011). By not approving reimbursement the health minister was basically condemning an innocent seven year old to death. On the other hand, the federal health care budget is limited. Money that is spent on covering Viktor’s drugs cannot be spent on the reimbursement of drugs for more common, less expensive disorders that threaten the lives of far more patients. Hence, saving Viktor implied not saving many others (London, 2012). Eventually, the health minister—herself a mother of three—felt she could not bring herself to let Viktor die and the Soliris reimbursement was approved (Schellens, 2015). The Viktor case illustrates a classic moral dilemma in which utilitarian and deontological considerations are in conflict. The moral principle of utilitarianism (e.g., Mill & Bentham, 1987) implies that the morality of an action is determined by its consequences. Therefore, harming an individual can be judged acceptable, if it prevents comparable harm to a greater number of people. One performs a cost benefit analysis and chooses the greater good. Hence, from a utilitarian point of view it is morally acceptable to deny Viktor’s request and let him die because more people will be saved by reimbursing other drugs. Alternatively, the moral perspective of deontology (e.g., Kant, 1785/2002) implies that the morality of an action depends on the intrinsic nature of the action. Here harming someone is considered wrong regardless of its potential benefits. Hence, from a deontological point of view, not saving Viktor would always be judged unacceptable. In recent years, cognitive scientists in the fields of psychology, philosophy, and have started to focus on the cognitive mechanisms underlying utilitarian and deontological reasoning (e.g., Conway & Gawronski, 2013; Greene, 2015; Kahane, 2015; Moore, Stevens, & Conway, 2011; Nichols, 2004; Valdesolo & DeSteno, 2006). A lot of this work has been influenced by the popular dual-process model of thinking (Evans & Stanovich, 2013; Evans, 2008; Kahneman, 2011; Sloman, 1996), which describes cognition as an interplay of fast, effortless, and intuitive (i.e., so-called “System 1”) processing on one hand, and slow, cognitively demanding, deliberate (i.e., so-called “System 2”) processing on the other. Inspired by this dichotomy the dual process model of moral

179 Bago Bence – Thèse de doctorat - 2018 reasoning (Greene, 2013; Greene & Haidt, 2002) has associated utilitarian judgments with deliberate System 2 processing and deontological judgments with intuitive System 1 processing. The core idea is that giving a utilitarian response to moral dilemmas requires that one engages in System 2 thinking and allocates cognitive resources to override an intuitively cued intuitive System 1 response that primes us not to harm others (Białek & De Neys, 2017). There is little doubt that the dual process model of moral cognition presents an appealing account and it has proved to be highly influential (Sloman, 2015). However, the framework is also criticized (e.g., Baron, 2017; Baron, Scott, Fincher, & Metz, 2015; Białek & De Neys, 2017; Kahane, 2015; Tinghög et al., 2016; Trémolière & Bonnefon, 2014). A key problem is that the processing specifications of the alleged System 1 and 2 operations are not clear. A critical issue concerns the time-course of utilitarian responding. In a typical moral dilemma, giving a utilitarian response is assumed to require the correction of the fast, initial System 1 response. The idea is that our immediate System 1 gut-response is deontological in nature but that after some further System 2 deliberation we can replace it with a utilitarian response. Hence, the final utilitarian response is believed to be preceded by an initial deontological response. From an introspective point of view, this core “corrective” dual process assumption (De Neys, 2018) seems reasonable. When faced with a dilemma such as the Viktor case, it surely feels as if the “don’t kill Viktor” sentiment pops up instantly. We’re readily repulsed by the very act of sacrificing a young boy and correcting that judgment by taking the greater good into account seems to require more time and effort. Unfortunately, the available empirical evidence is less conclusive than our introspective impressions. Consider, for example, evidence from latency studies and time pressure manipulations. Some earlier studies found that utilitarian responses take more time than deontological ones (Greene, Sommerville, Nystrom, Darley, & Cohen, 2001). Likewise, experimentally limiting the time allowed to make a decision was also shown to reduce the number of utilitarian responses (Suter & Hertwig, 2011). However, in recent years conflicting findings have been reported (Baron & Gürçay, 2017; Gürçay & Baron, 2017; Tinghög et al., 2016). More critically, even if we were to unequivocally establish that utilitarian responses take more time than deontological responses, this does not imply that utilitarian responders generated the deontological response before arriving at the utilitarian one. They might have needed more time to complete the System 2 deliberations without ever having considered the deontological response. Neuroimaging studies have also explored the neural correlates of deontological and utilitarian reasoning (e.g., Greene, Nystrom, Engell, Darley, & Cohen, 2004; Shenhav &

180 Bago Bence – Thèse de doctorat - 2018

Greene, 2014). In a nutshell, results typically indicate that deontological judgments are associated with activation in brain areas that are known to be involved in emotional processing (e.g., amygdala) whereas utilitarian decisions seem to recruit brain areas associated with controlled processing (e.g., dorsolateral prefrontal cortex). This imaging work suggests that deontological and utilitarian responses might rely on a different type of processing. However, it does not allow us to make strong inferences concerning their precise time course. A more direct test of the corrective dual process assumption comes from mouse tracking studies (Gürçay & Baron, 2017; Koop, 2013). After having read a moral dilemma, participants in these studies are asked whether they favor a deontological or utilitarian decision. To indicate their answer, they have to move the mouse pointer from the center of the screen towards the utilitarian or deontological response options that are presented in the opposite corners of the screen. In the mouse-tracking paradigm researchers typically examine the curvature in the mouse movement to test whether participants show “preference reversals” (Spivey et al., 2005). For example, if utilitarian responders initially generate a deontological response, they can be expected to move first towards the deontological response and afterwards to the utilitarian one. This will result in a more curved mouse trajectory. Deontological responders on the other hand are expected to go straight towards the deontological option from the start. However, contrary to the dual process assumption, the mouse trajectories have been found to be equally curved for both types of responses (Gürçay & Baron, 2016; Koop, 2013). There is also some converging evidence for the mouse-tracking findings. Białek and De Neys (2016, 2017) studied deontological responders’ conflict sensitivity. They presented participants with classic moral dilemmas and control versions in which deontological and utilitarian considerations cued the same non-conflicting response. For example, a no-conflict control version of the introductory drug example might be a scenario in which reimbursing the drug would save many more patients than not reimbursing it. Bialek and De Neys reasoned that if deontological responders were only considering the deontological response option, they should not be affected by the presence or absence of conflict. Results indicated that the intrinsic conflict in the classic dilemmas also affected deontological responders, as reflected in higher response doubt and longer decision times for the conflict vs no-conflict versions. Critically, this increased doubt was still observed when System 2 deliberation was experimentally minimized with a concurrent load task (Bialek & De Neys, 2017). This suggests that our intuitive System 1 is also cueing utilitarian considerations. However, it

181 Bago Bence – Thèse de doctorat - 2018 should be noted that although deontological responders showed conflict sensitivity, they still selected the deontological response option. Consequently, proponents of the corrective dual process view can still claim that people who actually make the utilitarian decision will only do so after deliberate correction of their initial deontological answer. In the present studies we adopt a two-response paradigm (Thompson, Turner, & Pennycook, 2011) to obtain a conclusive test of the corrective dual process assumption. The two-response paradigm has been developed in logical and probabilistic reasoning studies to gain direct behavioral insight into the time-course of intuitive and deliberate response generation (Bago & De Neys, 2017; Newman, Gibb, & Thompson, 2017; Pennycook & Thompson, 2012; Thompson & Johnson, 2014). In the paradigm participants are presented with a reasoning problem and have to respond as quickly as possible with the first response that comes to mind. Immediately afterwards they are presented with the problem again and can take as much time as they want to reflect on it and give a final answer. To make maximally sure that the first response is truly intuitive in nature participants are forced to give their first response within a strict deadline while their cognitive resources are also burdened with a concurrent load task (Bago & De Neys, 2017). The rationale is that System 2 processing, in contrast to System 1, is defined as time and resource demanding. By depriving participants from these resources one aims to “knock” out System 2 during the initial response phase (Bago & De Neys, 2017). The prediction in the moral reasoning case is straightforward. If the corrective assumption holds, the initial response to moral dilemmas should typically be deontological in nature and utilitarian responses should only appear in the final response stage. Put differently, individuals who manage to give a utilitarian response after deliberation in the final response stage should initially give a deontological response when they are forced to rely on mere intuitive processing in the first response stage. We present three studies in which we tested the robustness of our two-response findings. To foreshadow our key result, across all our studies we consistently observe that in the majority of cases in which people select a utilitarian responses after deliberation, the utilitarian response is already given in the initial phase. Hence, utilitarian responders do not need to deliberate to correct an initial deontological response. Their intuitive response is already utilitarian in nature. We will present a revised dual process model to account for the findings.

182 Bago Bence – Thèse de doctorat - 2018

STUDY 1

Method

Participants

In Study 1, 107 Hungarian students (77 female, Mean age = 21.6 years, SD = 1.4 years) from the Eotvos Lorand University of Budapest were tested. A total of 94% of the participants reported high school as highest completed educational level, while 6% reported having a post-secondary education degree. Participants received course credit for taking part. Participants in Study 1 (and all other reported studies) completed the study online.

Materials

Moral reasoning problems. In total, nine moral reasoning problems were presented. Problem content was based on popular scenarios from the literature (e.g., Cushman, Young, & Hauser, 2006; Foot, 1967; Royzman & Baron, 2002). All problems had the same underlying structure and required subjects to decide whether or not to sacrifice the lives of one of two groups of scenario characters. To minimize inter-item noise and possible content confounds (e.g., Trémolière & De Neys, 2013) we stuck to the following content rules for all problems: a) the difference between the possible number of characters in the two groups was kept constant at 8 lives, b) all characters were adults, c) the to-be made sacrifice concerned the death of the characters, d) there was no established hierarchy among the to-be sacrificed characters, and e) the scenario protagonist’s own life was never at stake. All problems are presented in the Supplementary Material, section A. All problems were translated to Hungarian (i.e., participants’ mother tongue) for the actual experiment. The problems were presented in two parts. First, the general background information was presented (non-bold text in example bellow) and participants clicked on a confirmation button when they finished reading it. Subsequently, participants were shown the second part of the problem that contained the critical conflicting (or non-conflicting, see further) dilemma information and asked them about their personal willingness to act and make the described sacrifice themselves (“Would you do X?”). Participants entered their answer by clicking on a corresponding bullet point (“Yes” or “No”). The first part of the problem remained on the

183 Bago Bence – Thèse de doctorat - 2018 screen when the second part was presented. The following example illustrates the full problem format:

Due to an accident there are 11 miners stuck in one of the shafts of a copper mine. They are almost out of oxygen and will die if nothing is done. You are the leader of the rescue team.

The only way for you to save the miners is to activate an emergency circuit that will transfer oxygen from a nearby shaft into the shaft where the 11 miners are stuck.

However, your team notices that there are 3 other miners trapped in the nearby shaft. If you activate the emergency circuit to transfer the oxygen, these 3 miners will be killed, but the 11 miners will be saved.

Would you activate the emergency circuit? 0 Yes 0 No

Four of the presented problems were traditional “conflict” versions in which participants were asked whether they were willing to sacrifice a small number of persons in order to save several more. Four other problems were control “no-conflict” versions in which participants were asked whether they were willing to sacrifice more people to save less (e.g., Bialek & De Neys, 2017). The following is an example of a no-conflict problem:

You are a radar operator overseeing vessel movement near Greenland. Due to sudden ice movement a boat carrying 3 passengers is about to crash into an iceberg. If nothing is done, all passengers will die.

The only way to save the 3 passengers is for you to order the captain to execute an emergency manoeuvre that will sharply alter the course of the boat.

However, the manoeuvre will cause the boat to overrun a life raft carrying 11 people. The life raft is floating next to the iceberg and out of sight of the captain. The 11 people will be killed if you order to execute the manoeuvre, but the 3 people on the boat will be saved.

Would you order to execute the manoeuvre?

Hence, on the conflict version the utilitarian response is to answer “yes” and the deontological response is to answer “no”. On the no-conflict problems both utilitarian and deontological considerations cue a “no” answer (for simplicity, we will refer to these non-sacrificial greater good answers as “utilitarian responses”). We included the no-conflict versions to make the problems less predictable and avoid that participants would start to reason about the possible dilemma choice before presentation of the second problem part. For the same reason we also included a filler item in the middle of the experiment (i.e., after 4 test problems). In this filler problem saving more people did not involve any sacrifice (i.e., doing the action implied saving 6 and killing 0 characters). Two problem sets were used to counterbalance the scenario content; scenario content that was used for the conflict problems in one set was used for the no-conflict problems in the

184 Bago Bence – Thèse de doctorat - 2018 other set, and vice-versa. Participants were randomly assigned to one of the sets. The presentation order of the problems was randomized in both sets. Load task. We wanted to make maximally sure that participants’ initial response was truly intuitive (i.e., System 2 engagement is ruled out). Therefore, we used a cognitive load task (i.e., the dot memorization task, see Miyake, Friedman, Rettinger, Shah, & Hegarty, 2001) to burden participants’ cognitive resources The rationale behind the load manipulation is simple; by definition, System 2 processing requires executive cognitive resources, while System 1 processing does not (Evans & Stanovich, 2013). Consequently, if we burden someone’s executive resources while they are asked to solve a moral reasoning problem, System 2 engagement is less likely. We opted for the dot memorization task because it is specifically assumed to burden participant’s executive resources (De Neys & Schaeken, 2007; De Neys & Verschueren, 2006; Franssens & De Neys, 2009; Miyake et al., 2001). Before each reasoning problem participants were presented with a 3 x 3 grid, in which 4 dots were placed. Participants were instructed that it was critical to memorize the location of the dots even though it might be hard while solving the reasoning problem. After answering the reasoning problem participants were shown four different matrixes and they had to choose the correct, to-be-memorized pattern. They received feedback as to whether they chose the correct or incorrect pattern. Note that the load was only applied during the initial response stage and not during the subsequent final response stage in which participants were allowed to deliberate and recruit System 2 (see further).

Procedure

Reading pre-test. Before we ran the main study we also recruited an independent sample of 33 participants for a reading pre-test (28 female, Mean age = 19.5 years, SD = 1.03 years). Participants were also recruited from the Eotvos Lorand University of Budapest and received course credit in exchange. All participants reported high school as the highest completed educational level. The basic goal of the reading pre-test was to determine the response deadline which could be applied in the main moral reasoning study. The idea was to base the response deadline on the average reading time in the reading test. Note that dual process theories are highly underspecified in many aspects (Kruglanski, 2013); they argue that System 1 is faster than System 2, but do not further specify how fast System 1 is exactly (e.g., System 1 < x seconds). Hence, the theory gives us nu unequivocal criterion on which we

185 Bago Bence – Thèse de doctorat - 2018 can base our deadline. Our “average reading time” criterion provides a practical solution to define the response deadline (Bago & De Neys, 2017). The rationale here was very simple; if people are allotted the time they need to simply read the problem, we can be reasonably sure that System 2 engagement is minimal. Thus, in the reading pre-test, participants were presented with the same items as in the main moral reasoning study. They were instructed to read the problems and randomly click on one of the answer options. The general instructions were as follows:

Welcome to the experiment! Please read these instructions carefully! This experiment is composed of 9 questions and 1 practice question. It will take 5 minutes to complete and it demands your full attention. You can only do this experiment once.

In this task we'll present you with a set of problems we are planning to use in future studies. Your task in the current study is pretty simple: you just need to read these problems. We want to know how long people need on average to read the material. In each problem you will be presented with two answer alternatives. You don’t need to try to solve the problems or start thinking about them. Just read the problem and the answer alternatives and when you are finished reading you randomly click on one of the answers to advance to the next problem.

The only thing we ask of you is that you stay focused and read the problems in the way you typically would. Since we want to get an accurate reading time estimate please avoid whipping your nose, taking a phone call, sipping from your coffee, etc. before you finished reading.

At the end of the study we will present you with some easy verification questions to check whether you actually read the problems. This is simply to make sure that participants are complying with the instructions and actually read the problems (instead of clicking through them without paying attention). No worries, when you simply read the problems, you will have no trouble at all to answer the verification questions.

Please confirm below that you read these instructions carefully and then press the "Next" button.

Problems were presented in two parts (background information and critical dilemma information) as in the main study. Our interest concerned the reading time for the critical second problem part. To make sure that participants would actually read the problems, we informed subjects that they would be asked to answer two simple verification questions at the end of the experiment to check whether they read the material. The verification questions could be easily answered even by a very rough reading. The following illustrates the verification question:

We asked you to read a number of problems. Which one of the following situations was not part of the experiment? o You were a soccer player o You were the leader of a rescue team o You were a railway controller o You were a late-night watchman

186 Bago Bence – Thèse de doctorat - 2018

The correct answers were clearly different from the situations which were presented during the task. Only one of the participants did not manage to solve both verification questions correctly (97% solved both correctly). This one participant was excluded from the reading-time analysis. The average reading time for the critical dilemma part in the resulting sample was M = 11.3 s (SD = 1.5 s). Note that raw reaction time data were first logarithmically transformed prior to analysis. Mean and standard deviation were calculated on the transformed data, and then they were back-transformed into seconds. We wanted to give the participants some minimal leeway, thus we rounded the average reading time to the closest higher natural number; the response deadline was therefore set to 12 seconds. Moral reasoning task. The experiment was run online. Participants were specifically instructed at the beginning that we were interested in their very first, initial answer that came to mind. They were also told that they would have additional time afterwards to reflect on the problem and could take as much time as they needed to provide a final answer. The literal instructions that were used stated the following (translated from Hungarian):

Welcome to the experiment! Please read these instructions carefully!

This experiment is composed of 9 questions and a couple of practice questions. It will take about 12 minutes to complete and it demands your full attention. You can only do this experiment once.

In this task we'll present you with a set of moral reasoning problems. We would like you to read every problem carefully and enter your response by clicking on it. There are no correct or incorrect decisions, we are interested in the response you personally feel is correct. We want to know what your initial, intuitive response to these problems is and how you respond after you have thought about the problem for some more time. Hence, as soon as the problem is presented, we will ask you to enter your initial response. We want you to respond with the very first answer that comes to mind. You don't need to think about it. Just give the first answer that intuitively comes to mind as quickly as possible. Next, the problem will be presented again and you can take all the time you want to actively reflect on it. Once you have made up your mind you can enter your final response. You will have as much time as you need to indicate your second response.

After you have entered your first and final answer we will also ask you to indicate your confidence in your response.

In sum, keep in mind that it is really crucial that you give your first, initial response as fast as possible. Afterwards, you can take as much time as you want to reflect on the problem and select your final response.

Please confirm below that you read these instructions carefully and then press the "Next" button.

187 Bago Bence – Thèse de doctorat - 2018

After this general introduction, participants were presented with a task specific introduction which explained them the upcoming task and informed them about the response deadline. The literal instructions were as follows:

We are going to start with a couple of practice problems. First, a fixation cross will appear. Then, the first part of the problem will appear. When you finished reading this click on the “Next” button and the rest of the problem will be presented to you.

As we told you we are interested in your initial, intuitive response. First, we want you to respond with the very first answer that comes to mind. You don't need to think about it. Just give the first answer that intuitively comes to mind as quickly as possible. To assure this, a time limit was set for the first response, which is going to be 12 seconds. When there are 2 seconds left, the background colour will turn to yellow to let you know that the deadline is approaching. Please make sure to answer before the deadline passes. Next, the problem will be presented again and you can take all the time you want to actively reflect on it. Once you have made up your mind you enter your final response.

After you made your choice and clicked on it, you will be automatically taken to the next page. After you have entered your first and final answer we will also ask you to indicate your confidence in the correctness of your response.

Press "Next" if you are ready to start the practice session!

After the specific instruction page participants solved two unrelated practice reasoning problems to familiarize them with the procedure. Next, they solved two practice dot matrix problems (without concurrent reasoning problem). Finally, at the end of the practice, they had to solve the two earlier practice reasoning problems under cognitive load. Each problem started with the presentation of a fixation cross for 1000 ms. Next, the first part of the moral reasoning problem with the background information appeared. Participants could take all the time they wanted to read the background information and clicked on the next button when they were ready. Subsequently, the dot matrix appeared and stayed on the screen for 2000 ms. Next, the remaining part of the problem appeared (while the first part stayed on screen). Participants had 12 s to give an answer; after 10 s the background of the screen turned yellow to warn participants about the upcoming deadline. If they did not provide an answer before the deadline, they were asked to pay attention to provide an answer within the deadline on subsequent trials After the initial response, participants were asked to enter their confidence in the correctness of their answer on a scale from 0% to 100%, with the following question: “How confident are you in your answer? Please type a number from 0 (absolutely not confident) to 100 (absolutely confident)”. After indicating their confidence, they were presented with four dot matrix options, from which they had to choose the correct, to-be-memorized pattern. Once

188 Bago Bence – Thèse de doctorat - 2018 they provided their memorization answer, they received feedback as to whether it was correct. If the answer was not correct, they were also asked to pay more attention to memorizing the correct dot pattern on subsequent trials. Finally, the full problem was presented again, and participants were asked to provide a final response. Once they clicked on one of the answer options they were automatically advanced to the next page where they had to provide their confidence level again. The colour of the answer options was green during the first response, and blue during the final response phase, to visually remind participants which question they were answering. Therefore, right under the question we also presented a reminder sentence: “Please indicate your very first, intuitive answer!” and “Please give your final answer”, respectively, which was also coloured as the answer options. At the end of the study participants completed a page with standard demographic questions.

Exclusion criteria. Participants failed to provide a first response before the deadline in 7% of the trials. In addition, in 8.3% of the trials participants responded incorrectly to the dot memorization load task. All these trials were removed from the analysis because it cannot be guaranteed that the initial response resulted from mere System 1 processing: If participants took longer than the deadline, they might have engaged in deliberation. If they fail the load task, we cannot be sure that they tried to memorize the dot pattern and System 2 was successfully burdened. In these cases we cannot claim that possible utilitarian responding at the initial response stage is intuitive in nature. Hence, removing trials that did not meet the inclusion criteria gives us the purest possible test of our hypothesis. In total, 14.8% of trials were excluded and 821 trials (out of 963) were further analysed (initial and final response for the same item counted as 1 trial).

Note that through the article we used mixed-effect regression models to analyse our results (Baayen et al., 2008; Kuznetsova, Brockhoff, & Christensen, 2015), with accounting for the random intercept of participants and items. For the binary choice data we used logistic regression while for the continuous confidence and reaction time data we used linear regression.

189 Bago Bence – Thèse de doctorat - 2018

Results

Table 1 gives a general overview of the results. We first focus on the response distributions for the final response. As one might expect, on the no-conflict problems in which utilitarian and deontological considerations cued the same response and choosing the greater good did not involve a sacrifice, the rate of utilitarian responses was near ceiling (95.4%). Not surprisingly, the utilitarian response rate was lower (84.5%) on the conflict problems in which choosing the greater good did require to sacrifice lives, χ2 (1) = 11.1, p = 0.0009, b = -0.99. The key finding, however, was that the utilitarian response was also frequently given as the initial, intuitive response on the critical conflict problems (79.7% of initial responses). This indicates that participants can give intuitive utilitarian responses to classic moral dilemmas. However, the raw percentage of intuitive utilitarian conflict problem responses is not fully informative. We can obtain a deeper insight into the results by performing a Direction of Change analysis on the conflict trials (Bago & De Neys, 2017). This means that we look at the way a given person in a specific trial changed (or didn’t change) her initial answer after the deliberation phase. More specifically, people can give a utilitarian and a deontological response in each of the two response stages. Hence, in theory this can result in four different types of answer change patterns (“DD”, deontological response in both stages; “UU”, utilitarian response in both stages; “DU”, initial deontological and final utilitarian response; “UD”, initial utilitarian and final deontological response). Based on the corrective dual process assumption, one can expect that people will either give “DD” responses, meaning that they had the deontological intuition in the beginning and did not correct it in the final stage, or “DU” responses meaning that they initially generated a deontological response, but then, after deliberation, they changed it to a utilitarian response. Table 2 shows the direction of change category frequencies for the conflict problems. First of all, we observed a non-negligible amount of DD (9.7%) and DU (10.6%) responses. In and by itself, these patterns are in accordance with the corrective predictions; reasoners generated the deontological response initially, and in the final response stage they either managed to override it (DU) or they did not (DD). However, what is surprising and problematic from the corrective perspective is the high percentage of UU responses (73.9% of the trials). Indeed, in the vast majority of the cases in which participants managed to give a utilitarian answer as final response, they already gave it as their first, intuitive response (i.e., 87.5% of cases). We refer to this critical number [(i.e., UU/(UU+DU) ratio] as the % of non- corrective utilitarian responses or non-correction rate in short. Overall, this means that

190 Bago Bence – Thèse de doctorat - 2018 utilitarian reasoners did not necessarily need their deliberate System 2 to correct their initial deontological intuition; their intuitive System 1 response was already utilitarian in nature.

Table 1. Initial and final average percentage (SD) of utilitarian responses in study 1-3. Conflict No-conflict Initial Final Initial Final Study No family - 79.7% (40.3) 84.5% (36.2) 90.3% (29.6) 95.4% (20.9) 1 Moderate ratio

Study Family - 21.2% (40.9) 17.5% (38.1) 94.5% (22.8) 96.4% (18.6) 2 Moderate ratio

Study No family - 81.7% (38.7) 89.8% (30.3) 94.2% (23.4) 93.8% (24.2) 3 Extreme ratio

Family - 28.4% (45.2) 31.4% (46.5) 95.6% (20.1) 97.5% (15.7) Extreme ratio

Overall average 54.2% (49.8) 57.5% (49.4) 93.7% (24.3) 95.7% (20.4)

Table 2. Frequency of direction of change categories in study 1-3 for conflict problems. Raw number of trials are in brackets. Direction of change category Non- correction UU DD UD DU rate UU/(DU+UU) Study 1 No family - 73.9% 9.7% (34) 5.7% (20) 10.6% (37) 87.5% Moderate ratio (258)

Study 2 Family - 12.7% (45) 74% (262) 8.5% (30) 4.8% (17) 72.6% Moderate ratio

Study 3 No family - 78.9% 7.4% (31) 2.9% (12) 10.9% (46) 87.8% Extreme ratio (332)

Family - 22.8% (77) 63% (213) 5.6% (19) 8.6% (29) 72.6% Extreme ratio

Overall average 48.7% 36.9% (540) 5.5% (81) 8.8% (129) 84.7% (712) Note. U = utilitarian. D = Deontological.

191 Bago Bence – Thèse de doctorat - 2018

STUDY 2

Our Study 1 results are challenging for the corrective dual process assumption: utilitarian responses to moral dilemmas were typically generated intuitively. However, one potential issue is that although the rate of utilitarian responding on the critical conflict items was lower than on the no-conflict problems, it was still fairly high. A critic might utter that the dual process model does not necessarily entail that all moral decisions require a deliberate correction process. The prototypical case on which the model has primarily focused concerns “high-conflict”18 situations in which a dilemma cues a strong conflicting emotional response which renders the utilitarian override particularly difficult (Greene, 2009; Greene, Nystrom, Engell, Darley, & Cohen, 2004; Shenhav & Greene, 2014). The high rate of final utilitarian responding on our Study 1 conflict problems might be argued to indicate that the problem did not evoke a particularly strong emotional response. Hence, the corrective assumption might be maintained in cases in which utilitarian responding is rarer (i.e., more demanding). Study 2 was run to address this issue. We used a manipulation (i.e., one of the to-be sacrificed persons in the dilemma was a close family member, (e.g., Hao, Liu, & Li, 2015; Tassy, Oullier, Mancini, & Wicker, 2013) that has been shown to increase the emotional averseness of the sacrificial option. We expected the manipulation to decrease the rate of utilitarian responding on the conflict problems. The critical question concerned the non-correction rate. If the Study 1 critique is right, final utilitarian decisions in Study 2 should be typically preceded by initial deontological responses leading to a floored non-correction rate.

Method

Participants

A total of 107 participants (68 female, Mean age = 20.6 years, SD = 1.9 years) from the Eotvos Lorand University of Budapest were tested. A total of 87.9% of the participants reported high school as highest completed educational level, while 12.1% reported having a post-secondary education degree. Participants received course credit for taking part.

18 The a priori operationalisation of what constitutes a “high-conflict” situation has been shown to be somewhat controversial (Gürcay & Baron, 2017; Greene, 2009). The simple point here is that we wanted to make sure that our non-correction rate results are robust and are not driven by specific idiosyncratic features of our dilemmas.

192 Bago Bence – Thèse de doctorat - 2018

Materials and Procedure

Moral reasoning task. The same scenario topics as in Study 1 were used. The only modification was that one of the-to be sacrificed persons in the dilemma was always a close family member (father, mother, brother, or sister). This “family member” manipulation has been shown to increase the emotional averseness of the sacrificial option and decrease the rate of utilitarian conflict responses (Hao et al., 2015; Tassy et al., 2013). Here is an example of a conflict problem:

Due to an accident there are 11 miners stuck in one of the shafts of a copper mine. They are almost out of oxygen and will die if nothing is done. You are the leader of the rescue team.

The only way for you to save the miners is to activate an emergency circuit that will transfer oxygen from a nearby shaft into the shaft where the 11 miners are stuck.

However, your team notices that your own father and two other miners are trapped in the nearby shaft. If you activate the emergency circuit to transfer the oxygen, your father and the two other miners will be killed, but the 11 miners will be saved.

Would you activate the emergency circuit? 0 Yes 0 No

The following is an example of a no-conflict problem:

You are a radar operator overseeing vessel movement near Greenland. Due to sudden ice movement a boat carrying 3 passengers is about to crash into an iceberg. If nothing is done, all passengers will die.

The only way to save the 3 passengers is for you to order the captain to execute an emergency manoeuvre that will sharply alter the course of the boat.

However, the manoeuvre will cause the boat to overrun a life raft carrying your own father and 10 other people. The life raft is floating next to the iceberg and out of sight of the captain. Your father and the other 10 people will be killed if you order to execute the manoeuvre, but the 3 people on the boat will be saved.

Would you order to execute the manoeuvre? 0 Yes 0 No

As in Study 1, participants evaluated four conflict, one filler, and four no-conflict problems in a randomized order. Scenario content of the conflict and no-conflict problems was counterbalanced. We also adopted the exact same two-response procedure as in Study 1. Hence, except for the modified scenario content, the procedure was completely identical to Study 1. All Study 2 problems are presented in the Supplementary Material, section A

193 Bago Bence – Thèse de doctorat - 2018

Exclusion criteria. The same exclusion criteria were applied as in Study 1. Participants failed to provide a first response before the deadline in 8.4% of the trials. In addition, in 7.1% of the trials participants responded incorrectly to the dot memorization load task. All these trials (15.1% of trials in total) were excluded and 818 trials (out of 963) were further analyzed (initial and final response for the same item counted as 1 trial).

Results

Table 1 gives an overview of the results. In line with expectations, we see that the percentage of utilitarian responses on the conflict items is much lower in Study 2 than in Study 1, both at the initial (17.5%) and final response (21.2%) stage. On the no-conflict items—in which choosing the greater good and saving the family member did not entail a sacrifice— the utilitarian response rate remained at ceiling. We tested this trend statistically by testing the interaction of conflict and family member condition (data from Study 1 as no- family condition, data from Study 2 as family condition). This interaction was indeed significant both at the initial, χ2 (3) = 108.45, p < 0.0001, b = -3.81, and final, χ2 (3) = 76.6, p < 0.0001, b = -4.37, response stage. This supports the claim that the family member manipulation increases the emotional averseness of a utilitarian sacrifice (Hao et al., 2015; Tassy et al., 2013). Reflecting the lower overall rate of initial and final utilitarian responses, the direction of change results in Table 2 indicate that there were fewer “UU” and “DU” responses in Study 2. However, the critical finding is that despite the overall decease, the “UU” responses (12.7%) are still twice as frequent as the “DU” (4.8%) responses. Hence, far from being floored, the non-correction rate remained high at 72.6%. In sum, in those cases that utilitarian responses are generated, they are still predominantly generated at the initial, intuitive response stage. This confirms the Study 1 finding and further argues against the corrective dual process assumption: Even in “high conflict” situations, utilitarian responding does not necessarily require reasoners to deliberately correct their initial deontological response. The low level of initial utilitarian conflict responses in Study 2 might give rise to the objection that these rare responses result from mere guessing. After all, our task is quite challenging; people had to respond within a strict response deadline and under secondary task load. In theory, it might be that participants found it too hard and just randomly clicked on one of the answer options. However, the ceiled performance on the no-conflict problems

194 Bago Bence – Thèse de doctorat - 2018

(94.5% utilitarian response) argues against such a guessing account. If participants were guessing, their performance on the conflict and no-conflict problems should be closer to 50% in both cases. We also conducted a so-called stability analysis (Bago & De Neys, 2017) to further test for a guessing account. We calculated for every participant on how many conflict problems they displayed the same direction of change category. We refer to this measure as the stability index. For example, if an individual shows the same type of direction of change on all four conflict problems, the stability index would be 100%. If the same direction of change is only observed on two trials, the stability index would be 50%19 etc. Results showed that the average stability index in Study 2 reached 83.8% (similar high stability rates were observed in all our studies, see Table S1 in the Supplementary Material). This indicates that that the direction of change pattern is highly stable on the individual level and argues against a guessing account; if people were guessing, they should not tend to show the same response pattern consistently.

STUDY 3

Study 3 was run to further test the robustness of our findings. One might argue that Study 1 and 2 focused on two more extreme cases: utilitarian responding was either very rare or very prevalent. In Study 3 we looked at a more “intermediate” case. We therefore combined the family member manipulation with a manipulation that has been shown to facilitate utilitarian responding. Trémolière and Bonnefon (2014) previously showed that increasing the kill-save ratio of a sacrifice (i.e., more people are saved), promoted utilitarian responding. Hence, by making the kill-save ratio more extreme we expected to increase the rate of utilitarian responding in comparison with Study 2. We again were interested in the non-correction rate. If the high non-correction rate is consistently observed with different scenario characteristics, this indicates that the findings are robust.

19 Note that due to methodological restrictions (we excluded items with incorrect load questions and items where response was not given within the deadline) some participants had less than four responses available. For these participants, stability was calculated based on the available items.

195 Bago Bence – Thèse de doctorat - 2018

Methods

Participants

In Study 3, 230 Hungarian students (171 female, Mean age = 22.6 years, SD = 21.7 years) from the Eotvos Lorand University of Budapest were tested. A total of 83% of the participants reported high school as highest completed educational level, while 17% reported having a post-secondary education degree. Participants received course credit for taking part.

Materials and procedure

Moral reasoning task. The same scenario topics as in the previous studies were used. The only modification was the kill-save ratio. Therefore, we multiplied the number of lives at stake in the largest group by a factor 5. Hence, in Study 1 and 2 the ratio was moderate (e.g., kill 3 to save 11; all ratios between 20%-30%), in Study 3 the ratio was more extreme (e.g., kill 3 to save 55; all ratios between 4-8%). Note that we made the ratio as extreme as the scenario content would allow (e.g., a life raft/plane carrying 5000 passengers would not be realistic, e.g., Trémolière & Bonnefon, 2014). For half of the sample the extreme ratios were combined with the same “family member” scenario content that was used in Study 2. For completeness, for the other half of the sample we combined the extreme ratios with the original “no family” scenario content that was used in Study 1. Participants were randomly allocated to one of the two conditions. Hence, over the three studies the kill-save ratio and family member manipulations were fully crossed. As in Study 1 and 2, participants evaluated four conflict, one filler, and four no- conflict problems in a randomized order. Scenario content of the conflict and no-conflict problems was counterbalanced. We also adopted the exact same two-response procedure as in Study 1 and 2. Hence, except for the modified kill-save ratio scenario content, the procedure was identical to Study 1 and 2.

Exclusion criteria. The same exclusion criteria were applied as in Study 1 and 2. Participants failed to provide a first response before the deadline in 8.1% of the trials. In addition, in 6.4% of the trials participants responded incorrectly to the dot memorization load

196 Bago Bence – Thèse de doctorat - 2018 task. All these trials (13.8% of trials in total) were excluded and 1784 trials (out of 2070) were further analyzed (initial and final response for the same item counted as 1 trial).

Results and discussion

Table 1 gives an overview of the results. As before, the no-conflict items remained at ceiling throughout. As expected, the extremer kill-save ratios in Study 3 resulted in a slightly higher initial and final utilitarian conflict problem response rate in comparison with the moderate kill-save results in Study 1 and 2. This trend was most pronounced in the “family” condition in which the utilitarian response rate with moderate ratios was lowest. Statistical testing showed that the main effect of the extremity manipulation was significant at the final response (after accounting for the effect of “Family” condition), χ2 (2) = 11.97, p = 0.0005, b = -1.06, but not at the initial response stage, χ2 (2) = 3.28, p = 0.07, b = -0.43. The interaction trend with the family member manipulation failed to reach significance both at the initial, χ2 (3) = 0.56, p = 0.45, and final response stage, χ2 (3) = 1.85, p = 0.17. Note that the more limited impact of the extremity manipulation might be due to the fact that our ratios were less extreme than in previous work (e.g., Trémolière & Bonnefon, 2014). Nevertheless, the key point is that we observed a higher absolute descriptive number of utilitarian responses in Study 3, especially in the family condition (31.4% final utilitarian vs. 17.5% in Study 2) which allows us to test the generalizability of our non-correction findings across various levels of ultimate utilitarian responding. Table 2 shows the direction of change results. The critical finding is that we again observe very high non-correction rates in Study 3, both when the life of a family member was a stake (72.6%) or not (87.8%). Hence, across our three studies with varying dilemma characteristics and absolute levels of utilitarian responding, we consistently observe that although correction is sometimes observed, it is far less likely than non-correction. In more than 70% of the cases, utilitarian responders do not need to correct an initial, deontological response, their initial intuitive response is already utilitarian in nature. Additional analyses. After having established the robustness of the non-correction findings in our three studies, we can explore a number of additional two-response data questions. For example, one can contrast response latencies and confidence ratings for the different direction of change categories. Previous two-response studies on logical reasoning (e.g., Bago & De Neys, 2017; Thompson et al., 2011; Thompson & Johnson, 2014)

197 Bago Bence – Thèse de doctorat - 2018 established that the initial response confidence is typically lower for responses that get subsequently changed after deliberation (e.g., “DU” and “UD” in the present case) than for responses that are not changed (e.g., “UU” and “DD” in the present case). It has been suggested that this lower initial confidence (or “Feeling of Rightness” as Thompson et al. refer to it) would be one factor that determines whether reasoners will engage in System 2 deliberation (e.g., Thompson et al., 2011). Changed responses have also been shown to be associated with longer “re-thinking times” (i.e., response latencies) in the final response stage. To explore these trends in the moral reasoning case, Figures 1 and 2 plot the average confidence ratings and response latencies findings across our three studies. As the figures indicate our moral reasoning findings are consistent with the logical reasoning trends. Initial response confidence (Figure 1, top panel) for the “UD” and “DU” categories in which the initial response is changed after deliberation is lower than for “UU” and “DD” categories in which the initial responses is not changed. Final response times (Figure 2, bottom panel) are also longer for the change categories (i.e., “DU” and “UD”) than for the no-change ones. To test these trends statistically we entered direction of change category (change vs no-change) as fixed factors to the models. All latency data were log-transformed prior to analysis. Both the confidence, χ2 (1) = 104.7, p < 0.0001, b = -15.8, and latency, χ2 (1) = 49.03, p < 0.0001, b = 0.17, trends were significant. One additional trend that visually pops-out is that for the “DU” category in which an initial deontological response is corrected, there is a sharp confidence increase when contrasting initial and final confidence, χ2 (1) = 49.1, p < 0.0001, b = 21.4. After deliberation, the response confidence attains the level of intuitively generated utilitarian responses in the “UU” case. Hence, perhaps not surprisingly, in those cases that deliberate correction occurs it seems to also alleviate one’s initial doubt. We also note that with respect to the initial response latencies (Figure 2, top panel), the rare UD trials seemed to be generated slightly faster than the others, χ2 (1) = 7.3, p = 0.007, b = -0.05. For completeness, the interested reader can find a full overview of the confidence and latency data in the Supplementary material (Table S2 and S3).

198 Bago Bence – Thèse de doctorat - 2018

A. Initial confidence

B. Final confidence

Figure 1. Mean initial (A.) and final (B.) conflict problem response confidence ratings as a function of direction of change category averaged across Study 1-3. Error bars are 95% confidence intervals. A related issue we can explore with our confidence data is whether intuitive utilitarian responders are actually faced with two competing intuitions at the first response stage. That is, a possible reason for why people in the “UU” category manage to give a utilitarian initial response might be that the problem simply does not generate an intuitive deontological

199 Bago Bence – Thèse de doctorat - 2018 response for them. Hence, they would only generate a utilitarian response and would not be faced with an interfering deontological one. Alternatively, they might generate two competing intuitions, but the utilitarian intuition might be stronger and therefore dominate (Bago & De Neys, 2017). We can address this question by looking at the confidence contrast between conflict and no-conflict control problems. If conflict problems cue two conflicting initial intuitive responses, people should process the problems differently than the no-conflict problems (in which such conflict is absent) in the initial response stage. Studies on conflict detection during moral reasoning that used a classic single response paradigm have shown that processing conflict problems typically results in lower response confidence (e.g., Bialek & De Neys, 2016, 2017). The question that we want to answer here is whether this is also the case at the initial response stage. Therefore, we contrasted the confidence ratings for the initial response on the conflict problems with those for the initial response on the no-conflict problems20. Our central interest here concerns the “UU” cases but a full analysis and discussion for each direction of change category is presented in the Supplementary Material (section C). In sum, results across our studies indeed indicate that “UU” responders showed a decreased confidence (average decrease = 6.1%, SE = 1.1, χ2 (1) = 21.4, p < 0.0001, b = - 6.76) on the conflict vs no-conflict problems. This supports the hypothesis that in addition to their dominant utilitarian intuition the alternative deontological response is also being cued.

20 In general, response latencies can also be used to study conflict detection (e.g., Botvinick, 2007; De Neys & Glumicic, 2008; Pennycook, Fugelsang, & Koehler, 2015). However, we refrained from analyzing response latencies in the current context given that in the moral reasoning domain they have been found to be a less reliable conflict indicator (Bialek & De Neys, 2017)

200 Bago Bence – Thèse de doctorat - 2018

A. Initial response time

B. Final response time

Figure 2. Mean initial (A.) and final (B.) conflict problem response times as a function of direction of change category averaged across Study 1-3. Error bars are 95% confidence intervals.

201 Bago Bence – Thèse de doctorat - 2018

GENERAL DISCUSSION

Our studies tested the assumption that utilitarian responses to moral dilemmas require deliberate System 2 correction of an initial, intuitive deontological System 1 response. By adopting a two-response paradigm in which participants were required to give an initial response under time-pressure and cognitive load we were able to empirically identify the intuitively generated response that preceded the final response given after deliberation. We ran three studies in which we tested a range of conflict dilemmas that gave rise to various absolute levels of utilitarian responding, including “high-conflict” cases in which there was a strong emotional averseness towards the sacrificial option. Our critical finding is that although there were some instances in which deliberate correction occurred, these were the exception rather than the rule. Across the studies, results consistently showed that in the vast majority of cases in which people opt for a utilitarian response after deliberation, the utilitarian response is already given in the initial phase. Hence, pace the corrective dual process assumption, utilitarian responders do not necessarily need to deliberate to correct an initial deontological response. Their intuitive response is typically already utilitarian in nature. Our two-response findings point to the pervasiveness of an intuitive utilitarianism in which people intuitively prefer the greater good without any deliberation. One might note that the idea that utilitarian reasoning can be intuitive is not new. As Bialek and De Neys (2017) noted, at least since J. S. Mill various philosophers have characterized utilitarianism as a heuristic intuition or rule of thumb. At the empirical level, Kahane (2012, 2015; Wiech et al., 2013) demonstrated this by simply changing the severity of the deontological transgression. Kahane and colleagues showed that in cases where the deontological duty is trivial and the consequence is large (e.g., when one needs to decide whether it is acceptable to tell a lie in order to save someone’s life) the utilitarian decision can be made intuitively. Likewise, Trémolière and Bonnefon (2014) showed that when the kill-save ratios (e.g., kill 1 to save 5000) were exceptionally inflated, people effortlessly made the utilitarian decision even when they were put under cognitive load. Hence, one could argue that these earlier empirical studies established that at least in some exceptional or extreme scenarios utilitarian responding can be intuitive. What the present findings indicate is that there is nothing exceptional about intuitive utilitarianism. The established high non-correction rate in the present studies implies that the intuitive generation of a utilitarian response is the rule rather than the exception. Moreover,

202 Bago Bence – Thèse de doctorat - 2018 the non-correction was observed in standard dilemmas with moderate, conventional kill-save ratios and severe deontological transgressions (i.e., killing) that were used to validate the standard dual process model of moral cognition. This indicates that utilitarian intuitions are not a curiosity that result from extreme or trivial scenario content but lie at the very core of the moral reasoning process (Bialek & De Neys, 2017).

Towards a new dual process model of moral cognition

The evidence in favor of intuitive utilitarianism and against the corrective assumption forces us to revise the dual process model of moral cognition. So what type of model or architecture do we need to account for the present findings? Previous critique on the dual process model of moral cognition already suggested moving from a serial to parallel processing conceptualization (e.g., Koop, 2013). The serial and parallel processing view concern specific assumptions about the time-course of the System 1 and 2 interaction in dual process models. The serial view model entails that at the start of the reasoning process only System 1 is activated by default. System 2 can be activated but this activation is optional and occurs later in the reasoning process (e.g., Evans & Stanovich, 2013; Kahneman, 2011). The parallel view (e.g., Sloman, 1996) entails that both System 1 and System 2 are activated simultaneously from the start. Note, however, that although the serial and parallel view differ on when System 2 processing is assumed to start, both make the corrective assumption and hypothesize that computing the alleged (e.g., utilitarian) System 2 response will take time and effort. Given that we observed that the utilitarian response is generated as initial response when System 2 processing is experimentally “knocked-out”, the parallel activation does not allow us to account for the findings. A recent alternative to the more traditional serial and parallel models is the so-called “hybrid21” model view (Bago & De Neys, 2017; Ball, Thompson, & Stupple, 2018; Banks, 2018; Białek & De Neys, 2017; De Neys, 2012, 2018; Handley & Trippas, 2015; Pennycook, Fugelsang, & Koehler, 2015; Thompson & Newman, 2018; Trippas & Handley, 2018). At the most general level this model simply entails that the response that is traditionally assumed to be cued by System 2 can also by cued by System 1. Hence, in the case of moral reasoning the idea is that System 1 is simultaneously generating both a deontological and utilitarian

21 We use the “hybrid” model label to refer to core features that seem to be shared – under our interpretation – by the recent theoretical proposals of various authors. It should be clear that this does not imply that these proposals are completely similar. We are talking about a general family resemblance rather than full correspondence and focus on commonalities rather than the differences.

203 Bago Bence – Thèse de doctorat - 2018 intuition (e.g., Bialek & De Neys, 2016, 2017; see also Gürçay & Baron, 2017; Rosas, 2017). This allows us to account for the fact that utilitarian responses can be intuitive and non- corrective in nature. However, this does not suffice. The key challenge for the dual process model is to account for the direction of change results. Indeed, although we observed that final utilitarian responders predominantly generate the utilitarian response intuitively, many reasoners did not generate utilitarian responses and stuck to a deontological response throughout. Likewise, there were also cases in which correction occurred and the utilitarian response was only generated after deliberate correction. How can we explain these different response patterns? Here it is important to underline that the hybrid model—such as it has been presented in the logical/probabilistic reasoning field—posits that although System 1 will generate different types of intuitions, this does not entail that all these intuitions are equally strong (Bago & De Neys, 2017; Pennycook, 2018; Pennycook et al., 2015). They can vary in their strength or activation level. More specifically, the model proposes that we need to consider both absolute (which one of the two intuitions is strongest?) and relative (how pronounced is the activation difference between both intuitions?) strength differences between competing intuitions (Bago & De Neys, 2017). The initial response will be determined by the absolute strength level. Whichever intuition is strongest will be selected as initial response. Whether or not the initial response gets subsequently deliberately changed will be determined by the relative strength difference between both intuitions. The smaller the difference, the less confident one will be, and the more likely that the initial response will be changed after deliberation. Bago and De Neys (2017) already showed that such a model accounted for the two-response findings in logical/probabilistic reasoning. Here we propose to apply the same principles to the moral reasoning case. Figure 3 illustrates the idea. In the figure we have plotted the hypothetical strength of the utilitarian and deontological intuition for each of the four direction of change categories in imaginary activation strength “units”. For example, in the UU case, the utilitarian intuition might be 4 units strong whereas the deontological intuition might be only 1 unit strong. In the DD case, we would have the opposite situation with a 4 unit strong deontological intuition and a much weaker, 1 unit utilitarian intuition. In the two change categories, one of the two intuitions will also dominate the other but the relative difference will be less pronounced. For example, in the DU case the deontological intuition might have strength level 3 whereas the utilitarian intuition has strength level 2. Because the relative difference is less pronounced, there will be more doubt and this will be associated with longer final rethinking and answer

204 Bago Bence – Thèse de doctorat - 2018 change. In other words, in each of the four direction of change categories there will be differences in which intuition is the dominant one and how dominant the intuition is. The more dominant an intuition is, the more likely that it will be selected as initial response, and the less likely that it will be corrected by deliberate System 2 processing.

Figure 3. Illustration of a hybrid dual process model of moral cognition. Possible absolute (which one of the two intuitions is strongest?) and relative (how pronounced is the activation difference between both intuitions?) strength differences between the utilitarian and deontological intuition in the different direction of change categories. The figure shows the strength of the utilitarian and deontological intuition for each direction of change category in (imaginary) activation strength “units”. Note: U = Utilitarian, D = Deontological.

It should be clear that Figure 3 presents a hypothetical model of the strength levels. The strength levels were set to illustrate the core hybrid model principles. However, the principles themselves are general and were independently established in the logical/probabilistic reasoning field. The key point is that by allowing utilitarian intuitions within System 1 and considering strength differences between competing System 1 intuitions

205 Bago Bence – Thèse de doctorat - 2018 we can readily explain why utilitarian responses can be generated intuitively and why sometimes people will correct their initial responses after deliberation. The hybrid model illustrates how one can make theoretical sense of the observed findings. Interestingly, it also makes new predictions. That is, given the core principles one can expect that changes in the strength levels of competing intuitions should lead to predictable consequences. For example, our studies showed that the family member manipulation had a profound impact on the rate of utilitarian responding. It is not unreasonable to assume that putting the life of a close family member at stake will increase the strength of the deontological intuition. It follows from the hybrid model principles that the prospect of sacrificing a family member should not only decrease the utilitarian response rate (which we observed but is fairly trivial) but also affect the associated response confidence. Consider two reasoners in the family and no family condition who both give an intuitive deontological response. The fact that they give an initial deontological response implies that the absolute strength of their deontological intuition dominates their utilitarian intuition. Putting the life of a family member at stake in the family condition will further increase the strength of the deontological intuition. Hence, for a deontological responder in the family condition the strength difference with the competing utilitarian intuition will increase. Therefore, for a deontological responder it should become less likely to experience conflict in the family vs no-family condition, and their response should be doubted less. Furthermore, based on the same principles, one can expect that the strength manipulation should have the exact opposite impact for intuitive utilitarian responders. The fact that someone gives the utilitarian response implies that the absolute strength of their utilitarian intuition will dominate their deontological intuition. However, since putting the life of a family member at stake will increase the strength of the deontological intuition, the relative difference between the two intuitions will be smaller for the utilitarian responder in the family condition. That is, the utilitarian responder in the family condition will now face a deontological intuition which strength is closer to their utilitarian intuition strength. Consequently, given the smaller difference, they should experience more conflict and show more response doubt. Figure 4 plots the average confidence data for initial deontological and utilitarian response in the family and no family conditions across our studies. The expected trend is indeed observed22. Making the deontological intuition stronger makes utilitarian

22 A related, albeit less pronounced trend, can be observed for the kill-save manipulation (see Supplementary Material D).

206 Bago Bence – Thèse de doctorat - 2018 responders less confident about their decision (i.e., 19.6% decrease) whereas deontological responders grow more confident (i.e., 15.6% increase). Statistical testing indicated that this interaction was significant, χ2 (3) = 76.63, p < 0.0001, b = -25.7. In sum, we hope to have demonstrated how an application of the hybrid dual process principles can account for the observed findings – and makes testable predictions. We believe this underlines the potential of the hybrid model view as an alternative to the traditional dual process model. At the very least the present studies should make it clear that the traditional corrective model is untenable. Although the hybrid model will need to be further validated and developed the present studies indicate that it core principle stands: Any viable dual process model of moral cognition will need to allow for the generation of both utilitarian and deontological intuitions within System 1 and consider competition between these intuitions.

Why do we need System 2 deliberation?

The hybrid model and the evidence for intuitive utilitarianism imply that we need to upgrade the role of System 1: Utilitarian judgments do not necessarily require System 2 deliberation but can be generated by System 1. Here it is important to stress that upgrading the role of System 1 does not imply a downgrade of System 2. First, in all our studies we observed that correction does sometimes occur. Hence, although it is more exceptional, System 2 can be used to deliberately correct one’s intuitive response. Second, and more critically, the fact that deliberation is not typically used for correction does not imply it cannot be important for other functions. For example, one of the features that is often associated with deliberation is its cognitive transparency (Bonnefon, 2016). Deliberate decisions can typically be justified; we can explain why we opt for a certain response after we reflected on it. Intuitive processes often lack this explanatory property: People have little insight into their intuitive processes and do typically not manage to justify their “gut-feelings” (Marewski & Hoffrage, 2015; Mega & Volz, 2014). Hence, one suggestion is that people might be using deliberation to look for an explicit justification or validation of their intuitive insight (Bago & De Neys, 2018). For example, Bago and De Neys (2018) observed that although reasoners could often intuitively generate the correct solution to logical reasoning problems, they struggled to properly explain why their answer was correct. Such justifications were more likely after people were given the time to deliberate. A similar process might be at play during moral reasoning. In the Supplementary Material (section E) we present the results of an exploratory pilot study in which people were given moral dilemmas and were asked to give a justification after both their initial and final response. We were specifically interested in

207 Bago Bence – Thèse de doctorat - 2018 proper utilitarian justifications that explicitly mentioned the greater good (e.g., “I opted for this decision because more people will be saved”). The study replicated the finding that final utilitarian decisions were typically preceded by initial utilitarian responses (i.e., high non- correction rate). Critically, however, proper utilitarian justifications for a utilitarian response were more likely in the final response stage (i.e., up to +20% increase when the life of a family member was at stake23). Hence, although utilitarian responses can be generated intuitively, additional deliberation might make it more likely that we will manage to properly justify it. In general, being able to justify one’s response and producing explicit arguments to support it might be more crucial for reasoning than it was often believed to be in the past (Mercier & Sperber, 2011, 2017). The work of Mercier and Sperber, for example, underscores that arguments are critical for communicative purposes. We will not be very successful in convincing others that our decision is acceptable, if we can only tell them that we “feel it is right”. If we come up with a good explanation, people will be more likely to change their mind and accept our view (Trouche et al., 2014). If System 2 deliberation plays a role in this process, it should obviously not be downplayed. Interestingly, at least one tradition within moral reasoning research has characterized deliberate justifications as post hoc constructions or “rationalizations” (Haidt, 2001). This “social intuitionist” approach has stressed the primacy of intuitive processes for moral reasoning. By and large, moral reasoning would be driven by mere intuitive processes. Interestingly, the traditional dual process of moral cognition reacted against this “intuitionist” view by arguing that corrective deliberate processes were also central to moral reasoning (Greene & Haidt, 2002). By presenting evidence against the corrective assumption the current paper might seem to support the social intutionist framework. We simply want to highlight here that although the hybrid model shares the upgraded view of intuitive processes, it does not conceive deliberation as epiphenomenal or extrinsic to the reasoning process. Whatever one’s position in this debate might be, our point here is that that the case against the corrective dual process assumption should not be taken as an argument against the role or importance of deliberation in human cognition. Our goal is not to contest that deliberation might be important for human reasoning. The point is simply that this importance does not necessarily lie in a correction process.

23 One limitation of the study is that participants can use the justification to deliberate about their initial response. This might inflate proper utilitarian justifications at the initial response phase. However, the point is that despite this limitation we still observed an increase in utilitarian justifications in the final response phase.

208 Bago Bence – Thèse de doctorat - 2018

In closing

Finally, we want to highlight the close link between the current work on moral reasoning and related dual process work in the logical reasoning field. As we noted, our two- response paradigm and the theoretical hybrid dual process model we proposed were inspired by recent dual process advances on logical reasoning. In the past, dual process research in the moral and logical reasoning fields has been occurring in somewhat isolation (Bonnefon & Trémolière, 2017; Gürçay & Baron, 2017) and we hope that the present study can stimulate a closer interaction (Białek & De Neys, 2017; Gürçay & Baron, 2017; Trémolière, De Neys, & Bonnefon, 2018). In our view, such interaction is the critical stepping stone to arrive at a unified domain-general model of the interplay between intuitive and deliberate processes in human cognition. Our evidence against the corrective dual process view suggests that such a model will need to be build on a hybrid processing architecture in which absolute and relative strength differences between competing intuitions determine our reasoning performance.

209 Bago Bence – Thèse de doctorat - 2018

REFERENCES

Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59(4), 390– 412. Bago, B., & De Neys, W. (2017). Fast logic?: Examining the time course assumption of dual process theory. Cognition, 158, 90–109. Ball, L., Thompson, V., & Stupple, E. (2018). Conflict and dual process theory: the case of belief bias. In W. De Neys (Ed.), Dual Process Theory 2.0. Oxon, UK: Routledge. Banks, A. (2018). Comparing dual process theories: evidence from event-related potentials. In W. De Neys (Ed.), Dual Process Theory 2.0. Oxon, UK: Routledge. Baron, J. (2017). Utilitarian vs. deontological reasoning: method, results, and theory. In J.-F. Bonnefon & B. Trémolière (Eds.), Moral inferences (pp. 137–151). Hove, UK: Psychology Press. Baron, J., & Gürçay, B. (2017). A meta-analysis of response-time tests of the sequential two- systems model of moral judgment. Memory & Cognition, 45(4), 566–575. Baron, J., Fincher, K., & Metz, S. E. (2015). Why does the Cognitive Reflection Test (sometimes) predict utilitarian moral judgment (and other things)? Journal of Applied Research in Memory and Cognition, 4(3), 265–284. Białek, M., & De Neys, W. (2016). Conflict detection during moral decision-making: evidence for deontic reasoners’ utilitarian sensitivity. Journal of Cognitive Psychology, 28(5), 631–639. Białek, M., & De Neys, W. (2017). Dual processes and moral conflict: Evidence for deontological reasoners’ intuitive utilitarian sensitivity. Judgment and Decision Making, 12(2), 148–167. Bonnefon, J.-F. (2016). The Pros and Cons of Identifying Critical Thinking with System 2 Processing. Topoi, 1–7. Bonnefon, J.-F., & Trémolière, B. (Eds.). (2017). Moral Inferences. Oxon, UK: Routledge. Botvinick, M. M. (2007). Conflict monitoring and decision making: reconciling two perspectives on anterior cingulate function. Cognitive, Affective, & Behavioral Neuroscience, 7(4), 356–366. Conway, P., & Gawronski, B. (2013). Deontological and utilitarian inclinations in moral decision making: a process dissociation approach. Journal of Personality and , 104(2), 216. Cushman, F., Young, L., & Hauser, M. (2006). The role of conscious reasoning and intuition in moral judgment: Testing three principles of harm. Psychological Science, 17(12), 1082–1089. De Neys, W. (2012). Bias and conflict a case for logical intuitions. Perspectives on Psychological Science, 7(1), 28–38. De Neys, W. (2018). Bias, conflict, and fast logic: Towards a hybrid dual process future? In W. De Neys (Ed.), Dual Process Theory 2.0. Oxon, UK: Routledge. De Neys, W., & Glumicic, T. (2008). Conflict monitoring in dual process theories of thinking. Cognition, 106(3), 1248–1299.

210 Bago Bence – Thèse de doctorat - 2018

De Neys, W., & Schaeken, W. (2007). When people are more logical under cognitive load. Experimental Psychology (Formerly Zeitschrift Für Experimentelle Psychologie), 54(2), 128–133. De Neys, W., & Verschueren, N. (2006). Working memory capacity and a notorious brain teaser: The case of the Monty Hall Dilemma. Experimental Psychology, 53(2), 123– 131. Dolgin, E. (2011). World’s most expensive drug receives second approval for deadly blood disease. Nature Medicine. Retrieved from http://blogs.nature.com/spoonful/2011/09/soliris.html Evans, J. S. B. T., & Stanovich, K. E. (2013). Dual-process theories of higher cognition advancing the debate. Perspectives on Psychological Science, 8(3), 223–241. Evans, J. S. B. T. (2008). Dual-processing accounts of reasoning, judgment, and social cognition. Annual Review of Psychology, 59, 255–278. Foot, P. (1967). The problem of abortion and the doctrine of double effect. Oxford Review, 5, 5–15. Franssens, S., & De Neys, W. (2009). The effortless nature of conflict detection during thinking. Thinking & Reasoning, 15(2), 105–128. Greene, J. (2013). Moral tribes: emotion, reason and the gap between us and them. New York, NY: Penguin Press. Greene, J. D. (2009). The cognitive neuroscience of moral judgment. In M. Gazzaniga (Ed.), The cognitive neurosciences (4th ed., pp. 987–999). Cambridge, MA: MIT Press. Greene, J. D. (2015). Beyond point-and-shoot morality: Why cognitive (neuro) science matters for ethics. The Law & Ethics of Human Rights, 9(2), 141–172. Greene, J. D., Nystrom, L. E., Engell, A. D., Darley, J. M., & Cohen, J. D. (2004). The neural bases of cognitive conflict and control in moral judgment. Neuron, 44(2), 389–400. Greene, J. D., Sommerville, R. B., Nystrom, L. E., Darley, J. M., & Cohen, J. D. (2001). An fMRI investigation of emotional engagement in moral judgment. Science, 293(5537), 2105–2108. Greene, J., & Haidt, J. (2002). How (and where) does moral judgment work? Trends in Cognitive Sciences, 6(12), 517–523. Gürçay, B., & Baron, J. (2017). Challenges for the sequential two-system model of moral judgement. Thinking & Reasoning, 23(1), 49–80. Haidt, J. (2001). The emotional dog and its rational tail: a social intuitionist approach to moral judgment. Psychological Review, 108(4), 814. Handley, S. J., & Trippas, D. (2015). Chapter Two-Dual Processes and the Interplay between Knowledge and Structure: A New Parallel Processing Model. Psychology of Learning and Motivation, 62, 33–58. Hao, J., Liu, Y., & Li, J. (2015). Latent Fairness in Adults’ Relationship-Based Moral Judgments. Frontiers in Psychology, 6, 1871. Kahane, G. (2012). On the wrong track: Process and content in . Mind & Language, 27(5), 519–545. Kahane, G. (2015). Sidetracked by trolleys: Why sacrificial moral dilemmas tell us little (or nothing) about utilitarian judgment. Social Neuroscience, 10(5), 551–560. Kahneman, D. (2011). Thinking, fast and slow. New York, NY: Farrar, Straus and Giroux.

211 Bago Bence – Thèse de doctorat - 2018

Kant, I. (1785). Groundwork for the metaphysics of morals. New Haven, CT: Yale University Press. Koop, G. J. (2013). An assessment of the temporal dynamics of moral decisions. Judgment and Decision Making, 8(5), 527. Kruglanski, A. W. (2013). Only one? The default interventionist perspective as a unimodel— Commentary on Evans & Stanovich (2013). Perspectives on Psychological Science, 8(3), 242–247. Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B. (2015). Package ‘lmerTest’. London, J. A. (2012). How should we model rare disease allocation decisions? Hastings Center Report, 42(1), 3. Marewski, J. N., & Hoffrage, U. (2015). Modeling and aiding intuition in organizational decision making. Journal of Applied Research in Memory and Cognition, 4, 145–311. Mega, L. F., & Volz, K. G. (2014). Thinking about thinking: implications of the introspective error for default-interventionist type models of dual processes. Frontiers in Psychology, 5. Mill, J. S., & Bentham, J. (1987). Utilitarianism and other essays. Harmondsworth, UK: Penguin. Miyake, A., Friedman, N. P., Rettinger, D. A., Shah, P., & Hegarty, M. (2001). How are visuospatial working memory, executive functioning, and spatial abilities related? A latent-variable analysis. Journal of Experimental Psychology: General, 130(4), 621– 640. Moore, A. B., Stevens, J., & Conway, A. R. (2011). Individual differences in sensitivity to reward and punishment predict moral judgment. Personality and Individual Differences, 50(5), 621–625. Newman, I., Gibb, M., & Thompson, V. A. (2017). Rule-based reasoning is fast and belief- based reasoning can be slow: Challenging current explanations of belief -bias and base-rate neglect. Journal of Experimental Psychology: Learning, Memory, and Cognition, 43(7), 1154–1170. Nichols, S. (2004). Folk concepts and intuitions: From philosophy to cognitive science. Trends in Cognitive Sciences, 8(11), 514–518. Pennycook, G. (2018). A perspective on the theoretical foundation of dual process models. In W. De Neys (Ed.), Dual Process Theory 2.0. Oxon, UK: Routledge. Pennycook, G., Fugelsang, J. A., & Koehler, D. J. (2015). What makes us think? A three- stage dual-process model of analytic engagement. Cognitive Psychology, 80, 34–72. Pennycook, G., & Thompson, V. A. (2012). Reasoning with base rates is routine, relatively effortless, and context dependent. Psychonomic Bulletin & Review, 19(3), 528–534. Rosas, A. (2017). On the Cognitive (Neuro) science of Moral Cognition: Utilitarianism, Deontology, and the “Fragmentation of Value.” In A. Ibáñez, L. Sedeño, & A. García (Eds.), Neuroscience and Social Science (pp. 199–215). Springer. Royzman, E. B., & Baron, J. (2002). The preference for indirect harm. Social Justice Research, 15(2), 165–184. Schellens, G. (2015). Alexion deal with Belgian government got public. Retrieved from http://bbibber.blogspot.be/2015/03/alexion-deal-with-belgian-government.html

212 Bago Bence – Thèse de doctorat - 2018

Shenhav, A., & Greene, J. D. (2014). Integrative moral judgment: dissociating the roles of the amygdala and ventromedial prefrontal cortex. Journal of Neuroscience, 34(13), 4741– 4749. Sloman, S. (2015). Opening editorial: The changing face of cognition. Cognition, 135, 1–3. Sloman, S. (1996). The empirical case for two systems of reasoning. Psychological Bulletin, 119(1), 3–22. Spivey, M. J., Grosjean, M., & Knoblich, G. (2005). Continuous attraction toward phonological competitors. Proceedings of the National Academy of Sciences of the United States of America, 102(29), 10393–10398. Suter, R. S., & Hertwig, R. (2011). Time and moral judgment. Cognition, 119(3), 454–458. Tassy, S., Oullier, O., Mancini, J., & Wicker, B. (2013). Discrepancies between judgment and choice of action in moral dilemmas. Frontiers in Psychology, 4, 250. Thompson, V. A., & Johnson, S. C. (2014). Conflict, metacognition, and analytic thinking. Thinking & Reasoning, 20(2), 215–244. Thompson, V. A., Turner, J. A. P., & Pennycook, G. (2011). Intuition, reason, and metacognition. Cognitive Psychology, 63(3), 107–140. Thompson, V. A., & Newman, I. (2018). Logical intuitions and other conundra for dual process theories. In W. De Neys (Ed.), Dual Process Theory 2.0. Oxon, UK: Routledge. Tinghög, G., Andersson, D., Bonn, C., Johannesson, M., Kirchler, M., Koppel, L., & Västfjäll, D. (2016). Intuition and moral decision-making–the effect of time pressure and cognitive load on moral judgment and altruistic behavior. PloS One, 11(10), e0164012. Trémolière, B., & Bonnefon, J.-F. (2014). Efficient kill–save ratios ease up the cognitive demands on counterintuitive moral utilitarianism. Personality and Social Psychology Bulletin, 40(7), 923–930. Trémolière, B., & De Neys, W. (2013). Methodological concerns in moral judgement research: Severity of harm shapes moral decisions. Journal of Cognitive Psychology, 25(8), 989–993. Trémolière, B., De Neys, W., & Bonnefon, J.-F. (2018). Reasoning and moral judgment: A common experimental toolbox. In L. J. Ball & V. A. Thompson (Eds.), The Routledge International Handbook of Thinking and Reasoning. Oxon, UK: Routledge. Trippas, D., & Handley, S. (2018). The parallel processing model of belief bias: review and extensions. In W. De Neys (Ed.), Dual Process Theory 2.0. Oxon, UK: Routledge. Trouche, E., Sander, E., & Mercier, H. (2014). Arguments, more than confidence, explain the good performance of reasoning groups. Journal of Experimental Psychology: General, 143(5), 1958–1971. Valdesolo, P., & DeSteno, D. (2006). Manipulations of emotional context shape moral judgment. Psychological Science, 17(6), 476. Wiech, K., Kahane, G., Shackel, N., Farias, M., Savulescu, J., & Tracey, I. (2013). Cold or calculating? Reduced activity in the subgenual cingulate cortex reflects decreased emotional aversion to harming in counterintuitive utilitarian judgment. Cognition, 126(3), 364–372.

213 Bago Bence – Thèse de doctorat - 2018

Chapter 6: Rise and fall of conflicting intuitions during reasoning

Abstract

Recent dual process models proposed that the strength of competing intuitions determines reasoning performance. A key challenge at this point is to search for boundary conditions; identify cases in which the strength of different intuitions will be weaker/stronger. Therefore, we ran two studies with the two-response paradigm in which people are asked to give two answers to a given reasoning problem. We adopted base-rate problems in which base rate and stereotypic information can cue conflicting intuitions. By manipulating the information presentation order, we aimed to manipulate their saliency; and by that, indirectly the activation strength of the intuitions. Contrary to our expectation, we observed that the order manipulation had opposite effects in the initial and final response stages. We explain these results by taking into account that the strength of intuitions is not constant but changes over time; they have a peak, a growth, and a decay rate.

Based on Bago, B., & De Neys, W. (2017). The rise and fall of conflicting intuitions during reasoning. In Proceedings of the 39th Annual Meeting of the Cognitive Science Society (pp. 87–92). Austin, TX: Cognitive Science Society. Retrieved from https://mindmodeling.org/cogsci2017/papers/0028/index.html

214 Bago Bence – Thèse de doctorat - 2018

Introduction

Decades of research in thinking and reasoning has revealed that people are usually subject to errors. Consider for example the following situation:

“There is a party with 1000 people. Jo is a randomly chosen participant from the party. We know that Jo is 23 years old and is finishing a degree in engineering. On Friday nights, Jo likes to go out cruising with friends while listening to loud music and drinking beer. We also know that 900 people attending the party are women. What is most likely: Is Jo a man or a woman?” This is a so-called base rate problem. Based on the “normative”24 principle that a randomly drawn individual will more likely come from the largest group, one should favor the conclusion that Jo is a woman. However, the majority of people tend to err on this problem by going with the presented stereotype (which cues that Jo is a man). Dual process theories provide an explanation for general thinking bias on problems such as the base rate task. They distinguish two types of processing, Type 1 and Type 2. One should note that there are many dual process theories, but in this study, we will focus on the most influential dual process theory, the default-interventionist theory. Type 1 processes (also referred to as intuitive processes) are thought to be completely autonomous, while Type 2 processes (also referred to as analytic processes) are more controlled. Type 1 processing generates responses cued by stereotypes or common beliefs; relying on this intuitive, initial response is what makes people biased in such situations. After Type 1 processing produced a response, in some cases, Type 2 processing gets engaged; this type of processing has the ability to override and correct the response generated by Type 1 processing. In general, it is assumed that Type 2 processing has the ability to generate responses based on logic or probabilities, while Type 1 processing has not been considered to be able to handle information such as logical properties of the task, or probabilities (Kahneman, 2011; Stanovich & Evans, 2013). However, recently, conflict detection studies (De Neys, 2012, 2014) indicated that the assumption that Type 1 processing is not able to handle probabilistic or logical information might not hold. These studies showed that even biased reasoners were able to detect the

24 Note that we will be using the label “normative”, ‘‘correct”, or ‘‘logical” response as a handy shortcut to refer to ‘‘the response that has traditionally been considered as correct or normative according to standard logic or probability theory”. The appropriateness of these traditional norms has sometimes been questioned in the reasoning field (e.g.,see (Keith E. Stanovich & West, 2000b), for a review). Under this interpretation, the heuristic response should not be labeled as ‘‘incorrect” or ‘‘biased”. For the sake of simplicity, we stick to the traditional labeling. In the same vein, we use the term ‘‘logical” as a general header to refer both to standard logic and probability theory.

215 Bago Bence – Thèse de doctorat - 2018 conflict between intuitive “heuristic” cues (e.g., stereotypes) and “normative” logical and probabilistic principles (e.g., base rate probabilities). These studies usually contrast conflict and no-conflict reasoning problems. In conflict problems, heuristic processing and normative principles cue different responses as in the base rate problem above. In a no-conflict problem normative principles and heuristic processing cues the same response; for example, imagine that the above-presented base rate problem would state that there are 900 men and 100 women. In this case, both the stereotype and base rate probabilities would cue the same response (that Jo is a man). In conflict problems, studies showed that even incorrect reasoners (compared to correct reasoners in no-conflict problems) showed elevated response times, decreased post-decision confidence, and higher activation in brain areas mediating conflict detection across a range of tasks (for review see De Neys, 2012). These results made some authors suggest that there occurs some kind of elementary processing of logical/probabilistic information even during Type 1 processing. De Neys (2012) argues that conflict detection happens as a result of two conflicting Type 1 outputs, generated by two kinds of intuitions. He argues that one of these intuitions is based on stereotypes or common beliefs (heuristic intuition) the other one is based on logico- mathematical principles (logical intuition). Recently, Bago and De Neys (2017b) went a step further and argued that people are not just able to detect the conflict intuitively but some of them are able to give the logically correct response intuitively. Our so-called hybrid dual process model argues that the two different intuitions differ in activation strength (or “salience”), and the actual intuitive response that the person provides will be the one which gained more strength. The relative difference between the strength of the heuristic and logical intuitions defines how pronounced the conflict is; the smaller the relative difference, the more pronounced the conflict will be; the larger the relative difference, the less pronounced it will be. A key question at this point is to search for boundary conditions; identify cases in which the strength of different intuitions will be more or less pronounced. One way to do so is to manipulate the presentation order of base rate information and stereotypes. Let us explain why. In a previous study, Pennycook, Fugelsang, and Koehler (2015) argued that a “given piece of information is at its most salient just prior to judgement” (Pennycook et al., 2015, p. 57). Pennycook et al. (2015) further argued that this would mean that base rate information is most salient if presented right before the decision was made (after the stereotypical description had been presented). The authors observed that presenting the base rate information at the end of the problem indeed boosted participants’ accuracy compared to the

216 Bago Bence – Thèse de doctorat - 2018 condition when it was presented first. To help us explain these results, one could operationalize saliency as the strength of a given intuitive response. Hence, whatever information was presented later, would be the more salient, therefore the intuition cued by this piece of information would be the stronger one. In this study, we wanted to test the robustness of these findings – will we get the same effects after purely intuitive Type 1 processing? Thus, to test this question, one needs to use a research design which is able to separately measure intuitive Type 1 responses from analytic Type 2 responses. For this reason, we used the two response paradigm (V. A. Thompson, Prowse Turner, et al., 2011). In the two response paradigm, participants are presented with the same item twice. First, they are asked to give a very quick intuitive, initial response. Then, the same task is presented again and now they can take as much time as they want before providing their final response. One also needs to be sure that the initial response is truly intuitive; we achieved this by applying a strict response deadline (3 seconds) and a secondary task that burdens reasoner’s (executive) cognitive capacity during the initial response. With these manipulations we can experimentally knock out Type 2 processing during the initial responding (Bago & De Neys, 2017a). Our hypothesis was that if presentation order indeed affects the strength of an intuition, we should observe the same effect after purely intuitive processing as has been observed previously after deliberative thinking. That is, if base rates are presented last, the strength of the base rate intuition should be higher, and therefore more correct responses should be observed both at the initial and final response stages.

Study 1

Method

Participants

In total, 149 participants took part in the experiment (86 female, M = 39.3 year, SD =12.7 year). Participants were recruited online, via Crowdflower, and received $0.25 for their participation. Subjects were randomly assigned to one of the two conditions. Note that data in the S-BR condition were taken from the study of (Bago & De Neys, 2017a). A total of 44.5% of participants reported having high school as highest completed educational level, while

217 Bago Bence – Thèse de doctorat - 2018

52.1% reported that they have a post-secondary educational degree (3.4% reported less than high school).

Materials

Reasoning task. Participants solved a total of eight base-rate problems. All problems were taken from (Pennycook, Cheyne, et al., 2014)). Participants always received a description of the composition of a sample (e.g., “This study contained I.T engineers and professional boxers”), base rate information (e.g., “There were 995 engineers and 5 professional boxers”) and a description that was designed to cue a stereotypical association (e.g. “This person is strong”). Participants’ task was to indicate to which group the person most likely belonged. The problem presentation format we used in this research was based on Pennycook et al.'s (2014) rapid-response paradigm. In this paradigm, the base rates and descriptive information are presented serially and the amount of text that is presented on screen is minimized. Pennycook et al. introduced the paradigm to minimize the influence of reading times and get a purer and less noisy measure of reasoning time per se. Participants received 3 pieces of information in a given trial. First, the names of the two groups in the sample (e.g., “This study contains clowns and accountants”). This sentence stayed on the screen and was always presented first. Participants were presented with stereotypical descriptive information (e.g., Person ‘L’ is funny) as well. The descriptive information specified a neutral name (‘Person L’) and a single word personality trait (e.g., “strong” or “funny”) that was designed to trigger the stereotypical association. Participants also received the base rate probabilities. In this experiment, we manipulated the presentation order of the base rate probabilities and stereotypes. So, for one group the base rates were presented first (BR-S), for the other group, the base rates were presented last, after the stereotype (S-BR). Presentation order was manipulated between-subject. The following illustrates the full problem format in the S-BR condition:

This study contains clowns and accountants. Person 'L' is funny. There are 995 clowns and 5 accountants. Is Person 'L' more likely to be: o A clown o An accountant

218 Bago Bence – Thèse de doctorat - 2018

Half of the presented problems were conflict items and the other half were no-conflict items. In no-conflict items, the base rate probabilities and the stereotypic information cued the same response. In conflict items, the stereotypic information and the base rate probabilities cued different responses. Three kinds of base rates were used: 997/3, 996/4, 995/5. Each problem started with the presentation of a fixation cross for 1000 ms. After the fixation cross disappeared, the sentence which specified the two groups appeared for 2000 ms. Then the first information appeared, for another 2000 ms, while the first sentence remained on the screen. Finally, the last information appeared together with the question and two response alternatives. Note that we presented the last information and question together (rather than presenting the last information for 2000 ms first) to minimize the possibility that some participants would start solving the problem during the presentation of the last part of the problem. Once all the parts were presented, participants were able to select their answer by clicking on it. The position of the correct answer alternative (i.e., first or second response option) was randomly determined for each item. The eight items were presented in random order. Confidence in the correctness of the response was recorded after the initial and the final response stages by asking participants to indicate their confidence level on a scale ranging from 0% to 100%.

Cognitive load task. We used a concurrent load task - the dot memorization task - to burden participants’ executive cognitive resources while they were solving the reasoning tasks. The idea behind the load manipulation is straightforward. One of the defining features of Type 2 processing is that it requires executive (working memory) resources (e.g., Evans & Stanovich,2013; Kahneman, 2011). Hence, if we burden participants’ cognitive resources with a secondary load task while they are solving the reasoning problems, we reduce the possibility that they can engage in Type 2 thinking (De Neys, 2006). In every trial, after the fixation cross disappeared, participants were shown a matrix in which 4 dots were presented in a complex interspersed pattern in a 3 x 3 grid for 2000 ms. Participants were instructed to memorize the pattern. Previous studies established that this demanding secondary task successfully burdens executive resources during reasoning (De Neys, 2006a). After the matrix disappeared, the reasoning problem was presented as described above and participants had to give their first response. Then participants were shown four matrices with different dot patterns and they had to select the correct, to-be- memorized matrix. Participants were given feedback as to whether they recalled the correct

219 Bago Bence – Thèse de doctorat - 2018 matrix or not. Subsequently, the problem was presented again and participants selected their final response and response confidence. Hence, no load was imposed during the second, final response stage. All trials on which an incorrect matrix was selected (9.5 % of trials) were removed from the analysis.

Response deadline. In order to minimize the possibility of Type 2 engagement during the initial response, we used a strict response deadline (3000 milliseconds), based on a reading pre-test (see Bago & De Neys, 2017a). 1000 ms before the deadline, the background turned yellow to alert the participants to the approaching deadline. If participants did not select an answer within 3000 ms they got feedback to remind them that they had not answered within the deadline and they were told to make sure to respond faster on subsequent trials. Obviously, there was no response deadline on the final response, but only on the initial response. All trials where participants did not manage to provide a response were excluded from the analysis (8.7% of trials).

Procedure. The experiment was run online. People were clearly instructed that we were interested in their first, initial response to the problem. Instructions stressed that it was important to give the initial response as fast as possible and that participants could afterwards take additional time to reflect on their answer. After the instructions, participants were presented with practice problems to familiarize them with the procedure. At the end of the experiment, demographic questions were collected.

Results

Our main interest concerns the response accuracy analysis. Table 1 gives an overview of the findings. As one can see, we replicated the findings of Pennycook et al. (2015) at the final response stage for the conflict problems: Final accuracies on conflict problems are higher (41.6%) when the base rates are presented last vs. first (24.3%). However, contrary to our expectations, we do not observe the same effect at the initial response stage; there is even a trend towards fewer correct responses in the “base rates last” S-BR condition (29.7%) vs BR-S (31.8%) condition. Indeed, the final conflict response accuracies in the S-BR condition were higher than the initial conflict response accuracies, whereas the reverse trend can be observed in the BR-S condition. In other words, the condition with the highest final accuracy

220 Bago Bence – Thèse de doctorat - 2018

(S-BR) was the one with the lowest initial accuracy, while the condition with the lowest final accuracy (BR-S) was the one with the highest initial accuracy. Finally, as expected, note that accuracies on the no-conflict problems were always very high. Not surprisingly, in the absence of conflict, both the stereotype and base-rates can cue the correct response whatever order the information is presented in.

Table 1. Percentage of correct initial and final responses for conflict and no-conflict items in both order conditions.

Response Order S-BR BR-S Conflict Initial 29.7% 31.8% Final 41.6% 24.3% No-conflict Initial 93.4% 90.1% Final 93.7% 91.4% Note. S-BR = base rates last/ BR-S = base rates first.

We used mixed effect logistic regression (logit) models to analyze the data and entered accuracy as a dependent variable. The order manipulation (S-BR/BR-S), response number (initial/final response), and their interaction were entered as predictors into the model. We also accounted for the random effect (random intercept) of subjects. We concentrated our analysis on the critical conflict problems. Only the interaction improved model fit significantly χ2 (5) = 20.18, p < 0.0001, b = 1.94, but not the main effect of order χ2 (3) = 0.19, p = 0.66 or response number χ2 (4) = 0.38, p = 0.54. These results confirm our visual inspection that order affects initial and final accuracies differently.

Table 2. Frequency of each direction of change category (number of trials) for conflict items in both conditions.

Direction of change Order S-BR BR-S 11 26.7% (54) 19.7% (47) 00 55.4% (112) 63.6% (152) 10 3% (6) 12.1% (29) 01 14.9% (30) 4.6% (11) Note. S-BR = base rates last/ BR-S = base rates first.

221 Bago Bence – Thèse de doctorat - 2018

For completeness, one could also test the direction of change in every trial (Bago & De Neys, 2017a). Specifically, people can give correct or incorrect responses on both response stages; this means that one could give two correct (“11”), two incorrect (“00”), an initial correct but final incorrect (“10”), or an initial incorrect but final correct (“01”) response. The results of the direction of change analysis are summarized in Table 2. In both order conditions, the most frequent categories were the “00” and “11” cases. In line with previous observations (Bago & De Neys, 2017b; V. A. Thompson, Prowse Turner, et al., 2011) people rarely changed their initial response (i.e., taken together the “10” and “01” cases account for 16%-18% of the trials). Interestingly, the direction in which people changed also tended to be reversed; in the S-BR condition most people who did change, changed from an incorrect to correct response (i.e., “01” category, 14.9% vs “10” category, 3%). However, in the BR-S condition most people who changed their initial response, changed it to an incorrect response (i.e., “10” category dominates with 12.1% vs 4.6% for the “01” category). Hence, this fits with the overall trend towards the higher likelihood of an initial incorrect and final correct response when the base rates are presented last. A Chi-square test of independence revealed that the distribution of the direction of change categories in the two order conditions significantly differed from each other χ2 (3) = 27.56, p < 0.0001.

Discussion

Contrary to our expectations, we did not observe the expected accuracy effect at the initial response stage; we only observed it in the final response stage. However, we wanted to be sure that the findings were robust before drawing any conclusions. Note that Pennycook et al. (2015) already observed that their order findings were robust against manipulations of the extremity of the base rates. That is, they found the same order effect on (final) accuracies when they used so-called “moderate” base rates (e.g., base rate probabilities of 700 men and 300 women) instead of the “extreme” base rates (e.g., e.g. base rate probabilities of 995 men and 5 women) that were adopted in our (and their) Study 1. In Study 2 we therefore also adopted the moderate base-rates and examined whether the unexpected reversal of the order effect on initial, intuitive responses would still be observed.

222 Bago Bence – Thèse de doctorat - 2018

Study 2

Method

Participants

In total, 162 participants took part in the experiment (98 female, M = 40.2 year, SD =14.6 year). Participants were recruited online, via Crowdflower, and received $0.25 for their participation. Subjects were randomly assigned to one of the two conditions. Note that data in the S-BR condition were taken from the study of Bago and De Neys (2017b). A total of 46.3% of participants reported having high school as highest completed educational level, while 52.5% reported that they have a post-secondary educational degree (1.3% reported less than high school).

Materials

Reasoning task. The identical experimental design was used as in Study 1. The only difference is that we used moderate base rates instead of extreme ones, namely 700/300, 710/290 and 720/280. In 16.7% of the trials participants did not provide the correct response for the dot matrix task, and in 10.5% of the trials, participants did not manage to produce an initial response within the deadline. These trials were excluded from further analysis. Overall, 24.6% of the trials were excluded and 977 were analyzed.

Results and discussion

Table 3 summarizes the accuracy results. As the table indicates, no-conflict response accuracies are again very high overall and we also replicated the conflict problem pattern we observed in Study 1: As Pennycook et al. (2015) found, presenting the base rates last led to increased accuracy on the final response. However, as in Study 1, the opposite trend was observed in the initial response. We also observe again that there were more initial than final incorrect response in the BR-S condition, whereas the opposite trend is observed in the S-BR condition. Statistical analysis on the conflict problems confirmed our visual inspection; neither presentation order χ2 (3) = 0.04, p = 0.84, nor response number improved model fit

223 Bago Bence – Thèse de doctorat - 2018 significantly, χ2 (4) = 0.05, p = 0.83, only their interaction did χ2 (5) = 9.73, p = 0.0018, b = 1.4.

Table 3. Percentage of correct initial and final responses for conflict and no-conflict items in both order conditions.

Response Order S-BR BR-S Conflict Initial 16.4% 18.3% Final 23% 13.2% No-conflict Initial 90.9% 90.9% Final 90% 92.5% Note. S-BR = base rates last/ BR-S = base rates first

Table 4 summarizes the results of the direction of change results for conflict items. Here too we observe the same trend as in Study 1. Among the few people who changed their response, the direction in which they changed are reversed as a function of presentation order; in the S-BR condition most people who did change, changed from an incorrect to correct response. But in the BR-S condition, more people changed to an incorrect response. A Chi- square test of independence revealed that the distribution of the direction of change categories in the two order conditions significantly differed from each other χ2 (3) = 18.22, p = 0.0004.

Table 4. Frequency of each direction of change category (number of trials) for conflict items in both order conditions.

Direction of change Order S-BR BR-S 11 14.2% (32) 8.2% (21) 00 74.8% (169) 76.7% (197) 10 2.2% (5) 10.1% (26) 01 8.8% (20) 5.5% (13) Note. S-BR = base rates last/ BR-S = base rates first

224 Bago Bence – Thèse de doctorat - 2018

General Discussion

In this paper, we tested whether manipulating the presentation order of the base rates and stereotypes had the same effect after purely intuitive processing (i.e., initial response) as had been observed previously after deliberative thinking (i.e., final response). In two studies, we replicated the findings of Pennycook et al. (2015) at the final response stage: Final accuracies on conflict problems were higher when the base rates were presented last. However, contrary to our expectations, in both studies this effect consistently reversed at the initial response stage. Why is this the case? We believe that these results draw attention to a simple but somewhat neglected issue in reasoning models, namely that intuitive responses are not generated instantly at full strength. The hybrid dual process model that we presented in the introduction (e.g., Bago & De Neys, 2017a) argues that reasoning performance in the initial response stage is determined by the strength of different intuitions, for example. The implicit assumption here is that the strength of these intuitions is “instant” and “constant”. That is, the idea is that the intuition is readily generated with full force and maintains this strength level. However, upon some further reflection, this assumption might be quite naïve. It is reasonable to assume that even a quickly generated intuition needs some time to reach its peak. Keeping this feature in mind might suffice to explain the current findings. Have a look at Figure 1. In this illustration, the strength of two intuitions (I1, I2) change over time – they have a peak, a growth and a decay rate. The y-axis represents the strength, the x-axis represents time, while T1 and T2 represent the time of initial and final response, respectively.

I1 and I2 will start gaining strength when the relevant cue is presented (in the S-BR condition

I1 is the heuristic intuition cued by the presentation of the stereotype, and I2 is the logical intuition cued by the base rate information). So, in the S-BR condition, the stereotype is presented first. When the stereotype is presented, the intuition (I1) cued by it starts gaining strength. Subsequently, the presentation of the base rate information cues the logical intuition

(I2) and its strength will also start rising. Both intuitions grow until they reach their peak. At

T1, I1 has already reached its peak, and is stronger than I2 (which has not reached its peak yet); as a result, I1 will be the initial response. But after T1, the strength of I1 starts decaying, while the strength of I2 is still increasing, and it reaches its peak at T2. At T2, I2 will be the stronger intuition, so people will more likely pick I2 as their final response. Hence, the mere growth and decay of an intuition – or it’s “rise and fall” as we labelled it in the title - implies

225 Bago Bence – Thèse de doctorat - 2018 that (ceteris paribus) the most recently cued intuition will be weaker earlier on in the reasoning process (e.g., initial response stage) and dominate later in the reasoning process (e.g., final response). Clearly, we have presented and illustrated the most generic and general case in which two intuitions have the same peak level, growth, and decay rate. Obviously, these features might vary. One intuition might have a higher peak than the other, or a faster/slower growth/decay than the other. In addition, we believe that deliberation might also modulate the strength level. For example, one can imagine that one functional consequence of deliberation might be to boost or sustain the peak activation level of one intuition and decrease activation of the other. These more specific features have to be tested and validated in future studies. For example, one could try to test the role of deliberation by examining the impact of cognitive load on the presentation order findings in the second response stage. However, in all these more specific cases the general principle holds that we have to keep in mind that intuitions are not necessarily generated instantly but “rise and fall”; we need to consider their growth and decay. We believe this should motivate further research in the area by trying to determine what the growth and decay functions look like exactly.

Figure 1. Illustration of how the strength of intuitions might change over time. The y-axis represents the activation strength while the x-axis represents time. I1 and I2 represent the two cued intuitions. Note that in the BR-S condition I1 is the logical intuition cued by the base rate probabilities, while I2 is the heuristic intuition cued by the stereotypes. Consequently, in the S-BR condition, I1 is the heuristic and I2 is the logical intuition. T1 and T2 represent the time of initial and final response, respectively.

226 Bago Bence – Thèse de doctorat - 2018

REFERENCES

Bago, B., & De Neys, W. (2017a). Fast logic?: Examining the time course assumption of dual process theory. Cognition, 158, 90–109. Bago, B., & De Neys, W. (2017b). Examining the hybrid dual process model. Manuscript in preparation. De Neys, W. (2006). Automatic–heuristic and executive–analytic processing during reasoning: Chronometric and dual-task considerations. The Quarterly Journal of Experimental Psychology, 59(6), 1070–1100. De Neys, W. (2012). Bias and conflict a case for logical intuitions. Perspectives on Psychological Science, 7(1), 28–38. De Neys, W. (2014). Conflict detection, dual processes, and logical intuitions: Some clarifications. Thinking & Reasoning, 20(2), 169–187. De Neys, W., & Schaeken, W. (2007). When people are more logical under cognitive load. Experimental Psychology (formerly Zeitschrift Für Experimentelle Psychologie), 54(2), 128–133. Evans, J. S. B., & Stanovich, K. E. (2013). Dual-process theories of higher cognition advancing the debate. Perspectives on Psychological Science, 8(3), 223–241. Franssens, S., & De Neys, W. (2009). The effortless nature of conflict detection during thinking. Thinking & Reasoning, 15(2), 105–128. Pennycook, G., Cheyne, J. A., Barr, N., Koehler, D. J., & Fugelsang, J. A. (2014). Cognitive style and religiosity: The role of conflict detection. Memory & Cognition, 42(1), 1–10. Pennycook, G., Fugelsang, J. A., & Koehler, D. J. (2015). What makes us think? A three- stage dual-process model of analytic engagement. Cognitive Psychology, 80, 34–72. Stanovich, K. E., & West, R. F. (2000). Individual differences in reasoning: Implications for the rationality debate. Behavioral and Brain Sciences, 23(5), 645–665. Thompson, V. A., Prowse Turner, J. A., & Pennycook, G. (2011). Intuition, reason, and metacognition. Cognitive Psychology, 63(3), 107–140.

227 Bago Bence – Thèse de doctorat - 2018

Conclusion

From default-interventionism to hybrid model: Summary

I started this thesis with the goal to test the validity of one of dual process theory’s central hypotheses, the corrective assumption. This assumption states that in conflicting situations, System 1 generates a response based on stereotypic associations or common beliefs. When this response is in conflict with normative principles, it needs to be corrected by System 2 engagement; I tested this assumption with the two-response paradigm, where people are presented with the same reasoning problem twice. First, they are asked to give an initial response under response deadline and concurrent load (to assure the response is intuitive). After, they are presented with the same problem again and asked to provide a final response without any constraints. We can confidently conclude that I found little evidence for the corrective assumption. People who managed to give the correct response in the end, already gave the correct response from the very beginning. This “non-correction” ratio was stable across different manipulations and different reasoning tasks (syllogistic reasoning, base rate neglect, bat-and-ball problem, moral reasoning). In additional experiments I also asked people to try to justify their response. I found that they could not justify their logically correct intuitive response - they needed further time to deliberate before they were able to do so. This evidence forced me to revise dual process theory in a way that it does not anymore rely on the corrective assumption. A new dual process model was needed which can accommodate that logical responding can occur from the very beginning of the reasoning process. This new model is called the hybrid dual process model. It is default-interventionist in nature; it distinguishes analytic and intuitive processes and proposes a serial interaction between the two systems. However, in this model, System 1 generates two (or more) intuitive responses. One is based on heuristic cues, while the other is based on elementary logico- mathematical principles. However, these responses are not generated with equal strength. Whichever response gains the more strength will be selected and given as an initial response. The model defines the feeling of conflict as the relative difference between the strength of the two activated intuitions. Some clarification is needed here. This model poses a more positive view of intuitive processing than the default-interventionist theory. It does not assume that System 1 is totally blind toward logical principles. However, there are two things I must mention here. The

228 Bago Bence – Thèse de doctorat - 2018 majority of all participants are still biased by System 1 heuristic intuition. I do not argue that participants always generate a correct intuitive response, but that the majority of correct responses are intuitive in nature. This, in itself, is enough to make us revise the default- interventionist view. Second, System 1 logics has its own limits. I do not say that it will be generated in every situation. Complexity may play a role in here; one can imagine reasoning problems where the solution requires deliberation, just imagine any hard math problem. This model (specified in length in every chapter) was created to fit a dataset, thus, further empirical validation was needed. I used an experimental manipulation where we manipulated the strength of the logical intuition (moderate vs extreme base rates) and expected that this will differentially affect the feeling of conflict for biased and non-biased reasoners. I found evidence supporting this claim; with a stronger logical intuition, correct reasoners felt less conflicted than with less strong logical intuition. This was exactly the reverse for biased reasoners. In a further experiment, with the use of electroencephalogram, I found evidence for the early detection (between 200 and 500 ms after stimulus onset) of heuristic-logic conflict. This provides good evidence for the hybrid model over the default- interventionist view. In a final experiment, I also challenged the traditional view on intuitive responses; it is usually assumed that intuitive responses are “instant” and “constant” because of their automatic nature. I showed that this is not true; competing intuitive responses change their strength level over time. This change in strength can result in people changing their initial intuitive response without deliberation. In this dissertation, I also started to address the question whether the hybrid model can be generalized to other domains of thinking. Dual process theory has been very influential: different domains of reasoning research adopted it and used it to explain various phenomena. One of these domains is moral reasoning (Chapter 5). By using the two-response paradigm, we found evidence that utilitarian responses (i.e., deciding based on the greater good) are usually given intuitively – contrary to default interventionist predictions. So where exactly do we stand? I believe that over the last years, we managed to move forward from the default-interventionist model; we now have strong evidence for the hybrid model view. However, I must say that the hybrid model is a “work in progress”. It has its shortcomings and underspecifications. There are many challenges ahead. In the following section I will make an attempt to discuss some of these.

229 Bago Bence – Thèse de doctorat - 2018

What are intuitions?

Throughout this thesis, I used the concept “intuition” to refer to System 1 responses, but I never provided a clear definition of what intuition really is. The reason is that there is no clear definition on which everyone agrees. Classically, intuition is conceived as something very fast and inflexibly, but most importantly, a response which was given without being able to produce a justification (Glöckner & Witteman, 2010). In this thesis, I refer to intuition, as the output of System 1, automatically generated response. Lately, various authors argued that intuition (and System 1) is not a homogenous process, but a generic name for several types of automatic processes – that is the reason it is sometimes referred to as The Autonomous Set of Systems (TASS, Stanovich, 2009). These automatic processes do differ in their nature. Glöckner & Witteman (2010) provide a possible classification of intuitive responses; they argue that main difference between different intuitive responses is the way they were acquired and the way the information is retrieved or integrated into the decision. They named four different processes which they thought to underlie intuitive responding: associative intuition, matching intuition, accumulative intuition, and constructive intuition. The first three have special importance to this argument. Associative intuitions are argued to be acquired through associative or reinforcement learning, social learning. In addition, when cued, it is assumed that they are perceived/retrieved as feelings or general affective arousal. Matching intuition means the acquisition of exemplars or prototypes, while the retrieval process is the automatic comparison of target stimuli with exemplars. Accumulative intuition refers to the evidence accumulation process; its source is memory traces or just-perceived information, while it is integrated as “random sampling proportional to the importance of the information” (Glöckner & Witteman, 2010, p. 8). This is just one possible classification and it also has its shortcomings. The authors make it very clear that there are possible overlaps between these categories. These overlaps make it very hard to infer which process generated which intuition in a specific context. For example; stereotype-based responding in the base rate problem could have been caused by any of the three. However, the message I want to deliver is that in the hybrid model, it is not yet specified what types of intuitions are at play, what is their exact nature. Even if we know what information cues the “logical intuition” in one context, we might not know, what the exact mechanics of the process are. It is also not clear if an intuition’s nature has anything to with its strength. Does the strength somehow associate with the nature of the intuition (different types of intuitive responses increase/decrease strength

230 Bago Bence – Thèse de doctorat - 2018 levels)? How can and does the nature of the intuition reflect that the strength is changing over time (as suggested in Chapter 4)?

The rise and fall of conflicting intuitions?

Chapter 6 provides evidence that intuitive responses are not constant as generally assumed; they change their strength over time. I ended the chapter with a challenge; we should find out the exact function of how the strength of intuitive responses changes over time. To find this out, we need a, mathematically speaking, well specified model, which tries to capture and describe these dynamics. One possibility is to use sequential sampling models (Bogacz, Brown, Moehlis, Holmes, & Cohen, 2006). There are several kinds of sequential sampling models, but I will focus only on the one, which I believe is the most general one. This is called the mutual inhibition model. This model argues that possible responses are represented as different decision units. These decision units are competing through common inhibition; they are not independent; changes in their strength level directly affect the strength of the competing response option (i.e., decision unit).

The model describes the activation level of the decision units (D1 and D2):

퐷1̇ = (−푘퐷1 − 푤퐷2 + 퐼1푎 + 퐼1푏)푑푡 + 푐푑푊 퐷2̇ = (−푘퐷2 − 푤퐷1 + 퐼2푎 + 퐼2푏)푑푡 + 푐푑푊

Let me explain the parameters in more detail. Letter “k” represents leakage; the intuitive response looses its strength with a given rate, naturally. Parameter “w” represents inhibition; this is the strength of inhibitory connection between the two decision units; the higher it is, the stronger the decision units affect (negatively) each other’s activation level. “I1” and “I2” is the strength of the information, in other words, how strongly the stimuli cue the response. If there is no noise, leakage or inhibition, “I” basically equals to the strength. Parameters “a” and “b” simply account for the time a given information is presented to the participant – they might not be presented in the exact same time (for example chapter 6). The rest represents noise. This is probably the most general and neutral mathematical description of intuitive processing. It is important to note that this model does not assume that there is leakage or inhibition per se, it merely notes the possibility. It is possible that when this model is fitted on the data, some parameters turn out to be 0, therefore, they are non-existent. What is needed to

231 Bago Bence – Thèse de doctorat - 2018 be investigated, is whether “k” or “w” differ from 0 or not. This information indeed has consequences for the theory; are they connected through common inhibition or not? Is there any leakage or not? Their exact value – if it turns out to be higher than 0 – is less important.. There is at least one important conclusion which follows from this model. Previous literature using the two response paradigm assumed that people who change their initial response engaged in analytical reasoning. If this model is right, response change or “changes of mind” is not bound to System 2 engagement. Their strength changes as they are leaky or as there (might be) an inhibitory connection between them. This leads us to a different observation. If this is true, then, in theory, increased level of conflict associates with response change (as previously shown in Chapters 1,2,5) but does not cause it. This is a question which can be answered by the careful testing of this model. Note that this is not the first formal model trying to capture changes of mind. Previous models tried to capture changes of mind dynamics in a more simple task (the dot motion task) and also used a more simple model (the drift diffusion model Resulaj, Kiani, Wolpert, & Shadlen, 2009; Van den Berg et al., 2016). This model can also be used as alternative model, against which one can test the outlined mutual inhibition model.

The problem of “specificity”

The hybrid model specifies conflict as the relative difference between the strength of two competing intuitions. But what determines the feeling of conflict when there are more than two competing intuitions? On the one hand, all of the reasoning problems used in the past to provide support for the hybrid model did use problems where only two kinds of intuitive responses were cued. On the other hand, Bhatia (2017) manipulated conflict by adding more heuristic response options to the task. He expected that participants will experience more conflict and, therefore, be more likely to engage in deliberative reasoning when more heuristic response options are added. He indeed found increased accuracy level and more experienced conflict (measured by confidence and RT) when more heuristic response options were presented to the participant. However, the exact cognitive mechanism causing the increased conflict is not yet clear. Consider the way the hybrid model defines conflict. We can easily see that the simple subtraction fails when there are more than two competing intuitions. Imagine the following case: I1 = 4, I2 = 3 I3 = 1 (Intuition 1,2,3 have an imaginary strength level of 4,3,1, respectively). If we go with simple subtraction (as an estimate for conflict) we will get this: 4 – 3 – 1 = 0. This value will be lower than the value

232 Bago Bence – Thèse de doctorat - 2018 we would get if only two (any of the two) response would be cued. This does not sound right, as according to the results of Bhatia (2010), the conflict should be higher as more intuitive responses are getting activated. Therefore, simple subtraction as an estimate for conflict only works for the special two-competing intuition scenario. One also has to bear in mind that we do not know if conflict always increases with the number of cued intuitive responses. One can imagine that after a specific point, one more added response option will not increase conflict anymore. Consider for example a scenario where 6 possible response options are cued. It is so much that people might not experience more conflict than in the 3 option case. Point is, we do not know how the feeling of conflict would be determined as a function of response options/cued intuitions. These are interesting challenges for the future.

What is the purpose of System 2 and how does it work?

I believe that another current weakness of the hybrid model is the role and specification of System 2 in reasoning. Previously, in chapters 2 and 5 we argued that System 2 is there to justify intuitive responses. We showed this by asking people to justify their response. After initial response, people hardly came up with a mathematically correct justification, and needed further re-thinking time to create clear justifications of the response. Thus, I argued that System 2 is needed to create justifications of the intuitive response. However, this is only one aspect of System 2 reasoning we tested, but again, the full picture is probably a bit more complex. We also observed that there are some participants that change their initial response after deliberation. Therefore, System 2 might do something else besides creating justifications. In a more detailed model of System 2 processing, Pennycook, Fugelsang, & Koehler (2015) argued that there are at least two types of System 2 processes; rationalization and cognitive decoupling. Cognitive decoupling refers to the creation of a secondary representation of the problem on which people can execute manipulations and perform hypothetical thinking (Stanovich, 2009). For Pennycook et al. (2015) cognitive decoupling means an attempt to try to falsify System 1 output, while rationalization is a process which tries to verify it. What is the difference between them in terms of processing, is less clear. Let me give an example to show how controversial this distinction is. Szollosi, Bago, Szaszi, & Aczel, (2017) gave participants the standard bat-and-ball problem. After participants solved the bat-and-ball problem, they were asked whether they tried to verify their answer. The authors define verification as follows: “Conceptually, we defined verification as people's self-reported attempt to make certain that the given answer is

233 Bago Bence – Thèse de doctorat - 2018 correct. […] On the bat-and-ball problem one can verify their initial answer by substituting the data from the task into a formula (e.g., into the correct formula: x + [x + $1.00] = $1.10).” (Szollosi et al., 2017, p. 2). The authors found that the majority of participants did try to verify their response. So what exactly is verification? Is it cognitive decoupling or rationalization? Pennycook et al. (2015) could argue that it depends on whether the output of verification confirms or falsifies the intuitive response. But in terms of processing, there might be no difference at all; after making an attempt to verify your response, you could arrive at realizing that you made a mistake (and thus change your response), or even realize and be even more confident that your initial response was correct. Indeed, people who tried to verify the correctness of their response, but failed to do so, ended up with a decreased level of confidence. It is further not clear whether people can only engage in one of the proposed processes (rationalization/decoupling). Once we made an attempt to rationalize our initial response we might still be able to engage in decoupling or any type of System 2 processing afterwards. A more comprehensive theory of System 2 processing also has to be clearer on the subject of when a given process is cued. Under what condition do people engage in rationalization or decoupling for example? This remains a critical issue for the future.

The problem of domain generality

Default interventionist dual process theory has been very influential and affected theorizing in several domains of thinking research. In this dissertation, I showed evidence against the corrective assumption and argued that in the logical reasoning field, a hybrid model should be rather accepted. This evidence could give rise to doubts in the other domains that adopted default-interventionist view. One of these domains is moral reasoning. In Chapter 5, we tested the corrective assumption in moral reasoning and found support for the hybrid model over the default-interventionist view. In the domain of cooperative thinking, the corrective assumption has also emerged. Cooperation researchers use situations in which altruistic and selfish (or cooperative and defective) considerations are in conflict with each other. For example, imagine you found 10 euros on the street. You see a homeless person sitting near you. How much would you give to him? Selfish considerations (and as well economic theory) would suggest that you keep all the money to yourself, while altruistic considerations would make you give the homeless guy some money. In practice, cooperation/altruism are usually studied and operationalized in the

234 Bago Bence – Thèse de doctorat - 2018 context of economic games (e.g., the prisoner’s dilemma, public good’s game, dictator game, ultimatum game, see Rand, Greene, & Nowak, 2012). Again, the critical corrective assumption is that in conflicting situations, both System 1 and 2 responses have their own, different characteristics. For example, in the above situation (which is an analogy of the dictator game), altruism is thought to be the intuitive, System 1 response, while being selfish is a more deliberative, System 2 response according to the influential work of Rand and colleagues (Rand, 2016; Rand et al., 2014, 2012). Surprisingly, there is little evidence in the literature that allows for the direct validation of the corrective assumption. In several experiments, researchers used time pressure or delayed responding to test the idea. For example, when presented with a moral dilemma, people either responded within a deadline (time pressure) or were told not to respond before some time had passed (time delayed) (Rand et al., 2012). The rationale here is that System 1 operates fast, while System 2 needs time to generate a response. Thus, given a limited amount of time, people rely on their System 1, intuitive response, while they will be more likely to deliberate when allotted more time in the delayed condition. In line with the expectation of the corrective assumption, people were less cooperative in the delayed than in the time pressured condition in the (but effects are usually quite small and weak, see Tinghög et al., 2016). However, this does not necessarily imply that cooperative responders generated the deontological response before the utilitarian one. People might have needed more time to reach a selfish decision without ever having considered the cooperative option (and evidence is also debated, see Bouwmeester et al., 2017; Tinghög et al., 2013). From a theoretical point of view, it is important to test whether the same cognitive architecture underlies reasoning in different thinking domains. But, if we have a unified picture of how human thinking is structured, this will also allow us to design policies aiding rational decision making, promote cooperation or utilitarianism. Without this knowledge, policies can easily backfire. For example, should we promote analytical thinking above intuition? If the corrective assumption is right, analytical thinking, in general, makes people more utilitarian or logical, but less cooperative. Thus, it might be (or have been) our goal to make people more rational by inducing deliberation, but at the same time, such an analytical- thinking promoting policy will make people less cooperative. Therefore, testing the question of domain specificity has to be addressed, before we even start thinking about changing any policies or designing interventions.

235 Bago Bence – Thèse de doctorat - 2018

The problem of generalizability in different populations

In all of my experiments, I used internet-based or laboratory experiments. The majority of our respondents are well-educated individuals from western cultures, who are usually referred to as WEIRD population (Western, Educated, Industrialized, Rich, Democratic). This is indeed a problem – it might have biased our results and thus prevents us from generalizing over all populations. I’m going to reflect on two out of the five characteristics of this population, which I believe should be addressed in the upcoming years. Education. In Chapter 2, I argued that people with correct intuition probably automatized the algebra required to solve the bat-and-ball problem. Even though this is a very basic algebraic equation, it does require some sort of elementary education where participants had the vast opportunity to practice solving similar problems. However, if they lack this education, they might find it hard to solve the problem at all, let alone develop correct intuitions. We do not know at this point. But we cannot be sure that the “correct intuitive responding” effect is robust without testing it first. These results might also be specific to each reasoning problem. For example, it has been shown that infants are sensitive to frequencies and basic logic (Cesana-Arlotti et al., 2018; Téglás et al., 2011). Thus, the rate of correct intuitive responses might not depend on education with reasoning problems such as base rate neglect and syllogistic reasoning, simply because the origin of the logical intuition here is not automatization of a learnt rule, but something else. Cultural specification. Culture is a major factor in what type of intuitive responses people have. In moral reasoning, even within Western cultures (and here I refer to the USA, to be more specific) people with different political attitudes produce different moral intuitions (Greene, 2013). It also has been shown for example that there are great cultural differences in terms of utilitarian response rate between western societies and China: Chinese people tend to be less utilitarian than western people (Ahlenius & Tännsjö, 2012; Gold, Colman, & Pulford, 2014). However, we do not know which part of the reasoning process is affected by culture. Do people in China produce less strong utilitarian intuitions on average? Do they differ in their analytic thinking style (which might make them to be more deontological after deliberation)? To understand the possible limits of the hybrid model, we have to understand how culture affects the way we think.

236 Bago Bence – Thèse de doctorat - 2018

Moral philosophical implications of the hybrid model

Greene (2013) argues that understanding the cognitive processes behind moral reasoning is important in order to select which moral philosophy can help mankind succeed in the long term. He argues that deontological responses are supported by intuitive, System 1 processing, while utilitarian responses are supported by System 2, deliberative processes. Thus, when people make up arguments to support their deontological intuition, it has to be a rationalization. Utilitarian responding is superior because it was preceded by conscious, analytic reasoning. He argues that intuitive responses cannot be trusted on issues such as morality, and we cannot build laws, justice systems or even societies on plain, automatic, intuitive responses. These automatically generated responses usually reflect an emotion and are sometimes biased. We should follow reason instead. However, we already know that the majority of utilitarian responses are generated by intuitive processing too (in sacrificial moral dilemmas at least). In fact, when people were facing a stronger emotional stimulus, they were more likely to choose the deontological response after additional deliberation, than only after quick intuitive response. It also means that utilitarian responses are also subject to rationalization. Therefore, we cannot really conclude that utilitarianism is superior to deontological responding (not for the reasons Greene gives). Greene’s reasoning does not stand, if utilitarianism is intuitive. However, the picture is more complicated than that. I believe that System 1 intuitive utilitarian responding has its own limits. System 1 cannot really perform difficult computations. It simply cues an automatically available response. Therefore, in situations where the greater good is harder to compute or even figure out, System 2 engagement is needed to find the answer. Consider for example any situation in which there are probabilities assigned to each option (high level of uncertainty) or long-term consequences of actions are not clear (or when long term consequences are in conflict with the immediate losses/rewards). This requires time and careful deliberation. System 1 utilitarian response is rather a state of the mind, a will, to find the utilitarian response. In other words, in these situations, some type of System 2 hypothetical thinking is required to find the true utilitarian response. Deontological responses probably never need this kind of analytic engagement, to find the deontological response. The point is that utilitarian response might still be superior, but if one wants to find it out, one needs to test the hybrid model with problems where the utilitarian option is harder to find – as in the scenario I mentioned.

237 Bago Bence – Thèse de doctorat - 2018

In conclusion - Toward a unified theory of thinking

Understanding how intuitive and deliberative thought interact has been one of the central aims of research on human thinking for decades. First, the default-interventionist theory has been proposed to capture this interaction, but after falsifying its central assumption – the corrective assumption - I believe it is time to move forward toward a hybrid dual- process future. Indeed, there is a long way ahead. The list of the problems and questions I outlined here regarding the hybrid model is not even close to being complete, and it is already a long list. However, finding an answer to these questions will be necessary before we manage to arrive at a truly unified, general model of human thinking.

238 Bago Bence – Thèse de doctorat - 2018

REFERENCES

Ahlenius, H., & Tännsjö, T. (2012). Chinese and Westerners Respond Differently to the Trolley Dilemmas. Journal of Cognition and Culture, 12(3–4), 195–201. Bhatia, S. (2017). Conflict and bias in heuristic judgment. Journal of Experimental Psychology: Learning, Memory, and Cognition, 43(2), 319–325. Bogacz, R., Brown, E., Moehlis, J., Holmes, P., & Cohen, J. D. (2006). The physics of optimal decision making: a formal analysis of models of performance in two- alternative forced-choice tasks. Psychological Review, 113(4), 700–765. Bouwmeester, S., Verkoeijen, P. P., Aczel, B., Barbosa, F., Bègue, L., Brañas-Garza, P., … Espín, A. M. (2017). Registered Replication Report: Rand, Greene, and Nowak (2012). Perspectives on Psychological Science, 12(3), 527–542. Cesana-Arlotti, N., Martín, A., Téglás, E., Vorobyova, L., Cetnarski, R., & Bonatti, L. L. (2018). Precursors of logical reasoning in preverbal human infants. Science, 359(6381), 1263–1266. Glöckner, A., & Witteman, C. (2010). Beyond dual-process models: A categorisation of processes underlying intuitive judgement and decision making. Thinking & Reasoning, 16(1), 1–25. Gold, N., Colman, A. M., & Pulford, B. D. (2014). Cultural differences in responses to real- life and hypothetical trolley problems. Judgment and Decision Making, 9(1), 65–76. Greene, J. D. (2013). Moral tribes: emotion, reason and the gap between us and them. New York, NY: Penguin Press. Pennycook, G., Fugelsang, J. A., & Koehler, D. J. (2015). What makes us think? A three- stage dual-process model of analytic engagement. Cognitive Psychology, 80, 34–72. Rand, D. G. (2016). Cooperation, fast and slow: Meta-analytic evidence for a theory of and self-interested deliberation. Psychological Science, 27(9), 1192–1206. Rand, D. G., Greene, J. D., & Nowak, M. A. (2012). Spontaneous giving and calculated greed. Nature, 489(7416), 427–430. Rand, D. G., Peysakhovich, A., Kraft-Todd, G. T., Newman, G. E., Wurzbacher, O., Nowak, M. A., & Greene, J. D. (2014). Social heuristics shape intuitive cooperation. Nature Communications, 5. Resulaj, A., Kiani, R., Wolpert, D. M., & Shadlen, M. N. (2009). Changes of mind in decision-making. Nature, 461(7261), 263–266. Stanovich, K. E. (2009). Distinguishing the reflective, algorithmic, and autonomous minds: Is it time for a tri-process theory. In J. S. B. T. Evans & K. Frankish (Eds.), In two minds: Dual processes and beyond (pp. 55–88). New York, NY: Oxford University Press. Szollosi, A., Bago, B., Szaszi, B., & Aczel, B. (2017). Exploring the determinants of confidence in the bat-and-ball problem. Acta Psychologica, 180, 1–7. Téglás, E., Vul, E., Girotto, V., Gonzalez, M., Tenenbaum, J. B., & Bonatti, L. L. (2011). Pure reasoning in 12-month-old infants as probabilistic inference. Science, 332(6033), 1054–1059.

239 Bago Bence – Thèse de doctorat - 2018

Tinghög, G., Andersson, D., Bonn, C., Böttiger, H., Josephson, C., Lundgren, G., … Johannesson, M. (2013). Intuition and cooperation reconsidered. Nature, 498(7452), E1–E2. Tinghög, G., Andersson, D., Bonn, C., Johannesson, M., Kirchler, M., Koppel, L., & Västfjäll, D. (2016). Intuition and moral decision-making–the effect of time pressure and cognitive load on moral judgment and altruistic behavior. PloS One, 11(10), e0164012. Van den Berg, R., Anandalingam, K., Zylberberg, A., Kiani, R., Shadlen, M. N., & Wolpert, D. M. (2016). A common mechanism underlies changes of mind about decisions and confidence. Elife, 5, e12192.

240 Bago Bence – Thèse de doctorat - 2018

Appendix A

SUPPLEMENTARY MATERIAL FOR CHAPTER 1 Section A.

Base rate problems used in the study This study contains scientists and assistants. This study contains lawyers and gardeners. Person 'C' is intelligent. Person 'W' is argumentative. There are 4 scientists and 996 assistants. There are 3 lawyers and 997 gardeners. (No-conflict) (Conflict)

This study contains clowns and accountants. This study contains high school students and librarians. Person 'L' is funny. Person 'M' is loud. There are 995 clowns and 5 accountants. There are 995 high school students and 5 librarians. (No-conflict) (Conflict)

This study contains lab technicians and This study contains I.T. technicians and boxers. aerobics instructors. Person 'F' is strong. Person 'D' is active. There are 997 I.T. technicians and 3 boxers. There are 5 lab technicians and 995 aerobics (Conflict) instructors. (No-conflict)

This study contains nurses and artists. This study contains businessmen and firemen. Person 'S' is creative. Person 'K' is brave. There are 3 nurses and 997 artists. There are 996 businessmen and 4 firemen. (No-conflict) (Conflict)

Section B. Syllogistic reasoning problems used in the study Type „A” questionnaire Type „B” questionnaire

All flowers need light All flowers need light Roses are flowers Roses need light Roses need light Roses are flowers (No-conflict: Valid/Believable) (Conflict: Invalid/Believable)

All things made of wood can be used as fuel All things made of wood can be used as fuel Trees can be used as fuel Trees are made of wood Trees are made of wood Trees can be used as fuel (Conflict: Invalid/Believable) (No-conflict: Valid/Believable)

241 Bago Bence – Thèse de doctorat - 2018

All mammals can walk All mammals can walk Spiders can walk Whales are mammals Spiders are mammals Whales can walk (No-conflict: Invalid/Unbelievable) (Conflict: Valid/Unbelievable)

All vehicles have wheels All vehicles have wheels Boats are vehicles Trolley suitcases have wheels Boats have wheels Trolley suitcases are vehicles (Conflict: Valid/Unbelievable) (No-conflict: Invalid/Unbelievable)

All birds have wings All birds have wings Crows are birds Crows have wings Crows have wings Crows are birds (No-conflict: Valid/Believable) (Conflict: Invalid/Believable)

All cannons fire bullets All cannons fire bullets Water cannons are cannons Guns fire bullets Water cannons fire bullets Guns are cannons (Conflict: Valid/Unbelievable) (No-conflict: Invalid/Unbelievable)

All flowering plants have leafs All flowering plants have leafs Fern has leafs Cacti are flowering plants Fern is a flowering plant Cacti have leafs (No-conflict: Invalid/Unbelievable) (Conflict: Valid/Unbelievable)

All dogs have snouts All dogs have snouts Labradors have snouts Labradors are dogs Labradors are dogs Labradors have snouts (Conflict: Invalid/Believable) (No-conflict: Valid/Believable)

242 Bago Bence – Thèse de doctorat - 2018

A) Base rate: initial response B) Base rate: final response

C) Syllogism: initial response D) Syllogism: final response

Figure S1. Mean no-conflict problems response latencies of the initial and final responses in the base-rate and syllogistic reasoning task for each of the direction of change categories. Error bars are 95% confidence intervals. Note that averages and confidence intervals were calculated on log-transformed latencies. The figure shows the back-transformed (anti-logged) latencies.

243 Bago Bence – Thèse de doctorat - 2018

A) Base rate: initial response B) Base rate: final response

C) Syllogism: initial response D) Syllogism: final response

Figure S2. Mean no-conflict problem confidence ratings for initial and final responses in the base-rate and syllogistic reasoning tasks for each of the direction of change categories. Error bars are 95% confidence intervals.

244 Bago Bence – Thèse de doctorat - 2018

Appendix B

SUPPLEMENTARY MATERIAL FOR CHAPTER 2

Items used in Study 1-6: Conflict version Control version 1 A pencil and an eraser cost $1.10 in total. A pencil and an eraser cost $1.10 in total. The pencil costs $1 more than the eraser. The pencil costs $1. How much does the eraser cost? How much does the eraser cost? 2 A magazine and a banana cost $2.60 in total. A magazine and a banana cost $2.60 in total. The magazine costs $2 more than the banana. The magazine costs $2. How much does the banana cost? How much does the banana cost? 3 A cheese and a bread cost $2.90 in total. A cheese and a bread cost $2.90 in total. The cheese costs $2 more than the bread. The cheese costs $2. How much does the bread cost? How much does the bread cost? 4 An apple and an orange cost $1.80 in total. An apple and an orange cost $1.80 in total. The apple costs $1 more than the orange. The apple costs $1. How much does the orange cost? How much does the orange cost? 5 A sandwich and a soda cost $2.50 in total. A sandwich and a soda cost $2.50 in total. The sandwich costs $2 more than the soda. The sandwich costs $2. How much does the soda cost? How much does the soda cost? 6 A hat and a ribbon cost $4.20 in total. A hat and a ribbon cost $4.20 in total. The hat costs $4 more than the ribbon. The hat costs $4. How much does the ribbon cost? How much does the ribbon cost? 7 A coffee and a cookie cost $2.40 in total. A coffee and a cookie cost $2.40 in total. The coffee costs $2 more than the cookie. The coffee costs $2. How much does the cookie cost? How much does the cookie cost? 8 A book and a bookmark cost $3.30 in total. A book and a bookmark cost $3.30 in total. The book costs $3 more than the bookmark. The book costs $3. How much does the bookmark cost? How much does the bookmark cost?

Response options for each of the problems in Study 1-6: 2 response options 4 response options 1 5, 10 1, 5, 10, 15 2 30, 60 15, 30, 60, 90 3 45, 90 15, 45, 90, 135 4 40, 80 20, 40, 80, 120 5 25, 50 5, 25, 50, 75 6 10, 20 5, 10, 20, 30 7 20, 40 10, 20, 40, 60 8 15, 30 5, 15, 30, 45

245 Bago Bence – Thèse de doctorat - 2018

Items used in Study 7: Conflict version Control version 1 An apple and an orange weigh 160 grams altogether. An apple and an orange weigh 160 grams The apple weighs 100 grams more than the orange. altogether. How much does the orange weigh? The apple weighs 100 grams. How much does the orange weigh? 2 In a shop there are 250 PCs and MACs altogether. In a shop there are 250 PCs and MACs altogether. There are 200 more PCs than MACs. There are 200 PCs. How many MACs are there in the shop? How many MACs are there in the shop? 3 Altogether, a book and a magazine have 330 pages. Altogether, a book and a magazine have 330 pages. The book has 300 pages more than the magazine. The book has 300 pages. How many pages does the magazine have? How many pages does the magazine have? 4 In total, a plumber and an electrician work 240 days. In total, a plumber and an electrician work 240 The electrician works 200 days more than the days. plumber. The electrician works 200 days. How many days does the plumber work? How many days does the plumber work?

246 Bago Bence – Thèse de doctorat - 2018

B. Conflict detection analysis Study 1-5

For each direction of change category one may ask whether reasoners are faced with two competing intuitions at the first response stage. We can address this question by looking at the contrast between conflict and control problems. If conflict problems cue two conflicting initial intuitive responses, people should process the problems differently than the no-conflict problems (in which such conflict is absent) in the initial response stage and show lower confidence and longer response latencies (e.g., Botvinick, 2007; De Neys, 2012; Johnson et al., 2016; Pennycook et al., 2015) when solving the conflict problems Therefore, we contrasted the confidence ratings and response times for the initial response on the conflict problems with those for the initial response on the no-conflict problems for each of the four direction of change categories. Note that we used only the dominant control 11 category for this contrast (which we will refer to as “baseline”), as responses in the other control direction of change categories cannot be interpreted unequivocally. Table S1 (confidence) and S2 (latencies) show the results. Visual inspection of Table S1 indicates that there is a general trend towards a decreased initial confidence when solving conflict problems for all direction of change categories. However, this effect is much larger for the 01 and 10 cases in which reasoners subsequently changed their initial response. This suggests that although reasoners might be experiencing some conflict between competing intuitions in all cases, this conflict is much more pronounced in the 10 and 01 case. Latency data in Table S2 mirrors this pattern. In all change categories, it took more time to give a response on conflict items but this latency increase is most pronounced in case people ended up changing their initial response. To analyse the data statistically we created mixed effect multi-level models (Baayen, Davidson, & Bates, 2008; Kuznetsova, Brockhoff, & Christensen, 2017). We ran a separate analysis for each of the four direction of change conflict problem categories and we analysed both confidence and reaction times. In the analysis, the confidence or reaction time for the initial response in a given direction of change category in question was contrasted with the initial response confidence or reaction time for 11 control problems which served as our baseline. We will refer to this contrast as the conflict factor. The conflict factor was entered as fixed factor, and participants and items were entered as random factor (random intercept). We also entered the response format (2 response vs 4 response vs free response) as fixed factor mainly to test for an interaction with the conflict factor. In the cases in which we found a significant interaction we also analysed each response format condition separately. We were

247 Bago Bence – Thèse de doctorat - 2018 not interested in main effects of the response format (e.g., simply because of the different deadlines, responses will be faster for some studies than for others) and did not analyse it further to avoid spurious findings. Note that prior to analysis reaction times were log- transformed to normalize the distribution, and analysis were performed on the log- transformed data. 11 Category. In terms of confidence, we found that conflict improved model fit significantly, χ2 (1) = 43.6, p < 0.0001, as well as the main effect of response format, χ2 (3) = 7.63, p = 0.022, but not their interaction, χ2 (5) = 1.3, p = 0.52. Hence, people were less confident in the 11 conflict category than in the baseline, b = -12.2, t (193.2) = -10.83, p < 0.0001. Similar results were found with regard to reaction times as well; conflict improved model fit significantly, χ2 (1) = 21.73, p < 0.0001, as well as the main effect of response format, χ2 (3) = 554.9, p < 0.0001, but not their interaction, χ2 (5) = 2.23, p = 0.33. Thus, it took more time to give a response in the 11 conflict category, than in the baseline, b = 0.05, t (127.8) = 5.4, p < 0.0001. 00 category. With regard to confidence, the main effect of conflict improved model fit significantly, χ2 (1) = 12.6, p = 0.0004, but neither response format, χ2 (3) = 5.5, p = 0.06, nor their interaction did, χ2 (5) = 4.6, p = 0.1. Hence, people were less confident in their response in the 00 conflict category than in baseline, b = -3.1, t (12.9) = -4.3, p = 0.0008. For reaction times, we found that conflict improved model fit, χ2 (1) = 7.04, p = 0.008, along with the main effect of response format, χ2 (3) = 559.3, p < 0.0001, but not their interaction, χ2 (5) = 1.84, p = 0.4. Hence, it took people more time to give a response in the 00 conflict category than in the baseline, b = 0.02, t (13.7) = 2.9, p = 0.01. 10 category. For confidence, we found that model fit was improved by conflict, χ2 (1) = 50.1, p < 0.0001, and the main effect of response format, χ2 (3) = 6.6, p = 0.036. Their interaction did not improve model fit significantly, χ2 (5) = 3.3, p = 0.19. Therefore, people were less confident in the 10 conflict category than in the baseline, b = -49.5, t (834.9) = - 21.9, p < 0.0001. For reaction times, we found that only response format improved the model fit significantly, χ2 (3) = 557.3, p < 0.0001, but not conflict, χ2 (1) = 1.25, p = 0.26, and not their interaction, χ2 (5) = 1.9, p = 0.39. 01 category. Regarding confidence, we found that conflict improved model fit significantly, χ2 (1) = 50.8, p < 0.0001. There was no main effect of response format, χ2 (3) = 3.4, p = 0.18, but format and conflict did interact, χ2 (5) = 60.25, p < 0.0001. We analysed each of the three response format conditions separately and found that in every condition people were less confident in the 01 conflict category than in the baseline, b < -21.4, t < -9.7,

248 Bago Bence – Thèse de doctorat - 2018 p < 0.0001. With respect to reaction times, we found that conflict improved model fit significantly, χ2 (1) = 28.6, p < 0.0001, as well as the main effect of condition, χ2 (3) = 566.4, p < 0.0001, but not their interaction, χ2 (5) = 2.1, p = 0.34. It took participants longer to give a response in the 01 conflict category than in the baseline, b = 0.09, t (324.5) = 6.8, p < 0.0001. Taken together, the conflict detection analysis on the confidence and latency data indicates that by and large participants showed decreased response confidence and increased response times (in contrast with the no-conflict baseline) after having given their first, intuitive response on the conflict problems in all direction of change categories. This supports the hypothesis that participants were always being faced with two conflicting intuitive responses when solving the conflict bat-and-ball problems. In other words, results imply that 11 responders also activate a heuristic “10 cents” intuition in addition to the logical correct “5 cents” response they selected. Likewise, 00 responders also seem to detect that there is an alternative to the incorrect “10 cents” response. Although this points to some minimal error sensitivity (De Neys et al., 2013; Johnson et al., 2016) among incorrect responders, it does not imply that incorrect responders also realize that the correct response is “5 cents” (Travers et al., 2016). The error sensitivity or increased doubt for incorrect responders might result from a less specific intuition (e.g., maybe incorrect responders doubted that their “10 cents” was correct without knowing that the correct response was “5 cents”). More generally speaking, it is possible that the correct intuition differs in strength and/or specificity for correct and incorrect responders. Clearly, the present study was designed and optimized to draw conclusions about the nature of correct responders’ intuitions. Claims with respect to the nature of incorrect responders’ intuitions remain speculative and will need further validation in future studies. Finally, visual inspection also clearly shows that the conflict effects were much larger for the 10 and 01 cases than for the 11 and 00 ones. A contrast analysis25 that tested this trend directly indicated that this trend was significant for confidence data, Z = -15.4, p < 0.0001, (r = 0.11 for the no-change group, while r = 0.5 for the change group), and on reaction times too, Z = -2.35, p (one-tailed) = 0.009, (r = 0.07 for no-change and r = 0.13 for change group). This pattern suggests that although reasoners might be generating two intuitive responses and are being affected by conflict between them in all cases, this conflict is much more pronounced in cases where people subsequently change their answer. This tentatively

25 For this contrast analysis, we first calculated the r effect sizes out of t-values (Rosnow & Rosenthal, 2003). As a next step we used Fisher r-to-z transformation to assess the statistical difference between the two independent r-values. We used the following calculator for the z-transformation and p-value calculation: http://vassarstats.net/rdiff.html

249 Bago Bence – Thèse de doctorat - 2018 suggests that it is this more pronounced conflict experience that makes them subsequently change their answer (Bago & De Neys, 2017; Thompson et al., 2012). Table S1. Average confidence differences (SD) at the initial response stage between the baseline (11 responses on no-conflict problems) and conflict problems for each direction of change category. Response format 11 00 10 01

Study1 2 response 8.4 (3.4) 2.5 (1.9) 52 (9.97) 25.7 (7.6) Study2a 2 response 14.7 (5.6) 1.7 (1.2) 50.1 (8.1) 19.4 (8.7) Study2b 4 response 9.6 (4.9) 4.8 (1.5) 42.4 (48.5) 41.8 (10.6) Study3a 2 response 27.8 (7.7) 7.9 (3.5) 65.7 (7.8) 29.1 (7.6) Study3b 4 response 12.5 (5.4) 9.8 (3.4) - 32.6 (10.2) Study4 Free response 10.3 (6.2) 1.2 (0.9) - 34.3 (32.5) Study5 Free response 8.6 (3.5) 9.2 (3.6) 95.7 (-) 55 (10.2)

Average 2 response 15.7 (2.8) 2.4 (1) 55.0 (5.1) 25.7 (4.5) 4 response 10.8 (3.7) 5.7 (1.4) 42.3 (48.5) 36.9 (7.3) Free response 9.8 (3) 3.7 (1.6) 97.4 (-) 52.7 (9.8)

Overall average 12.3 (1.9) 3.8 (0.7) 56.1 (5.1) 33.5 (3.7)

Table S2. Average reaction time differences in ms (SD) at the initial response stage between the baseline (11 responses on no-conflict problems) and conflict problems for each direction of change category. Note that averages are based on geometrical means. Response format 11 00 10 01

Study1 2 response -300 (196.4) -200 (124.8) -350 (521.5) -770 (290.7) Study2a 2 response -230 (291.6) -40 (90.3) 180 (388.7) -60 (373.8) Study2b 4 response -550 (239) -170 (93.6) 200 (1944) -230 (472.3) Study3a 2 response -510 (302.4) -60 (191.6) -470 (473) -390 (334.7) Study3b 4 response -470 (276.9) 50 (199.3) - -900 (386.9) Study4 Free response -230 (389.3) -220 (183.8) - -3230 (658.1) Study5 Free response -300 (235.4) -210 (187.2) -2340 (-) -1640 (326)

Average 2 response -400 (143.5) -60 (68.7) -120 (267.4) -490 (195.1) 4 response -580 (181.1) -120 (84.8) 270 (1943.8) -670 (307.7) Free response -440 (202.5) -100 (131.2) -2750 (-) -2170 (291.6)

Overall average -720 (113.8) -30 (55.7) 270 (269.2) -760 (175.4)

250 Bago Bence – Thèse de doctorat - 2018

C. Data for no-conflict control problems The tables in this section give an overview of the direction of change (Table S3), stability (Table S4), and justification data (Table S5 & S6) on the no-conflict control problems.

Table S3. Frequency of direction of change categories for no-conflict control problems in Study 1-5. The raw number of trials in each category is presented between brackets. Response format 11 00 10 01

Study1 2 response 93.0% (281) 3.6% (11) 0.3% (1) 3.0% (9) Study2a 2 response 93.9% (504) 2.8% (15) 0.7% (4) 2.6% (14) Study2b 4 response 93.3% (457) 3.3% (16) 1.6% (8) 1.8% (9) Study3a 2 response 94.9% (129) 1.5% (2) 1.5% (2) 2.2% (3) Study3b 4 response 94.1% (128) 1.5% (2) 1.5% (2) 2.9% (4) Study4 Free response 96.4% (132) 0.77% (1) - 2.9% (4) Study5 Free response 92.5% (149) 0.6% (1) 1.2% (2) 5.6% (9)

Average 2 response 93.7% (914) 2.9% (28) 0.7% (7) 2.7% (26) 4 response 93.5% (585) 2.9% (18) 1.6% (10) 2.1% (13) Free response 94.5% (281) 0.7% (2) 0.7% (2) 4.4% (13)

Overall average 93.8% (1780) 2.5% (48) 1.0% (19) 2.7% (52)

251 Bago Bence – Thèse de doctorat - 2018

Table S4. Frequency of stability index values on no-conflict control problems in Study 1-5. The raw number of participants in each category is presented between brackets. Response <33% 50% 66% 75% 100% Average format stability

Study1 2 response 1.2% (1) 2.3% (2) 4.6% (4) 5.8% (5) 86.2% (75) 95.1% Study2a 2 response - 1.3% (2) 5.0% (8) 5.7% (9) 88.1% (140) 96.3% Study2b 4 response - 2.8% (4) 4.9% (7) 4.2% (6) 88.2% (127) 96.0% Study3a 2 response 2.4% (1) 4.9% (2) 4.9% (2) 2.4% (1) 85.4% (35) 94.7% Study3b 4 response - 4.9% (2) 4.9% (2) 4.9% (2) 85.4% (35) 96.4% Study4 Free response - - 5.1% (2) 7.7% (3) 87.2% (34) 93.0% Study5 Free response - 2.3% (1) 2.3% (1) 20.5% (9) 75.0% (33) 95.0%

Average 2 response 0.7% (2) 2.1% (6) 4.9% (14) 5.2% (15) 87.2% (251) 95.5% 4 response - 3.2% (6) 4.9% (9) 4.3% (8) 87.6% (162) 95.6% Free response - 1.2% (1) 3.6% (3) 14.5% (12) 80.7% (67) 94.6%

Overall average 0.4% (2) 2.3% (13) 4.7% (26) 6.3% (35) 86.3% (480) 95.4%

252 Bago Bence – Thèse de doctorat - 2018

Table S5. Frequency of different justification categories for no-conflict control problems in Study 6. The raw number of justifications in each category is presented between brackets. Justification Initial response Final response Correct Incorrect Correct Incorrect Correct math 39.6% (19) - 56.5% (48) - Incorrect math - - - 20% (1) Unspecified math 10.4% (5) - 23.5% (20) - Hunch 6.3% (3) - 1.2% (1) - Guess 6.3% (3) 25% (3) 1.2% (1) 40% (2) Previous 6.3% (3) 25% (3) 3.5% (3) - Other 31.3% (15) 50% (6) 14.1% (12) 40% (2)

253 Bago Bence – Thèse de doctorat - 2018

Table S6. Frequency of different justification categories for no-conflict control problems in Study 7. The raw number of justifications in each category is presented between brackets. Response format Justification Initial response Final response Correct Incorrect Correct Incorrect 2 response Correct math 51.9% (41) - 82.8% (72) - Incorrect math - - - - Unspecified math 17.7% (12) 12.5% (1) 12.6% (11) - Hunch 19% (15) 25% (2) 3.4% (3) - Guess 10.2% (8) 37.5% (3) - - Other 1.2% (1) 25% (2) 1.1% (1) -

Free response Correct math 68.7% (68) 12.5% (1) 95.3% (102) - Incorrect math - - - - Unspecified math 2% (2) - 0.9% (1) - Hunch 15.2% (15) 12.5% (1) 0.9% (1) - Guess 14.1% (14) 62.5% (5) 1.9% (2) - Other - 12.5% (1) 0.9% (1) -

254 Bago Bence – Thèse de doctorat - 2018

D. Full procedure reading pre-tests Study 1 and Study 2 Reading pre-test Study 1. Before we ran the main study we also recruited an independent sample of 64 participants for a reading pre-test (31 female, Mean age = 38.9 years, SD = 13.1 years). Participants were recruited via the Crowdflower platform, and they received $0.10. A total of 41% of the subjects reported high school as highest completed educational level, and 58% reported having a post-secondary education degree (1% reported less than high school). The basic goal of the reading pre-test was to determine the response deadline which could be applied in the main reasoning study. The idea was to base the response deadline on the average reading time in the reading test. Note that dual process theories are highly underspecified in many aspects (Kruglanski, 2013); they argue that System 1 is faster than System 2, but do not further specify how fast System 1 is exactly (e.g., System 1 < x seconds). Hence, the theory gives us nu unequivocal criterion on which we can base our deadline. Our “average reading time” criterion provides a practical solution to define the response deadline. The rationale here was very simple; if people are allotted the time they need to simply read the problem, we can be reasonably sure that System 2 engagement is minimal. Thus, in the reading pre-test, participants were presented with the same items as in the reasoning study. They were instructed to read the problems and randomly click on one of the answer options. Of course, we wanted to avoid that participants would spontaneously engage in any type of reasoning in the pre-test. Therefore, the answer options were randomly selected numbers (to which we drew participant’s attention in the instructions) to make it less likely that reading participants would try to solve the problems. The general instructions were as follows:

Welcome to the experiment! Please read these instructions carefully! This experiment is composed of 8 questions and 1 practice question. It will take 3 minutes to complete and it demands your full attention. You can only do this experiment once. In this task we'll present you with a set of problems we are planning to use in future studies. Your task in the current study is pretty simple: you just need to read these problems. We want to know how long people need on average to read the material. In each problem you will be presented with two answer alternatives. You don’t need to try to solve the problems or start thinking about them. Just read the problem and the answer alternatives and when you are finished reading you randomly click on one of the answers to advance to the next problem. In each problem you will be presented with two answer alternatives. These answer alternatives are simply randomly generated numbers. You don’t need to try to solve the problems or start thinking about them. The only thing we ask of you is that you stay focused and read the problems in the way you typically would. Since we want to get an accurate reading time estimate please avoid whipping your nose, taking a phone call, sipping from your coffee, etc. before you finished reading. At the end of the study we will present you with some easy verification questions to check whether you actually read the problems. This is simply to make sure that participants are complying with the instructions and actually read the problems (instead of clicking through them

255 Bago Bence – Thèse de doctorat - 2018

without paying attention). No worries, when you simply read the problems, you will have no trouble at all at answering the verification questions.

You will receive $0.10 for completing this experiment. Please confirm below that you read these instructions carefully and then press the "Next" button.

To make sure that participants would actually read the problems, we informed subjects that they would be asked to answer two – very easy - verification questions at the end of the experiment to check whether they read the material. The verification questions could be easily answered even by a very rough reading. The following illustrates the verification question:

We asked you to read a number of problems. Which one of the following pair of goods were NOT presented during the task? o A laptop and a mouse o A pencil and an eraser o An apple and an orange o A banana and a magazine

The correct answers were clearly different from the goods which were presented during the task. A total of 84.4% of the participants solved both verification questions correctly, and only the data from these participants was analysed. As in the main experiment, items were presented serially. First, the first sentence of the problem was presented for 2000 ms. Next, the full problem appeared on the screen. Reading times were measured from the presentation of the full problem. The average reading time of the sample was M = 3.87 sec, SD = 2.18 sec. Note that raw reaction time data were first logarithmically transformed prior to analysis. Mean and standard deviation were calculated on the transformed data, and then they were back- transformed into seconds. We wanted to give the participants some minimal leeway26, thus we rounded the average reading time to the closest higher natural number; the response deadline was therefore set to 4 seconds. Reading pre-test Study 2. Half of the participants were presented with four response options in Study 2. Since reading through more options will in itself take more time, we decided to run a new reading pre-test with the 4-option format. To this end an additional 23 participants were recruited (16 female, mean age = 40.8 years, SD = 15.2 years) via Crowdflower. They received $0.10 for participation. A total of 48% of the participants reported high school as highest completed educational level, and 52% reported having a post- secondary education degree. As in Study 1, the four response options in the pre-test were

26 This also helped to account for minor language differences since participants in the main study would solve Hungarian translations of the English problems.

256 Bago Bence – Thèse de doctorat - 2018 randomly generated numbers. Except for the number of response options the pre-test was completely similar to the 2-option Study 1 pre-test. Participants were also presented with the same verification questions, which were correctly solved by 74.9% of the participants. Only data from participants who responded correctly to both verification questions was analysed. Prior to reaction time analysis, raw reaction times were log-transformed. Mean and standard deviations were calculated on the log-transformed data, and they were back-transformed after calculation. The mean reading time in the pre-test sample was 4.3 s (SD = 2 s). As in Study 1, we rounded the deadline to nearest higher natural number. Hence, the time limit in the 4- option format was set to 5 s (vs 4 s in the 2-option format).

257 Bago Bence – Thèse de doctorat - 2018

E. Additional analyses and data Table S7. Frequency of direction of change categories in study 1-5 for the first conflict problem that participants solved only. Raw number of trials are in brackets. Study Response Direction of change category Non- format correction 11 00 10 01 (11/11+01) Study1 2 response 5.7% (5) 71.6% (63) 6.8% (6) 15.9% (14) 26% Study2a 2 response 4.2% (7) 89.2% (148) 2.4% (4) 4.2% (7) 50% Study2b 4 response 8.1% (12) 89.2% (132) - 2.7% (4) 75% Study3a 2 response 2.4% (1) 69.1% (29) 9.5% (4) 19.1% (8) 11% Study3b 4 response 9.3% (4) 76.7% (33) - 14% (6) 40% Study4 Free response 10% (4) 85% (34) - 5% (2) 66.7% Study5 Free response 9.1% (4) 68.2% (30) - 22.7% (10) 29%

Average 2 response 4.4% (13) 81.1% (240) 4.7% (14) 9.8% (29) 30.1% 4 response 8.4% (16) 86.4% (165) - 5.2% (10) 61.5% Free response 9.5% (8) 76.2% (64) - 14.3% (12) 40%

Overall average 6.5% (37) 82.1% (469) 2.5% (14) 8.9% (51) 42%

Table S8. Frequency of each direction of change category for the first conflict item only in the justification studies (Study 6-7). Raw number of trials are in brackets. Study Response Direction of change category Non- format correction 11 00 10 01 (11/11+01) Study 6 2 response 8.2% (4) 53.1% (26) 16.3% (8) 22.5% (11) 26.7% Study 7a 2 response 18.8% (9) 58.3% (28) 10.4% (5) 12.5% (6) 60% Study 7b Free response 12.5% (7) 44.6% (25) 1.8% (1) 41.1% (23) 23.3%

258 Bago Bence – Thèse de doctorat - 2018

Table S9. Frequency of conflict problem justifications for different direction of change categories in Study 6 and 7 (raw number of justifications in brackets). Study Justification Initial response Final response 11 00 10 01 11 00 10 01 Study 6 Correct math 30% (6) - - 7.1% (1) 55% (11) 6.7% (3) - 69.2% (9) 2 Incorrect math - 25.5% - 21.4% (3) - 37.8% 11.1% (1) - response (12) (17) Unspecified - 19.1% (9) - - 5% (1) 37.8% 77.8% (7) 7.7% (1) math (17) Hunch 5% (1) 4.3% (2) - 14.3% (2) 5% (1) - - 7.7% (1) Guess 20% (4) 10.6% (5) 66.7% (6) 7.1% (1) 5% (1) 4.4% (2) - - Previous 30% (6) 6.4% (3) - 7.1% (1) 10% (2) 2.2% (1) - - Other 15% (3) 34% (16) 33.3% (3) 42.9% (6) 20% (4) 11.1% (5) 11.1% (1) 15.4% (2)

Study 7 Correct math 12.5% (2) 2.3% (1) - - 93.8% 2.3% (1) - 100% (10) 2 (15) response Incorrect math - 40.1% - 20% (2) - 65.9% 83.3% (5) - (18) (29) Unspecified 12.5% (2) 11.4% (5) - - 6.3% (1) 18.2% (8) - - math Hunch 56.3% (9) 22.7% 16.7% (1) 50% (5) - 2.3% (1) - - (10) Guess 18.8% (3) 22.7% 83.3% (5) 30% (3) - 11.4% (5) - - (10) Other ------16.7% (1) -

Study 7 Correct math 30.8% (4) - - - 84.6% 2.3% (1) - 93.3% Free (11) (28) response Incorrect math - 46.5% 20 - 26.7% (8) - 69.8% 50% (1) - (30) Unspecified 7.7% (1) 7% (3) - - 7.7% (1) 16.3% (7) 50% (1) 3.3% (1) math Hunch 38.5% (5) 23.3% 100% (2) 40% (12) - 4.7% (2) - - (10) Guess 15.4% (2) 21% (9) - 26.7% (8) - 7% (3) - - Other 7.7% (1) 2.3% (1) - 6.7% (2) 7.7% (1) - - 3.3% (1)

259 Bago Bence – Thèse de doctorat - 2018

A. Initial response Confidence

B. Final response confidence

Figure S1. Mean initial (A.) and final (B.) conflict problem response confidence ratings averaged across Study 1-5. Error bars are 95% confidence intervals.

260 Bago Bence – Thèse de doctorat - 2018

Appendix C

SUPPLEMENTARY MATERIAL FOR CHAPTER 5

A. Scenario contents used in Study 1-3

We first present the “No family member” versions (Study 1, Study3) followed by the “Family member” versions (Study 2, Study 3). Numbers used in the “Extreme” kill-save ratio condition (Study 3) are in brackets next to the scenario name.

No family member versions (Study 1, Study3)

Iceberg (3/55) You are radar operator overseeing vessel movement near Greenland. Due to sudden ice movement a boat carrying 11 passengers is about to crash into an iceberg. If nothing is done, all passengers will die.

The only way to save the 11 passengers is for you to order the captain to execute an emergency maneuver that will sharply alter the course of the boat.

However, the maneuver will cause the boat to overrun a life raft carrying 3 people that is floating next to the iceberg and out of sight of the captain. The 3 people on the life raft will be killed if you order the captain to execute the maneuver, but the 11 people on the boat will be saved. Would you order to execute the manoeuvre?

Gas explosion (4/60) You are the inspector of gas lines in a city. For some reason, the gas in a pipeline, which is running under a building, got on fire. If the fire reaches the building, it will explode, and will kill 12 people in it.

You realize that the only way to stop the explosion of the building is to close the pipeline by pushing on a button, and divert the fire to a side-pipeline. If you do so, the 12 people will be saved.

However, above the side-line is another building with 4 people in it. If you push the button and divert the fire into the sideline, this building will explode and kill the 4 people in it, but the 12 in the building above the main line will be saved. Would you push the button?

Fumes (5/65) You are the late-night watchman in a hospital. Due to an accident in the building next door, there are deadly fumes rising up through the hospital’s ventilation system. The fumes are

261 Bago Bence – Thèse de doctorat - 2018 directly and quickly heading towards a room with 13 patients in it. If you do nothing the fumes will rise up into this room and cause their deaths.

The only way to avoid the deaths of these patients is to hit a certain switch, which will cause the fumes to bypass the room and enter a second room instead.

However, you realize that 5 patients are in the second room. These 5 patients will be killed if you hit the switch and let the fumes bypass the first room, but the 13 patients in the first room will be saved. Would you hit the switch?

Airplane (5/65) You are a military base commander. A missile has just been mistakenly fired at a commercial airliner. If you do nothing, the missile will reach the airliner and 13 people on the airliner will die.

You realize that the only way to save these people, is to alter the course of the commercial airliner. In this case, the missile will pass by the airliner and the 13 people inside will be saved.

However, if you alter the course of the commercial airliner, the missile will hit another airliner with 5 people inside which is flying right behind it. These 5 people who are travelling on this airliner will be killed if you alter the other’s course, but the 13 people in the commercial airliner will be saved. Would you alter the commercial airliner’s course?

Submarine (4/60) You are responsible for the mission of a submarine. You are leading this operation from a control center on the beach. An onboard explosion has damaged the ship and collapsed the only access corridor between the upper and lower levels of the ship. As a result of the explosion, water is quickly approaching to the upper level of the ship. If nothing is done, 12 people in the upper level will be killed.

You realize that the only way to save these people is to hit a switch in which case the path of the water to the upper level will be blocked and it will enter the lower level of the submarine instead.

However, you realize that 4 people are trapped in the lower level. If you hit the switch, the 4 people in the lower level (who otherwise would survive) will die, but the 12 people in the upper level will be saved. Would you hit the switch?

Mine (3/55) Due to an accident there are 11 miners stuck in one of the shafts of a copper mine. They are almost out of oxygen and will die if nothing is done. You are the leader of the rescue team.

The only way for you to save them is to activate an emergency circuit that will transfer oxygen from a nearby shaft into the shaft where the 11 miners are stuck.

262 Bago Bence – Thèse de doctorat - 2018

However, your team notices that 3 miners are trapped in the nearby shaft. If you activate the emergency circuit to transfer the oxygen, these 3 miners will be killed, but the 11 miners will be saved. Would you activate the emergency circuit?

Metro (2/50) You are a late-night metro controller in a station. A metro with automated driver is heading towards the station. Right before the station, the track gets blocked by a huge rock. If nothing is done, the train crashes into the rock and 10 passengers will die.

You realize that the only way to save these people is to pull a lever and divert the metro to a short emergency side-track. However, there is another metro coming right after it. If you divert the first metro to the side-track the second one will crash into the rock, as there is only place on the side track for one metro. If you do nothing, the first metro crashes, and the second one will be automatically directed into the side-track.

However, you notice that 2 passengers are travelling in the second metro. If you pull the lever and let the first metro divert to the side-track, these 2 passengers will die, but the 10 people in the first metro will be saved. Would you pull the lever?

Volcano (2/50) After a volcano eruption deadly hot lava is heading towards a nearby village. You are directing the rescue operations. There are 10 people standing on the roof of a house. If nothing is done, these 10 people will inevitably be killed by the lava stream.

The only way to save these people is to order the construction of an emergency barrier that will divert the lava stream into an old river bed.

However, you suddenly receive the information that right along the old river bed 2 people are standing on the roof of a barn. If you order the construction of the emergency barrier and divert the lava to save the 10 people on the roof of the house, the 2 people on the roof of the barn will inevitably be killed. Would you order the construction of the emergency barrier?

FILLER ITEM (same in both Family and No-family conditions)

Trolley (2/50)

You are a railway controller. There is a runaway trolley barrelling down the railway tracks. Ahead, on the tracks, there are 6 workmen. The trolley is headed straight for them and they will be killed if nothing is done.

You are standing some distance off in the train yard, next to a lever. If you pull this lever, the trolley will switch to a side track and you can save the 6 workmen on the main track.

263 Bago Bence – Thèse de doctorat - 2018

You notice that there are no workmen on the side track. So there will be no workmen who will be killed if you pull the lever and change the tracks but the 6 workmen on the main track will be saved.

Would you pull the lever?

Family member versions (Study 2, Study 3)

Iceberg (3/55) You are radar operator overseeing vessel movement near Greenland. Due to sudden ice movement a boat carrying 11 passengers is about to crash into an iceberg. If nothing is done, all passengers will die.

The only way to save the 11 passengers is for you to order the captain to execute an emergency maneuver that will sharply alter the course of the boat.

However, the maneuver will cause the boat to overrun a life raft carrying your father and 2 other people that is floating next to the iceberg and out of sight of the captain. Your father along with the 2 other people will be killed if you order the captain to execute the maneuver, but the 11 people on the boat will be saved. Would you order to execute the manoeuvre?

Gas explosion (4/60) You are the inspector of gas lines in a city. For some reason, the gas in a pipeline, which is running under a building, got on fire. If the fire reaches the building, it will explode, and will kill 12 people in it.

You realize that the only way to stop the explosion of the building is to close the pipeline by pushing on a button, and divert the fire to a side-pipeline. If you do so, the 12 people will be saved.

However, above the side-line is another building with your mother and 3 other people in it. If you push the button and divert the fire into the sideline, this building will explode and kill your mother and the 3 other people, but the 12 in the building above the main line will be saved. Would you push the button?

Fumes (5/65) You are the late-night watchman in a hospital. Due to an accident in the building next door, there are deadly fumes rising up through the hospital’s ventilation system. The fumes are directly and quickly heading towards a room with 13 patients in it. If you do nothing the fumes will rise up into this room and cause their deaths.

The only way to avoid the deaths of these patients is to hit a certain switch, which will cause the fumes to bypass the room and enter a second room instead.

264 Bago Bence – Thèse de doctorat - 2018

However, you realize that your sister is in the second room along with 4 other patients. Your sister and the 4 other patients will be killed if you hit the switch and let the fumes bypass the first room, but the 13 patients in the first room will be saved. Would you hit the switch?

Airplane (5/65) You are a military base commander. A missile has just been mistakenly fired at a commercial airliner. If you do nothing, the missile will reach the airliner and 13 people on the airliner will die.

You realize that the only way to save these people, is to alter the course of the commercial airliner. In this case, the missile will pass by the airliner and the 13 people inside will be saved.

However, if you alter the course of the commercial airliner, the missile will hit another airliner with your sister and 4 other people inside which is flying right behind it. Your sister and the 4 other people who are travelling on this airliner will be killed if you alter the course, but the 13 people in the commercial airliner will be saved. Would you alter the commercial airliner’s course?

Submarine (4/60) You are responsible for the mission of a submarine. You are leading this operation from a control center on the beach. An onboard explosion has damaged the ship and collapsed the only access corridor between the upper and lower levels of the ship. As a result of the explosion, water is quickly approaching to the upper level of the ship. If nothing is done, 12 people in the upper level will be killed.

You realize that the only way to save these people is to hit a switch in which case the path of the water to the upper level will be blocked and it will enter the lower level of the submarine instead.

However, you realize that your brother and 3 other people are trapped in the lower level. If you hit the switch, your brother along with the 3 other people in the lower level (who otherwise would survive) will die, but the 12 people in the upper level will be saved. Would you hit the switch?

Mine (3/55) Due to an accident there are 11 miners stuck in one of the shafts of a copper mine. They are almost out of oxygen and will die if nothing is done. You are the leader of the rescue team.

The only way for you to save them is to activate an emergency circuit that will transfer oxygen from a nearby shaft into the shaft where the 11 miners are stuck.

However, your team notices that your father and 2 other miners are trapped in the nearby shaft. If you activate the emergency circuit to transfer the oxygen, your father and the 2 other miners will be killed, but the 11 miners will be saved. Would you activate the emergency circuit?

265 Bago Bence – Thèse de doctorat - 2018

Metro (2/50) You are a late-night metro controller in a station. A metro with automated driver is heading towards the station. Right before the station, the track gets blocked by a huge rock. If nothing is done, the train crashes into the rock and 10 passengers will die.

You realize that the only way to save these people is to pull a lever and divert the metro to a short emergency side-track. However, there is another metro coming right after it. If you divert the first metro to the side-track the second one will crash into the rock, as there is only place on the side track for one metro. If you do nothing, the first metro crashes, and the second one will be automatically directed into the side-track.

However, you notice that your brother and 1 other passenger are travelling in the second metro. If you pull the lever and let the first metro divert to the side-track, your brother and the 1 other passenger will die, but the 10 people in the first metro will be saved. Would you pull the lever?

Volcano (2/50) After a volcano eruption deadly hot lava is heading towards a nearby village. You are directing the rescue operations. There are 10 people standing on the roof of a house. If nothing is done, these 10 people will inevitably be killed by the lava stream.

The only way to save these people is to order the construction of an emergency barrier that will divert the lava stream into an old river bed.

However, you suddenly receive the information that right along the old river bed your mother and 1 other person are standing on the roof of a barn. If you order the construction of the emergency barrier and divert the lava to save the 10 people on the roof of the house, your mother and the 1 other person on the roof of the barn will inevitably be killed. Would you order the construction of the emergency barrier?

266 Bago Bence – Thèse de doctorat - 2018

B. Stability index

Table S1 Frequency of stability index values on conflict items in Study 1-3. The raw number of participants for each value is presented between brackets. Study Stability index value Average >33 50% 66.7% 75% 100% Stability Study No 2.9% (3) 14.4% 17.3% 23.1% 42.3% (73) 83.8% 1 family - (15) (18) (24) Moderate ratio

Study Family - 5.7% (6) 10.5% 11.4% 8.6% (9) 63.8% (67) 85% 2 Moderate (11) (12) ratio

Study No 3.2% (4) 11.3% 12.1% 14.5% 58.9% (73) 84.5% 3 family - (14) (15) (18) Extreme ratio

Family - 5.8% (6) 17.5% 11.7% 12.6% 52.4% (54) 80.3% Extreme (18) (12) (13) ratio

Overall average 4.3% 13.3% 13.1% 14.7% 54.6% 82.4% (19) (58) (57) (64) (238)

267 Bago Bence – Thèse de doctorat - 2018

C. Conflict detection analysis on combined Study 1-3

For each direction of change category one may ask whether reasoners are faced with two competing intuitions at the first response stage. We can address this question by looking at the contrast between conflict and control problems. If conflict problems cue two conflicting initial intuitive responses, people should process the problems differently than the no-conflict problems (in which such conflict is absent) in the initial response stage and show lower confidence when solving the conflict problems (Bialek & De Neys, 2017, see also footnote 3). Therefore, we contrasted the confidence ratings for the initial response on the conflict problems with those for the initial response on the no-conflict problems for each of the four direction of change categories. Note that we used only the dominant no-conflict “UU” category in which participants refused to sacrifice more people to save less. We refer to this category as “baseline”. The rare responses in the other no-conflict direction of change categories were not cued by utilitarian or deontological considerations and cannot be interpreted unequivocally. To avoid spurious conclusion in this exploratory analysis we combined the data from our three studies to get the most general and robust test. Table S2 shows the results. Visual inspection of Table S2 (bottom) indicates that overall there is a general trend towards a decreased initial confidence when solving conflict problems for all direction of change categories. However, this effect is larger for the “UD” and “DU” cases in which reasoners subsequently changed their initial response. This suggests that although reasoners might be experiencing some conflict between competing intuitions in all cases, this conflict is more pronounced in the “UD” and “DU” case. We ran a separate analysis for each of the four direction of change conflict problem categories on the combined data from Study 1-3. In the analysis, the confidence for the initial response in a given direction of change category in question was contrasted with the initial response confidence for no-conflict “UU” responses which served as our baseline. We will refer to this contrast as the conflict factor. The conflict factor was entered as fixed factor, and participants were entered as random factor. Results showed that conflict improved model fit significantly for each of the four direction of change categories (UU, χ2 (1) = 21.4, p < 0.0001, b = -6.76; DD, χ2 (1) = 25.3, p < 0.0001, b = -9.1; DU, χ2 (1) =17.96, p < 0.0001, b = - 17.04; UD, χ2 (1) = 43.4, p < 0.0001, b = -26.9). Hence, the conflict detection analysis on the confidence data indicates that by and large participants showed decreased response confidence (in contrast with the no-conflict baseline) after having given their first, intuitive response on the conflict problems in all direction of change categories. This supports the

268 Bago Bence – Thèse de doctorat - 2018 hypothesis that just like utilitarian responders, deontological responders were being faced with two conflicting intuitive responses when solving the conflict dilemmas (Bialek & De Neys, 2016, 2017). A contrast analysis27 that contrasted the conflict effects on the change (i.e., “UD” and “DU”) and no-change (“UU” and “DD”) indicated that the trend towards larger effects for the change categories did not reach significance, Z = -0.98, p (one-tailed) = 0.16, (r = 0.14 for no- change and r = 0.18 for change group). Nevertheless, the trend suggests that although reasoners might be generating two intuitive responses and are being affected by conflict between them in all cases, this conflict is more pronounced in cases where people subsequently change their answer. In line with our absolute confidence level findings on the conflict problems (see Figure 1), this tentatively suggests that it is the more pronounced conflict experience that makes them subsequently change their answer (Bago & De Neys, 2017; Thompson et al., 2012). As we noted in footnote 3, our conflict detection analysis focused on the confidence data because these have been shown to be more reliable than latency data in the moral reasoning case (Bialek & De Neys, 2017). Nevertheless, for completeness the interested reader finds an overview of the latency data in Table S3. Visual inspection of the table indicates that there were few consistent initial conflict detection effects (i.e., longer initial response times on conflict than no-conflict problems) in the latency data.

27 For this contrast analysis, we first calculated the r effect sizes out of t-values (Rosnow & Rosenthal, 2003). As a next step we used Fisher r-to-z transformation to assess the statistical difference between the two independent r-values. We used the following calculator for the z-transformation and p-value calculation: http://vassarstats.net/rdiff.html

269 Bago Bence – Thèse de doctorat - 2018

Table S2. Average confidence ratings and confidence contrast difference between the no- conflict baseline and conflict problems as a function of response stage and direction of change category. Numbers in brackets are standard deviations of the means for initial and final responses, and standard errors for initial and final conflict contrast.

Study Direction Initial Final Initial Final of Change response response conflict conflict contrast contrast Study 1 No family - Baseline 78.5% (21.3) 85.7% (17.7) - - Moderate ratio UU 76.2% (20.9) 80.7% (19.7) 2.3% (1.8) 5% (1.6) DD 57.7% (25.9) 68.5% (21.8) 20.8% (4.6) 17.2% (3.9) UD 71% (27.1) 62.9% (35) 7.5% (6.2) 22.8% (7.9) DU 47.2% (30.2) 77.4% (25.8) 31.3% (5.1) 8.3% (4.4)

Study 2 Family - Baseline 79.5% (24.7) 88.3% (19.5) - - Moderate ratio UU 55.3% (26.9) 61.4% (24.7) 24.2% (4.2) 26.9% (3.8) DD 74.1% (25.4) 80.7% (23) 5.4% (2.1) 7.6% (1.8) UD 57.1% (27) 66.5% (28.7) 22.4% (5.1) 21.8% (5.3) DU 51.3% (26.1) 50.9% (24.2) 28.2% (6.5) 37.4% (6)

Study 3 No family - Baseline 80.9% (23.1) 87.6% (18.7) - - Extreme ratio UU 78.8% (24.1) 85.6% (19.9) 2.1% (1.7) 2% (1.4) DD 76.3% (24.1) 80% (22.9) 4.6% (4.5) 7.6% (4.2) UD 61.2% (39.6) 54.3% (39.2) 19.7% (11.5) 33.3% (11.3) DU 48.5% (30.5) 78.5% (25.9) 32.4% (4.6) 9.1% (3.9)

Family - Baseline 84.4% (21.3) 93.1% (15) - - Extreme ratio UU 61.7% (27.9) 70.3% (27.8) 22.7% (3.6) 22.8% (3.3) DD 73% (28.6) 79.3% (25.8) 11.4% (2.4) 13.8% (1.9) UD 46.5% (33.7) 63.2% (36.1) 37.9% (8.8) 29.9% (8.3) DU 53.1% (20.5) 61.9% (25.3) 31.3% (4.0) 31.2% (4.8)

Overall average Baseline 80.8% (22.7) 88.6% (18) - - UU 74.7% (24.5) 80.6% (22.2) 6.1% (1.1) 8% (1) DD 72.7% (26.8) 79.4% (24.2) 8.1% (1.3) 9.2% (1.1) UD 59.3% (31.1) 63% (33.3) 21.5% (3.6) 25.6% (3.7) DU 49.5% (27.7) 70.8% (27.2) 31.3% (2.5) 17.8% (2.4) Note. U = utilitarian. D = Deontological.

270 Bago Bence – Thèse de doctorat - 2018

Table S3. Average response times and response time contrast difference between the no- conflict baseline and conflict problems as a function of response stage and direction of change category. Means were calculated on log-transformed data and were back-transformed prior to the subtraction. Numbers in brackets are (geometric) standard deviations of the means for initial and final responses, and standard errors for the initial and final conflict contrast.

Study Direction Initial Final Initial Final of Change response response conflict conflict contrast contrast Study 1 No family - Baseline 7.72s (1.49) 6.94s (2.33) - - Moderate UU 8s (1.5) 7.11s (2.35) -0.28s (0.12) -0.17s (0.19) ratio DD 8.02s (1.6) 7.77s (2.48) -0.3s (0.29) -0.83s (0.44) UD 7.48s (1.43) 7.69s (3) 0.24s (0.33) -0.75s (0.68) DU 7.3s (1.82) 12.7s (2.17) 0.42s (0.31) -5.76s (0.38)

Study 2 Family - Baseline 8.02s (1.43) 7.1s (2.5) - - Moderate UU 8.7s (1.43) 7.83s (2.24) -0.68s (0.23) -0.73s (0.36) ratio DD 8.23s (1.45) 6.84s (2.51) -0.21s (0.12) 0.26s (0.21) UD 5.42s (2.29) 16.71s (2.1) 2.6s (0.43) -9.61s (0.41) DU 9.47s (1.2) 16.82s (1.94) -1.45s (0.3) -9.72s (0.49)

Study 3 No family - Baseline 7.32s (1.58) 7.87s (2.43) - - Extreme ratio UU 7.62s (1.53) 7.33s (2.3) -0.3s (0.12) 0.54s (0.17) DD 4.98s (2.24) 4.62s (2.4) 2.34s (0.41) 3.25s (0.45) UD 6.2s (2.62) 4.79s (2.37) 1.12s (0.76) 3.08s (0.7) DU 8.09s (1.73) 11.87s (2.49) -0.77s (0.27) -4s (0.39)

Family - Baseline 7.63s (1.42) 7.51s (2.62) - - Extreme ratio UU 7.95s (1.61) 6.3s (2.52) -0.32s (0.2) 1.21s (0.32) DD 8.03s (1.48) 7.1s (2.73) -0.4s (0.13) 0.41s (0.24) UD 7.87s (1.67) 9.77s (2.76) -0.24s (0.39) -2.26s (0.65) DU 7.86s (1.67) 10.61s (2.81) -0.23s (0.32) -3.1s (0.54)

Overall average Baseline 7.65s (1.49) 7.37s (2.47) - - UU 7.86s (1.52) 7.16s (2.34) -0.21s (0.07) 0.22s (0.11) DD 7.9s (1.54) 6.84s (2.6) -0.25s (0.08) 0.53s (0.13) UD 6.54s (2.02) 10.11s (2.74) 1.11s (0.23) -2.74s (0.31) DU 7.97s (1.69) 12.36s (2.4) -0.32s (0.15) -4.99s (0.22) Note. U = utilitarian. D = Deontological.

271 Bago Bence – Thèse de doctorat - 2018

D. Supplementary confidence analysis

Given the core hybrid model principles one can expect that changes in the strength levels of competing intuitions should lead to predictable consequences. Just as the family member manipulation can be assumed to affect the strength of the postulated deontological intuition, the kill-save ratio manipulation can—in theory—be assumed to affect the strength of the postulated logical intuition (i.e., stronger logical intuition with a more extreme kill-save ratio). However, our overall utilitarian response rate (Table 1) already indicated that the impact of the kill-save manipulation in the current studies was less marked than that of the family member manipulation. Extremer kill-save ratios did not lead to a significantly higher initial utilitarian response rate. This questions whether the kill-save ratio manipulation successfully affected the strength of the postulated utilitarian intuition. Nevertheless, for completeness and consistency, we also tested the impact of the kill-save ratio extremity on response confidence. If extremer kill-save ratios increase the strength of the utilitarian intuition, the key prediction is again that utilitarian and deontological responders’ response confidence should show opposite effects. Figure S1 plots the average initial response confidence as a function of the kill-save extremity across our studies. As the figure shows, there was a slight trend in the expected direction: Making the utilitarian intuition “stronger” (extreme vs moderate kill-save condition), increased initial confidence for utilitarian responders but decreased it for deontological responders (i.e., deontological responders are more likely to doubt their deontological decision when the utilitarian intuition is stronger). However, statistical testing showed that the interaction trend was not significant, χ2 (3) = 0.03, p = 0.86. Obviously, it is possible that adopting more extreme kill-save ratios (e.g., 1/5000 vs 1/5, see Trémolière & Bonnefon, 2014) might result in stronger effects of the kill-save ratio manipulation on the utilitarian response rate and response confideInitialnce.

272 Bago Bence – Thèse de doctorat - 2018

Figure S1. Mean initial conflict problem response confidence ratings for initial utilitarian and deontological responses as a function of the kill-save ratio (bottom) manipulations across Study 1-3. Error bars are 95% confidence intervals.

273 Bago Bence – Thèse de doctorat - 2018

E. Justification study

Here we report an exploratory study in which people were given moral dilemmas and were asked to give a justification after both their initial and final response. We were specifically interested in the rate of proper utilitarian justifications that explicitly mentioned the greater good (e.g., “I opted for this decision because more people will be saved”).

Method

Participants A total of 120 Hungarian students (95 female, Mean age = 20.3 years, SD = 1.4 years) from the Eotvos Lorand University of Budapest were tested. 93.3% of the participants reported high school as highest completed educational level, while 6.7% reported having a post-secondary education degree. Participants received course credit for taking part.

Material We adopted the same material and design as in our main studies. Half of the participants received “Family” versions and the other half “No family” versions. We used the moderate kill-save ratios in all versions. Since the primary goal was to study participant’s response justifications we made a number of procedural changes to optimize the justification elicitation. Given that explicit justification might be hard (and/or frustrating) we opted to present only half of the main study problems (i.e., two conflict and two no-conflict versions). These items were chosen randomly from the main study problems. The procedure followed the same basic two-response paradigm as in the main studies with the exception that cognitive load was not applied and participants were not requested to enter their response confidence so as to further simplify the task design. As in the main studies, the initial response deadline was set to 12 s. Note that previous work from our team that contrasted deadline and load treatments indicated that a challenging response deadline may suffice to minimize System 2 engagement in a two-response paradigm (see Bago & De Neys, 2017). After both the initial and final response people were asked the following justification question: “Why did you choose this response option? Please try to justify why you opted for the answer you selected.” There was no time restriction to enter the justification. Whenever participants missed the response deadline for the reasoning problem, they were not presented

274 Bago Bence – Thèse de doctorat - 2018 with the justification question, but with a message which urged them to make sure to enter their response before the deadline on the next item.

Justification analysis. To analyse participants’ justifications we defined 3 main justification categories on the basis of an initial screening. Although our key interest lies in the rate of utilitarian justifications, the categorization gives us some insight into the variety of justifications participants spontaneously produce. The three justification categories along with illustrative examples are presented below.

Utilitarian. People made reference to the greater good or, in some cases to the less negative consequences (e.g., “People are all equal, the least people should die”, “If I do this, fewer people will die”, “Because more people will be saved”)

Feeling/Intuition. People referred to a gut feeling, intuition or their sentiments towards the family member in question. (e.g., “Because I would feel guilty for the death of those people”, “I just can’t kill my brother”, “I don’t know, this is what my heart would say”).

Other. All responses that could not be readily categorized as Utilitarian or Feeling/intuition (e.g., “There must be a possibility to divert both airplanes”, “For the same reason as before”, “I don’t risk the life of humans”).

Exclusion criteria. Trials on which the response deadline was missed (24.3% of all trials) were discarded. Therefore, in total, 454 trials (out of 600) were analysed.

Results and discussion

By and large, people’s dilemma choices were consistent with the results of our main studies. The overall non-correction rate was again high and reached 82.4%. Table S4 gives a detailed overview. But the central question of this study concerned the response justifications. Table S5 presents an overview of the justification results on the critical conflict items. Our primary interest lies in the utilitarian responders; could they justify their initial utilitarian conflict response by referring to the greater good, or do they require further deliberation? As Table S5 shows, there is an overall increase in utilitarian justifications in the final response stage compared to the initial response stage (7.7% increase). This difference was especially

275 Bago Bence – Thèse de doctorat - 2018 clear in the family condition (23.2% increase) in which the emotional averseness of the utilitarian option was highest. But it is also clear that the data are noisy. This is evidenced by the relatively high number of “Other” responses, and by the fact that participants sometimes referred to “Utilitarian” justifications even when giving a deontological response, for example. As we already noted (see footnote 5), we cannot exclude that participants use the justification phase to deliberate about their initial response which would inflate utilitarian justifications overall. Furthermore, the percentage of discarded trials in which the initial response deadline was missed was quite high (i.e., 24.3%—about 3 times higher than what we observed in the main studies). This might indicate that the mere fact that people were asked to justify their answer triggered additional reflection throughout the study. In line with this hypothesis we also found that average initial response latencies were about 1 s longer in the justification study vs main studies (8.8 s vs 7.8 s). Taken together this indicates that the findings should be interpreted with some caution. The study might overestimate the overall likelihood of utilitarian justifications. Nevertheless, the results present some preliminary evidence for the idea that such justifications are more likely after deliberate reasoning.

276 Bago Bence – Thèse de doctorat - 2018

Table S4. Initial and final average percentage (SD) of utilitarian responses in Justification study. Conflict No-conflict Initial Final Initial Final No family 72.4% (45) 77.6% (41.9) 90.2% (29.9) 91.5% (28.1) Family 26.4% (44.4) 17.2% (38) 94.7% (22.6) 93.6% (24.6)

Average 47.9% (50.1) 45.4% (49.9) 92.6% (26.2) 92.6% (26.2)

Table S5. Frequency of different types of justifications for conflict items (raw number of justifications in brackets). Condition Justification Initial response Final response Utilitarian Deontological Utilitarian Deontological No family Utilitarian 84.6% (44) 35% (7) 83.1% (49) 46.7% (7) Feeling/Intuition 3.8% (2) 5% (1) - 6.7% (1) Other 11.5% (6) 60% (12) 16.9% (10) 46.7% (7)

Family Utilitarian 43.5% (10) 1.7% (1) 66.7% (10) - Feeling/Intuition 26.1%(6) 85% (51) 13.3%(2) 70.3% (45) Other 30.4% (7) 13.3% (8) 20% (3) 29.7% (19)

Overall Utilitarian 72% (54) 10% (8) 79.7% (59) 8.9% (7) Feeling/Intuition 10.7% (8) 65% (52) 2.7% (2) 58.2% (46) Other 17.3% (13) 25% (20) 17.6% (13) 32.9% (26)

277