Evil, or Weird? An investigation of the moral stereotype of scientists

Master Thesis

Alessandro Santoro Student Number 10865135

University of Amsterdam 2015/16

Supervisors dhr. dr. Bastiaan Rutjens Second Assessor: dhr. dr. Michiel van Elk University of Amsterdam Department of Social Psychology Evil, or Weird? An investigation of the moral stereotype of scientists

Alessandro Santoro University of Amsterdam

A recent by Rutjens and Heine (2016) investigated the moral stereotype of scientists, and found them to be associated with immoral behaviors, especially purity violations. We developed novel hypotheses that were tested across two studies, inte- grating the original findings with two recent lines of research: one suggesting that the intuitive associations observed in the original research might have been influenced by the weirdness of the scenarios used (Gray & Keeney, 2015), and another using the dual-process theory of morality (Greene, Nystrom, Engell, Darley, & Cohen, 2004) to investigate how cognitive reflection, as opposed to intuition, influences moral judg- ment. In Study 1, we did not replicate the original results, and we found scientists to be associated more with weird than with immoral behavior. In Study 2, we did not find any effect of reflection on the moral stereotype of scientists, but we did replicate the original results. Together, our studies formed an image of a scientist that is not necessarily evil, but rather perceived as weird and possibly amoral. In our discussion, we acknowledge our studies’ limitations, which in turn helped us to meaningfully interpret our results and suggest directions for future research. Keywords: Stereotyping, Moral Foundations Theory, Cognitive Reflection

How far would a scientist go to prove a theory? In 1802, et al., 2015) has shown a lack of interest in people to a training doctor named Stubbins Ffirth hypothesized pursue a science related career. Additionally, nega- that yellow fever was not an infectious disease, contrary tive perceptions of scientists can influence the extent to to popular belief. To prove his theory, he poured in- which people adhere to their recommendations. This fected vomit into his open wounds. When the wounds influence is perfectly illustrated by the discrepancy be- healed without problems, he continued to experiment tween popular opinion and scientific evidence regarding on himself: he dropped additional ‘fresh black vomit’ genetically modified organisms (GMOs). Regardless of into his eye, swallowed pills made from it, and even the available scientific evidence in favor of GMOs, the drank it in a solution with water. Ffirth never got sick public opposition remains strong due to several factors from his disturbing series of experiments, but it was related to the way the GMOs are perceived as danger- not because the disease was not infectious: it just re- ous and immoral (Blancke, Van Breusegem, De Jaeger, quires direct transmission into the bloodstream, usually Braeckman, & Van Montagu, 2015). In turn, these through the bite of a mosquito (Herzig, 2005). negative representations yield a large impact on both The stereotype of the ‘evil scientist’ is quite per- national and international development of regulatory vasive in popular culture. Examples of such evil or frameworks concerning the import and cultivation of immoral scientists can be found in contemporary se- GM crops. It seems therefore important to directly ad- ries such as Dexter from Dexter’s Laboratory or Rick dress the issue of whether scientists are in fact perceived Sanchez from Rick and Morty, which are probably due as immoral as they are depicted in popular culture. to real cases of unscrupulous scientists such as Stub- The stereotype of the immoral scientist was the main bins Ffirth. Such a negative stereotype can have serious focus of a recent study by Rutjens and Heine (2016), consequences, as people might decide to distance them- which is central to the current research. They investi- selves from scientists (Cuddy, Fiske, & Glick, 2008). gated this moral stereotype by looking at the intuitive This is even more important when considering that a re- associations that people hold towards the morality of cent report from the European Commission (Hazelkorn scientists. Indeed, they found that scientists were intu- 2 ALESSANDRO SANTORO itively associated with a variety of moral violations (es- ena, such as differences in moral judgments among var- pecially purity violations), as compared to various con- ious cultures or political ideologies (for a review of the trol targets. The current project aimed to replicate and existing empirical findings, see Graham et al., 2012). extend their findings in order to understand the true na- MFT research has identified (at least) five moral foun- ture of such a negative stereotype, which in our research dations – Care, Fairness, Loyalty, Authority, and Pu- was operationalized as intuitive associations (Study 1) rity – and has used the Moral Foundations Question- and as explicit judgments (Study 2) of scientists’ moral- naire (MFQ; Graham, Haidt, & Nosek, 2009) to assess ity. the extent to which they are endorsed by individuals. We developed novel hypotheses that were tested Additionally, research in this field uses scenarios that across two studies, integrating the original findings with describe violations specific to each moral foundation recent lines of research, one suggesting that the intu- (Davies, Sibley, & Liu, 2014). In their research, Rutjens itive associations observed by in the original research and Heine (2016) investigated how scientists’ morality might have been influenced by the weirdness of the sce- is perceived using both scenarios taken from the MFT narios used (Gray & Keeney, 2015), and another using literature and the MFQ. dual-process theory of morality (Greene et al., 2004) to Our first study draws on the studies 1-7 from Rut- investigate how cognitive reflection, as opposed to in- jens and Heine’s research (2016), in which they exam- tuition, influences moral judgment. In the first study, ined the stereotype of scientists’ morality by looking at we looked at the nature of the intuitive associations the intuitive associations people hold towards scientists. observed by Rutjens and Heine (2016), investigating More specifically, they used a design that combined whether scientists are perceived as immoral or rather moral scenarios with the conjunction (Tversky as weird. In the second study, we tried to replicate & Kahneman, 1983), a reasoning error that occurs when the explicit associations observed by Rutjens and Heine it is assumed that specific conditions are more likely (2016), while exploring whether these can be influenced than more general ones (described below). Rutjens and by an induced and/or dispositional reflective state. Heine (2016) initially presented participants with a sce- nario describing a particular moral violation, such as Study 1 – The moral stereotype of scientists: the following: intuitive associations On the way home from work, Jack decided Rutjens and Heine (2016) based their investigation to stop at the butcher shop to pick up some- of the perception of scientists’ morality on the Moral thing for dinner. He decided to roast a Foundations Theory (MFT), which is a central theoret- whole chicken. He got home, unwrapped ical framework in morality research. MFT is rooted in the chicken carcass, and decided to make anthropological research and aims to understand why, love to it. He used a condom, and fully ster- even though morality differs across cultures, recurrent ilized the carcass when he was finished. He themes and similarities can also be found. It was shaped then roasted the chicken and ate it for din- into its present form by Haidt and Joseph (2004), who ner alongside a nice glass of Chardonnay. argued that morality is composed of universal and in- (Supplements, p. 1) nate moral foundations. A metaphor used to explain this concept is that After the reading the scenario, participants had to in- morality is like a human tongue with its taste receptors dicate which option was more probable: A) Jack is a (Haidt, 2012). In the same way we all have the same sports fan or B) Jack is a sports fan and a [condition receptors but different tastes in food, MFT argues that target]. Depending on the condition the participant was we also have the same cognitive modules, or founda- in, the target of option B would be either a scientist or tions, but different ‘tastes’ in morality. The extent to one of several control targets (e.g., an atheist, a Mus- which these foundations are cultivated across cultures lim). Since it is impossible for a subcategory (option B) makes them more or less sensitive, which then results to be more likely than the whole category (option A), in different patterns of morality. Since its formulation, selecting option B would be a reasoning error (i.e., the MFT has been used to account for a variety of phenom- ). The likelihood to make such an EVIL OR WEIRD? 3

error is based on the participant’s intuitive associations al., 2004). In accordance with this model, research has between the description of the person in the scenario shown that deontological judgments (i.e., judgments and the target selected. Therefore, these can be concerned with rights and duties) are associated with adopted as a measure of the people’s moral stereotype intuitive responses, whereas utilitarian judgments (i.e., towards the target. judgments concerned with maximizing utility) are asso- To distinguish which moral foundations are associ- ciated with more pondered responses (for an overview, ated with scientists the most (or the least), Rutjens and see Paxton, Bruni, & Greene, 2014). To test this the- Heine (2016) used different moral scenarios taken from ory, researchers have often employed the Cognitive Re- the MFT literature, with each scenario depicting a vi- flection Test (CRT; Frederick, 2005), a test designed to olation to a particular moral foundation. They found assess participants’ ability to suppress intuitive and in- that except for fairness and care violations, scientists correct answers in favor of a deliberative and correct were consistently associated with immoral behavior, in answer. An example item is the following: “A bat and particular with violations of purity (e.g., a person mak- a ball cost $1.10 in total. The bat costs $1.00 more than ing love to a dead chicken). However, it has recently the ball. How much does the ball cost?”. Even though been argued that such scenarios are not just immoral, one would intuitively think the ball cost $0.10, people but also very weird. Research has shown that impurity who consider the problem more thoughtfully reach the scenarios are considered both weirder and less severe correct answer, $0.05 (as 0.05 + 1.05 = 1.10). than other types of scenarios, rising doubts about a pos- Interesting results have been obtained in research sible in MFT research (Gray & Keeney, combining the CRT with moral scenarios. First, a ro- 2015). Accordingly, it is possible that scientists are not bust positive correlation was found between the num- necessarily associated with immorality, but rather with ber of correct answers and utilitarian judgments (Hard- unusual behavior that may or may not be (im-)moral. man, 2008; Santoro, 2014), as well as a high nega- Given the discussed importance of negative public per- tive correlation between CRT scores and judgments of ceptions of scientists’ morality, it is important to deter- moral wrongness (Pennycook, Cheyne, Barr, Koehler, mine whether they are truly perceived as immoral or as & Fugelsang, 2014). Second, the CRT has been used only as capable of odd behavior. to induce a reflective state and investigate its effects on Study 1 used the same design as Rutjens and Heine moral judgment. For instance, having participants com- (studies 1–7; 2016), but extended their findings with pleting the CRT either before or after rating the wrong- a different set of moral scenarios with the goal to iso- ness of moral dilemmas, it was found that participants late impurity, weirdness, and severity. In doing so, we in the CRT-first condition judged the scenarios as more aimed to shed more light on the nature of the intuitive acceptable than those in the dilemmas-first condition, associations that people hold towards scientists. hence offering further support for the dual-process the- ory of morality (Paxton, Ungar, & Greene, 2012). Study 2 – The moral stereotype of scientists: Study 2 combined this paradigm with the design of explicit judgments study 8 from Rutjens and Heine (2016), which directly explored what foundations are most strongly associated After establishing the role of morality in the intuitive with scientists. We investigated whether participants associations with scientists in Study 1, Study 2 inves- who complete the CRT before making their judgments tigated the role of cognitive reflection in making ex- (consequently entering a reflective state) rate scientists’ plicit morality judgments about scientists. To this end, morality differently from participants who complete it we draw on a dual-process theory of morality which afterwards, and we also investigated whether the scores has been used to test how cognitive reflection (as op- on the CRT (a measure of dispositional reflectiveness) posed to intuition) influences moral judgment. Ac- correlate with a specific pattern of judgments. Although cording to the dual-process theory, people generating the CRT has been used in the past to examine the ef- a moral judgment experience a conflict between two fects of reflection on moral judgments (e.g, Paxton et distinct psychological/neural systems: an intuitive, au- al., 2012), it has never been employed to study the ef- tomatic and emotionally-driven system, and a more re- fects of reflection on a moral stereotype, thus this study flective, controlled and reason-driven system (Greene et 4 ALESSANDRO SANTORO was a first step for this type of investigation. Practically, The second research question is more exploratory in it is important to know whether reflection can influence nature, since the effects of reflection on moral stereo- the perception of the scientists’ morality in the case it types has not been investigated yet. However, reflec- might help people being more thoughtful about the sci- tion has been associated with utilitarian judgments, and entists’ suggestions. More generally, it is also important this might lead to a less harsh perception of scientists’ to see whether reflection can influence a moral stereo- morality in the case they are perceived to be more con- type similarly to the way it affects moral judgments. cerned with maximizing utility (e.g., GMOs can be good) than with what is right (e.g., GMOs are unnat- ural). If this is the case, it could be argued that both Key Research Questions induced and dispositional reflection would result in less harsh judgments of the scientists’ morality. We tested Summarizing, our research draws on the research by this idea in Study 2. Rutjens and Heine (2016) and aims to integrate it with the two other lines of research that have been discussed: Study 1 the first investigating how weirdness and severity of a Pilot scenario affect moral judgment, and the second looking at how (dispositional or induced) reflection influences Initially, 25 moral scenarios were piloted (refer to moral judgment. To do so, two research questions were Appendix A for a full description of the pilot’s meth- tested in two studies: ods and results) and categorized according the follow- Study 1 ing criteria: general immorality, impurity, severity and Are the intuitive associations between scientists and weirdness. The differing ratings obtained in the pilot immorality due to the immorality depicted in impurity were then used to select the five scenarios to use in scenarios, or rather due to the weirdness and/or sever- Study 1, one for each of the following categories: ity of the scenarios? Study 2 1. Impure + weird (+ not severe); Does cognitive reflection influence the perceptions of 2. Impure + severe (+ not weird); scientists’ morality? Considering the research questions are combining 3. Not impure + weird (+ not severe); novel lines of research, making specific predictions was 4. Not impure + severe (+ not weird); complex. Pertaining to the first research question, it is possible that for the associations to occur, perceived im- 5. Impure + not weird and not severe. purity is essential (i.e., scientists being considered im- moral), that perceived weirdness is essential (i.e., sci- Except for some of the type three scenarios (which are entists being considered odd people), or that both are not inherently immoral), the scenarios used were based necessary. However, Rutjens and Heine (2016) found on moral vignettes previously proposed and validated that scientists are associated not only with purity viola- (e.g., Clifford, Iyengar, Cabeza, & Sinnott-Armstrong, tions, but also with other moral violations which might 2015; Graham & Haidt, 2012; Gray & Keeney, 2015). be perceived as less weird. Additionally, taking into account the results of the original research, we also rea- The ratings’ means and standard deviations for each soned that scientists should not be associated strongly of the moral scenarios piloted were used to determine with severely immoral behaviors (such as rape) even which were the best fit for our five categories. Except though this prediction was more exploratory. For these for the type 3 scenario (i.e., only weird) we could not reasons, we predicted that in Study 1 scientists would find ones that perfectly matched our categories, due to be associated with impurity as well as with weirdness, the limited amount of scenarios that we were able to pi- expecting the strongest associations for scenarios that lot. Therefore, we chose the scenarios by looking at the are both, and the weakest associations for scenarios that ones in which the relations between the ratings of in- are neither of those and/or highly severe. terest (e.g., weird and impure but not severe) were best EVIL OR WEIRD? 5

represented, even though it meant using scenarios that 3. The third scenario (man with hamster) was cre- were suboptimal for the respective category. For exam- ated by us and thus we cannot compare it with ple, for the type 5 scenario (i.e., only impure), we could previous ratings in the literature. not find a scenario that was high in impurity but low on severity and weirdness, and thus we chose one that 4. The fourth scenario (kicking a dog) had an aver- was rated lower on severity and weirdness than it was age rating of 4.40 (SD = 0.81) in severity and of on impurity, despite its impurity rating being quite low. 3.62 (SD = 1.34) in weirdness, which is similar All the means and standard deviations for the scenarios to the one observed by Gray and Keeney (2015), used in Study 1 are shown in Table 1 below, whereas who found the scenario to be rated 6 (out of 7) in Appendix A reports all of the pilot’s results. severity and 4.5 (out of 7) in weirdness. The ratings we obtained in our pilot study were con- 5. The fifth scenario (sex for drink) had an average sistent with those available in the literature, increasing rating of 2.82 (SD = 1.52) in immorality and of their reliability: 3.10 (SD = 1.46) in impurity, which is similar to the results of Clifford and colleagues (2015), who 1. The first scenario (necrobestiality) had an aver- had a comparable scenario – “A homosexual in a age rating of 3.24 (SD = 1.67) in immorality and gay bar offering sex to anyone who buys him a of 3.88 (SD = 1.48) in impurity, which is similar drink” – that received in an average rating of 2.6 to the results of Clifford and colleagues (2015), (out or 5) in immorality, while 73% of partici- who found an average rating of 3 (out of 5) in pants rated the scenario as impure. immorality and where 88% of participants rated the scenario as impure. Even though these scenarios were the best fit for our categories out of the ones we piloted, they were not a 2. The second scenario (rape) had an average rating perfect match. The first category was [impure, weird, of 4.76 (SD = 0.52) in severity and of 3.82 (SD not severe], but the scenario was rated relatively high = 1.37) in weirdness, which is similar to the one also in severity (M = 3.70; SD =1.40). The second cat- observed by Gray and Keeney (2015), who found egory was [impure, severe, not weird], but the scenario the scenario to be rated 7 (out of 7) in severity was rated relatively high also in weirdness (M = 3.82; and 4.3 (out of 7) in weirdness. SD = 1.37). The third category was [weird, not impure, not severe] and the scenario used fit the category well. The fourth category was [severe, not impure, not weird],

Table 1 Pilot ratings for the scenarios used in Study 1 Rating Scenario M (SD) Immorality Impurity Severity Weirdness 1. Jack has sex with a frozen dead chicken before 3.24 (1.67) 3.88 (1.48) 3.70 (1.40) 4.84 (0.37) cooking it for dinner. 2. Jack forces another person to have sexual 4.76 (0.69) 4.54 (0.95) 4.76 (0.52) 3.82 (1.37) intercourse with him, without that person’s consent. 3. Jack carries around his hamster in his pocket 1.32 (0.82) 1.40 (0.86) 2.00 (1.16) 4.32 (1.02) daily, regularly asking the hamster for advice. 4. Jack kicks a dog in the head, hard. 4.46 (0.81) 3.90 (1.36) 4.40 (0.81) 3.62 (1.34) 5. Jack is in a bar and offers to sleep with anyone 2.82 (1.52) 3.10 (1.46) 2.48 (1.34) 2.90 (1.39) who buys him a drink. 6 ALESSANDRO SANTORO but the scenario was rated relatively high also in impu- paring scientists across scenarios), and whether this at- rity (M = 3.90; SD =1.36) and weirdness (M = 3.62; SD tribution is affected by the severity of the scenario. = 1.34). The fifth category was [impure, not weird, not Participants. G*Power 3 (Faul, Erdfelder, Lang, severe], but the ratings of the scenario used were quite & Buchner, 2007) was used to determine the number similar: 3.10 (SD = 1.46) for impurity, 2.48 (SD = 1.34) of participants needed. For each scenario condition for severity, and 2.90 (SD = 1.39) for weirdness. How- in Study 1, at least 150 subjects were needed to de- ever, the limited amount of time and resources avail- tect medium effects (w=.3) with 95% power using chi- able for this project did not give us a chance to pilot squared. Participants were recruited on Amazon’s Me- additional scenarios in order to find better fits for our chanical Turk (MTurk; Buhrmester, Kwang, & Gosling, categories. 2011), which offers a diverse pool of subjects taken Yet, bearing in mind that the scenarios were not per- from the US population, thus avoiding culture effects. fectly representative of the respective categories, our re- They were excluded if they failed an attention check or sults can still offer fruitful insights regarding our first did not answer all the questions; this was the only exclu- research question. sion criterion. A of 764 adults (i.e., over 18; age and gender was not recorded) took part in Study 1 in ex- Methods change for a monetary reward. Eight participants were excluded because they did not answer all the questions Experimental design. In Study 1, each participant and two because they failed the attention check. This re- had to read a single scenario and then indicate whether sulted in a total of 754 participants that were randomly the person portrayed in the scenario is more likely to assigned to one of the conditions; Table 2 below shows be A) a sports fan or B) a sports fan and a scien- the specific number of participants in each condition of tist/atheist/Muslim. As previously discussed, choosing Study 1, which was relatively evenly distributed. option B would indicate a reasoning error (i.e., con- junction fallacy) due to the associations that the person Table 2 intuitively holds towards the target depicted. Since the Number of participants per condition conjunction fallacy is a very brief measure, it allowed us Target to keep the study as short as possible (and consequently Scenario Total to test a high number of participants). Scientist Atheist Muslim Even though Rutjens and Heine (2016) had three sci- 1 56 46 48 150 entist targets (a scientist, a cell biologist, an experimen- 2 31 64 57 152 tal psychologist) we decided to only use a general scien- 3 46 55 50 151 tist target both because they did not find differences be- 4 51 47 51 149 tween the three scientist targets and because it allowed 5 49 55 48 152 us to increase our statistical power. The two control groups (atheist/Muslim) are the same as those used in Materials. In accordance with the results of the pi- Rutjens and Heine (2016) and were included both in lot, the following scenarios were used: order to keep the design as similar as possible to the original study and because they offer a good compari- 1. Impure + weird (+ not severe): son, since one (atheists) is consistently associated with • “Jack has sex with a frozen dead chicken moral violations while the other (Muslim) is not. before cooking it for dinner”; Hence, Study 1 had a between-subjects design with scenario types (1-5) and option B target (scien- 2. Impure + severe (+ not weird): tist/atheist/Muslim) as independent variables, and num- • “Jack forces another person to have sexual ber of fallacies in each condition as dependent variable. intercourse with him, without that person’s This study was thus used to answer the first research consent”; question, testing whether people attribute impure be- havior to scientists (compared to the other targets), or rather just associates them with strange behavior (com- EVIL OR WEIRD? 7

3. Not impure + weird (+ not severe): variable was created: this variable contained the value 1 for participants who committed a fallacy and the value • “Jack carries around his hamster in his 0 for participants who did not, and served as the de- pocket daily, regularly asking the hamster pendent variable in our analyses. To check for famil- for advice”; iarity with science as a possible confounder, we con- 4. Not impure + severe (+ not weird): ducted a chi-squared analysis only for the scientist con- dition across all scenarios, using familiarity with sci- • “Jack kicks a dog in the head, hard”; ence and number of fallacies as variables. This analysis 5. Impure + not weird and not severe: did not reveal any significant effect, possibly because of the small number of people who were familiar with • “Jack is in a bar and offers to sleep with any- science (17 out of 233), and we thus excluded famil- one who buys him a drink”. iarity with science from the following analyses. Next, Besides the aforementioned focal materials of the we conducted five chi-squared analyses, one for each study, both Study 1 and 2 included demographic ques- scenario type (1–5), using number of fallacies and tar- tions regarding religious beliefs (i.e., “Do you believe get type (scientist, atheist, Muslim) as variables. The in God or a higher power?”; 0 = not at all, 100 = very analyses revealed a significant overall difference in the much), political orientation (i.e., “What is your political number of conjunction fallacies between target condi- 2 orientation?” 0 = very liberal, 100 = very conservative), tions in all the scenario types: scenario 1 (χ (2) = 2 and nationality. Additionally, both studies contained the 18.34, p < .001, Cramer’s V = .35), scenario 2 (χ (2) 2 question “Are you a scientist, or working in academia?” = 28.39, p < .001, V = .43), scenario 3 (χ (2) = 15.02, 2 (yes / no) in order to control for familiarity with science p < .01, V = .32), scenario 4 (χ (2) = 33.99, p < .001, 2 as a possible confounder. V = .48), and scenario 5 (χ (2) = 14.44, p < .01, V Procedure. Upon accessing the survey from = . 31). Subsequently, post-hoc comparisons were MTurk, participants were presented with a welcome conducted between targets for all the scenario types. page containing a short briefing. After reading the Table 3 below shows the results of these comparisons briefing, participants had to confirm they were 18 or (with a Bonferroni-adjusted significance level of .01/15 older and that they agreed to take part in the study, and = .0006), together with those of the overall chi-squared then click on “Next” to start the experiment. At this tests and the percentage of fallacies in each condition. point, the website randomly assigned the participant to The percentage of conjunction fallacies per condition one of the conditions, and presented the corresponding are also illustrated in Figure 1 below. moral scenario and target option. The participant then had to read the scenario and indicate whether it is more probable that the person in the scenario is a A) a sports fan or B) a sports fan and a scientist/atheist/Muslim. After completing the main task of the study, partici- pants had to complete an attention check to determine if they were paying attention (i.e., they were asked to select 5 on a scale 1–7), after which they were pre- sented with the demographic questions and the control question about familiarity with science. After these last questions, a final screen thanked the participants and gave them the chance to give feedback.

Results First of all, the participants who failed the attention check or did not complete all the questions were ex- cluded from the analyses. Then, a dummy “Fallacies” Figure 1. Percentage of fallacies per condition. 8 ALESSANDRO SANTORO Table 3 Percentage of fallacies per condition with chi-squared test results Target Scenario χ2 (df = 2) Cramer’s V Scientist Atheist Muslim 1 21.4 % a 56.5 % b 20.8 % a 18.34* .35 2 6.5 % a 46.9 % b 10.5 % a 28.39* .43 3 39.1 % a 25.5 % b 6.0 % b 15.02* .32 4 7.8 % a 57.4 % b 17.6 % a 33.99* .48 5 2.0 % a 20.0 % a 2.1 % a 14.44* .31 Note. * p < .01. The superscripts indicate the results of the post-hoc chi-squared tests between target conditions. Same superscripts indicate no significant differences (p > .05), whereas different ones indicate significant differences (p < .01).

As shown in Table 3, the number of conjunction fal- icant positive moderate correlation was found between lacies in the atheist condition were significantly higher religiosity and political orientation (r(752) = .38, p < than those in the scientist and Muslim conditions sce- .01), indicating that the more religious people were also nario type 1, 2, and 4. For scenario type 3, partici- the more conservative ones. pants in the scientist condition made significantly more Table 4 fallacies than those in the atheist and Muslim condi- Correlations between fallacies, religiosity and political tions. For scenario type 5, no significant differences orientation were found between targets. Finally, a chi-squared analysis was conducted to look Variables 1 2 3 at differences in the number of conjunction fallacies be- 1 Fallacies - tween scenario types in the scientist condition. The 2 Religiosity .12* - analysis showed a significant overall difference (χ2(4) 3 Political Orientation .09** .38* - = 31.46, p < .001, Cramer’s V = .37), and subsequent Note. * p < .01, ** p = .01. post-hoc comparisons were conducted to look at spe- cific differences between scenarios. These comparisons Discussion revealed that participants committed significantly more fallacies in scenario type 1 than in type 5 (χ2(1) = 9.06, In Study 1, we expected that scientists would be as- p < .01, V = .29), and more fallacies in scenario type 3 sociated the most with scenarios that were either weird, than in type 2 (χ2(1) = 10.29, p < .005, V = .37), type impure, or both. Moreover, we expected them to be as- 4 (χ2(1) = 13.50, p < .001, V = .37), and type 5 (χ2(1) sociated the least with scenarios that were severe. The = 20.40, p < .001, V = .46). While all these compar- first of these predictions was confirmed by our results, isons were significant with a Bonferroni-adjusted sig- as scientists were associated the most with the weird nificance level of .005 (.05/10), none of the other com- only scenario. When comparing this scenario with the parisons were significant. others, we saw that scientists were significantly more Besides the main analyses, we also checked for cor- associated with scenario type 3 (weird only) than with relations between fallacies, religiosity, and political ori- type 2 (impure + severe), 4 (severe only), and 5 (impure entation. The analyses showed significant correlations only). The results of the first two comparisons are in between all of them (across conditions), and the results line with our prediction, since we expected the lowest are presented in Table 4 below. A significant positive associations for scenarios that were severe. One might weak correlation was found between the number of fal- wonder why, then, scientists were significantly more as- lacies and both religiosity (r(752) = .12, p < .01) and sociated with the type 3 than type 5 scenario. However, political orientation (r(752) = .09, p = .01), indicating this is not surprising if we consider the actual ratings that more fallacies were committed by people that were of type 5 scenario, which was practically only slightly more religious or more conservative. Finally, a signif- impure and also had slightly higher severity than type 3. EVIL OR WEIRD? 9

Bearing this in mind also allows us to explain why we initially expected. To overcome this limitation, we took found that scientists were significantly more associated into account the actual ratings of the scenarios rather with scenario type 1 (impure + weird) than with type 5, than their category, which allowed us to meaningfully since type 1 was higher in both weirdness and impurity interpret our results. Still, since our results rely on than type 5; therefore also this result is in line with our the scenarios used, it could be possible that scientists predictions. are not necessarily associated with weird behavior, but Yet, our results differ from those of the original re- rather only with the specific behaviors we depicted in search by Rutjens and Heine (2016) in a number of the one-sentence scenarios. This is something future ways. First, a much higher percentage of people in their research could look into, perhaps looking at how differ- research associated necrobestiality with scientists: in ent types of strange behavior are associated with scien- our study, 21.4% of participants in that condition com- tists. Further research could also look at how scientists mitted a conjunction fallacy, whereas in the original re- are associated with weird behavior compared to groups search, up to 65.8% did. Second, we found atheists to which are notoriously known as strange and eccentric, be significantly more associated with immoral behav- such as rock stars. ior than both scientists and Muslims (as in previous re- Additionally, our study used the target ‘scientists’ in search; Gervais, 2014), whereas in the original research a general sense, with no label specifying which type of the number of fallacies for scientists (in the necrobes- scientist. This should also be taken into account when tiality condition) was either similar to those for atheists, interpreting our results, as it could be possible that our or even significantly higher than those. Third, while results cannot be extended to all the different types of in our research the fallacies for the scientists and Mus- scientists. However, we think that using a general ‘sci- lims did not differ significantly across scenarios (except entist’ target was the best way to approach our inves- for the type 3, which was not immoral), in the original tigation for three reasons: it gave us more statistical research scientists were significantly more associated power (compared to having more scientist targets), Rut- than Muslims with a number of moral violations. jens and Heine (2016) did not find differences between Taken together, the results of Study 1 cast doubt on scientists conditions, and because that is how they are the findings by Rutjens and Heine (2016), and rather usually referred to in the media (i.e., scientists rather suggest that scientists are perceived more as weird than than chemists or physicists). Therefore, we find the use as immoral. Further support for this idea comes from of the general ‘scientist’ target not necessarily a limita- the fact that the two highest numbers of fallacies we ob- tion, but rather something to keep in mind when gener- served for scientists were committed in scenarios type alizing the results. 3 (only weird) and type 1 (also highly weird), while the Finally, one might wonder why the results of Study 1 lowest number of fallacies were committed in the type did not replicate the ones of the original research. This 5 scenario (the least weird). Therefore, the results from discrepancy might have been caused by the materials Study 1 not only support our initial predictions, but also used, since even though we used the same scenario as suggest that the original results of Rutjens and Heine the original research (i.e., necrobestiality) our scenario (2016) might have been confounded by the weirdness was worded differently. Our scenario descriptions were of the scenarios, offering support to Gray and Keeney’s only one sentence long, whereas scenarios used in the (2015) hypothesis about a sampling bias in MFT re- original research (as well as in the MFT literature; e.g., search. However, our findings need to be interpreted Graham & Haidt, 2012) are usually longer. However, with caution due to the limitations of our study, which since we had to keep the study as short as possible and are discussed below. since we could not find (longer) moral scenarios in the The main limitation of our research is that due to the literature that would fit all our categories, we opted to restricted time and resources available we could only use short moral vignettes. pilot 25 scenarios, and had to choose among those the The difference between the vignettes we used and the ones that would best fit our scenario types. Our results scenarios used in the original research can be clearly are thus based on the scenarios we used, which were seen by looking back at the necrobestiality scenario not as representative for the respective categories as we mentioned in the introduction: in that scenario, the 10 ALESSANDRO SANTORO moral violation is meticulously described, with the per- if they failed an attention check or did not answer all son unwrapping the chicken carcass, using a condom, the questions. A sample of 107 adults (i.e., over 18; age and sterilizing the carcass afterwards; on the other hand, and gender was not recorded) took part in Study 2 in ex- our scenario excluded all these details and just men- change for a monetary reward. Nine participants were tioned the necrobestiality act. Perhaps, the fastidious- excluded because they did not answer all the questions ness of the scenario used in the original research makes and fourteen because they failed the attention check. it seem very methodological and analytical, which is This resulted in a total of 84 participants that were ran- then consequently associated with the mentality of a domly assigned to either the MFQ-First condition (N = scientist. Support for this idea comes from an indepen- 45) or the CRT-First condition (N = 39). dent replication of the original research that success- Materials. The moral judgment section of the fully replicated the original results using the same (i.e., Moral Foundations (MFQ30-part2; Gra- longer) scenarios (Soetekouw, 2016). This suggests that ham et al., 2009) was used to assess the explicit judg- Gray and Keeney’s (2015) concerns regarding a possi- ments of the scientists’ morality; an example item of the ble scenario sampling bias in the MFT literature are le- MFQ is “Justice is the most important requirement for gitimate, and also that we should investigate certain as- a society” (1 = strongly disagree; 5 = strongly agree). pects of these scenarios (e.g., weirdness, severity, word- The Cognitive Reflection Test (CRT; Frederick, 2005) ing) in order to avoid confounded results. This idea is was used to induce and assess reflection. An example further discussed in the general discussion. item of the MFQ is “If it takes 5 machines 5 minutes to make 5 widgets, how long would it take 100 ma- Study 2 chines to make 100 widgets?” (5 minutes). The MFQ and the CRT are reported in full in Appendix B and C Methods respectively. Additionally, Study 2 used the same de- Experimental design. In Study 2, participants mographic and control questions of Study 1. were asked to complete the Cognitive Reflection Test Procedure. Except for the main tasks, the overar- (CRT) either before or after completing the moral judg- ching procedure of Study 2 was the same as the one in ment section of the Moral Foundations Questionnaire Study 1. After the previously described introduction, (MFQ) from the perspective of a scientist. The study the website randomly assigned the participants to ei- thus only had one between participants condition (CRT- ther the CRT-First or the MFQ-First condition, and pre- first / MFQ-first). sented either the CRT or the MFQ. Participants had then To answer the second research question, we looked to read the instructions for the part presented (whether at differences in the MFQ scores (i.e., explicit moral it was the CRT or the MFQ) and complete all the items judgments of scientists) between the participants in the before moving on to the next section. Within each sec- CRT-first condition and those in the MFQ-first condi- tion, the items were randomized to avoid the study to tion, as well as at the correlation between the number be confounded by an order effect. Each item of the of correct responses in the CRT and the MFQ scores CRT was presented individually, in a screen contain- (across conditions). This allowed us to explore whether ing both the question and an empty cell to enter the an- induced (i.e., participants in the CRT-first condition) or swer; participants had to answer the question to move dispositional (i.e., participants with a high CRT score) on to the next one. All the MFQ items were presented reflection affects the participants’ moral stereotype of on the same page, together with the instructions ask- scientists. ing participants to respond as John, who is a scientist, Participants. G*Power 3 (Faul et al., 2007) was in order to make sure they answered from the perspec- used to determine the number of participants needed. tive of a scientist; participants answered each item on For each condition in Study 2, 50 participants were a five-point Likert scale (1 = John strongly disagrees; needed to detect medium effects ( f 2=.135) with 95% 5 = John strongly agrees). After completing the MFQ power using a regression analysis. As in Study 1, par- (and independently from their condition), participants ticipants were recruited on Amazon’s Mechanical Turk were asked to describe what they knew about John (i.e., (MTurk; Buhrmester et al., 2011), and were excluded he is a scientist), in order to check that the manipulation EVIL OR WEIRD? 11 was successful. Finally, the aforementioned control and Subsequently, a multivariate regression analysis was demographic questions were presented, after which a conducted with CRT score as a predictor and the five final screen thanked the participants and gave them the moral foundations as dependent variables, but no sig- chance to give feedback. nificant differences were found between different CRT scores. In order to look for a possible interaction be- tween CRT scores and order condition, we used these Results two as predictors in a multiple regression with the five First, participants who failed the attention check, moral foundations as dependent variables, but no sig- those who did not complete all the questions and those nificant differences were found. who failed the manipulation check were excluded from Finally, a one-way ANOVA was conducted to look the analyses1. Second, a “CRT score” variable was cre- at differences between moral foundations; this analy- ated, which contained the number of correct answers sis included participants from both order conditions, to the CRT for each participant, ranging from 0 to 3. since they did not differ in our previous analyses. Third, the answers to the three items of the MFQ cor- The ANOVA showed a significant overall difference responding to each moral foundation (as illustrated in (F(4,83) = 17.34, p < .001), and subsequent paired- Appendix B) were averaged and computed into five new samples t-tests were conducted to look at specific differ- variables, one for each moral foundation, ranging from ences between scenarios. These comparisons revealed 0 to 5. Fourth, to control for familiarity with science that participants rated scientists significantly lower in as a possible confounder, we ran a MANOVA with the Purity (M = 2.81, SD = .95) than in Fairness (M = 3.56, five moral foundations as dependent variables, and with SD = .63; t(83) = -6.57, p <.001), Loyalty (M = 3.17, familiarity with science as independent variable. Since SD = .60; t(83) = -3.63, p <.001), Authority (M = 3.28, this analysis was not significant, we excluded familiar- SD = .66; t(83) = -5.86, p <.001), and Care (M = 3.43, ity with science from the subsequent analyses. Then, SD = .71; t(83) = -4.92, p <.001). Additionally, par- we conducted a MANOVA with the moral foundations ticipants rated scientists significantly higher in Fairness as dependent variables, and order condition (MFQ- (M = 3.56, SD = .63) than in Loyalty (M = 3.17, SD first/CRT-first) as independent variable. No significant = .60; t(83) = 4.51, p <.001). While all these compar- differences were found between conditions, and the re- isons were significant with a Bonferroni-adjusted sig- sults of the analysis are shown in Table 5 below together nificance level of .001 (.01/10), none of the other com- with means and standard deviations for each foundation parisons were significant. The means for each moral in each order condition. foundation are shown in Figure 2, together with the out- come of the comparisons. Same superscripts indicate no significant differences (p > .05), whereas different Table 5 ones indicate significant differences (p < .001). Analyses of Variance between Conditions and Moral Besides the main analyses, we checked for correla- Foundations tions between CRT scores, moral foundations, religios- Condition Moral ity, political orientation, and familiarity with science. F-value* Foundation MFQ-First CRT-First The results of these analyses are shown in Table 6 be- M(SD) M(SD) low, together with means and standard deviations for each of the variables. As in Study 1, we found a signif- Fairness 3.47 (.62) 3.67 (.64) 2.14 icant positive weak correlation between religiosity and Loyalty 3.16 (.63) 3.19 (.58) 0.06 political orientation (r(752) = .38, p < .01), indicating Authority 3.28 (.56) 3.27 (.76) 0 that the more religious people were also the more con- Purity 2.79 (.91) 2.84 (1.00) 0.05 servative ones. Finally, we found a significant weak to Care 3.47 (.70) 3.38 (.72) 0.40 moderate negative correlation between religiosity and Note. * Degrees of freedom of the test statistics were the same for each test: df = (1,82). None of the tests were signif- 1Three additional participants respectively answered only icant (p > .05). 1s, 3s, and 4s to all items, and were later excluded from the analyses; this did not alter the results. 12 ALESSANDRO SANTORO

that reflection, which has been associated with utilitar- ian judgments (Paxton et al., 2014), could improve the overall perception of a scientist’s morality in the case they are perceived to be associated more with maximiz- ing utility (i.e., utilitarian) rather than with what it is right (i.e., deontological). However, this was not the case, since the ratings on the five moral foundations were not different for participants in the CRT-first con- dition (compared to those in the MFQ-first condition) and with different CRT scores (across conditions). Our results thus suggest that (induced or dispositional) re- flection does not improve the moral stereotype of scien- tists. Yet, it must be noted that our study had a smaller Figure 2. Averages for each moral foundation across sample than what we expected (especially the CRT-first conditions. condition, which had 39 participants instead of 50), due to the number of participants that had to be excluded from the analyses; hence, it is advisable to replicate the number of correct responses on the CRT, r(82) = our study with a bigger sample to further validate our -.30, p <.01, indicating that more religious people ob- results. Additionally, it is possible that we did not ob- tained lower scores on the CRT. serve an effect because our manipulation failed to elicit a reflective state, and this might have happened for two reasons. First, due to our limited time and budget, we Discussion could only use the three items of the CRT, whereas other Our second study was more exploratory in nature, research has used a wider battery of items to elicit and as to our knowledge it was the first one investigating measure reflection (e.g., Pennycook et al., 2014). This the possible effects of cognitive reflection on a moral should not be a problem since the CRT alone has also stereotype. We aimed to integrate the dual-process the- been used to elicit cognitive reflection (Paxton et al., ory of morality (Greene et al., 2004) with our investiga- 2012), but a future replication attempt should use the tion of the (im)moral stereotype of scientists and to see full battery of items and see whether that would change whether such a stereotype could be affected by induced the outcome of the study. Second, our research relied or dispositional reflection. To this end, we reasoned on online data collection and this could have led to im-

Table 6 Means, standard deviations and correlations between the variables in Study 2 Variable M SD 1 2 3 4 5 6 7 8 9 1 CRT 1.90 1.22 – 2 Fairness 3.56 .63 -.16 – 3 Loyalty 3.17 .60 -.11 .17 – 4 Authority 3.28 .66 -.04 -.06 .35* – 5 Purity 2.81 .95 -.10 .18 .39* .64* – 6 Care 3.43 .71 .03 .50* .08 .07 .06 – 7 Religiosity 34.63 40.64 -.30* .15 .17 .20 .27** .03 – 8 Political Orientation 37.00 26.31 -.02 .08 .09 .23** .27** .02 .26** – 9 Familiarity With Science 1.89 0.31 .07 -.10 -.03 .23** 0.00 -.10 .01 .05 – Note. * p < .01, ** p < .05. EVIL OR WEIRD? 13 personal participation (Evans & Mathur, 2005), with General Discussion participants taking part to the study while doing other things or responding superficially, thus failing to actu- Summary of the Studies ally engage in reflection. This possibility could be in- Taken together, our results yielded important in- vestigated by future research, for instance using the de- sights. In our first study, we saw that scientists were sign of previous studies involving the CRT (e.g., Paxton associated the most with weird behavior and the least et al., 2012) and trying to replicate their results using with severely immoral behavior, in line with our predic- a pen and paper version and an online version, to see tions. In the second study, we found no effect of cogni- whether the results are similar (as well as in line with tive reflection on the moral stereotype of scientists, but the literature). Therefore, even though the results of our we found that scientists were not considered to be evil, second study were not significant, we obtained a num- although they somewhat lacked in purity. The two stud- ber of valuable insights that can be used to improve this ies had some limitations that were discussed, and that type of investigation in the future. proved to be useful in informing future research. Due to The other aim of this study was to replicate the orig- these limitations, our results should be replicated before inal results of Rutjens and Heine (2016), and we suc- making solid conclusions: for Study 1, future research cessfully did so. In fact, the average ratings for each should try to replicate and extend our results using dif- moral foundation in our study were very similar to those ferent scenarios; for Study 2, a replication should be observed in the original research, as shown in Table 7 conducted with a bigger sample, with the full battery below. of items used to induce reflection in previous research Table 7 (e.g., Pennycook et al., 2014), and using a pen and paper Means for each Moral Foundation in the current and version of our design. original research Research Additional Findings Moral Foundation Current* Original** In addition to the main results already discussed, our M(SD) M(SD) two studies also offered interesting correlational evi- Fairness 3.57 (.63) 3.66 (.68) dence on the relationship between CRT scores, religion, Loyalty 3.17 (.60) 3.04 (.57) and politics. In both studies, we found a significant Authority 3.27 (.66) 3.33 (.79) weak to moderate positive correlation between politi- Purity 2.81 (.95) 2.76 (.90) cal orientation and religiosity: in Study 1, r(752) = .38, Care 3.42 (.71) 3.48 (.90) p <.01; in Study 2, r(82) = .26, p <.05. These results Note. * Study 2, Santoro (2016); ** Study 8, Rutjens & Heine show that more conservative people tend to be more re- (2016). ligious, which is in line with previous research in the field (e.g., Pennycook, Cheyne, Seli, Koehler, & Fugel- Our results thus offer further support to those of Rut- sang, 2012). Moreover, we found a significant weak to jens and Heine (2016), and provide a clear picture of moderate negative correlation between religiosity and the explicit moral stereotype of scientists: although our the number of correct responses on the CRT, r(82) = ratings suggest that they are not considered to be partic- -.30, p <.01. This result is also in line with previous ularly evil, they do seem to be perceived as lacking in research on the link between analytical thinking and re- the purity foundation, which was significantly less as- ligiosity, suggesting that being inclined to apply ana- sociated with scientists than the other foundations. This lytical thinking could increase people’s willingness to suggests that rather than necessarily immoral, scientists question and be skeptical about religious beliefs (Pen- might be perceived to be amoral, in that they do not nycook, Fugelsang, & Koehler, 2015). Therefore, even mind ‘getting their hands dirty’ (i.e., do not mind im- though these results were not the main focus of our re- purity) for the sake of science; however, these are only search, they provided significant evidence in support of speculations and should be investigated in the future, as relevant lines of research and further validated the qual- discussed below. ity of our data. 14 ALESSANDRO SANTORO

The Moral Stereotype of the Scientist have serious consequences. For instance, the general public’s opinion of GMOs is affected by how these are Summing up, our investigation of the intuitive and perceived as unnatural and immoral, regardless of the explicit perceptions of the scientists’ morality formed scientific evidence offered in their support (Blancke et an image of a scientist that is not necessarily evil, but al., 2015). A negative stereotype could then affect the rather perceived as weird and possibly amoral, for ex- public’s adherence to new practices suggested by the ample as someone who can disregard morality for the scientists, and before planning interventions to increase sake of science; this is in line with what was suggested trust on scientists and their recommendations, it is cru- by Rutjens and Heine (2016). Considering that the cial to truly understand the nature of this moral stereo- results of Study 2 successfully replicated those of the type. original research, it is important to understand why the We now have an answer to our original question: results of Study 1 did not. As discussed, a possibility “Evil, or Weird”? Our results formed a more positive is that our results were confounded by the materials we image of scientists than the ‘evil scientist’ we started used, since even though we used the same scenario as with. We found that they can be perceived as weird but the original research (i.e., necrobestiality) our scenario also as somewhat lacking in purity, which could sug- had different wording. gest a stereotype of scientists as amoral, perhaps in the To tackle this issue, future research should try to val- sense that they could also set aside morality if needed. idate the scenarios used in MFT research on several However, further research is needed to confirm our re- scales, including those suggested in our research (i.e., sults and explain what (immoral and/or strange) behav- weirdness, severity). These ratings ratings can then be iors are the most associated with scientists. Until then, taken into account, together with the wording, to avoid we can take the stereotype of the immoral scientist with confounded results when using the scenarios in a spe- irony, and conclude our investigation with a relevant cific research. Furthermore, to investigate the moral joke: stereotype of scientists more clearly, future research should use our Study 1 design with more scenarios and “I’m a scientist who’s researching bestial- see which ones are more strongly associated with scien- ity between humans and chickens... tists. In particular, since our study contained only one example of weird behavior, research should use a vari- ...I’ll be in my lab.” ety of weird scenarios to investigate whether scientists are truly perceived as weird, and check which odd be- haviors they are the most associated with; notoriously weird (e.g., a rock star) or normal (e.g., an average Joe) targets could be used as control groups, to look at how they compare with scientists. Finally, considering that in Study 2 we found scientists to be rated low on impu- rity as in the original research, future research should investigate this perception more in detail. For instance, to further understand the nature of this moral stereo- type, different types of impurity scenarios could be used to see which are associated the most with scientists. A possibility could be to investigate our hypothesis that scientists might be considered impure in the extent to which they are unscrupulous for the sake of science, as illustrated in our example of the ‘vomit-eating’ Stub- bins Ffirth. It is necessary to discern whether the extent to which the public perceives scientists just as odd or as capable of immoral behavior, since a negative stereotype can EVIL OR WEIRD? 15

References Science, 1–10. Greene, J. D., Nystrom, L. E., Engell, A. D., Darley, J. M., Blancke, S., Van Breusegem, F., De Jaeger, G., Braeckman, & Cohen, J. D. (2004). The neural bases of cognitive J., & Van Montagu, M. (2015). Fatal attraction: the conflict and control in moral judgment. Neuron, 44(2), Trends in plant intuitive appeal of gmo opposition. 389–400. science 20 , (7), 414–418. Haidt, J. (2012). The righteous mind: Why good people are Buhrmester, M., Kwang, T., & Gosling, S. D. (2011). Ama- divided by politics and religion. Vintage. zon’s mechanical turk a new source of inexpensive, Haidt, J., & Joseph, C. (2004). Intuitive ethics: How in- Perspectives on psychological yet high-quality, data? nately prepared intuitions generate culturally variable science 6 , (1), 3–5. virtues. Daedalus, 133(4), 55–66. Clifford, S., Iyengar, V., Cabeza, R., & Sinnott-Armstrong, Hardman, D. (2008). Moral dilemmas: Who makes utilitar- W. (2015). Moral foundations vignettes: a stan- ian choices. Unpublished manuscript. dardized stimulus database of scenarios based on Hazelkorn, E., Ryan, C., Beernaert, Y., Constantinou, C. P., Behavior research meth- moral foundations theory. Deca, L., Grangeat, M., . . . Welzel-Breuer, M. (2015). ods 47 , (4), 1178–1198. Science education for responsible citizenship. Report Cuddy, A. J., Fiske, S. T., & Glick, P. (2008). Warmth and to the European Commission of the Expert Group on competence as universal dimensions of social percep- Science Education. tion: The stereotype content model and the bias map. Herzig, R. (2005). Suffering for science: Reason and sacri- Advances in experimental social psychology 40 , , 61– fice in modern america. Rutgers University Press. 149. Paxton, J. M., Bruni, T., & Greene, J. D. (2014). Davies, C. L., Sibley, C. G., & Liu, J. H. (2014). Confir- Are ?counter-intuitive?deontological judgments re- matory factor analysis of the moral foundations ques- ally counter-intuitive? an empirical reply to. So- Social Psychology 45 tionnaire. , (6), 431–436. cial cognitive and affective neuroscience, 9(9), 1368– Evans, J. R., & Mathur, A. (2005). The value of online 1371. Internet research 15 surveys. , (2), 195–219. Paxton, J. M., Ungar, L., & Greene, J. D. (2012). Reflection Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). and reasoning in moral judgment. Cognitive Science, G* power 3: A flexible statistical power analysis pro- 36(1), 163–177. gram for the social, behavioral, and biomedical sci- Pennycook, G., Cheyne, J. A., Barr, N., Koehler, D. J., & Behavior research methods 39 ences. , (2), 175–191. Fugelsang, J. A. (2014). The role of analytic thinking Frederick, S. (2005). Cognitive reflection and decision mak- in moral judgements and values. Thinking & Reason- The Journal of Economic Perspectives 19 ing. , (4), 25– ing, 20(2), 188–214. 42. Pennycook, G., Cheyne, J. A., Seli, P., Koehler, D. J., & Gervais, W. M. (2014). Everything is permitted? people Fugelsang, J. A. (2012). Analytic cognitive style intuitively judge immorality as representative of athe- predicts religious and paranormal belief. Cognition, PloS one 9 ists. , (4), e92302. 123(3), 335–346. Graham, J., & Haidt, J. (2012). Sacred values and evil ad- Pennycook, G., Fugelsang, J. A., & Koehler, D. J. (2015). The social versaries: A moral foundations approach. Everyday consequences of analytic thinking. Current psychology of morality: Exploring the causes of good Directions in Psychological Science, 24(6), 425–432. and evil , 11–31. Rutjens, B. T., & Heine, S. J. (2016). The immoral land- Graham, J., Haidt, J., Koleva, S., Motyl, M., Iyer, R., Wojcik, scape? scientists are associated with violations of S. P., & Ditto, P. H. (2012). Moral foundations theory: morality. PloS one, 11(4), e0152798. Advances The pragmatic validity of moral pluralism. Santoro, A. (2014). Effects of reflection and time on moral in Experimental Social Psychology, Forthcoming. judgment. Graham, J., Haidt, J., & Nosek, B. A. (2009). Liberals and Soetekouw, R. (2016). Controlling the cliché: De invloed conservatives rely on different sets of moral founda- van subjectieve controle op stereotypering. Journal of personality and social psychology tions. , Tversky, A., & Kahneman, D. (1983). Extensional versus 96 (5), 1029-–1046. intuitive reasoning: The conjunction fallacy in proba- Gray, K., & Keeney, J. E. (2015). Impure or just weird? sce- bility judgment. Psychological review, 90(4), 1–61. nario sampling bias raises questions about the founda- tion of morality. Social Psychological and Personality