University of Nevada, Reno

Replication in : Approaches, Perceived Threat, Attributions, and Attitudes Regarding Replication Research and Outcomes Among Social Psychologists

A dissertation submitted in partial fulfillment of the requirements for the degree of in

by

Ryan S. Erhart

Dr. Clayton Peoples/Dissertation Advisor

May, 2019

UNIVERSITY OF NEVADA RENO THE GRADUATE SCHOOL

We recommend that the dissertation prepared under our supervision by

RYAN S. ERHART

entitled

Replication in Psychology: Approaches, Perceived Threat, Attributions, and Attitudes Regarding Replication Research and Outcomes Among Social Psychologists

Be accepted in partial fulfillment of the requirements for the degree of

DOCTOR OF PHILOSOPHY

Clayton Peoples, Ph.D., Advisor

William Evans, Ph.D., Committee Member

Jennifer Lowman, Ph.D., Committee Member

Shawn Marsh, Ph.D., Committee Member

Dana Edberg, Ph.D., Graduate School Representative

David W. Zeh, Ph.D., Dean, Graduate School

May, 2019 i

Abstract

Psychological science is in a state of self-assessment, wherein an expansion of meta-research to examine the reproducibility of psychological findings has emerged. The primary method of assessing the reproducibility of psychological science has been large scale direct replication attempts. Initial replicable estimations are weak, leading some to declare that psychological science is in a crisis of replication. The purpose of this dissertation is to (1) establish the normative research practices of a sample of psychological scientists, (2) examine how threat is associated with different approaches to replication, (3) compare the causes that are attributed to partial replication outcomes for oneself versus others, and (4) explore how various factors (e.g. normative practices, attributions) are related to attitudes toward replication research in psychology. A sample of primarily American social psychologists were surveyed (144 complete cases).

Prevailing approaches to replication were, on average, viewed favorably, and contrary to expectations respondents were unthreatened by two replication request scenarios.

Respondents assigned dispositional attributes (e.g., flexible research practices) more so as contributing to a partial replication outcome of a peer’s work than their own work. A consistent finding emerged wherein more favorable attitudes toward and support for replication research were associated with greater assignment of flexible research practices on the part of others, and greater assumed selective reporting of studies that “work” by others. Combined, these findings suggests that at the individual-level threat is not a formidable barrier thwarting replication, and associating a partial replication outcome with increased flexible practices on the part of a peer fosters support and favorable attitudes toward replication. ii

Keywords: replication crisis, attributions, attitudes, normative research practices, culture- of-science

iii

TABLE OF CONTENTS Page

LIST OF TABLES ...... x

LIST OF FIGURES...... xii

CHAPTER

1 INTRODUCTION…………………………………………………………...... 1

2 LITERATURE REVIEW/BACKGROUND……………………………….... 6

Science, Epistemology, and Notable Assumptions about Science…...…...6

Post-Positivism and Social Construction Approaches to

Research ...... 8

Evidence-based practice approach ………………………...9

Grounded theory approach……………………………..…..9

Culture...... 10

Psychological Conceptualization of Culture……………………..10

Organizational Culture...... 11

Culture of Science...... 12

Culture of Social Science...... 12

The “Replication Crisis” in Psychology: Events/Issues that Led to It…...13

Data Fabrication………………………………………………….14

P-Hacking (i.e., Researcher Degrees of Freedom)………………15

Other Flexible Researcher Practices……………………………..17

Open Science Framework………………………………………..17

Replication: What it is, its Benefits……………………………………...18 iv

Types of Replication……………………………………………..19

Attempts at Replication in Psychological Science………………………22

Open Science Collaboration (OSC)……………………………...22

“Many Labs”……………………………………………………..25

Registered Replication Report…………………………………...26

Data Colada………………………………………………………27

P-curve…………………………………………………...29

Barriers to Replication in Psychological Science………………………..30

Reluctance to Publish Replication Studies………………………31

The “Weaponization” of Replication: The Case of Amy Cuddy...32

Possible Reasons for Resistance to Replication in Psychology………….33

Normative Research Practices…………………………………...33

Threat Associated with Replication……………………………...33

Method-oriented versus effect-oriented approaches to

replication…………………………………………………..33

The OSC’s approaches to systematic replication………...35

Attributions Connected to Successful Versus Failed

Replications………………………………………………………36

3 RESEARCH QUESTIONS, HYPOTHESES, METHODS, AND

DATA/VARIABLES…………………………………………………..…….39

Research Aims and Hypotheses………………………………………….39

Research Questions (RQs) and Associated Hypotheses…………40

Preregistration ……………………………………………………...……42 v

Present Analyses…………………………………………………………42

Piloting of Materials……………………………………………………..43

Pilot recruitment……………………………………………...….44

Cognitive and follow-up interviews……………………………...44

Method…………………………………………………………………...45

Design & Procedures…………………………………………….45

Sampling…………………………………………………………46

Exploratory Data Analysis: Missing Cases, Exclusion, and

Recoding………………………………………………………....46

Survey Materials…………………………………………………………47

Replication Approaches……………………………………….…47

Partial Replication Outcome Conditions………………………...49

Role Identity……………………………………………………..49

Role Conflict……………………………………………………..50

Normative Research Practices…………………………...………50

Familiarity with Replication……………………………………..51

Structural Barriers to Replication………………………………..51

Target/Dependent Measures……………………………………..52

Attitudes toward replication……………………………..52

Health of the field………………………………………..52

Sample Characteristics…………………………………………………..52

vi

4 ANALYSIS OF NORMATIVE RESEARCH PRACTICES (ANALYSIS 1)

………………………………………..……………………………..……….54

The State of Psychological Science………..…………………………….54

Normative Research Practices with the Present Sample………………....55

Emerging practices……………………………………………….56

Self-reported and perceived falsification of data………………...57

Flexible research practices………………...... 58

The practice of self-citing one’s published work……………..….58

5 ANALYSIS OF THREAT TOWARD DIFFERING APPROACHES TO

REPLICATION (ANALYSIS 2) …………………………………………....60

Results: Predictions…………………..…………………………………..61

Method-Oriented versus Effect-Oriented Approaches to

Replication……………………………………………………….61

Post hoc power analysis………………………………….62

Descriptive statistics. …………………………………....63

Positively-valenced measures..………………………..…63

Negatively-valenced measures..……………………….…63

Exploratory Results: Postdiction……………………….…………...……64

Examining Researchers’ Impression of Replication by Group:

Gender and Professional Status………………………………….64

Researchers’ impressions toward replication by gender…65

Researchers’ impressions’ toward replication by

professional status………………………………………..66 vii

Discussion………………….…………...……………………….……….66

6 ANALYSIS OF ATTRIBUTIONS (ANALYSIS 3)……………….……..…71

Research Goals…………………………………………….……..71

Results: Prediction…………………………………………………….…72

Attributional Causes of a Partial Replication Outcome of a Peer’s

Versus One’s Own Work………………………………….……..72

Post hoc power analysis………………………………….73

Descriptive statistics. …………………………………....73

Dispositional causes assigned to a partial replication

outcome……………………………………………...…...74

Situational causes assigned to a partial replication

outcome.…………………………………………....….....75

Discussion……………………………………………………………..…76

7 ANALYSIS OF ATTITUDES TOWARD REPLICATION (ANALYSIS 4)

………………………………………..……………………………………...79

Exploratory Results: Postdiction………………..…………….……….…79

Examining Association between Attributional Causal and Attitudes

Toward Replication……………………………..………….….…79

Factor analysis. ………..………………..………….…....80

Assumptions of multiple regression and post-hoc power

analysis…………………………………………………...81

Selection of variables to include across the regression

analyses………...... 82 viii

Attributes and the Norm of Selective Reporting Regressed on

Attitudes Toward Replication……………………………………84

Association between attributes, a norm of selective

reporting, and attitudes toward replication research in

psychology……………………………………………….84

Association between attributes, a norm of selective

reporting, and support for replication research in

psychology..…………………………………………..….85

Association between attributes, a norm of selective

reporting, and attitude toward the discussion of replication

in psychology……………………………………..…..….85

Association between attributes, a norm of selective

reporting, and the impact replication research has on one’s

future research practices………………………………....86

Discussion…………..…………………………..…………….…...……..86

8 GENERAL DISCUSSION, CONCLUSIONS……………………………....93

General Discussion………………………………………………………93

Normative Practices……………………………………………...94

Perceived Threat toward Replication…………………………….94

Perceived Attributional Causes Assigned to a Partial

Replication……………………………………………………….96

Attitudes toward Replication…………………………………….98

Limitations to the Present Research……………………………..99 ix

Future Directions and Recommendations………………………100

Conclusion…………………………………………………………...…101

REFERENCES…………………..…………………………..…………103

APPENDIX A: SURVEY ……………..……………………………….131

x

LIST OF TABLES

Page

TABLES

1 The Frequencies of Various Research Practices: Comparing Self-Reported and Perceived Frequency of Others’ Conduct (Analysis 1)………………..115

2 Descriptive Statistics for the Replication Measures by Approach (Analysis 2, n = 189) …………………………………………………………………...…116

3 Summary Statistics for Positively and Negatively Valanced Measures of Approaches to Replication (Analysis 2) ……………………………………117

4 Summary Statistics Comparing Impressions Toward Replication Among Men (n = 57) and Women (n = 93) on Positively and Negatively Valenced Measures (Analysis 2) ……………………………………………...………117

5 Summary Statistics Comparing Impressions Toward Replication Across Professional Status on Positively and Negatively Valenced Measures (Analysis 2) ……………………………………………...…………………118

6 Descriptive Statistics for Attributional Causes Researchers Assign to a Partial Replication Outcome of a Peer’s Work Relative to Their Own Work (Analysis 3) ……………………………………………...……………………………119

7 Summary Statistics for Attributional Causes Assigned to a Partial Replication Outcome of a Peer’s Work Relative to One’s Own Work (Analysis 3)…….120

8 Summary of Factor Loadings for 18 Attribution Items Three Factors Emerged (Analysis 4)…………………………………………………………………121

9 Summary Statistics of Variables Included in Four Regression Models Examining Attitudes Toward Replication Research in Psychology (Analysis 4, n = 144) ……………………………………………………………………122

10 Model Examining Attitude Toward Replication Research in Psychology (Analysis 4, Adj-R2 = 0.21) ……………………………………………...…123

11 Model Examining Support for Replication Research in Psychology (Analysis 4, Adj-R2 0.16) ……………………………………………………………...124

12 Model Examining Attitude Toward Discussion of Replication in Psychology (Analysis 4, Adj-R2 = 0.12) ………………………………………………...124 xi

13 Model Examining Impact Replication Research has on Your Future Research Practices (Analysis 4, Adj-R2 = 0.03)………………………………………125

xii

LIST OF FIGURES

Page

FIGURES

1 Means of Replication Measures by Approaches (Analysis 2, n = 189)...126

2 Perceived Utility of Two Approaches to Systematic Replication by Percentage of Respondents (Analysis 2)………………………………..127

3 Perceived Reasonability of Two Approaches to Systematic Replication by Percentage of Respondents (Analysis 2)………………………………..127

4 Perceived Encouragement to participate in Two Approaches to Systematic Replication by Percentage of Respondents (Analysis 2)……………..…128

5 Perceived Threat of Two Approaches to Systematic Replication by Percentage of Respondents (Analysis 2)………………………………..128

6 Perceived Antagonism of Two Approaches to Systematic Replication by Percentage of Respondents (Analysis 2)………………………………..129

7 Perceived Sense of Distress Toward Two Approaches to Systematic Replication by Percentage of Respondents (Analysis 2)………………..129

8 Attributional Causes Contributing to a Replication Outcome of a Peer’s versus One’s Published Research (Analysis 3)……………………...….130

1

REPLICATION IN PSYCHOLOGY: APPROACHES, PERCEIVED THREAT, ATTRIBUTIONS, AND ATTITUDES REGARDING REPLICATION RESEARCH AND OUTCOMES AMONG SOCIAL PSYCHOLOGISTS

CHAPTER 1: INTRODUCTION

Science is traditionally understood as an accumulative process in which previous works are utilized and built upon to produce something new (i.e., scientific discovery).

Like other domains of society, science is composed of collections of institutions, groups, and individuals who share knowledge, beliefs, actions, and understanding. Thus, science is a culture. A faction of the culture-of-science deals with the rehashing of old ideas and findings (i.e., replication). Scientific discovery and replication represent two distinct types of scientific processes—one, which focuses on novel contributions; and another, which focuses on validating contributions. Novelty and replication represent ends of a continuum within which scientific knowledge exists. The focus of this study will be replication in psychological science.

Psychological science conceptualizes replication as a tool to assess the validity, integrity, and, ultimate reproducibility of research findings and effects (Makel, Plucker,

& Hegarty, 2012). How it typically works is that an independent team of researchers attempts to duplicate the findings or effects of an original piece of work using feasibly identical methodologies (Open Sci et al., 2015). Although attempts at direct replication in psychology have become more frequent, replication is still not widely accepted (Makel et al., 2012; Martin & Clarke, 2017). Moreover, replication has at times been “weaponized” in psychology such that high-profile cases of failed replication have resulted in the ending of professional careers (Ranehill et al., 2015; Simmons & Simonsohn, 2015) and 2 the questioning of entire programs-of-research (Kahneman, 2012). This is unfortunate, because replication can potentially carry many benefits for a field, if used appropriately.

When used appropriately, replication is a healthy and necessary facet of science

(Shrout & Rodgers, 2018). Direct replication provides the means to establish the boundaries of a phenomenon. Furthermore, conceptual replication complements direct replication by examining the target effect through different procedures, materials, design, sample, or any combination of these methods. An effect is deemed “established” in part through replication. In short, replication has great utility, and could even be viewed as a necessary process to drive psychological science forward (Klein, 2014; Open Sci et al.,

2015). As such, it is critical that we work toward reducing the threat associated with replication and improve perceptions of it.

The purposes of this dissertation are fourfold: (1) establish the normative research practices of the sample in light of broader findings on normative practices among psychological scientists, (2) better understand how threat is associated with different approaches to replication, (3) compare the causes that are attributed to partial replication outcomes for oneself versus others, and (4) bring it all together by exploring how various factors (e.g. normative practices, attributions) are related to attitudes toward replication research in psychology.

To fulfill the four purposes outlined above, data were collected from a sample of social psychologists via a survey instrument (details to follow in Chapter 3). With these data, four sets of analyses were conducted. The first analysis uses descriptive statistics to explain the normative research practices of respondents. The second analyses examines whether two different approaches to replication—one that is method-oriented versus one 3 that is focused on larger relationships/effects in the literature—are perceived differently

(e.g. threatening, antagonistic, distressful, useful, reasonable, and encouraging).1 The third analysis assesses the causes respondents attribute to partial replication outcome of their work versus a peer’s, representing a test of attribution theory. The fourth and final analysis ties various pieces together and explores how research practices and attributions are related to respondents’ attitudes about replication research.

More details on the above sets of analyses—as well as their results—can be found in Chapters 4 through 7, respectively. In summary, the results from the first analysis suggest that these respondents are quite similar to those surveyed in other studies in both their reporting of various research practices and their assumptions about their peers’ practices (Agnoli, Wicherts, Veldkamp, Albiero, & Cubelli, 2017; Fiedler & Schwarz,

2016; John, Loewenstein, & Prelec, 2012). The findings from the second analysis show, surprisingly, that scholars do not feel extremely threatened by replication, nor do they differ significantly in their views of two different approaches to replication.

The results from the third analysis show that respondents attribute dispositional factors (e.g., flexible practices on the part of the original researcher) more so as a cause of a partial replication of a peer’s work than one’s own work. No differences emerged regarding situational causes attributed to a partial replication outcome of a peer’s versus one’s own work. Significant differences aside, respondents attributed situational causes such as publication bias and procedural differences between original and replicating studies as major contributors to a partial replication outcome.

1 These two approaches also diverge in that the method-oriented solicits the feedback from the original authors, whilst the effect-oriented approaches seek the original authors approval of a replication protocol. 4

Lastly, findings from the fourth and final analysis suggest a number of interesting outcomes. Respondents hold favorable attitudes toward and support replication research in psychology. Correspondingly, respondents believe that many areas of psychological science (e.g., journal editors and publishers, reviewers, the field generally) ought to be more accepting of replication research. Significant associations emerged between the norm of selectively reporting studies and attitudes toward and support for replication research, wherein greater perceived prevalence in others’ selective reporting is associated with more favorable attitudes toward and greater support for replication research. A consistent association emerged between the attributed flexible actions of peers and attitudes toward and support for replication research. Specifically, greater flexible practices, when perceived as the cause for a partial replication outcome on the part of a peer, was associated with more favorable attitudes and support for replication research.

An inverse association emerged wherein the self-reported norm of selectively reporting studies was associated with less favorable attitude toward and support for replication research.

Although implications of the findings will be discussed in greater depth within each results chapter (again, Chapters 4 through 7) as well as in the concluding chapter

(Chapter 8), it is beneficial to briefly summarize here: The results provide some hope for the future viability of replication research in the psychological sciences. While it might be easy to assume that researchers feel threatened by replication research, this dissertation suggests that threat is not as prevalent as previously assumed. Moreover, the findings show that the two main approaches currently pursued by entities that actively engage in replication do not differ in the extent to which they are perceived as 5 threatening. Consequently, threat may not be a significant barrier to replication going into the future. When looking at attributions, the findings of this dissertation suggest that attributions could actually further replication efforts in the psychological sciences, especially in the case of dispositional explanations for others’ studies that do not fully replicate. In a similar vein, skepticism about others’ (flexible) research practices is another factor that has the potential to progress replication going into the future given that this skepticism is related to greater support for replication. Conversely, those who admit to engaging in flexible practices themselves are less supportive of replication. In sum, the future for replication research in psychology appears to be bright, and now is a prudent time to capitalize on these favorable views to push replication forward.

6

CHAPTER 2: LITERATURE REVIEW/BACKGROUND

Science, Epistemology, and Notable Assumptions about Science

Science as an entity (i.e., a domain of society) has been discussed and conceptualized since the 17th century (Kitcher, 1995). Science is a culture guided by assumptions, principles, and practices, which is composed of groups and individuals each working from different perspectives and experiences that interact to produce complementary and conflicting pieces of knowledge. In other words, the business of science is that of epistemology and methodology. Epistemology is any pursuit toward understanding the actual state of the world, constantly asking what do we know and how do we know what we know? Methodology refers to the actions (i.e., practices) used to better understand the state of the world. Epistemology and methodology are inherently connected (Trochim, 2006).

As one facet of science is to validate what is known through collective objectivity

(Charmaz, 2014), scientific knowledge has often been the center of epistemological debates in philosophy (Kuhn, 1963; Popper, 1959). Objectivity and rationality are ordering principles of science, and science is based on the pursuit of the “true state” of the world through an accumulative process (Kitcher, 1995). Thus, science is about making inferences and trying to predict the unknown using what is known (King, 2017).

Falsification and Scientific Revolution

Both Karl Popper (1959, 1962) and Thomas Kuhn (1963, 1970) were concerned with the acquisition of scientific knowledge. Working toward deductive reasoning,

Popper (1959) argued that a scientific theory’s worth was determined by 1) being testable and 2) surviving more and more stringent testing (disproof; Popper, 1959). The upper and 7 lower limits of any scientific statement are truth and falsity (Popper, 1962). So, a scientific theory and its corresponding evidence are considered corroborated when they withstand ‘detailed and severe’ testing and are not replaced with another theory. This he referred to as falsification. Popper would say that a scientific theory is not worth its merit if it were merely probable; instead, he would insist that scientists seek precise explanatory (i.e., corroboratory) theories (Popper, 1962). Furthermore, no definitive proof of a theory can ever be produced because experimental findings are fallible

(Popper, 1959).

According to Kuhn (1963), science is based on agreed-upon achievements (i.e., paradigms) that are acknowledged by entities and groups of scientists. These notable achievements provide the foundation that signal future practices. A paradigm is any set of unprecedented achievements (i.e., accumulation of knowledge) that rally groups of researchers toward specific disciplinary practices (e.g., methodologies, theoretical approaches, guiding principles) and grant the capacity and flexibility to work unresolved problems (Kuhn, 1963). Scientific knowledge is entrenched in sociologically and psychologically bounded paradigms (Kuhn, 1970). Science and all scientific endeavors, entities, and groups are socially constructed, and are embedded in social contexts of shared practices, beliefs, and approaches (Berger & Luckmann, 1966; Kuhn, 1963).

Popper’s notion of falsification was widely adopted across science and psychological science alike. The null hypothesis significant test (NHST), a conditional probability test, inherently applies Popper’s notion of falsification of testing a nil relationship/disproving-hypothesis-testing (Wilkinson, 2013). Kuhn (1970) makes two critical points in a comparative discussion of his and Popper’s philosophical arguments 8 about the development of scientific knowledge. First, Kuhn (1970) argues that Popper should not have made testing a qualifier for scientific evidence, and second, that Popper’s notion of falsification was a proposed methodological-ideology (i.e., belief-structure), instead of a guiding methodological rule. Both Kuhn and Popper’s arguments highlight two philosophical positions in science, post-positivism and social construction. Neither is the founder of either approach, but both have been influential in shaping the development of each position and its application to scientific endeavors respectively.

Post-Positivism and Social Construction Approaches to Research

Objectivity and the notion of truth or truths are not standard across science. There are many blanket philosophical positions in science, but two are particularly relevant: post-positivism and social constructionism. Post-positivism was preceded by positivism, which assumed that there was a singular truth about the natural world waiting to be discovered (Charmaz, 2014) and that scientists could see the ‘true’ state of the world

(Trochim, 2006). This approach relies on empirical evidence and advocates for a universal scientific method that assumes a passive researcher collects facts (i.e., evidence, data), but does not have a hand in creating the facts (Charmaz, 2014). Post-positivism rejects these notions and instead acknowledges that all observations and methodologies are fallible, may be erroneous, and that scientific theory is amendable (Trochim, 2006).

Social constructionism, and, to a degree, post-positivism, assumes there are multiple truths created through the shared experience of individuals and groups

(Charmaz, 2014; Trochim, 2006). Science is the social creation and recreation of incoming knowledge, all of which is bound by context (i.e., time, historical climate, and prevailing worldviews) (Kuhn, 1963). Social constructionism assumes that a researcher 9 can never be completely removed from these contexts and that researchers often, when appropriate, do have a hand in creating the facts (Charmaz, 2014). A theory is said to be a scientific fact once collective-objectivity is reached through iterative studies indicating similar findings from a multitude of approaches and methodologies. This is referred to as triangulation.

Evidence-based practice approach. Mainstream science is largely based on a post-positivist approach (Charmaz, 2014; Kline, 2008). A manifestation of a post- positivist approach to science can be found within the evidence-based practice (EBP) approach. The overall goal of the EBP approach is to improve patient outcomes (i.e., physical and mental concerns and ailments) through informed research that is based on the “best available evidence/knowledge” (APA, 2006). In short, EBP is defined as an approach that summarizes scientific evidence based on a hierarchy of research methodologies that are then used to guide the decisions to conduct research. This approach places quantifiable evidence at the top of the methodological hierarchy, assuming the practices that lend to the best quantifiable evidence lead to the creation of the most valuable forms of scientific evidence.

Grounded theory approach. Comparatively, a manifestation of a socially constructed approach to science can be found with the grounded theory approach.

Generally, grounded theory refers to a set of rigorous research practices, iterative interaction with relevant data (memo writing and analyses), and reflexive adaptation leading to the emergence of conceptual categories from data themselves (Charmaz,

2014). This approach is largely based on a classic social psychological approach, symbolic interactionism, which assumes: 1) society is a manifestation within the minds of 10 its members (Reynolds, 2005), 2) people act toward and upon things that have meaning,

3) meaning stems from social interactions among individuals and groups, 4) meaning is processed in an interpretative fashion, and 5) meaning provides a backdrop to beliefs and actions (Charmaz, 2014; Stryker & Vryan, 2003; West & Zimmerman, 1987). Grounded theory coupled with symbolic interactionism assumes that scientific evidence accumulates iteratively within the same project and is a product of a social interaction between the researcher(s) and participants/respondents. From this coupled approach, truth is plural and relative, knowledge is socially constructed, and evidence is validated by obtaining collective-objectivity.

Culture

There are essentially three main conceptualizations of culture: psychological, organizational, and culture-of-science. Each concept has its merits and limitations, but when combined, the three offer a more complete conceptual paradigm of culture than alone. Psychological conceptualization of culture emphasizes the passing and creation of shared knowledge (Heine, 2010). The organizational conceptualization of culture assumes that the elemental units of a culture are patterns of assumptions that serve as the backdrop to normative beliefs and behaviors (Schein, 1990, 1996). Lastly, the culture-of- science emphasizes the accumulation of a validated knowledge base through standard practices/behaviors (Merton, 1968).

Psychological Conceptualization of Culture

Psychology defines culture as any collection of people who share in common institutions, identities, and groups; meet regularly; have identifiable roles and statuses; share common behaviors and practices; are directed by common ideological systems; and 11 pass along and create knowledge among members, groups, and institutions within and across social and physical boundaries (Heine, 2010).

Organizational Culture

From an organizational perspective, culture is composed of patterns of assumptions that are socially constructed by a given group, which are used when learning

“to cope with the problems of external adaptation and internal integrations” (Schein,

1990, p. 111). Utilized assumptions “worked well enough to be considered valid and, therefore, [are] to be taught to new members as the correct way to perceive, think, and feel in relation to” the problems faced by the collective (Schein, 1990, p. 111). Internal- consistency (i.e., stability) of a culture is a product of the group’s length-in-existence, learning experience, and the extent that assumptions are clear and agreed upon among the leaders of the group (Schein, 1990).

Once a group has reached an agreed-upon set of assumptions, corresponding patterns of perceiving, thinking, feeling, and acting offer meaning and promote stability

(i.e., comfort; Schein, 1990). Thus, patterns of agreed-upon and common assumptions function as an anxiety-reduction mechanism (Schein, 1990). In other words, culture is a set of taken-for-granted implicit assumptions that are the backdrop to normative feelings, beliefs, and practices (Schein, 1996). A culture’s members rarely question these assumptions, but when sets of taken-for-granted assumptions are questioned such action may be perceived as counter-normative, instilling a sense of cultural instability (i.e., cultural change; Schein, 1990, 1996). Note, with established cultures, instability often occurs among sub-factions within a particular culture (Schein, 1990, 1996).

12

Culture of Science

The culture of science often refers to 1) sets of methods used to verify knowledge,

2) a collection of shared knowledge (i.e., evidence) from methodological practices, 3) a set of beliefs and norms that indicate what is deemed scientific behavior, and 4) any combination of these areas (Merton, 1968). These areas represent the essence or ethos of science (Merton, 1968). A distinguishing feature of the culture of science is the standardization of practices that strive to validate accumulating knowledge (Kuhn, 1963,

1970; Popper, 1959, 1962). This feature inherently requires the shifting and revising of common assumptions; however, scientific knowledge and practices that are deemed non- controversial are the more implicit/taken-for-granted assumptions, which are often unquestioned.

Culture of Social Science

A culture of social science is any collection of people and groups who accumulate and enhance social-scientific knowledge through standard practices based on patterns of agreed upon assumptions that serve as the backdrop of what is deemed scientific belief and behavior (Merton, 1968; Schein, 1996). Some sub-factions of social science (e.g., social psychology) have begun to question prevailing research practices, challenging fundamental assumptions and beliefs of social science (John et al., 2012; Simmons,

Nelson, & Simonsohn, 2011). This cultural instability has been referred to as the

Replication Crisis/Discussion/Movement/Wars and Research Dependability Movement in psychological science (Engber, 2017; Klein, 2014; Lishner, 2015 Trafimow & Earp,

2016). The next section addresses events that are pivotal to understanding the Replication

Crisis in psychology. 13

The “Replication Crisis” in Psychology: Events/Issues that Led to It

As the debate around replication unfolds, common research practices (e.g., failing to report all of a study’s dependent variables) have been reframed as questionable practices, bordering on scientific misconduct (John et al., 2012; Simmons et al., 2011).

The notion of p-hacking (i.e., researcher degrees of freedom) has become widespread.

These developments in meta-research, along with other relevant events, have shifted the normative beliefs and practices of mainstream psychology, particularly within social psychology, and have prompted cultural instability and change.

Cultural instability is not the only avenue for change. Revised cultural norms that are perceived as trending can drive change if the normative trend is perceived as increasing into the future (Mortensen et al., 2018). For instance, if the emerging norm of constructing preregistration reports (i.e., formal declaration of a researchers’ hypothesis, methodology, intended analyses, and desired sample size a priori) is perceived as increasing in prevalence, then this may prompt increased use and endorsement to construct such reports. Furthermore, emerging prevalent norms guide behavior when they are consistent with both the cultural context (i.e., requiring the use of preregistration reports) and culturally approved behaviors (e.g., one ought to construct preregistration reports) (Cialdini, Reno, & Kallgren, 1990).

The field of social psychology has become the center of scientific change with significant events marking these moments of change. Many discussions and methodological changes have emerged from this cultural instability (e.g., data fabrication) and trending practices (e.g., incorporation of preregistration reports), some of which are epistemologically healthy, such as increasing use of data repositories and 14 support for transparency, reproducibility, and replication (Open Sci, 2017). Other changes have had corrosive effects, including shaming prominent researchers out of the field (e.g., Amy Cuddy; Engber, 2017).

Evidence suggests that statistically significant (Kühberger, Fritz, & Scherndl,

2014; U. Simonsohn, Nelson, & Simmons, 2014) and novel (Wang, Veugelers, &

Stephan, 2017) findings are disproportionally selected for peer-review publication over other scientific contributions. Other evidence suggests that replication studies make up a small percentage (1-2%) of the psychological literature (Makel, Plucker, & Hegarty,

2012). Furthermore, cases of scientific fabrication and falsification have increased (Steen,

Casadevall, & Fang, 2013), pressures for researchers to publish are high, and available research funds are scarce (Smaldino & McElreath, 2016). An analysis indicates that publish-or-perish cultural characteristics (i.e., early-career-researchers, researchers working in countries that incentivize publication with cash rewards) and data fabrication

(i.e., prevalence of published cases of intentional fabrication) are correlated (Fanelli,

Costas, Fang, Casadevall, & Bik, 2017). This suggests that competitive research environments may increase fabrication and falsification of data.

Data Fabrication

One of the most notable instances of instability was the investigation of prominent social psychologist Diederik Stapel for multiple allegations of data fabrication and scientific misconduct (Verfaellie & McGwin, 2011). Investigations led to 58 retractions of fraudulent research and Stapel returning his doctoral degree (Oransky, 2015;

“Retraction Watch Leaderboard,” 2018). This case evoked various indirect reactions, one of which suggested that the occurrence of false-positive rates in psychological science is 15 widespread (Simmons et al., 2011). This conclusion was based on an umbrella concept referred to as researcher degrees of freedom (i.e., p-hacking).

P-Hacking (i.e., Researcher Degrees of Freedom)

According to Simmons et al. (2011), common research practices in psychology amplified false-positive (i.e., incorrect rejection of the null hypothesis) rates in the psychological literature. These common research practices were often decision-points made by scientists when conducting research. These decisions were thought to have minimal down-stream impacts on analysis and findings; however, some contemporary evidence suggests the opposite (Simmons et al., 2011).

These decision-point practices have been referred to as researcher degrees of freedom. This term stems from the use of degrees-of-freedom in inferential statistics, which refers to the number of observations (i.e., sample size) minus the amount of active independent variables one has in their statistical model. With each additional independent variable, one loses the capacity to predict variability in the data. Thus, the more independent variables one adds to a statistical model the less wiggle-room there is to account for error in the model, which in turn can inflate the chance of false discoveries

(i.e., erroneous results; Type I error). According to Simmons et al. (2011), researcher degrees of freedom can stem from four common typologies: “flexibility in (a) choosing among dependent variables, (b) choosing sample size, (c) using covariates, and (d) reporting subsets of experimental conditions” (p. 1360). In short, the argument is that ambiguous research practices like the ones mentioned above introduce unchecked-error into statistical analyses, which capitalize on the chance to produce false-findings, or as

Simmons puts it: 16

Everyone knew it was wrong, but they thought it was wrong the way it’s

wrong to jaywalk…simulations revealed it was wrong the way it’s wrong

to rob a bank.”…Simmons called those questionable research practices p-

hacking, because researchers used them to lower a crucial measure of

statistical significance known as the p-value. (Dominus, 2017; Simmons et

al., 2011)

Simmons et al. (2011) developed a way to infer the presence of p-hacking and false- positives via a method they call p-curve analysis (Simonsohn et al., 2014). Such practices are possible contributors to instances of failed direct replication attempts (Open Sci et al.,

2015). The concept of p-hacking called into question the dominant methodological practices of mainstream psychology (Dominus, 2017) and created cultural instability and debate within psychology and social psychology alike. Note that p-curve analyses are not definitive and have limitations (Bruns & Ioannidis, 2016), and despite the widespread use of p-hacking behaviors, some evidence suggests it minimally impacts scientific consensuses (Head, Holman, Lanfear, Kahn, & Jennions, 2015).

Although the primary emphasis is with limiting false-discoveries, the use of justifiable flexible practices may help balance against false-negatives (i.e., wrongful rejections of the null) (Lishner, 2015). Thus, it is important to keep in mind that, at times, justifiable analytic decisions require some flexibility and that flexible and ambiguous research practices are not synonymous. It is assumed that the flexible practices discussed often confuse the production of scientific knowledge, but this does not mean all flexibility in analytic decisions should be abandoned. Instead, increased openness and transparency 17 such as clearly distinguishing between prediction (i.e., confirmatory analysis) and postdiction (i.e., exploratory analysis) is paramount.

Other Flexible Researcher Practices

At the same time that the p-hacking publication was released, John and colleagues

(2012) published self-report survey findings of questionable research practices (QRPs) from 2,000 U.S. psychologists. According to this publication, the self-report base-rates were as follows: 1) data fabrication 0.6%; 2) claiming results are unaffected by demographic factors when unsure 3%: 3) reporting an unexpected finding as being predicted a prior 27%; 4) excluding data upon seeing its impact on results, 38%; 5)

“selectively reporting studies that “worked””, 45%; 6) rounding off a p-value, 22%; 7) stopping data collection prematurely, 15%; 8) neglecting to report all of a study’s conditions, 27%; 9) deciding to collect additional data upon see if results reach statistical significance, 55%; and 10) “failing to report all of a study’s dependent measures” 62%. These data are limited (Fiedler & Schwarz, 2016); however, these findings have been replicated with a sample of Italian psychologists (Agnoli et al., 2017).

Despite limited data, this publication indirectly called into question the day-to-day practices of psychologists.

Open Science Framework

In 2012, Brian Nosek and Jeffrey Spies developed and launched an online platform, Open Science Framework (OSF; https://osf.io/ ). The OSF subsequently led to the creation and launch of the Center for Open Science (COS; https://cos.io/ ) whose mission “is to increase openness, integrity, and reproducibility of research.” This entity has been very influential in reshaping the day-to-day practices of psychological 18 researchers (Nelson, Simmons, & Simonsohn, 2018; Shrout & Rodgers, 2018) and spearheading systematic attempts that assess the reproducibility of psychological research

(Open Sci, 2015, 2017).

Replication: What it is, its Benefits

Replication functions as a cross-validating tool that, when done frequently, can add empirical strength to any body of scientific knowledge. The accumulation of scientific knowledge, in part, deals with the adjustment and enhancement of newly produced findings and establishing consistent reliable patterns of knowledge (i.e., establishing robust patterns; Martin & Clarke, 2017). This is the business of science. In essence, a piece of scientific knowledge, once validated through collaborative efforts and complementary programs-of-research, becomes less controversial and serves as foundational knowledge (Earp & Trafimow, 2015; Lakatos, 1974; Popper, 1959). Thus, the process of replication can strengthen emerging scientific discoveries (Makel &

Plucker, 2014) and correct previously established pieces of knowledge (Martin & Clarke,

2017). It is important to keep in mind that well-established theory is not a prerequisite for replication and that when there is an established theory, prediction depends more on auxiliary hypotheses than the theory in question (Earp & Trafimow, 2015; Trafimow &

Earp, 2016).

Replication can also reduce the risks posed by data fabrication, p-hacking, and other questionable research practices. Cases of data fabrication could be revealed through failed direct replication attempts (e.g., ad hoc analysis) (Simonsohn, 2013). Overly flexible and messy research practices could be revealed upon a replication inquiry

(Simmons et al., 2011). A replication attempt can lend support for a previously 19 established finding (e.g., OSC, 2015) and help identify spurious findings. Consider the published empirical claim for the existence of precognition (Bem, 2011), and the corresponding preregistered replication attempt which failed to lend support for the original claim (Ritchie, Wiseman, & French, 2012).

This series of cultural events mark discrepancies in how psychologists actually behave versus how they ought to behave. At the center of these events are sets of direct replication failures (i.e., replications that do not duplicate original findings) fueling the debate (Open Sci et al., 2015; Ranehill et al., 2015). Psychological science is in the midst of a paradigm shift (Kuhn, 1963). This paradigm shift has been referred to as the

Replication Crisis (Trafimow & Earp, 2016), Replication Movement (Engber, 2017),

Replication Wars (Klein, 2014), and Research Dependability Movement (Lishner, 2015).

Crisis implies that something, structurally or behaviorally, needs to change; while movement refers to collections of people enacting change on social-cultural systems and normative practices; and war indicates conflict and contention among culturally similar groups. Lastly, a Research Dependability Movement emphasizes the focus on the day-to- day research practices and the capacity of such practices to reproduce findings.

Combining these blanket-labels loosely tracks the trajectory of this paradigm shift. Next, it is important to conceptualize replication and its many facets.

Types of Replication

Many types of replication have been presented (for review, see Earp & Trafimow,

2015), but only those types widely used in psychology will be discussed. Lykken (1968) proposed three ideal-types (i.e., stereotypic anchors) of replication. The first is literal replication, which tries to duplicate the exact procedures used by the original authors, 20 mirroring the sampling procedures, experimental conditions, and measurement techniques. As Lykken points out, the closest one could come to a literal replication is to have the original authors run more participants through a recently established study.

The second type of replication is operational/procedural replication, which is a replication attempt that only duplicates the sampling procedures and experimental conditions of the original study. The purpose of this type of replication is to cross- validate the experimental conditions and procedures the original authors deem important enough to report in the methods section of the initial publication.

The third type of replication is constructive replication, where replicators strategically avoid reproducing the original authors’ practices (i.e., methods), but instead work only from a clear statement of the empirical facts. For instance, a replicating team working from a constructive approach would use the empirical factor of the original work

(e.g., group separation contributes to increased levels of inter-group prejudices) and test whether one can reproduce a similar finding by whatever means deemed scientifically appropriate.

Lykken’s (1968) replication types were later revised as either direct or conceptual replication (Makel et al., 2012). A direct replication is an experiment that is as close to the original work as possible, which attempts to cross-validate a particular finding (Makel et al., 2012). The replicators should attempt to duplicate procedure, conditions, manipulations, etc. of the original piece, avoiding any unnecessary alterations (Earp &

Trafimow, 2015). A conceptual replication deliberately deviates from the original work with the intent of cross-validating a theoretical finding, assuming it has been reliably established (Earp & Trafimow, 2015). In short, a direct replication is utilized to validate a 21 specific finding, while a conceptual replication is utilized to validate an underlying theory, phenomenon, or construct (Earp & Trafimow, 2015).

It is important to keep in mind that a single successful literal/direct, operational, or constructive/conceptual replication is not necessarily an indication of a well- established relationship that generalizes to a target population (Earp & Trafimow, 2015;

Lykken, 1968). Similar to the relationship between reliability and validity, reliability among a set of measures could be strong, but this is not necessarily an indication that the instruments capture what they are expected to measure (Lykken, 1968; Shadish, Cook, &

Campbell, 2002). A relationship (e.g., prominent foster prejudicial attitudes toward out-group members) is said to be robust (strongly valid) upon the cross-validation of phenomena from multiple research teams through multiple data sources, methodologies, and approaches across time (i.e., triangulation). Furthermore, it is important to keep in mind that a single replication attempt is just that, one attempt in a collection of other evidence. It is also important to frame replication on a continuum acknowledging that variations to direct and conceptual replications exist (Lishner, 2015).

Lykken’s (1968) conceptualizations of replication map on to contemporary definitions (Appelbaum et al., 2018). Specifically, the APA Working Group on

Quantitative Research Reporting Standards recommends that all replication studies be framed as a particular type: “direct (exact, literal) replication, approximate replication, or conceptual (construct) replication.” Similarly, a distinction is made between internal and external replication (Appelbaum et al., 2018). Internal replication is defined as an attempt using the same sample or some resampling or randomization method (e.g., training- test/hold-out data approach) to cross-validate empirical findings. External replication is 22 defined as an attempt that repeats a previously published or archived study to validate its empirical findings (Appelbaum et al., 2018).

Attempts at Replication in Psychological Science

The culmination of high-profile cases of data fabrication and alarming base-rates of QRPs (John et al., 2012) led to numerous systematic attempts to test whether psychological science is replicable (Maxwell, Lau, & Howard, 2015; Open Sci et al.,

2015). In order to produce isolating evidence to assess the replicability of psychological science, one often conducts a direct replication (Maxwell et al., 2015). Conceptual replications are beneficial, but any difference between original and replicated results can be attributed to procedural differences (Gilbert, King, Pettigrew, & Wilson, 2016;

Maxwell et al., 2015). Therefore, systematic direct replications are the way forward, as direct replications, relative to conceptual, are efficient in that they inherently control for more variation which may influence findings (Lishner, 2015). Thus, conceptual replication should be done when a series of direct replications have been established

(Lishner, 2015). A number of groups/initiatives that have become actively involved in direct replication in psychology.

Open Science Collaboration (OSC)

Open Science Collaboration (OSC), co-founded by the social psychologist Brian

Nosek, initiated the Reproducibility Project: Psychology (RPP). The RPP was composed of 270 individual researchers tasked with performing a direct replication of 100 published psychological studies released in 2008 (Open Sci et al., 2015). The sample frame was all published articles from 3 top-tier journals in psychology: Psychological Science (PSCI),

Journal of Personality and Social Psychology (JPSP), and Journal of Experimental 23

Psychology: Learning, Memory, and Cognition (JEP: LMC). These 3 journals were selected for their agreed-upon breadth in the psychological literature. Once selected, separate teams of researchers were tasked with attempting a direct replication of the last study in each published article. The last study was selected on the assumption that the first study was usually a preliminary study and selection of the last study standardized the process across all replication attempts.

Each team of replicators was encouraged to pre-register their intended course of action (i.e., construct a pre-data report), evaluate the replication attempt based on a priori criterion of statistical significance and effect size, and subjective criteria of answering the question “Did your results replicate the original effect?” (Open Sci et al., 2015; for supplement materials see https://osf.io/ezcuj/). From this massive effort, the OSC concluded that roughly half (47%) of the original findings replicated. This finding then led some to declare that psychological science was in a replication crisis.

A direct replication attempt of the OSC’s RPP (i.e., a replication-of-the- replications) was conducted and published (Gilbert et al., 2016). This direct replication attempt successfully reproduced the original authors’ findings that 47% of the replicated findings were within the original 95% confidence interval effect sizes. Gilbert and colleagues argued that the OSC replicators frequently could not or did not conduct direct replications and instead changed features of the experimental conditions or procedures, which included additional variability into each replication attempt. When this extra variability is accounted for, the rate of expected replication failure decreases. This led

Gilbert and colleagues to conclude that the original replication rate was lower than it was in actuality. Regardless, this debate about the RPP’s original findings highlights that each 24 case is one piece of evidence that contributed to the vast literature of science. This acknowledgment was pointed out by the OSC in their response to Gilbert and colleagues critique “OSC2015 provides initial, not definitive, evidence—just like the original studies it replicated” (Open Sci, 2016). Furthermore, this debate indicates that the standards of replication are evolving.

An inherent limitation of any direct replication attempt is that the original study can never be exactly replicated (Gilbert et al., 2016; Nelson et al., 2018). At best, the replicating team can re-run the procedures on the same sample at a different time point.

Hence, it is important to base one’s trust in a body of knowledge on a series of replication attempts as opposed to just one attempt. Although the OSC’s RPP replicated 100 studies, each study was a single replication of the last study, from all selected 2008 publications.

OSC’s selection process for their inaugural project was appropriate for the primary research question, which was to assess how reproducible psychological science is. This approach, however, is limited to gauging the trustworthiness (i.e., replicability and reproducibility) of distinct psychological phenomena. The OSC should be praised for the past and present efforts; however, the RPP was most helpful in initiating the necessary momentum regarding replication and reproducibility, and criticism of the project led to healthy scientific discussion and future directions. This criticism also pointed out two alternative approaches to systematic direct replication: “Many Labs”

(Klein et al., 2014, 2018) and Registered Replication Report (Simons, Holcombe, &

Spellman, 2014).

25

“Many Labs”

Richard Klein (2014), in collaboration with over 30 independent labs from various nations, conducted direct replications of 13 psychological effects across 36 independent samples resulting in 6,344 participants. The initiating group chose 13 classic and contemporary psychological effects that varied in replicability; some were known to be replicable, while others were unknown. From this, all studies were bundled together in an “easy-to-administer experiment that was delivered to each participating sample through a single infrastructure (http://projectimplicit.net/)” (Klein et al., 2014, p. 143).

Collaborators were recruited via OSC and SPSP (Society of Personality and Social

Psychology) list-serves. Each replicating team had to administer the agreed upon protocol, collect data from at least 80 participants, post a video documenting the setting and procedure simulation, and document any change to the protocol.

The Many Labs project differs in two notable ways from the RPP: First, a standard protocol of procedure was registered and used. This standardizes the methodology across replication attempts so that laboratory and country differences could be assessed. Second, the Many Labs researchers set out to replicate psychological effects as opposed to specific studies within a specific selection criterion. This allowed for conditional comparison of lab versus online conditions and US versus international samples. Findings for this systematic replication (i.e., Many Labs) indicated that replicable findings may be more dependent on the effect than the sample or research setting. A similar approach to replication is the Registered Replication Report approach.

26

Registered Replication Report

The Registered Replication Report (RRR) approach is very similar to the Many

Labs approach with the added purpose of subjecting a series of replicated studies to a meta-analysis. RRRs compiled a set of studies from a variety of laboratories that all followed an identical, vetted protocol designed to reproduce the original method and finding as closely as possible. “By combining the resources of multiple lab, RRRs provide the ingredients for a meta-analysis that can authoritatively establish the size and reliability of an effect.” (Simons et al., 2014, p. 552).

The steps of an RRR are as follows: first, a proposal is submitted to an agreeable editor. This proposal needs to argue the value of conducting the replication, specifically scientific interest, practical implications and whether there is uncertainty in the size of the effect due to lack of direct replications or controversy in the literature. Additionally, this approach requires that the Registered Replication Reports be published, regardless of outcome. Thus, the targeted effect detailed in the replication proposal needs to be vetted by the agreeing editor and original authors as an appropriate candidate for an RRR.

Second step is to develop an accurate protocol, in which the editor asks the original authors to evaluate the proposal and to develop “manipulation checks or boundaries conditions necessary to measure the effect accurately” (Simons et al., 2014, p. 553). This approach is intended to have the replicating team and original authors work collaboratively toward the common goal of creating an accurate protocol. Thirdly, after the protocol is finalized, the journal solicits other laboratories to join in the replication efforts. Interested labs will then submit a proposal detailing how they plan to follow the protocol and relevant expertise to contribute to the replication effort. Effect size is the 27 primary measure reported for all effects, both from individual labs and meta-analysis.

The potential for statistical bias from both p-hacking and publication bias are eliminated by establishing preregistration reports and the guarantee of publication a priori (Simons et al., 2014).

The three approaches to direct replication efforts represent the majority of systematic replication attempts in psychology. These approaches represent the prevailing approaches of conducting systematic direct replication (for review see Shrout & Rodgers,

2018). However a small group of researchers, known for identifying other researchers whose results seem to replicate with low frequency, have developed what is, in essence, a fourth approach (Shrout & Rodgers, 2018).

Data Colada

In 2013, Leif Nelson, Joseph Simmons, and Uri Simonsohn launched a blog,

Data Colada, which “Posts involve quantitative analyses, replications, and/or discussions of interest to at least three behavioral scientists” (Simmons, Nelson, & Simonsohn, 2017).

Although this group is composed of only three researchers, they have been influential in shaping research practices within psychology. For instance, they are responsible for the seminal publication that conceptualized the term p-hacking (Simmons et al., 2011). To comprehend Data Colada, it is important to understand their perspective regarding the

Replication Crisis in psychology and a related technique they developed to detect selective reporting called p-curve analysis.

Nelson, Simmons, and Simonsohn (2018) label the Replication Crisis a renaissance rather than a crisis. They focus their discussion of replication in psychology on detecting false-positives in the literature (i.e., Type I errors) and how to prevent them. 28

They contend that the published literature is flooded with failed studies because of p- hacking. In particular, they argue that the file-drawer explanation is incorrect. The file- drawer explanation argues that statistically significant (i.e., successful) studies are submitted for publication while non-significant (i.e., failed) studies are hypothetically place in a filing drawer (Nelson et al., 2018). This explanation assumes that researchers conduct one predetermined statistical analysis, which is not often the case, as p-hacking suggests. According to Nelson et al. (2018), a study was filed only when p-hacking failed. In addition, psychologists have known for the past 50 years that the literature is littered with underpowered studies (Cohen, 1962), but underpowered studies were not failing because of p-hacking (Nelson et al., 2018). “P-hacking is the only honest and practical way to consistently get underpowered studies to be statistically significant”

(Nelson et al., 2018, p. 515, emphasis in the original).

The outcome of recent systematic replication attempts suggests that the presence of false-positives and underpowered studies are common in the literature (Nelson et al.,

2018). False-positives stem from three types of behaviors: p-hacking, human error, and fraud. The latter in the list is a deliberate action, while the two former types are often unintentional acts. Each is difficult to detect, and false-positives contaminate the literature and may bias meta-analyses (Nelson et al., 2018). “The end result of a meta- analysis is as strong as the weakest link; if there is some garbage in, then there is only garbage out” (Nelson et al., 2018, p. 528, emphasis in original). Meta-analytic thinking may inflate some of the issues centered on the prevalence of publishing false-positives.

P-hacking, and as a consequence false-positives, can be prevented with disclosure and preregistration (Nelson et al., 2018). Disclosures are statements from authors that 29 accompany each publication, which mentions all measures, conditions, data exclusions, and how they determined the sample size. Preregistration is an a priori proposed plan of action before any data cleaning or analyses takes place.

According to Data Colada, the pervasiveness of false-positive findings in the literature have contributed to the Replication Crisis, but disclosure wards off error and fraud, and preregistration protect against p-hacking tendencies. In light of this, they argue that studies should be subject to audits and researchers should design studies and conduct research assuming they could be audited.

For example, journals could require authors to provide information on

exactly when (i.e., specific dates and times), exactly where, and by whom

the data were collected. Journals could then do the routine fact checking

that newspapers do (Nelson et al., 2018, p. 527).

They further argue that those not convinced by a failure to replicate need to make testable claims as to the condition(s) the effect is expected to replicate and the conditions that a failure to replicate is informative (Nelson et al., 2018).

P-curve. Simonsohn et al. (2014) offer a p-curve analysis as a means to detect if a body of literature will produce replicable results. A “p-curve is the distribution of statistically significant p-values from a set of studies” (Nelson et al., 2018, p. 524, emphasis in the original). Psychological science agrees that statistical significance is p <

.05. Using this, one can construct a frequency distribution of p-values within a select body of literature (Simonsohn et al., 2014). P-curves should produce three notable types of distributions: First, if there is no effect, the p-curve should appear uniform/flat.

Second, if there is an effect, the distribution of values will produce a rightward skew, 30 meaning that if there is a true effect, there should be more values between .01 and .02 levels than between .04 or .05 levels. Conversely, if the distribution of p-values is leftward skewed, having many values between .04 and .05, it suggests that p-hacking, selective publishing, or both are prevalent in the selected literature. Taken together, a p- curve can be applied to a body of literature, which will detail a distribution of p-values. If the p-values are clustered between .01 and .02 then the literature is replicable and should be trusted. If the p-values are clustered between .04 and .05; however, then it is likely that the literature is not replicable and should not be trusted.

Barriers to Replication in Psychological Science

Although some data indicated that 62% of researchers replying and willingly share research materials upon received a request to conduct a replication (Abernethy &

Keel, 2016), this rate has minimally change over the past 30 years (Reid, Rotfeld, &

Wimmer, 1982). Despite efforts to replicate work in psychology increasing in recent years (see discussion of organizations/initiatives above), it is still not widely accepted practice (Makel & Plucker, 2014; Makel et al., 2012; Martin & Clarke, 2017). This can be seen in both a) the reluctance to publish replication studies and b) the fallout that occurs when replication is used inappropriately (e.g. ‘weaponized’). Each of these themes will be discussed briefly below before followed by a discussion on specific possible reasons for the resistance to replication. This dissertation will ultimately seek to better understand these barriers so we can break them down and open the door to more replication in psychology.

31

Reluctance to Publish Replication Studies

An analysis of the top 100 psychology journals (determined by the journals with the highest average-5-year impact factor) from 1900 to 2009 indicates that the term replication (replicat*) was used in 1.57% (5,051 of 321,411) of analyzed articles (Makel et al., 2012). Additional analysis of 500 randomly selected articles, “replicat*”, indicated that 68.4% were actual replication studies, establishing a 1.07% base-rate for replication in the psychological literature (Makel et al., 2012). Despite this low percentage, articles published after 2000 show a 2-fold increase in the use of the term “replicat*”, suggesting that the concept of replication in psychological science has received increased attention, but is still minimally represented.

Similar to what has been found in the published literature, psychology journals are only somewhat open to replication at present. In a study of more than 1,000 journals in psychological science, Martin and Clarke (2017) find that just 3% express openness to replication pieces in their submission guidelines. Focusing on the specific wording used in journal-submission-guidelines, four types of journals emerged: journals that accepted replication; journals that did not state or discourage replication; journals that implicitly discouraged replication over the submission of original/novel work; and journals that explicitly discouraged replication (Martin & Clarke, 2017). Furthermore, only 4% of journals in General (4 of 103), Cognitive (5 of 123), and Social (4 of 93) psychology were accepting of replications, while zero journals in Clinical psychology (0 of 78) were accepting of replication. It appears that replication is not widely published or accepted by journals in psychology. Many researchers argue this is changing (Nelson et al., 2018; 32

Shrout & Rodgers, 2018), but the norm is still disfavor toward replication—perhaps because of how it is sometimes (mis)used.

The “Weaponization” of Replication: The Case of Amy Cuddy

Amy Cuddy is a famous social psychologist whose work on the “power pose” was widely publicized via a TED talk (Cuddy, 2012) and peer-review publication (Carney,

Cuddy, & Yap, 2010). Cuddy’s research was not subjected to a direct replication attempt by the OSC, but instead was independently conducted by a team of researchers (Ranehill et al., 2015) and reworked, via p-curve analysis, by two prominent social psychologists

(Simmons & Simonsohn, 2015) known for occasionally showcasing direct replication attempts on their blog, Data Colada (Simmons & Simonsohn, 2017), subsequently published their findings as a commentary in Psychological Science. They concluded that together, the p-curve analysis of the 33 studies cited in the review of the literature

(Carney, Cuddy, & Yap, 2015), and the (Ranehill et al., 2015) replication, suggest that the behavioral and physiological effects of power-posing lack empirical support.

The fallout was swift and punishing for Cuddy. She was criticized in online commentary and her entire program of research came under scrutiny. She went from being a rising star in the field with a prestigious Ivy-league post to an exile. As a result of the controversy surrounding her research, Cuddy eventually stepped down from her tenure-track position at Harvard. It is worth noting that there is no evidence indicating

Cuddy behaved in an unethical way at any point in her career; and, interestingly, the first and third authors of the original (Carney et al., 2010) and review (Carney et al., 2015) studies did not experience similar treatment.

33

Possible Reasons for Resistance to Replication in Psychology

There are a multitude of possible reasons why there remains some strong resistance to replication in psychology. A few possibilities that are of great interest to the author are: normative research practices (and associated QRPs), threat associated with replication, and attributions connected to successful versus failed replication attempts.

Normative Research Practices

As noted earlier, research by John and colleagues (2012) has led us to question the day-to-day research practices of psychologists. They find that many psychologists report engaging in what could be deemed QRPs. This finding alone justifies a greater emphasis on replication as a buffer against such practices biasing the literature; but it also raises the possibility that individuals’ openness toward replication may vary according to their own practices. Such a possibility is worth exploring, and this dissertation includes appropriate analyses to begin examining this potential relationship.

Threat Associated with Replication

The aforementioned groups (OSC, Many Labs, RRR, and Data Colada) responsible for the late systematic replications in psychology are similar in that each is concerned with assessing the reproducibility of psychological science; however, some approaches are relatively more antagonistic or threatening, while other approaches are more constructive. Approaches to replication can be grouped into two different categories that I term “method-oriented” versus “effect-oriented.”

Method-oriented versus effect-oriented approaches to replication. Method- oriented approaches target specific findings for replication; importantly, in the process they cross-validate the measures, materials, and/or procedures original researchers used 34 to capture the original results. An effect-oriented approach targets known relationships or

“effects” in the literature and does not necessarily test specific measures, materials, and/or procedures. Inherent in a method-oriented replication is the testing of the exact measures used by original authors. Conversely, an effect-oriented replication is focused on devising standard methods necessary to test and detect the targeted effect. The distinction to keep in mind between method-oriented and effect-oriented approaches is the replication of a specific methodology (method-oriented) versus the replication of a general association/relationship between variables (effect-oriented). The former is a direct replication attempt that uses identical procedures to detect the original finding, whereas the latter uses a standard methodology to establish the effect in a generalizable sense.

This author would contend that a method-oriented approach is more threatening than an effect-oriented approach. With a method-oriented approach, the focus is on specific studies and the methods the original authors used. As such, if a study fails to replicate or partially replicates, it becomes easier to blame the original researchers (e.g. sloppy data, questionable practices, fraud, etc.). With an effect-oriented approach, there are more alternative explanations available should the study partially replicate— explanations that could allow the original authors to avoid some of the responsibility/fault

(e.g. different methods, older versus newer statistical techniques, etc.). In essence, a method-oriented approach implicitly places more of the focus—and criticism, should replication fail or partially replicate—on the original authors; an effect-oriented approach is more about the general relationship/effect rather than anything amoral or inept on the authors part. 35

The OSC’s approaches to systematic replication. The OSC has been influential in shifting the field of psychology toward increased transparency by advocating and practicing preregistration and the sharing of data and other supplemental materials with the scientific community. This sentiment and practices are constructive for science; however, consider OSC’s Reproducibility Project: Psychology (RPP), which systematically targeted specific studies from 2008 for replication. The OSC’s intention were never malicious; however, inherent in their process for selecting studies for replication, they 1) targeted a specific study, and consequently those researchers responsible for it, and 2) minimally included original authors in the replication efforts.

This is not to say that the OSC excludes the original authors; in fact, all original authors were contacted and asked to provide their input. However, in part, the methodology used by the RPP replicating team inherently separates the original authors from the project, which could imply antagonistic sentiments fostering a sense of threat associated with replication research.

Alternatively, the Registered Replication Report (RRR) approach 1) selects a known effect, 2) submits a proposal arguing the value of the RRR to an agreeable editor,

3) if approved the proposal is then vetted by original and replication authors with the purpose of creating a sound replication protocol, 4) the finalized protocol is used to solicit other labs to participate, requiring they follow the agreed upon protocol, and 5) the editor agrees to publish the RRR regardless of outcome, thus control for publication and avoids selection biases (Simons et al., 2014). Similarly, the Many Labs’ project approach targets psychological effects as opposed to specific studies and makes efforts to standardizes the methodologies across labs. 36

Attributions Connected to Successful Versus Failed Replications

A failed replication can potentially lead to negative attributions toward the original authors. They might be viewed as untrustworthy, and their research could be labeled as careless or invalid. Such attributions—especially those targeted toward the researchers themselves—could shape attitudes toward replication. As such, attributions are important to examine.

By one definition, an attribution is the outcome of a process that attempts to understand and explain the reason and cause for an observed, imagined, or thought- about behavior (Moskowitz, 2005). Intent is important in establishing the reason behind an action. We seek meaning from our environment by observing our actions and the actions of others. From this we infer what we and others’ intentions are or were. For instance, the cause of a failed replication may be attributed to lax procedures or flexible research practices on the part of the original authors (i.e., dispositional causes). This could also be attributed to differences between the original and replicating studies or sampling error (i.e., situational causes).

Divergence between self and other perceived attributions is well established

(Pronin, Ross, & Gilovich, 2004). In particular, when people hold different impressions relative to their peers there is a tendency to ground their understanding of others’ behaviors and beliefs as the result of negative (Ross, 2018) or less normatively accepted actions and thoughts (Pronin et al., 2004). In daily life one may notice that, relatively, people perceived others’ actions as more extreme or less normative then their own actions.

Personal reality is subjective but perceived objectively by the beholder. Thus, 37 people are convinced they perceived the world ‘as it is’ and that the actions and beliefs of others that differ stem from that person’s disposition (Pronin et al., 2004). This phenomenon also extends to group dynamics, where people tend to be biased toward their immediate in-group (e.g., family, ethnic community, area of research) relative to opposing out-groups (Vivian & Berkowitz, 1992). Keep in mind these phenomena are grounded in culture, specifically perceived discrepancies between what others ought to do versus what one actually does within similar situations.

It is important to keep in mind that the latest direct replication attempts have been the testing of specific tasks with the purpose of validating a specific finding. Replicable findings are not a direct indication of a healthy area of scientific knowledge per se; direct replication in and of itself is not a valid measure of a construct, but only the tasks (i.e., methodologies) used to capture the original findings (Klein, 2014; Trafimow & Earp,

2016). Thus, a series of replications may lend support for an effect, and a single replication attempt may lend support for a specific finding. With the burgeoning discussion of these varying issues it appears that the psychology community does not widely consider these limits (Klein, 2014; Nelson et al., 2018; Shrout & Rodgers, 2018;

Trafimow & Earp, 2016). As a consequence, targets of replication are not being treated accordingly, meaning in light of a partial replication some researchers are prematurely discredited (e.g., the case of Amy Cuddy). Therefore, the perceived causal attributions and attitudes are crucial to understanding the dynamic issues surrounding replication research in psychology.

The well-established self-other perceived attribution difference (Ross, 2018) may manifest when one is attributing the causes of a partial replication outcome of a peer 38 versus their own work. A partial replication outcome is neither a success or failure, however, it is often framed as a complete success or failure. Thus, researchers may be more apt to attribute negative or less normatively respected causes to a partial replication outcome of another’s work relative to their own work. In other words, the causal attributions a researcher assigns to a partial replication outcome may differ when the attempt is a duplication of their own published work versus a peer’s published work.

After all, when one’s research is targeted for replication, the outcome, if unfavorable, can lead to the questioning of one’s integrity and credibility. Upon the outcome of a replication, others may begin to question and attribute explanations and reasons for the outcome (attribution). This climate may prime a sense of threat that is associated with replication generally. Shrout and Rodgers (2018) note, when cases of data fabrication are prevalent, it primes the scientific community to be suspicious of one another.

Consequently, attribution is worth examining given its possible links with attitudes toward replication, more generally.

39

CHAPTER 3: RESEARCH QUESTIONS, HYPOTHESES, METHODS, AND

DATA/VARIABLES

Research Aims and Hypotheses

The present research has four primary aims: (1) To establish the normative research practices of the sample in comparison with others’ findings concerning normative/questionable practices among psychological scientists (see, Agnoli et al., 2017;

Fiedler & Schwarz, 2016; John et al., 2012; Washburn 2018 for comparison percentages).

(2) Determine if some approaches to replication (e.g. method-oriented) are viewed as more threatening or antagonistic relative to other approaches (e.g. effect-oriented). (3)

Identify the causes social psychology researchers believe contribute to a partial replication outcome, and how replication outcomes are perceived more generally— specifically, how do people attribute the intent of the original authors (self or other) when a replication outcome results in a partial duplication of the original work? (4) Link all three of the above together in an exploratory analysis of how normative practices and attributions are connected to attitudes toward replication research.

The overarching purpose of the research is to help break down barriers to replication in psychology. Given that psychological science is currently faced with a

“replication crisis,” it is critical that the field move forward in constructive ways. One way forward is to make replication more of an accepted norm in the field. Psychology journals still shy away from publishing replication studies, and recent high-profile cases of failed replication involved the “weaponization” of replication to ruin peoples’ careers

(e.g., the case of Amy Cuddy). If psychological science is to move forward it is important to understand the attitudinal and social consequences of replication research. 40

Furthermore, replication—used appropriately—will be key to this effort. This research seeks to contribute to the conversation on how this can be accomplished by helping us better understand three possible barriers presently in place that hold back replication efforts: normative practices, threat and attribution.

Research Questions (RQs) and Associated Hypotheses

RQ 1: What are the research practices of respondents? How do they compare with respondents in other studies (e.g., Washburn, 2018)? What do they believe are the research practices of others?

Hypotheses: There are no hypotheses connected to the first set of research questions given that they are geared toward simply establishing the patterns/views of this sample of respondents.

RQ 2: Is there a sense of perceived threat, animosity, and distress associated with replication? If so, does this perception toward replication vary by approach?

Hypotheses: There are four hypotheses related to the second set of research questions, as below.

Hypothesis 1: The method-oriented approach will appear more threatening, antagonistic, and evoke more distress among the perceptions of researchers relative to the effect- oriented approach to replication.

Hypothesis 2: The method-oriented approach will encourage the researcher’s participation in the process less relative to the effect-oriented approach to replication.

Hypothesis 3: The method-oriented approach will appear less useful than the effect- oriented approach to replication.

Hypothesis 4: The method-oriented approach will appear less reasonable, as a request, 41 than the effective oriented approach to replication.

The purpose of the above four hypotheses is to test the assumed threat associated with replication. Many discussions have eluded to an assumed level of perceived threat that researchers may hold toward replication, but little to no empirical evidence has examined this assumption.

RQ 3: Are dispositional attributions assigned as contributing to a partial replication outcome of a peer’s more than one’s own work?

Hypotheses: There are two hypotheses connected to the third set of research questions, as below.

Hypothesis 5: Relative to a partial replication outcome of one’s own work, researchers will attribute more bias (i.e., less normative researcher practices) and negative causes as contributing to the partial replication outcome of a peer’s work.

Hypothesis 6: Researchers that believe that false-positives (i.e., false-finding/discovery) that are widespread in psychological literature will disproportionally attribute the cause of an attempt to poor intentions (e.g. p-hacking or other QRPs) of the original authors.

The rationale for the above hypothesis stems from the fact that although false- positive rates in the literature are difficult to calculate, many have estimated possible rates through simulated data (Smaldino & McElreath, 2016) and other means (i.e., p- curve analysis) (Simmons et al., 2011; Simmons & Simonsohn, 2017), and it is often rationalized that the presence of false-positives in the published literature undermine replication efforts (Nelson et al., 2018).

RQ 4: What is the association between heretofore mentioned variables (e.g. normative practices, attributions) and attitudes toward replication? 42

Hypotheses: There are no hypotheses associated with the fourth research question given that it is exploratory in nature and there is little/no guidance in the literature on how things like normative practices and attributions might impact researchers’ views of replication.

In addition to the aims, research questions, and hypotheses stated above, it is important to note that some role identity questions were asked in the data collection process. It is unclear how those researchers who experience greater role strain will vary in their attitudes toward replication research and normative research practices, relative to those researchers who experience less role strain. Thus, exploratory analysis will examine these relationships—however it is not a central concern of the dissertation study itself.

Preregistration

All survey materials and preregistration can be found in an Open Science

Foundation (OSF) repository (https://osf.io/ank9g/). The preregistration was completed during data collection, but prior to any data cleaning or analyses. Preregistration is an emerging best practice that, when utilized, helps protect against researchers degrees of freedom and promotes the practice of open science (Nosek, Ebersole, DeHaven, &

Mellor, 2018).

Present Analyses

To meet the stated aims of the dissertation—and answer the above four sets of research questions (and, by extension, test the associated hypotheses)—data were collected from a sample of social psychologists (more detail on the sample to follow) via a survey. With those data, four sets of analyses were conducted. As noted already in

Chapter 1, the first analysis uses descriptive statistics to elucidate the normative research 43 practices of respondents; the second analyses examines whether two different approaches to replication are perceived differently; the third analysis assesses the causes respondents attribute to partial replication outcome; and the fourth analysis explores how research practices and attributions are related to respondents’ attitudes about replication research.

A copy of the survey can be found in Appendix A. In summary, to answer the first set of research questions (Analysis 1), contextual measures of normative practices, involvement with replication and research generally, and demographic variables were gathered. These measures provide contextual details such as typical research practices and perceived prevalence as well as respondents’ general involvement with research (e.g.,

“What percentage of your job is committed to research?”). To answer the second set of research questions (Analysis 2), items were included that captured respondents’ feelings toward a method-oriented versus effect-oriented replication request. To answer the third set of research questions (Analysis 3), items were included that asked respondents to assign causal attribution to a partial replication outcome of a peer’s and their published work, respectively. Finally, to answer the fourth set of research questions (Analysis 4), various measures capturing attitudes toward replication research were included. (Again, see Appendix A for a copy of the survey instrument.)

Piloting of Materials

It is important to assess untested measures. Thus, materials were piloted. Before detailing the method of the current research project, this author will detail the pilot testing and measurement assessment that was conducted prior to the administration of the final survey. Institutional Review Board (IRB) approval was obtained prior to pilot testing. 44

Pilot recruitment. Word-of-mouth/snowball sampling was utilized to recruit 16 participants with the purpose of pilot testing the survey instrument. I distributed the pilot recruitment email on a local list-serve at the University of Nevada, Reno. Subsequently, two colleagues were approached and agreed to distribute the survey through two local list-serves within the Arizona State University psychology community. In addition, two peers, upon seeing the recruitment email offered to distribute the survey to other researchers, which I agreed to.

Combined, these list-serves were distributed to approximately a couple hundred social psychology and psychology researchers. Unfortunately, response was low, amounting to only 16 respondents. This is very small, but useful dataset. Thus, response patterns were examined with caution. Variability appeared to emerge in response choices.

Additionally, colleagues who reviewed the survey raised concern about potential order- effects and after further discussion and thought, counter-balancing some measures were deemed appropriate. The two replication approach scenarios, self and peer attribution conditions, and self and other normative research practices questions were all counter- balanced. In addition, four interviews with the purpose of refining the survey measures were conducted.

Cognitive and follow-up interviews. I conducted one cognitive interview, where the respondent took the survey aloud and 3 follow-up interviews. The four interviewees were from varying professional statuses (graduate student, post-doc, and professor). The professor was an expert in replication research. Apart from minor adjustments (e.g., dropping ‘approach’ from the replication request narrative) to clarify survey questions, changes made based on the four interviews were minimal, but the following changes are 45 notable. Firstly, the majority of interviewees reported that the term ‘replication controversy’ was ambiguous. This same issue was raised by a colleague who reviewed the instrument. Thus, use of this term in the survey was dropped. Secondly, one respondent indicated that the feedback replication request was vague in that the replicating researcher would continue despite the original author’s feedback. Thus, a few key words were emphasized, and components of each scenario were slightly refined to clarify the feedback-approval distinction across the replication approaches.

Additionally, it became clear that framing the recruitment email and survey toward only social psychology researchers was too restrictive for the social psychology community. Social psychology is an eclectic field attracting researchers from various content areas. To help appeal to the whole of the social psychology community, while still targeting those involved with research ‘social psychology’ was dropped from both the recruitment email and survey.

Method

Design & Procedures

Using Qualtrics software, a survey (again, see Appendix A) was administered soliciting social psychology researchers to report on their attitudes and beliefs toward, as well as their experience with, replication in psychology. Respondents were recruited from members of two social psychology organizations (see Sampling). If upon reading the recruitment email respondents agree to participate first, they were presented with two replication approaches (counter-balanced), self-peer partial replication scenarios

(counter-balanced), followed by questions about their and others’ research practices

(counter-balanced), some role identity questions, measures of respondents’ familiarity 46 with replication research, structural barriers to replication, and demographics were gathered. On average respondents took about 17 minutes to complete the survey.

Sampling

Two social psychology organizations, the Society for Personality and Social

Psychology (SPSP) and the European Association of Social Psychology (EASP) were contacted. The hope was to utilize the same sampling methods other researchers employed, which directly sent a recruitment email to all its members (Washburn et al.,

2018), however, only EASP was accommodating. Thus, a direct email was sent to over

1,000+ EASP members and the survey was sent and advertised by an EASP person of contact.

SPSP members were solicited via an ‘Open Forum’ discussion board with the title

“Seeking input on Replication Research!” The initial post went out at the end of October

2018. Two re-posts occurred, each approximately two weeks apart. Data collection concluded on December 17th, 2018. EASP members were sent an initial recruitment email in the beginning of November, however, this population was not solicited further.

Exploratory Data Analysis: Missing Cases, Exclusion, and Recoding

Upon piloting and feedback, it became evident that all responses were, in part, contingent on one’s familiarity and experience conducting research. Thus, it was decided to add a filter question, which asked respondents if they were involved with research. In addition, respondents were asked if their research has been published and how many peer-review publications, including research under review, they have. A total of 273 respondents began the survey, however, 63 or 23% of respondents did not participate beyond these three initial questions. These respondents were excluded from the final 47 sample. The final sample was 210 respondents. Of the 210, two indicated they weren’t involved with research, however, upon investigation one respondent skipped the question and subsequently indicated they had 20 peer-review publications, and the other reported being a student of social psychology. Therefore, both cases were assumed to be involved with enough research to sufficiently form attitudes toward replication research. Thus, neither was excluded from analysis.

Of the 210 respondents approximately 25% dropped-out of the survey at some point. Leaving complete data for approximately 150 respondents. Given the small sample size a stringent exclusion of respondents who fail to complete 80% or more of the survey is not appropriate. Instead, Little’s test as to whether the data are missing completely at random (MCAR) was performed. This test is insignificant (p = 0.482), meaning we fail to reject the null hypothesis that there is no pattern in the missing data. Thus, the data are missing completely at random. In addition, all data were assessed for consistency and recoded when appropriate.

Survey Materials

Again, for complete copies of materials, see Appendix A.

Replication Approaches

Two replication approaches were adapted from an email exchange of a systematic replication project conducted by the OSC (Camerer et al., 2018). In efforts to gauge the replicability of social science research, the OSC conducted 21 replications of social science findings published in Nature and Science between 2010 and 2015. To facilitate this replication project original authors were contacted and asked to provide feedback and necessary research materials. From this exchange, a method-oriented replication 48 instrument was adapted. This replication approach states “The best way to gain a better understanding of reproducibility is to study it by trying to replicate published results.”

Additionally, the approach seeks feedback regarding the replication design. At its essence this scenario represents the ways in which systematic replication attempts are often communicated to original authors. Ideally representing how one might receive a notice that their research has become the target for a direct replication.

Alternatively, an effect-oriented approach to replication was also created. This approach is identical to the method-oriented approach with the exception of two components. 1) This approach states “The best way to gain a better understanding of reproducibility is to replicate robust hypotheses and findings.” 2) The effect-oriented approach seeks the approval regarding the replication design.

The dimensions of feedback and replication of published results are thought to represent the same underlying construct of a direct replication. This is the prevailing approach to replication in the social sciences and may be the predominant of replication research in psychology. Similarly, the dimensions of approval and replication of robust findings are also thought to represent the underlying construct of a conceptual replication.

Each replication approach is followed by 3 positive (useful, reasonable, encouraged) and 3 negative measures (antagonistic, threaten, distressed). Each measure is on a 6-point scale anchored at 6-Extreme to 1-Not at all, so higher values indicate more endorsement that the approach is more useful, reasonable, encourages one’s participation, antagonistic, threatening, and distressful, respectively.

49

Partial Replication Outcome Conditions

To capture the perceived contributing causes to a partial replication outcome two conditions, a self and other condition were constructed. Each condition provides a standard definition of a direct replication and asked to imagine a partial replication outcome of a peer’s/your published work. Then respondents were asked to attribute the cause of the partial replication to 5 dispositional (four flexible research practices and selective reporting) and 4 situational causes (procedural difference between original and replication studies, publication bias, sampling error, and underpowered original study).

Each causal attribution is on a slider scale from 0-100, higher values indicate greater endorsement of the attribute as a contributing cause of the partial replication outcome.

Role Identity

Role identity questions for both one’s role as a researcher and as an academic were adapted from (Elliott & Doane, 2015). These questions were originally designed to measure peoples’ personal and social identity regarding their mental illness. Presently, the same underlying assumptions about self and identity can be made. Thus, three personal (e.g., “My role [as a member of the scientific community/within an academic institution] is an important aspect of who I am”) and three social identity measures (e.g.,

“I identify with other people who are within my [area of research/academic institution]” were designed, each was measured on 6-point agree-disagree scales.

Based on factor loadings one of the social identity measures “I feel out of place being around other people who are members of my [research area/academic institution]” was poorly correlated. Thus, it was decided to drop this item from both identity measures.

The remaining 5 items show good reliability for both researcher (α = 0.84) and academic 50 identities (α = 0.85). Higher values indicate greater identification as a researcher and academic, respectively. Note these two identity measures are significantly correlated

(0.524, p < 0.001). Thus, to avoid a multicollinearity issue, one over the other will be utilized.

Role Conflict

Role strain (i.e., work perceived conflict and ambiguity) in organizations has been examined in the work place (Rizzo, House, & Lirtzman, 1970). Role conflict survey measures were adapted from a recent contribution to this area of research (Wilson &

Baumann, 2015). Five questions, on 6-point agree-disagree scales, were adapted to capture one’s role responsibilities as a researcher relative to one’s role responsibility as an academic (e.g., “I often neglect research responsibilities because of the demands of my academic institution”). The five items show good reliability (α = 0.94). Higher values indicate greater conflict between one’s role as a researcher relative to one’s role as an academic.

Normative Research Practices

In response to the replication crisis, a survey was recently administered to 1000+ social and personality psychology researchers, in an effort to gauge the prevalence of key research practices of open science (i.e., data and materials sharing & preregistration), conducting of formal power analysis, and reporting effect-sizes (Washburn et al., 2018).

These questions were re-utilized for the present research with the addition of others measures. Using a 5-point scale anchored at 1-Never to 5-Always respondents were asked how frequently they and others preregister hypotheses, make data publicly available, conduct formal power analysis, report effect size, falsify data, claim results are 51 unaffected by demographics when one doesn’t know, reported unexpected findings as predicted, selectively report only studies that work, stop data collection early because they found desired results, round off p-value, collect data on multiple DVs and only report a subset, collect more data after looking to see whether results were significant, exclude data after looking at the impact on results, and self-cite one’s published research.

Familiarity with Replication

It is important to capture researchers’ familiarity with replication research. Thus, three questions about the amount of formal training, research conducted, and participation one has with replication were asked. Each was measured on a 6-point scale anchored from 1-None to 6-A lot. If respondents had participated in replication research some follow-up questions about the outcome and publishing status of the most recent replication one participated in were asked. Respondents were also asked if their work has been the target of a replication.

Structural Barriers to Replication

Another related feature regarding attitudes toward replication research is the perceived acceptance of replication research. Thus, agree-disagree statements regarding the perceived acceptance of academic journals, Publishers, Peer Reviewers, Journal

Editors, academic institutions, and field of psychology were asked. Each measure was anchor from 1-Strongly disagree to 6-Strongly agree. Higher values indicate that greater acceptance is needed.

52

Target/Dependent Measures

Attitudes toward replication. Three questions, each on a 5-point scale, asked respondents about their perceived general opinion toward replication research in psychology, their support of replication research in psychology, and their confidence that the majority of findings in psychology will replicate. As an attention check two of these items were reverse coded (general opinion and support). These items were recoded accordingly. Higher values indicate more favor and support toward replication research, and greater confidence that the majority of findings in psychology will replicate.

Additionally, respondents were asked two questions about the discussion of replication and extent that replication research impacts their future research practices.

Both of these measures were on 6-point scales with higher values indicating greater favor or impact, respectively.

Health of the field. Using a 0-100 slider scale, two measures of the perceived percentage of published research that is false and true were asked. Specifically, the percentage of published research believed to be false-discoveries/false-positives or true- discoveries/true-positives.

Sample Characteristics

Women composed 60% of the sample and men 37%. The majority of the sample

(71%) were American. Of the American respondents 87% identified as White

American/Caucasian. Respondents ranged from 20 to 76 years old, with an average age of 37 and median of 33. Professional status is as follows: 17% Full Professor, 15%

Associate Professor, 18% Assistant Professor, 15% Post-doctoral Fellow, 2% Masters,

27% Graduate Student, 1% Adjunct Professor/Lecturer. 53

On average 57% of respondents’ jobs were dedicated to research. The majority of respondents had at least one peer-review publication and one respondent holds 140 peer- review publications. On average, respondents had 24 peer-review publications, with a median of 13 publications. On average, respondents had received ‘a little’ or ‘some’ formal training, and had conducted and participated in ‘a little’ or ‘some’ replication research, with a small majority having little to no familiarity with replication. A noticeable portion (48% or 101 respondents) had participated in replication research, while 17% of respondents’ work had been selected for replication. On average, respondents agreed that academic journals, Publishers, Peer Reviewers, Journal Editors, academic institutions, and field of psychology all need to be more accepting of replication research.

Descriptive statistics for the sample—particularly for each of the variables used in analyses—can be found in the analysis chapters (Chapters 4 through 7, respectively).

54

CHAPTER 4: ANALYSIS OF NORMATIVE RESEARCH PRACTICES

(ANALYSIS 1)2

The State of Psychological Science

As noted in Chapter 2, beginning in 2010 through 2015, a series of events have led many to claim that science is in a replication crisis, specifically in psychological science. The first of these events was the high-profile case of data fabrication by a prominent social psychologist, Diederik Stapel (Verfaellie & McGwin, 2011). Secondly, was a collection of studies that indicated alarming base rates of various questionable research practices (John et al., 2012) that falsely inflated rates of false-discovery

(Simmons et al., 2011). Questions of reproducibility and replication quickly emerged as a central topic. As the evidence unfolded, once common research practices (e.g., failing to report all of a study’s dependent variables) have been reframed as questionable practices bordering on scientific misconduct (John et al., 2012; Simmons et al., 2011). Notions of p-hacking (i.e., researcher degrees of freedom) and flexible research practices have become widespread. Thirdly, the publication of the Open Science Collaboration’s first systematic replication attempts of 100 psychology studies wherein 47% of the direct replications were within the 95% confidence interval of the original finding (Open Sci et al., 2015). This finding place serious doubt as to the reproducibility of psychological science and led some to declare that the field was lacking in replication research (i.e., in a replication crisis). These developments, along with other relevant events, have shifted the

2 Originally, it was proposed to conduct in-depth interviews to help gauge the ‘lay-of-the-land’ and attitudes social psychologists hold toward replication, however, two review pieces (Nelson et al., 2018; Shrout & Rodgers, 2018) and an empirical publication (Washburn et al., 2018) emerged that help establish the current state of psychological science. In addition, materials detailing actual replication request exchanges were made available from the OSC. These materials were used to create the replication approach instruments. Thus, it was determined that in-depth interviews were unnecessary. 55 normative beliefs and practices of mainstream psychology, particularly within social psychology (Nelson et al., 2018; Shrout & Rodgers, 2018).

Understanding of researchers’ attitudes and emerging practices in response to the state of social psychology have begun to emerge. Out of the reproducibility and replication debate many have argued for increased transparency through the practice of preregistration (Nosek, et al., 2018), but adoption of this reform is not widespread.

Similarly, survey data from 1,000+ social and personality researchers indicates that preregistration and data sharing practices are infrequently done (Washburn et al., 2018).

Resistance toward preregistration and data sharing stemmed from lack of access, perceived necessity, and knowhow. Although this is not the focus of the present studies, it is important to detail some sample characteristics and frequencies of various normative practices. All-in-all this will contextualize the day-to-day research conduct of the respondents in the present sample. This summary combined with the aforementioned research contextualizes the state of social psychological science. Originally, it was proposed that I would conduct in-depth interviews to capture in detail the state of psychological science and attitudes toward replication, however, the state of psychological science is well captured by prior research and an appropriate survey instrument was created from adapted real-world materials and expert feedback.

Normative Research Practices with the Present Sample

The research questions and hypotheses relevant to this chapter are concisely repeated below as a reminder:

RQ 1: What are the research practices of respondents? How do they compare with respondents in other studies (e.g., Washburn, 2018)? What do they believe are the 56 research practices of others?

Hypotheses: There are no hypotheses connected to the first set of research questions given that they are geared toward simply establishing the patterns/views of this sample of respondents.

The analysis in this chapter sought to examine the frequencies of the same aforementioned research practices. Unlike previous research, the present study attempts to counter self-report bias by asking respondents to report on the perceived frequency on fifteen research practices of others as well as themselves. For a summary of reported frequencies see Table 1. The fifteen norms can be categorized across 4 typologies: emerging practices, a fraudulent practice, flexible practices, and frequency one and others self-cites their published work. The fifteen research practices will be discussed in turn. The design is suited for Chi-square analysis but, the distribution of the data indicates the data are not suitable for such analysis. In particular, the sampling distribution of the

Chi-square test statistic uses an approximate distribution. Therefore, using a Chi-square test requires a minimum count frequency greater than 5 within each category (Field,

2012), however, the data are bound. Thus, no statistical comparisons between self and others’ frequencies were performed. For a summary of reported frequencies see Table 1.

Emerging practices. The frequency of four prevailing research practices were measured: preregistration of hypothesis, public sharing of data, conducting formal power analysis, and report effect-sizes. Consistent with previous estimates (Washburn et al.,

2018) preregistration and open sharing of data are rarely or never practiced with 43% of respondents reporting they never or rarely preregister their hypotheses. This practice appears to be emerging, however, in that 38% of respondents reported they often or 57 always preregister their hypotheses. Likewise, respondents felt that 47% of others preregister more some of the time.

Approximately 32% of respondents reported rarely or never practicing in open data (i.e., public sharing of data), while 47% reported that they share data often or always. Respondents believe that others (51%) share their data more so than compared to themselves (26%). The self-reported estimates are consistent with previous findings

(Washburn et al., 2018).

The two remaining practices of conducting formal power analysis and reporting of effect-size were by far the most widely endorsed emerging practices, with the majority of respondents reporting they often or always conduct power analysis and report effect- sizes. Generally, respondents believe they practiced both norms more frequently than their peers.

Self-reported and perceived falsification of data. Two respondents (1%) of 155 self-reported they often falsify data. These two respondents, one a full professor (with 70 publications) and the other a graduate student (with 8 publications), reported that approximately 80% of their job consisted of research. In addition, four respondents (3%) reported they sometimes falsify data. These four respondents, two post-doc fellows and two graduate students (with between 3 and 12 publications), reported that approximately

41%, 81%, 90%, and 100% of their job consisted of research. Two full professors with 20 and 55 publications respectively, reported that 47% or 51% of their jobs consist of research, reported they rarely falsify data. Although alarming, these frequencies are consistent with other established base-rates of falsification (Agnoli et al., 2017; John et al., 2012). Furthermore, it is clear, at least with these data, that the fabricating researchers 58 are very involved with research and some have produced a sizable amount of publications. Lastly, only 15% of respondents believed others never falsify data, while

78% believed others rarely falsify data.

Flexible research practices. Occurrence of various flexible research practices were assumed. Therefore, the purpose was to established crude frequencies of having done each practice. Findings are comparable to John and colleagues (2012). (For comparison purposes, the self-reported ‘no’ rates established by John et al., 2012, for each practice are reported in italics, when available.) Not surprisingly, the majority of respondents reported they never: claim results are unaffected by demographics when they aren’t sure (79%; John et al, 2012: 97%), stop data collection early after desired results are reached (89%; John et al, 2012: 84%), round-off p-value just over .05 (86%; John et al, 2012: 78%), collect more data after seeing if results were significant (59%; John et al,

2012: 44%), and exclude data after looking at the impact of doing so on results (62%;

John et al, 2012: 61%). Conversely, the majority of respondents believe that others often selectively report studies that work (56%; John et al, 2012: 45%). The flexible practice of selectively reporting studies that work appears to have the most variability, with 5% of respondents reporting they always practice this. Consistent, however, is the belief in others (61% often) and self-practice (75% always) report all of a study’s conditions.

The practice of self-citing one’s published work. Frequencies of this measure were included as an exploratory measure with the intent of detecting any differences that may emerge. The majority of respondents (54%) believe that others often self-cite their published work, while only 20% of respondents reported doing so always. Self-citing has not emerged as a primary issue, but arguably when done in excess can inflate journal 59 impact factor calculations, which subsequently ‘muddies the waters.’

An a priori prediction was made (Hypothesis 6). Specifically, that researchers that believe that false-positives (i.e., false-finding/discovery) are widespread in psychological literature will disproportionally attribute the cause of an attempt to the poor intentions (e.g. p-hacking or other QRPs) of the original authors. Although the percentage of false-positives in the literature is correlated positively with assigning flexible practices as causal in explaining a partial replication outcome of a peer’s work, it is weakly associated (no greater than 0.289). Thus, support for this hypothesis was not obtained.

60

CHAPTER 5: ANALYSIS OF THREAT TOWARD DIFFERING APPROACHES

TO REPLICATION (ANALYSIS 2)

As noted in Chapter 2, the late systematic replication attempts of psychological science share the primary objective of assessing the reproducibility of published findings and effects. In particular, two prevailing typologies of replication have materialized, a method-oriented and effect-oriented approach. Method-oriented approach to replication is a direct replication attempt that uses the original methods and procedures to detect a specific statistical finding. Conversely, an effect-oriented approach to replication is more akin to a conceptual replication, which incorporates original materials to construct a replication protocol for approval. It is assumed that method-oriented approaches toward replication are more antagonistic and threatening relative to effect-oriented approaches.

The purpose of Analysis 2 is to test this difference.

The research questions and hypotheses relevant to this chapter are concisely repeated below as a reminder:

RQ 2: Is there a sense of perceived threat, animosity, and distress associated with replication? If so, does this perception toward replication vary by approach?

Hypotheses: There are four hypotheses related to the second set of research questions, as below.

Hypothesis 1: The method-oriented approach will appear more threatening, antagonistic, and evoke more distress among the perceptions of researchers relative to the effect- oriented approach to replication.

Hypothesis 2: The method-oriented approach will encourage the researcher’s participation in the process less relative to the effect-oriented approach to replication. 61

Hypothesis 3: The method-oriented approach will appear less useful than the effect- oriented approach to replication.

Hypothesis 4: The method-oriented approach will appear less reasonable, as a request, than the effective oriented approach to replication.

Results: Predictions

Method-Oriented versus Effect-Oriented Approaches to Replication

To investigate differences between researchers’ impressions of two types of systematic replication, a series of repeated measures analyses of variance were performed. The assumptions of repeated measures analysis of variance are: 1) independent observations, 2) normal distribution of variables, 3) linearity, and 4) homogeneity of variance across conditions/groups. Some analysis of repeated measures designs are sensitive to unequal groups (Keselman, Algina, & Kowalchuk, 2001).

Utilizing listwise deletion, in the present analysis equalizes (i.e., balances) the groups

(method and effect-oriented). Furthermore, no respondent consistently selected extreme cases (i.e., 1-Not at all or 6-Extreme) across the six measures. Therefore, some extreme cases were noted, but no respondents were determined to be unduly influencing the linear relationship (i.e., outlying). All complete responses using listwise deletion were included in the analysis. Independence of observations is assumed. Normality, linearity, and homogeneity were assessed.

The data are moderately skewed with values between -0.730 and 1.062. Both a natural-log and square-root transformations were applied to the data, but neither helped to approximate a normal distribution. Normality requires that the sampling distribution of the target variable is normal. According to the Central Limit Theorem, in large samples 62

(n ≥ 30), the sampling distribution will approximate a normal/bell-curve shape regardless of the population distribution (Field, 2012). In heavy-skewed distributions then a large samples size of 100 or 160 is necessary (Wilcox, 2010). Therefore, with the present data one can apply the Central Limit Theorem when the sample size is large enough (n ≥ 100) and the distribution is moderately skewed. Therefore, despite the skewed distributions in the present data, one can assume that the sampling distributions are normally distributed.

The data are linearity related.

Violations of homogeneity (i.e., sphericity or assumed equal variance across groups) are common with repeated/within designs and a robust solution, Greenhouse-

Geisser, is available (Keselman et al., 2001; Keselman, Carriere, & Lix, 1993). Given the heterogenic data all reported statistics will be using the robust Greenhouse-Geisser estimations. Two tail-hypothesis and a statistically significant cut-off criterion of 0.05 across all analysis are assumed.

Post hoc power analysis. To determine if the present analysis were adequately powered, a post hoc power analysis for an ANOVA: Repeated measures, within factors was conducted. Taking the most conservative epsilon () correction (i.e., correction for non-sphericity) of 0.34 (Faul, Erdfelder, Lang, & Buchner, 2007), with two groups, six measures, a samples size of 189, alpha of 0.05, and a correlation of 0.5 the analysis was

2 adequately power (power = 0.959) to detect a ηp =0.02 (comparable to Cohen’s small effect size f of 0.143). The present analyses were not adequately power to detect any magnitude of effect smaller than 0.143. Conversely, based on rule-of-thumb recommendations to protect against increased probability of making a Type I Error

(Keselman et al., 1993) the sample size is adequate. Likewise, the sample size was 63 adequate (minimum total sample size of 110 respondents) to detect an effect size of 0.25 assuming the same conservative non-sphericity correction of 0.34.

Descriptive statistics. For descriptive statistics see Table 1. For a visual display of means see Figure 1. On average, both replication approaches are deemed very useful, reasonable, and encouraging. Conversely, on average, both replication approaches are deemed minimally threating, antagonistic, and distressful.

Positively-valenced measures. Regarding the three positively-valenced measures, researchers’ impression of the utility of the two replication requests did not

2 significantly differ, F(1, 189) = 0.220, p = 0.639, ηp = 0.001. Similarly, differences in

2 reasonability of the request did not significantly differ, F(1,189) = 0.129 , p = 0.554, ηp

= 0.002. Lastly, the extent that one felt encouraged by either approach did not

2 significantly differ, F(1, 189) = 2.213, p = 0.083, ηp = 0.016. The means of both perceived reasonability and encouragement are slightly greater for the effect versus method-oriented approach, respectively. This is consistent with expectations, but this difference is neither statistically or meaningfully different. Occasionally, it is meaningful to distinguish between practical and statistical significance. With the present findings the mean differences are neither statistically or meaningfully different, despite being slightly different in the expected direction (see Figures 1-3). For summary statistics see Table 2.

Negatively-valenced measures. Regarding the three negatively-valenced measures, perceived threat associated with the two replication approaches did not

2 significantly differ, F(1, 189) = 2.122, p = 0.147, ηp = 0.011. Similarly, differences regarding the antagonism associated with each request did not emerge, F(1, 189) = 1.201, 64

2 p = 0.247, ηp = 0.006. Lastly, the extent that one felt distress by either approach did not

2 significantly differ, F(1, 189) = 3.483, p = 0.064, ηp = 0.018. On average researchers perceived the method-oriented approach as more threatening, antagonistic, and distressful than the effect-oriented approach. Although this is consistent with expectations, the mean differences are neither statistically or meaningfully different (see Figures 4-6). For summary statistics see Table 2.

Exploratory Results: Postdiction

Examining Researchers’ Impression of Replication by Group: Gender and

Professional Status

Motivated by truth-seeking (Nosek et al., 2012; Lishner, 2015) additional postdiction analyses to examine the perceived threat associated with replication were conducted. Upon examining previous findings, it was thought that differences regarding impressions toward replication between those with less and more assumed power, privilege, and prestige may emerge. In particular, perceived threat differences associated with replication were examined among men and women, as well as, tenure and non- tenure professionals.

To investigate gender and professional status differences between researchers’ impressions of two types of systematic replication, a series of multivariate analysis of variance (MANOVA) were conducted. The assumptions of MANOVA are as follows: 1)

Independence: residuals should be statistically independent, 2) data are randomly sampled from population of interest, 3) Multivariate normality, and 4) homogeneity of covariance matrices. The assumption of independence is assumed, in that no 65 unaccounted-for statistical relations are known to exist in the data analyzed. Likewise, random sampling is assumed in that all members of the target population (e.g., SPSP members) had equal opportunity to participate. Data violate assumption of multivariate normality, however, given the samples size Central Limit Theorem can be applied and normality assumed (Wilcox, 2010). Apart from one set of dependent measures

(reasonability of replication request) Levene’s test of homogeneity of variance is met.

Thus, equal variance across groups (gender, professional status) is assumed in most cases, which adds strength to the multivariate test statistics. The assumption of homogeneity of covariance matrices is violated in all analyses. Thus, robust estimates

(Pillai’s Trace) will be reported.

Post-hoc power analysis indicates to detect a small effect size (0.20), across two groups, with two measures, and a 0.50 correlation between repeated measures requires a minimum total sample size of 246. The same specifications assuming 5 groups requires a minimum sample size of 355 respondents. Thus, the present analyses were not adequately powered, despite being robust to assumptions.

Researchers’ impressions toward replication by gender. Regarding the positively valenced measures, no statistical differences between men and women emerged, meaning the perceived utility, reasonability of the request, and encouragement of each replication approach impressed on men and women equally. Similarly, means and standard deviations by group were comparable to the overall means and standard deviation. Likewise, regarding the negatively valenced measures, no statistical differences between men and women emerged, meaning the perceived threat, animosity, and distress each replication approach evokes is equal across men and women. Similarly, 66 means and standard deviations across men and women were comparable to the overall means and standard deviations. For summary of test statistics see Table 4.

Researchers’ impressions’ toward replication by professional status.

Regarding the positively valenced measures, no statistical differences across professional status emerged, meaning the perceived utility, reasonability of the request, and encouragement each replication approach impressed on all professional (tenure or non- tenure) equally. Similarly, regarding the negatively valenced measures, no statistical differences across professional status emerged, meaning the perceived threat, animosity, and distress each replication approach evokes is equal among all professionals.

Additional analyses dichotomizing professional status as tenure (Full and Associate

Professor) or non-tenure (Assistant Professor, Post-doctoral fellow, and Graduate student) were performed. No differences between tenure and non-tenure professionals emerged. For summary of test statistics see Table 5.

Discussion

The results of Analysis 2 indicate that there are no statistical differences across various dimensions between the method and effect-oriented replication scenarios. In other words, the two approaches are statistically equal, and any actual differences are negligible. This interpretation is based on the assumption that null differences are not stemming from patterns in missing data or measurement issues. There is no indication that results were influenced by patterns in missing data or the instrument. On the contrary, there is indication that 1) the data are missing completely at random and 2) the instrument was adequately vetted and revised. Furthermore, a follow-up interview with a pilot respondent, who was an expert in replication, commented that the two replication 67 scenarios evoked what was intended, meaning in their opinion the instrument was simulating the intended conditions (i.e., maintained face validity). In that the method- oriented approach was soliciting feedback and was an attempt to replicate the original finding, whilst the effect-oriented approach was seeking one’s approval and attempting to replicate evidence consistent with the original finding. Thus, this author is confident in accepting the findings at face value.

Taking the findings at face value suggest that there are no differences between the two replication scenarios, meaning perceived threat, at the individual-level, is not a barrier associated with replication. This research question was thoroughly examined and further explored with the present data. A priori it was predicted that method-oriented replication would evoke more threat, animosity, and distress, as well, appearing less useful, reasonable (in request), and evoke less encouragement in one’s participation in the process than the effect-oriented scenario. No evidence to support this collection of predictions was discovered. Contrary evidence in that these two approaches to replication impress upon researchers equally is present.

These findings are telling, in that threat and distress are not widespread when one’s research is the target of a systematic replication, at least among the population of interest (American social psychologists). Furthermore, differences between men and women and professional status failed to emerge. This suggests that women assuming a minority status, in the sense of power, privilege, and prestige reveal no differences regarding threat and replication. Likewise, members of the professional community with or without tenure do not differ in their impressions of the two replication approaches.

These findings are hopeful, suggesting that social psychology is open to replication on 68 the individual-researcher-level regardless of approach. This may indicate a population of researchers who are welcoming and open to systematic replications of their and others’ work. Of course, a larger and more representative sample is needed to validate this, but preliminary findings suggest a non-threating disposition surrounding the prevailing approaches to replication.

The author can think of two plausible explanations for the present null findings.

First, the majority of the sample has received ‘some’ training, about a quarter of the sample has participated in replication research of some kind, and nearly half (48%) of the sample had little to no familiarity with replication research. It is clear that, in part, a majority of researchers are at most familiar with replication, generally. Lack of familiarity may make subtle differences between the approaches difficult to evoke. Thus, differences between the method and effect-oriented approaches were too subtle to evoke diverging impressions for those less familiar with replication research. If this is the case then the intended effect would not manifest for the majority of respondents. In support of this, a few respondents contacted the author, wishing questions about conceptual replication were asked. These few respondents often referred to the measures generally; however, the effect-oriented approach to replication in part was indicative of a conceptual replication, keeping in mind direct and conceptual replications are ideal types. These few, non-representative, comments in part reflect those respondents less familiar with subtleties regarding replication.

Contrary to this notion is that respondents are aware of the influencing factors that may impact a replication of published work. These causal attributes are the focus of analyses three and four; but, in short, awareness of such factors inherently requires some 69 familiarity with replication. Furthermore, although the use of the term ‘replication’ has increased in prevalence, in essence, a lot of what researchers do is replication, at least conceptually. There is ample evidence that suggests researchers are constantly validating and cross-examining their findings through various means such as manipulation checks and triangulation. From a broad conceptual position, a literature review is in part an acknowledgment of previous work and how one might replicate said work. Only recently have discussions (e.g., Shrout & Rodgers, 2018) and empirical evidence (e.g., Simmons, et al., 2011) emerged, which focuses on examining specific research practices and how such actions impact the dependability of psychological research. At the center of this discussion is replication (Lishner, 2015). Thus, one can assume that those involved with research are familiar with replication and the various factors that contribute to the process of replication.

Another possible explanation for the lack of significant differences centers on ones’ knowledge of the replication outcome. The outcome of the two competing replication approaches was unknown to respondents. In particular, respondents were asked to evaluate each replication approach without knowing the outcome of the attempt before them. Although the focus of Analysis 2 was to examine one’s reaction once their work had been selected for replication, in hindsight, knowledge of the outcome of the replication attempt may impact impressions toward the approach used.

The findings from Analysis 2 contribute to the understanding of the impressions social psychology researchers hold toward two replication approaches. Despite failing to find statistical significance, the lack of meaningful differences between the two approaches is revealing. Keep in mind that the present results are limited in scope. The 70 present sample is composed mostly of White American social psychologists with some familiarity with replication. Future research should attempt to gather a larger more representative sample, both demographically and with greater familiarity regarding replication. Researchers’ impressions toward replication may vary among minorities and those more familiar with the various intricacies surrounding replication research.

Therefore, researchers who hold expertise in and were targets of replication would be ideal respondents for future studies. It would also be beneficial for future research to indirectly examine the attitudes and impressions researchers hold toward replication. The current research is based on self-report information. An empirical analysis of other types of data, such as blog commentaries would supplement the present research.

71

CHAPTER 6: ANALYSIS OF ATTRIBUTIONS (ANALYSIS 3)

The latest direct replication attempts have been the vetting of specific findings. A partial replication outcome is neither a success or failure, and nor is a direct replication a valid measure of a construct, but only the methods (Klein, 2014; Trafimow & Earp,

2016). It is important to keep the limits of replication in mind so that the targeted literature and corresponding researchers are treated accordingly, meaning in light of a

“failed replication” one is not quick to discredit the published literature and original researchers. Thus, the association between perceived causal attributions and attitudes are a crucial initial step toward understanding the ways in which replication outcomes are inferred.

Research Goals

As evidence that speaks to the reproduciblity of psychological science mounts,

(Camerer et al., 2018; Open Sci et al., 2015) various contributors to the lack of reproducibility have been discussed. As many have discussed at length (Shrout &

Rodgers, 2018) there are numerous factors that inevitably contribute to a partial replication outcome. The purpose of the current study is to collect complementary quantitative data regarding researchers’ attributions and attitudes toward replication research and outcomes. Specifically, examining what one causally attributes the outcome of partial replication to. A failed outcome is defined as a direct replication attempt that did not reproduce evidence for the original effect. A successful outcome is defined as a direct replication attempt that did reproduce evidence for the original effect. Thus, a partial replication is neither a complete failure or success. Regardless, resulting outcomes would appear critical in evaluating replication generally. Arguably one would ponder a 72 partial replication or fail to replicate outcome more relative to a full replication outcome.

Furthermore, a partial and failed replication outcome places doubt on the original findings, underlying constructs, and assumed auxiliary assumptions. Examining this relationship is the focus of Analysis 3.

The research questions and hypotheses relevant to this chapter are concisely repeated below as a reminder:

RQ 3: Are dispositional attributions assigned as contributing to a partial replication outcome of a peer’s more than one’s own work?

Hypotheses: There are two hypotheses connected to the third set of research questions, as below.

Hypothesis 5: Relative to a partial replication outcome of one’s own work, researchers will attribute more bias (i.e., less normative researcher practices) and negative causes as contributing to the partial replication outcome of a peer’s work.

Hypothesis 6: Researchers that believe that false-positives (i.e., false-finding/discovery) are widespread in psychological literature will disproportionally attribute the cause of an attempt to the poor intentions (e.g. p-hacking or other QRPs) of the original authors.

Results: Prediction

Attributional Causes of a Partial Replication Outcome of a Peer’s Versus One’s

Own Work

To investigate the causal attributions, researchers assign to partial replication outcome of a peer’s work relative to their own work, a series of repeated measures analyses of variance were performed. No outlying cases were identified. Listwise deletion was utilized in the present analyses, doing so balanced the groups (partial 73 replication of a Peer’s and Your research). The same assumptions of normality, linearity, and sphericity were assessed. The data were moderately skewed (-0.013, 1.319). Neither a natural-log or square root transformation helped to approximate normality. Regardless, sampling distributions of the target variables are assumed normal because skew is small to moderate (> |1|). The data are linearly related. Sphericity is violated. Therefore, all reported statistics use robust Greenhouse-Geisser estimations. Two tail-hypothesis and a statistically significant cut-off criterion of 0.05 across all analysis are assumed.

Post hoc power analysis. Similar to previous analyses, a post hoc power analysis was conducted to determine if adequate power was reached. In particular, an ANOVA:

Repeated measures, within factors power analysis, were conducted. Taking the most conservative estimates epsilon () correction (i.e., correction for non-sphericity) of 0.126

(Faul, Erdfelder, Lang, & Buchner, 2007), with two groups, nine measures, a samples size of 117, and a correlation of 0.5 the analysis was not adequately powered (power =

2 0.631) to detect a ηp = 0.02 (comparable to Cohen’s small effect size f of 0.143).

Conversely, assuming the identical specifications to detect the smallest effect size from

2 the present analysis (ηp = 0.08 or effect size of 0.295), the sample size was adequately powered (power = 0.997).

Descriptive statistics. For descriptive statistics see Table 6. Note, the five dispositional attributes are selective reporting, flexibility in choosing dependent variables, samples size, choosing covariates, and reporting experimental conditions. The four situational attributes are procedural differences between original and replicating studies, publication bias, sampling error, and statistically underpowered original finding. 74

For a visual comparison of the average ratings of each causal attribution, associated with a partial replication of a peer’s versus one’s own work, see Figure 7.

Dispositional causes assigned to a partial replication outcome. All five dispositional attributes were assigned, to a greater extent, as causes of a partial replication of a peer’s work than one’s own work. Specifically, the outcome of a partial replication attempt is significantly attributed more to the flexibility in choosing among dependent variables by a peer (M = 37.73, SD = 25.22) than oneself (M = 29.44, SD = 26.56), F(1,

2 141) = 27.582, p < 0.001 , ηp = 0.164. Relatively, researchers attribute the lack of a full replication (i.e., partial replication) of a published finding more to an assumed flexible practice of choosing among dependent variables by a peer and less so as a causal explanation of a partial replication of their own work. Similarly, the outcome of a partial replication attempt is significantly attributed more to flexibility in choosing a sample size by a peer (M = 35.89, SD = 27.38) than oneself (M = 28.80, SD = 28.48), F(1, 137) =

2 12.236, p = 0.001 , ηp = 0.082. Stated another way, researchers attribute the lack of a full replication (i.e., partial replication) of a published finding more to an assumed flexible practice of choosing a sample size by a peer and less so, relatively, as a causal explanation of a partial replication of their own work. The outcome of a partial replication attempt is significantly attributed more to flexibility in choosing covariates by a peer (M = 38.19, SD = 26.38) than oneself (M = 26.36, SD = 26.11), F(1, 133) =

2 40.806, p < 0.001 , ηp = 0.235. In other words, researchers attribute the lack of a full replication (i.e., partial replication) of a published finding more to an assumed flexible practice of choosing covariates by a peer and less so, relatively, as a causal explanation 75 of a partial replication of their own work. The outcome of a partial replication attempt is significantly attributed more to flexibility in reporting experimental conditions by a peer

(M = 34.77, SD = 28.05) than one’s work (M = 20.85, SD = 28.85), F(1, 116) = 46.066, p

2 < 0.001 , ηp = 0.284. Stated another way, researchers attribute the lack of a full replication (i.e., partial replication) of a published finding more to an assumed flexible practice of reporting experimental conditions by a peer and less so, relatively, as a causal explanation of a partial replication of their own work. Lastly, the outcome of a partial replication attempt is significantly attributed more to selective reporting of findings by a peer (M = 50.27, SD = 30.03) than oneself (M = 29.19, SD = 30.61), F(1, 134) = 88.097,

2 p < 0.001 , ηp = 0.397. In other words, researchers attribute the lack of a full replication

(i.e., partial replication) of a published finding more to the flexibility in choosing what findings to report by a peer and less so, relatively, as a causal explanation of a partial replication of their own work. Combine these findings support Hypothesis 5. For summary statistics see Table 6.

Situational causes assigned to a partial replication outcome. Results of comparative analysis of the four situational attributes are mixed. The outcome of a partial replication attempt is significantly attributed more to publication bias regarding a peer’s work (M = 52.27, SD = 29.07) than one’s own work (M = 40.63, SD = 32.45), F(1, 142) =

2 32.812, p < 0.001 , ηp = 0.188. Researchers attribute the lack of a full replication (i.e., partial replication) of a published finding more to assumed publication bias (i.e., favoring of statistically significant findings) regarding a peer’s work, and less so, relatively, as a causal explanation of a partial replication of their own work. In addition, the outcome of a 76 partial replication attempt is significantly attributed more to statistically underpowered original findings, regarding a peer’s work (M = 47.47, SD = 28.22) than one’s own work

2 (M = 36.84, SD = 31.73), F(1, 145) = 24.695, p < 0.001 , ηp = 0.146. Compared to one’s own work, researchers attribute the lack of a full replication (i.e., partial replication) of a published finding more to the assumption that the original sample size of a peer’s work was too small to detect a true effect. The two other situational attributions did not reach statistical significance. In particular, researchers attributed procedural difference between original and replication studies as an explanatory cause of a peer’s (M = 50.67, SD =

27.30) and their own work (M = 49.94, SD = 28.48) equally, F(1, 156) = 0.287, p =

2 0.593, ηp = 0.002. Likewise, researchers attributed sampling error (i.e., lack of representativeness between the sample used for the original and replicating studies) as an explanatory cause of a peer’s (M = 49.90, SD = 27.66) and their own work (M = 49.26,

2 SD = 28.99) equally, F(1, 155) = 0.208, p = 0.649, ηp = 0.001. Combine these findings fail to find support for Hypothesis 6. For summary statistics see Table 7.

Discussion

Situational attributions were deemed, to a greater extent, as causes of a partial replication of a peer’s work than one’s own work. Interestingly, publication bias (a situational attribute) was assigned, to a greater extent, as a cause of a partial replication of a peer’s work than one’s own work. The five dispositional attributes, on average, were perceived as less causal than situational attributes. The attributes thought to largely influence a partial replication outcome are procedural differences between original and replication studies, publication bias, sampling error, and statistically underpowered 77 original finding. In other words, self-other statistical differences aside, respondents identified more systematic causes as contributors to a partial replication outcome (see

Figure 8).

The dispositional causes were assigned according to attribution theory, however, the situational attributions were nil or in the opposite direction from what attribution theory would predict. The findings regarding the situational attributes are perplexing.

Arguably, the majority of the nine attributional causes are negatively valenced. With this in mind, respondents appear to be attributing negatively valenced causes more so to others’ work and less so to their own work. In other words, the perception is that flexible practices/arbitrary analytic decisions (flexibility in choosing DVs, sample size, covariates, and experimental conditions), selective report of findings, publication bias, and statistically underpowered original studies are attributed as more causal in others work relative to one’s own work. These six attributes are often viewed as negative actions and features of research. The two remaining attributes, procedural difference between studies and sampling error are neutral features of the research process, generally. Coincidentally, these two neutrally valenced features were the only non- statistical differences to emerge. Despite dispositional or situational differences, the perceived good or badness of each attribute appears as a diverging factor in that poorer causes are attributed, to a greater extent, to others than one’s own doing.

The results of Analysis 3 shed light on the causes social psychology researchers attribute to a finding that partially replicates. It is imperative to understand how replication outcomes are utilized and perceived. In particular, social psychologists attribute individual-level choices (i.e., dispositional attributes) more to others and less to 78 themselves, as likely contributors to a partial replication outcome. These individual-level actions are associated with more favorable attitudes toward and support of replication research in psychology.

79

CHAPTER 7: ANALYSIS OF ATTITUDES TOWARD REPLICATION

(ANALYSIS 4)

Exploratory Results: Postdiction

Examining Association between Attributional Causal and Attitudes Toward

Replication

Prior findings indicate that dispositional causes are attributed more so to a partial replication outcome of a peer’s published work than one’s own work. A primary research question and the focus of the present research is to examine the association between attributions and attitudes toward replication. Thus, an additional exploratory analysis

(Analysis 4) was conducted. Attitudes are inherently connected to values and norms, each serving, in part, as a back drop for behavior. Therefore, understanding researchers’ attitudes toward replication is a crucial component toward understanding the state of replication in psychological science. In particular, an examination of individual-level attitudes toward replication will shed light on the prevailing endorsement toward and impact of replication research. Furthermore, associations between normative practices and attitudes will lend support for future assessments of the prevalence and practice of replication research generally. A priori it was unclear how normative practices were associated with attitudes toward replication. Upon examination (to be discussed at length shortly) the norm of selectively reporting studies that “work” emerged as a normative practice of interest.

Presently, five dependent measures that capture attitudes toward replication were gathered: General attitudes toward replication and support of replication research in psychology, attitudes toward the discussion of replication in psychology, confidence that 80 the majority of findings will replicate, and how replication research has impact one’s future research practices. Through a factor analysis, three potential independent measures emerged that approximate attributions. These three factors, along with the norm of selective reporting of studies that “work” were then used to predict researchers attitudes’ toward replication. The selection of variables to model was rigorous. This is discussed in detail later. Prior to detailing the selection of variables, a discussion about the factor analysis is provided. Subsequently, a discussion about assumptions, post-hoc power analysis, selection of variables, and results follows.

The research questions and hypotheses relevant to this chapter are concisely repeated below as a reminder:

RQ 4: What is the association between normative practices, attributions and attitudes toward replication?

Hypotheses: There are no hypotheses associated with the fourth research question given that it is exploratory in nature and there is little/no guidance in the literature on how things like normative practices and attributions might impact researchers’ views of replication.

Factor analysis. A principal axis factor analysis was conducted with the 18 attributional normative items with oblique rotations (direct oblimin) to determine patterns in the data worthy of a scale. According to the Kaiser-Meyer-Olkin measure, the sample size is adequate for the analysis (KMO = .848). Similarly, all the KMO values for each item were greater than 0.738, which is above the recommended minimum of 0.5 (Field,

2012). Initial analysis indicates four factors with eigenvalues over Kaiser’s criterion of 1.

Combining these four factors explains 70.33% of the variance. Based on the inflexions of 81 the scree plot, three or four factors could justifiably be retained. The items that load on factor 1 suggest they represent dispositional attributes of others, factor 2 represent dispositional attributes of oneself, and factor 3 represent situational attributes (see Table

8). Of the items, it was determined that ambiguity arose as to whether statistically underpowered original finding was a dispositional or situational attribute. It was assumed to be situational in that it is a feature of the literature, however, respondents may have treated it as a dispositional attribute. This is evident in the cross-loadings of the items on dispositional and situational factors. In light of this conceptual ambiguity, the two items were not included in any of the three factors, despite a strong loading on factor 1 (0.637).

Additionally, both publication bias items were dropped, because a factor composed of less than three items is unstable (Costello & Osbor, 2005). In total, three factors emerged.

Assumptions of multiple regression and post-hoc power analysis. To investigate the association between attitudes toward replication and attributions four regression models were conducted. The assumptions of linear regression are as follows:

1) data are linearly related, 2) homogeneity of variance is equal across all values of the predictor, 3) Independent errors (check of autocorrelation), 4) residuals are normally distributed, 5) predictors are uncorrelated (check of multicollinearity), 6) variables types are quantitative and categorical. The outcome variables are continuous and unbounded, and 7) non-zero variance, predictors should have some variability.

All variables are continuous and categorical. Of the five target variables, one

(confidence that the majority of psychological findings will replicate) was bounded, and therefore analysis with this variable was not appropriate. Across the four regression models, data are linearly related. According to Durbin-Watson, no autocorrelation was 82 present across the models. Similarly, predictors were uncorrelated with collinearity diagnostics well in range (VIF < 5 and tolerance > 0.02). Note that multicollinearity among some variables arose. This is discussed at length in the next section (Selection of variables to include across regression analyses). The assumption of homogeneity was violated. Thus, using Stata, statistic software, robust estimates and standards errors were generated for each model. For a summary of the model, estimates see Tables 10-13. For descriptive statistics see Table 9.

According to a post-hoc power analysis for multiple linear regression, to detect a small effect (0.15), an alpha of 0.05, samples size of 144, 1 dependent measure, and 4 independent variables the sample was adequately powered (0.996), requiring a minimum samples size of 139 respondents.

Selection of variables to include across the regression analyses. Findings from

Analysis 2 suggests that gender and professional status differences do not emerge regarding researchers’ impressions toward approaches to replication. In a similar vein, these same demographics may be negligible in explaining researchers’ attitudes toward replication. Two regression models (focusing on variance explained), one without gender, professional status, and age and one with these demographics were compared. Likewise, the association between role conflict and attitudes toward replication was examined.

Upon comparing changes in the explained variance among the models, it was indeed the case that these four variables 1) added little to no explained variance and 2) the change to the estimates was nil or negligible. Following the principle of parsimony, these four variables were dropped from the final regression models. 83

Similarly, attributes regarding replication outcomes appeared as key explanatory factors regarding attitudes toward replication, likewise, flexible research practices of others and oneself are conceptually relevant, however, these measures are related and may hang together. Thus, collinearity issues were examined across the 22 variables, which include: three attribution measures, one identity measure, and 18 normative research practices measures. Specifically, proportions of variance along with small eigenvalues were examined. Through two iterations proportion of variance greater than

5% were flagged. Using this stringent criterion and based on the first iteration 11 of the

22 variables were kept (three attribution factors and eight norms variables). Specifically, from the first iteration, six of the normative variables capture the flexible practices of oneself and others, respectively. The measures that are maintained are: selective reporting of studies that work, stop data collection early after found desired result, and report a subset of DVs that work for both self and others. Two measures that capture the extent to which one reports all of a study’s conditions and the same perceived practice in others were also kept. The research identity measure and the other ten normative measures were dropped because of issues with multicollinearity.

The second iteration using the same criterion to assess the collinearity of the remaining 11 variables. All but two normative measures were dropped, only selective reporting of studies that work for both self and others were retained. These two measures emerge as the most independent out of the remaining 8 normative items. Similarly, collinearity among two attribution factors also presented an issue. In particular, the flexibility of others and flexibility of oneself model estimates were not independent, correlated at 0.623. In fact, the estimate of the flexibility of oneself factor when included 84 in the model without the flexibility of others factor switched in direction and became nonsignificant (p = 0.073). Therefore, this factor was dropped from all final analyses. The flexibility of others factor was negligibly impacted by the inclusion or lack of inclusion of the flexibility of oneself factor. Following this thorough assessment of multicollinearity, the final regression analyses includes the following four independent variables: flexibility of others factor, situational attributes factor, normative practice of selective reporting of studies of others and oneself.

Attributes and the Norm of Selective Reporting Regressed on Attitudes Toward

Replication

To examine the factors that are associated with attitudes toward replication four regression models were conducted. For descriptive statistics see Table 9.

Association between attributes, a norm of selective reporting, and attitudes toward replication research in psychology. A positive association emerged between the attributed flexible practices of a peer’s in their work and favor toward replication research in psychology (b = 0.017, p < 0.001). Similarly, the practice of selectively reporting a subset of studies that “work” by others was also positively associated with favor toward replication research in psychology (b = 0.290, p = 0.034). Combine these findings to indicate that perceived arbitrary decisions (e.g., flexibility in choosing DVs, covariates, samples size, and experimental conditions) and selective reporting by one’s peers are significantly associated with more favorable attitudes toward replication research in psychology. Contrary to these two findings is the association between selectively reporting a subset of studies that “work” by oneself (b = -0.185, p = 0.042) and situational attributes (b = -0.009, p = 0.003), both are negatively associated with 85 attitudes toward replication research in psychology. See Table 10 for a summary of model estimates.

Association between attributes, a norm of selective reporting, and support for replication research in psychology. The attributed flexible practices of a peer’s in their work was positively associated with support for replication research in psychology

(b = 0.013, p < 0.001). Also, consistent is the positive association between selectively reporting of studies that “work” by others (b = 0.244, p = 0.037) and support for replication research. A contrary finding is the negative association between selectively reporting of studies that “work” by oneself (b = -0.190, p = 0.023) and support for replication research. Consistent with previous findings regarding attitudes toward replication research, these findings indicate that the perceived arbitrary decisions and selective reporting of others are significantly associated with more support for replication research in psychology, while self-reported selective reporting of studies is significantly associated with less support for replication research. See Table 11 for a summary of model estimates.

Association between attributes, a norm of selective reporting, and attitude toward the discussion of replication in psychology. The attributed flexible practices of a peer’s in their work was positively associated with attitudes toward the discussion of replication (b = 0.020, p < 0.001). This finding indicates that the perceived arbitrary decisions (e.g., flexibility in choosing DVs, covariates, samples size, and experimental conditions) of one’s peers are significantly associated with more favorable attitudes toward the discussion of replication in psychology. Self-reported or perceived selective reporting of studies that “work” by others were not significantly associated with 86 attitudes toward the discussion of replication. Conceptually, this makes sense in that selective reporting is only one indirect component related to the discussion of replication in psychology. Similarly, situational attributions were not significantly associated with attitudes toward the discussion of replication. See Table 12 for a summary of model estimates.

Association between attributes, a norm of selective reporting, and the impact replication research has on one’s future research practices. The attributed flexible practices of a peer’s in their work was not significantly associated with the perceived impact replication research has on one’s future research practices. Self-reported or perceived selective reporting of studies that “work” by others, as well as, situational attributions were not significantly associated with the perceived impact replication has on one’s future practices. The nil findings indicate that perceived flexible practices of others, selective reporting of studies, and situational attributions are not associated with the perceived impact replication research has on one’s future practice. See Table 13 for a summary of model estimates.

Discussion

Consistently, the attributed flexible practices of a peer’s in their work was positively associated with attitudes toward, support of, and discussion of replication research in psychology. The norm of selectively reporting of studies that “work” by others was positively associated with both favorable attitudes toward and more support for replication research. Combined with the findings from Analysis 3, these findings suggests that replication research is viewed favorably and supported when others are suspected of flexible practices, which are often framed as negative actions. Conversely, a 87 personal admission of selective reporting is negatively associated with one’s favorability and support for replication research. This finding suggests that researchers hold favorable views toward and support replication research as long as their personal research actions are not involved in the replication process. Likewise, a significant positive association between the attributed flexible practices of a peer’s in their work and one’s attitude toward the discussion of replication in psychology emerged. In other words, social psychology researchers who attributed flexible practices of peer’s significantly viewed the discussion of replication more favorable. This supports the rationale that researchers favorably and supportively want replication regarding the flexible practices of others, however, this same desire is reverse or nil regarding one’s personal actions in regard to selectively reporting a subset of studies.

In the era of systematic replication of social science findings, partial replications are frequent outcomes (e.g., Open Sci et al., 2015). Furthermore, there are numerous reasons why replication attempts result in partial findings. Various decisions are made when conducting research. Some of these decisions have little to no downstream impact on quantitative analysis or reproducibility. Other decisions, however, may unduly impact the chance of obtaining statistical findings or the likelihood others can reproduce one’s original results. These more impactful practices are referred to as flexible research practices. Specifically, referring to researchers’ flexibility in choosing critical criterion prior to analysis, such as dependent variables of interest, necessary samples size, relevant covariates, and experimental conditions (Simmons, et al., 2011). In short, these research practices refer to assumed flexibility in analytic strategies. The simulated implications of flexible research practices were influential (Simmons, et al., 2011), in part leading the 88 field of psychology to turn toward an internal examination of the scientific knowledge

(i.e., reproducibility of the psychological science). Many recommended revisions and changes to the research endeavor have been put forth, but the most consistent recommendations call for more direct replication research, increased transparency and open science, preregistration, and a motivation toward truth seeking (Lishner, 2015;

Nosek et al., 2012).

This revelation also led to proactive reactions to develop strategies that balance ambiguous analytic decisions that researchers sometimes face. One such practice is preregistration. Preregistration entails the detailing and public acknowledgment of one’s research questions and analytic plans prior to observing the research outcome (Nosek et al., 2018). Preregistration helps protect against ambiguous choices that may impact the research outcome. Furthermore, it provides the structure and transparency of the research process. Further clarifying between critical types of findings such as prediction and postdiction (Nosek et al., 2018). This clarity, specifically between prediction and postdiction, allows for exploration and promote innovation. All in all, preregistration ought to promote reproducibility as well as the conceptual limits of the knowledge produced.

In light of the discussion surround replication and reproducibility, one can argue that knowledge about the impact of flexible research practices is widespread (John, et al.,

2012; Nelson, et al., 2018; Simmons, et al., 2011; Shrout and Rodgers, 2018). Flexible practices are perceived negatively and are not endorsed as justifiable courses of action.

Recall these flexible actions are assumed to be made in honest trivial ways in that researchers fail to notice ambiguous actions, at key decision points. Likewise, it should 89 be noted that the actual impact of flexible research practices on replication is mixed

(Bruns & Ioannidis, 2016; Lishner, 2015). Furthermore, tactics to mitigate against the negative sources of influence stemming from flexible practices have been developed. The primary tactic being increased transparency and openness via practices such as preregistration. In a normative-sense, flexible practices are becoming less prevalent, but the present analysis reveals a self-other distinction regarding the ways flexible research practices are attributed to a partial replication outcome.

For instance, it is well known that the current peer-review process is biased against the null, in favor of statistical significance (Kühberger, et al., 2014). This is referred to as publication bias. Efforts and discussion addressing this issue have been published, however, because this bias has been present for some time and never seriously addressed, the literature is skewed toward statistical significance. This presents a tricky issue for replication, in that the present knowledge base may display inflated findings, where robust effects are weaker than the literature suggests. Therefore, researchers, when conducting direct replications, are arguably more likely to reach partial as opposed to full replication, based solely on the assumption that the present literature is overrepresented with positive (i.e., statistically significant) findings. Knowledge of publication bias and its potential impact on replication is evident in the present findings. Interestingly, in regard to a partial replication outcome, respondents think others’ work was more impacted by publication bias than their own. This finding may stem from the negativity that surrounds publication bias. Thus, respondents are reporting in self-serving ways wherein which they attributed negative actions to other’s work and less so to their own work. 90

Similarly, it is well known that the majority of research is statistically underpowered (Cohen, 1962), meaning the necessary number of respondents needed to detect a particular effect is inadequate. Like publication bias, awareness and means to address this issue are well known, however, the field of psychology has not widely adopted viable strategies to account for this issue, in recent history. Furthermore, some argue that the prevalence of flexible practices helps explain why underpowered studies in the literature continued to reach statistical significance and inevitably led to publication

(Nelson, et al., 2018). Conceptually, underpowered studies may display findings as more robust than they actually are. Of course, from a post-positivist position we can never know the absolute truth, but instead, develop the best approximation of what the current knowledge suggests is true. Regardless, statistically underpowered literature puts replication at a disadvantage, in that a full replication may be difficult to obtain because results are weaker than the original finding suggest. In theory, a series of both direct and conceptual replications will inevitably identify a robust pattern (Lakatos, 1974).

Likewise, awareness of this issue is evident in the present findings in that, on average, statistical power was assigned as a frequent contributor to a partial replication outcome.

Interestingly, in regard to a partial replication outcome, respondents think others’ work was more impacted by statistically underpowered studies than their own. Like previous arguments, these may be explained by the negativity surround statistically underpowered studies, in that respondents had a tendency report in a self-serving way.

Generally, researchers are aware of how procedural differences can impact findings. As a collective, science shares various principles that speak to this such as, the

Hawthorne and Heisenberg Effects, and social desirability. Wherein the Hawthorne 91

Effect poses that respondents are influenced by an observer’s presence, meaning if respondents are aware they are being observed, they then may behave differently, thus impacting results. Heisenberg Effect assumes that the action of measuring something inherently changes that which is measured. Social desirability refers to respondents behaving according to the assumed expectations of others such as the research team or society. These widely known principles speak generally to the limits of science and what can be captured, especially when research involves people. Specific to the current discussion is the inherent limits of quantitative research and the unwanted variability introduced into research by flexible practices. Furthermore, a true direct replication of a study’s methodology is not possible, but only an approximate duplication of a study can occur. Thus, procedural differences, slight or dramatic, may and do impact scientific findings. Awareness of this general consideration is evident in the present findings, wherein procedural differences between the original and replication studies are perceived as a large contributor to a partial replication outcome (see Figure 8 from Chapter 6).

Another systematic cause, sampling error, was also highly attributed as a cause of a partial replication outcome. Sampling error is conceptualized, in the context of the present study, as a lack of representativeness between the sample used for the original and replicating studies. Generalizability is a constant struggle across all research. Often social science researchers are limited in resources and are forced to approximate generalizability, through clever recruitment tactics or argument of assumed equivalence across samples and measurement, that speak to a sole target population (e.g., sampling college student and generalizing to adults). Of course, limits are inherent in all samples of respondents and measures. Therefore, using a mix of methods and re-sampling are 92 important. The use of both will often give credence to generalizability. From study to study, differences between sample characteristics can impact findings. Indirect evidence of this notion is evident in the present findings in that respondents assigned sampling error as a large contributor to a partial replication outcome of theirs and others work.

93

CHAPTER 8: GENERAL DISCUSSION, CONCLUSIONS

General Discussion

The reproducibility of psychological science is approximately 30-40% (Open Sci et al., 2015). General reproducibility estimates across social science is a little higher (50-

60%; Camerer et al., 2018) and estimates in other fields, such as oncology are lower (20-

25% reproducible; Prinz, Schlange, & Asadullah, 2011). In short, science is dealing with a credibility crisis in research (i.e., crisis of replication) (Gall, Ioannidis, & Maniadis,

2017). Some may disagree with the choice in words to describe the current state of science, but few would dispute that the culture of science is in a state of change. In the midst of this change, increased attention has been given to the reproducibility and replicability of science, in particular, psychological science (Open Sci et al., 2015). The assessment of published psychological findings is often done through large systematic direct replication of top-tier research (Camerer et al., 2018). Variations of systematic replication have emerged (Simons et al., 2014), but the prevailing approach is direct replication. In agreement with the field, the most immediate way to assess the reproducibility of our science is through large scale direct replications. Thus, the way forward is direct replication.

Therefore, it is important to assess the potential barriers and attitudes surrounding the prevailing approach to replication research in psychology. Thus, the focus of the present research was with the perceived barriers and attitudes held by social psychologists. Social psychologists hold a special place in the credibility crisis in research. Unfortunately, a recent high-profile case of data fabrication (i.e., the case of

Diederik Stapel) was committed by a social psychologist. Conversely, social psychology 94 research has helped drive the assessment of the reproducibility of science, serving as a model for other areas of science (Begley & Ioannidis, 2015; Camerer et al., 2016). Social psychologists have also helped generate recommendations to improve the credibility of science (Munafò et al., 2017). Therefore, social psychology is an area of science that is intimately involved with the credibility crisis in research. To increase the support and presents of replication, it is important to understand the perceived barriers and prevailing attitude social psychologists hold toward replication research in psychology. Through four sets of analyses, the present research provides preliminary evidence that speaks to the attitudes and impressions social psychologists hold toward replication.

Normative Practices

The first analysis provides descriptive frequencies on various research practices.

Comparable to previous research (e.g., Fiedler & Schwarz; John et al., 2012) respondents believe that others’ practice in flexible research actions to a greater extent than self- reported. Likewise, self-reported rates of data fabrication and other questionable research practices are comparable to established base-rates. These frequencies ground the state of psychological science as experienced by the present sample. Although the present sample is small, it is clear that the prevailing normative practices are comparable to the practices from mainstream psychological and social psychological science.

Perceived Threat toward Replication

Contrary to a preregistered hypothesis, no perceived threat toward replication emerged. Using an actual email request from the OSC, two replication request scenarios were created, a method-oriented and effect-oriented scenario. The method-oriented scenario closely approximates the prevailing way in which replication is conducted in 95 psychology. In particular, the OSC is responsible for conducting some of the most high- profile systematic direct replications of psychological science. Often, these efforts begin with the (1) contacting of original authors requesting the necessary materials to conduct the replication, (2) solicit feedback on a preregistration of the replication, and (3) inform the original authors that a replication of their published work will take place. Although this may not seem antagonistic on its face, if the replication outcome is a partial or failed attempt the consequences to one’s reputation are damaging.

A slight alternative to a method-oriented approach to replication is what’s labeled, in the present research, as an effect-oriented approach to replication. An effect-oriented approach (1) contacting of original authors requesting the necessary materials to conduct the replication , (2) asking original researchers to approve a replication protocol and (3) seeks the approval of the original authors. Thus, this approach approximates a more inclusive and flexible type of replication, which is more akin to conceptual replication in practice. The a priori hypothesis posed that researchers’ would perceive the method- oriented approach as more threatening than the effect-oriented approach, however, patterns in the data do not support this. In fact, on average respondents’ impressions toward either approach were favorable.

This finding is hopeful. A substantial amount of discussion has been dedicated to the topic of reproducibility and replication. At times some argue, including myself, that the prevailing approach to replication is antagonistic, and the experience of receiving a request is akin to receiving an audit notification from the Internal Revenue Service (IRS).

The present data do not support this notion. This does not imply that the experience of receiving a notification of replication will not provoke a sense of threat or distress, but 96 instead that the levels of threat and distress associated with a replication request are negligible. This finding may be an indication of a community that welcomes replication, generally, regardless as to whether their work is being specifically targeted. Furthermore, the overall positive favorability toward replication research in psychology indicates that respondents may be aware of its scientific value and purpose, generally.

Additionally, the credibility crisis is perhaps so widespread that the issue is perceived as a collective burden, wherein individual researchers are not threatened when their work is selected for replication. This is consistent with most respondents believing that academic journals, publishers, peer reviewers, journal editors, academic institutions, and the field of psychology ought to be more accepting of replication research. In other words, the hearts and minds of researchers have shifted, however, the perceived expectations stemming from the institutions and stakeholders of science has not equally shifted toward a greater acceptance toward replication research. This is consistent with various recommendations to incentivize the research enterprise toward truth-seeking over publishing (Lishner 2015; Munafò et al., 2017; Nosek et al., 2012).

Perceived Attributional Causes Assigned to a Partial Replication

Respondents disproportionally attributed flexible research practices as causes of a partial replication of a peer’s work than their own work. In other words, factors that are more under a researcher’s personal control were assigned with greater frequency as potential causes contributing to a partial replication outcome. This is consistent with respondents assuming greater prevalence of flexible practices in others over themselves.

This is also consistent with self-other distinction that attribution theory would predict.

Inconsistent with attribution theory’s self-other distinction, however, is the assignment of 97 situational causes. Specifically, publication bias, sampling error, and procedural differences between original and replicating studies were assigned comparable as causes of a partial replication of both a peer’s and their work.

These findings are not surprising; however, the greater assignment of situational causes that impact a partial replication is indicative of broader issues that have been thoroughly discussed elsewhere. For instance, publication bias has emerged as a frequent issue that complicates and undermines the reproducibility of science, so given that replication has emerged as the primary metric to assess reproducibility, this is not a revelation.

Furthermore, procedural differences between original and replicating studies was a main criticism of a well-known systematic replication attempt of psychological science

(Gilbert et al., 2016). Inherently a direct replication is limited in resources and feasibility, as are all research endeavors, and such limits may introduce ‘noise’ in the attempt, which in part may undermine replicable findings. This is why it is important to keep in mind that no single replication attempt has the final say, but instead, it is important to consider the outcome of a replication attempt within the broader context of science. Specifically, that science is an accumulative and iterative process involving 20 million smart people

(Ioannidis, 2018) and it is the integrity of this vast web that matters. All-in-all, respondents in the present sample appear aware of the broad contributing factors that impact replication outcomes, however, future research ought to examine researchers familiarity, knowledge, and levels of mastery regarding replication research. Ultimately, it is valuable to know if scientists’ understanding of the benefits and limits of replication 98 research is rich, as opposed to merely knowing about the utility replication brings to science.

Attitudes toward Replication

As noted previously, respondents are overall very supportive of replication research and believe that psychological science ought to be more accepting of replication research. Attitudes toward replication, in the present data, diverge along some self-other distinctions. In particular, self-reported selective reporting of a subset of studies that

“work” was associated with less support and favorability toward replication research in psychology. Conversely, assumed selective reporting of studies by others was positively associated with more support and favorability toward replication research in psychology.

Likewise, greater attributed flexible practices as causal of a partial replication outcome on the part of a peer’s were associated with more support and favorability toward replication research in psychology.

These findings can be interpreted relative to open science practices in that less assumed transparency among other researchers (i.e., selective reporting) fosters more support and favorable attitude toward replication research. Perhaps when others are flexible in their research practices it is perceived as misleading, while when one personally selectively reports studies it is perceived as a justifiable and defensible action.

Both of these instances may be true but, it is important to keep in mind that transparency is a vital course to address this disparity. When one is transparent and open in their research practices, one also allows for the scientific community to provide input. Plus, transparency fosters trust, some may argue that increase transparency is being promoted 99 at the expense of trust among scientists, but this author would argue that transparency goes in tandem with trust.

Furthermore, the heart of the credibility crisis in research is not falsification or fabrication, but inherent biases among researchers (e.g., flexible practices) and systemic cultural practices that emphasize publishing and novelty over replication. In short, trustworthiness of published science and as a consequence the trustworthiness of the actions and work of fellow researchers. This is evidence in the findings that researchers are willing to hold others accountable for selective reporting and flexible practices via replication, but conversely those that personally selectively report a subset of studies hold less support and favorable attitudes toward replication.

Limitations to the Present Research

The present research has obvious limitations in that the sample is small and composed of mostly White American social psychology professionals. Members of minority groups and researchers from around the world were not well represented in the present data. Furthermore, the data are based on self-reported measures which are susceptible to social desirability effects, wherein respondents may under-report normatively unacceptable actions and over-report normatively acceptable actions.

Regardless, reports of fairly atrocious actions (e.g., falsification of data) were measured and comparable to established base-rates of occurrences, lending support for minimal social desirability effects.

Similarly, self-selection effects may thwart the generalizability of the present findings. It is plausible that researchers most interested in replication research disproportionally chose to participate in the present study. Therefore, the present findings 100 may reflect the attitudes of those researchers that self-selected to participate in the survey and not the attitudes of social psychologists, generally. Despite this limitation, the present research successfully captured the varying attitudes researchers hold toward replication research in psychology, thus self-selection effects are perhaps of minimal consequence.

Another limitation may be with the generalizability of the measures and findings and verisimilitude. In particular, the replication measures were vetted and maintain face validity, but despite these efforts, the replication scenarios may not be indicative of the actual conditions of a replication request. Likewise, limitations of survey design, generally, may force respondents to form attitudes and feelings toward hypothetical realities they have never imagined or experience, which lack generalizability to actual situations and truth. Thus, some researchers’ experiences and opinions may not have been fully captured by the present design. Thus, it is important to consider these findings as an initial step, which need to be considered in light of future research and replication of the present findings.

Future Directions and Recommendations

Future research ought to cross-validate (i.e., replicate) the present findings. In particular, the relationship between attributed flexible practices, selective reporting, and attitudes toward replication. Complementary research could examine the various topics of the credibility crisis in research (i.e., replication crisis) indirectly through text-mining analysis. Wherein blog posts from researchers and media articles for general consumption are examined for topical patterns in language and meaning. Such a study has the potential to indirectly examine researchers attitudes toward replication. Additional research examining the trustworthiness of published science are needed. Preliminary findings 101 indicated that an emphasis on positive findings in top-journals drives the trustworthiness of published science down, while selecting for false-positives and fraudulent results

(Grimes, Bauch, & Ioannidis, 2018). Future research could examine the trustworthiness of science at the level of individual-researcher through institutions and society to corroborate and complement these findings. The present study was focused on attitudes and impressions toward replication at the individual-level. Future research could begin examining the specific attitudes and perceived barriers stemming from the various stakeholders (e.g., journal editors and publishers) and institutions (e.g., universities) of psychological science. Although the incentivizing structure of science has been heavily criticized, no known evidence exist that capture actual change at the institutional level.

Conclusion

No meaningful differences regarding threat toward two approaches to replication emerged. Furthermore, threat associated with either replication approach was negligible.

Despite assumed threat toward replication, the present data suggest that scholars do not feel threaten by replication. Greater perceived flexible practices on the part of others and selectively reporting of studies by others is associated with more favorable attitudes toward and support for replication research in psychology. Self-reported selective- reporting of studies was negatively associated with attitudes toward and support for replication. This is consistent with a climate of increased transparency and open practices in science. In light of these findings and the broader discussion of replication in science, it is recommended that psychological science continue and expand upon its replication efforts with greater attention understanding toward the role of stakeholders and institutions. The present findings suggest that individual-level threat is of little 102 consequence regarding replication and that increase support for replication is associated with the perceived flexible practices on the part of others. Thus, an empirical examination of attitudes toward replication research from the role of stakeholders and institutions would add another layer to the discussion and refine our understanding of where to focus efforts to promote the normalization of replicable thinking and acting in psychological science.

103

References

Abernethy, A. M., & Keel, A. L. (2016). How Researchers Respond to Replication

Requests Revisited. Journal of Advertising, 45(1), 13-18.

Agnoli, F., Wicherts, J. M., Veldkamp, C. L. S., Albiero, P., & Cubelli, R. (2017).

Questionable research practices among Italian research psychologists. Plos One,

12(3). doi:10.1371/journal.pone.0172792

APA. (2006). Evidence-based practice in psychology. American Psychologist, 61(4),

271-285.

Appelbaum, M., Cooper, H., Kline, R. B., Mayo-Wilson, E., Nezu, A. M., & Rao, S. M.

(2018). Journal article reporting standards for quantitative research in psychology:

The APA publications and communications board task force report. American

Psychologist, 73(1), 3-25. doi:https://psycnet.apa.org/doi/10.1037/amp0000389

Begley, C. G., & Ioannidis, J. P. (2015). Reproducibility in science: improving the

standard for basic and preclinical research. Circulation research, 116(1), 116-126.

Bem, D. J. (2011). Feeling the future: Experimental evidence for anomalous retroactive

influences on cognition and affect. Journal of Personality and Social Psychology,

100(3), 407-425.

Berger, P. L., & Luckmann, T. (1966). The social construction of reality: A treatise in the

sociology of knowledge: Penguin Uk.

Bruns, S. B., & Ioannidis, J. P. (2016). P-curve and p-hacking in observational

research. PloS One, 11(2), e0149144. 104

Camerer, C. F., Dreber, A., Forsell, E., Ho, T. H., Huber, J., Johannesson, M., ... & Wu,

H. (2016). Evaluating replicability of laboratory experiments in economics.

Science, 351(6280), 1433-1436.

Camerer, C. F., Dreber, A., Holzmeister, F., Ho, T. H., Huber, J., Johannesson, M., . . .

Wu, H. (2018). Evaluating the replicability of social science experiments in

Nature and Science between 2010 and 2015. Nature Human Behaviour, 2(9), 637-

644. doi:10.1038/s41562-018-0399-z

Carney, D. R., Cuddy, A. J., & Yap, A. J. (2010). Power posing: Brief nonverbal displays

affect neuroendocrine levels and risk tolerance. Psychological Science, 21(10),

1363-1368. doi:10.1177/0956797610383437

Carney, D. R., Cuddy, A. J., & Yap, A. J. (2015). Review and summary of research on

the embodied effects of expansive (vs. contractive) nonverbal displays.

Psychological Science, 26(5), 657-663. doi:10.1177/0956797614566855

Charmaz, K. (2014). Constructing grounded theory: Sage.

Cialdini, R. B., Reno, R. R., & Kallgren, C. A. (1990). A focus theory of normative

conduct: Recycling the concept of norms to reduce littering in public places.

Journal of Personality and Social Psychology, 58(6), 1015-1026.

doi:10.1037//0022-3514.58.6.1015

Cohen, J. (1962). The statistical power of abnormal-social psychological research: a

review. Journal of Abnormal and Social Psychology, 65(3), 145-153.

Costello, A.B. and Osbor, J.W. (2005). Best Practices in Exploratory Factor Analysis:

Four Recommendations for Getting the Most From Your Analysis, Practical

Assessment, Research & Evaluation, 10(7). 105

Cuddy, A. (2012). Your body language may shape who you are. Retrieved from

https://www.ted.com/talks/amy_cuddy_your_body_language_shapes_who_you_a

re

Dominus, S. (2017). When the revolution came for Amy Cuddy. Retrieved from

https://www.nytimes.com/2017/10/18/magazine/when-the-revolution-came-for-

amy-cuddy.html

Earp, B. D., & Trafimow, D. (2015). Replication, falsification, and the crisis of

confidence in social psychology. Frontiers in Psychology, 6, 1-11.

Elliott, M., & Doane, M. J. (2015). Stigma Management of Mental Illness: Effects of

Concealment, Discrimination, and Identification on Well-Being. Self and Identity,

14(6), 654-674. doi:10.1080/15298868.2015.1053518

Engber, D. (2017). The trials of Amy Cuddy: A feminist psychologist was dragged

through the mud for her mistakes. Did she deserve it? Retrieved from

http://www.slate.com/articles/health_and_science/science/2017/10/did_power_po

sing_guru_amy_cuddy_deserve_her_public_shaming.html

Fanelli, D., Costas, R., Fang, F. C., Casadevall, A., & Bik, E. M. (2017). Why do

scientists fabricate and falsify data? A matched-control analysis of papers

containing problematic image duplications. BioRxiv, 126805.

Faul, F., Erdfelder, E., Lang, A. G., & Buchner, A. (2007). G*Power 3: A flexible

statistical power analysis program for the social, behavioral, and biomedical

sciences. Behavior Research Methods, 39(2), 175-191. doi:10.3758/bf03193146

Fiedler, K., & Schwarz, N. (2016). Questionable research practices revisited. Social

Psychological and Personality Science, 7(1). doi:10.1177/1948550615612150 106

Field, A. (2012). Discovering statistics using IBM SPSS statistics (4th ed.). London: Sage.

Gall, T., Ioannidis, J. P., & Maniadis, Z. (2017). The credibility crisis in research: Can

economics tools help?. PLoS biology, 15(4), e2001846.

Gilbert, D. T., King, G., Pettigrew, S., & Wilson, T. D. (2016). Comment on “Estimating

the reproducibility of psychological science”. Science, 351(6277).

doi:10.1126/science.aad7243

Grimes, D. R., Bauch, C. T., & Ioannidis, J. P. (2018). Modelling science trustworthiness

under publish or perish pressure. Royal Society Open Science, 5(1), 171511.

Head M. L., Holman L., Lanfear, R., Kahn A. T., Jennions, M. D. (2015) The Extent and

Consequences of P-Hacking in Science. PLoS Biol 13(3): e1002106.

Heine, S. J. (2010). Cultural psychology. In S. T. Fiske, D. T. Gilbert, & G. Lindzey

(Eds.), Handbook of Social Psychology (5th ed., pp. 1423-1464). Hoboken, NJ:

Wiley.

Ioannidis, J. P. (2018). All science should inform policy and regulation. PLoS Med 15(5),

e1002575.

John, L. K., Loewenstein, G., & Prelec, D. (2012). Measuring the prevalence of

questionable research practices with incentives for truth telling. Psychological

Science, 23(5), 524-532.

Kahneman, D. (2012). Kahneman on the storm of doubts surrounding social priming

research:Open letter urges labs to replicate results to avoid a looming "train

wreak". [Blog Post]. Retrieved from

http://www.decisionsciencenews.com/2012/10/05/kahneman-on-the-storm-of-

doubts-surrounding-social-priming-research/ 107

Keselman, H. J., Algina, J., & Kowalchuk, R. K. (2001). The analysis of repeated

measures designs: A review. British Journal of Mathematical and Statistical

Psychology, 54(1), 1-20.

Keselman, H. J., Carriere, K. C., & Lix, L. M. (1993). Testing repeated measures

hypotheses when covariance matrices are heterogeneous. Journal of Educational

Statistics, 18(4), 305-319.

King, G. (2017). Replication in Social Science [Video]. Retrieved from

http://methods.sagepub.com/video/gary-king-discusses-replication-in-the-social-

sciences

Kitcher, P. (1995). The advancement of science: science without legend, objectivity

without illusions. Philosophy and Phenomenological Research, 55(3), 611-617.

doi:10.2307/2108442

Klein, R., Ratliff, K., Vianello, M., Adams Jr, R., Bahník, S., Bernstein, M., ... & Nosek,

B. A. (2014). Data from investigating variation in replicability: A “many labs”

replication project. Journal of Open Psychology Data, 2(1).

Klein, R. A., Vianello, M., Hasselman, F., Adams, B. G., Adams Jr, R. B., Alper, S., ... &

Nosek, B. A. (2018). Many Labs 2: Investigating variation in replicability across

samples and settings. Advances in Methods and Practices in Psychological

Science, 1(4), 443-490. doi: 10.1177/2515245918810225

Klein, S. B. (2014). What can recent replication failures tell us about the theoretical

commitments of psychology? Theory & Psychology, 24(3), 326-338.

doi:10.1177/0959354314529616 108

Kline, R. B. (2008). Becoming a behavioral science researcher: A guide to producing

research that matters: Guilford Press.

Kuhn, T. S. (1963). The structure of scientific revolutions (Vol. 2): University of Chicago

press Chicago.

Kuhn, T. S. (1970). Logic of discovery or psychology of research. Criticism and the

Growth of Knowledge, 1-23.

Kühberger, A., Fritz, A., & Scherndl, T. (2014). Publication bias in psychology: A

diagnosis based on the correlation between effect size and sample size. PLoS

ONE, 9(9). doi:10.1371/journal.pone.0105825

Lakatos, I. (1974). Falsification and the methodology of scientific research programmes.

In I. Lakatos & A. Musgrave (Eds.), Criticism and the growth of the knowledge

(pp. 91-196). Cambridge: Cambridge University Press.

Lishner, D. A. (2015). A concise set of core recommendations to improve the

dependability of psychological research. Review of General Psychology, 19(1),

52-68.

Lykken, D. T. (1968). Statistical significance in psychological research. Psychological

bulletin, 70(3), 151-159. doi:10.1037/h0026141

Makel, M. C., & Plucker, J. A. (2014). Facts Are More Important Than Novelty:

Replication in the Education Sciences. Educational Researcher, 43(6), 304-316.

doi:10.3102/0013189x14545513

Makel, M. C., Plucker, J. A., & Hegarty, B. (2012). Replications in Psychology Research:

How Often Do They Really Occur? Perspectives on Psychological Science, 7(6),

537-542. doi:10.1177/1745691612460688 109

Martin, G. N., & Clarke, R. M. (2017). Are psychology journals anti-replication? A

snapshot of editorial practices. Frontiers in Psychology, 8, 1-6.

Maxwell, S. E., Lau, M. Y., & Howard, G. S. (2015). Is psychology suffering from a

replication crisis? What does “failure to replicate” really mean? American

Psychologist, 70(6), 487-498. doi:10.1037/a0039400

Merton, R. K. (1968). Science and democratic social structure. In Social theory and

social structure (pp. 604-615). New York: Free Press.

Mortensen, C. R., Neel, R., Cialdini, R. B., Jaeger, C. M., Jacobson, R. P., & Ringel, M.

M. (2018). Trending norms: A lever for encouraging behaviors performed by the

minority. Social Psychological and Personality Science, 1948550617734615.

Moskowitz, G. B. (2005). Attributions. In G. B. Moskowitz (Ed.), Social Cognition:

Understanding self and others (pp. 233-266). New York: Guilford Press.

Munafò, M. R., Nosek, B. A., Bishop, D. V., Button, K. S., Chambers, C. D., Du Sert, N.

P., ... & Ioannidis, J. P. (2017). A manifesto for reproducible science. Nature

Human Behaviour, 1(1), 0021.

Nelson, L. D., Simmons, J., & Simonsohn, U. (2018). Psychology's Renaissance. Annual

Review of Psychology, 69, 511-534. doi:10.1146/annurev-psych-122216-011836

Nosek, B. A., Ebersole, C. R., DeHaven, A. C., & Mellor, D. T. (2018). The

preregistration revolution. Proceedings of the National Academy of Sciences of

the of America, 115(11), 2600-2606. doi:10.1073/pnas.1708274114

Nosek, B. A., Spies, J. R., & Motyl, M. (2012). Scientific utopia: II. Restructuring

incentives and practices to promote truth over publishability. Perspectives on

Psychological Science, 7(6), 615-631. 110

Open Sci, C. (2016). Response to comment on "estimating the reproducibility of

psychological science". Science, 351(6227), 1037c. doi:10.1126/science.aad7243

Open Sci, C. (2017). Open science framework: A scholarly commons to connect the

entire research cycle. In.

Open Sci, C., Aarts, A. A., Anderson, J. E., Anderson, C. J., Attridge, P. R., Attwood, A.,

. . . Zuni, K. (2015). Estimating the reproducibility of psychological science.

Science, 349(6251), 8. doi:10.1126/science.aac4716

Oransky, I. (2015). Retraction Watch. Retrieved from http://retractionwatch.com/the-

retraction-watch-leaderboard/

Popper, K. R. (1959). The Logic of Scientific Discovery. London: Routledge.

Popper, K. R. (1962). Conjectures and Refutations: The Growth of Scientific Knowledge.

New York: Basic Books.

Prinz, F., Schlange, T., & Asadullah, K. (2011). Believe it or not: how much can we rely

on published data on potential drug targets?. Nature reviews Drug

discovery, 10(9), 712-713.

Pronin, E., Ross, L., & Gilovich, T. (2004). Objectivity in the eye of the beholder:

Divergent perceptions of bias in self versus others. Psychological Review, 111(3),

781-799. doi:10.1037/0033-295x.111.3.781

Ranehill, E., Dreber, A., Johannesson, M., Leiberg, S., Sul, S., & Weber, R. A. (2015).

Assessing the Robustness of Power Posing: No Effect on Hormones and Risk

Tolerance in a Large Sample of Men and Women. Psychological Science, 26(5),

653-656. doi:10.1177/0956797614553946 111

Reid, L. N., Rotfeld, H. J., & Wimmer, R. D. (1982). How researchers respond to

replication requests. Journal of Consumer Research, 9(2), 216-218.

Retraction Watch Leaderboard, The. (2018, November 1). Retrieved from

https://retractionwatch.com/the-retraction-watch-leaderboard/

Reynolds, L. T. (2005). Cooley and Mead on human nature and society. In B. Berberoglu

(Ed.), An introduction to classical and contemporary social theory (3rd ed., pp.

37-45). Oxford: Rowman & Littlefield.

Ritchie, S. J., Wiseman, R., & French, C. C. (2012). Failing the future: Three

unsuccessful attempts to replicate Bem's ‘Retroactive Facilitation of

Recall’Effect. PloS one, 7(3), e33423.

Rizzo, J. R., House, R. J., & Lirtzman, S. I. (1970). Role conflict and ambiguity in

complex organizations. Administrative Science Quarterly, 15(2), 150-162.

doi:10.2307/2391486

Ross, L. (2018). From the Fundamental Attribution Error to the Truly Fundamental

Attribution Error and Beyond: My Research Journey. Perspectives on

Psychological Science, 13(6), 750-769. doi:10.1177/1745691618769855

Schein, E. H. (1990). Organizational culture. American Psychologist, 45(2), 109-119.

doi:10.1037//0003-066x.45.2.109

Schein, E. H. (1996). Culture: The missing concept in organization studies.

Administrative Science Quarterly, 41(2), 229-240. doi:10.2307/2393715

Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). Experimental and Quasi-

Experimental Designs for Generalized Causal Inference. Belmont: Wadsworth. 112

Shrout, P. E., & Rodgers, J. L. (2018). Psychology, Science, and Knowledge

Construction: Broadening Perspectives from the Replication Crisis. Annual

Review of Psychology, 69, 487-510. doi:10.1146/annurev-psych-122216-011845

Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology:

Undisclosed flexibility in data collection and analysis allows presenting anything

as significant. Psychological Science, 22(11), 1359-1366.

Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2017). Data colada. Retrieved from

http://datacolada.org/

Simmons, J. P., & Simonsohn, U. (2015). Power Posing: Reassessing the Evidence

Behind the Most Popular TED Talk.

Simmons, J. P., & Simonsohn, U. (2017). Power Posing: P-Curving the Evidence.

Psychological Science, 28(5), 687-693. doi:10.1177/0956797616658563

Simons, D. J., Holcombe, A. O., & Spellman, B. A. (2014). An introduction to registered

replication reports at perspectives on psychological science. Perspectives on

Psychological Science, 9(5), 552-555. doi:10.1177/1745691614543974

Simonsohn. (2013). Just post it: The lessons from two cases of fabricated data detected

by statistics alone. Paper presented at the Society for Personality and Social

Psychology (SPSP), New Orleans.

Simonsohn, U., Nelson, L. D., & Simmons, J. P. (2014). P-curve: a key to the file-

drawer. Journal of experimental psychology. General, 143(2).

doi:10.1037/a0033242

Smaldino, P. E., & McElreath, R. (2016). The natural selection of bad science. Royal

Society Open Science, 3(9), 1-17. doi:10.1098/rsos.160384 113

Steen, R. G., Casadevall, A., & Fang, F. C. (2013). Why has the number of scientific

retractions increased? PLoS ONE, 8(7), 1-9. doi:10.1371/journal.pone.0068397

Stryker, S., & Vryan, K. D. (2003). The symbolic interactionist frame. In J. Delamater

(Ed.), Handbook of Social Psychology (pp. 3-25). New York: Kluwer.

Trafimow, D., & Earp, B. D. (2016). Badly specified theories are not responsible for the

replication crisis in social psychology: Comment on Klein. Theory and

Psychology, 26(4). doi:10.1177/0959354316637136

Trochim, W. M. K. (2006). Positivism & Post-Positivism. Retrieved from

https://www.socialresearchmethods.net/kb/positvsm.php

Verfaellie, M., & McGwin, J. (2011). The case of Diederik Stapel: Allegations of

scientific fraud by prominent Dutch social psychologist are investigated by

multiple universities.

Vivian, J. E., & Berkowitz, N. H. (1992). Anticipated bias from an outgroup: An

attributional analysis. European Journal of Social Psychology, 22(4), 415-424.

doi:10.1002/ejsp.2420220410

Wang, J., Veugelers, R., & Stephan, P. (2017). Bias against novelty in science: A

cautionary tale for users of bibliometric indicators. Research Policy, 46(8), 1416-

1436. doi:10.1016/j.respol.2017.06.006

Washburn, A. N., Hanson, B. E., Motyl, M., Skitka, L. J., Yantis, C., Wong, K. M., . . .

Melton, Z. J. (2018). Why Do Some Psychology Researchers Resist Adopting

Proposed Reforms to Research Practices? A Description of Researchers’

Rationales. Advances in Methods and Practices in Psychological Science,

2515245918757427. 114

West, C., & Zimmerman, D. H. (1987). Doing Gender. Gender & Society, 1(2), 125-151.

doi:10.1177/0891243287001002002

Wilkinson, M. (2013). Testing the null hypothesis: The forgotten legacy of Karl Popper?

Journal of Sports Sciences, 31(9), 919-920. doi:10.1080/02640414.2012.753636

Wilcox, R. R. (2010). Fundamentals of modern statistical methods: Substantially

improving power and accuracy(2nd ed.). Los Angeles: Springer.

Wilson, K. S., & Baumann, H. M. (2015). Capturing a More Complete View of

Employees' Lives Outside of Work: The Introduction and Development of New

Interrole Conflict Constructs. Personnel Psychology, 68(2), 235-282.

doi:10.1111/peps.12080

115

TABLES

Table 1 The Frequencies of Various Research Practices: Comparing Self-Reported and Perceived Frequency of Others’ Conduct (Analysis 1) Norms Never Rarely Sometimes Often Always Emerging you others you others you others you others you others Practices Preregister hypothesis 29% 2% 14% 44% 20% 47% 25% 6% 13% 1% Make data publicly 12% 1% 20% 35% 26% 51% 24% 13% 19% 1% available Conduct formal power analysis 5% 0% 6% 24% 26% 44% 34% 30% 29% 2% Report effect- sizes 0% 0% 2% 4% 5% 25% 21% 57% 72% 13% Fraudulent

Practice Falsify data 95% 15% 1% 78% 3% 7% 1% 1% 0% 0% Flexible

Practices Claim results unaffected by demographics 79% 8% 17% 50% 3% 37% 1% 4% 0% 1% when not sure Report unexpected findings as 47% 1% 35% 11% 14% 40% 3% 46% 1% 3% predicted Selectively report studies 28% 2% 29% 8% 27% 26% 11% 56% 5% 9% that "work" Stop data collection early after found 89% 4% 8% 42% 3% 43% 1% 11% 0% 0% desired result "Round off" p- value just over 86% 11% 11% 50% 2% 30% 1% 8% 0% 0% .05 Report all study's 1% 0% 2% 10% 3% 20% 20% 61% 75% 9% conditions Collect data on multiple DVs and only report analyses for a 41% 3% 29% 15% 21% 42% 8% 37% 2% 4% subset that "work" 116

Collect more data after seeing if results 59% 0% 24% 23% 14% 56% 3% 20% 0% 2% were significant Exclude data after looking at the impact of 62% 2% 27% 33% 10% 45% 1% 20% 0% 1% doing so on the results Self-cite Self-cite published work 7% 0% 9% 2% 28% 9% 36% 54% 20% 35% Note. All reported frequencies utilized all available data despite missing data. Thus, the sample size ranges from 155 to 162 respondents.

Table 2 Descriptive Statistics for the Replication Measures by Approach (Analysis 2, n = 189) Variable Method-Oriented Effect-Oriented Mean (Std Deviation) Useful 4.77 (0.97) 4.74 (1.00) Reasonable 4.95 (0.91) 4.99 (0.94) Encourage 4.69 (1.12) 4.84 (1.12) Threatening 2.05 (1.12) 1.96 (1.08) Antagonistic 2.01 (1.11) 1.94 (0.99) Distress 2.28 (1.20) 2.18 (1.17)

Note. Each item is on a 6-point scale anchored at 6-Extreme to 1-Not at all. Higher values indicate more endorsement that the approach is more useful, reasonable, encourages one's participation, antagonistic, threatening, and distressful.

117

Table 3 Summary Statistics for Positively and Negatively Valanced Measures of Approaches to Replication (Analysis 2) partial-eta 2 F p df Variable squared (ηp )

Useful 0.001 0.220 0.639 189 Reasonable 0.002 0.129 0.554 189 Encourage 0.016 2.213 0.083 189 Threatening 0.011 2.122 0.147 189 Antagonistic 0.006 1.201 0.247 189 Distress 0.018 3.483 0.064 189 Note. Data violate assumption of equal variance (i.e. sphericity). All reported statistics are robust estimations (Greenhouse-Geisser).

Table 4 Summary Statistics Comparing Impressions Toward Replication Among Men (n = 57) and Women (n = 93) on Positively and Negatively Valenced Measures (Analysis 2) partial-eta 2 F p df Variable squared (ηp )

Useful 0.007 0.552 0.577 147 Reasonable 0.002 0.138 0.871 148 Encourage 0.004 0.333 0.717 148 Threatening 0.004 0.300 0.741 148 Antagonistic 0.009 0.707 0.495 148 Distress 0.004 0.311 0.733 148 Note. Data violate assumptions of homogeneity of covariance. All reported statistics are robust estimations (Pillai’s Trace).

118

Table 5 Summary Statistics Comparing Impressions Toward Replication Across Professional Status on Positively and Negatively Valenced Measures (Analysis 2) partial-eta 2 F p df Variable squared (ηp )

Useful 0.024 0.857 0.554 137 Reasonable 0.038 1.384 0.203 138 Encourage 0.052 1.894 0.061 138 Threatening 0.022 0.781 0.620 138 Antagonistic 0.005 0.163 0.995 138 Distress 0.015 0.544 0.823 138 Note. Data violate assumptions of homogeneity of covariance. All reported statistics are robust estimations (Pillai’s Trace). Professional status includes Full Professor (n = 27), Associate Professor (n = 24), Assistant Professor (n = 27), Post-doctoral fellow (n =23), and Graduate student (n = 42). More crude analyses dichotomizing professional status as either tenure (Full and Associate professor) or non-tenure (Assistant Professor, Post- doctoral fellow, and Graduate student) were performed. No differences between tenured and non-tenured professional emerged. Reported statistics reflect more finite analyses.

119

Table 6 Descriptive Statistics for Attributional Causes Researchers Assign to a Partial Replication Outcome of a Peer’s Work Relative to Their Own Work (Analysis 3) Partial Mean (Std Replication Causal Attribution n Deviation) of flexibility in choosing dependent variables 142 37.73 (25.22) flexibility in choosing sample size 138 35.89 (27.38) flexibility in choosing covariates 134 38.19 (26.38) flexibility in reporting experimental conditions 117 34.77 (28.05) Peer's Selective reporting 135 50.27 (30.03) research Procedural difference between original & 157 50.67 (27.30) replication studies Publication bias 143 52.27 (29.07) Sampling error 156 49.90 (27.66) Statistically underpowered original finding 146 47.47 (28.22)

flexibility in choosing dependent variables 142 29.44 (26.56) flexibility in choosing sample size 138 28.80 (28.48) flexibility in choosing covariates 134 26.36 (26.11) flexibility in reporting experimental conditions 117 20.85 (28.85) Your Selective reporting 135 29.19 (30.61) research Procedural difference between original & 157 49.94 (28.48) replication studies Publication bias 143 40.63 (32.45) Sampling error 156 49.26 (28.99) Statistically underpowered original finding 146 36.84 (31.73) Note. Each item was on a slider scale anchored at 0 to 100. Higher values indicate greater endorsement in that the attribution was a contributing causal leading to partial replication of the original finding.

120

Table 7 Summary Statistics for Attributional Causes Assigned to a Partial Replication Outcome of a Peer’s Work Relative to One’s Own Work (Analysis 3) 2 Causal Attribution partial-eta squared (ηp ) F p df

Flexibility in choosing 0.164 27.582 < 0.001 141 dependent variables Flexibility in choosing 0.082 12.236 0.001 137 sample size Flexibility in choosing 0.235 40.806 < 0.001 133 covariates Flexibility in reporting 0.284 46.066 < 0.001 116 experimental conditions Selective reporting 0.397 88.097 < 0.001 134 Procedural difference between 0.002 0.287 0.593 156 original & replication studies Publication bias 0.188 32.812 < 0.001 142 Sampling error 0.001 0.208 0.649 155 Statistically underpowered 0.146 24.695 < 0.001 145 original finding Note. Data violate the assumption of sphericity. All reported statistics are robust estimations, Greenhouse-Geisser.

121

Table 8 Summary of Factor Loadings for 18 Attribution Items Three Factors Emerged (Analysis 4) Rotated Factor Loadings Flexible Flexible Items Practices Practices Situational Publication of Others of Oneself Attributes Bias Original authors’ flexibility in 0.806 0.387 0.076 0.084 choosing among covariates Original authors’ flexibility in 0.783 0.338 0.066 0.11 choosing among dependent variables Original authors’ flexibility in 0.69 0.308 0.2 0.188 choosing sample size Statistically Underpowered Original 0.637 0.135 0.199 0.251 finding/effect Original authors’ flexibility in 0.631 0.462 0.295 0.136 reporting subsets of experimental conditions Selective Reporting (regarding a 0.628 0.279 0.111 0.507 peer’s work) Your flexibility in choosing among 0.375 0.778 0.077 0.156 covariates Your flexibility in choosing among 0.387 0.749 0.194 0.052 dependent variables Your flexibility in reporting subsets of 0.261 0.717 0.338 0.153 experimental conditions Your flexibility in choosing sample 0.381 0.715 0.301 0.077 size Selective Reporting (regarding your 0.211 0.632 0.156 0.422 work) Statistically Underpowered Original 0.282 0.448 0.384 0.303 finding/effect Procedural Differences between 0.187 0.144 0.835 0.056 Original and Replication studies (regarding a peer’s work) Procedural Differences between 0.154 0.193 0.772 0.013 Original and Replication studies (regarding your work) Sampling Error (regarding your work) 0.056 0.17 0.756 0.249 Sampling Error (regarding a peer’s 0.111 0.155 0.737 0.219 work) Publication Bias 0.392 0.075 0.187 0.726 Publication Bias 0.121 0.428 0.439 0.694 Eigenvalues 9.173 2.253 1.292 1.077 % of variance 49.36 10.872 5.586 4.51 α 0.916 0.914 0.892 122

Table 9 Summary Statistics of Variables Included in Four Regression Models Examining Attitudes Toward Replication Research in Psychology (Analysis 4, n = 144) Mean (Std Variable Deviation) Target/Dependent Measures Attitude toward replication research in psychology 4.08 (0.96) Support for replication research in psychology 4.17 (0.86) Attitude toward discussion of replication in 4.38 (1.30) psychology Impact replication research has on your future 3.79 (1.40) research practices

Independent variables Attributed flexible practices of others 37.89 (22.86) Situational attributions 50.70 (23.92) Perceived selectively reporting of studies by others 3.65 (0.81) Selectively reporting of studies with one’s own published work 2.42 (1.13) Note. Regarding the dependent variables attitudes toward and support for replication research, are measured on 5-point scales with higher values indicating more favorable attitudes and support. Attitudes toward the discussion of and how replication research will impact one’s future research practices are measured on 6-point scales with higher values indicating more favorability and impact, respectively. Regarding the independent variables: Attributed flexible practices of others is composed of 5-items each measured on a slider scale from 0-100, higher values indicate a greater endorsement of these practices in a peer’s work. Situational Attributions is composed of 4-items each measured on a slider scale from 0-100, higher values indicate greater endorsement in these factors that impact replicability. Perceived selective reporting of studies by others and oneself are both measured on a 5-point scale, with higher values indicating increased selective reporting of studies.

123

Table 10 Model Examining Attitude Toward Replication Research in Psychology (Analysis 4, Adj- R2 = 0.21) Variables b (unstd coeff.) std error β (std coeff.) p 95% CI lower upper Flexible attributed practices of others** 0.017 0.003 0.394 0.000 0.010 0.023 Situational -0.009 0.003 -0.221 0.006 -0.015 -0.003 Attributions* Perceived selectively reporting of studies 0.290 0.135 0.245 0.034 0.022 0.558 by others* Selectively reporting of studies with one's -0.185 0.091 -0.217 0.042 -0.364 -0.006 own published work* constant 3.286 0.465 2.366 4.207 Note. The assumption of homogeneity was violated. Thus, robust estimates are reported. All other assumptions of linear regression were met. ** p < 0.001 * p < 0.05

124

Table 11 Model Examining Support for Replication Research in Psychology (Analysis 4, Adj-R2 0.16) Variables b (unstd coeff.) std error β (std coeff.) p 95% CI lower upper Flexible attributed 0.013 0.003 0.349 0.000 0.007 0.019 practices of others** Situational -0.005 0.003 -0.140 0.075 -0.010 0.001 Attributions Perceived selectively reporting of studies 0.244 0.116 0.232 0.037 0.015 0.473 by others* Selectively reporting of studies with one's -0.190 0.083 -0.250 0.023 -0.354 -0.026 own published work* constant 3.499 0.400 2.709 4.289 Note. The assumption of homogeneity was violated. Thus, robust estimates are reported. All other assumptions of linear regression were met. ** p < 0.001 * p < 0.05

Table 12 Model Examining Attitude Toward Discussion of Replication in Psychology (Analysis 4, Adj-R2 = 0.12) Variables b (unstd coeff.) std error β (std coeff.) p 95% CI lower upper Flexible attributed 0.020 0.004 0.351 0.000 0.011 0.029 practices of others** Situational -0.007 0.004 -0.133 0.099 -0.016 0.001 Attributions Perceived selectively reporting of studies 0.244 0.163 0.153 0.135 -0.077 0.566 by others Selectively reporting of studies with one's -0.204 0.112 -0.178 0.071 -0.427 0.018 own published work constant 3.595 0.541 2.526 4.664 Note. The assumption of homogeneity was violated. Thus, robust estimates are reported. All other assumptions of linear regression were met. ** p < 0.001

125

Table 13 Model Examining Impact Replication Research has on Your Future Research Practices (Analysis 4, Adj-R2 = 0.03) Variables b (unstd coeff.) std error β (std coeff.) p 95% CI lower upper Flexible attributed 0.009 0.005 0.154 0.063 -0.001 0.019 practices of others Situational -0.004 0.005 -0.063 0.464 -0.014 0.006 Attributions Perceived selectively reporting of studies 0.327 0.206 0.190 0.114 -0.080 0.734 by others Selectively reporting of studies with one's -0.167 0.137 -0.134 0.226 -0.437 0.104 own published work constant 2.829 0.680 1.484 4.174 Note. The assumption of homogeneity was violated. Thus, robust estimates are reported. All other assumptions of linear regression were met.

126

FIGURES

Figure 1 Means of Replication Measures by Approaches (Analysis 2, n = 189) 6

5

4

3 Response Optoin RangeOptoin Response 2

1 Useful Reasonable Encourage Threatening Antagonistic Distress Measures

Method Oriented Effect Oriented

Note. Each item is on a 6-point scale anchored at 6-Extreme to 1-Not at all. Higher values indicate more endorsement that the approach is more useful, reasonable, encourages one's participation, antagonistic, threatening, and distressful.

127

Figure 2 Perceived Utility of Two Approaches to Systematic Replication by Percentage of Respondents (Analysis 2) 50% 45% 43%43% 40% 35%

30% 26%27% 23% 25% 21% Percent 20% 15% 10% 4% 5% 5% 5% 4% 1% 0% 0% Extremely Very Moderately Slightly Minimally Not at all Response Option

Method Oriented Effect Oriented Figure 3 Perceived Reasonability of Two Approaches to Systematic Replication by Percentage of Respondents (Analysis 2) 50% 44% 45% 43% 40%

35% 31% 30% 30%

25% 21%

Percent 19% 20% 15% 10% 4% 5% 3% 0% 2% 1% 1% 0% Extremely Very Moderately Slightly Minimally Not at all Response option

Method Oriented Effect Oriented

128

Figure 4 Perceived Encouragement to participate in Two Approaches to Systematic Replication by Percentage of Respondents (Analysis 2) 50% 45% 40% 34% 35% 32% 33%

30% 27% 26% 24% 25% Percent 20% 15% 10% 8% 5% 4% 4% 5% 2% 1% 0% Extremely Very Moderately Slightly Minimally Not at all Response option

Method Oriented Effect Oriented

Figure 5 Perceived Threat of Two Approaches to Systematic Replication by Percentage of Respondents (Analysis 2) 50%

45% 42% 40% 38% 34% 35% 33% 30% 25% Percent 20% 13% 15% 11% 12% 10% 10% 5% 3% 3% 1% 0% 0% Extremely Very Moderately Slightly Minimally Not at all Response option

Method Oriented Effect Oriented

129

Figure 6 Perceived Antagonism of Two Approaches to Systematic Replication by Percentage of Respondents (Analysis 2) 50% 45% 42% 40% 37% 38% 35% 30% 28%

25% Percent 20% 16% 14% 15% 11% 10% 9%

5% 2% 1% 0% 1% 0% Extremely Very Moderately Slightly Minimally Not at all Response option

Method Oriented Effect Oriented

Figure 7 Perceived Sense of Distress Toward Two Approaches to Systematic Replication by Percentage of Respondents (Analysis 2) 50% 45%

40% 36% 35% 32% 30% 27%27% 24%24% 25%

Percent 20% 15% 12% 10% 8% 5% 5% 3% 1% 1% 0% Extremely Very Moderately Slightly Minimally Not at all Response option

Method Oriented Effect Oriented

130

Figure 8 Attributional Causes Contributing to a Replication Outcome of a Peer’s versus One’s Published Research (Analysis 3) 100 90 80 70 60 50 51 48 50 50 49 47 50 40 36 37 40 35 35 32 29 29 26 29 30 21 20 10

0

**FlexDV

**PubBias

ProcedDiff

Sample Error Sample

**FlexCovar

**Select Report **Select

**Underpowered

**FlexExperCond **Flex Sample Size **FlexSample Dispositional Situational Peer's work Your Work

** denotes significant difference with p <= 0.001

131

APPENDIX A: SURVEY

INTRODUCTORY PAGE:

Introduction You are being invited to participate in a research study. Before you agree to be in the study, read the following. It explains why we are doing the study, and foreseeable risks and benefits.

Your participation is voluntary. All responses will be kept confidential.

Why are we doing this study? There has been an increased interest in the replicability and reproducibility of research. We think an important component is to understand the experiences and opinions researchers have toward replication.

Study length and foreseeable risks and benefits: The survey will take approximately 15-20 minutes. There are no foreseeable risks to you. Your participation, however, will enlighten the discussion of replication in psychology.

Contact Information. You may ask questions of the researcher at any time by sending an email to Ryan S. Erhart ______or by calling Clayton Peoples at ______.

You may ask about your rights as a research participant. If you have questions, concerns, or complaints about this research, you may report them (anonymously if you so choose) by calling the University of Nevada, Reno Research Integrity Office at ______.

Thank you for taking the time to participate. ______

INVOLVEMENT WITH RESEARCH QUESTIONS:

Are you involved in research? Yes/No

Have you published your research? Yes/No

How many peer-review publications do you hold, including any manuscripts under review (please enter a numeric value)? 132

METHOD-ORIENTED SCENARIO TO REPLICATION:

Assume you receive the following request to replicate your recently published study. Please read this request scenario and then answer the questions that follow it.

Scenario: Imagine that your most recent published finding has been selected as part of a replication study by another lab. The procedures the lab will follow are communicated via email as follows:

We are writing you because we plan to replicate your recently published experimental results.

Why are we doing this? There has been an increasing interest in the predictors of reproducibility of research results, and how low reproducibility may inhibit efficient accumulation of knowledge. The best way to gain a better understanding of reproducibility is to study it by trying to replicate published results.

What do we expect from you? We seek original materials from your published research, feedback on our replication design, and any additional information that will help us conduct the most effective replication possible.

We have attached a pre-replication plan/report that states the hypothesis that we have chosen from your paper, and how we will proceed with the replication study.

After we have received your initial feedback and the materials from the study we will prepare a more complete version of the replication report that we will send to you for review. ______FOLLOW-UP QUESTIONS:

In thinking about this request, how [USEFUL, REASONABLE, ANTAGONISTIC, THREATENING, DISTRESSED] do you feel it is?

Do you feel this request encourages your participation in the process?

1 = Extremely [USEFUL, REASONABLE, ENCOURAGED, ANTAGONISTIC, THREATENING, DISTRESSED], 2 = Very, 3 = Moderately, 4 = Slightly, 5 = Minimally, & 6 = Not at all [USEFUL, REASONABLE, ENCOURAGED, ANTAGONISTIC, THREATENING, DISTRESSED] 133

EFFECT-ORIENTED SCENARIO TO REPLICATION:

Assume you receive the following request to replicate your recently published study. Please read this request scenario and then answer the questions that follow it.

Scenario: Imagine that the general relationship/effect that your most recent published study speaks to has been selected as part of a replication study by another lab. The procedures the lab will follow are communicated via email as follows:

We are writing you because we plan to replicate your recently published experimental results.

Why are we doing this? There has been an increasing interest in reproducibility of research, and how low reproducibility may inhibit efficient accumulation of knowledge. The best way to gain a better understanding of reproducibility is to replicate robust hypotheses and findings.

What do we expect from you? We seek original materials from your published research, your approval of our replication design, and any additional information that will help us construct the most effective replication protocol.

We have attached a pre-replication plan/report that states the hypothesis that we have chosen from your paper, and how we plan to proceed with the replication study.

After we have received your approval and the materials from the study we will prepare a more complete version of the replication report that we will send to you for approval. ______FOLLOW-UP QUESTIONS:

In thinking about this request, how [USEFUL, REASONABLE, ANTAGONISTIC, THREATENING, DISTRESSED] do you feel it is?

Do you feel this request encourages your participation in the process?

1 = Extremely [USEFUL, REASONABLE, ENCOURAGED, ANTAGONISTIC, THREATENING, DISTRESSED], 2 = Very, 3 = Moderately, 4 = Slightly, 5 = Minimally, & 6 = Not at all [USEFUL, REASONABLE, ENCOURAGED, ANTAGONISTIC, THREATENING, DISTRESSED] 134

ATTRIBUTIONAL MEASURES OF A PARTIAL REPLICATION OUTCOME (PEER)

Partial Direct Replication of a peer's research

Direct replication is the attempt to recreate the conditions believed sufficient for obtaining a previously observed finding (i.e., finding in the same direction as the original study and a p-value < 0.05) and is the means of establishing the reproducibility of an effect with new data.

Imagine a peer's most recently published work has been selected for replication. With this definition in mind, indicate the degree to which you believe the following causes contributed to a replication attempt that partially reproduced evidence of the original effect of a peer's most recent work (e.g., finding in the same direction with a weaker effect size than the original finding/s).

To what degree do you think the outcome of a partial replication of a peer's most recently published finding can be attributed to the following? Please respond to each attribute by placing the slider on the desired value between 0-100.

ATTRIBUTIONS (all items were measured on a slider scale from 0-100): • Original authors' flexibility in choosing among dependent variables • Original authors' flexibility in choosing sample size • Original authors' flexibility in choosing among covariates • Original authors' flexibility in reporting subsets of experimental conditions • Selective Reporting (i.e., indication of only those findings and effects original author(s) deem worth reporting) • Procedural Differences between Original and Replication studies (e.g., administered online versus in-person) • Publication Bias (i.e., publishing of statistically significant findings over non-significant findings) • Sampling Error (i.e., lack of representativeness between the sample used for the original and replicating studies) • Statistically Underpowered Original finding/effect (i.e., The original study’s sample size was too small to detect a true effect)

135

ATTRIBUTIONAL MEASURES OF A PARTIAL REPLICATION OUTCOME (YOUR)

Partial Direct Replication of your research

Direct replication is the attempt to recreate the conditions believed sufficient for obtaining a previously observed finding (i.e., finding in the same direction as the original study and a p-value < 0.05) and is the means of establishing the reproducibility of an effect with new data.

Imagine your most recently published work has been selected for replication. With this definition in mind, indicate the degree to which you believe the following causes contributed to a replication attempt that partially reproduced evidence of the original effect of your most recent work (e.g., finding in the same direction with a weaker effect size than the original finding/s).

To what degree do you think the outcome of a partial replication of your most recently published finding can be attributed to the following? Please respond to each attribute by placing the slider on the desired value between 0- 100.

ATTRIBUTIONS (all items were measured on a slider scale from 0-100): • Your flexibility in choosing among dependent variables • Your flexibility in choosing sample size • Your flexibility in choosing among covariates • Your flexibility in reporting subsets of experimental conditions • Selective Reporting (i.e., indication of only those findings and effects original you deem worth reporting) • Procedural Differences between Original and Replication studies (e.g., administered online versus in-person) • Publication Bias (i.e., publishing of statistically significant findings over non-significant findings) • Sampling Error (i.e., lack of representativeness between the sample used for the original and replicating studies) • Statistically Underpowered Original finding/effect (i.e., The original study’s sample size was too small to detect a true effect)

136

IDENTITY & ROLE CONFLICT MEASURES

SOCIAL AND PERSONAL IDENTITY INSTRUCTION:

The following questions ask about your identity as a [RESEARCHER,ACADEMIC] belonging to the scientific community. Please answer each question by selecting the response option that most closely matches your identity.

My role as a member of the [scientific community, academic institution] is an important aspect of who I am.

My role [as a researcher, within academia] has little to do with how I think about myself.

My role within [science, academia] is central to my identity.

I identify with other people who are within my [area of research, academic institution].

I do not feel a bond with people who are members of [my research area, academic institution].

I feel out of place being around other people who are members of [my research area, academic institution].

ROLE CONFLICT INSTRUCTION:

The following questions ask about your role as a researcher and how that interacts with your role as an academic. Please answer each question by selecting the response option that most closely matches your experience.

The amount of time my academic institution requires makes it difficult to fulfill my research responsibilities.

My academic institution keeps me from research activities more than I would like.

I often neglect research responsibilities because of the demands of my academic institution.

The expectations of my academic institution make it difficult to spend time on research responsibilities.

I put off pursuing research interests in order to fulfill my academic responsibilities. ______1 = strongly disagree to 6 = strongly agree

137

NORMATIVE PRACTICE MEASURES

How frequently do you believe [OTHER, YOU] researchers practice the following?

• Pre-register your hypotheses • Make your data publicly available • Conduct formal power analyses • Report effect-sizes • Falsify data • Claim results are unaffected by demographic variable when you are actually not sure (or know that they are) • Report an unexpected finding as having been predicted from the start • Selectively report only studies that "work" • Stop data collection early because you found the desired result • “Round off” p-value that is just over .05 • Report all of a study's conditions • Collect data on multiple DVs and only report analyses for a subset that "work" • Collect more data after looking to see whether the results were significant • Exclude data after looking at the impact of doing so on the results • Self-cite in your published research ______1 = Never, 2 = Rarely, 3 = Sometimes, 4 = Often, 5 = Always

138

FAMILIARITY AND EXPERIENCE WITH, & ACCEPTANCE OF REPLICATION

Please answer each question below using the scale provided.

How much formal training have you received on replication research? How much replication research have you conducted? How much replication research have you participated in? ______1 = None, 2 = A little, 3 = Some, 4 = An average amount, 5 = More than average, 6 = A lot

Regarding the most recent replication attempt you've participated in, was the outcome able to reproduce evidence for the original finding and/or effect? [Fully, Partially, Not at all]

Has any of your published work been selected/targeted for replication by another researcher? [Yes/No]

What was the outcome of the most recent replication of your work? [Full replication, Partial replication, Failed replication]

Was the most recent replication attempt of your work published? [Yes/No] ACCEPTANCE INSTRUCTIONS

The following questions ask about your perception of the acceptance of replication research. Please answer each question by selecting the response option that most closely matches your perception.

[Academic journals, Publishers, Peer Reviewers, Journal editors, Academic institutions, field of psychology] in psychology need to be more accepting of replication research. (1 = strongly disagree to 6 = strongly agree)

139

ATTITUDES TOWARD REPLICATION

What is your general opinion toward replication research in psychology?

How supportive are you of replication research in psychology? (reverse coded) ______1 = Very [Favorable, Supportive] to 5 = Very Un[Favorable, Supportive]

How confident are you that the majority of findings in the psychology will replicate? (reverse coded) ______1 = Not at all Confident, 2 = Slightly Confident, 3 = Moderately Confident, 4 = Very Confident, 5 = Completely Confident

What is your opinion toward the discussion of replication in psychology? ______1 = Very little favor, 2 = Unfavorable, 3 = Slight, 4 = Moderate, 5 = Favorable, 6 =Very Favorable

How much does replication research in psychology impact your future research practices? ______1 = Not at all, 2 = Some, 3 = An average amount, 4 = More than an average amount, 5 = A lot, 6 = Completely

What percentage of published research in psychology do you believe is false (i.e., false-discovery/false-positive)? (slider scale 0-100)

What percentage of published research in psychology do you believe is true (i.e., accurate-discovery/actual effect)? (slider scale 0-100)

140

DEMOGRAPHIC QUESTIONS

What is your gender identity?

What is your age?

What percentage of your job is committed to research? (slider scale 0-100)

What is your professional status? [Full Professor, Associate Professor, Assistant Professor, Non-academic Ph.D, Adjunct Professor/Lecturer/Research Associate, Post-doctoral fellow, Graduate student, Undergraduate research assistant]

What are your areas of research (select all that apply)? (33 areas were listed with an ‘other, please specify option)

What is your country of origin?

How do you identify yourself? [African American/Black American, Asian American, Hispanic/Latin American, Middle Eastern American, Native American/American Indian, Pacific Islander, White American/Caucasian, Mixed/Multiple Ethnic Groups, Other (please specify):, Prefer not to answer]