Reproducibility 1 Reproducibility in Clinical Psychology Christopher J
Total Page:16
File Type:pdf, Size:1020Kb
Reproducibility 1 Reproducibility in Clinical Psychology Christopher J. Hopwood and Simine Vazire University of California, Davis For: A.G.C. Wright and M.N. Hallquist Handbook of Research Methods in Clinical Psychology Reproducibility 2 There is a long history of invalid ideas in clinical psychology, many of which have had profound negative effects on public health and individual lives. “Refrigerator mothers” were once blamed for autism (Greydnaus & Toledo-Pereya, 2012). Leeching (De Young, 2015), animal magnetism (Ellenberger, 1970), and the orgone box (Isaacs, 1999) were once proposed to treat mental illness. In each of these cases and many others, invalid ideas were corrected by scientific research, to the benefit of the public interest. The ability to identify and correct bad ideas about etiology and intervention is the primary virtue of a scientific approach to clinical psychology. Although the causes of psychopathology or the best way to prevent and treat problems in living remain poorly understood, there has been clear progress in the field that can be attributed directly to the scientific method. Invalid and harmful theories are not isolated to the distant past. In 1998, Wakefield and colleagues published a paper in a prestigious journal ostensibly linking the measles, mumphs, and rubella (MMR) vaccine to autism. Although it was based on a small sample of only 12 children, the finding was highly publicized. Even though a number of epidemiological studies found no association between the MMR vaccine and autism, rates of MMR vaccinations decreased and rates of measles, mumphs, and rubella increased in the United Kingdom following the publication of the study (McIntyre & Leask, 2008). After journalists uncovered multiple conflicts of interest, questionable research practices, and several ethical violations, the journal retracted the paper. Wakefield had apparently reported fraudulent data after having been paid to “find” a link between the MMR vaccine and autism (Godlee, 2011). He was found guilty of malpractice and lost his medical license in the United Kingdom. Since the time of the Wakefield et al. study there has been a crescendo of high profile false positives in the psychology literature (e.g., Klein et al., 2014). The issue has been widely discussed in the scientific literature (Baker, 2016; Nosek, 2012), the blogosphere (e.g., Gelman, 2016; Srivastava, 2016; Vazire, 2016), and the popular press (Aschwanden, 2015; Belluz, 2015; Engber, 2016; Yong, 2016). This is a serious problem. Faulty science in clinical psychology negatively affects patients and the public Reproducibility 3 because unhelpful or harmful practices are disseminated to ill effect and because persistent reports of invalid findings in the popular press can erode public trust in the scientific method. But it is also a fortuitous opportunity to improve clinical psychological research (Munafo et al., 2017; Tackett et al., in press). An upshot of recent discoveries of invalid findings has been a movement dedicated to teaching and disseminating more rigorous scientific methods (e.g., Open Science Collaboration, 2015). A central focus of this movement has involved investigating the reproducibility of reported effects. Genuine effects should be reproducible, and effects that cannot be reproduced should generally not be taken to be true. The purposes of this chapter are to describe the importance of reproducibility in the scientific method, review recent issues in the social sciences that contributed to the recognition of reproducibility problems, and describe best practices for conducting reproducible research in clinical psychology. Foundations of the Scientific Method The scientific method is one way of explaining phenomena. It can be contrasted with other methods, such as explanation via tradition or metaphysics, by a few foundational principles. In this section, we briefly review the distinguishing principles of the scientific method, with a focus on reproducibility. Observations and Explanations For an explanation of a phenomenon to be convincing in a scientific sense, it must explain and predict observations (Hempel & Oppenheim, 1948). Thus, science is fundamentally about the link between observations in nature and explanations about why those observations occurred. When explanations predict the same observations more than once they are increasingly convincing, and when they can predict similar kinds of observations across different contexts and situations, they become increasingly general. Reproducibility 4 Unlike in most other approaches to knowing, science rests on the principle of falsification (Popper, 1950), or the idea that scientists try to prove their explanations wrong rather than trying to prove them right. It follows from the idea of falsification that observations that are consistent with an explanation add confidence for that explanation but do not prove that it is true, whereas observations that are inconsistent with an explanation indicate that the explanation is at least partly inaccurate. This makes science really difficult, because our interest generally lies in proving something to be true rather than incrementally increasing our confidence that our explanation probably isn’t wrong. And as a general rule, it is easy to fool ourselves, particularly when we are motivated to see things a certain way (Feynman & Leighton, 1988). Replication This is why replication is so important in science. Replication means making the same explanation-relevant observations more than once in order to test the validity of the explanation. There are different levels of replication, ranging from completely direct to completely conceptual or constructive (Lykken, 1968). In the most direct replication, an observation would be sampled twice by the same person under maximally similar conditions. A replication becomes less direct when the same observations are made under highly similar conditions but by different people. This increases confidence in the observation. For instance, if you told your friend that you had blown up a plastic bottle by pouring vinegar and baking soda into it, she might be skeptical. If she saw it on video, her confidence would increase. If she did it herself, she would probably be basically convinced. Conceptual replications push the boundaries of the explanation by changing the conditions of the experiment. If you knew that mixing vinegar (an acid) and baking soda (a base) together creates a reaction that produces carbon dioxide, and you knew that carbon dioxide is less dense than regular air, it would follow that mixing vinegar and baking soda together in a plastic bottle would expand the bottle until it burst. Other things would follow, too. For instance, this should also work in closed spaces other Reproducibility 5 than a plastic bottle, and it should also work with other acid-base combinations. The underlying explanation for why the bottle blew up when vinegar was added to baking soda provides a basis for setting up conceptual replications that could test that explanation further and evaluate the boundary conditions of the effect. To use a more clinical example, it is probable that an astute physician observed many years ago that psychotic patients had an unusually high incidence of family members with mental health problems. Having shared this observation with his colleagues, others may have made similar observations. Eventually, early psychiatrists tested the concordance rates of family members in a controlled manner and confirmed an association, even across multiple ways of assessing psychotic conditions and evaluating family concordance (e.g., Jelliffe, 1911). This important set of observations could be used to support multiple possible explanations, including those related to heritability, the problematic “refrigerator mother” hypothesis, and others. These explanations were eventually tested as well, leading to the contemporary understanding that the increased incidence of psychotic symptoms among family members is most likely genetically mediated (Gottesman & Shields, 1973). While the etiology of psychotic phenomena remains poorly understood, replicated research on familial concordance that can be explained by nature to a greater degree than nurture advances our understanding significantly. From a scientific perspective, this kind of replicated evidence is always more convincing than any well-crafted argument untethered to replicated observations. Both direct and conceptual replication are important, for different reasons. Conceptual replications are a riskier test of a theory, extrapolating from past findings to make new predictions (e.g., predicting the effect in a new setting, population, or using a different measure). However, ifa conceptual replication fails, this result is ambiguous. It could mean that the original theory was wrong, and even the early successful studies should be discounted as evidence for the theory, or it could mean that the researcher extrapolated one step too far, but the earlier successful studies should still be taken Reproducibility 6 as strong evidence in support of the theory. Direct replications are useful because they eliminate this ambiguity. If all attempts to directly replicate a finding fail, this suggests that the original finding should not be taken as evidence for the theory, and that the theory is likely wrong. Thus, from a falsification perspective, direct replications are stronger tests because they expose the researcher to the risk of being forced to abandon their theory.