How Often Random Assignment Fails 1

HOW OFTEN RANDOM ASSIGNMENT FAILS 1 How often does random assignment fail? Estimates and recommendations Matthew H. Goldberg Yale Program on Climate Change Communication Yale University This article is now published in the Journal of Environmental Psychology. Please cite as: Goldberg, M. H. (2019). How often does random assignment fail? Estimates and recommendations. Journal of Environmental Psychology, doi:10.1016/j.jenvp.2019.101351 HOW OFTEN RANDOM ASSIGNMENT FAILS 2 Abstract A fundamental goal of the scientific process is to make causal inferences. Random assignment to experimental conditions has been taken to be a gold-standard technique for establishing causality. Despite this, it is unclear how often random assignment fails to eliminate non-trivial differences between experimental conditions. Further, it is unknown to what extent larger sample sizes mitigates this issue. Chance differences between experimental conditions may be especially important when investigating topics that are highly sample-dependent, such as climate change and other politicized issues. Three studies examine simulated data (Study 1), three real datasets from original environmental psychology experiments (Study 2), and one nationally-representative dataset (Study 3) and find that differences between conditions that remain after random assignment are surprisingly common for sample sizes typical of social psychological scientific experiments. Methods and practices for identifying and mitigating such differences are discussed, and point to implications that are especially relevant to experiments in social and environmental psychology. Keywords: random assignment; randomization; confounding; validity HOW OFTEN RANDOM ASSIGNMENT FAILS 3 How often does random assignment fail? Estimates and recommendations How do we best communicate the threat of climate change? Does this education program improve science literacy? Answering questions like these requires causal inference. The most effective method that enables causal inference is random assignment to conditions (Bloom, 2006; Fisher, 1925; Fisher, 1937; Gerber & Green, 2008; Rubin, 1974; Shadish, Cook, & Campbell, 2002). It is well known that random assignment lends greater confidence to causal inferences as sample size gets larger (e.g., Bloom, 2006). However, at commonly used sample sizes in psychological science, it is unclear how often random assignment fails to mitigate differences between conditions that might explain study results. Additionally, even given larger sample sizes, it is unknown how much larger is large enough (Deaton & Cartwright, 2018). The aim of this article is to answer these questions using both simulated and real participant data. Causality Before answering this question, first it is necessary to define causality and articulate a theoretical framework for it. A cause is “that which gives rise to any action, phenomenon, or condition” (Oxford English Dictionary, 2019). Or, in more statistical terms, “causal effects are defined as comparisons of potential outcomes under different treatments on a common set of units” (Rubin, 2005, p. 322). There are several frameworks through which scholars understand causality in scientific research, but one of the most prominent is the Rubin Causal Model (Rubin, 1974). The model emphasizes what some scholars call the Fundamental Problem of Causal Inference (e.g., Holland, 1986): it is impossible to observe the effect of two different treatments on the same participant. Thus, a causal effect is conceptualized as the difference between potential outcomes, HOW OFTEN RANDOM ASSIGNMENT FAILS 4 where individual participants could have been assigned to either the treatment or control condition. In this sense, the average causal effect indicates how much the outcome would have changed had the sample been treated (versus not treated). Put simply, although we cannot observe treatment effects for individuals, we can observe the average treatment effect across a sample (Deaton & Cartwright, 2018). This framework makes two core assumptions: excludability and non-interference (see Gerber & Green, 2012, pp. 39-45). Excludability is the assumption that the treatment is the sole causal effect on the outcome. Non-interference is the assumption that treatment versus control status of any individual participant is not affected by the status of another participant. Put simply, “a causal relationship exists if (1) the cause preceded the effect, (2) the cause was related to the effect, and (3) we can find no plausible alternative explanation for the effect other than the cause” (Shadish et al., 2002, p. 6). The first criterion is easily achieved in an experiment by design. The second criterion is easily achieved via data analysis. However, the third criterion is more challenging to meet, as there are essentially infinite potential alternative explanations (i.e., confounds) for any given study’s results, thereby potentially jeopardizing the excludability assumption (Gerber & Green, 2012). To address the issue of confounding, researchers aim to ensure experimental groups are equal in all respects except for the independent variable (Fisher, 1937; Gerber & Green, 2008; Holland, 1986; Pearl, 2009; Rubin, 1974; Shadish et al., 2002). If experimental conditions are equal on all characteristics except for the independent variable, then only the independent variable can be responsible for differences observed between conditions (Gerber & Green, 2008; Holland, 1986; Shadish et al., 2002). HOW OFTEN RANDOM ASSIGNMENT FAILS 5 Fisher (1937) noted the difficulty of creating equal groups: “it would be impossible to present an exhaustive list of such possible differences appropriate to any one kind of experiment, because the uncontrolled causes which may influence the result are always strictly innumerable” (p. 21). To address this issue, Fisher and his contemporaries developed random assignment, which ensures that pre-treatment differences are independent of the treatment condition assigned. Random Assignment and Causality R. A. Fisher (1925; 1937) developed the foundational concepts of random assignment as a means to aid causal inference. In the context of agricultural research, he developed random assignment and defined it as “using a means which shall ensure that each variety has an equal chance of being test on any particular plot of ground” (Fisher, 1937, p. 56). In the language of social science research, random assignment to conditions is when a random process (e.g., a random number generator, the flip of a coin, choosing from a shuffled deck of cards) is used to assign participants to experimental conditions, giving all participants an equal chance of being assigned to either condition. Fisher (1937; p. 23) advocated for the use of random assignment to experimental conditions as a method for mitigating the threat to an experiment’s internal validity: “…with satisfactory randomisation, its validity is, indeed, wholly unimpaired” (for a historical account of Fisher’s advocacy for randomization, see Hall, 2007). Since Fisher’s writing, random assignment has been shown to be best-practice of experimental design and causal inference (e.g., Shadish, et al., 2002). For example, in one of the most well-cited texts on causal inference, Shadish and colleagues (2002, p. 248) explain that random assignment is effective because it “ensures that alternative causes are not confounded with a unit’s treatment condition” and “it reduces the plausibility of threats to validity by distributing them randomly over conditions.” In other words, HOW OFTEN RANDOM ASSIGNMENT FAILS 6 because alternative causes are randomly distributed across conditions, they become perfectly balanced as sample size approaches infinity (Geber & Green, 2008; Shadish et al., 2002). Compared to other methods of equating experimental conditions (e.g., matching) a crucial strength of random assignment is that it balances conditions on known and unknown variables (Geber & Green, 2008; Shadish et al., 2002). Other methods, such as matching, may equate groups on variables that may be related to the independent and dependent variables, but threats to validity still remain because experimental groups may still systematically differ on unmeasured variables. This is not a problem for random assignment because it renders the assignment of experimental conditions independent of all other variables in the study. Random Assignment and Sample Size It is well known that larger sample sizes reduce the probability that random assignment will result in conditions that are unequal (e.g., Bloom, 2006; Shadish et al., 2002). That is, as sample size increases, differences within groups increases, but differences between groups decreases (Rose, 2001)—making it less likely that a variable other than the experimental manipulation will explain the results. Beyond the fact that larger samples are less likely to result in chance differences between conditions, it is unclear how large is large enough. As Deaton and Cartwright (2018) aptly noted, “Statements about large samples guaranteeing balance are not useful without guidelines about how large is large enough, and such statements cannot be made without knowledge of other causes and how they affect outcomes” (p. 6). In the present study, instead of comparing other methods to the standard of random assignment (e.g., Shadish, Clark, & Steiner, 2008), the performance of random assignment itself is put to the test—asking how often random assignment fails to eliminate key differences HOW OFTEN RANDOM

Load more