How Often Random Assignment Fails 1

Total Page:16

File Type:pdf, Size:1020Kb

How Often Random Assignment Fails 1 HOW OFTEN RANDOM ASSIGNMENT FAILS 1 How often does random assignment fail? Estimates and recommendations Matthew H. Goldberg Yale Program on Climate Change Communication Yale University This article is now published in the Journal of Environmental Psychology. Please cite as: Goldberg, M. H. (2019). How often does random assignment fail? Estimates and recommendations. Journal of Environmental Psychology, doi:10.1016/j.jenvp.2019.101351 HOW OFTEN RANDOM ASSIGNMENT FAILS 2 Abstract A fundamental goal of the scientific process is to make causal inferences. Random assignment to experimental conditions has been taken to be a gold-standard technique for establishing causality. Despite this, it is unclear how often random assignment fails to eliminate non-trivial differences between experimental conditions. Further, it is unknown to what extent larger sample sizes mitigates this issue. Chance differences between experimental conditions may be especially important when investigating topics that are highly sample-dependent, such as climate change and other politicized issues. Three studies examine simulated data (Study 1), three real datasets from original environmental psychology experiments (Study 2), and one nationally-representative dataset (Study 3) and find that differences between conditions that remain after random assignment are surprisingly common for sample sizes typical of social psychological scientific experiments. Methods and practices for identifying and mitigating such differences are discussed, and point to implications that are especially relevant to experiments in social and environmental psychology. Keywords: random assignment; randomization; confounding; validity HOW OFTEN RANDOM ASSIGNMENT FAILS 3 How often does random assignment fail? Estimates and recommendations How do we best communicate the threat of climate change? Does this education program improve science literacy? Answering questions like these requires causal inference. The most effective method that enables causal inference is random assignment to conditions (Bloom, 2006; Fisher, 1925; Fisher, 1937; Gerber & Green, 2008; Rubin, 1974; Shadish, Cook, & Campbell, 2002). It is well known that random assignment lends greater confidence to causal inferences as sample size gets larger (e.g., Bloom, 2006). However, at commonly used sample sizes in psychological science, it is unclear how often random assignment fails to mitigate differences between conditions that might explain study results. Additionally, even given larger sample sizes, it is unknown how much larger is large enough (Deaton & Cartwright, 2018). The aim of this article is to answer these questions using both simulated and real participant data. Causality Before answering this question, first it is necessary to define causality and articulate a theoretical framework for it. A cause is “that which gives rise to any action, phenomenon, or condition” (Oxford English Dictionary, 2019). Or, in more statistical terms, “causal effects are defined as comparisons of potential outcomes under different treatments on a common set of units” (Rubin, 2005, p. 322). There are several frameworks through which scholars understand causality in scientific research, but one of the most prominent is the Rubin Causal Model (Rubin, 1974). The model emphasizes what some scholars call the Fundamental Problem of Causal Inference (e.g., Holland, 1986): it is impossible to observe the effect of two different treatments on the same participant. Thus, a causal effect is conceptualized as the difference between potential outcomes, HOW OFTEN RANDOM ASSIGNMENT FAILS 4 where individual participants could have been assigned to either the treatment or control condition. In this sense, the average causal effect indicates how much the outcome would have changed had the sample been treated (versus not treated). Put simply, although we cannot observe treatment effects for individuals, we can observe the average treatment effect across a sample (Deaton & Cartwright, 2018). This framework makes two core assumptions: excludability and non-interference (see Gerber & Green, 2012, pp. 39-45). Excludability is the assumption that the treatment is the sole causal effect on the outcome. Non-interference is the assumption that treatment versus control status of any individual participant is not affected by the status of another participant. Put simply, “a causal relationship exists if (1) the cause preceded the effect, (2) the cause was related to the effect, and (3) we can find no plausible alternative explanation for the effect other than the cause” (Shadish et al., 2002, p. 6). The first criterion is easily achieved in an experiment by design. The second criterion is easily achieved via data analysis. However, the third criterion is more challenging to meet, as there are essentially infinite potential alternative explanations (i.e., confounds) for any given study’s results, thereby potentially jeopardizing the excludability assumption (Gerber & Green, 2012). To address the issue of confounding, researchers aim to ensure experimental groups are equal in all respects except for the independent variable (Fisher, 1937; Gerber & Green, 2008; Holland, 1986; Pearl, 2009; Rubin, 1974; Shadish et al., 2002). If experimental conditions are equal on all characteristics except for the independent variable, then only the independent variable can be responsible for differences observed between conditions (Gerber & Green, 2008; Holland, 1986; Shadish et al., 2002). HOW OFTEN RANDOM ASSIGNMENT FAILS 5 Fisher (1937) noted the difficulty of creating equal groups: “it would be impossible to present an exhaustive list of such possible differences appropriate to any one kind of experiment, because the uncontrolled causes which may influence the result are always strictly innumerable” (p. 21). To address this issue, Fisher and his contemporaries developed random assignment, which ensures that pre-treatment differences are independent of the treatment condition assigned. Random Assignment and Causality R. A. Fisher (1925; 1937) developed the foundational concepts of random assignment as a means to aid causal inference. In the context of agricultural research, he developed random assignment and defined it as “using a means which shall ensure that each variety has an equal chance of being test on any particular plot of ground” (Fisher, 1937, p. 56). In the language of social science research, random assignment to conditions is when a random process (e.g., a random number generator, the flip of a coin, choosing from a shuffled deck of cards) is used to assign participants to experimental conditions, giving all participants an equal chance of being assigned to either condition. Fisher (1937; p. 23) advocated for the use of random assignment to experimental conditions as a method for mitigating the threat to an experiment’s internal validity: “…with satisfactory randomisation, its validity is, indeed, wholly unimpaired” (for a historical account of Fisher’s advocacy for randomization, see Hall, 2007). Since Fisher’s writing, random assignment has been shown to be best-practice of experimental design and causal inference (e.g., Shadish, et al., 2002). For example, in one of the most well-cited texts on causal inference, Shadish and colleagues (2002, p. 248) explain that random assignment is effective because it “ensures that alternative causes are not confounded with a unit’s treatment condition” and “it reduces the plausibility of threats to validity by distributing them randomly over conditions.” In other words, HOW OFTEN RANDOM ASSIGNMENT FAILS 6 because alternative causes are randomly distributed across conditions, they become perfectly balanced as sample size approaches infinity (Geber & Green, 2008; Shadish et al., 2002). Compared to other methods of equating experimental conditions (e.g., matching) a crucial strength of random assignment is that it balances conditions on known and unknown variables (Geber & Green, 2008; Shadish et al., 2002). Other methods, such as matching, may equate groups on variables that may be related to the independent and dependent variables, but threats to validity still remain because experimental groups may still systematically differ on unmeasured variables. This is not a problem for random assignment because it renders the assignment of experimental conditions independent of all other variables in the study. Random Assignment and Sample Size It is well known that larger sample sizes reduce the probability that random assignment will result in conditions that are unequal (e.g., Bloom, 2006; Shadish et al., 2002). That is, as sample size increases, differences within groups increases, but differences between groups decreases (Rose, 2001)—making it less likely that a variable other than the experimental manipulation will explain the results. Beyond the fact that larger samples are less likely to result in chance differences between conditions, it is unclear how large is large enough. As Deaton and Cartwright (2018) aptly noted, “Statements about large samples guaranteeing balance are not useful without guidelines about how large is large enough, and such statements cannot be made without knowledge of other causes and how they affect outcomes” (p. 6). In the present study, instead of comparing other methods to the standard of random assignment (e.g., Shadish, Clark, & Steiner, 2008), the performance of random assignment itself is put to the test—asking how often random assignment fails to eliminate key differences HOW OFTEN RANDOM
Recommended publications
  • Data Collection: Randomized Experiments
    9/2/15 STAT 250 Dr. Kari Lock Morgan Knee Surgery for Arthritis Researchers conducted a study on the effectiveness of a knee surgery to cure arthritis. Collecting Data: It was randomly determined whether people got Randomized Experiments the knee surgery. Everyone who underwent the surgery reported feeling less pain. SECTION 1.3 Is this evidence that the surgery causes a • Control/comparison group decrease in pain? • Clinical trials • Placebo Effect (a) Yes • Blinding • Crossover studies / Matched pairs (b) No Statistics: Unlocking the Power of Data Lock5 Statistics: Unlocking the Power of Data Lock5 Control Group Clinical Trials Clinical trials are randomized experiments When determining whether a treatment is dealing with medicine or medical interventions, effective, it is important to have a comparison conducted on human subjects group, known as the control group Clinical trials require additional aspects, beyond just randomization to treatment groups: All randomized experiments need a control or ¡ Placebo comparison group (could be two different ¡ Double-blind treatments) Statistics: Unlocking the Power of Data Lock5 Statistics: Unlocking the Power of Data Lock5 Placebo Effect Study on Placebos Often, people will experience the effect they think they should be experiencing, even if they aren’t actually Blue pills are better than yellow pills receiving the treatment. This is known as the placebo effect. Red pills are better than blue pills Example: Eurotrip 2 pills are better than 1 pill One study estimated that 75% of the
    [Show full text]
  • Chapter 4: Fisher's Exact Test in Completely Randomized Experiments
    1 Chapter 4: Fisher’s Exact Test in Completely Randomized Experiments Fisher (1925, 1926) was concerned with testing hypotheses regarding the effect of treat- ments. Specifically, he focused on testing sharp null hypotheses, that is, null hypotheses under which all potential outcomes are known exactly. Under such null hypotheses all un- known quantities in Table 4 in Chapter 1 are known–there are no missing data anymore. As we shall see, this implies that we can figure out the distribution of any statistic generated by the randomization. Fisher’s great insight concerns the value of the physical randomization of the treatments for inference. Fisher’s classic example is that of the tea-drinking lady: “A lady declares that by tasting a cup of tea made with milk she can discriminate whether the milk or the tea infusion was first added to the cup. ... Our experi- ment consists in mixing eight cups of tea, four in one way and four in the other, and presenting them to the subject in random order. ... Her task is to divide the cups into two sets of 4, agreeing, if possible, with the treatments received. ... The element in the experimental procedure which contains the essential safeguard is that the two modifications of the test beverage are to be prepared “in random order.” This is in fact the only point in the experimental procedure in which the laws of chance, which are to be in exclusive control of our frequency distribution, have been explicitly introduced. ... it may be said that the simple precaution of randomisation will suffice to guarantee the validity of the test of significance, by which the result of the experiment is to be judged.” The approach is clear: an experiment is designed to evaluate the lady’s claim to be able to discriminate wether the milk or tea was first poured into the cup.
    [Show full text]
  • Survey Experiments
    IU Workshop in Methods – 2019 Survey Experiments Testing Causality in Diverse Samples Trenton D. Mize Department of Sociology & Advanced Methodologies (AMAP) Purdue University Survey Experiments Page 1 Survey Experiments Page 2 Contents INTRODUCTION ............................................................................................................................................................................ 8 Overview .............................................................................................................................................................................. 8 What is a survey experiment? .................................................................................................................................... 9 What is an experiment?.............................................................................................................................................. 10 Independent and dependent variables ................................................................................................................. 11 Experimental Conditions ............................................................................................................................................. 12 WHY CONDUCT A SURVEY EXPERIMENT? ........................................................................................................................... 13 Internal, external, and construct validity ..........................................................................................................
    [Show full text]
  • Analysis of Variance and Analysis of Variance and Design of Experiments of Experiments-I
    Analysis of Variance and Design of Experimentseriments--II MODULE ––IVIV LECTURE - 19 EXPERIMENTAL DESIGNS AND THEIR ANALYSIS Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur 2 Design of experiment means how to design an experiment in the sense that how the observations or measurements should be obtained to answer a qqyuery inavalid, efficient and economical way. The desigggning of experiment and the analysis of obtained data are inseparable. If the experiment is designed properly keeping in mind the question, then the data generated is valid and proper analysis of data provides the valid statistical inferences. If the experiment is not well designed, the validity of the statistical inferences is questionable and may be invalid. It is important to understand first the basic terminologies used in the experimental design. Experimental unit For conducting an experiment, the experimental material is divided into smaller parts and each part is referred to as experimental unit. The experimental unit is randomly assigned to a treatment. The phrase “randomly assigned” is very important in this definition. Experiment A way of getting an answer to a question which the experimenter wants to know. Treatment Different objects or procedures which are to be compared in an experiment are called treatments. Sampling unit The object that is measured in an experiment is called the sampling unit. This may be different from the experimental unit. 3 Factor A factor is a variable defining a categorization. A factor can be fixed or random in nature. • A factor is termed as fixed factor if all the levels of interest are included in the experiment.
    [Show full text]
  • The Politics of Random Assignment: Implementing Studies and Impacting Policy
    The Politics of Random Assignment: Implementing Studies and Impacting Policy Judith M. Gueron Manpower Demonstration Research Corporation (MDRC) As the only nonacademic presenting a paper at this conference, I see it as my charge to focus on the challenge of implementing random assignment in the field. I will not spend time arguing for the methodological strengths of social experiments or advocating for more such field trials. Others have done so eloquently.1 But I will make my biases clear. For 25 years, I and many of my MDRC colleagues have fought to implement random assignment in diverse arenas and to show that this approach is feasible, ethical, uniquely convincing, and superior for answering certain questions. Our organization is widely credited with being one of the pioneers of this approach, and through its use producing results that are trusted across the political spectrum and that have made a powerful difference in social policy and research practice. So, I am a believer, but not, I hope, a blind one. I do not think that random assignment is a panacea or that it can address all the critical policy questions, or substitute for other types of analysis, or is always appropriate. But I do believe that it offers unique power in answering the “Does it make a difference?” question. With random assignment, you can know something with much greater certainty and, as a result, can more confidently separate fact from advocacy. This paper focuses on implementing experiments. In laying out the ingredients of success, I argue that creative and flexible research design skills are essential, but that just as important are operational and political skills, applied both to marketing the experiment in the first place and to helping interpret and promote its findings down the line.
    [Show full text]
  • The Theory of the Design of Experiments
    The Theory of the Design of Experiments D.R. COX Honorary Fellow Nuffield College Oxford, UK AND N. REID Professor of Statistics University of Toronto, Canada CHAPMAN & HALL/CRC Boca Raton London New York Washington, D.C. C195X/disclaimer Page 1 Friday, April 28, 2000 10:59 AM Library of Congress Cataloging-in-Publication Data Cox, D. R. (David Roxbee) The theory of the design of experiments / D. R. Cox, N. Reid. p. cm. — (Monographs on statistics and applied probability ; 86) Includes bibliographical references and index. ISBN 1-58488-195-X (alk. paper) 1. Experimental design. I. Reid, N. II.Title. III. Series. QA279 .C73 2000 001.4 '34 —dc21 00-029529 CIP This book contains information obtained from authentic and highly regarded sources. Reprinted material is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials or for the consequences of their use. Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, microfilming, and recording, or by any information storage or retrieval system, without prior permission in writing from the publisher. The consent of CRC Press LLC does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specific permission must be obtained in writing from CRC Press LLC for such copying. Direct all inquiries to CRC Press LLC, 2000 N.W.
    [Show full text]
  • Key Items to Get Right When Conducting Randomized Controlled Trials of Social Programs
    Key Items to Get Right When Conducting Randomized Controlled Trials of Social Programs February 2016 This publication was produced by the Evidence-Based Policy team of the Laura and John Arnold Foundation (now Arnold Ventures). This publication is in the public domain. Authorization to reproduce it in whole or in part for educational purposes is granted. We welcome comments and suggestions on this document ([email protected]). 1 Purpose This is a checklist of key items to get right when conducting a randomized controlled trial (RCT) to evaluate a social program or practice. The checklist is designed to be a practical resource for researchers and sponsors of research. It describes items that are critical to the success of an RCT in producing valid findings about a social program’s effectiveness. This document is limited to key items, and does not address all contingencies that may affect a study’s success.1 Items in this checklist are categorized according to the following phases of an RCT: 1. Planning the study; 2. Carrying out random assignment; 3. Measuring outcomes for the study sample; and 4. Analyzing the study results. 1. Key items to get right in planning the study Choose (i) the program to be evaluated, (ii) target population for the study, and (iii) key outcomes to be measured. These should include, wherever possible, ultimate outcomes of policy importance. As illustrative examples: . An RCT of a pregnancy prevention program preferably should measure outcomes such as pregnancies or births, and not just intermediate outcomes such as condom use. An RCT of a remedial reading program preferably should measure outcomes such as reading comprehension, and not just participants’ ability to sound out words.
    [Show full text]
  • Chapter 5 Experiments, Good And
    Chapter 5 Experiments, Good and Bad Point of both observational studies and designed experiments is to identify variable or set of variables, called explanatory variables, which are thought to predict outcome or response variable. Confounding between explanatory variables occurs when two or more explanatory variables are not separated and so it is not clear how much each explanatory variable contributes in prediction of response variable. Lurking variable is explanatory variable not considered in study but confounded with one or more explanatory variables in study. Confounding with lurking variables effectively reduced in randomized comparative experiments where subjects are assigned to treatments at random. Confounding with a (only one at a time) lurking variable reduced in observational studies by controlling for it by comparing matched groups. Consequently, experiments much more effec- tive than observed studies at detecting which explanatory variables cause differences in response. In both cases, statistically significant observed differences in average responses implies differences are \real", did not occur by chance alone. Exercise 5.1 (Experiments, Good and Bad) 1. Randomized comparative experiment: effect of temperature on mice rate of oxy- gen consumption. For example, mice rate of oxygen consumption 10.3 mL/sec when subjected to 10o F. temperature (Fo) 0 10 20 30 ROC (mL/sec) 9.7 10.3 11.2 14.0 (a) Explanatory variable considered in study is (choose one) i. temperature ii. rate of oxygen consumption iii. mice iv. mouse weight 25 26 Chapter 5. Experiments, Good and Bad (ATTENDANCE 3) (b) Response is (choose one) i. temperature ii. rate of oxygen consumption iii.
    [Show full text]
  • Designing Experiments
    Designing experiments Outline for today • What is an experimental study • Why do experiments • Clinical trials • How to minimize bias in experiments • How to minimize effects of sampling error in experiments • Experiments with more than one factor • What if you can’t do experiments • Planning your sample size to maximize precision and power What is an experimental study • In an experimental study the researcher assigns treatments to units or subjects so that differences in response can be compared. o Clinical trials, reciprocal transplant experiments, factorial experiments on competition and predation, etc. are examples of experimental studies. • In an observational study, nature does the assigning of treatments to subjects. The researcher has no influence over which subjects receive which treatment. Common garden “experiments”, QTL “experiments”, etc, are examples of observational studies (no matter how complex the apparatus needed to measure response). What is an experimental study • In an experimental study, there must be at least two treatments • The experimenter (rather than nature) must assign treatments to units or subjects. • The crucial advantage of experiments derives from the random assignment of treatments to units. • Random assignment, or randomization, minimizes the influence of confounding variables, allowing the experimenter to isolate the effects of the treatment variable. Why do experiments • By itself an observational study cannot distinguish between two reasons behind an association between an explanatory variable and a response variable. • For example, survival of climbers to Mount Everest is higher for individuals taking supplemental oxygen than those not taking supplemental oxygen. • One possibility is that supplemental oxygen (explanatory variable) really does cause higher survival (response variable).
    [Show full text]
  • Randomized Experimentsexperiments Randomized Trials
    Impact Evaluation RandomizedRandomized ExperimentsExperiments Randomized Trials How do researchers learn about counterfactual states of the world in practice? In many fields, and especially in medical research, evidence about counterfactuals is generated by randomized trials. In principle, randomized trials ensure that outcomes in the control group really do capture the counterfactual for a treatment group. 2 Randomization To answer causal questions, statisticians recommend a formal two-stage statistical model. In the first stage, a random sample of participants is selected from a defined population. In the second stage, this sample of participants is randomly assigned to treatment and comparison (control) conditions. 3 Population Randomization Sample Randomization Treatment Group Control Group 4 External & Internal Validity The purpose of the first-stage is to ensure that the results in the sample will represent the results in the population within a defined level of sampling error (external validity). The purpose of the second-stage is to ensure that the observed effect on the dependent variable is due to some aspect of the treatment rather than other confounding factors (internal validity). 5 Population Non-target group Target group Randomization Treatment group Comparison group 6 Two-Stage Randomized Trials In large samples, two-stage randomized trials ensure that: [Y1 | D =1]= [Y1 | D = 0] and [Y0 | D =1]= [Y0 | D = 0] • Thus, the estimator ˆ ˆ ˆ δ = [Y1 | D =1]-[Y0 | D = 0] • Consistently estimates ATE 7 One-Stage Randomized Trials Instead, if randomization takes place on a selected subpopulation –e.g., list of volunteers-, it only ensures: [Y0 | D =1] = [Y0 | D = 0] • And hence, the estimator ˆ ˆ ˆ δ = [Y1 | D =1]-[Y0 | D = 0] • Only estimates TOT Consistently 8 Randomized Trials Furthermore, even in idealized randomized designs, 1.
    [Show full text]
  • How to Do Random Allocation (Randomization) Jeehyoung Kim, MD, Wonshik Shin, MD
    Special Report Clinics in Orthopedic Surgery 2014;6:103-109 • http://dx.doi.org/10.4055/cios.2014.6.1.103 How to Do Random Allocation (Randomization) Jeehyoung Kim, MD, Wonshik Shin, MD Department of Orthopedic Surgery, Seoul Sacred Heart General Hospital, Seoul, Korea Purpose: To explain the concept and procedure of random allocation as used in a randomized controlled study. Methods: We explain the general concept of random allocation and demonstrate how to perform the procedure easily and how to report it in a paper. Keywords: Random allocation, Simple randomization, Block randomization, Stratified randomization Randomized controlled trials (RCT) are known as the best On the other hand, many researchers are still un- method to prove causality in spite of various limitations. familiar with how to do randomization, and it has been Random allocation is a technique that chooses individuals shown that there are problems in many studies with the for treatment groups and control groups entirely by chance accurate performance of the randomization and that some with no regard to the will of researchers or patients’ con- studies are reporting incorrect results. So, we will intro- dition and preference. This allows researchers to control duce the recommended way of using statistical methods all known and unknown factors that may affect results in for a randomized controlled study and show how to report treatment groups and control groups. the results properly. Allocation concealment is a technique used to pre- vent selection bias by concealing the allocation sequence CATEGORIES OF RANDOMIZATION from those assigning participants to intervention groups, until the moment of assignment.
    [Show full text]
  • Random Assignment
    7 RANDOM ASSIGNMENT Just as representativeness can be secured by the method of chance . so equivalence may be secured by chance.1distribute —W. A. McCall or LEARNING OBJECTIVES • Understand what random assignment does and how it works. • Produce a valid randomization processpost, for an experiment and describe it. • Critique simple random assignment, blocking, matched pairs, and stratified random assignment. • Explain the importance of counterbalancing. • Describe a Latincopy, square design. ust as the mantra in real estate is “location, location, location,” the motto in experi- Jmental design is “random assignment, random assignment, random assignment.” This notbook has discussed random assignment all throughout. It bears repeating that ran- dom assignment is the single most important thing a researcher can do in an experiment. Everything else pales in comparison to having done this correctly.2 Random assignment is what distinguishes a true experiment from a quasi, natural, or pre-experimental design. In Dochapter 1, experiments were referred to as the gold standard. Without successful random 173 Copyright ©2019 by SAGE Publications, Inc. This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher. 174 Designing Experiments for the Social Sciences assignment, however, they can quickly become “the bronze standard.”3 This chapter will review some of the advantages of random assignment, discuss the details of how to do it, and explore related issues of counterbalancing. THE PURPOSE OF RANDOM ASSIGNMENT People vary. That is, they are different. Were this not so, there would be no reason to study them. Everyone would be the same, reacting the same way to different teaching techniques, advertisements, health interventions, and political messages.
    [Show full text]