The Econometrics of Randomized Experiments∗

Total Page:16

File Type:pdf, Size:1020Kb

The Econometrics of Randomized Experiments∗ The Econometrics of Randomized Experiments∗ Susan Atheyy Guido W. Imbensz Current version June 2016 Keywords: Regression Analyses, Random Assignment, Randomized Experiments, Potential Outcomes, Causality ∗We are grateful for comments by Esther Duflo. yProfessor of Economics, Graduate School of Business, Stanford University, and NBER, [email protected]. zProfessor of Economics, Graduate School of Business, Stanford University, and NBER, im- [email protected]. [1] 1 Introduction Randomized experiments have a long tradition in agricultural and biomedical settings. In eco- nomics they have a much shorter history. Although there have been notable experiments over the years, such as the RAND health care experiment (Manning, Newhouse, Duan, Keeler and Leibowitz, 1987, see the general discussion in Rothstein and von Wachter, 2016) and the Neg- ative Income Tax experiments (e.g., Robins, 1985), it is only recently that there has been a large number of randomized experiments in economics, and development economics in particu- lar. See Duflo, Glennerster, and Kremer (2006) for a survey. As digitization lowers the cost of conducting experiments, we may expect that their use may increase further in the near future. In this chapter we discuss some of the statistical methods that are important for the analysis and design of randomized experiments. Although randomized experiments avoid many of the challenges of observational studies for causal inference, there remain a number of statistical issues to address in the design and analysis of experiments. Even in the simplest case with observably homogenous, independent subjects, where the experiment is evaluated by comparing sample means for the treatment and control group, there are questions of how to conduct inference about the treatment effect. When there are observable differences in characteristics among units, questions arise about how best to design the experiment and how to account for imbalances in characteristics between the treatment and control group in analysis. In addition, it may be desirable to understand how the results of an experiment would generalize to different settings. One approach to this is to estimate heterogeneity in treatment effects; another is to reweight units according to a target distribution of characteristics. Finally, statistical issues arise when units are not independent, as when they are connected in a network. In this chapter, we discuss a variety of methods for addressing these and other issues. A major theme of the chapter is that we recommend using statistical methods that are di- rectly justified by randomization, in contrast to the more traditional sampling-based approach that is commonly used in econometrics. In essence, the sampling based approach considers the treatment assignments to be fixed, while the outomes are random. Inference is based on the idea that the subjects are a random sample from a much larger population. In contrast, the randomization-based approach takes the subject's potential outcomes (that is, the outcomes [1] they would have had in each possible treatment regime) as fixed, and considers the assignment of subjects to treatments as random. Our focus on randomization follows the spirit of Freedman (2006, p. 691), who wrote: \Experiments should be analyzed as experiments, not as observa- tional studies. A simple comparison of rates might be just the right tool, with little value added by `sophisticated' models." Young (2016) has recently applied randomization-based methods in development economics. As an example of how the randomization-based approach matters in practice, we show that methods that might seem natural to economists in the conventional sampling paradigm (such as controlling for observable heterogeneity using a regression model) require additional assumptions in order to be justified. Using the randomization-based approach suggests alternative methods, such as placing the data into strata according to covariates, analyzing the within-group exper- iments, and averaging the results. This is directly justified by randomization of the treatment assignment, and does not require any additional assumptions. Our overall goal in this chapter is to collect in one place some of the most important statis- tical methods for analyzing and designing randomized experiments. We will start by discussing some general aspects of randomized experiments, and why they are widely viewed as providing the most credible evidence on causal effects. We will then present a brief introduction to causal inference based on the potential outcome perspective. Next we discuss the analysis of the most basic of randomized experiments, what we call completely randomized experiments where, out of a population of size N, a set of Nt units are selected randomly to receive one treatment and the remaining Nc = N−Nt are assigned to the control group. We discuss estimation of, and inference for, average as well as quantile treatment effects. Throughout we stress randomization-based rather than model-based inference as the basis of understanding inference in randomized experi- ments. We discuss how randomization-based methods relate to more commonly used regression analyses, and why we think the emphasis on randomization-based inference is important. We then discuss the design of experiments, first considering power analyses and then turning to the benefits and costs of stratification and pairwise randomization, as well as the complications from re-randomization. We recommend using experimental design rather than analysis to ad- just for covariates differences in experiments. Specifically, we recommend researchers to stratify the population into small strata and then randomize within the strata and adjust the standard errors to capture the gains from the stratification. We argue that this approach is preferred [2] to model-based analyses applied after the randomization to adjust for differences in covariates. However, there are limits on how small the strata should be: we do not recommend to go as far as pairing the units, because it complicates the analysis due to the fact that variances cannot be estimated within pairs, whereas they can within strata with at least two treated and two control units. This chapter draws from a variety of literatures, including the statistical literature on the analysis and design of experiments, e.g., Wu and Hamada (2009), Cox and Reid (2000), Altman (1991), Cook and DeMets (2008), Kempthorne (1952, 1955), Cochran and Cox (1957), Davies (1954), and Hinkelman and Kempthorne (2005, 2008). We also draw on the literature on causal inference, both in experimental and observational settings, Rosenbaum (1995, 2002, 2009), Rubin (2006), Cox (1992), Morgan and Winship (2007), Morton Williams (2010) and Lee (2005), and Imbens and Rubin (2015). In the economics literature we build on recent guides to practice in randomized experiments in development economics, e.g., Duflo, Glennerster, and Kremer (2006), Glennerster (2016), and Glennerster and Takavarasha (2013) as well as the general empirical micro literature (Angrist and Pischke, 2008). There have been a variety of excellent surveys of methodology for experiments in recent years. Compared to Duflo, Glennerster and Kremer (2006), Glennerster and Takavarasha (2013) and Glennerster (2016), this chapter focuses more on formal statistical methods and less on issues of implementation in the field. Compared to the statistics literature, we restrict our discussion largely to the case with a single binary treatment. We also pay more attention to the compli- cations arising from non-compliance, clustered randomization, and the presence of interactions and spillovers. Relative to the general causal literature, e.g., Rosenbaum (1995, 2009) and Imbens and Rubin (2015), we do not discuss observational studies with unconfoundedness or selection-on-observables in depth, and focus more on complications in experimental settings. This chapter is organized as follows. In Section 8 we discuss the complications arising from cluster-level randomization. We discuss how the use of clustering required the researcher to make choices regarding the estimands. We also focus on the choice concerning the unit of analysis, clusters or lower-level units. We recommend in general to focus on cluster-level analyses as the primary analyses. Section 9 contains a discussion of non-compliance to treatment assignment and its relation to instrumental variables methods. In Section 10 we present some recent results for analyzing heterogeneity in treatment effects. Finally, Section 11 we discuss violations of the no- [3] interaction assumption, allowing outcomes for one unit to be affected by treatment assignments for other units. These interactions can take many forms, some through clusters, and some through general networks. We show that it is possible to calculate exact p-values for tests of null hypotheses of no interactions while allowing for direct effects of the treatments. Section 12 concludes. 2 Randomized Experiments and Validity In this section we discuss some general issues related to the interpretation of analyses of ran- domized experiments and their validity. Following Cochran (1972, 2015) we define randomized experiments as settings where the the assignment mechanism does not depend on characteristics of the units, either observed or unobserved, and the researcher has control over the assignments. In contrast, in observational studies (Rosenbaum, 1995; Imbens and Rubin, 2015), the researcher does not have control over the assignment
Recommended publications
  • Design of Experiments Application, Concepts, Examples: State of the Art
    Periodicals of Engineering and Natural Scinces ISSN 2303-4521 Vol 5, No 3, December 2017, pp. 421‒439 Available online at: http://pen.ius.edu.ba Design of Experiments Application, Concepts, Examples: State of the Art Benjamin Durakovic Industrial Engineering, International University of Sarajevo Article Info ABSTRACT Article history: Design of Experiments (DOE) is statistical tool deployed in various types of th system, process and product design, development and optimization. It is Received Aug 3 , 2018 multipurpose tool that can be used in various situations such as design for th Revised Oct 20 , 2018 comparisons, variable screening, transfer function identification, optimization th Accepted Dec 1 , 2018 and robust design. This paper explores historical aspects of DOE and provides state of the art of its application, guides researchers how to conceptualize, plan and conduct experiments, and how to analyze and interpret data including Keyword: examples. Also, this paper reveals that in past 20 years application of DOE have Design of Experiments, been grooving rapidly in manufacturing as well as non-manufacturing full factorial design; industries. It was most popular tool in scientific areas of medicine, engineering, fractional factorial design; biochemistry, physics, computer science and counts about 50% of its product design; applications compared to all other scientific areas. quality improvement; Corresponding Author: Benjamin Durakovic International University of Sarajevo Hrasnicka cesta 15 7100 Sarajevo, Bosnia Email: [email protected] 1. Introduction Design of Experiments (DOE) mathematical methodology used for planning and conducting experiments as well as analyzing and interpreting data obtained from the experiments. It is a branch of applied statistics that is used for conducting scientific studies of a system, process or product in which input variables (Xs) were manipulated to investigate its effects on measured response variable (Y).
    [Show full text]
  • Introduction to Social Statistics
    SOCY 3400: INTRODUCTION TO SOCIAL STATISTICS MWF 11-12:00; Lab PGH 492 (sec. 13748) M 2-4 or (sec. 13749)W 2-4 Professor: Jarron M. Saint Onge, Ph.D. Office: PGH 489 Phone: (713) 743-3962 Email: [email protected] Office Hours: MW 10-11 (Please email) or by appointment Teaching Assistant: TA Email: Office Hours: TTh 10-12 or by appointment Required Text: McLendon, M. K. 2004. Statistical Analysis in the Social Sciences. Additional materials will be distributed through WebCT COURSE DESCRIPTION: Sociological research relies on experience with both qualitative (e.g. interviews, participant observation) and quantitative methods (e.g., statistical analyses) to investigate social phenomena. This class focuses on learning quantitative methods for furthering our knowledge about the world around us. This class will help students in the social sciences to gain a basic understanding of statistics, whether to understand, critique, or conduct social research. The course is divided into three main sections: (1) Descriptive Statistics; (2) Inferential Statistics; and (3) Applied Techniques. Descriptive statistics will allow you to summarize and describe data. Inferential Statistics will allow you to make estimates about a population (e.g., this entire class) based on a sample (e.g., 10 or 12 students in the class). The third section of the course will help you understand and interpret commonly used social science techniques that will help you to understand sociological research. In this class, you will learn concepts associated with social statistics. You will learn to understand and grasp the concepts, rather than only focusing on getting the correct answers.
    [Show full text]
  • Causal Inference with Information Fields
    Causal Inference with Information Fields Benjamin Heymann Michel De Lara Criteo AI Lab, Paris, France CERMICS, École des Ponts, Marne-la-Vallée, France [email protected] [email protected] Jean-Philippe Chancelier CERMICS, École des Ponts, Marne-la-Vallée, France [email protected] Abstract Inferring the potential consequences of an unobserved event is a fundamental scientific question. To this end, Pearl’s celebrated do-calculus provides a set of inference rules to derive an interventional probability from an observational one. In this framework, the primitive causal relations are encoded as functional dependencies in a Structural Causal Model (SCM), which maps into a Directed Acyclic Graph (DAG) in the absence of cycles. In this paper, by contrast, we capture causality without reference to graphs or functional dependencies, but with information fields. The three rules of do-calculus reduce to a unique sufficient condition for conditional independence: the topological separation, which presents some theoretical and practical advantages over the d-separation. With this unique rule, we can deal with systems that cannot be represented with DAGs, for instance systems with cycles and/or ‘spurious’ edges. We provide an example that cannot be handled – to the extent of our knowledge – with the tools of the current literature. 1 Introduction As the world shifts toward more and more data-driven decision-making, causal inference is taking more space in applied sciences, statistics and machine learning. This is because it allows for better, more robust decision-making, and provides a way to interpret the data that goes beyond correlation Pearl and Mackenzie [2018].
    [Show full text]
  • ECLAC WORKSHOP NEW TOOLS and METHODS for POLICY-MAKING 19 May 2014, Paris
    OECD - ECLAC WORKSHOP NEW TOOLS AND METHODS FOR POLICY-MAKING 19 May 2014, Paris 21 March 2014 Dear Madam/Sir, We are pleased to invite you to the Workshop on New Tools and Methods for Policy-making, co- organised by the United Nations Economic Commission for Latin America and the Caribbean (ECLAC) and the Organisation for Economic Co-operation and Development (OECD). The workshop will take place at the OECD headquarters in Paris on Monday 19 May 2014. The workshop represents a unique opportunity for policy-makers, academics, researchers in economics and representatives of international organisations to meet. Discussions will focus especially on how recent advances in methods and tools in economics could contribute to devise more effective policies to foster growth, employment and improved income distribution. Leading economists will present their views on these advances in the four following fields: • complexity and network theory; • agent-based modelling; • behavioural and experimental economics; and • big data. These approaches hold promise to redefine the toolkit which experts, academics and policy-makers could use for their analyses in the coming years. The preliminary programme of the workshop is attached. We kindly invite you to register on our dedicated website at http://www.oecd.org/naec/workshop.htm by 25 April 2014. Should you require additional information, please do not hesitate to contact Mr. Romain Zivy at ECLAC and Ms. Elena Miteva at the OECD via the meeting website. We look forward to welcoming you at this workshop. Yours sincerely, Alicia Barcena Angel Gurría Executive Secretary Secretary General ECLAC, United Nations OECD ECLAC, Av.
    [Show full text]
  • Data Collection: Randomized Experiments
    9/2/15 STAT 250 Dr. Kari Lock Morgan Knee Surgery for Arthritis Researchers conducted a study on the effectiveness of a knee surgery to cure arthritis. Collecting Data: It was randomly determined whether people got Randomized Experiments the knee surgery. Everyone who underwent the surgery reported feeling less pain. SECTION 1.3 Is this evidence that the surgery causes a • Control/comparison group decrease in pain? • Clinical trials • Placebo Effect (a) Yes • Blinding • Crossover studies / Matched pairs (b) No Statistics: Unlocking the Power of Data Lock5 Statistics: Unlocking the Power of Data Lock5 Control Group Clinical Trials Clinical trials are randomized experiments When determining whether a treatment is dealing with medicine or medical interventions, effective, it is important to have a comparison conducted on human subjects group, known as the control group Clinical trials require additional aspects, beyond just randomization to treatment groups: All randomized experiments need a control or ¡ Placebo comparison group (could be two different ¡ Double-blind treatments) Statistics: Unlocking the Power of Data Lock5 Statistics: Unlocking the Power of Data Lock5 Placebo Effect Study on Placebos Often, people will experience the effect they think they should be experiencing, even if they aren’t actually Blue pills are better than yellow pills receiving the treatment. This is known as the placebo effect. Red pills are better than blue pills Example: Eurotrip 2 pills are better than 1 pill One study estimated that 75% of the
    [Show full text]
  • Design of Experiments (DOE) Using JMP Charles E
    PharmaSUG 2014 - Paper PO08 Design of Experiments (DOE) Using JMP Charles E. Shipp, Consider Consulting, Los Angeles, CA ABSTRACT JMP has provided some of the best design of experiment software for years. The JMP team continues the tradition of providing state-of-the-art DOE support. In addition to the full range of classical and modern design of experiment approaches, JMP provides a template for Custom Design for specific requirements. The other choices include: Screening Design; Response Surface Design; Choice Design; Accelerated Life Test Design; Nonlinear Design; Space Filling Design; Full Factorial Design; Taguchi Arrays; Mixture Design; and Augmented Design. Further, sample size and power plots are available. We give an introduction to these methods followed by a few examples with factors. BRIEF HISTORY AND BACKGROUND From early times, there has been a desire to improve processes with controlled experimentation. A famous early experiment cured scurvy with seamen taking citrus fruit. Lives were saved. Thomas Edison invented the light bulb with many experiments. These experiments depended on trial and error with no software to guide, measure, and refine experimentation. Japan led the way with improving manufacturing—Genichi Taguchi wanted to create repeatable and robust quality in manufacturing. He had been taught management and statistical methods by William Edwards Deming. Deming taught “Plan, Do, Check, and Act”: PLAN (design) the experiment; DO the experiment by performing the steps; CHECK the results by testing information; and ACT on the decisions based on those results. In America, other pioneers brought us “Total Quality” and “Six Sigma” with “Design of Experiments” being an advanced methodology.
    [Show full text]
  • A Simple Censored Median Regression Estimator
    Statistica Sinica 16(2006), 1043-1058 A SIMPLE CENSORED MEDIAN REGRESSION ESTIMATOR Lingzhi Zhou The Hong Kong University of Science and Technology Abstract: Ying, Jung and Wei (1995) proposed an estimation procedure for the censored median regression model that regresses the median of the survival time, or its transform, on the covariates. The procedure requires solving complicated nonlinear equations and thus can be very difficult to implement in practice, es- pecially when there are multiple covariates. Moreover, the asymptotic covariance matrix of the estimator involves the density of the errors that cannot be esti- mated reliably. In this paper, we propose a new estimator for the censored median regression model. Our estimation procedure involves solving some convex min- imization problems and can be easily implemented through linear programming (Koenker and D'Orey (1987)). In addition, a resampling method is presented for estimating the covariance matrix of the new estimator. Numerical studies indi- cate the superiority of the finite sample performance of our estimator over that in Ying, Jung and Wei (1995). Key words and phrases: Censoring, convexity, LAD, resampling. 1. Introduction The accelerated failure time (AFT) model, which relates the logarithm of the survival time to covariates, is an attractive alternative to the popular Cox (1972) proportional hazards model due to its ease of interpretation. The model assumes that the failure time T , or some monotonic transformation of it, is linearly related to the covariate vector Z 0 Ti = β0Zi + "i; i = 1; : : : ; n: (1.1) Under censoring, we only observe Yi = min(Ti; Ci), where Ci are censoring times, and Ti and Ci are independent conditional on Zi.
    [Show full text]
  • THE HISTORY and DEVELOPMENT of STATISTICS in BELGIUM by Dr
    THE HISTORY AND DEVELOPMENT OF STATISTICS IN BELGIUM By Dr. Armand Julin Director-General of the Belgian Labor Bureau, Member of the International Statistical Institute Chapter I. Historical Survey A vigorous interest in statistical researches has been both created and facilitated in Belgium by her restricted terri- tory, very dense population, prosperous agriculture, and the variety and vitality of her manufacturing interests. Nor need it surprise us that the successive governments of Bel- gium have given statistics a prominent place in their affairs. Baron de Reiffenberg, who published a bibliography of the ancient statistics of Belgium,* has given a long list of docu- ments relating to the population, agriculture, industry, commerce, transportation facilities, finance, army, etc. It was, however, chiefly the Austrian government which in- creased the number of such investigations and reports. The royal archives are filled to overflowing with documents from that period of our history and their very over-abun- dance forms even for the historian a most diflScult task.f With the French domination (1794-1814), the interest for statistics did not diminish. Lucien Bonaparte, Minister of the Interior from 1799-1800, organized in France the first Bureau of Statistics, while his successor, Chaptal, undertook to compile the statistics of the departments. As far as Belgium is concerned, there were published in Paris seven statistical memoirs prepared under the direction of the prefects. An eighth issue was not finished and a ninth one * Nouveaux mimoires de I'Acadimie royale des sciences et belles lettres de Bruxelles, t. VII. t The Archives of the kingdom and the catalogue of the van Hulthem library, preserved in the Biblioth^que Royale at Brussells, offer valuable information on this head.
    [Show full text]
  • Cluster Analysis, a Powerful Tool for Data Analysis in Education
    International Statistical Institute, 56th Session, 2007: Rita Vasconcelos, Mßrcia Baptista Cluster Analysis, a powerful tool for data analysis in Education Vasconcelos, Rita Universidade da Madeira, Department of Mathematics and Engeneering Caminho da Penteada 9000-390 Funchal, Portugal E-mail: [email protected] Baptista, Márcia Direcção Regional de Saúde Pública Rua das Pretas 9000 Funchal, Portugal E-mail: [email protected] 1. Introduction A database was created after an inquiry to 14-15 - year old students, which was developed with the purpose of identifying the factors that could socially and pedagogically frame the results in Mathematics. The data was collected in eight schools in Funchal (Madeira Island), and we performed a Cluster Analysis as a first multivariate statistical approach to this database. We also developed a logistic regression analysis, as the study was carried out as a contribution to explain the success/failure in Mathematics. As a final step, the responses of both statistical analysis were studied. 2. Cluster Analysis approach The questions that arise when we try to frame socially and pedagogically the results in Mathematics of 14-15 - year old students, are concerned with the types of decisive factors in those results. It is somehow underlying our objectives to classify the students according to the factors understood by us as being decisive in students’ results. This is exactly the aim of Cluster Analysis. The hierarchical solution that can be observed in the dendogram presented in the next page, suggests that we should consider the 3 following clusters, since the distances increase substantially after it: Variables in Cluster1: mother qualifications; father qualifications; student’s results in Mathematics as classified by the school teacher; student’s results in the exam of Mathematics; time spent studying.
    [Show full text]
  • Chapter 4: Fisher's Exact Test in Completely Randomized Experiments
    1 Chapter 4: Fisher’s Exact Test in Completely Randomized Experiments Fisher (1925, 1926) was concerned with testing hypotheses regarding the effect of treat- ments. Specifically, he focused on testing sharp null hypotheses, that is, null hypotheses under which all potential outcomes are known exactly. Under such null hypotheses all un- known quantities in Table 4 in Chapter 1 are known–there are no missing data anymore. As we shall see, this implies that we can figure out the distribution of any statistic generated by the randomization. Fisher’s great insight concerns the value of the physical randomization of the treatments for inference. Fisher’s classic example is that of the tea-drinking lady: “A lady declares that by tasting a cup of tea made with milk she can discriminate whether the milk or the tea infusion was first added to the cup. ... Our experi- ment consists in mixing eight cups of tea, four in one way and four in the other, and presenting them to the subject in random order. ... Her task is to divide the cups into two sets of 4, agreeing, if possible, with the treatments received. ... The element in the experimental procedure which contains the essential safeguard is that the two modifications of the test beverage are to be prepared “in random order.” This is in fact the only point in the experimental procedure in which the laws of chance, which are to be in exclusive control of our frequency distribution, have been explicitly introduced. ... it may be said that the simple precaution of randomisation will suffice to guarantee the validity of the test of significance, by which the result of the experiment is to be judged.” The approach is clear: an experiment is designed to evaluate the lady’s claim to be able to discriminate wether the milk or tea was first poured into the cup.
    [Show full text]
  • Survey Experiments
    IU Workshop in Methods – 2019 Survey Experiments Testing Causality in Diverse Samples Trenton D. Mize Department of Sociology & Advanced Methodologies (AMAP) Purdue University Survey Experiments Page 1 Survey Experiments Page 2 Contents INTRODUCTION ............................................................................................................................................................................ 8 Overview .............................................................................................................................................................................. 8 What is a survey experiment? .................................................................................................................................... 9 What is an experiment?.............................................................................................................................................. 10 Independent and dependent variables ................................................................................................................. 11 Experimental Conditions ............................................................................................................................................. 12 WHY CONDUCT A SURVEY EXPERIMENT? ........................................................................................................................... 13 Internal, external, and construct validity ..........................................................................................................
    [Show full text]
  • Field Experiments in Development Economics1 Esther Duflo Massachusetts Institute of Technology
    Field Experiments in Development Economics1 Esther Duflo Massachusetts Institute of Technology (Department of Economics and Abdul Latif Jameel Poverty Action Lab) BREAD, CEPR, NBER January 2006 Prepared for the World Congress of the Econometric Society Abstract There is a long tradition in development economics of collecting original data to test specific hypotheses. Over the last 10 years, this tradition has merged with an expertise in setting up randomized field experiments, resulting in an increasingly large number of studies where an original experiment has been set up to test economic theories and hypotheses. This paper extracts some substantive and methodological lessons from such studies in three domains: incentives, social learning, and time-inconsistent preferences. The paper argues that we need both to continue testing existing theories and to start thinking of how the theories may be adapted to make sense of the field experiment results, many of which are starting to challenge them. This new framework could then guide a new round of experiments. 1 I would like to thank Richard Blundell, Joshua Angrist, Orazio Attanasio, Abhijit Banerjee, Tim Besley, Michael Kremer, Sendhil Mullainathan and Rohini Pande for comments on this paper and/or having been instrumental in shaping my views on these issues. I thank Neel Mukherjee and Kudzai Takavarasha for carefully reading and editing a previous draft. 1 There is a long tradition in development economics of collecting original data in order to test a specific economic hypothesis or to study a particular setting or institution. This is perhaps due to a conjunction of the lack of readily available high-quality, large-scale data sets commonly available in industrialized countries and the low cost of data collection in developing countries, though development economists also like to think that it has something to do with the mindset of many of them.
    [Show full text]