Statistical Methods for Research Workers BIOLOGICAL MONOGRAPHS and MANUALS

Total Page:16

File Type:pdf, Size:1020Kb

Statistical Methods for Research Workers BIOLOGICAL MONOGRAPHS and MANUALS BIOLOGICAL MONOGRAPHS AND MANUALS General Editors: F. A. E. CREW, Edinburgh D. WARD CUTLER, Rothamsted No. V Statistical Methods for Research Workers BIOLOGICAL MONOGRAPHS AND MANUALS THE PIGMENTARY EFFECTOR SYSTEM By L. T. Hogben, London School of Economics. THE ERYTHROCYTE AND THE ACTION OF SIMPLE HlEl\IOLYSINS By E. Ponder, New York University. ANIMAL GENETICS: AN INTRODUCTION TO THE SCIENCE OF ANIMAL BREEDING By F. A. E. Crew, Edinburgh University. REPRODUCTION IN THE RABBIT By J. Hammon~, School of Agriculture, Cambridge. STATISTICAL METHODS FOR RESI<:ARCH WORKERS-· By R. A. Fisher, University of London. .., THE COMPARATIVE ANATOMY, HISTOLOGY, AND DE~ELOPMENT OF THIe PITUITARY BODY By G. R. de Beer, Oxford University. THE COMPOSITION AND DISTRIBUTION OF THE PROTOZOAN FAUNA OF THE SOIL By H. Sandon, Egyptian University, Cairo. THE SPECIES PROBLEM By G. C. Robson, British Museum (Natural History). • THE MIGRATION OF BUTTERFLIES By C. B. Williams, Rothamsted Experimental Station. GROWTH AND THE DEVELOP:\IENT OF MUTTON QUALITIES IN THE SHEEP By John Hammond, Cambridge. MATERNAL BEHAVIOUR IN THE RAT By Bertold P. Wiesner and Norah M. Sheard. And other volumes are in course 0/ puJilication. Statistical Methods for Research Workers BY R. A. FISHER, Sc.D., F.R.S. Formerly Fellow oj Conville and Caius Colle/;e, Cambridge Honorary Member, American Statistical Assoc~f!tl0!1-.._~ .. Calton Professor, ziver~¥YA'l0 LOI!,ffn I 1,.'· I, j FIFTH EDITION-REVISED AND ENLARGED - ANGRAU Central Library Hydl'rabad G3681 /11.1111.1111111111111111111111 1111 OL~C~r;-.~OYD EDINBURGH:" TWEEDDALE COURT LONDON: 33 PATERNOSTER ROW, E.C. I934 }IADE; IN OLIVER AND :R~AT BRITAIN BY OYD LTD ., EDINBURGH EDITORS' PREFACE THE increasing specialisation in biological inquiry has made it impossible for anyone author to deal adequately with current advances in knowledge. It has become a matter of considerable difficulty for a research stud~nt to gain a correct idea of the present state of knowledge of a subject fn which he himself is interested. To meet this situation the text-book is' being supplemented by the monograph. The aim of the present series is to provide authori­ tative accounts of what has been done in some of the diverse branches of biological investigation, and at the same time to give to those who have contributed notably to the development of a particular field of inquiry an opportunity of presenting the results of their resear~ches, scattered throughout the scientific journals, in a more extended form, showing their relation to what has already been done and to problems that remain to be solved. The present generation is witnessing" a return to the practice of older days when animal physiology was not yet divorced from morphology." Con­ spicuous progress is now being seen in the field of general physiology, of experimental biology, and in the application of biological principles to economic problems. Often the analysis of large masses of data by statistical methods is necessary, and the biological worker is continually encountering advanced v VI EDITORS' PREFACE statistical problems the adequate solutions of which are not found in current statistical text-books. To meet these needs the present monograph was pre­ pared, and the early call for the second and later editions indicates the success attained by the author In this project. F. A. E. C. D. W. C. PREFACE TO FIRST EDITION FOR several years the author has been working In somewhat intimate co-operation with a number of biological research departments; the present book is in every sense the product of this circumstance. Daily contact with the statistical problems which present themselves to the laboratory worker has stimulated the purely mathematical researches upon which are based the methods here presented. Little experience is sufficient to show that the traditional machinery of statistical processes is wholly unsuited to the needs of practical research. Not only does it take a cannon to shoot a sparrow, but it misses the sparrow! The elaborate mechanism built on the theory of infinitely large samples is not accurate enough for simple labora­ tory data. Only by systematically tackling small sample problems on their merits does it seem possible . to apply accurate tests to practical data. Such at least has been the aim of this book. lowe more than I can say to Mr W. S. Gosset, Mr E. Somerfield, and Miss W. A. Mackenzie, who have read the proofs and made many valuable sugges­ tions. Many small but none the less troublesome errors have been removed; I shall be grateful to readers who will notify me of any further errors and ambiguities they may detect. ROTHAMSTED EXPERIMENTAL STATION, February 1925. vii PREFACE TO FIFTH EDITION THE rapid demand for a fifth edition has more than justified the author's hope that use could be made of a book which, without entering into the mathe­ matical theory of statistical methods, should embody the latest results of that theory, presenting them in the form of practical procedures appropriate to those types of data with which research workers are actually concerned. Those critics who would like to have seen the inclusion of mathematical proofs of the more important propositions of the underlying theory, must still be referred to the technical publications given in the list of sources. There they will encounter exactly those difficulties which it would be undesirable to import into the present work; and will perceive that modern statistics could not have been developed without the elaboration of a system of ideas, logical and mathe­ matical, which, however fascinating in themselves, cannot be regarded as a necessary part of the equip­ ment of every research worker. To present "elementary proofs," of the kind which do not involve these ideas, would be really to justify the censure of a second school of critics, who, rightly feeling that a fallacious proof is worse than none, are eager to decry any attempt to "teach people to run before they can walk." The actual scope of the present volume really exempts it from ix x PREFACE TO FIFTH EDITION this criticism, which, besides, in an age of technicai co-operation, has seldom much force. The practical application of general theorems is a different art from their establishment by mathematical proof, and one useful to many to whom the other is unnecessary. In the second edition the importance of providing a striking and detailed illustration of the principles of statistical estimation led to the addition of a ninth chapter. The subject had received only general· discussion in the first edition, and, in spite of its practical importance, had not yet won sufficient attention from teachers to drive out of practice the demonstrably defective procedures which were stiII unfortunately taught to students. The new chapter superseded Section 6 and Example I of the first edition; in the third edition it was enlarged by two new sections (57' I and 57' 2) illustrating further the applicability of the method of maximum likelihood, and of the quantitative evaluation of information. In Section 27 a generalised form for the series of orthogonal polynomials was added to the third edition, in response to the need which is felt, with re~pect to some important classes of data, to use polynomials of higher degree than the fifth. Simple and direct algebraic proofs of the methods of Sections 28 and 28· I have been published by Miss F. E. Allan. In the fourth edition the Appendix to Chapter II I., on technical notation, was entirely rewritten, since the inconveniences of the moment notation seemed by that time definitely to outweigh the advantages r formerly conferred by its familiarity. The principal \ PREFACE TO FIFTH EDITION Xl new matter in that edition was added m response to the increasing use of the analysis of covariance, which is explained in Section 49' I. Since several writers have found difficulty in applying the appro­ priate tests of significance to deviations from regression formulce, this section has been further enlarged in the fifth edi tion. Other new sections in this edition are 21'01, giving a correction for continuity recently introduced by F. Yates, and 21·02 giving the exact test of significance for 2 x 2 tables. Workers who are accustomed to handle regression equations with a large number of variates will be interested in Section 29' I, which provides the relatively simple adjustments to be made when, at a late stage, it is decided that one or more of the variates used may with advantage be omitted. The possibility of doing this without laborious recalcu­ lations should encourage workers to make the list of independent variates included more comprehensive than has, in the past, been thought advisable. Section 5, formerly occupied by an account of the tables available for testing significance, has now been given to a historical note on the principal contributors to the development of statistical reasoning. With respect to the folding copies of tables bound with the book, it may be mentioned that many laboratory workers, who have occasion to use them constantly, have found it convenient to mount these on the faces of triangular or square prisms, which may be kept near at hand for immediate reference. The practical advantages of this plan have made ! , . ~ .' Xll PREFACE TO FIFTH EDITION it seem worth while to bring it to the notice of all readers. I t should be noted that numbers of sections, tables and examples have been unaltered by the insertion of fresh material, so that references to them, though not to pages, will be valid irrespective o.f the edition used. GALTON LABORATORY, October 1934. CONTENTS ,Po PAGE EDITORS' PREFACE v PREFACE TO FIRST EDITION vii PREFACE TO FIFTH EDITION IX 1.
Recommended publications
  • Volume Quantification by Contrast-Enhanced Ultrasound
    Herold et al. Cardiovascular Ultrasound 2013, 11:36 http://www.cardiovascularultrasound.com/content/11/1/36 CARDIOVASCULAR ULTRASOUND RESEARCH Open Access Volume quantification by contrast-enhanced ultrasound: an in-vitro comparison with true volumes and thermodilution Ingeborg HF Herold1*, Gianna Russo2, Massimo Mischi2, Patrick Houthuizen3, Tamerlan Saidov2, Marcel van het Veer3, Hans C van Assen2 and Hendrikus HM Korsten1,2 Abstract Background: Contrast-enhanced ultrasound (CEUS) has recently been proposed as a minimally- invasive, alternative method for blood volume measurement. This study aims at comparing the accuracy of CEUS and the classical thermodilution techniques for volume assessment in an in-vitro set-up. Methods: The in-vitro set-up consisted of a variable network between an inflow and outflow tube and a roller pump. The inflow and outflow tubes were insonified with an ultrasound array transducer and a thermistor was placed in each tube. Indicator dilution curves were made by injecting indicator which consisted of an ultrasound- contrast-agent diluted in ice-cold saline. Both acoustic intensity- and thermo-dilution curves were used to calculate the indicator mean transit time between the inflow and outflow tube. The volumes were derived by multiplying the estimated mean transit time by the flow rate. We compared the volumes measured by CEUS with the true volumes of the variable network and those measured by thermodilution by Bland-Altman and intraclass-correlation analysis. Results: The measurements by CEUS and thermodilution showed a very strong correlation (rs=0.94) with a modest volume underestimation by CEUS of −40 ± 28 mL and an overestimation of 84 ± 62 mL by thermodilution compared with the true volumes.
    [Show full text]
  • On the Sampling Variance of Intraclass Correlations and Genetic Correlations
    Copyright 1998 by the Genetics Society of America On the Sampling Variance of Intraclass Correlations and Genetic Correlations Peter M. Visscher University of Edinburgh, Institute of Ecology and Resource Management, Edinburgh EH9 3JG, Scotland Manuscript received February 3, 1997 Accepted for publication March 23, 1998 ABSTRACT Widely used standard expressions for the sampling variance of intraclass correlations and genetic correlation coef®cients were reviewed for small and large sample sizes. For the sampling variance of the intraclass correlation, it was shown by simulation that the commonly used expression, derived using a ®rst-order Taylor series performs better than alternative expressions found in the literature, when the between-sire degrees of freedom were small. The expressions for the sampling variance of the genetic correlation are signi®cantly biased for small sample sizes, in particular when the population values, or their estimates, are close to zero. It was shown, both analytically and by simulation, that this is because the estimate of the sampling variance becomes very large in these cases due to very small values of the denominator of the expressions. It was concluded, therefore, that for small samples, estimates of the heritabilities and genetic correlations should not be used in the expressions for the sampling variance of the genetic correlation. It was shown analytically that in cases where the population values of the heritabili- ties are known, using the estimated heritabilities rather than their true values to estimate the genetic correlation results in a lower sampling variance for the genetic correlation. Therefore, for large samples, estimates of heritabilities, and not their true values, should be used.
    [Show full text]
  • Statistical Significance Testing in Information Retrieval:An Empirical
    Statistical Significance Testing in Information Retrieval: An Empirical Analysis of Type I, Type II and Type III Errors Julián Urbano Harlley Lima Alan Hanjalic Delft University of Technology Delft University of Technology Delft University of Technology The Netherlands The Netherlands The Netherlands [email protected] [email protected] [email protected] ABSTRACT 1 INTRODUCTION Statistical significance testing is widely accepted as a means to In the traditional test collection based evaluation of Information assess how well a difference in effectiveness reflects an actual differ- Retrieval (IR) systems, statistical significance tests are the most ence between systems, as opposed to random noise because of the popular tool to assess how much noise there is in a set of evaluation selection of topics. According to recent surveys on SIGIR, CIKM, results. Random noise in our experiments comes from sampling ECIR and TOIS papers, the t-test is the most popular choice among various sources like document sets [18, 24, 30] or assessors [1, 2, 41], IR researchers. However, previous work has suggested computer but mainly because of topics [6, 28, 36, 38, 43]. Given two systems intensive tests like the bootstrap or the permutation test, based evaluated on the same collection, the question that naturally arises mainly on theoretical arguments. On empirical grounds, others is “how well does the observed difference reflect the real difference have suggested non-parametric alternatives such as the Wilcoxon between the systems and not just noise due to sampling of topics”? test. Indeed, the question of which tests we should use has accom- Our field can only advance if the published retrieval methods truly panied IR and related fields for decades now.
    [Show full text]
  • Tests of Hypotheses Using Statistics
    Tests of Hypotheses Using Statistics Adam Massey¤and Steven J. Millery Mathematics Department Brown University Providence, RI 02912 Abstract We present the various methods of hypothesis testing that one typically encounters in a mathematical statistics course. The focus will be on conditions for using each test, the hypothesis tested by each test, and the appropriate (and inappropriate) ways of using each test. We conclude by summarizing the di®erent tests (what conditions must be met to use them, what the test statistic is, and what the critical region is). Contents 1 Types of Hypotheses and Test Statistics 2 1.1 Introduction . 2 1.2 Types of Hypotheses . 3 1.3 Types of Statistics . 3 2 z-Tests and t-Tests 5 2.1 Testing Means I: Large Sample Size or Known Variance . 5 2.2 Testing Means II: Small Sample Size and Unknown Variance . 9 3 Testing the Variance 12 4 Testing Proportions 13 4.1 Testing Proportions I: One Proportion . 13 4.2 Testing Proportions II: K Proportions . 15 4.3 Testing r £ c Contingency Tables . 17 4.4 Incomplete r £ c Contingency Tables Tables . 18 5 Normal Regression Analysis 19 6 Non-parametric Tests 21 6.1 Tests of Signs . 21 6.2 Tests of Ranked Signs . 22 6.3 Tests Based on Runs . 23 ¤E-mail: [email protected] yE-mail: [email protected] 1 7 Summary 26 7.1 z-tests . 26 7.2 t-tests . 27 7.3 Tests comparing means . 27 7.4 Variance Test . 28 7.5 Proportions . 28 7.6 Contingency Tables .
    [Show full text]
  • Statistical Significance
    Statistical significance In statistical hypothesis testing,[1][2] statistical signif- 1.1 Related concepts icance (or a statistically significant result) is at- tained whenever the observed p-value of a test statis- The significance level α is the threshhold for p below tic is less than the significance level defined for the which the experimenter assumes the null hypothesis is study.[3][4][5][6][7][8][9] The p-value is the probability of false, and something else is going on. This means α is obtaining results at least as extreme as those observed, also the probability of mistakenly rejecting the null hy- given that the null hypothesis is true. The significance pothesis, if the null hypothesis is true.[22] level, α, is the probability of rejecting the null hypothe- Sometimes researchers talk about the confidence level γ sis, given that it is true.[10] This statistical technique for = (1 − α) instead. This is the probability of not rejecting testing the significance of results was developed in the the null hypothesis given that it is true. [23][24] Confidence early 20th century. levels and confidence intervals were introduced by Ney- In any experiment or observation that involves drawing man in 1937.[25] a sample from a population, there is always the possibil- ity that an observed effect would have occurred due to sampling error alone.[11][12] But if the p-value of an ob- 2 Role in statistical hypothesis test- served effect is less than the significance level, an inves- tigator may conclude that that effect reflects the charac- ing teristics of the
    [Show full text]
  • Understanding Statistical Hypothesis Testing: the Logic of Statistical Inference
    Review Understanding Statistical Hypothesis Testing: The Logic of Statistical Inference Frank Emmert-Streib 1,2,* and Matthias Dehmer 3,4,5 1 Predictive Society and Data Analytics Lab, Faculty of Information Technology and Communication Sciences, Tampere University, 33100 Tampere, Finland 2 Institute of Biosciences and Medical Technology, Tampere University, 33520 Tampere, Finland 3 Institute for Intelligent Production, Faculty for Management, University of Applied Sciences Upper Austria, Steyr Campus, 4040 Steyr, Austria 4 Department of Mechatronics and Biomedical Computer Science, University for Health Sciences, Medical Informatics and Technology (UMIT), 6060 Hall, Tyrol, Austria 5 College of Computer and Control Engineering, Nankai University, Tianjin 300000, China * Correspondence: [email protected]; Tel.: +358-50-301-5353 Received: 27 July 2019; Accepted: 9 August 2019; Published: 12 August 2019 Abstract: Statistical hypothesis testing is among the most misunderstood quantitative analysis methods from data science. Despite its seeming simplicity, it has complex interdependencies between its procedural components. In this paper, we discuss the underlying logic behind statistical hypothesis testing, the formal meaning of its components and their connections. Our presentation is applicable to all statistical hypothesis tests as generic backbone and, hence, useful across all application domains in data science and artificial intelligence. Keywords: hypothesis testing; machine learning; statistics; data science; statistical inference 1. Introduction We are living in an era that is characterized by the availability of big data. In order to emphasize the importance of this, data have been called the ‘oil of the 21st Century’ [1]. However, for dealing with the challenges posed by such data, advanced analysis methods are needed.
    [Show full text]
  • HERITABILITY ESTIMATES from TWIN STUDIES I. Formulae of Heritability Estimates
    HERITABILITY ESTIMATES FROM TWIN STUDIES I. Formulae of Heritability Estimates K.W. KANG. J.C. CHRISTIAN, J.A. NORTON, Jr. Departments of Medical Genetics and Psychiatry, Indiana University School of Medicine, Indianapolis, Indiana, USA Over the past 50 years a large number of methods have been proposed for estimating herita­ bility from twin studies. The present paper describes the most commonly cited of these esti­ mates as a first step in evaluating their usefulness. A critical review will then follow. Family studies of human quantitative traits have, in general, three goals: first, to determine whether a given phenotypic trait is genetically influenced; second, to determine if discrete segregating genetic factors influence the trait; and third, to determine whether the trait is linked with some other genetic trait. Twin data are often used to attain the first goal by measuring the relative importance of heredity and environment on the development of quan­ titative traits. The concept of heritability originated in an attempt to describe the degree to which the differences actually observed between individuals arose from differences in genetic makeup between individuals as contrasted to the effects of different environmental forces. For a review of the origin of the concept of heritability, see " Heritability in retro­ spect " by Bell (1977). The term heritability is used in both a broad and narrow sense. For the broad sense, the genotype is considered as a unit in relation to the environment. Genes come together in new combinations exhibiting intraallelic interaction (dominance) and interallelic interaction (epistasis). Heritability in the narrow sense considers only the additive portion of the genetic variability in relation to the phenotypic variability.
    [Show full text]
  • What Are Confidence Intervals and P-Values?
    What is...? series Second edition Statistics Supported by sanofi-aventis What are confidence intervals and p-values? G A confidence interval calculated for a measure of treatment effect Huw TO Davies PhD shows the range within which the true treatment effect is likely to lie Professor of Health (subject to a number of assumptions). Care Policy and G A p-value is calculated to assess whether trial results are likely to have Management, occurred simply through chance (assuming that there is no real University of St difference between new treatment and old, and assuming, of course, Andrews that the study was well conducted). Iain K Crombie PhD G Confidence intervals are preferable to p-values, as they tell us the range FFPHM Professor of of possible effect sizes compatible with the data. Public Health, G p-values simply provide a cut-off beyond which we assert that the University of Dundee findings are ‘statistically significant’ (by convention, this is p<0.05). G A confidence interval that embraces the value of no difference between treatments indicates that the treatment under investigation is not significantly different from the control. G Confidence intervals aid interpretation of clinical trial data by putting upper and lower bounds on the likely size of any true effect. G Bias must be assessed before confidence intervals can be interpreted. Even very large samples and very narrow confidence intervals can mislead if they come from biased studies. G Non-significance does not mean ‘no effect’. Small studies will often report non-significance even when there are important, real effects which a large study would have detected.
    [Show full text]
  • Understanding Statistical Significance: a Short Guide
    UNDERSTANDING STATISTICAL SIGNIFICANCE: A SHORT GUIDE Farooq Sabri and Tracey Gyateng September 2015 Using a control or comparison group is a powerful way to measure the impact of an intervention, but doing this in a robust way requires statistical expertise. NPC’s Data Labs project, funded by the Oak Foundation, aims to help charities by opening up government data that will allow organisations to compare the longer-term outcomes of their service to non-users. This guide explains the terminology used in comparison group analysis and how to interpret the results. Introduction With budgets tightening across the charity sector, it is helpful to test whether services are actually helping beneficiaries by measuring their impact. Robust evaluation means assessing whether programmes have made a difference over and above what would have happened without them1. This is known as the ‘counterfactual’. Obviously we can only estimate this difference. The best way to do so is to find a control or comparison group2 of people who have similar characteristics to the service users, the only difference being that they did not receive the intervention in question. Assessment is made by comparing the outcomes for service users with the comparison group to see if there is any statistically significant difference. Creating comparison groups and testing for statistical significance can involve complex calculations, and interpreting the results can be difficult, especially when the result is not clear cut. That’s why NPC launched the Data Labs project to respond to this need for robust quantitative evaluations. This paper is designed as an introduction to the field, explaining the key terminology to non-specialists.
    [Show full text]
  • Title: Assessing Test-Retest Reliability of Psychological Measures: Persistent
    Title: Assessing test-retest reliability of psychological measures: persistent methodological problems Running header: Reliability of psychological measures Victoria K. Aldridge1, Terence M. Dovey2, & Angie Wade1 1Clinical Epidemiology, Nutrition and Biostatistics Section UCL Great Ormond Street Institute of Child Health, 30 Guilford Street London WC1N 1EH, UK 2Department of Psychology Marie Jahoda Building Brunel University Kingston Lane Uxbridge Middlesex UB8 3PH, UK *Author for Correspondence ([email protected]) Health and Life Sciences De Montfort University Leicester LE1 9BH, UK +44 116 2078158 This version of the article may not completely replicate the final version published in European Psychologist. It is not the version of record and is therefore not suitable for citation. Final published version of this manuscript: https://doi.org/10.1027/1016-9040/a000298 1 Abstract Psychological research and clinical practice relies heavily on psychometric testing for measuring psychological constructs that represent symptoms of psychopathology, individual difference characteristics, or cognitive profiles. Test-retest reliability assessment is crucial in the development of psychometric tools, helping to ensure that measurement variation is due to replicable differences between people regardless of time, target behaviour, or user profile. While psychological studies testing the reliability of measurement tools are pervasive in the literature, many still discuss and assess this form of reliability inappropriately with regard to the specified
    [Show full text]
  • The Independent Samples T Test 189
    CHAPTER 7 Comparing Two Group Means: distribute The Independent or Samples t Test post, After reading this chapter, you will be able to Differentiate between the one-sample t test and the independentcopy, samples t test Summarize the relationship among an independent variable, a dependent variable, and random assignment Interpret the conceptual ingredients of thenot independent samples t test Interpret an APA style presentation of an independent samples t test Hand-calculate all ingredientsDo for an independent samples t test Conduct and interpret an- independent samples t test using SPSS In the previous chapter, we discussed the basic principles of statistically testing a null hypothesis. We high- lighted theseProof principles by introducing two parametric inferential statistical tools, the z test and the one- sample t test. Recall that we use the z test when we want to compare a sample to the population and we know the population parameters, specifically the population mean and standard deviation. We use the one-sample t test when we do not have access to the population standard deviation. In this chapter, we will add another inferential statistical tool to our toolbox. Specifically, we will learn how to compare the difference between means from two groups drawn from the same population to learn whether that mean difference might exist Draftin the population. 188 Copyright ©2017 by SAGE Publications, Inc. This work may not be reproduced or distributed in any form or by any means without express written permission of the publisher. CHAPTER 7 Comparing Two Group Means: The Independent Samples t Test 189 CONCEPTUAL UNDERSTANDING OF THE STATISTICAL TOOL The Study As a little kid, I was afraid of the dark.
    [Show full text]
  • Inference from Complex Samples
    Inference from Complex Samples By LESLIE KISH and MARTIN RICHARD FRANKEL The University ofMichigan and The University ofChicago [Read before the ROYAL STATISTICAL SOCIETY at a meeting organized by the RESEARCH SECTION on Wednesday, October 17th, 1973, Professor J. GANI in the Chair] SUMMARY The design of complex samples induces correlations between element values. In stratification negative correlation reduces the variance; but that gain is less for subclass means, and even less for their differences and for complex statistics. Clustering induces larger and positive correlations between element values. The resulting increase in variance is measured by the ratio deff, and is often severe. This is reduced but persists for subclass means, their differences, and for analytical statistics. Three methods for computing variances are compared in a large empirical study. The results are encouraging and useful. Keywords: CLUSTERS; COMPLEX SAMPLE; SAMPLING ERROR; DESIGN EFFECT; BRR; JACKKNIFE; INFERENCE; STANDARD ERROR; REPLICATION; INTRACLASS CORRELATION; SAMPLE DESIGN; SAMPLING VARIANCE; SUBCLASS ANALYSIS; STRATIFICATION; REPLICATION 1. INTRODUCTION STANDARD statistical methods have been developed on the assumption of simple random sampling. The assumption of the independent selection of elements (hence independence of observations) greatly facilitates obtaining theoretical results of interest. It is essential for most measures of reliability used in probability statements, such as aNn, chi-squared contingency tests, analysis of variance, the nonparametric literature and standarderrors for regression coefficients. Assumptions ofindependence yield the mathematical simplicity that becomes more desirable-and at present necessary-as we move from simple statistics such as means, to the complex statistics typified by regression analysis. Independence is often assumed automatically and needlessly, even when its relaxation would permit broader conclusions.
    [Show full text]