Size and Power of Some Tests Under Experimental Randomization Thomas Eugene Doerfler Iowa State University

Iowa State University Capstones, Theses and Retrospective Theses and Dissertations Dissertations 1965 Size and power of some tests under experimental randomization Thomas Eugene Doerfler Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/rtd Part of the Mathematics Commons, and the Statistics and Probability Commons Recommended Citation Doerfler, Thomas Eugene, "Size and power of some tests under experimental randomization" (1965). Retrospective Theses and Dissertations. 4081. https://lib.dr.iastate.edu/rtd/4081 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Retrospective Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. This dissertation has been microfihned exactly as received ^^ 2956 DOERFLER, Thomas Eugene, 1937- SIZE AND POWER OF SOME TESTS UNDER EXPERIMENTAL RANDOMIZATION. Iowa State University of Science and Technology Ph.D., 1965 Mathematics University Microfilms, Inc., Ann Arbor, Michigan SIZE AND POWER OF SOME TESTS UNDER EXPERIMENTAL RANDOMIZATION by Thomas Eugene Doerfler A Dissertation Submitted to the Graduate Faculty in Partial Fulfillment of I The Requirements for the Degree of DOCTOR OF PHILOSOPHY Major Subject; Statistics Approved : Signature was redacted for privacy. In Charge of Major Work Signature was redacted for privacy. Head of Major Deps ment Signature was redacted for privacy. Deaivpf Graduate College Iowa State University Of Science and Technology Ames, Iowa 1965 ii TABLE OF CONTENTS Page I. INTRODUCTION 1 II. REVIEW OF LITERATURE 4 III. RANDOMIZATION THEORY IN DESIGN AND ANALYSIS FOR PAIRED DATA 30 A. Experiment Randomization 30 B. Randomization Tests 32 1. Description of underlying principles 32 2. Competitive tests considered and their sizes 35 a. The Fisher randomization test 35 b. The Wilcoxon paired test 36 c. The Sign test 38 d. The normal scores test 39 e. The F test 40 IV. COMPARISON OF TESTS 42 A. Size and Power of Competitive Tests 42 B. Generalized Power Expressions 47 1. The Sign test 47 2. The Wilcoxon paired test 48 3. The Fisher randomization test 56 V. RESULTS OF NUMERICAL STUDY 75 A. Power of Competitive Tests Under Experiment Randomization 75 B. Size of the F Test Under Experiment Randomization 122 C. Variability of Power in Individual Experiments 131 VI. SUMMARY AND CONCLUSIONS 134 VII. BIBLIOGRAPHY 138 VIII. ACKNOWLEDGMENTS 145 1 I. INTRODUCTION While the concept and use of experimental randomization introduced by R. A. Fisher in 1926 have been universally accepted by experimenters, an extensive evaluation of this technique is far from complete. The term "experiment randomization" refers to the fact that the pattern of experimental results obtained is selected at random according to a standard design. In an effort to more fully understand the consequences of randomization as used in comparative experiments,, one is motivated to investigate tests of significance. The actual evaluation of signifi cance, the frequency with which significance at specific levels is reached under the null hypothesis of no treatment differences, and the frequency of significance under alternative hypotheses of treatment differences, is of particular interest in a study of this nature. The present research is limited to the paired design, and the study of methods which attempt to detect whether or not a treatment has had some measurable effect on the yield of an experimental unit. Thus, with N N pairs or observed treatment differences, there are 2 possible plans for the experiment, one of which is randomly selected by the experimenter. Because it possesses various optimal properties if the underlying distri bution of the observations is normal, the F test is generally employed for this type of experiment. However, in recent years, other techniques have been recommended which require less restrictive assumptions and are easily applied. Because these methods require no assumptions concerning the actual parametric form of the population from which the samples are drawn, they are denoted as non-parametric methods. It is common to 2 assume that the observations are independent and that the underlying distribution is a continuous function. Consequently the non-parametric techniques are applicable to a broad class of problems. We have limited this study to four non-parametric competitors to the F test, as each of the selected tests exemplifies a different logical basis of statistical inference. Fisher's randomization test, in addition to being a valuable research tool, exposes the underlying principle of non-parametric tests in general. This test was modified by Wilcoxon (1945), who replaced the magnitudes of the observations by their ranks and, in a sense, discards information. However, this test has proved to be a highly efficient and simple non-parametric test. A further simplification is exhibited in the Sign test, in which the test criterion depends only on the algebraic sign of each observation. We also mention the normal scores test which is quite similar to Wilcoxon'a test in the regions of interest pertaining to this study. The choice of which test to use is of paramount importance to the researcher and experimenter alike, A comparative study depends on the competing test criteria and the means with which they are evaluated. Thus research workers have relied on measures such as power and efficiency as logical and meaningful rationale. All analytical and empirical research consequently is directly related to the size of the competing tests, the underlying distribution from which the samples are drawn, the size of the sample, and the alternative hypothesis considered. Most of the work in this area involves asynçtotic theory and large sample comparisons. Because of the great variety of alternatives it is an imposing task to present a comprehensive report on the performance of non-parametric tests 3 when small samples are involved. It is the primary purpose of this study to exhibit the behavior of the competing tests for small samples, for this is what the experimenter frequently encounters in practice. An essentially complete review of pertinent research done in this area is presented. Particular attention is devoted to asymptotic theory, power computations and previous comparative studies. A descrip tion of experiment randomization and the logic of a randomization test follows. Permutation tests, primarily Fisher's randomization iiest, the Sign test, Wilcoxon's test and the normal scores test, along with the normal theory F test are next described in some detail. The comparison of these tests by the logical criteria of size and power is explained, with reference to theoretical as well as empirical results obtained by Monte Carlo techniques. A discussion of these results concludes this work. 4 II. REVIEW OF LITERATURE Although non-parametric tests based upon the permutation of observations have been employed in statistical studies for many years, only quite recently has an attempt been made to develop a statistical theory concerning these techniques. In recent years many methods have been devised which require less elaborate assumptions than the classical techniques. Though the literature abounds with descriptive and critical treatment of non-parametric competitors to the parametric tests, we will limit this section to a review of the work that is relevant to comparative experiments. It is well-known that the validity of using normal law theory for tests of significance in the randomized experiment is based on the randomi zation theory of R. A. Fisher (1960), first introduced in 1926. The essence of Fisher's method of inference is as follows. It is supposed that each experimental unit has a response which may depend on the treatment it receives. If, however, the treatments being compared have no effects, the observations on any treatment are a random sample of the totality of possible observations. So under the assumption of no treat ment effects each of the possible random assignments of the observed values to the treatments is equally likely to occur. If, however, the treatments have effects, the observed partitioning of observations according to treatments will not be a random partitioning. By choosing a test criterion which is sensitive to the alternative hypothesis, one can calculate a set of equally weighted values of the test statistic which specifies its distribution under the null hypothesis. The 5 rejection region is determined by extreme values which become more probable when the alternative hypothesis is true. Conclusions are then based on the test statistic calculated from the observed sample. Wilcoxon (1945) modified the procedure somewhat by considering the rank and the algebraic sign rather than the actual magnitude of the observations. Closely related to this test is the normal scores test, implicit in a description of the use of tables presented by Fisher and Yates (1938). Instead of ranks, they suggested a function of the expected value of order statistics from a sample of absolute normal variables. A further simplification is provided by the Sign test studied by Cochran (1937). For its application only the signs of the sample need be re corded, but this feature necessarily limits its scope to a certain class of problems. Before reviewing the literature we will enumerate

Size and Power of Some Tests Under Experimental Randomization Thomas Eugene Doerfler Iowa State University

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support