Comparison of Common Tests for Normality

Julius-Maximilians-Universität Würzburg Institut für Mathematik und Informatik Lehrstuhl für Mathematik VIII (Statistik) Comparison of Common Tests for Normality x 1 ⌠ −x2 F(x) = e 2 dx 2π ⌡−∞ 1 −x2 f(x) = e 2 2π 0.0 0.2 0.4 0.6 0.8 1.0 −3 −2 −1 0 1 2 3 probability and cumulative density function of N(0,1) Diplomarbeit vorgelegt von Johannes Hain Betreuer: Prof. Dr. Michael Falk August 2010 Contents 1 Introduction 4 2 The Shapiro-Wilk Test for Normality 8 2.1 Derivation of the Test Statistic . ........ 8 2.2 Properties and Interpretation of the W Statistic .................. 13 2.2.1 Some Analytical Features . 13 2.2.2 Interpretation of the W Statistic ....................... 19 2.3 The Coefficients associated with W .......................... 24 2.3.1 Theindirectapproach . 25 2.3.2 Thedirectapproach ............................. 29 2.4 Modifications of the Shapiro-Wilk Test . ......... 30 2.4.1 Probability Plot Correlation Tests . ....... 30 2.4.2 Extensions of the Procedure to Large Sample Sizes . ......... 37 3 Moment Tests for Normality 42 3.1 Motivation of Using Moment Tests . ...... 42 3.2 ShapeTests...................................... 46 3.2.1 TheSkewnessTest ............................... 46 3.2.2 TheKurtosisTest ............................... 53 3.3 OmnibusTests UsingSkewnessandKurtosis . ........ 57 3.3.1 The R Test ................................... 57 3.3.2 The K2 Test .................................. 60 2 CONTENTS 3 3.4 TheJarque-BeraTest .............................. 62 3.4.1 TheOriginalProcedure . 62 3.4.2 The Adjusted Jarque-Bera Test as a Modification of the Jarque-Bera Test 66 4 Power Studies 68 4.1 TypeIErrorRate .................................. 68 4.1.1 TheoreticalBackground . 68 4.1.2 GeneralSettings ............................... 70 4.1.3 Results ..................................... 72 4.2 PowerStudies.................................... 73 4.2.1 TheoreticalBackground . 73 4.2.2 General Settings of the Power Study . ..... 75 4.2.3 Results ..................................... 80 4.3 Recommendations ................................. 88 5 Conclusion 93 A Definitions 95 Chapter 1 Introduction One of the most, if not the most, used distribution in statistical analysis is the normal distribution. The topic of this diploma thesis is the problem of testing whether a given sample of random observations comes from a normal distribution. According to Thode (2002, p. 1), ”normality is one of the most common assumptions made in the development and use of statistical procedures.” Some of these procedures are the t-test, • the analysis of variance (ANOVA), • tests for the regression coefficients in a regression analysis and • the F -test for homogeneity of variances. • More details of these and other procedures are described in almost all introductory statistical textbooks, as for example in Falk et al. (2002). From the authors point of view, starting a work like this without a short historical summary about the development of the normal distribution would not make this thesis complete. Hence, we will now give some historical facts. More details can be seen in Patel and Read (1996), and the references therein. The first time the normal distribution appeared was in one of the works of Abraham De Moivre (1667–1754) in 1733, when he investigated large-sample properties of the binomial distribution. He discovered that the probability for sums of binomially distributed random variables to lie between two distinct values follows approximately a certain distribution—the distribution we today call the normal distribution. It was mainly Carl Friedrich Gauss (1777–1855) who revisited the idea of the normal distribution in his theory of errors of observations in 1809. In conjunction with the treatment of measurement errors in astronomy, the normal distribution became popular. Sir Francis Galton (1822–1911) made the following comment on the normal distribution: 4 CHAPTER 1. INTRODUCTION 5 I know of scarcely anything so apt to impress the imagination as the wonderful form of cosmic order expressed by the ”Law of Frequency of Error”. The law would have been personified by the Greeks and deified, if they had known of it. It reigns with serenity and in complete self-effacement, amidst the wildest confusion. After the discovery of the central limit theorem by Pierre-Simon de Laplace (1749–1827), for most of the nineteenth century, it was a common opinion in research that almost all measurable quantities were normally distributed, if only an accurate number of observations was regarded. But with the appearance of more and more phenomena, where the normality could not be sustained as a reasonable distribution assumption, the latter attitude was cast on doubt. As noted by Ord (1972, p. 1), ”Towards the end of the nineteenth century [. .] it became apparent that samples from many sources could show distinctly non-normal characteristics”. Finally, in the end of the nineteenth century most statisticians had accepted the fact, that distributions of populations might be non-normal. Consequently, the development of tests for departures from normality became an important subject of statistical research. The following citation of Pearson (1930b, p. 239) reflects rather accurately the ideas behind the theory of testing for normality: ”[. .] it is not enough to know that the sample could come from a normal population; we must be clear that it is at the same time improbable that it has come from a population differing so much from the normal as to invalidate the use of ’normal theory’ tests in further handling of the material.” Before coming to the actual topic, we first have to make some comments about basic assumptions and definitions. Settings of this work Basis of all considerations, the text deals with, is a random sample y1,...,yn of n independent and identically distributed (iid) observations as realizations of a random variable Y . Denote the probability densitiy function (pdf) of Y by pY (y). Since we focus our considerations on the investigation of the question, whether a sample comes from a normal population or not, the null hypothesis for a test of normality can be formulated as 1 (y µ)2 H : p (y)= p 2 (y) := exp − , <y< , (1.1) 0 Y N(µ,σ ) √ − 2σ2 −∞ ∞ 2πσ where both the expectation µ R and the variance σ2 > 0 are unknown. Note that instead of ∈ pN(0,1)(y) we write 1 1 ϕ(y)= exp y2 , <y< , (1.2) √2π −2 −∞ ∞ in this work for the pdf of a standard normally distributed random variable. A hypothesis is called simple hypothesis, if both parameters µ and σ2 are completely specified that is, one is CHAPTER 1. INTRODUCTION 6 2 2 testing H0 with µ = µ0 and σ = σ0, where µ0 and σ0 are both known. If at least one of the parameters is unknown, we call H0 a composite hypothesis. For the remainder of this work, we suppose that µ as well as σ2 are unknown, since this is the most realistic case of interest in practice. For the alternative hypothesis we choose the completely general hypothesis that the distribution of the underlying population is not normal, viz., H : p (y) = p 2 (y) (1.3) 1 Y N(µ,σ ) for all µ R and σ2 > 0. ∈ Historically, non-normal alternative distributions are classified into three groups based on their third and fourth standardized moments √β1 and β2, respectively. These two measures will be discussed more detailed in section 3.1. An alternative distribution with √β = 0 is called 1 skewed. Alternative distributions which are symmetric are grouped into alternatives with β2 greater or smaller 3, which is the value of the kurtosis for the normal distribution. Tests that can only detect deviations in either the skewness or the kurtosis are called shape tests. The tests that are able to cover both alternatives are called omnibus tests. Contents Picking up the statement of Pearson (1930b) above, the following text discusses some tests designed to test formally the appropriateness or adequacy of the normal distribution as a model of the underlying phenomen from which data were generated. The number of different tests for normality seems to be boundless when studying the corresponding literature. Therefore, we have to limit the topics presented here and exclude some testing families. The chi-square type tests (cf. Moore (1986)) are excluded as well as tests based on the empirical densitiy function (EDF) like the Kolmogorov-Smirnov test (cf. Stephens (1986)). To give reasons for this decision, we cite D’Agostino (1986a, p. 406) who state that ”[. .] when a complete sample is available, the chi-square test should not be used. It does not have good power [. .]”. Additionally they say that ”For testing for normality, the Kolmogorov-Smirnov test is only a historical curiosity. It should never be used. It has poor power [. .]” (see D’Agostino (1986a, p. 406)). We are also not going to consider informal graphical techniques, like boxplots or Q-Q plots to examine the question whether a sample is normally distributed. For an overview of this topic, we refer to D’Agostino (1986b). In this diploma thesis two different approaches in testing a random sample of (iid) observations for normality are investigated. The first approach consists in using regression-type tests in order to summarize the information that is contained in a normal probability plot. The most important representantive of these type of tests is the Shapiro-Wilk test. In the first chapter of the thesis, the derivation of this test is presented as well as analytical properties of the corresponding test statistic. Further, some modifications and extensions for larger sample sizes will be introduced CHAPTER 1. INTRODUCTION 7 and discussed. For each of the mentioned tests, empiricial significance points are calculated based on Monte Carlo simulations with a number of m = 1.000.000 repetitions for each test and each sample size. In the next chapter, the second approach is discussed, that consists of testing for normality using the third and fourth standardized sample moments of the observations y1,...,yn, also known as sample skewness, √b1, and sample kurtosis, b2.

Comparison of Common Tests for Normality

Maxskew and Multiskew: Two R Packages for Detecting, Measuring and Removing Multivariate Skewness

An Honest Approach to Parallel Trends ∗

Statistical Evidence of Central Moments As Fault Indicators in Ball Bearing Diagnostics

Chapter 5. Multiple Random Variables 5.6: Moment Generating Functions Slides (Google Drive) Alex Tsun Video (Youtube)

Lecture 11: Using the LLN and CLT, Moments of Distributions

Machine Learning

Learning Exponential Families in High-Dimensions

Moment-Ratio Diagrams for Univariate Distributions

Least Quartic Regression Criterion to Evaluate Systematic Risk in the Presence of Co-Skewness and Co-Kurtosis

Lecture 12: Central Limit Theorem and Cdfs Raw Moment: 0 N Μn = E(X )

A Simulation Method for Skewness Correction

On the Asymptotic Distribution of an Alternative Measure of Kurtosis