pISSN: 2093-940X, eISSN: 2233-4718 Journal of Rheumatic Diseases Vol. 26, No. 1, January, 2019 https://doi.org/10.4078/jrd.2019.26.1.5 Review Article Normality Test in Clinical Research Sang Gyu Kwak1, Sung-Hoon Park2 1Department of Medical Statistics and 2Division of Rheumatology, Department of Internal Medicine, Catholic University of Daegu School of Medicine, Daegu, Korea In data analysis, given that various statistical methods assume that the distribution of the population data is normal distribution, it is essential to check and test whether or not the data satisfy the normality requirement. Although the analytical methods vary depending on whether or not the normality is satisfied, inconsistent results might be obtained depending on the analysis meth- od used. In many clinical research papers, the results are presented and interpreted without checking or testing normality. According to the central limit theorem, the distribution of the sample mean satisfies the normal distribution when the number of samples is above 30. However, in many clinical studies, due to cost and time restrictions during data collection, the number of samples is frequently lower than 30. In this case, a proper statistical analysis method is required to determine whether or not the normality is satisfied by performing a normality test. In this regard, this paper discusses the normality check, several methods of normality test, and several statistical analysis methods with or without normality checks. (J Rheum Dis 2019;26:5-11) Key Words. Normality check, Normal distribution, Normality test, Statistical analysis method INTRODUCTION Although the analytical method varies depending on whether or not the normality requirement is satisfied, in- In data analysis, given that various statistical methods consistent results might be obtained depending on the assume that the distribution of the population data is nor- analysis method used. Said differently, it can be con- mal distribution, it is essential to check and test whether cluded that two independent groups have the same dis- or not the data satisfy the normality requirement. For ex- tribution, although they are in fact different. On the other ample, when comparing the distribution of two in- hand, it can be concluded that the distribution of two in- dependent groups, two sample t-tests, which is a para- dependent groups is the same. In order to solve these metric method, are used, if the two population data sat- problems, it is necessary to check and test whether or not isfy the normality requirement, and the Mann-Whitney the normality requirement is satisfied. U-test, which is a nonparametric method, if the data do In many clinical research papers, results are presented not satisfy the normality requirement [1]. The two-sam- and interpreted without checking or testing normality. In ple t-test assumes normality and the Mann-Whitney the case when the reviewer requests the normality check U-test does not assume normality. If the data satisfy nor- or test in the review process of a thesis, the normality test mality, the distribution of the two groups can be com- is carried out to correct the contents of the submitted pared using a two-sample t-test using means and stand- papers. However, when this lack of the normality check or ard deviation. However, if normality is not satisfied, the test goes unnoticed, the results are frequently presented Mann-Whitney U-test is used, which does not use the without a normality test. If the statistical analysis method mean and standard deviation and concludes that the two assumes normality, a normality test should be performed groups are similar if the rankings are similar. to check whether or not the normality requirement is Received:August 22, 2018, Revised:September 14, 2018, Accepted:September 14, 2018 Corresponding to:Sung-Hoon Park http://orcid.org/0000-0002-3218-5420 Division of Rheumatology, Department of Internal Medicine, Catholic University of Daegu School of Medicine, 33 Duryugongwon-ro 17-gil, Nam-gu, Daegu 42472, Korea. E-mail:[email protected] Copyright ⓒ 2019 by The Korean College of Rheumatology. All rights reserved. This is a Open Access article, which permits unrestricted non-commerical use, distribution, and reproduction in any medium, provided the original work is properly cited. 5 Sang Gyu Kwak and Sung-Hoon Park satisfied. One of the reasons why normality tests are not However, when the difference value between the mean performed is that the researchers’ understanding of the and the median is big, the distribution is skewed to the statistical analysis methods is low. right or to the left. Furthermore, the average of the collected data is the sample mean. According to the central limit theorem, the 3) Skewness and kurtosis distribution of the sample mean satisfies the normal dis- Skewness is a measure of the “asymmetry” of the proba- tribution when the number of samples is larger than 30 bility distribution, in which the curve appears distorted or [2]. Therefore, if the number of samples is larger than 30, skewed either to the left or to the right. In a perfect nor- the analysis can be performed on the assumption that the mal distribution, the tails on either side of the curve are normality is satisfied. In clinical studies, however, the exact mirror images of each other. When a distribution is number of samples is frequently lower than 30. The rea- skewed to the left, the tail on the curve's left-hand side is sons for this scarcity of samples include cost and time re- longer than that on the right-hand side, and the mean is strictions during data collection. In this case, a proper less than the mode. This situation is also referred to as statistical analysis method is required to determine negative skewness. When a distribution is skewed to the whether or not the normality requirement is satisfied by right, the tail on the curve's right-hand side is longer than performing a normality test. the tail on the left-hand side, and the mean is greater than The remainder of this paper is structured as follows. the mode. This situation is also referred to as positive First, we introduce the contents of normality check, skewness. which is followed by the introduction of several methods Kurtosis is a measure of the “tailedness” of the proba- of normality test. In addition, some statistical analysis bility distribution, in which the tails asymptotically ap- methods that should be used when the normality require- proach zero or not. Distributions with zero excess kurto- ment is or is not satisfied are described for the data analy- sis are called mesokurtic or mesokurtotic. The most sis in clinical studies. prominent example of a mesokurtic distribution is nor- mal distribution. A distribution with a positive excess MAIN SUBJECTS kurtosis is called leptokurtic or leptokurtotic. In terms of shape, a leptokurtic distribution has fatter tails. Examples Normality check of leptokurtic distributions include the Student's t-dis- There are four methods to check whether or not the col- tribution, exponential distribution, Poisson distribution, lected data satisfy the normality requirement. These and the logistic distribution. A distribution with a neg- methods are checking the normality using plot and sev- ative excess kurtosis is called platykurtic or platykurtotic. eral statistics, such as mean, median, skewness, and Examples of platykurtic distributions include the con- kurtosis. tinuous or discrete uniform distributions and the raised cosine distribution. The most platykurtic distribution is 1) Distribution plot the Bernoulli distribution. A distribution plot of the collected data is useful to check normality of the data. The distribution of the data should 4) Q–Q plot be checked to determine that it does not deviate too much A Q–Q plot is a plot of the quantiles of two distributions as compared to the normal distribution. against each other, or a plot based on the estimates of the quantiles. The pattern of points in the plot is used to com- 2) Difference value between mean and median pare the two distributions. The main step in constructing The mean is a simple arithmetic average of the given set a Q–Q plot is calculating or estimating the quantiles to be of values or quantities. The median is a positional average plotted. If one or both of the axes in a Q–Q plot is based and is defined as the middle number in an ordered list of on a theoretical distribution with a continuous cumu- values. In a normal distribution, the graph appears as a lative distribution function (CDF), all quantiles are classical, symmetrical “bell-shaped curve.” The mean, or uniquely defined and can be obtained by inverting the average, and the mode, or maximum point on the curve, CDF. If a theoretical probability distribution with a dis- are equal. Hence, the difference value between the mean continuous CDF is one of the two compared dis- and the median are close to zero in normal distribution. tributions, some quantiles may not be defined, so an in- 6 J Rheum Dis Vol. 26, No. 1, January, 2019 Normality Test in Clinical Research terpolated quantile may be plotted. If the Q–Q plot is to draw conclusions whether or not the collected data sat- based on the data, there are multiple quantile estimators isfy the normality requirement. Only a rough guess can be in use. The rules for forming Q–Q plots when quantiles made as in this respect. Therefore, to the definite answer, must be estimated or interpolated are called plotting we have to consider a statistical test for normality. There positions. are several methods to perform a normality test. The A simple case is when there are two data sets of the same Kolmogorov-Smirnov test, the Shapiro-Wilk test, and the size.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages7 Page
-
File Size-