When Does the Pooled Variance T-Test Fail?
Total Page:16
File Type:pdf, Size:1020Kb
African Journal of Mathematics and Computer Science Research Vol. 2(4), pp. 056-062, May, 2009 Available online at http://www.academicjournals.org/AJMCSR © 2009 Academic Journals Full Length Research Paper When does the pooled variance t-test fail? Teh Sin Yin* and Abdul Rahman Othman School of Distance Education, Universiti Sains Malaysia, 11800 USM, Penang, Malaysia. E-mail: [email protected] or [email protected]. Accepted 30 April, 2009 The pooled variance t-tests used prominently for comparing means between two groups is usually restricted with the assumptions of normality and homogeneity of variances. However, the violation of the assumptions happens in many real world data. In this study, the conditions where the pooled variance t-test would fail were investigated. The performance of the t-test was evaluated under different conditions. They were sample sizes, type of distributions (normal or non-normal), and unequal group variances. The Type I error rates and power of the pooled variance t-tests for different designs were obtained and compared. The results showed that the test failed dramatically when the group sample sizes were small and unequal with slight departure from homogeneity of variances. Key words: Pooled variance, power, t-test, type 1 error. INTRODUCTION The t-test first proposed in 1908 by William Sealy Gosset, mance of the t-test was evaluated under different condi- a statistician working for the Guinness brewery in Dublin, tions. They were sample sizes, type of distributions (nor- Ireland ("Student" was his pen name) (Mankiewicz, 1975; mal or non-normal), and equal/unequal group variances. O'Connor and Robertson, 1999-2004; Raju, 2005). Gos- The Type I error rates and power of the pooled variance set had been hired due to Claude Guinness's innovative t-tests for different designs were obtained and compared. policy of recruiting the best graduates from Oxford and This paper is organized as follows. In the second part, Cambridge to apply biochemistry and statistics to Guin- the pooled variance t-test, statistical power and effect ness' industrial processes (O'Connor and Robertson, sizes are reviewed. In the third part, the design specifi- 1999-2004). Gosset devised the t-test as a method to cations are discussed. The algorithms to obtain the Type cheaply monitor the quality of beer and the t-test was I error rates and the power rates of traditional pooled published in Biometrika in 1908 (Student, 1908). If the variance t-test are given in the fourth part. The fifth part two population distributions assumed to have homoge- describes the results and discussion. The conclusion and nous variances, the pooled variance t-test was used for some recommendation for future studies are in the final comparison of means. Comparing means is often part of section. an analysis, for data arising in both experimental and observational studies. In fact, many of the applications of t-test have appeared in the text book and literature for the METHODS physical sciences and engineering (Dougherty, 1990; Pooled variance t-test Kinney, 2002; Mendenhall and Sincich, 2007; Miller, 1995; Vardernan, 1994). Let X = X , X ,…, X andY = Y ,Y ,…,Y , where X and Although the pooled variance t-test statistical method 1 2 n 1 2 n Y are independent and identically distributed with was widely used in physical sciences and engineering, it standard normal distribution. The difference between the is usually restricted with the assumptions of normality and homogeneity of variances. No doubt, the violation of the two samples means, X and Y can be tested using the t- assumptions happens in many real world mdata. Hence, test. The hypothesis test is stated as: the purpose of this study is to investigate the conditions H : - = 0 when the pooled variance t-test would fail. The perfor- 0 x y Ha : x - y ≠ 0 *Corresponding author. E-mail: [email protected]. If the two population distributions were unknown but Yin and Othman 057 assumed to have the same variance, then their standard i.i.d Z ,Z ,…,Z ~ N(0,1). deviations say sx and sy can be pooled together. The 1 2 n pooled variance estimator of 2 is given by taking a Type I error ( ) is defined as probability of rejecting H weighted average of the sample variances, that is: 0 when null hypothesis is true. In this study, the nominal 2 2 level is = 0.05. The empirical Type I error rate which 2 sx (nx -1)+ sy (ny -1) sp = were close to nominal level considered as robust. The p- (n -1)+(n -1) x y value of two tailed t-test considered in this study is given (1) by Where; 2 p= 2P t > t0 | t ~ t,n +n -2 (5) sp = pooled sample variance from population X ( x y ) and Y, s2 = variance of sample X, x Statistical power 2 sy = variance of sample Y, The definition of power is1- , where is the probability nx = size of sample X, of Type II error. Based on this definition, the power of a n = size of sample Y. y test is given by Therefore, the test statistic for pooled variance t-test is Power = 1 – P(Accept H0 | H1 is true) defined as: = P(Reject H0 | H1 is true). (6) X -Y - ( x - y ) Essentially, the power of a test depends on three factors, t = which are the criterion of significance ( ), sample sizes 1 1 ’ 2 ∆ ÷ (n) and the effect sizes. Sp ∆ + ÷ «nx ny ◊ (2) Effect size Where; x = population mean for X, The effect sizes of the statistic under the design specifi- cations were examined to find the possible choices for y = population mean for Y. x and y . The effect size index (d) for the two group’s Two sample means need to be estimated before estima- samples is defined as: ting the variances, thus, two degrees of freedom are lost, and the number of degree of freedom is given by x - y d = (7) df = nx + ny - 2 . (3) Where The p-value of one tail t-test is then defined as = standard deviation of either population (if they are p = P t > t0 | t ~ t,n +n -2 ( x y ) equal) or the smallest standard deviation (if they are (4) unequal). Where; According to Cohen (1988), the effect size is considered t0 is the calculated t value. The null hypothesis is rejected when p-value ≤ = 0.05. Small when d = 0.2, The traditional pooled variance t-test is used to assess Medium when d = 0.5, the fidelity of Type I error rates in experimental designs. Large when d = 0.8. In this study, other than the standard normal data, the skewed data was considered by using the property of The effect size index is used to choose values of population means for the true alternative hypothesis. n 2 2 Assume the x > y so that x - y would be positive, the Y = ƒZi ~ n i =1 possible choices of x by assuming y = 0, out of the Where; infinite values of Ha : x ≠ y were given in Table 1. 058 Afr. J. Math. Comput. Sci. Res. Table 1. Possible choices of x and y for all of the conditions. Group Group Pooled Effect Range for sample sizes variances variances sizes, d x - y {5, 15} {1, 1} 1.000 - 0.5, 1, 1.5, … x y 0.5164 {5, 15} {1, 2} 1.778 x - y 1.0, 1.5, 2.0, … 0.6886 {25, 35} {4, 1} 2.241 - 0.5, 1, 1.5, … x y 0.3920 {25, 35} {3, 1} 1.828 - 0.5, 1, 1.5, … x y 0.3540 {25, 35} {1, 1} 1.000 - 0.5, 1, 1.5, … x y 0.2619 {25, 35} {1, 43} 25.621 - 1.0, 1.5, 2.0, … x y 1.3255 {25, 35} {1, 44} 26.207 - 1.0, 1.5, 2.0, … x y 1.3405 {20, 20} {1, 1} 1.000 - 0.5, 1, 1.5, … x y 0.3162 {20, 20} {1, 11} 6.000 - 1.0, 1.5, 2.0, … x y 0.7746 {20, 20} {1, 18} 9.500 x - y 1.0, 1.5, 2.0, … 0.9747 Design specification This study was based on simulated data. In terms of data generation, the SAS generator RANDGEN (SAS In evaluating the performance of the test procedures, Institute Inc., 2004) was used to obtain pseudo-random three variables were manipulated. They were: (i) group standard normal variates (RANDGEN(Y, ‘NORMAL’)) sizes (ii) type of distribution – normal or non-normal and and to generate the chi-squared variates with three de- (iii) pairing of equal/unequal variances. We used small grees of freedom (RANDGEN(Y, ‘CHISQUARE’, 3)). unequal group sizes (5, 15), large unequal group sizes The performances of the pooled variance t-test for the (25, 35), and small equal group sizes (20, 20). For non- specific designs were collected. Here, the group means normal distributions, we chose the chi-square distribu- are set to {0, 0} as to reflect the null hypothesis of equal 2 tions with three degrees of freedom ( 3 ) to represent a means (H0: x = y ). The study conditions that were mild skewed distribution. In terms of variance heteroge- considered: neity, different ratios of positive pairing and negative pairing were examined until a break down condition was Group sample sizes: {5, 15}, {25, 35}, {20, 20}. found for each design. Distributions: Normal, Chi-square 3 degrees of freedom Unequal group sizes, when paired with unequal 2 ( 3 ) variances, can affect Type II error control for tests that Group variances: {1, 1} and various pairings.