SOME PROPERTIES of the COEFFICIENT of VARIATION and F STATISTICS with RESPECT to TRANSFORMATIONS of the FORM Xk*
Total Page:16
File Type:pdf, Size:1020Kb
SOME PROPERTIES OF THE COEFFICIENT OF VARIATION AND F STATISTICS WITH RESPECT TO TRANSFORMATIONS OF THE FORM Xk* B. M. Rao a...nd W. T. Federer I. INTRODUCTION The consequences of non-normality, non-additivity, and heterogeneity of error variances on F tests in the analysis of variance have been studied in a n~~ber of papers (e.g., Bartlett [1,2], Box and Anderson [3], Cochran [4,5], Cu.:ctiss [6], Harter and lJ..l!a [7], Nissen and Ottestad [8), Pearson [9], Tuk.ey [10]). It appears that some of the conclusions drawn need fill~ther investiga.- tion. Also, a statistical ~rocedure designed "t9 determine the scale of measure- ment required to obtain any of these conditions is not available. Tukey [11] has n~e some progress in this direction; ~he statistic designed to determine the appropriate scale of measurement for a set of data to obtain normality is u.navaila.ble • In an attempt to firrl such a statistic, and in vievT of the folloWing comment made by Pearson [9] "• • • But in the more extrem~e cases of non- no~mal variation, where the standard deviation becomes an unsatisfactory measure of variability, there will always be a danger of accepting the hypo· to pick out a real difference,", it was decided to investigate the minimum coefficient of variation (C.V), a minimum F ratio for Tukey 1s one degree of freedom for non-additivity, and a maximum F ratios for row and column mean squares. Some ~nalytic results were obtained for the coefficient of variation in relation to positive integer values of k in the transforma- tion xt. In addition, a Monte Carlo investigation of a series of transforma• tiona Of the form Xk for k=%lrt2,••• 1 ±10 and for fractional values of k=~.8,±.6,±.5,±.4, and ±.2 were performed on random normal deviates for various population coefficients of variation (i.e., 5%, 100/15%, lo%, and * Department of Plant Breeding No. 426, Biometrics Unit No. BU-72. This work was supported in p:~.rt by a National Intitutes of Health Research Grant RG-5900. -2- 207~). The effect of these various transformations on the c.v. and various F tests was studied. II. ANALYTICAL RESULTS Notation: Let X be a 1·andom variable normally distributed with the mean i.J. and variance i? [N(Il, cf)] where 11> 0 and cr > 0. Let k be an integer greater than or equal to 1 (i.e., k=1,2,3,•••). Let ak be the kth moment of the random variable X, co - ~ (x-llY3 i.e., e 2a- dx for k=1,2,3, • • •. Clo is defined as equal to 1 and Ct..tc is defined as equal to zero. Lennna: If X is a random variable distributed N(!l,i? ), cr>O then This can easily be proved directly from the definition of ak• [13] II> - t(x1 -x)2 2 1 =1 The sample coefficient of variation is defined as s 1x- where s = ~-~~~-- n-1 n L.xt is the variance and t=l is the mean of the observed sample. The popula x= n tion coefficient of variation can be written as o/!l (i.e., standard deviation divided by mean for any f(x)). Theorem 1: Let X be a random variable N(!l,ifl ), ll >0, cr>O and let Xk be the transformation of X for k=l,2,3,···· Then the coefficient of variation of xk increases as k increases provided 3114 > cr4 • -3- -4- Proof: We have to prove 1_ ]112 L0zk+2~-H ---~----- > -------- ak+l 0:1: Because of the 1JOsitiveness of tbe moments and after some simplication it is sufficient to prove that O:.,alc+2 oC;~-o.',ek 0 ~+1 >O (1) \•Then }{. is even the truth of (1) is easily established from the :repeated ap- plication of the Lemma and it should be noted that in tbis case the stipulated condition that 31l4 > cr4 in -.theorem 1 is not necessary, \tJhen k is odd the proof is a little difficult 'bec1.use t:he ter:ms in (l) are to be expressed in !l and cro After some me..nipulation it can be sbo'm that the terms containing 1-1. and cr of (1) are positive and the term containing cr only is negativeo Here o~ly the set condition of the theorem is used to prove (1). Here also it should be noted that for sufficie;:1.tl;:l large k the condition becOines essentially ~l >Oc ll3] Since only the absolute value of the coefficient of variation has meaning in the appl::.cation the sign can be ignored. !n such a case e'ren if !l < 0 (but !l ~ 0) the theorem is true as the ma;sYJ.i tnd.e of a momer ..:t does not change except the sign and that, too, for odd mom.;rrts Oi.•.Jy, If iJ. ;::; 0 the invalidity of the theorem can easily be seen~ f.heore~ ~; Let log X be no::-m'J.lly distributed w:i.t:1 msan J.L &.::d variance CS 2 (CS > 0) v Le'.:. XK be a transformation for all real k, Then the coefficient of -va:dation of :x:k increases as Iki incrf.:lases and coefficient of variation of xk = coefficient of variation of x-kv 'rhe truth of the theorem can easily be seeno[lJ]-l(- * Theorem 2 is given on page ll~ III. m~IRICAL RESULTS (1) The data and the sampling design: From the book "A Million Random Digits with 1001 000 Normal Deviates," by The Rand Corporation [12], pages 211 55, 107, 159 and 185 of Normal Deviates were randomly selected. On each page there are ten groups of 50 diviates each. The selected pages combined give 50 such groups which we call samples, and each sample represents a two-way classification of five rows and ten columns. To each observation in the sample numbers 5, 10, 15 and 20 were added. This addition results in population means of 5, 10, 15 and 20 with the variance equal to unity in each case. These samples are designated as Category 1. A second set of 50 samples of two rovrs and ten colunms was obtained by taking the first two rows and ten columns of the 50 category 1 samples. This second set of samples was treated in the same manner as the category 1 samples, and was designated as Category 2. On these two categories a number of transformations x- 10 , X- 9 , X- 8 , x- 7 , X-s , x-4 , x-3 , x-z , ,r-1A , 1 og x, x, x2 , x3 , x4 , xs , xs , x7 , xe , xs, X1o have been performed and the analyses of variance have been computed. The ~ "5 , fJ "6 , fJ •a has also been conducted on the 50 samples of the tv1o categories each with 5 added to each observation of the sample. Log X is said to fit into the above transformations because (n+l)Jxn•dx=xn+l for all values of n except when n=-1 \vhich is log X, i.e., J~ = log X • The different statistics obtained are shown below. -5- -6- Source of Variation d.f. F c.v.% Rmrs r-1 FR[(r-l),(r-l)(c-1}] Columns c-1 Fc[(c-l),(r-l)(c-1)] /Error Mean Squa.re:XlOO En1 o:c (r-l)(c-1) x• ./cr Tukey's 1 Fy[l,(r-l}(c-1)-1] Non-Additivity Residual (r-1) (c-1)-1 The integer trru~formations and the averages of the statistics over fifty samples of categories 1 and 2 are given in Tables I and II respectively. Table III contains the fractional transformations and the averages of the statistics of both the categories. (The statistics of individual samples are not given here but may be obtained from the Biometrics Unit, Cornell University, Ithaca, Ne'\-r York. ) In order to accommodate the fractional and log transformations, all observations must be positive. Therefore, the minimum number that can be added to each observation in order to have all observations positive is 5· This automatically satisfies the condition of theorem 1. (2) Coefficient of variation: The coefficient of variation is expressed in percentage.* Figures 1 and 2 represent the graphs of the categories 1 and 2 respectively. The increase in the average coefficient of variation with positive k in the transformation xt agrees with the theorem 1 as it must. The relatively greater increase in the average c.v. with negative k has not been proved analytically. In fact, no theoretical results were obtained for nega- tive k and log X. * From here onwards the discussion is on the average values unless otherwise mentioned. e e e Table I. Averages of FR, Fe, Fr and c.v.~ over 50 samples of category 1. [five rows and ten columns] Population ~ means L 5 10 15 20 Tr~ j formati~~ FR Fe Fr c.v.% FR Fe Fr c.v.~ FR Fe Fr c.v.% FR Fe Fr c.v.% ~o ! 1.018 1.081 20.206 190.07 1 1.175 1.051 5.500 99.42 1.205 1.038 3.027 66.o4 1.214 1.032 2.192 49.39 X9 i 1.099 1.075 15.196 170.51 1.187 l.o46 4.481 88.78 1.210 1.035 2.610 59.16 1.216 1.030 1.963 44.33 X8 I 1.121 1.069 11.230 150.78 1.198 l.o42 3.635 78.36 1.214 1.032 2.252 52.38 1.216 1.028 1.763 39.32 x7 i 1.145 1.061 8.123 131.13 1.207 1.037 2.938 68.18 1.216 1.030 1.948 45.71 1.216 1.027 1.591 34.36 X6 ! 1.170 1.053 5.757 111.74! 1.213 1.033 2.372 58.21 1.217 1.028 1.694 39.11 1.215 1.025 1.441' 29.43 x5 ! 1.193 1.045 4.005 ~· 7311.217 1.030 1.921 48.41 11.215 1.026 1.487 32.58 1.212 1.024 1.328 24.~3 x4 ! 1.210 1.037 2.739 72 ..• 30, 1.217 1.027 1.573 38.76,1.212 1.024 1.326 26.09 1.209 1.023 1.236 19.o4 x3 i 1.219 1.030 1.860 56.22 1.213 1.024 1.322 29.19 1.207 1.023 1.210 19.62 1.204 1.022 1.170 14.76 x2 I 1.214 1.025 1.314 38.28 1.2o4 1.022 1.168 19.60 1.200 1.021 1.14o 13.13 1.198 1.021 1.130 9.86 X : 1.191 1.020 1.117 19.89 1.191 1.020 1.117 9-91 1.191 1.020 1.117 6.60 1.191 1.020 1.11'{ 4.95 I -...;J log] 1.152 1.016 1.548 13.47 1.175 1.018 1.183 4.39 1.182 1.020 1.145 2.46 1.186 1.024 1.134 1.66 I x-l 1.102 1.011 4.2)4 24.02 1.154 1.015 1.390 10.29 1.169 1.017 1.227 6.72 1.176 1.018 1.177 5.00 x-a 1.059 1.001 18.518 56 .• 85 ' 1.131 1.012 1.778 21.13 1.155 1.015 1.371 13.60 1.165 1.017 1.253 10.08 x-3 1.035 0.987 lo4.446 96.55 I 1.1oa 1.oo8 2.4o6 32.11 1.14o 1.013 1.584 20.69 1.155 1.015 1.363 15.24 x-4 1.025 0.973 674.072 140.68 1~085 1.004 3-363 45.20 1.124 1.011 1.880 28.02 1.144 1.013 1.510 20.51 x-s 1.022 0.961 4698.988 188.24 1.o64 0.999 4.820 58.77 1.109 1.008 2.270 35.65 1.132 1.012 1.698 25.90 x-s 1.023 0.953 34784.568 235-97 1.046 0.993 7.020 73.49 1.093 1.005 2.774 43.61' 1.~1 1.010 1.933 31.42 x-7 1.024 0.947 269589o840 280.95 1.032 0.987 10.268 89.35 1.078 1.002 3.422 51.93 1.109 1.008 2.218 37.10 x-a 1.024 0.944 1582034.800 321.62 1.021 0.982 14.852 106.25 1.065 0.999 4.262 60.65 1.098 1.006 2.561 42.95 x-9 1.024 0.942 *42421.774 357.50 1.013 0.976 21.131 124.02 1.052 0.995 5-350 69.79 1.086 1.004 2.970 48.98 x-lo 1.023 o.942 *189815.547 388.81 1.009 0.971 29.731 142.48 1.041 0.991 6.742 79.34 1.075 1.001 3.462 55.20 ----~------ ' *Averages based on 49 samples only.