Modified t Tests and Confidence Intervals for Asymmetrical Populations Author(s): Norman J. Johnson Source: Journal of the American Statistical Association, Vol. 73, No. 363 (Sep., 1978), pp. 536- 544 Published by: American Statistical Association Stable URL: http://www.jstor.org/stable/2286597 . Accessed: 19/12/2013 01:17

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp

. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].

.

American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal of the American Statistical Association.

http://www.jstor.org

This content downloaded from 143.169.248.6 on Thu, 19 Dec 2013 01:17:45 AM All use subject to JSTOR Terms and Conditions ModifiedL Testsand ConfidenceIntervals forAsymmetrical Populations NORMANJ. JOHNSON*

This articleconsiders a procedurethat reducesthe effectof popu- versa. These authors studied the use of It in forming lation skewnesson the distributionof the t variableso that tests about themean can be morecorrectly computed. A modificationof confidenceintervals for ,u in orderto reduce the thet variable is obtainedthat is usefulfor distributions with skewness effect.Their resultsshowed that the use of I tI was valid as severeas that of the exponentialdistribution. The procedureis only in the case of slightpopulation skewness. generalizedand applied to the jackknifet variablefor a class of statisticswith moments similar to thoseof the samplemean. Tests Because the mathematicsbecomes complicated very of the correlationcoefficient obtained using this procedureare quickly for sample sizes greaterthan two or three,few comparedempirically with correspondingtests determinedusing exact theoreticalresults are available (see Rider 1929, Fisher'sz transformationand theusual jackknifeestimate. Geary 1936, Laderman 1939, and Perlo 1933). A different KEY WORDS: t variable;Population skewness; Hypothesis tests; approach is to transformt (Anscombe 1950); anotheris Confidenceintervals; Jackknife: Cornish-Fisher expansions. to avoid entirelythe question of a distributionby using 1. INTRODUCTION various nonparametrictechniques (Box and Andersen 1955; Hartigan 1969). Most of the results obtained by Let x and s2 be the sample and , respec- these proceduresare established assuming symmetrical tively,of a sample of N independent,identically distri- distributions.When samplingfrom long-tailed symmetric buted observationstaken froma populationhaving mean distributions,Winsorizing or trimmingprocedures have Auand finitevariance a2. The variable t = V\N( - )ls been consideredby Andrewset al. (1972), Tukey (1964), is frequentlyused to make hypothesis tests about A and Yuen (1974). and formconfidence intervals for At.The distributionof The most useful results have been derived by ap- the variable t was firstobtained by Student (1908), proximationtechniques. Hotelling (1961) obtained an assuming a normallydistributed population. Since most expressiongiving the ratio of the tail area of the t dis- populations are not normallydistributed, the question tribution computed for samples from a known but of the robustnessof the distributionof t (and the robust- arbitrary distribution to the tail area of the usual ness of associated tests and confidenceintervals) to the Student t distribution. His findings,established for nonnormalityof the population has been of considerable symmetricaldistributions, support the early empirical interestto statisticians.This article deals with a modifi- results,which showed that largertails in the population cation of the t variable using propertiesof the data. The resultin smallertails in the t distribution.Chung (1946), modifiedvariable is less biased than the t variable and Bartlett (1935), Geary (1936), and Gayen (1949) deter- its distributionis less subject to effectsof population mined the distributionof t by of an Edgeworth asymmetrywhen the population is nonnormal. Thus, or a Gram-Charlierexpansion. The approximationto the the modificationresults in a more robust procedurefor distributionobtained by Bartlettand Geary is similarto makingtests and determiningconfidence intervals for ,u. that of Gayen. The firstfew terms of Gayen's expression Much of the previous research on the robustnessof are given by the t variable has treated the variable as given, and = - +. . ., (1.1) concentratedon indicating changes in the distribution P(t) Po(t, N) + O3P1(t,N) 32P2(t,N) function of t resultingfrom the nonnormalityof the where 3,B= /L3/0f3, /32 = g4/14, a2, and A3 are the second population. Early studies were empirical. Sophister and third central moments,respectively, of the popu- (1928), Neyman and Pearson (1928), and Nair (1941) lation, Po(t, N) is the usual Student t distribution,and showed that skewness in a population affectsthe dis- Pi(t, N) are correctiveterms for #j obtained fromin- tribution of t more than . Their results also complete,B functions. For convenience,tables of Pi(t, N) showed that positive skewnessin the population results were given in Gayen's articlefor selected values of t and in negative skewness in the distributionof t and vice several differentsample sizes N. Difficultiesmay arise when using (1.1) in a * NormanJ. Johnsonis AssistantProfessor, Department of Bio- situation. The series given by (1.1) is exact only for an statistics,University of North Carolina,Chapel Hill, NC 27514. The workwas donewhile the author was VisitingAssistant Professor, infinitenumber of terms. When the series is truncated, Departmentof Statistics,University of Kentucky,whose support is gratefullyacknowledged. The authorwishes to thankJ.A. Hartigan ? Journalof the AmericanStatistical Association formany helpful discussions, and L.J. Gleser,M.H. DeGroot,two September 1978,Volume 73, Number363 referees,and an associateeditor for helpful suggestions. Theoryand Methods Section 536

This content downloaded from 143.169.248.6 on Thu, 19 Dec 2013 01:17:45 AM All use subject to JSTOR Terms and Conditions Johnson:Effect of Skewnesson t Variables 537

probabilities smaller than zero or larger than one, or which are of the same orderas the momentsof x. Thus, distributionfunctions which are nonmonotonicallyin- the formof the Cornish-Fisherexpansion for these vari- creasing, may result. Gayen assumed in his work that ables will be analogous to the one given in (2.2) for x. populationparameters were known.This is not generally Note in this expansion that the skewness of the popu- the case in sampling situations,so that the f3imust be lation,/3, is the coefficientof the term(?2 - 1). (It is estimated fromthe sample. Empirical resultsshow that also the coefficientof otherterms but these are of smaller when appropriateestimates are used, errorsin estimation order.) The methodof this sectionfor deriving a modifi- affectthe accuracy of the approximation.Finally, the cation of the t variable is to eliminatethe (t2 -1) term Gayen method is cumbersomesince the adjustmentsto in the expansion of the modifiedt variable. In so doing, parameters, Pi(t, N), depend on the sample size and the largest-orderterm involvingskewness is eliminated many tables are required. fromthe expansion and the effectof all other termsin- This article seeks to correct the t variables for the volving skewnessis of small order. The appropriateness nonnormalityof the population distribution,not by of modificationsmade by this methodis establishedfor abandoning the t distributionas a standard, but by a varietyof differentdistributions by simulation. adjusting the t variable using propertiesof the data so Let the modifiedt variable be that the adjusted version has Student's t distributionto t - + X +? - - a sufficientlygood approximation. The form of the A) yI(x 4)2 (2/lN)}] correctedt variable is derivedby using a Cornish-Fisher EIs2/N]--' .(2.3) expansion. This form for t, given in equation (2.2), This formis suggestedby replacing - Auin the numer- differsfrom the usual variable in that the numeratoris ator of the t variable by the firstfew terms of the inverse - adjusted by a term involving ( j,) and a constant. Cornish-Fisherexpansion for x - Iu; i.e., in the notation These adjustmentscorrect bias and skewnesseffects due of (2.1), an expansion for v in terms of x - u. The re- to the skewnessof the nonnormalpopulation distribution. sulting expansion for t1is similarto that given for x in This technique avoids many of the annoyingfeatures (2.2). The constant), a functionof N, is chosen so that of previouslyproposed methods.Empirical studies show constant terms in the Cornish-Fisherexpansion of t1 that hypothesis tests determined by this procedure sum to zero, thus eliminatingthe low-orderbias; -yis compare favorably with tests determined by other chosen so that the coefficientof the t2 term in the methods for samples as small as 13 drawn fromdistri- Cornish-Fisherexpansion of ti is zero, therebyeliminat- butions as asymmetrical as x2 with two degrees of ing the low-ordereffects of skewness.The derivationof freedom.This method also has a natural extension to -yand X parametersis given in AppendixA. other problemsof parametricestimation involving the t The solutionsare distributionand appropriate tests and confidencein- tervals by making use of the jackknifeprocedure (e.g., 7 = 3/3a4, = 3/2N-2 (2.4) estimation of the correlation coefficient).This is dis- so that cussed furtherin Section 3. t = [(x-) + 62N+ 3 4 (- A)][s2/N]-. (2.5) 2. MODIFICATIONOF THE t VARIABLE The modificationof the t variable obtained in this Note that the bias of the numeratorof tl is ,u3/2o-2Nand section is derived using the Cornish-Fisherexpansion. the thirdcentral of its numeratoris 4MA3/N.The The general form of such an expansion (Cornish and between numerator and denominator is Fisher 1937) fora variable X is 4Mu3/N2plus smaller-orderterms. Thus, the corrections'y and X eliminatethe high-orderbias and skewnesseffects CF(X) = ,u+ a + (u3/a2)( 2-1) +..., (2.1) for the modifiedt variable, ti, but do not eliminatethe where r is a standard normal random variable, , is the bias or skewnesseffects of its numeratoror the correlation 2 meanof X, and a ... are thesecond, third, ... cen- betweenthe numeratorand denominator. tral moments of X, respectively. Wallace (1958) is In practice, the parameters/A 3 and ur2must be esti- relevant to the study of Cornish-Fisherexpansions as mated by the sample quantities 3 and s2.To demonstrate used in this article. Wallace discusses conditions for the use of the variable t1 in a testingsituation, assume a valid representationof a distributionby a series that we wish to test the hypothesisHo: A = ALoagainst approximation. the alternative Ha: AL> iLo. The hypothesiswould be As an importantexample, assume that all moments rejectedif of a population exist; then using (2.1), A3 (x- o) + 6 23 + (-o2 AO > ta,v uy 0$3 682N 384 CF(xt) =u8 + _N + N (.2-1) + O(N-') (2.2) -VN where the value of ta, is obtained fromthe Student t is the valid representationof x by a Cornish-Fisher distribution.For the appropriate degrees of freedom, expansion to two terms.The basic resultsderived in this v= N -1, az is the probability that a t variable is article will be establishedfor variables having moments greaterthan ta,1.

This content downloaded from 143.169.248.6 on Thu, 19 Dec 2013 01:17:45 AM All use subject to JSTOR Terms and Conditions 538 Journalof the AmericanStatistical Association, September 1978

Use of the variable t1does not lead directlyto a simple 1. EmpiricalResults of Sample Mean, 2,500 Samples expressionfor confidence intervals for ,u since the numer- ator of t1is nonlinearin ,u.However, the terms -yand X Intervals have been determinedso that large-ordereffects of bias 1 2 3 4 5 6 and population skewness are reduced. Thus, the effect Probability of the terminolving (x -,) is small-order,and a simple expressionfor a modifiedconfidence interval for ,u can .01 .04 .45 .45 .04 .01 the term - be obtained by neglecting involving(x ,U)2 in Expected frequencies (2.5). The variable t1then reduces to the variable Procedure N x2 25 100 1,125 1,125 100 25 ti' = [(x - A) + (M3/6o1N)][s2/N]-1, (2.6) X2 13 GYEN-th 172.0 67 88 1,094 1,060 118 73 and the endpointsof the resulting(1 - a) percent con- GYEN-est 241.4 18 73 1,134 1,036 142 97 fidenceinterval for A would be TSTAT 653.2 2 31 1,117 1,040 170 140 TADJ-th 12.3 29 116 1,108 1,101 107 39 TADJ-est 41.3 11 59 1,089 1,182 119 40 [x + (,A3/6s2N)] 4-tal2,s/VN . (2.7) 25 GYEN-th 60.2 51 95 1,111 1,087 103 53 The modifiedinterval is appealing for several reasons. GYEN-est 74.2 29 82 1,132 1,080 111 66 The width of the intervalis not increased,as it would be TSTAT 377.5 2 46 1,151 1,053 135 113 TADJ-th 5.4 31 106 1,122 1,097 115 29 if a modificationwere based on a scheme of weighting TADJ-est 21.1 9 80 1,112 1,146 123 30 the observations. Also, according to a Cornish-Fisher Xio2 13 expansion, this modificationcorrects for the difference GYEN-th 22.1 46 100 1,107 1,113 99 35 GYEN-est 35.5 34 80 1,122 1,104 104 51 betweenthe medianand the mean due to an asymmetrical TSTAT 68.8 14 70 1,120 1,108 129 59 population. Thus, confidenceintervals are based on a TADJ-th 13.7 43 96 1,113 1,119 101 28 modificationwhich makes the unbiased for the TADJ-est 15.1 26 76 1,106 1,144 109 39 25 mean (Johnson1974). GYEN-th 15.9 43 98 1,146 1,097 88 28 Table 1 contains the results of an empirical study GYEN-est 8.2 23 108 1,150 1,097 104 36 made to compare the validity of tests for, determined TSTAT 47.4 12 82 1,165 1,073 115 53 TADJ-th 4.7 32 101 1,156 1,097 93 21 fromthe correctionto the t variable given in (2.5) with TADJ-est 2.4 20 103 1,139 1,104 106 28 tests determined by the usual unadjusted Student t NOTE: For each procedure, six intervals were determined from the sample. The procedure and tests determinedby using Gayen's pro- frequency with which the true mean was observed in each interval is recorded in the cedure (1.1). The results are reassuringin that they table. The column headings are the probability that the true mean would be in each intervaland the expected frequencies give the number of times this should happen over show that the assumptionsand derivationswe have made 2,500 samples according to the assumptions of the hypothesis. The X2 goodness-of-fit lead to a variable which has more nearly a t distribution is given as a convenient measure for comparison purposes. The procedures than the otherprocedures considered. are fullydescribed in Section 2. The entriesin Table 1 are the frequenciesfor which the true mean ,uwas found in one of six regionsdetermined of the procedures: for each test over 2,500 samples. For each sample, five GYEN-th-A t statistic is calculated from the orderedvalues were determinedusing each test statistic. sample. Intervals are determinedusing For example,the fiveordered values forthe t statisticare: Gayen's procedure,assuming theoreti- cal parametervalues. (X + t.g9,12s/(13)2, X + t.95,128/(13)i, 0, GYEN-est-A t statistic is calculated from the x + x + t.o5,12s/(13)1, t.o1,128/(13)i) sample. Intervals are determinedusing Gayen's procedure with parameter It was then noted in which interval between the five values estimatedfrom each sample. orderedvalues the true mean , would be located. Thus, TSTAT-The usual unadjusted Student t pro- for each test the firstand sixth intervals would be the cedure of makingtests. critical region of a two-sided hypothesistest of ,u. In TADJ-th-An adjustmentto the t statisticis made the examplefor the t statistic,the two end intervalshave using (2.5). The parameter values a Type I errorof two percent.By presentingthe results needed for the adjustment are the in this way, not only can the ability of each procedure theoreticalvalues. to make tests at the correcta level be determined,but TADJ-est-An adjustmentto the t statisticis made also the abilityof each procedureto reduce the effectof using (2.5). The parameter values population skewnesscan be assessed. The x2 goodness- needed for the adjustment are esti- of-fitstatistic (0 - E)2/E) is given for each pro- (, mated fromeach sample. cedure as a convenientmeasure for comparison purposes. The populations sampled are various x2 distributions. The generationof the x2random numbersused in the Each procedurecompared in Table 1 is fullydescribed empirical studies summarizedin Table 1 was made by as follows.A mnemoniclabel is givento aid identification generatinguniform random numbers, u4i, by the multipli-

This content downloaded from 143.169.248.6 on Thu, 19 Dec 2013 01:17:45 AM All use subject to JSTOR Terms and Conditions Johnson:Effect of Skewnesson t Variables 539

cative congruentialmethod described in Abramowitzand the estimateof 0 based on gN determinedfrom the sample Stegun (1970). The initial seed of the generator was after deleting the jth group of observations. Let gjk 156277 and the multiplier was 214 + 3. Chi-squared denote the calculation of the statisticon the jth group random numbers, ui', were created from the uniform only, j = 1, ..., n. (This statisticis not consideredin numbersby formingui' = Ek=1 (-2 log ui). This quan- most discussions of the jackknife procedure but it is tity is distributed as chi-squared with 2k degrees of definedhere so that it can be used as part ofan estimation freedom. The simulation was conducted on an IBM procedureto be discussed later in this section.) The jth 360/91 computer at the Health Science Computing pseudovalue,JgNkj, is determinedfrom these estimates by Facility of the Universityof Californiaat Los Angeles. Notice in Table 1 that the methodTADJ of adjusting JgNki = ngN - (n - l)gNki , for j = 1, ..., n. the t variable for population skewness determinestests with critical regions which are closer to theoretical The jackknifeestimate of 0 is the average of the pseudo- critical regions, and x2 statistics which are smaller in value estimates: size, than either the Student t procedureor the Gayen adjustment to the t distribution.Gayen's adjustmentis also an improvementover the Studentt procedureaccord- JgNk = E JgNkji/n j=1 ing to these two criteria.For large sample sizes and/or large degreesof freedomin the parent x2 distribution,the The jackknifeprocedure was originallyproposed as a results of this table and other results not included here means of reducingbias (Quenouille 1949). Later, Tukey show that Gayen's procedureand the TADJ procedure (1969) arguedthat in many cases the pseudovalues could give similarresults. In all cases of x2and N, the results be consideredas independentobservations to be used to when parameters are estimated from the sample are calculate an estimate of variance. Tests and confidence betterfor the TADJ procedurethan the Gayen procedure. intervals would then be determinedfor the unknown Otherempirical studies show that resultsalmost identi- parameterusing a Student t distribution(Miller 1974). cal to those of the ordinaryt statistic are obtained for A broad class of variables will now be definedby the TADJ when the parent population is normal. Thus, the orderof theirmoments and crossmoments on subsamples t statistic can be adjusted as in (2.5) for any parent of the sample. distributionin the range fromnormal to a distribution as asymmetric as chi-squared with two degrees of Definition1: A variable is mean-likeif the orderof its freedomfor sample sizes as small as 13. mean and covariance over subsamples of the sample are of the same order as the correspondingmean and co- 3. A GENERALIZATIONUSING THE variance of x, the sample average. JACKKNIFEPROCEDURE Examples of mean-likevariables are the statistics x In this section the procedurefor adjusting the t vari- and s2. It can be shown (Johnson1974) that formean-like able, derived in the previous section, is extended and variables the jackknife procedure eliminates bias of applied to correctingthe effectof populationskewness on order0 (N-1) in a Cornish-Fisherexpansion but does not the jackknifet variable. Using the jackknifeprocedure, eliminatea second term in the expansion,also of order tests and confidenceintervals for a wide variety of o (N-1), involvingthe skewnessof the distribution.Thus, parameterscan be made. (Some notationis given below, in order to obtain propertiesof variables in terms of but fora more completediscussion and additional refer- Cornish-Fisher expansions accounting for population ences of the jackknifeprocedure, see Quenouille 1949 and asymmetry,the definitionof a variable must include Miller 1974.) The correctionof the jackknifeprocedure terms of order 0 (N-'). The followingdefinition, first to be derived is for cases in which the parameter of discussed in Johnson (1974), characterizes a class of interest is estimated by a statistic having moments variables according to the order of magnitude of their whichare of the same orderas those of the sample mean. moments including terms of O (N-1). This charac- Considernow the jackknifeprocedure. Let gN =(X1, terizationis then used to determinethe ability of the X2, . . ., XN), a function of the N observations X1, jackknife procedure to reduce the effectof population X2, ... . XN, be an estimateof an unknownparameter 6. Then the jackknife procedure is defined as follows: skewness. Group the observations into n groups of size k; thus Definition2: Let Pi denote the ith subset of the data, N = kn and pi denote the number of observationsin the subset Pi, = (X1, . .. , XN) (X1, . .. , Xk; Xk4--1,... * UPi the variable calculated on the subset Pi, and [p1pj]

X2k; . .. ; X(n-l)k+l, ... *, XN) and [pipjpk] the numberof observationsin the subsets Compute the statistic Pi fl Pi and P1 n Fi flPk, respectively.Then the vari- able UN, a symmetric,measurable function of inde- UNki = g(X1 . * *, Xi(j-1)kx Xijk+l, * * * , XiN) , pendent, identically distributedrandom variables X1, X2 .,XN, used to estimate6, is third-ordermean^-like

This content downloaded from 143.169.248.6 on Thu, 19 Dec 2013 01:17:45 AM All use subject to JSTOR Terms and Conditions 540 Journalof the AmericanStatistical Association, September 1978 if it satisfies mean, variance, and thirdcentral moment:

(1) E(gN) = 0 + (s/N) + O(N-2) E(JgNk) = 6 + O (nN 2) ,

var(JgNk) = (o-2/N)+ O(k-2) , (3.2) = oa2 0 as (2) cov(gpi,gpj) + ( ) pi, pj ; and A3(JgNk) = [(3 + 3r)/N2] + O(k 3) 7 jpk]iP3Pip (3) E[(gSp-i)(qP, g)(gpp - - 0)] = [P ] respectively(as k oo, n may or may not -+ ). Com- pipjpk paringthese momentswith the momentsgiven forthird- T F2 order mean-likevariables (3.1), it is seen that although + I- ([pipjpk +EPi[PijPk] the jackknifeprocedure has reduced the bias, the third Pip jpk LPi central moment,involving skewness,has not changed. ' ([pipj] + [piPk]) + [p pi][pjpik) The comparison also shows that there is an effecton the third central moment due to r. By examiningthe 1 ] + -- ([pipjpk]2 + [Pipjpk] cross moments between the pseudovalues, it can also pj be shown (Johnson 1974) that there are some second- *([pipj] + PjPk])+ [PiPj][pjPk]) orderdependencies between the pseudovaluesinvolving r. In orderto make probabilitystatements and determine 1 ] + -- ([pipjpk]2 + [piPjPk] confidenceintervals for 0, the jackknifeprocedure uses Pk the pseudovalues as new observations from which a t variable, JT, is calculated. The jackknife t variable is ([piPk] + [pjPk]) + [PiPk]Epjpk])] JT = (JgNk - 0) 1 n _ +0 ( ) as pi,pj, andpk oo . (JgNki - JgNk)] (3.3) pipijpk -n(n-1) i=1

(4) E(YN - 0)' = 0(N-2) . Moments of the distributionof the jackknifet variable will now be studied for third-ordermean-like variables. Under the assumptions,a third-ordermean-like vari- able has mean, variance, and third central moment: Theorem1: If uN is a third-ordermean-like variable = = with E(9N) 0 0, JgNki the ith pseudovalue, and E(gN) = C + (a/N) + 0(N-2) JgNk the jackknifeestimate of gN, then the mean, vari- var(gN) = (,72/N) + 0 (N-2) (3.1) ance, and thirdcentral moment of the jackknifet variable JT, as k -+c and n are: and -+o, E(JT) = + /A3(gN) = [ULL3 + 3r)/(N2)] + 0(N-3), (-1/2vN)[(/u3 2r)/lo] + O(n-') var (J T) = 1 + O (n-1) respectively.(Recall that r is a nonlineareffect arising in the definitionof third-ordermean-like variables, and Definition2.) An example of a third-ordermean-like variable is 113(JT)= (-1//N)E2/A3 + 3r)//o3]+ 0(n-1) - - gN = - ?XX+ + E(t M)2 (q2 /N)], whereE(x) = y. where 43 and a-are the population thirdcentral moment This variable is similarto the variable in the numerator and , respectively.The proof of this of the adjusted t variable givenin (2.3). The largestorder theoremis given in AppendixB. statistic,X(N), used to estimate 0 froma sample of N Theorem1 showsthat the jackknifet variable is biased observations taken from a U[O, 0] population, is not and has nonzero skewness; both the bias and skewness third-ordermean-like (its variance is O(N-2)). A con- are affectedby the population skewness and the non- venientmethod of determiningwhether or not a variable linearityeffect, r. is third-ordermean-like has not yet been found. Some An adjustmentcan be made to the jackknifet variable third-ordermean-like statistics, such as x and 82, are U to reduce the effectof population skewnessas was done statistics (Hoeffding1948); other third-ordermean-like for the t variable in Section 2. The inverse Cornish- variables, such as the correlationcoefficient estimate r Fisher expansionof the jackknifet variable is and gN = x + X + -y[Ux- )2 - (0,2/N)], mentioned above, are smooth functionsof U statistics, and the JT' = JT + X+ -y(JT2-(n/n - 2)) + O(n-') . (3.4) jackknife estimate is asymptoticallynormal (Arvesen Values of X and y will be determinedso that the bias and 1969). Asymptoticnormality of third-ordermean-like thirdcentral moment of JT' to low-orderterms are zero. statisticsis shown in Hartigan (1975). That U statistics Using the results of Theorem 1 and the momentsof or functionsof U statisticsare third-ordermean-like has JT' obtained from(3.4), not been determined. The jackknifeof a third-ordermean-like variable has X-= (1/2n)[(g3 + 2r)/u3],

This content downloaded from 143.169.248.6 on Thu, 19 Dec 2013 01:17:45 AM All use subject to JSTOR Terms and Conditions Johnson:Effect of Skewnesson t Variables 541 and An empiricalstudy was made to assess the improve- -Y = (2,t3+ 3r)/[6o-\/N(n/n - 2)2] ment to the jackknifet statistic due to the adjustment given in (3.5) when making tests on the correlation The adjusted jackknifet variable is coefficientp. The study also included results for tests

1 (IA3+ 2,r) 1 derived fromthe jackknife procedure and the familiar JT'=JT +--+ + Fisher z transformation. 2V\N a 6V\N(n/n- 2)2 The sample correlationcoefficient, r, is a third-order (2A3+ 3)(JT2 _ n1). (3. mean-likestatistic, and in the case of sampling froma a-3 n - 2 bivariate standard normalpopulation with correlationp, the mean,variance, and thirdand fourthcentral moments In the case of statisticx, the adjustment (3.5) reduces of r are (Ghosh 1966): to adjustment (2.4) of Section 2. Estimates of the = parameters/I 31 T, and a-2 are determinedfrom the pseudo- E(r) p + [5p(5 + p2)/4N] + O(N-2) valuesJqNki and the calculationson groupsgki: var(r) = [(1 - p2)2/N]+ O(N-2) ;

k2 n A 3(r) = [-6p(l - p3)3/N2]+ O(N-3) /.t3 = k2 3(JT') - E (JgNki - JgNk)3 n i=1 /14(r)= [3 (1 - p2)4/N2]+ O (N-3) since Observe that the sample correlation coefficient,r, is E(f33(JT')) - GO/k2) ? O(k-3) + O(1/nk2) negativelyskewed when p is positive,and vice versa for

T = k2T(JT') p negative. This is similar to the behavior of the t sta- k2 71 tistic when sampling from asymmetricpopulations, as

- E [(9ki - JgNki) (JgNki - JgNk)2] already noted. -1)= (n The resultsof the empiricalstudy are givenin Table 2. since The format of Table 2 is similar to that of Table 1. E(T(JT')) - (r/k2) + O(1/nk) + O(k-3) Intervals are determinedby various procedures.Table 2 reportsthe frequenciesover 1,000 samples with which 2=2 k&2(JT')2 ~~~~~~~~~~~~~(3.6)the true correlation p (or a transformationof p) fell in k n each of six intervals. The closeness of these observed E (JgNki - JgNk)2 to is (n-li frequencies expected frequencies measured by the since x2 goodness-of-fitstatistic. Bivariate samples of sizes 25 E(a2) = (12/k)+ O (N-1) + O (k2) and 50 were generatedusing the normalrandom number generator package RSTART (created by Professor The test statistic to test the hypothesis Ho: 0 = Oo George Marsaglia of McGill University). Correlated against the alternativeHoL: 0 > 0 is obtained using (3.3) bivariate normal variables X and Y were obtained and (3.5) by replacing parameters with appropriate by generating variables X - N(0, a-2) and Z - N (0, sample estimates given in (3.6). The resulting test a2(1 - p2)), then forming Y = pX + Z. A mnemonic statisticis label and a descriptionof the various procedurescom- pared in the study are given as follows: V\N(JgNk - Oo) (I3 + 2T) fn - 2 0a 2V\N3 \ 2 JACK-The jackknifeprocedure of making tests on p using the pseudovalues. 1 (3.7) (2-3 + 3T) [N(JgNk - _0) The degrees of freedomfor the t 6V\IN6 a2 n- 2 distributionare v = n - 1. ADJ JACK-The jackknifeprocedure adjusted The hypothesisHo is rejected at the ath level if the test as described in All statistic (3.7) is largerthan ta,, v = n - 1. As was the (3.5). param- eters in the case for the ordinaryt statistic discussed in Section 2, needed adjustmentare the derivation of a simple confidenceinterval for 0 is estimated from the pseudovalues estimates difficultbecause ofthe nonlinearityof 0 in (3.5). However, using describedin (3.6). an analogous assumption to the one in Section 2 for FISH Z-Tests for confidenceintervals for ,u can be made. That is, the = ? 2' + p)/(- modification'y in (3.4) makes the effectof the term logUl( p) I (JgNk - 0)2 small-order,so 0 may be replaced by its Fisher'sz transformation; estimate,JgNk, therebyremoving the quadratic termin 0 fromthe expression.The resultingmodified confidence Z = 2 log [(1 + r)/(1 -r)] intervalfor 0 is is an estimate of ?, and z has ap- proximatelya (JgNk? +d ta/2,p . (3.8)

This content downloaded from 143.169.248.6 on Thu, 19 Dec 2013 01:17:45 AM All use subject to JSTOR Terms and Conditions 542 Journalof the AmericanStatistical Association, September 1978

2. Empirical Results for Sample Correlation sense of giving lower x2 values and a more symmetrical Coefficient,1,000 Samples, p = .95 distributionof counts. The adjustment does not appear to be a useful improvementfor adjusting Fisher's z Intervals transformation.In both cases, jackknife of correlation coefficientand jackknife of Fisher's z transformation, 1 2 3 4 5 6 better results occur when k is larger than n. This is in Probability agreementwith the definitionsand deriyationsof mo- ments of jackknifed third-ordermean-like variables as .01 .04 .45 .45 .04 .01 givenin (3.2). Observealso that Fisher'sz transformation Procedure Expected frequencies is better than the jackknifeprocedure. This is not sur-

N n k X2 10 40 450 450 40 10 prising.The jackknifeprocedure is a general procedure not necessarilyintended to be the best procedure in a JACK specificcase. The results of the empiricalstudy suggest 25 25 1 397.2 67 74 474 373 10 2 that the best method of obtaining confidenceintervals 5 5 148.0 37 81 488 372 20 2 z with k as 50 50 1 175.8 48 63 451 415 21 2 for p is to jackknifeFisher's transformation 10 5 134.6 42 68 446 416 24 4 large as possible fora given sample size N. 5 10 86.5 34 65 456 416 27 2 4. CONCLUSION ADJJACK The adjustment to the t variable given in Section 2 25 5 5 72.5 29 72 459 407 24 9 50 10 5 49.2 30 53 428 447 29 13 and the generalizedversion of adjusting the jackknifet 5 10 38.3 25 60 435 444 32 4 variable given in Section 3 improvethe accuracy of the t variable and are useful in practice since all necessary FISH Z parameterscan be estimated fromthe sample. Gayen's 25 23.2 14 52 503 387 31 13 methodrequires different tables fordifferent sample sizes 50 16.0 20 45 471 409 43 12 to make corrections,and is essentiallyapplicable only to means. In the case of estimatingthe mean, empirical JACKFISH Z studies show that the correctionsuggested in this paper 25 25 1 6.8 12 41 460 424 47 16 improvesthe distributionof the t variable even when the 5 5 2.0 13 36 459 443 41 8 50 50 1 13.7 15 40 450 431 44 20 parent population is as asymmetricalas x2 with two 10 5 8.5 14 44 447 432 46 17 degrees of freedom.Other empirical results show that 5 10 1.8 10 44 444 445 44 13 this procedureis equivalent to the Student t variable for samples drawn froma normalpopulation. ADJJACK FISH Z The generalizedform of the correctionto the t variable 25 5 5 3.3 6 42 467 439 35 11 was derived for the jackknifeprocedure and compared 50 10 5 9.0 11 47 450 427 48 17 5 10 1.8 11 43 440 448 47 11 empiricallyto a well-knownbut specific technique in the case of the correlationcoefficient. For a correlation NOTE: For each procedure, intervals are determined fromthe sample (of size N). The frequencywith which the true correlationp, or the transformation; = 1/21og[(1 + p)I(i - p)] of .95, the proposed adjustment to the jackknife pro- in the case of Fisher's transformation,was observed in each interval,is recorded in the cedure improvedthe resultsof the unadjusted jackknife. table. The column headings are the probabilities that the true correlation p would be in each interval, and the expected frequencies give the number of times this should The correctionsto the t variable derivedin this article happen over 1,000 samples according to the assumptions of the hypothesis. The X2 are based on approximationby Cornish-Fisherexpan- goodness-of-fit statistic is given as a convenient measure for comparison purposes. The procedures are fullydescribed in Section 3. sions. For such approximationsto be valid, the group size, n, and the number of observations per group, k, cannot be too small. In the case of the mean, a sample of JACK FISH Z-The jackknife procedure applied size 13, and in the case of the correlationcoefficient, a to Fisher's z transformationto sample of 25, the empirical studies indicate that the make tests for Cornish-Fisher approximations are valid and useful ? = I log [ ( + p)/(1- p)] correctionsto the t variable result. JACK FISH Z-The ADJ jackknife procedure applied APPENDIXA: DERIVATIONOF y AND X IN (2.3) to Fisher's z transformationmodi- fied by the adjustment given in The Cornish-Fisherexpansions of x and s2, ignoring (3.5). All parametersneeded in the higher-orderterms, are adjustmentare estimatedfrom the CF(x) = ,u + + (M3/6Na2)(r2-1), pseudovalues using estimates de- (al/V\N)D CF(82) = o2 + [u - 4) scribedin (3.6). I C= [(,U4-40)/NI4] + } The empirical results in Table 2 suggest that the adjustmentto the jackknifeprocedure given in (3.5) is where v and v1are standard normal random variables. an improvementover the jackknife proceduretin the Replacing the values of x and s2 in (2.3) by their

This content downloaded from 143.169.248.6 on Thu, 19 Dec 2013 01:17:45 AM All use subject to JSTOR Terms and Conditions Johnson:Effect of Skewnesson t Variables 543 respectiveexpansions, then reducingand ignoringterms independent,let E(?'t) = p. Then from(B.3), of order O(N-1), the Cornish-Fisherexpansion of t1 is E(T) -V p/n ? (n-1) yT_ /L3 [___ V/2(n - 1) CF(ti) = + [6,2/ + -/]? 63/ L6VN V\N- 6o-3V/N var(T) = 1 E(v271) + O(n-1) XN/N -yor 1 +__ - __ - - [(-A 4)/NoN ]n (A.1) ~ (n-+ 1) (B.4) 83+ 3,r 3 -\n E(T3)= 3 ( E(r3r7) Rewritingti = pP + D*, where p is the correlationbe- 3u\N -\/2(n -1) tween x and 82, and P* is a normal random variable + O(n-1) independentof ~, and notingthat p = M3[cT2 (A4 - 4) ] (A.1) reduces to The expressionsin (B.4), E(?271),and E(v37) have solu- tions as functionsof p: Let r = pv + 7'*, wherei7 and 71* o- A 3 -2 X\/N are independent,-q* is standard normal,and E(v77) = p. CF(ti)= +__ _ _ 1 _ L-VN 3o-3V\Nj O Then, forexample, E (v37) = E[ (p- + -q*)37n] - ___ _- - __ - -[-a4)i/2 02V\N]v* (A.2) V\N 6o- VN - E[p,'q4 + 3p2713r* + 3 p271*2 + 1*3] = p3 + 3p

By selecting y and X so that,the coefficientof t2 iS zero Similarly,E(r271) = 0. The expressions (B.4) reduce to and constantterms sum to the zero, resultingexpression -1 a/n will have reduced bias, its distributionwill be more E(T) 0(n-_)- p + symmetric,and the contributionof low-orderterms due /2 (n - 1) to the correlationbetween numeratorand denominator var(T) = 1 + O(n-'), (B.5) will be eliminated.The resultingvalues of e and X are 3 + 3T 9\/n given in (2.4). 1A3(T) -= _ 3V p +O(n') , 3 \IN V/2(n - 1) APPENDIXB: PROOFOF THEOREM1 The covariance between numeratorand denominatorof J T can be obtained in terms of moments and cross The moments of the numeratorof JT in (3.3) are momentsof the pseudovalues. From these we have given by (3.2), and the Cornish-Fisher expansion of JtNk usingthese moments is (n - 1) (A3 + 2r) nk\/2 *13 = \N + 6N 2 1) + , (B. 1) CF(JtNk) O(N-) The moments as given in Theorem 1 follow directly. where v is a unit normal random variable. Let D be the [ReceivedJune 1975. RevisedFebruary 1978.] expressionin square brackets in the denominatorof JT in (3.3). Then, REFERENCES E(D) = (o2/N) + O(1/nk2) Abramowitz,Milton, and Stegun,Irene A. (eds.) (1970),Handbook and of MathematicalFunctions, New York: Dover Publications. Andrews,David F., Bickel,Peter J., Hampel,F.R., Huber,Peter 2no4 J., Rogers, William H., and Tukey, John W. var(D) + O(N-3) (1972), Robust (n-)Nn - 1) 2N2 Estimates of Location: Survey and Advances, Princeton, N.J.: PrincetonUniversity Press. The Cornish-Fisherexpansion for D is then Anscombe,Francis J. (1950), "Tables of the HyperbolicTransfor- mation Sinh V/X," Journal of the Royal Statistical Society, Ser. A, 113,228-229. 0.2 a/2V\/na2 1 CF(D) - +W0 (B.2) Arvesen,James N. (1969), "JackknifingU-Statistics," Annals of N (n-1)N 7 nN/ MathematicalStatistics, 40, 2076-2100. Bartlett,M.S. (1935), "The Effectof Non-Normalityon the t where is a standardnormal random variable. Replacing Distribution," Proceedingsof the CambridgePhilosophical Society, -q 31, 223-231. the numeratorand denominatorof (3.3) by theirrespec- Box, GeorgeE.P., and Andersen,Sigurd L. (1955), "Permutation tive Cornish-Fisherexpansions gives, for0 = 0, Theoryin the Derivationof Robust Criteriaand the Study of Departures from Assumption," Journal of the Royal Statistical IA + 3r) Society,Ser. B, 17, 1-26. T = +-3(g (21) Chung,Kai-Lai (1946),"The ApproximateDistribution of Student's Statistic," Annals of MathematicalStatistics, 17, 447-465. Cornish,E.A., and Fisher,R.A. (1937), "Momentsand Cumulants V\n 1 in the Specificationof Distributions,"Revue of theInternational - + . (B.3) StatisticsInstitute, 5, 307-327. (n1 7 O(n'l) Gayen,A.K. (1949), "The Distributionof 'Student's't in Random Samples of Any Size Drawn from Non-NormalUniverses," Since the numerator and denominatorof JT are not Biometrika,36, 353-369.

This content downloaded from 143.169.248.6 on Thu, 19 Dec 2013 01:17:45 AM All use subject to JSTOR Terms and Conditions 544 Journalof the AmericanStatistical Association, September 1978

Geary,R.C. (1936), "The Distributionof 'Student's' Ratio for Miller,Rupert G. (1974), "The Jackknife-AReview," Biometrika, Non-NormalSamples," Journalof the Royal StatisticalSociety, 61, 1-15. Supplement3, No. 2, 178-184. Nair,A.K.N. (1941), "Distributionof Student'st in theCorrelation Ghosh,B.K. (1966), "AsymptoticExpansions for the Momentsof Coefficientin Samplesfrom Non-Normal Populations," Sankhyd, the Distributionof CorrelationCoefficient," Biometrika, 53, 5, 383-400. 258-262. Neyman,Jerzy, and Pearson,Egon S. (1928), "On the Use and Hartigan,John A. (1969), "Using SubsampleValues as Typical Interpretationof CertainTest Criteriafor Purposes of Statistical Values," Journal of the AmericanStatistical Association, 64, InferencePart I," Biometrika,20A, 175-240. 1303-1317. Perlo, Victor(1933), "On the Distributionof Student'sRatio for (1975),"Necessary and SufficientConditions for Asymptotic Samples of Three Drawn froma RectangularDistribution," JointNormality of a Statisticand its SubsampleValues," Annals Biometrika,25, 203-204. ofStatistics, 3, 573-580. Quenouille,Maurice H. (1949), "ApproximateTests of Correlation Hoeffding,Wassily (1948), "A Class ofStatistics with Asymptotically in Time-Series,"Journal of theRoyal Statistical Society, Ser. B, Normal Distribution,"Annals of MathematicalStatistics, 19, il, 68-84. 293-325. (1956), "Notes on Bias in Estimation,"Biometrika, 43, 353-360. Hotelling,Harold (1961), "The Behaviorof Some StandardSta- Rider,Paul R. (1929), "On the Distributionof the tisticalTests UnderNon-Standard Conditions," Proceedings of the Ratio of Mean to Standard Deviation in Small FourthBerkeley Symposium on MathematicalStatistics and Proba- Samples from Non-Normal Universes,"Biometrika, 21, bility,1, Universityof CaliforniaPress, 319-359. 124-143. "Sophister"(1928), "Discussionof Small SamplesDrawn froman Johnson,Norman J. (1974), The Effectsof PopulationSkewness on InfiniteSkew Population," Biometrika, 20A, 389-423. ConfidenceIntervals Determined from Mean-Like Statistics, un- "Student" (1908), "The ProbableError of a Mean," Biometrika, publishedPh.D. dissertation,Department of Statistics,Yale 6, 1-25. University. Tukey, John W. (1964), Data Analysisand BehavioralSciences, Laderman,Jack (1939), "The Distributionof 'Student's'Ratio for unpublishedmanuscript. Samples of Two Items Drawn fromNon-Normal Universes," (1977),Exploratory Data Analysis,Reading, Mass.: Addison- Annalsof Mathematical Statistics, 10,'376-379. WesleyPublishing Co. Marsaglia,G., Ananthanarayanan,K., and Paul, N. (1973), "How Wallace,David L. (1958), "AsymptoticApproximations to Distri- to Use theMcGill Random Number Package 'SUPER-DUPER'," butions,"Annals of Mathematical Statistics, 29, 635-654. unpublishedmanuscript, School of ComputerSciences, McGill Yuen, Karen K. (1974), "The Two-SampleTrimmed t forUnequal University,Montreal. PopulationVariances," Biometrika, 61, 165-170.

This content downloaded from 143.169.248.6 on Thu, 19 Dec 2013 01:17:45 AM All use subject to JSTOR Terms and Conditions