Second-Order Accurate Inference on Simple, Partial, and Multiple Correlations Robert J
Total Page:16
File Type:pdf, Size:1020Kb
Journal of Modern Applied Statistical Methods Volume 5 | Issue 2 Article 2 11-1-2005 Second-Order Accurate Inference on Simple, Partial, and Multiple Correlations Robert J. Boik Montana State University{Bozeman, [email protected] Ben Haaland University of Wisconsin{Madison, [email protected] Follow this and additional works at: http://digitalcommons.wayne.edu/jmasm Part of the Applied Statistics Commons, Social and Behavioral Sciences Commons, and the Statistical Theory Commons Recommended Citation Boik, Robert J. and Haaland, Ben (2005) "Second-Order Accurate Inference on Simple, Partial, and Multiple Correlations," Journal of Modern Applied Statistical Methods: Vol. 5 : Iss. 2 , Article 2. DOI: 10.22237/jmasm/1162353660 Available at: http://digitalcommons.wayne.edu/jmasm/vol5/iss2/2 This Invited Article is brought to you for free and open access by the Open Access Journals at DigitalCommons@WayneState. It has been accepted for inclusion in Journal of Modern Applied Statistical Methods by an authorized editor of DigitalCommons@WayneState. Journal of Modern Applied Statistical Methods Copyright c 2006 JMASM, Inc. November 2006, Vol. 5, No. 2, 283{308 1538{9472/06/$9 5.00 Invited Articles Second-Order Accurate Inference on Simple, Partial, and Multiple Correlations Robert J. Boik Ben Haaland Mathematical Sciences Statistics Montana State University{Bozeman University of Wisconsin{Madison This article develops confidence interval procedures for functions of simple, partial, and squared multiple correlation coefficients. It is assumed that the observed multivariate data represent a random sample from a distribution that possesses finite moments, but there is no requirement that the distribution be normal. The coverage error of conventional one-sided large sample intervals decreases at rate 1=pn as n increases, where n is an index of sample size. The coverage error of the proposed intervals decreases at rate 1=n as n increases. The results of a simulation study that evaluates the performance of the proposed intervals is reported and the intervals are illustrated on a real data set. Key words: bootstrap, confidence intervals, Cornish-Fisher expansion, Edgeworth expansion, second-order accuracy Introduction pansions for these distributions are derived. Accurate inference procedures for sim- The Edgeworth expansions, in turn, are ple, partial, and multiple correlation coeffi- used to construct confidence intervals that cients depend on accurate approximations are more accurate than conventional large to the distributions of estimators of these sample intervals. coefficients. In this article, Edgeworth ex- Denote the p p sample correlation × matrix based on a random sample of size Robert Boik ([email protected]) is N from a p-variate distribution by R and Professor of Statistics. His research in- denote the corresponding population corre- terests include linear models, multivari- lation matrix by ∆. The p2 1 vectors ob- × ate analysis, and large sample methods. tained by stacking the columns of R and Ben Haaland ([email protected]) is a ∆ are denoted by r and ρ, and are ob- Ph.D. student. His research interests in- tained by applying the vec operator. That clude multivariate analysis, rates of conver- is, r def= vec R and ρ def= vec ∆. The exact gence, and nonparametric techniques. joint distribution of the components of r, 283 284 SECOND-ORDER ACCURATE INFERENCE when sampling from a multivariate normal distribution, where Z is Fisher's (1921) distribution, was derived by Fisher (1962), Z transformation. If the requirement of but the distribution is very difficult to use multivariate normality is relaxed, then the in practice because it is expressed in inte- distributions of r and r are substantially gral form unless p = 2. If sample size is suf- more complicated. Fortunately, the asymp- ficiently large, however, then one may sub- totic distribution of pn(r ρ) still is mul- stitute the asymptotic distribution of r, but tivariate normal with mean− zero and fi- with some loss of accuracy. nite covariance matrix whenever the par- In this article, the big O (pronounced ent distribution has finite fourth-order mo- oh) notation (Bishop, Fienberg, & Holland, ments. Expressions for the scalar compo- 1975, 14.2) is used to index the magnitude nents of the asymptotic covariance matrix x of pn(r ρ) when sampling from non- of a quantity. Let un be a quantity that de- normal distributions− were derived by Hsu pends on n = N rx, where rx is a fixed − (1949) and Steiger and Hakstian (1982, constant and N is sample size. Then, un = −k k 1983). Corresponding matrix expressions O(n ) if n un is bounded as n . j j −k ! 1 were derived by Browne and Shapiro (1986) Note that if un = O(n ) and k > 0, then and Neudecker and Wesselman (1990). The un converges to zero as n . Also, the rate of convergence to zero !is faster1 for large asymptotic bias of r when sampling from k than for small k. An approximation to the non-normal distributions was obtained by distribution of a random variable is said to Boik (1998). be jth-order accurate if the difference be- In the bivariate case, Cook (1951) ob- tween the approximating cumulative distri- tained scalar expressions for the moments −5=2 bution function (cdf) and the exact cdf has of r with error O(n ). These moments magnitude O(n−j=2). For example, the cdf could be used to compute the first three cumulants of pn(r ρ). The first four cu- of a sample correlation coefficient, rij, is − def mulants and the corresponding Edgeworth F (t) = P (r t). Suppose that F^ is rij ij ≤ rij expansion for the distribution of of pn(r an estimator of F . Then, F^ is first-order − rij rij ρ) were obtained by Nakagawa and Niki accurate if F (t) F^ (t) = O(n−1=2), for j rij − rij j (1992). all t. Some distributional results for pn(r − Pearson and Filon (1898) showed that ρ) have been obtained under special non- the first-order accurate asymptotic joint normal conditions. Neudecker (1996), for distribution of the components of pn(r example, obtained a matrix expression for ρ), when sampling from a multivariate− the asymptotic covariance matrix in the normal distribution, is itself multivariate special case of elliptical parent distribu- normal with mean zero and finite covari- tions. Also, Yuan and Bentler (1999) gave ance matrix. Pearson and Filon also de- the asymptotic distribution of correlation rived expressions for the components of and multiple correlation coefficients when the asymptotic covariance matrix. A ma- the p-vector of random variables can be trix expression for the asymptotic covari- written as a linear function of zv, where ance matrix was derived by Nell (1985). z is a p-vector of independent components Niki and Konishi (1984) derived the Edge- and v is a scalar that is independent of z. worth and Cornish-Fisher expansions for This article focuses on confidence in- pn [Z(r) Z(ρ)] with error only O(n−9=2) tervals for functions of ∆. Specifically, when sampling− from a bivariate normal second-order accurate interval estimators BOIK & HAALAND 285 for simple, partial, and squared multiple sults of Olkin and Finn to allow condition- correlation coefficients as well as for differ- ing on any set of variables rather than a ences among simple, partial, and squared single variable and to allow squared multi- multiple correlation coefficients are con- ple correlations to be based on an arbitrary structed. A confidence interval is said to number of explanatory variables. Alf and be jth-order accurate if the difference be- Graf (1999) gave scalar equations for the tween the actual coverage and the stated standard error of the difference between two nominal coverage has magnitude O(n−j=2). squared sample multiple correlations. This In general, large sample confidence intervals article also extends the results of Olkin and are first-order accurate under mild validity Finn (1995), but does so using relatively conditions. simple and easily computed (with a com- First-order accurate confidence inter- puter) matrix expressions for the required vals for correlation functions are not new. derivatives. As in the previous articles, this Olkin and Siotani (1976) and Hedges and article relies on derivatives of correlation Olkin (1983) used the delta method (Rao, functions. Unlike Olkin and Finn, how- 1973, 6a.2) together with the asymptotic ever, simple, partial, and multiple correla- distributionx of pn(r ρ) when sampling tions are treated as functions of the covari- − from a multivariate normal distribution to ance matrix, Σ, instead of the correlation derive the asymptotic distribution of par- matrix, ∆. The advantage of the current tial and multiple correlation coefficient es- approach is simplicity of the resulting ex- timators. Olkin and Finn (1995) used pansions. these results to obtain explicit expressions This article also extends the results of for asymptotic standard errors of estima- Olkin and Finn (1995), Graf and Alf (1999), tors of simple, partial, and squared mul- and Alf and Graf (1999) to be applicable tiple correlation coefficients as well as ex- to to non-normal as well as normal distri- pressions for standard errors of differences butions. This extension is straightforward. among these coefficients. The major contri- Denote the sample covariance based on a bution of Olkin and Finn (1995) was that random sample of size N by S. Then, one they demonstrated how to use existing mo- needs only to replace the asymptotic covari- ment expressions to compute first-order ac- ance of vec S derived under normality with curate confidence intervals for correlation the asymptotic covariance matrix derived functions when sampling from multivariate under general conditions. normal distributions. In summary, the contributions of this To avoid complicated expressions for article are as follows: (a) easily computed derivatives, Olkin and Finn (1995) gave matrix expressions for the required deriva- confidence intervals for partial correlations tives are given (see Theorems 2 and 3), (b) and their differences only in the special case the proposed intervals are asymptotically when the effects of a single variable are par- distribution free (ADF); and (c) the accu- tialed out.