On the Asymptotic Distribution of U-Statistics
Total Page:16
File Type:pdf, Size:1020Kb
ON THE ASYMPTOTIC DISTRIBUTION OF U-STATISTICS by Alan J. Lee* University of Auckland and University of North Carolina at Chapel Hill Abstract The asymptotic distribution of a U-statistic is found in the case when the corresponding Von Mises functional is stationary of order I. Practical methods for the tabulation of the limit distributions are discussed, and the results extended to certain incomplete U-statistics. Key Words and Phrases: asymptotic distribution, stationary statistical functional, incomplete U-statistic *The work of this author was partially supported by the Air Force Office of .e Scientific Research under Contract AFOSR-7S-2796. §1. Introduction. Let Xl"",X be independent random k-vectors with common distribution n ¢(~l""'~) function F. Let be a function on R km symmetric in the k-vectors ~xl""'x"'III and let m 8(F) = J...J ¢(xl,.·.,x) IT F(dx.) (1) D1< R m i=l 1 k be a functional defined on the space of d.f.s. on ~, assuming that the integral exists. An unbiased estimate of 8 = 8(F) is furnished by the U-statistic ¢(X. , ... ,X. ) (2) 1 1 1 m where the sum extends over all (~) subsets of the Xi' Following Hoeffding [10], define c=1,2, ... ,m and ~c(xl""'xc) = ~c(xl""'xc) - 8, assuming all expectations exist. Also define 1';0 = 0 , c = 1,2, ... ,m . < ~ If for some nonnegative integer. d m, sd+1 0 but Sc = 0 for c = O, ... ,d, the functional (1) is said to be stationary of order d at F. If d = 0, then (2) has an asymptotically normal distribution after suitable normalization [10]. If d ~ 0, the distribution is no longer normal and ma)' be obtained by 2 applying the theory of differentiable statistical functionals due to Von Mises and Filippova" ([13], [7]) . Let Fn be the empirical distribution function of Xl' ... 'Xn . The distribution of 1 n n S(F ) = - I </>(X. , ... ,X. ) (3) n m. 1 .L 1 1 n 1 1 =1 1 m 1= m is closely related to the distribution of U as described in §2, and the n asymptotic distribution of S(F ) is given by the following theorem. n Theorem (Von Mises, Filippova). Let 6 be a stationary functional of order d+l d. Then theasynlptotic distribution "of n2 (S(F ) -6) is identical to that " n of d II (F (dx.)-F(dx.» • (4) i=l n 1 1 In the case d = 1, the asymptotic distribution of (4) can be found using integral equation techniques. Filippova "gives an expression for the characteristic function of the asymptotic distribution of (4) in terms of Fredholm determinants. Gregory [8] gives a more concrete representation of the asymptotic distribution of (4) as an infinite series of random variables, in the case m = 2. Below we combine these results to give explicit expressions for the c.f.s. of the limit distributions and discuss practical methods for the tabulation of the limit distributions. We also make some remarks about incomplete U-statistics. 3 §2. Relationship between 8(P ) and U. n." n Let 1be the set of indices {(i , ... , i ): 1 ~ iJl, ~ n, JI, = 1, ... ,m} and let l m 1. be the subset of 1 consisting of those m-tuples with exactly j distinct J integers. Define for each j = i, ... ,m a sYmmetric kernel <Pu) (xl"" ,X~) by' <P(Xil""'Xim~ j1<P(j)(xl"",xj ) = L(j) where the sum L(j) is taken over all ~ ~ m-tuples of indices (il, .•. ,im) with 1 iJl, j such that exactly j of the in are distinct. Then 8(P ) =!-m L <p(X. , ..• ,X. ) N n n 1 1 1 1 m 1 m = - <pcx. , ... ,x. ) L L 1 1 nmJ= . 1 1 . 1 m J .e where u~j) is the U-statistic corresponding to the kernel <p(j)' Note that U(m)·= U. Thus n n (5) where (6) If 8 is stationary of order 1, then n(8(P )-8) converges to a nondegenerate . n distribution as seen in §3, and Var(nU ) is bounded as n + 00 ([10]). n Moreover, E(Zn) converges to the quantity -m(~-l) + 8(m-l) where 4 (m-I) .• . - . [IJ . (j) 8 = E[<!>(m_I) (XI,···,Xm_I)] and S1nce Var Un - 0 n2 . and Var Un = for j < mit follows from (6) that Z converges in probabi.1 ity to n -m(m-l) (m-I) ---2-- + 0 . , lind so the asymptotic di.s1ri.butiolls of n(O(F )-0) lind n n(U -8) differ only by a shift. A more explicit expression for e(m-l) is n 1 = (m-I)! I(m-l) E(<!>(Xi "",Xi » 1 m 1 m! (m-I) = (m-I)! 2 E(<!>(XI,XI'X2"",Xm_I» _ mem-I) - 2 E(~2(XI,XI» (m2JE(~2(XI,XI»' and so we..may write limnE(Z n) = lim nE n(8(F n)-8) = §3. Asymptotic. distribution of n(8(F. n)-8) and n(Un -8). By the Von Mises - Filippova theorem, the asymptotic distribution of n(8(F )-8) coincides with that of n (7) - '" where G (x) /:R(F (x)-F(x». Consider the karnel ~2(xI,x2) where (all n = n integrals are over II1c) Then it is easily seen that 5 '" JJ '1'Z(x1,xZ)Gn(dx1)Gn(dxZ) = J J'1'(x1,x2)Gn(dx1)Gn(dx2) =! 1ry(X.,X.) (8) n·I ·I _1J 1=1 J=1 Now let A., j ~ 1, be the (real) eigenvalues of the linear operator J '" LZ(IRk,dF) + LZ(IRk,dF) with (symmetric) kernel ~2' We note that by the assumption that ?:Z exists, JJ \'1'z(u,v)\z F(du)F(dv) < 00. By the results of [8], we can see that (i) the asymptotic distribution of (8) is the same as (mz}{ I A. (Y .-1) + E(o/Z(X ,X ).)}. (9) j=l J J 1 1 where the Y are independent xi random variables; and j 00 Cii) if L 11...1 < 00 then the asymptotic distribution of (8) is that of j=l J A.Y .. JJ To prove these, set the functions h in Theorem 2.1 of [8] to be 0, then n 00 !\''''L '1'ZCX.,X. ) converges in distribution to \'L A.(Y.-1). Since n1. " ;tJ 1. J J'=l J J 1 n n 1 '" 1 n '" 1 n '" - I I '1'2(X.,x.) = - I '1'2(X.,X,) + - I '1'2(X.)x.) and- I '1'z(x.,x.) n i=l j=l 1. J n i;tj 1. J n i=l 1. 1. n i=l . 1. 1. '" converges in probability to E['1'Z(X1,X1)] by the WLLN, (9) follows upon noting that E['1'2(X1,X1)] = J '1'2(x1,X1)F(dx1) - E('1'2(Xl'X2)) =J '1'Z(x1,X1)F(dx1) 6 since E[~2(X1,X2)] = 0 by [10]. (ii) follows as in 1.2.3 of [8]. Using (i) and (ii) above and (5) we see that the asymptotic (~) distribution of n(U -8) is that of I A.(Y.-1), and if I IA.I < co, the n j=l JJ J distribution to that of (~) ~~1 ~/~ - E !'J!~(XrXl) ~. The characteristic functions of the limit distributions are thus seen to be co -itA'• 1 </let) = II e J (1-2iA~t)-~ in the former case and j=l J </let) = e-it~ II(1-2iAjt)-~ in the latter where ~ =(;}E[~2(X1,X1)] and Aj = (~}Aj' j ~ 1. §4. Tabulation of the limit distribution. In this section we discuss practical means of computing the limit distribution of n(U -8). We first consider the case when the eigenvalues A. n J are surnmab1e. The method employed to tabulate the limit distribution will depend on the amount of information we possess about the eigenvalues. There are three subcases. (i) The eigenvalues must be determined numerically, and we have accurate determinations of x.. for 1 ::; j ::; N. J (ii) The eigenvalues can be explicitly dete-rmined for all j. co (iii) The characteristic function of I A. Y., namely co 1 j=l JJ </let) = IT (1-2itA.)-~, can be expressed in closed form. j=l J In subcase (iii), the limit distribution may most conveniently be found by numerical inversion of </let). Various methods are available (e.g. Davies [5], Bohman [3], Martynov [12]). In subcase (ii), computation of the inner product is necessary. A reason~b1e way to accomplish this is described in N-1 !,: [2] and consists of expJ'es~ing </let) = II (1-2itA .)-2 </l2(t) where j=l J 7 00 log ¢(t) = -~ I 10g(1-2itA.). Formally, we have j=N J 00 00 00 00 k log ¢(t) = ~ I I (2itA .)k/k = ~ I c k(2it)k;lk where C k = I A.. j=N k=l J k=l N' N, j=N J The formal manipulations above are valid if the last power series converges. Since IcNkl ~ [.Y IA.I]k = c~, say, this convergence will take place if J=N J ItI ~ 2~ • The numerical inversion methods above will require the N computation of ¢2 at a finite number M of ordinates t , so choose N so large . m m~x that CN < (2 Itml)-l and use the power series to compute ¢2' In special cases other methods may be used. If the eigenvalues A. are J all positive, the asymptotic results of Zolotarev [14] and Hoeffding [11] furnish good approximations for the tail probabilities.