Stat 709: Mathematical Statistics Lecture 30

Lecture 18: U- and V-statistics U-statistics Let X1;:::;Xn be i.i.d. from an unknown population P in a nonparametric family P. If the vector of order statistic is sufficient and complete for P 2 P, then a symmetric unbiased estimator of an estimable J is the UMVUE of J. In many problems, parameters to be estimated are of the form J = E[h(X1;:::;Xm)] with a positive integer m and a Borel function h that is symmetric and satisfies Ejh(X1;:::;Xm)j < ¥ for any P 2 P. An effective way of obtaining an unbiased estimator of J (which is a UMVUE in some nonparametric problems) is to use n −1 U = h(X ;:::;X ); (1) n ∑ i1 im m c n where ∑c denotes the summation over the m combinations of mbeamer-tu-logo distinct elements fi1;:::;img from f1;:::;ng. UW-Madison (Statistics) Stat 709 Lecture 18 2018 1 / 17 Definition 3.2 The statistic in (1) is called a U-statistic with kernel h of order m. Examples m Consider the estimation of m , where m = EX1 and m is an integer > 0. m Using h(x1;:::;xm) = x1 ···xm, we obtain the following U-statistic for m : n −1 U = X ···X : n ∑ i1 im m c Consider next the estimation of 2 2 s = [ Var(X1) + Var(X2)]=2 = E[(X1 − X2) =2]; 2 we obtain the following U-statistic with kernel h(x1;x2) = (x1 − x2) =2: 2 n ! 2 (Xi − Xj ) 1 2 ¯ 2 2 Un = ∑ = ∑ Xi − nX = S ; n(n − 1) 1≤i<j≤n 2 n − 1 i=1 beamer-tu-logo which is the sample variance. UW-Madison (Statistics) Stat 709 Lecture 18 2018 2 / 17 Examples In some cases, we would like to estimate J = EjX1 − X2j, a measure of concentration. Using kernel h(x1;x2) = jx1 − x2j, we obtain the following U-statistic unbiased for J = EjX1 − X2j: 2 Un = ∑ jXi − Xj j; n(n − 1) 1≤i<j≤n which is known as Gini’s mean difference. Let J = P(X1 + X2 ≤ 0). Using kernel h(x1;x2) = I(−¥;0](x1 + x2), we obtain the following U-statistic unbiased for J: 2 Un = ∑ I(−¥;0](Xi + Xj ); n(n − 1) 1≤i<j≤n beamer-tu-logo which is known as the one-sample Wilcoxon statistic. UW-Madison (Statistics) Stat 709 Lecture 18 2018 3 / 17 Variance of a U-statistic The variance of a U-statistic Un with kernel h has an explicit form. For k = 1;:::;m, let hk (x1;:::;xk ) = E[h(X1;:::;Xm)jX1 = x1;:::;Xk = xk ] = E[h(x1;:::;xk ;Xk+1;:::;Xm)] ~ hk = hk − E[h(X1;:::;Xm)] For any U-statistic with kernel h, n −1 U − E(U ) = h~(X ;:::;X ): (2) n n ∑ i1 im m c Theorem 3.4 (Hoeffding’s theorem) 2 For a U-statistic Un with E[h(X1;:::;Xm)] < ¥, n −1 m mn − m Var(Un) = ∑ zk ; m k=1 k m − k beamer-tu-logo where zk = Var(hk (X1;:::;Xk )). UW-Madison (Statistics) Stat 709 Lecture 18 2018 4 / 17 Proof Consider two sets fi1;:::;img and fj1;:::;jmg of m distinct integers from f1;:::;ng with exactly k integers in common. n mn−m The number of distinct choices of two such sets is m k m−k . ~ By the symmetry of hm and independence of X1;:::;Xn, ~ ~ E[h(Xi1 ;:::;Xim )h(Xj1 ;:::;Xjm )] = zk for k = 1;:::;m. Then, by (2), n −2 (U ) = E[h~(X ;:::;X )h~(X ;:::;X )] Var n ∑∑ i1 im j1 jm m c c n −2 m n mn − m = ∑ zk : m k=1 m k m − k This proves the result. beamer-tu-logo UW-Madison (Statistics) Stat 709 Lecture 18 2018 5 / 17 Corollary 3.2 Under the condition of Theorem 3.4, m2 m (i) n z1 ≤ Var(Un) ≤ n zm; (ii) (n + 1) Var(Un+1) ≤ n Var(Un) for any n > m; (iii) For any fixed m and k = 1;:::;m, if zj = 0 for j < k and zk > 0, then 2 k!m z 1 Var(U ) = k k + O : n nk nk+1 For any fixed m, if zj = 0 for j < k and zk > 0, then the mse of Un is of −k k=2 the order n and, therefore, Un is n -consistent. Example 3.11 2 Consider h(x1;x2) = x1x2, the U-statistic unbiased for m , m = EX1. ~ Note that h1(x1) = mx1, h1(x1) = m(x1 − m), ~ 2 2 2 2 ~ 2 z1 = E[h1(X1)] = m Var(X1) = m s , h(x1;x2) = x1x2 − m , and 2 4 2 2 2 4 z2 = Var(X1X2) = E(X1X2) − m = (m + s ) − m . n−1 beamer-tu-logo By Theorem 3.4, for Un = 2 ∑1≤i<j≤n Xi Xj , UW-Madison (Statistics) Stat 709 Lecture 18 2018 6 / 17 n−1 2n − 2 2n − 2 Var(U ) = z + z n 2 1 1 1 2 0 2 2 h i = 2(n − 2)m2s 2 + (m2 + s 2)2 − m4 n(n − 1) 4m2s 2 2s 4 = + : n n(n − 1) Next, consider h(x1;x2) = I(−¥;0](x1 + x2), which leads to the one-sample Wilcoxon statistic. Note that h1(x1) = P(x1 + X2 ≤ 0) = F(−x1), where F is the c.d.f. of P. Then z1 = Var(F(−X1)). Let J = E[h(X1;X2)]. Then z2 = Var(h(X1;X2)) = J(1 − J). Hence, for Un being the one-sample Wilcoxon statistic, 2 Var(U ) = [2(n − 2)z + J(1 − J)]: n n(n − 1) 1 If F is continuous and symmetric about 0, then z1 can be simplified as beamer-tu-logo 1 z1 = Var(F(−X1)) = Var(1 − F(X1)) = Var(F(X1)) = 12 UW-Madison (Statistics) Stat 709 Lecture 18 2018 7 / 17 Asymptotic distributions of U-statistics For nonparametric P, the exact distribution of Un is hard to derive. We study the method of projection, which is particularly effective for studying asymptotic distributions of U-statistics. Definition 3.3 Let Tn be a given statistic based on X1;:::;Xn. The projection of Tn on kn random elements Y1;:::;Ykn is defined to be kn ˇ Tn = E(Tn) + ∑[E(TnjYi ) − E(Tn)]: i=1 ˇ Let Tn be the projection of Tn on X1;:::;Xn, and yn(Xi ) = E(TnjXi ). If Tn is symmetric (as a function of X1;:::;Xn), then yn(X1);:::;yn(Xn) ˇ are i.i.d. with mean E[yn(Xi )] = E(Tn) = E[E(TnjXi )] = E(Tn). 2 If E(Tn ) < ¥ and Var(yn(Xi )) > 0, then, by the CLT, 1 n [yn(Xi ) − E(Tn)] !d N(0;1) (3) p ∑ beamer-tu-logo n Var(yn(X1)) i=1 UW-Madison (Statistics) Stat 709 Lecture 18 2018 8 / 17 ˇ If we can show Tn − Tn has a negligible order, then we can derive the asymptotic distribution of Tn by using (3) and Slutsky’s theorem. Lemma 3.1 ˇ Let Tn be a symmetric statistic with Var(Tn) < ¥ for every n and Tn be the projection of Tn on X1;:::;Xn. ˇ Then E(Tn) = E(Tn) and ˇ 2 ˇ E(Tn − Tn) = Var(Tn) − Var(Tn): Proof ˇ Since E(Tn) = E(Tn), ˇ 2 ˇ ˇ E(Tn − Tn) = Var(Tn) + Var(Tn) − 2 Cov(Tn;Tn) ˇ ˇ 2 Cov(Tn;Tn) = E(TnTn) − [E(Tn)] 2 = nE[TnE(TnjXi )] − n[E(Tn)] 2 = nEfE[TnE(TnjXi )jXi ]g − n[E(Tn)] 2 2 = nEf[E(TnjX )] g − n[E(Tn)] i beamer-tu-logo ˇ = n Var(E(TnjXi )) = Var(Tn) UW-Madison (Statistics) Stat 709 Lecture 18 2018 9 / 17 For a U-statistic Un, one can show (exercise) that n ˇ m ~ Un = E(Un) + ∑ h1(Xi ); n i=1 ˇ where Un is the projection of Un on X1;:::;Xn and ~ h1(x) = h1(x) − E[h(X1;:::;Xm)]; h1(x) = E[h(x;X2;:::;Xm)]: ~ Hence, if z1 = Var(h1(Xi )) > 0, ˇ 2 Var(Un) = m z1=n and, by Corollary 3.2 and Lemma 3.1, ˇ 2 −2 E(Un − Un) = O(n ): This is enough for establishing the asymptotic distribution of Un. If z1 = 0 but z2 > 0, then we can show that ˇ 2 −3 E(Un − Un) = O(n ): = One may derive results for the cases where z2 0, but the case ofbeamer-tu-logo either z1 > 0 or z2 > 0 is the most interesting case in applications. UW-Madison (Statistics) Stat 709 Lecture 18 2018 10 / 17 Theorem 3.5 2 Let Un be a U-statistic with E[h(X1;:::;Xm)] < ¥. (i) If z1 > 0, then p 2 n[Un − E(Un)] !d N(0;m z1): (ii) If z1 = 0 but z2 > 0, then ¥ m(m − 1) 2 n[Un − E(Un)] !d ∑ lj (c1j − 1); (4) 2 j=1 2 where c1j ’s are i.i.d. random variables having the chi-square 2 distribution c1 and lj ’s are some constants (which may depend on ¥ 2 P) satisfying ∑j=1 lj = z2. Lemma 3.2 Let Y be the random variable on the right-hand side of (4). 2 m2(m−1)2 beamer-tu-logo Then EY = 2 z2. UW-Madison (Statistics) Stat 709 Lecture 18 2018 11 / 17 It follows from Corollary 3.2(iii) and Lemma 3.2 that if z1 = 0, then m2(m − 1)2 amse (P) = z =n2 = Var(U ) + O(n−3) Un 2 2 n Proof of Lemma 3.2.

Stat 709: Mathematical Statistics Lecture 30

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support