Stat 709: Mathematical Statistics Lecture 30

Lecture 18: U- and V-statistics U-statistics Let X1;:::;Xn be i.i.d. from an unknown population P in a nonparametric family P. If the vector of order statistic is sufficient and complete for P 2 P, then a symmetric unbiased estimator of an estimable J is the UMVUE of J. In many problems, parameters to be estimated are of the form J = E[h(X1;:::;Xm)] with a positive integer m and a Borel function h that is symmetric and satisfies Ejh(X1;:::;Xm)j < ¥ for any P 2 P. An effective way of obtaining an unbiased estimator of J (which is a UMVUE in some nonparametric problems) is to use n −1 U = h(X ;:::;X ); (1) n ∑ i1 im m c n where ∑c denotes the summation over the m combinations of mbeamer-tu-logo distinct elements fi1;:::;img from f1;:::;ng. UW-Madison (Statistics) Stat 709 Lecture 18 2018 1 / 17 Definition 3.2 The statistic in (1) is called a U-statistic with kernel h of order m. Examples m Consider the estimation of m , where m = EX1 and m is an integer > 0. m Using h(x1;:::;xm) = x1 ···xm, we obtain the following U-statistic for m : n −1 U = X ···X : n ∑ i1 im m c Consider next the estimation of 2 2 s = [ Var(X1) + Var(X2)]=2 = E[(X1 − X2) =2]; 2 we obtain the following U-statistic with kernel h(x1;x2) = (x1 − x2) =2: 2 n ! 2 (Xi − Xj ) 1 2 ¯ 2 2 Un = ∑ = ∑ Xi − nX = S ; n(n − 1) 1≤i<j≤n 2 n − 1 i=1 beamer-tu-logo which is the sample variance. UW-Madison (Statistics) Stat 709 Lecture 18 2018 2 / 17 Examples In some cases, we would like to estimate J = EjX1 − X2j, a measure of concentration. Using kernel h(x1;x2) = jx1 − x2j, we obtain the following U-statistic unbiased for J = EjX1 − X2j: 2 Un = ∑ jXi − Xj j; n(n − 1) 1≤i<j≤n which is known as Gini’s mean difference. Let J = P(X1 + X2 ≤ 0). Using kernel h(x1;x2) = I(−¥;0](x1 + x2), we obtain the following U-statistic unbiased for J: 2 Un = ∑ I(−¥;0](Xi + Xj ); n(n − 1) 1≤i<j≤n beamer-tu-logo which is known as the one-sample Wilcoxon statistic. UW-Madison (Statistics) Stat 709 Lecture 18 2018 3 / 17 Variance of a U-statistic The variance of a U-statistic Un with kernel h has an explicit form. For k = 1;:::;m, let hk (x1;:::;xk ) = E[h(X1;:::;Xm)jX1 = x1;:::;Xk = xk ] = E[h(x1;:::;xk ;Xk+1;:::;Xm)] ~ hk = hk − E[h(X1;:::;Xm)] For any U-statistic with kernel h, n −1 U − E(U ) = h~(X ;:::;X ): (2) n n ∑ i1 im m c Theorem 3.4 (Hoeffding’s theorem) 2 For a U-statistic Un with E[h(X1;:::;Xm)] < ¥, n −1 m mn − m Var(Un) = ∑ zk ; m k=1 k m − k beamer-tu-logo where zk = Var(hk (X1;:::;Xk )). UW-Madison (Statistics) Stat 709 Lecture 18 2018 4 / 17 Proof Consider two sets fi1;:::;img and fj1;:::;jmg of m distinct integers from f1;:::;ng with exactly k integers in common. n mn−m The number of distinct choices of two such sets is m k m−k . ~ By the symmetry of hm and independence of X1;:::;Xn, ~ ~ E[h(Xi1 ;:::;Xim )h(Xj1 ;:::;Xjm )] = zk for k = 1;:::;m. Then, by (2), n −2 (U ) = E[h~(X ;:::;X )h~(X ;:::;X )] Var n ∑∑ i1 im j1 jm m c c n −2 m n mn − m = ∑ zk : m k=1 m k m − k This proves the result. beamer-tu-logo UW-Madison (Statistics) Stat 709 Lecture 18 2018 5 / 17 Corollary 3.2 Under the condition of Theorem 3.4, m2 m (i) n z1 ≤ Var(Un) ≤ n zm; (ii) (n + 1) Var(Un+1) ≤ n Var(Un) for any n > m; (iii) For any fixed m and k = 1;:::;m, if zj = 0 for j < k and zk > 0, then 2 k!m z 1 Var(U ) = k k + O : n nk nk+1 For any fixed m, if zj = 0 for j < k and zk > 0, then the mse of Un is of −k k=2 the order n and, therefore, Un is n -consistent. Example 3.11 2 Consider h(x1;x2) = x1x2, the U-statistic unbiased for m , m = EX1. ~ Note that h1(x1) = mx1, h1(x1) = m(x1 − m), ~ 2 2 2 2 ~ 2 z1 = E[h1(X1)] = m Var(X1) = m s , h(x1;x2) = x1x2 − m , and 2 4 2 2 2 4 z2 = Var(X1X2) = E(X1X2) − m = (m + s ) − m . n−1 beamer-tu-logo By Theorem 3.4, for Un = 2 ∑1≤i<j≤n Xi Xj , UW-Madison (Statistics) Stat 709 Lecture 18 2018 6 / 17 n−1 2n − 2 2n − 2 Var(U ) = z + z n 2 1 1 1 2 0 2 2 h i = 2(n − 2)m2s 2 + (m2 + s 2)2 − m4 n(n − 1) 4m2s 2 2s 4 = + : n n(n − 1) Next, consider h(x1;x2) = I(−¥;0](x1 + x2), which leads to the one-sample Wilcoxon statistic. Note that h1(x1) = P(x1 + X2 ≤ 0) = F(−x1), where F is the c.d.f. of P. Then z1 = Var(F(−X1)). Let J = E[h(X1;X2)]. Then z2 = Var(h(X1;X2)) = J(1 − J). Hence, for Un being the one-sample Wilcoxon statistic, 2 Var(U ) = [2(n − 2)z + J(1 − J)]: n n(n − 1) 1 If F is continuous and symmetric about 0, then z1 can be simplified as beamer-tu-logo 1 z1 = Var(F(−X1)) = Var(1 − F(X1)) = Var(F(X1)) = 12 UW-Madison (Statistics) Stat 709 Lecture 18 2018 7 / 17 Asymptotic distributions of U-statistics For nonparametric P, the exact distribution of Un is hard to derive. We study the method of projection, which is particularly effective for studying asymptotic distributions of U-statistics. Definition 3.3 Let Tn be a given statistic based on X1;:::;Xn. The projection of Tn on kn random elements Y1;:::;Ykn is defined to be kn ˇ Tn = E(Tn) + ∑[E(TnjYi ) − E(Tn)]: i=1 ˇ Let Tn be the projection of Tn on X1;:::;Xn, and yn(Xi ) = E(TnjXi ). If Tn is symmetric (as a function of X1;:::;Xn), then yn(X1);:::;yn(Xn) ˇ are i.i.d. with mean E[yn(Xi )] = E(Tn) = E[E(TnjXi )] = E(Tn). 2 If E(Tn ) < ¥ and Var(yn(Xi )) > 0, then, by the CLT, 1 n [yn(Xi ) − E(Tn)] !d N(0;1) (3) p ∑ beamer-tu-logo n Var(yn(X1)) i=1 UW-Madison (Statistics) Stat 709 Lecture 18 2018 8 / 17 ˇ If we can show Tn − Tn has a negligible order, then we can derive the asymptotic distribution of Tn by using (3) and Slutsky’s theorem. Lemma 3.1 ˇ Let Tn be a symmetric statistic with Var(Tn) < ¥ for every n and Tn be the projection of Tn on X1;:::;Xn. ˇ Then E(Tn) = E(Tn) and ˇ 2 ˇ E(Tn − Tn) = Var(Tn) − Var(Tn): Proof ˇ Since E(Tn) = E(Tn), ˇ 2 ˇ ˇ E(Tn − Tn) = Var(Tn) + Var(Tn) − 2 Cov(Tn;Tn) ˇ ˇ 2 Cov(Tn;Tn) = E(TnTn) − [E(Tn)] 2 = nE[TnE(TnjXi )] − n[E(Tn)] 2 = nEfE[TnE(TnjXi )jXi ]g − n[E(Tn)] 2 2 = nEf[E(TnjX )] g − n[E(Tn)] i beamer-tu-logo ˇ = n Var(E(TnjXi )) = Var(Tn) UW-Madison (Statistics) Stat 709 Lecture 18 2018 9 / 17 For a U-statistic Un, one can show (exercise) that n ˇ m ~ Un = E(Un) + ∑ h1(Xi ); n i=1 ˇ where Un is the projection of Un on X1;:::;Xn and ~ h1(x) = h1(x) − E[h(X1;:::;Xm)]; h1(x) = E[h(x;X2;:::;Xm)]: ~ Hence, if z1 = Var(h1(Xi )) > 0, ˇ 2 Var(Un) = m z1=n and, by Corollary 3.2 and Lemma 3.1, ˇ 2 −2 E(Un − Un) = O(n ): This is enough for establishing the asymptotic distribution of Un. If z1 = 0 but z2 > 0, then we can show that ˇ 2 −3 E(Un − Un) = O(n ): = One may derive results for the cases where z2 0, but the case ofbeamer-tu-logo either z1 > 0 or z2 > 0 is the most interesting case in applications. UW-Madison (Statistics) Stat 709 Lecture 18 2018 10 / 17 Theorem 3.5 2 Let Un be a U-statistic with E[h(X1;:::;Xm)] < ¥. (i) If z1 > 0, then p 2 n[Un − E(Un)] !d N(0;m z1): (ii) If z1 = 0 but z2 > 0, then ¥ m(m − 1) 2 n[Un − E(Un)] !d ∑ lj (c1j − 1); (4) 2 j=1 2 where c1j ’s are i.i.d. random variables having the chi-square 2 distribution c1 and lj ’s are some constants (which may depend on ¥ 2 P) satisfying ∑j=1 lj = z2. Lemma 3.2 Let Y be the random variable on the right-hand side of (4). 2 m2(m−1)2 beamer-tu-logo Then EY = 2 z2. UW-Madison (Statistics) Stat 709 Lecture 18 2018 11 / 17 It follows from Corollary 3.2(iii) and Lemma 3.2 that if z1 = 0, then m2(m − 1)2 amse (P) = z =n2 = Var(U ) + O(n−3) Un 2 2 n Proof of Lemma 3.2.

Stat 709: Mathematical Statistics Lecture 30

Notes for a Graduate-Level Course in Asymptotics for Statisticians

Chapter 4. an Introduction to Asymptotic Theory"

Chapter 6 Asymptotic Distribution Theory

Chapter 5 the Delta Method and Applications

Asymptotic Results for the Linear Regression Model

Asymptotic Independence of Correlation Coefficients With

Deriving the Asymptotic Distribution of U- and V-Statistics of Dependent

Asymptotic Distributions in Time Series

Asymptotic Variance of an Estimator

Asymptotic Inference in Some Heteroscedastic Regression Models

U-Statistics

A Unified Asymptotic Distribution Theory for Parametric and Non