34. Statistical Functionals and V-Statistics Lehmann §6.1, 6.2
Total Page:16
File Type:pdf, Size:1020Kb
34. Statistical functionals and V-statistics Lehmann §6.1, 6.2 Let S be a set of cumulative distribution functions and let T : S → R be a functional defined on that set. Then T is called a statistical functional. In general, we may have an iid sample from an unknown distribution F , and we want to learn the value of θ = T (F ) for a (known) functional T . Some particular instances of statistical functionals are as follows: • If T (F ) = F (c) for some constant c, then T is a statistical functional mapping each F to PF (X ≤ c). • If T (F ) = F −1(p) for some constant p, then T maps F to its pth quantile. Note that it is possible to define F −1(p) in such a way so that it is always defined even for distribution functions F that aren’t invertible in the usual sense: For example, take F −1(p)def= inf{x : F (x) ≥ p}. • If T (F ) = E F (X), then of course T maps a distribution function to its mean. Suppose X1,...,Xn is an iid sequence with distribution function F (x). We define the empirical distribution function Fˆn to be the distribution function for a discrete uniform distribution on {X1,...,Xn}. In other words, n 1 1 X Fˆ (x) = #{i : X ≤ x} = I{X ≤ x}. n n i n i i=1 Note that Fˆn(x) is a legitimate cumulative distribution function. Thus, if T is a statistical functional, we might consider T (Fˆn) as an estimator of T (F ). For obvious reasons, this estimator is called a plug-in estimator; we will discuss it in more detail when we consider V-estimators a bit later. For example, if T (F ) = E F (X), then the plug-in estimator given an iid sample X1,X2,... from F is n 1 X T (Fˆn) = E ˆ (X) = Xi = Xn. Fn n i=1 Suppose that for some real-valued function φ(x), we define T (F ) = E F φ(X). Note in this case that T {αF1 + (1 − α)F2} = α E F1 φ(X) + (1 − α)E F2 φ(X) = αT (F1) + (1 − α)T (F2). For this reason, such a functional is sometimes called a linear functional. To generalize this idea, we may take a real-valued function taking more than one real argument, say φ(x1, . , xa) for some a > 1, and define T (F ) = E F φ(X1,...,Xa), (122) which we take to mean the expectation of φ(X1,...,Xa) where X1,...,Xa is an iid sample from the distri- bution function F . Since X1,...,Xa are iid, it is clear that E F φ(X1,...,Xa) = E F φ(Xπ(1),...,Xπ(a)) for any permutation π mapping {1, . , a} onto itself. Since there are a! such permutations, consider the function 1 X φ∗(x , . , x ) = φ(x , . , x ). 1 a a! π(1) π(a) all π 96 ∗ ∗ Since E F φ(X1,...,Xa) = E F φ (X1,...,Xa) and φ is symmetric in its arguments (i.e., permuting its a arguments does not change its value), we see that in Equation (122) we may assume without loss of generality that φ is symmetric in its arguments. When a > 1, it may be shown that T is not a linear functional (this is beyond the scope of this course); nonetheless, we still consider it an expectation functional. Definition 34.1 For some integer a ≥ 1, suppose X1,...,Xa are a random sample from the distribution F . Then given a function φ: Ra → R that is symmetric in its a arguments, the map T : F 7→ E φ(X1,...,Xa) is called an expectation functional. We write T (F ) = E F φ(X1,...,Xa). If a = 1, then T is also called a linear functional. The function φ is called the kernel function. Suppose T (F ) is an expectation functional defined according to Equation (122) for some a ≥ 1. If we have an iid sample of size n from F , then as noted earlier, a natural way to estimate T (F ) is by the use of the plug-in estimator T (Fˆn). This estimator is called a V-estimator or a V-statistic. It is possible to write down ˆ 1 a V-statistic explicitly: Since Fn assigns probability n to each Xi, we have n n 1 X X Vn = T (Fˆn) = E ˆ φ(X1,...,Xa) = ··· φ(Xi ,...,Xi ). (123) Fn na 1 a i1=1 ia=1 In the case a = 1, then Equation (123) becomes n 1 X V = φ(X ). n n i i=1 2 It is clear in this case that E Vn = T (F ), which we denote by θ. Furthermore, if σ = Var F φ(X) < ∞, then the central limit theorem implies that √ L 2 n(Vn − θ) → N(0, σ ). For a > 1, however, the situation becomes slightly more complicated. First of all, Vn is no longer unbiased, since the sum in equation (123) contains some terms in which i1, . , ia are not all distinct. The expectation of such terms is not necessarily equal to θ = T (F ). Example 34.1 Let a = 2 and φ(x1, x2) = |x1 − x2|. It may be shown (Problem 34.2) that the functional T (F ) = E F |X1 −X2| is not linear in F . Furthermore, since |Xi1 −Xi2 | is identically zero whenever i1 = i2, it may also be shown that the V-estimator of T (F ) is biased. Since the bias in Vn is due to the duplication among the subscripts i1, . , ia, the obvious way to correct this bias is to restrict the summation in Equation (123) to sets of subscripts i1, . , ia that contain no duplication. For example, we might sum instead over all i1 < ··· < ia, an idea that leads in the next topic to U-statistics. Problems Problem 34.1 Let X1,...,Xn be an iid sample from F . For a fixed x for which 0 < F (x) < 1, find the asymptotic distribution of Fˆn(x). Problem 34.2 Let T (F ) = E F |X1 − X2|. (a) Show that T (F ) is not a linear functional by exhibiting distributions F1 and F2 and a constant α ∈ (0, 1) such that T {αF1 + (1 − α)F2} 6= αT (F1) + (1 − α)T (F2). 97 (b) For n > 1, demonstrate that the V-statistic Vn is biased in this case by finding cn 6= 1 such that E F Vn = cnT (F ). Problem 34.3 Let X1,...,Xn be a random sample from a distribution F with finite third absolute moment. (a) Find φ(x1, . , xa) for some a such that E F φ(X1,...,Xa) = Var F X. Note that a is not allowed to depend on n, and you should try to find the smallest possible a. Furthermore, φ should be symmetric in its arguments. (b) Let µ = E F (X). As in part (a), find φ(x1, . , xa) such that E F φ(X1,...,Xa) = E F (X − µ)3. 98 35. U-statistics Lehmann §6.1 Because the V-statistic n n 1 X X V = ··· φ(X ,...,X ) n na i1 ia i1=1 ia=1 is in general a biased estimator of T (F ) due to presence of summands in which there are duplicated indices on the Xik , the obvious way to produce an unbiased estimator is to sum only over those (i1, . , ia) in which no duplicates occur. Because φ is assumed to be symmetric in its arguments, we may without loss of generality restrict attention to the cases in which 1 ≤ i1 < ··· < ia ≤ n. Doing this, we obtain the U-statistic 1 X Un = n φ(Xi1 ,...,Xia ). (124) a 1≤i1<···<ia≤n Note that the sample size n must be at least as large as a for a U-statistic to be defined. Naturally, the “U” in “U-statistic” stands for unbiased (the “V” in “V-statistic” stands for von Mises, who was one of the originators of this theory in the late 1940’s). The fact that Un is unbiased is obvious, since it is the average n of a terms, each with expectation T (F ) = E F φ(X1,...,Xa). Because the subscripted F will always be understood, we omit it in what follows. Example 35.1 Consider a random sample X1,...,Xn from F , and let n X Rn = jI{Wj > 0} j=1 be the Wilcoxon signed rank statistic, where W1,...,Wn are simply X1,...,Xn reordered in increasing absolute value. We showed in Example 24.1 that n i X X Rn = I{Xi + Xj > 0}. i=1 j=1 Letting φ(a, b) = I{a + b > 0}, we see that φ is symmetric in its arguments and thus 1 1 n X 1 nRn = Un + n I{Xi > 0} = Un + OP ( n ) , 2 2 i=1 where Un is the U-statistic corresponding to the expectation functional T (F ) = P (X1+X2 > 0). Therefore, many of the properties of the signed rank test that we have already derived elsewhere can also be obtained using the theory of U-statistics developed later in this topic. In the special case a = 1, the V-statistic and the U-statistic coincide. In this case, we have already seen that both Un and Vn are asymptotically normal by the central limit theorem. However, for a > 1 clearly the two statistics do not coincide in general. Furthermore, we may no longer use the central limit theorem to obtain asymptotic normality because the summands are not independent (each Xi appears in more than one summand).