Chapter 10 U-Statistics
Total Page:16
File Type:pdf, Size:1020Kb
Chapter 10 U-statistics When one is willing to assume the existence of a simple random sample X1,...,Xn, U- statistics generalize common notions of unbiased estimation such as the sample mean and the unbiased sample variance (in fact, the “U” in “U-statistics” stands for “unbiased”). Even though U-statistics may be considered a bit of a special topic, their study in a large-sample theory course has side benefits that make them valuable pedagogically. The theory of U- statistics nicely demonstrates the application of some of the large-sample topics presented thus far. Furtheromre, the study of U-statistics even enables a theoretical discussion of sta- tistical functionals, which gives insight into the common modern practice of bootstrapping. 10.1 Statistical Functionals and V-Statistics Let S be a set of cumulative distribution functions and let T denote a mapping from S into the real numbers R. Then T is called a statistical functional. We may think of statistical functionals as parameters of interest. If, say, we are given a simple random sample from an distribution with unknown distribution function F , we may want to learn the value of θ = T (F ) for a (known) functional T . Some particular instances of statistical functionals are as follows: • If T (F ) = F (c) for some constant c, then T is a statistical functional mapping each F to PF (X ≤ c). • If T (F ) = F −1(p) for some constant p, where F −1(p) is defined in Equation (3.13), then T maps F to its pth quantile. • If T (F ) = E F (X), then T maps F to its mean. 157 Suppose X1,...,Xn is an independent and identically distributed sequence with distribution ˆ function F (x). We define the empirical distribution function Fn to be the distribution function for a discrete uniform distribution on {X1,...,Xn}. In other words, n 1 1 X Fˆ (x) = #{i : X ≤ x} = I{X ≤ x}. n n i n i i=1 ˆ Since Fn(x) is a distribution function, a reasonable estimator of T (F ) is the so-called plug-in ˆ estimator T (Fn). For example, if T (F ) = E F (X), then the plug-in estimator given a simple random sample X1,X2,... from F is n ˆ 1 X T (Fn) = E ˆ (X) = Xi = Xn. Fn n i=1 As we will see later, a plug-in estimator is also known as a V-statistic or a V-estimator. Suppose that for some real-valued function φ(x), we define T (F ) = E F φ(X). Note in this case that T {αF1 + (1 − α)F2} = α E F1 φ(X) + (1 − α)E F2 φ(X) = αT (F1) + (1 − α)T (F2). For this reason, such a functional is sometimes called a linear functional (see Definition 10.1). To generalize this idea, we consider a real-valued function taking more than one real argu- ment, say φ(x1, . , xa) for some a > 1, and define T (F ) = E F φ(X1,...,Xa), (10.1) which we take to mean the expectation of φ(X1,...,Xa) where X1,...,Xa is a simple random sample from the distribution function F . We see that E F φ(X1,...,Xa) = E F φ(Xπ(1),...,Xπ(a)) for any permutation π mapping {1, . , a} onto itself. Since there are a! such permutations, consider the function def 1 X φ∗(x , . , x ) = φ(x , . , x ). 1 a a! π(1) π(a) all π ∗ ∗ Since E F φ(X1,...,Xa) = E F φ (X1,...,Xa) and φ is symmetric in its arguments (i.e., permuting its a arguments does not change its value), we see that in Equation (10.1) we may assume without loss of generality that φ is symmetric in its arguments. A function defined as in Equation (10.1) is called an expectation functional, as summarized in the following definition: 158 Definition 10.1 For some integer a ≥ 1, let φ: Ra → R be a function symmetric in its a arguments. The expectation of φ(X1,...,Xa) under the assumption that X1,...,Xa are independent and identically distributed from some distri- bution F will be denoted by E F φ(X1,...,Xa). Then the functional T (F ) = E F φ(X1,...,Xa) is called an expectation functional. If a = 1, then T is also called a linear functional. Expectation functionals are important in this chapter because they are precisely the func- tionals that give rise to V-statistics and U-statistics. The function φ(x1, . , xa) in Definition 10.1 is used so frequently that we give it a special name: Definition 10.2 Let T (F ) = E F φ(X1,...,Xa) be an expectation functional, where φ: Ra → R is a function that is symmetric in its arguments. In other words, φ(x1, . , xa) = φ(xpi(1), . , xπ(a)) for any permutation π of the integers 1 through a. Then φ is called the kernel function associated with T (F ). Suppose T (F ) is an expectation functional defined according to Equation (10.1). If we have a simple random sample of size n from F , then as noted earlier, a natural way to estimate ˆ T (F ) is by the use of the plug-in estimator T (Fn). This estimator is called a V-estimator or a ˆ V-statistic. It is possible to write down a V-statistic explicitly: Since Fn assigns probability 1 n to each Xi, we have n n ˆ 1 X X Vn = T (Fn) = E ˆ φ(X1,...,Xa) = ··· φ(Xi ,...,Xi ). (10.2) Fn na 1 a i1=1 ia=1 In the case a = 1, then Equation (10.2) becomes n 1 X V = φ(X ). n n i i=1 2 It is clear in this case that E Vn = T (F ), which we denote by θ. Furthermore, if σ = Var F φ(X) < ∞, then the central limit theorem implies that √ d 2 n(Vn − θ) → N(0, σ ). For a > 1, however, the sum in Equation (10.2) contains some terms in which i1, . , ia are not all distinct. The expectation of such terms is not necessarily equal to θ = T (F ) because in Definition 10.1, θ requires a independent random variables from F . Thus, Vn is not necessarily unbiased for a > 1. Example 10.3 Let a = 2 and φ(x1, x2) = |x1 − x2|. It may be shown (Problem 10.2) that the functional T (F ) = E F |X1 − X2| is not linear in F . Furthermore, since 159 |Xi1 − Xi2 | is identically zero whenever i1 = i2, it may also be shown that the V-estimator of T (F ) is biased. Since the bias in Vn is due to the duplication among the subscripts i1, . , ia, one way to correct this bias is to restrict the summation in Equation (10.2) to sets of subscripts i1, . , ia that contain no duplication. For example, we might sum instead over all possible subscripts satisfying i1 < ··· < ia. The result is the U-statistic, which is the topic of Section 10.2. Exercises for Section 10.1 Exercise 10.1 Let X1,...,Xn be a simple random sample from F . For a fixed x for ˆ which 0 < F (x) < 1, find the asymptotic distribution of Fn(x). Exercise 10.2 Let T (F ) = E F |X1 − X2|. (a) Show that T (F ) is not a linear functional by exhibiting distributions F1 and F2 and a constant α ∈ (0, 1) such that T {αF1 + (1 − α)F2} 6= αT (F1) + (1 − α)T (F2). (b) For n > 1, demonstrate that the V-statistic Vn is biased in this case by finding cn 6= 1 such that E F Vn = cnT (F ). Exercise 10.3 Let X1,...,Xn be a random sample from a distribution F with finite third absolute moment. (a) For a = 2, find φ(x1, x2) such that E F φ(X1,X2) = Var F X. Your φ function should be symmetric in its arguments. 2 Hint: The fact that θ = E X1 − E X1X2 leads immediately to a non-symmetric φ function. Symmetrize it. 3 (b) For a = 3, find φ(x1, x2, x3) such that E F φ(X1,X2,X3) = E F (X − E F X) . As in part (a), φ should be symmetric in its arguments. 10.2 Hoeffding’s Decomposition and Asymptotic Nor- mality Because the V-statistic n n 1 X X V = ··· φ(X ,...,X ) n na i1 ia i1=1 ia=1 160 is in general a biased estimator of the expectation functional T (F ) = E F φ(X1,...,Xa) due to presence of summands in which there are duplicated indices on the Xik , one way to produce an unbiased estimator is to sum only over those (i1, . , ia) in which no duplicates occur. Because φ is assumed to be symmetric in its arguments, we may without loss of generality restrict attention to the cases in which 1 ≤ i1 < ··· < ia ≤ n. Doing this, we obtain the U-statistic Un: Definition 10.4 Let a be a positive integer and let φ(x1, . , xa) be the kernel func- tion associated with an expectation functional T (F ) (see Definitions 10.1 and 10.2). Then the U-statistic corresponding to this functional equals 1 X X Un = n ··· φ(Xi1 ,...,Xia ), (10.3) a 1≤i1<···<ia≤n where X1,...,Xn is a simple random sample of size n ≥ a. The “U” in “U-statistic” stands for unbiased (the “V” in “V-statistic” stands for von Mises, who was one of the originators of this theory in the late 1940’s).