34. Statistical functionals and V-statistics Lehmann §6.1, 6.2
Let S be a set of cumulative distribution functions and let T : S → R be a functional defined on that set. Then T is called a statistical functional. In general, we may have an iid sample from an unknown distribution F , and we want to learn the value of θ = T (F ) for a (known) functional T . Some particular instances of statistical functionals are as follows:
• If T (F ) = F (c) for some constant c, then T is a statistical functional mapping each F to PF (X ≤ c). • If T (F ) = F −1(p) for some constant p, then T maps F to its pth quantile. Note that it is possible to define F −1(p) in such a way so that it is always defined even for distribution functions F that aren’t invertible in the usual sense: For example, take F −1(p)def= inf{x : F (x) ≥ p}.
• If T (F ) = E F (X), then of course T maps a distribution function to its mean.
Suppose X1,...,Xn is an iid sequence with distribution function F (x). We define the empirical distribution function Fˆn to be the distribution function for a discrete uniform distribution on {X1,...,Xn}. In other words,
n 1 1 X Fˆ (x) = #{i : X ≤ x} = I{X ≤ x}. n n i n i i=1
Note that Fˆn(x) is a legitimate cumulative distribution function. Thus, if T is a statistical functional, we might consider T (Fˆn) as an estimator of T (F ). For obvious reasons, this estimator is called a plug-in estimator; we will discuss it in more detail when we consider V-estimators a bit later. For example, if T (F ) = E F (X), then the plug-in estimator given an iid sample X1,X2,... from F is
n 1 X T (Fˆn) = E ˆ (X) = Xi = Xn. Fn n i=1
Suppose that for some real-valued function φ(x), we define T (F ) = E F φ(X). Note in this case that
T {αF1 + (1 − α)F2} = α E F1 φ(X) + (1 − α)E F2 φ(X) = αT (F1) + (1 − α)T (F2). For this reason, such a functional is sometimes called a linear functional.
To generalize this idea, we may take a real-valued function taking more than one real argument, say φ(x1, . . . , xa) for some a > 1, and define
T (F ) = E F φ(X1,...,Xa), (122) which we take to mean the expectation of φ(X1,...,Xa) where X1,...,Xa is an iid sample from the distri- bution function F . Since X1,...,Xa are iid, it is clear that
E F φ(X1,...,Xa) = E F φ(Xπ(1),...,Xπ(a)) for any permutation π mapping {1, . . . , a} onto itself. Since there are a! such permutations, consider the function 1 X φ∗(x , . . . , x ) = φ(x , . . . , x ). 1 a a! π(1) π(a) all π
96 ∗ ∗ Since E F φ(X1,...,Xa) = E F φ (X1,...,Xa) and φ is symmetric in its arguments (i.e., permuting its a arguments does not change its value), we see that in Equation (122) we may assume without loss of generality that φ is symmetric in its arguments. When a > 1, it may be shown that T is not a linear functional (this is beyond the scope of this course); nonetheless, we still consider it an expectation functional.
Definition 34.1 For some integer a ≥ 1, suppose X1,...,Xa are a random sample from the distribution F . Then given a function φ: Ra → R that is symmetric in its a arguments, the map T : F 7→ E φ(X1,...,Xa) is called an expectation functional. We write T (F ) = E F φ(X1,...,Xa). If a = 1, then T is also called a linear functional. The function φ is called the kernel function.
Suppose T (F ) is an expectation functional defined according to Equation (122) for some a ≥ 1. If we have an iid sample of size n from F , then as noted earlier, a natural way to estimate T (F ) is by the use of the plug-in estimator T (Fˆn). This estimator is called a V-estimator or a V-statistic. It is possible to write down ˆ 1 a V-statistic explicitly: Since Fn assigns probability n to each Xi, we have
n n 1 X X Vn = T (Fˆn) = E ˆ φ(X1,...,Xa) = ··· φ(Xi ,...,Xi ). (123) Fn na 1 a i1=1 ia=1
In the case a = 1, then Equation (123) becomes
n 1 X V = φ(X ). n n i i=1
2 It is clear in this case that E Vn = T (F ), which we denote by θ. Furthermore, if σ = Var F φ(X) < ∞, then the central limit theorem implies that
√ L 2 n(Vn − θ) → N(0, σ ).
For a > 1, however, the situation becomes slightly more complicated. First of all, Vn is no longer unbiased, since the sum in equation (123) contains some terms in which i1, . . . , ia are not all distinct. The expectation of such terms is not necessarily equal to θ = T (F ).
Example 34.1 Let a = 2 and φ(x1, x2) = |x1 − x2|. It may be shown (Problem 34.2) that the
functional T (F ) = E F |X1 −X2| is not linear in F . Furthermore, since |Xi1 −Xi2 | is identically zero whenever i1 = i2, it may also be shown that the V-estimator of T (F ) is biased.
Since the bias in Vn is due to the duplication among the subscripts i1, . . . , ia, the obvious way to correct this bias is to restrict the summation in Equation (123) to sets of subscripts i1, . . . , ia that contain no duplication. For example, we might sum instead over all i1 < ··· < ia, an idea that leads in the next topic to U-statistics.
Problems
Problem 34.1 Let X1,...,Xn be an iid sample from F . For a fixed x for which 0 < F (x) < 1, find the asymptotic distribution of Fˆn(x).
Problem 34.2 Let T (F ) = E F |X1 − X2|.
(a) Show that T (F ) is not a linear functional by exhibiting distributions F1 and F2 and a constant α ∈ (0, 1) such that
T {αF1 + (1 − α)F2}= 6 αT (F1) + (1 − α)T (F2).
97 (b) For n > 1, demonstrate that the V-statistic Vn is biased in this case by finding cn 6= 1 such that E F Vn = cnT (F ).
Problem 34.3 Let X1,...,Xn be a random sample from a distribution F with finite third absolute moment.
(a) Find φ(x1, . . . , xa) for some a such that E F φ(X1,...,Xa) = Var F X. Note that a is not allowed to depend on n, and you should try to find the smallest possible a. Furthermore, φ should be symmetric in its arguments.
(b) Let µ = E F (X). As in part (a), find φ(x1, . . . , xa) such that E F φ(X1,...,Xa) = E F (X − µ)3.
98 35. U-statistics Lehmann §6.1
Because the V-statistic n n 1 X X V = ··· φ(X ,...,X ) n na i1 ia i1=1 ia=1 is in general a biased estimator of T (F ) due to presence of summands in which there are duplicated indices on
the Xik , the obvious way to produce an unbiased estimator is to sum only over those (i1, . . . , ia) in which no duplicates occur. Because φ is assumed to be symmetric in its arguments, we may without loss of generality restrict attention to the cases in which 1 ≤ i1 < ··· < ia ≤ n. Doing this, we obtain the U-statistic 1 X Un = n φ(Xi1 ,...,Xia ). (124) a 1≤i1<···