Chapter 5 the Delta Method and Applications
Total Page:16
File Type:pdf, Size:1020Kb
Chapter 5 The Delta Method and Applications 5.1 Linear approximations of functions In the simplest form of the central limit theorem, Theorem 4.18, we consider a sequence X1,X2,... of independent and identically distributed (univariate) random variables with finite variance σ2. In this case, the central limit theorem states that √ d n(Xn − µ) → σZ, (5.1) where µ = E X1 and Z is a standard normal random variable. In this chapter, we wish to consider the asymptotic distribution of, say, some function of Xn. In the simplest case, the answer depends on results already known: Consider a linear function g(t) = at+b for some known constants a and b. Since E Xn = µ, clearly E g(Xn) = aµ + b =√g(µ) by the linearity of the expectation operator. Therefore, it is reasonable to ask whether n[g(Xn) − g(µ)] tends to some distribution as n → ∞. But the linearity of g(t) allows one to write √ √ n g(Xn) − g(µ) = a n Xn − µ . We conclude by Theorem 2.24 that √ d n g(Xn) − g(µ) → aσZ. Of course, the distribution on the right hand side above is N(0, a2σ2). None of the preceding development is especially deep; one might even say that it is obvious that a linear transformation of the random variable Xn alters its asymptotic distribution 85 by a constant multiple. Yet what if the function g(t) is nonlinear? It is in this nonlinear case that a strong understanding of the argument above, as simple as it may be, pays real dividends. For if Xn is consistent for µ (say), then we know that, roughly speaking, Xn will be very close to µ for large n. Therefore, the only meaningful aspect of the behavior of g(t) is its behavior in a small neighborhood of µ. And in a small neighborhood of µ, g(µ) may be considered to be roughly a linear function if we use a first-order Taylor expansion. In particular, we may approximate g(t) ≈ g(µ) + g0(µ)(t − µ) for t in a small neighborhood of µ. We see that g0(µ) is the multiple of t, and so the logic of the linear case above suggests √ d 0 2 n g(Xn) − g(µ) → g (µ)σ Z. (5.2) Indeed, expression (5.2) is a special case of the powerful theorem known as the delta method. 0 b d Theorem 5.1 Delta method: If g (a) exists and n (Xn − a) → X for b > 0, then b d 0 n {g(Xn) − g(a)} → g (a)X. P The proof of the delta method uses Taylor’s theorem, Theorem 1.18: Since Xn − a → 0, b b 0 n {g(Xn) − g(a)} = n (Xn − a) {g (a) + oP (1)} , b d and thus Slutsky’s theorem together with the fact that n (Xn − a) → X proves the result. Expression (5.2) may be reexpressed as a corollary of Theorem 5.1: Corollary 5.2 The often-used special case of Theorem 5.1 in which X is normally 0 √ d 2 distributed states that if g (µ) exists and n(Xn − µ) → N(0, σ ), then √ d 2 0 2 n g(Xn) − g(µ) → N 0, σ g (µ) . Ultimately, we will extend Theorem 5.1 in two directions: Theorem 5.5 deals with the special case in which g0(a) = 0, and Theorem 5.6 is the multivariate version of the delta method. But we first apply the delta method to a couple of simple examples that illustrate a frequently understood but seldom stated principle: When we speak of the “asymptotic distribution” of a sequence of random variables, we generally refer to a nontrivial (i.e., nonconstant) distribu- tion. Thus, in the case of an independent and identically distributed sequence X1,X2,... of random variables with finite variance, the phrase “asymptotic distribution of Xn” generally refers to the fact that √ d n Xn − E X1 → N(0, Var X1), P not the fact that Xn → E X1. 86 2 Example 5.3 Asymptotic distribution of Xn Suppose X1,X2,... are iid with mean µ and finite variance σ2. Then by the central limit theorem, √ d 2 n(Xn − µ) → N(0, σ ). Therefore, the delta method gives √ 2 2 d 2 2 n(Xn − µ ) → N(0, 4µ σ ). (5.3) However, this is not necessarily the end of the story. If µ = 0, then the normal √ 2 limit in (5.3) is degenerate—that is, expression (5.3) merely states that n(Xn) converges in probability to the constant 0. This is not what we mean by the asymptotic distribution! Thus, we must treat the case µ = 0 separately, noting √ d 2 in that case that nXn → N(0, σ ) by the central limit theorem, which implies that d 2 2 nXn → σ χ1. Example 5.4 Estimating binomial variance: Suppose Xn ∼ binomial(n, p). Because Xn/n is the maximum likelihood estimator for p, the maximum likelihood esti- 2 mator for p(1 − p) is δn = Xn(n − Xn)/n . The central limit theorem tells us that √ d n(Xn/n − p) → N{0, p(1 − p)}, so the delta method gives √ d 2 n {δn − p(1 − p)} → N 0, p(1 − p)(1 − 2p) . Note that in the case p = 1/2, this does not give the asymptotic distribution of δn. Exercise 5.1 gives a hint about how to find the asymptotic distribution of δn in this case. We have seen in the preceding examples that if g0(a) = 0, then the delta method gives something other than the asymptotic distribution we seek. However, by using more terms in the Taylor expansion, we obtain the following generalization of Theorem 5.1: Theorem 5.5 If g(t) has r derivatives at the point a and g0(a) = g00(a) = ··· = (r−1) b d g (a) = 0, then n (Xn − a) → X for b > 0 implies that 1 nrb {g(X ) − g(a)} →d g(r)(a)Xr. n r! It is straightforward using the multivariate notion of differentiability discussed in Definition 1.34 to prove the following theorem: 87 Theorem 5.6 Multivariate delta method: If g : Rk → R` has a derivative ∇g(a) at a ∈ Rk and b d n (Xn − a) → Y for some k-vector Y and some sequence X1, X2,... of k-vectors, where b > 0, then b d T n {g (Xn) − g (a)} → [∇g(a)] Y. The proof of Theorem 5.6 involves a simple application of the multivariate Taylor expansion of Equation (1.18). Exercises for Section 5.1 Exercise 5.1 Let δn be defined as in Example 5.4. Find the asymptotic distribution of δn in the case p = 1/2. That is, find constant sequences an and bn and a d nontrivial random variable X such that an(δn − bn) → X. Hint: Let Yn = Xn − (n/2). Apply the central limit theorem to Yn, then transform both sides of the resulting limit statement so that a statement involving δn results. Exercise 5.2 Prove Theorem 5.5. 5.2 Variance stabilizing transformations Often, if E (Xi) = µ is the parameter of interest, the central limit theorem gives √ d 2 n(Xn − µ) → N{0, σ (µ)}. In other words, the variance of the limiting distribution is a function of µ. This is a problem if we wish to do inference for µ, because ideally the limiting distribution should not depend on the unknown µ. The delta method gives a possible solution: Since √ d 2 0 2 n g(Xn) − g(µ) → N 0, σ (µ)g (µ) , we may search for a transformation g(x) such that g0(µ)σ(µ) is a constant. Such a transfor- mation is called a variance stabilizing transformation. 88 Example 5.7 Suppose that X1,X2,... are independent normal random variables 2 2 2 with mean 0 and variance σ . Let us define τ = Var Xi , which for the normal 4 4 4 distribution may be seen to be 2σ . (To verify this, try showing that E Xi = 3σ by differentiating the normal characteristic function four times and evaluating at zero.) Thus, Example 4.11 shows that n ! √ 1 X d n X2 − σ2 → N(0, 2σ4). n i i=1 To do inference for σ2 when we believe that our data are truly independent and identically normally distributed, it would be helpful if the limiting distribution did not depend on the unknown σ2. Therefore, it is sensible in light of Corollary 0 2 2 4 2 5.2 to search for a function g(t) such that 2[g (σ√)] σ is not a function of σ . In other words, we want g0(t) to be proportional to t−2 = |t|−1. Clearly g(t) = log t is such a function. Therefore, we call the logarithm function a variance-stabilizing function in this example, and Corollary 5.2 shows that ( n ! ) √ 1 X d n log X2 − log σ2 → N(0, 2). n i i=1 Exercises for Section 5.2 Exercise 5.3 Suppose Xn ∼ binomial(n, p), where 0 < p < 1. (a) Find the asymptotic distribution of g(Xn/n)−g(p), where g(x) = min{x, 1− x}. √ (b) Show that h(x) = sin−1( x) is a variance-stabilizing transformation. √ Hint: (d/du) sin−1(u) = 1/ 1 − u2. 2 Exercise 5.4 Let X1,X2,... be independent from N(µ, σ ) where µ 6= 0. Let n 1 X S2 = (X − X )2.