Asymptotic Variance of an Estimator
Total Page:16
File Type:pdf, Size:1020Kb
Chapter 10 Asymptotic Evaluations 10.1 Point Estimation 10.1.1 Consistency The property of consistency seems to be quite a fundamental one, requiring that the estimator converges to the correct value as the sample size becomes infinite. Definition 10.1.1 A sequence of estimators Wn = Wn(X1,...,Xn) is a consistent sequence of estimators of the parameter θ if, for every ² > 0 and every θ ∈ Θ, lim Pθ(|Wn − θ| < ²) = 1. n→∞ or equivalently, lim Pθ(|Wn − θ| ≥ ²) = 0. n→∞ 169 170 CHAPTER 10. ASYMPTOTIC EVALUATIONS Note that in this definition, we are dealing with a family of prob- ability structures. It requires that for every θ, the corresponding estimator sequence will converge to θ in probability. Recall that, for an estimator Wn, Chebychev’s Inequality states E [(W − θ)2] P (|W − θ| ≥ ²) ≤ θ n , θ n ²2 so if, for every θ ∈ Θ, 2 lim Eθ[(Wn − θ) ] = 0, n→∞ then the sequence of estimators is consistent. Furthermore, 2 2 Eθ[(Wn − θ) ] = VarθWn + [BiasθWn] . Putting this all together, we can state the following theorem. Theorem 10.1.1 If Wn is a sequence of estimators of a param- eter θ satisfying i. limn→∞ VarθWn = 0, ii. limn→∞ BiasθWn = 0, for every θ ∈ Θ, then Wn is a consistent sequence of estimators of θ. 10.1. POINT ESTIMATION 171 Example 10.1.1 (Consistency of X¯ ) Let X1,... be iid N(θ, 1), and consider the sequence 1 Xn X¯ = X . n n i i=1 Since 1 E X¯ = θ and Var X¯ = , θ n θ n n the sequence X¯n is consistent. Theorem 10.1.2 Let Wn be a consistent sequence of estimators of a parameter θ. let a1, a2,..., and b1, b2,... be sequences of constants satisfying i. limn→∞ an = 1, ii. limn→∞ bn = 0. Then the sequence Un = anWn + bn is a constant sequence of estimators of θ. 172 CHAPTER 10. ASYMPTOTIC EVALUATIONS Theorem 10.1.3 (Consistency of MLEs) Let X1,X2,..., be Qn iid f(x|θ), and let L(θ|x) = i=1 f(xi|θ) be the likelihood func- tion. Let θˆ denote the MLE of θ. Let τ(θ) be a continuous function of θ. Under regularity conditions on f(x|θ) and, hence, L(θ|x), for every ² > 0 and every θ ∈ Θ, ˆ lim Pθ(|τ(θ) − τ(θ)| ≥ ²) = 0. n→∞ This is, τ(θˆ) is a consistent estimator of τ(θ). For proof of the theorem, see Stuart, Ord and Arnold (1999). 10.1. POINT ESTIMATION 173 10.1.2 Efficiency The property of consistency is concerned with the asymptotic accu- racy of an estimator: Does it converge to the parameter that it is estimating? In this section we look at a related property, efficiency, which is concerned with the asymptotic variance of an estimator. In calculating an asymptotic variance, we are, perhaps, tempted to proceed as follows. Given an estimator Tn based on a sample of size n, we calculate the finite-sample variance VarTn, and then evaluate limn→∞ knVarTn, where kn is some normalizing constant. Note that, in many cases, VarTn → 0 as n → ∞, so we need a factor kn to force it to a limit. 174 CHAPTER 10. ASYMPTOTIC EVALUATIONS Definition 10.1.2 For an estimator Tn, if limn→∞ knVarTn = 2 2 τ < ∞, where {kn} is a sequence of constants, then τ is called the limiting variance or limit of the variances. Example 10.1.2 (Limiting variances) For the mean X¯n of n iid normal observations with EX = µ and VarX = σ2, if we take √ 2 Tn = X¯n, then limn→∞ nVarX¯n = σ is the limiting variance of Tn. Definition 10.1.3 For an estimator Tn, suppose that kn(Tn − τ(θ)) → N(0, σ2) in distribution. The parameter σ2 is called the asymptotic variance or variance of the limit distribution of Tn. For calculations of the variances of sample means and other types of averages, the limit variance and the asymptotic variance typically have the same value. But in more complicated cases, the limiting variance will sometimes fail us. It is also interesting to note that it is always the case that the asymptotic variance is smaller than the limiting variance. 10.1. POINT ESTIMATION 175 Example 10.1.3 (Large-sample mixture variances) Con- sider a mixture model, where we observe Yn ∼ N(0, 1) with prob- 2 ability pn and Yn ∼ N(0, σn) with probability 1 − pn. First, with the formula Var(X) = E(Var(X|Y ))+Var(E(X|Y )) we have 2 Var(Yn) = pn + (1 − pn)σn. It then follows that the limiting variance of Yn is finite only if 2 limn→∞(1 − pn)σn < ∞. On the other hand, the asymptotic distribution of Yn can be directly calculated using P (Yn < a) = pnP (Z < a) + (1 − pn)P (Z < a/σn). Suppose now we let pn → 1 and σn → ∞ in such a way that 2 (1 − pn)σn → ∞. It then follows that P (Yn < a) → P (Z < a), that is, Yn → N(0, 1), and we have 2 limiting variance = lim pn + (1 − pn)σn = ∞, n→∞ asymptotic variance = 1. 176 CHAPTER 10. ASYMPTOTIC EVALUATIONS Definition 10.1.4 A sequence of estimators Wn is asymptoti- √ cally efficient for a parameter τ(θ) if n[Wn−τ(θ)] → N(0, ν(θ)) in distribution and [τ 0(θ)]2 ν(θ) = ¡ ¢; ∂ 2 Eθ (∂θ log f(X|θ)) that is, the asymptotic variance of Wn achieves the C-Rao lower bound. Theorem 10.1.4 (Asymptotic efficiency of MLEs) Let X1,..., be iid f(x|θ), let θˆ denote the MLE of θ, and let τ(θ) be a con- tinuous function of θ. Under the regularity conditions on f(x|θ) and , hence, L(θ|x), √ n[τ(θˆ) − τ(θ)]] → N(0, ν(θ)], where ν(θ) is the C-Rao lower bound. That is, τ(θˆ) is a consis- tent and asymptotically efficient estimator of τ(θ). 10.1. POINT ESTIMATION 177 P Proof: Recall that l(θ|x) = log f(xi|θ) is the log likelihood function. Denote derivatives (with respect to θ) by l0, l00,.... Now expand the first derivatives of the log likelihood around the true value θ0, 0 0 00 l (θ|x) = l (θ0|x) + (θ − θ0)l (θ0|x) + ··· . Now substitute the MLE θˆ for θ, realize that l0(θˆ) = 0. Rearranging √ and multiplying through by n gives us 1 0 √ √ 0 −√ l (θ0|x) ˆ −l (θ0|x) n n(θ − θ0) = n 00 = 1 00 . l (θ0|x) nl (θ0|x) 0 2 Let I(θ0) = E[l (θ0|X)] denote the information number for one observation. We can see that 1 √ £ 1 X ¤ √ l0(θ |x) = n W , n 0 n i i d where Wi = (dθf(Xi|θ))/f(Xi|θ) has mean 0 and variance I(θ0). By the central limit theorem, we have 1 −√ l0(θ |x) → N[0,I(θ )]. (in distribution) n 0 0 Write d2 1 00 1 X 1 X f(xi|θ) − l (θ |x) = W − dθ2 , n 0 n i n f(X |θ) i i i 178 CHAPTER 10. ASYMPTOTIC EVALUATIONS 2 where the mean of Wi is I(θ0) and the mean of the second term is 0. Apply WLLN, we have 1 00 l (θ |x) → I(θ ) (in probability) n 0 0 By Slutsky’s theorem, we have √ ˆ 1 n(θ − θ0) → N(0, . I(θ0) Now assume that τ(θ) is differentiable at θ = θ0. By the delta method, we have √ n[τ(θˆ) − τ(θ)] → N(0, ν(θ)]. Since √ τ(θˆ) − τ(θ) n p → Z in distribution, ν(θ) where Z ∼ N(0, 1). By applying Slutsky’s theorem, we conclude p p ¡ ν(θ)¢¡√ τ(θˆ) − τ(θ)¢ ¡ ν(θ)¢ τ(θˆ) − τ(θ) = √ n p → lim √ Z = 0, n ν(θ) n→∞ n So τ(θˆ) − τ(θ) → 0 in distribution. From theorem 5.5.13 we know that convergence in distribution to a point is equivalent to conver- gence in probability, so τ(θˆ) is a consistent estimator of µ. ¤ 10.1. POINT ESTIMATION 179 10.1.3 Calculations and Comparisons If an MlE is asymptotically efficient, the asymptotic variance in The- orem 10.1.3 is the Delta method variance of Theorem 5.5.24 (with- out the 1/n term). Thus, we can use the C-Rao lower bound as an approximation to the true variance of the MLE. Suppose that ˆ ∂ X1,...,Xn are iid f(x|θ), θ is the MLE of θ, and In(θ) = Eθ(∂θ log L(θ|X)) is the information number of the sample. From the Delta method and asymptotic efficiency of MLEs, the variance of h(θˆ) can be ap- proximated by [h0(θ)]2 [h0(θ)]2 [h0(θ)]2| Var(h(θˆ)|θ) ≈ = ≈ θ=θˆ , ∂ ∂2 In(θ) E ( log L(θ|X)) θ ∂θ −∂θ2 log L(θ|X)|θ=θˆ (10.1) ∂2 where −∂θ2 log L(θ|X)|θ=θˆ is called the observed information num- ber. Efron and Hinkley (1978) have shown that use of the observed information number is superior to the expected information number in this case. Note that the variance estimation process is a two-step procedure. First we approximate Var(h(θˆ)|θ), then we estimate the resulting ap- proximation, usually by substituting θˆ for θ. The resulting estimate ˆ c ˆ can be denoted by Varθˆh(θ) or Varθh(θ). It follows from Theorem 180 CHAPTER 10.