9 Properties of Point Estimators and Finding Them

9 Properties of point estimators and finding them 9.1 Introduction We consider several properties of estimators in this chapter, in particular ^ efficiency, consistency and sufficient statistics. An estimator θn is consistent if it converges to θ in a suitable sense as n ! 1. An estimator θ^ for θ is sufficient, if it contains all the information that we can extract from the random sample to estimate θ. If we have a sufficient statistic, then the Rao- Blackwell theorem gives a procedure for finding the unbiased estimator with the smallest variance. Until now we have been pulling our estimators out of the air. At the end of this chapter we consider two methods for finding estimators - the method of moments and maximum likelihood estimators. 9.2 Relative Efficiency If we want to compare two unbiased estimator for θ, then we should compare their variances. The estimator with the smaller variance is better. Why? If we want to quantify this we can look at their ratio. The relative efficiency ^ ^ of θ1 with respect to θ2 is defined to be var(θ^ ) eff(θ^ ; θ^ ) = 2 (1) 1 2 ^ var(θ1) ^ ^ If this is greater than 1, the estimator θ1 is better than the estimator θ2. Example: Suppose that Y is uniformly distributed on [0; θ]. (This is the population distribution.) The parameter θ is unknown. Let Y1;Y2; ··· ;Yn be a random sample. (So the Yi are independent and each is uniform on [0; θ].) We compare two possible unbiased estimators for θ. The mean of Y is just θ=2. So let ^ θ1 = 2Y (2) ^ 2 2 Then θ1 is an unbiased estimator for θ. The variance of Y is σ = θ =12. So the variance of σ = θ2=(3n) (3) θ^1 1 Now define (n) Y = maxfY1;Y2; ··· ;Yng (4) and n + 1 θ^ = Y (n) (5) 2 n ^ In the homework we saw that θ2 is an unbiased estimator. We computed the variance of Y (n) in the homework and found n var(Y (n)) = θ2 (6) (n + 2)(n + 1)2 So 1 var(θ^ ) = θ2 (7) 2 (n + 2)n ^ ^ So the relative efficiency of θ1 with respect to θ2 is 3 eff(θ^ ; θ^ ) = (8) 1 2 n + 2 We see that θ2 is a much better estimator than θ1. 9.3 Consistency We want to consider the behavior of estimators as the sample size goes to ^ infinity. So we will denote the estimator by θn to make the dependence on the sample size explicit. We have seen before that for our estimators for the population mean, population proportion, of population variance, the variance of the estimator goes to zero as the sample size goes to infinity. Since these are unbiased ^ ^ estimators, the mean of θn is θ. So as n ! 1, the distribution of θn is ^ becoming more and more concentrated around θ. Does this mean that θn is converging to θ in some sense? There are various senses in which a sequence of random variables can converge. In this section we will define one notion of convergence and show that if the variance of an unbiased estimator does ^ to zero as n ! 1, then θn converges to θ in this sense. When this happens we say the estimator is consistent. 2 Definition 1. A sequence of random variables Xn is said to converge to the random variable X in probability if for all > 0, we have lim P (jXn − Xj ≥ ) = 0 (9) n!1 ^ ^ Definition 2. A point estimator θn for θ is consistent if θn converges to θ in probability. Remark: Note that a constant is a RV. In the last definition we are thinking of the constant θ as a random variable. The definition is a bit abstract, so we look at what it says in the context of an example. We flip a coin n times and letp ^n be the number of heads we get divided by n. This is a sample proportion which estimates p, the probability that the coin comes up heads. Suppose p = 1=2. Ifp ^n converges in probability to 1=2, then P (0:49 ≤ p^n ≤ 0:51) converges to 1 as the number of flips goes to infinity, P (0:499 ≤ p^n ≤ 0:501) converges to 1 as the number of flips goes to infinity, and so on. Theorem 1. If θ^ is an unbiased estimator for θ and σ ! 0 as n ! 1, n θ^n ^ then θn is consistent. Proof: Follows from Chebyshev's inequality Corollary 1. The following estimators are consistent • The sample mean Y as an estimator for the population mean µ. • The difference of two sample means Y1 − Y2 drawn independently from two different populations as an estimator for the difference of the population means µ1 − µ2 if both sample sizes go to infinity. • The sample proportion p^ as an estimator for the population proportion p. • The difference of two sample proportions p^1 − p^2 drawn independently from two different populations as an estimator for the difference of the population proportions p1 − p2 if both sample sizes go to infinity. • The sample variance S2 as an estimator for the population variance σ2. 3 2 2 The corollary says that Sn converges in probability to σ . Does this imply that Sn converges in probability to σ? If so, Sn would be a consistent estimator for σ. The following theorem says it does. ^ Theorem 2. Suppose that the estimator θn converges in probability to the ^ parameter θ and the estimator βn converges in probability to the parameter β. Then ^ ^ 1. θn + βn converges in probability to θ + β. ^ ^ 2. θn × βn converges in probability to θ × β. ^ ^ 3. If β 6= 0, θn=βn converges in probability to θ/β. ^ 4. If g(x) is a continuous function, then g(θn) converges in probability to g(θ). Proof: We just prove the first statement. Fix an > 0. We must show ^ ^ lim P (jθn + βn − θ − βj ≤ ) = 1 n!1 Define events by ^ En = fjθn − θj ≤ /2g; ^ Fn = fjβn − βj ≤ /2g; ^ ^ Gn = fjθn + βn − θ − βj ≤ g We know that as n ! 1, P (En) and P (Fn) both converge to 1. The triangle inequality shows that En \Fn ⊂ Gn. So if we can show P (En \Fn) converges to 1, then P (Gn) converges to 1. We have c c c c P (En \ Fn) = 1 − P (En [ Fn) ≥ 1 − P (En) − P (Fn) ! 1 p Note that since g(x) = x is a continuous function and S2 is a consistent estimator for σ2, the last statement in the theorem implies S is a consistent estimator for σ. End of lecture on Tues, 2/13 Our first application of this theorem is to show that for unbiased estimators, if the variance goes to zero and the bias goes to zero then the estimator ^ ^ is consistent. Recall that the bias is defined to be B(θn) = E[θ] − θ. 4 Theorem 3. If θ^ is an estimator for θ (possibly biased) such that σ ! 0 n θ^n ^ ^ as n ! 1 and B(θn) ! 0 as n ! 1, then θn is consistent. Proof: We write the estimator as ^ ^ ^ θn = (θn − B(θn)) + B(θn) (10) ^ Note that the bias B(θn) is a constant, i.e., not random. We can think ^ of θn − B(θn) as another estimator for θ. It has the same variance as θn and so its variance goes to zero. Also, it is unbiased. So by our previous ^ theorem θn − B(θn) converges to θ in probability. The sequence of constants ^ B(θn) converges to 0, so if we think of these as RV's they converges to 0 ^ in probability. So by part (1) of the previous theorem θn converges to θ in probability. 2 2 Application: We have already seen that Sn is a consistent estimator for σ . 2 This followed from the fact that the variance of Sn goes to zero. Note that 2 we did not actually compute the variance of Sn. We illustrate the application 2 of the previous proposition by giving another proof that Sn is a consistent estimator. Recall " n # n 1 X S2 = Y 2 − (Y )2 (11) n n + 1 n i n i=1 The proof that this is a consistent estimator using the previous proposition was done in class, but is not included here. The notion of convergence in probability is one of several senses in which a sequence of RV's can converge. Another important one is the following. Definition 3. Let Xn be sequence of RV's, X a RV. Let FXn (t) and FX (t) be the CDF's. We say Xn converges to X in distribution if FXn (t) conveges to FX (t) for all t such that FX (t) is continuous at t. Proposition 1. If Xn converges to X in distribution and FX (t) is continuous at a and b, then P (a ≤ Xn ≤ b) converges to P (a ≤ X ≤ b) as n ! 1. Partial proof: P (a < Xn ≤ b) = FXn (b) − FXn (a) ! FX (b) − FX (a) = P (a < X ≤ b) (12) 5 This is not quite the statement in the proposition since we have a < ··· rather than a ≤ · · ·. For X this does not matter since the hypothesis that FX (t) is continuous at a implies P (X = a) = 0. But P (Xn = a) can be nonzero. To finish the proof we would need to prove that this probability goes to 0 as n ! 1. Note that the convergence in the central limit theorem is convergence in distribution.

Load more