Point Estimation-II Sections 10.4-10.6

Home , Consistent estimator

Point Estimation-II Sections 10.4-10.6 - Consistency, Sufficiency and Robustness Section 10.4 Consistency Property Outline:  Idea of consistency: As sample size n of a RS from some population with ˆ distribution fx( | ) increases to infinity, does the estimator (,,,)XXX12 n converge to , i.e., that the error goes to zero in some sense. ˆ  Pn(n   )  0 as   . [Convergence in Probability.]

 Why do we need to worry about it?  Even though we mentioned that MSE (variance, in case of unbiased estimator) is a reasonable measure of the closeness of an estimator to the true vales, we do have to make sure that this quantity is finite.  In case of standard distributions, usually this is not a problem.  But there are always pathological cases, in which chance variation may produce surprise outcomes, so we must be watchful about the effect of these pathologies.  An example in the book discusses the consequence of the underlying distribution being off from a Normal with an incredibly small probability, o One may not discover this in a small number of samples from an infinite population. XN~ (1 ) (  ,1)  Cauchy(  ,1), where  is small  (of the order of 10100 , much less than the chance of a person getting hit by a lightening strike).

o Cauchy distribution does not have any finite moment. [ is NOT THE MEAN of the distribution, but it is the median of the distribution.] o Thus for any small  0 , E(X) does not exist, and Var(X) is not defined. In other words,

 . the integrals |x | f ( x ) dx and x2 f ( x ) dx are not finite.  

o Therefore, in this case the law of large numbers does not apply to X n . . Note that Chebyshev’s inequality for the LLN required that

. E X exits and Var ( X )   2 is finite.

. Then, . The sample mean is a consistent estimator of the population mean, provided the population mean exists and the variance is finite. ˆ  Definition: The statistic (,,,)XXX12 n is said to be a consistent estimator

Pnˆ   0, as   . of the parameter  , if  n  o This definition is, in general, about estimators for an arbitrarily defined parameter of the population. o To verify this property, we need to know the variance of its exact or asymptotic sampling distribution. E.g., o Sample variance of a sample from a normal distribution.

 For a RS of size n from a Normal population, XN( ,2 ),

n ()XX 2 2 o Y = 1 i has a  distribution with  n 1 degrees of freedom.  2 Since E( Y ) , and Var ( Y ) 2 , it follows that E( Y / ) 1, and Var ( Y /  ) 2 /  . Therefore, as n goes to infinity,

1 n o SXX22()is a consistent estimator of  2 . n 11 i 1 n o So is ()XX 2 , even though it is not unbiased. However, its bias n 1 i goes to 0 as n goes to infinity, i.e., it is asymptotically unbiased.  A consistent estimator does not have to be unbiased, because consistency evaluates the difference between the estimator and the parameter in a probabilistic manner, and not between the estimator and its expected value. o See Example 10.9, page 330.

o If the estimator does not use all the data, e.g., ˆ (,,,),XXXX 12 nn the estimator is not consistent.

Section 10.5 Sufficiency  Key Idea: If we collect a lot of data, what function of the data should be saved, and rest discarded, so that the discarded part does not contain any information about the parameter , in the sense that the distribution of part to be discarded doesn’t depend on .

o We say that the statistic U(X12 , X , , Xn ) is sufficient for learning about the parameter , if conditional probability distribution (density) of the

full data X12 , X , , Xn , given doesn’t change with different values of In that case, given U = u, i.e., all observed

values of data x12 , x , , xn , that lead to the given value of u, will have same distribution for all In this sense, these different data sets do not provide any extra information about the parameter. Formally, f (x , x , , x | ) fh(x , x , , x |u , )12 n (x , x , , x ) * 1 2nng(u | ) 1 2 o If this ratio does depend on , some of the observations leading to will be more likely for some values of than for others, and this knowledge will help in learning about In this case, it is said that the conditional distribution of X12 , X , , Xn ,given U = uis independent of .

 Example: n Bernoulli trials. Distribution of their Sum is Binomial. Can find the ratio (*) as follows:

 If we know the distribution g(U | ) , it is easy to check whether or not this ratio depends on the parameter value? E.g., o Distribution of the sum of a RS of size n from Normal population with unknown mean and known variance (say 1).

o Distribution of the sum of a RS of size n from Normal population with unknown mean and unknown variance.

 Does this mean that one has to first obtain the distribution g(U | ) to check if this ratio depends on the parameter value? Thankfully NOT.  One can use Factorization Theorem to verify the sufficiency of a statistic for given parameter of a population.

o Factorization Theorem: The statistic U(X12 , X , , Xn ) is a sufficient

statistic for the parameter  if and only if the joint distribution

(density) of the data X12 , X , , Xn ,can be factorized into a function of

g(U , ) and another function h(x1 , x 2 , , xnn ) of x 1 , x 2 , , x only.The functions g and h do not have to be distributions (densities), since each of them can always be normalized to integrate to one to satisfy (*). o Examples: Exponential, Gamma, Poisson, Normal with unknown mean and variance.

o o

 Any one-one function of a sufficient statistic is also sufficient.  One-one functions are algebraically equivalent.  Examples:

Section 10.6 Robustness:  Key Idea: All good properties of statistical inference procedures are based on certain underlying distributional assumptions. In order to control risk, it is crucial that these properties are not sensitive to minor violations of these assumptions. Given access to modern computing resources, one can easily examine the sensitivity of the risk to these minor violations. However, starting about 1970’s, these questions were investigated analytically and via simulations.  The example about mixture of Normal and Cauchy distribution is a case in point, since its Mean doesn’t exist, and variance is not finite. With a small probability, one can get a few outliers in the sample, so the sample means do not converge. However, population median exists and the sample median is consistent for the population median.  In Olympics, why do we remove the highest and the lowest score and take the mean of the remaining scores? o Weighted Mean of the observations after removing a specified number of smallest and largest observations (Trimmed Mean) is a robust estimator of the mean.  These concepts will be discussed in Chapter 16.