Quick viewing(Text Mode)

Stat 421-SP 2012 Point Estimation-I Section 10.1-10.3 Outline:  a Random Sample (RS) of Size N - Observed Values of N Independent And

Stat 421-SP 2012 Point Estimation-I Section 10.1-10.3 Outline:  a Random Sample (RS) of Size N - Observed Values of N Independent And

Stat 421-SP 2012 Point Estimation-I Section 10.1-10.3 Outline:  A Random Sample (RS) of size n - Observed values of n independent and

identically distributed (iid) random variables XXX12,,, n from an infinite population, with common

o Purpose is to learn about the underlying distribution f. o However, if the distribution f is known, we do not need to sample for making any probabilistic statement, because the probability of any event A can be calculated from this known distribution.  In Parametric framework, it is assumed that the functional form of the population distribution of X is known, except for the value(s) of a finite number of unknown parameter(s) that fully characterize this distribution, e.g.,

o XN( ,2 ), where one or both the parameters  and are unknown.

o X Poisson( ),where the value of  is unknown.  Parameter: Unknown Value for the population distribution’s summary characteristic (e.g., , , etc.) that will define the distribution, once its values are learned.  We say that f belongs to a family of distributions defined by possible values of the parameter(s)  in a defined set.  : Using the values of the observations in the sample, we want to learn about the underlying parameter(s) value(s).

 Two type of Inference Problems: o Estimation – Must determine the value(s) of the parameter(s) from among a list of possible values (usually a continuum) o Test of Hypotheses: must decide to accept or reject a subset of values of the parameter as possible values of the parameter(s), e.g.,

2,or 2. The idea is that rejecting this set of values disproves a certain theory.  Point Estimation: Use the value of a for a specific observed sample as an estimate of the population parameter (Point Estimate)

 The statistic (a function g(,,,) X12 X X n ), whose value changes from sample to sample is called a Point , e.g.,

1 n o the mean function X or the SXX22() n n 11 i could be considered as Point of population mean or population variance, respectively.

1 n 1 n o The observed values xx or s22() x x for a specific nin 1 n 11 in sample are called point estimates. o [Note Lower case for Estimates and upper case for Estimators.]  Interval Estimators or Interval Estimates (Chapter 11) – the parameter could be defined as any of the values in an interval (which accounts for the uncertainty, or variability from sample to sample, in our .)  Key Question: How good is the Estimator we are considering? o NO Perfect Estimator that always give the “CORRECT” Answer. o Various Statistical properties of Estimators can be evaluated to find the best one from among many based on some perspective, (Optimization Criteria to minimize our risk for the use at hand.) . Different criteria can lead to different optimal solution! . Most information at lowest cost!  Generic concepts of “goodness” in the context of Point Estimation: o Unbiasedness, Minimum Variance, , Consistency, Sufficiency and Robustness  Unbiased Estimators o The estimator varies from sample to sample, but on the average over all possible samples, one may expect that the estimator be equal to the parameter it is trying to estimate. ˆ o An estimator (,,,)XXX12 n is an unbiased estimator of the parameter if and only if E ˆ   . Otherwise it is called biased.

ˆ ˆ . Note that the error,,in the estimator (,,,)XXX12 n is also a (varies from sample to sample). . Bias: b( ) = E ˆ   . If it is an unbiased estimator, the average error over all possible samples is ______. o It should be noted that Biased Estimator is a loaded word, in that sometimes biased estimators may be better from some perspective. o Any idea on why that may be the case! o Biased estimators can be de-biased If the bias does not depend on the parameter (is a constant). o Asymptotically Unbiased: The bias converges to 0 for all as the sample size goes to ∞. o Examples: . Sample Mean and Sample Variance in general. . Binomial (n, ), Shifted Exponential with 1.

 Efficiency o Key Question: Given several unbiased estimators of a given parameter, which one should one choose? o One ‘measure of goodness’ might be to consider Expected squared

error E[(ˆˆ  )2 ]  Var (  ) . o An unbiased estimator with smallest possible Variance is called Minimum Variance Unbiased Estimator. o How does one figure out that an estimator is MVUE. o It turns out that one can show that under very general regularity conditions, there is a threshold on the value of the variance of an unbiased estimator. . We will not worry about its proof at this level. But, for those who are curious, it involves Taylor Series expansion. o Cramer-Rao Inequality: ˆ For any unbiased estimator (,,,)XXX12 n based on a random sample of size n, 1 Var(),ˆ 2  lnfX ( | ) nE  

where, given  , fx( | ) is the value of the population density of X at x.

o The denominator on the right hand side of C-R inequality is called Fisher information about supplied in a sample of size n.

o Theorem: (Obvious from C-R inequality) An unbiased estimator of whose variance is equal to the right hand side of the C-R inequality is a Minimum Variance Unbiased Estimator. o Smaller the variance, the larger the information! o What if I have an estimator that ignores the data? (i.e., it is a constant.) Of course, its variance is equal to 0. But there must be something wrong with this estimator! ˆˆ o Given two estimators 12 and of the same parameter , the relative ˆ ˆ efficiency of 2 with respect to 1 is defined by the ratio,

Var()ˆ 1 . ˆ Var()2 ˆ o If this ratio is < 1, is less efficient with respect to , or 1 is ˆ relatively more efficient than 2 .

o Combining several Unbiased Estimators: Consider a linear k combination of several independent, unbiased estimators, w ˆ .  j1 jj kk Its variance is given by Var( wˆˆ )  w2 Var (  ). For example: jj11j j j j

o How does one choose the weights to find the Minimum Variance Unbiased Estimator in this class of linear combinations of these estimators? o Find the constraint needed for unbiasedness, and Minimize the variance subject to this constraint. [LaGrange multiplier.]

o o