<<

that has “central tendencies.” Which of these is best, and why?

Desirable criteria for point . The MSE. Well, of course, we want our point estima- ˆ Point estimators tor θ of the parameter θ to be close to θ. We can’t Math 218, Mathematical expect them to be equal, of course, because of sam- D Joyce, Spring 2016 pling error. How should we measure how far off the random θˆ is from the unknown constant θ? The main job of statistics is to make inferences, One standard measure is what is called the specifically inferences bout parameters based on squared error, abbreviated MSE and defined by from a sample. MSE(θˆ) = E((θˆ − θ)2) We assume that a sample X1,...,Xn comes from a particular distribution, called the population dis- the expected square of the error. If we have two dif- tribution, and although that particular distribution ferent estimators of θ, the one that has the smaller is not known, it is assumed that it is one of a fam- MSE is closer to the actual parameter (in some ily of distributions parametrized by one or more sense of “closer”). parameters. For example, if there are only two possible out- and . The MSE of an θˆ comes, then the distribution is a Bernoulli distri- can be split into two parts, the estimator’s variance bution parametrized by one parameter p, the prob- and a “bias” term. We’re familiar with the variance ability of success. of a X; it’s For another example, many measurements are as- sumed to be normally distributed, and for a normal Var(X) = σ2 = E((X − µ )2) = E(X2) − µ2 . distribution, there are two parameters µ and σ2. X X X Right now, though, our random variable is θˆ, so its Point estimation. The first kind of inference variance is that we’ll look at is estimating the values of pa- rameters. A point estimator θˆ of a parameter θ is Var(θˆ) = E((θˆ − E(θˆ))2) = E(θˆ2) − (E(θˆ))2. some of the sample X ,...,X . Any func- 1 n ˆ tion of a sample is called a sample . The MSE(θ) is the sum of this variance and one For example, for the Bernoulli distribution, a other component. Let’s see what that other com- typical estimator for p is the sample mean X, that ponent is. is,p ˆ is often taken to be X. MSE = E((θˆ − θ)2) For another example, for the , ˆ2 ˆ 2 µˆ is often taken to be X, andσ ˆ2 is often taken = E(θ − 2θθ + θ ) 2 1 P 2 = E(θˆ2) − (E(θˆ))2 + (E(θˆ))2 − 2E(θˆ)θ + θ2 to be the sample variance S = n−1 (Xi − X) , although sometimesσ ˆ2 is taken to be the sample = Var(θˆ) + (E(θˆ))2 − 2E(θˆ)θ + θ2 1 2 variance P(X − X) 2 n i = Var(θˆ) + (E(θˆ) − θ) Sometimes, there are many different choices for estimators. To estimate the population mean µ, The expression E(θˆ) − θ is called the bias of the besides (1) the sample mean X, you might instead estimator θˆ. 1 take (2) the midrange value 2 (Xmin + Xmax), or (3) the , or (4) just about any other statistic Bias(θˆ) = E(θˆ) − θ.

1 If Bias(θˆ) is positive that that you expect This particular estimated sample error is so com- the estimator θˆ to be too large, if negative, then monly used, it’s got its own name—the standard too small. From the computation above, we see error of the mean—abbreviated SEM. that the MSE is the sum of the variance of θˆ and the square of the bias of θˆ: The method of moments. There are other ways of coming up with estimators. One is called 2 MSE(θˆ) = Var(θˆ) + (Bias(θˆ)) . the method of moments. The idea is that to esti- mate a µk of the population distribution, So, what makes a good estimator? We’ve only just use the corresponding momentµ ˆk of the sam- seen a couple of measures of the goodness of an es- ple. They’re analogous, anyway. timator, and people disagree just based on them. n The authors say the best estimator is the unbi- 1 X µ = E(Xk) whileµ ˆ = Xk. ased one with the smallest variance. Others say k k k i the smallest MSE is most important since it’s even i=1 closer to the parameter. Others say that there are The method of moments goes further than that, other criteria that are more important than the though. If what you want to estimate k param- ones we’ve just discussed. Example 6.4 discusses eters θ1, . . . , θk, and those parameters aren’t mo- the sample variance S2 as an estimator for variance ments themselves, then find those parameters in σ2 for a normal distribution. Dividing by n − 1 terms of the moments. leads to an unbiased estimator, but dividing by n Example 6.6, page 202, shows how this is done leads to an estimator with a much smaller MSE, at when a random sample is taken from a uniform dis- least for small n, but as n increases the difference tribution on an unknown interval [θ1, θ2]. Using the between the MSE’s for the two estimators (the one method of moments, it turns out that involving n−1 and the one involving n) approaches q 0. ˆ ˆ 2 θ1, θ2 =µ ˆ1 ± 3(ˆµ2 − µˆ1)

1 P 1 P 2 and estimated standard er- where µ1 = X = k Xi, and µ2 = k Xi . ror of an estimator. The variance of θˆ is one measure of its error, and that’s often reported as Math 218 Home Page at its square root, the of θˆ, called http://math.clarku.edu/~djoyce/ma218/ the standard error of the estimator θˆ, abbreviated SE. Unfortunately, the actual value of this standard error is not known because the parameters of the distribution are unknown. But they can be esti- mated and so can the standard error be estimated. So, what’s usually reported is the estimated stan- dard error. An example. Suppose the population has two unknown parameters µ and σ2, as is the case for normal populations. The sample mean x is often used to estimate µ. The standard deviation of X, that is, the SE of the estimator X, is σ/n. But σ is estimated by the sample standard deviation√S. Thus, the estimated sample error for X is S/ n.

2