<<

STAT 285

Richard Lockhart

Simon Fraser University

Fall 2014 — Surrey

Richard Lockhart (Simon Fraser University) STAT 285 Point Estimation Fall 2014 — Surrey 1 / 29 Purposes of These Notes

Define an of a parameter. Define an estimate Unbiased . The of an estimator. of an estimator. Estimated Standard Error of an estimator. Minimum unbiased estimation. Method of moments estimation. Maximum likelihood estimation.

Richard Lockhart (Simon Fraser University) STAT 285 Point Estimation Fall 2014 — Surrey 2 / 29 Models

Basic statistical strategy for analysis of .

Give names (like X1, X2 or U or whatever) to the numbers measured in an of survey or or . . .. Treat these as random variables. Make assumptions about the joint distribution of the data. Like: independent normal µ,SD σ. Notice unknown parameters. Must use the data to learn about those parameters.

Richard Lockhart (Simon Fraser University) STAT 285 Point Estimation Fall 2014 — Surrey 3 / 29 List of Statistical Problems

Name most likely value of parameters: point estimation. Name of likely values: confidence interval. Assess evidence against hypothesis about parameters: hypothesis testing. Make forecasts, do interpolation. And more.

Richard Lockhart (Simon Fraser University) STAT 285 Point Estimation Fall 2014 — Surrey 4 / 29 Point Estimation

Estimate: number which is our best guess for parameter value. Estimator: rule for computing estimate from data. An estimator is a which is a function of the data. Example. Newcomb& Michelson measured speed of light in 1880s. Made 66 measurements of time taken by light to travel 7.44373 km.

Measured values are X1, X2,..., Xn with n = 66. Use lower case letters for observed values. First measurement was 24.828 millionths of a second. Convert measurement to speed of light. 9 8 x1 = 10 7.44373/24.828 = 2.998119 10 m/s. · 8 × x2 = 2.998361 10 m/s. × Point estimate of speed of light is 2.998336 108 m/s. ×

Richard Lockhart (Simon Fraser University) STAT 285 Point Estimation Fall 2014 — Surrey 5 / 29 Estimators

We were using the rule: average the data. So our estimator was

X1 + + Xn X¯ = · · · . n Model for measurement error.

Several parts: X1,..., Xn independent and identically distributed. Let µ = E(Xi ) be the population mean. Long run average measurement. Population SD is σ. Speed of light is c — standard notation. Relate µ to c: µ = c + bias Often assume bias is 0.

Richard Lockhart (Simon Fraser University) STAT 285 Point Estimation Fall 2014 — Surrey 6 / 29 Newcomb data

Richard Lockhart (Simon Fraser University) STAT 285 Point Estimation Fall 2014 — Surrey 7 / 29 Point Estimation

Have data and model for population. Model describes population in terms of some parameters. Binomial(n, p) model: p is a parameter. Sample from a N(µ,σ2) model. Parameters are µ and σ. Sample from the Gamma density

− 1 x α 1 f (x; α, β)= exp( x/β) x > 0. βΓ(α) β −   Parameters are α and β. Generic notation: θ.

Richard Lockhart (Simon Fraser University) STAT 285 Point Estimation Fall 2014 — Surrey 8 / 29 An estimator

Estimator of θ: function of data used to compute best guess of θ. Often denoted by θˆ: parameter name with ‘hat’ on top. If X is has Binomial(n, p) distbn and θ = p then usual estimator is X pˆ =p ˆ(X )= . n

2 If X1,..., Xn are a sample from a Normal(µ,σ ) population then the usual estimators of µ and σ are n µˆ = X¯ = i=1 n P n 2 i (Xi X¯) S = =1 − . s n 1 P −

Richard Lockhart (Simon Fraser University) STAT 285 Point Estimation Fall 2014 — Surrey 9 / 29 More estimators

If X1,..., Xn are a sample from a Gamma(α, β) population then the usual estimators for α and β are the unique solutions of the equations

X¯ β = α ′ n Γ (α) log(Xi ) = i=1 X¯ log X X¯. Γ(α) n − ≡ − P Other estimators are possible.

In Normal example the estimator (X1,..., Xn). In the Binomial problem: (X + 1)/(n + 2). In the gamma problem the method of moments estimator: ¯ βˆ = S αˆ = X X¯ βˆ 1 Or for a coin the estimator:p ˆ = 2 .

Richard Lockhart (Simon Fraser University) STAT 285 Point Estimation Fall 2014 — Surrey 10 / 29 Evaluating Estimators

Compare estimators in terms of usual, average performance. Want estimator to usually be close to θ. So: fix value of parameters. Compute summary of average distance from θˆ to θ. Choose θˆ to make that small. Specific measures: , Bias, Standard Error.

Richard Lockhart (Simon Fraser University) STAT 285 Point Estimation Fall 2014 — Surrey 11 / 29 Mean Squared Error (MSE)

Define Mean Squared Error of θˆ:

MSE(θˆ) E (θˆ θ)2 ≡ − h i RMSE or Root Mean Squared Error is square root of MSE. Example Calculation for the Binomial(n, p) problem. I suggested X + 1 p˜ = . n + 2 To compute MSE I need

E (˜p p)2 = E(˜p2) 2pE(˜p)+ p2. − − First term is Var(˜p)+ E(˜p). So we need mean and variance ofp ˜.

Richard Lockhart (Simon Fraser University) STAT 285 Point Estimation Fall 2014 — Surrey 12 / 29 MSE Example Mean X + 1 E(˜p)= E n + 2   E(X ) + 1 = n + 2 np + 1 = . n + 2 Variance X + 1 Var(˜p)= Var n + 2   Var(X ) = (n + 2)2 np(1 p) = − . (n + 2)2

Richard Lockhart (Simon Fraser University) STAT 285 Point Estimation Fall 2014 — Surrey 13 / 29 Comparing estimators Do algebra (you do it, not me): np(1 p) + (1 2p)2 MSE(˜p)= − − . (n + 2)2 Now comparep ˜ to usual estimator: X pˆ = . n Mean ofp ˆ is E(ˆp)= E(X )/n = np/n = p Definition: if E(θˆ)= θ for all θ then we call θˆ unbiased. So thisp ˆ is unbiased. So MSE(ˆp)= E[(ˆp p)2]= Var(ˆp). − And the variance is just Var(ˆp)= Var(X )/n2 = np(1 p)/n2 = p(1 p)/n. − − Richard Lockhart (Simon Fraser University) STAT 285 Point Estimation Fall 2014 — Surrey 14 / 29 Comparison: plot ratio of MSEs, n = 10 mserat(x, 10) mserat(x, 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

0.0 0.2 0.4 0.6 0.8 1.0

x

Richard Lockhart (Simon Fraser University) STAT 285 Point Estimation Fall 2014 — Surrey 15 / 29 Comparison: plot ratio of MSEs, n = 100 mserat(x, 100) mserat(x, 0.0 0.2 0.4 0.6 0.8 1.0

0.0 0.2 0.4 0.6 0.8 1.0

x

Richard Lockhart (Simon Fraser University) STAT 285 Point Estimation Fall 2014 — Surrey 16 / 29 Take away messages

The strange looking estimator is better in problems where p is not too far from 0.5 But for rare things (or extremely common things), that is for p near 0 or 1, the usual estimator is better. But you don’t usually know where p is. So we often make the choice in other ways.

Richard Lockhart (Simon Fraser University) STAT 285 Point Estimation Fall 2014 — Surrey 17 / 29 Bias, unbiased estimator

The θˆ of θ is

E(θˆ) θ − An estimator is unbiased if its bias is 0:

E(θˆ) θ ≡ We often prefer unbiased estimators. The estimatorp ˆ = X /n in the Binomial(n, p) model is unbiased. The bias ofp ˜ = (X + 1)/(n +2) is

1 2p − . n + 2

In sample from population with mean µ and variance σ2 sample mean and variance are unbiased. Sample SD is biased.

Richard Lockhart (Simon Fraser University) STAT 285 Point Estimation Fall 2014 — Surrey 18 / 29 Estimation of and SDs Already: sample mean is unbiased and has variance σ2/n. I compute the of the sample variance n 2 i (Xi X¯) E =1 − . n 1 P −  Start by expanding numerator: n n 2 2 2 (Xi X¯) = Xi 2Xi X¯ + X¯ − − i=1 i=1 X Xn n  n 2 2 = Xi 2X¯ Xi + X¯ − i=1 i=1 i=1 Xn X X 2 2 2 = Xi 2nX¯ + nX¯ − i=1 Xn 2 2 = Xi nX¯ − i=1 X Richard Lockhart (Simon Fraser University) STAT 285 Point Estimation Fall 2014 — Surrey 19 / 29 The Sample Variance is Unbiased Remember Var(X )= E(X 2) E2(X ) so − n 2 2 2 2 E( Xi )= n Var(X1)+ E (X1) = n(σ + µ . i=1 X  Using the same formula:

E(X¯ 2)= Var(X¯)+ E2(X¯)= σ2/n + µ2.

Put it together to see n 2 2 2 2 2 2 E( (Xi X¯) = n(σ + µ ) n(σ /n + µ ) = (n 1)σ . − − − i=1 X Finally we see (n 1)σ2 E(S2)= − = σ2 n 1 − so sample variance is unbiased estimator of the population variance.

Richard Lockhart (Simon Fraser University) STAT 285 Point Estimation Fall 2014 — Surrey 20 / 29 Standard Errors Estimates should always be accompanied by some assessment of their likely accuracy. For unbiased estimators with approximately normal distributions we use the Standard Error. The SE of an estimator θˆ of θ is

SE = Var(θˆ). That is: Standard Error of an estimatorq is another name for its SD. The standard error ofp ˆ in the Binomial(n, p) problem is p(1 p) − n p √ The SE of X¯ is σ . √n Sometimes we only know an approximate SE (problem 18 in Chapter 6, e.g.).

Richard Lockhart (Simon Fraser University) STAT 285 Point Estimation Fall 2014 — Surrey 21 / 29 Estimated Standard Errors

What accompanies our point estimate is a number, not a formula. The SE is usually a formula with unknown parameters in it. We estimate the SE by plugging in estimates of the parameters. The SE forp ˆ is p(1 p)/n so Estimated SE is − p pˆ(1 pˆ) − . p √n And you plug in data to get a number to put in your report.

Richard Lockhart (Simon Fraser University) STAT 285 Point Estimation Fall 2014 — Surrey 22 / 29 Sample SD is biased

Both sample SD and estimated SE are biased estimators. I know because Var(S)= E(S2) E2(S). − Since S is variable its variance is more than 0 and so

E2(S) < E(S2).

But S2 is unbiased so E2(S) <σ2. Take square roots to learn

E(S) <σ

That is S has a negative bias. So does S/√n which estimates the SE σ/√n.

Richard Lockhart (Simon Fraser University) STAT 285 Point Estimation Fall 2014 — Surrey 23 / 29 Minimum Variance Unbiased Estimation

Theorem 2 Suppose X1,..., Xn are sampled from a Normal(µ,σ ) population. Suppose µ˜ is an unbiased estimator of µ which is not the unbiased estimator µˆ = X.¯ Then no matter what µ and σ are:

Var(ˆµ) < Var(˜µ).

In Normal populations you cannot do better than X¯ without having bias.

Richard Lockhart (Simon Fraser University) STAT 285 Point Estimation Fall 2014 — Surrey 24 / 29 Where do estimators come from

Ad hoc, method of moments, method of maximum likelihood. Method of moments – 2 parameter example: ◮ Work out formulas for expected value of µ and σ2 in terms of parameters. ◮ Set X¯ = µ formula and S 2 = σ2 formula. ◮ Solve for parameters. Method of maximum likelihood: ◮ Write out log of joint density of data as function of parameters. ◮ Maximize the result over all parameter values.

Richard Lockhart (Simon Fraser University) STAT 285 Point Estimation Fall 2014 — Surrey 25 / 29 Method of moments example Sample mean and sample SD are method of moments estimates of µ and σ. pˆ is method of moments estimate of p in Binomial(n, p). In Gamma(α, β) population mean is µ = αβ and variance is σ2 = αβ2. So we solve

X¯ = αβ S2 = αβ2

Divide 2nd by first to find

βˆ = S2/X¯

Then use first equation to get

αˆ = X¯/βˆ = X¯ 2/S2.

Richard Lockhart (Simon Fraser University) STAT 285 Point Estimation Fall 2014 — Surrey 26 / 29 Maximum Likelihood

Suppose X1,..., Xn are a sample from the Poisson(λ) distribution. The joint pmf of X1,..., Xn is n n xi −λ P(X1 = x1,..., Xn = xn)= P(Xi = xi )= λ e /xi ! i=1 i=1 Y Y I want to maximize this function of λ. It is easier to maximize the logarithm of the function because the log of a product is a sum of logarithms. The log likelihood is n

logP(X1 = x1,..., Xn = xn)= (xi log(λ) λ log(xi !)) . − − i=1 X This simplifies to n n log(λ) xi nλ log(xi !). − − i=1 i=1 X X Richard Lockhart (Simon Fraser University) STAT 285 Point Estimation Fall 2014 — Surrey 27 / 29 MLE – maximize log likelihood

Find max by setting λ derivative equal to 0: n xi i=1 n = 0 λ − P Solve to find n xi λˆ = i=1 =x ¯. n P So Maximum Likelihood Estimator (MLE) is λˆ = X¯. At a maximum 2nd derivative is negative. Second derivative is nx¯/λ2; evaluate at λ = λˆ. − Get nx¯2, which is negative, so maximum. −

Richard Lockhart (Simon Fraser University) STAT 285 Point Estimation Fall 2014 — Surrey 28 / 29 MLE in normal samples

2 Have X1,..., Xn sampled from Normal(µ,σ ). Likelihood is product of densities: one for each Xi .

n 2 exp ((xi µ)/σ) /2 {− − }. √ i=1 2πσ Y Log likelihood is

n 2 1 (Xi µ) − n log(σ) n log(2π)/2. −2 2 − − i=1 X Maximize by setting both partial derivatives equal to 0 and solving. Find n 2 i (Xi X¯) (n 1) µˆ = X¯ andσ ˆ2 = =1 − = − S2. n n P

Richard Lockhart (Simon Fraser University) STAT 285 Point Estimation Fall 2014 — Surrey 29 / 29