Point Estimation

San José State University Math 161A: Applied Probability & Statistics I

Point Estimation

Prof. Guangliang Chen Section 6.1 Some general concepts of point estimation Point Estimation

Scenario change

We have completed the probability portion of the course:

distributions of discrete random variables (Chapter 3) • distributions of continuous random variables (Chapter 4) • joint distributions of two (discrete) random variables (Section 5.1) • sampling distributions of statistics (Sections 5.3, 5.4) •

Prof. Guangliang Chen | Mathematics & Statistics, San Jose State University3/27 Point Estimation

In the previous settings (which were very mathematical), we assumed that we had full knowledge of the distribution in terms of both the distribution type and the values of the associated parameters (e.g., Bernoulli(0.5), 2 1 Pois(2.2), N(65, 2 ), Exp( 45 )).

In practical settings we usually only know the type of the distribution for the population (or can make a reasonable assumption about the distribution type), but not the values of its parameters.

In most cases, it is too diﬃcult or expensive to access the whole population to determine the exact value of a distribution parameter.

Prof. Guangliang Chen | Mathematics & Statistics, San Jose State University4/27 Point Estimation

A more eﬃcient way is to use a sample from the population to infer about the population parameters. This is called statistical inference.

f known X 1 Population sampling X2 f(x; θ) b

inference (θ =?) b

Xn Sample

Prof. Guangliang Chen | Mathematics & Statistics, San Jose State University5/27 Point Estimation

For example, in the brown egg problem, we only know (or can assume) that the weights of all the brown eggs produced at the farm (population) follow a normal distribution (this is our model).

We will need to determine the values of its parameters µ (mean weight) and σ2 (variance).

Inference about the population mean µ and the variance σ2 can be made based on a random sample X1,...,Xn from the distribution (e.g., weights of a carton of eggs selected from the population).

Prof. Guangliang Chen | Mathematics & Statistics, San Jose State University6/27 Point Estimation

We may consider three diﬀerent kinds of inference tasks:

Point estimation: What is the single (best) guess of the population • mean µ?

Interval estimation: In what interval (range) does µ lie “with high • probability”?

Hypothesis testing: The label says µ = 65 g, but the average • weight of the eggs in a randomly selected carton is only 63.9 g. Is this a contradiction?

Prof. Guangliang Chen | Mathematics & Statistics, San Jose State University7/27 Point Estimation

For each task, inference will be performed through a statistic:

f known X 1 Population sampling X2 f(x; θ) θ =? b b statistic b U Xn Sample

Prof. Guangliang Chen | Mathematics & Statistics, San Jose State University8/27 Point Estimation

Point estimation

Consider the brown egg example again.

Example 0.1. Suppose the weights of the 12 eggs in a selected carton are

x1 = 63.3, x2 = 63.4, x3 = 64.0, x4 = 63.0, x5 = 70.4, x6 = 65.7,

x7 = 63.7, x8 = 65.8, x9 = 67.5, x10 = 66.4, x11 = 66.8, x12 = 66.0

Obviously, one can use the sample mean x¯ = 65.5 g as a reasonable guess of the population mean µ.

We say that x¯ = 65.5 g is a point estimate of µ. • However, point estimates will likely vary from sample to sample. • Prof. Guangliang Chen | Mathematics & Statistics, San Jose State University9/27 Point Estimation

In order to study such randomness, we need to consider a random • sample X1,...,X12 from the population and examine the associated statistic: 1 12 X¯ = X X . 12 i i=1 The statistic X¯ is called a point estimator of µ.

Note that a point estimator is a random variable (also a statistic) • while a point estimate is an observed value of the point estimator (obtained through a realization of the sampling process).

Prof. Guangliang Chen | Mathematics & Statistics, San Jose State University 10/27 Point Estimation

Question. Are there other estimators for µ in the brown egg example and what are the corresponding point estimates (based on the same sample)?

Sample median X˜. Point estimate is x˜ = 65.7+65.8 = 65.75 • 2 Midpoint of the range M. Point estimate is m = 63.0+70.4 = 66.7. • 2

b b b× b b

median midpoint

Conclusion: Point estimators of µ are not unique. −→ Follow-up question: Which one is the best?

Prof. Guangliang Chen | Mathematics & Statistics, San Jose State University 11/27 Point Estimation

General deﬁnition

More generally, consider a distribution f(x; θ) with known type f but unknown parameter value θ. For example,

f is the normal pdf and θ represents µ (assuming σ2 known); • f is the Poisson pmf and θ is the associated parameter λ; • Def 0.1. A point estimator θˆ of θ is any (reasonable) statistic that is used to estimate θ.

For any speciﬁc realization of the random sample, the corresponding value of θˆ is called a point estimate of θ.

Prof. Guangliang Chen | Mathematics & Statistics, San Jose State University 12/27 Point Estimation

Example 0.2. Suppose we draw a random sample X1,...,Xn from the uniform distribution Unif(0, b). Then the sample maximum

Xmax = max1 i n Xi ≤ ≤

b bb b b | 0 b can be used as a point estimator for b.

Prof. Guangliang Chen | Mathematics & Statistics, San Jose State University 13/27 Point Estimation

Follow-up question. Is there another statistic that may be used to estimate the unknown parameter b in the preceding example?

Prof. Guangliang Chen | Mathematics & Statistics, San Jose State University 14/27 Point Estimation

New question. Given a random sample X1,...,Xn from a population with unknown variance σ2, what estimators can we use for σ2?

The sample variance is the most common point estimator: • 1 n S2 = X(X − X¯)2 n − 1 i i=1

Another possibility is to use • 1 n n − 1 S02 = X(X − X¯)2 = S2 n i n i=1

Prof. Guangliang Chen | Mathematics & Statistics, San Jose State University 15/27 Point Estimation

Example 0.3. Given the same sample from before,

x1 = 63.3, x2 = 63.4, x3 = 64.0, x4 = 63.0, x5 = 70.4, x6 = 65.7,

x7 = 63.7, x8 = 65.8, x9 = 67.5, x10 = 66.4, x11 = 66.8, x12 = 66.0 a point estimate of σ2 based on S2 is s2 = 4.72. In contrast, s02 = 4.32.

Prof. Guangliang Chen | Mathematics & Statistics, San Jose State University 16/27 Point Estimation

Evaluation of estimators

The best estimators are unbiased and have least possible variance.

unbiasedestimators biasedestimators

b b b×× bbb bb b b b b b b b× b

b b b b× b b b b b b b b b × b b

Prof. Guangliang Chen | Mathematics & Statistics, San Jose State University 17/27 Point Estimation

Def 0.2. A point estimator θˆ of θ is said to be unbiased if

E(θˆ) = θ.

Otherwise, it is biased and the bias of θ is deﬁned as

B(θˆ) = E(θˆ) − θ.

Prof. Guangliang Chen | Mathematics & Statistics, San Jose State University 18/27 Point Estimation

iid 2 Theorem 0.1. Let X1,...,Xn ∼ f(x) with E(Xi) = µ and Var(Xi) = σ for all 1 ≤ i ≤ n. The statistics X,S¯ 2 are always unbiased estimators of µ, σ2 respectively.

Proof. The X¯ part directly follows from a previous sampling result:

E(X¯) = µ.

The variance part can be proved based on the following identity " # 1 n 1 n S2 = X(X − X¯)2 = X X2 − nX¯ 2 n − 1 i n − 1 i i=1 i=1

Prof. Guangliang Chen | Mathematics & Statistics, San Jose State University 19/27 Point Estimation

That is, " # 1 n E(S2) = X E(X2) − nE(X¯ 2) n − 1 i i=1 " # 1 n σ2 = X(µ2 + σ2) − n(µ2 + ) n − 1 n i=1 1 h i = n(µ2 + σ2) − (nµ2 + σ2) n − 1 = σ2

(In the above we have used the formula E(Y 2) = E(Y )2 + Var(Y ) for any random variable Y ).

Prof. Guangliang Chen | Mathematics & Statistics, San Jose State University 20/27 Point Estimation

Remark. The theorem implies that S02 is a biased estimator of σ2: n − 1 n − 1 E(S02) = E( S2) = σ2 6= σ2 n n and the bias is 1 B(S02) = E(S02) − σ2 = − σ2. n That is, S02 tends to underestimate σ2.

Prof. Guangliang Chen | Mathematics & Statistics, San Jose State University 21/27 Point Estimation

Remark. Note that µ may represent diﬀerent parameters for diﬀerent populations:

Normal: X¯ is an unbiased estimator of µ; • Bernoulli: X¯ is an unbiased estimator of p; • Poisson: X¯ is an unbiased estimator of λ; • Uniform(0, b): X¯ is an unbiased estimator of b/2, which implies that • 2X¯ is an unbiased estimator of b.

Prof. Guangliang Chen | Mathematics & Statistics, San Jose State University 22/27 Point Estimation

Example 0.4. For a random sample of size n from the Unif(0, b) distribution (where b is unknown), it can be shown that the sample maximum is a biased estimator of b: n E(X ) = b max n + 1 with negative bias n 1 B(X ) = E(X ) − b = b − b = − b max max n + 1 n + 1 n+1 However, n Xmax is an unbiased estimator of b: n + 1 n + 1 n + 1 n E X = E (X ) = · b = b n max n max n n + 1 (Recall that 2X¯ is another unbiased estimator of b).

Prof. Guangliang Chen | Mathematics & Statistics, San Jose State University 23/27 Point Estimation

Between two unbiased estimators (of some parameter), the one with smaller variance is better.

Def 0.3. The unbiased estimator θˆ∗ of θ that has the smallest variance is called a minimum variance unbiased estimator (MVUE).

All point estimators of θ

All unbiased ones

b MVUE

Theorem 0.2. For normal populations, X¯ is a MVUE for µ.

Prof. Guangliang Chen | Mathematics & Statistics, San Jose State University 24/27 Point Estimation

Summary

Assume a distribution f(x) with an unknown parameter θ and a random sample X1,...,Xn from this population.

Basic concepts • – Point estimator: a statistic used to estimate the parameter θ, denoted as θˆ. The observed value of θˆ corresponding to a speciﬁc sample is called a point estimate.

– Unbiasedness: θˆ is unbiased if E(θˆ) = θ. Otherwise, the bias

is B(θˆ) = E(θˆ) − θ. When two estimators θˆ1, θˆ2 are both unbiased, we prefer the one with smaller variance.

Prof. Guangliang Chen | Mathematics & Statistics, San Jose State University 25/27 Point Estimation

– The unbiased estimator θˆ∗ with the smallest variance is called a minimum variance unbiased estimator (MVUE) for θ.

Important results • ¯ 1 P – Sample mean X = n Xi is always unbiased (as an estimator for population mean µ). For example,

For Normal populations N(µ, σ2), X¯ is unbiased for µ; ∗ For Poisson populations Pois(λ), X¯ is unbiased for λ; ∗ For Uniform distributions Unif(0, θ), X¯ is unbiased for θ ; ∗ 2

Prof. Guangliang Chen | Mathematics & Statistics, San Jose State University 26/27 Point Estimation

In the case of a normal population N(µ, σ2), X¯ also has the smallest variance (among all unbiased estimators) and thus is a MVUE for µ.

2 1 P ¯ 2 – Sample variance S = n−1 (Xi − X) is always unbiased (as an estimator for population variance σ2). For example:

For Normal populations N(µ, σ2), S2 is unbiased for σ2; ∗ For Poisson populations Pois(λ), S2 is unbiased for λ; ∗ 02 1 P ¯ 2 Note that S = n (Xi − X) is always a biased estimator 2 02 1 2 for σ ; the bias is B(S ) = − n σ .

Prof. Guangliang Chen | Mathematics & Statistics, San Jose State University 27/27