Parameter Estimation

Numero, pondere et mensura Deus omnia condidit Population, Random Variable, Sample population random variable X sample

x

a "realisation" of X (observation) all male US army recruits (population )

the first 10 male conscripts entering the US army recruitment office in Concord NH on 11-5-2004 ( sample )

the BMI of a male US army recruit to be chosen at a randomly selected time from a randomly selected recruitment office ( random variable ) Some Conventions

Random variables are denoted by capital letters . Possible values or actual observations ('realisations') are denoted by small letters .

P(X=x) probability that the BMI of a randomly drawn recruit is equal to x P(X>18) probability that a randomly drawn recruit is not underweight P(24

Population Sample

X x ,...,x random variable Sampling 1 n θ: parameter realisations

Conclusion Data Collection

ˆθ ˆθ Inference (x1 ,...,xn ) estimate estimator Estimation from Samples Basic Idea

Parameter Observations Estimator θ θ) x1,...,x n (x1 ,..., xn )

π π = 0,0,1,1,0,1,... ˆ k / n probability proportion µ µ = 1.23,4.81,7.55,... ˆ x expected value sample mean

σ2 σ2 = 2 12.4,19.6,20.4,... ˆ s variance sample variance Coin Tossing

π: probability of head

0.30 π = k = 6 = ˆ 0.6 0.25 n 10 0.20

0.15 X: number of heads in 10 tosses 0.10 X follows a Bin( π,10) distribution 0.05

= = ⋅ π6 ⋅ − π 4 0.00 Pπ (X 6) 210 (1 ) 0.0 0.2 0.4π 0.6 0.8 1.0 Likelihood and Probability

The Likelihood of a parameter, given the data, equals the Probability of the data, given the parameter.

L(θ | x) = Pθ (X = x) Maximum Likelihood Principle

On the basis of data, a parameter is estimated by the most likely value, i.e. by the value that maximises the probability of the data.

This means that ˆθ is chosen such that

L(ˆθ | x) = max θ L(θ | x) AB0 Blood Groups

In a sample of 75 individuals from a 0.16 particular population, 10 were found to have 0.12 × 11 ⋅ π10 ⋅ − π 65 blood group B. What is 3.8 10 1( ) the frequency, π, of blood group B in that 0.08 population?

0.04 10 ˆπ = = 0.133 0.00 75 0.0 0.2 0.4π 0.6 0.8 1.0 Maximum Likelihood Principle Binomial Distribution

n k n−k L(π | k) =   ⋅ π ⋅ (1 − π) k  log{L(π| k)} = const + k ⋅ log(π) + (n − k) ⋅ log(1 − π)

δlog{L(π | k)} k n − k k = − = 0 π = δπ π 1 − π n

ˆπ = k / n is the maximum likelihood estimator of π Female Body-Mass Index (BMI)

Year Name BMI 1984 17.7 1985 Sharlene Wells 18.2 1986 16.8 1987 17.6 1988 Kaye Lani Rae Rafko 18.8 What is the expected 1989 19.1 µ value, , of the BMI of a 1990 17.9 US beauty queen? 1991 17.8 1998 Kate Shindle 20.2 1999 Nicole Johnson 19.6 Is it somewhere around 2001 Angela Perez Baraquio 20.3 x = 18.6 ? 2002 Katie Harman 19.5 Maximum Likelihood Principle Normal Distribution

−µ 2 − x( i ) µ = n 1 ⋅ σ2 L( | x1 ,..., xn ) ∏ e i=1 σ 2π

1 n log{L(µ | x ,..., x )} = const − (x − µ)2 1 n σ2 ∑i=1 i

δlog{L(µ | x ,..., x )} 2 n 1 n = − (n ⋅ µ − x )= 0 δµ σ2 ∑i=1 i

µ = 1 ⋅ n ∑ xi n i=1

µˆ = x is the maximum likelihood estimator of µ Estimators as Random Variables

Owing to the random nature of the sampling process, every estimator ˆθ is a random variable with expected value E( ˆθ ) and variance Var( ˆθ ).

Var(θˆ)

is called the 'standard error' of ˆθ . Distribution of the Sample Mean

Let X 1,...,X n be independent and have the same distribution µ σ2. with E(X i)= and Var(X i)=

=  1 ⋅ n  = 1 ⋅ n = 1 ⋅ ⋅ µ = µ E(X) E ∑ Xi  ∑ E(Xi ) n  n i=1  n i=1 n

2  n  n σ = 1 ⋅ = 1 ⋅ = 1 ⋅ ⋅ σ2 = Var(X) Var ∑ Xi  ∑ Var(Xi ) n  n i=1  n2 i=1 n2 n

σ standard error: n Accuracy and Precision

Accuracy relates to the difference between the expected value of an estimator and the true value. Precision relates to the variance of an estimator.

accurate accurate not accurate not accurate precise not precise precise not precise "Good" Estimators

A good estimator should be unbiased : E(ˆθ) = θ "100% accurate, i.e. gives the correct value, on average" consistent : P(| ˆθ − θ |> ε) → 0 n n "gives more accurate and precise values, and approaches the correct one, as the sample size increases"

efficient : Var(ˆθ) minimal "no other unbiased (i.e. 100% accurate) estimator gives more precise estimates" "Good" Estimators unbiased (100% accurate) consistent (accuracy and precision increase to 100% with increasing sample size) efficient (the most precise among the 100% accurate) "Good" Estimators Maximum Likelihood

Maximum Likelihood Estimators are generally

consistent asymptotically efficient

but NOT always unbiased Unbiased Estimators Probability

Let X have a Bin( π,n) distribution

k ˆπ = n

 X  1 1 n n − π = = ⋅ = ⋅ ⋅   ⋅ πk ⋅ − π n k = E(ˆ) E  E(X) ∑ = k   (1 ) ...  n  n n k 0 k  1 = ⋅ n ⋅ π = π n

ˆπ = k / n is an unbiased estimator of π Coin Tossing

ˆπ : proportion of heads in n tosses

0.8 100 replicates 0.7

0.6

ˆπ 0.5

0.4

0.3

0.2 n=50 n=100 n=500 Unbiased Estimators Expected Value

µ Let X 1,...,X n have the same distribution with E(X i)=

µ = = 1 ⋅ n ˆ x ∑ xi n i=1

µ =  1 ⋅ n  = 1 ⋅ n = 1 ⋅ ⋅ µ = µ E(ˆ) E ∑ Xi  ∑ E(Xi ) n  n i=1  n i=1 n

µˆ = x is an unbiased estimator of µ Game of Dice

Xi: count of a single throw (i=1,..,n) X : average count of n throws

5.5 100 replicates 5.0

4.5

4.0

X 3.5

3.0

2.5

2.0

1.5 n=10 n=100 n=500 Unbiased Estimators Variance

Let X 1,...,X n be independent and have the same σ2 distribution with Var(X i)=

σ2 = 2 = 1 ⋅ n − 2 ˆ s ∑ (xi x) n −1 i=1

 n  σ2 = 1 ⋅ − 2 = E(ˆ ) E ∑ (Xi X)  ...  n −1 i=1  1 = ⋅ (n −1) ⋅[E(X2 ) − µ2 ] = σ2 n −1 1

σˆ2 = s2 is an unbiased estimator of σ2 (Continuous) Uniform Distribution

1 ≤ ≤ − for a x b f(x) = b a 0 otherwise

b + a (b − a)2 E(X) = Var(X) = 2 12

1 b − a

a b Uniform Distribution

a=4, b=8 µ=6, σ2=1.33

3.5 100 replicates 3.0

2.5

2.0 s2 1.5

1.0

0.5

0.0 5 10 15 20 25 n Consistent Estimators

An estimator is called 'consistent' if its accuracy and precision increase with sample size and converges towards 100% as the sample size approaches infinity.

P(| ˆθ − θ |> ε) → 0 n n

ˆπ = k / n is a consistent estimator of π µˆ = x is a consistent estimator of µ σˆ2 = s2 is a consistent estimator of σ2 Efficient Estimators

An unbiased estimator is called 'efficient' if any other unbiased estimator varies more.

Var(ˆθ) minimal

ˆπ = k / n is an efficient estimator of π µˆ = x is usually an efficient estimator of µ σˆ2 = s2 is usually an efficient estimator of σ2 Confidence Interval

For most continuous random variables,

P(ˆθ = θ) = 0

i.e. it is improbable for the estimator to "hit the correct parameter on the head".

It is often more sensible to estimate θ by an interval that, with certain "confidence", includes the true value of θ. Confidence Interval Definition

A confidence interval is a rule to assign an interval I(x) to sample data x such that, for every possible value θ of the parameter under study,

Pθ (x : θ ∈ I(x)) ≥ 1 − α holds. Once the sample data x have been obtained, and the confidence interval I(x) calculated, the researcher has confidence 1-α that I(x) contains the true θ.

1-α is called 'confidence level' (usually 0.95) Confidence Interval Expected Value According to the Central Limit Theorem,

X − µ σ n follows an N(0,1) distribution for sufficiently large n.

X − µ P(−1.96 ≤ ≤ 1.96) = 0.95 σ n

σ σ P(X −1.96 ⋅ ≤ µ ≤ X + 1.96 ⋅ ) = 0.95 n n Confidence Interval Expected Value

σ x ± 1.96 ⋅ n demarcates an interval that will include the true expected value with probability 0.95 (i.e. in 95% of independent replicates of the experiment).

µ Female Body-Mass Index (BMI)

Year Name BMI 95% confidence interval for the 1984 Suzette Charles 17.7 1985 Sharlene Wells 18.2 expected value 1986 Susan Akin 16.8 (assuming σ=1.2): 1987 Kellye Cash 17.6 1988 Kaye Lani Rae Rafko 18.8 ± ⋅ 1.2 1989 Gretchen Carlson 19.1 18.6 1.96 1990 Debbye Turner 17.9 12 1991 Marjorie Vincent 17.8 or 1998 Kate Shindle 20.2 1999 Nicole Johnson 19.6 (17.9,19.3) 2001 Angela Perez Baraquio 20.3 2002 Katie Harman 19.5 Confidence Interval Expected Value

σ known: σ ± ⋅ x z1−α 2/ n σ unknown: α s α/2 1- α/2 ± ⋅ x t1−α n,2/ −1 n

where t 1-α/2,n-1 is the quantile of a t distribution with n-1 zα/2 z1-α/2 degrees of freedom Student's t-Distribution

0.4

0.3

f(x) 0.2

0.1

0.0 -4 -2 0 2 4 William S. Gosset x (1876-1937) 1 degree of freedom 3 degrees of freedom 2 degrees of freedom 500 degrees of freedom How to Get Better Estimates Expected Value

σ Var(X) = n i.e. the standard error of the sample mean decreases with increasing sample size.

σ ± ⋅ ± ⋅ s CI : x z1−α 2/ CI : x t1−α n,2/ −1 n n

i.e. the width of a confidence interval decreases with increasing sample size. How to Get Better Estimates Expected Value

width W of the CI: σ = ⋅ ⋅ W 2 z1−α 2/ n

1-α α/2 α/2 1-α increases

z1-α/2 increases W increases zα/2 z1-α/2

The width of a confidence interval increases with increasing confidence. Confidence Interval Sample Size

2 σ  2 ⋅ z ⋅ σ  = ⋅ ⋅ =  1−α / 2  W 2 z1−α / 2 n   n  W 

How many observations are required to estimate the expected male BMI with a 95% confidence interval spanning less than 2 kg/m2 (assuming that the BMI follows a Normal distribution with σ=2.5)?

2  2 ⋅1.96 ⋅ 2.5  Answer: n =   = 24.01  2  Summary

- Estimation denotes the scientific process of inferring population parameters from sample data. - An estimator is a mathematical rule that stipulates how to calculate a parameter estimate from observations. - Estimators such as the sample mean are random variables themselves, with an expected value and variance. - Good estimators should be unbiased (accurate), efficient (most precise at a given sample size) and consistent (increasingly precise with increasing sample size). - Instead of providing 'point estimates', confidence intervals represent regions that include a parameter with certain confidence.