Lecture 5: Point Estimation, Bias, and the Method of Moments

MATH 667-01 University of Louisville

September 12, 2019 Last modified: 9/12/2019

1 / 19 Lecture 5: Point Estimation, Bias, and the Method of Moments Introduction

We start with definitions related to estimation1. Then we discuss loss functions, risk, bias, and variance2. Finally, we consider parameter estimation by the method of moments3.

1CB: Section 7.1, HMC: Section 4.1 2CB: Section 7.3, HMC: Section 4.1 3CB: Section 7.2 2 / 19 Lecture 5: Point Estimation, Bias, and the Method of Moments Introduction

We consider point estimation of the unknown parameter θ (or function of the unknown parameter) in a parametric model X ∼ fX (x; θ).

Usually, we assume X1,...,Xn is a random sample from a population with pdf/pmf f(x; θ). Estimates θˆ of θ based on observed x1, . . . , xn gives us an estimated model from the parametric family. Definition L5.1:4 A point is any function W (X1,...,Xn) of a sample. That is, any is a point estimator. Note that an estimator is a function of the sample X1,...,Xn so it is random. Alternately, we refer to the observed value of a point estimator based on a realized data values x1, . . . , xn as a point estimate. So, the point estimate W (x1, . . . , xn) is not random. 4CB: Definition 7.1.1 on p.311, HMC: page 226 3 / 19 Lecture 5: Point Estimation, Bias, and the Method of Moments Introduction

We would like to know if an estimator is likely to give “good” estimates of the parameters based on observed data x1, . . . , xn. Or given competing , we want to know which one is likely to perform “best” at estimating the parameters. So, we need to define what we by “good” or “best” and describe a mathematical framework for evaluating estimators.

4 / 19 Lecture 5: Point Estimation, Bias, and the Method of Moments Loss and Risk Functions

Definition L5.2:5 Suppose X is a random vector with pmf/pdf f(x; θ) where θ is in the parameter space Θ. Define δ to be a decision rule for deciding on a point estimate for θ. Then a L(θ, δ) is a real-valued function that assigns a number in the interval [0, ∞) to each θ ∈ Θ and each decision rule δ. Some common loss functions for Θ ⊂ R are 2 squared error loss: L2(θ, δ) = (θ − δ) absolute error loss: L1(θ, δ) = |θ − δ|. Definition L5.3:6 The risk function of a rule δ(x) for estimating θ based on observed data x is

R(θ, δ) = Eθ [L(θ, δ(X))] .

5CB: page 348, HMC: page 414 6CB: page 349, HMC: page 414 5 / 19 Lecture 5: Point Estimation, Bias, and the Method of Moments Mean Squared Error and Bias

The risk function of an estimator under squared-error loss has another name. Definition L5.4:7 The mean squared error (MSE) of any estimator W of a parameter θ is the function of θ defined by  2 Eθ (W − θ) . Definition L5.5:8 The bias of a point estimator W , of a parameter θ, is the difference between the expected value of W and θ. That is,

Biasθ[W ] = Eθ [W ] − θ.

An estimator is called unbiased if it satisfies

Eθ[W ] = θ for all θ.

7CB: Definition 7.3.1 on p.330, HMC: page 415 8CB: Definition 7.3.2 on p.330 6 / 19 Lecture 5: Point Estimation, Bias, and the Method of Moments Mean Squared Error and Bias

The MSE can be split into two parts, one that measures the variability (precision) and one that measures the bias (accuracy). Theorem L5.1: If Var[W ] exists, then

 2 2 Eθ (W − θ) = Varθ[W ] + (Biasθ[W ]) .

Proof of Theorem L5.1:

 2 h 2i Eθ (W − θ) = Eθ ((W − Eθ[W ]) + (Eθ[W ] − θ)) 2 = Eθ[(W − Eθ[W ]) ] + 2Eθ [W − Eθ[W ]] (Eθ[W ] − θ) + 2 (Eθ[W ] − θ) h 2i 2 = Eθ (W − Eθ[W ]) + (Eθ[W ] − θ) 2 = Varθ[W ] + (Biasθ[W ])

7 / 19 Lecture 5: Point Estimation, Bias, and the Method of Moments Simulation: Comparison of the Sample Mean and

Suppose X1,...,X9 are iid Normal(0, 100) random variables and we want to estimate the population mean (equivalently, the population median). In R, the data can be simulated using the following code: > set.seed(57487) > x=rnorm(9,sd=10) > x [1] 2.497918 1.381941 11.905189 2.045325 [4] -10.494095 -1.985419 -6.860869 17.552160 [9] -7.265646

8 / 19 Lecture 5: Point Estimation, Bias, and the Method of Moments Simulation: Comparison of the Sample Mean and Median

The estimates (sample mean and sample median) can be computed as follows. > mean(x) [1] 0.9751672 > median(x) [1] 1.381941

so for the sample x = (x1, . . . , x9), the squared error losses 2 are L2(µ, x¯) = (0 − 0.9751672) ≈ 0.95 and 2 L2(µ, x(5)) = (0 − 1.381941) ≈ 1.91, and the absolute error losses are L1(µ, x¯) = |0 − 0.9751672| ≈ 0.975 and L1(µ, x(5)) = |0 − 1.381941| ≈ 1.38.

9 / 19 Lecture 5: Point Estimation, Bias, and the Method of Moments Simulation: Comparison of the Sample Mean and Median

To see that what happens on average, we need to repeat this simulation many times. The following R code repeats this calculation 10 000 times. > set.seed(93873) > R=1e4 > sample.mean=rep(0,R) > sample.median=rep(0,R) > for (i in 1:R){ + x=rnorm(9,sd=10) + sample.mean[i]=mean(x) + sample.median[i]=median(x) + cat(round(x)," mean=",round(mean(x),2)," median=", + round(median(x),2),"\n") + }

10 / 19 Lecture 5: Point Estimation, Bias, and the Method of Moments Simulation: Comparison of the Sample Mean and Median

Here is some output from the 10 000 simulated data sets.

-5 -7 13 9 15 3 -2 12 -2 mean= 3.8 median= 2.72 -15 -2 11 -13 2 -3 5 0 2 mean= -1.5 median= -0.4 -2 1 -11 -2 -13 -9 -6 -3 7 mean= -4.08 median= -2.83 24 -2 10 4 -4 -4 14 -8 11 mean= 4.88 median= 4.27 -5 0 18 15 5 13 1 7 3 mean= 6.51 median= 5.43 ... 8 -8 -15 -12 7 6 22 -23 -9 mean= -2.62 median= -8.41 1 11 -9 -7 1 12 13 -2 16 mean= 3.95 median= 0.94 -4 3 2 15 6 5 4 13 16 mean= 6.76 median= 4.72 20 -6 -2 8 -1 -12 17 -5 -14 mean= 0.52 median= -1.6 -1 -4 -15 -4 24 -18 -10 17 -6 mean= -1.99 median= -3.67

11 / 19 Lecture 5: Point Estimation, Bias, and the Method of Moments Simulation: Comparison of the Sample Mean and Median

Now, we can use the 10 000 simulations to estimate the risk under each of the loss functions. > mean((0-sample.mean)^2) [1] 11.37269 > mean((0-sample.median)^2) [1] 16.65672 > mean(abs(0-sample.mean)) [1] 2.699447 > mean(abs(0-sample.median)) [1] 3.2542 So, in this simulation from the Normal(0, 100) distribution under squared error loss, the risk of the sample mean is estimated to be Rˆ(µ, x¯) = 11.37 which is smaller than the ˆ estimated risk of the sample median R(µ, x(5)) = 16.66.

12 / 19 Lecture 5: Point Estimation, Bias, and the Method of Moments Simulation: Comparison of the Sample Mean and Median

Instead, suppose X1,...,X9 are iid Cauchy random variables having pdf 1 f(x) = , π(1 + x2) and we want to estimate the population median. We perform the simulation study replacing the sample from the standard normal distribution with one from a Cauchy distribution. > set.seed(626913) > R=1e4 > sample.mean=rep(0,R) > sample.median=rep(0,R) > for (i in 1:R){ + x=rt(9,df=1) + sample.mean[i]=mean(x) + sample.median[i]=median(x) + } 13 / 19 Lecture 5: Point Estimation, Bias, and the Method of Moments Simulation: Comparison of the Sample Mean and Median

> mean((0-sample.mean)^2) [1] 228501.2 > mean((0-sample.median)^2) [1] 0.4044862 > mean(abs(0-sample.mean)) [1] 10.4965 > mean(abs(0-sample.median)) [1] 0.4696931 So, in this simulation from the Cauchy distribution under squared error loss, the risk of the sample mean is estimated to be Rˆ(µ, x¯) = 228501 which is smaller than the estimated risk ˆ of the sample median R(µ, x(5)) = 0.4045.

14 / 19 Lecture 5: Point Estimation, Bias, and the Method of Moments Method of Moments

This is a simple approach based on the sample and population moments.9

Let X1,...,Xn be a sample from a population with pdf/pmf f(x; θ1, . . . , θk). The method of moments estimator of the parameters is ˜ ˜ denoted by (θ1,..., θk) and is obtained by solving the equations

0 m1 = µ1(θ1, . . . , θk) . . 0 mk = µk(θ1, . . . , θk)

1 Pn j for (θ1, . . . , θk) where mj = n i=1 Xi and 0 j µj = E[X ] for j = 1, . . . , k.

9CB: Section 7.2, HMC: Exercise 3.1.18 on p.165 15 / 19 Lecture 5: Point Estimation, Bias, and the Method of Moments Method of Moments

Example L5.1: Let X1,...,Xn be a random sample from a Bernoulli(p) distribution which has probability mass function

x 1−x P (X = x) = p (1 − p) I{0,1}(x)

where p ∈ [0, 1]. Find the method of moments estimator of p. 0 ¯ Answer to Example L5.1: Setting m1 = µ1 where m1 = X 0 and µ1 = E[X1] = p, the method of moments estimator is p˜ = X¯. Example L5.2: Suppose 10 voters are randomly selected in an exit poll and 4 voters say that they voted for the incumbent. What is the method of moments estimate of p? Answer to Example L5.2: The method of moments estimate of the proportion of all voters who voted for the incumbent is Pn x 4 p˜ = i=1 i = = .4. n 10

16 / 19 Lecture 5: Point Estimation, Bias, and the Method of Moments Method of Moments

2 Example L5.3: Suppose X1,...,Xn are iid Normal(µ, σ ) random variables. Find the method of moments estimator of (µ, σ2). 0 Answer to Example L5.3: Here µ1 = E[X] = µ and 0 2 2 2 2 µ2 = E[X ] = Var[X] + (E[X]) = σ + µ . 2 2 2 So, we have µ˜ = m1 = X¯ and σf +µ ˜ = m2. Solving for σf, we obtain

2 2 σf = m2 − µ˜ Pn X2 = i=1 i − X¯ 2 n Pn X2 − nX¯ 2 = i=1 i n Pn (X − X¯)2 = i=1 i . n 2 ¯ 1 Pn ¯ 2 Thus, (˜µ, σf) = X, n i=1(Xi − X) .

17 / 19 Lecture 5: Point Estimation, Bias, and the Method of Moments Method of Moments

Example L5.4: Suppose X1,...,Xn are iid Uniform(−θ, θ) random variables which have probability density function 1 f(x; θ) = I (x) 2θ (−θ,θ) where θ > 0. Find a method of moments estimator of θ based on the second . Note that since E[X] = 0, it cannot be used to estimate θ.

18 / 19 Lecture 5: Point Estimation, Bias, and the Method of Moments Method of Moments

Answer to Example L5.4: Since

Z θ  θ  3  3  0 2 1 1 1 3 1 θ θ 2 µ2 = x dx = x = − − = θ /3, −θ 2θ 2θ 3 −θ 2θ 3 3 we solve the equation

n 1 X X2 = θ2/3 n i i=1 for θ and obtain v u n u 3 X θ˜ = ±t X2. n i i=1

19 / 19 Lecture 5: Point Estimation, Bias, and the Method of Moments