Lecture 5: Point Estimation, Bias, and the Method of Moments

Lecture 5: Point Estimation, Bias, and the Method of Moments MATH 667-01 Statistical Inference University of Louisville September 12, 2019 Last modified: 9/12/2019 1 / 19 Lecture 5: Point Estimation, Bias, and the Method of Moments Introduction We start with definitions related to estimation1. Then we discuss loss functions, risk, bias, and variance2. Finally, we consider parameter estimation by the method of moments3. 1CB: Section 7.1, HMC: Section 4.1 2CB: Section 7.3, HMC: Section 4.1 3CB: Section 7.2 2 / 19 Lecture 5: Point Estimation, Bias, and the Method of Moments Introduction We consider point estimation of the unknown parameter θ (or function of the unknown parameter) in a parametric model X ∼ fX (x; θ). Usually, we assume X1;:::;Xn is a random sample from a population with pdf/pmf f(x; θ). Estimates θ^ of θ based on observed data x1; : : : ; xn gives us an estimated model from the parametric family. Definition L5.1:4 A point estimator is any function W (X1;:::;Xn) of a sample. That is, any statistic is a point estimator. Note that an estimator is a function of the sample X1;:::;Xn so it is random. Alternately, we refer to the observed value of a point estimator based on a realized data values x1; : : : ; xn as a point estimate. So, the point estimate W (x1; : : : ; xn) is not random. 4CB: Definition 7.1.1 on p.311, HMC: page 226 3 / 19 Lecture 5: Point Estimation, Bias, and the Method of Moments Introduction We would like to know if an estimator is likely to give \good" estimates of the parameters based on observed data x1; : : : ; xn. Or given competing estimators, we want to know which one is likely to perform \best" at estimating the parameters. So, we need to define what we mean by \good" or \best" and describe a mathematical framework for evaluating estimators. 4 / 19 Lecture 5: Point Estimation, Bias, and the Method of Moments Loss and Risk Functions Definition L5.2:5 Suppose X is a random vector with pmf/pdf f(x; θ) where θ is in the parameter space Θ. Define δ to be a decision rule for deciding on a point estimate for θ. Then a loss function L(θ; δ) is a real-valued function that assigns a number in the interval [0; 1) to each θ 2 Θ and each decision rule δ. Some common loss functions for Θ ⊂ R are 2 squared error loss: L2(θ; δ) = (θ − δ) absolute error loss: L1(θ; δ) = jθ − δj. Definition L5.3:6 The risk function of a rule δ(x) for estimating θ based on observed data x is R(θ; δ) = Eθ [L(θ; δ(X))] : 5CB: page 348, HMC: page 414 6CB: page 349, HMC: page 414 5 / 19 Lecture 5: Point Estimation, Bias, and the Method of Moments Mean Squared Error and Bias The risk function of an estimator under squared-error loss has another name. Definition L5.4:7 The mean squared error (MSE) of any estimator W of a parameter θ is the function of θ defined by 2 Eθ (W − θ) . Definition L5.5:8 The bias of a point estimator W , of a parameter θ, is the difference between the expected value of W and θ. That is, Biasθ[W ] = Eθ [W ] − θ: An estimator is called unbiased if it satisfies Eθ[W ] = θ for all θ: 7CB: Definition 7.3.1 on p.330, HMC: page 415 8CB: Definition 7.3.2 on p.330 6 / 19 Lecture 5: Point Estimation, Bias, and the Method of Moments Mean Squared Error and Bias The MSE can be split into two parts, one that measures the variability (precision) and one that measures the bias (accuracy). Theorem L5.1: If Var[W ] exists, then 2 2 Eθ (W − θ) = Varθ[W ] + (Biasθ[W ]) : Proof of Theorem L5.1: 2 h 2i Eθ (W − θ) = Eθ ((W − Eθ[W ]) + (Eθ[W ] − θ)) 2 = Eθ[(W − Eθ[W ]) ] + 2Eθ [W − Eθ[W ]] (Eθ[W ] − θ) + 2 (Eθ[W ] − θ) h 2i 2 = Eθ (W − Eθ[W ]) + (Eθ[W ] − θ) 2 = Varθ[W ] + (Biasθ[W ]) 7 / 19 Lecture 5: Point Estimation, Bias, and the Method of Moments Simulation: Comparison of the Sample Mean and Median Suppose X1;:::;X9 are iid Normal(0; 100) random variables and we want to estimate the population mean (equivalently, the population median). In R, the data can be simulated using the following code: > set.seed(57487) > x=rnorm(9,sd=10) > x [1] 2.497918 1.381941 11.905189 2.045325 [4] -10.494095 -1.985419 -6.860869 17.552160 [9] -7.265646 8 / 19 Lecture 5: Point Estimation, Bias, and the Method of Moments Simulation: Comparison of the Sample Mean and Median The estimates (sample mean and sample median) can be computed as follows. > mean(x) [1] 0.9751672 > median(x) [1] 1.381941 so for the sample x = (x1; : : : ; x9), the squared error losses 2 are L2(µ, x¯) = (0 − 0:9751672) ≈ 0:95 and 2 L2(µ, x(5)) = (0 − 1:381941) ≈ 1:91, and the absolute error losses are L1(µ, x¯) = j0 − 0:9751672j ≈ 0:975 and L1(µ, x(5)) = j0 − 1:381941j ≈ 1:38. 9 / 19 Lecture 5: Point Estimation, Bias, and the Method of Moments Simulation: Comparison of the Sample Mean and Median To see that what happens on average, we need to repeat this simulation many times. The following R code repeats this calculation 10 000 times. > set.seed(93873) > R=1e4 > sample.mean=rep(0,R) > sample.median=rep(0,R) > for (i in 1:R){ + x=rnorm(9,sd=10) + sample.mean[i]=mean(x) + sample.median[i]=median(x) + cat(round(x)," mean=",round(mean(x),2)," median=", + round(median(x),2),"\n") + } 10 / 19 Lecture 5: Point Estimation, Bias, and the Method of Moments Simulation: Comparison of the Sample Mean and Median Here is some output from the 10 000 simulated data sets. -5 -7 13 9 15 3 -2 12 -2 mean= 3.8 median= 2.72 -15 -2 11 -13 2 -3 5 0 2 mean= -1.5 median= -0.4 -2 1 -11 -2 -13 -9 -6 -3 7 mean= -4.08 median= -2.83 24 -2 10 4 -4 -4 14 -8 11 mean= 4.88 median= 4.27 -5 0 18 15 5 13 1 7 3 mean= 6.51 median= 5.43 ... 8 -8 -15 -12 7 6 22 -23 -9 mean= -2.62 median= -8.41 1 11 -9 -7 1 12 13 -2 16 mean= 3.95 median= 0.94 -4 3 2 15 6 5 4 13 16 mean= 6.76 median= 4.72 20 -6 -2 8 -1 -12 17 -5 -14 mean= 0.52 median= -1.6 -1 -4 -15 -4 24 -18 -10 17 -6 mean= -1.99 median= -3.67 11 / 19 Lecture 5: Point Estimation, Bias, and the Method of Moments Simulation: Comparison of the Sample Mean and Median Now, we can use the 10 000 simulations to estimate the risk under each of the loss functions. > mean((0-sample.mean)^2) [1] 11.37269 > mean((0-sample.median)^2) [1] 16.65672 > mean(abs(0-sample.mean)) [1] 2.699447 > mean(abs(0-sample.median)) [1] 3.2542 So, in this simulation from the Normal(0; 100) distribution under squared error loss, the risk of the sample mean is estimated to be R^(µ, x¯) = 11:37 which is smaller than the ^ estimated risk of the sample median R(µ, x(5)) = 16:66. 12 / 19 Lecture 5: Point Estimation, Bias, and the Method of Moments Simulation: Comparison of the Sample Mean and Median Instead, suppose X1;:::;X9 are iid Cauchy random variables having pdf 1 f(x) = ; π(1 + x2) and we want to estimate the population median. We perform the simulation study replacing the sample from the standard normal distribution with one from a Cauchy distribution. > set.seed(626913) > R=1e4 > sample.mean=rep(0,R) > sample.median=rep(0,R) > for (i in 1:R){ + x=rt(9,df=1) + sample.mean[i]=mean(x) + sample.median[i]=median(x) + } 13 / 19 Lecture 5: Point Estimation, Bias, and the Method of Moments Simulation: Comparison of the Sample Mean and Median > mean((0-sample.mean)^2) [1] 228501.2 > mean((0-sample.median)^2) [1] 0.4044862 > mean(abs(0-sample.mean)) [1] 10.4965 > mean(abs(0-sample.median)) [1] 0.4696931 So, in this simulation from the Cauchy distribution under squared error loss, the risk of the sample mean is estimated to be R^(µ, x¯) = 228501 which is smaller than the estimated risk ^ of the sample median R(µ, x(5)) = 0:4045. 14 / 19 Lecture 5: Point Estimation, Bias, and the Method of Moments Method of Moments This is a simple approach based on matching the sample and population moments.9 Let X1;:::;Xn be a sample from a population with pdf/pmf f(x; θ1; : : : ; θk). The method of moments estimator of the parameters is ~ ~ denoted by (θ1;:::; θk) and is obtained by solving the equations 0 m1 = µ1(θ1; : : : ; θk) . 0 mk = µk(θ1; : : : ; θk) 1 Pn j for (θ1; : : : ; θk) where mj = n i=1 Xi and 0 j µj = E[X ] for j = 1; : : : ; k. 9CB: Section 7.2, HMC: Exercise 3.1.18 on p.165 15 / 19 Lecture 5: Point Estimation, Bias, and the Method of Moments Method of Moments Example L5.1: Let X1;:::;Xn be a random sample from a Bernoulli(p) distribution which has probability mass function x 1−x P (X = x) = p (1 − p) If0;1g(x) where p 2 [0; 1].

Load more