Estimation COMP 245 STATISTICS

Estimation COMP 245 STATISTICS Dr N A Heard Contents 1 Parameter Estimation 2 1.1 Introduction . .2 1.2 Estimators . .2 2 Point Estimates 3 2.1 Introduction . .3 2.2 Bias, Efficiency and Consistency . .4 2.3 Maximum Likelihood Estimation . .5 3 Confidence Intervals 10 3.1 Introduction . 10 3.2 Normal Distribution with Known Variance . 11 3.3 Normal Distribution with Unknown Variance . 11 1 1 Parameter Estimation 1.1 Introduction In statistics we typically analyse a set of data by considering it as a random sample from a larger, underlying population about which we wish to make inference. 1. The chapter on numerical summaries considered various summary sample statistics for describing a particular sample of data. We defined quantities such as the sample mean x¯, and sample variance s2. 2. The chapters on random variables, on the other hand, were concerned with character- ising the underlying population. We defined corresponding population parameters such as the population mean E(X), and population variance Var(X). We noticed a duality between the two sets of definitions of statistics and parameters. In particular, we saw that they were equivalent in the extreme circumstance that our sample exactly represented the entire population (so, for example, the cdf of a new randomly drawn member of the population is precisely the empirical cdf of our sample). Away from this extreme circumstance, the sample statistics can be seen to give approximate values for the corresponding population parameters. We can use them as estimates. For convenient modelling of populations (point 2), we met several simple parameterised probability distributions (e.g. Poi(l), Exp(l), U(a, b), N(m, s2)). There, population parameters such as mean and variance are functions of the distribution parameters. So more generally, we may wish to use the data, or just their sample statistics, to estimate distribution parameters. For a sample of data x = (x1,..., xn), we can consider these observed values as realisations of corresponding random variables X = (X1,..., Xn). If the underlying population (from which the sample has been drawn) is such that the distribution of a single random draw X has probability distribution PXjq(·jq), where q is a generic parameter or vector of parameters, we typically then assume that our n data point random variables X are i.i.d. PXjq(·jq). Note that PXjq(·jq) is the conditional distribution for draws from our model for the population given the true (but unknown) parameter values q. 1.2 Estimators Statistics, Estimators and Estimates Consider a sequence of random variables X = (X1,..., Xn) corresponding to n i.i.d. data samples to be drawn from a population with distribution PX. Let x = (x1,..., xn) be the corresponding realised values we observe for these r.v.s. A statistic is a function T : Rn ! Rp applied to the random variable X. Note that T(X) = ¯ n T(X1,..., Xn) is itself a random variable. For example, X = ∑i=1 Xi/n is a statistic. The corresponding realised value of a statistic T is written t = t(x) (e.g. t = x¯). If a statistic T(X) is to be used to approximate parameters of the distribution PXjq(·jq), we say T is an estimator for those parameters; we call the actual realised value of the estimator for a particular data sample, t(x), an estimate. 2 2 Point Estimates 2.1 Introduction Definition A point estimate is a statistic estimating a single parameter or characteristic of a distribution. For a running example which we will return to, consider a sample of data (x1,..., xn) from an Exponential(l) distribution with unknown l; we might construct a point estimate for either l itself, or perhaps for the mean of the distribution = l−1, or the variance = l−2. Concentrating on the mean of the distribution in this example, a natural estimator of this could be the sample mean, X¯ . But alternatively, we could propose simply the first data point we observe, X1 as our point estimator; or, if the data had been given to us already ordered we might (lazily) suggest the sample median, X(fn+1g/2). How do we quantify which estimator is better? Sampling Distribution Suppose for a moment we actually knew the parameter values q of our population distribution PXjq(·jq) (so suppose we know l in our exponential example). Then since our sampled data are considered to be i.i.d. realisations from this distribution (so each Xi ∼ Exp(l) in that example), it follows that any statistic T = T(X1,..., Xn) is also a random variable with some distribution which also only depends on these parameters. If we are able to (approximately) identify this sampling distribution of our statistic, call it PTjq, we can then find the conditional expectation, variance, etc of our statistic. Sometimes PTjq, will have a convenient closed-form expression which we can derive, but in other situations it will not. For those other situations, then for the special case where our statistic T is the sample mean, then provided that our sample size n is large, we use the CLT to give us an approximate distribution for PTjq. Whatever the form of the population distribution PXjq, we know from the CLT that approximately X¯ ∼ N(E[X], Var[X]/n). For our Xi ∼ Exp(l) example, it can be shown that the statistic T = X¯ is a continuous random variable with pdf (nl)ntn−1e−nlt f (tjl) = , t > 0. Tjl (n − 1)! This is actually the pdf of a Gamma(n, nl) random variable, a well known continuous distribution, so T ∼ Gamma(n, nl). a So using the fact that Gamma(a, b) has expectation , here we have b n 1 E(X¯ ) = E (Tjl) = = = E(X). Tjl nl l So the expected value of X¯ is the true population mean. 3 2.2 Bias, Efficiency and Consistency Bias The previous result suggests that X¯ is, at least in one respect, a good statistic for estimating the unknown mean of an exponential distribution. Formally, we define the bias of an estimator T for a parameter q, bias(T) = E[Tjq] − q. If an estimator has zero bias we say that estimator is unbiased. So in our example, X¯ gives an unbiased estimate of q = l−1, the mean of an exponential distribution. [In contrast, the sample median is a biased estimator of the mean of an exponential distri- 5 bution. For example if n = 3, it can be shown that E(X ) = .] (fn+1g/2) 6l In fact, the unbiasedness of X¯ is true for any distribution; the sample mean x¯ will always be an unbiased estimate for the population mean m: ∑n X ∑n E(X ) nm E(X¯ ) = E i=1 i = i=1 i = = m. n n n Similarly, there is an estimator for the population variance s2 which is unbiased, irrespec- tive of the population distribution. Disappointingly, this is not the sample variance n 2 1 ¯ 2 S = ∑(Xi − X) , n i=1 as this has one too many degrees of freedom. (Note that if we knew the population mean m, n 1 2 2 then ∑(Xi − m) would be unbiased for s .) n i=1 Bias-Corrected Sample Variance However, we can instead define the bias-corrected sample variance, 1 n S2 = (X − X¯ )2 n−1 − ∑ i n 1 i=1 which is then always an unbiased estimator of the population variance s2. Warning: Because of it’s usefulness as an unbiased estimate of s2, many statistical text books and 1 n software packages (and indeed your formula sheet for the exam!) refer to s2 = (x − x¯)2 as n−1 − ∑ i n 1 i=1 the sample variance. Efficiency Suppose we have two unbiased estimators for a parameter q, which we will call Qˆ (X) and Q˜ (X). And again suppose we have the corresponding sampling distributions for these estimators, PQˆ jq and PQ˜ jq, and so can calculate their means, variances, etc. Then we say Qˆ is more efficient than Q˜ if: ˆ ˜ 1. 8q, VarQˆ jq Q q ≤ VarQ˜ jq Q q ; 4 ˆ ˜ 2. 9q s.t. VarQˆ jq Q q < VarQ˜ jq Q q . That is, the variance of Qˆ is never higher than that of Q˜ , no matter what the true value of q is; and for some value of q, Qˆ has a strictly lower variance than Q˜ . If Qˆ is more efficient than any other possible estimator, we say Qˆ is efficient. Example Suppose we have a population with mean m and variance s2, from which we are to obtain a random sample X1,..., Xn. Consider two estimators for m, Mˆ = X¯ , the sample mean, and M˜ = X1, the first observation in the sample. Well we have seen E(X¯ ) = m always, and certainly E(X1) = m, so both estimators are unbiased. s2 We also know Var(X¯ ) = , and of course Var(X ) = s2, independent of m. So if n ≥ 2, Mˆ n 1 is more efficient than M˜ as an estimator of m. Consistency In the previous example, the worst aspect of the estimate M˜ = X1 is that it does not change, let alone improve, no matter how large a sample n of data is collected. In contrast, the variance of Mˆ = X¯ gets smaller and smaller as n increases. Technically, we say an estimator Qˆ is a consistent estimator for the parameter q if Qˆ con- verges in probability to q. That is, 8e > 0, P(jQˆ − qj > e) ! 0 as n ! ¥. This is hard to demonstrate, but if Qˆ is unbiased we do have: lim Var Qˆ = 0 ) Qˆ is consistent.

Estimation COMP 245 STATISTICS

Phase Transition Unbiased Estimation in High Dimensional Settings Arxiv

Bias, Mean-Square Error, Relative Efficiency

Weak Instruments and Finite-Sample Bias

STAT 830 the Basics of Nonparametric Models The

Bias in Parametric Estimation: Reduction and Useful Side-Effects

Nearly Weighted Risk Minimal Unbiased Estimation✩ ∗ Ulrich K

Chapter 4 Parameter Estimation

Ch. 2 Estimators

Estimators, Bias and Variance

Lecture 9 Estimators

A Simulation Study of the Robustness of the Least Median of Squares Estimator of Slope in a Regression Through the Origin Model

5 Decision Theory: Basic Concepts