Chapter 7: Estimation Sections

Chapter 7 – continued Chapter 7: Estimation Sections

7.1 Statistical Inference Bayesian Methods: 7.2 Prior and Posterior Distributions 7.3 Conjugate Prior Distributions 7.4 Bayes Estimators Frequentist Methods: 7.5 Maximum Likelihood Estimators 7.6 Properties of Maximum Likelihood Estimators Skip: p. 434-441 (EM algorithm and Sampling Plans) Skip: 7.7 Sufﬁcient Statistics Skip: 7.8 Jointly Sufﬁcient Statistics Skip: 7.9 Improving an Estimator

1 / 13 Chapter 7 – continued 7.5 Maximum Likelihood Estimators Frequentist Inference

Likelihood When the joint pdf/pf f (x|θ) is regarded as a function of θ for given observations x1,..., xn it is called the likelihood function.

Maximum Likelihood Estimator Maximum likelihood estimator (MLE): For any given observations x we pick the θ ∈ Ω that maximizes f (x|θ).

Given X = x, the maximum likelihood estimate (MLE) will be a function of x. Notation: θˆ = δ(X) Potentially confusing notation: Sometimes θˆ is used for both the estimator and the estimate. Note: The MLE is required to be in the parameter space Ω. Often it is easier to maximize the log-likelihood L(θ) = log (f (x|θ)

2 / 13 Chapter 7 – continued 7.5 Maximum Likelihood Estimators Examples

Let X ∼ Binomial(n, θ) where n is given. Find the maximum likelihood estimator of θ. Say we observe X = 3, what is the maximum likelihood estimate of θ?

2 Let X1,..., Xn be i.i.d. N(µ, σ ). Find the MLE of µ when σ2 is known Find the MLE of µ and σ2 (both unknown)

ˆ Let X1,..., Xn be i.i.d. Uniform[0, θ], where θ > 0. Find θ

ˆ Let X1,..., Xn be i.i.d. Uniform[θ, θ + 1]. Find θ

3 / 13 Chapter 7 – continued 7.5 Maximum Likelihood Estimators MLE

Intuition: We pick the parameter that makes the observed data most likely

But: The likelihood is not a pdf/pf: If the likelihood of θ2 is larger than the likelihood of θ1, i.e. f (x|θ2) > f (x|θ1) it does NOT mean that θ2 is more probable Recall: θ is not random here Limitations: Does not always exist Not always appropriate - we cannot incorporate “external” (prior) knowledge May not be unique

4 / 13 Chapter 7 – continued Chapter 7: Estimation Sections

5 / 13 Chapter 7 – continued 7.6 Properties of Maximum Likelihood Estimators Properties of MLE’s

Theorem 7.6.2: MLE’s are invariant If θˆ is the MLE of θ and g(θ) is a function of θ then g(θˆ) is the MLE of g(θ)

Example: Let pˆ be the MLE of a probability parameter, e.g. the p in Binomial(n, p). p pˆ Then the MLE of the odds, 1−p is 1−pˆ In general this does not hold for Bayes estimators. E.g. for square error loss E(g(θ)|x) 6= g(E(θ|x))

6 / 13 Chapter 7 – continued 7.6 Properties of Maximum Likelihood Estimators Computation

For MLE’s In many practical situations the maximization we need is not available analytically or too cumbersome There exist many numerical optimization methods, Newton’s Method (see deﬁnition 7.6.2) is one example.

7 / 13 Chapter 7 – continued 7.6 Properties of Maximum Likelihood Estimators Computation

For Bayesian estimators In many practical situations the posterior distribution is not available in closed form This happens if we cannot evaluate the integral for the marginal distribution In stead people either approximate the posterior distribution or take random samples from it, e.g. using Markov Chain Monte Carlo (MCMC) methods

8 / 13 Chapter 7 – continued 7.6 Properties of Maximum Likelihood Estimators Method of Moments (MOM)

Let X1,..., Xn be i.i.d. from f (x|θ) where θ is k dimensional. th 1 Pn j The j sample moment is deﬁned as mj = n i=1 Xi Method of moments (MOM) estimator: match the theoretical moments and the sample moments and solve for parameters:

2 k m1 = E(X1|θ), m2 = E(X1 |θ),..., mk = E(X1 |θ)

Example:

Let X1,..., Xn be i.i.d. Gamma(α, β). Then α α(α + 1) E(X) = and E(X 2) = β β2

Find the MOM estimator of α and β

9 / 13 Chapter 7 – continued 7.6 Properties of Maximum Likelihood Estimators Consistency

Def: Consistent estimators

An estimator δn(X) = δ(X1, X2,..., Xn) is consistent if

p δn(X) −→ θ as n → ∞.

Under fairly general conditions and for a wide range of loss functions, the Bayes estimator is consistent. Consider an estimation problem where there exist a unique M.L.E. of the parameter of interest θ. Then under certain conditions, which are typically satisﬁed in practical problems, the sequence of M.L.E.s is a consistent estimator of θ.

10 / 13 Chapter 7 – continued 7.7 Sufﬁcient Statistics Sufﬁcient Statistics

A statistic: T = r(X1,..., Xn)

Def: Sufﬁcient Statistics

Let X1,..., Xn be a random sample form f (x|θ) and let T be a statistic. If the conditional distribution of

X1,..., Xn|T = t

does not depend on θ then T is called a sufﬁcient statistic

The idea: Just as good to have the observed sufﬁcient statistic as it is to have the individual observations of X1,..., Xn Can limit our search for a good estimator to sufﬁcient statistics

11 / 13 Chapter 7 – continued 7.7 Sufﬁcient Statistics Sufﬁcient Statistics

Theorem 7.7.1: Factorization Criterion

Let X1,..., Xn be a random sample form f (x|θ) where θ ∈ Ω is unknown. A statistic T = r(X1,..., Xn) is a sufﬁcient statistic for θ if n and only if for all x ∈ R and all θ ∈ Ω, the joint pdf/pf fn(x|θ) can be factored as fn(x|θ) = u(x)v (r(x), θ) where function u and v are nonnegative.

The function u may depend on x but not on θ The function v depends on θ but depends on x only through the value of the statistic r(x)

Both MLEs and Bayesian estimators depend on data only through sufﬁcient statistics.

12 / 13 Chapter 7 – continued

END OF CHAPTER 7

13 / 13