Chapter 7 – continued Chapter 7: Estimation Sections

7.1 Bayesian Methods: 7.2 Prior and Posterior Distributions 7.3 Conjugate Prior Distributions 7.4 Bayes Estimators Frequentist Methods: 7.5 Maximum Likelihood Estimators 7.6 Properties of Maximum Likelihood Estimators Skip: p. 434-441 (EM algorithm and Plans) Skip: 7.7 Sufficient Skip: 7.8 Jointly Sufficient Statistics Skip: 7.9 Improving an Estimator

1 / 13 Chapter 7 – continued 7.5 Maximum Likelihood Estimators

Likelihood When the joint pdf/pf f (x|θ) is regarded as a function of θ for given observations x1,..., xn it is called the .

Maximum Likelihood Estimator Maximum likelihood estimator (MLE): For any given observations x we pick the θ ∈ Ω that maximizes f (x|θ).

Given X = x, the maximum likelihood estimate (MLE) will be a function of x. Notation: θˆ = δ(X) Potentially confusing notation: Sometimes θˆ is used for both the estimator and the estimate. Note: The MLE is required to be in the space Ω. Often it is easier to maximize the log-likelihood L(θ) = log (f (x|θ)

2 / 13 Chapter 7 – continued 7.5 Maximum Likelihood Estimators Examples

Let X ∼ Binomial(n, θ) where n is given. Find the maximum likelihood estimator of θ. Say we observe X = 3, what is the maximum likelihood estimate of θ?

2 Let X1,..., Xn be i.i.d. N(µ, σ ). Find the MLE of µ when σ2 is known Find the MLE of µ and σ2 (both unknown)

ˆ Let X1,..., Xn be i.i.d. Uniform[0, θ], where θ > 0. Find θ

ˆ Let X1,..., Xn be i.i.d. Uniform[θ, θ + 1]. Find θ

3 / 13 Chapter 7 – continued 7.5 Maximum Likelihood Estimators MLE

Intuition: We pick the parameter that makes the observed most likely

But: The likelihood is not a pdf/pf: If the likelihood of θ2 is larger than the likelihood of θ1, i.e. f (x|θ2) > f (x|θ1) it does NOT that θ2 is more probable Recall: θ is not random here Limitations: Does not always exist Not always appropriate - we cannot incorporate “external” (prior) knowledge May not be unique

4 / 13 Chapter 7 – continued Chapter 7: Estimation Sections

7.1 Statistical Inference Bayesian Methods: 7.2 Prior and Posterior Distributions 7.3 Conjugate Prior Distributions 7.4 Bayes Estimators Frequentist Methods: 7.5 Maximum Likelihood Estimators 7.6 Properties of Maximum Likelihood Estimators Skip: p. 434-441 (EM algorithm and Sampling Plans) Skip: 7.7 Sufficient Statistics Skip: 7.8 Jointly Sufficient Statistics Skip: 7.9 Improving an Estimator

5 / 13 Chapter 7 – continued 7.6 Properties of Maximum Likelihood Estimators Properties of MLE’s

Theorem 7.6.2: MLE’s are invariant If θˆ is the MLE of θ and g(θ) is a function of θ then g(θˆ) is the MLE of g(θ)

Example: Let pˆ be the MLE of a probability parameter, e.g. the p in Binomial(n, p). p pˆ Then the MLE of the odds, 1−p is 1−pˆ In general this does not hold for Bayes estimators. E.g. for square error loss E(g(θ)|x) 6= g(E(θ|x))

6 / 13 Chapter 7 – continued 7.6 Properties of Maximum Likelihood Estimators Computation

For MLE’s In many practical situations the maximization we need is not available analytically or too cumbersome There exist many numerical optimization methods, Newton’s Method (see definition 7.6.2) is one example.

7 / 13 Chapter 7 – continued 7.6 Properties of Maximum Likelihood Estimators Computation

For Bayesian estimators In many practical situations the posterior distribution is not available in closed form This happens if we cannot evaluate the integral for the marginal distribution In stead people either approximate the posterior distribution or take random samples from it, e.g. using Markov Chain Monte Carlo (MCMC) methods

8 / 13 Chapter 7 – continued 7.6 Properties of Maximum Likelihood Estimators Method of Moments (MOM)

Let X1,..., Xn be i.i.d. from f (x|θ) where θ is k dimensional. th 1 Pn j The j is defined as mj = n i=1 Xi Method of moments (MOM) estimator: match the theoretical moments and the sample moments and solve for :

2 k m1 = E(X1|θ), m2 = E(X1 |θ),..., mk = E(X1 |θ)

Example:

Let X1,..., Xn be i.i.d. Gamma(α, β). Then α α(α + 1) E(X) = and E(X 2) = β β2

Find the MOM estimator of α and β

9 / 13 Chapter 7 – continued 7.6 Properties of Maximum Likelihood Estimators Consistency

Def: Consistent estimators

An estimator δn(X) = δ(X1, X2,..., Xn) is consistent if

p δn(X) −→ θ as n → ∞.

Under fairly general conditions and for a wide of loss functions, the is consistent. Consider an estimation problem where there exist a unique M.L.E. of the parameter of interest θ. Then under certain conditions, which are typically satisfied in practical problems, the sequence of M.L.E.s is a consistent estimator of θ.

10 / 13 Chapter 7 – continued 7.7 Sufficient Statistics Sufficient Statistics

A : T = r(X1,..., Xn)

Def: Sufficient Statistics

Let X1,..., Xn be a random sample form f (x|θ) and let T be a statistic. If the conditional distribution of

X1,..., Xn|T = t

does not depend on θ then T is called a sufficient statistic

The idea: Just as good to have the observed sufficient statistic as it is to have the individual observations of X1,..., Xn Can limit our search for a good estimator to sufficient statistics

11 / 13 Chapter 7 – continued 7.7 Sufficient Statistics Sufficient Statistics

Theorem 7.7.1: Factorization Criterion

Let X1,..., Xn be a random sample form f (x|θ) where θ ∈ Ω is unknown. A statistic T = r(X1,..., Xn) is a sufficient statistic for θ if n and only if for all x ∈ R and all θ ∈ Ω, the joint pdf/pf fn(x|θ) can be factored as fn(x|θ) = u(x)v (r(x), θ) where function u and v are nonnegative.

The function u may depend on x but not on θ The function v depends on θ but depends on x only through the value of the statistic r(x)

Both MLEs and Bayesian estimators depend on data only through sufficient statistics.

12 / 13 Chapter 7 – continued

END OF CHAPTER 7

13 / 13