STAT 830 Likelihood Methods

STAT 830 Likelihood Methods Richard Lockhart Simon Fraser University STAT 830 — Fall 2011 Richard Lockhart (Simon Fraser University) STAT 830 Likelihood Methods STAT 830 — Fall 2011 1 / 32 Purposes of These Notes Define the likelihood, log-likelihood and score functions. Summarize likelihood methods Describe maximum likelihood estimation Give sequence of examples. Richard Lockhart (Simon Fraser University) STAT 830 Likelihood Methods STAT 830 — Fall 2011 2 / 32 Likelihood Methods of Inference Toss a thumb tack 6 times and imagine it lands point up twice. Let p be probability of landing points up. Probability of getting exactly 2 point up is 15p2(1 − p)4 This function of p, is the likelihood function. Def’n: The likelihood function is map L: domain Θ, values given by L(θ)= fθ(X ) Key Point: think about how the density depends on θ not about how it depends on X . Notice: X , observed value of the data, has been plugged into the formula for density. Notice: coin tossing example uses the discrete density for f . We use likelihood for most inference problems: Richard Lockhart (Simon Fraser University) STAT 830 Likelihood Methods STAT 830 — Fall 2011 3 / 32 List of likelihood techniques Point estimation: we must compute an estimate θˆ = θˆ(X ) which lies in Θ. The maximum likelihood estimate (MLE) of θ is the value θˆ which maximizes L(θ) over θ ∈ Θ if such a θˆ exists. Point estimation of a function of θ: we must compute an estimate φˆ = φˆ(X ) of φ = g(θ). We use φˆ = g(θˆ) where θˆ is the MLE of θ. Interval (or set) estimation. We must compute a set C = C(X ) in Θ which we think will contain θ0. We will use {θ ∈ Θ : L(θ) > c} for a suitable c. Hypothesis testing: decide whether or not θ0 ∈ Θ0 where Θ0 ⊂ Θ. We base our decision on the likelihood ratio sup{L(θ); θ ∈ Θ \ Θ } 0 . sup{L(θ); θ ∈ Θ0} Richard Lockhart (Simon Fraser University) STAT 830 Likelihood Methods STAT 830 — Fall 2011 4 / 32 Maximum Likelihood Estimation To find MLE maximize L. Typical function maximization problem: Set gradient of L equal to 0. Check root is maximum, not minimum or saddle point. Examine some likelihood plots in examples: Focus on fact that each data set corresponds to its own function of θ So the graph itself is a statistic. Richard Lockhart (Simon Fraser University) STAT 830 Likelihood Methods STAT 830 — Fall 2011 5 / 32 Cauchy Data IID sample X1,..., Xn from Cauchy(θ) density 1 f (x; θ)= π(1 + (x − θ)2) The likelihood function is n 1 L (θ)= 2 π(1 + (Xi − θ) ) i Y=1 Here are some likelihood plots. Richard Lockhart (Simon Fraser University) STAT 830 Likelihood Methods STAT 830 — Fall 2011 6 / 32 Cauchy data n =5 Likelihood Function: Cauchy, n=5 Likelihood Function: Cauchy, n=5 Likelihood Likelihood 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 -10 -5 0 5 10 -10 -5 0 5 10 Theta Theta Likelihood Function: Cauchy, n=5 Likelihood Function: Cauchy, n=5 Likelihood Likelihood 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 -10 -5 0 5 10 -10 -5 0 5 10 Theta Theta Likelihood Function: Cauchy, n=5 Likelihood Function: Cauchy, n=5 Likelihood Likelihood 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 -10 -5 0 5 10 -10 -5 0 5 10 Theta Theta Richard Lockhart (Simon Fraser University) STAT 830 Likelihood Methods STAT 830 — Fall 2011 7 / 32 Cauchy data n = 5 — close-up Likelihood Function: Cauchy, n=5 Likelihood Function: Cauchy, n=5 Likelihood Likelihood 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 -2 -1 0 1 2 -2 -1 0 1 2 Theta Theta Likelihood Function: Cauchy, n=5 Likelihood Function: Cauchy, n=5 Likelihood Likelihood 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 -2 -1 0 1 2 -2 -1 0 1 2 Theta Theta Likelihood Function: Cauchy, n=5 Likelihood Function: Cauchy, n=5 Likelihood Likelihood 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 -2 -1 0 1 2 -2 -1 0 1 2 Theta Theta Richard Lockhart (Simon Fraser University) STAT 830 Likelihood Methods STAT 830 — Fall 2011 8 / 32 Cauchy data n = 25 up Likelihood Function: Cauchy, n=25 Likelihood Function: Cauchy, n=25 Likelihood Likelihood 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 -10 -5 0 5 10 -10 -5 0 5 10 Theta Theta Likelihood Function: Cauchy, n=25 Likelihood Function: Cauchy, n=25 Likelihood Likelihood 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 -10 -5 0 5 10 -10 -5 0 5 10 Theta Theta Likelihood Function: Cauchy, n=25 Likelihood Function: Cauchy, n=25 Likelihood Likelihood 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 -10 -5 0 5 10 -10 -5 0 5 10 Theta Theta Richard Lockhart (Simon Fraser University) STAT 830 Likelihood Methods STAT 830 — Fall 2011 9 / 32 Cauchy data n = 25 — close-up Likelihood Function: Cauchy, n=25 Likelihood Function: Cauchy, n=25 Likelihood Likelihood 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 -1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 1.0 Theta Theta Likelihood Function: Cauchy, n=25 Likelihood Function: Cauchy, n=25 Likelihood Likelihood 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 -1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 1.0 Theta Theta Likelihood Function: Cauchy, n=25 Likelihood Function: Cauchy, n=25 Likelihood Likelihood 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 -1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 1.0 Theta Theta Richard Lockhart (Simon Fraser University) STAT 830 Likelihood Methods STAT 830 — Fall 2011 10 / 32 Things to see in the plots The likelihood functions have peaks near the true value of θ (which is 0 for the data sets I generated). The peaks are narrower for the larger sample size. The peaks have a more regular shape for the larger value of n. I actually plotted L(θ)/L(θˆ) which has exactly the same shape as L but runs from 0 to 1 on the vertical scale. Richard Lockhart (Simon Fraser University) STAT 830 Likelihood Methods STAT 830 — Fall 2011 11 / 32 The log-likelihood To maximize this likelihood: differentiate L, set result equal to 0. Notice L is product of n terms; derivative is n 1 2(Xi − θ) 2 2 2 π(1 + (Xj − θ) ) π(1 + (Xi − θ) ) i X=1 Yj=6 i which is quite unpleasant. Much easier to work with logarithm of L: log of product is sum and logarithm is monotone increasing. Def’n: The Log Likelihood function is ℓ(θ) = log{L(θ)} . For the Cauchy problem we have 2 ℓ(θ)= − log(1 + (Xi − θ) ) − n log(π) Now we examine log likelihoodX plots. Richard Lockhart (Simon Fraser University) STAT 830 Likelihood Methods STAT 830 — Fall 2011 12 / 32 Cauchy log-likelihood n =5 Likelihood Ratio Intervals: Cauchy, n=5 Likelihood Ratio Intervals: Cauchy, n=5 Log Likelihood Log Likelihood -22 -20 -18··· -16 -14 -12 · · ··· · -25 -20 -15 -10 -10 -5 0 5 10 -10 -5 0 5 10 Theta Theta Likelihood Ratio Intervals: Cauchy, n=5 Likelihood Ratio Intervals: Cauchy, n=5 Log Likelihood Log Likelihood -20 -15 -10 -20 -15 -10 -5 ····· ·· · · · -10 -5 0 5 10 -10 -5 0 5 10 Theta Theta Likelihood Ratio Intervals: Cauchy, n=5 Likelihood Ratio Intervals: Cauchy, n=5 Log Likelihood Log Likelihood -25 -20 -15 ···· · · ·· -24 -22 -20 -18 -16 -14 -12 -10 -10 -5 0 5 10 -10 -5 0 5 10 Theta Theta Richard Lockhart (Simon Fraser University) STAT 830 Likelihood Methods STAT 830 — Fall 2011 13 / 32 Cauchy log-likelihood n = 5, close-up Likelihood Ratio Intervals: Cauchy, n=5 Likelihood Ratio Intervals: Cauchy, n=5 Log Likelihood Log Likelihood -13.5 -13.0 -12.5 -12.0 -11.5 -11.0 -14 -12 -10 -8 -2 -1 0 1 2 -2 -1 0 1 2 Theta Theta Likelihood Ratio Intervals: Cauchy, n=5 Likelihood Ratio Intervals: Cauchy, n=5 Log Likelihood Log Likelihood -8 -6 -4 -2 -12 -11 -10 -9 -8 -7 -6 -2 -1 0 1 2 -2 -1 0 1 2 Theta Theta Likelihood Ratio Intervals: Cauchy, n=5 Likelihood Ratio Intervals: Cauchy, n=5 Log Likelihood Log Likelihood -14 -13 -12 -11 -10 -17 -16 -15 -14 -13 -12 -2 -1 0 1 2 -2 -1 0 1 2 Theta Theta Richard Lockhart (Simon Fraser University) STAT 830 Likelihood Methods STAT 830 — Fall 2011 14 / 32 Cauchy log-likelihood n = 25 Likelihood Ratio Intervals: Cauchy, n=25 Likelihood Ratio Intervals: Cauchy, n=25 Log Likelihood Log Likelihood -100 -80 -60 -40 -100 -80 -60 -40 -20 · · ········ · ······· ·· ·· · · · · ··· ········ · ········ · · · · -10 -5 0 5 10 -10 -5 0 5 10 Theta Theta Likelihood Ratio Intervals: Cauchy, n=25 Likelihood Ratio Intervals: Cauchy, n=25 Log Likelihood Log Likelihood -100 -80 -60 -40 -100 -80 -60 -40 -20 · ·· ··· ············· ··· · · · ·· · ················· ·· ·· · -10 -5 0 5 10 -10 -5 0 5 10 Theta Theta Likelihood Ratio Intervals: Cauchy, n=25 Likelihood Ratio Intervals: Cauchy, n=25 Log Likelihood Log Likelihood -100 -80 -60 -40 ··· · ········ ··· ····· · · ·········· · ·· ··· ··· ·· · · ·· -120 -100 -80 -60 -10 -5 0 5 10 -10 -5 0 5 10 Theta Theta Richard Lockhart (Simon Fraser University) STAT 830 Likelihood Methods STAT 830 — Fall 2011 15 / 32 Cauchy log-likelihood n = 25, close-up Likelihood Ratio Intervals: Cauchy, n=25 Likelihood Ratio Intervals: Cauchy, n=25 Log Likelihood Log Likelihood -30 -28 -26 -24 -30 -28 -26 -24 -22 -1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 1.0 Theta Theta Likelihood Ratio Intervals: Cauchy, n=25 Likelihood Ratio Intervals: Cauchy, n=25 Log Likelihood Log Likelihood -44 -42 -40 -38 -36 -28 -26 -24 -22 -1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 1.0 Theta Theta Likelihood Ratio Intervals: Cauchy, n=25 Likelihood Ratio Intervals: Cauchy, n=25 Log Likelihood Log Likelihood -56 -54 -52 -50 -48 -46 -49 -48 -47 -46 -45 -44 -43 -1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 1.0 Theta Theta Richard Lockhart (Simon Fraser University) STAT 830 Likelihood Methods STAT 830 — Fall 2011 16 / 32 Things to notice Plots of ℓ for n = 25 quite smooth, rather parabolic.

STAT 830 Likelihood Methods

Analysis of Variance; *******43*1T**T

6 Modeling Survival Data with Cox Regression Models

18.650 (F16) Lecture 10: Generalized Linear Models (Glms)

Optimal Variance Control of the Score Function Gradient Estimator for Importance Weighted Bounds

Bayesian Methods: Review of Generalized Linear Models

Simple, Lower-Variance Gradient Estimators for Variational Inference

Score Functions and Statistical Criteria to Manage Intensive Follow up in Business Surveys

Journal of Econometrics a Consistent Bootstrap Procedure for The

Introduces the Maximum Score Estimator

Glossary of Testing, Measurement, and Statistical Terms

Method of Moments Estimates for the Four-Parameter Beta Compound Binomial Model and the Calculation of Classification Consistency Indexes

Generalized Linear Models Link Function the Logistic Equation Is