Mathematical Statistics Likelihood Methods of Inference Toss Coin 6
Total Page:16
File Type:pdf, Size:1020Kb
STAT 801: Mathematical Statistics Likelihood Methods of Inference Toss coin 6 times and get Heads twice. p is probability of getting H. Probability of getting exactly 2 heads is 15p2(1 − p)4 This function of p, is likelihood function. Definition: The likelihood function is map L: domain Θ, values given by L(θ) = fθ(X) Key Point: think about how the density depends on θ not about how it depends on X. Notice: X, observed value of the data, has been plugged into the formula for density. Notice: coin tossing example uses the discrete density for f. We use likelihood for most inference problems: 1. Point estimation: we must compute an estimate θ^ = θ^(X) which lies in Θ. The maximum likelihood estimate (MLE) of θ is the value θ^ which maximizes L(θ) over θ 2 Θ if such a θ^ exists. 2. Point estimation of a function of θ: we must compute an estimate φ^ = φ^(X) of φ = g(θ). We use φ^ = g(θ^) where θ^ is the MLE of θ. 3. Interval (or set) estimation. We must compute a set C = C(X) in Θ which we think will contain θ0. We will use fθ 2 Θ : L(θ) > cg for a suitable c. 4. Hypothesis testing: decide whether or not θ0 2 Θ0 where Θ0 ⊂ Θ. We base our decision on the likelihood ratio supfL(θ); θ 2 Θ g 0 : supfL(θ); θ 2 Θ n Θ0g Maximum Likelihood Estimation To find MLE maximize L. Typical function maximization problem: Set gradient of L equal to 0 Check root is maximum, not minimum or saddle point. Examine some likelihood plots in examples: Cauchy Data 1 Iid sample X1; : : : ; Xn from Cauchy(θ) density 1 f(x; θ) = π(1 + (x − θ)2) The likelihood function is n 1 L(θ) = π(1 + (X − θ)2) i=1 i Y [Examine likelihood plots.] 2 Likelihood Function: Cauchy, n=5 Likelihood Function: Cauchy, n=5 1.0 1.0 0.8 0.8 0.6 0.6 Likelihood Likelihood 0.4 0.4 0.2 0.2 0.0 0.0 -10 -5 0 5 10 -10 -5 0 5 10 Theta Theta Likelihood Function: Cauchy, n=5 Likelihood Function: Cauchy, n=5 1.0 1.0 0.8 0.8 0.6 0.6 Likelihood Likelihood 0.4 0.4 0.2 0.2 0.0 0.0 -10 -5 0 5 10 -10 -5 0 5 10 Theta Theta Likelihood Function: Cauchy, n=5 Likelihood Function: Cauchy, n=5 1.0 1.0 0.8 0.8 0.6 0.6 Likelihood Likelihood 0.4 0.4 0.2 0.2 0.0 0.0 -10 -5 0 5 10 -10 -5 0 5 10 Theta Theta 3 4 Likelihood Function: Cauchy, n=5 Likelihood Function: Cauchy, n=5 1.0 1.0 0.8 0.8 0.6 0.6 Likelihood Likelihood 0.4 0.4 0.2 0.2 0.0 -2 -1 0 1 2 -2 -1 0 1 2 Theta Theta Likelihood Function: Cauchy, n=5 Likelihood Function: Cauchy, n=5 1.0 1.0 0.8 0.8 0.6 0.6 Likelihood Likelihood 0.4 0.4 0.2 0.2 0.0 0.0 -2 -1 0 1 2 -2 -1 0 1 2 Theta Theta Likelihood Function: Cauchy, n=5 Likelihood Function: Cauchy, n=5 1.0 1.0 0.8 0.8 0.6 0.6 Likelihood Likelihood 0.4 0.4 0.2 0.2 0.0 0.0 -2 -1 0 1 2 -2 -1 0 1 2 Theta Theta 5 6 Likelihood Function: Cauchy, n=25 Likelihood Function: Cauchy, n=25 1.0 1.0 0.8 0.8 0.6 0.6 Likelihood Likelihood 0.4 0.4 0.2 0.2 0.0 0.0 -10 -5 0 5 10 -10 -5 0 5 10 Theta Theta Likelihood Function: Cauchy, n=25 Likelihood Function: Cauchy, n=25 1.0 1.0 0.8 0.8 0.6 0.6 Likelihood Likelihood 0.4 0.4 0.2 0.2 0.0 0.0 -10 -5 0 5 10 -10 -5 0 5 10 Theta Theta Likelihood Function: Cauchy, n=25 Likelihood Function: Cauchy, n=25 1.0 1.0 0.8 0.8 0.6 0.6 Likelihood Likelihood 0.4 0.4 0.2 0.2 0.0 0.0 -10 -5 0 5 10 -10 -5 0 5 10 Theta Theta 7 8 Likelihood Function: Cauchy, n=25 Likelihood Function: Cauchy, n=25 1.0 1.0 0.8 0.8 0.6 0.6 Likelihood Likelihood 0.4 0.4 0.2 0.2 0.0 0.0 -1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 1.0 Theta Theta Likelihood Function: Cauchy, n=25 Likelihood Function: Cauchy, n=25 1.0 1.0 0.8 0.8 0.6 0.6 Likelihood Likelihood 0.4 0.4 0.2 0.2 0.0 0.0 -1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 1.0 Theta Theta Likelihood Function: Cauchy, n=25 Likelihood Function: Cauchy, n=25 1.0 1.0 0.8 0.8 0.6 0.6 Likelihood Likelihood 0.4 0.4 0.2 0.2 0.0 0.0 -1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 1.0 Theta Theta 9 I want you to notice the following points: • The likelihood functions have peaks near the true value of θ (which is 0 for the data sets I generated). • The peaks are narrower for the larger sample size. • The peaks have a more regular shape for the larger value of n. • I actually plotted L(θ)=L(θ^) which has exactly the same shape as L but runs from 0 to 1 on the vertical scale. To maximize this likelihood: differentiate L, set result equal to 0. Notice L is product of n terms; derivative is n 1 2(Xi − θ) π(1 + (X − θ)2) π(1 + (X − θ)2)2 i=1 6 j i X Yj=i which is quite unpleasant. Much easier to work with logarithm of L: log of product is sum and logarithm is monotone increasing. Definition: The Log Likelihood function is `(θ) = logfL(θ)g : For the Cauchy problem we have 2 `(θ) = − log(1 + (Xi − θ) ) − n log(π) X [Examine log likelihood plots.] 10 Likelihood Ratio Intervals: Cauchy, n=5 Likelihood Ratio Intervals: Cauchy, n=5 -12 -10 -14 -15 -16 Log Likelihood Log Likelihood -18 -20 -20 -22 · · · · · ·· · · -25 -10 -5 0 5 10 -10 -5 0 5 10 Theta Theta Likelihood Ratio Intervals: Cauchy, n=5 Likelihood Ratio Intervals: Cauchy, n=5 -5 -10 -10 -15 -15 Log Likelihood Log Likelihood -20 -20 · · ·· · ·· · · · -10 -5 0 5 10 -10 -5 0 5 10 Theta Theta Likelihood Ratio Intervals: Cauchy, n=5 Likelihood Ratio Intervals: Cauchy, n=5 -10 -12 -15 -14 -16 -18 -20 Log Likelihood Log Likelihood -20 -22 -25 · · · · · · · · ·· -24 -10 -5 0 5 10 -10 -5 0 5 10 Theta Theta 11 12 Likelihood Ratio Intervals: Cauchy, n=5 Likelihood Ratio Intervals: Cauchy, n=5 -11.0 -8 -11.5 -12.0 -10 -12.5 Log Likelihood Log Likelihood -12 -13.0 -13.5 -14 -2 -1 0 1 2 -2 -1 0 1 2 Theta Theta Likelihood Ratio Intervals: Cauchy, n=5 Likelihood Ratio Intervals: Cauchy, n=5 -2 -6 -7 -4 -8 -9 -6 Log Likelihood Log Likelihood -10 -8 -11 -12 -2 -1 0 1 2 -2 -1 0 1 2 Theta Theta Likelihood Ratio Intervals: Cauchy, n=5 Likelihood Ratio Intervals: Cauchy, n=5 -10 -12 -11 -13 -14 -12 Log Likelihood Log Likelihood -15 -13 -16 -14 -17 -2 -1 0 1 2 -2 -1 0 1 2 Theta Theta 13 14 Likelihood Ratio Intervals: Cauchy, n=25 Likelihood Ratio Intervals: Cauchy, n=25 -20 -40 -40 -60 -60 Log Likelihood Log Likelihood -80 -80 -100 -100 · · · · ······· ··· ··· · · · · ·· ·· · ········ ·· · · · -10 -5 0 5 10 -10 -5 0 5 10 Theta Theta Likelihood Ratio Intervals: Cauchy, n=25 Likelihood Ratio Intervals: Cauchy, n=25 -20 -40 -40 -60 -60 -80 Log Likelihood Log Likelihood -80 -100 -100 · ·· ··· ··········· · · · · · · · · ········ · · · · · · · -10 -5 0 5 10 -10 -5 0 5 10 Theta Theta Likelihood Ratio Intervals: Cauchy, n=25 Likelihood Ratio Intervals: Cauchy, n=25 -40 -60 -60 -80 -80 Log Likelihood Log Likelihood -100 -100 · · ··· · ······ ·· ··· ·· · · · · · · ··· · ······· ·· ·· · ·· ·· -120 -10 -5 0 5 10 -10 -5 0 5 10 Theta Theta 15 16 Likelihood Ratio Intervals: Cauchy, n=25 Likelihood Ratio Intervals: Cauchy, n=25 -22 -24 -24 -26 -26 -28 Log Likelihood Log Likelihood -28 -30 -30 -1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 1.0 Theta Theta Likelihood Ratio Intervals: Cauchy, n=25 Likelihood Ratio Intervals: Cauchy, n=25 -36 -22 -38 -24 -40 Log Likelihood Log Likelihood -26 -42 -44 -28 -1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 1.0 Theta Theta Likelihood Ratio Intervals: Cauchy, n=25 Likelihood Ratio Intervals: Cauchy, n=25 -43 -46 -44 -48 -45 -50 -46 -52 Log Likelihood Log Likelihood -47 -54 -48 -56 -49 -1.0 -0.5 0.0 0.5 1.0 -1.0 -0.5 0.0 0.5 1.0 Theta Theta 17 Notice the following points: • Plots of ` for n = 25 quite smooth, rather parabolic.