Fisher Information Example
Topic 15 Maximum Likelihood Estimation Multidimensional Estimation
1 / 10 Fisher Information Example
Outline
Fisher Information
Example Distribution of Fitness Effects Gamma Distribution
2 / 10 Fisher Information Example
Fisher Information
For a multidimensional parameter space θ = (θ1, θ2, . . . , θn), the Fisher information I (θ) is a matrix. As with one-dimensional case, the ij-th entry has two alternative expressions, namely,
∂ ∂ ∂2 I (θ)ij = Eθ ln L(θ|X ) ln L(θ|X ) = −Eθ ln L(θ|X ) . ∂θi ∂θj ∂θi ∂θj Rather than taking reciprocals to obtain an estimate of the variance, we find the matrix inverse I (θ)−1. • The diagonal entries of I (θ)−1 gives estimates of variances. • The off-diagonal entries of I (θ)−1 give estimates of covariances.
3 / 10 Fisher Information Example
Fisher Information To be precise, for n observations, let θˆi,n(X ) be the maximum likelihood estimator of the i-th parameter. Then 1 1 Var (θˆ (X )) ≈ I (θ)−1 Cov (θˆ (X ), θˆ (X )) ≈ I (θ)−1. θ i,n n ii θ i,n j,n n ij When the i-th parameter is θi , the asymptotic normality and efficiency can be expressed by noting that the z-score
θˆi (X ) − θi Zi,n = . q −1 I (θ)ii /n is approximately a standard normal. As we saw in one dimension, we can replace the information matrix with the observed information matrix, ∂2 J(θˆ)ij = − ln L(θˆ(X )|X ). ∂θi ∂θj
4 / 10 Fisher Information Example
Distribution of Fitness Effects We return to the model of the gamma distribution for the distribution of fitness effects of deleterious mutations. To obtain the maximum likelihood estimate for the gamma family of random variables, write the likelihood βα βα L(α, β|x) = xα−1e−βx1 ··· xα−1e−βxn Γ(α) 1 Γ(α) n βα n = (x x ··· x )α−1e−β(x1+x2+···+xn) . Γ(α) 1 2 n and its logarithm n n X X ln L(α, β|x) = n(α ln β − ln Γ(α)) + (α − 1) ln xi − β xi . i=1 i=1
∂ ∂ The score function is a vector ∂α ln L(α, β|x), ∂β ln L(α, β|x) . 5 / 10 Fisher Information Example
Gamma Distribution
n n X X ln L(α, β|x) = n(α ln β − ln Γ(α)) + (α − 1) ln xi − β xi . i=1 i=1 The zeros of the components of the score function determine the maximum likelihood estimators. Thus, to determine these parameters, we solve the equations n ∂ d X ln L(ˆα, βˆ|x) = n(ln βˆ − ln Γ(ˆα)) + ln x = 0 ∂α dα i i=1 n ∂ αˆ X αˆ and ln L(ˆα, βˆ|x) = n − x = 0, or¯ x = . ∂β ˆ i ˆ β i=1 β Substituting βˆ =α/ ˆ x¯ into the first equation results the following relationship forˆα. d n(lnα ˆ − lnx ¯ − ln Γ(ˆα) + ln x) = 0 dα
6 / 10 Fisher Information Example
Gamma Distribution
This can be solved numerically. The deriva-
tive of the logarithm of the gamma function 1.5
d 1.0 ψ(α) = ln Γ(α) dα 0.5 is know as the digamma function and is alpha-score
called in R with digamma. 0.0
For the example for the distribution of fit- -0.5 ness effects in humans, a simulated data 0.14 0.16 0.18 0.20 0.22 0.24
set (rgamma(500,0.19,5.18)) yieldsˆα = alpha ˆ 0.2006 and β = 5.806 for maximum likeli- d Figure: lnα ˆ − lnx ¯ − dα ln Γ(ˆα) + ln xi crosses hood estimates. the horizontal axis atˆα = 0.2006.
7 / 10 Fisher Information Example
Gamma Distribution Exercise. To determine the variance of these estimators, compute the appropriate second derivatives. ∂2 d2 ∂2 α I (α, β) = − ln L(α, β|x) = n ln Γ(α), I (α, β) = − ln L(α, β|x) = n , 11 ∂α2 dα2 22 ∂β2 β2
∂2 1 I (α, β) = − ln L(α, β|x) = −n . 12 ∂α∂β β This give a Fisher information matrix
2 ! d ln Γ(α) − 1 28.983 −0.193 I (α, β) = n dα2 β I (0.19, 5.18) = 500 . 1 α −0.193 0.007 − β β2
2 2 NB. ψ1(α) = d ln Γ(α)/dα is known as the trigamma function and is called in R with trigamma. 8 / 10 Fisher Information Example
Gamma Distribution The inverse matrix 1 0.0422 1.1494 I (α, β)−1 = . 500 1.1494 172.5587
Thus, −5 Var(ˆα) ≈ 8.432 × 10 σαˆ ≈ 0.00918
ˆ Var(β) ≈ 0.3451 σβˆ ≈ 0.5875
Compare this with the method of moments estimators
σαˆ ≈ 0.02838 σβˆ ≈ 0.9769
Exercise. Estimate the correlation ρ(ˆα, βˆ).
9 / 10 Fisher Information Example
Gamma Distribution 2120 2110 loglikelia 2100
0.14 0.16 0.18 0.20 0.22 0.24
loglikeli alpha 2125.5 loglikelib
alpha 2124.5 beta 5.0 5.5 6.0 6.5 7.0
beta Figure: Graphs of vertical slices through the Figure: The log-likelihood surface. The domain log-likelihood function surface through the is0 .14 ≤ α ≤ 0.24 and5 ≤ β ≤ 7 MLE. (top) βˆ = 5.806 (bottom)ˆα = 0.20066.
10 / 10