1 Fisher Information

Fisher Information April 6, 2016 Debdeep Pati 1 Fisher Information Assume X ∼ f(x j θ) (pdf or pmf) with θ 2 Θ ⊂ R. Define @ 2 I (θ) = E log f(X j θ) X θ @θ @ where @θ log f(X j θ) is the derivative of the log-likelihood function evaluated at the true value θ. Fisher information is meaningful for families of distribution which are regular: 1. Fixed support: fx : f(x j θ) > 0g is the same for all θ. @ 2. @θ log f(x j θ) must exist and be finite for all x and θ. 3. If EθjW (X)j < 1 for all θ, then @ k @ k Z Z @ k E W (X) = W (x)f(x j θ)dx = W (x) f(x j θ)dx @θ θ @θ @θ 1.1 Regular families One parameter exponential families: Cauchy location or scale family: 1 f(x j θ) = π(1 + (x − θ)2) 1 f(x j θ) = πθ(1 + (x/θ)2) and lots more. (Most families of distributions used in applications are regular). 1.2 Non-regular families Uniform(0; θ) Uniform(θ; θ + 1): 1 1.3 Facts about Fisher Information Assume a regular family. 1. @ E log f(X j θ) = 0: θ @θ @ Here @θ log f(X j θ) is called the \score" function S(θ). Proof. @ Z @ E log f(X j θ) = log f(x j θ) f(x j θ)dx θ @θ @θ Z @ f(x j θ) = @θ f(x j θ)dx f(x j θ) Z @ = f(x j θ)dx @θ @ Z = f(x j θ)dx = 0 @θ since R f(x j θ)dx = 1 for all θ. @ 2. IX (θ) = Varθ @θ log f(X j θ) . @ Proof. Since Eθ @θ log f(X j θ) = 0 @ @ 2 Var log f(X j θ) = E log f(X j θ) = I (θ): θ @θ θ @θ X 3. If X = (X1;X2;:::;Xn) and X1;X2;:::;Xn are independent random variables, then IX (θ) = IX1 (θ) + IX2 (θ) + ··· IXn (θ). Proof. Note that n Y f(x j θ) = fi(xi j θ) i=1 2 where fi(· j θ) is the pdf (pmf) of Xi. Observe that n @ X @ log f(X j θ) = log f (X j θ) @θ @θ i i i=1 and the random variables in the sum are independent. This n @ X @ Var log f(X j θ) = Var log f (X j θ) @θ @θ i i i=1 Pn so that IX (θ) = i=1 IXi (θ) by 2. 4. If X1;X2;:::;Xn are i.i.d and X = (X1;X2;:::;Xn), then IXi (θ) = IX1 (θ) for all i so that IX (θ) = nIX1 (θ). 5. An alternate formula for Fisher information is @2 I (θ) = E − log f(X j θ) X θ @θ2 R R R @ Proof. Abbreviate f(x j θ)dx as f, etc. Since 1 = f, applying @θ to both sides, @ Z Z @f Z @ 0 = f = = @θ · f @θ @θ f Z @ = log f f: @θ @ Applying @θ again, @ Z @ 0 = log f f @θ @θ Z @ @ = log f f @θ @θ Z @2 Z @ @f = log f · f + log f @θ2 @θ @θ Noting that @f @f = @θ · f; @θ f @ = log f f; @θ 3 this becomes Z @2 Z @ 2 0 = log f · f + log f · f @θ2 @θ or @2 0 = E log f(X j θ) + I (θ): @θ2 X Example: Fisher Information for a Poisson sample. Observe X = (X1;:::;Xn) iid ~ Poisson(λ). Find IX(λ). We know IX(λ) = nIX1 (λ). We shall calculate IX1 (λ) in three ~ ~ ways. Let X = X1. Preliminaries: λxe−λ f(x j λ) = x! log f(x j λ) = x log λ − λ − log x! @ x log f(x j λ) = − 1 @λ λ @2 x − log f(x j λ) = @λ2 λ2 Method #1: Observe that @ 2 X 2 I (λ) = E log f(X j λ) = E − 1 X λ @λ λ λ X X EX = Var (sinceE = = 1) λ λ λ λ Var(X) λ λ 1 = = = = λ2 λ2 λ2 λ Method #2: Observe that @ X I (λ) = Var log f(X j λ) = Var − 1 X λ @λ λ X 1 = Var = (as in Method#1): λ λ Method #3: Observe that @2 X λ 1 I (λ) = E − log f(X j λ) = E = = : X λ @λ2 λ λ2 λ2 λ 4 n Thus IX(λ) = nIX1 (λ) = λ . ~ Example: Fisher information for Cauchy location family. Suppose X1;X2;:::;Xn iid with pdf 1 f(x j θ) = : π(1 + (x − θ)2) Let X = (X1;:::;Xn);X ∼ f(x j θ). Find IX(θ). ~ ~ Note that IX(θ) = nIX1 (θ) = nIX (θ). Now ~ @ @f log f(x j θ) = @θ @θ f −1 π(1+(x−θ)2)2 · 2(x − θ)(−1) = 1 π(1+(x−θ)2) 2(x − θ) = (1 + (x − θ)2) Now @ 2 I (θ) = E log f(X j θ) X @θ 2(X − θ) 2 = E 1 + (X − θ)2 Z 1 2(x − θ) 2 1 = 2 2 dx −∞ 1 + (x − θ) π(1 + (x − θ) ) 4 Z 1 (x − θ)2 = 2 3 dx: π −∞ (1 + (x − θ) ) Letting u = x − θ; du = dx, 4 Z 1 u2 IX (θ) = 2 3 du π −∞ (1 + u ) 8 Z 1 u2 = 2 3 du: π 0 (1 + u ) 5 Substituting x = 1=(1 + u2), u = (1=x − 1)1=2; du = 0:5(1=x − 1)−1=2(−1=x2)dx, 8 Z 1 u2 IX (θ) = 2 3 du π 0 (1 + u ) 8 Z 1 u2 1 2 = 2 2 du π 0 (1 + u ) 1 + u 8 Z 1 = (1 − x)x2 · (1=2)(1=x − 1)−1=2(1=x2)dx π 0 4 Z 1 = x1=2(1 − x)1=2dx π 0 4 Z 1 = x3=2−1(1 − x)3=2−1dx (Beta integral) π 0 p 4 Γ(3=2)Γ(3=2) 4 (0:5 π)2 = = π Γ(3=2 + 3=2) π 2! 1 = : 2 Hence IX(θ) = n=2. ~ 2 Uses of Fisher Information • Asymptotic distribution of MLE's • Cramér-RaoInequality (Information inequality) 2.1 Asymptotic distribution of MLE's • i.i.d case: If f(x j θ) is a regular one-parameter family of pdf's (or pmf's) and θ^n = θ^n(Xn) is the MLE based on Xn = (X1;:::;Xn) where n is large and X1;:::;Xn are iid from f(x j θ), then approximately, 1 θ^ ∼ N θ; n nI(θ) where I(θ) ≡ IX1 (θ) and θ is the true value. Note that nI(θ) = IXn (θ). More formally, ^ θn − θ p d = nI(θ)(θ^n − θ) ! N(0; 1) q 1 nI(θ) 6 as n ! 1. • More general case: (Assuming various regularity conditions) If f(x j θ) is a one- ~ parameter family of joint pdf's (or joint pmf's) for data Xn = (X1;:::;Xn) where n is large (think of a large dataset arising from regression or time series model) and θ^n = θ^n(Xn) is the MLE, then 1 θ^n ∼ N θ; IXn (θ) where θ is the true value. 2.2 Estimation of the Fisher Information If θ is unknown, then so is IX(θ). Two estimates I^ of the Fisher information IX(θ) are @2 I^ = I (θ^); I^ = − log f(X j θ)j 1 X 2 @θ2 θ=θ^ where θ^ is the MLE of θ based on the data X. I^1 is the obvious plug-in estimator. It can be difficult to compute IX (θ) does not have a known closed form. The estimator I^2 is suggested by the formula @2 I (θ) = E − log f(X j θ) X @θ2 It is often easy to compute, and is required in many Newton- Raphson style algorithms for finding the MLE (so that it is already available without extra computation). The two estimates I^1 and I^2 are often referred to as the \expected" and \observed" Fisher information, respectively. As n ! 1, both estimators are consistent (after normalization) for IXn (θ) under various regularity conditions. ^ ^ For example: in the iid case: I1=n; I2=n, and IXn (θ)=n all converge to I(θ) ≡ IX1 (θ). 2.3 Approximate Confidence Intervals for θ Choose 0 < α < 1 (say, α = 0:05). Let z∗ be such that P (−z∗ < Z < z∗) = 1 − α where Z ∼ N(0; 1). When n is large, we have approximately p ^ IX(θ)(θ − θ) ∼ N(0; 1) 7 so that ∗ p ^ ∗ P − z < IX(θ)(θ − θ) < z ≈ 1 − α or equivalently, s s 1 1 P θ^ − z∗ < θ < θ^ + z∗ ≈ 1 − α: IX(θ) IX(θ) This approximation continues to hold when IX(θ) is replaced by an estimate I^ (either I^1 or I^2): r1 r1 P θ^ − z∗ < θ < θ^ + z∗ ≈ 1 − α: I^ I^ Thus r1 r1 θ^ − z∗ ; θ^ + z∗ I^ I^ is an approximate 1 − α confidence interval for θ. (Here θ^ is the MLE and I^ is an estimate of the Fisher information.) 3 Cramer-Rao Inequality Let X ∼ Pθ; θ 2 Θ ⊂ R. ~ Theorem 1. If f(x j θ) is a regular one-parameter family, EθW (X) = τ(θ) for all θ, and ~ ~ τ(θ) is differentiable, then fτ 0(θ)g2 Varθ(W (X)) ≥ : ~ IX(θ) ~ Proof.

1 Fisher Information

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support