Properties of Maximum Likelihood Estimation (MLE) (Latex Prepared by Haiguang Wen) April 27, 2015

ECE 645: Estimation Theory Spring 2015 Instructor: Prof. Stanley H. Chan Lecture 8: Properties of Maximum Likelihood Estimation (MLE) (LaTeX prepared by Haiguang Wen) April 27, 2015 This lecture note is based on ECE 645(Spring 2015) by Prof. Stanley H. Chan in the School of Electrical and Computer Engineering at Purdue University. 1 Efficiency of MLE Maximum Likelihood Estimation (MLE) is a widely used statistical estimation method. In this lecture, we will study its properties: efficiency, consistency and asymptotic normality. MLE is a method for estimating parameters of a statistical model. Given the distribution of a statistical model f(y ; θ) with unkown deterministic parameter θ, MLE is to estimate the parameter θ by maximizing the probability f(y ; θ) with observations y. θ(y) = arg min f(y ; θ) (1) θ Please see the previous lecture note Lectureb 7 for details. 1.1 Cramér–Rao Lower Bound (CRLB) Cramér–Rao Lower Bound (CRLB) is introduced in Lecture 7. Briefly, CRLB describes a lower bound on the variance of estimators of the deterministic parameter θ. That is ( ∂ E[θ(Y )])2 Var θ(Y ) > ∂θ , (2) I(θ) b where I(θ) is the Fisher information that measuresb the information carried by the observable random variable Y about the unknown parameter θ. For unbiased estimator θ(Y ), Equation 2 can be simplified as 1 Var θ(Y ) > b , (3) I(θ) which means the variance of any unbiased estimatorb is as least as the inverse of the Fisher information. 1.2 Efficient Estimator From section 1.1, we know that the variance of estimator θ(y) cannot be lower than the CRLB. So any estimator whose variance is equal to the lower bound is considered as an efficient estimator. b Definition 1. Efficient Estimator An estimator θ(y) is efficient if it achieves equality in CRLB. b Example 1. 2 Question: Y = Y1, Y2, , Yn are i.i.d. Gaussian random variables with distribution N(θ, σ ). Determine the maximum likelihood{ ··· estimator} of θ. Is the estimator efficient? Solution: Let y = y ,y , ,yn be the observation, then { 1 2 ··· } n f(y ; θ)= f(yk ; θ) k Y=1 n 2 1 (yk θ) = exp − √2πσ2 − 2σ2 k=1 Y n 2 1 k=1(yk θ) = n exp − . (2πσ2) 2 − 2σ2 P Take the log of both sides of the above equation, we have n 2 n (yk θ) log f(y ; θ)= log(2πσ2) k=1 − . − 2 − 2σ2 P Since log f(y ; θ) is a quadratic concave function of θ, we can obtain the MLE by solving the following equation. n ∂ log f(y ; θ) 2 (yk θ) = k=1 − =0. ∂θ 2σ2 P 1 n Therefore, the MLE is θMLE(y)= n k=1 yk. P Now let us check whetherb the estimator is efficient or not. It is easy to check that the MLE is an unbiased estimator (E[θMLE (y)] = θ). To determine the CRLB, we need to calculate the Fisher information of the model. ∂2 n b I(θ)= E log f(y ; θ) = (4) − ∂θ2 σ2 According to Equation 3, we have 1 σ2 Var θ (Y ) > = . (5) MLE I(θ) n And the variance of the MLE is b n 1 σ2 Var θ (Y ) = Var Y = . (6) MLE n k n k ! X=1 b So CRLB equality is achieved, thus the MLE is efficient. 1.3 Minimum Variance Unbiased Estimator (MVUE) Recall that a Minimum Variance Unbiased Estimator (MVUE) is an unbiased estimator whose variance is lower than any other unbiased estimator for all possible values of parameter θ. That is Var(θMVUE (Y )) 6 Var(θ(Y )) (7) for any unbiased θ(Y ) of any θ. b b Proposition 1. Unbiased and Efficient Estimators b If an estimator θ(y) is unbiased and efficient, then it must be MVUE. b 2 Proof. Since θ(y) is efficient, according to CRLB, we have b Var θ(Y ) 6 Var θ(Y ) (8) for any θ(Y ). Therefore, θ(Y ) must be minimumb variance (MV).e Since θ(Y ) is also unbiased, it is a MVUE. ✷ e e e Remark: The converse of the proposition is not true in general. That is, MVUE does NOT need to be efficient. Here is a counter example. Example 2. Counter Example 1 Suppose that y = Y1, Y2, , Yn are i.i.d. exponential random variables with unknown mean θ . Find the MLE and MVUE of{ θ. Are··· these} estimators efficient? Solution: Let y = y ,y , ,yn be the observation, then { 1 2 ··· } n f(y ; θ)= f(yk ; θ) k=1 Yn = θ exp θyk {− } k=1 Y n n = θ exp θ yk . (9) − ( k ) X=1 Take the log of both sides of the above equation, we have n log f(y ; θ)= n log(θ) θ yk. − k X=1 Since log f(y ; θ) is a concave function of θ, we can obtain the MLE by solving the following equation. n ∂ log f(y ; θ) n = =0 ∂θ θ − k X=1 So the MLE is n θMLE(y)= n . (10) k=1 yk b P E n To calculate the CRLB, we need to calculate θMLE(Y ) and Var θMLE(Y ) . Let T (y)= k=1 yk, then by moment generating function, we can show thath the distributioni of T (y) is the Erlange distribution: b b P n n−1 θ t − f (t)= e θt. (11) T (n 1)! − So we have ∞ n n−1 n θ t − E θ (T (Y )) = e θtdt MLE t (n 1)! Z0 h i ∞− n−2 nθ (θt) − b = e θtdθt n 1 0 (n 2)! −n Z − = θ. (12) n 1 − 3 Therefore the MLE is a biased estimator of θ. Similarly, we can calculate the variance of MLE as follows. 2 2 Var θMLE (T (Y )) = E θ (T (Y )) E θMLE (T (Y )) MLE − h θ2n2 i h i b = b b (n 1)2(n 2) − − The Fisher information is ∂2 n I(θ)= log f(y θ)= . −∂θ2 | θ2 So the CRLB is 2 ∂ E ∂θ θMLE (T (Y )) Var θ (T (Y )) > MLE h I(θ) i b n2 n b = (n 1)2 θ2 −n . = θ2. (n 1)2 − The CRLB equality does NOT hold, so θMLE is not efficient. b n The distribution in Equation 9 belongs to exponential family and T (y) = k=1 yk is a complete sufficient statistic. So the MLE can be expressed as θ (T (y)) = n , which is a function of T (y). However, the MLE T (y) P MLE is a biased estimator (Equation 12). But we can construct an unbiased estimator based on the MLE. That is b n 1 θ(T (y)) = − θ (T (y)) n MLE n 1 e = − .b T (y) E E n−1 n−1 n It is easy to check θ (T (Y )) = n θMLE (T (Y )) = n n−1 θ = θ. Since θ(T (y)) is an unbiased estimator and it is a functionh ofi completeh sufficient statistic,i θ(T (y)) is MVUE. So e b e n 1 θMVUE (T (y)) = −e . (13) T (y) The variance of MVUE is b n 1 Var θ (T (Y )) = Var − θ (T (Y )) MVUE n MLE (n 1)2 n2θ2 b = − b n2 (n 1)2(n 2) − − θ2 = . n 2 − So the CRLB is 1 θ2 Var θ (T (Y )) > = . MVUE I(θ) n Therefore, the MVUE is NOT an efficientb estimator. 4 2 Consistency of MLE Definition 2. Consistency Let Y1, , Yn be a sequence of observations. Let θn be the estimator using Y1, , Yn . We say that { ··· } p { ··· } θn is consistent if θn θ, i.e., → b P θn θ >ε 0, as n (14) b b | − | → → ∞ Remark: A sufficient condition to have Equationb 14 is that 2 E θn θ 0, as n . (15) − → → ∞ Proof. b According to Chebyshev’s inequality, we have 2 E (θn θ) − P θn θ ε (16) | − |≥ ≤ h ε2 i b 2 b Since E θn θ 0, the we have − → b 2 E (θn θ) − 0 P θn θ ε 0. ≤ | − |≥ ≤ h ε2 i → b b Therefore, P θn θ >ε 0, as n . | − | → → ∞ ✷ b Example 3. 2 Y1, Y2, , Yn are i.i.d. Gaussian random variables with distribution N(θ, σ ). Is the MLE using Y1, Y2, , Yn consistent?{ ··· } { ··· } Solution: From Example 1., we know that the MLE is n 1 θ = Y . n n k k X=1 b Since 2 σ2 E θn θ = Var θn = , (see Equation 6), − n 2 b p b so E θn θ 0. Therefore θn θ, i.e. θn is consistent. − → → b b b In fact, the result of the example above it holds for any distribution. The following proposition states this result: Proposition 2. (MLE of i.i.d observation is consistent) Let Y , , Yn be a sequence of i.i.d. observations where { 1 ··· } iid Yk ∼ fθ(y). Then the MLE of θ is consistent. 5 Proof. (This proof is partially correct. See Levy Chapter 4.5 for complete discussion.) The MLE of θ is n θn = argmax fθ(yk) θ k Y=1 b n = argmax log fθ(yk) θ k=1 ! n Y 1 = argmax log fθ(yk) θ n k X=1 = argmax ϕn(θ), θ 1 n fθ (yk) where ϕn(θ)= log fθ(yk). Let ℓb(yk) = log , then we have n k=1 θ fθb(yk) P E def fθ(yk) θb ℓθb(Yk) = log fθb(yk)dyk fb(y ) Z θ k = D fθ fb .

Load more