10 Point Estimation Using Maximum Likelihood
Total Page:16
File Type:pdf, Size:1020Kb
B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 10 Point Estimation using Maximum Likelihood 10.1 The method of maximum likelihood De¯nition For each x 2 X , let θ^(x) be such that, with x held ¯xed, the likelihood function fL(θ; x): θ 2 £g attains its maximum value as a function of θ at θ^(x). The estimator θ^(X) is then said to be a maximum likelihood estimator (MLE) of θ. ² Given any observed value x, the value of a MLE, θ^(x), can be interpreted as the most likely value of θ to have given rise to the observed x value. ² To ¯nd a MLE, it is often more convenient to maximize the log likelihood function, ln L(θ; x), which is equivalent to maximizing the likelihood function. ² It should be noted that a MLE may not exist | there may be an x 2 X such that there is no θ that maximizes the likelihood function fL(θ; x): θ 2 £g. However, in practice, it has been found that such cases will be exceptional. ² The MLE, if it does exist, may not be unique | given x 2 X , there may be more than one value of θ that maximizes the likelihood function fL(θ; x): θ 2 £g. Again, in practice, such cases are found only very exceptionally. ² There will be simple cases in which the MLE may be found analytically, possibly just using the standard methods of calculus. In other cases, especially if the dimension q of the vector of parameters is large, the maximization will have to be carried out using numerical methods within computer packages. ² The maximum of the likelihood function may be in the interior of £, but there will be cases in which it lies on the boundary of £. In the former case, routine analytical or numerical methods are generally available, but the possibility that the latter case may occur has to be borne in mind. Assuming that the likelihood function L(θ; x) is a continuously di®erentiable function of θ, given x 2 X , an interior stationary point of ln L(θ; x) or of L(θ; x) is given by a solution of the likelihood equations, @ ln L(θ; x) = 0; j = 1; : : : ; q: (1) @θj A solution of Equations (1) may or may not be unique, and may or may not give us a MLE. In many of the standard cases, solution of the Equations (1) does give us the MLE, but we cannot take this for granted. We may wish to check that a solution of Equation (1) gives at least a local maximum of the likelihood function. If L(θ; x) is twice continuously di®erentiable, the criterion to check is that the Hessian matrix, the matrix of second order partial derivatives, @2 ln L(θ; x) j; k = 1; : : : ; q: @θj@θk 1 is negative de¯nite at the solution point. However, even if this condition can be shown to be satis¯ed, it does not prove that the solution point is a global maximum. If possible, it is better to use alternative, direct methods of proving that a global maximum has been found. Example 1 2 For a random sample X1;X2;:::;Xn of size n from a N(¹; σ ) distribution, we found in Chapter 9 that the log likelihood function is given by n 1 ln L(¹; σ2; x) = ¡ ln(2¼σ2) ¡ (s + n(¹x ¡ ¹)2); (2) 2 2σ2 xx Pn 2 where sxx = i=1(xi ¡ x¹) . We could ¯nd the MLE by solving the likelihood equations, but the following argument provides a clear-cut demonstration that we really have found the MLE. For any given σ2 > 0, the expression (2) is maximized with respect to ¹ by taking the value¹ ^ =x ¹ of ¹. Using this fact, we reduce the problem of ¯nding a MLE to the univariate problem of maximizing n s ln L(^¹; σ2; x) = ¡ ln(2¼σ2) ¡ xx (3) 2 2σ2 with respect to σ2 > 0. Di®erentiating the expression (3) w.r.t σ2, we ¯nd @ ln L(^¹; σ2; x) n s = ¡ + xx @σ2 2σ2 2σ4 n ³s ´ = xx ¡ σ2 : 2σ4 n Hence 8 2 > > 0 for σ < sxx=n > 2 < @ ln L(^¹; σ ; x) 2 = 0 for σ = sxx=n @σ2 > > : 2 < 0 for σ > sxx=n Thus ln L(^¹; σ2; x) is a unimodal function of σ2 which attains its maximum at the value ^2 2 2 σ = sxx=n of σ . It follows that the MLE of (¹; σ ) is given by à ! 1 Xn (^¹; σ^2) = X;¹ (X ¡ X¹)2 : n i i=1 Example 2 Obtain the maximum likelihood estimator of the parameter p 2 (0; 1) based on a random sample X1;X2;:::;Xn of size n drawn from the Bin(m; p) distribution: µ ¶ m f (x; p) = px(1 ¡ p)m¡x; x = 0; 1; : : : ; m: X x 2 where m 2 Z+ is known. Solution: Yn Yn µ ¶ m xi m¡xi L(p; x) = fX(x; p) = fXi (xi; p) = p (1 ¡ p) xi i=1 i=1 " # n µ ¶ Y P m n x n(m¡x) = p i=1 i (1 ¡ p) : xi i=1 Therefore log L(p; x) = const + nx log p + n(m ¡ x) log(1 ¡ p): Taking the derivative with respect to p and setting equal to zero yields the equation satis¯ed by the m.l.e. d nx n(m ¡ x) log L(p; x) = ¡ = 0 dp p=^p p^ 1 ¡ p^ i.e. x nx =pnm ^ i:e: p^ = : m Example 3 Let X1;X2;:::;Xn denote a random sample of size n from the Poisson distribution with unknown parameter ¹ > 0 such that, for each i = 1; : : : ; n, xi ¡¹ ¹ fXi (xi; ¹) = e ; xi = 0; 1; 2;::: xi! Find the maximum likelihood estimator of ¹. Solution: Since 1 Pn ¡n¹ i=1 xi L(¹; x) = fX(x; ¹) = Qn £ e ¹ i=1 xi! then à ! Xn Xn log L(¹; x) = ¡n¹ + xi log ¹ ¡ log(xi!) i=1 i=1 Hence d 1 Xn log L(¹; x) = ¡n + x = 0 d¹ ¹ i i=1 which, upon solving for ¹, yields 1 Xn ¹^ = x : n i i=1 3 Example 4 Suppose that the random variable Y has a probability density function given by θ¡1 fY (y; θ) = θy 0 < y < 1 where θ > 0. Let Y1;Y2;:::;Yn be a random sample, of size n, drawn from the same distribution as that of Y . Based on this random sample, show that the maximum likelihood estimator of θ is given by n ¡Pn : i=1 log Yi Solution: à ! Yn Yn Yn θ¡1 θ¡1 n L(θ; y) = fY(y; θ) = fYi (yi; θ) = θyi = θ yi : i=1 i=1 i=1 Xn log L(θ; y) = n log θ + (θ ¡ 1) log yi: i=1 Thus d n Xn log L(θ; y) = + log y dθ θ i i=1 which, when equated to 0 and solved for θ, yields ^ n θ(y) = ¡Pn : i=1 log yi The following two theorems, which we state without proof, show two general properties of MLEs, which give support to the use of MLEs. The ¯rst theorem formally demonstrates that the method of maximum likelihood is tied in with the concept of su±ciency, some- thing that is not generally true for estimators derived under other procedures. Theorem 1 Let X be a vector of observations from a family F of distributions and suppose that a MLE θ^ exists and is unique, i.e, for each x 2 X there exists a unique θ^(x) that maximizes ln L(θ; x). For any su±cient statistic T , the estimator θ^ is a function of T . Rather than estimating the vector of parameters θ, we may wish to estimate some function ¿(θ) of the parameters. Another useful result about MLEs is the following. Theorem 2 (The Invariance Property of MLEs) If θ^ is a MLE of θ then, for any function ¿(θ) of θ; ¿(θ^) is a MLE of ¿(θ). Example 5 To give a simple illustration of the use ofP Theorem 2, for a random sample of size n from a N(¹; σ2) distribution, as we have seen, (X ¡X¹)2=n is the MLE of σ2. It follows pP i ¹ 2 that (Xi ¡ X) =n is the MLE of σ. 4 10.2 The method of least squares for one-way models Recall the linear statistical model for the completely randomized one-way design, Yij = ¹ + ¿i + "ij i = 1; : : : ; a; j = 1; : : : ; ni; (4) for which the systematic part of Yij is E[Yij] ´ ¹ + ¿i ´ ¹i: Given estimates¹ ^ and¿ ^i; i = 1; : : : ; a, of ¹ and the ¿i, respectively, the corresponding ¯tted value y^ij is given by substituting the estimates of the parameters into the expression for E[Yij], i.e., y^ij =¹ ^ +¿ ^i: According to the method of least squares, given the observed values yij i = 1; : : : ; a; j = 1 1; : : : ; ni, we choose as our estimates,¹ ^ and¿ ^i; i = 1; : : : ; a, those values of the parameters ¹ and ¿i, i = 1; : : : ; a, which jointly minimize Xa Xni 2 L = (yij ¡ ¹ ¡ ¿i) : (5) i=1 j=1 Setting the partial derivative of L with respect to ¹ equal to zero, and evaluating at the parameter estimates, we obtain N¹^ + n1¿^1 + n2¿^2 + ::: + na¿^a = y::; (6) and, setting the partial derivative of L with respect to each of the¿ ^i equal to zero, and evaluating at the parameter estimates, we obtain ni¹^ + ni¿^i = yi: i = 1; : : : ; a: (7) The a+1 equations (6) and (7) for the a+1 least squares estimates¹ ^ and¿ ^i; i = 1; : : : ; a, are known as the (least squares) normal equations.