6 Point Estimation

6 Point Estimation Copyright © Cengage Learning. All rights reserved. Methods of Point 6.2 Estimation Copyright © Cengage Learning. All rights reserved. Methods of Point Estimation The definition of unbiasedness does not in general indicate how unbiased estimators can be derived. We now discuss two “constructive” methods for obtaining point estimators: the method of moments and the method of maximum likelihood. By constructive we mean that the general definition of each type of estimator suggests explicitly how to obtain the estimator in any specific problem. 3 Methods of Point Estimation Although maximum likelihood estimators are generally preferable to moment estimators because of certain efficiency properties, they often require significantly more computation than do moment estimators. It is sometimes the case that these methods yield unbiased estimators. 4 The Method of Moments 5 The Method of Moments The basic idea of this method is to equate certain sample characteristics, such as the mean, to the corresponding population expected values. Then solving these equations for unknown parameter values yields the estimators. 6 The Method of Moments Definition Let X1, . , Xn be a random sample from a pmf or pdf f(x). For k = 1, 2, 3, . , the kth population moment, or kth moment of the distribution f(x), is E(Xk). The kth sample moment is Thus the first population moment is E(X) = µ, and the first sample moment is ΣXi/n = The second population and sample moments are E(X2) and 2 ΣXi /n, respectively. The population moments will be functions of any unknown parameters θ1, θ2, . 7 The Method of Moments Definition Let X1, X2, . , Xn be a random sample from a distribution with pmf or pdf f (x; θ1, . , θm), where θ1, . , θm are parameters whose values are unknown. Then the moment estimators θ1, . , θm are obtained by equating the first m sample moments to the corresponding first m population moments and solving for θ1, . , θm. 8 The Method of Moments If, for example, m = 2, E(X) and E(X2) will be functions of θ1 and θ2. 2 2 Setting E(X) = (1/n) Σ Xi (= ) and E(X ) = (1/n) Σ Xi gives two equations in θ1 and θ2. The solution then defines the estimators. 9 Example 12 Let X1, X2, . , Xn represent a random sample of service times of n customers at a certain facility, where the underlying distribution is assumed exponential with parameter λ. Since there is only one parameter to be estimated, the estimator is obtained by equating E(X) to . Since E(X) = 1/λ for an exponential distribution, this gives 1/λ = or λ = 1/ . The moment estimator of λ is then λ 10 Maximum Likelihood Estimation 11 Maximum Likelihood Estimation The method of maximum likelihood was first introduced by R. A. Fisher, a geneticist and statistician, in the 1920s. Most statisticians recommend this method, at least when the sample size is large, since the resulting estimators have certain desirable efficiency properties. 12 Example 15 A sample of ten new bike helmets manufactured by a certain company is obtained. Upon testing, it is found that the first, third, and tenth helmets are flawed, whereas the others are not. Let p = P(flawed helmet), i.e., p is the proportion of all such helmets that are flawed. Define (Bernoulli) random variables X1, X2, . , X10 by 13 Example 15 cont’d Then for the obtained sample, X1 = X3 = X10 = 1 and the other seven Xi’s are all zero. The probability mass function of any particular Xi is , which becomes p if xi = 1 and 1 – p when xi = 0. Now suppose that the conditions of various helmets are independent of one another. This implies that the Xi’s are independent, so their joint probability mass function is the product of the individual pmf’s. 14 Example 15 cont’d Thus the joint pmf evaluated at the observed Xi’s is . 3 7 f (x1, . , x10; p) = p(1 – p)p p = p (1 – p) (6.4) Suppose that p = .25. Then the probability of observing the sample that we actually obtained is (.25)3(.75)7 = .002086. If instead p = .50, then this probability is (.50)3(.50)7 = .000977. For what value of p is the obtained sample most likely to have occurred? That is, for what value of p is the joint pmf (6.4) as large as it can be? What value of p maximizes (6.4)? 15 Example 15 cont’d Figure 6.5(a) shows a graph of the likelihood (6.4) as a function of p. It appears that the graph reaches its peak above p = .3 = the proportion of flawed helmets in the sample. Graph of the likelihood (joint pmf) (6.4) from Example 15 Figure 6.5(a) 16 Example 15 cont’d Figure 6.5(b) shows a graph of the natural logarithm of (6.4); since ln[g(u)] is a strictly increasing function of g(u), finding u to maximize the function g(u) is the same as finding u to maximize ln[g(u)]. Graph of the natural logarithm of the likelihood Figure 6.5(b) 17 Example 15 cont’d We can verify our visual impression by using calculus to find the value of p that maximizes (6.4). Working with the natural log of the joint pmf is often easier than working with the joint pmf itself, since the joint pmf is typically a product so its logarithm will be a sum. Here 3 7 ln[ f (x1, . , x10; p)] = ln[p (1 – p) ] = 3ln(p) + 7ln(1 – p) (6.5) 18 Example 15 cont’d Thus [the (1) comes from the chain rule in calculus]. 19 Example 15 cont’d Equating this derivative to 0 and solving for p gives 3(1 – p) = 7p, from which 3 = 10p and so p = 3/10 = .30 as conjectured. That is, our point estimate is = .30. It is called the maximum likelihood estimate because it is the parameter value that maximizes the likelihood (joint pmf) of the observed sample. In general, the second derivative should be examined to make sure a maximum has been obtained, but here this is obvious from Figure 6.5. 20 Example 15 cont’d Suppose that rather than being told the condition of every helmet, we had only been informed that three of the ten were flawed. Then we would have the observed value of a binomial random variable X = the number of flawed helmets. The pmf of X is For x = 3, this becomes The binomial coefficient is irrelevant to the maximization, so again = .30. 21 Maximum Likelihood Estimation The likelihood function tells us how likely the observed sample is as a function of the possible parameter values. Maximizing the likelihood gives the parameter values for which the observed sample is most likely to have been generated—that is, the parameter values that “agree most closely” with the observed data. 22 Example 16 Suppose X1, X2, . , Xn is a random sample from an exponential distribution with parameter λ. Because of independence, the likelihood function is a product of the individual pdf’s: The natural logarithm of the likelihood function is ln[ f (x1, . , xn ; λ)] = n ln(λ) – λΣxi 23 Example 16 cont’d Equating (d/dλ)[ln(likelihood)] to zero results in n/λ – Σxi = 0, or λ = n/Σxi = Thus the mle is it is identical to the method of moments estimator [but it is not an unbiased estimator, since 24 Example 17 Let X1, . , Xn be a random sample from a normal distribution. The likelihood function is so 25 Example 17 cont’d 2 To find the maximizing values of µ and σ , we must take 2 the partial derivatives of ln(f ) with respect to µ and σ , equate them to zero, and solve the resulting two equations. Omitting the details, the resulting mle’s are 2 The mle of σ is not the unbiased estimator, so two different principles of estimation (unbiasedness and maximum likelihood) yield two different estimators. 26 Estimating Functions of Parameters 27 Estimating Functions of Parameters In Example 17, we obtained the mle of σ2 when the underlying distribution is normal. The mle of σ = , as well as that of many other mle’s, can be easily derived using the following proposition. Proposition The Invariance Principle Let be the mle’s of the parameters θ1, θ2, . , θm. Then the mle of any function h(θ1, θ2, . , θm) of these parameters is the function h( ) of the mle’s. 28 Example 20 Example 17 continued… In the normal case, the mle’s of µ and σ2 are To obtain the mle of the function substitute the mle’s into the function: The mle of σ is not the sample standard deviation S, though they are close unless n is quite small. 29 Large Sample Behavior of the MLE 30 Large Sample Behavior of the MLE Although the principle of maximum likelihood estimation has considerable intuitive appeal, the following proposition provides additional rationale for the use of mle’s. Proposition Under very general conditions on the joint distribution of the sample, when the sample size n is large, the maximum likelihood estimator of any parameter θ is approximately unbiased and has variance that is either as small as or nearly as small as can be achieved by any estimator. Stated another way, the mle is approximately the MVUE of θ. 31 Large Sample Behavior of the MLE Because of this result and the fact that calculus-based techniques can usually be used to derive the mle’s (though often numerical methods, such as Newton’s method, are necessary), maximum likelihood estimation is the most widely used estimation technique among statisticians.

Load more