IEEE TRANSACTIONS ON RELIABILITY, VOL. 54, NO. 3, SEPTEMBER 2005 389 Robust Weighted Likelihood Estimation of Exponential Parameters Ejaz S. Ahmed, Andrei I. Volodin, and Abdulkadir. A. Hussein

Abstract—The problem of estimating the parameter of an ex- • sample ponential distribution when a proportion of the observations are • -trimmed sample mean is quite important to reliability applications. The method • quadratic risk of a of weighted likelihood is applied to this problem, and a robust of the exponential parameter is proposed. Interestingly, • , Convergence as fast as a, convergence faster the proposed estimator is an -trimmed mean type estimator. than The large-sample robustness properties of the new estimator are examined. Further, a Monte Carlo simulation study is conducted I. INTRODUCTION showing that the proposed estimator is, under a wide of contaminated exponential models, more efficient than the usual HE EXPONENTIAL distribution is the most commonly maximum likelihood estimator in the sense of having a smaller T used model in reliability & life-testing analysis [1], [2]. risk, a measure combining bias & variability. An application of the We propose a method of weighted likelihood for robust estima- method to a set on the failure times of throttles is presented. tion of the exponential distribution parameter. We establish that Index Terms—Exponential distribution, maximum likelihood es- the suggested weighted likelihood method provides -trimmed timation, robust estimation, weighted likelihood method. mean type . Further, we investigate the robustness properties of the weighted likelihood estimator (WLE) in com- ACRONYMS1 parison with the usual maximum likelihood estimator (MLE). This can be viewed as an application of the classical likelihood • a.s. almost surely that was introduced by [3] as the relevance weighted likelihood • WLE weighted likelihood estimator to problems of robust estimation of parameters. The weighted • MLE maximum likelihood estimator likelihood method was introduced as a generalization of the local likelihood method. For further discussion of the local like- NOTATION lihood method in the context of , see • random sample size [4]–[6]. In contrast to the local likelihood, the weighted likeli- • parameter of interest hood method can be global, as demonstrated by one of the ap- • parametric space plications in [7] where the James-Stein estimator is found to • a parametric family of cumulative be a maximum weighted likelihood estimator with weights es- distribution functions timated from the data. • a parametric family of probability The theory of weighted likelihood enables a bias-precision density functions trade of to be made without relying on a Bayesian approach. • the usual maximum likelihood estimator The latter permits the bias- trade of to be made in a of parameter conceptually straightforward manner. However, the reliance on • the proposed estimator of the parameter empirical Bayes methods softens the demands for realistic prior • the natural logarithm function modeling in complex problems. The weighted likelihood theory • special weights that we use for the esti- offers a simple alternative to the empirical Bayesian approach mation method for many complex problems. At the same time, it links within • an obstructing parameter a single formal framework a diverse collection of statistical do- • mains, such as weighted , nonparametric regres- the set of obstructing distributions where , and sion, meta-analysis, and shrinkage estimation. The weighted likelihood principle comes with an underlying general theory • , , fixed positive constants that extends Wald’s theory for maximum likelihood estimators, as shown in [8]. Manuscript received August 28, 2003. This research was supported by the We propose applying the method of weighted likelihood to Natural Sciences and Engineering Research Council of Canada (NSERC). As- the problem of the robust estimation of the parameter of an sociate Editor: M. Xie. exponential distribution by assigning zero weights to observa- S. E. Ahmed and A. A. Hussein are with the Department of Mathematics and Statistics, University of Windsor, Windsor, Ontario, N9B 3P4 Canada. tions with small likelihood. Interestingly, the weighted likeli- A. I. Volodinis with the Department of Mathematics and Statistics, University hood method yields -trimmed mean type estimators of the of Regina, Regina, Saskatchewan, S4S 4G2 Canada. parameter of interest. The statistical asymptotic properties of Digital Object Identifier 10.1109/TR.2005.853276 the WLE is developed, and a simulation study is conducted to 1The singular and plural of an acronym are always spelled the same. appraise the behavior of the proposed estimators for moderate

0018-9529/$20.00 © 2005 IEEE 390 IEEE TRANSACTIONS ON RELIABILITY, VOL. 54, NO. 3, SEPTEMBER 2005 sample sizes in comparison with the usual maximum likelihood tion function . That is, is defined by the given small estimator. probability in the equation An analogous idea appears in [9]. However, these authors use a different type of likelihood equation in which they do not weigh the likelihood of each observation, but rather the loga- rithmic derivatives. Hence, the proposed estimation strategy is a completely different approach. From this equation, we obtain . This approximation of is reasonable because in our calculations we II. THE PROPOSED WEIGHTED LIKELIHOOD ESTIMATOR will replace by the limit in probability (even almost surely) Let in general be a parametric family of cu- of this estimator, i.e., by : So, we choose the value of as mulative distribution functions with corresponding probability density functions . Estimators of obtained by maximizing the weighted log- Hence, we reject an observation from the sample if . The probability is user-dependent as in [9], and can be considered as a measure of the level of desired robustness. Example: Here we consider a from [1] (also used where depends on the sample ; are called by [10]) on the failure times of throttles. The data reports the Weighted Likelihood Estimators (WLE) of . If all the weights failure times of 25 units where time is measured in Kilome- are equal to unity, then the resulting estimator is the usual ters driven prior to failure. According to exponential model fit, maximum likelihood estimator. the estimated rate of failure for this data is .We In this paper, we suggest a special choice of weights ; con- added 5% contamination from a different exponential with rate nected with the theory of robust estimation, and based on the .000 05. That is equivalent to augmenting the data by one ob- maximum likelihood method with rejection of spurious obser- servation. The estimate of after contamination is . vations. Let be the usual maximum likelihood At level of robustness, the rejection criteria is estimator of the parameter . The weight that corresponds to . Thus, only the one extra observation will be Re- the observation is assumed to be 1, if its estimated likelihood moved, and now . is suffciently large, and 0 elsewhere; that is Furthermore, to illustrate that the proposed procedure & the if -trimmed estimators do not necessarily coincide, we notice elsewhere. that a .01-trimming procedure would not reject the contaminant observation, whereas our robust procedure with would Consequently, we delete all improbable observations from the reject it. sample (the choice of is discussed in the next paragraph). Not In the following section, we study the robustness properties of surprisingly, it can be seen that in the case of a uni-modal proba- the & by using contaminated (obstructing) distributions. bility density function we reject only extreme order statistics. However, this may not be the case for multi-modal probability III. ROBUSTNESS OF THE PROPOSED ESTIMATOR density functions. The proposed estimator of the parameter is defined as In general, define a set of obstructing distributions as the solution of the equation (1) where denotes the cumulative distribution function with den- sity function , , and : As it is commonly done where are the remaining observations in the in the classical robustness studies (c.f., [11]), under the assump- sample after the rejection procedure. tion that the sample is taken from the distribution with fixed In the case of the exponential distribution, these estimators one would find the limits in probability of the estimates & are obtained by using in the above equations the appropriate for , and compare their biases. The values of for distribution, and density functions given by, respectively which is less than , indicate the regions in the parameter space where the estimate is to be favored over the estimate . Along the same lines, we shall also compare the quadratic risks of these estimates. The quadratic risk of an es- where , . timator is the expected (average) squared distance between the Regarding the issue of the choice of , we suggest ; estimator, and the parameter being estimated. In other words, where can be chosen as in the selection of the critical constant the quadratic risk is an omnibus measure of the performance of in the criterion of elimination of outliers. Therefore, in the ex- an estimator which takes into consideration the bias & the pre- ponential case, is chosen from the condition of a small proba- cision (variance) of the estimator. bility of rejection of an observation assuming that the observa- Let ; , and be any positive Numbers, tion has come from the above exponential cumulative distribu- and put . Although in practice depends on the AHMED et al.: ROBUST WEIGHTED LIKELIHOOD ESTIMATION OF EXPONENTIAL PARAMETERS 391 sample size, , here for the sake of brevity we assume that it is The probability of rejecting an observation in the obstructed a constant. model is asymptotically equal to In the exponential distribution case, and in virtu of (1), we assume that the sample is taken from the distri- bution

Hence, the asymptotic distribution of equals the distribution of the -trimmed sample mean Under this contaminated model, note that

of a random sample of size from the distribution con- and by the strong law of large numbers centrated on the interval (0, ). The probability density of this distribution is positive only on this interval, and has the form

On the other hand, the estimate converges in proba- bility (even almost surely) to some value which can be where . calculated as the limit of the expected values truncated at the Denote by , and the mathematical expecta- point of the distribution tion, and variance, respectively, of a relative , . That is to the distribution with probability density function . With the preliminaries accounted for, the first result can now be presented. Theorem 1: The maximum likelihood estimator under the obstructing model has the relative bias (2)

Indeed, these integrals can be evaluated in closed form. How- and the estimator has the relative bias ever the solution is cumbersome, thus rendering the comparison of the relative biases & quite dif- ficult. In fact The above relation reveals that the relative bias of the pro- posed estimator in comparison with relative bias of the MLE and the integral in the numerator of (2) equals decreases in the value by the order . There- fore, it is apparent that the value of must be chosen as a quantity of order ; otherwise the values of will be under-estimated (negative bias). A possibility is to choose Therefore, an exact calculation of the gain in bias, and reduction from the relation , if the & can be ap- in risk of the proposed estimator as compared to the usual MLE, propriately guessed. On the other hand, the difference between may not be possible. Here we shall give only approximate anal- the relative bias of , and that of is positive, meaning there ysis concerning these quantities under convenient assumptions, is always some gain in terms of bias reduction, as long as is such as not too small compared to . Next, we provide approximate expressions for the quadratic risks of the proposed estimator. Theorem 2: The maximum likelihood estimator has the quadratic risk for some , or equivalently, . This requirement is not restrictive at all, as can be seen from the simulation re- sults in Section IV. In fact, in practice, is a constant, and is (3) bounded for all reasonable sample sizes. and the estimator has the asymptotic representation of the Recall that we reject observations satisfying quadratic risk , and note that

almost surely (4) 392 IEEE TRANSACTIONS ON RELIABILITY, VOL. 54, NO. 3, SEPTEMBER 2005

TABLE I ” SIMULATED &ASYMPTOTIC BIAS FOR ESTIMATORS  &  , FOR n aPHH& aIXH

TABLE II ” SIMULATED &ASYMPTOTIC RISK FOR ESTIMATORS  &  , FOR n a PHH & aIXH

Fig. 1. The approximate relative risk i a ‚@” Aa‚@ A as function of . The line through y aIis the benchmark.

The relative efficiency of the estimator in comparison with TABLE III ” ” maximum likelihood estimator has the asymptotic represen- DIFFERENCES BETWEEN BIASES fi—s@A fi—s@ A &RISKS ‚@A ‚@ A FOR n aQH& aIXH tation

In passing, we note that more precise results can be obtained based on the joint distribution of the extreme terms of the sample, and the sample mean. TABLE IV ” ” By examining the formula for , one notes that most of the DIFFERENCES BETWEEN BIASES fi—s@A fi—s@ A &RISKS ‚@A ‚@ A risk reduction will be achieved for small samples, and for small FOR n aQH& aQXH as compared to the size of the sample. The following Fig. 1 shows such a pathology for , 60, and for small values. From the graph, apparently, the gain in risk will decrease as in- creases, or as increases, for fixed . However, the simulations reported in the next section show a richer set of the parameters , , over which the proposed estimator has lesser risk than the MLE. To achieve this, we generate a random number from the uni- IV. MONTE-CARLO SIMULATIONS form distribution on the interval [0, 1], and compare with . If , then take . Otherwise . Next, For the purposes of simulation, we fix , , generate a random number from the exponential distribution and . Recall that, if a random variable has the uniform with the parameter . The random number has the cumulative distribution on the interval [0, 1], then will have distribution function . All the following simulation results the exponential distribution with parameter . We need this re- are based on 5000 replicates. sult for generating random numbers from the exponential distri- Tables I and II show for the extent to which the bution. simulated & the asymptotic biases, and risks of the proposed We generate random numbers from the obstructed distribu- estimator & the MLE agree. The asymptotic biases & risks are tion calculated using the relations in Theorems 1 & 2. Note that the risk of the estimator does not depend on . From the tables, we see that the asymptotic results are accurate to the second decimal digit except for small & in the case of bias, and for This procedure is organized in the following way. The main idea large & in the case of risk. is that, with probability , we need to generate a random To examine the performance of the proposed estimator in number from the exponential distribution with parameter terms of bias & risk, we reported in Tables III–XI the differ- ; and with probability , generate a random number from the ences between the biases , and the risks exponential distribution with parameter . of the two estimators for different values of , , AHMED et al.: ROBUST WEIGHTED LIKELIHOOD ESTIMATION OF EXPONENTIAL PARAMETERS 393

TABLE V TABLE X ” ” ” ” DIFFERENCES BETWEEN BIASES fi—s@A fi—s@ A &RISKS ‚@A ‚@ A DIFFERENCES BETWEEN BIASES fi—s@A fi—s@ A &RISKS ‚@A ‚@ A FOR n aQH& aSXH FOR n a PHH & aQXH

TABLE VI TABLE XI ” ” ” ” DIFFERENCES BETWEEN BIASES fi—s@A fi—s@ A &RISKS ‚@A ‚@ A DIFFERENCES BETWEEN BIASES fi—s@A fi—s@ A &RISKS ‚@A ‚@ A FOR n a IHH & aIXH FOR n a PHH & aSXH

TABLE VII V. C ONCLUSION ” ” DIFFERENCES BETWEEN BIASES fi—s@A fi—s@ A &RISKS ‚@A ‚@ A FOR n a IHH & aQXH The estimation of the parameter of an exponential distribution under contamination (i.e., in the presence of outliers) is quite rel- evant to reliability, and various robust estimation strategies have been suggested in the recent past. However, there is no one best estimation strategy for all situations. In this paper we suggested a robust weighted likelihood estimator (WLE) for this problem. We examined the risk (a measure combining bias & variability) of the estimator as compared to the usual maximum likelihood TABLE VIII estimator (MLE), and illustrated through approximate formulae ” ” DIFFERENCES BETWEEN BIASES fi—s@A fi—s@ A &RISKS ‚@A ‚@ A & simulations that WLE performs better than the MLE under FOR n a IHH & aSXH various contamination scenario. The methodology of the paper, and the estimator proposed, can be extended to the case of censored reliability data. Suppose that the observed data is subject to independent Type I right-cen- soring. Therefore the observed data comes in the form of pairs , where , is the failure time, is the time, and if or zero other- wise. Here, the MLE of is , where , TABLE IX ” ” DIFFERENCES BETWEEN BIASES fi—s@A fi—s@ A &RISKS ‚@A ‚@ A are, respectively, the number of failed items, FOR n a PHH & aIXH and average of the observed times. Defining , and following the arguments in Sec- tion II, the criteria for rejection of extreme observations would be . Although the proof may be more in- volving, it is very plausible that the optimality & robustness properties noticed for the estimator of this paper will also be true for its counterpart for the censored failure data. Other types of censoring can be handled in a similar way. , and . These tables show the advantage of the estimator A further interesting generalization of the suggested method- in the majority of the cases considered. The pathology pointed ology can be made to the case of the two-parameter Weibull out in the previous section is clearly seen in these tables. For distribution. See [2] for detailed treatment of the Weibull distri- instance, the gain in terms of risk reduction in favor of de- bution & related issues. The criteria for a Weibull would creases as gets larger while fixing all the remaining param- be to reject any observation such that eters. Similar behavior can be observed for increasing while . One may expand this expression, and isolate or keeping all other parameters fixed. Also, as intuitively expected, neglect either the term or depending on the the gain in terms of bias & risk in favor of the proposed esti- values of , and then proceed as in the paper to obtain mator increases as the parameter of the contaminating observa- the robustness level, and finally replace & by their MLE. tions gets farther away from that of the main stream observations The robust estimators so obtained should then be compared to (i.e., as becomes larger). other estimators of the Weibull parameters such as , 394 IEEE TRANSACTIONS ON RELIABILITY, VOL. 54, NO. 3, SEPTEMBER 2005 least squares, probability weighted moment, and maximum like- Direct computations show that the second moment of the nor- lihood estimators (see for example [12]–[14]. Models that are malized random variable with distribution equals modifications of the Weibull distribution, and their censored . Hence versions, are also fertile areas to which the methods of the paper can be extended [15]–[17].

Finally, APPENDIX A PROOF OF THEOREM 1 For the asymptotic bias of the estimator ,wehave which proves (3). We have

where

Further, ,so with the remainder term The MLE has mathematical expectation , and the weighted likelihood estimator has mathematical expectation With some algebra, one can show that

Hence, the gain in bias has the order .

APPENDIX B PROOF OF THEOREM 2 The proof is based on the following two elementary integrals with a remainder term of the order . Hence we have established that

and which completes the proof of (4). The asymptotic relative efficiency of the estimator to the estimator is

We mainly use the case . The quadratic risk of is AHMED et al.: ROBUST WEIGHTED LIKELIHOOD ESTIMATION OF EXPONENTIAL PARAMETERS 395

ACKNOWLEDGMENT [15] C. D. Lai, M. Xie, and D. N. P. Murthy, “A modified weibull distribu- tion,” IEEE Trans. Reliab., vol. 52, no. 1, pp. 33–37, 2003. The authors would like to thank the associate editor and [16] M. Xie, Y. Tang, and T. N. Goh, “A modified weibull extension with bathtub-shaped function,” Reliab. Eng. Syst. Saf., vol. 76, anonymous referees for their constructive comments. no. 3, pp. 279–285, 2002. [17] U. Balasooriya and C. K. Low, “Competing causes of failure and reli- ability tests for weibull lifetimes under type i progressive censoring,” IEEE Trans. Reliab., vol. 53, no. 1, pp. 29–36, 2004. REFERENCES

[1] W. R. Blischke and D. N. P. Murthy, Reliability: Modeling, Prediction and Optimization : Wiley, 2000. Ejaz S. Ahmed is Professor and Head of the Department of Mathematics and [2] D. N. P. Murthy, M. Xie, and R. Jiang, Weibull Models: Wiley, 2004. Statistics and Director of the Statistical Consulting, Research and Services [3] F. Hu, “Symptotic properties of relevance weighted likelihood estima- Centre at the University of Windsor. Before joining the University of Windsor, Canad. J. Statist. tions,” , vol. 25, pp. 45–60, 1997. he was Professor and Head of the Department of Mathematics and Statistics at J. Amer. [4] R. Tibshirani and T. Hastie, “Local likelihood estimation,” University of Regina, and had a faculty position at the University of Western Statist. Assoc. , vol. 82, pp. 559–567, 1987. Ontario. His area of expertise includes , multivariate anal- [5] J. G. Staniswalis, “The kernel estimate of a regression function in likeli- ysis and asymptotic theory. He has published numerous articles in scientific J. Amer. Statist. Assoc. hood-based models,” , vol. 84, pp. 276–283, 1989. journals, both collaborative and methodological. Further, he co-edited two [6] J. Fan, N. E. Heckman, and W. P. Wand, “Local polynomial kernel re- volumes and organized many sessions and workshops. He made several invited J. gression for generalized linear models and quasilikelihood functions,” scholarly presentations in various countries and supervised graduate students Amer. Statist. Assoc., vol. 90, pp. 826–838, 1995. both at masters’ and Ph.D. level. Dr. Ahmed serves on the editorial board of The Relevance Weighted Likelihood With Appli- [7] F. Hu and J. V. Zidek, many statistical journals. cations, S. E. Ahmed and N. Reid, Eds: Springer Verlag, 2001, vol. 25, Empirical Bayes and Likelihood Inference. [8] , “The weighted likelihood,” Canad. J. Statist., vol. 30, pp. 347–372, 2002. [9] D. J. Dupuis and S. Morgenthaler, “Robust weighted likelihood estima- Andrei I. Volodin (B.S. in Mathematics, Kazan University, Russia, 1983; Kan- tors with an application to bivariate extreme value problems,” Canad. J. didant Nouk in Mathematics and Physics (Russian analog of Ph.D.)(1999) Vil- Statist., vol. 30, pp. 17–36, 2002. nius University (Lithuania); Ph.D. in Statistics, University of Regina, Canada, [10] R. Jiang and D. N. P. Murthy, “Graphical representation of two mixed 2002) is an assistant professor in Statistics at the University of Regina, Regina, Weibull distributions,” IEEE Trans. Reliab., vol. 44, pp. 477–488, 1995. Saskatchewan, Canada. His research topics are: Bootstrap estimation; Shrinkage [11] P. J. Huber, Robust Statistics. New York: Wiley, 1981. estimation; Limit theorems; Distributions on abstract spaces. [12] J. H. Gove, “Moment and maximum likelihood estimators for Weibull distributions under length- and area-biased ,” Environmental and Ecological Statistics, vol. 10, no. 4, pp. 455–467, 2003. [13] S. Mahdi and F. Ashkar, “Exploring generalized probability weighted Abdulkadir A. Hussein (B.S. in Applied Mathematics, University of Trieste, moments, generalized moments and maximum likelihood estimating Italy, 1996; M.Sc. (1999) and Ph.D. (2003) in Statistics, University of Alberta, methods in two-parameter Weibull model,” Journal of Hydrology, vol. Edmonton, Canada) is an assistant professor in Statistics at the University of 285, pp. 62–75, 2004. Windsor in Windsor, Ontario, Canada. His research topics are: Techniques for [14] K. R. Skinner, J. B. Keats, and W. J. Zimmer, “A comparison of three the analyzes of censored data and their applications in medical and industrial estimators of the Weibull parameters,” Qual. Reliab. Eng. Int., vol. 17, problems. Change-point and Sequential analyzes; Health outcomes and policy no. 4, pp. 249–256, 2001. research.