Robust Maximum Likelihood Estimation

INFORMS JOURNAL ON COMPUTING Articles in Advance,pp.1–14 http://pubsonline.informs.org/journal/ijoc/ ISSN1091-9856(print),ISSN1526-5528(online) Robust Maximum Likelihood Estimation Dimitris Bertsimas,a Omid Nohadanib a Operations Research Center and Sloan School of Management, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139; b Department of Industrial Engineering and Management Sciences, Northwestern University, Evanston, Illinois 60208 Contact: [email protected], http://orcid.org/0000-0002-1985-1003 (DB); [email protected], http://orcid.org/0000-0001-6332-3403 (ON) Received: January 10, 2016 Abstract. In many applications, statistical estimators serve to derive conclusions from Revised: February 12, 2017; December 19, 2017 data, for example, in finance, medical decision making, and clinical trials. However, the Accepted: April 30, 2018 conclusions are typically dependent on uncertainties in the data. We use robust opti- Published Online in Articles in Advance: mization principles to provide robust maximum likelihood estimators that are protected April 26, 2019 against data errors. Both types of input data errors are considered: (a) the adversarial https://doi.org/10.1287/ijoc.2018.0834 type, modeled using the notion of uncertainty sets, and (b) the probabilistic type, modeled by distributions. We provide efficient local and global search algorithms to compute the robust Copyright: © 2019 INFORMS estimators and discuss them in detail for the case of multivariate normally distributed data. The estimator performance is demonstrated on two applications. First, using computer simulations, we demonstrate that the proposed estimators are robust against both types of data uncertainty and provide more accurate estimates compared with classical estimators, which degrade significantly, when errors are encountered. We establish a range of uncertainty sizes for which robust estimators are superior. Second, we analyze deviations in cancer radiation therapy planning. Uncertainties among plans are caused by patients’ individual anatomies and the trial-and-error nature of the process. When analyzing a large set of past clinical treatment data, robust estimators lead to more reliable decisions when applied to a large set of past treatment plans. History: Accepted by Karen Aardal, Area Editor for Design and Analysis of Algorithms. Funding: O. Nohadani received support from the National Science Foundation [Grant CMMI-1463489]. Supplemental Material: The online supplement is available at https://doi.org/10.1287/ijoc.2018.0834. Keywords: optimization • robust optimization • robust statistics • maximum likelihood estimator • radiation therapy 1. Introduction of maximum likelihood estimators completely. There- Maximum likelihood is widely used to successfully con- fore, it is instrumental to the success of the application to struct statistical estimators for parameters of a proba- construct estimators that are intrinsically robust against bility distribution. This method is prevalent in many possible sources of uncertainty. applications, ranging from econometrics and machine In this work, we seek to estimate the parameter θ for learning to many areas of science and engineering, the probability density function f θ, x of an ensemble m ( ) where the nature of the data motivates a functional of n data points xi R , which can be contaminated by ∆ m ∈ form of the underlying distribution. The parameters of errors xi R in some uncertainty set 8. The robust ∈ this distribution remain to be estimated (Pfanzagl 1994). MLE maximizes the worst-case likelihood via For uncertain distributions, the estimators can be lo- n max min f θ; x ∆x , cated in a family of them by leveraging the minimax ∆ ∏ i i (1) θ X 8 i 1 ( − ) asymptotic variance (Huber et al. 1964, Huber 1996). ∈ ! This is also possible in the common case of symmetric following the robust optimization (RO) paradigm. contamination (Jaeckel 1971). Maximum likelihood Robust optimization has increasingly been used as estimator (MLE) methods typically assume data to be an effective way to immunize solutions against data complete, precise, and free of errors. In reality, however, uncertainty. In principle, if errors are not taken into data are often insufficient. Moreover, any input data account, an otherwise optimal solution may turn out can be subject to errors and perturbations. These can to be suboptimal, or even in some cases infeasible. RO, stem from (a) measurement errors, (b) input errors, however, considers errors to reside within an uncer- (c) implementation errors, (d) numerical errors, or (e) model tainty set and aims to calculate solutions that are robust errors. These sources of uncertainty affect the quality to such uncertainty. There is a sizable body of literature of the estimators and can degrade outcomes signifi- on various aspects of RO, and we refer to Ben-Tal et al. cantly, so much so that we might lose the advantages (2009) and Bertsimas et al. (2011). In the context of 1 Bertsimas and Nohadani: Robust Maximum Likelihood Estimation 2 INFORMS Journal on Computing, Articles in Advance, pp. 1–14, © 2019 INFORMS simulation-based problems, that is, problems not given data, where errors are observed on the data and a priori by a closed form solution, a local search algorithm was information on distributions is not available. proposed that provides robust solutions to unconstrained In principle, errors may be of an adversarial nature, (Bertsimas et al. 2010b)andconstrained(Bertsimasetal. where no probabilistic information about their source is 2010a)problemswithoutexploitingthestructureofthe known, or of a distributional nature, where the source is problem or the uncertainty set. known to be probabilistic. Correspondingly, we discuss The effects of measurement errors on statistical es- two kinds of robust maximum likelihood estimators: timators have been addressed extensively in the liter- 1. Adversarially robust: The worst-case scenario is ature, for example, by Buonaccorsi (2010) and the calculated among possible errors that reside in some references within. Measurement error models assume uncertainty set. To compute these estimators, we pro- a distribution of the observed values given the true pose two methods: a first-order gradient descent al- values of a certain quantity (Fuller 2009). This is the gorithm, which is highly efficient and warrants local reverse of the Berkson error model, which assumes optimal robust estimators, and a global search method a distribution on the true values given the observed based on robust simulated annealing, which provides values (Berkson 1950). The rich literature for error cor- global optimal robust estimators at higher computa- rection provides a plethora of techniques for correcting tionally expense. additive errors in linear regression (Cheng et al. 1999). 2. Distributionally robust: The worst-case scenario is El Ghaoui and Lebret (1997) showed that robust least evaluated among errors that are independent and fol- squares problems for erroneous but bounded data low some distribution residing in some set of distri- can be formulated as second-order cone or semidefi- butions. Such errors resemble persistent errors. Using nite optimization problems, and thus become effi- distributional robust optimization techniques, we show ciently solvable. In the context of maximum likelihood, that their estimators are a particular case of adversa- Calafiore and El Ghaoui (2001) elaborated on estima- rially robust estimators. tors in linear models in the presence of Gaussian noise To demonstrate the performance of the methods, we whose parameters are uncertain. The proposed esti- apply our methods to two types of data sets. First, we mators maximize a lower bound on the worst-case conduct numerical experiments on simulated data to likelihood using semidefinite optimization. ensure a controlled setting and to be able to determine In the context of robust statistics, Huber (1980) in- the deviation from true data. We show that for small- troduced estimators that are insensitive to perturba- sized errors, both the local and the global RO methods tions. The robustness of estimators is measured in yield comparable estimates. For larger errors, how- different ways: For instance, the breakdown point is ever, we observe that the robust simulated annealing defined as the minimum amount of contamination that method outperforms the local search method. Moreover, causes the estimator to become unreliable. Another we show that the proposed estimators are also immune measure is the influence curve that describes the impact against the source of uncertainty; that is, even if the of outliers to an estimator (Hampel 1974). errors follow a different distribution than anticipated, In this paper, we introduce a robust MLE method to the estimators remain robustly optimal. Furthermore, produce estimators that are robust to data uncertainty. the proposed robust estimators turn out to be signifi- Our approach differs from Huber’s(1980) in multiple cantly more accurate compared with classical maxi- facets, as summarized in Table 1. Because the likeli- mum likelihood estimators, which degrade sizably, hood is not proportional to the error in the estimation, when errors are encountered. Finally, we establish the our proposed method considers the worst case directly range within which the robust estimators are most in the likelihood. Correspondingly, we believe that our effective. This range can inform practitioners about proposed approach is directly relevant to real-world

Load more