Distributionally Robust Optimization: a Review
Total Page:16
File Type:pdf, Size:1020Kb
DISTRIBUTIONALLY ROBUST OPTIMIZATION: A REVIEW HAMED RAHIMIAN∗ AND SANJAY MEHROTRAy Abstract. The concepts of risk-aversion, chance-constrained optimization, and robust opti- mization have developed significantly over the last decade. Statistical learning community has also witnessed a rapid theoretical and applied growth by relying on these concepts. A modeling frame- work, called distributionally robust optimization (DRO), has recently received significant attention in both the operations research and statistical learning communities. This paper surveys main concepts and contributions to DRO, and its relationships with robust optimization, risk-aversion, chance-constrained optimization, and function regularization. Key words. Distributionally robust optimization; Robust optimization; Stochastic optimiza- tion; Risk-averse optimization; Chance-constrained optimization; Statistical learning AMS subject classifications. 90C15, 90C22, 90C25, 90C30, 90C34, 90C90, 68T37, 68T05 1. Introduction. Many real-world decision problems arising in engineering and management have uncertain parameters. This parameter uncertainty may be due to limited observability of data, noisy measurements, implementations and predic- tion errors. Stochastic optimization (SO) and (2) robust optimization frameworks have classically allowed to model this uncertainty within a decision-making frame- work. Stochastic optimization assumes that the decision maker has complete knowl- edge about the underlying uncertainty through a known probability distribution and minimizes a functional of the cost, see, e.g., Shapiro et al. [295], Birge and Louveaux [45]. The probability distribution of the random parameters is inferred from prior beliefs, expert opinions, errors in predictions based on the historical data (e.g., Kim and Mehrotra [171]), or a mixture of these. In robust optimization, on the other hand, it is assumed that the decision maker has no distributional knowledge about the un- derlying uncertainty, except for its support, and the model minimizes the worst-case cost over an uncertainty set, see, e.g., El Ghaoui and Lebret [99], El Ghaoui et al. [100], Ben-Tal and Nemirovski [19], Bertsimas and Sim [34], Ben-Tal and Nemirovski [20], Ben-Tal et al. [26]. The concept of robust optimization has a relationship with chance-constrained optimization, where in certain cases there is a direct relationship between a robust optimization model and a chance-constrained optimization model, see, e.g., Boyd and Vandenberghe [57, pp157{158]. We often have partial knowledge on the statistical properties of the model pa- rameters. Specifically, the probability distribution quantifying the model parameter uncertainty is known ambiguously. A typical approach to handle this ambiguity, from a statistical point of view, is to estimate the probability distribution using statis- tical tools, such as the maximal likelihood estimator, minimum Hellinger distance arXiv:1908.05659v1 [math.OC] 13 Aug 2019 estimator [311], or maximum entropy principle [126]. The decision-making process can then be performed with respect to the estimated distribution. Because such an estimation may be imprecise, the impact of inaccuracy in estimation|and the sub- sequent ambiguity in the underlying distribution|is widely studied in the literature through (1) the perturbation analysis of optimization problems, see, e.g., Bonnans and Shapiro [55], (2) stability analysis of a SO model with respect to a change in ∗Department of Industrial Engineering and Management Sciences, Northwestern University, Evan- ston, IL 60208 ([email protected]). yDepartment of Industrial Engineering and Management Sciences, Northwestern University, Evan- ston, IL 60208 ([email protected]). 1 2 H. RAHIMIAN AND S. MEHROTRA the probability distribution, see, e.g, Rachev [250], R¨omisch [263], or (3) input un- certainty analysis in stochastic simulation models, see, e.g., Lam [175] and references therein. The typical goals of these approaches are to quantify the sensitivity of the op- timal value/solution(s) to the probability distribution and provide continuity and/or large-deviation-type results, see, e.g., Dupaˇcov´a[96], Schultz [272], Heitsch et al. [142], Rachev and R¨omisch [248], Pflug and Pichler [230]. While these approaches quantify the input uncertainty, they do not provide a systematic modeling framework to hedge against the ambiguity in the underlying probability distribution. Ambiguous stochastic optimization is a systematic modeling approach that bridges the gap between data and decision-making|statistics and optimization frameworks| to protect the decision-maker from the ambiguity in the underlying probability dis- tribution. The ambiguous stochastic optimization approach assumes that the under- lying probability distribution is unknown and lies in an ambiguity set of probability distributions. As in robust optimization, this approach hedges against the ambiguity in probability distribution by taking a worst-case approach. Scarf [271] is arguably the first to consider such an approach to obtain an order quantity for a newsvendor problem to maximize the worst-case expected profit, where the worst-case is taken with respect to all product demand probability distributions with a known mean and variance. Since the seminal work of Scarf, and particularly in the past few years, sig- nificant research has been done on ambiguous stochastic optimization problems. This paper provides a review of the theoretical, modeling, and computational developments in this area. Moreover, we review the applications of the ambiguous stochastic opti- mization model that have been developed in the recent years. This paper also puts DRO in the context of risk-averse optimization, chance-constrained optimization, and robust optimization. 1.1. A General DRO Model. We now formally introduce the model formu- lation that we discuss in this paper. Let x 2 X ⊆ Rn be the decision vector. On a measurable space (Ξ; F), let us define a random vector ξ~ :Ξ 7! Ω ⊆ Rd, a random cost function h(x; ξ~): X ×Ξ 7! R, and a vector of random functions g(x; ξ~): X ×Ξ 7! Rm, > i.e., g(x; ·) := [g1(x; ·); : : : ; gm(x; ·)] . Given this setup, a general stochastic optimiza- tion problem has the form n h ~ i h ~ i o (SO) inf RP h(x; ξ) RP g(x; ξ) ≤ 0 ; x2X where P denotes the (known) probability measure on (Ξ; F) and RP : Z 7! R de- notes a (componentwise) real-valued functional under P , where Z is a linear space of measurable functions on (Ξ; F). The functional RP accounts for quantifying the uncertainty in the outcomes of the decision, for a given fixed probability measure P . This setup represents a broad range of problems in statistics, optimization, and con- trol, such as regression and classification models [106, 163], simulation-optimization [107, 227], stochastic optimal control [31], Markov decision processes [246], and sto- chastic programming [45, 295]. As a special case of (SO), we have the classical stochastic programming problems: h ~ i (1.1) inf EP h(x; ξ) ; x2X and n h ~ i o (1.2) inf h(x) EP g(x; ξ) ≤ 0 ; x2X DISTRIBUTIONALLY ROBUST OPTIMIZATION: A REVIEW 3 where RP [·] is taken as the expected-value functional EP [·]. Note that by taking h(x; ·) := 1A(x)(·) in (1.1), where 1A(x)(·) denotes an indicator function for an ar- bitrary set A(x) ⊆ B(Rd) (we define the indicator function and B(Rd) precisely in Section2), we obtain the class of problems with a probabilistic objective function of the form P fξ~ 2 A(x)g, see, e.g., Pr´ekopa [244]. The set A(x) is called a safe region and may be of the form a(x)>ξ~ ≤ b(x) or a(ξ~)>x ≤ b(ξ~)1. Similarly, by tak- ~ 1 1 > ing h(x; ξ) := h(x) and g(x; ·) := [ A1(x)(·);:::; Am(x)(·)] , for suitable indicator 1 functions Aj (x)(·), j = 1; : : : ; m,(1.2) is in the form of probabilistic (i.e., chance) ~ constraints P fξ 2 Aj(x)g ≤ 0; j = 1; : : : ; m, see, e.g., Charnes et al. [68], Charnes and Cooper [67], Pr´ekopa [243, 245], Dentcheva [86]. Note that the case where the ~ event fξ 2 Aj(x)g is formed via several constraints is called joint chance constraint ~ as compared to individual chance constraint, where the event fξ 2 Aj(x)g is formed via one constraint. A robust optimization model is defined as ( ) (RO) inf sup h(x; ξ) sup g(x; ξ) ≤ 0 ; x2X ξ2U ξ2U where U ⊆ Rd denotes an uncertainty set for the parameters ξ~. Similar to (SO), (1.3) inf sup h(x; ξ) x2X ξ2U and ( ) (1.4) inf h(x) sup g(x; ξ) ≤ 0 x2X ξ2U are two special cases of (RO). Problem (SO), as well as (1.1) and (1.2), require the knowledge of the under- lying measure P , whereas (RO), as well as (1.3) and (1.4), ignore all distributional knowledge of ξ~, except for its support. An ambiguous version of (SO) is formulated as h ~ i h ~ i (DRO) inf sup RP h(x; ξ) sup RP g(x; ξ) ≤ 0 : x2X P 2P P 2P Here, P denotes the ambiguity set of probability measures, i.e., a family of measures consistent with the prior knowledge about uncertainty. Note that if we consider the measurable space (Ω; B), where B denotes the Borel σ-field on Ω, i.e., B = Ω \B(Rd), then P can be viewed as an ambiguity set of probability distributions P defined on (Ω; B) and induced by ξ~2. As discussed before, (DRO) finds a decision that minimizes the worst-case of the functional R of the cost h among all probability measures in the ambiguity set 1We say a safe region of the form a(x)>ξ~ ≤ b(x) is bi-affine in x and ξ if a(x) and b(x) are both affine in x. Similarly, we say a safe region of the form a(ξ~)>x ≤ b(ξ~) is bi-affine in x and ξ if a(ξ~) and b(ξ~) are both affine in ξ~.