Distributionally robust inventory control when demand is a martingale
Linwei Xin University of Chicago, [email protected], http://faculty.chicagobooth.edu/linwei.xin/ David A. Goldberg Cornell University, [email protected], https://people.orie.cornell.edu/dag369/
Demand forecasting plays an important role in many inventory control problems. To mitigate the potential harms of model misspecification in this context, various forms of distributionally robust optimization have been applied. Although many of these methodologies suffer from the problem of time-inconsistency, the work of Klabjan, Simchi-Levi and Song [85] established a general time-consistent framework for such problems by connecting to the literature on robust Markov decision processes. Motivated by the fact that many forecasting models exhibit very special structure, as well as a desire to understand the impact of positing different dependency structures, in this paper we formulate and solve a time-consistent distributionally robust multi-stage newsvendor model which naturally unifies and robustifies several inventory models with demand forecasting. In particular, many simple models of demand forecasting have the feature that demand evolves as a martingale (i.e. expected demand tomorrow equals realized demand today). We consider a robust variant of such models, in which the sequence of future demands may be any martingale with given mean and support. Under such a model, past realizations of demand are naturally incorporated into the structure of the uncertainty set going forwards. We explicitly compute the minimax optimal policy (and worst-case distribution) in closed form, by com- bining ideas from convex analysis, probability, and dynamic programming. We prove that at optimality the worst-case demand distribution corresponds to the setting in which inventory may become obsolete at a random time, a scenario of practical interest. To gain further insight, we prove weak convergence (as the time horizon grows large) to a simple and intuitive process. We also compare to the analogous setting in which demand is independent across periods (analyzed previously in Shapiro [119]), and identify interest- ing differences between these models, in the spirit of the price of correlations studied in Agrawal et al. [2]. Finally, we complement our theoretical results by providing a targeted and concise numerical experiment further demonstrating the benefits of our model. Key words : inventory control, distributionally robust optimization, martingale, dynamic programming, robust Markov decision process, demand forecasting
1. Introduction and literature review 1.1. Introduction The fundamental problem of managing an inventory over time in the presence of stochastic demand is one of the core problems of Operations Research, and related models have been studied since at least the seminal work of Edgeworth [48]. In many practical settings of interest, demands are correlated over time (cf. Scarf [114, 115], Iglehart and Karlin [77]). As a result, there is a vast lit- erature investigating inventory models with correlated demand, including: studies of the so-called bull-whip effect (cf. Ryan [112], Chen et al. [36], Lee et al. [87]); models with Markov-modulated demand (cf. Feldman [52], Iglehart and Karlin [77], Kalymon [83]); models with forecasting, includ- ing models in which demand follows an auto-regressive / moving average (ARMA) or exponentially smoothed process (cf. Johnson and Thompson [82], Lovejoy [89], Blinder [28], Pindyck [104], Miller [96], Badinelli [6], Gallego and Ozer [58], Levi et al. [88], Chao [35], Boute, Disney, and Van Mieghem [31]); and models obeying the Martingale Model of Forecast Evolution (MMFE) and its
1 Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 2 many generalizations (cf. Heath and Jackson [71], Graves et al. [65], Toktay and Wein [127], Erkip et al. [50], Iida and Zipkin [78], Lu, Song and Regan [90], Milner and Kouvelis [97]). Although several of these works offer insights into the qualitative impact of correlations on the optimal policy (and associated costs) when managing an inventory over time, these results are typically proven under very particular distributional assumptions, which assume perfect knowledge of all relevant distributions. This is potentially a significant problem, since various authors have previously noted that model misspecification when demand is correlated can lead to very sub-optimal policies (cf. Badinelli [6]). Indeed, the use of such time series and forecasting models in Operations Research practice is well-documented, and concerns over the practical impact of model misspecification have been raised repeatedly in the forecasting and Operations Research literature (cf. Fildes [53], Fildes et al. [54]). One approach taken in the literature to correcting for such model uncertainty is so-called dis- tributionally robust optimization. In this framework, one assumes that the joint distribution (over time) of the sequence of future demands belongs to some set of joint distributions, and solves the minimax problem of computing the control policy which is optimal against a worst-case distri- bution belonging to this set. Such a distributionally robust approach is motivated by the reality that perfect knowledge of the exact distribution of a given random process is rarely available (cf. Dupacov´a[46, 47], Pr´ekopa [106], Z´aˇckov´a[ˇ 138]). A typical minimax formulation is as below:
min max Q [f(x, ξ)] , x∈X Q∈M E where X is a set of decisions and M is a set of probability measures. The objective is to pick a decision x that minimizes the average cost of f under a worst-case distribution. The application of such distributionally robust approaches to the class of inventory control prob- lems was pioneered in Scarf [113], where it was assumed that M contains all probability measures whose associated distributions satisfy certain moment constraints. Such an approach has been taken to many variants of the single-stage model since then (cf. Gallego and Moon [60], Gallego [57], Perakis and Roels [102], Gallego and Moon [61], Yue et al. [137], Hanasusanto et al. [68], Zhu et al. [140]), and the single-stage distributionally robust model is quite broadly understood. The analogous questions become more subtle in the multi-stage setting, due to questions regard- ing the specification of uncertainty in the underlying joint distribution. There have been two approaches taken in the literature, depending on whether the underlying optimization model is static or dynamic in nature. In a static formulation, one specifies a family of joint distributions for demand over time, typically by e.g. fixing various moments and supports, or positing that the distribution belongs to some appropriately defined ball, and then solves an associated global min- imax optimization problem (cf. Delage and Ye [43], See and Sim [117], Popescu [105], Bertsimas et al. [18], Dupacov´a[46, 47], Bertsimas, Gupta and Kallus [20]). Such static formulations gener- ally cannot be decomposed and solved by dynamic programming (DP), because the distributional constraints do not contain sufficient information about how the distribution behaves under con- ditioning. Put another way, such models generally do not allow for the incorporation of realized demand information into model uncertainty / robustness going forwards (i.e. re-optimization in real time), and are generally referred to as time-inconsistent in the literature (cf. Ahmed, Cakmak and Shapiro [4], Iancu, Petrik and Subramanian [74], Xin, Goldberg and Shapiro [136], Epstein and Schneider [49]). This inability to incorporate realized demand information may make such approaches undesirable for analyzing models which explicitly consider the forecasting of future demand (cf. Klabjan, Simchi-Levi and Song [85], Iancu, Petrik and Subramanian [74], Shapiro [118]). We note that although some of these static formulations have been able to model settings in which information is revealed over time and/or temporal dependencies manifest through time series (cf. See and Sim [117], Lam [86], Gupta [67]), the fundamental inability to incorporate real- ized demand (in the sense of time in-consistency) remains. Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 3
Alternatively, in a dynamic formulation, the underlying family of potential joint distributions must admit a certain decomposition which ensures a desirable behavior under conditioning, and a resolution by DP. The existence of such a decomposition is generally referred to as the rect- angularity property (cf. Iyengar [80], Iancu, Petrik and Subramanian [74], Epstein and Schneider
[49], Shapiro [120]). For example, the set of all joint distributions for the vector of demands (D1,D2) such that (s.t.): E[D1] = 1, E[D2|D1] = D1, and (D1,D2) has support on the non-negative integers is rectangular, since the feasible set of joint distributions may be decomposed as follows. To each possible realized value d for D1 (i.e. each non-negative integer), we may associate a fixed collection S(d) of possible conditional distributions for D2 (i.e. those distributions with mean d and support on the non-negative integers). Furthermore, every feasible distribution for the vector (D1,D2) may be constructed by first selecting a feasible distribution D for D1 (i.e. any distribution with mean one and support on the non-negative integers), and for each element d in the support of D, setting the conditional distribution of D2 (given {D1 = d}) to be any fixed element of S(d). Moreover, the set of joint distributions constructible in this manner is precisely the set of feasible distribu- tions for the vector (D1,D2). Alternatively, if we had instead required that E[D1] = E[D2] = 1, and E[D1D2] = 2 (with the same support constraint), the corresponding set of feasible joint distributions would not be rectangular, as it may be verified that such a decomposition is no longer possible. For a formal definition and more complete / precise description of the rectangularity property, we refer the reader to cf. Iyengar [80], Iancu, Petrik and Subramanian [74], Epstein and Schneider [49], Xin, Goldberg and Shapiro [136], Nilim and El Ghaoui [99], Bielecki et al. [14], Shapiro [120], and note that since various communities have studied several closely related notions at different times, a complete consensus on a common rigorous definition has not yet been reached. As dis- cussed in Nilim and El Ghaoui [99], we note that such dynamic formulations are closely related to the theory of stochastic games (cf. Altman et al. [5]), and have also been studied within the mathematical finance community (cf. Bayraktar et al. [11], Bayraktar and Yao [10], Glasserman and Xu [63], Cheng and Riedel [38], F¨ollmerand Leukert [55], Nishimura and Ozaki [100], Riedel [109], Trojanowska and Kort [128]). There are several works which formulate DP approaches to distributionally robust inventory models (cf. Gallego [59], Ahmed, Cakmak and Shapiro [4], Choi and Ruszczynski [39], Shapiro [119], Xin, Goldberg and Shapiro [136]). More generally, such dynamic problems can typically be formulated as so-called robust Markov decision processes (MDP) (cf. Iyengar [80], Nilim and El Ghaoui [99], Wiesemann, Kuhn and Rustem [135]), akin to robust formulations considered previ- ously within the control community (cf. Hansen and Sargent [69]). To the best of our knowledge, the only such work which considers applications to correlated demands and forecasting models within Operations Research is that of Klabjan, Simchi-Levi and Song [85]. In that work, the authors make the connection between dynamic distributionally robust inventory models and robust MDP, in a very general sense, and prove e.g. the optimality of state-dependent (s, S) policies, where the state incorporates the entire history of demand. The authors also consider a more specialized model in which the demand in period t may come from any distribution within a certain ball of the empirical histogram of all past demands. We note that many simple forecasting models common in the Operations Research and statistics literature (e.g. linear time series) have considerable special structure, and to the best of our knowledge precisely how this structure would manifest within the general framework of Klabjan, Simchi-Levi and Song [85] (and associated general framework of robust MDP) remains an open question. On a related note, to the best of our knowledge, there seems to have been no systematic study of the impact of positing different joint dependency structures in such multi-stage distributionally robust inventory control problems, e.g. comparing the properties and cost of a minimax optimal policy. The quest to develop such an understanding in the broader context of stochastic optimiza- tion (not specifically inventory control) was initiated in Agrawal et al. [2], where the authors define Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 4 the so-called price of correlations as the ratio between the optimal minimax value when all asso- ciated random variables (r.v.) are independent, and the setting where they may take any joint distribution belonging to an allowed family. Although the authors do not look specifically at any inventory problems, they stress the general importance of understanding how positing different joint distribution uncertainty impacts the underlying stochastic optimization. Also, although sev- eral authors have demonstrated that the family of non-stationary history-dependent s-S policies is optimal for a broad class of minimax inventory problems (cf. Ahmed, Cakmak and Shapiro [4], Not- zon [101], Klabjan, Simchi-Levi and Song [85]), the impact of the posited dependency structure on the corresponding s-S policy remains poorly understood. We note that although these questions remain open, the general theory of sensitivity analysis for (distributionally robust) optimization problems is very well developed (cf. Bonans and Shapiro [29]), and that different yet related ques- tions are the subject of several recent investigations (cf. Gupta [67]). Combining the above, we are led to the following question.
Question 1. Can we construct effective dynamic distributionally robust variants of the struc- tured time series and forecasting models common in the Operations Research and statistics lit- erature? Furthermore, can we develop a theory of how positing different dependency structures impacts the optimal policy and cost for such models?
1.2. Additional literature review Highly relevant to any discussion of distributionally robust optimization (either static or dynamic) is the vast literature on classical (i.e. deterministic) robust optimization, in which the only con- straints made are on the supports of the associated r.v.s (cf. Ben-Tal, El Ghaoui and Nemirovski [15], Bertsimas, Brown and Caramanis [19]). In this setting, the worst-case distribution is always degenerate, namely a point-mass on a particular worst-case trajectory. Such models often lead to tractable global optimization problems for fairly complex models, and have been successfully applied to several inventory settings (cf. Kasugai and Kasegai [84], Aharon et al. [3], Ben-Tal et al. [17], Bertsimas, Iancu and Parrilo [22], Bertsimas and Thiele [25], Ben-Tal et al. [16], Carrizosaa et al. [33], Solyali et al. [122], Sun and Van Mieghem [126], Mamani, Nassiri and Wagner [92]). Indeed, these models tend to be more tractable than corresponding distributionally robust formu- lations, where the question of tractability (under both static and dynamic formulations) may be more sensitive to the particular modeling assumptions (cf. See and Sim [117], Bertsimas, Sim, and Zhang [24], Wiesemann, Kuhn and Rustem [135], Delage and Ye [43]). In spite of their potential computational advantages, one drawback of such classical robust approaches is that they may be overly conservative, and unable to capture the stochastic nature of many real-world problems (cf. See and Sim [117]). We note that the precise relationship between classical robust and distribu- tionally robust approaches remains an intriguing open question. On a related note, questions of robust time series models and their applications to inventory models, similar in spirit to those we will consider in the present work, were recently studied in Carrizosaa et al. [33], at least for single-stage models. There the authors formulated a general notion of robust time series, and prove that the corresponding single-stage minimax newsvendor problem can be formulated as a convex optimization problem by using a budget-of-uncertainty approach similar to that studied in [7]. Such an approach has also been used to study robust versions of multi-period inventory models in which demand is correlated (cf. [92]). As regards connections to our earlier discussion of static and dynamic formulations in the distributionally robust setting, we note that the potential modeling benefits of being able to dynamically update one’s uncertainty sets has also been considered in the deterministic robust setting (cf. Bertsimas and Vayanos [26], Wagner [132]). Indeed, an approach to robust multi-period inventory problems based on deterministic robust optimization / the budget- of-uncertainty approach and optimal control was recently presented in [132], where we note that Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 5 their approach and findings are in a sense incomparable to our own. Also relevant to questions involving distributionally robust optimization is the search for “opti- mal” probability bounds. In particular, some of the most celebrated results in probability theory are of the form “any random vector with property A has probability at most p of belonging to set S” for appropriate choices of A,p,S. That obtaining tight versions of these bounds can be phrased as an appropriate distributionally robust optimization problem is by now well-known within the Operations Research community, especially since this connection was clarified in the seminal work of Bertsimas and Popescu [23]. Such problems, e.g. those considered in Bertsimas and Popescu [23], typically involve formulating a static optimization problem in which one specifies knowledge or bounds on certain moments, covariances, supports, etc., and frames finding a “worst-case” distri- bution satisfying these conditions as a convex optimization problem. Recent work of Beigblock and Nutz [12] has brought to light the fact that certain probability inequalities, specifically well-known inequalities for martingales such as the celebrated Burkholder inequality, can be similarly formu- lated using distributionally robust optimization, but are more amenable to a dynamic formulation, due to the fundamentally conditional structure of the martingale property. Another related line of work is that on so-called optimal transport problems, especially that subset focusing on so-called optimal martingale transport, and we refer the interested reader to Dolinsky and Soner [44] and Biegblock et al. [13] for details and additional references. We close this overview by noting that there is a vast literature on time series in the statistics community (cf. Box, Jenkins, and Reinsel [30]), and we make no attempt to survey that litera- ture here. These investigations include considerable work on robust approaches (cf. Maronna et al. [94]), where different lines of research use different notions of robustness (cf. Stockinger and Dutter [125]). Most relevant to our own investigations are the many such works related to distributional misspecification, which has been investigated within the statistics community for over 50 years (cf. Tukey [129], Huber [73], Stockinger and Dutter [125], Medina and Ronchetti [95]), and includes the analysis of minimax notions of robustness (cf. Franke [56], Luz and Moklyachuk [91]) and so-called robust Kalman filters (cf. Ruckdeschel [111]). In spite of this vast literature, it is worth noting that even recently, there have been several calls for the development of more tools and methodologies for the robust analysis of time series data. For example, in De Gooijer and R. Hyndman [42], it is noted that a persistent problem in the application of time series models is “the frequent misuse of methods based on linear models with Gaussian i.i.d. distributed errors.” A similar commentary was given very recently in Guerrier et al. [66], in which the authors note that “the robust estimation of time series parameters is still a widely open topic in statistics for various reasons.” Such calls for the development of new methodologies further serve to under- score the potential impact of combining ideas from robust optimization with time series analysis, e.g. through Question1.
1.3. Our contributions In this paper we take a step towards answering Question1 by formulating and solving a dynamic distributionally robust multi-stage newsvendor model which naturally unifies the analysis of several inventory models with demand forecasting. In particular, we consider a dynamic distributionally robust multi-stage inventory control problem (with backlogged demand) in which the sequence of future demands may be any martingale with given mean and support (assumed the same in every period). Note that the martingale demand assumption has been widely used in many demand forecasting models in the literature, including the additive-MMFE in which the demand process is given by Dt − Dt−1 = t where {t}t≥1 is a sequence of i.i.d. r.v.s with mean zero; and the Dt multiplicative-MMFE in which the demand process is given by = exp (ηt) where {ηt}t≥1 is a Dt−1 sequence of i.i.d. r.v.s with E [exp (η1)] = 1. Here we refer the interested reader to Wang, Atasu and Kurtulus [133] for further discussion of such models and their study in the inventory literature. Our Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 6 contributions are four-fold. First, we explicitly compute the minimax optimal policy (and associated worst-case distribution) in closed form. Our main proof technique involves a non-trivial induction, combining ideas from convex analysis, probability, and DP. Second, we prove that at optimality the worst-case demand distribution corresponds to the setting in which the inventory may become obsolete at a random time, a scenario of practical interest which has been studied previously in the literature (cf. Pierskalla [103], Brown [32], Joglekar and Lee [81], Song and Zipkin [124], Cerag [34]). Third, to gain further insight into our explicit solution, we prove weak convergence (as the time horizon grows large) to a simple and intuitive process. Fourth, in the spirit of Agrawal et al. [2], we compare to the analogous setting in which demand is independent across periods (analyzed previously in Shapiro [119]), and identify interesting differences between these two models. In particular, we prove that when the initial inventory level is zero, the minimax optimal cost is maximized under the dependency structure of the independent-demand model. More precisely, the minimax optimal cost when nature is restricted to selecting a joint distribution (for demand) under which demand is independent over time (and satisfies the given support constraints) is equivalent to the minimax optimal cost when nature need only satisfy the support constraints (and may select any joint distribution whatsoever). Furthermore, we compute the limiting ratio (as the time horizon grows large) of the minimax optimal value under the martingale-demand model to that under the independent-demand model, which has a simple closed form and evaluates to exactly 1 2 in the perfectly symmetric case. Interestingly, we find that under different initial conditions, the situation may in fact be reversed, showing that such comparative statements may be quite subtle. Finally, we complement our theoretical results by providing a targeted and concise numerical experiment further demonstrating the benefits of our model. Indeed, our experiments find that when the true demand distribution has the dependency structure of an additive-MMFE model, the robust minimax-optimal policy we devise in this work significantly outperforms the previously studied robust policy which is minimax-optimal under an independence assumption.
1.4. Outline of paper Our paper proceeds as follows. We begin by reviewing the model with independent demand (ana- lyzed in Shapiro [119]) in Section 2.1. We then introduce our distributionally robust model with martingale dependency structure in Section 2.2, and prove the validity of a certain DP formulation (very similar to the robust MDP framework of Iyengar [80]). We present our main results in Sec- tion 2.3, which include the explicit solution to our distributionally robust martingale model, the weak convergence, and a comparison between our martingale-demand model and the independent- demand model. The proofs of our main results regarding the explicit solution of the associated DP and several related structural properties are given in Section3. In Section4 we prove our weak convergence results. In Section5, we conduct a targeted and concise numerical experiment. We provide a summary of our results, directions for future research, and concluding remarks in Section6. We also provide a technical appendix, which contains the proofs of several results used throughout the paper, as well as some additional supporting results, in Section7.
2. Main results Before stating our main results, we briefly review the inventory model analyzed in Shapiro [119], which we refer to as the independent-demand model.
2.1. Independent-demand model Consider the following distributionally robust inventory control problem with backlogging, finite time horizon T , strictly positive linear backorder per-unit cost b, and a per-unit holding cost of 1, where we note that assuming a holding cost equal to 1 is without loss of generality (w.l.o.g.) due to a simple scaling argument. Let Dt be the demand in period t, and xt be the inventory level Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 7 at period t after placing a non-negative order, t ∈ [1,T ], where we note that the order must be placed at the start of period t before Dt is known. For a t-dimensional vector x (potentially of real numbers, r.v.s, functions, etc.), let x[s] = (x1, . . . , xs), and x[t1,t2] = (xt1 , . . . , xt2 ). In that case, we require that xt is a measurable function of D[t−1], xt ≥ xt−1 − Dt−1 for t ∈ [2,T ], and that x1 is a real constant, where x0 ∈ [0,U] denotes the initial inventory level and x1 ≥ x0. For t ∈ [1,T ], the cost incurred in period t equals
∆ Ct = b[Dt − xt]+ + [xt − Dt]+,
∆ ∆ where z+ = max(z, 0), z− = max(−z, 0), z ∈ R. As a notational convenience, we also define ∆ C(x, d) = b(d − x)+ + (x − d)+. As in Iyengar [80], to avoid later possible concerns about measurability, throughout we make the minor technical assumption that all relevant demand distributions have rational support, and all √policies π considered are restricted to order rational quantities (i.e. not irrational quantities such as 2), and the relevant problem primitives x0, µ, U, b (corresponding to initial inventory level, mean demand, bound on support of demand, and backorder penalty respectively) also belong to Q+, with + x0, µ ∈ [0,U], where Q is the set of all non-negative rational numbers. Under this assumption, it may be easily verified (due to the finiteness and non-negativity of all relevant cost functions) that all relevant functions arising in the various DP which we shall later define are sufficiently regular, e.g. measurable, integrable, and finite-valued. For clarity of exposition, and in light of this assumption, we do not dwell further on related questions of measurability, and instead refer the interested reader to the excellent reference Puterman [107] for further discussion. We now formally and more precisely define the relevant notion of admissible policy in our setting. As our analysis will be restricted to the setting in which there is a uniform upper bound on the support of demand (denoted U), for simplicity we will incorporate U directly into several of our ∆ T definitions. Let [0,U]Q = [0,U] Q, where Q is the set of all rational numbers. Then we define an π π admissible policy π to be a T -dimensional vector of functions {xt , t ∈ [1,T ]}, s.t. x1 is a constant π π ∆ π ∆ π t−1 satisfying x1 ≥ x0 = y1 = x0; and for t ∈ [2,T ], xt is a deterministic map from [0,U]Q to Q, s.t. for t π π ∆ π t ∈ [1,T − 1] and d = (d1, . . . , dt) ∈ [0,U]Q, xt+1(d) ≥ yt+1(d) = xt (d[t−1]) − dt, which is equivalent to requiring that a non-negative amount of inventory is ordered in each period. Sometimes as a π π π notational convenience, we will denote x1 by x1 (d[0]), or more generally as x1 (q) for q ∈ [0,U]Q. Let Πˆ denote the family of all such admissible policies. Note that once a particular policy π ∈ Πˆ π is specified, the associated costs {Ct , t ∈ [1,T ]} are explicit functions of the vector of demands, π π and we sometimes make this dependence explicit with the notation Ct (D[t]), where Ct (D[t]) = π C(xt (D[t−1]),Dt). Since we will exclusively consider settings in which the demand in every period belongs to [0,U], and the initial inventory belongs to [0,U], it will be convenient to prove that in such a setting one may w.l.o.g. restrict to policies which always order up to a level in [0,U]. In particular, let ˆ π Π denote that subset of Π consisting of policies π s.t. xt (d[t−1]) ∈ [0,U] for all t ∈ [1,T ] and T d ∈ [0,U]Q whenever x0 ∈ [0,U]Q (which again is to be assumed throughout). Then the following lemma provides the desired result, where we defer the relevant proof to the Technical Appendix in Section7. ˆ T Lemma 1. For every πˆ ∈ Π, there exists π ∈ Π s.t. for all d = (d1, . . . , dT ) ∈ [0,U]Q and t ∈ [1,T ], π πˆ Ct (d[t]) ≤ Ct (d[t]). Let us say that a probability measure Q on an appropriate measurable space is rationally sup- ported if it places probability 1 on t-dimensional rational vectors for some finite t, where we note that all measures which we consider will have this property. For such a measure Q, let supp(Q) (i.e. the support of Q) denote the (possibly infinite) set of rational vectors which Q assigns strictly Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 8 positive probability, and DQ denote a r.v. (or vector if supp(Q) is not one-dimensional) distributed as Q. For µ ∈ [0,U]Q, let M(µ) be the collection of all rationally supported probability measures Q Q s.t. supp(Q) ⊆ [0,U]Q, and E[D ] = µ. Furthermore, let INDT denote the collection of all T - dimensional product measures s.t. all T marginal distributions belong to M(µ). When there is no ambiguity, we will suppress the dependence on T , simply writing IND. In words, the joint dis- tribution of demand belongs to IND iff the demand is independent across time periods, and the demand in each period has support a subset of [0,U]Q and mean µ. Then the particular optimiza- PT π tion problem considered in Shapiro [119] Example 5.1.2 is infπ∈Πˆ supQ∈IND EQ[ t=1 Ct ], which by Lemma1 is equivalent to
T X π inf sup EQ[ Ct ]. (2.1) π∈Π Q∈IND t=1
We note that although the exact problem considered in Shapiro [119] does not explicitly restrict the family of policies and demand distributions to Q, it may be easily verified that the same results go through in this setting (under our assumptions on the rationality of the relevant problem parame- ters). In Example 5.1.2 of Shapiro [119], the author proves that due to certain structural properties, Problem (2.1) can be reformulated as a DP, as we now describe. We note that although the family of measures IND does not possess the so-called rectangularity property, here the decomposition follows from the fact that convexity and independence imply that irregardless of the inventory level and past realizations of demand, a worst-case demand distribution always has support {0,U}, where the precise distribution is then dictated by the requirement of having expectation µ. To formally define the relevant DP, we now define a sequence of functions {V t, t ≥ 1}, {f t, t ≥ 1}, {gt, t ≥ 1}. In general, such a DP would be phrased in terms of the so-called “cost-to-go” functions, with the t-th such function representing the remaining cost incurred by an optimal policy during periods T − t + 1 through T (i.e., if there are t periods to go), subject to the given state at time t. Here and throughout, we use the fact that the backorder and holding costs are the same in every period to simplify the relevant concepts and notations. In particular, due to this symmetry and the associated self-reducibility which it induces, for all T ≥ 1, it will suffice to define the aforementioned cost-to-go function for the first time period only. Indeed, in the following function definitions, V T (y, µ) will coincide with the optimal value of Problem (2.1) when the initial inventory level is y and the associated mean is µ (we leave the dependence on U implicit), where f T , gT have analogous interpretations. We note that to prevent later potential confusion and ambiguity regarding the order of operations such as supremum and expectation, we explicitly write out certain expectations as relevant sums over appropriate domains. Then the relevant DP is as follows.
1 ∆ 1 ∆ X 1 f (x1, d1) = C(x1, d1) , g (x1, µ) = sup f (x1, q1)Q1(q1), Q1∈M(µ) q1∈[0,U]Q 1 ∆ 1 1 ∆ X 1 V (y1, µ) = inf g (z1, µ) ,Q (x1, µ) = arg max f (x1, q1)Q1(q1), z ≥y 1 1 Q1∈M(µ) (2.2) z1∈[0,U]Q q1∈[0,U]Q 1 ∆ 1 Φ (y1, µ) = arg min g (z1, µ); z1≥y1 z1∈[0,U]Q Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 9 and for T ≥ 2,
T ∆ T −1 T ∆ X T f (xT , dT ) = C(xT , dT ) + V (xT − dT , µ) , g (xT , µ) = sup f (xT , qT )QT (qT ), QT ∈M(µ) qT ∈[0,U]Q T ∆ T T ∆ X T V (yT , µ) = inf g (zT , µ) ,Q (xT , µ) = arg max f (xT , qT )QT (qT ), z ≥y T T QT ∈M(µ) zT ∈[0,U]Q qT ∈[0,U]Q T ∆ T Φ (yT , µ) = arg min g (zT , µ). zT ≥yT zT ∈[0,U]Q (2.3) We have given the argument of the relevant functions with superscript T the subscript T (e.g. T xT , dT are the arguments for f ) to prevent any possible ambiguities when expanding the relevant recursions, although this is of course not strictly necessary. We do note that conceptually, if the length of the time horizon is fixed to T , then the functions with superscript T (e.g. V T ) actually correspond to the cost-to-go in period 1 (not period T ), and the corresponding arguments (e.g. xT , dT ) correspond to the inventory level and demand in period 1. More generally, for s ∈ [0,T − 1], V T −s will correspond to the cost-to-go in period s + 1, and the corresponding arguments (e.g. xT −s, dT −s) correspond to the inventory level and demand in period s + 1. We have chosen our notational scheme to maximize clarity of exposition, and again note that several of the resulting simplifications do rely on the symmetries / time-homogeneities of the specific problem considered.
2 To illustrate how the relevant functions evolve, note that V (y2, µ) equals X 1 inf sup C(z2, q2) + V (z2 − q2, µ) Q2(q2), z ≥y 2 2 Q2∈M(µ) z2∈[0,U]Q q2∈[0,U]Q which itself equals X X inf sup C(z2, q2) + inf sup C(z1, q1)Q1(q1) Q2(q2). z ≥y z ≥z −q 2 2 Q2∈M(µ) 1 2 2 Q1∈M(µ) z2∈[0,U]Q q2∈[0,U]Q z1∈[0,U]Q q1∈[0,U]Q
We now review the results of Lemma 5.1 and Example 5.1.2 of Shapiro [119], which characterizes the optimal policy, value, and worst-case distribution (against an optimal policy) for Problem 2.1, and relates the solution to DP formulation (2.2)-(2.3). Recall that a policy π ∈ Π is said to be a ∗ π π ∗ base-stock policy if there exist constants {xt , t ∈ [1,T ]}, s.t. xt = max {yt , xt } for all t ∈ [1,T ]. Let us define ( U ∆ 0 if µ ≤ b+1 , χIND(µ, U, b) = U U if µ > b+1 ; ( U T ∆ T bµ if µ ≤ b+1 , OptIND(µ, U, b) = U T (U − µ) if µ > b+1 . µ µ Let Dµ ∈ M(µ) be the probability measure s.t. Dµ(0) = 1 − U , Dµ(U) = U . Then the following is proven in Shapiro [119]. + Theorem 1 ([119]). For all strictly positive U, b ∈ Q ,T ≥ 1, and µ, x0 ∈ [0,U]Q, Problem 2.1 ∗ ∗ always has an optimal base-stock policy π , in which the associated base-stock constants {xt , t ∈ ∗ [1,T ]} satisfy xt = χIND(µ, U, b) for all t ∈ [1,T ]. If x0 ≤ χIND(µ, U, b), the optimal value of Prob- T lem 2.1 equals OptIND(µ, U, b); and for all x0 ∈ [0,U], the product measure s.t. all marginals are PT π∗ distributed according to law Dµ belongs to arg maxQ∈IND EQ[ t=1 Ct ]. Furthermore, the DP for- mulation (2.2)-(2.3) can be used to compute these optimal policies, distributions, and values. In T T particular, for all x0, x, µ ∈ [0,U]Q, V (x0, µ) is the optimal value of Problem 2.1, Dµ ∈ Q (x, µ), T and max x, χIND(µ, U, b) ∈ Φ (x, µ). Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 10
2.2. Martingale-demand model In this subsection, we formally define our distributionally robust martingale-demand model, and π establish some preliminary results. Let T, b, Π,Ct , M(µ) be exactly as defined for the independent- demand model in Subsection 2.1, i.e. the time horizon, backorder and holding costs, set of admissible policies, cost incurred in period t under policy π, and collection of all probability measures with support [0,U]Q and mean µ, respectively. Furthermore, let MART denote the collection of all rationally supported probability measures corresponding to discrete-time martingale sequences
(D1,...,DT ) s.t. for all t, the marginal distribution of Dt belongs to M(µ). In words, the joint distribution of demand belongs to MART iff the sequence of demands is a martingale, and the demand in each period has support a subset of [0,U]Q and mean µ. When there is no ambiguity, we will suppress the dependence on T , simply writing MAR. Note that by the martingale property, the conditional distribution of Dt, conditioned on D[t−1], belongs to M(Dt−1). Then in analogy PT π with Problem 2.1, the optimization problem of interest is infπ∈Πˆ supQ∈MAR EQ[ t=1 Ct ], which by Lemma1 is equivalent to T X π inf sup EQ[ Ct ]. (2.4) π∈Π Q∈MAR t=1 In analogy with the DP formulation (2.2)-(2.3) given for the independent-demand model, we now define an analogous formulation for the martingale-demand setting, and will later prove that this DP formulation indeed corresponds (in an appropriate sense) to Problem 2.4.
ˆ1 ∆ 1 ∆ X ˆ1 f (x1, d1) = C(x1, d1) , gˆ (x1, µ1) = sup f (x1, q1)Q1(q1), Q1∈M(µ1) q1∈[0,U]Q ˆ 1 ∆ 1 ˆ1 ∆ X ˆ1 V (y1, µ1) = inf gˆ (z1, µ1) , Q (x1, µ1) = arg max f (x1, q1)Q1(q1), z ≥y 1 1 Q1∈M(µ1) (2.5) z1∈[0,U]Q q1∈[0,U]Q ˆ 1 ∆ 1 Φ (y1, µ1) = arg min gˆ (z1, µ1); z1≥y1 z1∈[0,U]Q and for T ≥ 2,
ˆT ∆ ˆ T −1 T ∆ X ˆT f (xT , dT ) = C(xT , dT ) + V (xT − dT , dT ) , gˆ (xT , µT ) = sup f (xT , qT )QT (qT ), QT ∈M(µT ) qT ∈[0,U]Q ˆ T ∆ T ˆT ∆ X ˆT V (yT , µT ) = inf gˆ (zT , µT ) , Q (xT , µT ) = arg max f (xT , qT )QT (qT ), z ≥y T T QT ∈M(µT ) zT ∈[0,U]Q qT ∈[0,U]Q ˆ T ∆ T Φ (yT , µT ) = arg min gˆ (zT , µT ). zT ≥yT zT ∈[0,U]Q (2.6) T T −1 Note that in the definition of f (xT , dT ) in (2.2)-(2.3), the second argument in the function V ˆT is µ, which is a constant. By contrast, in the definition of f (xT , dT ) above, the second argument ˆ T −1 in the function V is dT , which represents the demand just realized in the past period and is not a constant.
ˆ 2 To again illustrate how the relevant functions evolve, note that V (y2, µ2) equals
X ˆ 1 inf sup C(z2, q2) + V (z2 − q2, q2) Q2(q2), z ≥y 2 2 Q2∈M(µ2) z2∈[0,U]Q q2∈[0,U]Q Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 11 which itself equals
X X inf sup C(z2, q2) + inf sup C(z1, q1)Q1(q1) Q2(q2). z ≥y z ≥z −q 2 2 Q2∈M(µ2) 1 2 2 Q1∈M(q2) z2∈[0,U]Q q2∈[0,U]Q z1∈[0,U]Q q1∈[0,U]Q We now prove that the DP 2.5- 2.6 indeed corresponds to Problem 2.4, and begin with several Q Q Q additional definitions. Recall that given a probability measure Q ∈ MART , D = (D1 ,...,DT ) denotes a random vector distributed as Q. Given t1 ≤ t2 ∈ [1,T ], we let Q[t1,t2] denote the probabil- Q Q ity measure induced by Q on (Dt1 ,...,Dt2 ), letting Q[t2] denote Q[1,t2], and Qt2 denote Q[t2,t2]. As a notational convenience, we define Q0 to be the probability measure which assigns µ probability one. n m Thus supp(Q0) = {µ}. Also, we sometimes let µ0 denote µ. Given two finite vectors x ∈ R , y ∈ R , n+m we let x : y ∈ R denote the concatenation of x and y. Namely, (x : y)k = xk for k ∈ [1, n], and (x : y)k = yk−n for k ∈ [n + 1, n + m]. Given Q ∈ MART , t ∈ [1,T ] and q ∈ supp(Q[t−1]), let Q[t](q:d) Qt|q denote the unique probability measure s.t. for all d ∈ [0,U]Q, Qt|q(d) = . Equivalently, Q[t−1](q) Q Q Qt|q(d) = P Dt = d D[t−1] = q . Note that by the martingale property, Qt|q ∈ M(qt−1).
Before precisely stating the desired result, we will need some preliminary results regarding the DP 2.5- 2.6, which assert the non-emptiness of various sets, as well as a certain axiom-of-choice type result asserting the ability to appropriately select elements from various sets. ˆt ˆ t Lemma 2. For all t ∈ [1,T ], and x, d ∈ [0,U]Q, Q (x, d) =6 ∅, Φ (x, d) =6 ∅. Furthermore, it is t ˆt possible to assign to every t ∈ [1,T ], and x, d ∈ [0,U]Q, a measure Qx,d ∈ Q (x, d); and rational t ˆ t number Φx,d ∈ Φ (x, d). Although we temporarily defer the proof of Lemma2, as the lemma will follow from the proof t t of our main result in which we explicitly construct such Qx,d, Φx,d, suppose for now that such an t t assignment Qx,d, Φx,d has been made in an arbitrary yet fixed manner. We now use these quantities to define a corresponding policy and distribution, which will correspond to the optimal policy, and ∗ ∗ ∗ worst-case distribution at optimality, for Problem 2.4. Letπ ˆ = (ˆx1,..., xˆT ) ∈ Π denote the follow- T ing policy, which we define inductively.x ˆ∗ = Φ . Now, suppose we have completely specifiedx ˆ∗ 1 x0,µ s ∗ ∆ πˆ∗ for all s ∈ [1, t] for some t ∈ [1,T − 1], and thus have also specifiedy ˆs = ys for all s ∈ [1, t+ 1]. Then ∗ t ∗ T −t we definex ˆ as follows. For all d ∈ [0,U] ,x ˆ (d) = Φ ∗ . It may be easily verified from the t+1 Q t+1 yˆt+1(d),dt ˆ ˆ∗ definition of Φ that this defines a valid policy belonging to Π. Similarly, let Q ∈ MART denote the following probability measure, which we define inductively through its conditional distributions. T Qˆ∗ = Q . Now, suppose we have completely specified Qˆ∗ . Then we specify Qˆ∗ as follows. For 1 x0,µ [t] [t+1] ∗ ∗ T −t all d ∈ supp(Qˆ ), Qˆ = Q πˆ∗ . Due to its countable support, it may be easily verified that [t] t+1|d xt+1(d),dt ˆ∗ this uniquely and completely specifies a measure Q ∈ MART . As a notational convenience, we let ∗ ˆ ˆ ∗ ˆ ∗ ˆ∗ D = (D1 ,..., DT ) denote a vector distributed as Q .
Then we have the following theorem, in analogy with Theorem1, whose proof we defer to the Technical Appendix in Section7.
+ ∗ Theorem 2. For all strictly positive U, b ∈ Q ,T ≥ 1, and µ, x0 ∈ [0,U]Q, πˆ is an optimal policy ˆ∗ PT πˆ∗ ˆ T for Problem 2.4, and Q ∈ arg maxQ∈MAR EQ[ t=1 Ct ]. Furthermore, V (x0, µ) is the optimal value of Problem 2.4.
Note that in light of Lemma2, Theorem2 implies that computing an optimal policy (and associated worst-case distribution at optimality) for Problem 2.4 is equivalent to solving the DP 2.5- 2.6, and Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 12 computing the relevant minimizers and maximizers. Furthermore, in light of Theorem2, solving Problem 2.4 is thus reduced from a minimax problem to purely a maximization problem (of the expected performance of the policyπ ˆ∗), over a restricted family of martingales. This puts the problem within the general framework of Beigblock and Nutz [12], in which the authors develop a general methodology, based on concave envelopes, to compute optimal martingale inequalities. Indeed, our resolution of the DP 2.5- 2.6 can be viewed as an explicit execution of this methodology, which exploits the special structure ofπ ˆ∗. It is also interesting to note that Theorem2 asserts that one can w.l.o.g. restrict to policies and distributions which are Markovian when one considers the three-dimensional state-space consisting of both the post-ordering inventory level and demand in the previous period, as well as a dimension tracking the current time period, even though in principle both the policy and distribution could have exhibited a more complicated form of history dependence.
2.3. Main results We now state our main results. We begin by introducing some additional definitions and notations. Whenever possible, we will suppress dependence on the parameters µ, U, b for simplicity. As a notational convenience, let us define all empty products to equal unity, and all empty sums to equal zero. For an event A, let I(A) denote the corresponding indicator. For j ∈ [−1,T ], let
( QT −1 k T ∆ U k=j+1 b+k if j ≤ T − 1, Aj = b+T T U if j = T ; and j BT =∆ AT . j b + T j T T T T T T T Note that A−1 = B−1 = B0 = 0, AT −1 = BT = U, and both {Aj , j ∈ [−1,T ]} and {Bj , j ∈ [−1,T ]} T are increasing in j, and decreasing in T . For x, µ ∈ [0,U], let qx,µ be the unique (up to sets of measure zero) probability measure s.t.
AT −µ µ−AT qT (AT ) = j+1 , qT (AT ) = j if µ ∈ (AT ,AT ] , x ∈ [0,BT ); x,µ j AT −AT x,µ j+1 AT −AT j j+1 j+1 j+1 j j+1 j qT (0) = 1 − µ , qT (AT ) = µ if µ ∈ (AT ,AT ] , x ∈ [BT ,BT ) , k ≥ j + 1; x,µ AT x,µ k AT j j+1 k k+1 k k T µ T µ qx,µ(0) = 1 − U , qx,µ(U) = U if µ = 0 or x = U. (2.7)
Also, for j ∈ [0,T ], let T ∆ b + T Gj (x, µ) = (T − T µ)x + (T − j)bµ; Aj in which case we define ( T ∆ 0 if µ = 0, Γµ = T +1 T +1 j + 1 if µ ∈ (Aj ,Aj+1 ] , j ∈ [−1,T − 1];
T T T and note that µ ∈ (A T −1 ,A T −1 ] for all T ≥ 2 and µ ∈ (0,U], while A T −1 = 0. We also define Γµ −1 Γµ Γ0 −1
T ∆ T ∆ T T ∆ T T χ (µ, U, b) = β = B T , Opt (µ, U, b) = G T (β , µ). MAR µ Γµ MAR Γµ µ Then an explicit solution to the DP 2.5- 2.6, and thus to Problem 2.4, is given as follows. Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 13