Distributionally Robust Inventory Control When Demand Is a Martingale

Distributionally robust inventory control when demand is a martingale

Linwei Xin University of Chicago, [email protected], http://faculty.chicagobooth.edu/linwei.xin/ David A. Goldberg Cornell University, [email protected], https://people.orie.cornell.edu/dag369/

Demand forecasting plays an important role in many inventory control problems. To mitigate the potential harms of model misspecification in this context, various forms of distributionally robust optimization have been applied. Although many of these methodologies suffer from the problem of time-inconsistency, the work of Klabjan, Simchi-Levi and Song [85] established a general time-consistent framework for such problems by connecting to the literature on robust Markov decision processes. Motivated by the fact that many forecasting models exhibit very special structure, as well as a desire to understand the impact of positing different dependency structures, in this paper we formulate and solve a time-consistent distributionally robust multi-stage newsvendor model which naturally unifies and robustifies several inventory models with demand forecasting. In particular, many simple models of demand forecasting have the feature that demand evolves as a martingale (i.e. expected demand tomorrow equals realized demand today). We consider a robust variant of such models, in which the sequence of future demands may be any martingale with given mean and support. Under such a model, past realizations of demand are naturally incorporated into the structure of the uncertainty set going forwards. We explicitly compute the minimax optimal policy (and worst-case distribution) in closed form, by combining ideas from convex analysis, probability, and dynamic programming. We prove that at optimality the worst-case demand distribution corresponds to the setting in which inventory may become obsolete at a random time, a scenario of practical interest. To gain further insight, we prove weak convergence (as the time horizon grows large) to a simple and intuitive process. We also compare to the analogous setting in which demand is independent across periods (analyzed previously in Shapiro [119]), and identify interesting differences between these models, in the spirit of the price of correlations studied in Agrawal et al. [2]. Finally, we complement our theoretical results by providing a targeted and concise numerical experiment further demonstrating the benefits of our model. Key words : inventory control, distributionally robust optimization, martingale, dynamic programming, robust Markov decision process, demand forecasting

1. Introduction and literature review 1.1. Introduction The fundamental problem of managing an inventory over time in the presence of stochastic demand is one of the core problems of Operations Research, and related models have been studied since at least the seminal work of Edgeworth [48]. In many practical settings of interest, demands are correlated over time (cf. Scarf [114, 115], Iglehart and Karlin [77]). As a result, there is a vast literature investigating inventory models with correlated demand, including: studies of the so-called bull-whip eﬀect (cf. Ryan [112], Chen et al. [36], Lee et al. [87]); models with Markov-modulated demand (cf. Feldman [52], Iglehart and Karlin [77], Kalymon [83]); models with forecasting, including models in which demand follows an auto-regressive / moving average (ARMA) or exponentially smoothed process (cf. Johnson and Thompson [82], Lovejoy [89], Blinder [28], Pindyck [104], Miller [96], Badinelli [6], Gallego and Ozer [58], Levi et al. [88], Chao [35], Boute, Disney, and Van Mieghem [31]); and models obeying the Martingale Model of Forecast Evolution (MMFE) and its

1 Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 2 many generalizations (cf. Heath and Jackson [71], Graves et al. [65], Toktay and Wein [127], Erkip et al. [50], Iida and Zipkin [78], Lu, Song and Regan [90], Milner and Kouvelis [97]). Although several of these works offer insights into the qualitative impact of correlations on the optimal policy (and associated costs) when managing an inventory over time, these results are typically proven under very particular distributional assumptions, which assume perfect knowledge of all relevant distributions. This is potentially a significant problem, since various authors have previously noted that model misspecification when demand is correlated can lead to very sub-optimal policies (cf. Badinelli [6]). Indeed, the use of such time series and forecasting models in Operations Research practice is well-documented, and concerns over the practical impact of model misspecification have been raised repeatedly in the forecasting and Operations Research literature (cf. Fildes [53], Fildes et al. [54]). One approach taken in the literature to correcting for such model uncertainty is so-called distributionally robust optimization. In this framework, one assumes that the joint distribution (over time) of the sequence of future demands belongs to some set of joint distributions, and solves the minimax problem of computing the control policy which is optimal against a worst-case distribution belonging to this set. Such a distributionally robust approach is motivated by the reality that perfect knowledge of the exact distribution of a given random process is rarely available (cf. Dupacová[46, 47], Prékopa [106], Záˇcková[ˇ 138]). A typical minimax formulation is as below:

min max Q [f(x, ξ)] , x∈X Q∈M E where X is a set of decisions and M is a set of probability measures. The objective is to pick a decision x that minimizes the average cost of f under a worst-case distribution. The application of such distributionally robust approaches to the class of inventory control problems was pioneered in Scarf [113], where it was assumed that M contains all probability measures whose associated distributions satisfy certain moment constraints. Such an approach has been taken to many variants of the single-stage model since then (cf. Gallego and Moon [60], Gallego [57], Perakis and Roels [102], Gallego and Moon [61], Yue et al. [137], Hanasusanto et al. [68], Zhu et al. [140]), and the single-stage distributionally robust model is quite broadly understood. The analogous questions become more subtle in the multi-stage setting, due to questions regarding the specification of uncertainty in the underlying joint distribution. There have been two approaches taken in the literature, depending on whether the underlying optimization model is static or dynamic in nature. In a static formulation, one specifies a family of joint distributions for demand over time, typically by e.g. fixing various moments and supports, or positing that the distribution belongs to some appropriately defined ball, and then solves an associated global minimax optimization problem (cf. Delage and Ye [43], See and Sim [117], Popescu [105], Bertsimas et al. [18], Dupacová[46, 47], Bertsimas, Gupta and Kallus [20]). Such static formulations generally cannot be decomposed and solved by dynamic programming (DP), because the distributional constraints do not contain sufficient information about how the distribution behaves under conditioning. Put another way, such models generally do not allow for the incorporation of realized demand information into model uncertainty / robustness going forwards (i.e. re-optimization in real time), and are generally referred to as time-inconsistent in the literature (cf. Ahmed, Cakmak and Shapiro [4], Iancu, Petrik and Subramanian [74], Xin, Goldberg and Shapiro [136], Epstein and Schneider [49]). This inability to incorporate realized demand information may make such approaches undesirable for analyzing models which explicitly consider the forecasting of future demand (cf. Klabjan, Simchi-Levi and Song [85], Iancu, Petrik and Subramanian [74], Shapiro [118]). We note that although some of these static formulations have been able to model settings in which information is revealed over time and/or temporal dependencies manifest through time series (cf. See and Sim [117], Lam [86], Gupta [67]), the fundamental inability to incorporate realized demand (in the sense of time in-consistency) remains. Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 3

Alternatively, in a dynamic formulation, the underlying family of potential joint distributions must admit a certain decomposition which ensures a desirable behavior under conditioning, and a resolution by DP. The existence of such a decomposition is generally referred to as the rectangularity property (cf. Iyengar [80], Iancu, Petrik and Subramanian [74], Epstein and Schneider

[49], Shapiro [120]). For example, the set of all joint distributions for the vector of demands (D1,D2) such that (s.t.): E[D1] = 1, E[D2|D1] = D1, and (D1,D2) has support on the non-negative integers is rectangular, since the feasible set of joint distributions may be decomposed as follows. To each possible realized value d for D1 (i.e. each non-negative integer), we may associate a fixed collection S(d) of possible conditional distributions for D2 (i.e. those distributions with mean d and support on the non-negative integers). Furthermore, every feasible distribution for the vector (D1,D2) may be constructed by first selecting a feasible distribution D for D1 (i.e. any distribution with mean one and support on the non-negative integers), and for each element d in the support of D, setting the conditional distribution of D2 (given {D1 = d}) to be any fixed element of S(d). Moreover, the set of joint distributions constructible in this manner is precisely the set of feasible distributions for the vector (D1,D2). Alternatively, if we had instead required that E[D1] = E[D2] = 1, and E[D1D2] = 2 (with the same support constraint), the corresponding set of feasible joint distributions would not be rectangular, as it may be verified that such a decomposition is no longer possible. For a formal definition and more complete / precise description of the rectangularity property, we refer the reader to cf. Iyengar [80], Iancu, Petrik and Subramanian [74], Epstein and Schneider [49], Xin, Goldberg and Shapiro [136], Nilim and El Ghaoui [99], Bielecki et al. [14], Shapiro [120], and note that since various communities have studied several closely related notions at different times, a complete consensus on a common rigorous definition has not yet been reached. As dis- cussed in Nilim and El Ghaoui [99], we note that such dynamic formulations are closely related to the theory of stochastic games (cf. Altman et al. [5]), and have also been studied within the mathematical finance community (cf. Bayraktar et al. [11], Bayraktar and Yao [10], Glasserman and Xu [63], Cheng and Riedel [38], Föllmerand Leukert [55], Nishimura and Ozaki [100], Riedel [109], Trojanowska and Kort [128]). There are several works which formulate DP approaches to distributionally robust inventory models (cf. Gallego [59], Ahmed, Cakmak and Shapiro [4], Choi and Ruszczynski [39], Shapiro [119], Xin, Goldberg and Shapiro [136]). More generally, such dynamic problems can typically be formulated as so-called robust Markov decision processes (MDP) (cf. Iyengar [80], Nilim and El Ghaoui [99], Wiesemann, Kuhn and Rustem [135]), akin to robust formulations considered previously within the control community (cf. Hansen and Sargent [69]). To the best of our knowledge, the only such work which considers applications to correlated demands and forecasting models within Operations Research is that of Klabjan, Simchi-Levi and Song [85]. In that work, the authors make the connection between dynamic distributionally robust inventory models and robust MDP, in a very general sense, and prove e.g. the optimality of state-dependent (s, S) policies, where the state incorporates the entire history of demand. The authors also consider a more specialized model in which the demand in period t may come from any distribution within a certain ball of the empirical histogram of all past demands. We note that many simple forecasting models common in the Operations Research and statistics literature (e.g. linear time series) have considerable special structure, and to the best of our knowledge precisely how this structure would manifest within the general framework of Klabjan, Simchi-Levi and Song [85] (and associated general framework of robust MDP) remains an open question. On a related note, to the best of our knowledge, there seems to have been no systematic study of the impact of positing different joint dependency structures in such multi-stage distributionally robust inventory control problems, e.g. comparing the properties and cost of a minimax optimal policy. The quest to develop such an understanding in the broader context of stochastic optimization (not specifically inventory control) was initiated in Agrawal et al. [2], where the authors define Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 4 the so-called price of correlations as the ratio between the optimal minimax value when all associated random variables (r.v.) are independent, and the setting where they may take any joint distribution belonging to an allowed family. Although the authors do not look specifically at any inventory problems, they stress the general importance of understanding how positing different joint distribution uncertainty impacts the underlying stochastic optimization. Also, although several authors have demonstrated that the family of non-stationary history-dependent s-S policies is optimal for a broad class of minimax inventory problems (cf. Ahmed, Cakmak and Shapiro [4], Not- zon [101], Klabjan, Simchi-Levi and Song [85]), the impact of the posited dependency structure on the corresponding s-S policy remains poorly understood. We note that although these questions remain open, the general theory of sensitivity analysis for (distributionally robust) optimization problems is very well developed (cf. Bonans and Shapiro [29]), and that different yet related questions are the subject of several recent investigations (cf. Gupta [67]). Combining the above, we are led to the following question.

Question 1. Can we construct eﬀective dynamic distributionally robust variants of the structured time series and forecasting models common in the Operations Research and statistics literature? Furthermore, can we develop a theory of how positing diﬀerent dependency structures impacts the optimal policy and cost for such models?

1.2. Additional literature review Highly relevant to any discussion of distributionally robust optimization (either static or dynamic) is the vast literature on classical (i.e. deterministic) robust optimization, in which the only constraints made are on the supports of the associated r.v.s (cf. Ben-Tal, El Ghaoui and Nemirovski [15], Bertsimas, Brown and Caramanis [19]). In this setting, the worst-case distribution is always degenerate, namely a point-mass on a particular worst-case trajectory. Such models often lead to tractable global optimization problems for fairly complex models, and have been successfully applied to several inventory settings (cf. Kasugai and Kasegai [84], Aharon et al. [3], Ben-Tal et al. [17], Bertsimas, Iancu and Parrilo [22], Bertsimas and Thiele [25], Ben-Tal et al. [16], Carrizosaa et al. [33], Solyali et al. [122], Sun and Van Mieghem [126], Mamani, Nassiri and Wagner [92]). Indeed, these models tend to be more tractable than corresponding distributionally robust formulations, where the question of tractability (under both static and dynamic formulations) may be more sensitive to the particular modeling assumptions (cf. See and Sim [117], Bertsimas, Sim, and Zhang [24], Wiesemann, Kuhn and Rustem [135], Delage and Ye [43]). In spite of their potential computational advantages, one drawback of such classical robust approaches is that they may be overly conservative, and unable to capture the stochastic nature of many real-world problems (cf. See and Sim [117]). We note that the precise relationship between classical robust and distributionally robust approaches remains an intriguing open question. On a related note, questions of robust time series models and their applications to inventory models, similar in spirit to those we will consider in the present work, were recently studied in Carrizosaa et al. [33], at least for single-stage models. There the authors formulated a general notion of robust time series, and prove that the corresponding single-stage minimax newsvendor problem can be formulated as a convex optimization problem by using a budget-of-uncertainty approach similar to that studied in [7]. Such an approach has also been used to study robust versions of multi-period inventory models in which demand is correlated (cf. [92]). As regards connections to our earlier discussion of static and dynamic formulations in the distributionally robust setting, we note that the potential modeling benefits of being able to dynamically update one’s uncertainty sets has also been considered in the deterministic robust setting (cf. Bertsimas and Vayanos [26], Wagner [132]). Indeed, an approach to robust multi-period inventory problems based on deterministic robust optimization / the budget- of-uncertainty approach and optimal control was recently presented in [132], where we note that Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 5 their approach and findings are in a sense incomparable to our own. Also relevant to questions involving distributionally robust optimization is the search for “optimal” probability bounds. In particular, some of the most celebrated results in probability theory are of the form “any random vector with property A has probability at most p of belonging to set S” for appropriate choices of A,p,S. That obtaining tight versions of these bounds can be phrased as an appropriate distributionally robust optimization problem is by now well-known within the Operations Research community, especially since this connection was clarified in the seminal work of Bertsimas and Popescu [23]. Such problems, e.g. those considered in Bertsimas and Popescu [23], typically involve formulating a static optimization problem in which one specifies knowledge or bounds on certain moments, covariances, supports, etc., and frames finding a “worst-case” distribution satisfying these conditions as a convex optimization problem. Recent work of Beigblock and Nutz [12] has brought to light the fact that certain probability inequalities, specifically well-known inequalities for martingales such as the celebrated Burkholder inequality, can be similarly formulated using distributionally robust optimization, but are more amenable to a dynamic formulation, due to the fundamentally conditional structure of the martingale property. Another related line of work is that on so-called optimal transport problems, especially that subset focusing on so-called optimal martingale transport, and we refer the interested reader to Dolinsky and Soner [44] and Biegblock et al. [13] for details and additional references. We close this overview by noting that there is a vast literature on time series in the statistics community (cf. Box, Jenkins, and Reinsel [30]), and we make no attempt to survey that literature here. These investigations include considerable work on robust approaches (cf. Maronna et al. [94]), where different lines of research use different notions of robustness (cf. Stockinger and Dutter [125]). Most relevant to our own investigations are the many such works related to distributional misspecification, which has been investigated within the statistics community for over 50 years (cf. Tukey [129], Huber [73], Stockinger and Dutter [125], Medina and Ronchetti [95]), and includes the analysis of minimax notions of robustness (cf. Franke [56], Luz and Moklyachuk [91]) and so-called robust Kalman filters (cf. Ruckdeschel [111]). In spite of this vast literature, it is worth noting that even recently, there have been several calls for the development of more tools and methodologies for the robust analysis of time series data. For example, in De Gooijer and R. Hyndman [42], it is noted that a persistent problem in the application of time series models is “the frequent misuse of methods based on linear models with Gaussian i.i.d. distributed errors.” A similar commentary was given very recently in Guerrier et al. [66], in which the authors note that “the robust estimation of time series parameters is still a widely open topic in statistics for various reasons.” Such calls for the development of new methodologies further serve to under- score the potential impact of combining ideas from robust optimization with time series analysis, e.g. through Question1.

1.3. Our contributions In this paper we take a step towards answering Question1 by formulating and solving a dynamic distributionally robust multi-stage newsvendor model which naturally unifies the analysis of several inventory models with demand forecasting. In particular, we consider a dynamic distributionally robust multi-stage inventory control problem (with backlogged demand) in which the sequence of future demands may be any martingale with given mean and support (assumed the same in every period). Note that the martingale demand assumption has been widely used in many demand forecasting models in the literature, including the additive-MMFE in which the demand process is given by Dt − Dt−1 = t where {t}t≥1 is a sequence of i.i.d. r.v.s with mean zero; and the Dt multiplicative-MMFE in which the demand process is given by = exp (ηt) where {ηt}t≥1 is a Dt−1 sequence of i.i.d. r.v.s with E [exp (η1)] = 1. Here we refer the interested reader to Wang, Atasu and Kurtulus [133] for further discussion of such models and their study in the inventory literature. Our Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 6 contributions are four-fold. First, we explicitly compute the minimax optimal policy (and associated worst-case distribution) in closed form. Our main proof technique involves a non-trivial induction, combining ideas from convex analysis, probability, and DP. Second, we prove that at optimality the worst-case demand distribution corresponds to the setting in which the inventory may become obsolete at a random time, a scenario of practical interest which has been studied previously in the literature (cf. Pierskalla [103], Brown [32], Joglekar and Lee [81], Song and Zipkin [124], Cerag [34]). Third, to gain further insight into our explicit solution, we prove weak convergence (as the time horizon grows large) to a simple and intuitive process. Fourth, in the spirit of Agrawal et al. [2], we compare to the analogous setting in which demand is independent across periods (analyzed previously in Shapiro [119]), and identify interesting differences between these two models. In particular, we prove that when the initial inventory level is zero, the minimax optimal cost is maximized under the dependency structure of the independent-demand model. More precisely, the minimax optimal cost when nature is restricted to selecting a joint distribution (for demand) under which demand is independent over time (and satisfies the given support constraints) is equivalent to the minimax optimal cost when nature need only satisfy the support constraints (and may select any joint distribution whatsoever). Furthermore, we compute the limiting ratio (as the time horizon grows large) of the minimax optimal value under the martingale-demand model to that under the independent-demand model, which has a simple closed form and evaluates to exactly 1 2 in the perfectly symmetric case. Interestingly, we find that under different initial conditions, the situation may in fact be reversed, showing that such comparative statements may be quite subtle. Finally, we complement our theoretical results by providing a targeted and concise numerical experiment further demonstrating the benefits of our model. Indeed, our experiments find that when the true demand distribution has the dependency structure of an additive-MMFE model, the robust minimax-optimal policy we devise in this work significantly outperforms the previously studied robust policy which is minimax-optimal under an independence assumption.

1.4. Outline of paper Our paper proceeds as follows. We begin by reviewing the model with independent demand (analyzed in Shapiro [119]) in Section 2.1. We then introduce our distributionally robust model with martingale dependency structure in Section 2.2, and prove the validity of a certain DP formulation (very similar to the robust MDP framework of Iyengar [80]). We present our main results in Sec- tion 2.3, which include the explicit solution to our distributionally robust martingale model, the weak convergence, and a comparison between our martingale-demand model and the independent- demand model. The proofs of our main results regarding the explicit solution of the associated DP and several related structural properties are given in Section3. In Section4 we prove our weak convergence results. In Section5, we conduct a targeted and concise numerical experiment. We provide a summary of our results, directions for future research, and concluding remarks in Section6. We also provide a technical appendix, which contains the proofs of several results used throughout the paper, as well as some additional supporting results, in Section7.

2. Main results Before stating our main results, we brieﬂy review the inventory model analyzed in Shapiro [119], which we refer to as the independent-demand model.

2.1. Independent-demand model Consider the following distributionally robust inventory control problem with backlogging, ﬁnite time horizon T , strictly positive linear backorder per-unit cost b, and a per-unit holding cost of 1, where we note that assuming a holding cost equal to 1 is without loss of generality (w.l.o.g.) due to a simple scaling argument. Let Dt be the demand in period t, and xt be the inventory level Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 7 at period t after placing a non-negative order, t ∈ [1,T ], where we note that the order must be placed at the start of period t before Dt is known. For a t-dimensional vector x (potentially of real numbers, r.v.s, functions, etc.), let x[s] = (x1, . . . , xs), and x[t1,t2] = (xt1 , . . . , xt2 ). In that case, we require that xt is a measurable function of D[t−1], xt ≥ xt−1 − Dt−1 for t ∈ [2,T ], and that x1 is a real constant, where x0 ∈ [0,U] denotes the initial inventory level and x1 ≥ x0. For t ∈ [1,T ], the cost incurred in period t equals

∆ Ct = b[Dt − xt]+ + [xt − Dt]+,

∆ ∆ where z+ = max(z, 0), z− = max(−z, 0), z ∈ R. As a notational convenience, we also define ∆ C(x, d) = b(d − x)+ + (x − d)+. As in Iyengar [80], to avoid later possible concerns about measurability, throughout we make the minor technical assumption that all relevant demand distributions have rational support, and all √policies π considered are restricted to order rational quantities (i.e. not irrational quantities such as 2), and the relevant problem primitives x0, µ, U, b (corresponding to initial inventory level, mean demand, bound on support of demand, and backorder penalty respectively) also belong to Q+, with + x0, µ ∈ [0,U], where Q is the set of all non-negative rational numbers. Under this assumption, it may be easily verified (due to the finiteness and non-negativity of all relevant cost functions) that all relevant functions arising in the various DP which we shall later define are sufficiently regular, e.g. measurable, integrable, and finite-valued. For clarity of exposition, and in light of this assumption, we do not dwell further on related questions of measurability, and instead refer the interested reader to the excellent reference Puterman [107] for further discussion. We now formally and more precisely define the relevant notion of admissible policy in our setting. As our analysis will be restricted to the setting in which there is a uniform upper bound on the support of demand (denoted U), for simplicity we will incorporate U directly into several of our ∆ T definitions. Let [0,U]Q = [0,U] Q, where Q is the set of all rational numbers. Then we define an π π admissible policy π to be a T -dimensional vector of functions {xt , t ∈ [1,T ]}, s.t. x1 is a constant π π ∆ π ∆ π t−1 satisfying x1 ≥ x0 = y1 = x0; and for t ∈ [2,T ], xt is a deterministic map from [0,U]Q to Q, s.t. for t π π ∆ π t ∈ [1,T − 1] and d = (d1, . . . , dt) ∈ [0,U]Q, xt+1(d) ≥ yt+1(d) = xt (d[t−1]) − dt, which is equivalent to requiring that a non-negative amount of inventory is ordered in each period. Sometimes as a π π π notational convenience, we will denote x1 by x1 (d[0]), or more generally as x1 (q) for q ∈ [0,U]Q. Let Πˆ denote the family of all such admissible policies. Note that once a particular policy π ∈ Πˆ π is specified, the associated costs {Ct , t ∈ [1,T ]} are explicit functions of the vector of demands, π π and we sometimes make this dependence explicit with the notation Ct (D[t]), where Ct (D[t]) = π C(xt (D[t−1]),Dt). Since we will exclusively consider settings in which the demand in every period belongs to [0,U], and the initial inventory belongs to [0,U], it will be convenient to prove that in such a setting one may w.l.o.g. restrict to policies which always order up to a level in [0,U]. In particular, let ˆ π Π denote that subset of Π consisting of policies π s.t. xt (d[t−1]) ∈ [0,U] for all t ∈ [1,T ] and T d ∈ [0,U]Q whenever x0 ∈ [0,U]Q (which again is to be assumed throughout). Then the following lemma provides the desired result, where we defer the relevant proof to the Technical Appendix in Section7. ˆ T Lemma 1. For every πˆ ∈ Π, there exists π ∈ Π s.t. for all d = (d1, . . . , dT ) ∈ [0,U]Q and t ∈ [1,T ], π πˆ Ct (d[t]) ≤ Ct (d[t]). Let us say that a probability measure Q on an appropriate measurable space is rationally supported if it places probability 1 on t-dimensional rational vectors for some finite t, where we note that all measures which we consider will have this property. For such a measure Q, let supp(Q) (i.e. the support of Q) denote the (possibly infinite) set of rational vectors which Q assigns strictly Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 8 positive probability, and DQ denote a r.v. (or vector if supp(Q) is not one-dimensional) distributed as Q. For µ ∈ [0,U]Q, let M(µ) be the collection of all rationally supported probability measures Q Q s.t. supp(Q) ⊆ [0,U]Q, and E[D ] = µ. Furthermore, let INDT denote the collection of all T - dimensional product measures s.t. all T marginal distributions belong to M(µ). When there is no ambiguity, we will suppress the dependence on T , simply writing IND. In words, the joint distribution of demand belongs to IND iff the demand is independent across time periods, and the demand in each period has support a subset of [0,U]Q and mean µ. Then the particular optimiza- PT π tion problem considered in Shapiro [119] Example 5.1.2 is infπ∈Πˆ supQ∈IND EQ[ t=1 Ct ], which by Lemma1 is equivalent to

T X π inf sup EQ[ Ct ]. (2.1) π∈Π Q∈IND t=1

We note that although the exact problem considered in Shapiro [119] does not explicitly restrict the family of policies and demand distributions to Q, it may be easily verified that the same results go through in this setting (under our assumptions on the rationality of the relevant problem parameters). In Example 5.1.2 of Shapiro [119], the author proves that due to certain structural properties, Problem (2.1) can be reformulated as a DP, as we now describe. We note that although the family of measures IND does not possess the so-called rectangularity property, here the decomposition follows from the fact that convexity and independence imply that irregardless of the inventory level and past realizations of demand, a worst-case demand distribution always has support {0,U}, where the precise distribution is then dictated by the requirement of having expectation µ. To formally define the relevant DP, we now define a sequence of functions {V t, t ≥ 1}, {f t, t ≥ 1}, {gt, t ≥ 1}. In general, such a DP would be phrased in terms of the so-called “cost-to-go” functions, with the t-th such function representing the remaining cost incurred by an optimal policy during periods T − t + 1 through T (i.e., if there are t periods to go), subject to the given state at time t. Here and throughout, we use the fact that the backorder and holding costs are the same in every period to simplify the relevant concepts and notations. In particular, due to this symmetry and the associated self-reducibility which it induces, for all T ≥ 1, it will suffice to define the aforementioned cost-to-go function for the first time period only. Indeed, in the following function definitions, V T (y, µ) will coincide with the optimal value of Problem (2.1) when the initial inventory level is y and the associated mean is µ (we leave the dependence on U implicit), where f T , gT have analogous interpretations. We note that to prevent later potential confusion and ambiguity regarding the order of operations such as supremum and expectation, we explicitly write out certain expectations as relevant sums over appropriate domains. Then the relevant DP is as follows.

1 ∆ 1 ∆ X 1 f (x1, d1) = C(x1, d1) , g (x1, µ) = sup f (x1, q1)Q1(q1), Q1∈M(µ) q1∈[0,U]Q 1 ∆ 1 1 ∆ X 1 V (y1, µ) = inf g (z1, µ) ,Q (x1, µ) = arg max f (x1, q1)Q1(q1), z ≥y 1 1 Q1∈M(µ) (2.2) z1∈[0,U]Q q1∈[0,U]Q 1 ∆ 1 Φ (y1, µ) = arg min g (z1, µ); z1≥y1 z1∈[0,U]Q Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 9 and for T ≥ 2,

T ∆ T −1 T ∆ X T f (xT , dT ) = C(xT , dT ) + V (xT − dT , µ) , g (xT , µ) = sup f (xT , qT )QT (qT ), QT ∈M(µ) qT ∈[0,U]Q T ∆ T T ∆ X T V (yT , µ) = inf g (zT , µ) ,Q (xT , µ) = arg max f (xT , qT )QT (qT ), z ≥y T T QT ∈M(µ) zT ∈[0,U]Q qT ∈[0,U]Q T ∆ T Φ (yT , µ) = arg min g (zT , µ). zT ≥yT zT ∈[0,U]Q (2.3) We have given the argument of the relevant functions with superscript T the subscript T (e.g. T xT , dT are the arguments for f ) to prevent any possible ambiguities when expanding the relevant recursions, although this is of course not strictly necessary. We do note that conceptually, if the length of the time horizon is fixed to T , then the functions with superscript T (e.g. V T ) actually correspond to the cost-to-go in period 1 (not period T ), and the corresponding arguments (e.g. xT , dT ) correspond to the inventory level and demand in period 1. More generally, for s ∈ [0,T − 1], V T −s will correspond to the cost-to-go in period s + 1, and the corresponding arguments (e.g. xT −s, dT −s) correspond to the inventory level and demand in period s + 1. We have chosen our notational scheme to maximize clarity of exposition, and again note that several of the resulting simplifications do rely on the symmetries / time-homogeneities of the specific problem considered.

2 To illustrate how the relevant functions evolve, note that V (y2, µ) equals X 1 inf sup C(z2, q2) + V (z2 − q2, µ) Q2(q2), z ≥y 2 2 Q2∈M(µ) z2∈[0,U]Q q2∈[0,U]Q which itself equals X X inf sup C(z2, q2) + inf sup C(z1, q1)Q1(q1) Q2(q2). z ≥y z ≥z −q 2 2 Q2∈M(µ) 1 2 2 Q1∈M(µ) z2∈[0,U]Q q2∈[0,U]Q z1∈[0,U]Q q1∈[0,U]Q

We now review the results of Lemma 5.1 and Example 5.1.2 of Shapiro [119], which characterizes the optimal policy, value, and worst-case distribution (against an optimal policy) for Problem 2.1, and relates the solution to DP formulation (2.2)-(2.3). Recall that a policy π ∈ Π is said to be a ∗ π π ∗ base-stock policy if there exist constants {xt , t ∈ [1,T ]}, s.t. xt = max {yt , xt } for all t ∈ [1,T ]. Let us deﬁne ( U ∆ 0 if µ ≤ b+1 , χIND(µ, U, b) = U U if µ > b+1 ; ( U T ∆ T bµ if µ ≤ b+1 , OptIND(µ, U, b) = U T (U − µ) if µ > b+1 . µ µ Let Dµ ∈ M(µ) be the probability measure s.t. Dµ(0) = 1 − U , Dµ(U) = U . Then the following is proven in Shapiro [119]. + Theorem 1 ([119]). For all strictly positive U, b ∈ Q ,T ≥ 1, and µ, x0 ∈ [0,U]Q, Problem 2.1 ∗ ∗ always has an optimal base-stock policy π , in which the associated base-stock constants {xt , t ∈ ∗ [1,T ]} satisfy xt = χIND(µ, U, b) for all t ∈ [1,T ]. If x0 ≤ χIND(µ, U, b), the optimal value of Prob- T lem 2.1 equals OptIND(µ, U, b); and for all x0 ∈ [0,U], the product measure s.t. all marginals are PT π∗ distributed according to law Dµ belongs to arg maxQ∈IND EQ[ t=1 Ct ]. Furthermore, the DP formulation (2.2)-(2.3) can be used to compute these optimal policies, distributions, and values. In T T particular, for all x0, x, µ ∈ [0,U]Q, V (x0, µ) is the optimal value of Problem 2.1, Dµ ∈ Q (x, µ), T and max x, χIND(µ, U, b) ∈ Φ (x, µ). Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 10

2.2. Martingale-demand model In this subsection, we formally deﬁne our distributionally robust martingale-demand model, and π establish some preliminary results. Let T, b, Π,Ct , M(µ) be exactly as deﬁned for the independent- demand model in Subsection 2.1, i.e. the time horizon, backorder and holding costs, set of admissible policies, cost incurred in period t under policy π, and collection of all probability measures with support [0,U]Q and mean µ, respectively. Furthermore, let MART denote the collection of all rationally supported probability measures corresponding to discrete-time martingale sequences

(D1,...,DT ) s.t. for all t, the marginal distribution of Dt belongs to M(µ). In words, the joint distribution of demand belongs to MART iﬀ the sequence of demands is a martingale, and the demand in each period has support a subset of [0,U]Q and mean µ. When there is no ambiguity, we will suppress the dependence on T , simply writing MAR. Note that by the martingale property, the conditional distribution of Dt, conditioned on D[t−1], belongs to M(Dt−1). Then in analogy PT π with Problem 2.1, the optimization problem of interest is infπ∈Πˆ supQ∈MAR EQ[ t=1 Ct ], which by Lemma1 is equivalent to T X π inf sup EQ[ Ct ]. (2.4) π∈Π Q∈MAR t=1 In analogy with the DP formulation (2.2)-(2.3) given for the independent-demand model, we now deﬁne an analogous formulation for the martingale-demand setting, and will later prove that this DP formulation indeed corresponds (in an appropriate sense) to Problem 2.4.

ˆ1 ∆ 1 ∆ X ˆ1 f (x1, d1) = C(x1, d1) , gˆ (x1, µ1) = sup f (x1, q1)Q1(q1), Q1∈M(µ1) q1∈[0,U]Q ˆ 1 ∆ 1 ˆ1 ∆ X ˆ1 V (y1, µ1) = inf gˆ (z1, µ1) , Q (x1, µ1) = arg max f (x1, q1)Q1(q1), z ≥y 1 1 Q1∈M(µ1) (2.5) z1∈[0,U]Q q1∈[0,U]Q ˆ 1 ∆ 1 Φ (y1, µ1) = arg min gˆ (z1, µ1); z1≥y1 z1∈[0,U]Q and for T ≥ 2,

ˆT ∆ ˆ T −1 T ∆ X ˆT f (xT , dT ) = C(xT , dT ) + V (xT − dT , dT ) , gˆ (xT , µT ) = sup f (xT , qT )QT (qT ), QT ∈M(µT ) qT ∈[0,U]Q ˆ T ∆ T ˆT ∆ X ˆT V (yT , µT ) = inf gˆ (zT , µT ) , Q (xT , µT ) = arg max f (xT , qT )QT (qT ), z ≥y T T QT ∈M(µT ) zT ∈[0,U]Q qT ∈[0,U]Q ˆ T ∆ T Φ (yT , µT ) = arg min gˆ (zT , µT ). zT ≥yT zT ∈[0,U]Q (2.6) T T −1 Note that in the deﬁnition of f (xT , dT ) in (2.2)-(2.3), the second argument in the function V ˆT is µ, which is a constant. By contrast, in the deﬁnition of f (xT , dT ) above, the second argument ˆ T −1 in the function V is dT , which represents the demand just realized in the past period and is not a constant.

ˆ 2 To again illustrate how the relevant functions evolve, note that V (y2, µ2) equals

X ˆ 1 inf sup C(z2, q2) + V (z2 − q2, q2) Q2(q2), z ≥y 2 2 Q2∈M(µ2) z2∈[0,U]Q q2∈[0,U]Q Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 11 which itself equals

X X inf sup C(z2, q2) + inf sup C(z1, q1)Q1(q1) Q2(q2). z ≥y z ≥z −q 2 2 Q2∈M(µ2) 1 2 2 Q1∈M(q2) z2∈[0,U]Q q2∈[0,U]Q z1∈[0,U]Q q1∈[0,U]Q We now prove that the DP 2.5- 2.6 indeed corresponds to Problem 2.4, and begin with several Q Q Q additional definitions. Recall that given a probability measure Q ∈ MART , D = (D1 ,...,DT ) denotes a random vector distributed as Q. Given t1 ≤ t2 ∈ [1,T ], we let Q[t1,t2] denote the probabil- Q Q ity measure induced by Q on (Dt1 ,...,Dt2 ), letting Q[t2] denote Q[1,t2], and Qt2 denote Q[t2,t2]. As a notational convenience, we define Q0 to be the probability measure which assigns µ probability one. n m Thus supp(Q0) = {µ}. Also, we sometimes let µ0 denote µ. Given two finite vectors x ∈ R , y ∈ R , n+m we let x : y ∈ R denote the concatenation of x and y. Namely, (x : y)k = xk for k ∈ [1, n], and (x : y)k = yk−n for k ∈ [n + 1, n + m]. Given Q ∈ MART , t ∈ [1,T ] and q ∈ supp(Q[t−1]), let Q[t](q:d) Qt|q denote the unique probability measure s.t. for all d ∈ [0,U]Q, Qt|q(d) = . Equivalently, Q[t−1](q) Q Q Qt|q(d) = P Dt = d D[t−1] = q . Note that by the martingale property, Qt|q ∈ M(qt−1).

Before precisely stating the desired result, we will need some preliminary results regarding the DP 2.5- 2.6, which assert the non-emptiness of various sets, as well as a certain axiom-of-choice type result asserting the ability to appropriately select elements from various sets. ˆt ˆ t Lemma 2. For all t ∈ [1,T ], and x, d ∈ [0,U]Q, Q (x, d) =6 ∅, Φ (x, d) =6 ∅. Furthermore, it is t ˆt possible to assign to every t ∈ [1,T ], and x, d ∈ [0,U]Q, a measure Qx,d ∈ Q (x, d); and rational t ˆ t number Φx,d ∈ Φ (x, d). Although we temporarily defer the proof of Lemma2, as the lemma will follow from the proof t t of our main result in which we explicitly construct such Qx,d, Φx,d, suppose for now that such an t t assignment Qx,d, Φx,d has been made in an arbitrary yet fixed manner. We now use these quantities to define a corresponding policy and distribution, which will correspond to the optimal policy, and ∗ ∗ ∗ worst-case distribution at optimality, for Problem 2.4. Letπ ˆ = (ˆx1,..., xˆT ) ∈ Π denote the follow- T ing policy, which we define inductively.x ˆ∗ = Φ . Now, suppose we have completely specifiedx ˆ∗ 1 x0,µ s ∗ ∆ πˆ∗ for all s ∈ [1, t] for some t ∈ [1,T − 1], and thus have also specifiedy ˆs = ys for all s ∈ [1, t+ 1]. Then ∗ t ∗ T −t we definex ˆ as follows. For all d ∈ [0,U] ,x ˆ (d) = Φ ∗ . It may be easily verified from the t+1 Q t+1 yˆt+1(d),dt ˆ ˆ∗ definition of Φ that this defines a valid policy belonging to Π. Similarly, let Q ∈ MART denote the following probability measure, which we define inductively through its conditional distributions. T Qˆ∗ = Q . Now, suppose we have completely specified Qˆ∗ . Then we specify Qˆ∗ as follows. For 1 x0,µ [t] [t+1] ∗ ∗ T −t all d ∈ supp(Qˆ ), Qˆ = Q πˆ∗ . Due to its countable support, it may be easily verified that [t] t+1|d xt+1(d),dt ˆ∗ this uniquely and completely specifies a measure Q ∈ MART . As a notational convenience, we let ∗ ˆ ˆ ∗ ˆ ∗ ˆ∗ D = (D1 ,..., DT ) denote a vector distributed as Q .

Then we have the following theorem, in analogy with Theorem1, whose proof we defer to the Technical Appendix in Section7.

+ ∗ Theorem 2. For all strictly positive U, b ∈ Q ,T ≥ 1, and µ, x0 ∈ [0,U]Q, πˆ is an optimal policy ˆ∗ PT πˆ∗ ˆ T for Problem 2.4, and Q ∈ arg maxQ∈MAR EQ[ t=1 Ct ]. Furthermore, V (x0, µ) is the optimal value of Problem 2.4.

Note that in light of Lemma2, Theorem2 implies that computing an optimal policy (and associated worst-case distribution at optimality) for Problem 2.4 is equivalent to solving the DP 2.5- 2.6, and Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 12 computing the relevant minimizers and maximizers. Furthermore, in light of Theorem2, solving Problem 2.4 is thus reduced from a minimax problem to purely a maximization problem (of the expected performance of the policyπ ˆ∗), over a restricted family of martingales. This puts the problem within the general framework of Beigblock and Nutz [12], in which the authors develop a general methodology, based on concave envelopes, to compute optimal martingale inequalities. Indeed, our resolution of the DP 2.5- 2.6 can be viewed as an explicit execution of this methodology, which exploits the special structure ofπ ˆ∗. It is also interesting to note that Theorem2 asserts that one can w.l.o.g. restrict to policies and distributions which are Markovian when one considers the three-dimensional state-space consisting of both the post-ordering inventory level and demand in the previous period, as well as a dimension tracking the current time period, even though in principle both the policy and distribution could have exhibited a more complicated form of history dependence.

2.3. Main results We now state our main results. We begin by introducing some additional deﬁnitions and notations. Whenever possible, we will suppress dependence on the parameters µ, U, b for simplicity. As a notational convenience, let us deﬁne all empty products to equal unity, and all empty sums to equal zero. For an event A, let I(A) denote the corresponding indicator. For j ∈ [−1,T ], let

( QT −1 k T ∆ U k=j+1 b+k if j ≤ T − 1, Aj = b+T T U if j = T ; and j BT =∆ AT . j b + T j T T T T T T T Note that A−1 = B−1 = B0 = 0, AT −1 = BT = U, and both {Aj , j ∈ [−1,T ]} and {Bj , j ∈ [−1,T ]} T are increasing in j, and decreasing in T . For x, µ ∈ [0,U], let qx,µ be the unique (up to sets of measure zero) probability measure s.t.

 AT −µ µ−AT qT (AT ) = j+1 , qT (AT ) = j if µ ∈ (AT ,AT ] , x ∈ [0,BT );  x,µ j AT −AT x,µ j+1 AT −AT j j+1 j+1  j+1 j j+1 j qT (0) = 1 − µ , qT (AT ) = µ if µ ∈ (AT ,AT ] , x ∈ [BT ,BT ) , k ≥ j + 1; x,µ AT x,µ k AT j j+1 k k+1  k k  T µ T µ qx,µ(0) = 1 − U , qx,µ(U) = U if µ = 0 or x = U. (2.7)

Also, for j ∈ [0,T ], let T ∆ b + T Gj (x, µ) = (T − T µ)x + (T − j)bµ; Aj in which case we deﬁne ( T ∆ 0 if µ = 0, Γµ = T +1 T +1 j + 1 if µ ∈ (Aj ,Aj+1 ] , j ∈ [−1,T − 1];

T T T and note that µ ∈ (A T −1 ,A T −1 ] for all T ≥ 2 and µ ∈ (0,U], while A T −1 = 0. We also deﬁne Γµ −1 Γµ Γ0 −1

T ∆ T ∆ T T ∆ T T χ (µ, U, b) = β = B T , Opt (µ, U, b) = G T (β , µ). MAR µ Γµ MAR Γµ µ Then an explicit solution to the DP 2.5- 2.6, and thus to Problem 2.4, is given as follows. Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 13

t ˆ t Theorem 3. For all t ∈ [1,T ], and all x, d ∈ [0,U]Q, max x, χMAR(d, U, b) ∈ Φ (x, d), and we t t ∗ may set Φx,d = max x, χMAR(d, U, b) . Thus an optimal policy πˆ for Problem 2.4 is given as ∗ T follows. xˆ1 = max x0, χMAR(µ, U, b) . For t ∈ [1,T − 1],

∗ ∗ T −t xˆt+1(d[t]) = max xˆt (d[t−1]) − dt, χMAR(dt, U, b) .

t ˆt t t Similarly, qx,d ∈ Q (x, d), and we may set Qx,d = qx,d. Thus a worst-case (at optimality) distribution ˆ∗ PT πˆ∗ ˆ∗ T Q ∈ arg max Q[ C ] may be given as follows. Q = q ∗ . For t ∈ [1,T − 1] and d ∈ Q∈MAR E t=1 t 1 xˆ1,µ ˆ∗ supp(Q[t]), ˆ∗ T −t Qt+1|d = q ∗ . xˆt+1(d),dt

T T Furthermore, for all x0 ≤ βµ , the optimal value of Problem 2.4 equals OptMAR(µ, U, b). We note that this optimal solution is not necessarily unique, but do not attempt to characterize the family of all possible solutions here. Furthermore, as the optimal policy is essentially given in closed form, it may be computed very eﬃciently. It is possible to infer many interesting qualitative features of the optimal policyπ ˆ∗ and worst- case (at optimality) measure Qˆ∗ from Theorem3. First, we have the following observation, whose proof we defer to the Technical Appendix in Section7.

∗ ∗ ∗ Observation 1. For any initial inventory level x0 ∈ [0,U], the (optimal) policy πˆ = (ˆx1,..., xˆT ), and the random vector Dˆ ∗ (distributed as the worst-case measure Qˆ∗) satisfy the following relation. ˆ ∗ ∗ ˆ ∗ For all t ∈ [1,T ], w.p.1, either Dt ≥ xˆt (D[t−1]) (i.e. the demand in period t clears the inventory), ˆ ∗ or Dt = 0 (i.e. the demand in period t equals 0, all future demands also equal 0, and the policy- maker is left holding her inventory for the remainder of the horizon). Furthermore, for all T ≥ 1, T T µ ∈ (0,U), and x ∈ χMAR(µ, U, b),U , qx,µ has support on two points, one of which is zero, and the other of which is at least x. It also follows that under these assumptions, the first case in the T definition of qx,µ cannot occur. We next further formalize this observation under the additional assumption that one’s initial inven- T tory is sufficiently small, i.e. x0 ≤ χMAR(µ, U, b). In this case the stochastic inventory and demand process induced by the policyπ ˆ∗, and sequence of demands distributed as Qˆ∗, takes a simpler form which we now describe explicitly. We also relate this simpler form, which results from the fact that the (conditional) distribution of demand always has 2-point support with positive probability at 0 (as noted in Observation1), to the concept of inventory obsolescence (see Interpretation1). Let T T ∆ T T ∆ T ∆ Γµ T Λ = max(T − Γµ , 1),D0 = µ, X0 = b+T +1 µ, and for t ∈ [1, Λ ],

T ∆ T +1−t T ∆ T +1−t Dt = A T ,Xt = B T . (2.8) min(Γµ ,T −1) Γµ

T T Note that Λ represents the ﬁrst time that Dt reaches U. We now formally relate the corresponding inventory and demand dynamics to an appropriate T ∆ T T Markov chain. Consider the following discrete time Markov chain {M (t) = Xt , Dt , t ≥ 1}, with randomized initial conditions.  XT ,DT w.p. µ ,  1 1 DT X T , DT = 1 1 1 T µ  X1 , 0 w.p. 1 − T ; D1 Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 14 for t ∈ [2, ΛT − 1],

X T , 0 w.p. 1, if DT = 0,  t−1 t−1  DT T T  T T t−1 T X , D = Xt ,Dt w.p. T , if Dt−1 =6 0, t t Dt T  T Dt−1 T  Xt , 0 w.p. 1 − T , if Dt−1 =6 0; Dt for t = ΛT (if ΛT =6 1),

 T T  Xt−1, 0 w.p. 1, if Dt−1 = 0,  T T T T Dt−1 T Xt , Dt = Xt ,U w.p. U , if Dt−1 =6 0,  DT  T t−1 T Xt , 0 w.p. 1 − U , if Dt−1 =6 0; for t ≥ min ΛT + 1,T , ( X T , 0 w.p. 1, if DT = 0; X T , DT = ΛT ΛT t t T U, U w.p. 1, if DΛT = U.

The state transitions of the above discrete time Markov chain {M T (t), t ≥ 1} can be described T as follows. In period t = 1, the inventory level is set to be X1 and the demand is realized as either T D1 or zero with certain probabilities (note that the probabilities depend on µ). Going forward, if the demand realized in the past period t − 1 is zero, then the demand in period t remains zero due to the martingale property, and the inventory level does not change from period t − 1 to period t. Between periods 2 and ΛT , if the demand has not yet hit zero, the post-ordering inventory level T T T increases (in period t) from Xt−1 to Xt , and the demand (in period t) is realized as either Dt or zero with certain probabilities (the probabilities depend on the demand realized in the past T T T T period). In particular, Dt reaches U when t = Λ , where {Dt , t ∈ [1, Λ ]} is an increasing sequence. Starting from t = ΛT + 1, the demand is either zero or U in all remaining periods. If the demand is zero, then the inventory level does not change, stuck at its value before the ﬁrst zero demand. If the demand is U, then the post-ordering inventory level is set to U in all remaining periods. T ˆ ∗ Corollary 1. If x0 ∈ [0, χMAR(µ, U, b)], then one can construct D on a common probability T ∗ ˆ ∗ ˆ ∗ T T space with M s.t. { xˆt (D[t−1]), Dt , t ∈ [1,T ]} equals { Xt , Dt , t ∈ [1,T ]} w.p.1. Furthermore, T T T T T T {Dt , t ∈ [1, Λ ]} and {Xt , t ∈ [1, Λ ]} are both monotone increasing, with Xt ≤ Dt for all t ∈ T T [1, Λ ] and DΛT = U. We now give an alternate description of the dynamics described in Corollary1, explicitly describ- ing the random amount of time until the demand is either 0 or U. Let ZT denote the r.v., with support on the integers belonging to [1, ΛT ], whose corresponding probability measure ZT satisﬁes

 T 1 − Dt−1 µ if t ∈ [1, ΛT − 1];  DT DT ZT (t) = t t−1 µ T  T if t = Λ . Dt−1

Also, let Y T denote a r.v., with support on {0,U}, independent of ZT , whose corresponding probability measure YT satisﬁes

 DT  ΛT −1 if x = U; YT (x) = U DT  ΛT −1 1 − U if x = 0. Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 15

Corollary 2. Under the same assumptions, and on the same probability space, as described T T T T in Corollary1, one can also construct Z ,Y s.t. all of the following hold w.p.1. { Xt , Dt , t ∈ T T T T T T T T T [1,Z − 1]} equals { Xt ,Dt , t ∈ [1,Z − 1]}. If Z ≤ Λ − 1, then Xt , Dt = XZT , 0 for all T T T T T T T T T t ∈ [Z ,T ]. If Z = Λ , and Y = U, then XΛT , DΛT = XΛT ,U , and Xt , Dt = U, U for all T T T T T T T T t ∈ [Λ + 1,T ]. If Z = Λ , and Y = 0, then Xt , Dt = XΛT , 0 for all t ∈ [Λ ,T ]. Furthermore, ˆ ∗ T T T ˆ ∗ T T T T Dt = Dt I(Z > t), t ∈ [1, Λ − 1]; and Dt = UI(Z = Λ ,Y = U), t ∈ [Λ ,T ]. Note that Corollary2 implies that Dˆ ∗ increases along a sequence of pre-determined constants T (i.e. Dt ) until a random time at which it either drops to 0 or jumps to U, and stays there. Similar martingales have played an important role in studying extremal martingales in other contexts, e.g. certain worst-case distributions which arise in the study of the celebrated prophet inequalities (cf. Hill and Kertz [72]). In light of our results, the appearance of such martingales has an appealing interpretation.

Interpretation 1. Under our model of demand uncertainty, a robust inventory manager should be most concerned about the possibility that her product becomes obsolete, i.e. demand for her product drops to zero at a random time. We again note that the scenario of product obsolescence has been recognized as practically relevant and analyzed extensively in the inventory literature. In contrast to much previous literature on this topic, here our model did not assume that products would become obsolete - instead the risks of obsolescence arose naturally from our worst-case analysis. Another interesting feature of our analysis is that even in a worst-case setting, the system may enter a state s.t. for all subsequent time periods, demand becomes perfectly predictable, and can be perfectly matched (at the level U). This is again intuitively appealing, as it coincides with the notion that one’s product has reached such a level of popularity that demand predictably attains its maximum. Furthermore, it suggests a certain dichotomy - namely, that either one’s product eventually becomes obsolete, or eventually reaches a level of popularity for which demand predictably attains its maximum. We now present an intuitive description of these dynamics, explained as a game between the “inventory manager” (selector of the policy) and “nature” (selector of the worst-case martingale). Nature reasons that, if her demand in some period t is ever zero, the martingale property ensures that she will order zero in all subsequent periods. This will leave the inventory manager stuck holding all the inventory which she held at the start of period t for the entire remainder of the time horizon, which will incur a large cost. Such a situation is thus desirable from nature’s perspective, and one might think that to maximize the probability of this happening, nature will thus maximize the probability that the demand is 0, which would be achieved by putting all probability on 0 and U. However, as Corollaries1-2 show, this is not the case. In fact, the adversary does not put any probability mass at U until time ΛT , i.e. the last time period in which there are any meaningful dynamics. The reason is that in the martingale-demand model, there is an additional hidden cost for nature associated with putting probability on or near U. In particular, if a demand of U ever occurs, then the martingale property ensures that all future demands must be U as well. However, it is “easy” for an inventory manager to perform well against an adversary that always orders U - in particular, she can simply order up to U in every period, incurring zero cost. In other words, the aforementioned hidden cost to nature comes in the form of a “loss of randomness”, making nature perfectly predictable going forwards. Corollaries1-2 indicate that at optimality, this tradeoﬀ manifests by having nature always put some probability at 0, and some probability on a diﬀerent T T quantity Dt , which increases monotonically to U as t ↑ Λ . This “ramping up” can be explained by observing that although higher values for this second point of the support result in a greater “loss of randomness”, the cost of such a loss becomes smaller over time, as the associated window of time during which the adversary is perfectly predictable shrinks accordingly. Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 16

Note that Corollary1 implies that (in a worst-case setting) the inventory manager continues to ramp up production until either the demand drops to zero, or time ΛT is reached, where the precise manner in which production is ramped up results from an attempt to balance the cost of potential obsoletion (i.e. being stuck holding one’s inventory) with the cost of under-ordering. To better understand this ramping up, and more generally the dynamics of an optimal policy against a worst-case distribution for Problem 2.4, we now prove that the Markov chain M T , properly rescaled, converges weakly (as T → ∞) to a simple and intuitive limiting process. For a review of the formal definition of weak convergence, and the relevant topological spaces and metrics, we refer the reader to the excellent texts Billingsley [27] and Whitt [134]. Let us define ∆ µ ∞ ∆ 1 γ = , Λ = 1 − γ b . U Let Z∞ denote the mixed (i.e. both continuous and discrete components) r.v., with continuous support on [0, Λ∞, and discrete support on the singleton Λ∞, whose corresponding probability measure Z∞ satisfies ( b(1 − α)b−1dα if α ∈ [0, Λ∞; Z∞(α) = γ if α = Λ∞.

For α ∈ [0, Λ∞], where α is again a continuous parameter, let us also deﬁne

∆ ∆ 1 ∞ −b ∞ b −(b+1) Dα = µ(1 − α) ,Xα = µγ (1 − α) .

∞ ∞ ∞ Note that DΛ∞ = XΛ∞ = U. We now deﬁne an appropriate limiting process. Let M (α)0≤α≤1 denote the following two-dimensional process, constructed on the same probability space as Z∞.  X∞,D∞ if α ∈ [0,Z∞);  α α ∞ ∞ ∞ ∞ ∞ M (α) = XZ∞ , 0 if α ≥ Z and Z < Λ ; U, U if α ≥ Z∞ and Z∞ = Λ∞.

Then we have the following weak convergence result. For α ∈ [0, 1], let MT (α) =∆ M T max(bαT c, 1). Theorem 4. For all U, b > 0 and µ ∈ (0,U), the sequence of stochastic processes T ∞ 2 {M (α)0≤α≤1,T ≥ 1} converges weakly to the process M (α)0≤α≤1 on the space D ([0, 1], R ) under the J1 topology. We note that Theorem4 reveals several qualitative insights into the corresponding dynamics. For example, it provides simple analytical formulas for how (in a worst-case setting) the production and demand both ramp up over time, and demonstrates that the limiting probability that obsolescence ever occurs (again in such a worst-case setting) equals 1 − γ. A diﬀerent lens through which one can gain further insight into Theorem3 follows by observing that for certain values of b and µ, several quantities appearing in our main results simplify considerably.

kU T U(j+1) Observation 2. When b = 1 and µ = T +1 for some k ∈ [1,T ], Aj = T for j ∈ [−1,T ], and T T T kU Λ = T + 1 − k. For t ∈ [1, Λ ], Dt = T +1−t ,

t t −1 k k − 1 k k XT = (1 − )(1 − ) U, OptT (µ, U, b) = (1 − )UT. t T + 2 T + 1 T + 2 T + 1 MAR T T + 1 Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 17

Furthermore,

( 1 T T +1 if t ∈ [1,T − k]; Z (t) = k+1 T +1 if t = T + 1 − k; and

( 1 T 1 − k+1 if x = U; Y (x) = 1 k+1 if x = 0. Namely, ZT is equally likely to take any value in the interval [1,T − k]. Of course, another avenue for more concretely understanding Theorem3 is to explicitly evaluate all relevant quantities for small values of T , as we now do for the case T = 2.

Observation 3.

1 1 1 1 1 1 A−1 = 0,A0 = U, A1 = (b + 1)U, B−1 = 0,B0 = 0,B1 = U, 2 2 U 2 2 b 2 2 A−1 = 0,A0 = b+1 ,A1 = U, A2 = (1 + 2 )U, B−1 = 0,B0 = 0, 2 U 2 3 3 2U 3 2U 3 B1 = b+2 ,B2 = U, A−1 = 0,A0 = (b+1)(b+2) ,A1 = b+2 ,A2 = U, 3 b 3 3 3 2U 3 2U 3 A3 = (1 + 3 )U, B−1 = 0,B0 = 0,B1 = (b+2)(b+3) ,B2 = b+3 ,B3 = U.

 U 2 U U−µ 2 µ− b+1 U U qx,µ( b+1 ) = 1 , qx,µ(U) = 1 if µ ∈ ( b+1 ,U] , x ∈ [0, b+2 );  U(1− b+1 ) U(1− b+1 )  q2 (0) = 1 − (b + 1) µ , q2 ( U ) = (b + 1) µ if µ ∈ (0, U ] , x ∈ [0, U );  x,µ U x,µ b+1 U b+1 b+2 2 µ 2 µ U U qx,µ(0) = 1 − U , qx,µ(U) = U if µ ∈ (0, b+1 ] , x ∈ [ b+2 ,U);  2 µ 2 µ U U qx,µ(0) = 1 − , qx,µ(U) = if µ ∈ ( ,U] , x ∈ [ ,U);  U U b+1 b+2  2 µ 2 µ qx,µ(0) = 1 − U , qx,µ(U) = U if µ = 0 or x = U. µ µ µ G2(x, µ) = 2−(b+1)(b+2) x+2bµ , G2(x, µ) = 2−(b+2) x+bµ , G2(x, µ) = 2(1− )x. 0 U 1 U 2 U  0 if µ ∈ [0, 2U ], (  (b+1)(b+2) 2 if µ ∈ [0, 2U ], Γ2 = 1 if µ ∈ ( 2U , 2U ], Λ2 = (b+1)(b+2) µ (b+1)(b+2) b+2 1 if µ ∈ ( 2U ,U].  2U (b+1)(b+2) 2 if µ ∈ ( b+2 ,U]. 0 if µ ∈ [0, 2U ], 2bµ if µ ∈ [0, 2U ],  (b+1)(b+2)  (b+1)(b+2) 2 U 2U 2U 2 2U 2U 2U χMAR = b+2 if µ ∈ ( (b+1)(b+2) , b+2 ], OptMAR = b+2 + (b − 1)µ if µ ∈ ( (b+1)(b+2) , b+2 ],  2U  2U U if µ ∈ ( b+2 ,U]. 2(U − µ) if µ ∈ ( b+2 ,U]. ( U 2U 2 b+1 if µ ∈ [0, (b+1)(b+2) ], 2 n 2U D1 = 2U D2 = U if µ ∈ [0, (b+1)(b+2) ]. U if µ ∈ ( (b+1)(b+2) ,U]. 0 if µ ∈ [0, 2U ],  (b+1)(b+2) 2 U 2U 2U 2 n 2U X1 = b+2 if µ ∈ ( (b+1)(b+2) , b+2 ], X2 = 0 if µ ∈ [0, (b+1)(b+2) ].  2U U if µ ∈ ( b+2 ,U].

( 2 µ 2 µ 2U Z (1) = 1 − (b + 1) U , Z (2) = (b + 1) U if µ ∈ [0, (b+1)(b+2) ]; 2 2U Z (1) = 1 if µ ∈ ( (b+1)(b+2) ,U]. Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 18

( 2 1 2 1 2U Y (U) = b+1 , Y (0) = 1 − b+1 if µ ∈ [0, (b+1)(b+2) ]; 2 µ 2 µ 2U Y (U) = U , Y (0) = 1 − U if µ ∈ ( (b+1)(b+2) ,U]. To further help interpret Observation3 and the explicit forms for the various quantities, we show (in Section 7.11 of the Technical Appendix) how (for the case T = 2) one may derive the same quantities from a certain “heuristic relaxation” of our original problem. In doing so, we see precisely where and why certain quantities arise, and we defer the relevant formulation and discussion to Section 7.11 of the Technical Appendix. We note that Observation3 also brings another interesting T feature of Theorem3 to light, namely the fact that χMAR(µ, U, b) is not monotone increasing in b. This is surprising, since one would expect that as the backlogging penalty increases, one T would wish to stock higher inventory levels. We also note that for ﬁxed T, µ, U, χIND(µ, U, b) is monotone increasing in b, i.e. such a non-monotonicity does not manifest in the independent- 1 ∞ b demand model. Furthermore, Theorem4, and the monotonicity (in b) of X0 = µγ , implies that this non-monotonicity does not manifest “in the limit”. We conclude by presenting several comparative results between the independent-demand and martingale-demand models. First, we prove that the expected cost incurred by a minimax optimal policy is highest under the independent-demand model, among all possible dependency structures, whenever x0 = 0. In particular, in this case the expected cost incurred under the independent-demand model is at least the expected cost incurred under the martingale-demand model. Let GENT denote the family of all T -dimensional measures Q s.t. Qt ∈ M(µ) for all t, and OptT (µ, U, b) =∆ inf sup [PT Cπ] under the initial condition x = 0. Then we GEN π∈Π Q∈GENT EQ t=1 t 0 prove the following comparative result, and defer the proof to the Technical Appendix in Section 7.

+ T Theorem 5. For all strictly positive U, b ∈ Q ,T ≥ 1, and µ ∈ [0,U]Q, OptGEN(µ, U, b) = T OptIND(µ, U, b). T OptMAR(µ,U,b) Although Theorem5 implies that T ≤ 1, it is interesting to further understand the OptIND(µ,U,b) behavior of this ratio, in the spirit of the so-called price of correlations introduced in Agrawal et al. [2]. Indeed, we prove the following, which provides a simple analytical form for this ratio as T → ∞, again deferring the proof to the Technical Appendix in Section7.

+ Theorem 6. For all strictly positive U, b ∈ Q , and µ ∈ (0,U)Q,

( 1 U T 1 − γ b if µ ≤ , OptMAR(µ, U, b) b+1 lim = 1 T →∞ OptT (µ, U, b) (1−γ b )bµ U IND U−µ if µ > b+1 . We also note the following corollary, which follows immediately from Theorem6.

+ U Corollary 3. Suppose U ∈ Q , µ = 2 , and b = 1. Then

OptT (µ, U, b) 1 lim MAR = . T →∞ T OptIND(µ, U, b) 2 One setting which arises in various inventory applications is that in which b >> h, which moti- vates considering what our results say about the regime in which b → ∞. In particular, the following may be easily veriﬁed by combining our main result Theorem3 with a straightforward calculation.

+ + U Corollary 4. For all strictly positive U ∈ Q , µ ∈ [0,U]Q, T ≥ 1, and b ∈ Q s.t. b > ( µ − 1)T , T T OptMAR(µ,U,b) it holds that χMAR(µ, U, b) = χIND(µ, U, b) = U, and T = 1. OptIND(µ,U,b) Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 19

Intuitively, Corollary4 indicates that with all other parameters fixed, if b is sufficiently large, then in both scenarios the inventory manager orders up to U in the first period out of fear of incurring large backlogging costs. Furthermore, although the demand dynamics are different in the two models in this case, they both result in the same minimax expected cost. Interestingly, when one considers the case of large b in the asymptotic setting of Theorem6 (in which T is also very large), a different behavior emerges. Namely, by letting b → ∞ in Theorem6, one may conclude the following after a straightforward calculation.

+ Corollary 5. For all strictly positive U ∈ Q , and µ ∈ (0,U)Q,

OptT (µ, U, b) log(γ−1) lim lim MAR = . b→∞ T →∞ T −1 OptIND(µ, U, b) γ − 1 Combining Corollaries4 and5, we conclude that in this case an interchange of limits does not hold. T OptMAR(µ,U,b) Namely, as Corollary4 implies that lim T →∞ limb→∞ T = 1, here the order in which one OptIND(µ,U,b) ∞ lets b, T get large makes a fundamental difference. Indeed, as it is easily verified that limb→∞(bΛ ) = −1 ∞ log(γ ), while D0 = µ irregardless of b, what happens in this second setting as b gets large ∞ is that Dα ramps up very quickly from µ to U, where the time required to ramp up behaves log(γ−1) (asymptotically) as b T (remember that as large as b may be, T is always much larger due to the order of limits here). However, over this relatively short time horizon (short as a fraction of T , still large in an absolute sense as T is very large), a non-trivial backlogging penalty may be incurred since b is so large. It can be shown that these effects result in a non-trivial limiting dynamics under a proper rescaling, although we do not present a complete analysis here for the sake of brevity. As a final observation, we note that the comparative results of Theorem5 are sensitive to the initial conditions, and in fact for different values of x0 the situation may be reversed. For example, we have the following result, whose proof we similarly defer to the Technical Appendix in Section 7.

+ µ 1 Observation 4. For all strictly positive U, b ∈ Q ,T ≥ 1, and µ ∈ [0,U]Q, if x0 = U and U < b+1 , PT π infπ∈Π supQ∈MAR EQ[ t=1 Ct ] U−µ then limT →∞ PT π = bµ > 1. infπ∈Π supQ∈IND EQ[ t=1 Ct ] We also would like to emphasize that many of the insights of our analysis, especially the “obsolescence phenomena”, i.e. the property that conditional on the past demand realizations and the current inventory level (under an optimal policy) there always exists a worst-case distribution that assigns a strictly positive probability to zero and some level which clears the inventory, is quite sensitive to the particular assumptions of our model. Indeed, although the fact that such a worst-case (conditional) distribution assigns probability to at most 2 points will hold under a broad range of modeling assumptions (and follows essentially from the sparsity of basic feasible solutions to linear programs, which would extend to some other small number of support points under additional constraints), the special properties of putting probability at 0 and clearing the inventory are quite fragile. In Section 7.12 of the Technical Appendix, we provide several examples showing that this feature need not hold if one relaxes our modeling assumptions. Although we defer an explicit description of those examples and findings to Section 7.12, we summarize those findings here. We consider three examples, which collectively demonstrate that : 1. if one allows for time-dependent costs, then a worst-case distribution may not put positive probability at zero; 2. if one removes the upper bound on the support, then a worst-case distribution may not even exist; and 3. if one imposes a lower bound on the support, then a worst-case distribution may not put positive probability on this lower bound, and furthermore may not clear the inventory. Collectively, these findings suggest that extending our framework to more complex models will require several Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 20 fundamentally new ideas, as much of the structure which allowed for our explicit analysis may no longer hold. Of course, the resulting dynamic programs (although more complex) still remain quite structured even in these more general settings, and we believe that searching for new (possibly more complex and subtle) phenomena may lead to the resolution of those settings as well. For example, in the setting that a lower bound is imposed, although our analysis in Section 7.12 (specifically the third example) shows that under an optimal policy a worst-case distribution need not clear the inventory, it does not rule out the possibility that in such a setting a worst-case distribution always takes the inventory below the lower bound. Also, although our analysis in Section 7.12 (specifically the first example) shows that if the holding costs are held constant and the backlogging costs are monotone increasing (over time) then the “obsolescence phenomena” need not hold, it may be that imposing other monotonicities and/or appropriate restrictions on the sequence of holding and backlogging costs allows one to recover such obsolescence properties even when costs vary over time. Indeed, in Section 7.12 (specifically the fourth example) we do show the following positive result: the obsolescence phenomena again manifests even if holding costs can vary arbitrarily over time, if one enforces (the admittedly strong assumption) that all backlogging costs are 0. In this case under a worst-case measure either all demands are 0 or all demands are U. Although computing the optimal policy is trivial in that setting (as it is to order nothing), the fact that the worst-case measure utilizes obsolescence (in the strongest sense possible) is more subtle, and further demonstrates the worst- case nature of the correlations (over time) induced by obsolescence (especially when holding costs are a primary concern). Formulating and analyzing such structural properties (which naturally generalize those considered in our analysis), and applying them to develop solution methodologies for more general modeling frameworks, remains an interesting direction for future research.

3. Proof of Theorem3 and Corollaries1-2 In this section, we complete the proof of our main result, Theorem3, which yields an explicit solution to Problem 2.4, as well as Lemma2, and Corollaries1-2. We proceed by induction, and note that our analysis combines ideas from convex analysis with the theory of martingales. We begin by making several additional deﬁnitions, and proving several auxiliary results. For x, µ ∈ [0,U], and j ∈ [−1,T − 1] , let

T ∆ T Fj (x, µ) = −bx + (b + T )Bj+1 + T b − (b + 1)(j + 1) µ. Also, let us deﬁne F T (x, µ) if µ ∈ (AT ,AT ] , x ∈ [0,BT );  j j j+1 j+1 GT (x, µ) if µ ∈ (AT ,AT ] , x ∈ [BT ,BT ) , k ≥ j + 1; gT (x, µ) =∆ k j j+1 k k+1 GT (x, 0) if µ = 0, x ∈ [BT ,BT );  k k k+1  T GT (U, µ) if x = U. For later proofs, it will be convenient to note that g may be equivalently expressed as follows. Let ( T ∆ T if x = U, Υx = T +1 T +1 j if x ∈ [Bj ,Bj+1 );

T T T and note that x ∈ [B T −1 ,B T −1 ) for all T ≥ 2 and x ∈ [0,U), while U = B T −1 . Noting that Υx Υx +1 ΥU +1 T T GT (U, µ) = GT −1(U, µ) implies

 T T F T −1 (x, µ) if x < B T −1 ; gT (x, µ) = Γµ −1 Γµ T T G T −1 (x, µ) if x ≥ B T −1 .  Υx Γµ Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 21

We now prove several properties of gT . It will first be useful to review a well-known sufficient condition for convexity of non-differentiable functions. Recall that for a one-dimensional function f(x), + f(x0+h)−f(x0) the right-derivative of f evaluated at x0, which we denote by ∂x f(x0), equals limh↓0 h . When this limit exists, we say that f is right-differentiable at x0. Then the following sufficient condition for convexity is stated in Royden and Fitzpatrick [110] Section 5, Proposition 18. Lemma 3 (Royden and Fitzpatrick [110]). A one-dimensional function f(x), which is continuous and right-differentiable on an open interval (a, b) with non-decreasing right-derivative on (a, b), is convex on (a, b). Then we may derive the following properties of gT and other relevant quantities, whose proofs we defer to the Technical Appendix in Section7. Lemma 4. gT (x, d) is a continuous and convex function of x on (0,U), and a right (left) continuous function of x at 0 (U).

T T +1 T Lemma 5. Aj < Aj+1 < Aj+1 for j ∈ [0,T − 2]. It follows that for all T ≥ 2 and d ∈ [0,U], T T −1 T −1 Γd ∈ {Γd , Γd + 1}. T T Lemma 6. βd ∈ arg minz∈[0,U] g (z, d). To simplify various notations and concepts before completing the proof of Theorem3, it will be useful to make several additional deﬁnitions, and prove a few more relevant preliminary results. For x, d ∈ [0,U], and j ∈ [0,T ], let

T ∆ b + T b + T 2 F j (x, d) = T x + (b − 1)T − bj − T x d + T d ; Aj Aj and for j ∈ [−1,T − 1], let

T ∆ T Gj (d) = TBj+1 + T b − (b + 1)(j + 1) d.

T T T Note that βd is monotone increasing in d, with β0 = 0, and βU = U. It follows that for all x ∈ [0,U],

T ∆ T zx = inf{d ≥ 0 s.t. βd ≥ x − d}, is well defined, T zx ≤ x for x ∈ [0,U], (3.9) T and zx = 0 iff x = 0. For x, d ∈ [0,U], let us define

 T F (x, d) if d ∈ [0, zT T(x − BT , x − BT ] , j ∈ [0,T − 1];  j x j+1 j  T T T T +1 T +1 T ∆ Gj (d) if d ∈ [zx ,U] (Aj ,Aj+1 ] , j ∈ [−1,T − 1]; g (x, d) = T G (0) if d = 0 , x = 0;  −1  T F T −1(U, 0) if d = 0 , x = U. For later proofs, it will be convenient to note that g may be equivalently expressed as follows.

( T T F ΥT −1 (x, d) if d < zx ; gT (x, d) = x−d T T G T (d) if d ≥ z . Γd −1 x In that case, we may derive the following properties of gT (x, d), whose proofs we defer to the Technical Appendix in Section7. Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 22

T T T Lemma 7. For all T ≥ 1, x, d ∈ [0,U], g (x, d) = g max βd , x − d , d .

Lemma 8. For each ﬁxed x ∈ [0,U], gT (x, d) is a continuous function of d on (0,U), and a right T T (left) - continuous function of d at 0 (U). Also, g (x, d) is a convex function of d on (0, zx ), and T a concave function of d on (zx ,U). ∆ T ∆ T T ∆ T T ∆ T T ∆ T Also, for x, d ∈ [0,U], let us deﬁne ηd = x − d, αx = A T −1 , ζx = Γ T , Ax = A T −1 , ℵx = A T −1 , Γx zx ζx Υx

T ∆ T −1 f (x, d) = b(d − x)+ + (x − d)+ + g (x, d). Note that for all j ∈ [−1,T − 1], j + 1 BT = AT j+1 b + T j+1 j + 1 ≤ AT = AT . (3.10) b + j + 1 j+1 j Combining with (3.9) and a straightforward contradiction argument, we conclude that for all x ∈ [0,U] and T ≥ 2, T T T T T T Υx ≥ Γx ≥ ζx , and ℵx ≥ αx ≥ Ax . (3.11) In that case, we may derive the following structural properties of fT , whose proofs we defer to the Technical Appendix in Section7. Lemma 9. For each fixed x ∈ [0,U], fT (x, d) is a continuous function of d on (0,U), and a right (left) - continuous function of d at 0 (U). Also, for all j ∈ [−1,T − 2], fT (x, d) is a convex function T T T T of d on (Aj ,Aj+1). Furthermore, f (x, d) is a convex function of d on (0, Ax ), and a concave T function of d on (αx ,U). We now introduce one last preliminary result, which will provide a mechanism for certifying a distribution as the solution to a given distributionally robust optimization problem, whose proof we again defer to the Technical Appendix in Section7. Lemma 10. Suppose f : [0,U] → R is any bounded real-valued function with domain [0,U]. Sup- ∆ f(R)−f(L) Rf(L)−Lf(R) pose 0 ≤ L ≤ µ < R ≤ U, and the linear function η(d) = R−L d + R−L , i.e. the line intersecting f at the points L and R, satisfies η(d) ≥ f(d) for all d ∈ [0,U], i.e. lies above f on R−µ µ−L [0,U]. Then the measure q s.t. q(L) = R−L , q(R) = R−L , belongs to arg maxQ∈M(µ) EQ[f(D)]. With all of the above notations, definitions, and preliminary results in place, we now proceed with the proof of Theorem3, which will follow from the following result. T T T ˆT Theorem 7. For all x, d ∈ [0,U]Q and T ≥ 1, g (x, d) =g ˆ (x, d), and qx,d ∈ Q (x, d). Proof: We proceed by induction, beginning with the base case T = 1. It follows from Observation 0 0 1 1 U 1 U 1 3 that for all x, µ ∈ [0,U]Q,Γ = Υ = B 0 = 0, Γ = I(µ > ), β = U × I(µ > ), g (x, µ) = µ x Γµ µ b+1 µ b+1 1 b+1 1 µ 1 µ G0(x, µ) = (1 − U µ)x + bµ, and qx,µ(0) = 1 − U , qx,µ(U) = U . The desired result then follows directly from the results of Shapiro [119], and we omit the details.

Now, suppose that the result is true for all s ∈ [1,T − 1] for some T ≥ 2. We now prove that the results hold also for T . It follows from the induction hypothesis, and Lemmas4,6, and7 that for all x, d ∈ [0,U]Q, ˆ T −1 T −1 T −1 ˆ T −1 T −1 ˆT T V (x, d) = g max βd , x , d , V (x−d, d) = g (x, d) , and f (x, d) = f (x, d). (3.12) Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 23

From deﬁnitions, to prove the desired induction, it thus suﬃces to demonstrate that

T T T T g (x, d) = sup EQ[f (x, D)] , and qx,d ∈ arg max EQ[f (x, D)]. (3.13) Q∈M(d) Q∈M(d)

We now use Lemma9 to explicitly construct (for each ﬁxed value of x) a family F of lines {Li}, T T s.t. each line Li lies above f (x, d) for all d ∈ [0,U], each line Li intersects f (x, d) at exactly two 1 2 1 2 points pi , pi , and for each µ ∈ [0,U] there exists Li ∈ F s.t. µ ∈ [pi , pi ]. Our construction will ultimately allow us to apply Lemma 10, explicitly solve the distributionally robust optimization T problem supQ∈M(d) EQ[f (x, D)], and complete the proof. We begin by explicitly constructing the family of lines F. For d ∈ R, let us deﬁne fT (x, ℵT ) − T x KT (x, d) =∆ x d + T x; T ℵx and for j ∈ [−1,T − 2],

T T T T T T T T T T T ∆ f (x, Aj+1) − f (x, Aj ) Aj+1f (x, Aj ) − Aj f (x, Aj+1) Lj (x, d) = T T d + T T . Aj+1 − Aj Aj+1 − Aj

T −1 Noting that F j (x, 0) = (T − 1)x for all j, it follows from Lemma7 that

fT (x, 0) = T x. (3.14)

It may be easily verified, using (3.14), that KT defines the unique line passing through the (x, y) co- T T T T T ordinates 0, f (x, 0) and ℵx , f (x, ℵx ) ; and Lj defines the unique line passing through the (x, y) T T T T T T co-ordinates Aj , f (x, Aj ) and Aj+1, f (x, Aj+1) . Then we have the following result, showing T T T T that K lies above f , and that L` lies above f for certain values of `, whose proof we defer to the Technical Appendix in Section7. Lemma 11. For each fixed x ∈ [0,U], KT (x, d) ≥ fT (x, d) for all d ∈ [0,U]. Also, for all ` ∈ T −1 T T [Υx ,T − 2], L` (x, d) ≥ f (x, d) for all d ∈ [0,U]. T T That qx,d ∈ arg maxQ∈M(d) EQ[f (x, D)] follows from Lemmas 10 and 11, definitions, and a straightforward case analysis, the details of which we omit. We now prove that gT (x, d) = T T −1 T −1 supQ∈M(d) EQ[f (x, D)], and proceed by a case analysis. Let j = Γd and k = Υx . First, suppose T d ∈ [0,Ak ]. In light of Lemmas 10 and 11, in this case it suffices to demonstrate that

gT (x, d) = KT (x, d). (3.15)

In this case, the left-hand side of (3.15) equals

T b + T Gk (x, d) = (T − T d)x + (T − k)bd. (3.16) Ak Alternatively, from (7.62), the right-hand side of (3.15) equals

(b + T )x b(T − k) − T d + T x. (3.17) Ak Noting that (3.16) equals (3.17) completes the proof in this case. T Alternatively, suppose d > Ak . In this case it suﬃces to demonstrate that

T T g (x, d) = Lj−1(x, d). (3.18) Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 24

Note that the left-hand side of (3.18) equals

T T Fj−1(x, d) = −bx + (b + T )Bj + T b − (b + 1)j d. (3.19)

Alternatively, it follows from (3.9) and (3.10) that the right-hand side of (3.18) equals

T T T T T T T T T T f (x, Aj ) − f (x, Aj−1) Aj f (x, Aj−1) − Aj−1f (x, Aj ) T T d + T T Aj − Aj−1 Aj − Aj−1 T −1 T T −1 T T T −1 T T T −1 T Gj−1 (Aj ) − Gj−2 (Aj−1) Aj Gj−2 (Aj−1) − Aj−1Gj−1 (Aj ) = b(d − x) + T T d + T T . (3.20) Aj − Aj−1 Aj − Aj−1

T −1 T T −1 T We now simplify (3.20). Note that Gj−1 (Aj ) − Gj−2 (Aj−1) equals T −1 T −1 T T T (T − 1)(Bj − Bj−1 ) + (T − 1)b − (b + 1)j (Aj − Aj−1) − (b + 1)Aj−1 j j − 1 = (T − 1)b − (b + 1)j AT − AT + (T − 1) AT − AT − (b + 1)AT j j−1 T − 1 j T − 1 j−1 j−1 T T = (T − 1)b − (b + 1)j Aj − Aj−1 , (3.21)

T T −1 T T T −1 T and Aj Gj−2 (Aj−1) − Aj−1Gj−1 (Aj ) equals

T T −1 T T −1 T T (T − 1) Aj Bj−1 − Aj−1Bj + (b + 1)Aj−1Aj j − 1 j = (T − 1) AT AT − AT AT + (b + 1)AT AT j j−1 T − 1 j−1 j T − 1 j−1 j T T T T T = bAj−1Aj = jAj Aj − Aj−1 . (3.22)

Plugging (3.21) and (3.22) into (3.20), and comparing to (3.19), completes the proof of (3.13), and the desired induction. With Theorem7 in hand, we now complete the proof of our main result Theorem3, as well as Corollaries1 and2. Proof: [Proof of Theorem3] Lemma2 follows from Theorem7 and Lemmas4 and6, and it follows from definitions and a straightforward case analysis (the details of which we omit) that T T T OptMAR(µ, U, b) = g (βµ , µ). Theorem3 then follows by combining Lemmas4 and6 with Theorem 7,(3.12), Lemma2, and Theorem2. T T T Proof: [Proof of Corollaries1 and2] First, note that if Γ µ = T , then χMAR µ, U, b = BT = T T T T AT −1 = X1 = D1 = U, proving the desired result in this case. Next, suppose Γµ ≤ T − 1. In light T T T of the definitions of Xt and Dt , i.e. (2.8), the already established monotonicities of {Aj , j ∈ T [−1,T −1]} and {Bj , j ∈ [−1,T ]}, and the martingale property, it suffices to establish that for all t ∈ T T +1−t T T T T +1−t T +1−t T [1, Λ ]: 1. χMAR Dt−1, U, b = Xt , 2. ∃j ∈ [−1,T − 1], k ≥ j + 1 s.t. Dt−1 ∈ (Aj ,Aj+1 ],Xt ∈ T +1−t T +1−t T +1−t T [Bk ,Bk+1 ),Ak = Dt .

T T T First, suppose t = 1. In this case, Dt−1 = µ ∈ (A T −1 ,A T −1 ], Γµ −1 Γµ

T +1−t T T T T χ D , U, b = χ µ, U, b = B T = X , MAR t−1 MAR Γµ t

T T τ τ and D = A T . Note that since A is monotone decreasing in τ, it follows that Γ is monotone t Γµ j µ T T −1 T −1 increasing in τ, and thus Γµ ≥ Γµ . Thus we ﬁnd that 1. holds, and 2. holds with j = Γµ −1, k = T Γµ . Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 25

T T ∆ T +2−t Next, suppose t ∈ [2, Λ ]. In this case, Dt−1 = z = A T . Note that Γµ

T +2−t T +2−t z ∈ (A T ,A T ], (3.23) Γµ −1 Γµ

T +1−t T and thus Γz = Γµ . Furthermore, T +1−t T T +1−t T +1−t T χMAR Dt−1, U, b = B T +1−t = B T = Xt . Γz Γµ τ In addition, the fact that Aj is monotone decreasing in τ and monotone increasing in j, combined T +1−t T +1−t 0 T with (3.23), implies that z ∈ (Aj0−1 ,Aj0 ] for some j ≤ Γµ . Thus we ﬁnd that 1. holds, and 2. 0 T holds with j = j − 1, k = Γµ , completing the proof. 4. Asymptotic analysis and proof of Theorem4 In this section, we complete the proof of Theorem4, and begin with a lemma which asserts the (uniform) convergence of several one-dimensional functions to appropriate limits. As all of these convergences follow from repeated application of elementary bounds such as Taylor’s Inequality for the exponential function, and the continuous diﬀerentiability of all relevant functions over appropriate domains, we omit the details. For α ∈ [0, 1), let f(α) =∆ b (1 − α)b−1, where we note that f(α) = Z∞(α) for α ∈ [0, Λ∞). Lemma 12. For all µ ∈ (0,U),

1 −1 T b T +1 −1 T ∞ lim T Γµ = γ , lim A T = µ , lim T Λ = Λ ; T →∞ T →∞ Γµ T →∞

∆ T ∆ ∞ For all T ≥ 1, α ≥ 0, let mα(T ) = min(bαT c, Λ − 1), and mα = min(α, Λ ). For α1, α2 ≥ 0 s.t. T ∆ T T ∞ ∆ ∞ ∞ α < m , let us deﬁne X = X I bα T c + 1 ≤ Z ≤ m (T ) , and X = X ∞ I α ≤ Z < 1 α2 α1,α2 ZT 1 α2 α1,α2 Z 1 ∞ ∆ mα2 . Also, for µ ∈ (0,U), note that f is continuous on [0, Λ ], and let f = supα∈[0,Λ∞] f(α) < ∞. In light of Lemma 12, it follows that lim sup sup T ZT (t) = f. (4.24) T →∞ t∈[1,ΛT −1] Let us also recall that X∞ and D∞ are continuous and strictly increasing on [0, Λ∞]. Before embarking on the proof of Theorem4, it will be useful to ﬁrst establish the weak convergence of XT to X∞ . α1,α2 α1,α2 For µ ∈ (0,U), α , α ≥ 0 s.t. α < m , XT converges weakly to X∞ . Further- Lemma 13. 1 2 1 α2 α1,α2 α1,α2 more, for any bounded uniformly continuous function H : [0,U] → R,

mα2 (T ) Z mα2 X T T ∞ lim H Xk Z (k) = H (Xz ) f(z)dz. (4.25) T →∞ α1 k=bα1T c+1 Proof: By the Portmanteau Theorem (cf. Theorem 2.1 in Billingsley [27]), to prove weak convergence, it suﬃces to demonstrate (4.25). Let H denote an upper bound for |H|. It follows from Lemma 12,(4.24), and the fact that |ab − cd| ≤ |a + c||b − d| + |b + d||a − c| that for any > 0, there exists T < ∞ s.t. for all T ≥ T, and k ∈ [bα1T c + 1, mα2 (T )],

T T ∞ k T ∞ H Xk T Z (k) − H(X k )f( ) ≤ 2H + 4f sup |H Xk − H(X k ) . T T T k∈[bα1T c+1,mα2 (T )] Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 26

It follows from Lemma 12 that there similarly exists T1, < ∞ s.t. for all T ≥ T1,, sup |XT − X∞ < . Combining the above with the fact that H is uni- k∈[bα1T c+1,mα (T )] k k 2 T formly continuous implies that for any > 0, there exists T2, < ∞ s.t. for all T ≥ T2,, sup H (XT ) T ZT (k)−H(X∞) ( k ) < . Thus for any > 0, there exists T < ∞ k∈[bα1T c+1,mα (T )] k k f T 3, 2 T m (T ) m (T ) −1 P α2 T T −1 P α2 ∞ k s.t. for all T ≥ T3,, T k=bα T c+1 H (Xk ) T Z (k) − T k=bα T c+1 H(X k )f( T ) < . Not- 1 1 T m (T ) −1 P α2 ∞ k R mα2 ∞ ing that Riemann integrability implies limT →∞ T k=bα T c+1 H(X k )f( T ) = α H (Xz ) f(z)dz 1 T 1 completes the proof. We now complete the proof of Theorem4, and proceed by the standard path of demonstrating appropriate notions of tightness and convergence of ﬁnite-dimensional distributions. We begin by providing some additional background regarding the space D ([0, 1], Rk) and associated notions of weak convergence. Recall that the space D ([0, 1], Rk) contains all functions x : [0, 1] → Rk that are right-continuous in [0, 1) and have left-limits in (0, 1] (i.e. cadlag on [0, 1]). Given such a cadlag k ∆ ∆ Pk function x = (x1, . . . , xk) : [0, 1] → R , let ||xj|| = supt∈[0,1] |xj(t)|, and ||x|| = j=1 ||xj||. Given a m m set {αi}i=0 satisfying 0 = α0 < α1 < . . . < αm = 1, we say that {αi}i=0 is δ-sparse if min1≤i≤m(αi − αi−1) ≥ δ. Let Sδ,m denote the collection of all such δ-sparse sets consisting of exactly m points. Given x which is cadlag on [0, 1] and δ ∈ (0, 1), let

k ! ∆ X 0 wx(δ) = inf max sup |xj(α) − xj(α )| , m∈Z+,{α }∈S 1≤i≤m 0 i δ,m j=1 α,α ∈[αi−1,αi) where the infimum extends over all δ-sparse sets {αi} of all possible finite lengths (i.e. values of m), where we note that one must only consider values of m which are at most dδ−1e + 1. Let us recall the following necessary and sufficient conditions for tightness in the space D ([0, 1], Rk).

Lemma 14 (Theorem 13.2, Billingsley [27]). A sequence of probability measures {PT }T ≥1 k on the space D ([0, 1], R ) under the J1 topology is tight if and only if the following conditions hold:

lim lim sup PT (||x|| ≥ a) = 0; (4.26) a→∞ T →∞

For all > 0 , lim lim sup PT (wx(δ) ≥ ) = 0. (4.27) δ→0 T →∞ T We now use Lemma 14 to prove tightness of the sequence {M (α)0≤α≤1,T ≥ 1}, by verifying (4.26) and (4.27).

T 2 Lemma 15. For µ ∈ (0,U), {M (α)0≤α≤1,T ≥ 1} is tight in D ([0, 1], R ). Proof: (4.26) follows immediately from the fact that MT (α) is supported on [0,U]2 w.p.1 for all α ∈ [0, 1], T ≥ 1. Let us prove (4.27). Recall from Corollary2 that we may construct r.v.s ZT ,Y T on the same probability space as MT s.t. the properties outlined in Corollary2 hold. It follows from Lemma 12, and the Riemann integrability of f, that there exists δ0 > 0 s.t. for all δ ∈ (0, δ0), 0 and δ ∈ (0, δ), there exists Tδ0 s.t. for all T ≥ Tδ0 ,

∞ ∞ ∞ ∆ Note that X ,D , f are continuously diﬀerentiable on [0, Λ + 0] for some 0 > 0. Let D = d ∞ d ∞ d T 32 1+sup ∞ max(| D |, | X |, | |) . Also note that Lemma 12 implies that {Y ,T ≥ 0≤x≤Λ +0 dx x dx x dx f 1} converges weakly to the r.v. which equals U w.p.1. Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 27

It thus follows from (4.28) and the triangle inequality that for all δ ∈ (0, δ0), there exists Tδ s.t. for all T ≥ Tδ, sup max XT − XT , DT − DT ≤ Dδ , |XT − U| ≤ Dδ, (4.29) bα2T c bα1T c bα2T c bα1T c ΛT 0≤α1≤α2≤1 T bα2T c≤Λ |α1−α2|≤2δ

ZT ZT [ min( , 1 − ) ≤ 4δ} {Y T = 0} ≤ Dδ. P T T As we will be referring to various sample-path constructions in the remainder of the proof, for a given ω ∈ Ω (on the corresponding probability space), we will denote the dependence of various r.v.s on ω by adding ω as a subscript or superscript where appropriate. Given δ ∈ (0, δ0) and T ≥ Tδ, ZT,ω ZT,ω T,ω let Ωδ,T ⊆ Ω denote that subset of Ω s.t. ω ∈ Ωδ,T iff min( T , 1 − T ) > 4δ and Y = U. Let us fix some > 0, δ ∈ (0, δ0),T ≥ Tδ, ω ∈ Ωδ,T . For the given parameters, we now define a particular ZT,ω δ-sparse mesh. Note that for the given ω, there exists kω s.t. kωδ ≤ T < (kω + 1)δ, where 0 < ∆ ZT,ω ∆ (kω −2)δ < (kω +2)δ < 1. Also, let n1,ω = sup{n : T +nδ < 1−δ}, and nω = kω +n1,ω +2. Then we ω ω ω let {αi , i ∈ [1, nω]} denote the following δ-sparse mesh. Let α1 = 0. For k ∈ [1, kω − 1], let αk = kδ. ω ZT,ω ω ZT,ω ω Let αkω = T . For k ∈ [kω + 1, kω + 1 + n1,ω], let αk = T + (k − kω)δ. Finally, we let αnω = 1. Note that ZT,ω = αω = inf{α ∈ [0, 1] : bαT c ≥ ZT,ω}, (4.30) T kω 0 ω ω 0 i ∈ [2, nω + 1], α, α ∈ [αi−1, αi ) implies |α − α | ≤ 2δ. We treat two cases. First, suppose ZT,ω ≤ ΛT − 1. In this case, Corollary2 implies that ( XT ,DT if bαT c ≤ ZT,ω − 1; MT,ω(α) = bαT c bαT c T T,ω XZT,ω , 0 if bαT c ≥ Z .

It thus follows from (4.29) and (4.30) that

2 X T,ω T,ω 0 max sup |Mj (α) − Mj (α )| ≤ 2Dδ, i∈[2,kω] 0 j=1 α,α ∈[αi−1,αi)

2 X T,ω T,ω 0 max sup |Mj (α) − Mj (α )| = 0, i∈[kω+1,nω+1] 0 j=1 α,α ∈[αi−1,αi)

ω and thus in this case wx (δ) ≤ 2Dδ. Alternatively, suppose ZT,ω = ΛT . In this case, Corollary2 implies that  XT ,DT if bαT c ≤ ΛT − 1;  bαT c bαT c MT (α) = T T XΛT ,U if bαT c = Λ ; U, U if bαT c ≥ ΛT + 1.

It again follows from (4.29) and (4.30) that

2 X T,ω T,ω 0 max sup |Mj (α) − Mj (α )| ≤ 2Dδ. i∈[2,kω] 0 j=1 α,α ∈[αi−1,αi) Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 28

0 T,ω T,ω 0 Now, note that for i ∈ [kω + 1, nω + 1] and α, α ∈ [αi−1, αi), |M2 (α) − M2 (α )| = 0, and T,ω T,ω 0 T |M1 (α) − M1 (α )| ≤ |XΛT − U| ≤ Dδ, with the final inequality following from (4.29). Thus in ω this case, it again holds that wx (δ) ≤ 2Dδ. To complete the proof of (4.27), we need only carefully combine all of the above. In particular, let ω us fix any > 0. It follows from (4.29) that for all δ ∈ (0, δ0) and T ≥ Tδ, P(Ωδ,T ) ≥ 1 − Dδ, wx (δ) ≤ 2Dδ for all ω ∈ Ω , and thus wω(δ) > 2Dδ ≤ Dδ. We conclude that for all δ < min(δ , ) and δ,T P x 0 3D ω T ≥ Tδ, P wx (δ) ≥ ≤ Dδ, from which (4.27) follows after taking limits in the appropriate order. Next, we prove the following weak convergence of finite-dimensional distributions. ∞ T T Lemma 16. For µ ∈ (0,U), and 0 = α1 < . . . < αk0 = Λ < . . . < αn = 1, (M (α1),..., M (αn)) ∞ ∞ converges weakly to (M (α1),..., M (αn)). Proof: By the Portmanteau theorem, it suffices to prove that for any bounded uniformly continuous function H : [0,U]2n → R, T T ∞ ∞ lim H M (α1),..., M (αn) = [H (M (α1),..., M (αn))] . (4.31) T →∞ E E As a notational convenience, we define for i ∈ [1, n],

H∞ (i, x, y) =∆ H X∞ ,D∞ ,...,X∞,D∞, x, y, x, y, . . . , x, y , α1 α1 αi αi HT (i, x, y) =∆ H XT ,DT ,...,XT ,DT , x, y, x, y, . . . , x, y , bα1T c bα1T c bαiT c bαiT c Hˆ T (i, x, y) =∆ H XT ,DT ,...,XT ,DT , x, y, y, y, . . . , y, y . bα1T c bα1T c bαiT c bαiT c Then it follows from the deﬁnition of M∞(α) that

k −1 0 Z αi+1 ∞ ∞ X ∞ ∞ ∞ E [H (M (α1),..., M (αn))] = H (i, Xz , 0) f(z)dz + γH (k0, U, U) . i=1 αi

T T It follows from Lemma 12 that there exists T0 s.t. for all T ≥ T0, bαk0−1T c < Λ − 1 < Λ < bαk0+1T c. From Corollary2, for all T ≥ T0,

T T T T T T E H M (α1),..., M (αn) = W1 + W2 + W3 + W4 , where

min{bα T c,ΛT −1} k0−2 bαi+1T c k0 T ∆ X X T T T X T T T W1 = H i, Xj , 0 P(Z = j) + H k0 − 1,Xj , 0 P(Z = j), i=1 j=bα T c+1 j=bα T c+1 i k0−1 ΛT −1 T ∆ T X T T T W2 = I bαk0 T c < Λ − 1 H k0,Xj , 0 P(Z = j), j=bαk T c+1 " 0 T ∆ T ˆ T T T T W3 = I bαk0 T c = Λ H k0 − 1,XΛT ,U + I bαk0 T c > Λ H (k0 − 1, U, U) # T T T T T + I bαk0 T c < Λ H (k0, U, U) P Z = Λ P Y = U , " T ∆ T T T W4 = I bαk0 T c ≥ Λ H k0 − 1,XΛT , 0 # T T T T T T + I bαk0 T c < Λ H k0,XΛT , 0 P Z = Λ P Y = 0 . Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 29

T Pk0−1 R αi+1 ∞ ∞ Lemma 13 implies that limT →∞ W = H (i, X , 0) f(z)dz, and it follows from Lemma 1 i=1 αi z T ∞ T 12 and the uniform continuity of H that limT →∞ W4 = γH (k0, U, U), and limT →∞ W2 = T limT →∞ W3 = 0. Combining the above completes the proof of (4.31). Proof: [Proof of Theorem4] Combining Lemmas 14, 15, and 16 completes the proof of Theorem 4.

5. Numerical experiments In this section, we conduct numerical experiments to further demonstrate the benefits of the pro- posed policy. More specifically, we aim to show the benefits of a robust policy which accounts for martingale structure over a robust policy which assumes that the demand is drawn independently over time. To that end, we will compare the performance of the minimax optimal policy for the independent-demand model to the performance of the minimax optimal policy for the martingale- demand model, when the demand itself comes from a simple additive-MMFE model (and has the martingale property). Indeed, in line with the numerical experiments conducted in Mamani, Nassiri and Wagner [92] (in which the joint distribution of demand was multivariate normal), we Pn assume that the true demand process is the following martingale: Dn = µ + t=1 t, where µ is a constant and {t}t≥1 is a sequence of i.i.d. r.v.s representing forecast adjustments, i.e. the demand comes from a simple additive-MMFE model. We further assume that {t, t ∈ [1,T ]} are i.i.d., normally distributed with mean 0 and variance σ2. We note that such an additive-MMFE model with normally distributed increments is common in the literature (e.g., Heath and Jackson [71], Lu, Song and Regan [90], Wang, Atasu and Kurtulus [133]), making it a good reference for numerical comparisons. As a base-line we assume that the demand distribution has a per-period average of 10, i.e., µ = 10, and the standard deviation σ equals either 0.1µ (relatively low) or 0.2µ (relatively high). We normalize the holding cost at h = 1 and vary backorder costs in the specified range, b namely, the service level b+h is varied from 10% (low) to 90% (high). The finite time horizon T is varied from relatively short (T = 3) to relatively long (T = 20). We note that although both policies we consider will be seeded by an upper-bound parameter U (as required in the definition of the policy),which we will vary from U = 15 to U = 25, the demand process itself in no cases depends on U, and may go above U (and may actually go below zero, in which case the demand actually adds to the inventory, although we note that such negative demand will be a relatively rare occurence under most of our parameter settings). As any particular model assumptions may not hold in any given practical setting, it is in some sense natural to consider the performance when such a support assumption may fail. Recall that the minimax optimal policy in the independent-demand setting (from Theorem1) is a base-stock policy with base-stock level

χIND(µ, U, b), which we note is independent of time or the realized demands, and in any case is either always 0 or always U (depending on the relation of µ, U, and b). Thus this policy can be implemented with no alterations even if demand leaves the interval [0,U]. Indeed, we let PolicyIND denote exactly this policy, which we shall use in our numerical experiments. Things are slightly more complicated in the martingale-demand setting. Indeed, recall that ∗ ∗ the minimax optimal policyπ ˆ in the martingale-demand setting (from Theorem3) sets ˆx1 = T ∗ ∗ T −t max x0, χMAR(µ, U, b) ; and For t ∈ [1,T −1], setsx ˆt+1(d[t]) = max xˆt (d[t−1])−dt, χMAR(dt, U, b) . ∗ T −t Formally,π ˆ is a state-and-time-dependent base-stock policy, ordering up to level χMAR(Dt, U, b) in period t + 1 for t ∈ [1,T − 1], where we note that this level depends both on time and the most recent demand value. Here, a difficulty arises if for some t it holds that Dt ∈/ [0,U], since in that case T −t χMAR(Dt, U, b) is not defined. We remedy this by defining a policy PolicyMAR as follows. In period T ∗ 1, the policy orders up to level χMAR(µ, U, b), consistent withπ ˆ . For t ∈ [1,T − 1], if Dt ∈ [0,U], T −t ∗ then at the start of period t + 1 the policy orders up to level χMAR(Dt, U, b), consistent withπ ˆ . However, if for some t ∈ [1,T − 1] it holds that Dt < 0, then at the start of period t + 1 the policy orders up to 0. Alternatively, if for some t ∈ [1,T − 1] it holds that Dt > U, then at the start of Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 30

period t + 1 the policy orders up to U. We note that this is in some sense the simplest and most natural way to reconcile this discrepancy, and it is this (slightly modiﬁed) policy PolicyMAR that we use in our numerical experiments.

In summary, the goal of our numerical experiment is to compare the performance of PolicyMAR and PolicyIND when the demand process is a Gaussian random walk (i.e. additive-MMFE). For each parameter setting, we perform 106 simulations, with our table displaying the average cost incurred under each of the two policies for each parameter setting. In all cases we set the initial inventory level to 0. Although (for clarity of exposition) we do not include a formal analysis of variance or conﬁdence intervals, we note that our experiments suggested that the outcomes were generally quite stable. The following table summarizes the diﬀerent parameter settings which we will consider.

Parameters h b T U {t, t ∈ [1,T ]} µ σ x0 1 1 2 Values 1 9 , 4 , 1, 4, 9 3, 10, 20 15, 20, 25 i.i.d. Normal(0, σ ) 10 0.1µ, 0.2µ 0 Table 1 Summary of parameters in the numerical experiments.

We report our numerical results in Table2. For each set of parameter values, we report the following three numbers: the average cost incurred by PolicyMAR (denoted by CMAR), the average cost incurred by PolicyIND (denoted by CIND), and the percentage of cost reduction deﬁned as CIND−CMAR CIND−CMAR . Each cell contains the corresponding triplet (CMAR,CIND, ). CIND CIND Let us summarize our ﬁndings. First, and most importantly, in all cases the cost incurred by

PolicyMAR is no greater than the cost incurred by PolicyIND, and the percentage of cost reduction can be as large as 64%. This dominance, and (for some parameter settings) substantial improvement, suggest the importance of accounting for possible dependencies when implementing robust inventory control policies. We note that in the deterministic robust optimization setting, related numerical results were presented in [92]. Next, we comment on how the parameter choices impact the improvement. First, we note that for several parameter settings, the two policies perform nearly identically. This stems from the fact that for certain parameters, both policies will with high probability either always order up to U or always order up to 0. Although PolicyIND always exhibits such a behavior, a careful analysis of Theorem3 and Policy MAR shows that for certain values of T, t, Dt,U, and b, the thresholds of PolicyMAR will exhibit the same behavior. For example, we proved this equivalence for sufficiently large b in our previous Corollary4, in which case all relevant thresholds will be U (with high probability). A similar behavior can be demonstrated as regards settings in which both policies set the relevant thresholds to 0 (again with high probability in the case of PolicyMAR). We note that such a phenomena occurs almost exclusively when T = 3 and/or b takes either very small or very large values, which is consistent with the parameter regimes under which such agreement would occur (as indicated in Corollary4). Indeed, Corollary4 indicates that such agreement would occur U ∆ (for example) if b > ( − 1) × (T − t) for all t ∈ [0,T − 1] (here letting D0 = µ), which is more likely Dt to occur when b is large and/or T is small. Again, we note that a similar phenomena can occur for alternative parameter ranges in which case both policies order up to 0 instead of ordering up to U. We note that when T = 3, this behavior is partially mitigated by the presence of a larger variance, i.e. when T = 3 the larger variance tends to yield a larger gap between the policies, as the larger variance can lead to realized demands which break the conditions needed for these degeneracies. Second, we note that our results seem to be fairly insensitive to U, σ, and T, so long as T ≥ 10. Intuitively, for any such parameters the conditional expectation of demand will with high probability fluctuate considerably, leading to a significant advantage for PolicyMAR, which utilizes this Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 31

U = 15, σ = 0.1µ T = 3 T = 10 T = 20 b = 1/9 (3.333, 3.333, 0.00%) (10.19, 11.11, 8.28%) (18.66, 22.52, 17.1%) b = 1/4 (5.991, 7.500, 20.1%) (18.13, 25.00, 27.5%) (33.03, 50.31, 34.3%) b = 1 (11.66, 15.00, 22.3%) (26.57, 50.49, 47.4%) (52.31, 104.6, 50.0%) b = 4 (15.01, 15.01, 0.00%) (40.17, 51.22, 21.6%) (66.47, 111.1, 40.2%) b = 9 (15.01, 15.01, 0.00%) (51.77, 52.44, 1.28%) (99.89, 122.0, 18.1%) U = 20, σ = 0.1µ T = 3 T = 10 T = 20 b = 1/9 (3.333, 3.333, 0.00%) (11.09, 11.11, 0.18%) (21.99, 22.52, 2.35%) b = 1/4 (7.500, 7.500, 0.00%) (23.67, 25.00, 5.32%) (44.32, 50.31, 11.9%) b = 1 (18.88, 30.00, 37.1%) (43.82, 100.0, 56.2%) (84.20, 200.4, 58.0%) b = 4 (30.00, 30.00, 0.00%) (53.39, 100.0, 46.6%) (84.69, 200.6, 57.8%) b = 9 (30.00, 30.00, 0.00%) (87.81, 100.0, 12.2%) (112.7, 201.0, 43.9%) U = 25, σ = 0.1µ T = 3 T = 10 T = 20 b = 1/9 (3.333, 3.333, 0.00%) (11.11, 11.11, 0.00%) (22.48, 22.52, 0.18%) b = 1/4 (7.500, 7.500, 0.00%) (24.92, 25.00, 0.32%) (48.60, 50.31, 3.40%) b = 1 (17.91, 30.00, 40.3%) (55.17, 100.0, 44.8%) (107.1, 200.4, 46.6%) b = 4 (41.34, 45.00, 8.13%) (64.92, 150.0, 56.7%) (107.4, 300.2, 64.2%) b = 9 (45.00, 45.00, 0.00%) (96.36, 150.0, 35.8%) (135.3, 300.2, 54.9%) U = 15, σ = 0.2µ T = 3 T = 10 T = 20 b = 1/9 (3.211, 3.336, 3.75%) (9.911, 12.35, 19.7%) (37.32, 44.60, 16.3%) b = 1/4 (5.476, 7.503, 27.0%) (16.76, 26.30, 36.3%) (49.90, 72.95, 31.6%) b = 1 (10.84, 15.33, 29.3%) (29.30, 57.90, 49.4%) (80.55, 150.1, 46.3%) b = 4 (15.77, 15.81, 0.25%) (53.94, 68.65, 21.4%) (151.1, 198.4, 23.8%) b = 9 (16.63, 16.63, 0.00%) (80.62, 86.58, 6.88%) (251.2, 278.9, 9.93%) U = 20, σ = 0.2µ T = 3 T = 10 T = 20 b = 1/9 (3.335, 3.336, 0.03%) (11.65, 12.35, 5.67%) (40.92, 44.60, 8.25%) b = 1/4 (7.445, 7.503, 0.77%) (22.23, 26.30, 15.5%) (58.62, 72.95, 19.6%) b = 1 (18.11, 30.00, 39.6%) (41.09, 101.6, 59.6%) (94.89, 226.0, 58.0%) b = 4 (29.63, 30.01, 1.27%) (56.73, 103.2, 45.0%) (130.3, 239.7, 45.6%) b = 9 (30.01, 30.02, 0.03%) (87.94, 105.6, 16.7%) (176.0, 261.5, 32.7%) U = 25, σ = 0.2µ T = 3 T = 10 T = 20 b = 1/9 (3.336, 3.336, 0.00%) (12.22, 12.35, 1.05%) (43.12, 44.60, 3.32%) b = 1/4 (7.503, 7.503, 0.00%) (24.89, 26.30, 5.36%) (65.22, 72.95, 10.6%) b = 1 (18.80, 30.00, 37.3%) (51.96, 101.6, 48.9%) (113.6, 226.0, 49.7%) b = 4 (40.48, 45.00, 10.0%) (68.18, 150.9, 54.8%) (139.8, 322.7, 56.7%) b = 9 (44.98, 45.00, 0.04%) (96.32, 151.2, 36.3%) (175.6, 327.5, 46.4%)

Table 2 Costs incurred by PolicyMAR and PolicyIND, and percentages of cost reduction. information. Indeed, for all such parameter settings, the gap fluctuates fairly consistently as a function of b, taking values near 50% for values of b closer to 1, and taking smaller values for other settings of b. We offer two possible explanations for this phenomena. First, we note that for the U parameters we consider, b being close to 1 tends to coincide with µ×(b+1) being close to 1. We note U that µ×(b+1) equals 1 exactly when PolicyIND is indifferent between always ordering up to 0, and U always ordering up to U. Our results suggest that it is exactly in this regime, namely when µ×(b+1) is close to 1, that the relative gap between the two policies tends to be largest. However, a more refined analysis reveals that when T and U are both large, this effect seems to Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 32

U be mitigated on one side, as the gap remains large even when µ×(b+1) is small, although the gap U shrinks when µ×(b+1) becomes large. To help explain this refinement, we reason alternatively as follows, building on our earlier discussion about how certain parameter values can lead to the two policies degenerating and behaving similarly. It follows from Theorem3 that in the first period µ QT k PolicyMAR orders up to 0 (i.e. does not order at all) if U ∈ [0, k=1 b+k ), and orders up to U if µ T U ∈ ( b+T , 1]. Of course, PolicyMAR may also order up to many intermediate values. However, the µ QT k 1 µ T above implies that if U ∈ 0, min( k=1 b+k , b+1 ) , or U ∈ b+T , 1 , then PolicyMAR and PolicyIND degenerate to a common policy (at least in the first period). Thus if b is relatively large, since QT k k=1 b+k will be extremely small, the primary manner for such a degeneracy/agreement to occur µ T U would be to have U ∈ b+T , 1 , equivalently b ≥ T × ( µ − 1). The only parameter settings we consider for which this condition holds either require T = 3, or T = 10,U = 15, b = 9. We note that indeed, the setting in which T = 10,U = 15, b = 9 is essentially the only setting in which T and b are moderately large, yet the gap between the policies remains very small. T Alternatively, if b is relatively small, since b+T is then close to 1, the primary manner for such µ QT k 1 a degeneracy/agreement to occur would be to have U ∈ [0, k=1 b+k ) (since b+1 is very close to 1 QT k for small values of b). As a Taylor-series approximation shows that k=1 b+k behaves roughly like U −b log( µ ) T for small b, this is (roughly) equivalent to requiring that b ≤ log(T ) . However, this inequal- 1 ity holds for many parameter settings involving large T, specifically : U = 15,T = 10, b = 9 ; U = 1 1 1 1 15,T = 20, b = 9 ; U = 20,T = 10, b = 9 ; U = 20,T = 10, b = 4 ; U = 20,T = 20, b = 9 ; U = 25,T = 1 1 1 1 10, b = 9 ; U = 25,T = 10, b = 4 ; U = 25,T = 20, b = 9 ; U = 25,T = 20, b = 4 . Thus we find that there is a substantial asymmetry between small and large values of b, with small values of b leading to more scenarios in which the two policies degenerate and behave identically, at least in the first period. Intuitively, this should lead to smaller gaps for smaller values of b, while the presence of a large gap should be more robust for larger values of b, exactly as reflected in our numerical experiments. Of course, the exact behavior of PolicyMAR (and associated thresholds) over time depends on the realized demand (not just µ), yet one might expect this intuition to (roughly) hold broadly over the entire time horizon.

In summary, we ﬁnd that PolicyMAR substantially outperforms PolicyIND for these settings, further demonstrating the utility of our model and the importance of taking dependencies into consideration when modeling robustness. Our experiments also suggest many interesting and subtle phenomena, whose full exploration we leave as an interesting direction for future research.

6. Conclusion In this paper, we formulated and solved a dynamic distributionally robust multi-stage newsvendor model which naturally uniﬁes the analysis of several inventory models with demand forecasting, and enables the optimizer to incorporate past realizations of demand into the structure of the uncertainty set going forwards. We explicitly computed the minimax optimal policy (and associated worst-case distribution) in closed form. Our main proof technique involved a non-trivial induction, combining ideas from convex analysis, probability, and DP. By analyzing in-depth the interplay between the minimax optimal policy and associated worst-case distribution, we proved that at optimality the worst-case demand distribution corresponds to the setting in which inventory may become obsolete at a random time, a scenario of practical interest. To gain further insight into our explicit solution, we computed the limiting dynamics as the time horizon grows large, by proving weak convergence to an appropriate limiting stochastic process, for which many quantities have a simple and intuitive closed form. We also compared to the analogous setting in which demand is independent across periods, and made several comparisons between these two models. Finally, we complemented our results by providing a targeted and concise numerical experiment further demonstrating the beneﬁts of our model. Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 33

Our work leaves many interesting directions for future research. First, to make the model more practically applicable, it would be interesting to extend our results to more complicated dependency structures which robustify a larger class of forecasting models, e.g. that for which E[Dt+1|D[t]] = atDt + bt (i.e. general time-dependent linear conditional expectations). One could also incorporate higher order dependencies, non-linearities, as well as higher-dimensional covariate and feature spaces such as those arising in factor models. Similarly, one could consider inventory models with more realistic features, e.g. various notions of ordering costs, settings in which demand is lost, positive lead times, etc. Of course, our general approach could also be applied to many models used by the Operations Research community where forecasting and model uncertainty non-trivially interact. In all of these cases, it would be interesting to develop a more general theory of when the associated DP have a special structure which allows for a closed-form (or at least computationally tractable) solution. We note that such questions may also shed further light on the search for optimal probability inequalities in a variety of settings, e.g. bounds for functionals of martingale sequences and their generalizations. As suggested by our examples in Section 7.12, these generalizations will likely require several fundamentally new ideas and insights. Taking a broader view, perhaps the most interesting directions for future research involve explor- ing the relationship between different ways to model uncertainty in multi-stage optimization problems. What is the precise relationship between static and dynamic models of uncertainty, and how should we think about comparing and selecting between different models? This question involves notions of computational efficiency, model appropriateness in any given application, as well as questions related to the so-called price of correlations. Of course, such issues are intimately related to various relevant questions of a statistical nature. How precisely should our ability to fit and tune a model relate to our choice of uncertainty set? How does the notion of model over-fitting come into play here? Such questions become especially interesting in the context of multi-stage problems. Another intriguing line of questioning involves the relation between classical probability and (distributionally) robust optimization. For example, the obsolescence feature of the worst-case martingale in our setting can be interpreted as a “robust” manifestation of the celebrated martingale convergence theorem. A deeper understanding of the precise connection between the limit laws of classical probability and the solutions to distributionally robust optimization problems remains an intriguing open question, where we note that closely related questions have been the subject of several recent investigations (cf. Abernethy et al. [1], Bandi et al. [8,9]). Formalizing and analyzing these and related questions will likely require a combination of ideas and techniques from optimization, statistics, and probability, all of which bring to bear different perspectives with which to understand uncertainty.

Acknowledgments The authors would like to thank Erhan Bayraktar, Vishal Gupta, Garud Iyengar, Sandeep Juneja, Anton Kleywegt, Marcel Nutz, Alex Shapiro, and Alejandro Toriello for helpful discussions and insights. David A. Goldberg also gratefully acknowledges support from NSF grant no. 1757394.

References [1] Abernethy, J., P. Bartlett, R. Frongillo, A. Wibisono. 2013. How to hedge an option against an adversary: Black-scholes pricing is minimax optimal. Advances in Neural Information Processing Systems. 2346–2354. [2] Agrawal, S., Y. Ding, A. Saberi, Y. Ye. 2012. Price of correlations in stochastic optimization. Operations Research. 60(1) 150–162. [3] Aharon, B., G. Boaz, S. Shimrit. 2009. Robust multi-echelon multi-period inventory control. European Journal of Operational Research. 199(3) 922–935. [4] Ahmed, S., U. Cakmak, A. Shapiro. 2007. Coherent risk measures in inventory problems. European Journal of Operational Research. 182 226–238. Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 34

[5] Altman, E., E. Feinberg, A. Shwartz. 2000. Weighted discounted stochastic games with perfect information. Ann. Internat. Soc. Dynamic Games. 5 303-323. [6] Badinelli, R. 1990. The inventory costs of common mis-specification of demand-forecasting models. The international journal of production research. 28(12) 2321–2340. [7] Bandi, C., D. Bertsimas. 2012. Tractable stochastic analysis in high dimensions via robust optimization. Mathematical programming. 134(1) 23-70. [8] Bandi, C., D. Bertsimas, N. Youssef. 2015. Robust queueing theory. Operations Research. 63(3) 676–700. [9] Bandi, C., D. Bertsimas, N. Youssef. 2018. Robust transient multi-server queues and feed-forward net- works. Queueing Systems. Forthcoming. [10] Bayraktar, E., S. Yao. 2014. On the Robust Optimal Stopping Problem. SIAM Journal on Control and Optimization. 52(5) 3135–3175. [11] Bayraktar, E., Y. Zhang, Z. Zhou. 2014. A note on the Fundamental Theorem of Asset Pricing under model uncertainty. Risks. 2(4) 425-433. [12] Beigblock, M., M. Nutz. 2014. Martingale Inequalities and Deterministic Counterparts. Electronic Jour- nal of Probability. 19(95) 1–15. [13] Beiglbock, M., M. Nutz, N. Touzi. 2017. Complete Duality for Martingale Optimal Transport on the Line. The Annals of Probability. 45(5) 3038-3074. [14] Bielecki, T., I. Cialenco, M. Pitera. 2014. On time consistency of dynamic risk and performance measures in discrete time. arXiv preprint arXiv:1409.7028. [15] Ben-Tal, A., L. El Ghaoui, A. Nemirovski. 2009. Robust Optimization. Princeton University Press, Princeton, NJ. [16] Ben-Tal, A., D. den Hertog, A. De Waegenaere, B. Melenberg, G. Rennen. 2013. Robust solutions of optimization problems affected by uncertain probabilities. Management Science. 59(2) 341-357. [17] Ben-Tal, A., B. Golany, A. Nemirovski, J.P. Vial. 2005. Retailer-supplier flexible commitments contracts: a robust optimization approach. Manufacturing & Service Operations Management. 7 248–271. [18] Bertsimas, D., X.V. Doan, K. Natarajan, C.P. Teo. 2010. Models for minimax stochastic linear optimization problems with risk aversion. Mathematics of Operations Research. 35(3) 580–602. [19] Bertsimas, D., D. Brown, C. Caramanis. 2011. Theory and applications of robust optimization. SIAM Review 53(3) 464–501. [20] Bertsimas, D., V. Gupta, N. Kallus. 2017. Robust sample average approximation. Mathematical Pro- gramming. Forthcoming. [21] Bertsimas, D., N. Kallus. 2014. From Predictive to Prescriptive Analytics. arXiv preprint arXiv:1402.5481. [22] Bertsimas, D., D. A. Iancu, P.A. Parrilo. 2010. Optimality of affie policies in multistage robust optimization. Mathematics of Operations Research. 35(2) 363-394. [23] Bertsimas, D., I. Popescu. 2005. Optimal inequalities in probability theory: A convex optimization approach. SIAM Journal on Optimization. 15(3) 780–804. [24] Bertsimas, D., M. Sim, M. Zhang. 2014. A practicable framework for distributionally robust linear optimization. Preprint. [25] Bertsimas, D., A. Thiele. 2006. A robust optimization approach to inventory theory. Operations Research. 54(1) 150-168. [26] Bertsimas, D. and Vayanos, P. 2015. Data-driven learning in dynamic pricing using adaptive optimization. Preprint. [27] Billingsley, P. 1999. Convergence of Probability Measures. Wiley, New York. [28] Blinder, A. 1983. Inventories and sticky prices: More on the microfoundations of macroeconomics. National Bureau of Economic Research. [29] Bonnans, J., A. Shapiro. 2013. Perturbation analysis of optimization problems. Springer Science and Business Media. Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 35

[30] Box, G., G. Jenkins, G. Reinsel. 2011. Time series analysis: forecasting and control. Vol. 734. John Wiley and Sons. [31] Boute, R. N., S. M. Disney, J. A. Van Mieghem. 2018. Dual Sourcing and Smoothing under Non-Stationary Demand Time Series: Re-shoring with SpeedFactories. Available at SSRN: https://ssrn.com/abstract=3140083. [32] Brown, G., J. Lu, R. Wolfson. 1964. Dynamic modeling of inventories subject to obsolescence. Manage- ment Science. 11(1). 51–63. [33] Carrizosaa, E., A. Olivares-Nadal, P. Ramirez-Cobob. 2016. Robust newsvendor problem with autore- gressive demand. Computers & Operations Research. 68 123-133. [34] Cerag, P. 2010. Advances in Inventory Management: Dynamic Models. Erasmus Research Institute of Management (ERIM). No. EPS-2010-199-LIS. [35] Chao, X., X. Gong, C. Shi, H. Zhang. 2015. Approximation Algorithms for Perishable Inventory Systems. Operations Research. 63(3) 585–601. [36] Chen, F., Z. Drezner, J. Ryan, D. Simchi-Levi. 2000. Quantifying the bullwhip effect in a simple supply chain: The impact of forecasting, lead times, and information. Management Science. 46(3) 436–443. [37] Chen, Y., G. Iyengar. 2011. A new robust cycle-based inventory control policy. working paper. [38] Cheng, X., F. Riedel. 2013. Optimal stopping under ambiguity in continuous time. Mathematics and Financial Economics. 7(1) 29-68. [39] Choi, S., A. Ruszczynski. 2008. A risk-averse newsvendor with law invariant coherent measures of risk. Operations Research Letters. 36(1) 77–82. [40] Clark, A. J., H. Scarf. 1960. Optimal policies for a multi-echelon inventory problem. Management Sci- ence. 50(12) 475-490. [41] Cogger, K. 1988. Proposals for research in time series forecasting. International Journal of Forecasting. 4(3) 403–410. [42] De Gooijer, J., R. Hyndman. 2006. 25 years of time series forecasting. International journal of forecasting. 22(3) 443–473. [43] Delage, E., Y. Ye. 2010. Distributionally robust optimization under moment uncertainty with application to data-driven problems. Operations Research. 58(3) 596-612. [44] Dolinsky, Y., Soner, M. 2014. Martingale optimal transport and robust hedging in continuous time. Probability Theory and Related Fields. 160(1-2) 391-427. [45] Dominguez, M., I. Lobato. 2004. Consistent estimation of models defined by conditional moment restrictions. Econometrica. 72(5) 1601–1615. [46] Dupacová,J. 1987. The minimax approach to stochastic programming and an illustrative application. Stochastics. 20 73–88. [47] Dupacová,J. 2001. Stochastic programming: Minimax approach. In: A. Floudas and P.M. Pardalos (Eds.), Encyclopedia of Optimization. 5 327–330. Kluwer Academic Publishers. [48] Edgeworth, F. 1888. The mathematical theory of banking. Journal of the Royal Statistical Society. 51(1) 113–127. [49] Epstein, L. G., M. Schneider. 2003. Recursive multiple-priors. Journal of Economic Theory. 113(1) 1-31. [50] Erkip, N., W. Hausman, S. Nahmias. 1990. Optimal centralized ordering policies in multi-echelon inventory systems with correlated demands. Management Science. 36(3) 381–392. [51] Escanciano, J., I. Lobato. 2009. Testing the martingale hypothesis. Palgrave Hand-book of Econometrics, Palgrave, MacMillan. [52] Feldman, R. 1978. A continuous review (s, S) inventory system in a random environment. Journal of Applied Probability. 654–659. [53] Fildes, R., S. Makridakis. 1995. The impact of empirical accuracy studies on time series analysis and forecasting. International Statistical Review. 289-308. Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 36

[54] Fildes, R., K. Nikolopoulos, S. F. Crone, A. A. Syntetos. 2008. Forecasting and Operational Research: A Review. The Journal of the Operational Research Society. 59(9) 1150–1172. [55] Föllmer,H., P. Leukert. 1999. Quantile hedging. Finance and Stochastics. 3(3) 251-273. [56] Franke, J. 1985. Minimax-robust prediction of discrete time series. Zeitschrift fr Wahrscheinlichkeits- theorie und Verwandte Gebiete. 68(3) 337-364. [57] Gallego, G. 1998. New bounds and heuristics for (Q, r) policies. Management Science. 44(2) 219–233. [58] Gallego, G., O. Ozer. 2001. Integrating replenishment decisions with advance demand information. Management Science. 47(10) 1344–1360. [59] Gallego, G. 2001. Minimax analysis for finite-horizon inventory models. IIE Transactions. 33(10) 861– 874. [60] Gallego, G., I. Moon. 1993. The distribution free newsboy problem: review and extensions. Journal of the Operational Research Society. 44(8) 825–834. [61] Gallego, G., I. Moon. 1994. Distribution free procedures for some inventory models. Journal of the Operational research Society. 651–658. [62] Gallego, G., J. Ryan, D. Simchi-Levi. 2001. Minimax analysis for finite horizon inventory models. IIE Transactions. 33 861–874. [63] Glasserman, P., X. Xu. 2014. Robust risk measurement and model risk. Quantitative Finance. 14(1) 29–58. [64] Graves, S.C. 1999. A single-item inventory model for a nonstationary demand process. Manufacturing & Service Operations Management. 1(1) 50–61. [65] Graves, S.C., M.C. Meal, S. Dasu, Y. Qiu. 1986. Two-stage production planning in a dynamic environment. Axsater S, Schneeweiss C, Silver E, eds. Multi-Stage Production Planning and Inventory Control. Lecture Notes in Economics and Mathematical Systems. Vol. 266. Springer-Verlag, Berlin. 9-43. [66] Guerrier, S., R. Molinari, M. Victoria-Feser. 2014. Estimation of Time Series Models via Robust Wavelet Variance. Austrian Journal of Statistics. 43(4) 267–277. [67] Gupta, Vishal. 2015. Near-Optimal Ambiguity Sets for Distributionally Robust Optimization. Preprint. [68] Hanasusanto, G., A. Grani, D. Kuhn, W. Wallace, S. Zymler. 2015. Distributionally robust multi-item newsvendor problems with multimodal demand distributions. Mathematical Programming. 152 1–32. [69] Hansen, L., T. Sargent. 2001. Robust control and model uncertainty. American Economic Review. 60–66. [70] Hausman, W.H. 1969. Sequential decision problems: A model to exploit existing forecasters. Management Science. 16(2) B-93-B-111. [71] Heath, D. C., P. L. Jackson. 1994. Modeling the evolution of demand forecasts with application. IIE Transactions. 26(3) 17-30. [72] Hill, T., R. Kertz. 1983. Stop rule inequalities for uniformly bounded sequences of random variables. Transactions of the American Mathematical Society. 278 197–207. [73] Huber, P. 1964. Robust estimation of a location parameter. The Annals of Mathematical Statistics. 35(1) 73–101. [74] Iancu, D.A., M. Petrik, D. Subramanian. 2015. Tight Approximations of Dynamic Risk Measures. Mathematics of Operations Research. 40(3) 655-682. [75] Iancu, D.A., M. Sharma, M. Sviridenko. 2013. Supermodularity and affine policies in dynamic robust optimization. Operations Research. 61(4) 941-956. [76] Iglehart, D. L. 1963. Optimality of (s, S) policies in the infinite horizon dynamic inventory problem. Management Science. 9(2) 259-267. [77] Iglehart, D., S. Karlin. 1962. Optimal policy for dynamic inventory process with nonstationary stochastic demands. Arrow K, Karlin S, Scarf H, eds. Studies in Applied Probability and Management Science, Chap. 8 (Stanford University Press, Redwood City, CA), 127-147. [78] Iida, T., P. Zipkin. 2006. Approximate solutions of a dynamic forecasting inventory model. Manufac- turing & Service Operations Management. 8(4) 407-425. Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 37

[79] Ishihara, J., M. Terra, J. Campos. 2006. Robust Kalman filter for descriptor systems. IEEE Transactions on Automatic Control. 51(8) 1354. [80] Iyengar, G. 2005. Robust dynamic programming. Mathematics of Operations Research. 30(2) 257-280. [81] Joglekar, P., P. Lee. 1993. An exact formulation of inventory costs and optimal lot size in face of sudden obsolescence. Operations Research Letters. 14(5) 283–290. [82] Johnson, G. D., H. E. Thompson. 1975. Optimality of myopic inventory policies for certain dependent demand processes. Management Science. 21(11) 1303-1307. [83] Kalymon, B. 1971. Stochastic prices in a single-item inventory purchasing model. Operations Research. 19(6) 1434–1458. [84] Kasugai, H., T. Kasegai. 1961. Note on minimax regret ordering policy-static and dynamic solutions and a comparison to maximin policy. Journal of the Operations Research Society of Japan. 3(4) 155–169. [85] Klabjan, D., D. Simchi-Levi, M. Song. 2013. Robust stochastic lot-sizing by means of histograms. Pro- duction and Operations Management. 22 691-710. [86] Lam, H. 2017. Sensitivity to Serial Dependency of Input Processes: A Robust Approach. Management Science. Forthcoming. [87] Lee, H., V. Padmanabhan, S. Whang. 2004. Information distortion in a supply chain: the bullwhip effect. Management Science. 50(12) 1875–1886. [88] Levi, R., G. Janakiraman, M. Nagarajan. 2008. A 2-approximation algorithm for stochastic inventory control models with lost sales. Mathematics of Operations Research. 33(2) 351–374. [89] Lovejoy, W. 1992. Stopped myopic policies for some inventory models with uncertain demand distributions. Management Science. 38(5) 688-707. [90] Lu, X. W., J. S. Song, A. Regan. 2006. Inventory planning with forecast updates: Approximate solutions and cost error bounds. Operations Research. 54(6) 1079-1097. [91] Luz, M., M. Moklyachuk. 2014. Minimax-robust filtering problem for stochastic sequences with stationary increments. Theory of Probability and Mathematical Statistics. 89 127–142. [92] Mamani, H., S. Nassiri, M. R. Wagner. 2017. Closed-form solutions for robust inventory management. Management Science. 63(5) 1625-1643. [93] Marandi, A., D. Hertog. 2017. When are static and adjustable robust optimization with constraint-wise uncertainty equivalent?. Mathematical Programming. Forthcoming. [94] Maronna, R., M.Douglas, V. Yohai. 2006. Robust statistics. John Wiley and Sons, Chichester. [95] Medina, M., E. Ronchetti. 2015. Robust statistics: a selective overview and new directions. Wiley Inter- disciplinary Reviews: Computational Statistics. [96] Miller, B. 1986. Scarf’s state reduction method, flexibility, and a dependent demand inventory model. Operations Research. 34(1) 83-90. [97] Milner, J. M., P. Kouvelis. 2005. Order quantity and timing flexibility in supply chains: The role of demand characteristics. Management Science. 51(6) 970-985. [98] Moklyachuk, M. 2000. Robust procedures in time series analysis. Theory Stoch. Process. 6(22) 3-4. [99] Nilim, A., L. El Ghaoui. 2005. Robust control of Markov decision processes with uncertain transition matrices. Operations Research. 53(5) 780-798. [100] Nishimura, K.G., H. Ozaki. 2007. Irreversible investment and Knightian uncertainty. Journal of Eco- nomic Theory. 136 668-694. [101] Notzon, E. 1970. Minimax inventory and queueing models. Technical Report No. 7, Department of Operations Research, Stanford University. [102] Perakis, G., G. Roels. 2008. Regret in the newsvendor model with partial information. Operations Research. 56(1) 188-203. [103] Pierskalla, W. 1969. An inventory problem with obsolescence. Naval Research Logistics Quarterly. 16(2). 217–228. Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 38

[104] Pindyck, R. 1982. Adjustment costs, uncertainty, and the behavior of the firm. The American Economic Review. 415–427. [105] Popescu, I. 2007. Robust mean-covariance solutions for stochastic optimization. Operations Research. 55(4) 98-112. [106] Prékopa, A. 1995. Stochastic Programming. Kluwer Academic Publishers. [107] Puterman, Martin L. 2014. Markov decision processes: discrete stochastic dynamic programming. John Wiley and Sons. [108] Reyman, G. 1989. State reduction in a dependent demand inventory model given by a time series. European Journal of Operational Research. 41(2) 174-180. [109] Riedel, F. 2009. Optimal stopping with multiple priors. Econometrica. 77(3) 857-908. [110] Royden, H., P. Fitzpatrick. 1988. Real analysis. New York: Prentice Hall. [111] Ruckdeschel, P. 2010. Optimally (distributional-) robust Kalman filtering. arXiv preprint arXiv:1004.3393. [112] Ryan, J. 1998. Analysis of inventory models with limited demand information. Ph.D. Thesis. [113] Scarf, H. 1958. A min-max solution of an inventory problem. In K. Arrow, S. Karlin and H. Scarf (Eds.) Studies in the Mathematical Theory of Inventory and Production. Stanford University Press, Stanford, CA, 201–209. [114] Scarf, H. 1959. Bayes solution of the statistical inventory problem. The annals of mathematical statistics. 490–508. [115] Scarf, H. 1960. Some remarks on Bayes solutions to the inventory problem. Naval research logistics quarterly. 7(4) 591–596. [116] Scarf, H. 1960. The Optimality of (s, S) Policies in the Dynamic Inventory Problem. Mathematical Methods in the Social Sciences. Stanford University Press. 196-202. [117] See, C. T., M. Sim. 2010. Robust approximation to multiperiod inventory management. Operations Research. 58(3) 583-594. [118] Shapiro, A. 2009. On a time consistency concept in risk averse multistage stochastic programming. Operations Research Letters. 37(3) 143–147. [119] Shapiro, A. 2012. Minimax and risk averse multistage stochastic programming. European Journal of Operational Research. 219 719–726. [120] Shapiro, A. 2016. Rectangular sets of probability measures. Operations Research. 64(2) 528-541. [121] Shapiro, A., D. Dentcheva, A. Ruszczyński.2009. Lectures on Stochastic Programming: Modeling and Theory. SIAM, Philadelphia. [122] Solyali, Oguz, J. Cordeau, and G. Laporte. ”The impact of modeling on robust inventory management under demand uncertainty.” Management Science 62, no. 4 (2015): 1188-1201. [123] Song, J., P. Zipkin. 1993. Inventory control in a fluctuating demand environment. Operations Research. 41(2) 351-370. [124] Song, J., P. Zipkin. 1996. Managing inventory with the prospect of obsolescence. Operations Research. 44(1) 215–222. [125] Stockinger, N., R. Dutter. 1987. Robust time series analysis: A survey. Kybernetika. 23(7) 3–88. [126] Sun, J., J. A. Van Mieghem. 2017. Robust dual sourcing inventory management: Optimality of capped dual index policies. Available at SSRN: https://ssrn.com/abstract=2991250. [127] Toktay, L., L. Wein. 2001. Analysis of a forecasting-productioninventory system with stationary demand. Management Science. 47(9) 1268-1281. [128] Trojanowska, M., P.M. Kort. 2010. The Worst Case for Real Options. Journal of Optimization Theory and Applications. 146(3) 709-734. [129] Tukey, W. 1960. A survey of sampling from contaminated distributions. Contributions to probability and statistics. 2 448–485. Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 39

[130] Veinott, Jr., A. F. 1965. Optimal policy for a multi-product, dynamic, nonstationary inventory problem. Management Science. 12(3) 206-222. [131] Veinott, Jr., A. F. 1966. On the optimality of (s, S) inventory policies: new conditions and a new proof. SIAM Journal on Applied Mathematics. 14 1067-1083. [132] Wagner, M. ”Robust Inventory Management: An Optimal Control Approach.” Operations Research 66, no. 2 (2017): 426-447. [133] Wang, T., A. Atasu, M. Kurtulus. 2012. A multi-ordering newsvendor model with dynamic forecast evolution. Manufacturing & Service Operations Management. 14(3) 472-484. [134] Whitt, W. 2002. Stochastic-process limits: an introduction to stochastic-process limits and their application to queues. Springer. [135] Wiesemann, W., D. Kuhn, B. Rustem. 2013. Robust Markov decision processes. Mathematics of Oper- ations Research. 38(1) 153-183. [136] Xin, L., D.A. Goldberg, and A. Shapiro. 2013. Time (in)consistency of multistage distributionally robust inventory models with moment constraints. arXiv preprint arXiv:1304.3074. [137] Yue, J., B. Chen, M. Wang. 2006. Expected value of distribution information for the newsvendor problem. Operations Research. 54(6) 1128–1136. [138] Záˇcková,J.ˇ 1966. On minimax solution of stochastic linear programming problems. Casopisˇ Pˇestován´ı Matematiky. 91 423–430. [139] Zipkin, P.H. 2000. Foundations of Inventory Management. McGraw-Hill, Boston. [140] Zhu, Z., J. Zhang, Y. Ye. 2013. Newsvendor optimization with limited distribution information. Opti- mization methods and software. 28(3) 640–667. Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 40

7. Technical Appendix 7.1. Proof of Lemma1 Proof of Lemma1: We first prove that one can w.l.o.g. restrict to policies always ordering up to at least 0. For s ∈ [1,T ], let Πˆ s,0 denote that subset of Πˆ consisting of those policiesπ ˆ with T πˆ the following property: for all t ∈ [T − s + 1,T ] and d = (d1, . . . , dT ) ∈ [0,U]Q, xt (d[t−1]) ≥ 0. Also, let Πˆ 0,0 =∆ Π.ˆ We now prove that for all s ∈ [0,T − 1], and any policyπ ˆ ∈ Πˆ s,0, there exists a 0 ˆ s+1,0 T πˆ0 πˆ policyπ ˆ ∈ Π s.t. for all vectors d = (d1, . . . , dT ) ∈ [0,U]Q and t ∈ [1,T ], Ct (d[t]) ≤ Ct (d[t]), proving the desired result. Let us fix any s ∈ [0,T − 1], and letπ ˆ be any policy belonging to Πˆ s,0. 0 0 0 We now define a sequence of functions {xt, t ∈ [1,T ]}, where x1 is a real constant, and xt is a t−1 map from [0,U]Q to Q for t ≥ 2. We will then prove that these functions correspond to a policy belonging to Πˆ s+1,0, which incurs cost no greater thanπ ˆ in every time period for every sample 0 πˆ 0 πˆ path of demand. Let xt = xt for all t ∈ [1,T ] \{T − s}, and xT −s = max(0, xT −s). We now verify 0 ˆ s+1,0 that {xt, t ∈ [1,T ]} corresponds to some policy belonging to Π , and begin by demonstrating t 0 0 0 ∆ 0 that for all t ∈ [0,T − 1] and d = (d1, . . . , dt) ∈ [0,U]Q, xt+1(d) ≥ xt(d[t−1]) − dt, where x1(d[0]) = x1, 0 ∆ ∆ x0(d[−1]) = x0, and d0 = 0. The assertion is immediate for t ∈ [1,T ] \{T − s}, as in these cases the πˆ property is inherited from {xt , t ∈ [1,T ]}, and the monotonicty of our construction. We now verify 0 0 the assertion for t = T − s, by demonstrating that xT −s+1(d) ≥ xT −s(d[T −s−1]) − dT −s, and proceed πˆ by a case analysis. First, suppose xT −s(d[T −s−1]) ≥ 0. In this case, the desired property is trivially πˆ 0 0 inherited from xT −s. If not, by construction xT −s(d[T −s−1]) = 0, and thus xT −s(d[T −s−1])−dT −s ≤ 0. ˆ s,0 πˆ However,π ˆ ∈ Π implies that xT −s+1(d) ≥ 0, demonstrating the desired property. Combining the 0 0 πˆ above with the fact that xt ≥ 0 for all t ∈ [T −s+1,T ] by virtue of xt equaling xt for all t ≥ T −s+1 ˆ s,0 0 andπ ˆ belonging to Π , and the fact that xT −s ≥ 0 by construction, completes the proof that ˆ s,0 0 ˆ s+1,0 πˆ0 πˆ for all s ∈ [0,T − 1], and any policyπ ˆ ∈ Π , there exists a policyπ ˆ ∈ Π s.t. xt = xt for all πˆ0 πˆ t ∈ [1,T ] \{T − s}, and xT −s = max(0, xT −s). Combining with the fact that for all d ∈ [0,U] and x < 0, C(x, d) = b(d + |x|) > b[d − 0]+ + [0 − d]+ = bd, T πˆ0 πˆ further demonstrates that for all vectors d = (d1, . . . , dT ) ∈ [0,U]Q and t ∈ [1,T ], Ct (d[t]) ≤ Ct (d[t]), completing the proof of the desired statement. Applying the statement inductively, we conclude that for anyπ ˆ ∈ Π,ˆ there exists a policyπ ˆ0 ∈ Πˆ T,0 (i.e. a policy ordering up to at least 0 in every period for all sample paths) which incurs cost at most that incurred byπ ˆ.

We next prove that one can restrict to policies always ordering up to at most U. For s ∈ [1,T ], let Πˆ s,U denote that subset of Πˆ T,0 consisting of thoseπ ˆ with the following property: for all T πˆ πˆ t ∈ [T − s + 1,T ] and d = (d1, . . . , dT ) ∈ [0,U]Q, xt (d[t−1]) ≤ U whenever yt (d[t−1]) ≤ U. Also, let Πˆ 0,U =∆ Πˆ T,0. We now prove that for all s ∈ [0,T − 1], and any policyπ ˆ ∈ Πˆ s,U , there exists a policy 0 ˆ s+1,U T πˆ0 πˆ πˆ ∈ Π s.t. for all vectors d = (d1, . . . , dT ) ∈ [0,U]Q and t ∈ [1,T ], Ct (d[t]) ≤ Ct (d[t]). Let us ﬁx any s ∈ [0,T − 1], and letπ ˆ be any policy belonging to Πˆ s,U . We now deﬁne a sequence of 0 0 0 t−1 functions {xt, t ∈ [1,T ]}, where x1 is a real constant, and xt is a map from [0,U]Q to Q for t ≥ 2. We will then prove that these functions correspond to a policy belonging to Πˆ s+1,U , which incurs 0 πˆ cost no greater thanπ ˆ in every time period for every sample path of demand. Let xt = xt for all T −s−1 t ∈ [1,T ] \{T − s}; and for d ∈ [0,U]Q let ( min U, xπˆ (d) if yπˆ (d) ≤ U, x0 (d) =∆ T −s T −s T −s πˆ xT −s(d) else.

0 ˆ s+1,U We now verify that {xt, t ∈ [1,T ]} corresponds to some policy belonging to Π , and begin by t 0 0 demonstrating that for all t ∈ [0,T − 1] and d = (d1, . . . , dt) ∈ [0,U]Q, xt+1(d) ≥ xt(d[t−1]) − dt. The assertion is immediate for t ∈ [1,T ] \{T − s − 1}, as in these cases the property is inherited Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 41

πˆ from {xt , t ∈ [1,T ]}, and the monotonicty of our construction. We now verify the assertion for 0 0 t = T − s − 1, by demonstrating that xT −s(d) ≥ xT −s−1(d[T −s−1]) − dT −s−1, and proceed by a case πˆ πˆ πˆ analysis. First, suppose that either yT −s(d[T −s−1]) > U, or yT −s(d[T −s−1]) < U and xT −s(d[T −s]) ≤ πˆ U. In this case, the desired property is either trivially true, or inherited from xT −s. If not, it πˆ πˆ must hold that yT −s(d[T −s]) ≤ U, and xT −s(d[T −s]) > U. However, by construction this implies 0 0 that xT −s−1(dT −s−2) − dT −s−1 ≤ U, while xT −s(d[T −s−1]) = U, demonstrating the desired property. 0 0 t−1 Combining the above with the fact that xt(d) ≤ U whenever xt−1(d[t−2]) for all d ∈ [0,U]Q and 0 πˆ ˆ s,U t ∈ [T − s + 1,T ] by virtue of xt equaling xt for all t ≥ T − s + 1 andπ ˆ belonging to Π , and the 0 fact that xT −s ≤ U by construction, completes the proof that for all s ∈ [0,T − 1], and any policy ˆ s,U 0 ˆ s+1,Y πˆ0 πˆ πˆ0 πˆ ∈ Π , there exists a policyπ ˆ ∈ Π s.t. xt = xt for all t ∈ [1,T ] \{T − s}, and xT −s is as deﬁned above. Combining with the fact that for all d ∈ [0,U] and x > U,

C(x, d) = x − d > b[d − U]+ + [U − d]+ = U − d,

T πˆ0 πˆ further demonstrates that for all vectors d = (d1, . . . , dT ) ∈ [0,U]Q and t ∈ [1,T ], Ct (d[t]) ≤ Ct (d[t]), completing the proof of the desired statement. Applying the statement inductively, we conclude ˆ 0 ˆ T,U that for anyπ ˆ ∈ Π, so long as x0 ≤ U, there exists a policyπ ˆ ∈ Π (i.e. a policy ordering up to a level in [0,U] in every period for all sample paths) which incurs cost at most that incurred byπ ˆ, completing the proof. .

7.2. Proof of Theorem2 Our proof is quite similar, conceptually, to the proof in Iyengar [80] that a robust MDP satisfying the so-called rectangularity property can be solved by dynamic programming. However, as our problem does not seem to fit precisely into that framework (e.g. since the conditional distribution of demand in a given time period can in principle depend on the entire history of demand up to that time), and as it is well-known that the rectangularity property and related notions can be quite subtle (cf. Xin, Goldberg and Shapiro [136]), we include a self-contained proof for completeness. We will proceed by a backwards induction, reasoning inductively that w.l.o.g. in period T the conditional distribution of demand can be chosen to be consistent with Qˆ∗, and thus the policy in period T can be chosen to be consistent withπ ˆ∗, and thus the conditional distribution of demand in period T − 1 can be chosen to be consistent with Qˆ∗, etc. It is worth noting that one could also proceed by using e.g. minimax theorems, strong duality, saddle-point theory, and/or general results from the theory of stochastic games, and we refer the interested reader to Nilim and El Ghaoui [99] for closely related results proved using such alternative approaches. We begin by making several π π definitions. For π ∈ Π, let MAR0 = MART . For t ∈ [1,T − 1], let us define MARt as follows. π π t Q ∈ MAR iff Q ∈ MAR , and for all q ∈ supp(Q[T −t]),QT −t+1|q = Q π . Also, let t t−1 xT −t+1(q),qT −t π ˆ∗ MART denote the singleton {Q }. Similarly, let Π0 = Π, and for t ∈ [1,T − 1], let us define Πt T −t π t as follows. π ∈ Πt iff π ∈ Πt−1, and for all q ∈ [0,U] , x (q) = Φ π . Also, let ΠT Q T −t+1 yT −t+1(q),qT −t denote the singleton {πˆ∗}. We now state a lemma which follows immediately from our definitions, and formalizes the notion that for any probability measure Q ∈ MAR, and time t, there is a unique probability measure π belonging to MARt which agrees with Q in periods [1, t − 1], and can be constructed naturally through an appropriate composition of marginal distributions; as well as the analogous statement for constructing policies belonging to Πt which agree with a given policy in periods [1, t − 1]. The associated measures (and policies) can in all cases be constructed through the natural composition of marginals (measurable functions).

Lemma 17. Given π ∈ Π, Q ∈ MART , and t ∈ [1,T − 1], there exists a unique probability mea- π,t π π,t sure Q ∈ MARt satisfying Q[T −t] = Q[T −t]. Similarly, given π ∈ Π and t ∈ [1,T − 1], there exists t π πt a unique policy π ∈ Πt satisfying x[T −t] = x[T −t]. Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 42

We now prove (by induction) a theorem which formalizes the intuitive backwards induction described above, and which will directly imply Theorem2. π Theorem 8. For all s ∈ [0,T − 1], π ∈ Πs, and Q ∈ MARs , T X π X X ˆs+1 π EQ[ Ct ] = f xT −s(q), z QT −s|q(z) Q[T −s−1](q). (7.32) t=T −s q∈supp(Q[T −s−1]) z∈[0,U]Q

π For all s ∈ [0,T − 1], π ∈ Πs, and Q ∈ MARs+1,

T X π X s+1 π EQ[ Ct ] = gˆ xT −s(q), qT −s−1 Q[T −s−1](q). (7.33) t=T −s q∈supp(Q[T −s−1])

For all s ∈ [0,T − 1] and π ∈ Πs,

T T X π X π sup EQ[ Ct ] = sup EQ[ Ct ]. (7.34) Q∈MAR Q∈MARπ t=1 s+1 t=1

π For all s ∈ [0,T − 1], π ∈ Πs+1, and Q ∈ MARs+1,

T X π X ˆ s+1 π EQ[ Ct ] = V yT −s(q), qT −s−1 Q[T −s−1](q). (7.35) t=T −s q∈supp(Q[T −s−1])

For all s ∈ [0,T − 1],

T T X π X π inf sup EQ[ Ct ] = inf sup EQ[ Ct ]. (7.36) π∈Π Q∈MAR π∈Πs+1 Q∈MAR t=1 t=1

Proof: We begin with the base case s = 0. (7.32) follows immediately from the fact that fˆ1(x, d) = π C(x, d). (7.33) then follows from the fact that for all π ∈ Π0, Q ∈ MAR1 , and q ∈ supp(Q[T −1]), 1 ˆ1 π 1 ˆ1 QT |q = Q π ∈ Q (x (q), qT −1), combined with the definition ofg ˆ and Q . xT (q),qT −1 T ˜ We now prove (7.34). Let us fix any π ∈ Π0. Note that for any > 0, there exists Q ∈ MAR ˜ PT π Q PT π s.t. E[ t=1 Ct D[t] )] > supQ∈MAR EQ[ t=1 Ct ] − . Let us fix any such > 0 and corresponding ˜ ˜ ˜ Q ∈ MAR, where we denote Q by Q for clarity of exposition. We now prove that

T T X π Q˜π,1 X π Q˜ E[ Ct D[t] )] ≥ E[ Ct D[t])]. (7.37) t=1 t=1 ˜π,1 ˜ ˜ As Q[T −1] = Q[T −1], it suﬃces by (7.32) to demonstrate that for every q ∈ supp(Q[T −1]),

X ˆ1 π ˜π,1 X ˆ1 π ˜ f xT (q), z QT |q(z) ≥ f xT (q), z QT |q(z). (7.38) z∈[0,U]Q z∈[0,U]Q

˜π,1 π ˜ ˜π,1 1 By construction, Q ∈ MAR , and thus for all q ∈ supp(Q[T −1]), Q = Q π ∈ 1 T |q xT (q),qT −1 ˆ1 π Q (xT (q), qT −1). Thus by deﬁnition, the left-hand side of (7.38) equals

X ˆ1 π 1 π sup f xT (q), z Q(z) =g ˆ xT (q), qT −1 . (7.39) Q∈M(qT −1) z∈[0,U]Q Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 43

˜ Noting that the martingale property ensures QT |q ∈ M(qT −1) then completes the proof of (7.38), and (7.37). Letting ↓ 0 completes the proof of (7.34).

We now prove (7.35). As Π1 ⊆ Π0, it follows from (7.33), combined with the fact that π ∈ Π1 π 1 implies x (q) = Φ π , that T yT (q),qT −1

π X 1 1 Q[C ] = gˆ Φ π , qT −1 Q[T −1](q). E T yT (q),qT −1 q∈supp(Q[T −1])

1 ˆ 1 π As Φ π ∈ Φ y (q), qT −1 ,(7.35) then follows from deﬁnitions. yT (q),qT −1 T PT π We now prove (7.36). For any > 0, there existsπ ˜ ∈ Π0 s.t. infπ∈Π supQ∈MAR EQ[ t=1 Ct ] > PT π˜ supQ∈MAR EQ[ t=1 Ct ] − . Let us ﬁx any such > 0 and correspondingπ ˜ ∈ Π0, where we denote π˜ byπ ˜ for clarity of exposition. We now prove that

T T X π˜ X π˜1 sup EQ[ Ct ] ≥ sup EQ[ Ct ]. (7.40) Q∈MAR Q∈MAR t=1 t=1

As Π1 ⊆ Π0, by (7.34) it suﬃces to demonstrate that

T T X π˜ X π˜1 sup EQ[ Ct ] ≥ sup EQ[ Ct ]. (7.41) Q∈MARπ˜ π˜1 1 t=1 Q∈MAR1 t=1

π˜ Note that Q ∈ MAR1 , combined with (7.33), implies that

T T −1 X π˜ X π˜ X 1 π˜ EQ[ Ct ] = EQ[ Ct ] + gˆ xT (q), qT −1 Q[T −1](q). t=1 t=1 q∈supp(Q[T −1])

π˜ T −1 π˜ π˜ π˜ π˜ ∈ Π0 implies that for all Q ∈ MAR1 and q ∈ [0,U]Q , xT (q) ≥ yT (q) and xT (q) ∈ [0,U]Q. By π˜ combining the above, we conclude from deﬁnitions that Q ∈ MAR1 implies

T T −1 X π˜ X π˜ X ˆ 1 π˜ EQ[ Ct ] ≥ EQ[ Ct ] + V yT (q), qT −1 Q[T −1](q) t=1 t=1 q∈supp(Q[T −1]) T −1 X X π˜ ˆ 1 π˜ = C xt (q[t−1]), qt + V yT (q), qT −1 Q[T −1](q). (7.42)

q∈supp(Q[T −1]) t=1

1 π˜ π˜ π˜1 Alternatively, if Q ∈ MAR1 , it follows from (7.35), and the fact that x[T −1] = x[T −1] and thus π˜1 π˜ yT = yT , that

T T −1 X π˜1 X X π˜ ˆ 1 π˜ EQ[ Ct ] = C xt (q[t−1]), qt + V yT (q), qT −1 Q[T −1](q). (7.43) t=1 q∈supp(Q[T −1]) t=1

˜ π˜1 ˜ For any 2 > 0, there exists Q2 ∈ MAR1 (which we denote simply as Q) s.t.

T T X π˜1 X π˜1 EQ˜ [ Ct ] > sup EQ[ Ct ] − 2. (7.44) π˜1 t=1 Q∈MAR1 t=1 Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 44

˜π,˜ 1 π˜ ˜π,˜ 1 ˜ Lemma 17 implies the existence of Q ∈ MAR1 s.t. Q[T −1] = Q[T −1]. Trivially, PT π˜ PT π˜ sup π˜ Q[ C ] ≥ ˜π,˜ 1 [ C ], and it follows from (7.42), (7.43), and (7.44) that Q∈MAR1 E t=1 t EQ t=1 t T T −1 X π˜ X X π˜ ˆ 1 π˜ ˜ EQ˜π,˜ 1 [ Ct ] ≥ C xt (q[t−1]), qt + V yT (q), qT −1 Q[T −1](q) t=1 ˜ t=1 q∈supp(Q[T −1]) T T X π˜1 X π˜1 = EQ˜ [ Ct ] > sup EQ[ Ct ] − 2. π˜1 t=1 Q∈MAR1 t=1

Letting 2 ↓ 0 completes the proof of (7.41) and (7.40). Letting ↓ 0 completes the proof of (7.36).

We now proceed by induction. Suppose that (7.32)-(7.36) hold for all s0 ∈ [0, s] for some s ≤ T − 2. We now prove that (7.32)-(7.36) also hold for s + 1, and begin by verifying (7.32). Note π that for all π ∈ Πs+1 and Q ∈ MARs+1, π X X π EQ[CT −s−1] = C(xT −s−1(q), z)QT −s−1|q(z) Q[T −s−2](q). (7.45) q∈supp(Q[T −s−2]) z∈[0,U]Q

π π Combining with (7.35), applied to s, and the fact that yT −s(q : z) = xT −s−1(q) − z, we conclude T X π X X ˆ s+1 π EQ[ Ct ] = V xT −s−1(q) − z, z QT −s−1|q(z) Q[T −s−2](q). (7.46) t=T −s q∈supp(Q[T −s−2]) z∈[0,U]Q Combining (7.45)-(7.46) with the deﬁnition of fˆs+2 completes the proof of (7.32). π We now prove (7.33). It follows from (7.32), applied to s + 1, and the fact that MARs+2 ⊆ π π MARs+1, that for all π ∈ Πs+1 and Q ∈ MARs+2, T X π X X ˆs+2 π EQ[ Ct ] = f xT −s−1(q), z QT −s−1|q(z) Q[T −s−2](q). t=T −s−1 q∈supp(Q[T −s−2]) z∈[0,U]Q

π s+2 ˆs+2 π As Q ∈ MAR , q ∈ supp(Q[T −s−2]) implies QT −s−1|q = Q π ∈ Q (x (q), qT −s−2), s+2 xT −s−1(q),qT −s−2 T −s−1 (7.33) then follows from the deﬁnitions ofg ˆs+2 and Qˆs+2. We now prove (7.34). As Πs+1 ⊆ Πs, it follows from (7.34), applied to s, that it suﬃces to demonstrate that for all π ∈ Πs+1,

T T X π X π sup EQ[ Ct ] = sup EQ[ Ct ]. (7.47) Q∈MARπ Q∈MARπ s+1 t=1 s+2 t=1 ˜ ˜ π PT π Q Let us fix any π ∈ Πs+1. For any > 0, there exists Q ∈ MARs+1 s.t. E[ t=1 Ct D[t] )] > PT π ˜ π sup π Q[ C ] − . Let us fix any such > 0 and corresponding Q ∈ MAR , where Q∈MARs+1 E t=1 t s+1 ˜ ˜ we denote Q by Q for clarity of exposition. We now prove that T T X π Q˜π,s+2 X π Q˜ E[ Ct D[t] )] ≥ E[ Ct D[t])]. (7.48) t=1 t=1 ˜π,s+2 ˜ As Q[T −s−2] = Q[T −s−2], by (7.32), applied to s + 1, it suffices to demonstrate that for every q ∈ ˜ supp(Q[T −s−2]), X ˆs+2 π ˜π,s+2 X ˆs+2 π ˜ f xT −s−1(q), z QT −s−1|q(z) ≥ f xT −s−1(q), z QT −s−1|q(z). (7.49) z∈[0,U]Q z∈[0,U]Q Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 45

˜π,s+2 π ˜ ˜π,s+2 By construction, Q ∈ MARs+2, and thus for all q ∈ supp(Q[T −s−2]), QT −s−1|q = s+2 ˆs+2 π Q π ∈ Q (x (q), qT −s−2). Thus by deﬁnition, the left-hand side of (7.49) equals xT −s−1(q),qT −s−2 T −s−1

X ˆs+2 π s+2 π sup f xT −s−1(q), z Q(z) =g ˆ xT −s−1(q), qT −1 . (7.50) Q∈M(qT −s−2) z∈[0,U]Q ˜ Noting that the martingale property ensures QT −s−1|q ∈ M(qT −s−2) then completes the proof of (7.49), and (7.48). Letting ↓ 0 further completes the proof of (7.34). We now prove (7.35). As Πs+2 ⊆ Πs+1, it follows from (7.33), applied to s + 1, and the fact that π s+2 π ∈ Πs+2 implies x (q) = Φ π , that T −s−1 yT −s−1(q),qT −s−2

T X π X s+2 s+2 Q[ C ] = gˆ Φ π Q[T −s−2](q). E t yT −s−1(q),qT −s−2 t=T −s−1 q∈supp(Q[T −s−2])

s+2 ˆ s+2 π As Φ π ∈ Φ y (q), qT −s−1 ,(7.35) then follows from deﬁnitions. yT −s−1(q),qT −s−1 T −s−1 We now prove (7.36). First, observe that by (7.36), applied to s, it suﬃces to prove that

T T X π X π inf sup EQ[ Ct ] = inf sup EQ[ Ct ]. (7.51) π∈Πs+1 Q∈MAR π∈Πs+2 Q∈MAR t=1 t=1

PT π PT π˜ For any > 0, there existsπ ˜ ∈ Πs+1 s.t. infπ∈Πs+1 supQ∈MAR EQ[ t=1 Ct ] > supQ∈MAR EQ[ t=1 Ct ]− . Let us ﬁx any such > 0 and correspondingπ ˜ ∈ Πs+1, where we denoteπ ˜ byπ ˜ for clarity of exposition. We now prove that

T T X π˜ X π˜s+2 sup EQ[ Ct ] ≥ sup EQ[ Ct ]. Q∈MAR Q∈MAR t=1 t=1

As Πs+2 ⊆ Πs+1, by (7.34), applied to s + 1, it suﬃces to demonstrate that

T T X π˜ X π˜s+2 sup EQ[ Ct ] ≥ sup EQ[ Ct ]. (7.52) Q∈MARπ˜ π˜s+2 s+2 t=1 Q∈MARs+2 t=1

π˜ Note that Q ∈ MARs+2, combined with (7.33), applied to s + 1, implies that

T T −s−2 X π˜ X π˜ X s+2 π˜ EQ[ Ct ] = EQ[ Ct ] + gˆ xT −s−1(q), qT −s−2 Q[T −s−2](q). t=1 t=1 q∈supp(Q[T −s−2])

π˜ T −s−2 π˜ π˜ π˜ ∈ Πs+1 implies that for all Q ∈ MARs+2 and q ∈ [0,U]Q , xT −s−1(q) ≥ yT −s−1(q) and π˜ π˜ xT −s−1(q) ∈ [0,U]Q. By combining the above, we conclude from deﬁnitions that Q ∈ MARs+2 PT π˜ implies EQ[ t=1 Ct ] is at least

T −s−2 X π˜ X ˆ s+2 π˜ EQ[ Ct ] + V yT −s−1(q), qT −s−2 Q[T −s−2](q), t=1 q∈supp(Q[T −s−2]) which is itself equal to T −s−2 X X π˜ ˆ s+2 π C xt (q[t−1]), qt + V yT −s−1(q), qT −s−2 Q[T −s−2](q). (7.53)

q∈supp(Q[T −s−2]) t=1 Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 46

s+2 π˜ π˜ Alternatively, if Q ∈ MARs+2 , it follows from (7.35), applied to s+1, and the fact that x[T −s−2] = π˜s+2 PT π˜s+2 x[T −s−2], that EQ[ t=1 Ct ] equals

T −s−2 X π˜ X ˆ s+2 π EQ[ Ct ] + V yT −s−1(q), qT −s−2 Q[T −s−2](q), t=1 q∈supp(Q[T −s−2]) which is itself equal to T −s−2 X X π˜ ˆ s+2 π C xt (q[t−1]), qt + V yT −s−1(q), qT −s−2 Q[T −s−2](q). (7.54)

q∈supp(Q[T −s−2]) t=1

˜ π˜s+2 ˜ For any 2 > 0, there exists Q2 ∈ MARs+2 (which we denote simply as Q) s.t.

T T X π˜s+2 X π˜s+2 EQ˜ [ Ct ] > sup EQ[ Ct ] − 2. (7.55) π˜s+2 t=1 Q∈MARs+2 t=1

˜π,s˜ +2 π˜ ˜π,s˜ +2 ˜ Lemma 17 implies the existence of Q ∈ MARs+2 s.t. Q[T −s−2] = Q[T −s−2]. Trivially, PT π˜ PT π˜ sup π˜ Q[ C ] ≥ ˜π,s˜ +2 [ C ], which by (7.53), (7.54), and (7.55) is itself at least Q∈MARs+2 E t=1 t EQ t=1 t T −s−2 X X π˜ ˆ s+2 π ˜π,s˜ +2 C xt (q[t−1]), qt + V yT −s−1(q), qT −s−2 Q[T −s−2](q) q∈supp(Q˜π,s˜ +2 ) t=1 [T −s−2] T −s−2 X X π˜ ˆ s+2 π ˜ = C xt (q[t−1]), qt + V yT −s−1(q), qT −s−2 Q[T −s−2](q) ˜ t=1 q∈supp(Q[T −s−2]) T T X π˜s+2 X π˜s+2 = EQ˜ [ Ct ] > sup EQ[ Ct ] − 2. π˜s+2 t=1 Q∈MARs+2 t=1

Letting 2 ↓ 0 completes the proof of (7.52) and (7.51). Letting ↓ 0 completes the proof of (7.36). Combining all of the above completes the desired induction and proof of the theorem. . 7.3. Proof of Observation1 T 0 Proof of Observation1: For a general T ≥ 1, µ ∈ (0,U), x ∈ χMAR(µ, U, b),U , let j be the T T 0 T T unique index s.t. µ ∈ (Aj0 ,Aj0+1], and k to be the unique index s.t. x ∈ [Bk0 ,Bk0+1), where existence and uniqueness follow from definitions and our assumptions. From definitions (especially T of qx,µ) and our assumptions (especially time-homogeneity of the relevant parameters and the accompanying self-reducibility of the DP equations, as well as the martingale property), to prove T the desired result it suffices to prove (for such general T, µ, x) that x ≥ χMAR(µ, U, b) imples : 1. 0 0 T T k ≥ j + 1 (i.e. that the first case in the definition of qx,µ cannot occur); and 2. Ak0 ≥ x. Thus sup- T T T T +1 T +1 pose x ≥ χMAR(µ, U, b). Recall from definitions that χMAR(µ, U, b) = B T , and µ ∈ A T ,A T . Γµ Γµ −1 Γµ T +1 T T It follows that µ ≤ A T ≤ A T by the monotonicity (in T) of Aj . Combining with the fact that Γµ Γµ T T T µ ∈ (Aj0 ,Aj0+1], and the monotonicity (in j) of Aj , it follows that

T 0 Γµ ≥ j + 1. (7.56)

T T T T T Since by deﬁnition χ (µ, U, b) = B T , the fact that x ≥ χ (µ, U, b) and x ∈ [B 0 ,B 0 ), MAR Γµ MAR k k +1 T combined with the monotonicity (in k) of Bk , together imply that

0 T k ≥ Γµ . (7.57) Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 47

Combining (7.56) and (7.57) we conclude that k0 ≥ j0 + 1, showing the ﬁrst desired statement. To T complete the proof, it thus suﬃces to prove that (under the same assumptions) Ak0 ≥ x. Since T T T T by construction x ∈ [Bk0 ,Bk0+1), and it follows from (3.10) that Ak0 ≥ Bk0+1, combining the above completes the proof. Q.E.D.

7.4. Proof of Lemma4 Proof: [Proof of Lemma4] We first prove continuity. That gT (x, d) is a continuous function ST −1 T of x on (0,U) \ T −1 {Bi }, and a right-continuous function of x on [0,U] \{U}, follows from i=Γd T T definitions, and the fact that Fi (x, d) and Gi (x, d) are continuous functions of x on [0,U] for T T −1 all i. It similarly follows that lim T g (x, d) exists for all i ∈ [Γ ,T ] \{0}. It thus suffices to x↑Bi d T T T T −1 demonstrate that lim T g (x, d) equals g (B , d) for all i ∈ [Γ ,T ] \{0}. We treat two cases: x↑Bi i d T −1 T −1 T −1 i = Γd , and i ∈ [Γd + 1,T ], and begin with the case i = Γd . By assumption we preclude the T −1 case i = 0. Thus suppose i = Γd ∈ [1,T ]. In this case,

T T T lim g (x, d) = Fi−1(Bi , d) T x↑Bi T T = −bBi + (b + T )Bi + T b − (b + 1)i d. T = TBi + T b − (b + 1)i d. Alternatively,

T T T T g (Bi , d) = Gi (Bi , d) b + T T = (T − T d)Bi + (T − i)bd Ai T = TBi + T b − (b + 1)i d.

T −1 Combining the above completes the proof for this case. Next, suppose i ∈ [Γd +1,T ]. In this case,

T T T lim g (x, d) = Gi−1(Bi , d) T x↑Bi b + T T = (T − T d)Bi + T − (i − 1) bd Ai−1 T = TBi + T b − (b + 1)i d. Alternatively,

T T T T g (Bi , d) = Gi (Bi , d) b + T T T = (T − T d)Bi + (T − i)bd = TBi + T b − (b + 1)i d, Ai completing the proof. We now prove convexity. As in our proof of continuity, that gT (x, d) is a right-differentiable ST T function of x on (0,U)\ T −1 {Bi }, with non-decreasing right-derivative on the same set, follows i=Γd T T from definitions and the fact that Fi (x, d) and Gi (x, d) are linear functions of x on [0,U] for all i. It similarly follows that gT (x, d) is a right-differentiable function of x on [0,U] \{U}, and + T T −1 that lim T ∂ g (x, d) exists for all i ∈ [Γ ,T − 1] \{0}. It thus suffices to demonstrate that x↑Bi x d + T + T T T −1 T −1 lim T ∂ g (x, d) ≤ ∂ g (B , d) for all i ∈ [Γ ,T − 1] \{0}. We treat two cases: i = Γ , and x↑Bi x x i d d T −1 T −1 i ∈ [Γd +1,T −1], and begin with the case i = Γd . By assumption we preclude the cases i = 0,T . T −1 Thus suppose i = Γd ∈ [1,T − 1]. Then

+ T + T lim ∂x g (x, d) = lim ∂x Fi−1(x, d) T T x↑Bi x↑Bi = −b. Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 48

Alternatively,

+ T T + T T ∂x g (Bi , d) = ∂x Gi (Bi , d) b + T = T − T d. Ai b + T T ≥ T − T Ai = −b. Ai T −1 Combining the above completes the proof for this case. Next, suppose i ∈ [Γd + 1,T − 1]. Then

+ T + T lim ∂x g (x, d) = lim ∂x Gi−1(x, d) T T x↑Bi x↑Bi b + T = T − T d. Ai−1 Alternatively,

+ T T + T T ∂x g (Bi , d) = ∂x Gi (Bi , d) b + T = T − T d. Ai T The desired result then follows from the fact that Ai is increasing in i and d is non-negative. Combining the above completes the proof. 7.5. Proofs of Lemmas5 and6 T +1 Proof: [Proof of Lemma5] The desired result follows directly from the observation that Aj+1 is T j+1 T Aj b+j+1 trivially strictly less than Aj+1 for all j ∈ [0,T − 2], and T +1 = T < 1 for all j ∈ [0,T − 2]. Aj+1 b+T T T −1 Proof: [Proof of Lemma6] Let i = Γd , j = Γd . From Lemma4, it suﬃces to prove that + T T + T T + T ∂x g (x, d) ≤ 0 for all x < Bi , and ∂x g Bi , d ≥ 0; or that ∂x g (x, d) ≤ 0 for all x < U and T Bi = U. It follows from Lemma5 that i ∈ {j, j + 1}. We now proceed by a case analysis. First, T T T T suppose i = j. In that case, Bi = Bj , j ≤ T − 1, and Bi < U. We conclude that for all x < Bi ,

+ T + T ∂x g (x, d) = ∂x Fj−1(x, µ) = −b. Noting that

+ T T + T T ∂x g Bi , d = ∂x Gi (Bi , d) b + T b + T T +1 = T − T d ≥ T − T Ai = 0 Ai Ai completes the proof in this setting. T T Alternatively, suppose that i = j + 1. In this case, Bi > Bj , and

+ T + T lim ∂x g x, d = lim ∂x Gi−1(x, d) T T x↑Bi x↑Bi b + T b + T T +1 = T − T d ≤ T − T Ai−1 = 0. Ai−1 Ai−1

T If Bi = U, the result follows from Lemma4. Otherwise, + T T + T T ∂x g Bi , d = ∂x Gi (Bi , d) b + T b + T T +1 = T − T d ≥ T − T Ai = 0. Ai Ai Combining the above completes the proof. Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 49

7.6. Proof of Lemma7 Proof: [Proof of Lemma7] Let us proceed by showing that for each ﬁxed x ∈ [0,U], gT (x, d) = T T g max βd , x − d , d for all d ∈ [0,U]. As the equivalence is easily veriﬁed for the case d = 0, in T T T which case g (x, d) = g max βd , x − d , d = T x, suppose d > 0. We proceed by a case analysis, T T beginning with the setting d ∈ (0, zx ). In this case, max βd , x − d = x − d, and

T T x > d + BΓT ≥ d + B T −1 , d Γd

T where the second inequality follows from Lemma5 and the monotonicity (in i) of Bi . Combining T T with the easily veriﬁed fact that F j (x, d) = Gj (x − d, d) for all j completes the proof in this case. T T T T Alternatively, suppose d ∈ [zx ,U], which implies that max βd , x − d = βd , and d ≥ x − βd . We T T +1 proceed by a case analysis. First, suppose d ∈ (Aj ,Aj+1 ] for some j ∈ [−1,T − 2]. In this case, T T T T +1 T +1 Lemma5 implies that d ∈ (Aj ,Aj+1] (Aj ,Aj+1 ], and

T T g (x, d) = Gj (d). T T T T = Gj+1(Bj+1, d) = g (βd , d).

T +1 T Alternatively, suppose d ∈ (Aj+1 ,Aj+1] for some j ∈ [−1,T − 2]. In this case, Lemma5 implies that T T T T +1 T +1 d ∈ (Aj ,Aj+1] (Aj+1 ,Aj+2 ], and

T T g (x, d) = Gj+1(d). T T T T = Gj+2(Bj+2, d) = g (βd , d).

Lemma5 implies that this treats all cases. Combining the above completes the proof. 7.7. Proof of Lemma8 T Proof: [Proof of Lemma8] First, let us treat the case d ∈ [0, zx ), and begin by proving continuity. T Right-continuity at 0 when x =6 0 follows from the fact that limd↓0 F j (x, d) = T x for all j, and right-continuity at 0 when x = 0 follows from deﬁnitions. That gT (x, d) is a continuous function T ST −1 T T of d on (0, zx ) \ j=1 {x − Bj }, and a left-continuous function of d on (0, zx ), follows from the T T continuity (in d) of F (x, d) for all j. It similarly follows that lim T g (x, d) exists for all j s.t. j d↓x−Bj T T T T T x − B ∈ (0, z ), and it thus suﬃces to demonstrate that lim T g (x, d) equals g (x, x − B ) j x d↓x−Bj j T T for all j ∈ [1,T − 1] s.t. x − Bj ∈ (0, zx ). Note that for all such j,

T T T lim g (x, d) = F j−1(x, x − Bj ) T d↓x−Bj b + T T b + T T 2 = T x + (b − 1)T − b(j − 1) − T x (x − Bj ) + T (x − Bj ) . Aj−1 Aj−1 Alternatively,

T T T T g (x, x − Bj ) = F j (x, x − Bj ) b + T T b + T T 2 = T x + (b − 1)T − bj − T x (x − Bj ) + T (x − Bj ) . Aj Aj

T T T It follows that g (x, x − B ) − lim T g (x, d) equals j d↓x−Bj

b + T b + T T b + T b + T T 2 − b + ( T − T )x (x − Bj ) − ( T − T )(x − Bj ) Aj−1 Aj Aj−1 Aj Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 50

T b + T b + T b + T b + T T = (x − Bj ) − b + ( T − T )x − ( T − T )(x − Bj ) Aj−1 Aj Aj−1 Aj T T 1 1 = (x − Bj ) − b + (b + T )Bj ( T − T ) = 0, Aj−1 Aj completing the proof of continuity. + T We now prove convexity. Again applying Lemma3, it suﬃces to demonstrate that ∂d g (x, d) T T exists and is non-decreasing on (0, zx . Since F j (x, d) is a convex quadratic function of d for all j, + T T + T T ST −1 we conclude that: ∂d g (x, d) exists on (0, zx ); ∂d g (x, d) is non-decreasing on (0, zx ) \ j=1 {x − T + T T T B }; and lim T ∂ g (x, d) exists for all j ∈ [1,T − 1] s.t. x − B ∈ (0, z ). It thus suﬃces to j d↑x−Bj d j x + T + T T T T demonstrate that lim T ∂ g (x, d) ≤ ∂ g (x, x − B ) for all j ∈ [1,T − 1] s.t. x − B ∈ (0, z ). d↑x−Bj d d j j x Note that for any such j,

+ T + T T lim ∂ g (x, d) = ∂ F j (x, x − Bj ) T d d d↑x−Bj b + T b + T T = (b − 1)T − bj − T x + 2 T (x − Bj ) Aj Aj b + T = (b − 1)T − (b + 2)j + x T . Aj Alternatively, it follows from continuity that

+ T T + T T ∂d g (x, x − Bj ) = ∂d F j−1(x, x − Bj ) b + T b + T T = (b − 1)T − b(j − 1) − T x + 2 T (x − Bj ) Aj−1 Aj−1 b + T = (b − 1)T − (b + 2)j − b + x T . Aj−1 It follows that + T T + T ∂ g (x, x − Bj ) − lim ∂ g (x, d) (7.58) d T d d↑x−Bj equals b + T b + T −b + x T − x T , Aj−1 Aj which will be the same sign as j b − bAT + x(b + T ) − x(b + T ) = −bAT + x(b + T ) . (7.59) j−1 b + j j−1 b + j

b+j T T b+j Noting that b+T Aj−1 = Bj , and multiplying through the right-hand side of (7.59) by b(b+T ) , we T T T further conclude that (7.58) will be the same sign as x − Bj . Since by assumption x − Bj ∈ (0, zx ), T T we conclude that x − Bj ≥ 0, completing the proof of continuity and convexity for d ∈ (0, zx ) and right-continuity at 0. T T Next, let us treat the case d ∈ (zx ,U], and begin by proving continuity. That g (x, d) is a T ST −1 T +1 T continuous function of d on (zx ,U) \ j=0 {Aj }, and a left-continuous function of d on (zx ,U], T T follows from the continuity (in d) of Gj (d) for all j. It similarly follows that lim T +1 g (x, d) d↓Aj T +1 T T exist for all j ∈ [0,T − 1] s.t. Aj > zx . It thus suﬃces to demonstrate that lim T +1 g (x, d) = d↓Aj T T +1 T +1 T g (x, Aj ) for all j ∈ [0,T − 1] s.t. Aj > zx . Note that for any such j,

T T T +1 lim g (x, d) = Gj (Aj ) T +1 d↓Aj T T +1 = TBj+1 + T b − (b + 1)(j + 1) Aj . Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 51

Alternatively,

T T +1 T T +1 g (x, Aj ) = Gj−1(Aj ) T T +1 = TBj + T b − (b + 1)j Aj .

Thus

T T +1 T T T T +1 g (x, Aj ) − lim g (x, d) = T (Bj − Bj+1) + (b + 1)Aj T +1 d↓Aj T T = (jAT − (j + 1)AT + (b + 1) AT b + T j j+1 b + T j T = (b + j + 1)AT − (j + 1)AT = 0, b + T j j+1 completing the proof of continuity. + T We now prove concavity. Again applying Lemma3, it suffices to demonstrate that ∂d g (x, d) T T exists and is non-increasing on (zx ,U). Since Gj (d) is a linear function of d for all j, it follows from the piece-wise definition of gT (x, d) that demonstrating the desired concavity is equivalent to + T + T showing that ∂d Gj (0) is non-increasing in j. Noting that ∂d Gj (0) = T b − (b + 1)(j + 1), which is trivially decreasing in j, completes the proof. T ∆ Finally, let us prove continuity at zx . We consider two cases, depending on how ηd = x − d comes T T to go from lying above βd to lying below βd . This “crossing” can occur in two ways. In particular, T T T either βd and ηd actually intersect, or zx occurs at a jump discontinuity of βd and the two functions T never truly intersect. We proceed by a case analysis. Let i = Γ T . zx T T T T T ST T +1 First, suppose that βd and ηd actually intersect at zx , namely Bi = x−zx . If zx ∈ j=−1{Aj }, T +1 the proof of right-continuity follows identically to our previous proof of right-continuity at Aj T +1 T T for all j ∈ [0,T − 1] s.t. Aj > zx , and we omit the details. Otherwise, right-continuity at zx T follows from definitions. Either way, we need only demonstrate left-continuity. Note that zx ∈ T T (x − Bi+1, x − Bi ], and

T T T lim g (x, d) = F i (x, zx ) T d↑zx b + T T b + T T 2 = T x + (b − 1)T − bi − T x (x − Bi ) + T (x − Bi ) Ai Ai T T = TBi + T b − (b + 1)i (x − Bi ).

Alternatively,

T T T T g (x, zx ) = Gi−1(x − Bi ) T T = TBi + T b − (b + 1)i (x − Bi ), completing the proof of continuity in this case. T T T T +1 Alternatively, suppose that βd and ηd do not truly intersect at zx . In this case, zx = Ai ∈ T T (x − Bi+1, x − Bi ], and

T T T +1 lim g (x, d) = F i (x, Ai ) T d↑zx b + T T +1 b + T T +1 2 = T x + (b − 1)T − bi − T x Ai + T (Ai ) Ai Ai T +1 = b(T − i)Ai , Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 52 where the ﬁnal equality follows from straightforward algebraic manipulations, the details of which we omit. Alternatively,

T T T T +1 g x, zx ) = Gi−1(Ai ) T T +1 = TBi + T b − (b + 1)i Ai T +1 = b(T − i)Ai , where the final equality again follows from straightforward algebraic manipulations. This completes T the proof of left-continuity at zx . The proof of right-continuity in this case follows identically to T +1 T +1 T our previous proof of right-continuity at Aj for all j ∈ [0,T − 1] s.t. Aj > zx , and we omit the details. Combining the above completes the proof of the lemma. 7.8. Proof of Lemmas9 and 10 Proof: [Proof of Lemma9] The statements regarding continuity follow from Lemma8. Noting T T T −1 that d > αx implies d > x, concavity on (αx ,U) follows from (3.9) and Lemma8. Let i = ζx . T −1 T −1 Convexity on (0, zx ) follows from (3.9) and Lemma8. Supposing zx ∈/ {0,U}, it follows from T −1 T T T T T −1 definitions that zx ∈ (Ai−1,Ai ], where Ax = Ai . Combining with the convexity of F j (x, d) T −1 T −1 ST −1 T and Gj (d) for all j, to prove the lemma, it suffices to demonstrate that: 1. zx ∈/ j=−1{Aj } implies that + T + T T −1 lim ∂ f (x, d) ≤ ∂ f (x, zx ); (7.60) T −1 d d d↑zx ST −1 T and 2. x∈ / j=−1{Aj } implies + T + T lim ∂d f (x, d) ≤ ∂d f (x, x). (7.61) d↑x T −1 ST −1 T T −1 We treat several cases, and assume throughout that zx , x∈ / j=−1{Aj }. First, suppose zx = x. T T In this case, it follows from a straightforward contradiction argument that Γx = i = 0, x ∈ (0,A0 ), and

+ T + T −1 lim ∂ f (x, d) = −1 + lim ∂ F 0 (x, d) T −1 d T −1 d d↑zx d↑zx b + T − 1 b + T − 1 T −1 = −1 + (b − 1)(T − 1) − T −1 x + 2 T −1 zx A0 A0 b + T − 1 = −1 + (b − 1)(T − 1) + T −1 x A0 b + T − 1 T ≤ −1 + (b − 1)(T − 1) + T −1 A0 = bT − (b + 1). A0 Alternatively,

+ T T −1 + T −1 ∂d f (x, zx ) = b + ∂d G−1 (x) = b + (T − 1)b = bT. Combining the above completes the proof of both (7.60) and (7.61) in this case. T −1 ∆ Next, suppose zx < x. We again consider two cases, depending on how ηd = x − d comes to go T −1 T −1 from lying above βd to lying below βd . In the case that the two actually intersect, it may be T −1 T −1 easily veriﬁed that zx = x − Bi , and

+ T + T −1 lim ∂ f (x, d) = −1 + lim ∂ F i (x, d) T −1 d T −1 d d↑zx d↑zx b + T − 1 T −1 T −1 b + T − 1 T −1 = −1 + (b − 1)(T − 1) − bi − T −1 (zx + Bi ) + 2 T −1 zx Ai Ai b + T − 1 T −1 = −1 + (b − 1)(T − 1) − (b + 1)i + T −1 zx . Ai Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 53

Alternatively,

+ T T −1 + T −1 T −1 ∂d f x, zx = −1 + ∂d Gi−1 (zx ) = −1 + (T − 1)b − (b + 1)i.

It follows that

+ T T −1 + T b + T − 1 T −1 ∂ f (x, zx ) − lim ∂ f (x, d) = T − 1 − zx , d T −1 d T −1 d↑zx Ai which will be the same sign as

T T −1 T T Ai − zx ≥ Ai − Ai = 0.

Combining the above completes the proof of (7.60) in this case. Furthermore, the proof of (7.61) ST −1 T follows from the fact that x∈ / j=−1{Aj }, from which it follows that

+ T + T ∂d f (x, x) − lim ∂d f (x, d) = b + 1. d↑x

T −1 T −1 ST −2 T −1 ST −1 T Finally, suppose that zx < x and zx ∈/ j=1 {x − Bj }. As x∈ / j=−1{Aj }, this ﬁnal case follows nearly identically to the proof of (7.61) in the previous case, and we omit the details. Combining all of the above cases completes the proof of the lemma. Proof: [Proof of Lemma 10] Note that for any measure Q ∈ M(µ), it is true that

f(R) − f(L) Rf(L) − Lf(R) [f(D)] ≤ [η(D)] = µ + . EQ EQ R − L R − L But since q has support only on points d ∈ [0,U] s.t. f(d) = η(d), it follows that

f(R) − f(L) Rf(L) − Lf(R) [f(D)] = [η(D)] = µ + . Eq Eq R − L R − L

Combining the above completes the proof.

7.9. Proof of Lemma 11 Proof: [Proof of Lemma 11] We ﬁrst prove that KT (x, d) ≥ fT (x, d) for all d ∈ [0,U]. Let i = T −1 T −1 T −1 T T ζx , j = Γx , and k = Υx . That K (x, 0) ≥ f (x, 0) follows from deﬁnitions. We now prove that T T T T K (x, Al ) ≥ f (x, Al ) for all l ∈ [i, j − 1]. First, it will be useful to rewrite K in a more convenient T T form. Noting that ℵx = Ak ,

T T T T −1 T f (x, Ak ) = b(Ak − x) + Gk−1 (Ak ) T T −1 T = b(Ak − x) + (T − 1)Bk + (T − 1)b − (b + 1)k Ak , and

T T T f (x, Ak ) − T x K (x, d) = T d + T x Ak T −1 (T − 1)Bk − (b + T )x = T b − (b + 1)k + T d + T x Ak (b + T )x = b(T − k) − T d + T x. (7.62) Ak Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 54

T T −1 Combining with the fact that for all l ∈ [i, j − 1] one has Al ∈ [zx , x], proving the desired statement is equivalent to proving that

(b + T )x T T T −1 T b(T − k) − T Al + T x ≥ x − Al + (T − 1)Bl + (T − 1)b − (b + 1)l Al , Ak which is itself equivalent to demonstrating that

(b + T )x T b(l + 1 − k) + 1 − T Al + (T − 1)x ≥ 0. (7.63) Ak First, it will be useful to prove that the left-hand side of (7.63),

∆ (b + T )x T η(l) = b(l + 1 − k) + 1 − T Al + (T − 1)x, Ak is decreasing in l, for l ∈ [0, j − 1]. Indeed, after simplifying, we ﬁnd that b (b + T )x T η(l + 1) − η(l) = b + b(l + 1 − k) + 1 − T Al+1, b + l + 1 Ak which will be the same sign as

T b(l + 2 − k) + l + 2 Ak − (b + T )x T T ≤ b(l + 2 − k) + l + 2 Ak − (b + T )Bk T = (b + 1)(l + 2 − k)Ak . Noting that ` + 1 ≤ j − 1 and (3.11) implies j ≤ k completes the proof of monotonicity, which we now use to complete the proof of (7.63). In particular, the above monotonicity implies that to prove (7.63), it suffices to prove that η(k − 1) ≥ 0. Note that b(T − 1 − k) − k η(k − 1) = AT + x. k−1 b + k If b(T − 1 − k) − k ≥ 0, then trivially η(k − 1) ≥ 0. Thus suppose b(T − 1 − k) − k < 0. In this case, η(k − 1) ≥ 0 iff (b + k)AT x ≤ k−1 . k − b(T − 1 − k) T As x ≤ Bk+1, it thus suffices to prove that b + k BT ≤ AT , k+1 k − b(T − 1 − k) k−1

T which, dividing both sides by Ak−1(b + k) and simplifying, is itself equivalent to proving that (b + T )k ≥ k − b(T − 1 − k)(b + k + 1).

Noting that k ≤ T − 1, and thus k − b(T − 1 − k) ≤ k, thus completes the desired proof that T T T T K (x, Al ) ≥ f (x, Al ) for all l ∈ [i, j − 1]. T T T T We now prove that K (x, Al ) ≥ f (x, Al ) for all l ∈ [j, k]. By construction,

T T T T K (x, Ak ) = f (x, Ak ), and thus it suﬃces to prove the desired claim for l ∈ [j, k − 1]. Note that the degenerate case for T T which x = Ak can be ignored, as in that case j = k. Thus suppose x < Ak . In this case, Lemma9 Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 55

T T T T implies that f (x, d) is a continuous, concave, piecewise linear function of d on [Aj ,Ak ]. As K (x, d) is a linear function of d, it follows from the basic properties of concave functions that to prove the desired claim, it suﬃces to demonstrate that

+ T + T ∂ K(x, Ak ) ≤ lim ∂ f (x, d), (7.64) d T d d↑Ak which is equivalent to proving that

(b + T )x + T −1 b(T − k) − ≥ b + lim ∂ Gk−1 (d). (7.65) T T d Ak d↑Ak

It follows from definitions that b + T b + T AT = BT ≤ x. k k k k Combining with (7.65), we find that to prove the desired claim, it suffices to demonstrate that

b(T − k) − k ≥ b + (T − 1)b − (b + 1)k. (7.66)

Noting that both sides of (7.66) are equivalent completes the proof. T T T T Finally, let us prove that K (x, Al ) ≥ f (x, Al ) for all l ∈ [k + 1,T − 1], which will complete the proof. It again follows from Lemma9 and the basic properties of concave functions that in this case + T T + T T it suﬃces to demonstrate that ∂d K (x, Ak ) ≥ ∂d f (x, Ak ), which is equivalent to proving that

(b + T )x + T −1 T b(T − k) − T ≥ b + ∂d Gk (Ak ). (7.67) Ak It follows from deﬁnitions that k + 1 AT = AT k b + k + 1 k+1 b + T b + T = BT ≥ x. b + k + 1 k+1 b + k + 1

Combining with (7.67), we ﬁnd that in this case it suﬃces to demonstrate that

b(T − k) − (b + k + 1) ≥ b + (T − 1)b − (b + 1)(k + 1). (7.68)

Noting that both sides of (7.68) are equivalent completes the proof. Combining all of the above with the piece-wise convexity guaranteed by Lemma9 completes the proof that KT (x, d) ≥ fT (x, d) for all d ∈ [0,U]. T T T We now prove that for all l ∈ [k, T − 2], Ll (x, d) ≥ f (x, d) for all d ∈ [0,U]. Note that f (x, d) is T T T T a concave function on [Ak ,U], and by construction Ll (x, d) is a line tangent to f (x, d) at Al . It follows from the basic properties of concave functions that

T T T Ll (x, d) ≥ f (x, d) for all d ∈ [Ak ,U].

Combining those same properties with (7.64) and the basic properties of linear functions, it follows T T T that Ll (x, d) ≥ K (x, d) for all d ∈ [0,Ak ]. Combining all of the above completes the proof. Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 56

7.10. Proof of Theorems5 and6, and Observation4

Proof: [Proof of Theorem5] Let π1 denote the base-stock policy which always orders up to 0, and π2 the base-stock policy which always orders up to U. Note that if x0 = 0, then for any π1 π2 T Q ∈ MGEN, w.p.1 Ct = bdt, and Ct = U − dt, t ∈ [1,T ], which implies that OptGEN(µ, U, b) ≤ T min {T bµ, T (U − µ)} = OptIND(µ, U, b). Proof: [Proof of Theorem6] From Theorems1 and3,

T +1 T T T T T G T (βµ , µ) Γµ A T + T b − (b + 1)Γµ µ OptMAR(µ, U, b) Γµ Γµ T = = . OptIND(µ, U, b) min {T bµ, T (U − µ)} min {T bµ, T (U − µ)}

Combining the above with Lemma 12 completes the proof. Proof: [Proof of Observation4] The results of [119] (i.e. Theorem1) imply that if x0 = U and µ 1 U ≤ b+1 , then the dynamics of Problem 2.1 at optimality are as follows. The initial inventory level µ µ equals U. In each period, with probability U , the demand equals U, and with probability 1 − U equals 0. Up until the ﬁrst time that the demand equals U, the inventory level will equal U, and at the end of each period a holding cost equal to U is incurred. The ﬁrst time the demand equals U, the inventory level drops to 0, and in that period no cost is incurred. In all later periods, the inventory level is raised to 0, and if in that period the demand equals U, a cost of bµ is incurred - otherwise no cost is incurred. It follows that the value of Problem 2.1 equals

T X µ µ µ (1 − )k−1 (k − 1)U + (T − k)bµ + (1 − )T UT, U U U k=1 which is itself equal to U 2 µ U bµT + − (b + 1)U + (1 − )T (1 + b − )U. µ U µ

T µ T Alternatively, it follows from Theorems2 and3, combined with the fact that qU,µ(U) = U , qU,µ(0) = µ 1 − U , and the martingale property, that the dynamics of Problem 2.4 at optimality are as follows. µ The initial inventory level equals U. With probability U , the demand in every period is U, and the inventory level is raised to U at the start of each period, so no cost is incurred over the entire µ time horizon. With probability 1 − U , the demand in every period is 0, and the inventory level in every period is U, implying that a cost of TU is incurred over the time horizon. It follows that the µ value of Problem 2.4 equals (1 − U )TU = (U − µ)T . Combining the above with a straightforward µ 1 U−µ limiting argument, and the fact that U < b+1 implies bµ > 1, completes the proof.

7.11. Further interpretation of Observation3 To further help interpret Observation3 and the explicit forms for the various quantities, we now show how (for the case T = 2) one may derive the same quantities from a certain “heuristic relaxation” of our original problem. In doing so, we will be able to see precisely where and why certain quantities arise. In light of Observation1, let us consider the following simplified version of Problem 2.4. To set up this simplified problem, we reason as follows. Observation1 suggests that at optimality a worst-case distribution always has support on 2 points, one on 0 and one on some value which clears the inventory. Thus we create a simplified problem by “enforcing” that in the first period, the adversary selects such a distribution. Namely, if the post-ordering inventory level (in the first period) equals x, then in the first period the adversary must select (as a function of x) µ some value d ∈ max(x, µ),U and set the demand (in the first period) equal to d w.p. d , and equal µ to 0 w.p. 1− d . Note that if the adversary acts in this manner in the first period then the dynamics in the second period will (since the policy-maker can order up to any level as their inventory will Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 57 have been cleared in the first period) be equivalent to the dynamics of a single-period problem in which the policy-maker may select any starting level in [0,U] and the demand must have mean D1 (either d or 0 depending on the random realized demand in the first period), a minimax problem whose explicit solution is given in Theorem1. Combining the above, we are led to the following simplified version of Problem 2.4: µ µ inf sup (1 − ) × (2x) + × b × (d − x) + min(bd, U − d) . x∈[0,U] d d d∈ max(x,µ),U

Further relaxing the problem to allow d to take any value in [0,U] (i.e. relaxing the constraint that d ≥ max(x, µ), which will yield the same insights through a simpler analysis), we are led to the following further-simplified problem : µ µ inf sup (1 − ) × (2x) + × b × (d − x) + min(bd, U − d) . (7.69) x∈[0,U] d∈[0,U] d d For a fixed x ∈ [0,U], let us consider the inner maximization problem. Noting that min(bd, U −d) = U bd exactly when d ≤ b+1 , we find (after some simple algebra) that the inner problem has optimal value (b + 2)µx U − (b + 2)x max sup 2x + 2µb − , sup 2x + (b − 1)µ + µ × . (7.70) U d U d d∈[0, b+1 ] d∈[ b+1 ,U]

(b+2)µx U Noting that sup U 2x + 2µb − attains its maximum at d = , which is equivalent to d∈[0, b+1 ] d b+1 U−(b+2)x 2x + (b − 1)µ + µ U , we find that the first term within the maximum is superfluous in (7.70), b+1 and that (7.70) equals U − (b + 2)x sup 2x + (b − 1)µ + µ × . (7.71) U d d∈[ b+1 ,U] Now, we find that the optimal value for d in (7.71) is determined by the sign of U − (b + 2)x, i.e. U U whether x ≤ b+2 or not. We have thus so far identified two critical values : b+1 as determining U the value and behavior in the final period (in line with the single-period problem), and b+2 as determining the sign of a critical coefficient which drives whether a worst-case distribution selects the non-zero value for demand to be “small” or “large”. Combining the above with some simple U algebra, we conclude the following. For x ∈ [0, b+2 ), the inner maximization has a maximizer at U µ U d = b+1 , and has value (at this maximizer) 2µb + 2 − (b + 1)(b + 2) U x. For x ∈ [ b+2 ,U], the inner µ maximization has a maximizer at d = U, and has value (at this maximizer) µb + 2 − (b + 2) U x. µ µ Thus the optimal choice of x is driven by the sign of 2 − (b + 1)(b + 2) U and 2 − (b + 2) U , which coincides exactly with whether µ ∈ [0, 2U ], µ ∈ ( 2U , 2U ], or µ ∈ ( 2U ,U]. (b+1)(b+2) (b+1)(b+2) b+2 b+2 2U µ Indeed, first suppose µ ∈ [0, ]. Then inf U 2µb + 2 − (b + 1)(b + 2) x = 2µb (b+1)(b+2) x∈[0, b+2 ] U µ µ U (attained at x = 0), while inf U µb + 2 − (b + 2) x = µb + 2 − (b + 2) × (attained x∈[ b+2 ,U] U U b+2 U at x = b+2 ). As it follows from some straightforward algebra (the details of which we omit) that µ U 2U µb + 2 − (b + 2) U × b+2 ≥ 2µb for all µ ∈ [0, (b+1)(b+2) ], in this case we conclude that the optimal x equals 0, the optimal d (for that choice of x) equals U , and the optimal value equals 2µb. b+1 2U 2U µ Next, suppose µ ∈ ( , ]. Then inf U 2µb + 2 − (b + 1)(b + 2) x = 2µb + 2 − (b+1)(b+2) b+2 x∈[0, b+2 ] U µ U U µ (b + 1)(b + 2) × (attained at x = ), while inf U µb + 2 − (b + 2) x = µb + 2 − U b+2 b+2 x∈[ b+2 ,U] U Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 58

µ U U U (b + 2) U × b+2 (attained at x = b+2 ). In this case we conclude that the optimal x equals b+2 , the optimal d (for that choice of x) equals U, and the optimal value equals µ(b − 1) + 2U . b+2 2U µ Finally, suppose µ ∈ ( ,U]. Then inf U 2µb + 2 − (b + 1)(b + 2) x = 2µb + 2 − (b + b+2 x∈[0, b+2 ] U µ U U µ 1)(b + 2) × (attained at x = ), while inf U µb + 2 − (b + 2) x = µb + 2 − (b + U b+2 b+2 x∈[ b+2 ,U] U µ 2) U × U (attained at x = U). In this case, it again follows from some straightforward algebra (the details of which we omit) that the optimal x equals U, the optimal d (for that choice of x) equals U, and the optimal value equals 2(U − µ). We note that the above cases and values are perfectly consistent with the cases and values for 2 2 2 2 all quantities appearing in Observation3, including χMAR, OptMAR,D1, and X1 . In summary, the U break-point b+1 comes from the two possible behaviors in the single-period problem (corresponding U to the final period), the break-point b+2 (for x) determines whether a critical coefficient in nature’s inner maximization is positive or negative, hence dictating whether nature selects a “small” or 2U 2U “large” value for its non-zero support point. Finally, the break-points (b+1)(b+2) and b+2 (for µ) determine which coefficients in the policy-maker’s outer minimization are positive, also acting as important determinants for the optimal policy. We leave as an interesting direction for future research understanding more formally and generally the connection between such a simplified problem and our true problem.

7.12. Different behaviors when our assumptions do not hold As noted in Section 2.3, the “obsolescence phenomena”, i.e. the property that conditional on the past demand realizations and the current inventory level (under an optimal policy), there always exists a worst-case distribution that assigns a strictly positive probability to zero (and otherwise clears the inventory), is a feature of our particular modeling assumptions, and may not hold under different assumptions. Here we provide several examples showing that this feature need not hold if one relaxes our modeling assumptions. The first three examples collectively demonstrate that: 1. if one allows for time-dependent costs, then a worst-case distribution may not put positive probability at zero (first example); 2. if one removes the upper bound on the support, then a worst- case distribution may not even exist (second example); and 3. if one imposes a lower bound on the support, then a worst-case distribution may not put positive probability on this lower bound, and furthermore may not clear the inventory (third example). Collectively, these findings suggest that extending our framework to more complex models will require several fundamentally new ideas, as much of the structure which allowed for our explicit analysis may no longer hold. However, our fourth example shows there is indeed hope of extending our results to more general settings, by showing that the obsolescence phenomena again manifests even if holding costs can vary arbitrarily over time, if one enforces (the admittedly strong assumption) that all backlogging costs are 0.

Example 1. Non-stationary and possibly zero cost parameters. Here we show that if one relaxes the constraint that the cost parameters are stationary and strictly positive, diﬀerent behavior arises. Indeed, let us consider the modiﬁcation of Problem 2.4 in which x0 = 0,T = 2, h1 = U 1, b1 = 0, h2 = 1. We let b2 > 0, U > 0 be general, and set µ = . Namely, the problem is identical b2+1 to Problem 2.4, except that we allow for non-stationary and possibly zero costs. First, we claim ∗ ∗ ∗ ∗ that in this setting, there exists a minimax optimal policy π = (x1, x2) satisfying x1 = 0, and ∗ x2(d1) = χIND(d1, U, b2). Indeed, to see this, note that we can derive a lower bound on the value of the associated minimax problem by relaxing the problem s.t. at the start of the second period, the policy-maker may (at zero cost) select any inventory level in [0,U], i.e. the policy-maker may order or dispose of inventory at that time (in the actual problem inventory disposal is infeasible). It follows from Theorem1, non-negativity of all costs, and the fact that the dynamics in the Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 59

final period coincide with those of a corresponding single-period problem (with µ equal to the ∗ ∗ realized value of D1), that π is actually optimal for this modified problem. As π is also feasible for the original problem, optimality for the original problem immediately follows. Again applying ˆ∗ ∗ Theorem1, it follows that there exists a worst-case measure Q (against π ) s.t. for all d1 ∈ [0,U], Qˆ∗ (0) = 1 − d1 , Qˆ∗ (U) = d1 . Furthermore, again applying Theorem1, we find that Qˆ∗ can be 2|d1 U 2|d1 U 1 taken to be any measure belonging to arg max EQ min(b2 × D1,U − D1) . Q∈M(µ)

A simple application of Jensen’s inequality shows that the measure Q which assigns probability ˆ∗ 1 to the d maximizing min(b2 × d, U − d) is feasible and optimal, i.e. one can take Q1 to be the measure s.t. Qˆ∗( U ) = 1. Namely, here Qˆ∗ does not put strictly positive probability at 0, i.e. 1 b2+1 1 such an obsolescence phenomena does not occur (at least not in the ﬁrst period). Note that non- stationarity was critical to this conclusion, since if b2 also equalled zero then the problem would become degenerate with the policy which never orders optimal, and every feasible martingale inducing zero cost.

Example 2. No upper bound on support. Here we show that if one relaxes the constraint that the demand must be at most U, diﬀerent behavior arises. Let us consider the modiﬁcation of

Problem 2.4 in which x0 = 1,T = 1, h1 = 1, b1 = 1, µ = 1, but there is no upper bound on the support (equivalently U = ∞). Let M∞(µ) denote the set of all probability measures with non-negative support and mean µ. In this case, for any ﬁxed choice of x1 ≥ x0, the inner maximization (over measures) is equivalent to

sup EQ [[D1 − x1]+ + [x1 − D1]+] . (7.72) Q∈M∞(1)

Note that for any strictly positive real number y and non-negative r.v. Z s.t. P (Z = 0) < 1, it is easily verified that P [Z − y]+ + [y − Z]+ < Z + y > 0. Since trivially [Z − y]+ + [y − Z]+ ≤ Z + y ∞ w.p.1, it follows that for any fixed x1 ≥ x0 and Q ∈ M (1), EQ [[D1 − x1]+ + [x1 − D1]+] < 1 + x1. M M 1 1 M However, note that if (for M ≥ x1 +2) we let Q denote the measure s.t. Q ( M ) = 1− M ,Q (M − 1 1 M ∞ 1 1 1 + M ) = M , we find that Q ∈ M (1), and EQM [[D1 − x1]+ + [x1 − D1]+] = (1 − M ) × (x1 − M ) + 1 1 2(x1+1) M × (M − 1 + M − x1) ≥ 1 + x1 − M , and thus limM→∞ EQM [[D1 − x1]+ + [x1 − D1]+] = 1 + x1. Combining the above, we conclude that for any feasible policy, there does not exist any worst- case distribution. Furthermore, the unique optimal policy is that which orders nothing in the first period, and there exists a sequence of probability measures which (in the limit) yield a minimax value of 2, such that no distribution in the sequence puts strictly positive probability at 0.

Example 3. Positive lower bound on support. Here we show that if one imposes an additional lower bound constraint on the support of the demand, diﬀerent behavior arises. Let us consider the modiﬁcation of Problem 2.4 in which x0 = U, T = 2, h1 = b1 = h2 = b2 = 1. We let U+L U > 0,L ∈ (0,U), be general so long as they satisfy the constraints U < 2 × L, and set µ = 2 . Let ML(µ) denote the set of all probability measures with support a subset of [L, U] and mean µ, and MARL denote the collection of all probability measures corresponding to discrete-time martingale L L sequences (D1,D2) s.t. for all t, the marginal distribution of Dt belongs to M (µ). Let Π denote the family of (appropriately adapted) policies that both : 1. never order up to more than U, and 2. always order up to at least L. Restricting to policies in ΠL is without loss of generality, where the proof is analogous to that of Lemma1, and we omit the details. Then we consider the problem Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 60

PT π infπ∈ΠL supQ∈MARL EQ[ t=1 Ct ]. We begin by deﬁning some auxiliary functions, which correspond to the solution to the corresponding 1-period problem. It follows from a straightforward convexity argument (analogous to that used in the proof of Theorem1) and some algebra that

1,L ∆ (U − x) × (ˆµ − L) + (x − L) × (U − µˆ) g (x, µˆ) = sup EQ [|x − D1|] = , Q∈ML(ˆµ) U − L

µˆ−L where one can always take as a worst-case distribution the measure Q s.t. Q(U) = U−L ,Q(L) = µˆ−L 1,L 1 − U−L . Furthermore, infx∈[L,U] g (x, µˆ) = min(ˆµ − L, U − µˆ), and  U ifµ ˆ > U+L ;  2 ∗,L ∆ 1 U+L x (ˆµ) = arg min g (x, µˆ) = L ifµ ˆ < 2 ; x∈[L,U]  U+L [L, U] ifµ ˆ = 2 .

Furthermore, using the facts that: 1. x0 = U, 2. U < 2 × L implies x1 − D1 ≤ L w.p.1, and 3. the dynamics in the ﬁnal period coincide with those of a corresponding single-period problem (whose 1,L PT π solution is given by g ), we ﬁnd that infπ∈ΠL supQ∈MARL EQ[ t=1 Ct ] equals

U − µ + sup EQ [min(D1 − L, U − D1)] , (7.73) Q∈ML(µ)

∗ ∗,L where the policy π that orders nothing in period 1, and orders up to x (D1) in period 2 (where U+L any consistent choice may be selected from [L, U] if D1 = 2 ) will be optimal. A simple application of Jensen’s inequality shows that the measure Q which assigns probability 1 to the d maximizing U+L min(d − L, U − d) is feasible and optimal, i.e. one can take Q to be the measure s.t. Q( 2 ) = 1. Summarizing the above, we conclude that a worst-case measure (for π∗) is the martingale Qˆ∗ s.t. ˆ∗ U+L ˆ∗ ˆ∗ 1 Q1( 2 ) = 1, while Q2(L) = Q2(U) = 2 . Hence the worst-case measure not only puts no probability at 0 (which would be impossible due to the lower bound), but actually puts no probability even at the lower bound itself (at least in the ﬁrst round). The intuition here is that due to the lower bound, it is less appealing for the adversary to leave the policy-maker stuck holding inventory, as some of this inventory is always reduced in later rounds (due to the lower bound). Interestingly, we note that here, D1 is not large enough to clear the initial inventory, instead only bringing the inventory below the lower bound (in contrast to the behavior when no such lower bound is imposed, e.g. in Observation1).

Example 4. Non-stationary cost parameters revisited: a setting where the obsolescence phenomena again manifests. Here we show that there is a simple (albeit not completely trivial) setting in which costs can be non-stationary, yet our results (and general insights) regarding obsolescence continue to hold. Let us consider the modiﬁcation of Problem 2.4 in which x0 ∈ (0,U] is strictly positive (but otherwise general), T is general, {hi, i = 1,...,T } are strictly positive (but otherwise general), µ ∈ (0,U) is general, and the major restriction is that bi = 0 for all i. Namely, there is no back-logging cost whatsoever, and the only incurred costs are holding costs, which again may vary in a general manner over time. It follows from a straightforward contradiction argument (similar in spirit to the proof of Lemma1, the details of which we omit) that in this setting the policy π∗ which never orders is optimal. The associated inner maximization (over measures) is thus equivalent to T t X X sup htEQ[max(0, x0 − Di)]. Q∈MAR t=1 i=1 Xin and Goldberg: Distributionally robust inventory control when demand is a martingale 61

For t ∈ [1,T ], let MtU (tµ) denote the set of all probability measures Q with support on [0, tU] such Q Pt that E[D ] = tµ. As for all t ∈ [1,T ], i=1 Di has mean tµ and support a subset of [0, tU], we conclude (after applying convexity and Theorem1) that for all Q ∈ MAR and t ∈ [1,T ],

t X µ EQ[max(0, x0 − Di)] ≤ sup EQ max(0, x0 − D1) = (1 − ) × x0, tU U i=1 Q∈M (tµ)

∗ ∗ µt µ ∗ µ where the measure Q such that Q (0) = 1 − Ut = 1 − U ,Q (tU) = U belongs to ˆ∗ arg maxQ∈MtU (tµ) EQ max(0, x0 −D1) . Noting that under the martingale measure Q ∈ MAR such Qˆ∗ µ Qˆ∗ µ that P Di = U for all i ∈ [1,T ] = U ,P Di = 0 for all i ∈ [1,T ] = 1 − U it holds (for all t) Pt µ Pt µ ∗ that P ( i=1 Di = 0) = 1 − U ,P ( i=1 Di = tU) = U , we conclude that (under optimal policy π ) ˆ∗ PT Pt ˆ∗ Q ∈ arg maxQ∈MAR t=1 EQ[max(0, x0 − i=1 Di)]. We note that the measure Q corresponds to a strong manifestation of the obsolescence phenomena, as even in the ﬁrst period the demand is either 0 or U (with the demand stuck at its initial value in all subsequent periods). The intuition here is that when the only costs possible are holding costs, all costs are derived from being “stuck with inventory”, and the obsolescence setting (and the correlations over time in demand it induces) are (in an appropriate sense) worst-case along these lines. We note that in the symmetric setting in which all holding costs are 0 while the back-logging costs may be general, the inner maximization becomes degenerate, as the policy which always orders up to U will be optimal, and no cost will be incurred under any distribution for demand (under this policy).