Monte Carlo Simulation Approach to Stochastic Programming
Total Page:16
File Type:pdf, Size:1020Kb
Proceedings of the 2001 Winter Simulation Conference B.A.Peters,J.S.Smith,D.J.Medeiros,andM.W.Rohrer,eds. MONTE CARLO SIMULATION APPROACH TO STOCHASTIC PROGRAMMING Alexander Shapiro School of Industrial & Systems Engineering Georgia Institute of Technology Atlanta, Georgia 30332, U.S.A. ABSTRACT If the space is finite, say := {ω1, ..., ωK } with respective probabilities pk, k = 1, ..., K ,then Various stochastic programming problems can be formulated as problems of optimization of an expected value function. K Quite often the corresponding expectation function cannot g(x) = pk G(x,ωk). (3) be computed exactly and should be approximated, say by k=1 Monte Carlo sampling methods. In fact, in many practical applications, Monte Carlo simulation is the only reasonable Consequently problem (1) can be viewed as a deterministic way of estimating the expectation function. In this talk we optimization problem. Note, however, that the number K discuss converges properties of the sample average approx- of possible realizations (scenarios) of the data can be very ω imation (SAA) approach to stochastic programming. We large. Suppose, for instance, that is a random vector with argue that the SAA method is easily implementable and 100 stochastically independent components each having 3 = 100 can be surprisingly efficient for some classes of stochastic realizations. Then the total number of scenarios is K 3 . programming problems. No computer in a foreseeable future will be able to handle that number of scenarios. 1 INTRODUCTION One can try to solve the optimization problem (1) by a Monte Carlo simulation. That is, by generating a random ω1, ..., ωN ∼ Consider the optimization problem (say iid) sample P one can estimate the expectation g(x) by the corresponding sample average min g(x) := EP [G(x,ω)] , (1) x∈X N −1 j gˆN (x) := N G(x,ω ). (4) n where G : R × → R, the expectation is taken with j=1 respect to probability measure P defined on a sample space (, F),andX ⊂ Rn. We assume that for every x ∈ X the There are two basic approaches to solving the optimization expectation g(x) is well defined. problem (1) by using such Monte Carlo sampling approxi- The function G(x,ω) in itself can be defined by an mations. In one approach the sampling is performed inside optimization problem. For example, in two-stage stochastic a particular algorithm with a new sample generated at each programming with recourse, G(x,ω) is given by the opti- iteration of the corresponding numerical procedure. In the mal value of the corresponding second stage problem. In other approach gˆN (x) is considered as a function of x, particular, in two-stage linear stochastic programming with associated with a generated random sample, and the “true" recourse, G(x,ω) is the optimal value of (expected value) problem (1) is approximated by the opti- mization problem T T Miny c x + q y . + = , ≥ , (2) min gˆN (x). (5) s t Tx Wy h y 0 x∈X ω ∈ (·,ω) where (q(ω), h(ω), T (ω), W(ω)) is the random data of the It is assumed that for a given , the function G , problem. and may be its derivatives, is computable at any point x ∈ X. Therefore, once the random sample is generated, problem (5) becomes a deterministic optimization problem. 428 Shapiro 1/2 Required values of the sample average function gˆN (x) can tion function g(x) is smooth near x0,thenN (xˆN − x0) be calculated either by keeping the generated random sample converges in distribution to normal (Huber, 1967). That is, in the computer memory or by using the common random xˆN converges to x0 also at a rate of the square root of the numbers generation. We refer to this approach as the sample sample size. Moreover, in such cases xˆN is asymptotically average approximation (SAA) method. Note that the SAA equivalent to stochastic approximation estimators imple- method is not an algorithm since one still has to choose a mented with optimal step sizes (Shapiro, 1996) (see, e.g., particular numerical procedure in order to solve the SAA Kushner and Clark (1978) for a discussion of the stochastic problem (5). approximation method). In this talk we discuss various properties of the SAA It appears from the above discussion that the SAA method. We argue that in many instances the SAA method approach inherits the slow convergence properties of the can be very efficient and easily implementable. It is difficult Monte Carlo sampling method. However, the situation to point at the origin of that approach. The idea is simple changes drastically in cases where the expectation function and natural, and variants of the SAA method were suggested g(x) is not smooth (differentiable). It happens quite often by various authors over the years. In a context of simulation in such cases that the true problem (1) has a sharp optimal models a variant of the SAA method, based on Likelihood solution x0. That is, there exists a constant κ>0 such that Ratio transformations, was suggested in Rubinstein and Shapiro (1993). Independently, and more or less at the g(x) ≥ g(x0) + κ x − x0 (7) same time, similar ideas were employed in Statistics for computing Maximum Likelihood estimates by Monte Carlo for all x ∈ X. For example, if G(x,ω) is given by the techniques based on Gibbs sampling, Geyer and Thompson optimal value of the linear program (2), then G(·,ω) is a (1992). Let us also remark that the terminology “sample piecewise linear convex function. If, moreover, is finite average approximation" is not uniform in the literature. For (i.e., there is a finite number of scenarios), then g(·) is also example, the term “sample-path optimization" was used in piecewise linear and convex. If, further, the set X is defined Plambeck, Fu, Robinson and Suri (1996). by linear constraints and the set S0 of optimal solutions is nonempty, then the following inequality always holds 2 RATES OF CONVERGENCE g(x) ≥ v0 + κ dist(x, S0) (8) Let us denote by v0 and S0 the optimal value and the set of optimal solutions, respectively, of the “true" problem (1), for all x ∈ X. Of course, property (7) is a particular case ˆ and similarly by vˆN and SN the optimal value and the set of (8) if S0 ={x0} is a singleton. of optimal solutions, respectively, of the SAA problem (5). The following results are derived in Shapiro and By the Law of Large Numbers we have that for a given x, Homem-de-Mello (2001). thesampleaveragegˆN (x) converges to the corresponding Theorem 2.1 Suppose that is finite, for every expectation g(x) w.p.1. It is possible then to show that, ω ∈ the function G(·,ω) is convex piecewise linear, the under mild regularity conditions, vˆN converges to v0 and set X is polyhedral and the set S0 is nonempty and bounded. dist(xˆN , S0) converges to zero w.p.1 as N →∞,forany Then the following holds: ˆ ˆ xˆN ∈ SN . That is, the estimators vˆN and xˆN are consistent. (i) w.p.1 for N large enough, the set SN of optimal solutions ˆ However, by the Central Limit Theorem, gˆN (x) con- of the SAA problem is nonempty and SN ⊂ S0, −1/2 verges to g(x) at a rate of Op(N ), and therefore the (ii) there exists a constant γ>0 such that convergence is slow. That is, in order to improve the ac- curacy of the estimator gˆN (x) by one digit, one needs to 1 ˆ lim log P(SN ⊂ S0) =−γ. (9) increase the sample size 100 times. It appears that this slow N→∞ N convergence is inherited by the estimator vˆN of the optimal value. It is possible to show (Shapiro, 1991) that Property (i) of the above theorem asserts that for N large enough, any optimal solution xˆN of the SAA problem solves −1/2 the “true" problem (1) exactly. Moreover, (9) implies that vˆN = min gˆ N (x) + op(N ). (6) ∈ x S0 the probability of that event approaches one exponentially fast with increase of the sample size N. If, in addition to the ={ } vˆ In particular, if S0 x0 is a singleton, then N converges assumptions of theorem 2.1, the set S ={x } is a singleton, v ˆ ( ) ( ) 0 0 to 0 at the same rate as gN x0 converges to g x0 . then for N large enough, the SAA problem has unique Let us consider an estimator xˆ ∈ Sˆ . In the Statistics N N optimal solution xˆN and xˆN = x0, and moreover probability literature such estimators are called M-estimators, and it of that event approaches one exponentially fast. Under is well known that under certain regularity conditions, and the condition (7) the same holds for a general distribution, in particular if S0 ={x0} is a singleton and the expecta- 429 Shapiro provided that the corresponding moment generating function Note that since X is finite, ε∗ is always greater than ε, is finite valued near zero. although the difference ε∗ − ε can be small. The exponential rate of convergence indicates that one For ε>δ≥ 0 and a given significance level α ∈ (0, 1), may not need a large sample in order to solve polyhedral the above analysis gives the following estimate of the sample P( ˆδ ⊂ ε) problems (as, for example, the linear two-stage problems size N that guarantees that the probability SN S is with a finite number of scenarios) exactly by using the greater than or equal to 1 − α: SAA method. The exponential constant γ depends on how well the set of optimal solutions of the true problem 3σ 2 is conditioned.