<<

ARTICLE IN PRESS

European Journal of Operational Research xxx (2004) xxx–xxx www.elsevier.com/locate/dsw Invited Review An overview of the design and analysis of simulation experiments for sensitivity analysis Jack P.C. Kleijnen *

Department of Information Systems and Management, Center for Economic Research (CentER), Tilburg University, Postbox 90153, 5000 LE Tilburg, The Netherlands Received 1 August 2003; accepted 28 January 2004

Abstract

Sensitivity analysis may serve validation, optimization, and risk analysis of simulation models. This review surveys ‘classic’ and ‘modern’ designs for experiments with simulation models. Classic designs were developed for real, non- simulated systems in agriculture, engineering, etc. These designs assume ‘a few’ factors (no more than 10 factors) with only ‘a few’ values per factor (no more than five values). These designs are mostly incomplete factorials (e.g., frac- tionals). The resulting input/output (I/O) data are analyzed through polynomial metamodels, which are a type of models. Modern designs were developed for simulated systems in engineering, management science, etc. These designs allow ‘many factors (more than 100), each with either a few or ‘many’ (more than 100) values. These designs include group screening, Latin hypercube (LHS), and other ‘space filling’ designs. Their I/O data are analyzed through second-order polynomials for group screening, and through models for LHS. 2004 Published by Elsevier B.V.

Keywords: Simulation; Regression; Scenarios; Risk analysis; modelling

1. Introduction Kelton (2000), including their chapter 12 on ‘Experimental design, sensitivity analysis, and Once simulation analysts have programmed a optimization’. This assumption implies that the simulation model, they may use it for sensitivity reader’s familiarity with DOE is restricted to ele- analysis, which in turn may serve validation, opti- mentary DOE for simulation. In this article, I try to mization, and risk (or uncertainty) analysis for summarize that elementary DOE, and extend it. finding robust solutions. In this paper, I discuss Traditionally, experts in statistics and stochastic how these analyses can be guided by the statistical systems have focused on tactical issues in simula- theory on (DOE). tion; i.e., issues concerning the runlength of a I assume that the reader is familiar with simu- steady-state simulation, the number of runs of a lation––at the level of a textbook such as Law and terminating simulation, variance reduction tech- niques (VRT), etc. I find it noteworthy that in the related area of deterministic simulation––where * Tel.: +31-13-4662029; fax: +31-13-4663069. these tactical issues vanish––statisticians have been E-mail address: [email protected] (J.P.C. Kleijnen). attracted to DOE issues; see the standard publica-

0377-2217/$ - see front matter 2004 Published by Elsevier B.V. doi:10.1016/j.ejor.2004.02.005 ARTICLE IN PRESS

2 J.P.C. Kleijnen / European Journal of Operational Research xxx (2004) xxx–xxx tion by Koehler and Owen (1996). Few statisticians Section 5 discusses cross-validation of the meta- have studied random simulations. And only some model, to decide whether the metamodel is an simulation analysts have focused on strategic issues, adequate approximation of the underlying simula- namely which scenarios to simulate, and how to tion model. Section 6 gives conclusions and further analyze the resulting I/O data. research topics. Note the following terminology. Statisticians speak of ‘factors’ with ‘levels’ whereas simulation analysts speak of inputs or parameters with values. 2. Black boxes and metamodels Statisticians talk about design points or runs, whereas simulationists talk about scenarios. DOE treats the simulation model as a black Two textbooks on classic DOE for simulation box––not a white box. To explain the difference, I are Kleijnen (1975, 1987). An update is Kleijnen consider an example, namely an M/M/1/ simula- (1998). A bird-eye’s view of DOE in simulation is tion. A white box representation is Kleijnen et al. (2004a), which covers a wider area P I w than this review––without using any equations, ta- w ¼ i¼1 i ; ð1aÞ bles, or figures; this review covers a smaller area––in I more detail. An article related to this one is Kleijnen wi ¼ maxðwiÀ1 þ siÀ1 À ai; 0Þ; ð1bÞ (2004), focusing on Monte Carlo experiments in mathematical statistics instead of (dynamic) simu- ai ¼Àlnðr2iÞ=k; ð1cÞ lation experiments in Operations Research. Classic articles on DOE in simulation are siÀ1 ¼Àlnðr2iÀ1Þ=l; ð1dÞ Schruben and Margolin (1978) and Donohue et al. (1993). Several tutorials have appeared in the w1 ¼ 0; ð1eÞ Winter Simulation Conference Proceedings. with average waiting time as output in (1a); inter- Classic DOE for real, non-simulated systems was arrival times a in (1c); service times s in (1d); developed for agricultural experiments in the 1930s, pseudo-random numbers (PRN) r in (1c) and (1d); and––since the 1950s––for experiments in engi- empty starting (or initial) conditions in (1e); and the neering, psychology, etc. In those real systems, it is well-known single-server waiting-time formula in impractical to experiment with ‘many’ factors; (1b). k ¼ 10 factors seems a maximum. Moreover, it is This white box representation may be analyzed then hard to experiment with factors that have through perturbation analysis and score function more than ‘a few’ values; five values per factor analysis in order to estimate the gradient (for local seems a maximum. sensitivity analysis) and use that estimate for opti- The remainder of this article is organized as mization; see Spall (2003). I, however, shall not follows. Section 2 covers the black box approach to follow that approach. simulation, and corresponding metamodels (espe- A black box representation of this M/M/1 cially, polynomial and Kriging models); note that example is ‘metamodels’ are also called response surfaces, emulators, etc. Section 3 starts with simple meta- w ¼ wðk; l; r0Þ; ð2Þ models with a single factor for the M/M/1 simula- where wð:Þ denotes the mathematical function tion; proceeds with designs for multiple factors implicitly defined by the pro- including Plackett–Burman designs for first-order gram implementing (1a)–(1e); r0 denotes the seed of polynomial metamodels, and concludes with the PRN. screening designs for (say) hundreds of factors. One possible metamodel of the black box model Section 4 introduces Kriging metamodels, which in (2) is a Taylor series approximation––cut off after provide exact interpolation in deterministic simu- the first-order effects of the two factors, k and l: lation. These metamodels often use space-filling designs, such as (LHS). y ¼ b0 þ b1k þ b2l þ e; ð3Þ ARTICLE IN PRESS

J.P.C. Kleijnen / European Journal of Operational Research xxx (2004) xxx–xxx 3 where y is the metamodel predictor of the simula- To estimate the parameters of whatever meta- 0 tion output w in (2); b ¼ðb0; b1; b2Þ denotes the model, the analysts must experiment with the sim- parameters of the metamodel in (3), and e is the ulation model; i.e., they must change the inputs (or noise––which includes both lack of fit of the meta- factors) of the simulation, run the simulation, and model and intrinsic noise caused by the PRN. analyze the resulting I/O data. This experimentation Besides (3), there are many alternative meta- is the topic of the next sections. models. For example, a simpler metamodel is y ¼ b0 þ b1x þ e; ð4Þ 3. Linear regression metamodels and DOE where x is the traffic rate––in queueing theory 3.1. Simplest metamodels for M/M/1 simulations usually denoted by q: k I start with the simplest metamodel, namely a x ¼ q ¼ : ð5Þ l first-order polynomial with a single factor; see (4). Elementary mathematics proves that––to fit such a This combination of the two original factors straight line––it suffices to have two I/O observa- (k; l) into a single factor (q)––inspired by queueing tions; also see ‘local area 1’ in Fig. 1. It can be theory––illustrates the use of transformations. An- proven that selecting those two values as far apart other useful transformation may be a logarithmic as ‘possible’ gives the ‘best’ estimator of the one: replacing y, l,andk by, logðyÞ, logðkÞ, and parameters in (4). In other words, if within the local logðlÞ in (3) makes the first-order polynomial area the fitted first-order polynomial gives an er- approximate relative changes; i.e., the regression ror––denoted by e in (4)––that has zero mean (so parameters b1 and b2 become elasticity coefficients. the polynomial is an ‘adequate’ or ‘valid’ approxi- There are many––more complex––types of mation; i.e., it shows no ‘lack of fit’), then the metamodels. Examples are Kriging models, neural parameter estimator is unbiased with minimum nets, radial basis functions, splines, support vector variance. regression, and wavelets; see Clarke et al. (2003) In practice, the analysts do not know over which and Antioniadis and Pham (1998). I, however, will experimental area a first-order polynomial is a ‘va- focus on two types that have established a track lid’ metamodel. This validity depends on the goals record in random and deterministic simulation of the simulation study; see Kleijnen and Sargent respectively: (2000). So the analysts may start with a local area, and • linear regression models (see Section 3) simulate the two (locally) extreme input values. Let • Kriging (see Section 4). us denote these two extreme values by )1 and +1,

Fig. 1. M/M/1 example. ARTICLE IN PRESS

4 J.P.C. Kleijnen / European Journal of Operational Research xxx (2004) xxx–xxx which implies the following standardization (also Note that Zeigler et al. (2000) call the experi- called coding, linear transformation): mental area the ‘experimental frame’. I call it the domain of admissible scenarios, given the goals of q À q x ¼ ; ð6Þ the simulation study. ðqmax À qminÞ=2 I conclude that lessons learned from this simple M/M/1 model, are: where q ¼ðqmax þ qminÞ=2 denotes the average traffic rate in the (local) experiment. i(i) The analysts should decide whether they want The Taylor series argument implies that––as the to experiment locally or globally. experimental area gets bigger (see ‘local area 2’ in (ii) Given that decision, they should select a specific Fig. 1)––a better metamodel may be a second-order metamodel type; for example, a low-order poly- polynomial nomial or a Kriging model. 2 y ¼ b0 þ b1x þ b2x þ e: ð7Þ 3.2. Metamodels with multiple factors Obviously, estimation of the three parameters in (7) requires at least the simulation of three input val- Let us now consider a metamodel with k factors; ues. Indeed, DOE provides designs with three val- for example, (4) implies k ¼ 1, whereas (3) implies ues per factor (for example, 3k designs. However, k ¼ 2. The following design is most popular, even most publications on DOE in simulation discuss though it is inferior: change one factor at a time; see Central Composite Designs (CCD), which have five Fig. 2 and the columns denoted by x1 and x2 in values per factor; see Kleijnen (1975). Table 1. In that design the analysts usually start I emphasize that the second-order polynomial in with the ‘base’ scenario, denoted by the row (0; 0). (7) is nonlinear in x (the regression variables), but Then the next two scenarios that they run are (1; 0) linear in b (the parameters to be estimated). Con- and (0; 1). sequently, such a polynomial metamodel is a type of linear regression model. Table 1 Finally, when the experimental area covers the One-factor-at-a-time design for two factors, and possible whole area in which the simulation model is valid regression variables (0 < q < 1), then other global metamodels become Scenario x0 x1 x2 x1x2 relevant. For example, Kleijnen and Van Beers 11000 (2004a) find that a Kriging metamodel outperforms 21100 a second-order polynomial. 31010

Fig. 2. One-factor-at-a-time design for two factors. ARTICLE IN PRESS

J.P.C. Kleijnen / European Journal of Operational Research xxx (2004) xxx–xxx 5

In such a design, the analysts cannot estimate the plusses and four minuses ––except for the dummy between the two factors. Indeed, Table 1 column. 3 shows that the estimated interaction (say) b1;2 is The 2 design enables the estimation of all eight ^ confounded with the estimated intercept b0; i.e., the parameters of the following metamodel, which is a columns for the corresponding regression variables third-order polynomial that is incomplete; i.e., some are linearly dependent. (Confounding remains when parameters are assumed zero: the base values are denoted not by zero but by one; X3 X2 X3 then these two columns become identical.) y ¼ b þ b x þ b 0 x x 0 In practice, analysts often study each factor at 0 j j j;j j j j¼1 j¼1 j0>j three levels (denoted by )1, 0, +1) in their one-at-a- time design. However, two levels suffice to estimate þ b1;2;3x1x2x3 þ e: ð8Þ a first-order polynomial metamodel––as we saw in Indeed, the 23 design implies a matrix of regression Section 3.1. variables X that is orthogonal To enable the estimation of interactions, the analysts must change factors simultaneously.An ðX0XÞ¼nI; ð9Þ interesting problem arises if k increases from two to three. Then Fig. 2 becomes Fig. 3––which does not where n denotes the number of scenarios simulated; show the output w, since it would require a fourth for example, Table 2 implies n ¼ 8. Hence the or- dimension; instead x3 replaces w. And Table 1 be- dinary least squares (OLS) estimator comes Table 2. The latter table shows the 23 fac- ^ 0 À1 0 torial design; i.e., each of the three factors has two b ¼ðX XÞ X w ð10Þ values, and the analysts simulate all the combina- simplifies for the 23 design––which implies (9)––to tions of these values. To simplify the notation, the b^ ¼ X0w=8. ) table shows only the signs of the factor values, so The covariance matrix of the (linear) OLS esti- ) means 1 and + means +1. The table further shows mator given by (10) is possible regression variables, using the symbols ‘0’ 1 1 through ‘1.2.3’––to denote the indexes of the covðb^Þ¼½ðX0XÞÀ X0ŠcovðwÞ½ðX0XÞÀ X0Š0: ð11Þ regression variables x0 (which remains 1 in all sce- In case of white noise; i.e., narios) through x1x2x3. Further, I point out that each column is balanced; i.e., each column has four covðwÞ2r2I; ð12Þ

Fig. 3. The 23 design. ARTICLE IN PRESS

6 J.P.C. Kleijnen / European Journal of Operational Research xxx (2004) xxx–xxx

Table 2 The 23 design and possible regression variables Scenario 0 1 2 3 1.2 1.3 2.3 1.2.3 1+)))+++) 2++))))++ 3+) + ))+ ) + 4+++) + ))) 5+))++))+ 6++) + ) + )) 7+) ++))+ ) 8 ++++++++

(11) reduces to the well-known formula example, in a first-order polynomial metamodel the factor estimated to be the most important factor is covðb^Þ¼r2ðX0XÞÀ1: ð13Þ the one with the highest absolute value for its esti- However, I claim that in practice this white noise mated effect. Also see Bettonvil and Kleijnen assumption does not hold: (1990). i(i) The output variances change as the input (sce- 3.3. Fractional factorials and other incomplete de- nario) changes so the assumed common variance signs r2 in (12) does not hold. This is called variance heterogeneity. (Well-known examples are queue- The incomplete third-order polynomial in (8)

ing models, which have both the mean and the included a third-order effect, namely b1;2;3. Standard variance of the waiting time increase as the traffic DOE textbooks include the definition and estima- rate increases; see Cheng and Kleijnen, 1999.) tion of such high-order interactions. However, the (ii) Often the analysts use common random numbers following claims may be made: (CRN) so the assumed diagonality of the matrix in (12) does not hold. 1. High-order effects are hard to interpret. 2. These effects often have negligible magnitudes. Therefore I conclude that the analysts should choose between the following two options. Claim # 1 seems obvious. If claim #2 holds, then the analysts may simulate fewer scenarios than i(i) Continue to apply the OLS point estimator (10), specified by a full factorial (such as the 23 design). 3À1 but use the covariance formula (11) instead of For example, if indeed b1;2;3 is zero, a 2 fractional (13). factorial design suffices. A possible 23À1 design is (ii) Switch from OLS to generalized least squares shown in Table 2, deleting the four rows (scenarios) (GLS) with estimated covðwÞ based on m > n that have a minus sign in the 1.2.3 column (rows 1, replications (using different PRN); for details 4, 6, 7). In other words, only a fraction––namely 2À1 see Kleijnen (1992, 1998). of the 23 full factorial design––is simulated. This design corresponds with the points denoted by the The variances of the estimated regression symbol *in Fig. 3. Note that this figure has the parameters––which are on the main diagonal in following geometrically property: each scenario (11)––can be used to test statistically whether some corresponds with a vertex that cannot be reached factors have zero effects. However, I emphasize that via a single edge of the cube. a significant factor may be unimportant––practically This 23À1 design has two identical columns, speaking. If the factors are scaled between )1 and namely the 1.2.3 column (which has four plusses) +1 (see the transformation in (6)), then the esti- and the dummy column 0 (which obviously has four mated effects quantify the order of importance. For plusses). Hence, the corresponding two effects are ARTICLE IN PRESS

J.P.C. Kleijnen / European Journal of Operational Research xxx (2004) xxx–xxx 7 confounded––but b1;2;3 is assumed zero, so this tions are important. These designs require double confounding can be ignored! the number of scenarios required by resolution III Sometimes a first-order polynomial metamodel designs; i.e., after simulating the scenarios of the suffices. For example, in the (sequential) optimiza- resolution III design, the analysts simulate the tion of black-box simulation models the analysts mirror scenarios; i.e., multiply by )1 the factor may use a first-order polynomial to estimate the values in the original scenarios. local gradient; see Angun€ et al. (2002). Then a 2kÀp Resolution V designs enable unbiased estimators design suffices with the biggest p value such that of first-order effects plus two-factor interactions. To 2kÀp > k. An example is: k ¼ 7andp ¼ 4 so only this class belong certain 2kÀp designs with small eight scenarios are simulated; see Table 3. This table enough p values, and saturated designs developed shows that the first three factors (labeled 1, 2, and 3) by Rechtschaffner (1967); saturated designs are de- form a full factorial 23 design; the symbol ‘4 ¼ 1.2’ signs with the minimum number of scenarios––that means that the values for factor 4 are specified by still allow unbiased estimators of the metamodel’s multiplying the elements of the columns for the parameters. Saturated designs are attractive for factors 1 and 2. Note that the design is still balanced expensive simulations; i.e., simulations that require and orthogonal. Because of this orthogonality, it relatively much computer time per scenario. can be proven that the estimators of the meta- CCD augment Resolution V designs with the model’s parameters have smaller variances than base scenario and 2k scenarios changing factors one one-factor-at-a-time designs give. How to select at a time; this changing means increasing and scenarios in 2kÀp designs is discussed in many DOE decreasing each factor in turn. Saturated designs textbooks, including Kleijnen (1975, 1987). (smaller than CCD) are discussed in Kleijnen (1987, Actually, these designs––i.e., fractional factorial pp. 314–316). designs of the 2kÀp type with the biggest p value still enabling the estimation of first-order metamodels–– 3.4. Designs for screening are a subset of Plackett–Burman designs. The latter designs consists of k þ 1 scenarios rounded upwards Most practical, non-academic simulation models to a multiple of four. For example, if k ¼ 11, then have many factors; for example, Kleijnen et al. Table 4 applies. If k ¼ 8, then the Plackett–Burman (2004b) ––experiment with a supply-chain simula- design is a 27À4 fractional factorial design; see tion model with nearly 100 factors. Even a Plack- Kleijnen (1975, pp. 330–331). Plackett–Burman ett–Burman design would then take 102 scenarios. designs are tabulated in many DOE textbooks, Because each scenario needs to be replicated several including Kleijnen (1975). Note that designs for times, the total computer time may then be pro- first-order polynomial metamodels are called reso- hibitive. For that reason, many analysts keep a lot lution III designs. of factors fixed (at their base values), and experi- Resolution IV designs enable unbiased estimators ment with only a few remaining factors. An exam- of first-order effects––even if two-factors interac- ple is a military (agent-based) simulation that was

Table 3 A27À4 design Scenario 1 2 3 4 ¼ 1.2 5 ¼ 1.3 6 ¼ 2.3 7 ¼ 1.2.3 1 )))+++) 2+))))++ 3 ) + ))+ ) + 4++) + ))) 5 ))++))+ 6+) + ) + )) 7 ) ++))+ ) 8 +++++++ ARTICLE IN PRESS

8 J.P.C. Kleijnen / European Journal of Operational Research xxx (2004) xxx–xxx

Table 4 The Placket–Burman design for 11 factors Scenario 1234567891011 1+) + )))+++) + 2++) + )))+++) 3 ) ++) + )))+++ 4 +++) ++)))++ 5+) ++))))))+ 6++) +++++))) 7 ) +++) ++) + )) 8 ))+++) ++) + ) 9 )))+++) ++) + 10 + )))+++) ++) 11 ) + )))+++) ++ 12 ))))))))))) run millions of times for just a few scenarios–– scenario with all individual factors at )1, and a changing only a few factors; see Horne and Leo- second scenario with all factors at +1. Comparing nardi (2001). the outputs of these two extreme scenarios requires However, statisticians have developed designs only two replications because the aggregated effect that require fewer than k scenarios––called super- of the group factor is huge compared with the saturated designs; see Yamada and Lin (2002). intrinsic noise (caused by the PRN). In the next Some designs aggregate the k individual factors into step, SB splits––bifurcates––the factors into two groups of factors. It may then happen that the ef- groups. There are several heuristic rules to decide fects of individual factors cancel out, so the analysts on how to assign factors to groups (again see would erroneously conclude that all factors within Kleijnen et al., 2004b). Comparing the outputs of that group are unimportant. The solution is to de- the third scenario with the outputs of the preceding fine the )1 and +1 levels of the individual factors scenarios enables the estimation of the aggregated such that all first-order effects are non-negative.As effect of the individual factors within a group. an example, let us return to the metamodel for the Groups––and all its individual factors––are elimi- M/M/1 simulation in (3), which treats the arrival nated from further experimentation as soon as the and service rates as individual factors. Then the group effect is statistically unimportant. Obviously, value )1 of the arrival rate denotes the lowest value the groups get smaller as SB proceeds sequentially. in the experimental area so waiting time tends to be SB stops when the first-order effects of all important low. The value )1 of the service rate is its high individual factors are estimated. In the supply-chain value, so again waiting time tends to be low. My simulation only 11 of the 92 factors are classified as experience is that in practice the users do know the important. This shortlist of important factors is direction of the first-order effects of individual fac- further investigated to find a robust solution. tors––not only in queueing simulations but also in other types (e.g., an ecological simulation with nearly 400 factors discussed by Bettonvil and 4. Kriging metamodels Kleijnen, 1996). There are several types of group screening de- Let us return to the M/M/1 example in Fig. 1. If signs; for a recent survey including references, I the analysts are interested in the I/O behavior refer to Kleijnen et al. (2004b). Here I focus on the within ‘local area 1’, then a first-order polynomial most efficient type, namely sequential bifurcation such as (4) may be adequate. Maybe, a second- (abbreviated to SB). order polynomial such as (7) is required to get an SB is so efficient because it is a sequential design. valid metamodel in ‘local area 2’, which is larger SB starts with only two scenarios, namely, one and covers a steeper part of the I/O function. ARTICLE IN PRESS

J.P.C. Kleijnen / European Journal of Operational Research xxx (2004) xxx–xxx 9

However, Kleijnen and Van Beers (2004a) show the process yð:Þ has constant mean and constant that the latter metamodel gives very poor predic- variance, and the covariances of yðx þ hÞ and yðxÞ tions compared with a Kriging metamodel. depend only on the distance between their inputs, Kriging has been often applied in deterministic namely the lag jhj¼jðx þ hÞÀðxÞj. simulation models. Such simulations are used for The Kriging predictor for the unobserved input computer aided engineering (CAE) in the develop- x0––denoted by y^ðx0Þ––is a weighted linear combi- ment of airplanes, automobiles, computer chips, nation of all the n simulation output data: computer monitors, etc.; see Sacks et al. (1989)’s Xn pioneering article, and––for an update––see Simp- 0 y^ðx0Þ¼ ki Á yðxiÞ¼k Á y ð15aÞ son et al. (2001). i¼1 For random simulations (including discrete- event simulations) there are hardly any applications with yet. First, I explain the basics of Kriging; then DOE Xn aspects. ki ¼ 1; ð15bÞ i¼1 4.1. Kriging basics 0 0 where k ¼ðk1; ...; knÞ and y ¼ðy1; ...; ynÞ . To quantify the weights k in (15), Kriging derives Kriging is named after the South-African mining the best linear unbiased estimator (BLUE), which engineer D.G. Krige. It is an interpolation method minimizes the mean squared error (MSE) of the that predicts unknown values of a random process; predictor: see the classic Kriging textbook Cressie (1993). 2 More precisely, a Kriging prediction is a weighted MSEðy^ðx0ÞÞ ¼ Eððyðx0ÞÀy^ðx0ÞÞ Þ linear combination of all output values already observed. These weights depend on the distances with respect to k. Obviously, these weights depend between the input for which the output is to be on the covariances mentioned below (14). Cressie predicted and the inputs already simulated. Kriging (1993) characterizes these covariances through the assumes that the closer the inputs are, the more variogram, defined as 2cðhÞ¼varðyðx þ hÞÀyðxÞÞ. positively correlated the outputs are. This assump- (I follow Cressie (1993), who uses variograms to tion is modeled through the correlogram or the express covariances, whereas Sacks et al. (1989) use related variogram, discussed below. correlation functions.) It can be proven that the Note that in deterministic simulation, Kriging optimal weights in (15) are   has an important advantage over linear regression 0 À1 0 0 1 À 1 C c À1 analysis: Kriging is an exact interpolator; that is, k ¼ c þ 1 C ð16Þ 10CÀ11 predicted values at observed input values are ex- actly equal to the simulated output values. with the following symbols: c: vector of the n The simplest type of Kriging (to which I limit (co)variances between the output at the new input this review) assumes the following metamodel (also x0 and the outputs at the n old inputs, so c c x x ; ...; c x x 0, C: n n matrix of see (4) with l ¼ b0 and b1 ¼ 0): ¼ð ð 0 À 1Þ ð 0 À nÞÞ Â the covariances between the outputs at the n old y ¼ l þ e ð14aÞ inputs––with element (i; j) equal to cðxi À xjÞ, and with 1: vector of n ones. I point out that the optimal weights in (16) vary EðeÞ¼0; varðeÞ¼r2; ð14bÞ with the input value for which output is to be pre- where l is the mean of the stochastic process yð:Þ, dicted (see c), whereas linear regression uses the and e is the additive noise, which is assumed to have same estimated metamodel (with b^) for all inputs to zero mean and constant finite variance r2 (further- be predicted. (A forthcoming paper discusses the more, many authors assume normality). Kriging fact that the weights k are estimated via the esti- further assumes a stationary covariance process; i.e., mated covariances c and C, so the Kriging predictor ARTICLE IN PRESS

10 J.P.C. Kleijnen / European Journal of Operational Research xxx (2004) xxx–xxx is actually a non-linear random variable; see, Den values per input can be much larger than in Hertog et al., 2004.) Plackett–Burman designs or CCD. 2. Next, LHS places these integers 1; ...; n such 4.2. Designs for Kriging that each integer appears exactly once in each row and each column of the design. (This ex- The most popular design type for Kriging is plains the term ‘Latin hypercube’: it resembles LHS. This design type was invented by McKay Latin squares in classic DOE.) et al. (1979) for deterministic simulation models. Those authors did not analyze the I/O data by Within each cell of the design in the table, the Kriging (but they did assume I/O functions more exact input value may be sampled uniformly; see complicated than the polynomial models in classic Fig. 4. (Alternatively, these values may be placed DOE). Nevertheless, LHS is much applied in Kri- systematically in the middle of each cell. In risk ging nowadays, because LHS is a simple technique analysis, this uniform sampling may be replaced by (it is part of spreadsheet add-ons such as @Risk). sampling from some other distribution for the input LHS offers flexible design sizes n (number of values.) scenarios simulated) for any number of simulation Because LHS implies randomness, its result may inputs, k. An example is shown for k ¼ 2 and n ¼ 4 happen to be an outlier. For example, it might in Table 5 and Fig. 4, which are constructed as happen––with small probability––that in Fig. 4 all follows. scenarios lie on the main diagonal, so the values of the two inputs have a correlation coefficient of )1. 1. The table illustrates that LHS divides each input Therefore, the LHS may be adjusted to become range into n intervals of equal length, numbered (nearly) orthogonal; see Ye (1998). from 1 to n (the example has n ¼ 4; see the num- We may also compare classic designs and LHS bers in the last two columns); i.e., the number of geometrically. Fig. 3 illustrates that many classic designs consist of corners of k-dimensional cubes. Table 5 These designs imply simulation of extreme scenar- A LHS design for two factors and four scenarios ios. LHS, however, has better space filling proper- Scenario Interval factor 1 Interval factor 2 ties. (In risk analysis, the scenarios fill the space 121 according to a––possibly non-uniform––distribu- 214 tion.) 343 This space filling property has inspired many 432 statisticians to develop related designs. One type

Fig. 4. A LHS design for two factors and four scenarios. ARTICLE IN PRESS

J.P.C. Kleijnen / European Journal of Operational Research xxx (2004) xxx–xxx 11 maximizes the minimum Euclidean distance be- 4. Executing the preceding three steps for all sce- tween any two points in the k-dimensional experi- narios gives n predictions y^i. mental area. Other designs minimize the maximum 5. These y^i can be compared with the correspond- distance. See Koehler and Owen (1996), Santner ing simulation outputs wi. This comparison et al. (2003), and also Kleijnen et al. (2004a). may be done through a . The analysts may eyeball that plot to decide whether they find the metamodel acceptable. 5. Cross-validation of metamodels Case studies using this cross-validation proce- Whatever metamodel is used (polynomial, Kri- dure are Vonk Noordegraaf (2002) and Van Groe- ging, etc.), the analysts should validate that mod- nendaal (1998). el––once its parameters have been estimated. Kleijnen and Sargent (2000) discuss many criteria. In this review, I focus on the question: does the 6. Conclusion and further research metamodel give adequate predictions? To answer this question, I discuss cross-validation for linear Because simulation––treated as a black box–– regression; after that discussion, it will be obvious implies experimentation with a model, DOE is how cross-validation also applies to other meta- essential. In this review, I discussed both classic model types. I explain a different validation proce- DOE for polynomial regression metamodels and dure for linear regression models in Appendix A. modern DOE (including LHS) for other metamod- I assume that the analysts use OLS to estimate els such as Kriging models. The simpler the meta- the regression parameters; see (10). This yields the n model is, the fewer scenarios need to be simulated. classic regression predictors for the n scenarios im- I did not discuss so-called optimal designs be- plied by X in (10): cause these designs use statistical assumptions (such as white noise) that I find too unrealistic. A recent Y^ Xb^ 17 ¼ ð Þ discussion of optimal DOE––including references–– However, the analysts can also compute regression is Spall (2003). predictors through cross-validation, as follows. Neither did I discuss the designs in Taguchi (1987), as I think that the classic and modern de- 1. Delete I/O combination i from the complete set of signs that I did discuss are superior. Nevertheless, I n combinations. I suppose that this i ranges from believe that Taguchi’s concepts––as opposed to his 1 through n, which is called leave-one-out cross- statistical techniques––are important. In practice, validation. I assume that this procedure results the ‘optimal’ solution may break down because the in n non-singular matrixes, each with n À 1 rows environment turns out to differ from the environ- (say) XÀiði ¼ 1; 2; ...; nÞ. To satisfy this assump- ment that the analysts assumed when deriving the tion, the original matrix Xð¼ XÀ0Þ must satisfy optimum. Therefore they should look for a ‘robust’ n > q where q denotes the number of regression solution. For further discussion I refer to Kleijnen parameters. Counter-examples are the saturated et al. (2004a). designs in Tables 3 and 4; the solution is to exper- Because of space limitations, I did not discuss iment with one factor less or to add one scenario sequential DOE, except for SB and two-stage reso- (e.g., the scenario with all coded x-values set to lution IV designs––even though the sequential nat- zero, which is the base scenario). ure of simulation experiments (caused by the 2. Next the analysts recompute the OLS estimator computer architecture) makes such designs very of the regression parameters b; i.e., they use attractive. See Jin et al. (2002), Kleijnen et al. ^ (10) with XÀi and (say) wÀi to get bÀi. (2004a), and Kleijnen and Van Beers (2004b). ^ ^ 3. Substituting bÀi (which results from step 2) for b An interesting research question is: how much in (17) gives y^i, which denotes the regression pre- computer time should analysts spend on replication; dictor for the scenario deleted in step 1. how much on exploring new scenarios? ARTICLE IN PRESS

12 J.P.C. Kleijnen / European Journal of Operational Research xxx (2004) xxx–xxx

Another challenge is to develop designs that analysts replicate each scenario (say) m times with explicitly account for multiple outputs. This may be non-overlapping PRN and m > 1, and get m esti- ^ a challenge indeed in SB (depending on the output mates (say) brðr ¼ 1; ...; mÞ of the regression selected to guide the search, different paths lead to parameters. From these estimates they can estimate the individual factors identified as being important). covðb^Þ in (A.1). ( The non-overlapping PRN reduce In practice, multiple outputs are the rule in simu- the q  q matrix covðb^Þ to a diagonal matrix with ^ lation; see Kleijnen and Smits (2003) and also the elements varðbjÞ, j ¼ 1; ...; q, on the main Kleijnen et al. (2004a). diagonal; CRN is allowed.) Note that this valida- The application of Kriging to random simulation tion approach requires replication, whereas cross- models seems a challenge. Moreover, corresponding validation does not. software needs to be developed. Also see Lophaven Next, the analysts simulate this new scenario et al. (2002). with new non-overlapping PRN, and get wnþ1.To Comparison of various metamodel types and estimate the variance of this simulation output, the their designs remains a major problem. For exam- analysts may again use m replicates, resulting in d ple, Meckesheimer et al. (2001) compare radial ba- w nþ1 and varðw nþ1Þ. sis, neural net, and polynomial metamodels. Clarke I recommend comparing the regression predic- et al. (2003) compare low-order polynomials, radial tion and the simulation output through a Student t basis functions, Kriging, splines, and support vector test regression. Alam et al. (2003) found that LHS gives y^nþ1 À w nþ1 the best neural-net metamodels. Comparison of tmÀ1 ¼ : ðA:2Þ dvar y^ dvar w^ 1=2 screening designs has hardly been done; see Kleij- f ð nþ1Þþ ð nþ1Þg nen et al. (2004a,b). The analysts should reject the metamodel if this test statistic exceeds the 1 À a quantile of the tmÀ1 distribution. Acknowledgements If the analysts simulate several new scenarios, then they can still apply the t test in (A.2)––now The following colleagues provided comments on combined with Bonferroni’s inequality. a first draft of this paper: Russell Barton (Penn- sylvania State University) and Wim van Beers (Tilburg University). References

Alam, F.M., McNaught, K.R., Ringrose, T.J., 2003. A compar- Appendix A. Alternative validation test of linear ison of experimental designs in the development of a neural network simulation metamodel. Simulation Modelling: regression metamodels Practice and Theory (accepted conditionally). Angun,€ E., Den Hertog, D., Gurkan,€ G., Kleijnen, J.P.C., 2002. Instead of the cross-validation procedure dis- Response surface methodology revisited. In: Yucesan,€ E., cussed in Section 5, I propose the following test–– Chen, C.H., Snowdon, J.L., Charnes, J.M. (Eds.), Proceed- which applies only to linear regression metamodels ings of the 2002 Winter Simulation Conference, Institute of Electrical and Electronics Engineers, Piscataway, NJ, pp. (not to other types of metamodels); also see Kleij- 377–383. nen (1992). Antioniadis, A., Pham, D.T., 1998. Wavelet regression for The accuracy of the predictor for the new sce- random or irregular design. Computational Statistics and 28, 353–369. nario xnþ1 based on (17) may be quantified by its variance Bettonvil, B., Kleijnen, J.P.C., 1996. Searching for important factors in simulation models with many factors: sequential 0 ^ bifurcation. European Journal of Operational Research 96 varðy^nþ1Þ¼xnþ1 covðbÞxnþ1; ðA:1Þ (1), 180–194. ^ Bettonvil, B., Kleijnen, J.P.C., 1990. Measurement scales and where covðbÞ was given by (13) in case of white resolution IV designs. American Journal of Mathematical noise. For more realistic cases, I propose that and Management Sciences 10 (3–4), 309–322. ARTICLE IN PRESS

J.P.C. Kleijnen / European Journal of Operational Research xxx (2004) xxx–xxx 13

Cheng, R.C.H., Kleijnen, J.P.C., 1999. Improved design of sim- Kleijnen, J.P.C., Sargent, R.G., 2000. A methodology for ulation experiments with highly heteroskedastic responses. the fitting and validation of metamodels in simulation. Operations Research 47 (5), 762–777. European Journal of Operational Research 120 (1), 14– Clarke, S.M., Griebsch, J.H., Simpson, T.W., 2003. Analysis of 29. support vector regression for approximation of complex Kleijnen, J.P.C., Smits, M.T., 2003. Performance metrics in engineering analyses. Proceedings of DETC ’03, ASME supply chain management. Journal Operational Research 2003 Design Engineering Technical Conferences and Com- Society 54 (5), 507–514. puters and Information in Engineering Conference, Chi- Kleijnen, J.P.C., Van Beers, W.C.M., 2004a. Robustness of cago. Kriging when interpolating in random simulation with Cressie, N.A.C., 1993. Statistics for Spatial Data. Wiley, New heterogeneous variances: Some experiments. European York. Journal of Operational Research (accepted conditionally). Den Hertog, D., Kleijnen, J.P.C., Siem, A.Y.D., 2004. The Kleijnen, J.P.C., Van Beers, W.C.M., 2004b. Application-driven correct Kriging variance. (Preliminary title) Working sequential designs for simulation experiments: Kriging Paper, CentER, Tilburg University, Tilburg, The Nether- metamodeling. Jounal of the Operational Research Society lands (preprint: http://center.kub.nl/staff/kleijnen/papers. (in press). html). Koehler, J.R., Owen, A.B., 1996. Computer experiments. In: Donohue, J.M., Houck, E.C., Myers, R.H., 1993. Simulation Ghosh, S., Rao, C.R. (Eds.), Handbook of Statistics, vol. designs and correlation induction for reducing second-order 13. Elsevier, Amsterdam, pp. 261–308. bias in first-order response surfaces. Operations Research 41 Law, A.M., Kelton, W.D., 2000. Simulation Modeling and (5), 880–902. Analysis, third ed. McGraw-Hill, New York. Horne, G., Leonardi, M. (Eds.), 2001. Maneuver Warfare Lophaven, S.N., Nielsen, H.B., Sondergaard, J., 2002. DACE: a Science 2001. Defense Automatic Printing Service, Quan- Matlab Kriging toolbox, version 2.0. Lyngby (Denmark), tico, VA. IMM Technical University of Denmark. Jin, R., Chen, W., Sudjianto, A., 2002. On sequential McKay, M.D., Beckman, R.J., Conover, W.J., 1979. A compar- sampling for global metamodeling in engineering ison of three methods for selecting values of input variables design. In: Proceedings of DETC ’02, ASME 2002 in the analysis of output from a computer code. Techno- Design Engineering Technical Conferences and Comput- metrics 21 (2), 239–245 (reprinted in 2000: Technometrics 42 ers and Information in Engineering Conference, DETC (1), 55–61). 2002/DAC-34092, September 29–October 2, Montreal, Can- Meckesheimer, M., Barton, R.R., Limayem, F., Yannou, B., ada. 2001. Metamodeling of combined discrete/continuous re- Kleijnen, J.P.C., 2004. Design and analysis of Monte Carlo sponses. AIAA Journal 39, 1950–1959. experiments. In: Gentle, J.E., Haerdle, W., Mori Y. (Eds.), Rechtschaffner, R.L., 1967. Saturated fractions of 2n and 3n Handbook of Computational Statistics; vol. I: Concepts and factorial designs. Technometrics 9, 569–575. Fundamentals. Springer, Heidelberg. Sacks, J., Welch, W.J., Mitchell, T.J., Wynn, H.P., 1989. Design Kleijnen, J.P.C., 1998. Experimental design for sensitivity anal- and analysis of computer experiments. Statistical Science 4 ysis, optimization, and validation of simulation models. In: (4), 409–435. Banks, J. (Ed.), Handbook of Simulation. Wiley, NewYork, Santner, T.J., Williams, B.J., Notz, W.I., 2003. The design pp. 173–223. and analysis of computer experiments. Springer, New York. Kleijnen, J.P.C., 1992. Regression metamodels for simulation Schruben, L.W., Margolin, B.H., 1978. Pseudorandom number with common random numbers: comparison of validation assignment in statistically designed simulation and distribu- tests and confidence intervals. Management Science 38 (8), tion sampling experiments. Journal American Statistical 1164–1185. Association 73 (363), 504–525. Kleijnen, J.P.C., 1987. Statistical tools for simulation practitio- Simpson, T.W., Mauery, T.M., Korte, J.J., Mistree, F., 2001. ners. Marcel Dekker, New York. Kriging metamodels for global approximation in simula- Kleijnen, J.P.C., 1975. Statistical techniques in simulation, vol. tion-based multidisciplinary design optimization. AIAA II. Marcel Dekker, New York [Russian translation, Pub- Journal 39 (12), 2233–2241. lishing House ‘‘Statistics’’, Moscow, 1978]. Spall, J.C., 2003. Introduction to Stochastic Search and Optimi- Kleijnen, J.P.C., Sanchez, S.M., Lucas, T.W., Cioppa, T.M., zation; Estimation, Simulation, and Control. Wiley, Hobo- 2004a. A user’s guide to the brave new world of designing ken, NJ. simulation experiments. INFORMS Journal on Computing Taguchi, G., 1987. System of Experimental Designs, vol. 1 and 2. (accepted conditionally). UNIPUB/Krauss International, White Plains, NY. Kleijnen, J.P.C., Bettonvil, B., Person, F., 2004b. Finding Van Groenendaal, W.J.H., 1998. The Economic Appraisal of the important factors in large discrete-event simulation: Natural Gas Projects. Oxford University Press, Oxford. sequential bifurcation and its applications. In: Dean, A.M., Vonk Noordegraaf, A., 2002. Simulation modelling to support Lewis, S.M. (Eds.), Screening, Springer, New York (forth- national policy making in the control of bovine herpes virus coming; preprint: http://center.kub.nl/staff/kleijnen/papers. 1. Doctoral dissertation, Wageningen University, Wagenin- html). gen, The Netherlands. ARTICLE IN PRESS

14 J.P.C. Kleijnen / European Journal of Operational Research xxx (2004) xxx–xxx

Yamada, S., Lin, D.K.J., 2002. Construction of mixed-level Ye, K.Q., 1998. Orthogonal column Latin hypercubes and their supersaturated design. Metrika (56), 205–214, Available application in computer experiments. Journal Association online: http://springerlink.metapress.com/app/home/con- Statistical Analysis, Theory and Methods (93), 1430–1439. tent.asp?wasp ¼ 4h8ac83qyg2rvm9lwa7w&referrer ¼ contri- Zeigler, B.P., Praehofer, K., Kim, T.G., 2000. Theory of Modeling bution&format ¼ 2&page ¼ 1. and Simulation, second ed. Academic Press, New York.