A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output from a Computer Code Author(S): M

American Society for Quality A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output from a Computer Code Author(s): M. D. Mckay, R. J. Beckman, W. J. Conover Source: Technometrics, Vol. 42, No. 1, Special 40th Anniversary Issue (Feb., 2000), pp. 55-61 Published by: American Statistical Association and American Society for Quality Stable URL: http://www.jstor.org/stable/1271432 Accessed: 29/06/2010 10:02 Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/action/showPublisher?publisherCode=astata. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. American Statistical Association and American Society for Quality are collaborating with JSTOR to digitize, preserve and extend access to Technometrics. http://www.jstor.org A Comparison of Three Methods for Selecting Values of Input Variables in the Analysis of Output From a Computer Code M. D. MCKAYAND R. J. BECKMAN W. J. CONOVER Los Alamos ScientificLaboratory Departmentof Mathematics P.O.Box 1663 Texas Tech University Los Alamos, NM87545 Lubbock,TX 79409 Two types of sampling plans are examined as alternatives to simple random sampling in Monte Carlo studies. These plans are shown to be improvements over simple random sampling with respect to variance for a class of estimators which includes the sample mean and the empirical distribution function. KEY WORDS: Latin hypercube sampling; Sampling techniques; Simulation techniques; Variance reduction. 1. INTRODUCTION to comparing the three methods with respect to their per- in code. Numerical methods have been used for years to provide formance an actual computer The code used in this was approximate solutions to fluid flow problems that defy ana- computer paper developed in the of the Theoretical Division lytical solutions because of their complexity. A mathemati- Hydrodynamics Group at the Los Alamos Scientific to reac- cal model is constructed to resemble the fluid flow problem, Laboratory, study tor and Romero The code is and a computer program (called a "code"), incorporating safety (Hirt 1975). computer named SOLA-PLOOP and is a one-dimensional version of methods of obtaining a numerical solution, is written. Then another code SOLA (Hirt, Nichols, and Romero 1975). The for any selection of input variables X = (X,..., XK) an code was used us to model the blowdown output variable Y = h(X) is produced by the computer by depressurization of a filled with water at fixed initial tem- code. If the code is accurate the output Y resembles what straight pipe perature and pressure. Input variables include: X1, the actual output would be if an experiment were performed phase change rate; X2, drag coefficient for drift velocity; X3, num- under the conditions X. It is often impractical or impossi- ber of bubbles per unit volume; and X4, pipe roughness. The ble to perform such an experiment. Moreover, the computer input variables are assumed to be uniformly distributedover codes are sometimes sufficiently complex so that a single given ranges. The output variable is pressure as a function set of input variables may require several hours of time on of time, where the initial time to is the time the pipe rup- the fastest computers presently in existence in order to pro- tures and depressurization initiates, and the final time tl is duce one output. We should mention that a single output 20 milliseconds later. The pressure is recorded at 0.1 milli- Y is usually a graph Y(t) of output as a function of time, second time intervals. The code was used repeatedly so that calculated at discrete time points t, to < t < tl. the accuracy and precision of the three sampling methods When real world with a modeling phenomena computer could be compared. code one is often faced with the problem of what values to use for the inputs. This difficulty can arise from within the physical process itself when system parameters are not 2. A DESCRIPTION OF THE THREE METHODS constant, but vary in some manner about nominal values. USED FOR SELECTING THE VALUES OF INPUT We model our uncertainty about the values of the inputs VARIABLES by treating them as random variables. The information de- From the many different methods of selecting the values sired from the code can be obtained from a study of the of input variables, we have chosen three that have consid- probability distribution of the output Y(t). Consequently, erable intuitive appeal. These are called random sampling, we model the "numerical" experiment by Y(t) as an un- stratified sampling, and Latin hypercube sampling. known transformation h(X) of the inputs X, which have a Random Sampling. Let the input values XI,..., XN be known probability distribution F(x) for x c S. Obviously a random sample from F(x). This method of sampling is several values of X, say XI,..., XN, must be selected as perhaps the most obvious, and an entire body of statistical successive inputs sets in order to obtain the desired infor- literature may be used in making inferences regarding the distribution of mation concerning Y(t). When N must be small because Y(t). of the running time of the code, the input variables should be selected with great care. ? 1979 American Statistical Association The next section describes three methods of selecting and the American Society for Quality (sampling) input variables. Sections 3, 4 and 5 are devoted TECHNOMETRICS,FEBRUARY 2000, VOL. 42, NO. 1 55 56 M. D. MCKAY,R. J. BECKMAN,AND W. J. CONOVER StratifiedSampling. Using stratifiedsampling, all areas the following theorem,proved in the Appendix,relates the of the sample space of X are representedby input values. variances of TL and TR. Let the sample space S of X be partitionedinto I disjoint strata Si. Let pi = P(X E Si) representthe size of Si. Theorem. If Y = h(X1,... XK) is monotonic in each of its Obtain a random sample Xij, j = 1,..., ni from Si. Then arguments,and g(Y) is a monotonicfunction of Y, of course the ni sum to N. If I = 1, we have random then Var(TL) < Var(TR). samplingover the entire samplespace. 2.2 The SOLA-PLOOP Example Latin Hypercube Sampling. The same reasoning that led The to stratifiedsampling, ensuring that all portionsof S were three sampling plans were compared using the sampled,could lead further.If we wish to ensurealso that SOLA-PLOOPcomputer code with N = 16. Firsta random each of the input variablesXk has all portionsof its dis- sample consisting of 16 values of X = (X1, X2,X3, X4) was tributionrepresented by input values, we can divide the selected,entered as inputs,and 16 graphsof Y(t) were observedas range of each Xk into N strataof equal marginalproba- outputs.These outputvalues were used in the bility 1/N, and sample once from each stratum.Let this estimators. For the sample be Xkj, j = 1,..., N. These form the Xk compo- stratifiedsampling method the rangeof each in- variablewas divided at nent, k = 1,..., K, in Xi, i = 1,..., N. The components put the median into two parts of of the various Xk's are matchedat random.This method equalprobability. The combinationsof rangesthus formed 24 = 16 of selecting inputvalues is an extensionof quotasampling produced strataSi. One observationwas obtained (Steinberg1963), and can be viewed as a K-dimensional at randomfrom each Si as input,and the resultingoutputs extensionof Latin squaresampling (Raj 1968). were used to obtainthe estimates. To obtainthe One advantageof the Latin hypercubesample appears Latinhypercube sample the rangeof each variable when the output Y(t) is dominatedby only a few of input Xi was stratifiedinto 16 intervalsof equal the componentsof X. This method ensures that each of probability,and one observationwas drawnat randomfrom each those componentsis representedin a fully stratifiedman- interval.These 16 valuesfor the 4 inputvariables were ner, no matter which componentsmight turn out to be matchedat randomto form 16 inputs,and thus 16 outputs important. from the code. The entire We mentionhere that the N intervalson the rangeof each process of samplingand estimatingfor the componentof X combine to form NK cells which cover three selection methodswas repeated50 times in orderto some idea of the sample space of X. These cells, which are labeled by get the accuraciesand precisions involved. The coordinatescorresponding to the intervals,are used when total computertime spent in runningthe SOLA-PLOOP findingthe propertiesof the samplingplan. code in this study was 7 hours on a CDC-6600. Some of the standarddeviation plots appearto be inconsistentwith 2.1 Estimators the theoreticalresults. These occasionaldiscrepancies are In the Appendix(Section 8), stratifiedsampling and Latin believedto arisefrom the non-independenceof the estima- hypercubesampling are examinedand compared to random tors over time and the small samplesizes. samplingwith respectto the class of estimatorsof the form 3.

Load more