Univariate Input Models for Stochastic Simulation ME Kuhl1,JSIvy2,EKLada3, NM Steiger4, MA Wagner5 and JR Wilson2 1Rochester Institute of Technology, Rochester, NY, USA, 2North Carolina State University, Raleigh, NC, USA, 3SAS Institute Inc., Cary, NC, USA, 4University of Maine, Orono, ME, USA, 5SAIC, Vienna, VA, USA

Techniques are presented for modeling and then randomly sampling many of the univariate probabilistic input processes that drive discrete-event simulation experiments. Emphasis is given to the generalized family, the Johnson translation system of distributions, and the Bézier distribution family be- cause of the flexibility of these families to model a wide range of distributional shapes that arise in practical applications. Methods are described for rapidly fitting these distributions to data or to subjective information (expert opinion) and for randomly sampling from the fitted distributions. Also discussed are applications ranging from pharmaceutical manufacturing and medical decision analysis to smart-materials research and healthcare systems analysis.

Keywords: simulation; univariate input models; generalized beta distributions; Johnson translation system of distributions; Bézier distributions

1. Introduction

One of the main problems in the design and construction of stochastic simulation experiments is the selection of valid input models—i.e., distributions that accurately mimic the behavior of the random input processes driving the system under study. Often the following interrelated difficulties arise in attempts to use standard distribution families for simulation input modeling:

1. Standard distribution families cannot adequately represent the probabilistic behavior of many real- world input processes, especially in the tails of the underlying distribution.

2. The parameters of the selected distribution family are troublesome to estimate from either sample data or subjective information (expert opinion).

3. Fine-tuning or editing the shape of the fitted distribution is difficult because (i) there are a limited number of parameters available to control the shape of the fitted distribution, and (ii) there is no effective mechanism for directly manipulating the shape of the fitted distribution while simultaneously updating the corresponding parameter estimates.

In modeling a simulation input process, the practitioner must identify an appropriate distribution family and then estimate the corresponding distribution parameters; and the problems enumerated above can hinder the progress of both of these model-building activities.

Correspondence: JR Wilson, Edward P. Fitts Department of Industrial and Systems Engineering, North Carolina State Uni- versity, Campus Box 7906, Raleigh, North Carolina 27695-7906, USA. E-mail: [email protected] uim11.tex 1 July 9, 2009 – 11:35 The conventional approach to identification of a stochastic simulation input model encompasses several procedures for using sample data to accept or reject each of the distribution families in a list of well-known alternatives. These procedures include (i) informal graphical techniques based on probability plots, fre- quency distributions, or box-plots; and (ii) statistical goodness-of-fit tests such as the Kolmogorov-Smirnov, chi-squared, and Anderson-Darling tests. For a detailed discussion of these procedures, see Sections 6.3–6.6 of Law (2007). Unfortunately, none of these procedures is guaranteed to yield a definitive conclusion. For example, identification of an input distribution can be based on visual comparison of superimposed graphs of a histogram of the available data set and the fitted probability density function (p.d.f.) for each of sev- eral alternative distribution families. In this situation, however, the final conclusion depends largely on the number of class intervals (also called bins or cells) in the histogram as well as the class boundaries; and a different layout for the histogram could lead the user to identify a different distribution family. Similar anomalies can occur in the use of statistical goodness-of-fit tests. In small samples, these tests can have very low power to detect lack of fit between the empirical distribution and each alternative theoretical dis- tribution, resulting in an inability to reject any of the alternative distributions. In large samples, moreover, practically insignificant discrepancies between the empirical and theoretical distributions often appear to be statistically significant, resulting in rejection of all the alternative distributions.

After somehow identifying an appropriate family of distributions to model an input process, the simula- tion user also faces problems in estimating the associated distribution parameters. The user often attempts to match the mean and standard deviation of the fitted distribution with the sample mean and standard de- viation of a data set, but shape characteristics such as the sample and are less frequently considered when estimating the parameters of an input distribution. Some estimation methods, such as max- imum likelihood and percentile matching, may simply fail to yield parameter estimates for some distribution families. Even if several distribution families are readily fitted to a set of sample data, the user generally lacks a definitive basis for selecting the appropriate “best-fitting” distribution.

The task of building a simulation input model is further complicated if sample data are not available. In this situation, identification of an appropriate distribution family is arbitrarily based on whatever information can be elicited from knowledgeable individuals (experts); and the corresponding distribution parameters are computed from subjective estimates of simple numerical characteristics of the underlying distribution such as the , selected percentiles, or low-order moments. In summary, simulation practitioners lack a clear- cut, definitive procedure for identifying and estimating high-fidelity stochastic input models (or even merely acceptable, “rough-cut” input models); consequently, simulation output analysis is often based on input processes of questionable validity.

In this article, techniques are presented for modeling and then randomly sampling many of the univari- ate probabilistic input processes that drive discrete-event simulation experiments, with the primary focus on methods designed to alleviate the difficulties encountered in using conventional approaches to simula- tion input modeling. Emphasis is given to the generalized beta distribution family (Section 2), the Johnson translation system of distributions (Section 3), and the Bézier distribution family (Section 4). For each dis- tribution family, we describe methods for fitting distributions to sample data or expert opinion and then for randomly sampling the fitted distributions. Much of the discussion concerns public-domain software and uim11.tex 2 July 9, 2009 – 11:35 fitting procedures that facilitate rapid univariate simulation input modeling. To illustrate these procedures, we also discuss applications ranging from pharmaceutical manufacturing and medical decision analysis to smart-materials research and healthcare systems analysis. Finally in Section 5 conclusions and recommen- dations are presented. Some of the material in this article has been presented in Kuhl et al (2008a, b), which was an invited tutorial on simulation input modeling presented at the 2008 Winter Simulation Conference. In a companion paper (Kuhl et al, 2009) we discuss some multivariate distributions and time-dependent arrival processes that frequently arise in probabilistic simulation input modeling; see also Sections 3–4 of Kuhl et al (2006).

2. Generalized beta distribution family

Suppose X is a continuous random variable with lower limit a and upper limit b whose distribution is to be approximated and then randomly sampled in a simulation experiment. In such a situation, it is often possible to model the probabilistic behavior of X using a generalized beta distribution, whose p.d.f. has the form

˛ 1 ˛ 1 .˛1 C ˛2/.x a/ 1 .b x/ 2 fX .x/ D ˛ C˛ 1 for a x b; (1) .˛1/.˛2/.b a/ 1 2 R 1 z1 t where .z/ D 0 t e dt (for z>0) denotes the gamma function. For graphs illustrating the wide range of distributional shapes achievable with generalized beta distributions, see one of the following ref- erences: pp. 92–93 of Hahn and Shapiro (1967); pp. 291–293 of Law (2007); or pp. 11–14 of Kuhl et al (2008b), which is available online. If X has the p.d.f. (1), then the cumulative distribution function (c.d.f.) of X, which is defined by R x FX .x/ D PrfX xgD 1 fX .w/ dw for all real x, unfortunately has no convenient analytical expres- sion; but the mean and of X are respectively given by

˛1b C ˛2a X D EŒX D (2) ˛1 C ˛2 and 2 2 2 .b a/ ˛1˛2 X D E X X D 2 : (3) .˛1 C ˛2/ .˛1 C ˛2 C 1/

Recall that for a continuous p.d.f. fX ./, a mode m is a local maximum of that function; and if there is a unique global maximum for fX ./, then the p.d.f. is said to be unimodal, and m is usually called the “most likely value” of the random variable X.If˛1;˛2 1 and either ˛1 >1or ˛2 >1, then the beta p.d.f. (1) is unimodal; and the mode is given by

.˛1 1/b C .˛2 1/a m D .˛1;˛2 1 and ˛1˛2 >1/: (4) ˛1 C ˛2 2 Equations (2)–(4) reveal that key distributional characteristics of the generalized beta distribution are simple functions of the parameters a, b, ˛1,and˛2; and this facilitates input modeling—especially in pilot studies in which rapid model development is critical. uim11.tex 3 July 9, 2009 – 11:35 2.1. Fitting beta distributions to data or subjective information

Given a random sample fXi W i D 1;:::;ng of size n from the distribution to be estimated, let X.1/ X.2/ ˚ X.n/ denote the order obtained˚ by sorting the fXi g in ascending order so that X.1/ D min Xi W i D 1;:::;n and X.n/ D max Xi W i D 1;:::;n . We can fit a generalized beta distribution to this data set using the following sample statistics: 9 ay D 2X.1/ X.2/ ; by D 2X.n/ X.n1/ ; = P P ; (5) x 1 n 2 1 n x 2 X D n iD1 Xi ;S D n1 iD1 Xi X : In particular the method of moment matching involves (i) setting the right-hand sides of (2)and(3) equal to the sample mean Xx and the sample variance S 2, respectively; and (ii) solving the resulting equations for the corresponding estimates ˛y1 and ˛y2 of the shape parameters. In terms of the auxiliary quantities

Xx ya S d1 D and d2 D ; by ya by ya the moment-matching estimates of ˛y1 and ˛y2 are given by

2 2 d1 .1 d1/ d1.1 d1/ ˛y1 D 2 d1 ; ˛y2 D 2 .1 d1/: (6) d2 d2

AbouRizk et al (1994) discuss BetaFit, a Windows-based software package for fitting the generalized beta distribution to sample data by computing estimators ay, by, ˛y1,and˛y2 using the following estimation methods:

moment matching with ay D X.1/ and by D X.n/;

feasibility-constrained moment matching, so that the feasibility conditions a

maximum likelihood (assuming a and b are known and thus are not estimated); and

ordinary least squares (OLS) and diagonally weighted least squares (DWLS) estimation of the c.d.f.

Figure 1 demonstrates the application of BetaFit to a sample of 9,980 observations of end-to-end chain lengths (in angströms) of the ionic polymer Nafion based on the method of moment matching. In Sec- tion 3.5 below, we provide further details on the origin of the Nafion data set and its relevance to the problem of predicting the stiffness properties of a certain class of smart materials. Like all the soft- ware packages mentioned in this article, BetaFit is in the public domain and is available on the Web via www.ise.ncsu.edu/jwilson/page3 .

For rapid development of preliminary simulation models, practitioners often base an initial input model for the random variable X on subjective estimates ay, my,andby of the minimum, mode, and maximum, re- spectively, of the distribution of X. Although the is often used in such circumstances, uim11.tex 4 July 9, 2009 – 11:35 Figure 1 Beta p.d.f. (left panel) and c.d.f. (right panel) fitted to 9,980 Nafion chain lengths. it can yield excessively heavy tails—and hence grossly unrealistic simulation results—when the distance by ym between the estimates of the upper limit and mode is much larger than the distance my ya between the estimates of the mode and lower limit, or vice versa. The generalized beta distribution is usually a better choice in such situations, but there is some difficulty in selecting the shape parameters to yield the desired value my for the mode. In many project-management and quality-control applications, it is convenient to assume that the stan- dard deviation of the random variable at hand is one-sixth of the corresponding range; and if we equate 2ı the right-hand sides of (3)and(4), respectively, with the subjective estimates by ya 36 and my of the variance and mode of X, then we must solve a cubic equation to obtain the corresponding shape parameters ofthebetap.d.f.(1). In terms of the auxiliary quantity my ya q D ; by ya we see that in the special cases in which q D 0 or q D 1, the required shape parameters are exactly given by ) ˛y D 1 and ˛y D 3:87227 if q =0; 1 2 (7) ˛y1 D 3:87227 and ˛y2 D 1 if q =1: (For a detailed justification of (7), see the Appendix of this article, which contains exact computing formu- las for the shape parameters of a beta distribution with user-specified values of the endpoints, mode, and variance.) For the more common case in which 0

by ym 1 q r D D my ya q so that the required shape parameters are given by

r2 C 3r C 4 4r2 C 3r C 1 ˛y1 D and ˛y2 D I (8) r2 C 1 r2 C 1 uim11.tex 5 July 9, 2009 – 11:35 see pp. 202–203 of Wilson et al (1982) and McBride and McClelland (1967). If 0:02 q 0:98, then the error in the approximation (8) is less than 3%; and if 0:1 q 0:9, then the error in this approximation is less than 1:2%. To handle situations in which the estimated mode my is very close to one of the estimated endpoints ay and by (that is, q < 0:02 or q > 0:98), see the Appendix. In the application of beta distributions to a problem in medical decision making that is detailed in Section 2.4 below, the error in using the approxi- mation (8) was essentially zero (that is, less than 108) on each of 50 different beta distributions used in the associated simulation study.

AbouRizk et al (1991) discuss the Visual Interactive Beta Estimation System (VIBES), a Windows- based software package that enables graphically-oriented fitting of generalized beta distributions to subjec- tive estimates of: (i) the endpoints a and b; and (ii) any of the following combinations of distributional characteristics—

2 the mean X and the variance X ,

the mean X and the mode m ,

2 the mode m and the variance X ,

1 the mode m and an arbitrary quantile xp D FX .p/ for p 2 .0; 1/ ,or

two quantiles xp and xq for p; q 2 .0; 1/ .

As a general-purpose tool for simulation input modeling, the generalized beta distribution family has the following advantages:

It is sufficiently flexible to represent with reasonable accuracy a wide diversity of distributional shapes.

Its parameters are easily estimated from either sample data or subjective information.

On the other hand, generating samples from the beta distribution is relatively slow; and in some applications, the time to generate beta random variables can be a substantial fraction of the overall simulation run time (Wilson et al, 1982).

2.2. Generating beta variates

Although most general-purpose simulation packages provide a generator of beta random variables, in our experience some care is required to verify the performance of a beta variate generator in cases where any shape parameter is less than one or is very large (say, greater than 30). Note that Equations (7)–(8)always yield 1 ˛1;˛2 4 while equations (A1)–(A5) in the Appendix always yield ˛1;˛2 1; and in these situations, we have obtained excellent results using two procedures available in Press et al (2007). To generate a generalized beta random variable X with minimum a,maximumb, and shape parameters ˛1 and ˛2, the first method uses Gammadev of Press et al (2007) to generate Y.˛1;˛2/, a standard beta random uim11.tex 6 July 9, 2009 – 11:35 variable on the unit interval Œ0; 1 with shape parameters ˛1 and ˛2 ; and then the desired random sample is given by

X D a C .b a/Y.˛1;˛2/: (9) In terms of the incomplete

Z x .˛1 C ˛2/ ˛11 ˛21 Ix.˛1;˛2/ D t .1 t/ dt for 0 x 1 (10) .˛1/.˛2/ 0

(which coincides with the c.d.f. FY.˛1;˛2/.x/ D PrfY.˛1;˛2/ xg of a standard beta random variable Y.˛1;˛2/ for 0 x 1), the second method for generating X is based on inversion of the c.d.f. of X,

1 1 1 X D FX .U / D a C .b a/F .U / D a C .b a/IU .˛1;˛2/; (11) Y.˛1;˛2/ where U UniformŒ0; 1 is a random number and we use the procedure invbetai of Press et al (2007) to 1 obtain a highly accurate approximation to Ix .˛1;˛2/ for all x in Œ0; 1. 1 REMARK 1. In the companion paper on multivariate input modeling (Kuhl et al, 2009), Ix .˛1;˛2/ and the associated approximation invbetai of Press et al (2007) are important tools in our approach to building multivariate beta distributions as well as stationary univariate time series whose marginals are generalized beta distributions.

2.3. Application of beta distributions to pharmaceutical manufacturing

Pearlswig (1995) provides a good example of a pharmaceutical manufacturing simulation whose credibility depended critically on the use of appropriate input models. In this study of the estimated production capacity of a plant that had been designed but not yet built, the usual three time estimates ay, my,andby were obtained from the process engineer for each of the operations in manufacturing a certain type of effervescent tablet. Unfortunately very conservative (i.e., large) estimates were provided for the upper limit by of each operation time; and when triangular distributions were used to represent batch-to-batch variation in actual processing times for each operation within each step of production, the resulting bottlenecks resulted in very low estimates of the probability of reaching a prespecified annual production level.

As in many simulation applications in which subjective estimates ay, my,andby are elicited from experts, the estimate my of the modal (most likely) time to perform a given operation was substantially more reliable than the estimates ay and by of the lower and upper limits on the same operation time. When all the triangular distributions in the simulation were replaced by generalized beta distributions using (8) to ensure confor- mance to the engineer’s estimate of the most likely processing time for each operation within each step, the resulting annual tablet production was in excellent agreement with the production of similar plants already in existence. This simple remedy restored the faith of management in the validity of the overall simulation model, which was subsequently used to finalize certain aspects of the design and operation of the new plant.

uim11.tex 7 July 9, 2009 – 11:35 2.4. Application of beta distributions to medical decision analysis

In the following application of simulation input modeling to medical decision analysis, we compare two alternative methods for estimating the parameters of a generalized beta distribution from limited sample data or subjective information about the minimum, mode, and maximum values of the target random variable. The discussion is also intended to illustrate the extent to which simulation-generated outputs may depend on the endpoints of the fitted beta distributions used in the simulation. This example provides insight into the issues surrounding the use of the generalized beta distribution to represent a simulation input that is subject to randomness or uncertainty when that distribution must be fitted to subjective information or some combination of limited sample data and subjective information.

Cost-effectiveness studies are frequently used in medical decision making for comparing various treat- ment or intervention alternatives. The Panel on Cost-Effectiveness in Health and Medicine (Gold et al, 1996) defines cost-effectiveness analysis (CEA) as “. . . a method designed to assess the comparative impacts of expendituresondifferenthealthinterventions...”that“...involvesestimatingthenet,orincremental,costs and effects of an intervention—its costs and health outcomes compared with some alternative.” Decision models for CEA involve a large number of input parameters, each subject to substantial uncertainty. In par- ticular, these studies involve uncertainty and random variability with respect to utility (i.e., effectiveness), probability, and cost estimates for disease states and interventions. There is variability between patients and parameter uncertainty, each reflected in the standard errors associated with simulation-based estimates of mean performance—for example, the expected values of the costs, quality-adjusted life years, and utilities resulting from alternative treatments. Therefore an accurate assessment of cost effectiveness must involve sensitivity analysis and must attempt to model the inherent variability and uncertainty in these parameter estimates. Probabilistic sensitivity analysis is one method for performing a multiway sensitivity analysis in which all parameters subject to uncertainty are varied simultaneously by Monte Carlo sampling from the distributions postulated for those parameters.

Xu et al (2009) develop a decision-tree model for determining the cost effectiveness of cesarean deliv- ery upon maternal request (CDMR) for women having a single childbirth without indications. Their model compares CDMR with trial of labor (TOL) considering all possible short- and long-term outcomes and the resulting consequences for the mother and neonate. This results in a decision tree containing over 100 chance events. For each parameter in their decision model, Xu et al use either literature-based or expert opinion–based estimates for the mode, minimum, and maximum values. Typically there is limited informa- tion available for parameter distribution estimation; moreover, there is significant variability in the parameter values because of substantial uncertainty regarding mode of delivery with respect to utility measures, the of outcomes, and outcome costs. Here we explore two examples from Xu et al in which we fit beta distributions for utility and probability parameter estimates by two different approaches:

Using the approximation based on Equations (7)and(8); and

Using the version of the so-called “Beta PERT” distribution that is implemented in the @RISK software (Palisade Corporation, 2009), which is usually termed the RiskPert distribution and is detailed in uim11.tex 8 July 9, 2009 – 11:35 Equations (12)and(13)below.

To illustrate each approach, we discuss in some detail how we formulated probabilistic input models of the following quantities:

(i) P.Vag/, the probability of a vaginal delivery given that the decision maker pursues a “trial of labor”; and

(ii) U.SpVag/, the utility associated with a spontaneous vaginal delivery given that the decision maker pursues a “trial of labor.”

A trial of labor is a decision to attempt a vaginal delivery; this will result in a spontaneous vaginal delivery, an instrumental vaginal delivery, or an emergency cesarean section. For the probability of a vaginal delivery P.Vag/, the most likely value of 0.90 was obtained from the published literature. Not only was 0.9 the most frequently cited value, it was also judged to be the highest-quality estimate in terms of sample size and its applicability to populations cited in the literature. The values 0.844 and 0.97 were taken to be the lower and upper bounds on P.Vag/, respectively, because they corresponded to the smallest and largest estimates found in the literature. The associated estimates of the utility U.SpVag/ resulting from a spontaneous vaginal delivery were obtained similarly; and the mode, minimum, and maximum values found in the literature were 0.92, 0.69, and 1.0, respectively.

While the “minimum” and “maximum” values were the smallest and largest values found in the available literature, we recognized that the true lower bound might be less than the estimated “minimum” and the true upper bound might be greater than the estimated “maximum” in many cases. In contrast to Xu et al, who assume that the minimum and maximum values from the literature correspond to the 0:025 and 0:975 percentiles, we explored the effect of assuming that the “true” lower and upper bounds could be obtained by taking an appropriate offset from the original estimated “minimum” and “maximum” values, where the offset is expressed as a fraction of the original estimate of the range,

0 0 a D a .b a/ and b D b C .b a/ for >0:

Based on the original estimate of the mode m as well as the new estimates a0 and b0 of the “true” minimum and maximum values, respectively, for each distribution used in the probabilistic sensitivity analysis, we fitted a beta distribution using the approximation for the associated shape parameters given by Equations (7) and (8). In addition, we fitted the RiskPert version of the beta distribution by assuming that the mean and variance of the random variable X satisfy the following equations, 0 0 0 0 a C 4m C b 2 b X a D and D (12) X 6 X 7 so that the corresponding shape parameters are given by ! ! a0 b0 ˛1 D 6 and ˛2 D 6 D 6 ˛1 : (13) b0 a0 b0 a0 uim11.tex 9 July 9, 2009 – 11:35 (Note that whereas Equations (2)and(3)arealways true for a beta random variable X, Equations (12)and (13) are only satisfied when X has a RiskPert distribution, which is special type of beta distribution.)

The value for was varied from 0 to 0.1. Varying resulted in small changes to the shape parameters for the beta distributions fitted by each method. However, we found that the value of had an effect on the cost-effectiveness decision; and the effect varied depending on the type of distribution used for all the probabilities and utilities in the decision tree. For 2 Œ0; 0:02/ , there was a significant difference in the effectiveness of CDMR and TOL (i.e., the 95% confidence interval for the mean difference in the utility between CDMR and TOL did not include zero) when using beta distributions fitted by each method. For 2 Œ0:02; 0:07 , there was a significant difference in the effectiveness of CDMR and TOL only when using beta distributions fitted via Equations (7)and(8). And for >0:07, the difference in effectiveness of CDMR and TOL was not significant for either method of fitting beta distributions.

The difference in the effect of as a function of the distributional assumptions can be explained by the shapes of the beta distributions fitted by each method. The p.d.f.’s of the fitted beta distributions for P.Vag/ and U.SpVag/ are shown in Figure 2, subfigures (a)–(f), for the cases in which D 0; 0:05,and0:1. For all the other beta distributions used in this application, similar behavior was seen in the superimposed plots of the beta p.d.f. fitted via Equations (7)and(8) versus the beta p.d.f. fitted via Equations (12)and (13). While each fitted distribution has the desired mode in each case, the RiskPert distribution based on (12)and(13) has fatter tails than those of the p.d.f. based on (7)and(8); moreover, we see that for the RiskPert distribution, the variance clearly depends on the mean. As indicated above, the assumptions about the variance that underlie Equations (7)and(8) differ substantially from the assumptions about the mean and variance that underlie the RiskPert distribution; and these differences lead to different conclusions about the cost-effectiveness of CDMR compared with TOL when 2 Œ0:02; 0:07 .

REMARK 2. Several general conclusions emerged from the foregoing applications to pharmaceutical man- ufacturing and medical decision analysis. When input modeling is based on estimates of the minimum, most likely, and maximum values of a target random variable, there is often substantial uncertainty in the estimates of the extreme values; and in such situations the fitted distribution should generally have most of its probability concentrated in the vicinity of the estimated mode, which is much more accurate than the other two estimates. The generalized beta distribution is usually a good choice for rapid input modeling in these situations; and often acceptable results can be obtained using either Equations (7)and(8) or Equations (12)and(13). In our view the primary disadvantage of Equations (12)and(13) is that the variance of the fitted distribution is a function of its mean. In general the analysis of a simulation-generated response is complicated by dependence of the variance of the response on its mean; and numerous variance-stabilizing transformations have been proposed to avoid such undesirable behavior (Irizarry et al 2003). In some types of applications, it may be necessary to study systematically the sensitivity of the simulation-generated re- sults to changes in the assumed values of the mode and variance of each input random variable; and in this case the development given in the Appendix can be used to investigate the impact of independently varying the postulated values of the mode and variance of the fitted beta distribution.

uim11.tex 10 July 9, 2009 – 11:35

2.5 2.5

2 2

1.5 1.5

1 1

0.5 0.5

0 0 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1 (a) P.Vag/, D 0:0 (b) P.Vag/, D 0:05

2.5 2.5

2 2

1.5 1.5

1 1

0.5 0.5

0 0 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 (c) P.Vag/, D 0:10 (d) U.SpVag/, D 0:0

2.5 2.5

2 2

1.5 1.5

1 1

0.5 0.5

0 0 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 (e) U.SpVag/, D 0:05 (f) U.SpVag/, D 0:10

Figure 2 Beta distributions fitted to P.Vag/, the probability of vaginal delivery (subfigures (a)–(c)) and to U.SpVag/, the utility of spontaneous vaginal delivery (subfigures (d)–(f)), where the solid red line is the fit using Equations (7)and(8) and the dashed blue line is the RiskPert fit using (12)and(13) uim11.tex 11 July 9, 2009 – 11:35 3. Johnson translation system of distributions

Starting from a continuous random variable X whose distribution is unknown and is to be approximated and subsequently sampled, Johnson (1949) proposes the idea of inferring an appropriate distribution by identifying a suitable “translation” (or transformation) of X to a standard normal random variable Z with mean 0 and variance 1 so that Z N.0; 1/. The translations have the form X Z D C ı g ; (14) where and ı are shape parameters, is a scale parameter, is a location parameter, and g./ is a function whose form defines the four distribution families in the Johnson translation system, 8 ˆ ln.y/; for SL (lognormal) family, ˆ p <ˆ 2 ln y C y C 1 ; for SU (unbounded) family, g.y/ D ˆ ˆ lnŒy=.1 y/; for SB (bounded) family, :ˆ y; for SN normal family. DeBrota et al (1989a) detail the advantages of the Johnson translation system of distributions for simulation input modeling, especially in comparison with the triangular, beta, and families.

3.1. Johnson distribution and density functions

If (14) is an exact normalizing translation of X to a standard normal random variable, then the c.d.f. of X is given by h i x FX .x/ D ˆ C ı g for all x 2 H, R 1=2 z 1 2 where: (i) ˆ.z/ D .2/ 1 exp 2 w dw denotes the c.d.f. of the N.0; 1/ distribution; and (ii) the space H of X is 8 ˆ ˆ Œ; C1/; for SL (lognormal) family, <ˆ H .1; C1/; for SU (unbounded) family, D ˆ ˆ Œ; C ,forSB (bounded) family, :ˆ .1; C1/; for SN normal family. The p.d.f. of X is given by ( ) 2 ı 0 x 1 x fX .x/ D g exp C ı g .2/1=2 2 H for all x 2 ,where 8 ˆ ˆ 1=y; for SL (lognormal) family, ˆ ıp < 2 1 y C 1; for SU (unbounded) family, g0.y/ D ˆ ˆ 1=Œy.1 y/; for SB (bounded) family, :ˆ 1; for SN normal family. For graphs illustrating the diversity of distributional shapes that can be achieved with the Johnson system of univariate distributions, see DeBrota et al (1989a) or pages 34–37 of Kuhl et al (2008b). uim11.tex 12 July 9, 2009 – 11:35 3.2. Fitting Johnson distributions to sample data

The process of fitting a Johnson distribution to sample data involves first selecting an estimation method and the desired translation function g./ and then obtaining estimates of the four parameters , ı, ,and. The Johnson translation system of distributions has the flexibility to match (i) any feasible combination of 2 values for the mean X , variance X , skewness ı 3 3 SkX D E X X X ; and kurtosis ı 4 4 KuX D E X X X I 2 or (ii) sample estimates of the moments X , X ,SkX ,andKuX . Moreover, in principle the skewness SkX and kurtosis KuX uniquely identify the appropriate translation function g./. Although there are no closed- form expressions for the parameter estimates based on the method of moment matching, these quantities can be accurately approximated using the iterative procedure of Hill et al (1976). Other estimation methods may also be used to fit Johnson distributions to sample data—for example, in the FITTR1 software package (Swain et al, 1988), the following methods are available:

OLS and DWLS estimation of the c.d.f.;

minimum L1 and L1 norm estimation of the c.d.f.;

moment matching; and

percentile matching.

3.3. Fitting SB distributions to subjective information

DeBrota et al (1989b) discuss VISIFIT, a public-domain software package for fitting Johnson SB distribu- tions to subjective information, possibly combined with sample data. The user must provide estimates of the endpoints a and b together with any two of the following characteristics:

the mode m;

the mean X ;

the x0:5;

arbitrary quantile(s) xp or xq for p; q 2 .0; 1/;

the width of the central 95% of the distribution; or

the standard deviation X .

uim11.tex 13 July 9, 2009 – 11:35 3.4. Generating Johnson variates by inversion

After a Johnson distribution has been fitted to a data set, generating samples from the fitted distribution is straightforward. First, a standard normal variate Z N.0; 1/ is generated. Then the corresponding realization of the Johnson random variable X is found by applying to Z the inverse translation 1 Z X D C g ; (15) ı where for all real z we define the inverse translation function 8 z ˆ e ; for SL (lognormal) family, ˆ ı <ˆ z z 1 e e 2; for SU (unbounded) family, g .z/ D ı (16) ˆ z ˆ 1 1 C e ; for SB (bounded) family, :ˆ z; for SN (normal) family.

REMARK 3. Although most popular general-purpose simulation packages provide an acceptable generator of standard normal random variables, we are particularly interested in generating Z by the method of inver- sion, Z D ˆ1.U /,whereU UniformŒ0; 1 is a random number and we use the approximation to ˆ1./ that is available via Normaldist of Press et al (2007). Also recommended is the approximation to ˆ1./ given in Section 26.2.22 of Abramowitz and Stegun (1972). As documented in the companion paper on multivariate input modeling (Kuhl et al, 2009), an accurate approximation to ˆ1./ will be a key element in our approach to building multivariate extensions of the Johnson translation system of distributions as well as stationary univariate time series whose marginals are Johnson distributions.

3.5. Application of Johnson distributions to smart materials research

Matthews et al (2006), Weiland et al (2005), and Gao and Weiland (2008) present a multiscale modeling approach for the prediction of material stiffness of a certain class of smart materials called ionic polymers. The material stiffness depends on multiple parameters, including the effective length of the polymer chains composing the material. In a case study of Nafion, a specific type of ionic polymer, Matthews et al (2006) develop a simulation model of the conformation of Nafion polymer chains on a nanoscopic level, from which a large number of end-to-end chain lengths are generated. The p.d.f. of end-to-end distances is then estimated and used as an input to a macroscopic-level mathematical model to quantify material stiffness.

Figure 3 shows the empirical distribution of 9,980 simulation-generated observations of end-to-end Nafion chain lengths (in angströms). Superimposed on the empirical distribution is the result of using the DWLS estimation method to fit an unbounded Johnson (SU ) distribution to the chain length data. Figure 3 reveals a remarkably accurate fit to the given data set. Furthermore, comparing the Johnson fit in Figure 3 with the beta fits for the same data set in Figure 1, we see that the Johnson distribution is able to capture certain key aspects of the Nafion data set that the beta distribution is unable to represent adequately.

Gao and Weiland (2008), Matthews et al (2006), and Weiland et al (2005) conclude that the estimates of the distribution of chain lengths obtained by fitting an appropriate Johnson distribution to the data are more uim11.tex 14 July 9, 2009 – 11:35

1 0.1

0.9 0.09

0.8 0.08

0.7 0.07

0.6 0.06

0.5 0.05

0.4 0.04

0.3 0.03

0.2 0.02

0.1 0.01

0 0 −10 0 10 20 30 40 50 60 70 80 90 −10 0 10 20 30 40 50 60 70 80 90

Figure 3 Johnson SU c.d.f. (left panel) and p.d.f. (right panel) fitted to 9,980 Nafion chain lengths. intuitive than those using other density estimation techniques for the following reasons. First, it is possible to write down an explicit functional form for the Johnson p.d.f. fX .x/ that is simple to differentiate. This is a 00 crucial property because the second derivative fX .x/ of the p.d.f. will be used as an input to a mathematical model to estimate material stiffness. Second, there is a relatively simple relationship between the Johnson parameters and the material stiffness. Weiland et al (2005) summarize the results of a sensitivity analysis for the Johnson parameters and the corresponding effect on material stiffness. In general, Weiland et al find that increasing the location parameter leads to an increase in predicted stiffness. Similarly, increasing the shape parameter ı or decreasing the scale parameter both lead to marginally higher predicted material stiffness. Establishing a consistent relationship between these parameters and stiffness would first serve to extend the current theory to stiffness predictions, and may ultimately also serve as a step toward the custom design of materials with specific stiffness properties.

3.6. Application of Johnson distributions to healthcare systems analysis

In a recent study of the arrival patterns of patients who have scheduled appointments at a community health- care clinic, Alexopoulos et al (2008) find that patient tardiness (i.e., the patient’s deviation from the sched- uled appointment time) is most accurately modeled using an SU distribution. Specifically they consider data on patient tardiness collected by the Partnership of Immunization Providers, a collaborative public-private project created by the University of California, San Diego School of Medicine, Division of Community Pedi- atrics, in association with community clinics and small, private provider practices. Alexopoulos et al (2008) perform an exhaustive analysis of 18 continuous distributions, and they conclude that the SU distribution provides superior fits to the available data.

uim11.tex 15 July 9, 2009 – 11:35 4. Bézier distribution family

4.1. Definition of Bézier curves

In computer graphics, a Bézier curve is often used to approximate a smooth (continuously differentiable) ˚function on a bounded interval by forcing the Bézier curve to pass in the vicinity of selected control points pi .xi ;zi /T W i D 0; 1; : : : ; n in two-dimensional Euclidean space. (Throughout this article, all vectors will be column vectors unless otherwise stated; and the roman superscript T will denote the transpose of a vector or matrix.) Formally, a Bézier curve of degree n with control points fp0; p1;:::;png is given parametrically by Xn P.t/ D Bn;i .t/ pi for t 2 Œ0; 1; (17) iD0 where the blending function Bn;i .t/ (for all t 2 Œ0; 1) is the Bernstein polynomial

nŠ i ni Bn;i .t/ t .1 t/ for i D 0; 1; : : : ; n: (18) iŠ.n i/Š

4.2. Bézier distribution and density functions

If X is a continuous random variable whose space is the bounded interval Œa; b and if X has c.d.f. FX ./, and p.d.f. fX ./, then in principle we can approximate FX ./ arbitrarily closely using a Bézier curve of the form (17) by taking a sufficient number .n C 1/ of control points with appropriate values for the coordinates .xi ;zi /T of the ith control point pi for i D 0;:::;n.IfX is a Bézier random variable, then the c.d.f. of X is given parametrically by ˚ T P.t/ D x.t/;FX Œx.t/ for t 2 Œ0; 1; (19) where 9 Xn > > x.t/ D Bn;i .t/xi ; => iD0 Xn > (20) > FX Œx.t/ D Bn;i .t/zi : ;> iD0

Equation (20) reveals that the control points p0; p1;:::;pn constitute the parameters regulating all the prop- erties of a Bézier distribution. Thus the control points must be arranged so as to ensure the basic require- ments of a c.d.f.: (i) FX .x/ is monotonically nondecreasing in the cutoff value x; (ii) FX .a/ D 0;and (iii) FX .b/ D 1. By utilizing the Bézier property that the curve described by (19)–(20) passes through the T control points p0 and pn exactly, we can ensure that FX .a/ D 0 if we take p0 .a; 0/ ; and we can ensure T that FX .b/ D 1 if we take pn .b; 1/ . See Wagner and Wilson (1996a) for a complete discussion of univariate Bézier distributions and their use in simulation input modeling.

uim11.tex 16 July 9, 2009 – 11:35 If X is a Bézier random variable with c.d.f. FX ./ given parametrically by (19), then it follows that the corresponding p.d.f. fX .x/ for all real x is given parametrically by ˚ T P .t/ D x.t/;fX Œx.t/ for t 2 Œ0; 1; where x.t/ isgivenby(20)and

Pn1 iD0 Bn1;i .t/zi fX Œx.t/ D Pn1 : iD0 Bn1;i .t/xi

In the last equation, xi xiC1 xi and zi ziC1 zi (i D 0; 1; : : : ; n1) represent the corresponding first differences of the x-andz-coordinates of the original control points fp0; p1;:::;png in the parametric representation (19)ofthec.d.f.

4.3. Generating Bézier variates by inversion

The method of inversion can be used to generate a Bézier random variable whose c.d.f. has the parametric representation displayed in equations (19)–(20). Given a random number U UniformŒ0; 1, we perform the following steps: (i) find tU 2 Œ0; 1 such that Xn Bn;i .tU /zi D U I (21) iD0 and (ii) deliver the variate Xn X D Bn;i .tU /xi : (22) iD0 The solution to (21) can be computed by any root-finding algorithm such as Müller’s method, Newton’s method, or the bisection method. Codes to implement this approach to generating Bézier variates are avail- able on Web site www.ise.ncsu.edu/jwilson/page3.

REMARK 4. As documented in the companion paper on multivariate input modeling (Kuhl et al, 2009), the inversion scheme specified in Equations (21)and(22) for generating Bézier random variables will be a key element in our approach to building multivariate extensions of the univariate Bézier distributions as well as stationary univariate time series whose marginals are Bézier distributions.

4.4. Using PRIME to model Bézier distributions

PRIME is a graphical, interactive software system that incorporates the methodology detailed in this section to help an analyst estimate the univariate input processes arising in simulation studies. PRIME is written entirely in the C programming language, and it has been developed to run under Microsoft Windows. A public-domain version of the software is available on the previously mentioned Web site. PRIME is designed to be easy and intuitive to use. The construction of a c.d.f. is performed through the actions of the mouse, and several options are conveniently available through menu selections. Control points are represented as uim11.tex 17 July 9, 2009 – 11:35 Figure 4 PRIME windows showing the Bézier c.d.f. (left panel) with its control points and the p.d.f. (right panel).

small black squares, and each control point is given a unique label corresponding to its index i in equation (17). Figure 4 shows a typical session in PRIME, where the c.d.f. and p.d.f. windows are both displayed.

In the absence of data, PRIME can be used to model an input process conceptualized from subjective in- formation or expertise. The representation of the conceptualized distribution is achieved by adding, deleting, and moving the control points via the mouse. Each control point acts like a “magnet” that pulls the curve in the direction of the control point, where the blending functions (i.e., the Bernstein polynomials defined by equation (18)) govern the strength of the “magnetic” attraction exerted on the curve by each control point. Pressing and dragging (i.e., moving) a control point causes the displayed c.d.f. to be updated (nearly) instan- taneously. If they are displayed, the corresponding p.d.f., the first four moments (that is, the mean, variance, skewness, and kurtosis), and selected percentile values of the Bézier distribution are updated (nearly) simul- taneously in adjacent windows so that the user gets immediate feedback on the effects of moving selected control points. Thus, the user has a variety of readily available indicators and measures, as well as visually appealing displays, to aid in the construction of the conceptualized distribution.

As detailed in Wagner and Wilson (1996a, b), PRIME includes several standard estimation procedures for fitting distributions to sample data sets:

OLS estimation of the c.d.f.;

minimum L1 and L1 norm estimation of the c.d.f.;

maximum likelihood estimation (assuming a and b are known);

moment matching; and uim11.tex 18 July 9, 2009 – 11:35 percentile matching.

Figure 5 shows a Bézier distribution that was fitted to the same data set consisting of Nafion polymer chain lengths as shown in Figure 3. In this application of PRIME, we obtained the fitted Bézier distribution automatically, where: (i) the number of control points (n+1) was determined by the likelihood ratio test detailed in Wagner and Wilson (1996b); and (ii) the components of the control points were estimated by the method of ordinary least squares. Figure 5 shows that a Bézier distribution yielded an excellent fit to the given data set.

Figure 5 Bézier distribution fitted to 9,980 Nafion chain lengths.

The Bézier distribution family, which is entirely specified by its control points fp0; p1;:::;png,hasthe following advantages:

It is extremely flexible and can represent a wide diversity of distributional shapes. For instance, Figure 4 depicts a that is easily constructed using PRIME, yet impossible to achieve with other distribution families.

If data are available, then the likelihood ratio test of Wagner and Wilson (1996a) can be used in conjunction with any of the estimation methods enumerated above to find automatically both the number and location of the control points. uim11.tex 19 July 9, 2009 – 11:35 In the absence of data, PRIME can be used to determine the conceptualized distribution based on known quantitative or qualitative information that the user perceives to be pertinent.

As the number .nC1/ of control points increases, so does the flexibility in fitting Bézier distributions. The interpretation and complexity of the control points, however, does not change with the number of control points.

5. Conclusions and recommendations

The common thread running through this article is the focus on robust input models that are computationally tractable and sufficiently flexible to represent adequately many of the probabilistic phenomena that arise in many applications of discrete-event stochastic simulation. For another approach to input modeling with no data, see Craney and White (2004).

Notably missing from this article is a discussion of Bayesian techniques for simulation input modeling, a topic that we think will receive increasing attention from practitioners and researchers alike in the future. In selecting the input models for a simulation, we must account for three main sources of uncertainty:

1. Stochastic uncertainty arises from dependence of the simulation output on the random numbers gen- erated and used on each run—for example, the random number U used in generate a generalized beta random variable X via Equation (11).

2. Model uncertainty arises when the correct input model is unknown, and we must choose between alternative input models with different functional forms that adequately fit available sample data or subjective information—for example, the generalized beta, Johnson SU , and Bézier distributions fitted to the Nafion data set as depicted in Figures 1, 3,and5, respectively.

3. Parameter uncertainty arises when the parameters of the selected input model(s) are unknown and must be estimated from sample data or subjective information.

Although stochastic uncertainty is much more widely recognized by simulation practitioners than the other two types of uncertainty, it is not always a major source of variation in simulation output as demon- strated by Zouaoui and Wilson (2004) using an M=G=1 queueing system simulation in which stochastic uncertainty accounts for only 2% of the posterior variance of the average waiting time in the queue, while model uncertainty regarding the exact functional form of the service-time distribution accounts for 18% of the posterior variance—and thus 80% of the posterior variance is due to uncertainty regarding the exact numerical values of the arrival rate and the parameters of the service-time distribution. In such a situation, conventional approaches to input modeling have the potential to yield a grossly misleading picture of the inherent accuracy of simulation-generated system performance measures such as the average queue waiting time. For an introduction to Bayesian input modeling, see Chick (1999, 2001) and Zouaoui and Wilson (2003, 2004). uim11.tex 20 July 9, 2009 – 11:35 Another topic not discussed in this paper is the use of heavy-tailed distributions in simulation input modeling. If the random variable X has a heavy-tailed distribution, then

˛ 1 FX .x/ D PrfX>xgcx as x !1; (23) where c>0is a location parameter, ˛ is a shape parameter with ˛ 2 .1; 2/,and means that the ratio of the left- and right-hand sides of (23) tends to 1 as x !1. Heavy-tailed distributions frequently arise in simulations of computer and communications systems (Crovella and Lipsky, 1997; Greiner et al, 1999; Heyde and Kou, 2004). Fishman and Adan (2005) discuss some situations in which the lognormal distri- bution (a member of the Johnson translation system) can provide a reasonable substitute for a heavy-tailed distribution.

Additional material on techniques for simulation input modeling will be posted to the Web site www.ise .ncsu.edu/jwilson/more_info .

Acknowledgments

Partial support for some of the research described in this paper was provided by National Science Foundation Grant DMI-9900164.

References

AbouRizk, S. M., D. W. Halpin, and J. R. Wilson. 1991. Visual interactive fitting of beta distributions. Journal of Construction Engineering and Management 117 (4): 589–605. AbouRizk, S. M., D. W. Halpin, and J. R. Wilson. 1994. Fitting beta distributions based on sample data. Journal of Construction Engineering and Management 120 (2): 288–305. Abramowitz, M., and I. A. Stegun. 1972. Handbook of mathematical functions with formulas, graphs, and mathematical tables, New York: Dover Publications Inc. Alexopoulos, C., D. Goldsman, J. Fontanesi, D. Kopald, and J. R. Wilson. 2008. Modeling patient arrival times in community clinics. Omega 36:33–43. Chick, S. E. 1999. Steps to implement Bayesian input distribution selection. In Proceedings of the 1999 Winter Simulation Conference, ed. P. A. Farrington, H. B. Nembhard, D. T. Sturrock, and G. W. Evans, 317–324. Piscataway, New Jersey: Institute of Electrical and Electronics Engineers. Available online as www.informs-sim.org/wsc99papers/044.PDF [accessed March 28, 2009]. Chick, S. E. 2001. Input distribution selection for simulation experiments: Accounting for input uncertainty. Operations Research 49 (5): 744–758. Craney, T. A., and N. White. 2004. Distribution selection with no data using VBA and Excel. Quality Engineering 16 (4): 643–656.

uim11.tex 21 July 9, 2009 – 11:35 Crovella, M. E., and L. Lipsky. 1997. Long-lasting transient conditions in simulations with heavy-tailed workloads. In Proceedings of the 1997 Winter Simulation Conference, ed. S. Andradóttir, K. J. Healy, D. H. Withers, and B. L. Nelson, 1005–1012. Piscataway, New Jersey: Institute of Electrical and Electron- ics Engineers. Available online as www.informs-sim.org/wsc97papers/1005.PDF [accessed July 8, 2009]. DeBrota, D. J., R. S. Dittus, S. D. Roberts, J. R. Wilson, J. J. Swain, and S. Venkatraman. 1989a. Modeling input processes with Johnson distributions. In Proceedings of the 1989 Winter Simulation Conference, ed. E. A. MacNair, K. J. Musselman, and P. Heidelberger, 308–318. Piscataway, New Jersey: Institute of Electrical and Electronics Engineers. Available online via www.ise.ncsu.edu/jwilson/files/ debrota89wsc.pdf [accessed March 28, 2009]. DeBrota, D. J., R. S. Dittus, S. D. Roberts, and J. R. Wilson. 1989b. Visual interactive fitting of bounded Johnson distributions. Simulation 52 (5): 199–205. Dickson, L. E. New first course in the theory of equations. New York: Wiley. Fishman, G. S., and I. J. B. Adan. 2005. How heavy-tailed distributions affect simulation-generated time averages. ACM Transactions on Modeling and Computer Simulation 16 (2): 152–173. Gao, F., and L. M. Weiland. 2008. A multiscale model applied to ionic polymer stiffness prediction. Journal of Materials Research 23 (3): 833–841. Gold, M. R., J. E. Siegel, L. B. Russell, and M. C. Weinstein. 1996. Cost-effectiveness in health and medicine. New York: Oxford University Press. Greiner, M., M. Jobmann, and L. Lipsky. 1999. The importance of power-tail distributions for modeling queueing systems. Operations Research 47 (2): 313–326. Hahn, G. J., and S. S. Shapiro. 1967. Statistical models in engineering. New York: Wiley. Heyde, C. C., and S. G. Kou. 2004. On the controversy over tailweight distributions. Operations Research Letters 32:399–408. Hill, I. D, R. Hill, and R. L. Holder. 1976. Algorithm AS99: Fitting Johnson curves by moments. Applied Statistics 25 (2): 180–189. Irizarry, M. de los A., M. E. Kuhl, E. K. Lada, S. Subramanian, and J. R. Wilson. 2003. Analyzing transformation-based simulation metamodels. IIE Transactions 35 (3): 271–283. Johnson, N. L. 1949. Systems of frequency curves generated by methods of translation. Biometrika 36:149– 176. Kuhl, M. E., E. K. Lada, N. M. Steiger, M. A. Wagner, and J. R. Wilson. 2008a. Introduction to mod- eling and generating probabilistic input processes for simulation. In Proceedings of the 2008 Winter Simulation Conference, ed. S. J. Mason, R. R. Hill, L. Mönch, O. Rose, T. Jefferson, and J. W. Fowler, 48–61. Piscataway, New Jersey: Institute of Electrical and Electronics Engineers. Available online as www.informs-sim.org/wsc08papers/008.pdf [accessed March 28, 2009]. Kuhl, M. E., E. K. Lada, N. M. Steiger, M. A. Wagner, and J. R. Wilson. 2008b. Introduction to modeling uim11.tex 22 July 9, 2009 – 11:35 and generating probabilistic input processes for simulation. Slides accompanying the oral presenta- tion of Kuhl et al. (2008a). Available online as www.ise.ncsu.edu/jwilson/files/wsc08imt.pdf [accessed March 28, 2009]. Kuhl, M. E., J. S. Ivy, E. K. Lada, N. M. Steiger, M. A. Wagner, and J. R. Wilson. 2009. Multivariate and time-dependent input processes for simulation. Journal of Simulation in preparation. Kuhl, M. E., E. K. Lada, N. M. Steiger, M. A. Wagner, and J. R. Wilson. 2006. Introduction to modeling and generating probabilistic input processes for simulation. In Proceedings of the 2006 Winter Simulation Conference, ed. L. F. Perrone, F. P. Wieland, J. Liu, B. G. Lawson, D. M. Nicol, and R. M. Fujimoto, 19–35. Piscataway, New Jersey: Institute of Electrical and Electronics Engineers. Available online via www.informs-sim.org/wsc06papers/003.pdf [accessed March 28, 2009]. Law, A. M. 20070. Simulation modeling and analysis. 4th ed. New York: McGraw-Hill. Matthews, J. L., E. K. Lada, L. M. Weiland, R. C. Smith, and D. J. Leo. 2006. Monte Carlo simulation of a solvated ionic polymer with cluster morphology. Smart Materials and Structures 15 (1): 187–199. McBride, W. J., and C. W. McClelland. 1967. PERT and the beta distribution. IEEE Transactions on Engineering Management EM-14 (4): 166-169. Palisade Corp. 2009. Getting started in @RISK. Ithaca, New York: Palisade Corp. Available online via www.palisade.com/risk/5/tips/EN/gs/ [accessed July 5, 2009]. Pearlswig, D. M. 1995. Simulation modeling applied to the single pot processing of effervescent tablets. Master’s thesis, Integrated Manufacturing Systems Engineering Institute, North Carolina State Univer- sity, Raleigh, North Carolina. Available online as www.ise.ncsu.edu/jwilson/files/pearlswig95 .pdf [accessed March 28, 2009]. Press, W. H., S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. 2007. Numerical recipes: The art of scientific computing. 3rd ed. Cambridge: Cambridge University Press. Swain, J. J., S. Venkatraman, and J. R. Wilson. 1988. Least-squares estimation of distribution functions in Johnson’s translation system. Journal of Statistical Computation and Simulation 29:271–297. Wagner, M. A. F., and J. R. Wilson. 1996a. Using univariate Bézier distributions to model simulation input processes. IIE Transactions 28 (9): 699–711. Wagner, M. A. F., and J. R. Wilson. 1996b. Recent developments in input modeling with Bézier distribu- tions. In Proceedings of the 1996 Winter Simulation Conference, ed. J. M. Charnes, D. J. Morrice, D. T. Brunner, and J. J. Swain, 1448–1456. Piscataway, New Jersey: Institute of Electrical and Electron- ics Engineers. Available online as www.ise.ncsu.edu/jwilson/files/wagner96wsc.pdf [accessed March 28, 2009]. Weiland, L. M., E. K. Lada, R. C. Smith, and D. J. Leo. 2005. Application of rotational isomeric state theory to ionic polymer stiffness predictions. Journal of Materials Research 20 (9): 2443–2455. Wilson, J. R., D. K. Vaughan, E. Naylor, and R. G. Voss. 1982. Analysis of Space Shuttle ground operations. Simulation 38 (6): 187–203.

uim11.tex 23 July 9, 2009 – 11:35 Xu, Xiao, J. S. Ivy, D. A. Patel, S. N. Patel, D. G. Smith, S. B. Ransom, D. Fenner, and J. O. L. Delancey. 2009. Lifelong pelvic floor consequences of cesarean delivery on maternal request for primigravid women with a single birth: A decision analysis. Journal of Women’s Health to appear. Zouaoui, F., and J. R. Wilson. 2003. Accounting for parameter uncertainty in simulation input modeling. IIE Transactions 35 (3): 781–792. Zouaoui, F., and J. R. Wilson. 2004. Accounting for input-model and input-parameter uncertainties in simulation. IIE Transactions 36 (11): 1135–1151.

Appendix: Exact computation of shape parameters for beta distribution fitted to user-specified mode and variance

To simplify the notation in this appendix, we let a, m,andb denote the user-specified minimum, mode, and maximum of the target distribution with a12(so that the desired beta distribution has a smaller variance than that of the uniform distribution on the interval Œa; b), then for any value of m 2 Œa; b, there is a unique generalized beta distribution on

Œa; b with a unique mode at m. (If ! D 12, then it can be shown that we must have ˛1 D ˛2 D 1 so that the beta distribution with the given mode and variance coincides with the uniform distribution on Œa; b.Since the mode is assumed to be unique, this uninteresting case is eliminated from further consideration.) If we set the right-hand side of (4) equal to m and the right-hand side of (3) equal to .b a/2=!, then we obtain the following equivalent system of equations in terms of the asymmetry ratio r D .b m/=.m a/, provided m>aso that r<1: ) ˛3 C B˛2 C C˛ C D D 0; 1 1 1 (A1) ˛2 D r˛1 C 1 r; where 9 3 2 3r 2r C .5 !/r C 4 > B D ; > .1 C r/3 > > 3 2 = C 3r 5r C .! 3/r C .5 !/ D 3 ; > (A2) .1 C r/ > > r3 C 4r2 5r C 2 > D D : ;> .1 C r/3 REMARK 5. In the case that m D a so that r D1, we solve the “mirror image” problem for which m D b and r D 0; and then we interchange the resulting shape parameters to obtain a generalized beta distribution whose mode coincides with its minimum. See also Remark 6 below.

It can be proved that if !>12, then for all r 2 Œ0; 1/ the cubic equation in ˛1 defined by (A1)–(A2) uim11.tex 24 July 9, 2009 – 11:35 has a nonnegative discriminant

18BCD 4B3D C B2C2 4C3 27D2 so that the cubic equation has three real roots fj W j D 1; 2; 3g such that: ) 1 >1; (A3) 2;3 <1:

As possible values of ˛1, the roots 2 and 3 are unacceptable for the following reasons:

(i) The assignment ˛1 2 .0; 1/ yields a generalized beta distribution with an asymptote at its lower limit a, which seems intuitively problematic and is clearly unacceptable when the user-specified mode m exceeds the lower limit.

(ii) The assignment ˛1 0 does not define a legitimate generalized beta distribution.

We are therefore left with the unique assignment ˛1 D 1; and a computing formula for ˛1 can be derived from the explicit solution to a cubic equation as follows (see Sections 33–38 of Dickson, 1939). In terms of the auxiliary quantities 9 1 2 = P D C 3 B ; (A4) 1 2 3 ; Q D D 3 BC C 27 B ; we have 8 ( " #) 3=2 ˆ 1=2 3 <ˆ 4 P 1 1 1 Q 1 B; >0; 3 cos 3 cos 2 P 3 if ˛1 D 1 D (A5) ˆ :ˆ B; if D 0:

Finally we take ˛2 D r˛1 C 1 r to complete the specification of the generalized beta distribution.

REMARK 6. In general to avoid numerical difficulties that can occur with large values of r (that is, when r 1), we recommend the following approach to the use of Equations (A1)–(A5). If .b m/=.m a/ > 1, then we solve the “mirror image” problem for which r D .m a/=.b m/ < 1; and finally we interchange the resulting shape parameters to obtain a generalized beta distribution with the user-specified mode m.

uim11.tex 25 July 9, 2009 – 11:35