Mean and Variance Bounds and Propagation for Ill-Specified Random Variables Andrew T
Total Page:16
File Type:pdf, Size:1020Kb
494 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 34, NO. 4, JULY 2004 Mean and Variance Bounds and Propagation for Ill-Specified Random Variables Andrew T. Langewisch and F. Fred Choobineh, Member, IEEE Abstract—Foundations, models, and algorithms are provided Furthermore, we show that nonexhaustive sensitivity analysis for identifying optimal mean and variance bounds of an ill-spec- does not always identify the optimal bounds for the mean and ified random variable. A random variable is ill-specified when at variance. least one of its possible realizations and/or its respective proba- bility mass is not restricted to a point but rather belongs to a set Ill-specificity is a consequence of imprecision or ambi- or an interval. We show that a nonexhaustive sensitivity-analysis guity concerning random variables, and in situations where approach does not always identify the optimal bounds. Also, a pro- arithmetic functions of several ill-specified variables are of cedure for determining the mean and variance bounds of an arith- interest, researchers have examined various techniques which metic function of ill-specified random variables is presented. Esti- model the resultant distributional uncertainty and imprecision. mates of pairwise correlation among the random variables can be incorporated into the function. The procedure is illustrated in the These techniques include robust Bayesian methods (e.g., [1]), context of a case study in which exposure to contaminants through Monte Carlo simulation with best judgment to choose input the inhalation pathway is modeled. distributions (e.g., [2], [3]), Monte Carlo simulation with the Index Terms—Fuzzy numbers, ill-specified random variables, input distributions selected by the maximum entropy criterion mean and variance bounds, risk analysis, uncertainty. (e.g., [4] and [5]), second-order Monte Carlo simulation (e.g., [6] and [7]), probability-bounds analysis (PBA; e.g., [8] and [9]), fuzzy arithmetic (e.g., [10] and [11]), and interval analysis I. INTRODUCTION (e.g., [12] and [13]). N projective modeling, the analyst must often consider the In this paper, an alternative approach, here labeled in- I behavior of random variables that are ill-specified in values terval-based mean and variance propagation analysis (IMVPA), and/or probabilities. By ill-specified, we mean the outcomes that is presented and illustrated. This approach provides a tool form the support for a random variable are not exactly known, for identifying optimal bounds for the mean and variance of but may be represented by intervals or sets, or that the prob- ill-specified variables, and then propagating those interval abilities associated with these outcomes are not exactly speci- bounds through arithmetic functions. Furthermore, IMVPA can fied, but can be constrained to intervals, or both. Furthermore, utilize estimates of correlation between pairs of ill-specified the sets or intervals may be overlapping. For such random vari- random variables. The approach is illustrated in the context ables, the mean and variance, two prime statistics used in anal- of a case study in which exposure to airborne emissions via ysis and estimation, cannot be precisely determined. We present the inhalation pathway is estimated for residents living near a procedures for optimally bounding the mean and variance of an food-processing facility [14]. In this example, upper and lower ill-specified random variable and demonstrate the use of these bounds on both the mean and standard deviation (or variance) procedures in risk analysis, where the risk is a function of one of the risk function are found to be significantly tighter than or more ill-specified random variables. those yielded by PBA. The use of IMVPA in risk analysis The impetus for considering ill-specified random variables can help the analyst avoid overestimating the health risks due is that, in most practical situations, no information about the to airborne emissions. Other applications should be readily random variable’s probability distribution is available. The ana- apparent in a variety of fields where quantitative modeling lyst, lacking adequate information to faithfully model a random of ill-specified data is useful (e.g., determining the mean and variable, often will assume additional knowledge, and hope to variance of electric power generation system production costs; safeguard this assumption with sensitivity analysis. Our pro- see [15] and [16]). posed definition of ill-specified random variables requires lim- ited amounts of information that often will be available, and, in Section 3, we will show that our definition also encompasses II. RELATED APPROACHES many of the established procedures for modeling ambiguity. Exact analytic methods for the propagation of uncertainty are typically unwieldy and often intractable, and require well-speci- Manuscript received July 25, 2002; revised January 3, 2004. This paper was fied probability distributions [17]. Focusing on approximations, recommended by Associate Editor S. Patek. the method of moments is applied widely to the analysis of com- Andrew T. Langewisch is with the Department of Business Admin- istration at Concordia University, Seward, NE 68434 USA (e-mail: plex models [18]. In this approach, a response variable is ana- [email protected]). lyzed as a function of precisely-described input distributions; F. Fred Choobineh is with the Department of Industrial and Management Sys- in contrast, we operate on and propagate uncertainty through a tems Engineering, University of Nebraska, Lincoln, NE 68588 USA (e-mail: [email protected]). series of two precisely-described and/or ill-specified variables Digital Object Identifier 10.1109/TSMCA.2004.826316 at a time, aggregating results. 1083-4427/04$20.00 © 2004 IEEE LANGEWISCH AND CHOOBINEH: MEAN AND VARIANCE BOUNDS AND PROPAGATION FOR ILL-SPECIFIED RANDOM VARIABLES 495 Fig. 1. PBA computes bounds on the CDF of a lognormal distribution having a mean somewhere in the interval (a, b) and standard deviation somewhere in (c, d). Rowe [19] outlined fundamental bounds on means and vari- for propagating the effects of imprecision through calculations. ances of transformations of random variables for which mo- In parametric cases, marginal distribution forms are known but ments and/or order statistics are given. Smith [20] elaborated the distributions’ parameters may only be specified by intervals. bounds on distribution functions with given moments and shape For example, suppose that evidence implies that a distribution constraints. His work generalizes that of Chebychev. is lognormal in form, with its and known only within in- Walley [21] has developed a mathematical theory of impreci- terval ranges. Fig. 1 illustrates probability bounds for the case sion for use in probabilistic reasoning, statistical inferences, and and . The leftmost bound, for example, decision making. In the elicitation procedure, an individual’s is comprised of two pieces: the CDF of lognormal for imprecise, indeterminate, and incomplete statements about , and the CDF of lognormal otherwise. Clearly, his/her beliefs and judgments are modeled (in a manner that the resulting bounds are not lognormal distributions. In non- is fundamentally different from our definition of ill-specified parametric cases, the marginal distribution forms are not known random variables) and presented as closed convex half-spaces, but some information such as the minimum, maximum, mode, and the intersection of these half-spaces identifies a class of and/or percentile values are known. RAMAS Risk Calc [26] is a linear previsions and a set of extreme points, one use of which commercially available software package for performing PBA is to identify upper and lower probabilities and upper and lower that uses modifications of Williamson and Downs’ [8] proce- variances. dure for constructing probability bounds. Saxena [22] suggests that if given upper and lower bounds on probabilities, one can maximize entropy to obtain a proba- III. MEAN AND VARIANCE BOUNDS bility distribution. Saxena has developed a simple algorithm to facilitate determining the maximum, and he shows how the tech- We call a real-valued variable an ill-specified random vari- nique may be applied to investment analysis. Kmietowicz and able when we do not know the precise probability measure on Pearman [23] suggest that although a decision maker may not be , but we have enough information to constrain the possible able to specify precise probability distributions, if he or she can realizations to a finite number of points, sets or bounded in- specify a ranking of the probabilities, i.e., , tervals, i.e., , and we can constrain the probability then maximum and minimum expected payoffs and variances mass assignments on to points or bounded intervals; i.e., can be derived and used to guide decision making. At least one of the sets or intervals Given uncertainty about a prior distribution and also about must be nondegenerate in order to have an ill-specified random the family of sampling distributions, Lavine [24] developed a variable. Furthermore, ’s may overlap. Thus, method for computing upper and lower bounds on posterior ex- . If there is a singleton set , then there is pectations over reasonable classes of sampling distributions that also a lower bound . Since in this ill-specified lie “close to” a given parametric family. space we cannot determine the precise values of the distribu- For ill-specified random variables, Langewisch