An Imprecise SHAP As a Tool for Explaining the Class Probability Distributions Under Limited Training Data

Total Page:16

File Type:pdf, Size:1020Kb

An Imprecise SHAP As a Tool for Explaining the Class Probability Distributions Under Limited Training Data An Imprecise SHAP as a Tool for Explaining the Class Probability Distributions under Limited Training Data Lev V. Utkin, Andrei V. Konstantinov, and Kirill A. Vishniakov Peter the Great St.Petersburg Polytechnic University St.Petersburg, Russia e-mail: [email protected], [email protected], [email protected] Abstract One of the most popular methods of the machine learning prediction explanation is the SHapley Additive exPlanations method (SHAP). An imprecise SHAP as a modification of the original SHAP is proposed for cases when the class probability distributions are imprecise and represented by sets of distributions. The first idea behind the imprecise SHAP is a new approach for computing the marginal contribution of a feature, which fulfils the important efficiency property of Shapley values. The second idea is an attempt to consider a general approach to calculating and reducing interval-valued Shapley values, which is similar to the idea of reachable probability intervals in the imprecise probability theory. A simple special implementation of the general approach in the form of linear optimization problems is proposed, which is based on using the Kolmogorov-Smirnov distance and imprecise contamination models. Numerical examples with synthetic and real data illustrate the imprecise SHAP. Keywords: interpretable model, XAI, Shapley values, SHAP, Kolmogorov-Smirnov distance, imprecise probability theory. arXiv:2106.09111v1 [cs.LG] 16 Jun 2021 1 Introduction An importance of the machine learning models in many applications and their success in solv- ing many applied problems lead to another problem which may be an obstacle for using the models in areas where their prediction accuracy as well as understanding is crucial, for ex- ample, in medicine, reliability analysis, security, etc. This obstacle takes place for complex models which can be viewed as black boxes because their users do not know how the models act and are functioning. Moreover, the training process is also often unknown. A natural way for overcoming the obstacle is to use a meta-model which could explain the provided predic- tions. The explanation means that we have to select features of an analyzed example which are responsible for the prediction of the obtained black-box model or significantly impact on the corresponding prediction. By considering the explanation of a single example, we say about the 1 so-called local explanation methods. They aim to explain the black-box model locally around the considered example. Another explanation methods try to explain predictions taking into account the whole dataset or its certain part. The need of explaining the black-box models us- ing the local or global explanations in many applications motivated developing a huge number of explanation methods which are described in detail in several comprehensive survey papers [10, 29, 41, 47, 49, 68, 79, 82, 85]. Among all explanation methods, we select two very popular methods: the Local Inter- pretable Model-Agnostic Explanation (LIME) [60] and SHapley Additive exPlanations (SHAP) [44, 72]. The basic idea behind LIME is to build an approximating linear model around the explained example. In order to implement the approximation, many synthetic examples are generated in the neighborhood of the explained example with weights depending on distances from the explained example. The linear regression model is constructed by using the generated examples such that its coefficients can be regarded as quantitative representation of impacts of the corresponding features on the prediction. SHAP is inspired by game-theoretic Shapley values [71] which can be interpreted as av- erage expected marginal contributions over all possible subsets (coalitions) of features to the black-box model prediction. In spite of two important shortcomings of SHAP, including its computational complexity depending on the number of features and some ambiguity of avail- able approaches for removing features from consideration [17], SHAP is widely used in practice and can be viewed as the most promising and theoretically justified explanation method which fulfils several nice properties [44]. Nevertheless, another difficulty of using SHAP is how to deal with predictions in the form of probability distributions which arise in multi-class classifica- tion and in machine learning survival analysis. The problem is that SHAP in each iteration calculates a difference between two predictions defined by different subsets of features in the explained example. Therefore, the question is how to define the difference between the prob- ability distributions. One of the simplest ways in classification is to consider a class with the largest probability. However, this way may lead to incorrect results when the probabilities are comparable. Moreover, the same approach cannot be applied to explanation of the survival model predictions which are in the form of survival functions. It should be noted that Covert et al. [18] justified that the well-known Kullback-Leibler (KL) divergence [40] can be applied to the global explanation. It is obvious that the KL divergence can be applied also to the local explanation. Moreover, the KL divergence can be replaced with different distances, for example, χ2-divergence [56], the relative J-divergence [23], Csiszar's f-divergence measure [19]. However, the use of all these measures leads to such modifications of SHAP that obtained Shapley values do not satisfy properties of original Shapley values. Therefore, one of the problems for solving is to define a meaningful distance between two predictions represented in the form of the class probability distributions, which fulfils the Shapley value properties. It should be noted that the probability distribution of classes may be imprecise due to a limited number of training data. On the one hand, it can be said that the class probabilities in many cases are not real probabilities as measures defined on events, i.e., they are not relative frequencies of occurrence of events. For example, the softmax function in neural networks pro- duce numbers (weights of classes), which have some properties of probabilities, but not original probabilities. On the other hand, probabilities of classes in random forests and in random sur- 2 vival forests can be viewed as relative frequencies (probabilities are computed by counting the percentage of different classes of examples at each leaf node), and they are true probabilities. In both the cases, it is obvious that accuracy of the class probability distributions depends on the machine learning model predicting the distributions and on the amount of training data. It is difficult to impact on improvement of the post-hoc machine learning model representing as a black box, but we can take into account the imprecision of probability distributions due to the lack of sufficient training data. The considered imprecision can be referred to the epistemic uncertainty [69] which represents our ignorance about a model caused by the lack of observation data. Epistemic uncertainty can be resolved by observing more data. There are several approaches to formalize the imprecision. One of them is to consider a set of probability distributions instead of the single one. Sets of distributions are produced by imprecise statistical models, for example, by the linear-vacuous mixture or imprecise "-contamination model [76], by the imprecise Dirichlet model [77], by the constant odds-ratio model [76], by Kolmogorov{Smirnov bounds [34]. It is obvious that the above imprecise statistical models produce interval-valued probabilities of events or classes, and the obtained interval can be regarded as a measure of observation data insufficiency. It turns out that the imprecision of the class probability distribution leads to imprecision of Shapley values when SHAP is used for explaining the machine learning model predictions. In other words, Shapley values become interval-valued. This allows us to construct a new framework of the imprecise SHAP or imprecise Shapley values, which provides a real accounting of the prediction uncertainty. Having the interval-valued Shapley values, we can compare intervals in order to select the most important features. It is interesting to point out that properties of the efficiency and the linearity of Shapley values play a crucial role in constructing basic rules of the framework of imprecise Shapley values. In summary, the following contributions are made in this paper: 1. A new approach for computing the marginal contribution of the i-th feature in SHAP for interpretation of the class probability distributions as predictions of a black-box machine learning model is proposed. It is based on considering a distance between two probability dis- tributions. Moreover, Shapley values using the proposed approach fulfils the efficiency property which is very important in correct explanation by means of SHAP. 2. An imprecise SHAP is proposed for cases when the class probability distributions are im- precise, i.e., they are represented by convex sets of distributions. The outcome of the imprecise SHAP is a set of interval-valued Shapley values. Basic tools for dealing with imprecise class probability distributions and for computing interval-valued Shapley values are introduced. 3. Implementation of the imprecise SHAP by using the Kolmogorov-Smirnov distance is proposed. Complex optimization problems for computing interval-valued Shapley values are reduced to finite sets of simple linear programming problems. 4. The imprecise SHAP is illustrated
Recommended publications
  • Imprecise Probability in Risk Analysis
    IMPRECISE PROBABILITY IN RISK ANALYSIS D. Dubois IRIT-CNRS, Université Paul Sabatier 31062 TOULOUSE FRANCE Outline 1. Variability vs incomplete information 2. Blending set-valued and probabilistic representations : uncertainty theories 3. Possibility theory in the landscape 4. A methodology for risk-informed decision- making 5. Some applications Origins of uncertainty • The variability of observed repeatable natural phenomena : « randomness ». – Coins, dice…: what about the outcome of the next throw? • The lack of information: incompleteness – because of information is often lacking, knowledge about issues of interest is generally not perfect. • Conflicting testimonies or reports:inconsistency – The more sources, the more likely the inconsistency Example • Variability: daily quantity of rain in Toulouse – May change every day – It can be estimated through statistical observed data. – Beliefs or prediction based on this data • Incomplete information : Birth date of Brazil President – It is not a variable: it is a constant! – You can get the correct info somewhere, but it is not available. – Most people may have a rough idea (an interval), a few know precisely, some have no idea: information is subjective. – Statistics on birth dates of other presidents do not help much. • Inconsistent information : several sources of information conflict concerning the birth date (a book, a friend, a website). The roles of probability Probability theory is generally used for representing two aspects: 1. Variability: capturing (beliefs induced by) variability through repeated observations. 2. Incompleteness (info gaps): directly modeling beliefs via betting behavior observation. These two situations are not mutually exclusive. Using a single probability distribution to represent incomplete information is not entirely satisfactory: The betting behavior setting of Bayesian subjective probability enforces a representation of partial ignorance based on single probability distributions.
    [Show full text]
  • Imprecise Probability for Multiparty Session Types in Process Algebra APREPRINT
    IMPRECISE PROBABILITY FOR MULTIPARTY SESSION TYPES IN PROCESS ALGEBRA A PREPRINT Bogdan Aman Gabriel Ciobanu Romanian Academy, Institute of Computer Science Romanian Academy, Institute of Computer Science Alexandru Ioan Cuza University, Ia¸si, Romania Alexandru Ioan Cuza University, Ia¸si, Romania [email protected] [email protected] February 18, 2020 ABSTRACT In this paper we introduce imprecise probability for session types. More exactly, we use a proba- bilistic process calculus in which both nondeterministic external choice and probabilistic internal choice are considered. We propose the probabilistic multiparty session types able to codify the structure of the communications by using some imprecise probabilities given in terms of lower and upper probabilities. We prove that this new probabilistic typing system is sound, as well as several other results dealing with both classical and probabilistic properties. The approach is illustrated by a simple example inspired by survey polls. Keywords Imprecise Probability · Nondeterministic and Probabilistic Choices · Multiparty Session Types. 1 Introduction Aiming to represent the available knowledge more accurately, imprecise probability generalizes probability theory to allow for partial probability specifications applicable when a unique probability distribution is difficult to be identified. The idea that probability judgments may be imprecise is not novel. It would be enough to mention the Dempster- Shafer theory for reasoning with uncertainty by using upper and lower probabilities [1]. The idea appeared even earlier in 1921, when J.M. Keynes used explicit intervals for approaching probabilities [2] (B. Russell called this book “undoubtedly the most important work on probability that has appeared for a very long time”).
    [Show full text]
  • Demystifying Dilation
    Forthcoming in Erkenntnis. Penultimate version. Demystifying Dilation Arthur Paul Pedersen Gregory Wheeler Abstract Dilation occurs when an interval probability estimate of some event E is properly included in the interval probability estimate of E conditional on every event F of some partition, which means that one’s initial estimate of E becomes less precise no matter how an experiment turns out. Critics main- tain that dilation is a pathological feature of imprecise probability models, while others have thought the problem is with Bayesian updating. How- ever, two points are often overlooked: (i) knowing that E is stochastically independent of F (for all F in a partition of the underlying state space) is sufficient to avoid dilation, but (ii) stochastic independence is not the only independence concept at play within imprecise probability models. In this paper we give a simple characterization of dilation formulated in terms of deviation from stochastic independence, propose a measure of dilation, and distinguish between proper and improper dilation. Through this we revisit the most sensational examples of dilation, which play up independence be- tween dilator and dilatee, and find the sensationalism undermined by either fallacious reasoning with imprecise probabilities or improperly constructed imprecise probability models. 1 Good Grief! Unlike free advice, which can be a real bore to endure, accepting free information when it is available seems like a Good idea. In fact, it is: I. J. Good (1967) showed that under certain assumptions it pays you, in expectation, to acquire new infor- mation when it is free. This Good result reveals why it is rational, in the sense of maximizing expected utility, to use all freely available evidence when estimating a probability.
    [Show full text]
  • Propagation of Imprecise Probabilities Through
    PROPAGATION OF IMPRECISE PROBABILITIES THROUGH BLACK-BOX MODELS A Thesis Presented to The Academic Faculty by Morgan Bruns In Partial Fulfillment of the Requirements for the Degree Master of Science in the School of Mechanical Engineering Georgia Institute of Technology May 2006 PROPAGATION OF IMPRECISE PROBABILITIES THROUGH BLACK-BOX MODELS Approved by: Dr. Christiaan J.J. Paredis, Advisor School of Mechanical Engineering Georgia Institute of Technology Dr. Bert Bras School of Mechanical Engineering Georgia Institute of Technology Dr. Leon F. McGinnis School of Industrial and Systems Engineering Georgia Institute of Technology Date Approved: April 7, 2006 ACKNOWLEDGEMENTS This thesis would not have been possible without the assistance of many of my friends and colleagues. I would like to thank my advisor, Dr. Chris Paredis, for his support, guidance, and insight. I would also like to thank the students of the Systems Realization Laboratory for the intellectually nurturing environment they have provided. In particular, I would like to thank Jason Aughenbaugh, Scott Duncan, Jay Ling, Rich Malak, and Steve Rekuc for their criticism and advice regarding my research. iii TABLE OF CONTENTS ACKNOWLEDGEMENTS iii LIST OF TABLES vi LIST OF FIGURES vii LIST OF ABBREVIATIONS viii LIST OF SYMBOLS ix SUMMARY xi CHAPTER 1 : INTRODUCTION 1 1.1 Design decision making 1 1.2 Imprecision in design 2 1.3 Challenges for decision making formalisms 4 1.4 Research questions, hypotheses, and thesis outline 5 CHAPTER 2 : DECISION MAKING UNDER IMPRECISE UNCERTAINTY
    [Show full text]
  • Demystifying Dilation
    CORE Metadata, citation and similar papers at core.ac.uk Provided by PhilSci Archive Forthcoming in Erkenntnis. Penultimate version. Demystifying Dilation Arthur Paul Pedersen Gregory Wheeler Abstract Dilation occurs when an interval probability estimate of some event E is properly included in the interval probability estimate of E conditional on every event F of some partition, which means that one’s initial estimate of E becomes less precise no matter how an experiment turns out. Critics main- tain that dilation is a pathological feature of imprecise probability models, while others have thought the problem is with Bayesian updating. How- ever, two points are often overlooked: (i) knowing that E is stochastically independent of F (for all F in a partition of the underlying state space) is sufficient to avoid dilation, but (ii) stochastic independence is not the only independence concept at play within imprecise probability models. In this paper we give a simple characterization of dilation formulated in terms of deviation from stochastic independence, propose a measure of dilation, and distinguish between proper and improper dilation. Through this we revisit the most sensational examples of dilation, which play up independence be- tween dilator and dilatee, and find the sensationalism undermined by either fallacious reasoning with imprecise probabilities or improperly constructed imprecise probability models. 1 Good Grief! Unlike free advice, which can be a real bore to endure, accepting free information when it is available seems like a Good idea. In fact, it is: I. J. Good (1967) showed that under certain assumptions it pays you, in expectation, to acquire new infor- mation when it is free.
    [Show full text]
  • For Peer Review 19 20 2009)
    Cambridge Journal of Economics De Finetti on Uncertainty Journal: Cambridge Journal of Economics For PeerManuscript ID:Review Draft Manuscript Type: Article CJE Keywords: risk, uncertainty, de Finetti, Keynes, Knight <a href=http://www.aeaweb.org/journal/elclasjn_hold.html B21, D81 target=new><b>JEL Classification</b></a>: http://mc.manuscriptcentral.com/cje Page 1 of 29 Cambridge Journal of Economics 1 2 3 4 DE FINETTI ON UNCERTAINTY 5 6 7 8 9 10 11 Abstract 12 13 The well-known Knightian distinction between quantifiable risk and unquantifiable uncertainty is at 14 odds with the dominant subjectivist conception of probability associated with de Finetti, Ramsey 15 and Savage. Risk and uncertainty are rendered indistinguishable on the subjectivist approach insofar 16 as an individual’s subjective estimate of the probability of any event can be elicited from the odds at 17 which she would be prepared to bet for or against that event. The risk/uncertainty distinction has 18 however never quite goneFor away andPeer is currently undeReviewr renewed theoretical scrutiny. The purpose of 19 20 this paper is to show that de Finetti’s understanding of the distinction is more nuanced than is 21 usually admitted. Relying on usually overlooked excerpts of de Finetti’s works commenting on 22 Keynes, Knight, and interval valued probabilities, we argue that de Finetti suggested a relevant 23 theoretical case for uncertainty to hold even when individuals are endowed with subjective 24 probabilities. Indeed, de Finetti admitted that the distinction between risk and uncertainty is relevant 25 when different people sensibly disagree about the probability of the occurrence of an event.
    [Show full text]
  • Design Optimization with Imprecise Random Variables
    2009-01-0201 Design Optimization with Imprecise Random Variables Jeffrey W. Herrmann University of Maryland, College Park Copyright © 2009 SAE International ABSTRACT using precise probability distributions or representing uncertain quantities as fuzzy sets. Haldar and Design optimization is an important engineering design Mahadevan [7] give a general introduction to reliability- activity. Performing design optimization in the presence based design optimization, and many different solution of uncertainty has been an active area of research. The techniques have been developed [8, 9, 10]. Other approaches used require modeling the random variables approaches include evidence-based design optimization using precise probability distributions or representing [11], possibility-based design optimization [12], and uncertain quantities as fuzzy sets. This work, however, approaches that combine possibilities and probabilities considers problems in which the random variables are [13]. Zhou and Mourelatos [11] discussed an evidence described with imprecise probability distributions, which theory-based design optimization (EBDO) problem. are highly relevant when there is limited information They used a hybrid approach that first solves a RBDO to about the distribution of a random variable. In particular, get close to the optimal solution and then generates this paper formulates the imprecise probability design response surfaces for the active constraints and uses a optimization problem and presents an approach for derivative-free optimizer to find a solution. solving it. We present examples for illustrating the approach. The amount of information available and the outlook of the decision-maker (design engineer) determines the INTRODUCTION appropriateness of different models of uncertainty. No single model should be considered universally valid. In Design optimization is an important engineering design this paper, we consider situations in which there is activity in automotive, aerospace, and other insufficient information about the random variables to development processes.
    [Show full text]
  • Blurring out Cosmic Puzzles
    Blurring Out Cosmic Puzzles Yann Ben´etreau-Dupin∗† September 12, 2018 Abstract The Doomsday argument and anthropic reasoning are two puz- zling examples of probabilistic confirmation. In both cases, a lack of knowledge apparently yields surprising conclusions. Since they are for- mulated within a Bayesian framework, they constitute a challenge to Bayesianism. Several attempts, some successful, have been made to avoid these conclusions, but some versions of these arguments cannot be dissolved within the framework of orthodox Bayesianism. I show that adopting an imprecise framework of probabilistic reasoning allows for a more adequate representation of ignorance in Bayesian reasoning and explains away these puzzles. 1 Introduction The Doomsday paradox and the appeal to anthropic bounds to solve the cosmological constant problem are two examples of puzzles of probabilistic confirmation. These arguments both make ‘cosmic’ predictions: the former gives us a probable end date for humanity, and the second a probable value of the vacuum energy density of the universe. They both seem to allow one to draw unwarranted conclusions from a lack of knowledge, and yet one way of formulating them makes them a straightforward application of Bayesianism. They call for a framework of inductive logic that allows one to represent arXiv:1412.4382v1 [physics.data-an] 14 Dec 2014 ∗Department of Philosophy & Rotman Institute of Philosophy, Western University, Canada, [email protected] †Forthcoming in Philosophy of Science (PSA 2014). This paper stems from discussions at the UCSC-Rutgers Institute for the Philosophy of Cosmology in the summer of 2013. I am grateful to Chris Smeenk and Wayne Myrvold for discussions and comments on earlier drafts.
    [Show full text]
  • On De Finetti's Instrumentalist Philosophy of Probability
    Forthcoming in the European Journal for Philosophy of Science On de Finetti’s instrumentalist philosophy of probability Joseph Berkovitz1 Abstract. De Finetti is one of the founding fathers of the subjective school of probability. He held that probabilities are subjective, coherent degrees of expectation, and he argued that none of the objective interpretations of probability make sense. While his theory has been influential in science and philosophy, it has encountered various objections. I argue that these objections overlook central aspects of de Finetti’s philosophy of probability and are largely unfounded. I propose a new interpretation of de Finetti’s theory that highlights these aspects and explains how they are an integral part of de Finetti’s instrumentalist philosophy of probability. I conclude by drawing an analogy between misconceptions about de Finetti’s philosophy of probability and common misconceptions about instrumentalism. 1. Introduction De Finetti is one of the founding fathers of the modern subjective school of probability. In this school of thought, probabilities are commonly conceived as coherent degrees of belief, and expectations are defined in terms of such probabilities. By contrast, de Finetti took expectation as the fundamental concept of his theory of probability and subjective probability as derivable from expectation. In his theory, probabilities are coherent degrees of expectation.2 De Finetti represents the radical wing of the subjective school of probability, denying the existence of objective probabilities. He held that probabilities and probabilistic reasoning are subjective and that none of the objective interpretations of probability make sense. Accordingly, he rejected the idea that subjective probabilities are guesses, predictions, or hypotheses about objective probabilities.
    [Show full text]
  • Reasoning with Imprecise Belief Structures
    Reasoning with imprecise belief structures Thierry Denœux U.M.R. CNRS 6599 Heudiasyc Universit´ede Technologie de Compi`egne BP 20529 - F-60205 Compi`egne cedex - France email: Thierry.Denoeux@@hds.utc.fr Abstract This paper extends the theory of belief functions by introducing new concepts and techniques, allowing to model the situation in which the be- liefs held by a rational agent may only be expressed (or are only known) with some imprecision. Central to our approach is the concept of interval- valued belief structure, defined as a set of belief structures verifying certain constraints. Starting from this definition, many other concepts of Evidence Theory (including belief and plausibility functions, pignistic probabilities, combination rules and uncertainty measures) are generalized to cope with imprecision in the belief numbers attached to each hypothesis. An appli- cation of this new framework to the classification of patterns with partially known feature values is demonstrated. Keywords: evidence theory, Dempster-Shafer theory, belief functions, Transferable Belief model, uncertainty modeling, imprecision, pattern clas- sification. 1 Introduction The representation of uncertainty is a major issue in many areas of Science and Engineering. Although Probability Theory has been regarded as a reference framework for about two centuries, it has become increasingly apparent that the concept of probability is not general enough to account for all kinds of uncertainty. More specifically, the requirement that precise numbers be assigned to every individual hypotheses is often regarded as too restrictive, particularly when the available information is poor. The use of completely monotone capacities, also called credibility or belief functions, as a general tool for representing someone's degrees of belief was proposed by Shafer [20] in the seventies (belief functions had already been introduced by Dempster ten years earlier as lower probabilities generated by a multivalued mapping [1]).
    [Show full text]
  • Mean and Variance Bounds and Propagation for Ill-Specified Random Variables Andrew T
    494 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 34, NO. 4, JULY 2004 Mean and Variance Bounds and Propagation for Ill-Specified Random Variables Andrew T. Langewisch and F. Fred Choobineh, Member, IEEE Abstract—Foundations, models, and algorithms are provided Furthermore, we show that nonexhaustive sensitivity analysis for identifying optimal mean and variance bounds of an ill-spec- does not always identify the optimal bounds for the mean and ified random variable. A random variable is ill-specified when at variance. least one of its possible realizations and/or its respective proba- bility mass is not restricted to a point but rather belongs to a set Ill-specificity is a consequence of imprecision or ambi- or an interval. We show that a nonexhaustive sensitivity-analysis guity concerning random variables, and in situations where approach does not always identify the optimal bounds. Also, a pro- arithmetic functions of several ill-specified variables are of cedure for determining the mean and variance bounds of an arith- interest, researchers have examined various techniques which metic function of ill-specified random variables is presented. Esti- model the resultant distributional uncertainty and imprecision. mates of pairwise correlation among the random variables can be incorporated into the function. The procedure is illustrated in the These techniques include robust Bayesian methods (e.g., [1]), context of a case study in which exposure to contaminants through Monte Carlo simulation with best judgment to choose input the inhalation pathway is modeled. distributions (e.g., [2], [3]), Monte Carlo simulation with the Index Terms—Fuzzy numbers, ill-specified random variables, input distributions selected by the maximum entropy criterion mean and variance bounds, risk analysis, uncertainty.
    [Show full text]
  • Probability Bounds Analysis Applied to the Sandia Verification and Validation Challenge Problem
    ASME Journal of Verification, Validation and Uncertainty Quantification , 2016 Probability Bounds Analysis Applied to the Sandia Verification and Validation Challenge Problem Aniruddha Choudhary1 National Energy Technology Laboratory 3610 Collins Ferry Road, Morgantown, WV 26505 [email protected] ASME Member Ian T. Voyles Aerospace and Ocean Engineering Department, Virginia Tech 460 Old Turner Street, Blacksburg, VA 24061 [email protected] ASME Member Christopher J. Roy Aerospace and Ocean Engineering Department, Virginia Tech 460 Old Turner Street, Blacksburg, VA 24061 [email protected] ASME Member William L. Oberkampf W. L. Oberkampf Consulting 5112 Hidden Springs Trail, Georgetown, TX 78633 [email protected] ASME Member Mayuresh Patil Aerospace and Ocean Engineering Department, Virginia Tech 460 Old Turner Street, Blacksburg, VA 24061 [email protected] 1 Corresponding author 1 ASME Journal of Verification, Validation and Uncertainty Quantification Abstract Our approach to the Sandia Verification and Validation Challenge Problem is to use probability bounds analysis based on probabilistic representation for aleatory uncertainties and interval representation for (most) epistemic uncertainties. The nondeterministic model predictions thus take the form of p-boxes, or bounding cumulative distribution functions (CDFs) that contain all possible families of CDFs that could exist within the uncertainty bounds. The scarcity of experimental data provides little support for treatment of all uncertain inputs as purely aleatory uncertainties and also precludes significant calibration of the models. We instead seek to estimate the model form uncertainty at conditions where experimental data are available, then extrapolate this uncertainty to conditions were no data exist. The Modified Area Validation Metric (MAVM) is employed to estimate the model form uncertainty which is important because the model involves significant simplifications (both geometric and physical nature) of the true system.
    [Show full text]