Belief Functions: a Gentle Introduction Seoul National University
Total Page:16
File Type:pdf, Size:1020Kb
Belief functions: A gentle introduction Seoul National University Professor Fabio Cuzzolin School of Engineering, Computing and Mathematics Oxford Brookes University, Oxford, UK Seoul, Korea, 30/05/18 Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 1 / 125 Uncertainty Outline Belief functions Decision making 1 Uncertainty Semantics 5 Second-order uncertainty Dempster’s rule Theories of uncertainty Classical probability Multivariate analysis Imprecise probability 2 Beyond probability Misunderstandings Monotone capacities Probability intervals Set-valued observations 4 Reasoning with belief Propositional evidence Fuzzy and possibility theory functions Probability boxes Scarce data Statistical inference Rough sets Representing ignorance Combination Rare events Conditioning 6 Belief functions on reals Uncertain data Belief vs Bayesian reasoning Continuous belief functions 3 Belief theory Generalised Bayes Theorem Random sets A theory of evidence The total belief theorem 7 Conclusions Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 2 / 125 Uncertainty Second-order uncertainty Orders of uncertainty the difference between predictable and unpredictable variation is one of the fundamental issues in the philosophy of probability second order uncertainty: being uncertain about our very model of uncertainty has a consequence on human behaviour: people are averse to unpredictable variations (as in Ellsberg’s paradox) how good are Kolmogorov’s measure-theoretic probability, or Bayesian and frequentist approaches at modelling second-order uncertainty? Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 3 / 125 Uncertainty Classical probability Probability measures mainstream mathematical theory of (first order) uncertainty: mathematical (measure-theoretical) probability mainly due to Russian mathematician Andrey Kolmogorov probability is an application of measure theory, the theory of assigning numbers to sets additive probability measure ! mathematical representation of the notion of chance assigns a probability value to every subset of a collection of possible outcomes (of a random experiment, of a decision problem, etc) collection of outcomes Ω ! sample space, universe subset A of the universe ! event Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 4 / 125 Uncertainty Classical probability Probability measures probability measure µ: a real-valued function on a probability space that satisfies countable additivity probability space: it is a triplet (Ω; F; P) formed by a universe Ω, a σ-algebra F of its subsets, and a probability measure on F I not all subsets of Ω belong necessarily to F axioms of probability measures: I µ(;) = 0, µ(Ω) = 1 I 0 ≤ µ(A) ≤ 1 for all events A ⊆ F I additivity: for all countable collection of pairwise disjoint events Ai : ! [ X µ Ai = µ(Ai ) i i probabilities have different interpretations: we consider frequentist and Bayesian (subjective) probability Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 5 / 125 Uncertainty Classical probability Frequentist inference in the frequentist interpretation, the (aleatory) probability of an event is its relative frequency in time the frequentist interpretation offers guidance in the design of practical ‘random’ experiments developed by Fisher, Pearman, Neyman three main tools: I statistical hypothesis testing I model selection I confidence interval analysis Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 6 / 125 Uncertainty Classical probability Statistical hypothesis testing 1 state the research hypothesis 2 state the relevant null and alternative hypotheses 3 state the statistical assumptions being made about the sample, e.g. assumptions about the statistical independence or about the form of the distributions of the observations 4 state the relevant test statistic T (a quantity derived from the sample) 5 derive the distribution of the test statistic under the null hypothesis from the assumptions 6 set a significance level (α), i.e. a probability threshold below which the null hypothesis will be rejected 7 compute from the observations the observed value tobs of the test statistic T 8 calculate the p-value, the probability (under the null hypothesis) of sampling a test statistic at least as extreme as the observed value 9 Reject the null hypothesis, in favor of the alternative hypothesis, if and only if the p-value is less than the significance level threshold Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 7 / 125 Uncertainty Classical probability P-values More likely observation P-value Very unlikely Very unlikely observations observations Observed Probability density Probability data point Set of possible results the p-value is not the probability that the null hypothesis is true or the probability that the alternative hypothesis is false: frequentist statistics does not and cannot attach probabilities to hypotheses Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 8 / 125 Uncertainty Classical probability Maximum Likelihood Estimation (MLE) the term ‘likelihood’ was popularized in mathematical statistics by Ronald Fisher in 1922: ‘On the mathematical foundations of theoretical statistics’ Fisher argued against ‘inverse’ (Bayesian) probability as a basis for statistical inferences, and instead proposes inferences based on likelihood functions likelihood principle: all of the evidence in a sample relevant to model parameters is contained in the likelihood function I this is hotly debated, still [Mayo,Gandenberger] maximum likelihood estimation: fθ^mleg ⊆ farg max L(θ ; x1;:::; xn)g; θ2Θ where L(θ ; x1;:::; xn) = f (x1; x2;:::; xn j θ) and ff (:jθ); θ 2 Θg is a parametric model consistency: the sequence of MLEs converges in probability, for a sufficiently large number of observations, to the (actual) value being estimated Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 9 / 125 Uncertainty Classical probability Subjective probability (epistemic) probability = degrees of belief of an individual assessing the state of the world Ramsey and de Finetti ! subjective beliefs must follow the laws of probability if they are to be coherent (if this ‘proof’ was prooftight we would not be here in front of you!) also, evidence casts doubt that humans will have coherent beliefs or behave rationally Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 10 / 125 Uncertainty Classical probability Bayesian inference prior distribution: the distribution of the parameter(s) before any data is observed, i.e. p(θ j α) depends on a vector of hyperparameters α likelihood: the distribution of the observed data conditional on its parameters, i.e. p(X j θ) marginal likelihood (sometimes also termed the evidence) is the distribution of the observed data marginalized over the parameter(s): Z p(X j α) = p(X j θ)p(θ j α) dθ θ posterior distribution: the distribution of the parameter(s) after taking into account the observed data, as determined by Bayes’ rule: p(X j θ)p(θ j α) p(θ j X; α) = / p(X j θ)p(θ j α) p(X j α) Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 11 / 125 Beyond probability Outline Belief functions Decision making 1 Uncertainty Semantics 5 Second-order uncertainty Dempster’s rule Theories of uncertainty Classical probability Multivariate analysis Imprecise probability 2 Beyond probability Misunderstandings Monotone capacities Probability intervals Set-valued observations 4 Reasoning with belief Propositional evidence Fuzzy and possibility theory functions Probability boxes Scarce data Statistical inference Rough sets Representing ignorance Combination Rare events Conditioning 6 Belief functions on reals Uncertain data Belief vs Bayesian reasoning Continuous belief functions 3 Belief theory Generalised Bayes Theorem Random sets A theory of evidence The total belief theorem 7 Conclusions Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 12 / 125 Beyond probability Something is wrong? measure-theoretical mathematical probability is not general enough: I cannot (properly) model missing data I cannot (properly) model propositional data I cannot really model unusual data (second order uncertainty) the frequentist approach to probability: I cannot really model pure data (without ‘design’) I in a way, cannot even model properly continuous data I models scarce data only asymptotically Bayesian reasoning has several limitations: I cannot model no data (ignorance) I cannot model ‘uncertain’ data I cannot model pure data (without prior) I again, cannot properly model scarce data (only asymptotically) Professor Fabio Cuzzolin Belief functions: A gentle introduction Seoul, Korea, 30/05/18 13 / 125 Beyond probability Fisher has not got it all right the setting of hypothesis testing is (arguably) arguable I the scope is quite narrow: rejecting or not rejecting a hypothesis (although it can provide confidence intervals) I the criterion is arbitrary: who decides what an ‘extreme’ realisation is (choice of α)? what is the deal with 0.05 and 0.01? I the whole ‘tail’ idea comes from the fact that, under measure theory, the conditional probability (p-value) of a point outcome x is zero – seems trying to patch an underlying problem with the way probability