Chapter 4: Probability Distributions

Total Page:16

File Type:pdf, Size:1020Kb

Chapter 4: Probability Distributions Chapter 4: Probability Distributions 4.1 Random Variables A random variable is a function X that assigns a numerical value x to each possible outcome in the sample space An event can be associated with a single value of the random variable, or it can be associated with a range of values of the random variable. The probability of an event can then be described as: 푃 퐴 = 푃(푋 = 푥푖) or 푃 퐴 = 푃(푥푙 ≤ 푋 ≤ 푥푢) There could also be other topology for the random variable to describe the event. If 푥푖, = 1,2, ⋯ , 푁 are all the possible values of random variable associated with the sample space, then 푁 푃(푋 = 푥푖) = 1 푖=1 e.g. Each (composite) outcome consists of 3 ratings (M,P,C). Let 푀1 , 푃1 and 퐶1 be preferred ratings. Let X be the function that assigns to each outcome the probabilities x number of preferred ratings each outcome 퐶 0.03 3 possesses. 1 푃1 Since each outcome has a probability, we 퐶 can compute the probability of getting each 2 0.06 2 value x = 0,1,2,3 of the function X 퐶3 0.07 2 푀1 x | P(X = x) 3 | 0.03 퐶1 0.02 2 2 | 0.29 1 | 0.50 퐶2 0.01 1 0 | 0.18 푃2 퐶3 0.01 1 퐶10.09 2 푃1 퐶20.16 1 푀2 퐶3 0.01 1 … … Random variables X can be classified by the number of values x they can assume. The two common types are discrete random variables with a finite or countably infinite number of values continuous random variables having a continuum of values for x 1. A value of a random variable may correspond to several random events. 2. An event may correspond to a range of values (or ranges of values) of a random variable. 3. But a given value (in its legal range) of a random variable corresponds to a random event. 4. Different random values of the random variable correspond to mutually exclusive random events. 5. Each value of a random variable has a corresponding probability. 6. All possible values of a random variable correspond to the entire sample space. 7. The summation of probabilities corresponding to all values of a random variable must equal to unity. A fundamental problem is to find the probability of occurrence for each possible value x of the random variable X. 푃 푋 = 푥 = 푃(퐴) all outcomes 퐴 assigned value 푥 This is the problem of identifying the probability distribution for a random variable. The probability distribution of a discrete random variable X can be listed as a table of the possible values x together with the probability P(X = x) for each e.g. 푥1 | 푃(푋 = 푥1) 푥2 | 푃(푋 = 푥2) 푥3 | 푃(푋 = 푥3) … It is standard notation to refer to the values P(X = x) of the probability distribution by f(x) f(x) ≡ P(X = x) The probability distribution always satisfies the conditions 푓 푥 ≥ 0 and 푎푙푙 푥 푓 푥 = 1 푥−2 e.g. 푓 푥 = for x = 1,2,3,4 2 푥2 e.g. 푓 푥 = for x = 0,1,2,3,4 25 Since the probability distribution for a discrete random variable is a tabular list, it can also be represented as a histogram, the probability histogram. For a discrete random variable, the height for the bin value x is f(x), the width of the bin is meaningless. For a discrete random variable, the probability histogram is commonly drawn either with touching bins (left) or in Pareto style (right - also referred to as a bar chart). f(x) for number preferred ratings Of course one can also compute the cumulative distribution function (or cumulative probability function) 퐹 푥 = 푃 푋 ≤ 푥 for all − ∞ ≤ 푥 ≤ ∞ and plot it in the ways learned in chapter 2 (with consideration that the x-axis is not continuous but discrete). F(x) for number preferred ratings We now start to discuss the probability distributions for many discrete random variables that occur in nature 4.2 Binomial Distribution Bernoulli distribution: In probability theory and statistics, the Bernoulli distribution, named after Swiss scientist Jacob Bernoulli, is a discrete probability distribution, which takes value 1 with success probability 푝 and value 0 with failure probability 푞 = 1 − 푝 . So if X is a random variable with this distribution, we have: 푃 푋 = 1 = 푝; 푝 푋 = 0 = 푞 = 1 − 푝. Mean and variance of a random variable 푿: (1) Mean (mathematical expectation, expectation, average, etc): 휇 = 푥 = 퐸 푋 = 푥푃(푋 = 푥) 푖 (2) Variance: 2 2 2 푉푎푟 푋 = 퐸 푥 − 푥 = 휎 = 푥 − 휇 푃(푋 = 푥) 푖 휎 is called the standard deviation. For random variable with Bernoulli distribution, we have 휇 = 퐸 푋 = 푝 푉푎푟 푋 = 휎2 = 1 − 푝 2푝 + 푝2푞 = 푞2푝 + 푝2푞 = 푝푞 푝 + 푞 = 푝푞 Binomial Distribution: We can refer to the ordered sequence of length n as a series of n repeated trials, where each trial produces a result that is either “success” or “failure”. We are interested in the random variable that reports the number x successes in n trials. Each trial is a Bernoulli trial which satisfies a) there are only two outcomes for each trial b) the probability of success is the same for each trial c) the outcomes for different trials are independent We are talking about the events 퐴푖 in the sample space S where 퐴1= s _ _ _ _ …. _; 퐴2= _ s _ _ _ …. _; 퐴3= _ _ s _ _ …. _; … ; 퐴푛= _ _ _ _ _ …. s; where by b) P(퐴1) = P(퐴2) = … = P(퐴푛) and by c) P(퐴푖 ∩ 퐴푗) = P(퐴푖) · P(퐴푗) for all distinct pairs i , j e.g. police roadblock checking for drivers who are wearing seatbelts condition a): two outcomes: “y” or “n” conditions b) &c): if the events 퐴1 to 퐴푛 contain all cars stopped, then b) and c) will be satisfied If however, event 퐴1 is broken into two (mutually exclusive sub-events), 퐴1< which is all events s _ _ _ … _ and driver 1 is less than 21 and 퐴1> which is all events s _ _ _ … _ and driver 1 is 21 or older it is entirely likely that P(퐴1<) ≠ P(퐴1>), and we would not be dealing with Bernoulli trials. If the someone caught not wearing a seatbelt began to warn oncoming cars approaching the roadblock, then P(퐴푖 ∩ 퐴푗) ≠ P(퐴푖) · P(퐴푗) for all i , j pairs and we would also not be dealing with Bernoulli trials. Note that in our definition of Bernoulli trials the number of trials n is fixed in advance All Bernoulli trials of length n have the same probability distribution!!!! (a consequence of the assumptions behind the definition of Bernoulli trials) This probability distribution is called the Binomial probability distribution for n. (it is called this because each trial has a binomial outcome “s” or “f” and the sequences generated (the composite outcomes) are binomial sequences.) e.g. Binomial probability distribution for n = 3. Sample space has 23 = 8 outcomes sss ssf sff fff sfs fsf fss ffs RV values 3 2 1 0 P(sss) = 1/8 = ½ · ½ · ½; P(ssf) = 1/8 = ½ · ½ · (1−½); P(fsf) = 1/8 = (1−½) · ½ · (1−½); etc. Probability Distribution x 0 1 2 3 f(x) 1/8 3/8 3/8 1/8 3 3 3 3 ½ 0 1 − ½ 3 ½ 1 1 − ½ 2 ½ 2 1 − ½ 1 ½ 3 1 − ½ 0 0 1 2 3 From this example, we see that the binomial probability distribution, which governs Bernoulli trials of length n is: 푛 푓(푥) ≡ 푏 푥; 푛, 푝 = 푝푥 1 − 푝 푛−푥 (BPD) 푥 where p is the (common) probability of success in any trial, and x = 0, 1, 2, …., n Note: 1. The term on the RHS of (BPD) is the x’th term of the binomial expansion of 푝 + (1 − 푝) 푛 푛 i.e. 푝 + (1 − 푝) 푛 = 푛 푝푥(1 − 푝)푛−푥 푥=0 푥 which also proves that 푛 푛 푝푥(1 − 푝)푛−푥 = 1푛 = 1 푥 푥=0 2. (BPD) is a 2-parameter family of distribution functions characterized by choice of n and p. e.g. In 60% of all solar-heat installations, the utility bill is reduced by at least 1/3. What is the probability that the utility bill will be reduced by at least 1/3 in a) 4 of 5 installations? b) at least 4 of 5 installation? a) “s” = “at least 1/3” (i.e. 1/3 or greater) “f” = “less than 1/3” P(Ai) = p = 0.6 Assume c) of Bernoulli trial assumptions holds. 5 Then f(4) = b(4; 5, 0.6) = 0.64 0.41 4 5 5 b) We want f(4) + f(5) = b(4; 5, 0.6) + b(5; 5, 0.6) = 0.64 0.41 + 0.65 0.40 4 5 Examples of binomial distribution Cumulative binomial probability distribution 풙 푩 풙; 풏, 풑 ≡ 풃 풌; 풏, 풑 (퐂퐁퐏퐃) 풌=ퟎ is the probability of x or fewer successes in n Bernoulli trials, were p is the probability of success on each trial. From (CBPD) we see 풃 풙; 풏, 풑 = 푩 풙; 풏, 풑 − 푩(풙 − ퟏ; 풏, 풑) Values of 푩 풙; 풏, 풑 are tabulated for various n and p values in Table 1 of Appendix B Cumulative binomial distribution cumulative probability e.g. probability is 0.05 for flange failure under a given load L.
Recommended publications
  • Bayesian Versus Frequentist Statistics for Uncertainty Analysis
    - 1 - Disentangling Classical and Bayesian Approaches to Uncertainty Analysis Robin Willink1 and Rod White2 1email: [email protected] 2Measurement Standards Laboratory PO Box 31310, Lower Hutt 5040 New Zealand email(corresponding author): [email protected] Abstract Since the 1980s, we have seen a gradual shift in the uncertainty analyses recommended in the metrological literature, principally Metrologia, and in the BIPM’s guidance documents; the Guide to the Expression of Uncertainty in Measurement (GUM) and its two supplements. The shift has seen the BIPM’s recommendations change from a purely classical or frequentist analysis to a purely Bayesian analysis. Despite this drift, most metrologists continue to use the predominantly frequentist approach of the GUM and wonder what the differences are, why there are such bitter disputes about the two approaches, and should I change? The primary purpose of this note is to inform metrologists of the differences between the frequentist and Bayesian approaches and the consequences of those differences. It is often claimed that a Bayesian approach is philosophically consistent and is able to tackle problems beyond the reach of classical statistics. However, while the philosophical consistency of the of Bayesian analyses may be more aesthetically pleasing, the value to science of any statistical analysis is in the long-term success rates and on this point, classical methods perform well and Bayesian analyses can perform poorly. Thus an important secondary purpose of this note is to highlight some of the weaknesses of the Bayesian approach. We argue that moving away from well-established, easily- taught frequentist methods that perform well, to computationally expensive and numerically inferior Bayesian analyses recommended by the GUM supplements is ill-advised.
    [Show full text]
  • Practical Statistics for Particle Physics Lecture 1 AEPS2018, Quy Nhon, Vietnam
    Practical Statistics for Particle Physics Lecture 1 AEPS2018, Quy Nhon, Vietnam Roger Barlow The University of Huddersfield August 2018 Roger Barlow ( Huddersfield) Statistics for Particle Physics August 2018 1 / 34 Lecture 1: The Basics 1 Probability What is it? Frequentist Probability Conditional Probability and Bayes' Theorem Bayesian Probability 2 Probability distributions and their properties Expectation Values Binomial, Poisson and Gaussian 3 Hypothesis testing Roger Barlow ( Huddersfield) Statistics for Particle Physics August 2018 2 / 34 Question: What is Probability? Typical exam question Q1 Explain what is meant by the Probability PA of an event A [1] Roger Barlow ( Huddersfield) Statistics for Particle Physics August 2018 3 / 34 Four possible answers PA is number obeying certain mathematical rules. PA is a property of A that determines how often A happens For N trials in which A occurs NA times, PA is the limit of NA=N for large N PA is my belief that A will happen, measurable by seeing what odds I will accept in a bet. Roger Barlow ( Huddersfield) Statistics for Particle Physics August 2018 4 / 34 Mathematical Kolmogorov Axioms: For all A ⊂ S PA ≥ 0 PS = 1 P(A[B) = PA + PB if A \ B = ϕ and A; B ⊂ S From these simple axioms a complete and complicated structure can be − ≤ erected. E.g. show PA = 1 PA, and show PA 1.... But!!! This says nothing about what PA actually means. Kolmogorov had frequentist probability in mind, but these axioms apply to any definition. Roger Barlow ( Huddersfield) Statistics for Particle Physics August 2018 5 / 34 Classical or Real probability Evolved during the 18th-19th century Developed (Pascal, Laplace and others) to serve the gambling industry.
    [Show full text]
  • This History of Modern Mathematical Statistics Retraces Their Development
    BOOK REVIEWS GORROOCHURN Prakash, 2016, Classic Topics on the History of Modern Mathematical Statistics: From Laplace to More Recent Times, Hoboken, NJ, John Wiley & Sons, Inc., 754 p. This history of modern mathematical statistics retraces their development from the “Laplacean revolution,” as the author so rightly calls it (though the beginnings are to be found in Bayes’ 1763 essay(1)), through the mid-twentieth century and Fisher’s major contribution. Up to the nineteenth century the book covers the same ground as Stigler’s history of statistics(2), though with notable differences (see infra). It then discusses developments through the first half of the twentieth century: Fisher’s synthesis but also the renewal of Bayesian methods, which implied a return to Laplace. Part I offers an in-depth, chronological account of Laplace’s approach to probability, with all the mathematical detail and deductions he drew from it. It begins with his first innovative articles and concludes with his philosophical synthesis showing that all fields of human knowledge are connected to the theory of probabilities. Here Gorrouchurn raises a problem that Stigler does not, that of induction (pp. 102-113), a notion that gives us a better understanding of probability according to Laplace. The term induction has two meanings, the first put forward by Bacon(3) in 1620, the second by Hume(4) in 1748. Gorroochurn discusses only the second. For Bacon, induction meant discovering the principles of a system by studying its properties through observation and experimentation. For Hume, induction was mere enumeration and could not lead to certainty. Laplace followed Bacon: “The surest method which can guide us in the search for truth, consists in rising by induction from phenomena to laws and from laws to forces”(5).
    [Show full text]
  • 3.2.3 Binomial Distribution
    3.2.3 Binomial Distribution The binomial distribution is based on the idea of a Bernoulli trial. A Bernoulli trail is an experiment with two, and only two, possible outcomes. A random variable X has a Bernoulli(p) distribution if 8 > <1 with probability p X = > :0 with probability 1 − p, where 0 ≤ p ≤ 1. The value X = 1 is often termed a “success” and X = 0 is termed a “failure”. The mean and variance of a Bernoulli(p) random variable are easily seen to be EX = (1)(p) + (0)(1 − p) = p and VarX = (1 − p)2p + (0 − p)2(1 − p) = p(1 − p). In a sequence of n identical, independent Bernoulli trials, each with success probability p, define the random variables X1,...,Xn by 8 > <1 with probability p X = i > :0 with probability 1 − p. The random variable Xn Y = Xi i=1 has the binomial distribution and it the number of sucesses among n independent trials. The probability mass function of Y is µ ¶ ¡ ¢ n ¡ ¢ P Y = y = py 1 − p n−y. y For this distribution, t n EX = np, Var(X) = np(1 − p),MX (t) = [pe + (1 − p)] . 1 Theorem 3.2.2 (Binomial theorem) For any real numbers x and y and integer n ≥ 0, µ ¶ Xn n (x + y)n = xiyn−i. i i=0 If we take x = p and y = 1 − p, we get µ ¶ Xn n 1 = (p + (1 − p))n = pi(1 − p)n−i. i i=0 Example 3.2.2 (Dice probabilities) Suppose we are interested in finding the probability of obtaining at least one 6 in four rolls of a fair die.
    [Show full text]
  • The Likelihood Principle
    1 01/28/99 ãMarc Nerlove 1999 Chapter 1: The Likelihood Principle "What has now appeared is that the mathematical concept of probability is ... inadequate to express our mental confidence or diffidence in making ... inferences, and that the mathematical quantity which usually appears to be appropriate for measuring our order of preference among different possible populations does not in fact obey the laws of probability. To distinguish it from probability, I have used the term 'Likelihood' to designate this quantity; since both the words 'likelihood' and 'probability' are loosely used in common speech to cover both kinds of relationship." R. A. Fisher, Statistical Methods for Research Workers, 1925. "What we can find from a sample is the likelihood of any particular value of r [a parameter], if we define the likelihood as a quantity proportional to the probability that, from a particular population having that particular value of r, a sample having the observed value r [a statistic] should be obtained. So defined, probability and likelihood are quantities of an entirely different nature." R. A. Fisher, "On the 'Probable Error' of a Coefficient of Correlation Deduced from a Small Sample," Metron, 1:3-32, 1921. Introduction The likelihood principle as stated by Edwards (1972, p. 30) is that Within the framework of a statistical model, all the information which the data provide concerning the relative merits of two hypotheses is contained in the likelihood ratio of those hypotheses on the data. ...For a continuum of hypotheses, this principle
    [Show full text]
  • People's Intuitions About Randomness and Probability
    20 PEOPLE’S INTUITIONS ABOUT RANDOMNESS AND PROBABILITY: AN EMPIRICAL STUDY4 MARIE-PAULE LECOUTRE ERIS, Université de Rouen [email protected] KATIA ROVIRA Laboratoire Psy.Co, Université de Rouen [email protected] BRUNO LECOUTRE ERIS, C.N.R.S. et Université de Rouen [email protected] JACQUES POITEVINEAU ERIS, Université de Paris 6 et Ministère de la Culture [email protected] ABSTRACT What people mean by randomness should be taken into account when teaching statistical inference. This experiment explored subjective beliefs about randomness and probability through two successive tasks. Subjects were asked to categorize 16 familiar items: 8 real items from everyday life experiences, and 8 stochastic items involving a repeatable process. Three groups of subjects differing according to their background knowledge of probability theory were compared. An important finding is that the arguments used to judge if an event is random and those to judge if it is not random appear to be of different natures. While the concept of probability has been introduced to formalize randomness, a majority of individuals appeared to consider probability as a primary concept. Keywords: Statistics education research; Probability; Randomness; Bayesian Inference 1. INTRODUCTION In recent years Bayesian statistical practice has considerably evolved. Nowadays, the frequentist approach is increasingly challenged among scientists by the Bayesian proponents (see e.g., D’Agostini, 2000; Lecoutre, Lecoutre & Poitevineau, 2001; Jaynes, 2003). In applied statistics, “objective Bayesian techniques” (Berger, 2004) are now a promising alternative to the traditional frequentist statistical inference procedures (significance tests and confidence intervals). Subjective Bayesian analysis also has a role to play in scientific investigations (see e.g., Kadane, 1996).
    [Show full text]
  • The Interplay of Bayesian and Frequentist Analysis M.J.Bayarriandj.O.Berger
    Statistical Science 2004, Vol. 19, No. 1, 58–80 DOI 10.1214/088342304000000116 © Institute of Mathematical Statistics, 2004 The Interplay of Bayesian and Frequentist Analysis M.J.BayarriandJ.O.Berger Abstract. Statistics has struggled for nearly a century over the issue of whether the Bayesian or frequentist paradigm is superior. This debate is far from over and, indeed, should continue, since there are fundamental philosophical and pedagogical issues at stake. At the methodological level, however, the debate has become considerably muted, with the recognition that each approach has a great deal to contribute to statistical practice and each is actually essential for full development of the other approach. In this article, we embark upon a rather idiosyncratic walk through some of these issues. Key words and phrases: Admissibility, Bayesian model checking, condi- tional frequentist, confidence intervals, consistency, coverage, design, hierar- chical models, nonparametric Bayes, objective Bayesian methods, p-values, reference priors, testing. CONTENTS 5. Areas of current disagreement 6. Conclusions 1. Introduction Acknowledgments 2. Inherently joint Bayesian–frequentist situations References 2.1. Design or preposterior analysis 2.2. The meaning of frequentism 2.3. Empirical Bayes, gamma minimax, restricted risk 1. INTRODUCTION Bayes Statisticians should readily use both Bayesian and 3. Estimation and confidence intervals frequentist ideas. In Section 2 we discuss situations 3.1. Computation with hierarchical, multilevel or mixed in which simultaneous frequentist and Bayesian think- model analysis ing is essentially required. For the most part, how- 3.2. Assessment of accuracy of estimation ever, the situations we discuss are situations in which 3.3. Foundations, minimaxity and exchangeability 3.4.
    [Show full text]
  • 5 Stochastic Processes
    5 Stochastic Processes Contents 5.1. The Bernoulli Process ...................p.3 5.2. The Poisson Process .................. p.15 1 2 Stochastic Processes Chap. 5 A stochastic process is a mathematical model of a probabilistic experiment that evolves in time and generates a sequence of numerical values. For example, a stochastic process can be used to model: (a) the sequence of daily prices of a stock; (b) the sequence of scores in a football game; (c) the sequence of failure times of a machine; (d) the sequence of hourly traffic loads at a node of a communication network; (e) the sequence of radar measurements of the position of an airplane. Each numerical value in the sequence is modeled by a random variable, so a stochastic process is simply a (finite or infinite) sequence of random variables and does not represent a major conceptual departure from our basic framework. We are still dealing with a single basic experiment that involves outcomes gov- erned by a probability law, and random variables that inherit their probabilistic † properties from that law. However, stochastic processes involve some change in emphasis over our earlier models. In particular: (a) We tend to focus on the dependencies in the sequence of values generated by the process. For example, how do future prices of a stock depend on past values? (b) We are often interested in long-term averages,involving the entire se- quence of generated values. For example, what is the fraction of time that a machine is idle? (c) We sometimes wish to characterize the likelihood or frequency of certain boundary events.
    [Show full text]
  • 1 Normal Distribution
    1 Normal Distribution. 1.1 Introduction A Bernoulli trial is simple random experiment that ends in success or failure. A Bernoulli trial can be used to make a new random experiment by repeating the Bernoulli trial and recording the number of successes. Now repeating a Bernoulli trial a large number of times has an irritating side e¤ect. Suppose we take a tossed die and look for a 3 to come up, but we do this 6000 times. 1 This is a Bernoulli trial with a probability of success of 6 repeated 6000 times. What is the probability that we will see exactly 1000 success? This is de…nitely the most likely possible outcome, 1000 successes out of 6000 tries. But it is still very unlikely that an particular experiment like this will turn out so exactly. In fact, if 6000 tosses did produce exactly 1000 successes, that would be rather suspicious. The probability of exactly 1000 successes in 6000 tries almost does not need to be calculated. whatever the probability of this, it will be very close to zero. It is probably too small a probability to be of any practical use. It turns out that it is not all that bad, 0:014. Still this is small enough that it means that have even a chance of seeing it actually happen, we would need to repeat the full experiment as many as 100 times. All told, 600,000 tosses of a die. When we repeat a Bernoulli trial a large number of times, it is unlikely that we will be interested in a speci…c number of successes, and much more likely that we will be interested in the event that the number of successes lies within a range of possibilities.
    [Show full text]
  • Bernoulli Random Forests: Closing the Gap Between Theoretical Consistency and Empirical Soundness
    Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16) Bernoulli Random Forests: Closing the Gap between Theoretical Consistency and Empirical Soundness , , , ? Yisen Wang† ‡, Qingtao Tang† ‡, Shu-Tao Xia† ‡, Jia Wu , Xingquan Zhu ⇧ † Dept. of Computer Science and Technology, Tsinghua University, China ‡ Graduate School at Shenzhen, Tsinghua University, China ? Quantum Computation & Intelligent Systems Centre, University of Technology Sydney, Australia ⇧ Dept. of Computer & Electrical Engineering and Computer Science, Florida Atlantic University, USA wangys14, tqt15 @mails.tsinghua.edu.cn; [email protected]; [email protected]; [email protected] { } Traditional Bernoulli Trial Controlled Abstract Tree Node Splitting Tree Node Splitting Random forests are one type of the most effective ensemble learning methods. In spite of their sound Random Attribute Bagging Bernoulli Trial Controlled empirical performance, the study on their theoreti- Attribute Bagging cal properties has been left far behind. Recently, several random forests variants with nice theoreti- Random Structure/Estimation cal basis have been proposed, but they all suffer Random Bootstrap Sampling Points Splitting from poor empirical performance. In this paper, we (a) Breiman RF (b) BRF propose a Bernoulli random forests model (BRF), which intends to close the gap between the theoreti- Figure 1: Comparisons between Breiman RF (left panel) vs. cal consistency and the empirical soundness of ran- the proposed BRF (right panel). The tree node splitting of dom forests classification. Compared to Breiman’s Breiman RF is deterministic, so the final trees are highly data- original random forests, BRF makes two simplifi- dependent. Instead, BRF employs two Bernoulli distributions cations in tree construction by using two indepen- to control the tree construction.
    [Show full text]
  • Frequentism-As-Model
    Frequentism-as-model Christian Hennig, Dipartimento di Scienze Statistiche, Universita die Bologna, Via delle Belle Arti, 41, 40126 Bologna [email protected] July 14, 2020 Abstract: Most statisticians are aware that probability models interpreted in a frequentist manner are not really true in objective reality, but only idealisations. I argue that this is often ignored when actually applying frequentist methods and interpreting the results, and that keeping up the awareness for the essential difference between reality and models can lead to a more appropriate use and in- terpretation of frequentist models and methods, called frequentism-as-model. This is elaborated showing connections to existing work, appreciating the special role of i.i.d. models and subject matter knowledge, giving an account of how and under what conditions models that are not true can be useful, giving detailed interpreta- tions of tests and confidence intervals, confronting their implicit compatibility logic with the inverse probability logic of Bayesian inference, re-interpreting the role of model assumptions, appreciating robustness, and the role of “interpretative equiv- alence” of models. Epistemic (often referred to as Bayesian) probability shares the issue that its models are only idealisations and not really true for modelling reasoning about uncertainty, meaning that it does not have an essential advantage over frequentism, as is often claimed. Bayesian statistics can be combined with frequentism-as-model, leading to what Gelman and Hennig (2017) call “falsifica- arXiv:2007.05748v1 [stat.OT] 11 Jul 2020 tionist Bayes”. Key Words: foundations of statistics, Bayesian statistics, interpretational equiv- alence, compatibility logic, inverse probability logic, misspecification testing, sta- bility, robustness 1 Introduction The frequentist interpretation of probability and frequentist inference such as hy- pothesis tests and confidence intervals have been strongly criticised recently (e.g., Hajek (2009); Diaconis and Skyrms (2018); Wasserstein et al.
    [Show full text]
  • The P–T Probability Framework for Semantic Communication, Falsification, Confirmation, and Bayesian Reasoning
    philosophies Article The P–T Probability Framework for Semantic Communication, Falsification, Confirmation, and Bayesian Reasoning Chenguang Lu School of Computer Engineering and Applied Mathematics, Changsha University, Changsha 410003, China; [email protected] Received: 8 July 2020; Accepted: 16 September 2020; Published: 2 October 2020 Abstract: Many researchers want to unify probability and logic by defining logical probability or probabilistic logic reasonably. This paper tries to unify statistics and logic so that we can use both statistical probability and logical probability at the same time. For this purpose, this paper proposes the P–T probability framework, which is assembled with Shannon’s statistical probability framework for communication, Kolmogorov’s probability axioms for logical probability, and Zadeh’s membership functions used as truth functions. Two kinds of probabilities are connected by an extended Bayes’ theorem, with which we can convert a likelihood function and a truth function from one to another. Hence, we can train truth functions (in logic) by sampling distributions (in statistics). This probability framework was developed in the author’s long-term studies on semantic information, statistical learning, and color vision. This paper first proposes the P–T probability framework and explains different probabilities in it by its applications to semantic information theory. Then, this framework and the semantic information methods are applied to statistical learning, statistical mechanics, hypothesis evaluation (including falsification), confirmation, and Bayesian reasoning. Theoretical applications illustrate the reasonability and practicability of this framework. This framework is helpful for interpretable AI. To interpret neural networks, we need further study. Keywords: statistical probability; logical probability; semantic information; rate-distortion; Boltzmann distribution; falsification; verisimilitude; confirmation measure; Raven Paradox; Bayesian reasoning 1.
    [Show full text]