Likelihood and Probability What We Will Cover: What We Will Not Cover: in Scientific Inference •The Basis of Inference in Science

Total Page:16

File Type:pdf, Size:1020Kb

Likelihood and Probability What We Will Cover: What We Will Not Cover: in Scientific Inference •The Basis of Inference in Science Likelihood and Probability What we will cover: What we will not cover: in Scientific Inference •The basis of inference in science. Not suitable for people already well-versed in statistics. They’ll already know most of this! •Compare the "frequentist" approach we UCL Graduate School: Graduate Skills Course usually learn, with today’s widely used Not suitable for people who’ve no idea about alternatives in statistical inference. statistics. At least GCSE knowledge required. Your hosts for today: We won’t have time to teach you all you need to ¾Particularly the use of "likelihood" and Particularly the use of "likelihood" and know to analyse your data. James Mallet, Professor of Biological Diversity Bayesian probability. Ziheng Yang, Professor of Statistical Genetics http://abacus.gene.ucl.ac.uk/ We won’t have time to go into very complicated http://abacus.gene.ucl.ac.uk/jim/ •We will hopefully empower to develop examples. (Department of Biology, UCL) your own analyses, using simple examples. Instead, we hope My main source Overview You begin to develop a healthy disrespect for most “off-the-shelf” methods. (But you will Anthony W. Edwards • What is scientific inference? probably still use them). (1972); reprinted 1992: • Three philosophies of statistical inference: – Frequentist (probability in the long run) You start to form your own ideas of how Likelihood. Cambridge UP – Likelihood (likelihood) statistics and scientific inference are related (a – Bayesian (posterior probability) philosophy of science topic). • Common Ground: Opposing philosophies agree (approximately) on many problems That your interest in likelihood and Bayesian analysis is piqued, and you might be motivated see also more in-depth: to do further reading. • Discussion Yudi Pawitan (2001). • Exercises, example of ABO bloodgroups You become empowered to perform simple In all Likelihood. Statistical • Ziheng’s talk: when philosophies conflict ... statistical analyses, using Excel and Excel's Modelling and Inference Solver "add-in". + a little programming ⇒ you Modelling and Inference can analyse much more difficult problems. using Likelihood. Oxford UP Scientific Inference The nature of scientific Models and hypotheses inference Science is about trying to find • What is scientific inference? “predictability” or “regularities” in nature, “I’m sure this is true” “predictability” or “regularities” in nature, • Three philosophies of statistical inference: which we can use. – Frequentist (probability in the long run) “I’m pretty sure” “I’m not sure” “It is likely that...” – Likelihood (likelihood measures strength) For some reason, this usually seems to “This seems most probable to me” For some reason, this usually seems to – Bayesian (posterior probability " " ) work ... • Common ground: Opposing philosophies All of inference about the world is likely to Models and hypotheses allow prediction. agree (approximately), in many problems. be based on probability; it’s statistical. Models and hypotheses allow prediction. We test them by analysing something (Except divine revelation!) about their “likelihood” or “probability” 1 Models and hypotheses in Models and hypotheses in Data is typically discrete For example, statistical inference ... Counts of things milk fat ... Measurements to nearest mm, 0.1ºC Models are assumed to be true for the Data is also finite purposes of the particular test or problem Models, hypotheses can be discrete too, or e.g. we assume height in humans to be continuous. Models and hypotheses may be normally distributed. finite, or infinite in scope. A good method of inference should take this Hypotheses are “parameters” that are the discreteness of data into account when we focus of interest in estimation analyse the data. Many analyses, e.g. mean and variance of height humans. From Sokal & Rohlf 1981, particularly frequentist, don’t! Biometry, p. 47 Null hypotheses in statistics Estimation is primary The three philosophies We are often taught in biology a simplistic Edwards argues that we should turn this • What is scientific inference? kind of “Popperian” approach to science, to argument on its head. kind of “Popperian” approach to science, to • Three philosophies of statistical inference: falsify simple hypotheses. We then try to test – Frequentist (probability in the long run) the null hypothesis! Estimation of a distribution or model can lead to testing of an infinitude of – Likelihood (likelihood) – Bayesian (posterior probability) (Zero-dimensional statistics, if you like; only hypotheses, including the null hypothesis. – Bayesian (posterior probability) one hypothesis can be excluded). • Common ground: Opposing philosophies Uses full dimensionality of the problem: agree (approximately), in many problems. In this view, estimation (e.g. mean, variance) ≥1 – n-dimensional statistical analyses. is like natural history, not good science. Physics-envy? More powerful! 1. Frequentist, significance Philosophical problems P - values testing, P-values with frequentist approach P-values are “tail probabilities” Perfected in 1920s We only have one set of data; seems to (Pearson, Fisher et al.) imagine the experiment done a very large “What the use of P χ2 e.g. test, or t-test number of times implies, therefore, is that a χ2 = 5.28, d.f. = 1; hypothesis that may be or t=3.92, d.f.=10 Often tend to assume the data come from a true may be rejected We find P<0.05, or P=0.009834 continuous distribution; because it has not e.g. χ2 tests on count data, Σ(O-E)2/E predicted observable This is “tail probability” or “probability in the results that have not long run” of getting results at least as extreme Encourages testing of null hypothesis occurred” Jeffreys 1961 as the data under the null hypothesis 2 Alternatives to frequentism 2. Likelihood The Law of Likelihood • Frequentism: “Probability in the long run” The likelihood of a hypothesis (H) after doing “Within the framework of a statistical an experiment or gathering data (D) is the model, a particular set of data supports one • Two alternative measures of support: probability of the data given the hypothesis statistical hypothesis better than another if – Bayesian Probability (Thomas Bayes 1763, Marquis de Laplace 1820) the likelihood of the first hypothesis on the “The probability of a hypothesis given the data” L(H|D) = P(D|H) data exceeds the likelihood of the second hypothesis” – Likelihood (RA Fisher 1920s, Edwards 1972) Probabilities add to 1 for each hypothesis (by “The probability of the data given a hypothesis” definition), but do not add to 1 across different P(D | H ) (can be viewed as a simplified form of Bayesian = 1 probability) hypotheses – hence “Likelihood” Likelihood Ratio P(D | H 2 ) Example: binomial A common frequentist Support distribution approach: Support is defined as the natural Supposing we are interested in estimating the allele frequency of a gene in a sample: Sample mean p* = 2/10 = 0.2 logarithm of the likelihood ratio 2 A a Total alleles Sample variance, sp = p*q*/n = 0.2x0.8/10 = 0.016 P(D | H ) = 1 Standard deviation of mean, s =√0.016=0.126 Support loge 2 8 10 p P(D | H 2 ) i (n-i) n ± 95% conf. limits of mean = p* t9,0.05sp = − loge P(D | H1) loge P(D | H 2 ) This is a problem that is well suited to the binomial theorem: = 0.2 ± 2.262 x 0.126 n −−n! P(D|H ) = ppiniini()11−=()pp()− () = (-0.085, +0.485) j i ini!(− )! Note the NEGATIVE lower limit! Likelihood plot Likelihood 0.008 Likelihood approach Likelihood & the binomial & the 0.007 & the 0.006 n=10 0.005 To get the support for two hypotheses, we need Binomial probability sample size "successes" using likelihood n= 10 i= 2 binomial 0.004 0.003 to calculate: Likelihood/B ln likelihood ln likelihood ratio Likelihood n=40 Hj = p p^i(1-p)^(n-i) The support curve gives 0.002 P(D | H ) 0 0 #NUM! (impossible) #NUM! (->minus infinity) a measure of belief in the 0.001 Support = log 1 0.001 1.002E-06 -13.81351 -8.36546 continuously variable 0 e 0.01 9.22745E-05 -9.290743 -4.19635 hypotheses 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P(D | H ) 0.05 0.001658551 -6.401811 -1.39779 Binom ial p 2 0.1 0.004304672 -5.448054 -0.44403 0.15 0.006131037 -5.094391 -0.09037 Edwards: 2 units below Log Likelihood plot 0.2 0.006710886 -5.004024 *=max (i=2)! 0 Note! The binomial coefficient depends only the can be viewed as Bin om ial p 0.25 0.006257057 -5.074045 -0.07002 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.3 0.005188321 -5.261345 -0.25732 “support limits” on the data (D), not on the hypothesis (H) 0 0.35 0.003903399 -5.545908 -0.54188 0.4 0.002687386 -5.919186 -0.91516 -1 n −−n! (equivalent to approx 2 iniini−=()− () 0.45 0.001695612 -6.379711 -1.37569 P(D|H ) = pp()11pp() -2 j i in!(− i )! 0.5 0.000976563 -6.931472 -1.92745 standard deviations in 0.55 0.000508658 -7.583736 -2.57971 the frequentist approach) -3 0.6 0.00023593 -8.351977 -3.34795 Ln Likelihood 0.65 9.51417E-05 -9.260143 -4.25612 -4 n=10 Binomial coeff. cancels! No need to calculate the 2 0.7 3.21489E-05 -10.34513 -5.34111 logeLR=2 implies LR=e , n=40 tedious constant! Just need the pi(1–p)(n-i) terms the best is 7.4x as good -5 3 Bayes’ Theorem as a means Sum of support from different experiments 3. Bayes’ Theorem Bayes’ Theorem as a means of inference Likelihood of Binomial p P(B | A)P(A) Sum of support = P(A | B) P(H1 | D) k.P(D | H1 )P(H1) 0 0.2 0.4 0.6 0.8(rescaled) 1 = 0 P(B) P(H 2 | D) k.P(D | H 2 )P(H 2 ) -1 Named after its inventor, Thomas Bayes in 18th Century England.
Recommended publications
  • Creating Modern Probability. Its Mathematics, Physics and Philosophy in Historical Perspective
    HM 23 REVIEWS 203 The reviewer hopes this book will be widely read and enjoyed, and that it will be followed by other volumes telling even more of the fascinating story of Soviet mathematics. It should also be followed in a few years by an update, so that we can know if this great accumulation of talent will have survived the economic and political crisis that is just now robbing it of many of its most brilliant stars (see the article, ``To guard the future of Soviet mathematics,'' by A. M. Vershik, O. Ya. Viro, and L. A. Bokut' in Vol. 14 (1992) of The Mathematical Intelligencer). Creating Modern Probability. Its Mathematics, Physics and Philosophy in Historical Perspective. By Jan von Plato. Cambridge/New York/Melbourne (Cambridge Univ. Press). 1994. 323 pp. View metadata, citation and similar papers at core.ac.uk brought to you by CORE Reviewed by THOMAS HOCHKIRCHEN* provided by Elsevier - Publisher Connector Fachbereich Mathematik, Bergische UniversitaÈt Wuppertal, 42097 Wuppertal, Germany Aside from the role probabilistic concepts play in modern science, the history of the axiomatic foundation of probability theory is interesting from at least two more points of view. Probability as it is understood nowadays, probability in the sense of Kolmogorov (see [3]), is not easy to grasp, since the de®nition of probability as a normalized measure on a s-algebra of ``events'' is not a very obvious one. Furthermore, the discussion of different concepts of probability might help in under- standing the philosophy and role of ``applied mathematics.'' So the exploration of the creation of axiomatic probability should be interesting not only for historians of science but also for people concerned with didactics of mathematics and for those concerned with philosophical questions.
    [Show full text]
  • There Is No Pure Empirical Reasoning
    There Is No Pure Empirical Reasoning 1. Empiricism and the Question of Empirical Reasons Empiricism may be defined as the view there is no a priori justification for any synthetic claim. Critics object that empiricism cannot account for all the kinds of knowledge we seem to possess, such as moral knowledge, metaphysical knowledge, mathematical knowledge, and modal knowledge.1 In some cases, empiricists try to account for these types of knowledge; in other cases, they shrug off the objections, happily concluding, for example, that there is no moral knowledge, or that there is no metaphysical knowledge.2 But empiricism cannot shrug off just any type of knowledge; to be minimally plausible, empiricism must, for example, at least be able to account for paradigm instances of empirical knowledge, including especially scientific knowledge. Empirical knowledge can be divided into three categories: (a) knowledge by direct observation; (b) knowledge that is deductively inferred from observations; and (c) knowledge that is non-deductively inferred from observations, including knowledge arrived at by induction and inference to the best explanation. Category (c) includes all scientific knowledge. This category is of particular import to empiricists, many of whom take scientific knowledge as a sort of paradigm for knowledge in general; indeed, this forms a central source of motivation for empiricism.3 Thus, if there is any kind of knowledge that empiricists need to be able to account for, it is knowledge of type (c). I use the term “empirical reasoning” to refer to the reasoning involved in acquiring this type of knowledge – that is, to any instance of reasoning in which (i) the premises are justified directly by observation, (ii) the reasoning is non- deductive, and (iii) the reasoning provides adequate justification for the conclusion.
    [Show full text]
  • The Interpretation of Probability: Still an Open Issue? 1
    philosophies Article The Interpretation of Probability: Still an Open Issue? 1 Maria Carla Galavotti Department of Philosophy and Communication, University of Bologna, Via Zamboni 38, 40126 Bologna, Italy; [email protected] Received: 19 July 2017; Accepted: 19 August 2017; Published: 29 August 2017 Abstract: Probability as understood today, namely as a quantitative notion expressible by means of a function ranging in the interval between 0–1, took shape in the mid-17th century, and presents both a mathematical and a philosophical aspect. Of these two sides, the second is by far the most controversial, and fuels a heated debate, still ongoing. After a short historical sketch of the birth and developments of probability, its major interpretations are outlined, by referring to the work of their most prominent representatives. The final section addresses the question of whether any of such interpretations can presently be considered predominant, which is answered in the negative. Keywords: probability; classical theory; frequentism; logicism; subjectivism; propensity 1. A Long Story Made Short Probability, taken as a quantitative notion whose value ranges in the interval between 0 and 1, emerged around the middle of the 17th century thanks to the work of two leading French mathematicians: Blaise Pascal and Pierre Fermat. According to a well-known anecdote: “a problem about games of chance proposed to an austere Jansenist by a man of the world was the origin of the calculus of probabilities”2. The ‘man of the world’ was the French gentleman Chevalier de Méré, a conspicuous figure at the court of Louis XIV, who asked Pascal—the ‘austere Jansenist’—the solution to some questions regarding gambling, such as how many dice tosses are needed to have a fair chance to obtain a double-six, or how the players should divide the stakes if a game is interrupted.
    [Show full text]
  • Bayesian Versus Frequentist Statistics for Uncertainty Analysis
    - 1 - Disentangling Classical and Bayesian Approaches to Uncertainty Analysis Robin Willink1 and Rod White2 1email: [email protected] 2Measurement Standards Laboratory PO Box 31310, Lower Hutt 5040 New Zealand email(corresponding author): [email protected] Abstract Since the 1980s, we have seen a gradual shift in the uncertainty analyses recommended in the metrological literature, principally Metrologia, and in the BIPM’s guidance documents; the Guide to the Expression of Uncertainty in Measurement (GUM) and its two supplements. The shift has seen the BIPM’s recommendations change from a purely classical or frequentist analysis to a purely Bayesian analysis. Despite this drift, most metrologists continue to use the predominantly frequentist approach of the GUM and wonder what the differences are, why there are such bitter disputes about the two approaches, and should I change? The primary purpose of this note is to inform metrologists of the differences between the frequentist and Bayesian approaches and the consequences of those differences. It is often claimed that a Bayesian approach is philosophically consistent and is able to tackle problems beyond the reach of classical statistics. However, while the philosophical consistency of the of Bayesian analyses may be more aesthetically pleasing, the value to science of any statistical analysis is in the long-term success rates and on this point, classical methods perform well and Bayesian analyses can perform poorly. Thus an important secondary purpose of this note is to highlight some of the weaknesses of the Bayesian approach. We argue that moving away from well-established, easily- taught frequentist methods that perform well, to computationally expensive and numerically inferior Bayesian analyses recommended by the GUM supplements is ill-advised.
    [Show full text]
  • Practical Statistics for Particle Physics Lecture 1 AEPS2018, Quy Nhon, Vietnam
    Practical Statistics for Particle Physics Lecture 1 AEPS2018, Quy Nhon, Vietnam Roger Barlow The University of Huddersfield August 2018 Roger Barlow ( Huddersfield) Statistics for Particle Physics August 2018 1 / 34 Lecture 1: The Basics 1 Probability What is it? Frequentist Probability Conditional Probability and Bayes' Theorem Bayesian Probability 2 Probability distributions and their properties Expectation Values Binomial, Poisson and Gaussian 3 Hypothesis testing Roger Barlow ( Huddersfield) Statistics for Particle Physics August 2018 2 / 34 Question: What is Probability? Typical exam question Q1 Explain what is meant by the Probability PA of an event A [1] Roger Barlow ( Huddersfield) Statistics for Particle Physics August 2018 3 / 34 Four possible answers PA is number obeying certain mathematical rules. PA is a property of A that determines how often A happens For N trials in which A occurs NA times, PA is the limit of NA=N for large N PA is my belief that A will happen, measurable by seeing what odds I will accept in a bet. Roger Barlow ( Huddersfield) Statistics for Particle Physics August 2018 4 / 34 Mathematical Kolmogorov Axioms: For all A ⊂ S PA ≥ 0 PS = 1 P(A[B) = PA + PB if A \ B = ϕ and A; B ⊂ S From these simple axioms a complete and complicated structure can be − ≤ erected. E.g. show PA = 1 PA, and show PA 1.... But!!! This says nothing about what PA actually means. Kolmogorov had frequentist probability in mind, but these axioms apply to any definition. Roger Barlow ( Huddersfield) Statistics for Particle Physics August 2018 5 / 34 Classical or Real probability Evolved during the 18th-19th century Developed (Pascal, Laplace and others) to serve the gambling industry.
    [Show full text]
  • This History of Modern Mathematical Statistics Retraces Their Development
    BOOK REVIEWS GORROOCHURN Prakash, 2016, Classic Topics on the History of Modern Mathematical Statistics: From Laplace to More Recent Times, Hoboken, NJ, John Wiley & Sons, Inc., 754 p. This history of modern mathematical statistics retraces their development from the “Laplacean revolution,” as the author so rightly calls it (though the beginnings are to be found in Bayes’ 1763 essay(1)), through the mid-twentieth century and Fisher’s major contribution. Up to the nineteenth century the book covers the same ground as Stigler’s history of statistics(2), though with notable differences (see infra). It then discusses developments through the first half of the twentieth century: Fisher’s synthesis but also the renewal of Bayesian methods, which implied a return to Laplace. Part I offers an in-depth, chronological account of Laplace’s approach to probability, with all the mathematical detail and deductions he drew from it. It begins with his first innovative articles and concludes with his philosophical synthesis showing that all fields of human knowledge are connected to the theory of probabilities. Here Gorrouchurn raises a problem that Stigler does not, that of induction (pp. 102-113), a notion that gives us a better understanding of probability according to Laplace. The term induction has two meanings, the first put forward by Bacon(3) in 1620, the second by Hume(4) in 1748. Gorroochurn discusses only the second. For Bacon, induction meant discovering the principles of a system by studying its properties through observation and experimentation. For Hume, induction was mere enumeration and could not lead to certainty. Laplace followed Bacon: “The surest method which can guide us in the search for truth, consists in rising by induction from phenomena to laws and from laws to forces”(5).
    [Show full text]
  • The Likelihood Principle
    1 01/28/99 ãMarc Nerlove 1999 Chapter 1: The Likelihood Principle "What has now appeared is that the mathematical concept of probability is ... inadequate to express our mental confidence or diffidence in making ... inferences, and that the mathematical quantity which usually appears to be appropriate for measuring our order of preference among different possible populations does not in fact obey the laws of probability. To distinguish it from probability, I have used the term 'Likelihood' to designate this quantity; since both the words 'likelihood' and 'probability' are loosely used in common speech to cover both kinds of relationship." R. A. Fisher, Statistical Methods for Research Workers, 1925. "What we can find from a sample is the likelihood of any particular value of r [a parameter], if we define the likelihood as a quantity proportional to the probability that, from a particular population having that particular value of r, a sample having the observed value r [a statistic] should be obtained. So defined, probability and likelihood are quantities of an entirely different nature." R. A. Fisher, "On the 'Probable Error' of a Coefficient of Correlation Deduced from a Small Sample," Metron, 1:3-32, 1921. Introduction The likelihood principle as stated by Edwards (1972, p. 30) is that Within the framework of a statistical model, all the information which the data provide concerning the relative merits of two hypotheses is contained in the likelihood ratio of those hypotheses on the data. ...For a continuum of hypotheses, this principle
    [Show full text]
  • People's Intuitions About Randomness and Probability
    20 PEOPLE’S INTUITIONS ABOUT RANDOMNESS AND PROBABILITY: AN EMPIRICAL STUDY4 MARIE-PAULE LECOUTRE ERIS, Université de Rouen [email protected] KATIA ROVIRA Laboratoire Psy.Co, Université de Rouen [email protected] BRUNO LECOUTRE ERIS, C.N.R.S. et Université de Rouen [email protected] JACQUES POITEVINEAU ERIS, Université de Paris 6 et Ministère de la Culture [email protected] ABSTRACT What people mean by randomness should be taken into account when teaching statistical inference. This experiment explored subjective beliefs about randomness and probability through two successive tasks. Subjects were asked to categorize 16 familiar items: 8 real items from everyday life experiences, and 8 stochastic items involving a repeatable process. Three groups of subjects differing according to their background knowledge of probability theory were compared. An important finding is that the arguments used to judge if an event is random and those to judge if it is not random appear to be of different natures. While the concept of probability has been introduced to formalize randomness, a majority of individuals appeared to consider probability as a primary concept. Keywords: Statistics education research; Probability; Randomness; Bayesian Inference 1. INTRODUCTION In recent years Bayesian statistical practice has considerably evolved. Nowadays, the frequentist approach is increasingly challenged among scientists by the Bayesian proponents (see e.g., D’Agostini, 2000; Lecoutre, Lecoutre & Poitevineau, 2001; Jaynes, 2003). In applied statistics, “objective Bayesian techniques” (Berger, 2004) are now a promising alternative to the traditional frequentist statistical inference procedures (significance tests and confidence intervals). Subjective Bayesian analysis also has a role to play in scientific investigations (see e.g., Kadane, 1996).
    [Show full text]
  • 1 Stochastic Processes and Their Classification
    1 1 STOCHASTIC PROCESSES AND THEIR CLASSIFICATION 1.1 DEFINITION AND EXAMPLES Definition 1. Stochastic process or random process is a collection of random variables ordered by an index set. ☛ Example 1. Random variables X0;X1;X2;::: form a stochastic process ordered by the discrete index set f0; 1; 2;::: g: Notation: fXn : n = 0; 1; 2;::: g: ☛ Example 2. Stochastic process fYt : t ¸ 0g: with continuous index set ft : t ¸ 0g: The indices n and t are often referred to as "time", so that Xn is a descrete-time process and Yt is a continuous-time process. Convention: the index set of a stochastic process is always infinite. The range (possible values) of the random variables in a stochastic process is called the state space of the process. We consider both discrete-state and continuous-state processes. Further examples: ☛ Example 3. fXn : n = 0; 1; 2;::: g; where the state space of Xn is f0; 1; 2; 3; 4g representing which of four types of transactions a person submits to an on-line data- base service, and time n corresponds to the number of transactions submitted. ☛ Example 4. fXn : n = 0; 1; 2;::: g; where the state space of Xn is f1; 2g re- presenting whether an electronic component is acceptable or defective, and time n corresponds to the number of components produced. ☛ Example 5. fYt : t ¸ 0g; where the state space of Yt is f0; 1; 2;::: g representing the number of accidents that have occurred at an intersection, and time t corresponds to weeks. ☛ Example 6. fYt : t ¸ 0g; where the state space of Yt is f0; 1; 2; : : : ; sg representing the number of copies of a software product in inventory, and time t corresponds to days.
    [Show full text]
  • The Interplay of Bayesian and Frequentist Analysis M.J.Bayarriandj.O.Berger
    Statistical Science 2004, Vol. 19, No. 1, 58–80 DOI 10.1214/088342304000000116 © Institute of Mathematical Statistics, 2004 The Interplay of Bayesian and Frequentist Analysis M.J.BayarriandJ.O.Berger Abstract. Statistics has struggled for nearly a century over the issue of whether the Bayesian or frequentist paradigm is superior. This debate is far from over and, indeed, should continue, since there are fundamental philosophical and pedagogical issues at stake. At the methodological level, however, the debate has become considerably muted, with the recognition that each approach has a great deal to contribute to statistical practice and each is actually essential for full development of the other approach. In this article, we embark upon a rather idiosyncratic walk through some of these issues. Key words and phrases: Admissibility, Bayesian model checking, condi- tional frequentist, confidence intervals, consistency, coverage, design, hierar- chical models, nonparametric Bayes, objective Bayesian methods, p-values, reference priors, testing. CONTENTS 5. Areas of current disagreement 6. Conclusions 1. Introduction Acknowledgments 2. Inherently joint Bayesian–frequentist situations References 2.1. Design or preposterior analysis 2.2. The meaning of frequentism 2.3. Empirical Bayes, gamma minimax, restricted risk 1. INTRODUCTION Bayes Statisticians should readily use both Bayesian and 3. Estimation and confidence intervals frequentist ideas. In Section 2 we discuss situations 3.1. Computation with hierarchical, multilevel or mixed in which simultaneous frequentist and Bayesian think- model analysis ing is essentially required. For the most part, how- 3.2. Assessment of accuracy of estimation ever, the situations we discuss are situations in which 3.3. Foundations, minimaxity and exchangeability 3.4.
    [Show full text]
  • Frequentism-As-Model
    Frequentism-as-model Christian Hennig, Dipartimento di Scienze Statistiche, Universita die Bologna, Via delle Belle Arti, 41, 40126 Bologna [email protected] July 14, 2020 Abstract: Most statisticians are aware that probability models interpreted in a frequentist manner are not really true in objective reality, but only idealisations. I argue that this is often ignored when actually applying frequentist methods and interpreting the results, and that keeping up the awareness for the essential difference between reality and models can lead to a more appropriate use and in- terpretation of frequentist models and methods, called frequentism-as-model. This is elaborated showing connections to existing work, appreciating the special role of i.i.d. models and subject matter knowledge, giving an account of how and under what conditions models that are not true can be useful, giving detailed interpreta- tions of tests and confidence intervals, confronting their implicit compatibility logic with the inverse probability logic of Bayesian inference, re-interpreting the role of model assumptions, appreciating robustness, and the role of “interpretative equiv- alence” of models. Epistemic (often referred to as Bayesian) probability shares the issue that its models are only idealisations and not really true for modelling reasoning about uncertainty, meaning that it does not have an essential advantage over frequentism, as is often claimed. Bayesian statistics can be combined with frequentism-as-model, leading to what Gelman and Hennig (2017) call “falsifica- arXiv:2007.05748v1 [stat.OT] 11 Jul 2020 tionist Bayes”. Key Words: foundations of statistics, Bayesian statistics, interpretational equiv- alence, compatibility logic, inverse probability logic, misspecification testing, sta- bility, robustness 1 Introduction The frequentist interpretation of probability and frequentist inference such as hy- pothesis tests and confidence intervals have been strongly criticised recently (e.g., Hajek (2009); Diaconis and Skyrms (2018); Wasserstein et al.
    [Show full text]
  • The P–T Probability Framework for Semantic Communication, Falsification, Confirmation, and Bayesian Reasoning
    philosophies Article The P–T Probability Framework for Semantic Communication, Falsification, Confirmation, and Bayesian Reasoning Chenguang Lu School of Computer Engineering and Applied Mathematics, Changsha University, Changsha 410003, China; [email protected] Received: 8 July 2020; Accepted: 16 September 2020; Published: 2 October 2020 Abstract: Many researchers want to unify probability and logic by defining logical probability or probabilistic logic reasonably. This paper tries to unify statistics and logic so that we can use both statistical probability and logical probability at the same time. For this purpose, this paper proposes the P–T probability framework, which is assembled with Shannon’s statistical probability framework for communication, Kolmogorov’s probability axioms for logical probability, and Zadeh’s membership functions used as truth functions. Two kinds of probabilities are connected by an extended Bayes’ theorem, with which we can convert a likelihood function and a truth function from one to another. Hence, we can train truth functions (in logic) by sampling distributions (in statistics). This probability framework was developed in the author’s long-term studies on semantic information, statistical learning, and color vision. This paper first proposes the P–T probability framework and explains different probabilities in it by its applications to semantic information theory. Then, this framework and the semantic information methods are applied to statistical learning, statistical mechanics, hypothesis evaluation (including falsification), confirmation, and Bayesian reasoning. Theoretical applications illustrate the reasonability and practicability of this framework. This framework is helpful for interpretable AI. To interpret neural networks, we need further study. Keywords: statistical probability; logical probability; semantic information; rate-distortion; Boltzmann distribution; falsification; verisimilitude; confirmation measure; Raven Paradox; Bayesian reasoning 1.
    [Show full text]