A Brief Overview of Probability Theory in Data Science by Geert

Total Page:16

File Type:pdf, Size:1020Kb

A Brief Overview of Probability Theory in Data Science by Geert A brief overview of probability theory in data science Geert Verdoolaege 1Department of Applied Physics, Ghent University, Ghent, Belgium 2Laboratory for Plasma Physics, Royal Military Academy (LPP–ERM/KMS), Brussels, Belgium Tutorial 3rd IAEA Technical Meeting on Fusion Data Processing, Validation and Analysis, 27-05-2019 Overview 1 Origins of probability 2 Frequentist methods and statistics 3 Principles of Bayesian probability theory 4 Monte Carlo computational methods 5 Applications Classification Regression analysis 6 Conclusions and references 2 Overview 1 Origins of probability 2 Frequentist methods and statistics 3 Principles of Bayesian probability theory 4 Monte Carlo computational methods 5 Applications Classification Regression analysis 6 Conclusions and references 3 Early history of probability Earliest traces in Western civilization: Jewish writings, Aristotle Notion of probability in law, based on evidence Usage in finance Usage and demonstration in gambling 4 Middle Ages World is knowable but uncertainty due to human ignorance William of Ockham: Ockham’s razor Probabilis: a supposedly ‘provable’ opinion Counting of authorities Later: degree of truth, a scale Quantification: Law, faith ! Bayesian notion Gaming ! frequentist notion 5 Quantification 17th century: Pascal, Fermat, Huygens Comparative testing of hypotheses Population statistics 1713: Ars Conjectandi by Jacob Bernoulli: Weak law of large numbers Principle of indifference De Moivre (1718): The Doctrine of Chances 6 Bayes and Laplace Paper by Thomas Bayes (1763): inversion of binomial distribution Pierre Simon Laplace: Practical applications in physical sciences 1812: Théorie Analytique des Probabilités 7 Frequentism and sampling theory George Boole (1854), John Venn (1866) Sampling from a ‘population’ Notion of ‘ensembles’ Andrey Kolmogorov (1930s): axioms of probability Early 20th century classical statistics: Ronald Fisher, Karl Pearson, Jerzy Neyman, Egon Pearson Applications in social sciences, medicine, natural sciences 8 Maximum entropy and Bayesian methods Statistical mechanics with Ludwig Boltzmann (1868): maximum entropy energy distribution Josiah Willard Gibbs (1874–1878) Claude Shannon (1948): maximum entropy in frequentist language Edwin Thompson Jaynes: Bayesian re-interpretation Harold Jeffreys (1939): standard work on (Bayesian) probability Richard T. Cox (1946): derivation of probability laws Computers: flowering of Bayesian methods 9 Overview 1 Origins of probability 2 Frequentist methods and statistics 3 Principles of Bayesian probability theory 4 Monte Carlo computational methods 5 Applications Classification Regression analysis 6 Conclusions and references 10 Frequentist probability Probability = frequency Straightforward: Number of 1s in 60 dice throws ≈ 10: p = 1/6 Probability of plasma disruption p ≈ Ndisr/Ntot Less straightforward: Probability of fusion electricity by 2050? Probability of mass of Saturn 90 mA ≤ mS < 100 mA? 11 Populations vs. sample 12 Statistics E.g. weight w of Belgian men: unknown but fixed for every individual Average weight mw in population? Random variable W Sample: W1, W2,..., Wn Average weight: statistic (estimator) W Central limit theorem: p W ∼ p(Wjmw, sw) ) W ∼ N (mw, sw/ n) 13 Maximum likelihood parameter estimation Maximum likelihood (ML) principle: mˆw = arg max p(W1,..., Wnjmw, sw) + mw2R n 1 (W − m )2 ≈ p − i w arg max ∏ exp 2 + 2s mw2R i=1 2psw w " # 1 n (W − m )2 = p − i w arg max exp ∑ 2 + 2s mw2R 2psw i=1 w ML estimator (known sw): 1 n mˆw = W = ∑ Wi n i=1 14 Frequentist hypothesis testing Weight of Dutch men compared to Belgian men (populations) Observed sample averages WNL, WBE Null hypothesis H0: mw,NL = mw,BE Test statistic: W − W NL BE ∼ N (0, 1) (under H ) s 0 WCZ−WBE 15 Overview 1 Origins of probability 2 Frequentist methods and statistics 3 Principles of Bayesian probability theory 4 Monte Carlo computational methods 5 Applications Classification Regression analysis 6 Conclusions and references 16 Probability theory: quantifying uncertainty Statistical (aleatoric) vs. epistemic uncertainty? Every piece of information has uncertainty Uncertainty = lack of information Observation may reduce uncertainty Probability (distribution) quantifies uncertainty 17 Example: physical sciences Measurement of physical quantity Origin of stochasticity: Apparatus Microscopic fluctuations Systematic uncertainty is assigned a probability distribution E.g. coin tossing, voltage measurement, probability of hypothesis vs. another, . Bayesian: no random variables 18 What is probability? Objective Bayesian view Probability = real number 2 [0, 1] Always conditioned on known information Notation: p(AjB) or p(AjI) Extension of logic: measure of degree to which B implies A Degree of plausibility, but subject to consistency Same information ) same probabilities Probability distribution: outcome ! probability 19 Joint, marginal and conditional distributions p(x, y), p(x), p(y), p(xjy), p(yjx) 20 Example: normal distribution Normal/Gaussian probability density function (PDF): 1 (x − m)2 p(xjm, s, I) = p exp − 2ps 2s2 Probability x1 ≤ x < x1 + dx Inverse problem: m, s given x? 21 Updating information states Bayes’ theorem x = data vector p(xjq, I)p(qjI) p(qjx, I)= q = vector of model parameters p(xjI) I = implicit knowledge Likelihood: misfit between model and data Prior distribution: ‘expert’ or diffuse knowledge Evidence: Z Z p(xjI) = p(x, qjI) dq = p(xjq, I)p(qjI) dq Posterior distribution 22 Updating information states Bayes’ theorem x = data vector p(xjq, I)p(qjI) p(qjx, I)= q = vector of model parameters p(xjI) I = implicit knowledge Likelihood: misfit between model and data Prior distribution: ‘expert’ or diffuse knowledge Evidence: Z Z p(xjI) = p(x, qjI) dq = p(xjq, I)p(qjI) dq Posterior distribution 22 Updating information states Bayes’ theorem x = data vector p(xjq, I)p(qjI) p(qjx, I)= q = vector of model parameters p(xjI) I = implicit knowledge Likelihood: misfit between model and data Prior distribution: ‘expert’ or diffuse knowledge Evidence: Z Z p(xjI) = p(x, qjI) dq = p(xjq, I)p(qjI) dq Posterior distribution 22 Updating information states Bayes’ theorem x = data vector p(xjq, I)p(qjI) p(qjx, I)= q = vector of model parameters p(xjI) I = implicit knowledge Likelihood: misfit between model and data Prior distribution: ‘expert’ or diffuse knowledge Evidence: Z Z p(xjI) = p(x, qjI) dq = p(xjq, I)p(qjI) dq Posterior distribution 22 Practical considerations Updating state of knowledge = learning ‘Uninformative’ prior (distribution) Assigning uninformative priors: Transformation invariance (Jaynes ’68): e.g. uniform for location variable Show me the prior! Principle of indifference Testable information: maximum entropy ... 23 Mean of a normal distribution: uniform prior n measurements xi ! x Independent and identically distributed xi: n 1 (x − m)2 ( j ) = p − i p x m, s, I ∏ exp 2 i=1 2ps 2s Bayes’ rule: p(m, sjx, I) ∝ p(xjm, s, I)p(m, sjI) Suppose s ≡ se ! delta function Assume m 2 [mmin, mmax] ! uniform prior: 8 1 < , if m 2 [mmin, mmax] p(mjI) = mmax − mmin :0, otherwise Let mmin ! −¥, mmax ! +¥ ) improper prior Ensure proper posterior 24 Posterior for m Posterior: n (x − m)2 ( j ) − ∑i=1 i p m x, I ∝ exp 2 2se Define n n 1 2 1 2 x ≡ ∑ xi, (Dx) ≡ ∑(xi − x¯) n i=1 n i=1 Adding and subtracting 2nx2 (‘completing the square’), 1 h i ( j ) − ( − )2 + ( )2 p m x, I ∝ exp 2 m x Dx 2se /n Retaining dependence on m, (m − x)2 ( j ) − p m x, I ∝ exp 2 2se /n 2 m ∼ N (x, se /n) 25 Mean of a normal distribution: normal prior 2 Normal prior: m ∼ N (m0, t ) Posterior: n (x − m)2 (m − m )2 ( j ) − ∑i=1 i × − 0 p m x, I ∝ exp 2 exp 2 2se 2t Expanding and completing the square, 2 m ∼ N (mn, sn ), where − n 1 n 1 1 ≡ 2 + ≡ 2 + mn sn 2 x 2 m0 and mn sn 2 2 se t se t mn is weighted average of m0 and x 26 Care for the prior Intrinsic part of Bayesian theory, enabling coherent learning High-quality data may be scarce Regularization of ill-posed problems (e.g. tomography) Prior can retain influence: e.g. regression with errors in all variables 27 Unknown mean and standard deviation Repeated measurements ! information on s Scale variable s ! Jeffreys’ scale prior: 1 p(sjI) ∝ , s 2 ]0, +¥[ s Posterior: " # 1 (m − x)2 + (Dx)2 1 p(m, sjx, I) ∝ exp − × sn 2s2/n s 28 Marginal posterior for m (1) Marginalization = integrating out a (nuisance) parameter: Z +¥ p(mjx, I) = p(m, sjx, I) ds 0 " #− n Z +¥ 2 2 2 1 (m − x) + (Dx) n −1 −s ∝ s 2 e ds 0 2 2/n " #− n 1 n (m − x)2 + (Dx)2 2 = G , 2 2 2/n where (m − x)2 + (Dx)2 s ≡ 2s2/n 29 Marginal posterior for m (2) After normalization: − n n " 2 # 2 G 2 (m − x) p(mjx, I) = q 1 + 2 n−1 (Dx)2 p(Dx) G 2 Changing variables, (m − x)2 t ≡ q , with p(tjx, I)dt ≡ p(mjx, I)dm, (Dx)2/(n − 1) − n G n t2 2 p(tjx, I) = 2 1 + p n−1 n − 1 (n − 1)pG 2 Student’s t-distribution with parameter n = n − 1 If n 1, " # 1 (m − x)2 p(mjx, I) −! q exp − ( )2 2(Dx)2/n 2p Dx /n 30 Marginal posterior for s Marginalization of m: Z +¥ p(sjx, I) = p(m, sjx, I) dm −¥ " # 1 (Dx)2 ∝ exp − sn 2s2/n Setting X ≡ n(Dx)2/s2, 1 k −1 − X ( jx ) = 2 2 ≡ − p X , I k X e , k n 1 2 k 2 G 2 c2 distribution with parameter k 31 The Laplace approximation (1) Laplace (saddle point) approximation of distributions around the mode (= maximum) E.g. marginal for m: − n n " 2 # 2 G 2 (m − x) p(mjx, I) = q 1 + 2 n−1 (Dx)2 p(Dx) G 2 Taylor expansion around mode: 1 d2(ln p) [ ( j )] ≈ [ ( j )] + ( − )2 ln p m x, I ln p x x, I 2 m x 2 dm m=x h ni n − 1 = ln G − ln G 2 2 1 h i n − ln p(Dx)2 − (m − x)2 2 2(Dx)2 32 The Laplace approximation
Recommended publications
  • Creating Modern Probability. Its Mathematics, Physics and Philosophy in Historical Perspective
    HM 23 REVIEWS 203 The reviewer hopes this book will be widely read and enjoyed, and that it will be followed by other volumes telling even more of the fascinating story of Soviet mathematics. It should also be followed in a few years by an update, so that we can know if this great accumulation of talent will have survived the economic and political crisis that is just now robbing it of many of its most brilliant stars (see the article, ``To guard the future of Soviet mathematics,'' by A. M. Vershik, O. Ya. Viro, and L. A. Bokut' in Vol. 14 (1992) of The Mathematical Intelligencer). Creating Modern Probability. Its Mathematics, Physics and Philosophy in Historical Perspective. By Jan von Plato. Cambridge/New York/Melbourne (Cambridge Univ. Press). 1994. 323 pp. View metadata, citation and similar papers at core.ac.uk brought to you by CORE Reviewed by THOMAS HOCHKIRCHEN* provided by Elsevier - Publisher Connector Fachbereich Mathematik, Bergische UniversitaÈt Wuppertal, 42097 Wuppertal, Germany Aside from the role probabilistic concepts play in modern science, the history of the axiomatic foundation of probability theory is interesting from at least two more points of view. Probability as it is understood nowadays, probability in the sense of Kolmogorov (see [3]), is not easy to grasp, since the de®nition of probability as a normalized measure on a s-algebra of ``events'' is not a very obvious one. Furthermore, the discussion of different concepts of probability might help in under- standing the philosophy and role of ``applied mathematics.'' So the exploration of the creation of axiomatic probability should be interesting not only for historians of science but also for people concerned with didactics of mathematics and for those concerned with philosophical questions.
    [Show full text]
  • There Is No Pure Empirical Reasoning
    There Is No Pure Empirical Reasoning 1. Empiricism and the Question of Empirical Reasons Empiricism may be defined as the view there is no a priori justification for any synthetic claim. Critics object that empiricism cannot account for all the kinds of knowledge we seem to possess, such as moral knowledge, metaphysical knowledge, mathematical knowledge, and modal knowledge.1 In some cases, empiricists try to account for these types of knowledge; in other cases, they shrug off the objections, happily concluding, for example, that there is no moral knowledge, or that there is no metaphysical knowledge.2 But empiricism cannot shrug off just any type of knowledge; to be minimally plausible, empiricism must, for example, at least be able to account for paradigm instances of empirical knowledge, including especially scientific knowledge. Empirical knowledge can be divided into three categories: (a) knowledge by direct observation; (b) knowledge that is deductively inferred from observations; and (c) knowledge that is non-deductively inferred from observations, including knowledge arrived at by induction and inference to the best explanation. Category (c) includes all scientific knowledge. This category is of particular import to empiricists, many of whom take scientific knowledge as a sort of paradigm for knowledge in general; indeed, this forms a central source of motivation for empiricism.3 Thus, if there is any kind of knowledge that empiricists need to be able to account for, it is knowledge of type (c). I use the term “empirical reasoning” to refer to the reasoning involved in acquiring this type of knowledge – that is, to any instance of reasoning in which (i) the premises are justified directly by observation, (ii) the reasoning is non- deductive, and (iii) the reasoning provides adequate justification for the conclusion.
    [Show full text]
  • The Interpretation of Probability: Still an Open Issue? 1
    philosophies Article The Interpretation of Probability: Still an Open Issue? 1 Maria Carla Galavotti Department of Philosophy and Communication, University of Bologna, Via Zamboni 38, 40126 Bologna, Italy; [email protected] Received: 19 July 2017; Accepted: 19 August 2017; Published: 29 August 2017 Abstract: Probability as understood today, namely as a quantitative notion expressible by means of a function ranging in the interval between 0–1, took shape in the mid-17th century, and presents both a mathematical and a philosophical aspect. Of these two sides, the second is by far the most controversial, and fuels a heated debate, still ongoing. After a short historical sketch of the birth and developments of probability, its major interpretations are outlined, by referring to the work of their most prominent representatives. The final section addresses the question of whether any of such interpretations can presently be considered predominant, which is answered in the negative. Keywords: probability; classical theory; frequentism; logicism; subjectivism; propensity 1. A Long Story Made Short Probability, taken as a quantitative notion whose value ranges in the interval between 0 and 1, emerged around the middle of the 17th century thanks to the work of two leading French mathematicians: Blaise Pascal and Pierre Fermat. According to a well-known anecdote: “a problem about games of chance proposed to an austere Jansenist by a man of the world was the origin of the calculus of probabilities”2. The ‘man of the world’ was the French gentleman Chevalier de Méré, a conspicuous figure at the court of Louis XIV, who asked Pascal—the ‘austere Jansenist’—the solution to some questions regarding gambling, such as how many dice tosses are needed to have a fair chance to obtain a double-six, or how the players should divide the stakes if a game is interrupted.
    [Show full text]
  • Bayesian Versus Frequentist Statistics for Uncertainty Analysis
    - 1 - Disentangling Classical and Bayesian Approaches to Uncertainty Analysis Robin Willink1 and Rod White2 1email: [email protected] 2Measurement Standards Laboratory PO Box 31310, Lower Hutt 5040 New Zealand email(corresponding author): [email protected] Abstract Since the 1980s, we have seen a gradual shift in the uncertainty analyses recommended in the metrological literature, principally Metrologia, and in the BIPM’s guidance documents; the Guide to the Expression of Uncertainty in Measurement (GUM) and its two supplements. The shift has seen the BIPM’s recommendations change from a purely classical or frequentist analysis to a purely Bayesian analysis. Despite this drift, most metrologists continue to use the predominantly frequentist approach of the GUM and wonder what the differences are, why there are such bitter disputes about the two approaches, and should I change? The primary purpose of this note is to inform metrologists of the differences between the frequentist and Bayesian approaches and the consequences of those differences. It is often claimed that a Bayesian approach is philosophically consistent and is able to tackle problems beyond the reach of classical statistics. However, while the philosophical consistency of the of Bayesian analyses may be more aesthetically pleasing, the value to science of any statistical analysis is in the long-term success rates and on this point, classical methods perform well and Bayesian analyses can perform poorly. Thus an important secondary purpose of this note is to highlight some of the weaknesses of the Bayesian approach. We argue that moving away from well-established, easily- taught frequentist methods that perform well, to computationally expensive and numerically inferior Bayesian analyses recommended by the GUM supplements is ill-advised.
    [Show full text]
  • Practical Statistics for Particle Physics Lecture 1 AEPS2018, Quy Nhon, Vietnam
    Practical Statistics for Particle Physics Lecture 1 AEPS2018, Quy Nhon, Vietnam Roger Barlow The University of Huddersfield August 2018 Roger Barlow ( Huddersfield) Statistics for Particle Physics August 2018 1 / 34 Lecture 1: The Basics 1 Probability What is it? Frequentist Probability Conditional Probability and Bayes' Theorem Bayesian Probability 2 Probability distributions and their properties Expectation Values Binomial, Poisson and Gaussian 3 Hypothesis testing Roger Barlow ( Huddersfield) Statistics for Particle Physics August 2018 2 / 34 Question: What is Probability? Typical exam question Q1 Explain what is meant by the Probability PA of an event A [1] Roger Barlow ( Huddersfield) Statistics for Particle Physics August 2018 3 / 34 Four possible answers PA is number obeying certain mathematical rules. PA is a property of A that determines how often A happens For N trials in which A occurs NA times, PA is the limit of NA=N for large N PA is my belief that A will happen, measurable by seeing what odds I will accept in a bet. Roger Barlow ( Huddersfield) Statistics for Particle Physics August 2018 4 / 34 Mathematical Kolmogorov Axioms: For all A ⊂ S PA ≥ 0 PS = 1 P(A[B) = PA + PB if A \ B = ϕ and A; B ⊂ S From these simple axioms a complete and complicated structure can be − ≤ erected. E.g. show PA = 1 PA, and show PA 1.... But!!! This says nothing about what PA actually means. Kolmogorov had frequentist probability in mind, but these axioms apply to any definition. Roger Barlow ( Huddersfield) Statistics for Particle Physics August 2018 5 / 34 Classical or Real probability Evolved during the 18th-19th century Developed (Pascal, Laplace and others) to serve the gambling industry.
    [Show full text]
  • This History of Modern Mathematical Statistics Retraces Their Development
    BOOK REVIEWS GORROOCHURN Prakash, 2016, Classic Topics on the History of Modern Mathematical Statistics: From Laplace to More Recent Times, Hoboken, NJ, John Wiley & Sons, Inc., 754 p. This history of modern mathematical statistics retraces their development from the “Laplacean revolution,” as the author so rightly calls it (though the beginnings are to be found in Bayes’ 1763 essay(1)), through the mid-twentieth century and Fisher’s major contribution. Up to the nineteenth century the book covers the same ground as Stigler’s history of statistics(2), though with notable differences (see infra). It then discusses developments through the first half of the twentieth century: Fisher’s synthesis but also the renewal of Bayesian methods, which implied a return to Laplace. Part I offers an in-depth, chronological account of Laplace’s approach to probability, with all the mathematical detail and deductions he drew from it. It begins with his first innovative articles and concludes with his philosophical synthesis showing that all fields of human knowledge are connected to the theory of probabilities. Here Gorrouchurn raises a problem that Stigler does not, that of induction (pp. 102-113), a notion that gives us a better understanding of probability according to Laplace. The term induction has two meanings, the first put forward by Bacon(3) in 1620, the second by Hume(4) in 1748. Gorroochurn discusses only the second. For Bacon, induction meant discovering the principles of a system by studying its properties through observation and experimentation. For Hume, induction was mere enumeration and could not lead to certainty. Laplace followed Bacon: “The surest method which can guide us in the search for truth, consists in rising by induction from phenomena to laws and from laws to forces”(5).
    [Show full text]
  • Statistical Modelling
    Statistical Modelling Dave Woods and Antony Overstall (Chapters 1–2 closely based on original notes by Anthony Davison and Jon Forster) c 2016 StatisticalModelling ............................... ......................... 0 1. Model Selection 1 Overview............................................ .................. 2 Basic Ideas 3 Whymodel?.......................................... .................. 4 Criteriaformodelselection............................ ...................... 5 Motivation......................................... .................... 6 Setting ............................................ ................... 9 Logisticregression................................... .................... 10 Nodalinvolvement................................... .................... 11 Loglikelihood...................................... .................... 14 Wrongmodel.......................................... ................ 15 Out-of-sampleprediction ............................. ..................... 17 Informationcriteria................................... ................... 18 Nodalinvolvement................................... .................... 20 Theoreticalaspects.................................. .................... 21 PropertiesofAIC,NIC,BIC............................... ................. 22 Linear Model 23 Variableselection ................................... .................... 24 Stepwisemethods.................................... ................... 25 Nuclearpowerstationdata............................ ....................
    [Show full text]
  • “To Be a Good Mathematician, You Need Inner Voice” ”Огонёкъ” Met with Yakov Sinai, One of the World’S Most Renowned Mathematicians
    “To be a good mathematician, you need inner voice” ”ОгонёкЪ” met with Yakov Sinai, one of the world’s most renowned mathematicians. As the new year begins, Russia intensifies the preparations for the International Congress of Mathematicians (ICM 2022), the main mathematical event of the near future. 1966 was the last time we welcomed the crème de la crème of mathematics in Moscow. “Огонёк” met with Yakov Sinai, one of the world’s top mathematicians who spoke at ICM more than once, and found out what he thinks about order and chaos in the modern world.1 Committed to science. Yakov Sinai's close-up. One of the most eminent mathematicians of our time, Yakov Sinai has spent most of his life studying order and chaos, a research endeavor at the junction of probability theory, dynamical systems theory, and mathematical physics. Born into a family of Moscow scientists on September 21, 1935, Yakov Sinai graduated from the department of Mechanics and Mathematics (‘Mekhmat’) of Moscow State University in 1957. Andrey Kolmogorov’s most famous student, Sinai contributed to a wealth of mathematical discoveries that bear his and his teacher’s names, such as Kolmogorov-Sinai entropy, Sinai’s billiards, Sinai’s random walk, and more. From 1998 to 2002, he chaired the Fields Committee which awards one of the world’s most prestigious medals every four years. Yakov Sinai is a winner of nearly all coveted mathematical distinctions: the Abel Prize (an equivalent of the Nobel Prize for mathematicians), the Kolmogorov Medal, the Moscow Mathematical Society Award, the Boltzmann Medal, the Dirac Medal, the Dannie Heineman Prize, the Wolf Prize, the Jürgen Moser Prize, the Henri Poincaré Prize, and more.
    [Show full text]
  • The Likelihood Principle
    1 01/28/99 ãMarc Nerlove 1999 Chapter 1: The Likelihood Principle "What has now appeared is that the mathematical concept of probability is ... inadequate to express our mental confidence or diffidence in making ... inferences, and that the mathematical quantity which usually appears to be appropriate for measuring our order of preference among different possible populations does not in fact obey the laws of probability. To distinguish it from probability, I have used the term 'Likelihood' to designate this quantity; since both the words 'likelihood' and 'probability' are loosely used in common speech to cover both kinds of relationship." R. A. Fisher, Statistical Methods for Research Workers, 1925. "What we can find from a sample is the likelihood of any particular value of r [a parameter], if we define the likelihood as a quantity proportional to the probability that, from a particular population having that particular value of r, a sample having the observed value r [a statistic] should be obtained. So defined, probability and likelihood are quantities of an entirely different nature." R. A. Fisher, "On the 'Probable Error' of a Coefficient of Correlation Deduced from a Small Sample," Metron, 1:3-32, 1921. Introduction The likelihood principle as stated by Edwards (1972, p. 30) is that Within the framework of a statistical model, all the information which the data provide concerning the relative merits of two hypotheses is contained in the likelihood ratio of those hypotheses on the data. ...For a continuum of hypotheses, this principle
    [Show full text]
  • Excerpts from Kolmogorov's Diary
    Asia Pacific Mathematics Newsletter Excerpts from Kolmogorov’s Diary Translated by Fedor Duzhin Introduction At the age of 40, Andrey Nikolaevich Kolmogorov (1903–1987) began a diary. He wrote on the title page: “Dedicated to myself when I turn 80 with the wish of retaining enough sense by then, at least to be able to understand the notes of this 40-year-old self and to judge them sympathetically but strictly”. The notes from 1943–1945 were published in Russian in 2003 on the 100th anniversary of the birth of Kolmogorov as part of the opus Kolmogorov — a three-volume collection of articles about him. The following are translations of a few selected records from the Kolmogorov’s diary (Volume 3, pages 27, 28, 36, 95). Sunday, 1 August 1943. New Moon 6:30 am. It is a little misty and yet a sunny morning. Pusya1 and Oleg2 have gone swimming while I stay home being not very well (though my condition is improving). Anya3 has to work today, so she will not come. I’m feeling annoyed and ill at ease because of that (for the second time our Sunday “readings” will be conducted without Anya). Why begin this notebook now? There are two reasonable explanations: 1) I have long been attracted to the idea of a diary as a disciplining force. To write down what has been done and what changes are needed in one’s life and to control their implementation is by no means a new idea, but it’s equally relevant whether one is 16 or 40 years old.
    [Show full text]
  • Randomness and Computation
    Randomness and Computation Rod Downey School of Mathematics and Statistics, Victoria University, PO Box 600, Wellington, New Zealand [email protected] Abstract This article examines work seeking to understand randomness using com- putational tools. The focus here will be how these studies interact with classical mathematics, and progress in the recent decade. A few representa- tive and easier proofs are given, but mainly we will refer to the literature. The article could be seen as a companion to, as well as focusing on develop- ments since, the paper “Calibrating Randomness” from 2006 which focused more on how randomness calibrations correlated to computational ones. 1 Introduction The great Russian mathematician Andrey Kolmogorov appears several times in this paper, First, around 1930, Kolmogorov and others founded the theory of probability, basing it on measure theory. Kolmogorov’s foundation does not seek to give any meaning to the notion of an individual object, such as a single real number or binary string, being random, but rather studies the expected values of random variables. As we learn at school, all strings of length n have the same probability of 2−n for a fair coin. A set consisting of a single real has probability zero. Thus there is no meaning we can ascribe to randomness of a single object. Yet we have a persistent intuition that certain strings of coin tosses are less random than others. The goal of the theory of algorithmic randomness is to give meaning to randomness content for individual objects. Quite aside from the intrinsic mathematical interest, the utility of this theory is that using such objects instead distributions might be significantly simpler and perhaps giving alternative insight into what randomness might mean in mathematics, and perhaps in nature.
    [Show full text]
  • Introduction to Bayesian Estimation
    Introduction to Bayesian Estimation March 2, 2016 The Plan 1. What is Bayesian statistics? How is it different from frequentist methods? 2. 4 Bayesian Principles 3. Overview of main concepts in Bayesian analysis Main reading: Ch.1 in Gary Koop's Bayesian Econometrics Additional material: Christian P. Robert's The Bayesian Choice: From decision theoretic foundations to Computational Implementation Gelman, Carlin, Stern, Dunson, Wehtari and Rubin's Bayesian Data Analysis Probability and statistics; What's the difference? Probability is a branch of mathematics I There is little disagreement about whether the theorems follow from the axioms Statistics is an inversion problem: What is a good probabilistic description of the world, given the observed outcomes? I There is some disagreement about how we interpret data/observations and how we make inference about unobservable parameters Why probabilistic models? Is the world characterized by randomness? I Is the weather random? I Is a coin flip random? I ECB interest rates? It is difficult to say with certainty whether something is \truly" random. Two schools of statistics What is the meaning of probability, randomness and uncertainty? Two main schools of thought: I The classical (or frequentist) view is that probability corresponds to the frequency of occurrence in repeated experiments I The Bayesian view is that probabilities are statements about our state of knowledge, i.e. a subjective view. The difference has implications for how we interpret estimated statistical models and there is no general
    [Show full text]