Enigma of Karl Pearson and Bayesian Inference

The Enigma of Karl Pearson and Bayesian Inference John Aldrich University of Southampton Karl Pearson sesquicentenary conference, the Royal Statistical Society Friday 23 March 2007 An enigmatic position in the history of the theory of probability is occupied by Karl Pearson.* Harold Jeffreys Theory of Probability (1939) * the theory of probability = Bayesian theory the enigma Pearson was a Bayesian in philosophy of science but not in statistics where he used the method of moments in estimation and P values in testing HJ felt the enigma because He first knew Pearson as a philosopher of science Pearson should have written the Theory of Probability – he should have been me! Historians of economics have an expression … In the 19th century German scholars wondered how Adam Smith could have written two very different books on ethics and economics. So das Adam Smith problem. So with reference to the integrity of KP’s work das Karl Pearson Problem An easy solution to the problem? The philosophy that impressed Jeffreys is in the Grammar of Science. The relevant parts were written in 1891-2 before KP did any statistical work. KP changed his mind or he never had a mind to do Bayesian statistics but … but … but KP believed his philosophical tenets were relevant to his statistical practice and never changed his mind about this or about the tenets KP did Bayesian statistical work at all stages of his career (HJ seems to have been unaware of this) KP even held that inverse probability is the basis of modern statistical theory The problem is deeper than HJ realised My goals to describe Pearson’s Bayesian philosophy of science to describe his Bayesian statistics to speculate on the link between this Bayesianism and the frequentism in most of his statistical writing Most of his statistical writing !!!!!!!!???????? KP wrote 400 papers over 40 years (1894- 1936) 400 observations on his doing the doing changed no Theory of Probability —no summa Identifying the big picture … No treatise or textbook by KP, only Tables for Statisticians and Biometricians 1914 & 1931 Lectures as reported by Yule 1894-6 Gosset 1907 and Isserlis 1913 KP’s own scrappy lecture notes Yule later reflected on his 1894-6 notes A straightforward, organized, logically developed course could hardly then exist when the very elements of the subject were being developed... but to modern eyes there was never a “straightforward, organized, logically developed course” Textbooks by the followers ? W. P. Elderton Frequency-Curves and Correlation (1906) faithful but limited G. U. Yule Introduction to the Theory of Statistics (1911) more an introduction to Yulean statistics Retrospectives on KP’s statistics Egon Pearson’s biography (1936/8) Churchill Eisenhart’s DSB article (1974) neither very detailed Egon notices the Bayesian project(s)—he was part of them from 1920 Eisenhart misses them—more or less Other impressions from the literature Bayes gone by 1900 or never there Stigler and Porter—general histories Bayes (or Laplace) always there Dale History of Inverse Probability from Thomas Bayes to Karl Pearson Hald A History of Mathematical Statistics from 1750 to 1930 The career 1870s undergraduate—Bayesian training 1880s otherwise occupied–elasticity & other matters 1890s establishing the permanent way 1900s the steady state—Bayes and non-Bayes coexist 1910s accommodating small sample theory 1920s Bayesian break-out 1930s the end When Karl was Carl The undergraduate Carl Pearson produced a nice set of notes on the Theory of Probability from lectures by Henry Martyn Taylor. Our statistical inference appears under the headings Probability of Future Events based on Experience Errors of Observation & Method of Least Squares “Of the probability of future events deduced from experience” This section covers our Bayes’ theorem the rule of succession the probability of a future success given m successes in n Bernoulli trials with a uniform prior for the probability of a success closer to their Bayes’ theorem Theory of Errors and Least squares To find the most probable value of the true measurement is a “problem in inverse chances” Normal errors and uniform prior for the regression coefficients Ultimately Gauss’s method (1809). Over the next decade Pearson does all sorts of things establishes himself as an applied mathematician specialising in elasticity. but no probability 1890s: laying the permanent way— the lines on which thoughts travel probability in the Grammar of Science 1892 statistical techniques needed for biometry in publications from 1894 onwards The probability line : probability and projectability That a certain sequence has occurred and recurred in the past is a matter of experience to which we give expression in the concept causation that it will continue to recur in the future is a matter of belief to which we give expression in the concept probability. Subjective or objective probability? KP after a "middle road" between De Morgan, Mill and Jevons who are "pushing the possibilities of the theory of probability in too wide and unguarded a manner" and Boole and Venn who are "taking a severely critical and in some respects too narrow view of them." We believe in recurrence because we have seen it in the past we can apply the rule of succession this is based on a uniform prior or equal distribution of ignorance but that is not unreasonable A uniform knowledge prior: against Venn & Boole and after Edgeworth We take our stand upon the fact that probability-constants occurring in nature present every variety of fractional value and that natural constants in general are found to show no preference for one number rather than another. this uniform knowledge prior is unquestioned until 1917/1920 It is in all KP’s lecture notes and in all his publications on prediction In 1917/20 KP challenges it in 2 opposite ways Specific experience may contradict it It is an unnecessary assumption The statistical lines Method of moments 1894 non-Bayes Pearson curves 1895 non-Bayes Correlation & regression 1896 Bayes ? Probable errors 1898 non-Bayes χ2 and P-values 1900 non-Bayes Estimation problems in biometry A mixture of normals by the method of moments (1894) Pearson curves by the method of moments (1895) Normal regression and correlation by the Gaussian method (1896) Doing what comes naturally I: Carl Pearson’s Gaussian method Finding the “best value of correlation” for a bivariate normal population using the Gaussian method is straightforward Estimating a mixture of normals using the Gaussian method is not What comes naturally II To the elastician Think of the distribution of the observations as a material lamina with mechanical properties— moments Imagine the visible lamina is formed from two invisible component laminae The moments of the visible lamina are determined by the moments of the invisibles and vice versa KP’s comments on methods The method adopted [method of moments] has only been chosen after many trials and errors Other methods produce exponential equations the solution of which seems more beyond the wit of man than that of a numerical equation even of the ninth order Horses for courses ? KP never applied or discussed the application of the Gaussian method to estimating the Pearson curves The method of moments took on a life of its own in curve fitting and perhaps KP judged the Gaussian method unsuited or impractical 1900s: an equilibrium ? Bayes for prediction non-Bayes for everything else Equilibrium I : repeated sampling If a number of random samples be taken any statistical constant will vary from sample to sample, and its variation is distributed according to some law round the actual value of the constant for the total population. This variation would be very properly measured by the standard deviation of the constant for an indefinitely great series of random samples. probable errors papers in 1903, -13, -20 Equilibrium II : “On the Influence of Past Experience on Future Expectation” 1907 One and all, we act on the principle that the statistical ratio determined from our past experience will hold, at any rate approximately, for the near future. I start as most mathematical writers have done with “the equal distribution of ignorance,” or Bayes’ Theorem. The frequency of future samples is given by a certain hypergeometrical series, which is not at all closely approximated by a Gaussian curve …. “Fundamental problems” 1907 The fundamental problem in physics is the permanence of repeated sequences, without change of conditions The fundamental problem in statistics is the permanence of statistical ratios, without change of conditions. Projectability not just for philosophers The medical statistician Major Greenwood did the calculations Pearson planned The tables appear in Tables for Statisticians and Biometricians 1914 Examples include In a batch of 79 recruits from a certain regiment four were found to be syphilitic. What number of syphilitics may be anticipated in a further batch of 40 recruits? The 1917 development—out of small sample theory Pearson 1896/8 had found the large sample distribution of the correlation coefficient—though if you blink it’s a Bayes posterior. Student (1908) investigated the form of the small sample distribution. Soper (1913) tried to improve on Student (“I am indebted to Professor Karl Pearson for drafting the lines of this investigation and for critical supervision.”) R. A. Fisher 1915 In 1915 Pearson published Fisher’s paper on the extract distribution of the correlation coefficient KP was most struck by Fisher’s adjustment of r to obtain the “most probable value” of ρ KP on Fisher’s “use” of a uniform prior Statistical workers cannot be too often reminded that there is no validity in a mathematical theory pure and simple. Bayes’ Theorem must be based on experience, the experience where we are á priori in ignorance all values are equally likely to occur.

Enigma of Karl Pearson and Bayesian Inference

F:\RSS\Me\Society's Mathemarica

Two Principles of Evidence and Their Implications for the Philosophy of Scientific Method

National Academy Elects IMS Fellows Have You Voted Yet?

The Likelihood Principle

Memorial to Sir Harold Jeffreys 1891-1989 JOHN A

A Parsimonious Tour of Bayesian Model Uncertainty

Orme) Wilberforce (Albert) Raymond Blackburn (Alexander Bell

School of Social Sciences Economics Division University of Southampton Southampton SO17 1BJ, UK

The Enigma of Karl Pearson and Bayesian Inference

Frequentist and Bayesian Statistics

Mathematical Statistics and the ISI (1885-1939)

Statistical Inference a Work in Progress