Bayes Theorem/Rule, a First Intro

Total Page:16

File Type:pdf, Size:1020Kb

Bayes Theorem/Rule, a First Intro Bayes Theorem/Rule, A First Intro Until the mid-1700’s, the theory of probabilities (as distinct from theories of valuation like expected utility theory) was focussed almost entirely on estimating the likelihood of uncertain future events; lotteries, coin flips, or life expectancies. This class of probability estimate is often called aleatory probability from the latin aleator, meaning gambler. In law, aleatory contracts are those in which the signatories to the contract both risk loss or gain in the face of future uncer- tainty. A life insurance policy is an example of an aleatory contract. Aleatory uncertainties are exactly the kind of probabilistic events that Pascal had envisioned as the subject of a calculus of probabilities. Regardless of whether or not the world is as truly deter- ministic as Descartes and Galileo hoped, we often do not know what will happen the future. We do not know when a particular individual will die or whether a particular coin will land heads or tails up if it is flipped. Pascal’s probability theory was designed to model events of this type. In the second half of the eighteenth century two men revolutionized the calculus of probability when they realized that one could apply this probability theory not just to assess the likelihood of future events, but also to assess the likelihood of past events. While this may seem a small thing, it changed the way europeans thought about the mathematics of probability and opened the way to a more formal theory of decision making. Consider an uncertain situation which was of tremendous interest to both The English reverend Thomas Bayes and to the French mathematician Pierre-Simon LaPlace. An astronomer measures the angular altitude of Jupiter six times in rapid succession and gets six slightly different numbers each time. Jupiter has a single altitude, but we have six imperfect observations of that altitude, all of which differ. What, we might ask, was the most likely actual altitude of Jupiter at the time that we made our observations? It was Thomas Bayes’ insight, published posthumously in 1763, that probability theory could be extended to answer questions of this type as well. Bayes reasoned that if one knew the distribution of errors induced by the astronomer’s instruments, then one could mathematically infer the most likely true altitude of Jupiter when the observations were made. It is important to note that there is nothing aleatory about this kind of probability. At the time the measurement was made Jupiter certainly had an altitude. The only uncertainty derives from our own lack of knowledge. The limitation that we face in this example is entirely epistemological. Bayes was suggesting that probability theory could be used to describe epistemological uncer- tainty as well as to aleatory uncertainty. Unfortunately little is known about the historical Thomas Bayes. We do know that he was a rural protestant theologian and minister who was not a member of the Church of England, a Dissenter. He published only two works during his life: A theological work entitled Divine Benevolence, or an Attempt to Prove That the Principal End of the Divine Providence and Government Is the Hap- piness of His Creatures and a more mathematical work; An Introduction to the Doctrine of Flux- ions, and a Defence of the Mathematicians Against the Objections of the Author of The Analyst in which he defended Newton’s Calculus against an attack by the philosopher Bishop George Berke- ley. After his death, Bayes’ friend and executor Richard Price discovered amongst his papers a third manuscript entitled: Essay Towards Solving a Problem in the Doctrine of Chances. Price presented that paper at the Royal Society in London in 1763 and it is entirely upon that work which Bayes’ quite considerable fame rests. Today Bayes is such a towering name in mathematics, it seems astonishing that we know so little about him. We do not, for example, know why he was elected a fellow of the Royal Society before his death. In fact, the only picture of Bayes that we have may not even be a portrait of him. The historical Bayes is almost a total mystery. To his contemporaries that may not have been terribly surprising; the posthumous publication of Bayes’ essay in The Proceedings had almost no impact until LaPlace rediscovered it about 10 years later. Bayes’ insight was profound. He realized that there are many events about which we have only partial or inaccurate knowledge. Events which truly happened but about which we are, because of our limited knowledge, are uncertain. It was Bayes who first realized that a mathematically com- plete kind of inverse probability could be used to infer the most likely values or properties of those events1. The Bayesian theorem provides the basis for a fundamentally statistical approach to this kind of epistemological uncertainty. It does this by putting, on rigorous mathematical footing, the process of predicting the likelihood of all possible previous states of the world given one’s available observations. Put in English, Bayes’ theorem allows us to ask the following question: given my knowledge of how often I have observed that the world appeared to be in state x, and my knowl- edge of how well correlated my current sensory data is with the actual world state x, then pre- cisely how likely is it that the world was actually in state x. Bayes’ theorem is so important that I want to digress here to present a fairly complete example of how the mathematics of the theorem work. Imagine that you are a monkey trained to fixate a spot of light while two eccentric spots of light are also illuminated just as in the example presented in chapter five. In this experiment, however, the central fixation light changes color to indicate which of the two eccentric target lights, the left one or the right one, will serve as your goal on this trial. If you can decide which target is the goal, and look at it, you receive a raisin as a reward. How- ever, the color of the central fixation light (or more precisely the wavelength of the light emitted by the central stimulus) can be any one of a hundred different hues (or wavelengths). We can begin our Bayesian description of this task by saying that there are two possible world states. One state in which a leftward eye movement will be rewarded and one state in which a rightward eye movement will be rewarded. Figure 8.1: Bayesian Graphs of the Example - EPS doc - PG In mathematical notation we designate these two world states as w1 and w2. State w1 is when a leftward eye movement, or saccade, will be rewarded and state w2 is when a rightward saccade will be rewarded. After observing 100 trials we discover that on 25% of trials a leftward move- ment was rewarded, irrespective of the color of the fixation light and on 75% of trials the right- ward movement was rewarded. Based upon this observation we can say that the prior probability that world state w1 will occur (known as P(w1)) is 0.25, and the prior probability of world state w2 is 0.75. 1. As Stephen Stigler has pointed out, Thomas Stimpson was really the first mathematician to propose the idea of inverse proba- bilities, but it was Bayes who developed the mathematical approach on which modern inverse probabilities are based (Stigler, 1989). To make these prior probabilities more accurate estimates of the state of the world we next have to take into account the color of the central fixation stimulus and the correlation of that stimulus color with each of the world states. To do that we need to generate a graph which plots the proba- bility that we will encounter a particular stimulus wavelength (which we will call λ) when the 1 world is in state w1. Figure 8.5a plots an example of such a probability density function showing the likelihood of each value of λ when the world is in state w1, and when in state w2. We refer to this as the conditional probability density function for λ in world state w1, or P(λ|w1). Next, in order to get the two graphs in Figure 8.5a to tell us how likely it is that we see a given λ and the world is in a given state, we have to correct these graphs for the overall likelihood that the world is in either state w1 or state w2. To do that we multiply each point on the graphs by the prior probability of that world state. The graph on the left thus becomes: P(λ|w1)P(w1), where P(w1) is the prior probability for world state w1 as described above. Note in Figure 8.5b that this has the effect of re-scaling the graphs that appeared in Figure 8.5a. Finally, we have to determine how likely it is that any given value of λ will occur regardless of world state. To do this we need simply to count up all the times that we have seen λ at a specific value and then plot the probability density function for all values of λ (irrespective of which movement was rewarded) as shown in Figure 8.5c. Now we are ready to ask, when we see a given wavelength of light, what is the likelihood that on this trial a rightward movement will be rewarded (that we are in world state w1) and what is the likelihood that a leftward movement will be rewarded (world state w2).
Recommended publications
  • The Open Handbook of Formal Epistemology
    THEOPENHANDBOOKOFFORMALEPISTEMOLOGY Richard Pettigrew &Jonathan Weisberg,Eds. THEOPENHANDBOOKOFFORMAL EPISTEMOLOGY Richard Pettigrew &Jonathan Weisberg,Eds. Published open access by PhilPapers, 2019 All entries copyright © their respective authors and licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. LISTOFCONTRIBUTORS R. A. Briggs Stanford University Michael Caie University of Toronto Kenny Easwaran Texas A&M University Konstantin Genin University of Toronto Franz Huber University of Toronto Jason Konek University of Bristol Hanti Lin University of California, Davis Anna Mahtani London School of Economics Johanna Thoma London School of Economics Michael G. Titelbaum University of Wisconsin, Madison Sylvia Wenmackers Katholieke Universiteit Leuven iii For our teachers Overall, and ultimately, mathematical methods are necessary for philosophical progress. — Hannes Leitgeb There is no mathematical substitute for philosophy. — Saul Kripke PREFACE In formal epistemology, we use mathematical methods to explore the questions of epistemology and rational choice. What can we know? What should we believe and how strongly? How should we act based on our beliefs and values? We begin by modelling phenomena like knowledge, belief, and desire using mathematical machinery, just as a biologist might model the fluc- tuations of a pair of competing populations, or a physicist might model the turbulence of a fluid passing through a small aperture. Then, we ex- plore, discover, and justify the laws governing those phenomena, using the precision that mathematical machinery affords. For example, we might represent a person by the strengths of their beliefs, and we might measure these using real numbers, which we call credences. Having done this, we might ask what the norms are that govern that person when we represent them in that way.
    [Show full text]
  • The Bayesian Approach to the Philosophy of Science
    The Bayesian Approach to the Philosophy of Science Michael Strevens For the Macmillan Encyclopedia of Philosophy, second edition The posthumous publication, in 1763, of Thomas Bayes’ “Essay Towards Solving a Problem in the Doctrine of Chances” inaugurated a revolution in the understanding of the confirmation of scientific hypotheses—two hun- dred years later. Such a long period of neglect, followed by such a sweeping revival, ensured that it was the inhabitants of the latter half of the twentieth century above all who determined what it was to take a “Bayesian approach” to scientific reasoning. Like most confirmation theorists, Bayesians alternate between a descrip- tive and a prescriptive tone in their teachings: they aim both to describe how scientific evidence is assessed, and to prescribe how it ought to be assessed. This double message will be made explicit at some points, but passed over quietly elsewhere. Subjective Probability The first of the three fundamental tenets of Bayesianism is that the scientist’s epistemic attitude to any scientifically significant proposition is, or ought to be, exhausted by the subjective probability the scientist assigns to the propo- sition. A subjective probability is a number between zero and one that re- 1 flects in some sense the scientist’s confidence that the proposition is true. (Subjective probabilities are sometimes called degrees of belief or credences.) A scientist’s subjective probability for a proposition is, then, more a psy- chological fact about the scientist than an observer-independent fact about the proposition. Very roughly, it is not a matter of how likely the truth of the proposition actually is, but about how likely the scientist thinks it to be.
    [Show full text]
  • Most Honourable Remembrance. the Life and Work of Thomas Bayes
    MOST HONOURABLE REMEMBRANCE. THE LIFE AND WORK OF THOMAS BAYES Andrew I. Dale Springer-Verlag, New York, 2003 668 pages with 29 illustrations The author of this book is professor of the Department of Mathematical Statistics at the University of Natal in South Africa. Andrew I. Dale is a world known expert in History of Mathematics, and he is also the author of the book “A History of Inverse Probability: From Thomas Bayes to Karl Pearson” (Springer-Verlag, 2nd. Ed., 1999). The book is very erudite and reflects the wide experience and knowledge of the author not only concerning the history of science but the social history of Europe during XVIII century. The book is appropriate for statisticians and mathematicians, and also for students with interest in the history of science. Chapters 4 to 8 contain the texts of the main works of Thomas Bayes. Each of these chapters has an introduction, the corresponding tract and commentaries. The main works of Thomas Bayes are the following: Chapter 4: Divine Benevolence, or an attempt to prove that the principal end of the Divine Providence and Government is the happiness of his creatures. Being an answer to a pamphlet, entitled, “Divine Rectitude; or, An Inquiry concerning the Moral Perfections of the Deity”. With a refutation of the notions therein advanced concerning Beauty and Order, the reason of punishment, and the necessity of a state of trial antecedent to perfect Happiness. This is a work of a theological kind published in 1731. The approaches developed in it are not far away from the rationalist thinking.
    [Show full text]
  • Maty's Biography of Abraham De Moivre, Translated
    Statistical Science 2007, Vol. 22, No. 1, 109–136 DOI: 10.1214/088342306000000268 c Institute of Mathematical Statistics, 2007 Maty’s Biography of Abraham De Moivre, Translated, Annotated and Augmented David R. Bellhouse and Christian Genest Abstract. November 27, 2004, marked the 250th anniversary of the death of Abraham De Moivre, best known in statistical circles for his famous large-sample approximation to the binomial distribution, whose generalization is now referred to as the Central Limit Theorem. De Moivre was one of the great pioneers of classical probability the- ory. He also made seminal contributions in analytic geometry, complex analysis and the theory of annuities. The first biography of De Moivre, on which almost all subsequent ones have since relied, was written in French by Matthew Maty. It was published in 1755 in the Journal britannique. The authors provide here, for the first time, a complete translation into English of Maty’s biography of De Moivre. New mate- rial, much of it taken from modern sources, is given in footnotes, along with numerous annotations designed to provide additional clarity to Maty’s biography for contemporary readers. INTRODUCTION ´emigr´es that both of them are known to have fre- Matthew Maty (1718–1776) was born of Huguenot quented. In the weeks prior to De Moivre’s death, parentage in the city of Utrecht, in Holland. He stud- Maty began to interview him in order to write his ied medicine and philosophy at the University of biography. De Moivre died shortly after giving his Leiden before immigrating to England in 1740. Af- reminiscences up to the late 1680s and Maty com- ter a decade in London, he edited for six years the pleted the task using only his own knowledge of the Journal britannique, a French-language publication man and De Moivre’s published work.
    [Show full text]
  • The Theory That Would Not Die Reviewed by Andrew I
    Book Review The Theory That Would Not Die Reviewed by Andrew I. Dale The solution, given more geometrico as Proposi- The Theory That Would Not Die: How Bayes’ tion 10 in [2], can be written today in a somewhat Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant compressed form as from Two Centuries of Controversy posterior probability ∝ likelihood × prior probability. Sharon Bertsch McGrayne Yale University Press, April 2011 Price himself added an appendix in which he used US$27.50, 336 pages the proposition in a prospective sense to find the ISBN-13: 978-03001-696-90 probability of the sun’s rising tomorrow given that it has arisen daily a million times. Later use In the early 1730sThomas Bayes (1701?–1761)was was made by Laplace, to whom one perhaps really appointed minister at the Presbyterian Meeting owes modern Bayesian methods. House on Mount Sion, Tunbridge Wells, a town that The degree to which one uses, or even supports, had developed around the restorative chalybeate Bayes’s Theorem (in some form or other) depends spring discovered there by Dudley, Lord North, in to a large extent on one’s views on the nature 1606. Apparently not one who was a particularly of probability. Setting this point aside, one finds popular preacher, Bayes would be recalled today, that the Theorem is generally used to update if at all, merely as one of the minor clergy of (to justify the updating of?) information in the eighteenth-century England, who also dabbled in light of new evidence as the latter is received, mathematics.
    [Show full text]
  • History of Probability (Part 4) - Inverse Probability and the Determination of Causes of Observed Events
    History of Probability (Part 4) - Inverse probability and the determination of causes of observed events. Thomas Bayes (c1702-1761) . By the early 1700s the work of Pascal, Fermat, and Huygens was well known, mainly couched in terms of odds and fair bets in gambling. Jacob Bernoulli and Abraham DeMoivre had made efforts to broaden the scope and interpretation of probability. Bernoulli had put probability measures on a scale between zero and one and DeMoivre had defined probability as a fraction of chances. But then there was a 50 year lull in further development in probability theory. This is surprising, but the “brains” of the day were busy applying the newly invented calculus to problems of physics – especially astronomy. At the beginning of the 18 th century a probability exercise like the following one was not too difficult. Suppose there are two boxes with white and black balls. Box A has 4 white and 1 black, box B has 2 white and 3 black. You pick one box at random, with probability 1/3 it will be Box A and 2/3 it will be B. Then you randomly pick one ball from the box. What is the probability you will end up with a white ball? We can model this scenario by a tree diagram. We also have the advantage of efficient symbolism. Expressions like = had not been developed by 1700. The idea of the probability of a specific event was just becoming useful in addition to the older emphasis on odds of one outcome versus another. Using the multiplication rule for independent events together with the addition rule, we find that the probability of picking a white ball is 8/15.
    [Show full text]
  • Bayes Rule in Perception, Action and Cognition
    Bayes rule in perception, action and cognition Daniel M. Wolpert and Zoubin Ghahramani Department of Engineering, University of Cambridge, Trumpington Street, Cambridge Cognition and intelligent behaviour are fundamentally tied to the ability to survive in an uncertain and changing environment. Our only access to knowledge about the world is through our senses which provide information that is usually corrupted by random fluctuations, termed noise, and may provide ambiguous information about the possible states of the environment. Moreover, when we act on the world through our motor system, the commands we send to our muscles are also corrupted by variability or noise. This combined sensory and motor variability limits the precision with which we can perceive and act on the world. Here we will review the framework of Bayesian decision theory, which has emerged as a principled approach to handle uncertain information in an attempt to behave optimally, and how this framework can be used to understand sensory, motor and cognitive processes. Bayesian Theory is named after Thomas Bayes (Figure 1) who was an 18th century English Presbyterian minister. He is only known to have published two works during his life, of which only one dealt with mathematics in which he defended the logical foundation of Isaac Newton’s methods against contemporary criticism. After Bayes death, his friend Richard Price found an interesting mathematical proof among Bayes’ papers and sent the paper to the Editor of the Philosophical Transactions of the Royal Society stating “I now send you an essay which I have found among the papers of our deceased friend Mr Bayes, and which, in my opinion, has great merit....” The paper was subsequently published in 1764 as “Essay towards solving a problem in the doctrine of chances.” In the latter half of the 20th century Bayesian approaches have become a mainstay of statistics and a more general framework has now emerged, termed Bayesian decision theory (BDT).
    [Show full text]
  • Bayesian Statistics: Thomas Bayes to David Blackwell
    Bayesian Statistics: Thomas Bayes to David Blackwell Kathryn Chaloner Department of Biostatistics Department of Statistics & Actuarial Science University of Iowa [email protected], or [email protected] Field of Dreams, Arizona State University, Phoenix AZ November 2013 Probability 1 What is the probability of \heads" on a toss of a fair coin? 2 What is the probability of \six" upermost on a roll of a fair die? 3 What is the probability that the 100th digit after the decimal point, of the decimal expression of π equals 3? 4 What is the probability that Rome, Italy, is North of Washington DC USA? 5 What is the probability that the sun rises tomorrow? (Laplace) 1 1 1 My answers: (1) 2 (2) 6 (3) 10 (4) 0.99 Laplace's answer to (5) 0:9999995 Interpretations of Probability There are several interpretations of probability. The interpretation leads to methods for inferences under uncertainty. Here are the 2 most common interpretations: 1 as a long run frequency (often the only interpretation in an introductory statistics course) 2 as a subjective degree of belief You cannot put a long run frequency on an event that cannot be repeated. 1 The 100th digit of π is or is not 3. The 100th digit is constant no matter how often you calculate it. 2 Similarly, Rome is North or South of Washington DC. The Mathematical Concept of Probability First the Sample Space Probability theory is derived from a set of rules and definitions. Define a sample space S, and A a set of subsets of S (events) with specific properties.
    [Show full text]
  • Nature, Science, Bayes' Theorem, and the Whole of Reality
    Nature, Science, Bayes' Theorem, and the Whole of Reality Moorad Alexanian Department of Physics and Physical Oceanography University of North Carolina Wilmington Wilmington, NC 28403-5606 Abstract A fundamental problem in science is how to make logical inferences from scientific data. Mere data does not suffice since additional information is necessary to select a domain of models or hypotheses and thus determine the likelihood of each model or hypothesis. Thomas Bayes' Theorem relates the data and prior information to posterior probabilities associated with differing models or hypotheses and thus is useful in identifying the roles played by the known data and the assumed prior information when making inferences. Scientists, philosophers, and theologians accumulate knowledge when analyzing different aspects of reality and search for particular hypotheses or models to fit their respective subject matters. Of course, a main goal is then to integrate all kinds of knowledge into an all-encompassing worldview that would describe the whole of reality. A generous description of the whole of reality would span, in the order of complexity, from the purely physical to the supernatural. These two extreme aspects of reality are bridged by a nonphysical realm, which would include elements of life, man, consciousness, rationality, mental and mathematical abstractions, etc. An urgent problem in the theory of knowledge is what science is and what it is not. Albert Einstein's notion of science in terms of sense perception is refined by defining operationally the data that makes up the subject matter of science. It is shown, for instance, that theological considerations included in the prior information assumed by Isaac Newton is irrelevant in relating the data logically to the model or hypothesis.
    [Show full text]
  • Richard Price and the History of Science
    69 RICHARD PRICE AND THE HISTORY OF SCIENCE John V. Tucker Abstract Richard Price (1723–1791) was born in south Wales and practised as a minister of religion in London. He was also a keen scientist who wrote extensively about mathematics, astronomy, and electricity, and was elected a Fellow of the Royal Society. Written in support of a national history of science for Wales, this article explores the legacy of Richard Price and his considerable contribution to science and the intellectual history of Wales. The article argues that Price’s real contribution to science was in the field of probability theory and actuarial calculations. Introduction Richard Price was born in Llangeinor, near Bridgend, in 1723. His life was that of a Dissenting Minister in Newington Green, London. He died in 1791. He is well remembered for his writings on politics and the affairs of his day – such as the American and French Revolutions. Liberal, republican, and deeply engaged with ideas and intellectuals, he is a major thinker of the eighteenth century. He is certainly pre-eminent in Welsh intellectual history. Richard Price was also deeply engaged with science. He was elected a Fellow of the Royal Society for good reasons. He wrote about mathematics, astronomy and electricity. He had scientific equipment at home. He was consulted on scientific questions. He was a central figure in an eighteenth-century network of scientific people. He developed the mathematics and data needed to place pensions and insurance on a sound foundation – a contribution to applied mathematics that has led to huge computational, financial and social progress.
    [Show full text]
  • Report on the Reverend Thomas Bayes Here
    REPORT ON THE REVEREND THOMAS BAYES by David R. Bellhouse Department of Statistical and Actuarial Sciences University of Western Ontario London, Ontario, Canada What we know about Thomas Bayes Thomas Bayes was born in London circa 1701 and died in Tunbridge Wells in 1761. He was educated at a Dissenting Academy in London and then went to University of Edinburgh to study for the Presbyterian ministry. He also studied mathematics at Edinburgh. After Edinburgh he became a minister at Mount Sion Presbyterian chapel in Tunbridge Wells. He and some of his relatives are buried in Bunhill Fields in London, where his siblings lived. Many English Presbyterians of the mid-eighteenth century were quite different in their theological outlook from modern-day Presbyterians. Many were free-thinkers and strayed from standard Christian orthodoxy. Bayes seems to have fallen into this category. Thomas Bayes published very little in his lifetime and very few of his manuscripts survive. A complete catalogue is: two books (one theological and one mathematical), two mathematical papers related to probability theory in the Philosophical Transactions, a letter in the Royal Society Archives, a notebook held by the Institute and Faculty of Actuaries Archive, and some mathematical manuscripts now held in the Centre for Kentish Studies. In his mathematical work there is nothing that could in the slightest be called controversial. His lasting and important contribution is the first expression of what is now called Bayes Theorem in probability, the continuing application of which has had an enormous impact on modern society. His values as we understand them from any writings, speeches etc.
    [Show full text]
  • The Early Development of Mathematical Probability Glenn Shafer
    The Early Development of Mathematical Probability Glenn Shafer This article is concerned with the development of the mathematical theory of probability, from its founding by Pascal and Fermat in an exchange of letters in 1654 to its early nineteenth-century apogee in the work of Laplace. It traces how the meaning, mathematics, and applications of the theory evolved over this period. 1. Summary Blaise Pascal and Pierre Fermat are credited with founding mathematical probability because they solved the problem of points, the problem of equitably dividing the stakes when a fair game is halted before either player has enough points to win. This problem had been discussed for several centuries before 1654, but Pascal and Fermat were the first to give the solution we now consider correct. They also answered other questions about fair odds in games of chance. Their main ideas were popularized by Christian Huygens, in his De ratiociniis in ludo aleae, published in 1657. During the century that followed this work, other authors, including James and Nicholas Bernoulli, Pierre Rémond de Montmort, and Abraham De Moivre, developed more powerful mathematical tools in order to calculate odds in more complicated games. De Moivre, Thomas Simpson, and others also used the theory to calculate fair prices for annuities and insurance policies. James Bernoulli's Ars conjectandi, published in 1713, laid the philosophical foundations for broader applications. Bernoulli brought the philosophical idea of probability into the mathematical theory, formulated rules for combining the probabilities of arguments, and proved his famous theorem: the probability of an event is morally certain to be approximated by the frequency with which it occurs.
    [Show full text]