The Theory That Would Not Die Reviewed by Andrew I
Total Page:16
File Type:pdf, Size:1020Kb
Book Review The Theory That Would Not Die Reviewed by Andrew I. Dale The solution, given more geometrico as Proposi- The Theory That Would Not Die: How Bayes’ tion 10 in [2], can be written today in a somewhat Rule Cracked the Enigma Code, Hunted Down Russian Submarines, and Emerged Triumphant compressed form as from Two Centuries of Controversy posterior probability ∝ likelihood × prior probability. Sharon Bertsch McGrayne Yale University Press, April 2011 Price himself added an appendix in which he used US$27.50, 336 pages the proposition in a prospective sense to find the ISBN-13: 978-03001-696-90 probability of the sun’s rising tomorrow given that it has arisen daily a million times. Later use In the early 1730sThomas Bayes (1701?–1761)was was made by Laplace, to whom one perhaps really appointed minister at the Presbyterian Meeting owes modern Bayesian methods. House on Mount Sion, Tunbridge Wells, a town that The degree to which one uses, or even supports, had developed around the restorative chalybeate Bayes’s Theorem (in some form or other) depends spring discovered there by Dudley, Lord North, in to a large extent on one’s views on the nature 1606. Apparently not one who was a particularly of probability. Setting this point aside, one finds popular preacher, Bayes would be recalled today, that the Theorem is generally used to update if at all, merely as one of the minor clergy of (to justify the updating of?) information in the eighteenth-century England, who also dabbled in light of new evidence as the latter is received, mathematics. How is it, then, that Roger Farthing, resulting in a strengthening of one’s belief. Bayes’s author of an excellent history of Mount Sion, could Theorem provides a codification of “learning from describe Bayes as “to my mind, the greatest man experience”, and The Theory That Would Not Die to have lived in Tunbridge Wells” ([5, p. 167])? The is concerned with the investigation and exposition answer is fairly simple: Bayesian statistics. of situations in which such learning has been both In 1763 Richard Price forwarded to the Royal required and achieved. Society of London an essay by Bayes [2] in which A globally acceptable definition of Bayesianism, the following problem was addressed: or the Bayesian method, seems hardly possible, for Given the number of times in which there are perhaps almost as many definitions as an unknown event has happened there are practitioners of the art. A useful and and failed: Required the chance generally acceptable description is given as fol- that the probability of its happen- lows by Anthony O’Hagan: “The Bayesian method ing in a single trial lies between any briefly comprises the following principal steps. two degrees of probability that can Likelihood …Prior …Posterior …Inference” ([8, p. be named. 10]). It is also considered essential that the prior Andrew I. Dale is emeritus professor of statistics at the probability distributions be explicitly given (if University of KwaZulu-Natal in Durban, South Africa. His these distributions are estimated from the data, email address is [email protected]. the method, developed in the 1950s, is known as DOI: http://dx.doi.org/10.1090/noti839 Empirical Bayesianism). May 2012 Notices of the AMS 657 Laplace initially gave Bayes’s Theorem in a LX is p and that without is q. Then observe a simplified form as further number m with the two numbers being r n and s, respectively. Now repeat this experiment Pr[Ai|E] = Pr[E|Ai ] P1 Pr[E|Aj ], (with the same numbers n and m) in different where, for instance, Pr[E|Ai] is the probability of situations, and see whether “the distribution of the event E given the cause Ai . This formula was observed values of r could be compared with the developed in his later work to the continuous form theoretical distributions which on Bayes’ Theorem one knows today, and he immediately applied it in the knowledge of n and p should enable us to his scientific investigations. A large study of births predict” ([9, p. 396]). (more accurately baptisms) in France suggested to The damage done by German U-boats to Allied Laplace with strong probability that the chance of shipping during the Second World War made it the birth of a boy exceeded that of a girl in Paris imperative for the Allies to fathom the workings (and similarly in London). of the Enigma machines. These machines, used by Most statisticians these days are acquainted the Germans to send encoded messages, became with the theoretical aspects of Bayesianism. Less more and more sophisticated and complicated as well known, I suspect, are certain areas of practical the war progressed. Intensive research by a ded- application, and McGrayne’s book is a “find” in icated team under the leadership of Alan Turing this respect. So many, indeed, are the different at Bletchley Park in England eventually resulted applications considered that we cannot mention in the cracking of the codes (the difficulty was more than a few here. considerably eased when a code book of encrypt- Despite Laplace’s work, his successors saw little ing tables was obtained from a sinking German virtue in Bayesian methods and almost succeeded submarine off Egypt). McGrayne writes “Turing in killing them off. One who kept the flickering was developing a homegrown Bayesian system. flame alight was Joseph Louis François Bertrand Finding the Enigma settings that had encoded a with his work on projectiles for artillery field particular message was a classic problem in the officers, and later, in the famous Dreyfus affair, inverse probability of causes” (p. 68). the defence witness Henri Poincaré strongly sup- During the war Turing paid a visit to the United ported the updating of the probability, in the light States to discuss the work done at Bletchley Park of later evidence, of the truth of a hypothesis as with the U.S. Navy cryptographers. However “he to whether a certain letter was a forgery. tried in vain to explain the general principle that Come the Great War, statisticians in the United confirming inferences suggested by a hypothesis States found themselves using Bayesian methods would make the hypothesis itself more probable” in connection with the making of decisions about (pp. 75–6). Here Turing met Claude Shannon, who injured workers and telephone communications. was working intensively on information theory. Yet these methods did not meet with general “Roughly speaking, if the posterior in a Bayesian approval: “Early in the twentieth century” writes equation is quite different from the prior, some- McGrayne, “theoreticians would change their atti- thing has been learned; but when a posterior is tudes toward Bayes’ rule from tepid toleration to basically the same as the prior guess, the in- outright hostility” (p. 45)1. formation content is low” (p. 77). Turing’s work Sporadic work continued after the war, and in was not altogether dismissed, however, for Eisen- 1925 Egon Sharpe Pearson published a long paper hower later said that “Bletchley Park’s decoders [9] in which he sought to put the theoretical rule had shortened the war in Europe by at least two to the test—“the most extensive exploration of years” (p. 81). Bayesian methods conducted between Laplace in However, the success achieved by Bayesian the 1780s and the 1960s” (p. 49). Before the candle methods during the war was not generally appre- could be entirely guttered, the flame spluttered up in the work of Èmile Borel, Frank Plump- ciated, it having been decided by those in authority ton Ramsey, and Bruno de Finetti and in the that such results were to be kept classified. Bayes’s face of strong opposition from Ronald Aylmer Theorem was still seen as suspect: “‘Bayes’ still Fisher and Jerzy Neyman. Only the sterling work meant equal priors and did not yet mean making of Harold Jeffreys—largely and undeservedly ig- inferences, conclusions, or predictions based on nored by statisticians—at Cambridge kept the updating observational data” (p. 87). resuscitation on the go. Part III of McGrayne’s book records “the glorious Pearson’sidearanasfollows:letasampleofsize revival”. The first chapter here is devoted to the n (say of taxicabs in London streets) be observed work of Arthur Bailey, who made extensive and in which the number of taxis having registration important use of Bayes’s Theorem in his work on credibility in casualty insurance as chief actuary 1References to page numbers without a citation are to the at the New York State Insurance Department. His book being reviewed. reason for so doing was made quite clear: “It 658 Notices of the AMS Volume 59, Number 5 will be realized that all of the problems in which As the century progressed, Bayesian methods credibilities are used are problems in statistical became more widely used. In the business field estimation” ([1, p. 8]). the prime developers were Howard Raiffa and Chapter 8 details the entry of Bayesian methods Robert Osher Schlaifer. The emphasis was now into medical research, chiefly through the efforts, switching from (merely) analyzing data to mak- McGrayne claims, of Jerome Cornfield. The matter ing decisions, and McGrayne notes that “some treated here is the connection of smoking with business Bayesians even dropped the prior odds lung cancer and, later, heart disease. Small studies called for by Bayes’s rule” (p. 149). Whether such had been carried out before the Second World War, a move would be accepted as Bayesian might in but the retrospective large study [3] published by fact be denied by many: Jeffreys, for instance, Richard Doll and Austin Bradford Hill in 1950 “was interested in making inferences from sci- gave firm (conclusive?) evidence of the connection entific evidence, not in using statistics to guide between cigarette smoking and carcinoma of the future action.