A Short History of Markov Chain Monte Carlo: Subjective Recollections from Incomplete Data
Total Page:16
File Type:pdf, Size:1020Kb
STS stspdf v.2011/02/18 Prn:2011/02/22; 10:52 F:sts351.tex; (Svajune) p. 1 Statistical Science 2011, Vol. 0, No. 00, 1–14 DOI: 10.1214/10-STS351 © Institute of Mathematical Statistics, 2011 1 52 2 A Short History of Markov Chain Monte 53 3 54 4 Carlo: Subjective Recollections from 55 5 1 56 6 Incomplete Data 57 7 58 8 Christian Robert and George Casella 59 9 60 10 61 11 62 This paper is dedicated to the memory of our friend Julian Besag, 12 63 a giant in the field of MCMC. 13 64 14 65 15 66 16 Abstract. We attempt to trace the history and development of Markov chain 67 17 Monte Carlo (MCMC) from its early inception in the late 1940s through its 68 18 use today. We see how the earlier stages of Monte Carlo (MC, not MCMC) 69 19 research have led to the algorithms currently in use. More importantly, we see 70 20 how the development of this methodology has not only changed our solutions 71 21 to problems, but has changed the way we think about problems. 72 22 73 Key words and phrases: Gibbs sampling, Metropolis–Hasting algorithm, 23 74 hierarchical models, Bayesian methods. 24 75 25 76 26 1. INTRODUCTION We will distinguish between the introduction of 77 27 Metropolis–Hastings based algorithms and those re- 78 28 Markov chain Monte Carlo (MCMC) methods have lated to Gibbs sampling, since they each stem from 79 29 been around for almost as long as Monte Carlo tech- radically different origins, even though their mathe- 80 30 niques, even though their impact on Statistics has not matical justification via Markov chain theory is the 81 31 been truly felt until the very early 1990s, except in same. Tracing the development of Monte Carlo meth- 82 32 the specialized fields of Spatial Statistics and Image ods, we will also briefly mention what we might call 83 33 Analysis, where those methods appeared earlier. The the “second-generation MCMC revolution.” Starting in 84 34 emergence of Markov based techniques in Physics is the mid-to-late 1990s, this includes the development of 85 35 a story that remains untold within this survey (see Lan- particle filters, reversible jump and perfect sampling, 86 36 dau and Binder, 2005). Also, we will not enter into and concludes with more current work on population 87 37 a description of MCMC techniques. A comprehensive or sequential Monte Carlo and regeneration and the 88 38 treatment of MCMC techniques, with further refer- computing of “honest” standard errors. 89 39 ences, can be found in Robert and Casella (2004). As mentioned above, the realization that Markov 90 40 chains could be used in a wide variety of situations 91 41 only came (to mainstream statisticians) with Gelfand 92 42 Christian Robert is Professor of Statistics, CEREMADE, and Smith (1990), despite earlier publications in the 93 43 Université Paris Dauphine, 75785 Paris cedex 16, France statistical literature like Hastings (1970), Geman and 94 (e-mail: [email protected]). George Casella is 44 Geman (1984) and Tanner and Wong (1987). Several 95 Distinguished Professor, Department of Statistics, 45 reasons can be advanced: lack of computing machinery 96 University of Florida, Gainesville, Florida 32611, USA 46 (think of the computers of 1970!), or background on 97 (e-mail: casella@ufl.edu). 47 Markov chains, or hesitation to trust in the practicality 98 1A shorter version of this paper appears as Chapter 1 in Hand- 48 book of Markov Chain Monte Carlo (2011), edited by Steve of the method. It thus required visionary researchers 99 49 Brooks, Andrew Gelman, Galin Jones and Xiao-Li Meng. Chap- like Gelfand and Smith to convince the community, 100 50 man & Hall/CRC Handbooks of Modern Statistical Methods, Boca supported by papers that demonstrated, through a se- 101 51 Raton, Florida. ries of applications, that the method was easy to un- 102 1 STS stspdf v.2011/02/18 Prn:2011/02/22; 10:52 F:sts351.tex; (Svajune) p. 2 2 C. ROBERT AND G. CASELLA 1 derstand, easy to implement and practical (Gelfand Eckhardt, 1987) to simulate from nonuniform distrib- 52 2 et al., 1990; Gelfand, Smith and Lee, 1992; Smith and utions. Without computers, a rudimentary version in- 53 3 Gelfand, 1992; Wakefield et al., 1994). The rapid emer- vented by Fermi in the 1930s did not get any recog- 54 4 gence of the dedicated BUGS (Bayesian inference Us- nition (Metropolis, 1987). Note also that, as early as 55 5 ing Gibbs Sampling) software as early as 1991, when 1949, a symposium on Monte Carlo was supported 56 6 a paper on BUGS was presented at the Valencia meet- by Rand, NBS and the Oak Ridge laboratory and that 57 7 ing, was another compelling argument for adopting, at Metropolis and Ulam (1949) published the very first 58 8 large, MCMC algorithms.2 paper about the Monte Carlo method. 59 9 60 2.1 The Metropolis et al. (1953) Paper 10 2. BEFORE THE REVOLUTION 61 11 The first MCMC algorithm is associated with a sec- 62 12 Monte Carlo methods were born in Los Alamos, ond computer, called MANIAC, built3 in Los Alamos 63 13 New Mexico during World War II, eventually result- under the direction of Metropolis in early 1952. Both 64 14 ing in the Metropolis algorithm in the early 1950s. a physicist and a mathematician, Nicolas Metropolis, 65 15 While Monte Carlo methods were in use by that time, who died in Los Alamos in 1999, came to this place in 66 16 MCMC was brought closer to statistical practicality by April 1943. The other members of the team also came 67 17 the work of Hastings in the 1970s. to Los Alamos during those years, including the con- 68 18 What can be reasonably seen as the first MCMC al- troversial Teller. As early as 1942, he became obsessed 69 19 gorithm is what we now call the Metropolis algorithm, with the hydrogen (H) bomb, which he eventually man- 70 20 published by Metropolis et al. (1953). It emanates from aged to design with Stanislaw Ulam, using the better 71 21 the same group of scientists who produced the Monte computer facilities in the early 1950s. 72 22 Carlo method, namely, the research scientists of Los Published in June 1953 in the Journal of Chemical 73 23 Alamos, mostly physicists working on mathematical Physics, the primary focus of Metropolis et al. (1953) 74 24 physics and the atomic bomb. is the computation of integrals of the form 75 25 MCMC algorithms therefore date back to the same 76 26 time as the development of regular (MC only) Monte I = F(θ)exp{−E(θ)/kT } dθ 77 27 Carlo methods, which are usually traced to Ulam and 78 28 von Neumann in the late 1940s. Stanislaw Ulam asso- exp{−E(θ)/kT } dθ, 79 29 ciates the original idea with an intractable combinato- 80 30 rial computation he attempted in 1946 (calculating the on R2N , θ denoting a set of N particles on R2, with the 81 31 probability of winning at the card game “solitaire”). energy E being defined as 82 32 This idea was enthusiastically adopted by John von 83 N N 33 Neumann for implementation with direct applications 1 84 E(θ) = V(dij ), 34 to neutron diffusion, the name “Monte Carlo” being 2 85 i=1 j=1 35 suggested by Nicholas Metropolis. (Eckhardt, 1987, j=i 86 36 describes these early Monte Carlo developments, and 87 where V is a potential function and dij the Euclidean 37 Hitchcock, 2003, gives a brief history of the Metropo- distance between particles i and j in θ.TheBoltzmann 88 38 lis algorithm.) distribution exp{−E(θ)/kT } is parameterized by the 89 39 These occurrences very closely coincide with the ap- temperature T , k being the Boltzmann constant, with 90 40 pearance of the very first computer, the ENIAC, which a normalization factor 91 41 came to life in February 1946, after three years of 92 42 construction. The Monte Carlo method was set up by Z(T ) = exp{−E(θ)/kT } dθ, 93 43 von Neumann, who was using it on thermonuclear and 94 44 fission problems as early as 1947. At the same time, that is not available in closed form, except in trivial 95 45 that is, 1947, Ulam and von Neumann invented inver- cases. Since θ is a 2N-dimensional vector, numerical 96 46 sion and accept-reject techniques (also recounted in integration is impossible. Given the large dimension 97 47 of the problem, even standard Monte Carlo techniques 98 48 fail to correctly approximate I,sinceexp{−E(θ)/kT } 99 2Historically speaking, the development of BUGS initiated from 49 Geman and Geman (1984) and Pearl (1987), in accord with the de- 100 50 velopments in the artificial intelligence community, and it predates 3MANIAC stands for Mathematical Analyzer, Numerical Inte- 101 51 Gelfand and Smith (1990). grator and Computer. 102 STS stspdf v.2011/02/18 Prn:2011/02/22; 10:52 F:sts351.tex; (Svajune) p. 3 A SHORT HISTORY OF MARKOV CHAIN MONTE CARLO 3 1 is very small for most realizations of the random con- methods, a point already emphasized in Metropolis 52 2 figurations of the particle system (uniformly in the 2N et al. (1953).4 53 3 square). In order to improve the efficiency of the Monte In his Biometrika paper,5 Hastings (1970)alsode- 54 4 Carlo method, Metropolis et al.