INTRODUCTION to MONTE CARLO METHODS Tance Sampling

INTRODUCTION TO MONTE CARLO METHODS DJC MACKAY Department of Physics Cambridge University Cavendish Laboratory Madingley Road Cambridge CB HE United Kingdom ABSTRACT This chapter describ es a sequence of Monte Carlo metho ds imp or tance sampling rejection samplingtheMetrop olis metho dand Gibbs samplingFor each metho d we discuss whether the metho d is exp ected to b e useful for highdimensional problems such as arise in in ference with graphical mo dels After the metho ds have b een describ ed the terminology of Markovchain Monte Carlo metho ds is presented The chapter concludes with a discussion of advanced metho ds includ ing metho ds for reducing random walk b ehaviour For details of Monte Carlo metho ds theorems and pro ofs and a full list of references the reader is directed to Neal Gilks Richardson and Spiegelhalter and Tanner The problems to b e solved The aims of Monte Carlo metho ds are to solve one or b oth of the following problems r R Problem to generate samples fx g from a given probability distri r bution P x Problem to estimate exp ectations of functions under this distribution for example Z N d hxi x P xx Please note that I will use the word sample in the following sense a sample from a distribution P x is a single realization x whose probability distributio n is P x This contrasts with the alternative usage in statistics where sample refers to a collection of realizations fxg DJC MACKAY The probability distribution P x whichwewillcallthetarget density might b e a distribution from statistical physics or a conditional distribution arising in data mo delling for example the p osterior probabilityofa mo dels parameters given some observed data We will generally assume that x is an N dimensional vector with real comp onents x but wewill n sometimes consider discrete spaces also We will concentrate on the rst problem sampling b ecause if wehave solved it then we can solve the second problem by using the random sam r R ples fx g to give the estimator r X r x R r r R Clearly if the vectors fx g are generated from P x then the exp ecta r tion of is Also as the number of samples R increases the variance of will decrease as where is the variance of R Z N d x P xx This is one of the imp ortant prop erties of Monte Carlo metho ds The accuracy of the Monte Carlo estimate equation is indep endent of the dimensionality of the space sampled Tobe So regardless of the dimensionality precise the variance of goesas R r of xitmay b e that as few as a dozen indep endent samples fx g suce to estimate satisfactorily We will nd later however that high dimensionality can cause other dif culties for Monte Carlo metho ds Obtaining indep endent samples from a given distribution P x is often not easy WHY IS SAMPLING FROM P x HARD We will assume that the density from whichwe wish to draw samples P x can b e evaluated at least to within a multiplicative constant that is we can evaluate a function P x such that P x P xZ If wecanevaluate P x why can we not easily solve problem Whyisit in general dicult to obtain samples from P x There are two diculties The rst is that wetypically do not know the normalizing constant Z N Z d x P x MONTE CARLO METHODS 3 3 P*(x) P*(x) 2.5 2.5 2 2 1.5 1.5 1 1 0.5 0.5 0 0 -4 -2 0 2 4 -4 -2 0 2 4 a b Howtodraw samples Figure a The function P x exp x x from this density b The function P xevaluated at a discrete set of uniformly spaced p oints fx gHowtodraw samples from this discrete distribution i The second is that even if we did know Z the problem of drawing samples from P x is still a challenging one esp ecially in highdimensional spaces There are only a few highdimensional densities from whichitiseasyto draw samples for example the Gaussian distribution Let us start from a simple onedimensional example Imagine that we wish to draw samples from the density P xP xZ where i h x P x exp x x We can plot this function gure a But that do es not mean wecandraw samples from it To give ourselves a simpler problem we could discretize the variable x and ask for samples from the discrete probability distribution over a set of uniformly spaced p oints fx g gure b How could wesolve i this problem If weevaluate p P x at each p oint x we can compute i i i X Z p i i and Z p p i i and we can then sample from the probability distribution fp g using various i metho ds based on a source of random bits But what is the cost of this pro cedure and how do es it scale with the dimensionality of the space A sample from a univariate Gaussian can be generated by computing p logu where u and u are uniformly distributed in cosu DJC MACKAY N Let us concentrate on the initial cost of evaluating Z To compute Z equation wehave to visit every p oint in the space In gure b there are uniformly spaced p oints in one dimension If our system had N dimensions N say then the corresp onding number of points would b e an unimaginable number of evaluations of P Even if each comp onent x only to ok two discrete values the number of evaluations of n P would b e anumb er that is still horribly huge equal to the fourth power of the numb er of particles in the universe One system with states is a collection of spins for example a fragment of an Ising mo del or Boltzmann machine or Markov eld Yeomans whose probability distribution is prop ortional to P x expE x where x fg and n X X E x J x x H x mn m n n n mn n The energy function E x is readily evaluated for any x But if wewishto evaluate this function at al l states x the computer time required would b e function evaluations The Ising mo del is a simple mo del which has b een around for a long time but the task of generating samples from the distribution P x P xZ is still an activeresearch area as evidenced by the work of Propp and Wilson UNIFORM SAMPLING Having agreed that we cannot visit every lo cation x in the state space we might consider trying to solve the second problem estimating the exp ec r R uniformly tation of a function x bydrawing random samples fx g r from the state space and evaluating P x at those p oints Then we could intro duce Z dened by R R X r Z P x R r R N and estimate d x xP xby R r X P x r x Z R r MONTE CARLO METHODS Is anything wrong with this strategy Well it dep ends on the functions x and P x Let us assume that x is a b enign smo othly varying function and concentrate on the nature of P x A highdimensional distribution is often concentrated in a small region of the state space known as its H X typical set T whose volume is given by jT j where H X is the ShannonGibbs entropy of the probability distribution P x X H X P x log P x x If almost all the probability mass is lo cated in the typical set and x R N is a b enign function the value of d x xP x will b e principally determined by the values that xtakes on in the typical set So uniform sampling will only stand a chance of giving a go o d estimate of if we makethenumb er of samples R suciently large that we are likely to hit the typical set a numb er of times So howmany samples are required Let us take the case of the Ising mo del again The total size of the state space N H is states and the typical set has size So each sample has a chance H N of of falling in the typical set The numb er of samples required to hit the typical set once is thus of order N H R min So what is H At high temp eratures the probability distribution of an Ising mo del tends to a uniform distribution and the entropy tends to H N bits so R is of order Under these conditions uniform max min sampling maywell b e a satisfactory technique for estimating But high temp eratures are not of great interest Considerably more interesting are intermediate temp eratures such as the critical temp erature at whichthe Ising mo del melts from an ordered phase to a disordered phase At this temp erature the entropy of an Ising mo del is roughly NbitsFor this probability distribution the numb er of samples required simply to hit the typical set once is of order N N N R min whichforN is ab out This is roughly the square of the number of particles in the universe Thus uniform sampling is utterly useless for the study of Ising mo dels of mo dest size And in most highdimensional problems if the distribution P x is not actually uniform uniform sampling is unlikely to b e useful OVERVIEW Having established that drawing samples from a highdimensional distri bution P x P xZ is dicult even if P x is easy to evaluate we will DJC MACKAY φ(x) Q*(x) P*(x) x Figure Functions involved in imp ortance sampling We wish to estimate the exp ecta tion of x under P x P x We can generate samples from the simpler distribution Qx Q x We can evaluate Q and P at any p oint now study a sequence of Monte Carlo metho ds imp ortance sampling rejection sampling the Metrop olis metho d and Gibbs sampling Imp ortance sampling Imp ortance sampling is not a metho d for generating samples from P x problem it is just a metho d for estimating the exp ectation of a func tion x problem It can b e viewed as a generalization of the uniform sampling metho d For illustrative purp oses let us imagine that the target distribution is a onedimensional density P x It is assumed that we are able to evalu ate this density at least to within a multiplicative constant thus wecan evaluate a function P xsuch that P x P xZ But P x is to o complicated a function for us to b e able to sample from it directlyWenow assume that wehave

INTRODUCTION to MONTE CARLO METHODS Tance Sampling

The Exponential Family 1 Definition

1 Introduction to Bayesian Inference 2 Introduction to Gibbs Sampling

Fast Bayesian Non-Negative Matrix Factorisation and Tri-Factorisation

Bayesian Inference in the Normal Linear Regression Model

An Introduction to Markov Chain Monte Carlo Methods and Their Actuarial Applications

Accelerating Bayesian Inference on Structured Graphs Using Parallel Gibbs Sampling

Markov Chain Monte Carlo 1 Importance Sampling

Gibbs Sampling, Exponential Families and Orthogonal Polynomials

Gibbs Sampling

Differentially Private Bayesian Inference for Exponential Families

A New Hyperprior Distribution for Bayesian Regression Model with Application in Genomics

Stat 451 Lecture Notes 0512 Simulating Random Variables