<<

Markov chain Monte Carlo in action: a tutorial

Peter J. Green

University of Bristol, Department of Mathematics, Bristol, BS8 1TW, UK.

[email protected]

1. Intro duction

Markov chain Monte Carlo is probably ab out 50 years old, and has b een b oth develop ed and

extensively used in physics for the last four decades. However, the most sp ectacular increase

in its impact and in uence in and probability has come since the late '80's.

It has now come to be an all-p ervading technique in statistical computation, in particular for

Bayesian inference, and esp ecially in complex sto chastic systems.

2. Cyclones example: p oint pro cesses and change p oints

We will illustrate the ideas of MCMC with a running example: the observations are a p oint

pro cess of events at times y ;y ;:::;y in an interval [0;L. We supp ose the events o ccur as a

1 2 N

Poisson pro cess | but at a p ossibly non-uniform rate: say xt p er unit time, at time t;we wish

to make inference ab out xt. We consider a series of mo dels, ultimately allowing an unknown

number of change p oints, unknown hyp erparameters, and a parametric p erio dic comp onent.

The mo dels and the resp ective and inferences will be illustrated by an analysis of

a data set of the times of cyclones hitting the Bay of Bengal; there were 141 cyclones over a

p erio d of 100 years.

Mo del 1: constant rate. First supp ose that xt x for all t. Then the times of the events

are immaterial: we observe N events in a time interval of length L; the obvious estimate of x

N

b

is x = , the maximum likelihood of x under the assumption that N has a Poisson

L

distribution, with mean xL.

Mo del 2: constant rate, the Bayesian way. For a Bayesian approach to this example,

supp ose that wehave prior information ab out x from previous studies, for example. Supp ose

we can mo del this by x   ; :

Then we nd that a posteriori x has a Gamma distribution with mean  + N = + L, or

approximately N=L if N and L are large compared with and . Thus with a lot of data, the

Bayesian p osterior mean is close to the maximum likeliho o d estimator.

There is no need for MCMC in this mo del: you can calculate the p osterior exactly, and recognise

it as a standard distribution; it only worked like this b ecause we used a .

Mo del 3: constant rate, with hyp erparameter. Supp ose you are reluctant to sp ecify

your prior fully: you are happy to say x   ;  and can sp ecify but not , and want

to state only  e; f  for xed e and f a formulation that makes more sense in our next

formulation, mo del 4.

Now pxjN ; ; e; f  no longer has an explicit form, but the ful l conditionals pxjN ; ; ; e; f 

and p jx; N ; ; e; f  are simple:

xjN ; ; ; e; f   + N; + L

as b efore, and

jx; N ; ; e; f  e + ; f + x:

What happ ens if we generate a sample of x;  pairs by alternately drawing x and from

these distributions?

Figure 1 shows the rst few moves of this pro cess applied to mo del 3 on the cyclones data; we

to ok e = 1 and f = N=L = 1:41. The marginal distribution for x, as accumulated from the

rst 1000 sweeps of this pro cess is also displayed in Figure 1. beta oo o o o o o o o oo o o oooo o o o 0123 012345

1.1 1.2 1.3 1.4 1.5 1.6 1.7 0.5 1.0 1.5 2.0 2.5

x x

Figure 1: First few moves of the Gibbs sampler, and marginal distribution for

from 1000 sweeps, for the cyclones data, model 3.

This is a simple example of a Gibbs sampler. The alternating up dates of one variable conditioned

on the other induces Markov dep endence: the successively sampled pairs form a Markovchain

+ +

on the uncountable state space R , and it is readily shown that the joint p osterior is the

unique invariant distribution of the chain. Standard theorems imply that the chain converges

to this invariant distribution in several useful senses, so that we can treat the realised values

as a sample from the p osterior.

Mo del 4: constant rate, with change p oint. Now let us supp ose xt is a step function

a suitable mo del if we p ostulate one or more change points; the pro cess is completely random,

but the rate switches b etween levels. Let us take k change p oints at known times T ;T ;:::;T

1 2 k

with

8

x if 0  t

>

0 1

>

>

<

x if T  t

1 1 2

xt= ;

>

 

>

>

:

x if T  t

k k

Supp ose that x ;x :::;x are a priori indep endently drawn from Gamma distributions, as

0 ; k

b efore: x   ; . Then if N ;N ;:::;N are the numb ers of events b etween adjacent fT g,

j 0 1 k j

the ab ove metho d extends to sampling in turn from

x j   + N ; + T T ;j =0;1:::;k and

j j j+1 j

k

X

j  e +k +1 ; f + x ;

i

i=o

forming a Markovchain with a k + 2-dimensional state space fx ;x ;:::;x ; g. Note that

0 1 k

we write `j' to mean `given all other variables' | including the data.

The hierarchical mo del using random allows `b orrowing strength' in estimation from all the

data together: the x are conditionally indep endent given , but are unconditionally dependent.

j

In inference their values will b e shrunk together.

3. Beyond the Gibbs sampler { MCMC in general

Having motivated the idea of MCMC by use of the Gibbs sampler in a very basic problem, we

are now in a p osition to discuss the sub ject from a rather more general p ersp ective.

The ob jective is to construct a discrete time Markovchain whose state space is X the param-

eter space in , and whose limiting distribution is a sp eci ed target e.g. a

Bayesian p osterior. That is, we want a transition kernel P such that

t 0 0

P fx 2 Ajx g! A as t !1;8x :

Having constructed such a Markov chain, in the sense of devising a transition kernel with this

limiting prop erty, we then construct it in another sense | we form a realisation of the chain

1 2 N

fx ; x ;:::;x g and treat this as if it was a random sample from  .

A number of standard recip es have b een develop ed see, for example, Besag, et al., 1995,

including the Gibbs sampler, and the Metrop olis and Hastings metho ds; the latter is very

general, and can even b e extended to cases where the parameter space is not of xed dimension.

We illustrate some of the extensions to our p oint pro cess mo del that can also b e handled.

Mo del 5: another hyp erparameter. Let us now supp ose is also unknown, with, a

priori,  c; d for xed constants c and d. For the cyclones data, we to ok c = d = 2.

This last change means that is not enough. In a Markov chain with states

x = x ;x ;:::;x ; ; , we can up date using a Metrop olis move, on the

0 1 k

log  scale: the acceptance ratio simpli es to

8 9

 !  !

k +1 c

0

< =

0

Y

 

d k +1

min 1; x e

j

0

: ;

 

Mo del 6: unknown change p oints. If x ;x ;:::;x are unknown, so probably are the times

0 1 k

of the change p oints T

1 2 k 0 1 k 1 2

T ; ; . Let us assume a priori pT ;T ;:::;T  / T T T :::T T L T :

k 1 2 k 1 2 1 k k1 k

The p osterior marginal or joint conditional distributions are quite complex, for this or any

prior, so Metrop olis-Hastings is needed. The details are a little messy but straightforward. The

0

acceptance probability for a prop osal that T be drawn uniformly from [T ;T ] is

j 1 j+1

j

 

0 0

T T T T 

j 1 j +1

j j

min 1; likeliho o d ratio :

T T T T 

j j 1 j +1 j

Mo del 7: unknown number of change p oints

What if the number of change p oints, k , is also unknown? We might place a prior on k , say

Poisson:

k





pk = e

k!

and then make ab out all unknowns: x = k; ; ; T ;:::; T ;x ;:::;x .

1 k 0 k

There are 2k +4 parameters: the number of things you don't know is one of the things you

don't know! o o

o o intensity probability o

o oo

01234 o oooo 0.0 0.10 0.20

0 20406080100 024681012

time k

Figure 2: Posterior sample of step functions xt for model 6 with k = 2, and

posterior for k in model 7, applied to cyclones data.

A variable-dimension Metrop olis-Hastings was applied to this problem, setting the

hyp erparameter  = 3, and one asp ect of the resulting analysis is displayed in Figure 2.

REFERENCES AND FURTHER READING

Besag, J., Green, P. J., Higdon, D. and Mengersen, K. 1995 Bayesian computation and

sto chastic systems with discussion, Statistical Science, 10, 3{66.

Gilks, W. R., Richardson, S. and Spiegelhalter, D. J. eds. 1996 Markov chain Monte Carlo

in practice, Chapman and Hall, London.

Green, P. J. 1995 Reversible jump Markov chain Monte Carlo computation and Bayesian

mo del determination. Biometrika, 82, 711{732.

Green, P. J. 1999 A primer on Markov chain Monte Carlo. To app ear a preprintversion can

be found at http://www.stats.bris.ac. uk/ pe ter/ pape rs/s ems tat. ps.

RESUME

Les m etho de de Monte Carlo par Ch^aines de Markov datent d'environ cinquante ans et ont

et e d evelop ees et utilis ees en Physique p endant les quarante derni eres ann ees. Cep endant leur

impact le plus sp ectaculaire et leur in uence en Statistique et Probabilit e ne remontent qu' ala

n des ann ees 80.

A l'heure actuelle, ces m etho des sont omnipr esentes dans les calculs statistiques, en partic-

ulier dans l'inf erence bay esienne et tout sp ecialement dans l'analyse des syst emes sto chastiques complexes.