<<

PERFECT SAMPLING USING BOUNDING CHAINS

A Dissertation

Presented to the Faculty of the Graduate Scho ol

of Cornell University

in Partial Fulllmentofthe Requirements for the Degree of

Do ctor of Philosophy

by

Mark Lawrence Hub er

May

c

Mark Lawrence Hub er

ALL RIGHTS RESERVED

PERFECT SAMPLING USING BOUNDING CHAINS

Mark Lawrence Hub er PhD

Cornell University

In Monte Carlo simulation samples are drawn from a distribution to estimate

prop erties of the distribution that are to o dicult to compute analytically This

has applications in numerous elds including optimization statistical

mechanics genetics and the design of approximation algorithms

In the Monte Carlo Markovchain metho d a Markovchain is constructed which

has the target distribution as its stationary distribution After running the Markov

chain long enough the distribution of the nal state will b e close to the stationary

distribution of the chain Unfortunately for most Markov chains the time needed

to converge to the stationary distribution the time is completely unknown

Here we develop several new techniques for dealing with unknown mixing times

First weintro duce the idea of a b ounding chain whichdelivers a wealth of informa

tion ab out the chain Once a b ounding chain is created for a particular chain it is

p ossible to empirically estimate the mixing time of the chain Using ideas such as

coupling from the past and the FillMurdo chRosenthal algorithm b ounding chains

can also b ecome the basis of p erfect sampling algorithms Unlike traditional Monte

Carlo metho ds these algorithms draw samples which are exactly

distributed according to the stationary distribution

We develop b ounding chains for several Markov chains of practical interest

chains from like the SwendsenWang chain for the Ising mo del

the DyerGreenhill chain for the discrete hard core gas mo del and the continuous

WidomRowlinson mixture mo del with more than three comp onents in the mix

ture We also givetechniques for sampling from weighted p ermutations whichhave

applications in database access and nonparametric statistical tests In addition

chains for a variety of Markov chains of theoretical in we present here b ounding

terest such as the k coloring chain the sink free orientation chain and the anti

ferromagnetic Potts mo del with more than three colors Finally we develop new

Markov chains and b ounding chains for the continuous hard core gas mo del and

the WidomRowlinson mo del which are provably faster in practice

Biographical Sketch

Mark Lawrence Hub er was b orn in in the quaint town of Austin Minnesota

home of the Hormel corp oration Sensing the eventual rise to power of Jesse The

Bo dy Ventura Marks family moved out of the state living in Yukon OK Still

water OK Ames IA Corvallis OR and Hays KA b efore setting in La Grande

OR where he graduated high scho ol in Mark then moved south to sunny

Claremont California and Harvey Mudd College where he completed a Bachelor of

e land Mark headed Science in Mathematics in Missing the snow of his nativ

to Cornell and the Scho ol of Op erations Research and Industrial Engineering Up on

completion of his PhD Mark will sp end the next two years at Stanford Univer

sity as a National Science Foundation Postdo ctoral Fellow in the Mathematical

Sciences iii

To my grandmother whose love has always been an inspiration iv

Acknowledgements

FirstofallI would like to thank my advisor David Shmoys who has lasted stoically

through my research ups and downs my eccentric grammar and my complete lack

of organizational skills He has b een a go o d friend and a great advisor and I hop e

to work with him again in the future

Other professors have also had a great impact on my Cornell exp erience The

remaining members of my thesis committee Sid Resnick Jon Kleinb erg and Rick

Durrett were always willing to lend a hand Id also like to thank Persi Diaconis

who rst intro duced me to the b eauty and elegance of rapidly mixing Markovchain

hadnt known existed theory aworld of problems and techniques that I

This thesis would not be here without the tireless eorts of Nathan Edwards

A

whose computer exp ertise L T X prowess and willingness to listen to my litanyof

E

problems is second to none

Financially this work was supp orted through numerous sources An Oce of

Naval Research Fellowship carried me through the early years with ONR grants

N NSF grants CCR CCR DMS and

ASSERT grant N nishing the job v

I would also like to thank my ro ommates during my stay at Cornell Bas de

Blank and Chris Papadop oulos b oth of whom had to suer with my p olicies or

lack thereof regarding the prop er storage of all manner of items

The friends I have made here are to o many to list however I would like to

sp ecically thank my graduate class Greta Pangb orn Nathan Edwards Ed Chan

Paulo Zanjacomo Semyon Kruglyak and Fabian Chudak A guy couldnt ask for

a better group of p eople to commiserate with Thanks go out to Stephen Gulyas

who always had a sp ot on his intermural softball teams for me and who was always

willing to try to turn my tennis swing back into a baseball swing come springtime

Finally I would like to thank all of my family from the Midwest to the Pacic

Northwest whose supp ort has always b een unconditional Some great events have

happ ened while Ive b een at Cornell and Im glad to have b een able to b e there for

them vi

Table of Contents

The Need for Markov chains

Monte Carlo Markov chain metho ds

Markov chains

Going the distance

The Discrete Mo dels

The Ising mo del

The antiferromagnetic Ising mo del and MAX CUT

The Potts Mo del

The hard core gas mo del

The WidomRowlinson mixture mo del

Q colorings of a graph

Sink Free Orientations of a graph

Hyp ercub e slices

Applying the Monte Carlo metho d

Building Markov chains

Conditioning chains

The Heat Bath chain

Metrop olisHastings

The acceptance rejection heat bath chain

The Swap Move

The Ising and Potts mo dels

Antiferromagnetic Potts mo del at zero temp erature

SwendsenWang

Sink Free Orientations

WidomRowlinson

The Antivoter mo del

The List Up date Problem

Hyp ercub e slices vii

What remains mixing time

Bounding Chains

Monotonicity

Antimonotonicity

hain approach The b ounding c

Bounding the DyerGreenhill Hard Core chain

Martingales

Running time of b ounding chain for DyerGreenhill

Bounding chains for other mo dels

Q coloring chain

The Potts mo del

SwendsenWang

Sink free orientations

Hyp ercub e slices

WidomRowlinson

Nonlo cal conditioning chain

The single site heat bath chain

The birth death swapping chain

The antivoter mo del

The list up date problem

Application to nonparametric testing

Other applications of b ounding chains

Perfect sampling using coupling from the past

Reversing the chain

Coupling from the past

CFTP and b ounding chains

Coupling from the future

Perfect sampling using strong stationary times

Upp er b ounds on the strong stationary

Application to lo cal up date chains

The Hard Core Gas Mo del

Single site WidomRowlinson

Application to nonlo cal chains viii

Continuous Mo dels

Continous state space Markov chains

Continuous time Markov chains

The Continuous Hard Core Gas Mo del

Continuous b ounding chains

More on sup ermartingales

The Swapping Continuous Hard Core chain

WidomRowlinson

Continuous swapping chain

Final thoughts

Bibliography ix

List of Tables

SwendsenWang approach comparison x

List of Figures

General Monte Carlo Markov chain metho d

The general heat bath Markov chain

Single site hard core heat bath chain

Nonlo cal hard core heat bath chain

The general Metrop olisHastings Markov chain

Single site hard core Metrop olisHastings chain

The general acceptance rejection heat bath Markov chain

Acceptance rejection single site hard core heat bath chain

Dyer and Greenhill hard core chain step

Single site Potts heat bath chain

Single site Q coloring heat bath chain

SwendsenWang chain

Single edge heat bath sink free orientations chain

Single site heat bath discrete WidomRowlinson chain

Nonlo cal conditioning discrete WidomRowlinson chain

Birth death discrete WidomRowlinson chain

Birth death swapping discrete WidomRowlinson chain

The Antivoter mo del chain step

MA list up date chain

MTF list up date chain

Adjacent transp osition for MA distribution chain

Arbitrary transp osition for MA distribution chain

Heat bath hyp ercub e slice chain

Exp erimen tal Upp er Bounds on the Mixing Time

Bounding chain step for DyerGreenhill chain

Single site acceptance rejection heat bath Q coloring chain

Bounding chain for single site Q coloring heat bath chain

Single site acceptance rejection heat bath Potts chain xi

Single site acceptance rejection heat bath Potts b ounding chain

SwendsenWang b ounding chain

Single edge heat bath sink free orientation b ounding chain

Alternative heat bath hyp ercub e slice chain

Heat bath hyp ercub e slice b ounding chain Phase I

Heat bath hyp ercub e slice b ounding chain

Nonlo cal conditioning chain for WidomRowlinson

Acceptance rejection single site heat bath WidomRowlinson b ound

ing chain

Antivoter b ounding chain

MA list up date chain

Arbitrary transp osition for MA b ounding chain

Arbitrary transp osition for MA b ounding chain

Coupling from the past CFTP

Complete coupling strong stationary times

Bounding Chain for Lot wickSilverman

Swapping continuous hard core chain

Swapping continuous hard core b ounding chain

Birth death continuous WidomRowlinson chain

Birth death continuous WidomRowlinson chain

Swapping birth death continuous WidomRowlinson chain

Swapping birth death continuous WidomRowlinson chain

T for list up date chain

BC xii

Chapter

The Need for Markov chains

How dare we sp eak of the laws of chance Is not chance the antithesis

of all law

Bertrand Russel l Calcul des probabilites

Mo deling a roulette wheel is quite a bit simpler with than

with Newtonian mechanics While it is theoretically p ossible to observe the initial

angular momentum imparted to the wheel and ball followed by an equally intense

investigation into the frictional prop erties that cause the ball to land on red instead

of black a probabilistic approach captures the phenomenon in an elegant man

ner and yields results that are b oth accurate and insightful as regards long term

predictions over whether the player will return home p o orer than b efore

Analysis of gambling systems provided an initial imp etus to build a theory of

probability but to day many systems whose interactions are to o complex to mo del

exactly are mo deled probabilistically Often this provides enormous gains b oth in

the simplicity of the mo del and in the ability to analyze various prop erties of the

system

Todays mo dels have evolved into probability distributions that are quite easy

to describ e but which require the use of sophisticated arguments to analyze di

rectly A way to avoid use of these complicated techniques is to use a simulation

approach When using this technique no attempt is made to analyze prop erties

of the distributions directly instead samples are drawn from the distributions and

then statistical estimates are formed for prop erties of interest Once the abilityto

generate random samples is given a host of statistical metho ds can be brought to

b ear on the problem The accuracy and application of these statistical estimates has

been extensively studied The question that remains is how exactly do es one draw

these random samples in an ecient manner The answer for many applications is

to use Monte Carlo Markov chain metho ds This work intro duces a new metho d

in this area called b ounding chains an idea that often can lead to theoretical and

exp erimental insightinto a problem

In this rst chapter we describ e the Markov chain metho d starting with basic

denitions and facts concerning Markov chains and laying out the notation that

will be used throughout this dissertation The next chapter presents the discrete

mo dels and distributions to which we will be applying our metho ds after which

we describ e the Markov chains develop ed previously for these problems In the

fourth chapter we intro duce our primary to ol b ounding chains Bounding chains

provide a basis for algorithms for exp erimentally determining the time needed to

draw approximate random samples using a Markov chain as well as forming the

foundation for algorithms that draw p erfectly random samples After intro ducing

b ounding chains we showhow this approachmay b e utilized for each of the discrete

mo dels discussed earlier The following chapter explores the dierence between

approximate and p erfect sampling with an exp osition of one technique for p erfect

sampling the coupling from the past metho d of Propp and Wilson After

that we present another p erfect sampling technique an algorithm of Murdo chand

Rosenthal that builds on earlier work of Fill as well as the rst analysis of its

running time Finally we turn to continuous mo dels and explore how the successful

techniques from the discrete case may be used to construct and analyze Markov

chains for innite state space pro cesses

For many probability distributions of interest we present the rst algorithms for

p erfect sampling the hard core gas mo del using the DyerGreenhill chain the heat

bath chain for sinkfree orientations of a graph the heat bath chain for k colorings

of a graph the antiferromagnetic Potts mo del these last two were indep endently

discovered though not fully analyzed in the move ahead one chain for database

access the continuous WidomRowlinson mixture mo del with at least elements in

the mixture the antivoter mo del and the SwendsenWang chain for the Ising mo del

For the continuous hard core gas mo del wedevelop new b ounds on the mixing time

of the chain In addition wedevelop new Markovchains for the continuous hard core

gas mo del the repulsive area interaction mo del and the WidomRowlinson mo del

which ha ve provably stronger bounds on the mixing time than were previously

known The work on p erfect sampling and mixing times primarily emerges from an

understanding of b ounding chains a simple but extraordinarily p owerful to ol The

improved chains come from a generalization and application of a simple swapping

move of Bro der to a wide varietyofchains

Monte Carlo Markov chain metho ds

Anyone who has ever shued a deck of cards has used a Monte Carlo Markov

chain MCMC metho d The goal of any MCMC algorithm is to draw a random

sample from a sp ecic probability distribution In the case of shuing cards the

distribution is the uniform distribution over all p ermutations of the cards

A Markov chain is a sto chastic pro cess p ossessing the forgetfulness prop erty

Informally this means that the next state of the chain dep ends only on the value

of the previous state and random choices made during that time step It do es not

dep end on the value of any prior state This prop ertymakes simulation of a Markov

chain very easy The user need only compute a random function of the currentstate

without regard to the values of prior states

For the rst ve chapters of this dissertation we will be dealing with Markov

chains whichhave a nite state space and so the denitions wegive here will apply

to this discrete case In chapter we expand our scop e to include continuous state

t a more general treatmentofMarkov chains in that chapter spaces and we presen

Markov chains

Let be a nite set which we will call the sample space or state space Let

X X X X be a sto chastic pro cess with values chosen from Let

X X X X be the algebra generated by X X

i i i

Denition The X X X X on is said to

be a Markov chain if

P X j j X X P X j jX

i i i i i

We will use a matrix P to denote the probabilityofmoving from state i to state

j at a giv en time More precisely let P i j P X j jX i

t t

Denition An jjjj matrix P is a transition matrix for a Markov chain if

P i j P X j jX i

t t

for al l i and j in

From the denition of P this fact follows immediately

Fact Suppose that the random variable X has distribution p Then X has

t t

distribution pP

An easy induction argument together with the ab o ve fact yields the following

k k

Fact Let P i j denote the i j element of matrix P Then

k

P i j P X jX i

tk t

Often it is desirable that the pro cess X be able to move over every state of the

Markov chain

Denition A Markov chain is connected or irreducible if for al l i j in there

exists a time t jj such that

t

P i j

Consider a on a graph where the states are no des of a bipartite

graph and at each time step the state changes to a random neighbor of the no de

Then at all even times the state will b e in one bipartition and at o dd times it will

b e in the other Wesay that suchaMarkovchain has p erio d More generallywe

have the following denition

Denition Suppose that we have an irreducible Markov chain and that is

partitioned into k sets E fE E g If for al l i m and al l

k

P

x y E P x y where j i mod k then E forms a k cycle in the

i

y E

j

Markov chain The period of a Markov chain is the largest value of k for which a

k cycle exists If k the Markov chain is said to be ap erio dic

ogether the prop erties of ap erio dicity and irreducibilityhave imp ortant impli T

cations for a Markov chain

Denition A Markov chain is ergo dic if it is both connected and aperiodic

WehaveseenthatifX has distribution pthenX will have distribution pP

t t

If in fact pP p then X will have the same distribution as X Induction can

t t

be used to show that X will all b e identically distributed for all t t

t

Denition If P then is a stationary distribution of the Markov chain

Throughout this work the symbol will b e used to denote the stationary distribu

tion of some Markov chain though often we will describ e b efore we describ e the

Markov chain for which it is the stationary distribution

Ergo dic Markov chains are useful for the following reason

Theorem An ergodic Markov chain has a unique stationary distribution

Intuitively a Markov chain step represents a moving of probability ow along

edges i j such that P i j The stationary distribution is the probability

distribution such that probability ow into each no de is exactly balanced by the

probabilityow out of eachnode This is a general balance condition that owin

equals ow out A more restrictive condition would b e to require that ow across an

edge in one direction is exactly balanced bytheow across the edge in the opp osite

direction

Denition A Markov chain is reversible or is said to satisfy the detailed bal

ance condition if for al l i j in

iP i j j P j i

If iP i j j P j i then we say that the Markov chain is reversible with

respect to the distribution and in fact wil l be a stationary distribution of the

chain

A Markov chain satises the detailed balance condition if at stationarity the

probabilityow along each edge i j is the same as the probabilityow along each

edge j i The following shows that detailed balance is a stronger condition than general balance

Fact If a Markov chain is reversible with respect to the distribution p then p

is a stationary distribution for the chain If the Markov chain is connected this p

is the unique stationary distribution for the chain

The names reversibility and detailed balance app ear to have nothing what

so ever to do with one another In chapter we go into more detail ab out why this

condition is also called reversibility

Going the distance

Our ultimate goal is to sample from a target distribution Given an algorithm which

generates from some probability distribution we need a metho d for determining

when our algorithmic output is close to our desired distribution That is we need

a metric on distributions

We will use two common measures of distance the total variation distance and

the separation

Denition The total variation distance between a pair of distributions p and

q is denoted jjp q jj and dened as

TV

jjp q jj sup jpA q Aj

TV

A

The following fact ab out total variation distance will come in handy later

Fact For discrete state spaces

X

jjp q jj jpx q xj

TV

xjj

The separation b etween a distribution and is dened as follows

Denition The stationary distance between p and is denoted jjp jj and

S

dened as

pA A A

jjp jj sup sup

S

A pA

A

Aj A

Since A forallA clearly jjp jj jjp jj and this is a stronger

S TV

way of measuring how far p is from stationarity

These two means of quantifying how close the current state is to stationarity

have some advantageous theoretical prop erties and are by far the most commonly

seen in practice

The heart of the Monte Carlo Markov chain metho d is the idea that if a chain

is run for a long time from any starting distribution then it will move towards a

stationary distribution of the chain Armed with our metrics we may now make

statements ab out limits of distributions and make this concept precise

t

Theorem Suppose we have an aperiodic Markov chain Then lim pP wil l

t

be stationary and if the Markov chain is ergodic then

t

lim pP

t

where is the unique stationary distribution of the chain

t

w fast Once we know that pP is converging to the next logical question is ho

is it converging We this by the mixing time of the chain A fact will make

this denition easier

Fact Given an ergodic Markov chain with stationary distribution then we

t

have that jjpP jj is a monotonical ly decreasing function of t Put another

TV

t t

way jjpP jj jjpP jj for al l t t

TV TV

Let denote the distribution that puts probability on state x and elsewhere

x

t t

and set P P This is the distribution of a pro cess that was begun in state x

x

x x

and run for t time steps

t

Denition Let x be the smal lest time such that jjP jj Let

TV TV

x

max x

TV TV

x

Dene x and in the same fashion using the stationary distance

S S

Once we know the mixing time of a chain describing the basic Monte Carlo

Markov chain metho d is easy

Monte Carlo Markov chain MCMC metho d

Input P

Set X x for some x

For i to t

x

Take one step on the Markov chain P from X

i

Set X to b e the output of this step

i

Output X

t

x

Figure General Monte Carlo Markov chain metho d

The only thing preventing this from b eing an actual algorithm is lackofknowl

edge ab out the value of While many heuristics exists for determining this x

value we will concern ourselves in the following chapters with developing upp er

bounds for which are always accurate

Denition When is polynomial in ln and ln we say that the

chain is rapid ly mixing

In this work we will develop metho ds for showing when chains are rapidly mixing

and ways to deal with them when they are not

Chapter

The Discrete Mo dels

The laws of history are as absolute as the laws of physics and if the

probabilities of error are greater it is only b ecause history do es not deal

with as manyhumans as physics do es atoms

Isaac Asimov Foundation and Empire

The Monte Carlo Markov chain metho d is applicable to an amazing range of

problems Anywhere one wishes to obtain estimates of statistics for a probabilistic

mo del often MCMC is the only reasonable approach for even approximating the

answer to a problem

One of the richest sources of problems in this area comes from statistical mechan

ics In this area substances are mo deled as random samples drawn from probabil

ity distributions These distributions have particular values for physical parameters

such as energy Any real substance contains on the order of particles so central

limit theorems denitely apply and statistics for a mo del are often highly concen

trated ab out a single value Naturally in evaluating the usefulness of a particular

mo del the question arises of what is the average of a particular statistic over a dis

tribution For sp ecial cases this question may be answered analytically but often

the distribution is to o complex to allow for a direct approach

What is p erhaps surprising is that many of these mo dels from

have counterparts of interest in theoretical computer science Evaluation of most

statistics of interest in these mo dels are examples of P complete problems and

often simply being able to generate a random sample from these problems cannot

be done eciently unless RP NP These statistical mo dels were intro duced in

many cases decades b efore the corresp onding graph theoretical problem was shown

to be NPhard

Recall that a problem is in NP if it is a decision problem where a certicate

that the answer is true may be checked in p olynomial time Problems in this class

include determining whether a boolean expression is satisable or whether or not

a graph has a prop er coloring Optimization versions of problems in NP include

the traveling salesman problem and integer programming

The class P is the set of problems where the goal is to count the number of

accepting certicates to a problem in NP For example consider once more the

problem in NP of nding an assignmentofvariables which leads to a given Bo olean

expression being true The corresp onding problem in P is to count the number

of assignments which lead to the expression b eing true Clearly a problem in P

is more dicult than one in NP since if we know how many assignments are true

then we certainly know if at least one assignement is true

Given the diculty of solving a P problem exactly the logical question is when

can we develop algorithms which approximate the true answer For most of these

problems the ability to sample from the distribution of interest immediately yields

such an approximation algorithm

The statistical physics mo dels we consider are interesting for their ability to

mo del and predict real world substances and also for their theoretical prop erties

Often these mo dels exhibit a phenomenon known as a where a

small change in the parameter of the mo del leads to an enormous change in the

macroscopic prop erties of the distribution Phase transitions are often linked to the

sp eed at which a Markov chain simulation runs Roughly sp eaking on one side of

the phase transition a chain may be rapidly mixing but on the other side it may

converge very slowly

This behavior is similar to that of the NPcomplete problems related to these

mo dels It is well known that for many problems approximating an answer to an

optimization problem to within a certain constant or higher may be p ossible in

p olynomial time while any improvement in that constant leads to a NPcomplete

problem

For all the mo dels we consider the sample space will b e the colorings of a graph

V

that is C where V is the vertex set of a graph V E and C fQg

is a set of Q colors This contains a wide variety of problems from generating a

V

random p ermutation where V to mixture mo dels of gases Often the graphs

considered in these mo del are simple lattices in or dimensions although they

will b e dened for arbitrary graphs Throughout this work we will use n to refer to

the number of no des in the graph and m to refer to the number of edges

Of course not all of the mo dels we will consider come from statistical physics We

also present problems arising from database searches and the numerical evaluation of

statistical tests as wellasseveral mo dels of interest for their theoretical prop erties

The Ising mo del

Perhaps the most venerated mo del in statistical physics is the Ising mo del This

is sometimes called the LenzIsing mo del since Lenz rst prop osed it to Ising who

was his student at the time The mo del is quite simple yet contains within it the

phase transition b ehavior mentioned earlier

The mo del was rst intro duced as a mo del of magnetism More recently it has

found use as a mo del of alloys and for Quantum Chromo dynamics computations

The idea is simple Our color set consists of two colors C f g Following the

use of the Ising mo del as a mo del of magnetism we shall refer to no des colored

as up and no des colored as spin down

A conguration x consists of an assignation of colors to each of the no des of the

graph V E The Hamiltonian of a conguration H x is set to b e

X

H x xixj

ij

ij E

variables measure the strength of interaction across a particular edge Al The

though the metho ds we will discuss can deal with the case of arbitrary for sim

plicitywe will assume that every is

ij

The Ising mo del is a probability distribution on the set of congurations The

probability of selecting conguration x is

P

exp fJHxkT g B xv

v

v

x

Z

T

Here J is either for or for antiferromagnetism B is a pa

rameter that measures the external magnetic eld k is Boltzmans constant T is

the temp erature of the mo del and Z is the value which makes a probability

T

distribution ie the normalizing constant Often Z is referred to as the partition

T

function

It will be helpful to gain some intuition ab out how the parameters interact

Supp ose we are dealing with the ferromagnetic Ising mo del J Note that

H x is large when the values of xi and xj for an edge i j are the same

Hence a conguration where the endp oints of edges receive the same color is more

likely than a conguration where they are dierent Physically this means that the

spins tend to line up making for a stronger magnet

If J antiferromagnetism then the highest probability states are ones

where the endp oints of edges are colored dierently Here the Hamiltonian is largest

when spins do not align

The value B measures the presence of an external magnetic eld that biases

v

no de v While the the conguration towards either spin up or spin down at the

techniques that we use can be mo died to incorp orate a nonzero B for simplicity

we will always take B to b e

The temp erature measures how free the spins are to ght their natural ferro

magnetic or antiferromagnetic tendencies When the temp erature is very high x

is roughly Z regardless of the value of H x In this case all the congurations

T

are equally likely and each individual no de is almost as likely to b e spin up as spin

down

When T is very small only states with very large H xhave signicant probabil

ity of o ccurring For instance in the ferromagnetic state with high probability most

of the states will b e p ointing in the same direction This means that the state tends

to exhibit long range b ehavior where the spin of two no des on the opp osite sides of

the graph are highly correlated The distribution changes smo othly with T and

so there is a point where these long range correlations start to app ear Roughly

sp eaking this is the idea b ehind phase transitions

Technically phase transitions inv olve discontinuities in prop erties of the graph

and so truly only o ccur in graphs with an innite number of no des However even

for relatively mo dest size graphs the presence of a phase transition will result in

large changes in the prop erties of a graph with small changes in a parameter suchas

T One of the reasons for the primacy of the Ising mo del in statistical simulations

is the fact that this simple mo del exhibits a phase transition for graphs such as

the dimensional lattice Phase transitions are of course prevalent throughout the

physical world for instance the pro cess of ice turning to water is a phase transition

The constant k is Bolzmanns constant and arises out of the statistical me

chanical justication for the Hamiltonian in the Ising mo del Note that x is a two

coloring and therefore denes a cut of the graph Let C x denote the number of

edges which cross the cut this is the unweighted value of the cut Then as wehave

dened it H xC x m where m is the number of edges in our graph The

exp mkT term in the exp onential is a constant so we may intro duce a scaled

temp erature T such that

exp fJCxT g

Z

T

The antiferromagnetic Ising mo del and MAX CUT

There exists a very close relation b etween the antiferromagnetic Ising mo del and the

problem MAX CUT which is the problem of nding the largest cut in an arbitrary

graph This problem is known to b e NPcomplete so if this problem could b e solved

in p olynomial time then every problem in NP could b e solved in p olynomial time

This makes it quite unlikely that an ecient solution to this problem will b e found

The probabilityof generating a particular cut from the antiferromagnetic Ising

n C x

mo del will be exp fC xT gZ When T ln n x Z Let the

T T

conguration x be the coloring asso ciated with the maximum cut in the graph

max

n

There are only other colorings of the graph hence the total weight of cuts

which are of smaller size than the maximum is at most

n n n C

C x Z x Z x

max T max T max

Hence with probability a random sample drawn from this distribution will nd

the maximum cut in the graph In fact when the graph is regular all degrees of

the graph are the same we maydo much b etter

of each no de in the graph We shall show that when Let denote the degree

T O an algorithm for sampling from the Ising mo del leads to a constant

factor approximation algorithm for MAX CUT

Denition Given a maximization problem with optimal value OPT and a poly

nomial time algorithm which produces solutions with value ALG we say that it is

a approximation algorithm if ALGO P T Similarly if OPT is the opti

mal solution for a minimization problem we have a approximation algorithm if

ALGO P T

Theorem Suppose that we have a graph of bounded degree and an ecient

means for sampling from the on arbitrary graphs of maximum degree

T ln

for some T Then if we have a randomized Let

approximation algorithm for MAX CUT

Pro of Let A denote the set of congurations x such that C x C x

max

Then

X

exp fC xT g

A

Z

T

xA

X

exp fC x T g

max

Z

T

xA

C x T

max

x jAje

max

n ln C x T

max

x e

max

In order to have x A we need n ln C x T or

max max

T C x n ln max

Since there are n edges we know that a maximum cut contains at least

n edges Therefore if

n

T

n ln ln

then we have a randomized approximation to MAX CUT

It turns out that even for b ounded degree graphs solving MAX CUT is not only

NPcomplete but APXcomplete Lo osely sp eaking APX is the set of optimization

problems such that there exists a b ound such that if a b etter than approximation

algorithm exists then P NP For us this means that unless RP NP there

exists a constant such that no p olynomial time algorithm exists for sampling

from the Ising mo del when T It is not surprising then that in chapter we

shall analyze an algorithm of Haggstrom and Nelander and show that it runs

in p olynomial time when T

This all indicates the dicultyofeven sampling from the an tiferromagnetic Ising

mo del The ferromagnetic Ising mo del is a dierent story and in fact a p olynomial

time algorithm exists that generates samples drawn approximately from

the distribution of the Ising mo del for arbitrary graphs

The Potts Mo del

In the Ising mo del we had spin up and spin down but we live in a three dimensional

world It is easy to consider a mo del with spin up down right left into the plane

and out of the plane In general instead of using two colors we now use Q colors

This is the Potts mo del and it is an imp ortant generalization of the Ising mo del

Let x be a coloring of the no des using colors from C f Q g Then

C x now refers to the number of edges which cross the Q way cut determined by

x As in the Ising mo del we let

exp fJCxT g

x

Z

T

where J for ferromagnetism and J for antiferromagnetism

As with the Ising mo del the antiferromagnetic Potts mo del is NP dicult to

sample from at arbitrary temp eratures with the reduction beingtoMAX Q CUT

Unlike the Ising case however no metho d for sampling from the ferromagnetic

Potts mo del is known to run quickly at all temp eratures We will give a partial

answer to this question by analyzing two chains for the Potts mo del single site

up date and SwendsenWang

The hard core gas mo del

In the hard core gas mo del we again two color a graph However here coloring

a no de means that a gas molecule o ccupies that no de while the color means

that it is empty These molecules take up a xed amount of space the core of the

molecule These cores are hard and so are not allowed to intersect In our mo del

this means that no two adjacent no des contain gas molecules One way to enforce

this condition mathematically given a coloring x is to require that xixj for

P

i j Let nx xv If we think of the conguration x as dening all edges

v

the lo cations of a set of gas molecules then nx represents the numb er of molecules

in the conguration The distribution over this congurations is

nx

x

Z

where is a parameter known as the activityor fugacity

This typ e of set where no two no des which are adjacent are b oth in the set

is known as an indep endent set Just as the Ising mo del is a close relative of

MAX CUT the hard core gas mo del is closely linked to the problem of nding the

maximum indep endent set of a graph which is an NPcomplete problem When

n then with probabilityat least n the largest indep endent set in the graph

is chosen In fact Dyer Frieze and Jerrum showed by reduction to a dierent

NP complete problem that it there do es not exist an ecient means for sampling

from this distribution when and unless RP NP even when the

graph is restricted to be bipartite

The WidomRowlinson mixture mo del

Related to the hard core gas mo del is the WidomRowlinson mixture mo del

In this section we consider a discrete version of the mo del Supp ose we have

yp es of substances in a mixture Particles of substance i are allowed Q dierent t

to b e close to one another However two adjacent sites cannot b e o ccupied by two

dierent substances Mathematically C f Qg with the color indicating

that a site is empty and color i indicating that a particle of substance i o ccupies

the site We require that for all edges i j of the graph either X i X j or

X iX j

Haggstrom and Nelander gave an p erfect sampling algorithm for this discrete

mo del when Q Later we will present an improved algorithm for which sharp er

running time b ounds maybeshown In addition in chapter we will construct an

p erfect sampling algorithm for the continuous case as well

Q colorings of a graph

As the temp erature T drops in the antiferromagnetic Potts mo del more weight is

given to those states where the size of the cut is the entire graph In other words

the highest weightisgiven to colorings where the endp oints of an edge have dierent

colors

Denition A Q coloring of the nodes of a graph is prop er if the endpoints of

each edge in the graph have dierent colors

One way of dening the distribution when T is to give equal weight to a prop er

Q coloring of the graph and weight if the endp oints of an edge are given the same

color

Jerrum constructed a Markov chain for sampling uniformly from the color

ings of a graph when Q It is NP hard to determine if there is a coloring

when Q and for Q the chain of Jerrum is not connected However for

Q very little is known ab out the b ehavior of the chain In chapter

we give an p erfect sampling algorithm for this problem also indep endently given

in

This problem of sampling from the Q colorings of a graph also provides an easy

illustration for how a random sampling algorithm can b e turned into an approximate

counting algorithm The problem of counting the number of Q colorings of a graph is

a P complete problem Supp ose that k so weknow from Jerrums work that

we have an approximate sampling algorithm The work in gives an algorithm

for creating an p erfect sampling algorithm that runs in O mT time where T

RS RS

is the time needed to take a random sample We present here an algorithm that

runs in O nT time

RS

Sink Free Orientations of a graph

Given an undirected graph a sink free orientation is an assignment of directions to

edges such that no edge has outdegree The problem of computing the number of

sink free orientations of a graph is P hard and is also a sp ecial case of evaluating

the Tutte p olynomial of a graph making this problem of theoretical interest As

with the other problems we consider this one may be formulated as a coloring

Supp ose that we have an edge e fi j g where i j Then if we have in our

coloring xe the edge is oriented i j but if xe the edge is oriented

P P

xfi j g xfi j g j i To b e sink free we require that

ji ji

for all no des i

Hyp ercub e slices

n

Ahyp ercub e may b e thoughtofashaving state space f g where n is the dimen

sion of the cub e Edges exist between any two points in the state which dier by

at most one co ordinate so that changes only o ccur parallel to the co ordinate axis

Each p ossible state is considered equally likely

If the k colorings of a graph may b e considered the zero temp erature limit of the

Potts mo del then the hyp ercub e chain maybethought of as the innite temp erature

limit Here each p ossible conguration is equally likely and the coloring of a site

co ordinate is completely indep endent of the other co ordinates

A restriction of the hyp ercub e is used to study instances where we wish the

magnetization is the Ising mo del to remain constant Recall that for the Ising

mo del the magnetization is the number of no des colored In the hyp ercub e slice

mo del we restrict the set of allowable conguration Let L and U be integers with

n

L U Then the state space of the hyp ercub e slice mo del is those p oints in f g

P

satisfying L xi U If L is close to U then the magnetization will be

i

roughly constant

Applying the Monte Carlo metho d

All the mo dels discussed ab ove are probability distributions and the question of

interest is ho w to sample eciently from these distributions For decades construc

tion of Markov chains has been p ossible for these problems and others by using

some basic techniques We next explore the most successful of these techniques

Chapter

Building Markov chains

On two o ccasions I have b een asked by memb ers of Parliament Pray

Mr Babbage if you put into the machine wrong gures will the right

answers come out I am not able rightly to apprehend the kind of

confusion of ideas that could provoke such a question

Charles Babbage

Perhaps the earliest Monte Carlo algorithm outside of astrology is a metho d

for nding the value of the numerical constant dating back to the s This

involved ipping a to othpick onto ruled pap er and measuring the number metho d

of times the to othpick crossed the lines and was used by some to while away the

time Even at that time far b etter metho ds existed for approximating as this

metho d is agonizingly slow

With a computer pseudorandom numb ers can be generated by the millions

and used to drive a Markov chain towards its stationary distribution As Babbage

notes however the right gures must rst be put into the machine In our case

that means that we must be able to construct a Markov chain whose stationary

distribution is the same as the distribution given by our probabilistic mo dels

Broadly these chains fall into two classes lo cal up date chains and nonlo cal

V

up date chains Recall that for all of our problems C consists of the colorings

of a graph Lo cal up date algorithms take advantage of this structure by up dating

the colors on only one or two no des at a time Nonlo cal up date algorithms are often

far less intuitive than their lo cal counterparts but man y times can avoid regions

where lo cal chains are not rapidly mixing by changing the color of many no des

simultaneously

Conditioning chains

The reason why we cannot sample directly from these distributions is that they

are dened as xw xZ where w xis an easily computable weight function

and Z is an unknown normalizing constant that is often very dicult to compute

Conditioning will allowus to eliminate the need to know Z when taking a step

As an example consider the hard core gas mo del where the desired distribution

nx

is x Z Here the weight function is trivial to compute but nding Z is

P hard Supp ose that someone else has generated a random sample and sent

it to us however the color at no de v is missing All of the other color data came

through just ne Wenowseek to create a new random sample from conditioned

on the values of the colors at all no des other than no de v

If one of the neighbors of v is colored that is included in the indep endent

set then we did not need the missing data since v must be colored However

if all the neighb ors of v are colored then two congurations are p ossible We

wish to cho ose random X v given X V nfv g Note that nX when X v is

exactly more than nX when X v Let x denote the conguration where

v is colored and x denote the conguration where v is colored Then

P X x

P X x jX V nfv g xV nfv g

P X V nfv g

P X x

P X x P X x

nx

Z

nx nx



Z Z

Similarly it is easy to show that

P X x jX V nfv g

A wonderful thing has o ccurred in that Z was canceled out in the computation

This is not an accident but a general feature of conditioning arguments

More generally supp ose that we know the value of a conguration on all but

a small set of no des V here the U in the subscript stands for unknown Then

U

xV is the coloring of V and xV n V is the coloring on the remaining no des

U U U

so that xV xV n V describ es a complete conguration on V Given part of

U U

a conguration xV n V we can randomly extend it to a complete conguration

U

by using

w xV xV n V Z

U U

P X V xV jX V n V xV n V

U U

P X V n V xV n V

U U

w xV xV n V

U U

P

w xV xV n V

U U

xV

U

w xV xV n V Z

U U

P

w xV xV n V Z

U U

xV

U

w xV xV n V

U U

P

w xV xV n V

U U

xV

U

Again the normalizing constant Z has dropp ed completely out of the picture The

general conditioning chain then picks some subset of no des at random discards

their value and then replaces the colors on the subset by randomly extending the

remaining coloring

The Heat Bath chain

The heat bath chain is a sp ecial case of conditioning chains where the probability

distribution on V do es not dep end up on the current state of the chain It works

U

lik e this AsetV is chosen at random and the colors on those no des are discarded

U

The colors are then replaced randomly bycho osing colors for V from conditioned

U

on the values of the colors at all of the other no des At every step we use the same

distribution on subsets of V for cho osing our random V Often V is a randomly

U U

chosen dimensionno de of the state space

To describ e cho osing random numb ers we use a A to denote the act of

U

cho osing an element of A uniformly from that set We will use a A to denote R

cho osing a from A at random according to an arbitrary distribution Because we

are cho osing V and only up dating that p ortion often the value of xV n V is

U U

clear from context Therefore for notational convenience we will use w xV to

U

denote w xV xV n V

U U

The general heat bath Markov chain

V

Input X C

t

V

w aweight function on C

p a distribution on subsets of V

Set X X

t

V

Cho ose V according to distribution p

U R

V

U

Cho ose X V C according to

U R

w xV

U

P

P X V xV

U U

w x V

U

x V 

U

Set X X

t

Figure The general heat bath Markov chain

Reversibility allows us to show that this chain has the desired stationary distri

bution wZ Two states x and x are connected if their colors dier on a set

v which has p ositive probability of b eing selected pv There can b e more

U U

than one set v for which this is true Let V denote the set of subsets of V that

U U

contain all the no des on which x and x have dierent colors We now show that

heat bath chain is reversible Let x x denote the event that conguration the

moves from state x to x at one step of the Markov chain

X

x P x x x pv P x x jV v

U U U

v

U

X

w x

pv P x x jV v

U U U

Z

v U

Single site hard core heat bath chain

Set X X

t

Cho ose avertex v uniformly at random from V

Cho ose U uniformly from

If U or a neighbor of v is colored

Set X v

Else

Set X v

Set X X

t

Figure Single site hard core heat bath chain

X

w x w x

pv

P

U

x V n V w x Z

U

xxV nV

U

v V

U U

X

w x w x

pv

P

U

Z x V n V w x

U

xxV nV

v V U

U U

X

w x

pv P x x jV v

U U U

Z

v

U

x P x x

In the single site up date algorithm for the hard core gas mo del V is chosen to

U

be a single vertex v with probabilityn The probability of setting X v and

X v are exactly those computed in the previous section so weknow that this

chain has the correct distribution

Supp ose that the graph is bipartite with V V V where each edge has an

L R

and V Then the neighb ors of any no de in V lie only in endp oint in each of V

R L L

V and neighb ors of V all lie in V This allows us to switch all of the values of

R R L

V or V simultaneously This is denitely nonlo cal since the average number of

L R

Nonlo cal hard core heat bath chain

Set X X

t

Cho ose V to b e V or V each with probability

U L R

For each vertex v in V

U

Cho ose U uniformly from

If U or a neighbor of v is colored

Set X v

Else

Set X v

Set X X

t

Figure Nonlo cal hard core heat bath chain

no des which can be aected inamove is n

Metrop olisHastings

Metrop olisHastings chains take a rate approach rather than a conditioning

approach This approach works b est when all of the p ossible colorings for V are

U

roughly equal in value

In the heat bath chain we chose V threw away the colors on those no des

U

and then replaced them according to the conditional probability In a Metrop olis

Hastings typ e algorithm we attempt to change the values on V and sometimes

U

reject the change if it would lead to a state with lower weight

Again reversibility is used to show that this chain has the desired stationary

distribution As b efore supp ose that x and x are two states with p ositiveweight

where the set of subsets V containing all no des with dierent colors has p ositive U

The general Metrop olisHastings Markov chain

V

Input X C

t

V

w aweight function on C

p a distribution on subsets of V

Set X X

t

Cho ose V at random according to p

U

V

U

Cho ose xV uniformly at random from C

U

If w xV XV n V w X

U U

Set X V xV

U U

Else

Cho ose U uniformly at random from

w xV

U

If U

w X

Set X V xV

U U

Set X X

t

Figure The general Metrop olisHastings Markov chain

weight in p Then without loss of generalityletw x w x

X

x P x x x pv P x x jV v

U U U

v V

U U

X

w x

pv

U

jv j

U

Z jC j

v

U

and

X

pv P x x jV v x P x x x

U U U

v

U

X

w x w x w x

pv

U

jv j

U

Z jC j

v

U

X

w x

pv

U

jv j

U

Z jC j

v U

so they are equal and the Metrop olisHastings chain is reversible with the correct

distribution

Applying this to the sp ecic example of the hard core gas mo del we have the

following When the color chosen for v is and no neighbors have color already

Single site hard core Metrop olisHastings chain

Set X X

t

Cho ose avertex v uniformly at random from V

Cho ose c uniformly from f g

If c and v has no neighbors colored

Set X v c

Else

Cho ose U uniformly at random from

If U

Set X v c

Set X X

t

Figure Single site hard core Metrop olisHastings chain

then setting v to raises the weight of the conguration so we always pro ceed

nx

If however the color chosen for v is then the weight might drop from to

nx

The smaller over the larger of these weights is so with this probability

we switch a to

As with the heat bath when the graph is bipartite these ideas may be used to

construct a nonlo cal algorithm as well

The acceptance rejection heat bath chain

An idea which will come in handy later in a particular implementation of the heat

bath chain we will call the acceptance rejection heat bath chain The transition

probabilities are exactly the same as for the heat bath chain the dierence lies in

how a generate a Markov chain step In words what we do is having selected the

The general acceptance rejection heat bath Markov chain

V

Input X C

t

V

w aweight function on C

p a distribution on subsets of V

Set X X

t

V

Cho ose V according to distribution p

U R

Set M maxfw xV g

U

Rep eat

V

U

Cho ose X V C

U U

Cho ose W

U

w xV

U

Until W

M

Set X X

t

Figure The general acceptance rejection heat bath Markov chain

p ortion of the chain to change we then select a coloring for that p ortion uniformly

at random We then test a uniform against the weight of that coloring normalized

against M an upp er b ound on the weight If the test accepts we accept the value

and if it rejects we cho ose another coloring and try again

Although the form is similar this chain is not the Metrop olisHastings chain

It is simply another formulation of the heat bath chain as shown by the following

theorem This theorem is not new and is a staple of courses in sto chastic pro cesses

Theorem The probability that the node set V is given coloring xV by the

U U

acceptance rejection heat bath chain is

w xV

U

P X V xV

P

U U

w x V

U

x V

U

Pro of We do a rst step analysis computing the probability that X V xV

U U

and we rejected the color chosen in the rst step plus the probability that X V

U

xV and we accept the color in the rst step Let ACCEPT denote the event that

U

we accept the rst step and REJECT denote the event that we reject the rst step

P X V xV P X V xV ACCEPT

U U U U

P X V xV REJECT

U U

w xV

U

Q M

P X V xV jREJECT P REJECT

U U

When the rst step rejects we start over so

P X V xV jREJECT P X V x

U U U U

and

w xV

U

P X V xV

U U

Q M

P X V xV P REJECT

U U

w xV

U

P X V xV P REJECT

U U

Q M

w xV

U

P X V xV

U U

P ACCEPT Q M

w xV

U

P

w x V M Q M

U

x V

U

Q

w xV

U

P

w x V

U

x V

U

which completes the pro of

As an example wenow present the single site heat bath chain for the hard core

gas mo del as an acceptance rejection chain the version here is for when

Acceptance rejection single site hard core heat bath chain

Set X X

t

Cho ose avertex v uniformly at random from V

If a neighbor of v has color

Set X v

Else

Rep eat

Cho ose c f g

U

Cho ose U

U

Until c or c and U

Set X v c

Set X X

t

Figure Acceptance rejection single site hard core heat bath chain

The Swap Move

Bro der rst intro duced what we will call the swap move for a chain for generating

matchings of a graph Basically the idea is that if the color of exactly one neighbor

prevents a chosen site from being colored according to the heat bath distribution

then change the color of the neighb or to accommo date the color of the chosen no de

Bro der applied this move to construct a chain for sampling from the set of p erfect

matchings of a graph used in approximating the p ermanent of a matrix This

chain was later shown by Jerrum and Sinclair to b e rapidly mixing

Dyer and Greenhill applied this technique to the hard core gas mo del thereby

improving the ability to analyze the chain We will apply this movetoseveral chains

including the continuous hard core gas mo del and the discrete and continuous

WidomRowlinson mixture mo dels

Supp ose we are using the single site heat bath chain for the hard core gas mo del

When v has anyneighb ors colored we are unable to turn v to Supp ose however

that exactly one neighbor of v is colored with the rest colored Then a valid

move in that it stays in the state space would be to swap the color of v with its

neighb or with some probability p This chain is presented in gure

sw ap

Proving that this chain preserves the stationary distribution may be accom

plished via a direct application of reversibility The new moves are symmetric so

that if x and y are two congurations reached by aswap move then

x y P y xp P

sw ap

n

and x y Therefore clearly xP x y y P y x and these moves are

symmetric Since in the old chain there was no probability of moving from x to y

adding these moves preserves the stationary distribution

Later we will showhow this swap move improves the p erformance of chains for

the continuous hard core gas mo del and WidomRowlinson mixture mo del

Dyer and Greenhill hard core chain step

Set X X

t

Cho ose a vertex v uniformly at random from V

Cho ose U uniformly from

Case v has no neighb ors colored inX then

t

If U

Set X v

Else

Set X v

Case v has exactly neighbor w colored

If U p

sw ap

Set X v X w

Else

Set X v

Set X X

t

Figure Dyer and Greenhill hard core chain step

The Ising and Potts mo dels

We present here the single site heat bath up date chain for the Ising and Potts

mo dels The Metrop olisHastings chain is constructed in a similar fashion

For a vertex v let b c denote the number of neighbors of v which have color

v

c The b stands for blo cking as these colors tend to blo ck v from receiving color c

in antiferromagnetic mo dels

As with many lo cal up date chains the fact that we are picking vertices at random

least n ln n steps b efore we have any reasonable indicates that we must run for at

chance of mo difying all the no des A nice feature of this chain is that when the

temp erature T is large enough then O n ln n steps suce for this chain to mix

Single site Potts heat bath chain

Set x X

t

Cho ose avertex v uniformly at random from V

exp Jb cT

v

P

Cho ose c C according to

R

exp Jb c T

v

c C

Set xv c

Set X X

t

Figure Single site Potts heat bath chain

Antiferromagnetic Potts mo del at zero temp erature

Recall that the problem of uniformly sampling from the Q colorings of a graph may

be thought of as the zero temp erature limit of the antiferromagnetic Potts mo del

The heat bath chain for this problem is straightforward since all prop er colorings

of the graph have the same weight We b egin by making our notion of blo cking

more precise

Denition A color c blo cks or neighbors a node v if a node adjacent to v has

A color c which is not blocking for v is nonblo cking color c

Single site Q coloring heat bath chain

Set x X

t

Cho ose avertex v uniformly at random from V

Cho ose c uniformly from the set of nonblo cking colors for v

Set xv c

Set X x

t

Figure Single site Q coloring heat bath chain

SwendsenWang

SwendsenWang is a nonlo cal chain for the ferromagnetic Ising and Potts mo dels

which utilizes the random cluster viewp oint It has the advantage of being prov

ably faster than the single site up date mo del under low temp erature conditions

Although this mo del is known not to b e rapidly mixing for all temp eratures it

is in widespread use as a means for generating samples from the Potts mo del

The SwendsenWang pro cedure has two phases In phase the coloring of the

graph is used to divide the no des An edge is placed between two no des if the

endp oints of those no des receiv e the same color The connected comp onents of this

graph are connected sets of no des of the original graph that have the same color

Each of the edges of this graph are randomly and indep endently removed with

probability p where p exp T is high when the temp erature is high

and close to when the temp erature is low Once this has been accomplished the

number of connected comp onents will b e even higher

In phase I I the remaining connected comp onents are each assigned a color uni

formly and indep endently from C fQ g All the no des in that comp onent

are assigned this color For a subset of edges A let C A denote the set of connected

comp onents of A and let C denote the lowest numb ered no de in a connected

v

comp onent This is written algorithmically in gure

Swendsen and Wang bothintro duced this chain and showed that it has the

correct stationary distribution We will present an analysis of the mixing time of

this chain a result similar to that proved recently by Co op er and Frieze

SwendsenWang Step

Set x X

t

Let A ffv wg E xv xw g

For each edge e E set U e

U

For each no de v V set cv fQ g

U

For each edge e A

If U e p

Set A A nfeg

For all C C

A

Set X w cC for all w C

v

Set X x

t

Figure SwendsenWang chain

Sink Free Orientations

As noted in the previous chapter the problem of generating a sink free orientation

E

of a graph can be seen as have state space f g where the two colors for

each edge refers to the two p ossible orientations of that edge A heat bath chain

for this problem picks an edge at random and then randomly picks an orientation

that do es not create a sink

Single edge heat bath sink free orientations chain

Set x X

t

Cho ose e E

U

Cho ose c uniformly from the set of orientations that

do not create asink in the graph

Set xe c

Set X x

t

Figure Single edge heat bath sink free orientations chain

WidomRowlinson

We presentseveral chains for the WidomRowlinson mo del First we consider the lo

cal heat bath chainof then a discrete version of a nonlo cal chain for continuous

WidomRowlinson due to Haggstrom van Lieshout and Mller

Recall that in the WidomRowlinson mo del no des are either assigned a color

from fQg which indicates the typ e of particle o ccupying the site or a

indicating that that site is uno ccupied If c is the number of sites o ccupied by the

i

c

Q

c

ith particle typ e then the distribution is Z

Q

The heat bath chain cho oses a no de uniformly and then changes the color of

that no de conditioned on the remaining no des

Single site heat bath discrete WidomRowlinson chain

Set x X

t

Cho ose v V

U

Case All neighb ors of v are

P

Cho ose c fQg so that P c

R i

i

P

and P c i for all i Q

i i

i

Case All the neighb ors of v are either i or

Cho ose c fQg so that P c

R i

and P c i

i i

Case Two neighb ors of v have dierent p ositive colors

Set c

Set xv c

Set X x

t

Figure Single site heat bath discrete WidomRowlinson chain

The nonlo cal chain takes a more direct approach At each stage all the points

of a chosen color are removed from the chain Then new points of that color are

put back in the chain according to the stationary distribution conditioned on the

p ositions of the remaining colors In the next chapter we will see that these two

Nonlo cal conditioning discrete WidomRowlinson chain

Set x X

t

Cho ose i fQg

U

For all v such that all neighbors of v are either or i

Cho ose c fig so that P c

R i

and P c i

i i

Set xv c

Set X x

t

Figure Nonlo cal conditioning discrete WidomRowlinson chain

chains have roughly the same p erformance

A birth death typ e pro cess lo oks at WidomRowlinson from a dierent p oint

of view Let x be the set of no des colored i in a conguration x Then the birth

i

death approachsays that at each time step a p oint of color i is b orn at a sp ecic

P

A particular p ointdies with probability no de with probability n

i i

i

P

n by which we mean the point is removed from x by setting the

i i

i

color of the point to Algorithmically the pro cess is describ ed as follows Say

that a color i is blo cked at no de v if either v or a neighbor of v has a p ositive color

dierent from i Reversibility is easily seen to be satised with this chain which

has a slightly worse p erformance b ound than the single site heat bath chain We

intro duce it here b ecause it is easy to add a swap typ e move which will increase

the range of over which we may prove that the chain is rapidly mixing

i

In the vanilla birth death chain if a color is blo cked when it tries to b e b orn it

Birth death discrete WidomRowlinson chain

Set x X

t

Cho ose U

U

P

Set p

i

i

For all iQ

P

Set p

i i i

i

Cho ose i fQg according to p

R

Cho ose v V

U

If i

Set xv

Else

If color i is not blo cked for no de v

Set xv i

Set X x

t

Figure Birth death discrete WidomRowlinson chain

simply fails to b e b orn If however a p ointisblocked by only a single p ointinv and

the neighb ors of v our swap move will remove the blo cking point and intro duce

our p oint in its place

The Antivoter mo del

The antivoter mo del is one of two mo dels we will consider where the chain itself

is part of the mo del Throughout this work the discussions and results for the

antivoter mo del is jointwork with Gesine Reinhert

Naturally the antivoter mo del is closely related to the voter mo del Supp ose

that we have C f g At each step of the voter mo del a vertex is chosen at

random and the color of the vertex is changed to b e the same as a randomly chosen

Birth death swapping discrete WidomRowlinson chain

Set x X

t

Cho ose U

U

P

Set p

i

i

For all iQ

P

Set p

i i i

i

Cho ose i f Qg according to p

R

Cho ose v V

U

If i

Set xv

Else

Case color i is not blo cked for no de v

Set xv i

Case color i blo cked by exactly one no de w fv neighb ors of v g

If U p

sw ap

Set xv i

Set xw

Set X x

t

Figure Birth death swapping discrete WidomRowlinson chain

neighbor This may be used to mo del forest res and the spread of infectious

agents This chain has two absorbing states where the colors of all of the no des

are the same We will call such a state unanimous

In the antivoter mo del again a vertex is chosen uniformly at random but now

the color of the vertex is changed to the opp osite of a randomly chosen neighb or

Unlikethevoter mo del the antivoter mo del on graphs that are nonbipartite state

space is connected except for the unanimous states which once left are never

reached again and has a unique stationary distribution on the states However

this chain is not reversible Given a state where v is surrounded by no des colored

then v can move to b eing colored but cannot move back to being colored

The Antivoter mo del chain step

Set x X

t

Cho ose avertex v uniformly at random from V

Cho ose a neighbor w of v uniformly at random

Set xv xw

Set X X

t

Figure The Antivoter mo del chain step

Therefore P x y but P y x for some x and y and the chain cannot be

reversible This is the only chain we will consider in this dissertation which is not

reversible

The List Up date Problem

The list up date problem is the second problem we will consider where the chain

itself is part of the problem Throughout this work the discussions and results for

the list up date problem is jointwork with James Fill

Supp ose that wehave a list of n items arranged in order n in

other words is a p ermutation Requests come in for items whichmust b e served

by starting at the b eginning of the list and moving inwards until the item is found

Therefore it takes itimetolocateitemi

When an item is found we are allowed to bring it forward that is move it to

any p osition in the list prior to i without cost In addition we may transp ose

any two adjacent items in the list at cost

rear Given this situation there are several strategies one might consider for

ranging the items based up on requests For instance in the move to front MTF

metho d when an item is selected it is moved to the front of the list Sleator and

Tarjan showed that no matter what the sequence of requests this proto col is

worse than the optimal solution where all requests are known ahead of time by at

most a factor of

This is a worst case analysis of the problem Another approachistouseaverage

case analysis where the set of requests is a sto chastic pro cess and the goal is to

nd the pro cedure which minimizes the exp ected costs of serving the pro cess

Consider the move ahead one MA proto col In this metho d instead of the

selected item moving to the front it is instead placed ahead p osition Unlike

the MTF metho d this metho d has no nontrivial guarantee on the value of the

solution it delivers Supp ose the list is ordered n and the set of requests

is n n nn Then the MA chain will always require n time to service

a request when the optimal oine solution is to move n to the rst p osition

and n to the second and hold them there resulting in an average cost per query of

The worst case for MA is horribly bad but do es the average case do b etter

First wemust dene what we mean byanaverage input This is most often done

for the list up date problem by considering the input as a stream of indep endently

identically distributed requests The probability of requesting i at each step is p

i

With this random set of requests the list b ecomes a Markovchain with random

requests altering the chain based on whether we use MTF MA or some other rule

MA list up date chain

Cho ose i fng where probabilityofcho osing i is p

R i

If i

Let a i

Let j be the item such that j i

Swap i and j set j a i a

Figure MA list up date chain

MTF list up date chain

Request i fng with the probabilityofcho osing i is p

R i

If i

Let a i

For all j such that j a

Set j j

Set i

Figure MTF list up date chain

To compare the asymptotic b ehavior of these chains we compute the stationary

distribution of the lists given the input distribution The asymptotic cost is the ex

p ected cost of accessing an item from a stationary p ermutation Rivest showed

that under this scheme the MAchain has lower exp ected cost at stationarity than

the MT F metho d Of course the b est ordering under this probability distribution

is n if p p n However we assume that we do not have

knowledge of the p and we only see the set of random requests

i

We will use reversibility to derive the stationary distribution of the MAchain

For the MA list up date chain consider the distribution

n n

nn

p p p Z

MA

n

Say that the edge connecting the p ermutations and works byswitching adjacent

items i and j

n n

nn

p p p

n

P p

MA i

Z

Q

nk ni ni

p p p p

i

k ik j

i j

k

Z

Q

nk ni ni

p p p p

j

k ik j

i j

k

Z

P

MA

Therefore is reversible with resp ect to the MA chain Moreover this chain is

MA

easily shown to b e ergo dic so as our notation suggests is the unique stationary

MA

distribution of the MA chain

The MTF chain is not reversible since if an item moves from the back to the

front there is no corresp onding reverse move which places it in the back again

However it is easy to see what the probabilityisthatwhen is stationarity i

j Consider the pro cess which is stationary With probability

sometime b efore time either i or j was selected Then i j if at the last

choice of i or j i was selected Similarlyif j was selected at the last choice of j or

i then j i Therefore conditioned on the fact that the last choice was

either i or j the probabilitythat i was chosen is just

p

i

P i j

p p

i j

Now supp ose that we wish to compute the exp ected time needed to access an

item of The time needed to access an item is plus the numb er of items which

pro ceed it so

X

E access i E

j i

 

j i

X

P j i

j i

X

p

j

p p

i j

j i

Each item i is chosen to b e accessed with probability p therefore

i

X X

p

j

E access p

i

p p

i j

i

j i

X

p p

j i

p p

i j

j in

a result shown in and

Now lets upp er bound P i j for the MA chain Supp ose that we

know the p ositions of all items other than i or j Denote the smaller of the two

remaining p ositions a and the larger b Then there are two p ossibilities left for i

na nb

and j one with i at p osition a and j at position b with relative weight p p

i j

The other p ossibility is that j is at p osition a and i is at p osition j with relative

a b

Therefore the probability that i j given the p osition of all items p weight p

j i

other than i or j is

ba

na nb

p p

p p

i

i j j

nb na na nb ba ba

p p p p p p p p

i j i j i j i j

Now b a so this probability is at most p p p In fact many states of

i j i

p ositive weight of b a in which case the probability that i comes b efore j

is higher at least p p p Note this second expression may be rewritten as

i j i

p p p p p The p p term in the denominator is less than and so this

i j j i i j i

second fraction is greater than p p p

i j i

Hence the sum over j not equal to i of E islower for the MA chain

j i

 

than for the MTF chain and asymptotically it is guaranteed to p erform faster

Two questions remain First asymptotic eciency is useless if the chain takes

distribution Actually we do not need to to o long to converge to its stationary

measure mixing time here but we do need some way of measuring how quickly the

average access time converges to the stationary average access time

Adjacent transp osition for MA distribution chain

Set x X

t

Cho ose v f ng

U

Set i xv

Set j xv

Cho ose U

U

If U p p p

i i j

Set xv j

Set xv i

Set X x

t

Figure Adjacent transp osition for MA distribution chain

Second this analysis shows that the average asymptotic access time of the MA

chain is lower than that of MTF chain but it do es not tell what that asymptotic

eciency is Moreover for either chain to mix the second to lowest probability item

must be selected otherwise it may lie past the lowest probability itema situation

with low probability But this second lowest weight can b e made arbitrarily small

making the mixing time arbitrarily large To simulate from the asymptotic distri

bution of the MA chain requires extra subsidiary chains with dierent transitions

but the same distribution as the MA chain

Arbitrary transp osition for MA distribution chain

Set x X

t

Cho ose w f ng

U

Cho ose w f ngn fv g

U

Set v minfw w g

Set v maxfw w g

Set i xv

Set j xv

Set d jv v j

Cho ose U

U

d d d

If U p p p

i i j

Set xv j

Set xv i

Set X x

t

Figure Arbitrary transp osition for MA distribution chain

For example the adjacent tranp osition chain do es not request a sp ecic item but

randomly cho oses a p osition then swaps the p osition and the p osition immediately

preceding it according to the stationary distribution Of course there is no reason

why we must use adjacent transp ositions Arbitrary transp ositions work just as

well

Note that the average value of d chosen in this fashion is O n For weights

which are dierent from one another by a factor of more than cn this means

c

that the chance that the lowest weight item is placed rst is on the order of e

Hyp ercub e slices

The heat bath chain for the hyp ercub e is simple cho ose a co ordinate of the chain

uniformly at random and then change that co ordinate again uniformly at random

P

to either or When we are restricted to L xi U we must mo dify the

i

heat bath chain to take this into account When a switch would have the value of

P

ximoving b elow Lwe simply hold at the currentvalue rather than making the

i

switch Similarly for when we hit U This chain is of interest primarily b ecause of

Heat bath hyp ercub e slice chain

Set x X

t

Cho ose i fng

R

Cho ose c f g

R

P

xj U If L c

j i

Set xi c

Set X x

t

Figure Heat bath hyp ercub e slice chain

its relationship to algorithms for the Ising mo del when the magnetization is required

to be constant see for details

What remains mixing time

Creating Markov chains is easy but showing that the chains in this section mix in

a reasonable amount of time is not In the next chapter we intro duce the idea of

b ounding chains for computing the mixing time and we present b ounding chains

for each of the chains considered here

Chapter

Bounding Chains

The wolf Fenris broke the strongest fetters as if they were made of

cobwebs Finally the mountain spirits made for them the chain

called Gleipnir when the go ds asked the wolf to suer himself to be

bound with it he susp ected their design fearing enchantment He

therefore only consented to b e b ound with it up on condition that one of

the go ds put his hand in his mouth Tyr alone has courage enough to

do this But when the wolf found that he could not break his fetters

he bit o T yrs hand and he has ever since remained onehanded

Norse Myth Bul lnchs Mythology

Fortunately the mo dern practitioner of b ounding chain techniques do es not need

to make quite as large a sacrice as Tyr but a constant factor loss of sp eed is

sometimes involved In this chapter we describ e the purp ose and power of the

b ounding chain technique

To accurately describ e why b ounding chains are imp ortant we rst takealook

at how Markov chains are actually simulated Usually random numbers are drawn

which determine a function This function is then applied to the current state to

determine the next state of the chain

Denition Let f be a random function from into itself We say that f is

consistent with a Markov chain with transition matrix P if P f x y P x y

for al l x y

Recall the Dy erGreenhill chain for the hard core gas mo del of the previous

chapter Figure Once the random no de v is decided along with the uniform

random variable U we have completely chosen the function f for that time step

No matter what the state knowing v and U allows to determine the next state of

the chain

One metho d then of simulating a Markov chain is for each time t cho ose a

random function f indep endent of all other f and set X f X Since

t t t t t

each f is indep endent this clearly has the Markov prop erty In general we do not

t

require that each f be identically distributed all that is needed is that they are

t

consistent with the Markovchain However in this work all of the chains whichwe

consider can b e simulated using functions f which are indep endent and identically

t

As shorthand set distributed If X x then X f f f x

t t t

a

F f f f so that X F

a a b t

b t

Supp ose that we cho ose two starting values for the chain X and Y and set

X F X and Y F Y Then if X Y for some value of t then X Y

t t t t

t t t t

for all t t This motivates the following denition

Denition Suppose that a Markov chain is dened via a random sequence of

functions f f f consistent with a transition matrix P We refer to such

a a

a chain as a complete coupling chain If F X F Y we say that the stochastic

a a

t t

processes X and Y have coupled at time t and denote the rst time this occurs by

a

T If F is a constant function then we say that the chain has completely coupled

C

t

at time t and we denote the rst time this occurs by T

CC

The notion of complete coupling will b e of paramount imp ortance to us It will

not only allow for computer exp eriments and analytical arguments that determine

upp er b ounds on the mixing time of a Markovchain but also give p erfect sampling

algorithms The p erfect sampling asp ects of complete coupling will be explored

in Chapters and Here we concentrate on the exp erimental and analytical

determination of mixing times

The following theorem is an imp ortant to ol in determining the mixing time of a

Markov chain

Theorem Suppose that we have a completely coupling chain X has some

has a stationary distribution Then arbitrary distribution over and Y

X jj P X Y P T t jjF

TV t t C

t

That is the total variation distance between X and the stationary distribution is

t

bounded above by the probability that the X and Y processes have coupled

This theorem was originally proved by Do eblin Aldous is usually credited

with p opularizing this theorem as a to ol for b ounding the mixing time of a Markov

chain in the context of MCMC metho ds We shall use this theorem throughout this

work and so here we present the pro of which is straightforward

Pro of Given that Y has stationary distribution P Y A A for all

t

A Also T t and T t are disjoint sets so

C C

jjF X jj max jP X A Aj

TV t

t

A

max jP X A T tP X A T t P Y Aj

t C t C t

A

max jP X A T tP Y A T t

t C t C

A

P Y A T tP Y A T tj

t C t C

max jP X A T t P Y A T tj

t C t C

A

maxfP X A T tPY A T tg max

t C t C

A

max P T t

C

A

P T t

C

Another way of stating this result is that the mixing time is b ounded

TV

ab ove by min fP T t g Note that T is itself b ounded ab ove by T that

t C C CC

is if every pro cess started at time has coupled then clearly a pair of pro cesses X

and Y have coupled no matter what distribution they haveattime Therefore we

have proved the following corollary

Corollary Let be the mixing time of the chain with respect to total

TV

variation distance Then if at time t P T t

CC

t TV

Note that this statement do es not dep end at all up on the stationary distribution

of the chain it is entirely dep endent up on T Therefore if we had a metho d for

CC

eciently determining whether or not T t then we would immediately have a

CC

pro cedure for estimating the mixing time of the chain

Exp erimental Upp er Bounds on the Mixing Time

Input

For i to k

Set t

Rep eat

Set t t

Compute F

t

Until F is constant

t

Set t t

i

Set t to be the k largest value of t

est i

Figure Exp erimental Upp er Bounds on the Mixing Time

Johnson prop osed the algorithm in Figure for a sp ecic case where it is

very easy to determine if t T the case of monotonicity

CC

Monotonicity

A partially ordered set or p oset consists of a base set of elements together with

a partial order satisfying reexivity x x antisymmetry x y and y x

implies x y and transitivity x y and y z implies x z An example of

g and x y if and only if x y Note a p oset is when is all subsets of fn

that this particular subset has an elementwhichisgreater than all other elements

as well as one which is smaller than all other elements

Denition The element is maximal for if for al l x x

Similarly is a minimal element of the poset if x for al l x

In our example of the poset of all subsets of a set is minimal and

fng is maximal

Often it is p ossible to place a partial order on the state space of the Markov

chain such that moves on the chain resp ect the partial order

Denition Suppose the functions f f f determine a completely

coupling Markov chain We wil l say that the Markov chain resp ects or preserves

the partial order if for al l x y in and al l times t

x y f x f y

t t

Now a simple induction argumentshows that if the Markovchain resp ects a partial

order that X Y implies that F X F Y Johnsons approach was

t t

to use a maximal element a minimal element and a Markovchain which preserves

the partial order to squeeze all the elements of together

Supp ose that and are minimal and maximal elements of Then by denition

t t

x for all x Supp ose that at time t F F Then since for all

t t t

wthatF F xF for all x times t F F x F we kno

t t t

In other words F is constant and T t Moreover we know that the smallest

CC

t

time and meet is in fact T since complete coupling cannot occur while two

CC

pro cesses have not coupled

Example The simplest example of a nite monotonic chain is the random walk

on fng Supp ose that we are at state i then with probability the next

state is maxfi g and with probability the next state is minfi ng Then

with the partial order i j if i j this Markov chain is monotonic It has a

maximal element n and a minimal element

Other examples of chains which admit a monotonic partial order include the

ferromagnetic Ising mo del and the discrete WidomRowlinson mo del on a mixture

of tw o typ es However a wide variety of chains do not have a simple monotonic

structure and so a more general version of monotonicitywas develop ed

Antimonotonicity

Kendall app ears to be the earliest to take advantage of antimonotonicity in

designing an exact sampling algorithm although Haggstrom and Nelander were

the rst to formally dene the notion

V

Recall that for most cases of interest the sample space is just C The partial

orders for these state spaces are often constructed by putting an order on C

and then saying that X Y if X v Y v for all v V For chains on these

spaces with this typ e of partial order Haggstrom and Nelander dened the notion

of antimonotonicityas follows

V

Denition Consider a Markov chain on C and a partial order between

congurations on any subset of V The chain is antimonotonic if for al l congura

v xv we have that f xv f y v tions x y and v such that f x

t t t

Note that in monotonic cases with this typ e of partial order we have that x y

implies that f xv f y v That is why we refer to the prop erty with the

t t

inequality in the opp osite direction as antimonotonicity

This denition is usually only applicable when the numb er of no des that change

color from step to step is small such as in single no de or edge up date chains

v that is When a single no de changes then all we need check is that for the no de

altered x y implies that xv y v

We keep track of two states T for top and B for b ottom These states

t t

will have the prop erty that if B X T then B X T This

t t t t t t

lo oks similar to the monotonic case but B and T do not evolve according to the

Markov chain At time t let T f T and B f B For all v set

t t t t

T maxfT v B v g and B minfT v B v g Then wehave guaranteed

t t

that B X T B X T

t t t t t t

initial The algorithm pro ceeds just as in the monotonic case If we can nd

states such that B x T for all x then B T implies that F is constant

t t

t

and we are done

Note that the ab ove algorithm actually works for chains which are not anti

monotonic such as the LubyVigo da chain for the hard core gas mo del Since all

of these metho ds are sp ecic cases of b ounding chains we will not explore them in

detail here

The single site heat bath chain for the hard core mo del is an example of an

antimonotonic chain Here a single no de is chosen to b e changed If the no de turns

to then all the neighboring no des must have b een Thus to get high values

at a no de all the neighb ors must have small values This is the essential

notion of antimonotonicity that the particular no de value will b e higher if the rest

of the conguration is smaller where is the measure of whether avalue is higher

or smaller Unfortunatelymanychains of interest do not ob ey either monotonicity

nor antimonotonicity

The b ounding chain approach

The concept of b ounding chains generalizes the idea of monotonicity and antimono

tonicity and is applicable to a vast number of chains In its basic form the idea

for discrete chains was indep endently intro duced in by the author and by

Haggstr om and Nelander In and the author analyzed prop erties of

the b ounding chain more fully and the idea is applied to a wide variety of dierent

problems including continuous Markov chains

The original idea was develop ed in and in order to develop an exact

sampling algorithm for the prop er k colorings of a graph which is neither monotonic

nor antimonotonic when k

The concept is straightforward Basically instead of trying to show that com

plete coupling has o ccurred for every no de v of the graph wework on showing that

the complete coupling has o ccurred for a sp ecic no de v For some no des v com

plete coupling will have o ccurred and for some it will not Once complete coupling

has o ccurred for every no de v we know that it has o ccurred for the entire Markov

chain

V

Given a Markov chain M running on C with transition matrix P we

V

intro duce a new Markovchain M which runson P C where P C denotes

the set of nonempty subsets of C Given that is nite this new chain will be

nite as well and a conguration on M consists of giving each no de a set of colors

drawn from C In other words each no de of a conguration in the chain M has

asso ciated with it a color set which ranges from single colors up to the entire set C

Denition Let x and y Say that x y or x is in y if for al l nodes

v xv y v

In order to obtain our exp erimental b ounds on the mixing time we need to show

that F is a constant The size of is often exp onential in the input therefore

t

keeping track of F x is prohibitively exp ensive However keeping track of

x

t

xv ismuch easier That is wekeep track of the total numb er of p ossible F

x

t

colors of each individual no de When j F xv j for every no de v we

x

t

knowthatF x is constantover x The b ounding chain is the to ol that makes

t

this approach p ossible

V V

Denition Let M on C and M on P C becomplete coupling chains using

g g respectively Then we the random sequences f f f and g

say that M is a b ounding chain for M if for al l t

x y f x g y

t t

and

v jy v j g y v f y v

t t

The rst prop erty of b ounding chains says that after one step of the chain if X Y

t t

then X will b e in Y if the Y pro cess is b eing run on a b ounding chain for the

t t

X pro cess

The second prop erty says that if the b ounding chain has evolved to the p oint

where the color set at each point is a single color then the chain evolves exactly as

the chain that it b ounds Therefore the set of states where jy v j for all v is

absorbing for M

A simple induction argument extends the rst prop erty from single time steps

to m ultiple steps

Fact Suppose that M is a bounding chain for M Let X be a process run on

M and Y a process on M Then for al l t

X Y X Y

t t

Note that when jy v j for y we know that if x y then the value of

x must be the singleton elementiny v This leads to the following denition

Denition We say that the value of node v is determined by y if jy v j

If jy v j we say that the value of v is unknown

Another way of stating the second prop erty of bounding chains is that once all of

the no des of the graph are determined they stay determined Once there are no

unknown no des no no de will ever be unknown again

are so useful The following theorem is the reason b ounding chains

Theorem Suppose that M is a bounding chain for M and that Y v C for

al l nodes v Then if Y determines v for al l v V F x is constant

t t

Pro of Supp ose that Y v C and that at some time t jY v j Let x

t

Then x Y so F x Y by Fact However only one element of is in Y

t t

t

since jY v j for all no des v Therefore F x is that single element regardless

t

t

of x and so must b e constant

We claimed earlier that b ounding chains generalized the notion of monotonicity

and antimonotonicity In monotonic chains we trapp ed the states between two

t t

other states F and F Supp ose that the partial order is derived from a

t t

partial order on the color set Set Y v fc C F v c F v g

t

Each no de is given of a color set and by the monotonic behavior of the chain

X Y X Y Furthermore for jY v j to o ccur for all v F

t t t t t

t

F meaning that Y now evolves exactly as X F X do es Therefore Y is

t t t

t t

a b ounding chain

B v c T v g Antimonotonicityis similar Again we set Y v fc C

t t t

and once more it is easy to see that b ecause of antimonotonicity this is a valid

b ounding chain

One chain which is neither monotonic nor antimonotonic is the DyerGreenhill

chain of the previous chapter for generating samples from the hard core gas mo del

The metho d used by Dyer and Greenhill for proving that the mixing time of the

chain was p olynomial for restricted values of was path coupling Here we

develop a b ounding chain for this problem This b ounding chain will not only allow

us to prov e same theoretical mixing time result for the chain as in but it will

also give us a means for exp erimentally determining the mixing time when is

outside this restricted range and it will give an exact sampling algorithm for this

problem as we show in later chapters

Bounding the DyerGreenhill Hard Core chain

V

Recall that the state space for the DyerGreenhill chain is f g and so our state

V

space for the b ounding chain will b e ffg fg f gg For notational convenience

we will use to denote the set f g If anode is assigned a that indicates that

F at that no de might not b e constant

t

The DyerGreenhill chain cho oses a vertex v uniformly at random then

decides whether to attempt to turn that no de color to or If the attempt is

to turn v to the colors of the neighb ors of the no de do not matter v is colored

regardless of what the neighb ors are If the attempt is to turn v to and all

neighbors are then v is colored If at least neighbors of v are colored then

v is colored If exactly one neighbor of v is colored another roll is made which

if successful means that v is colored and this neighbor is switched from being

colored to b eing colored

Most of the b ounding chains we will develop are formed from the same pro cess

Supp ose that we wish to generate a b ounding function g for a Markov chain with

function f Then g y v ff xv jx y g is such a bounding function That is

for each p ossible x that could be in y compute f x Then for each no de v let

union over all x in y of f xv Then ensures that the new value of y v be the

both b ounding chain prop erties are true In constructing a bounding chain rst

instantiate all the random variables that are needed to determine f Then apply

f to all p ossible x in y Then for each no de write down all p ossible outcomes for

f x at that no de

Fortunately for most chains of interest the value of v at the next step is entirely

determined by the values of the neighb ors of v We do not have to examine all

x y rather weonlyhave to lo ok at all p ossible values for the neighb ors of v that

lie in y

For the DyerGreenhill chain we will rst describ e in words the behavior of

the b ounding chain This is given in algorithmic form in Figure We need to

examine all p ossible typ es of outcomes for the function f Supp ose that the random

variables comes out to be v and U Then f x says to color no de v

with Therefore f xv and f xw y w for all w v In this

xy xy

case we change y v to fg and leave all the rest of the no des unchanged

For all of the following cases supp ose that U If all the neighb ors

w of v satisfy y w fg the only congurations x in y also have all neighb ors

of v colored so xv always changes to Therefore f xv fg so

xy

g y v fg All the other no des remain unchanged

If at least two neighb ors of v have y v fg then so do all x in y and again v is

always colored so y v fg The tricky cases arise when some of the neighb ors

of v are colored f g

Supp ose that at least two neighbors of v are colored in y and the rest are

colored Let x be the conguration where all neighb ors of v are colored and

let x b e the conguration suchthaty v x v Then x y and x y

Sadlywe note that f x while f x so wemust color y v with f g

at the next step

These are all the functions f where we do not attempt a swap Supp ose now

that exactly neighbor w of v is colored fg and some other neighb or is colored

Then if we do not attempt a swap b ecause U we know v must be

colored b ecause of the neighb or colored fgso y v fg at the next step If we

do attempt a swap then matters turn ugly Again let x b e the conguration where

a is resolved to a and x be the conguration where a is resolved to a

Then f x v f x v and f x w and f x w Therefore

both v and w are colored in the next step

However the swap can be helpful for some congurations Supp ose that we

attempt a swap and exactly one neighbor w of v is colored while the rest are

Then again using x a resolves to a and x a resolves to a we see

that f x w f x w and f x v f x v so b oth v and w are

determined in the next time step

Theorem With probability the values of al l the nodes wil l be determined in

nite time

Pro of There is a small but p ositivechance that the next n moves will b e to change

n n

each of the no des to color where this chance is at least nn When

that happ ens all of the no des are determined and once they all are determined

they all stay determined Therefore the time needed for complete determination

is sto chastically bounded by a geometric random variable and so is nite with

probability

Of course we would like to make a much stronger statement than that and in

fact wecan Let refer to largest degree in the graph where the degree of a no de

is the number of undirected edges adjacent to that no de

Theorem Let T denote the time that al l the nodes arecompletely determined

BC

by the bounding chain and so T T If then

CC BC

E T min n log n n

BC

Therefore the algorithm runs in O n ln n steps when is bounded away from

and O n steps in general Moreover T is rarely very much larger than its

BC

expected value

k

P T keET e

BC BC

The pro of of this theorem when will b e straightforward but when

as in the case where we are attempting to sample indep endent sets

uniformly over graphs of b ounded degree will require abit of machinery and so

we take a brief lo ok at martingales

Martingales

Let D denote the set of unknown no des at a particular time step tandA denote

t T

the set of determined no des D stands for disagree and A stands for agree

At time all the no des are colored and so jD j n To prove the theorem we

would like to be able to show that on average the size of D is decreasing at each

t

step This is the motivation b ehind the intro duction of martingales

Denition We say that a stochastic process X X X is a sup er

martingale if with probability

E X j X X X

t t t t

In order to prove b ounds on the behavior of sup ermartingales we need the

concept of a stopping time

Denition Suppose that for al l t the event t is X X measur

t t

able Then is a stopping time of the process

Roughly sp eaking is a stopping time if at all times t we can determine if has

o ccurred or not An example a stopping time is the rst time that the Markovchain

enters a particular state Given the past history of the chain we may determine

whether or not that state has already been entered

The sup ermartingale prop ertysays that for single time steps the exp ectation of

the value of the sto chastic pro cess decreases on average This is also true for longer

time intervals and for stopping times

Fact Let fX g be a supermartingale and let be a stopping time greater than

i

i then

E X E X

i

The following theorem is well known we reprove it here as an illustration of a

pro of technique which we will use later

Theorem Suppose that we have a supermartingale on fng and that

is an absorbing state Furthermore suppose that P X X p Then the

t t

eaches is O n p expected time until the stochastic process r

The pro of technique works by b ounding the number of times that the pro cess

moves across the interval from i to i

Denition We say that a stochastic process X up crosses a b at time t

X a and X b Let U a b be the number of times X X upcrosses

t t

a b Similarly a downcrossing occurs when X b and X a

t t

Bounding the exp ected number of up crossings in a general sup ermartingale is

normally accomplished by means of the up crossing lemma However for the

sp ecial case for which we need it we can prove a stronger result directly

Lemma For a supermartingale on such that is an absorbing state

E U i i i

Pro of Supp ose that X i Let be the rst time that the pro cess moves to

t i

a state greater than or equal to i Let be the rst time that the that the

pro cess hits Then minf g is also a stopping time and

i

E X E X

t

i P E X P E X

n

n



X i P P E

i

i

i

P

i

i

Therefore the probability that another up crossing o ccurs is at most ii making

the number of up crossings sto chastically b ounded ab ove by a geometric random

variable with parameter i Therefore the exp ected numb er of up crossings is

b ounded ab ove by i

Armed with these lemmas we now pro ceed to prove Theorem

Pro of of Theorem We bound the exp ected number of times t that

X i This is a random variable whichwe will call N Note that X i for one of

t i t

three reasons First wecouldhavehadX i and the state stayed the same By

t

assumption the probability of staying at the same value is at most p Second it

mighthave b een that X ibut X i This implies that there was a i i

t t

up crossing Finally we could have had X i but X i which implies that

t t

there was a i i downcrossing Note that the number of i i downcrossings

is b ounded ab ove by one more than the number of i i up crossings

Taken together we have that

E N pE N E U i i E U i i

i i

E N E U i i E U i i

i

p

i i

p

i

p

Since weknow that X only takes on values from to nwe have that the exp ected

number of steps b efore hitting is

n

X

E E N

i

i

n

X

i

p

i

n n

p

which is O n p

A minor mo dication to the ab ove theorem will have a noticeable impact on

our ability to prove running time bounds The only dierence is that now the

probabilitythatwe move is dep endent up on the value of X

t

Theorem Suppose that we have a supermartingale on fng with an

Then the expected absorbing state Furthermore suppose that P X X p

t t X

t

time until the stochastic process reaches is

n

X

i

p

i

i

Pro of As with the pro of of Theorem we pro ceed by bounding the exp ected

value of N Here however the probability of moving from N at any given time

i i

step is at least p

i

E N p E N U i iU i i

i i i

i

p

i

P

i

n

as desired Therefore E

i

p

i

When we have a sto chastic pro cess which is b etter than a sup ermartingale we

use Walds Inequality see

Theorem Suppose that X X X is a nonnegative stochastic process

such that E X X jX q when X Then if is the rst time that

t t t t

X

t

E E X q

Running time of b ounding chain for DyerGreenhill

We need one more well known fact from elementary probability b efore proving

Theorem

Fact Markovs Inequality Let X be a nonnegative random variable with posi

tive expected value E X Then

E X

P X a

a

Pro of of Theorem For convenience let F be the algebra generated by

t

fX jt tg As b efore let D b e the set of unknown no des at time t and let A

t t t

V n D b e the set of determined no des We will showthatjD j is a sup ermartingale

t t

The second b ounding chain prop erty guarantees that when jD j itstays and

t

so isan absorbing state for the jD j pro cess

t

From the algorithm it is clear that the size of D changes by at most two at

t

each step jD j increases in size when no des move from A to D and decreases

t t t

when no des move from D to A Let V be the random vertex chosen at time t

t t t

by the algorithm Then P v V n for all v and

t

X

E jD j jD jjF v V E jD jj F jD j

t t t t t t t

n

v

From Figure we know thatateach step our choice of v places us in one of six

disjoint cases dep ending on the color sets of the neighb ors of v So for v satisfying

cases through we compute E jD jjD jjv V F

t t t t

Supp ose rst that v A Let cv denote the case that v falls into from

t

through Let C denote the value of E jD jjD j j v V F given that

i t t t t

cv i and v is in A Then in cases and the no de always stays in A so

t t

our exp ectation is and C C C If v is in case let w be the single

unknown neighbor With probability v attempts to swap with w and

so w moves to A With probability v attempts to turn to color

t

resulting in v b ecoming unknown Hence

C

In case switching v leads to b oth v and its neighb or colored to b e moved to

Finallyin case attempting to turn v to color results in v D so C

t

moving to D so C

t

Now supp ose that v was in D to start In taking a step we do not consider at

t

all the color of D Hence we can treat such an o ccurrence as always moving v out

t

of D and then treating it as though it was in A Hence for v D and case i

t t t

E jD j jD jjv V F C

t t t t i

Let R denote the number of no des that fall into case i Altogether we have

i

that

X X X

C C E jD jjD jjF

i i t t t

n

i

v A cv i v D cv i

t t

X X

jD j

t

C

i

n n

i

v cv i

jD j

t

R C R C R C

n

jD j

t

R R R maxfC C C g

n n

Now R R and R cannot b e arbitrarily large Every vertex counted in R and R

is adjacent to at least one vertex in D Every vertex counted by R is adjacent to

t

at least two vertices in D Taken as a whole the vertices in D are adjacent to at

t t

most jD j dierent vertices Hence

t

R R R jD j

t

C and so C maxfC C C g so that Wehaveshown that C C

jD j jD j

t t

E jD jj F jD j C

t t t

n n

jD j

t

n

jD j

t

where is set to be the factor in parenthesis in the last expression There are two

ways to pro ceed in the analysis at this point Since each gives an upp er b ound on

the exp ected running time each will give us one term in the minimum expression

of the theorem First since jD j is a sup ermartingale allowing us to use

t

jD j is b ounded below Theorem to continue The probability that jD j

t t

by the probability that a no de in D is chosen and turned to This o ccurs with

t

jD j

t

probability Therefore by Theorem

n

n

X

n

i E T

BC

i

i

n lnnn

which is smaller than the second term in the minimum expression of the theorem

When we can take another approach When we take a single step the

exp ectation of jD j decreases by a constant factor Wenow show by induction on t t

that

t

E jD j E jD j

t

The base case when t is simply an identity As our induction hyp othesis

supp ose that holds for time t The fact that jD jn tells us that E jD j

t t

is b ounded for all t so at time t

t t

E jD j E E jD jjjD j E jD j E jD j E jD j

t t t t

Note that jD j is integral so T tjD j Hence

t BC t

P T t P jD j

BC t

E jD j

t

t

E jD j

t

n

which shows that T has an exp onentially declining tail allowing us to upp er

BC

bound E T This upp er bound is exactly the rst expression in the minimum

BC

term

The nal portion of the theorem that P T ek E T exp k follows

BC BC

P T t s P T from the facts that P T eE T e and

BC BC BC BC

tP T s

BC

The same basic pro of outline will b e used again and again as we examine dierent

b ounding chains First come up with some integer measure of how far away the

b ounding chain is from detecting complete coupling in this case we used jD j

t

Second show that this measure is a sup ermartingale or even b etter than it shrinks

by a constant factor at each step Finally use the facts we have shown ab out

martingales or the fact that the measure is integral to upp er b ound the time until

our measure hits zero

Of course not every b ounding chain analysis will be quite as straightforward

and the next chapter basically consists of simple tricks to allow b ounding chains to

be used on each of the mo dels of Chapter

Bounding chain step for DyerGreenhill chain

Set Y Y

t

Cho ose a vertex v uniformly at random from V

Cho ose U uniformly from

If U

Set Y v fg

Else

Case All neighbors of v are colored fg

Set Y v fg

Case v has exactly neighbor w colored fg rest are fg

If U p

sw ap

Set Y v fg Y w fg

Else

Set Y v fg

Case v has more than one neighb or colored fg

Set Y v fg

Case One neighbor w colored f g rest colored

If U p

sw ap

Set Y v fg Y w fg

Else

Set Y v f g

Case One neighbor w colored fg at least one colored f g

if U p

sw ap

Set Y v f g Y w f g

Case More than one neighborisunknown rest are

if U

Set Y v f g

Figure Bounding chain step for DyerGreenhill chain

Chapter

Bounding chains for other mo dels

A priest asked What is Fate Master And he was answered

It is that which gives abeast of burden its reason for existence

It is that which men in former times had to bear up on their backs

It is that which has caused nations to build byways from City to City

up on which carts and coaches pass and alongside which inns havecome

to be built to stave o Hunger Thirst and Weariness

And that is Fate said the priest

Fate I thought you said Freight resp onded the Master

Thats all right said the priest Iwanted to know what F reightwas to o

Kehlog Albran The Prot

Finding the fate of a state of the Markovchain is easy once you have a b ounding

chain b ecause no matter what the starting state the nal fate of the pro cess will

be the same

On the other hand it turns out that the hard core gas mo del is one of the

few examples where a Markov chain can be directly turned into a b ounding chain

Often in order for the b ounding chain to detect complete coupling in a reasonable

amount of time some tweaking of the original chain into a more suitable form is

necessary

Q coloring chain

A case in p oint is the chain for the Q colorings of a graph Figure Jerrum

showed that the single site Metrop olis Hastings chain for this problem is rapidly

mixing when the numb er of colors Q where is the maximum degree of the

graph Salas and Sokal extended this result to the heat bath chain In this

section we analyze a b ounding chain for this chain intro duced in and We

will show that this b ounding chain detects complete coupling in p olynomial time

when Q although computer exp eriments in indicate that it is

p olynomial even when Q on certain classes of graphs

Rather than working directly with the heat bath chain it will b e easier to work

with the acceptance rejection heat bath chain when describing the b ounding chain

Although it is not necessary to take this approach when writing and analyzing the

b ounding chain it usually do es lead to some simplication

After dierent colors are chosen we know that at least one of them had

to be nonblo cking Recall that in the b ounding chain at site we keep a subset of

colors Y v In this case Y v is at worst each of the colors that we tried

Single site acceptance rejection heat bath Q coloring chain

Set x X

t

Cho ose v V

U

Rep eat

Cho ose c C

U

Until c is nonblo cking for v

Set xv c

Set X x

t

Figure Single site acceptance rejection heat bath Q coloring chain

for no de v Therefore the size of each Y v in the b ounding chain is at most

Hence there are at most colors in the color sets of neighbors of v so if

Q there is some chance of cho osing a color which is known not to be

blo cked for v

The idea of the b ounding chain is to keep selecting colors for v adding them to

the color set until wehave chosen at least dierent colors or we have found

one not in the set Let w v denote that fv wg is an edge of the graph For each

Bounding chain for single site Q coloring heat bath chain

Set y Y

t

Cho ose v V

U

Set y v

Rep eat

Cho ose c C

U

Set y v y v fcg

Until c Y w

v w

Figure Bounding chain for single site Q coloring heat bath chain

of v we add to y v those colors which could p ossibly be chosen for xv and so

X Y X Y

t t t t

As long as Q we always have a chance of jY v j in the next

round It turns out that this condition is sucient for the b ounding chain to detect

complete coupling in p olynomial time

Theorem Let T be the rst time that the bounding chain detects complete

BC

coupling If Q

Qn ln n

E T

BC

Q

If Q

E T n Q

BC

Pro of As with the DyerGreenhill b ounding chain we will prove the result by

keeping track of the size of D the set of no des such that jY v j Let dv

t t

denote the number of neighb ors of v which lie in D The set of no des where

t

jY v j we denote A Let v A D denote the event where v moves from

t t t t

A at time t to D in the next time step Such a move makes D larger than

t t t

D

t

On the other hand when v D A this decreases D by compared to

t t t

D Each no de is selected to b e altered with probabilityn

t

X

P v A D jjD j E jD j jD jjjD j

t t t t t t

n

v A

t

X

P v D A jjD j

t t t

n

v D

t

X

P v A D jjD j

t t t

n

v A t

X

P v D D jjD j

t t t

n

v D

t

X

jD j

t

P v D jjD j

t t

n n

v

The probability that v is in D dep ends on the number of unknown neighb ors of

t

v whichwe shall denote dv Supp ose that each of these dv neighb ors has

colors in its color set Then we are uncertain ab out the status of at most dv

colors That is for these colors we are not sure whether or not they blo ck v and

so cho osing them for v leads to v entering D This o ccurs with probability

t

dv

P v D

t

Q

Combining this with the fact that each no de in D has at most neighb ors and

t

P

dv jD j yields so

t

v

X

dv

E jD j jD jj jD j jD j

t t t t

n Q

v

jD j

t

jD j

t

n Q

which is at most when Q making jD j a sup ermartingale When

t

Q an easy induction gives us

t

Q

E jD j n

t

Qn

Qn ln n

k

After k steps this makes E jD j e Given that jD j is a nonnega

t t

Q

k

tive integer Markovs inequality gives P jD j e which means that the

t

exp ected time needed for jD j to hit is as in the rst part of the theorem t

For the second half note that jD j jD j with probability at least jD jQn

t t t

which is a lower bound on the chance of picking an unknown no de and changing it

to known Therefore we may apply Theorem which gives us the second half of

the theorem

As with the b ounding chain case it is unlikely that complete coupling takes

much longer than the exp ected time to completely couple More formally

Theorem Let T be the time that jY v j for al l v given that Y v C

BC t

k

for al l v Let E T be its expected value Then P T kE T

BC BC BC

Pro of By Markovs inequality the probabilitythat T E T is at most

BC BC

The starting condition we are given Y v C isinsomesensetheworse p ossible

If Y v Y v for all v then jY v jjY v j

t

Therefore after E T steps either wehave complete coupling or we try again

BC

Since the next steps are indep endent of the last the next E T steps also have

BC

a chance of showing complete coupling After kE T steps the only way

BC

we could not have complete coupling is if all k sets of E T steps failed which

BC

k

happ ens with probabilityat most

hwas seen in the b ounding chain This chain exhibits the cuto phenomenon whic

for the Dyer Greenhill hard core chain When the numb er of steps is less than n ln n

thereisagoodchance that wehavenoteven selected all of the no des at least once

Therefore the b ounding chain could not have converged in this time However

when the number of colors is suciently high the b ounding chain always detects

complete coupling after O n ln n steps with an exp onentially declining probability

of failure This b ehavior will b e seen in the Potts mo del b ounding chain as well

The Potts mo del

As with the Q coloring chain we b egin by writing the heat bath chain for the Potts

mo del as an acceptance rejection chain For convenience in this section we consider

the ferromagnetic mo del Similar results may be shown for the antiferromagnetic

mo del

We continue using b v to denote the number of neighb ors of v whichhave color

i

b v

i

i After cho osing which v to change let w i be the weight asso ciated with

color i where is again exp T A natural upp er b ound on the weights is

With this notation the general acceptance rejection chain b ecomes

Single site acceptance rejection heat bath Potts chain

Set x X

t

Cho ose v V

U

Rep eat

Cho ose c C

U

Cho ose W

U

b v 

c

Until W



Set xv c

Set X x

t

Figure Single site acceptance rejection heat bath Potts chain

Here blo cking colors do not prevent v from b ecoming a particular color they

encourage it

Now consider how to develop a b ounding chain for this pro cess For each no de

v and color iletd v b e the number of neighbors w of v for which jY w j and

i

i Y w Let b v denote the number of neighbors w of v for which jY w j

i

and Y w fig Then the chance of cho osing color i for v may be as high as

b v d v b v

i i i

or as low as for any X Y Therefore if we fall below

b v

i

wealways know to terminate the lo op but if we are in b etween this high

and low value we must add this color to the set of p ossible colors and rep eat the

lo op again

Single site acceptance rejection heat bath Potts b ounding chain

Set y Y

t

Cho ose v V

U

Set y v

Rep eat

Cho ose c C

U

Cho ose W

U

b v d v 

c c

If W



Set y v y v fcg

b v 

c

Until W



Set xv c

Set X x

t

Figure Single site acceptance rejection heat bath Potts b ounding chain

Again this b ounding chain exhibits the same kind of cuto phenomenon as seen

earlier where we do not have a go o d bound until all the no des have b een hit

with nonnegligible probabilityand then the probability that we have not detected

complete coupling declines exp onentially

Theorem Using this bounding chain for the ferromagnetic Potts model

kn ln n

k

e P T

BC

Pro of Rep eating the pro of for the Q coloring chain we are led to

X

E jD j jD jj jD j jD j P v D

t t t t t

n

v V

A necessary condition for no de v to end up in D is that the rst color for which

t

b v  b v d v 

c c c

also satises W This condition is not sucient since we W

 

may eventually end up cho osing this rst unknown color as the only color picked

an event which will come into play later when we consider the Ising mo del

Using the worst p ossible upp er bound that d v dv for all i meaning that

i

Y w C for all neighbors of v in D wehave that the probabilitythat v ends up

t

unknown is

b v b v d v

c c c

P v D

t

b v d v b v d v

c c c c

d v

c

dv

Therefore

X

jD j E jD jD jj jD j

t t t t

dv

n

v dv

The interesting thing ab out the terms in the last summand is that they do not con

tain any information ab out the no de v other than dv Therefore we just consider

maximizing this sum sub ject to the constraint that the dv are p ositive integers

that sum to at most jD j

t

Given this freedom the maximum will o ccur when all of the dv Supp ose

dv This contributes

dv

to the sum However if we create a dummy no de v with dv and lower dv

by then the sum of dw over all w remains the same but now the contribution

to the sum is

dv dv

dv dv

To see that the numerator of this fraction is at most which would mean this

contributes more to the sum than when dv we note that at the

numerator equals Taking its derivative with resp ect to gives

dv dv dv

dv dv

Making the derivative negative for all Therefore by the mean value theo

rem the numerator is less than for all p ositive making this contribution more

signicant

Hence

jD j jD j E jD jjD jjjD j

t t t t t

n

jD j

t

n

This will b e negative when in which case an induction may be

used to show that

t

E jD j n

t

n

and the pro of is nished in the same manner as b efore

SwendsenWang

SwendsenWang is unusual in that it switches back and forth between colorings

on the edges of the graph and Q colorings on the no des of the graph Our b ounding

chain must b e equally quixotic coloring the edges either fg fg or f g and

coloring the no des in a similar fashion

The SwendsenWang b ounding chain is given in Figure where C is the

A

set of connected comp onents with resp ect to the edges in A w v if w and v are

connected via edges in A B and v is the comp onent in C that contains vertex

C A

v SwendsenWang has two phases and so our b ounding chain do es as well It is the

rst phase that makes the b ounding chain approach p ossible Here edges are thrown

out of the chain indep endently with probability p recall that p exp T

is another means of measuring the temp erature of the Potts mo del This means

that if an edge is colored fg or and is thrown out then its color will always

change to fg When computing comp onents how ever edges that are still colored

could mean that we do not knowwhich no des b elong to which comp onents Hence

the colors of these no des at the next stage of the algorithm will b e uncertain

Analyzing the change in unknown edges is easy during the removal stage It

SwendsenWang b ounding chain

Set y Y

t

Let A ffv wg E y v y w jy v j jy w j g

Let B ffv wg E jy v y w j jy v j jy w j g

For each edge e set U e

U

For each no de v set k v fkg

U

For each edge e A

If U e p

Set A A nfeg

Set B B nfeg

For all C C

A

Set z w k C for all w C

v

Cho ose a total order uniformly at random for C C

A

For all v

Set y v z w

v w w v

C C

Set Y y

t

Figure SwendsenWang b ounding chain

is the growth of unknown comp onents that makes proving the following theorem

dicult

Theorem Set

Q p

p

Q p

If the SwendsenWang bounding chain wil l have detected complete coupling

by time log n with probability at least

Pro of As b efore we shall show that jD j is a sup ermartingale We use A and B

t

as dened in the b ounding chain step at time t

The key point is that for a vertex to receive more than one color it must be

B and at least one edge in B n A But connected to another vertex using edges in

foranedgetobeinB n A it had to have b een adjacent toanodeinD Hence no des

t

in D must be connected through edges in B to a no de in D This is necessary

t t

but not sucient anode v may be connected toanode in D and still end up not

t

b eing placed in D dep ending on the color choices Three p ossibilities that would

t

preclude v adding w to D are w is already in D b ecause of another no de the

t t

color chosen for v and w is the same and w v so w do es not add the color

C C

of v to its edge set Although we cannot analyze the rst exclusion it is easy to

quantify the last two

We shall write w v if no des v and w are connected using edges in B Every

no de placed in D is either in D to start with or connected to some no de in D

t t t

using edges in B Therefore we have that

X X

jD j

w v t v D

w k v k

c t

C C

C

w v

v D

t

The ordering of comp onents is uniform over all p ossible orderings which gives us

P w v The probability that the comp onents which v and w lie in

C C

receive dierent colors is Q Q By linearity of exp ectations

X X

Q

E jD jjF P v D jF

t t

Q

w v

v D

t

A necessary condition for v to b e in D is that at least one edge adjacenttov

t

must have survived the edge removal phase This o ccurs with probability p

We b ound the number of w v by using a branching pro cess argument see

After edge removal the numb er of no des adjacenttow is b ounded ab ove sto chas

tically by a binomial random variable with parameters p and Each of these is

a separate branching pro cess with number of children distributed as a binomial

random variable with parameters p and The p ossible number of children is

rather than b ecause one edge must be used as the parent Let h denote

the exp ected size of each of these child pro cesses

Each one of these children pro cesses has p exp ected number of children

and so the exp ected size of each child branching pro cess satises the recursion

E h pE h

Solving we nd that E h p

The original no de v has at most in exp ected value after Phase I p neighb ors

connected by unknown edges Each of these neighb ors is also the source of a branch

ing pro cess and so altogether the exp ected number of w v is p p

Summing up we nd that

X

E jD j jD j

t t

v D

t

where

Q p

p

Q p

What we have shown is that E jD j j F jD j Again this yields via

t t

t

induction E jD jjD jD j D is just the entire set of vertices V and so

t

t

E jD j n Hence after log n time steps E jD j Since jD j is

t t t

integral we have that P jD j by Markovs inequality and we are

t

done

Like all of our theorems concerning the a priori running time of b ounding chains

this one has an immediately corollary that the chain is rapidly mixing when the

condition on p is satised This is similar to a result proved indep endently by

Co op er and Frieze using the technique of path coupling The following table

shows the largest value of p over which these results apply

Table SwendsenWang approach comparison

Co op erFrieze

Bounding chain Q

Bounding chain Any Q

Of course this is at b est a theoretical determination The b ounding chain

approach allows for computer exp erimentation to determine what the actual running

time is for values of p which are much larger

Sink free orientations

In the sink free orientation chain Figure we cho ose an edge at random and

cho ose a new orientation at random from the set of the orientations that do not

create a sink

First note that without loss of generality we may assume that every no de has

degree at least since a leaf of the graph must have its edge directed out of the

leaf in order to avoid creating a sink

Supp ose wewere to b egin the b ounding chain by lab eling every edge f g

indicating that we do not know the orientation on any edge Then we would never

be able to gain any information ab out the state of the chain since we would never

know whether ipping a particular edge was a p ermissible move

Therefore weletY Y Y meaning that Y eY e Y e for all edges

e At the start of the b ounding chain pro cedure we single out a particular edge e of

the chain We set Y efg Y e fg and Y eY e for all e e

means that X Y Y Y for all X since for all edges e Clearly this

Y e

i i

If we guarantee that X Y X Y for i for all t then we will

t t

t t

have guaranteed that X Y X Y for all t and wewillhave a b ounding

i t t t

chain

i

We know the direction of some p ositive number of edges in Y therefore it is

t

p ossible to learn the identity of others As always we say that an edge e is known

if jY ej and otherwise it is unknown An edge is directed into i if the edge

i j is colored or if the edge j i is colored and otherwise it is directed out

of i Using this terminology the b ounding chain may be written as follows

er showed that the sink free orientation chain couples in Now Bubley and Dy

exp ected O m time So if X and X are two arbitrary pro cesses dierent at time

then P X X for t m The actual algorithm has two phases In

t t

Phase I we completely couple using the two pro cesses Y and Y hoping that each

of them detects complete coupling in their half of the states In Phase II we run

the chain as Bubley and Dyer did hoping that the two states which remain merge

into a single state

Theorem For this two phase approach E T is O m BC

Single edge heat bath sink free orientation b ounding chain

k k

Set y Y

t

Cho ose e fi j g E where ij

U

Case All edges b esides e are known to b e directed into i

Set y e fg

Case All edges b esides e are known to b e directed into j

Set y e fg

In the remaining cases cho ose U

U

Case U and no edges b esides e known to leave i

Set y e f g

Case U andno edges b esides e known to leave j

Set y e f g

Case U and an edge known to leave i

Set y e fg

Case U andan edge known to leave j

Set y e fg

Figure Single edge heat bath sink free orientation b ounding chain

Pro of The probability that a single Phase IPhase II pass detects complete cou

pling is the probability that three events o ccur Let t be the time at the end of

Phase I and t be the time at the end of Phase II Two of the three events that

k

must o ccur are jY v j for all v and k Let X and X be the states

t

by Y and Y should this o ccur Then the third event that must happ en is dened

that is the two remaining states have coupled by the end of Phase X that X

t t

 

II

The probabilitythatthetwo states couple in Phase II was already shown to

be at least when the time the chain is run is O m Therefore here we b ound

the time needed for Phase I to run to completion

As always we will show that jD j is a sup ermartingale

t

X

E jD j jD jjjD j P e A D jjD j

t t t t t t

m

eA

t

X

P e D A jjD j

t t t

eD

t

Consider how an edge might move from A to D Supp ose that e fi j g with

t t

ij Then we know the value of the edge already so if the roll of U is such that

change direction we will still know it Therefore the probability of an e do es not

edge b ecoming unknown is at most

An edge cannot become unknown unless it is adjacent to an unknown edge

Moreover it cannot b ecome unknown if it is pointed towards that unknown edge

For example supp ose e is currently colored and so is oriented j i Then if

another edge adjacent to i is unknown then that unknown edge cannot make j i

unknown since either we continue with e colored or we reverse the direction

to i j which do es not create a conict at i no matter what the direction of the

unknown edge is

If however we have an unknown edge adjacent to j then switching edge e to

i j might create a sink or it mightnot so edge e would b ecome unknown

other edges Supp ose that we have an unknown edge adjacent to i How many

adjacenttoi can it p ossibly help to b ecome unknown Wehave seen that the only

edges it can make unknown are those which are directed i j for some j Supp ose

that there are at least such edges We only selected one edge at a time to change

so changing such edge to j i would leave at least one edge still directed i j

so the value of the unknown edge is utterly irrelevant

Therefore the unknown edge can at most make one edge adjacenttoi unknown

Supp ose that the unknown edge is fi k g and the edge which it mightmake unknown

is i j Then if fi k g is chosen and the direction chosen is k iweknow that this

move would not create a sink b ecause we have an edge leaving i namely i j

Each edge has an equal chance of b eing chosen therefore

P i j A D jjD j P fi k g D A as k i jjD j

t t t t t t

m

Similarly if this unknown edge could create an unknown edge from edge k then

P k A D P fi k g D A as i k

t t t t

m

Summing over all known and unknown edges gives us

X X

P e A D jjD j P e D A jjD j

t t t t t t

eA eD

t t

thereby showing that jD j is a sup ermartingale

t

The dierence between this pro cess and the others that we have considered so

far is that not only is jD j an absorbing state so is jD j n Fortunately

t t

due to our trickof using Y and Y we start with jD j n for b oth b ounding

t

pro cesses at the b eginning of Phase I

At least one edge in D must b e adjacent to an edge which is p ointing awayfrom

t

their common no de otherwise D would b e known If we select this unknown edge

t

and direct it towards this common p oint then it will b ecome known The p ointis

that the probabilit y that jD j do es not equal jD j is at least m

t t

Theorem deals with the case when is the only absorbing state If n is

also an absorbing state of the pro cess then the exp ected time until the state is

absorb ed at either or n is b ounded ab oveby the exp ected time until the state hits

which is O m m O m Therefore after m steps the probability

that the pro cess reaches absorption is The probability that jD j reached is

t

m and so the probability that the number of unknowns went to for b oth Y

and Y is m Therefore after m exp ected runs of length m Phase I will hav e

condensed the b ounding chain to the p oint where it only contains two pro cesses X

and X Phase II then couples these in time m making the total running time

O m m O m

Hyp ercub e slices

To bound the behavior of the hyp ercub e slices chain we again use two phases

Unlike the sink free orientation b ounding chain however this approach will allow

us to show the mixing time of the chain within a constant factor

Consider this alternative version of the heat bath chain for hyp ercub e slices

shown in Figure In the earlier chain we have a chance of just holding

and not switching the random co ordinate This chance is taken care of by the test

U If we do decide to switch the chance that we switch a no de colored to

isjC jn and the chance that weswitch a no de colored to is jC jn The test

for U determines which event actually o ccurs

Markov chain in this form is that now the value of The reason for writing the

jC j is itself a Markov chain It either go es up with probability n jC jn or

down with probability jC jn

Alternative heat bath hyp ercub e slice chain

Set x X

t

Cho ose U

U

Cho ose U

U

Set C fi xig

Set C fi xig

If U

If U jC jn

Cho ose i C

U

Set xi

Else Cho ose i C

U

Set xi

Set X x

t

Figure Alternative heat bath hyp ercub e slice chain

In Phase I all wekeep trackofisthevalue of jC j We know that at the b eginning

L jC j U We run the chain in such a fashion so that if jC xjjC y j then

jC f xj jC f y jthatis the sto chastic pro cess jC X j is monotonic

If we have monotonicity start a hyp othetical pro cess with jC xj L and

t t

another with jC y j U and at some future time t jC F xj jC F y j

t t

then we know that jC F z j jC F xj for all z The statistic jC j will

t

xj where jC xj L be the same for all pro cesses In Phase I let C be jC F

L

t

and C be jC F y j where jC y j U Phase I ends when the jC j values for

U

the X and Y pro cesses merge

How do we guarantee monotonicity If U then wehave all the pro cesses

with jC j odd move and all the pro cesses with jC j even hold If U weha ve

all the pro cesses with jC j hold and all the even ones move Since jC j changes

by at most one at each step if jC xj jC y j then either jC y j jC xj is

even in which case their dierence is at least two Each term changes by at most

one so their dierence changes by at most two and jC f xj jC f y j If

their dierence is o dd then at each step at most one of the values changes so their

dierence changes by at most and again we have that jC f xj jC f y j

If jC xj jC y j then jC f xj jC f y j so altogether we have that

jC j is monotonic Once Phase Iisover we need only deal with states that contain

Heat bath hyp ercub e slice b ounding chain Phase I

Input C C

L U

Set y Y

t

Cho ose U

U

Cho ose U

U

Cho ose U

U

If U andC is o dd or U andC is even

L L

If U jC jn

L

Set C maxfC Lg

L L

Else Set C minfC Ug

L L

If U andC is o dd or U and C is even

U U

If U jC jn

U

Set C maxfC Lg

U L

Else Set C minfC Ug

U L

Figure Heat bath hyp ercub e slice b ounding chain Phase I

the same number of co ordinates colored

In the alternative heat bath chain supp ose that we decided to change a no de

colored to color Some of the no des are known to b e colored in the b ounding

chain that is Y v fg so we roll another die to see if we are cho osing from

these no des or not If we are not then we cho ose a no de to switch via a two step

pro cess First pick a no de i from the unknown no des For a particular pro cess X

lying in the b ounding chain if X v we switch X v to If X v then

we pick another no de from the unknown no des where X v and switch that

So this is a form of acceptance rejection sampling where after the rst rejection we

give up and just cho ose from the points meeting our criteria

we So we pick a no de and if it do esnt meet our criteria of b eing colored

reject it and pick a no de that is colored for sure Heres the rub the rst no de

that wecho ose will always b e colored at the end of the step If it was colored at

the b eginning of the step then we switch it to If it was colored then we pick a

dierent no de altogether and switch the value of that other no de In other words

at the end of the step in the b ounding chain we know that Y v

As in all of the other b ounding chains wehave considered let D denote the set

t

of no des such that jY v j Let K b e those no des that are known to b e colored

Y v fg and K be those no des that are known to b e colored

This bounding chain will always detect complete coupling very quickly

Theorem After n lnn time steps the probability that this bounding chain

wil l have detected complete coupling is at least

t

Pro of Phase I ends when jC j is the same for F z for all z in the state space

t

Phase II ends when F z is a constant We will show that the probability that

Phase I has not ended by time n lnU L is at most and the probability

that Phase I I has not ended bytimen lnn is also at most The union b ound

for failure will then complete the pro of

Heat bath hyp ercub e slice b ounding chain Phase II

Input Y jC j

t

Set y Y

t

Cho ose U

U

Cho ose U

U

Cho ose U

U

If U

If U jC jn and jC j L

If U jK jjC j

Cho ose i K

U

Set y v fg

Else

Cho ose i D

U t

Set y v fg

Else if U jC jn and jC j U

If U jK jjC j

Cho ose i K

U

Set y v fg

Else

Cho ose i D

U t

Set y v fg

Set Y y

t

Figure Heat bath hyp ercub e slice b ounding chain

We begin with Phase I Let C and C denote the values of C and C after

L U

L U

one time step and let a denote the change in the dierence b etween the upp er and

lower b ounds so that

a C C C C

U L

U L

Since jC C j and jC C j we know that a is either or

U L

U L

Consider two cases in the rst case C and C have the same parity Then

L U

with probability they dont move at all and with probability they move

with according to the value of U If U jC jn either a or a where

L

this latter event o ccurs when jC j L If jC jn U jC jn then a

L L U

and if jC jnU then a or a with the second event o ccurring when

U

jC j U Therefore

U

C C

U L

E ajsame parity

n

C C

U L

n

Now supp ose that C and C have dierent parity Then with probability

U L

C moves With probability n C n a will be With probability at most

L L

C n a will b e this is an upp er bound on the probability since C could be L

L L

The other p ossibility is that C moves again with probability In this case a

U

n and with probability at most n C n is with probability C

U U

Adding everything up we get that

C n C n C C

L U L U

E ajdierent parity

n n n n

C C

U L

n

And so it do es not matter whether or not we started with C and C having the

L U

same parity or not the b ound on E a is the same

Another way of writing this b ound is

C C E C C jC C

U L U L

U L

n

Following our usual program we note that this implies that the exp ected value of

k

the dierence after nk time steps is at most U Le By Markovs inequality

and the fact that C C is integer this is also an upp er b ound on the probability

U L

that C C Therefore after n lnU L steps the probabilitythat Phase I

U L

has not ended is at most

Now we tackle Phase II Let D be the number of unknown steps at time t We

t

rst note that D is more than a sup ermartingaleit never go es up Whenever a site

t

that is unknown is hit that site p ermanently changes to known The probabilityof

hitting a site in D is just jD jn and this changes the number of unknowns by

t t

so

E jD jj F jD j

t t t

n

k

so once more wehavethatE jD j ne and running for n ln n steps Phase I I

nk

will have ended with probability at least

WidomRowlinson

As p ointed out in Chapter there are several dierent chains for this mo del The

birth death swapping chain will have the strongest theoretical characteristics How

ever one of the other chains might b e the faster dep ending on the implementation

and so we consider b ounding chains for all of the chains given in Chapter

Nonlo cal conditioning chain

First we consider the nonlo cal chain For this chain wechose a color at random then

rolled a uniform random variable for each no de where that color was not blo cked

An acceptance rejection approach to this problem would be to roll a uniform for

every no de Then if a no de rolls to accept the color and is blo cked leave the no de

uncolored

This idea leads naturally to a b ounding chain approach where a no de which

might be blo cked by an unknown no de itself b ecomes unknown Let N v denote

the no de v together with all the neighbors of v

Nonlo cal bounding chain for WidomRowlinson

Set y Y

t

Cho ose i fQg

U

Let D fv jY v j g

For all no des v

Set y v y v nfig

For all no des v

Cho ose U

v U

If U

v i i

Case All no des in N v are colored fg

Set y v fig

Case there exists w N v D such that w fig

t

Set y v y v fig

Figure Nonlo cal conditioning chain for WidomRowlinson

Theorem Let and

i i i

X

min

i

i

If then the bounding chain wil l have detectedcomplete coupling in the nonlocal

WidomRow linson chain after Q lnnQ steps with probability at least

If Q then we have a tighter bound for

q

i

Pro of Let D denote the set of vertices v in D such that i y v Then we

t

t

i

pro ceed by lo oking at E jD j j F The rst thing to note is that if color i is

t

i

chosen all of D is thrown out when i is removed from y v for all v

t

i

added Therefore all of D is constructed in the second step For a no de to be

t

j

j i

for some j i Let d v to D it must be adjacent to or on top of a no de in D

t

t

j

denote the number of neighb ors of v which are in D

t

X

Q

i i i

E jD jF P v D jF jD j

t t t

Q Q

v V

X X

Q

i

i

j

jD j

d v

t

Q Q

i

v V

j i

X X

Q

i

i

j

jD j

d v

t

Q Q

i

j i v V

X

Q

j

i

jD j jD j

i

t

t

Q Q

j i

T

where Let z b e the Q dimensional vector jz jjz j jz j at time

i i i Q

step t What we have shown is that

E z E E z jF AE z

t t t

with A b eing a Q by Q matrix

Q

Q

I

Q Q

t

An inductive argument shows that E z A E z It is well known from linear

t

t t

algebra that jjA xjj jjxjj where is an upp er bound on the magnitude of

P P

t

the eigenvalues of A In our case jjE z jj n and so jjE z jj n n

i t i

i i

P

steps jjE z jj n so that E jz j n for all i After lnn

i t i

i

Therefore by Markovs inequality the probability that jz j is at most n

i

Using the union bound the chance that all of the jz j are identically is at most

i

It remains to b ound Of course given actual v alues for the it is a simple

i

matter to numerically compute the value of When Q thismay also b e done

q

For Q the metho d of Gershgorin analytically yielding

disks may b e used to show that

X X

i min max

i i

i i

Q Q Q

i

j i

which completes the pro of

Haggstrom and Nelander gave a b ounding chain for the lo cal heat bath

chain and showed that for the sp ecic case of the WidomRowlinson mo del where

all the are equal to the b ounding chain eciently detects complete coupling

i

when Q

Note that for the nonlo cal chain our result gives a p olynomial time guarantee

when Q When Q this is almost a factor of improvement

for large To get a b ound with b oth the Q and only a factor of instead of

we consider a stronger version of the b ounding chain than was found in

For this chain we will be able to show that it converges in p olynomial time when

Q

The single site heat bath chain

The base Markov chain we use is the single site heat bath chain The acceptance

rejection single site heat bath chain for WidomRowlinson works as follows Cho ose

a no de v uniformly at random Cho ose a color c for that no de where P c

P P

and P c i If color c is blo cked at no de v then

i i i

i i

pick anew color rep eating until a nonblo cking color is chosen for v

Acceptance rejection single site heat bath

WidomRowlinson b ounding chain

Set y Y

t

Cho ose v V

U

Set y v

Rep eat

P

Cho ose c so that P c

i R

i

P

forall i Q and P c i

i i

i

If for all w neighb oring v y w fig

Set y w y w fig

Until c or c not blo cked by a neighbor of v

Figure Acceptance rejection single site heat bath WidomRowlinson b ounding

chain

In it was noted that in b ounding chains like the one ab ove the p ossibility

exists of always changing the no de to a particular color regardless of the values of the

neighbors In our case we always have a xed probability of changing the color to

When this probability is greater than the b ounding chain will detect

complete coupling in p olynomial time Therefore this chain was previously known

P P

to converge when or equivalently when

i i

i i

Exp erimentally it was noted in that this b ound was quite lo ose esp ecially

when Q In fact this chain has exactly the same b ehavior as the nonlo cal chain

Theorem Let and

i i i

X

min

i

i

If then the bounding chain wil l have detectedcomplete coupling in the nonlocal

WidomRow linson chain after n lnnQ steps with a probability that is at

least

If Q then we have a tighter bound for

q

One immediate dierence is that the number of steps is larger by a factor of n

owing to the fact that the nonlo cal chain alters all n no des simultaneously whereas

the lo cal chain just alters one at a time Here just measuring running time in terms

of Markov chain steps can be misleading

Pro of The pro of is essentially the same as for the nonlo cal chain A no de gets

moved into z if it is chosen to be the changing no de i is chosen sometime during i

the acceptance rejection pro cess and it is adjacenttoanodeinz for j i Let z

j

i

denote the unknown set for color i after the step is taken Then if color is chosen

in the acceptance rejection pro cess the rep eat lo op stops Therefore the probability

of cho osing i is at most the probability that i gets chosen b efore But this is just

the probability that i is chosen conditioned on either i or b eing chosen and is

For each no de v let d v denote the number of neighb ors of v that lie

i i i

in z for some j i

j

X X

A

E jz jjz jjF P v z jF P v z jF

i

i i i

n

v z

i

v z

i

X

P v z jF jz j

i

i

n

v

X

jz j

i d v i

i

n

v

X

d v

i i

n

v

X

jz j

j i

n

j i

Note that this is exactly the same equation we derived for the nonlo cal chain except

now E z AE z where

t t

Q

Q

A I

n n

This in the same as the matrix for the nonlo cal chain except instead of a Q

factor in front of the rst term wehavean term whichiswhy our new running

time is O n rather than O Q Working through the eigenvalue b ounds as b efore

gives the result in the theorem

By considering a stronger b ounding chain we have increased the range of

i

where have a p olynomial guarantee by a factor of QQ To further increase

the range we intro duce a b ounding chain for the birth death swapping chain

The birth death swapping chain

For the b ounding chain for the birth death swapping chain we may prove the

following

Theorem Let and

i i i

X

min

i

i

i

If then the bounding chain wil l have detectedcomplete coupling in the nonlocal

WidomRow linson chain after n lnnQ steps with probability at least

If Q then we have a tighter bound for

q

max min

i

i

The discrete b ounding chain is derived from the birth death swapping b ounding

chain for the continuous WidomRowlinson mo del The pro of of this theorem and

description of the b ounding chain is completely analogous to the continous case

and will b e p ostp oned until Chapter

The antivoter mo del

As mentioned in chapter the antivoter mo del and the voter mo del are closely

related with one imp ortant dierencethe antivoter mo del has a stationary distri

bution given a nonbipartite graph whereas the voter mo del has absorbing states

where all the no des are colored the same way

On the other hand they are linked in that the mixing time for the antivoter

chain is b ounded ab ove by the absorption time for the voter mo del In fact as we

shall show the time needed for the b ounding chain to detect complete coalescence

of the antivoter chain is the same as the time until absorption for the voter chain

To facilitate complete coalescence we add a symmetric move that at each step

randomly p ermutes the color set With probability no des colored all ip to

color and no des colored all ip to color All this accomplishes is to allow us

at the rst step to say exactly what the color of at least no de is either or As

with the sink free orientations chain we need to know the color of at least one site

at all times in order to makeany progress But once at least some of the no des are

known then selecting an unknown no de and a known neighbor cause the unknown

no de to b ecome known

Instead of lab eling each no de f g or unknown assign each no de a variable

x where x f g The bounding chain pro ceeds by ipping the value of the

i i

variable instead of the known value Then the value of y i is a monomial either x

j

or x for some j in V Once all the monomials y icontain but a single variable

j

x we say that we have completely coupled All no des colored x will be a single

j j

Antivoter b ounding chain

Set y Y

t

Cho ose v V

U

Cho ose U

U

If U

For all v V

Set y v y v

Cho ose w uniformly from the neighb ors of v

Set y w y v

Set Y y

t

Figure Antivoter b ounding chain

color and all no des colored x will b e a dierent color

j

However note that we may run the voter mo del on two colors in a similar

fashion except at each step y w y v instead of y v Absorption occurs

when all the no des are colored x which o ccurs at exactly the same time that all

j

the no des for the antivoter mo del are either x or x A more detailed analysis

j j

can show that the original antivoter chain without the ipping of color classes is

rapidly mixing here we just show that our mo died chain rapidly mixes

Theorem After n c time steps where c is the size of the unweighted

min min

minimum cut in the graph the probability that this chain detects complete coupling

alternatively that the two color voter model reaches an absorption state is at least

Pro of At each step let j be the value such that the most no des are colored

either x or x Let A denote this set of no des and let D V n A Dene

j j t t t

P

deg v Since deg v is always p ositive when we know that

t t

v D

t

jD j our standard goal

t

A no de w moves from A to D if a neighbor v is selected and then w is the

t t

random neighbor of v which is selected to be changed an event which o ccurs with

dv

probability where dv is the number of neighbors of v which lie in D

t

n deg v

This event changes the value of by deg v Similarly a no de w moves from D

t

to A if a neighbor v is selected followed by w b eing selected as the neighb or This

t

changes the value of by deg v The only way that is for one of these

t t

events to o ccur two

X X

dv deg v dv

E jF deg v deg v

t t t

n deg v deg v

v A v D

t t

X

dv

t t

n n

v

t t t

n n

t

Hence is not only a sup ermartingale it is in fact a martingale as well The sets

t

A and D must be connected by a number of edges c where c is the size of

t t min min

the minumum unweighted cut in the graph Since deg v for all v we know

that each edge fv wg in the graph is selected with probability at least n So

the probability that changes value is bounded below by c n and using

min

Theorem completes the pro of

The list up date problem

The b ounding chain for the list up date chains are straightforward but unfortunately

seem to take longer to detect complete coupling Consider the direct MA chain

which selects an item at random according to a distribution p then swaps that

item with the chain directly in front of it To create a bounding chain we need

only to keep track of where each p ossible item could be When jy v j for all

MA list up date chain

Set y Y

t

Request i fng with the probabilityofcho osing i is p

R i

If for some j y j fig

Set y j y j

Set y j fig

Else

For all j from n down to

If i y j

Set y j y j nfj g

Set y j y j y j

Set y j y j fj g

Set Y y

t

Figure MA list up date chain

v fngthe b ounding chain has detected complete coalescence

The arbitrary transp osition chain for the MA list up date pro cess has a similar

b ounding chain Our approach is brute force Given the two p ositions picked one

has a color set and the other has a color set For each pair we use our random

uniform U to decide the ordering and up date the color sets accordingly Note that

taking a single step of the b ounding chain may take up to n time within the for

Arbitrary transp osition for MA b ounding chain

Set y Y

t

Cho ose w f ng

U

Cho ose w f ngn fv g

U

Set v minfw w g

Set v maxfw w g

Set y y v

Set y y v

Set y v

Set y v

Cho ose U

U

For each i j with i y and j y

d d d

If U p p p

i i j

Set y v y v fj g

Set y v y v fig

Else Set y v y v fig

Set y v y v fj g

Set Y y

t

Figure Arbitrary transp osition for MA b ounding chain

lo op This is in sharp contrast to many of our other chains where the time for a

single step was roughly the same as for the original Markov chain

i

When the weights p are geometric so that p the higher weight items

i i

tend to be placed at the front with very high probability Under these conditions

a mo died version of this b ounding chain where a step takes only as long as the

chain step may be used Moreover this chain converges quickly under Markov

certain conditions This chain takes advantage of the fact that the high weight

items tend to collect on the left side We let m be the smallest value such that

items through m are all known Let LE F T be the set fmg and R I GH T

be the rest of the no des Then instead of just wildly cho osing p ositions we rst

decide whether p ositions come from LE F T or RI GH T and pro ceed from there

Note that for anypermutation cho osing a random p osition in that p ermutation

is equivalent to cho osing a random item This equivalence will be very useful in

constructing a b ounding chain

Theorem Suppose that p p for al l i Then

i i

E time until complete coupling n

Pro of Given that the weights decrease geometrically we know that it is likely

that the high weight items will be at the front Therefore in this pro of we take a

departure from keeping trackofjD j and instead keep trackofm wherem is the

t t t

largest value such that fm g are in A Some value im mightalso be

t t t

in A but we will not use that information in our analysis

t

We shall show that on average the value of m grows larger at each time step

t

so that n m is a sup ermartingale Once m n complete coupling has o ccurred

t t

e are done and w

In case both records are chosen from no des in A therefore m m since

t t t

we are only moving around known no des

The value of m can change in the other two cases For instance if b oth values

t

come from RI GH T then it is conceivable that the value of m can go up if we know

t

the item which gets placed in p osition m

t

The probability that both choices come from R I GH T that one choice has po

sition m and the other choice is the largest record in RI GH T is at least n

it could be n if these two p ositions are the same The ratio p p is at least

j j

so the probability that this largest record gets sorted with the record in p osition

m is at least Hence P m m n

t t t

Unfortunately m may go down if we cho ose one p osition from LE F T and the

t

other p osition from R I GH T This happ ens when a m Potentially m a

t

To upp er b ound the probabilitythat this o ccurs we rst note that the probability

a and a particular unknown record j are b oth chosen is just n that p osition

Now the closest that an unknown record can be to p ositon a is m a and so

ma ma

ma

the probability that sorting o ccurs is at least p p p Therefore

s

the probability of b ecoming unknown is at most

ma

p

s

ma

ma

ma

p p

p p

s

s

d

This ratio is largest when p p is smallest We know that p p and

s i id

d

p p

id i

Let i b e the record at p osition a For each d there are at most two records whose

lab el is d away from i Moreover the distance b etween these two records and a is at

least m a Hence the probability that p osition a b ecomes unknown is b ounded

ab ove by

nm

dma

X

ma ma

dma

d

Combining these terms we have that

m

m a

t

t

X

a E m jm m

t t t

m a

t

n n

a

m

m a

t

t

X

a m E m m jm

t t t t

m a

t

n

a

m

a

t

X

a

a

n

a

n

n

Therefore the exp ected numb er of steps needed until m n is nn or just

t

n by Walds identity

Application to nonparametric testing

This problem of sampling from the list up date distribution with geometric weights

has another application quite unrelated to the list up date problem Supp ose that

we have sampled p eople ab out their favorite ice cream Ranking the results

gives us a p ermutation We wish to construct a test for a sp ecic statistic of some

sort such as how many ice cream makers in the top ve p ositions are lo cated in

North Dakota

In order to determine if our test is statistically signicant we rst need some

mo del of how the p ossible rankings are distributed One common metho d for accom

plishing this is to assume that given the true distribution of rankings a sampled

d



permutation is randomly chosen with weight prop ortional to where

so that it is unlikely that our surveyed rankings will be a great distance from the

true set of rankings

This denition b egs the question what distance do we use to measure d

One p ossibilityis the sum of squares distance that is

X

i i d

i

an idea known as Molloys Rule

There are some obvious variations on this such as using j i ij instead

of the squares However we shall lo ok further at the sum of squares distance and

see where it leads Expanding we have that

X

d i i i i

i

X X X

i i i j

i i i

Two facts help us out First assume that is the identity permutation this

changes nothing if we assume that has b een applied to Second the rst two

terms in this pro duct are constants and so raised to these terms are constants as

d



well Therefore the distribution Z satises

P

i j



i

Y

i



i

i

Y

n i



i

i

where again we may insert the n in the nal equation b ecause it is simply a multi

plicative constant This is exactly the form of the geometrically weighted list up date

problem with ratio and so if our previous theorem tells us that wemay

exactly sample for this problem as well

Other applications of b ounding chains

We have already seen that having a b ounding chain available gives us a means

for exp erimental determination of the mixing time of the chain However the

potential of b ounding chains is much greater In the next chapter we discuss

coupling from the past CFTP a means for generating p erfect samples from a

distribution Bounding chains give a way to use CFTP for a particular chain

Moreover upp er b ounds on the time needed for b ounding chains to detect complete

coupling will provide an upp er bound on the running time of the algorithm

Coupling from the past has but one weakness that the user must commit to

running the algorithm until termination in order not to intro duce bias into the

sample eventually obtained Therefore in the chapter following our discussion of

CFTP we examine another metho d for generating p erfect samples based on the

concept of strong stationary times Again the existence of b ounding chains will b e

an essential comp onent in creating these p erfect samplers

Arbitrary transp osition for MA b ounding chain II

Set y Y

t

Set A fv jy v j g

t

Set m min fy yig A

i t

Set LE F T fy mg RI GH T fy m mg

Cho ose U uniformly at random from

m

Case U

n

Cho ose i LE F T Cho ose j uniformly from the items in LE F T

U

Let a b e the p osition of record i

m m m nm

Case U

n n n n

Cho ose i uniformly from the items in LE F T

Cho ose j uniformly from the items in RI GH T

Let a b e the p ositions of record i

m nm m

U Case

n n n

Cho ose U uniformly at random from

If U m n

Set a m Set j to b e the highest probability item in R I GH T

Cho ose U uniformly at random from

If a m and j is the highest probability unknown record

Let j b e second highest probability among unknown records

If U

p p

j

j

Let y afj g

Else if a m

Set p maxfp p g Set p minfp p g

i j s i j

If record j is in LE F T

If U

jbaj

p p

s

Sort items i and j

Else

Antisort items i and j

If record j is in R I GH T

If U



p p ma

s

Let y afg

Else

Let y a

Set Y y

t

Figure Arbitrary transp osition for MA b ounding chain

Chapter

Perfect sampling using coupling

from the past

It is the mark of an educated mind to rest satised with the degree

of precision which the nature of the sub ject admits and not to seek

exactness where only an approximation is p ossible

Aristotle

The traditional Monte Carlo Markovchain metho d yields an answer that is only

approximately distributed according to the desired distribution and this was the

state of the art for decades Fortunately certain researchers never read Aristotle

and in the last veyears metho ds have b een discovered which allow exact sampling

from the stationary distribution of a chain Such a metho d is often referred to as a

perfect sampling algorithm

Propp and Wilson intro duced coupling from the past CFTP as a simple

algorithm for generating samples drawn exactly from the stationary distribution of

achain In only a few years Monte Carlo Markovchain practitioners have expanded

the scop e of CFTP applying the pro cedure to dozens of dierent chains

The general idea is quite simple We have often stated that the sto chastic

pro cesses which we consider have index set over all integers including negative

ones We now describ e how to simulate such achain

Reversing the chain

Supp ose that X has distribution p Then the distribution of X will b e p p P

We wish to nd a distribution p suchthatp p P If P has a unique stationary

distribution thenP and setting X to have distribution for all i including

i

negative i gives the result

Hence we supp ose that each X is distributed according to the unique stationary

i

distribution of the chain Given X we wish to simulate X X Furthermore

Using the fact that we wish to insure that P X x jX x P x x

i i i i i i

X X are stationary gives

i i

P X x X x

i i i i

P X x jX x

i i i i

P X x

i i

P X x jX x

i i i i

X x P X x

i i i i

P

x

i

P x x

i i

x

i

This relationship inspires the following denition

Denition For a Markov chain with transition matrix P and unique stationary

distribution let

x

P x y P y x

y

be the reversibilization of P If a chain is reversible then P P

This is why the detailed balance condition is also known as reversibility Given

a reversibile chain and X started in the stationary distribution we can run the

chain backwards according to P in order to obtain X X X which are also

distributed according to andX X X will b e distributed as a sto chastic

pro cess on a Marko v chain

Coupling from the past

Coupling from the past utilizes this characterization to obtain samples that are

drawn exactly from the stationary distribution Supp ose that X is stationary

Then using our construction X will also b e stationary for all t Therefore CFTP

t

t

starts at X and runs forward up to time Supp ose that F is constant Then

t

t

we have that X F X which is a known value Therefore we have obtained

t

stationary distribution X a p erfect sample from the

t

The only diculty arises when F is not constant Then we simply increase

t

t until F is constant As long as this eventually o ccurs with probability this

algorithm will almost surely terminate

Coupling from the past

Set t

Rep eat

Set t t

Run chain from t to t

t

Until F is constant

t

Output F as our answer

t

Figure Coupling from the past CFTP

CFTP and b ounding chains

t

Propp and Wilson noted that CFTP maybeusedanywhere F may b e shown

to be constant although most of their examples dealt with monotonic Markov

t

and chains For monotonic chains recall that we need only keep track of F

t

F and wait until they meet It was shown in that the exp ected time until

they meet is of the same order as the mixing time of the Markov chain

With b ounding chains we only know that the complete coupling time gives

an upp er b ound for the mixing time of the chain but the reverse may not be

t

true Still b ounding chains do allowus to determine when F is constant and so

immediately the results of the last twochapters indicate that wehave algorithms for

p erfect sampling from the hard core gas mo del using the DyerGreenhill chain the

Potts mo del using single site up date or SwendsenWang the k colorings of a graph

the sink free orientations of a graph the antivoter mo del the restricted hyp ercub e

and the list access problem

The running time of CF T P is of the same order as the time needed for complete

coupling to b e detected by the b ounding chain Therefore our results showing that

the b ounding chain detects complete coupling immediately extend to give a priori

bounds on the running time of CFTP

The amountofwork needed for CTFP comes from the memory required to store

for t t In practice random seeds are used for a pseudorandom each f

t

number generator that creates the same sequence of random numb ers each time

given the same seed Therefore the memory requirements are reduced to retaining

the seed for t which is usually on the order of ln n

Coupling from the future

As Stephen Hawking pointed out Disorder increases with time b ecause we mea

sure time in the direction in which disorder increases In coupling from the past

we found X by starting at a point t in time and then running forward The

counterintuitive part of this pro cess from a physical point of view is that disorder

as measured by the number of p ossible states admitted by the bounding chain is

decreasing as time moves forward However we could mo dify our algorithm to start

at a p oint t in time and run backwards using the reversed transition matrix P

We shall refer to this time reversed version of the algorithm as coupling from the

future CFTF

Practically CFTF is no dierent from CFTP Most of the chains we consider here

are reversible and so running the chain forwards or backwards makes no dierence

whatso ever to running time or memory requirements This notion of CFTF do es

have some nice theoretical implications however

The rst concerns the time arrow asso ciated with disorder A general

notion of entropy is the logarithm of the number of states that the system could

P

p ossibly be in With bounding chains the entropy of a state Y is ln jY v j

t t

v

When the entropy is jY v j for all v and we have completely coupled In

t

coupling from the past the entropy decreases as time moves forward contrary to

the usual physical use of the term

With CFTF the b ounding chain is working on the reversed chain and so as

the entropy decreases we are moving backwards in time exactly as intuition would

suggest

Second with coupling from the future we need only dene our sto chastic pro cess

on index set We only need the reversibilization for moving backwards on a

nite set of indices not for the entire set of negativeintegers

Third the metho d of attack in Chapter will also concentrate on a pro cess

X X where X is assumed to be stationary This forward way of lo oking at

things makes clearer the connection between the two

CFTF like CFTP is an example of an uninterruptible p erfect sampling algo

rithm Once a run is started to compute X the user must commit to nishing

the run in order to obtain unbiased samples In the next chapter weshow how this

limitation can b e removed with some extra work

Chapter

Perfect sampling using strong

stationary times

Technological progress has merely provided us with more ecient means

for going backwards

Aldous Huxley

The reversibilization of a Markov chain the ability to go backwards in time

provides the cornerstone for another algorithm for generating exact samples Unlike

CFTP this algorithm will be interruptible in that the user can give up shut o

the computer and go home at any time in the algorithm without intro ducing bias

into the sample

Supp ose that we have a Markov pro cess X X where X is b egun in the

unique stationary distribution of the chain Then X will also be stationary for

t

all times t Unfortunately the simulator must start in a sp ecied state so instead

of having the situation P X x x we must deal with P X xjX x

t t

which in general will not be x

We counteract the eect of conditioning on the value of X using a strong

stationary stopping time Recall that a stopping time is anyrandomvariable such

that the event t is X X measurable A strong stationary stopping time

t

is one that obviates the eects of starting in a particular state

Denition Say that is a strong stationary stopping time if

P X xj t X x x

t

Here we concentrate our eorts on a p erfect sampling technique intro duced by

Fill for a class of chains which are sto chastic monotonic a relaxation of the

monotonicity prop erty discussed earlier In this pap er Fill commented that the

metho d could be generalized and Murdo ch and Rosenthal sp ecied an algo

rithm whichwas applicable to a broader range of chains Here we shall refer to this

more general algorithm as FMR

Murdo ch and Rosenthal develop ed FMR as an algorithm Here we show that

their idea also leads to a strong stationary stopping time Using this idea we develop

bounds on the running time of FMR in terms of the complete coupling time and

stationary mixing time of the chain Rather than follow Murdo ch and Rosenthals

development we b egin by examining how general strong stationary times may be

develop ed using complete coupling

Consider again X X where each X has distribution Supp ose that is

t

any stopping time which o ccurs by time t with p ositive probabilityandboth x

and x are p ositive Then

P X x X x j t

t

P X xj t X x

t

P X x j t

P X x j t X x

t

P X x

t

P X x j t

P X x j t X x

t

x

P X x j t

Our goal is to construct a such that the fraction multiplying x is equal to

Recall that for a pro cess suchasX the reversibilization P y x xP x y y

allows us to simulate the chain in the reverse time direction Therefore one metho d

of generating the random vector X X is to generate X according to and

t t

then run the chain backwards using P

Just as we use functions f where X f X to takemoves on the Markov

t t t t

chain in the forward direction let f where X f X take moves in the

t t t t

a a

reverse direction For ab let F f f f so that X a F X b

b b a

b b

a

Now supp ose that F is a constant Then this constant is a random variable

b

with a distribution that is indep endent of X Therefore we let be the rst

t

time t such that F is constant This means that

t

P X x j t X t P X x j t

t

and

P X xj t X t x

t t

alue of the Recall that in CFTF and CFTP the goal was to determine the v

xed random variable X Now we are more exible We are willing to accept any

random variable X as stationary as long as complete coupling has o ccurred moving

t

backwards from time t to time

Algorithmicallythis may be used to take p erfect samples as follows Start X

in an arbitrary state x Run the chain forward to time t This generates a path

X X X X Now run the chain backwards from time t to conditioned on

t

the moves made on the path If these backwards moves completely couple the chain

then X will b e stationary

t

The functions f move the pro cess backwards in time By conditioning on the

t

backwards path we mean that wecho ose f conditioned on the eventthatf X

t t t

X

t

FMR Perfect Sampling

Input T X X X

T

Run the chain backwards conditioned on the path X X

T

as a complete coupling chain

If the backward chain completely couples by time

Output t is true

Else

Output t is false

Figure Complete coupling strong stationary times

Use of a strong stationary stopping time has one great advantage over CFTP

and CFTF It is interruptible The user runs the chain forward until t If

this takes to o long the user can ab ort the pro cess without intro ducing any bias

into samples which might be taken later As Fill notes this is really more of

a theoretical advantage than a practical one since when CFTP has small exp ected

running time the probability that the actual running time is larger by a factor of

k declines exp onentially in k

The amountofwork can b e much greater for this pro cedure than for CFTPWe

must run the backward chain conditioned on the forward path whichmay b e quite

dicult to do However for lo cal up date algorithms this is usually quite easy and

later we describ e how this pro cedure may be applied to the DyerGreenhill chain

for the hard core gas mo del

On the other hand the amount of work can also be much smaller We are

conditioning on the forward path started at x The choice of x can make it more

likely that the chain will have completely coupled Consider the case of single site

up date for the hard core gas mo del If westartx at the state of all no des colored

the empty indep endent set then in the backwards moves it is more likely to color

a no de We have seen in the b ounding chain that when a no de is colored it

immediately moves from unknown to known Therefore it is p ossible that starting

will all no des colored makes it more likely that the backwards b ounding chain will

show complete coupling

Upp er b ounds on the strong stationary stopping time

Recall from Chapter that the separation distance to is dened as

jjp jj sup pA A

S

Aj A

Just as the coupling theorem allows us to b ound total variation mixing time in

terms of a coupling stopping time Diaconis and Aldous showed how separation

mixing time is b ounding by strong stationary stopping times

Theorem Suppose that a Markov chain has a strong stationary stopping time

then

t

jjp jj P t

S

x

Therefore we do not exp ect that our strong stationary stopping time will run

faster than the separation mixing time Recall that is the rst time that

S

the separation distance starting at an arbitrary state falls below and T is

BC

the time that the b ounding chain detects complete coupling

Theorem Let T E T Then for any positive t

BC S

dtT e

P t

where E E T For reversible chains we may also set T

BC S

E T

BC

Pro of Intuitively running the chain for E T steps gives the b ounding chain

BC

Running the chain for time to detect complete coupling in the reverse direction

another T steps after that down to allows the chain to almost reach the stationary

S

distribution so that conditioning on the value of X do es not change the probability

T

T

of complete coupling to o much Let BC denote the event that the b ounding chain

detects complete coupling from time T to time Then given that we start at x

T

the probability that T is just P BC jX x Let T E T

BC S

from time T to time By Markovs inequality running the chain backwards from

T E T gives us at least a chance of the b ounding chain detecting

BC S

complete coupling The remaining time from T down to gives us at least a

chance that X will b e stationary no matter what the value of state X Hence

S

even if we know that the b ounding chain detected complete coupling from time T

down to time we know that X will still b e close to stationary

S

T

P BC X x

T

jX x P BC

P X x

T

X x P BC

T E T

BC

P X x

T T

P X x jBC P BC

T E T T E T

BC BC

P X x

x

x

Therefore P T and since the intervals T T T are

k

indep endent P kT which gives the rst result

For reversible chains it is well known that E T

S TV BC

which yields the nal result

This shows that when dealing with reversible chains the running time of FMR

will be similar to the running time of CFTP in number of steps with the added

bonus of only requiring a set amount of time and memory before the algorithm

b egins Note that for the sp ecic case of monotonic reversible chains Fill proved

atighter running time b ound of just T which mightbe faster than T

S CC

Application to lo cal up date chains

In this section we apply this strong stationary stopping time pro cedure to the hard

core gas chain of Dyer and Greenhill and the single site WidomRowlinson heat

bath chain b oth of which are lo cal up date Markov chains

The Hard Core Gas Mo del

Recall from Chapter the DyerGreenhill chain for the hard core gas mo del The

use of FMR requires two things rst an ecient means for determining when

complete coupling has o ccurred and second a way of running the chain forward

conditioned on a single path outcome

One technique for determining complete coupling is the b ounding chain given

in Chapter as Figure For the remainder of this section then we discuss the

requirement sp ecic to FMR that of running the chain conditioned on the outcome

of a single particle

Consider the path X X X If we needed to keep trackoftheentire state

T

of X for every t in T then FMR would require vastly more memory than

t

CFTPFortunately for lo cal up date chains this is not necessary Just as with CFTP

it is enough to record the move made from X to X for all t in T

t t

For the DyerGreenhill chain these moves consist of three typ es HOLD where

X X FLIPv where the mo ve consists entirely of ipping the color of no de

t t

v from to or vice versa and SWAPv w where no de v was colored no de

w was colored and the chain switched their values We deal with each of these

moves in turn

For HOLD we must cho ose at random f such that f X X This may be

t t t t

accomplished through acceptance rejection Pick a random f as usual If f X

t t t

X then keep this move otherwise reject it and b egin again The exp ected time

t

needed to makeamove is just the inverse of the probability that a random f xes

t

be If v X If v is already then this is the probability that v is chosen to

t

is then this is the probability that v is chosen to be A lower b ound on the

probability of holding is the minimum of these two chances Hence

P f X X min

t t t

and the exp ected number of choices of f we must make until acceptance is just

t

maxf g

For the FLIPv move clearly the no de chosen for f is v If no de v moved from

t

to then U is uniform over and if v moved from to then U is

uniform over

Finally for the SWAPv w move v was again the chosen no de while U is

uniform over Hence treating as a constant the memory needed

is similar to CFTP where the randomness is coming from true random numbers

Single site WidomRowlinson

The discrete WidomRowlinson case is similar There are three typ es of moves

HOLD where X X In MOVEv c no de v is changed to color c where

t t

neighbors of v include at least one colored c and MOVEv c where v is changed

to color c but all of the neighbors of v has color Again the HOLD move may

be accomplished using acceptancerejection sampling The exp ected time needed

will be at most max Q The MOVEv c case has the choice of no de

immediately b eing v and knowing that the color moved indicates that U is uniform

over FinallyMOVEv c has choice of no de v and U uniform over

the range of values which give color c

As in the hard core gas mo del case the information ab out moves which needs

to be saved is in fact quite small and the path in its entirety do es not need to be

saved

Application to nonlo cal chains

Of course just b ecause achain is nonlo cal do es not mean that FMR cannot b e ap

plied Recall the nonlo cal up date chain for discrete WidomRowlinson Figure

drew a set of points of a particular color which are Poisson distributed Now sup

pose that we are given two states X and X and we are trying to compute f

t t t

conditioned on the fact that f X X

t t t

In going from X to X a color c was chosen all p oints of that color were

t t

removed and the color c was added indep endently for each nonblo cked no de with

probability In going from X to X then we clearly must cho ose the same

c t t

color c Now the lo cation of the points of color c give us partial information ab out

what our choices for eachnodewas Basically if a no de is unblo cked in X and not

t

colored cthen we do not color that no de c If it is colored c in X then that no de t

will be colored c if it is not blo cked Finally if a no de in X is blo cked we must

t

randomly cho ose whether or not to color the no de c since the value at X imparts

t

no information

In other words at each step of the chain we distribute color c as a Poisson p oint

pro cess with rate on the no des The distribution of p oints of color c in X tells

c t

us the value of the p oint pro cess on all nonblo cked no des which can then b e easily

extended to a p oint pro cess on all of the no des Total memory requirement is again

the memory needed to record a single step of the chain making the memory needs

roughly the same as with CFTP

The theoretical advantages of FMR do not app ear likely to outweigh the algo

rithm complexityover CFTPHowever it is interesting to note when an interruptible

p erfect sampling algorithm exists that is comp etitivewithCFTP

Chapter

Continuous Mo dels

I am so in favor of the actual innite that instead of admitting that

Nature abhors it as is commonly said I hold that Nature makes frequent

use of it everywhere in order to show more eectively the p erfections of

its Author

Georg Cantor

Until now we have only dealt with discrete state spaces However Monte

Carlo Markov chain metho ds can also be used to obtain samples when the state

space is continuous We b egin byintro ducing some new techniques and notation to

deal with continous state spaces

Continous state space Markov chains

The construction and prop erties of a Markovchain over an arbitrary sample space

are quite similar to the case when is nite First we note that wemust haveaset

of measurable sets on say F Consider the sto chastic pro cess X X X

such that each X is a random variable drawn from a probability space on F

i

Let X X X X be the algebra generated by X X

i i i

Denition Let C F The stochastic proc ess X X X X on

F is a Markov chain if

P X C j X X X P X C jX

i i i i

Instead of a transition matrix we now record probabilities of moving via a

transition kernal K that b ehaves inavery similar way to its discrete counterpart

Denition A kernal K F is a transition kernal for a Markov

chain if

K x C P X C jX x

t t

for al l x in and C F

Denition Let p beaprobability distribution on F and K a kernal Then

for C F

Z

pK C K x C pdx

x

Fact Suppose that the random variable X has distribution p Then X has

t t

distribution pK

Denition Dene K I the kernal mapping probability distributions to

themselves Recursively dene for al l C F

Z

t t

K x C K y C K x dy

y

Fact Note that K K Moreover for al l C F

s

P X C jX xK x C

ts t

The notions of irreducibility and ap erio dicity also extend in a natural way to

the continuous world

Denition A Markov chain is irreducible if there exists a measure on F

such that for al l C with C and for al l x there exists a t such that

t

K x C

Denition Suppose that we have an irredicible Markov chain and that is

partitioned into k sets E fE E g If for al l i m and al l x E

k i

K x n E for j i mod k then E forms a k cycle in the Markov chain

j

The p erio d of a Markov chain is the largest value of k for which a k cycle exists

If k the Markov chain is said to be ap erio dic

Denition A Markov chain is ergodic if it is both irreducible and aperiodic It

is geometrical ly ergodic if there exists a probability distribution constant

and a function x such that

t t

jjK x jj x

TV

The chain is uniformly ergodic if v is constant over v

Our goal is to b ound away from and show that x is not exp onential in the

input size for our starting value x therebyshowing that convergence to the desired

stationary distribution o ccurs in p olynomial time

As in the discrete case the concept of reversibility will allow us to take exact

samples from the stationary distribution of achain

Denition A kernal K satises the detailed balance condition or is reversible

with respect to if for al l A B F

AK A B B K B A

A ny distribution which is reversible with respect to K is stationary for K If dx

is a density function sx and K x dy is a density k x y then the reversibility

condition becomes sxk x y sy k y x

Finally we note that the coupling theorem carries through to the continuous

case

distributed according to some station Theorem Suppose that X x Y is

ary distribution and the two stochastic processes are coupled Then if T is the

C

coupling time

t

jjK x jj P T t

TV C

Continuous time Markov chains

Another extension of Markov chains is to the index set of the sto chastic pro cess

Previously our sto chastic pro cess fX X X g was indexed by the in

tegers For continous time Markovchains the sto chastic pro cess will b e indexed by

an increasing sequence of real values k k k

Roughly sp eaking continuous time Markovchains intro duce clo cks to changes

Supp ose that we are at state x Instead of a random change in the state o ccurring

at every time step a clo ck is attached to every random change This clo ck is an

exp onential random variable with rate given by a rate kernel W x A We wish to

ensure that the rate at whichclocks expire is not to o high so that with probability

only a nite number of events o ccur in a nite amountoftime

Denition We say that W F is a rate kernal if

Z

W x dy W x A

y A

and

Z

W x dx Wx

x

Final ly we require that if W x dy then W x A Wy A for some con

stant for al l A F This insures that no state is left instantaneously

elapses between changes These conditions guarantee that a nite amount of time

W x

that is P x x e

ts t

Now b ecause we are dealing with exp onential random variables we rst gather

some facts ab out their distribution Supp ose that E a denotes an exp onential

random variable with rate a

Fact Exponential random variables have the forgetfulness property so that for

t s

at

P E a t sjE a s P E a t e

Fact The rate of the minimum of two exponential random variables is the sum

of their rates

minfE aEbgE a b

Fact For two exponential random variables E a and E b

a

P E a E b

a b

This rst fact tells us something very imp ortant ab out continuous time Markov

chains If we are at state x and s time passes without a clo ck expiration then the

distribution of each clo ck variable is exactly the same as it was b efore Nothing

happ ening over a time p erio d do es not giveany information ab out what will b e the

next event

The second fact allows us to give a full description of howthestochastic pro cess

X X may b e formed Supp ose that X x Then P k k t

k k k i i

i i i

exp fW x tg and

W x A

P X A

k

i

W x

For notational convenience we set X X where k tk

t k i i

i

Denition The transition kernel K is dened as

t

K x AP X AjX x

t

The third fact will be useful when trying to determine the probability that a

particular event o ccurred given that some event o ccurred

Intuitively a distribution is stationary if the rate at which probability leaves a

is equal to the rate at which it is entering More precisely state

Theorem Suppose that is stationary for the Markov chain with rate kernel

W Then W

This imp ortant prop erty allows us to add two continuous time Mark chains

with a common stationary distribution

Theorem Let W and W be two rate kernels for which is stationary Then

W W is the rate kernel for a new chain for which is also stationary

In other words if we add new moves to the chain satisfying W then the

new

stationary distribution of the chain will remain unchanged

The Continuous Hard Core Gas Mo del

In the discrete hard core gas mo del a conguration consisted of a set of vertices

on a graph colored and the rest colored In the continuous case we again have

a set of points colored but now the points come from a continuous state space

d

such as a subset of R

d

Let x S where S R is a bounded Borel set Then x is a conguration if

the number of p oints in x is nite We will write x fx x g and let nx be

n

the numb er of p oints in x In the hard core gas mo del each p oint has a hard core

of radius R around it Let a b be the distance between two points a and b

The fact that the core is hard means that two cores cannot intersect or equivalently

d

in R no two points of a conguration are allowed to be within a distance R of

each other We shall refer to a conguration satisfying this prop erty as valid The

probability distribution for the hard core gas mo del is

nx

sx

Z

if x x R for all i j and otherwise As in the discrete case Z is the

i j

normalization constant that makes s a probability distribution

To obtain samples from this distribution we will use continuous time Markov

chains The rst chain we consider was prop osed by Lotwick and Silverman

who showed that the chain do es converge geometrically to the correct stationary

distribution In other words for each starting state there exists constants C and

C such that the total variation distance b etween the state of the chain after t steps

and the stationary distribution is at most C exp tC Of course this result is

not helpful in practice if the constant C is exp onentially large While we do not

present a full analysis of the convergence rate for some ranges of we will show

that this chain do es mix in p olynomial time

The chain of Lotwick and Silverman is a spatial birth death chain These chains

have been extensively studied see and have been widely used

d

for generating p oint pro cesses on subsets of R The idea is simple There are

two typ es of events births and deaths Births add points to the conguration and

deaths remove points Let x be a conguration and z a point in space S If the

birth rate at p oint z given that we are currently in conguration x is bz x then

t in dz is added to x in time interval dt is bz xdxdz the probability that a p oin

Similarly if z is a point in x then dz x is the rate at which the p oint z dies

In time interval dt the probability that point z is removed from conguration x is

dz xdt Preston showed the following version of reversibility for these birth

death pro cesses

Theorem Let x be a conguration and z be a point in the state space If

bz xf xdz xf x fz g

then f x is a stationary density for the birth death process

Toshow that the LotwickSilverman chain has the correct stationary distribution

is an easy application of birth death reversibility Let U x denote the volume that

is within distance R of x and supp ose that wehave scaled the problem so that the

volume of S is Then new points are added to the set x b orn in the area not

within distance R of x Points in x are removed from the set die at rate

Note that if we remove the restriction that the cores must not intersect the

n

density just b ecomes f x xand we have aPoisson point pro cess on S

When actually simulating the chain it is quite exp ensive to compute U x

so instead an acceptancerejection metho d is used A p oint is chosen uniformly

at random from M at rate If it lies within distance R of x it is not added

but if no cores intersect the core of the new point it is added This is a thinned

Poisson pro cess and so will have rate equal to the old rate times the probabilityof

acceptance or exactly U x as desired

only verify that birth death Since this chain is a birth death chain we need

reversibility is satised The death rate for any p ointisconstant indep endentofx

so dz x for all z and x The birth rate for z suchthatz is not within distance

LotwickSilverman Continuous Hard Core Chain

Set X X

t

Cho ose t exp onential with rate nx

Cho ose U uniformly from

Case U nx

Cho ose v uniformly at random from S

If v x R

Set X X fv g

Case U nx

Cho ose i uniformly from fng

Set X X nfx g

i

Set t t t

Set X X

t

R of any p ointinx is also a constant Hence

nx

dz xf x fz g bz xf x

Z

and dxf xdx is a stationary distribution The set of p oints in a conguration

may be though of as a queue That is the unique stationary distribution is a

consequence of the coupling lemma and the fact that for any state thereisasmall

but p ositive probabilitythat after unit of time the queue will b e empty Even if

the queue do es not empty in this time it will not have grown to o much larger with

high probability

Continuous b ounding chains

In using CFTP the goal is the same as in the discrete case given an unknown state

at time t determine whether or not the state b ecomes known at time However

two diculties make use and analysis of CFTP more dicult First it is quite

dicult to show that no matter whichstate we started in at time t we ended up

in the same state at time The number of p ossible states to consider is to o high

Therefore the brute force approach that worked quite well in the discrete case

will not avail us here Instead we use the approach of Kendall where we learn

something ab out the unknown state at time t by lo oking farther back in time

Denition Say that a point z S has birth death interval b d if it was

z z

born at time b and dies at time d

z z

Note that just b ecause a point was b orn at time b do es not guarantee that it

z

was added to the set However whether or not it was added to the set it is removed

from the set at all times greater than d Therefore the only p oints which are even

z

p ossibly part of the conguration at time t are those whose birth death interval

b d contain the time t This approachallows us to initialize a b ounding chain

z z

The idea for continuous state space b ounding chains is straightforward A con

guration is a set of p oints The bounding chain keeps track of two sets of points

points that are known to b e in the set x and sets that are p ossibly in the set x

A D

Ateach step wesay that x x b ounds x if x x x x We shall refer to

A D A A D

points in X as known and p oints in X as unknown

A D

Our initialization pro cedure says that at time t the p oints x consists of all

D

points whose birth death interval contains t We wish to take steps in the Markov

chain such that if x x bounds x at time t it will also bound it at time t t

A D

The pro cedure works as follows Supp ose a new p oint is b orn Points which are

blo cked by p oints in x are denitely not added to the set Also points which are

A

not blo cked by either x or x are denitely added to the set and so are added to

A D

x The only uncertainty comes when attempting to add points which are blo cked

A

by points in x It is unknown whether these points are added or not and so they

D

are placed in x

D

al Supp ose that we are given a list of birth and death times over a time interv

a b and a list of z such that b a d We rst initialize the b ounding chain

z z

setting X to b e this set of z whose lifetime falls across a We then pro ceed in sorted

D

time order examining and dealing with births and deaths as they arise Again this

may be done in linear time and so utilizing the bounding chain do es not increase

the order of complexity of running the Markov simulation Note that generation of

the initial X is not dicult If we have no prior knowledge ab out the time b efore

D

a then we simply use a Poisson p oint pro cess with parameter generating lifetime

lengths for these p oints and then using the forgetfulness prop erty of exp onentials

generate death times for each of these points If we already know something ab out

birth and death times b efore a we simply condition on that information when

creating the Poisson point pro cess

Wemay use the b ounding chain exactly as we did in the discrete case for either

determination of mixing time or as a black box for a CFTP p erfect sampling

algorithm As with the discrete algorithms if we wish to exp erimentally determine

mixing time we simply check whether X is empty at time b If it is the state

D

x will have coupled with the stationary path Nothing in our presentation of

the b ounding chain prevents us from mo difying b on the y and continuing until

Bounding Chain for LotwickSilverman

Input List of events interval a b fz b ad g

z z

Output X X X

b A D

Set X fz b ad g

D z z

For each event e in sorted time order

If e d for some z

z

If z X

D

Set X X nfz g

D D

If z X

A

Set X X nfz g

A A

If e b for some z

z

If z X X R

A D

Set X X fz g

A A

If z X R and z X R

A D

Set X X fz g

D D

Figure Bounding Chain for LotwickSilverman

X is empty

D

For coupling from the past we need to b e a little more careful howwe generate

the list of birth death times and lifetime crossing times We generate birth death

times in reverse that is starting at time we move backwards in time and have

deaths app earing at rate For each death we then have a corresp onding birth

o ccur at a time earlier such that the dierence is exp onential with rate Because

e we generate our data death rst then birth it follows that for any time t w

immediately know which birth death intervals cross t and have death times in

t It remains to consider death times which cross t and have death times

in To limit the p ossibilities that we must consider note that any lifetime

which starts b efore t and ends after must cross time as well The set of p oints

whose lifetime crosses is just a Poisson p oint pro cess with parameter and so we

generate this list of p oints Anytime we need to see howmany cross t wejust

examine this list and see how many that died after were also b orn b efore t

In the discrete case we found that when we could show that

the b ounding chain converged in p olynomial time For the continuous case wenow

show the following Our result is stated in terms of the number of events needed to

o ccur since it is this value that represents the amount of work actually needed to

be p erformed in running the algorithm

Theorem Consider the har d core gas model with parameters and R and

suppose that we run the bounding chain for the LotwickSilverman chain forward

from time for k events Then if V afterO ln steps the probability

R

that we have converged is

P X

D

As in Chapter where we rst intro duced b ounding chains we will require the

use of sup ermartingales in our analysis

More on sup ermartingales

Recall that a sup ermaringale is a sto chastic pro cess such that with probability one

E X j fX g X

t t t

In exp ectation a sup ermartingale decreases as time go es on Now supp ose that the

sup ermartingale never grows to o fast so that

X X c

t t

for some constant c Then Azumas inequality limits the probability that the

sup ermartingale grows to o large

Theorem Let X X be a supermartingale satisfying X X c Then

t t





tc

P X X e

t

That is the probability that X rises to o far ab ove X is exp onentially small in the

t

time t

Our metho d of proving Theorem will b e to show that the numb er of unknown

no des stays ab ove with an exp onentially declining probability The sequence

nX has particular prop erties such as b eing an absorbing state The following

D

lemma shows how these prop erties force the pro cess to hit quickly

Lemma Suppose that X is a nonnegative stochastic process satisfying is an

q where q is a constant absorbing state X X c and E X X jX

t t t t t

between and Final ly if X is distributed as a Poisson process with parameter

and

q

c q lne

t

q

then

P X

t

Pro of Let b e the rst time that the pro cess X hits Note that the sto chastic

t

pro cess

X tq X

t t

N

t

q X

t

is a sup ermartingale The fact that E X X jX q insures that E N

t t t t

N jN when X Furthermore N is constant for all t so it is certainly

t t t t

a sup ermartingale We also havethatN N c q so wemay apply Azumas

t t

i

inequality Note N X so P N ie i Altogether we have that

P X P N tq

t t

X

P N tqjN iP N i

t

i

X

P N N tq ijN iP N i

t

i

X

 

tq i tcq i

e e i

i

X

  

tq qii tcq i

e i e

i

X

  q q

tq cq e e q i

e e e i

i

  q

tq cq e

e

For this last value to fall b elow we must have that

tq

q

e ln

c q

q

c q lne

t

q

which completes the pro of

Pro of of Theorem We utilize our bynow standard pro cedure of lo oking at

how the size of X changes with certain events in this case the set of births and the D

i

deaths of p oints in X Let X b e the set X after i suchevents have o ccurred and

D D

D

let F denote the algebra generated by all the congurations up to the time of the

i

i

ith event WewishtocomputeaboundonE nX jF The rate of births is

i

D

and the rate of deaths is nX From Fact we know that the probability that

D

one of our events is a birth is nX and the probability that an unknown

D

X When a p ointinX dies nX goesdown by point dies is nX n

D D D D

When a point is born it only joins X if it tries to be b orn in an area blo cked by

D

a point in X This o ccurs with probability at most nX V This is an upp er

D D R

bound since some of the unknown points blo cked areas mightoverlap Therefore

i

nX

D

i i

E nX jF V nX

i D

D D

i i

nX nX

D D

i

nX

D

V

D

i

nX

D

i

Therefore when nX

D

V

D

i i

E nX nX jF

i

D D

i

We are given that V this means that nX satises the conditions of

D

D

V

D

Lemma with q Note that q which means that



V

t

D

 

i

P nX e e

D

Finallywe note that the number of events where a no de in X died is b ounded

A

ab ovebythenumb er of births Therefore after k events at least kevents had to

have been either births or deaths of no des in X which completes the pro of D

The Swapping Continuous Hard Core chain

Greenhill and Dyer increased the ability to analyze the rate of convergence of

the discrete hard core chain byintro ducing a Bro der typ e swapping move Wenow

show that intro ducing the same move into the continuous hard core chain yields a

similar increase in our analytic ability The move is as follows If apoint attempts

to be born and is blo cked by exactly one point then the blo cking point migh t be

removed and the new point added This swap is executed with probability

Algorithmically this new chain maybe written as in Figure

Swapping continuous hard core chain

Set x X

t

Cho ose t exp onential with rate nx

Cho ose U uniformly from

Case U nx

Cho ose v uniformly at random from S

If v x R

Set X X fv g

If z x such that v z R and v x nfz g R

With probabilitysetx x nfz gfv g

Case U nx

Cho ose i uniformly from fng

Set x x nfx g

i

Set t t t

Set X x

t

Figure Swapping continuous hard core chain

This swap move only moves b etween states with the same stationary probability

and W x dy W y dx no matter what the probabilitythatwe execute the

sw ap sw ap

swap when the value of was chosen to make the analysis as tightas p ossible

Hence is also stationary for this so by Theorem this new chain

has the same stationary distribution as the old one The new moves do not increase

the number of points in the conguration therefore the same argument that showed

geometric ergo dicity for the old chain applies here as well

The b ounding chain for this swap chain is analogous to that intro duced for the

Dyer Greenhill chain Deaths occur exactly as b efore If a p oint in x dies it is

D

removed from x If a pointin x dies it is removed from x

D A A

Supp ose that a new p oint is b orn If it is not blo cked byany p oint either in x

A

or x then it is added to x If the point is blo cked by one point in x and we

D A D

cho ose not to swap then the point is not added If the point is blo cked by two or

more p oints in x the p oint is not added If the point is blo cked by two or more

A

unsure whether or not to add the p oint points in x and no points in x we are

D A

to the conguration and so the point is added to x All of these moves are the

D

same as for the b ounding chain for the LotwickSilverman chain

The interesting cases come when the p oint added is a candidate for a swap and

the algorithm cho oses to execute the swap Supp ose the p oint is blo cked by exactly

one point in x x Then the blo cking point is removed and the new p oint is

A D

added to x If the point is blo cked by exactly one p oint in x and at least one

A A

point in x then we are unsure ab out two points the new point might be added

D

and the blo cking point in x might be removed Therefore b oth of these points

A

must b e placed in x

D

Theorem Consider the hard core gas model with parameters and R and

suppose that we run the bounding chain for the LotwickSilverman chain forward

from time for k events Then if V then after O ln steps the

R

probability that we have converged is

P X

D

Swapping continuous hard core b ounding chain

Input List of events interval a b fz b ad g

z z

the swap execute rolls for each birth

Output X X X

b A D

Set X fz b a d g

D z z

For each event e in sorted time order

If e d for some z

z

If z X

D

Set X X nfz g

D D

If z X

A

Set X X nfz g

A A

If e b for some z

z

Let D be the number of p oints y in X such that z y R

D

Let A be the number of p oints y in X such that z y R

A

If D A

Set X X fz g

A A

If D A and we do not execute aswap at this birth

Set X X fz g

D D

If D A and we execute the swap

Set X X nfy X y z R g

D D D

Set X X fz g

A A

If D A and we execute the swap

Set X X fz gfy X y z R g

D D A

Set X X nfy X y z R g

A A A

Figure Swapping continuous hard core b ounding chain

The computer time needed to generate a simulation is the number of events

which o ccur which is why we phrase this theorem using events rather than time

Pro of The pro of is very similar to the case of the nonswapping chain and so we

will utilize the same notation as in that pro of We again will only consider those

i i

events capable of changing nX namely all births and deaths of no des in X

D D

i

These events o ccur at rate r nX

D

i

j F is the same as The death rate contribution to E nX b efore what

i

D

changes is the contribution of the birth rate Before births could only increase

i

nX but with the intro duction of the swap move it can also decrease the

D

i

number of unknowns Supp ose that a point is blo cked by a single p oint in X

D

i i

Then a birth and swap at that point removes a point from X and so nX

D D

i

decreases by If there is a birth and no swap then the new p oint joins X and

D

i

nX increases by

D

i i

If however a new p ointisblocked byapointinX and X then a swap results

D A

in two p oints b eing added to X If the pointcho oses not to swap then it cannot

D

i

be b orn since it is blo cked by X

A

i i

Let A denote the area blo cked by a single point in X and no points of X

D A

i i

Let A denote the area blo cked by more than one p ointinX and no p oints of X

D A

i

and let A denote the area blo cked by at least one point in X and exactly one

D

i

pointinX

A

i

A nX

D

i i

E nX nX jF

i

D D

r r

A A

r r

A

r

i

nX A A A

D

r

Note that each p ointinA and A is blo cked by at least p oint within distance R of

an unknown p oint and each p ointinA is blo cked by at least twounknown p oints

i

Hence A A A nX V Therefore

R

D

i i

nX nX V

R

D D

i i

E nX nX jF

i

D D

i

nX

D

and we may apply Lemma As in the nonswapping chain at least half of the

events must b e births or deaths of points in X completing the pro of

D

Wehave proven that this chain converges rapidly for values of which are twice

as high as for the nonswapping chain This swap movemay b e added to other chains

as well to increase their p erformance

WidomRowlinson

The Widom Rowlinson mo del was originally conceived as a continuous mo del

where the density of a conguration of points x x x x is

Q

nx

nx nx

Q



Q

Z

This is exactly analogous to the discrete case and each of the chains we examined

there has a corresp onding continuous chain

The birth death chain given for the discrete mo del actually comes from a dis

cretization of the birth death chain for the continuous mo del We say that a p oint

v and color i is blo cked if v x R for some j not equal to i

j

Birth death continuous WidomRowlinson chain

Set x X

t

P

Cho ose t exp onential with rate nx

i

i

Cho ose U

U

P

Set p

i

i

For all in

P

Set p

i i i

i

Cho ose i fQg according to p

R

If i

Cho ose v x x

U Q

Let j be the color such that v x

j

Set x x nfj g

j j

Else

If color i is not blo cked for no de v

Set xv i

Set X x

t

Figure Birth death continuous WidomRowlinson chain

As pointed out in this chain exhibits a monotonic structure when Q

For Q we must use a b ounding chain As with the hard core case a state y

of the b ounding chain will b e x x x z z z where each x contains

Q Q i

points which are known to be colored i while z refers to those p oints that are

i

might or might not b e in the conguration If they are in the conguration though

they are denitely colored i Let nxjx x j and nz jz x j

Q Q

Consider a point v and color i If v x R for some j i then we say that v

j

is blo cked for i by a known no de If v z R for some j i then we say that

j

v is blo cked for i by an unknown no de

Theorem Consider the WidomR ow linson model with parameters i running i

Birth death continuous WidomRowlinson b ounding chain

Set y Y

t

P

Cho ose t exp onential with rate nx nz

i

i

Cho ose U uniformly from

P

Set p

i

i

For all i n

P

Set p

i i i

i

Cho ose i fQg according to p

R

Case i

Cho ose v x x

U Q

Let j be the color such that v x

j

Set x x nfj g

j j

Set z z nfj g

j j

Case i v not blo cked for i

Set x x fig

i i

Case i v blo cked for i only by unknown no des

Set z z fig

i i

Set t t t

Set Y y x x z z

t Q Q

Figure Birth death continuous WidomRowlinson chain

from to Q and R and suppose that we run the bounding chain for this simple birth

death chain forward from time for k events For Q let

q

and let

X

min

i i

i

i

ln events the probability that we for Q If V then after O

R

V

R

have converged is

P nz

Pro of We consider only those events capable of changing the size of nz namely

births of new p oints and deaths of points in z Let z denote the unknown set of

i

points of color i at the b eginning of a b ounding chain step and z the set afterwards

i

Let F denote the sigma algebra generated by events up to the time of z

A bad event o ccurs when a point of color i is blo cked by a point in z where

j

a point in z blo cks a volume V and so the j i Given that we have a birth

j R

total amountofvolume blo cked could be as high as V nz jz j

R i

E z z jF P birth of color iP birth blo cked P death of color i

i

i

jz j

i i

V nz jz j

P P

R i

nz nz

i i

i i

i

P

nz

i

i

And so just as for the discrete chain we have that E z E E z jz E AE z

t

AE z since the elements of A are all nonnegative Let z denote the unknown set of

t t t t

points at time t An induction yields E z A E z so that jjE z jj jjE z jj

alue of A where is the highest magnitude eigenv

As in the discrete case the only remaining problem is how to bound When

p

Q we preform the computation directly and V For

R

Q the metho d of Gershgorin disks may be used to show that

X X

i min V max V

i i R R

i i

Q Q Q

i

j i

Q

To complete the pro of we recall that Q e

Continuous swapping chain

The swap move is not dicult to describ e in the continuous case and it will double

the range of over which we can prove that the b ounding chain detects complete

i

coupling in p olynomial time When a pointof color i is blo cked by any number of

points of color j dierent from i in the birth death chain that point is not b orn

In a swap move if a p ointof color i is blo cked by exactly one p oin t of color j i

then with probability p aswap move is executed and the p oint is b orn and the

sw ap

blo cking p oint is removed from the set

Swapping birth death continuous WidomRowlinson chain

Set x X

t

P

Cho ose t exp onential with rate nx

i

i

Cho ose U

U

P

Set p

i

i

For all in

P

Set p

i i i

i

Cho ose i fQg according to p

R

Case i

Cho ose v x x

U Q

Let j be the color such that v x

j

Set x x nfj g

j j

Case iisnotblocked for no de v

Set x x fv g

i i

Case iisblocked for v by exactly one point w x

j

If U p

sw ap

Set x x nfw g

j j

Set x x fv g

i i

Set X x

t

Figure Swapping birth death continuous WidomRowlinson chain

The b ounding chain must now take into account the p ossibilityof a swap typ e

move We have the same cases that arise in the discrete case

As in all cases dealing with the swapping move changing the value of p aects

sw ap

how quickly it actually converges When p go o d things happ en The

sw ap

only dierence between the theorem below and the theorem for the nonswapping

chain is that here may be larger by a factor of

Theorem Consider the WidomRow linson model with parameters i running

i

from to Q and R and suppose that we run the bounding chain for this simple birth

death chain forward from time for k events For Q let

X

min

i i

i

i

If p and V then after O ln events the probability that

sw ap R

V

R

we have converged is

P nz

For Q either let p and

sw ap

max

i

i

as above or let p and

sw ap

q

Either way if V the same probability of convergence holds

R

Pro of Let jD j be the set of unknown points at time t Then these points cover

t

area in three ways These are similar to the three classications of no des seen in

the discrete case To change the value of jD j either an unknown point died or

t

a point b eing born resulted in one or two no des b eing switched to unknown The

P

rate at which points are born or die from the unknown set is r jD j

t i

i

Let D denote the area covered by exactly one unknown point of some color

When a pointis b orn in D we are in case inthe swapping b ounding chain Let

D denote the area covered by exactly one known point of color i and at least one

i

unknown p oint of color i this corresp onds to case Finallylet D denote the area

i

covered by exactly at least two unknown points of color i and no known p oints so

that when a p ointisbornhere we are in case

P

Supp ose that Q so that min Case can havetwo outcomes

i i i

i

If we cho ose to swap then D is smaller than D by one no de but if we do not

t t

cho ose to swap it grows by one For a p oint v D letb b e the color of the p oint

v

i such that v is b eing blo cked by a point in z The point is that births of color b

i v

are not blo cked and do not aect jD j if the birthing p oint has color i but might

t

change if the new p oint has color j i This event o ccurs with probability r

b

v

Given that we have abirth event but that we have not yet selected a p osition the

exp ected change due to p oints in D will b e

Z Z

b b b

v v v

p p dv dv p

sw ap sw ap sw ap

r r r

v D v D

Z

p dv

sw ap

r

v D

D p

sw ap

r

where the inqualityisvalid as long as p We will later set p so

sw ap sw ap

this is true

For case cho osing to swap makes jD j jD j but is unchanged if wedo

t t

not swap The contribution from this case is

Z

b

v

p D p

sw ap sw ap



r r

v D

In case the new no de is always part of D so jD jjD j Therefore

t t t

the contribution is indep endent of p

sw ap

Z

b

v

D

r r

v D

The nal event that may occur to aect jD j is the death of an unknown no de

t

which o ccurs with probability jD jr and results in the removal of exactly one

t

unknown no de We now set p so that

sw ap

jD j

t

E jD jjD jjF D p D p D

t t t sw ap sw ap

r r r r

h i

jD j

t

D D D

r r

Points in D and D are covered by at least one no de of D and points in D are

t

covered by at least twonodesofD soD D D V jD j The existence of

t R t

this b ound is why we cho ose p as we did

sw ap

t

Following our usual derivation E jD j E jD j where

t

V

R

r

We have that when V and since jD j is integral and jD j is p oisson

R t

P

with rate Markovs inequality nishes the pro of

i

i

For Q max so we may use that for However we may also go

i i

be the area blo cked into more detail splitting apart D into D and D Let D

for by exactly one unknown p ointofcolorand D be the area blo cked for by

exactly one unknown p oint of color these areas may overlap Split D and D

in a similar fashion

As in the regular birth death chain let z denote the number of unknowns of

i

color i f g and z denote the numb er of unknown p oints after the step is taken

i

Then from our discussion ab ove we know that

h i

jD j

t

E z z jF D p D p D p

sw ap sw ap sw ap

r r r

h i

D D

r

z V

R

r

using the fact that D D z V An analogous equation may b e derived for the

R

exp ected change in z

This is the same recurrence that we have seen for the Q case in the regular

birth death chain for b oth the continuous and discrete case The eigenvalues of

the system are smaller than in the previous case by a factor of giving the result

The moral of this chapter is everything that works in the discrete world carries

over to the continuous realm with no problems

Swapping birth death continuous WidomRowlinson

b ounding chain

Set y Y

t

P

Cho ose t exp onential with rate nxnz

i

i

Cho ose U uniformly from

P

Set p

i

i

For all i n

P

Set p

i i i

i

Cho ose i fQg according to p

R

Case i

Cho ose v x x

U Q

Let j be the color such that v x

j

Set x x nfj g

j j

Set z z nfj g

j j

Case i v not blo cked for i

Set x x fig

i i

Case i v blo cked for i by at least two unknown no des

Set z z fig

i i

Case i v blo cked for i by exactly one unknown no de w z

j

If U p

sw ap

Set z z nfw g

j j

Set x x fv g

i i

Else Set z z fv g

i i

Case i v blo cked for i by exactly one known no de w x

j

and at least one unknown no des

If U p

sw ap

Set x x nfw g

j j

Set z z fw g

j j

Set z z fv g

i i

Set t t t

Set Y y x x z z

t Q Q

Figure Swapping birth death continuous WidomRowlinson chain

Chapter

Final thoughts

I dont see much sense in that said Rabbit No said Pooh humbly

there isnt But there was going to be when I began it Its just that

something happ ened to it along the way

AA Milne

The theoretical b ounds on the mixing time derived using b ounding chains are

always weaker than what may be found exp erimentally For example in chapter

we saw a b ounding chain for the list up date problem that is guaranteed to run

quickly when the probabilities of selecting items are geometric ie p p for

i i

some When exp eriments must be run to see how quickly the

b ounding chain converges

As shown in the graph below this bounding chain app ears to be p olynomial

when orlower but it also clearly exp onential when These running

times are an average of runs of the b ounding chain

List Update Chain 6

5

4 θ = 0.5 θ = 0.4 3 θ = 0.3 Millions 2 θ = 0.2

Average Running Time 1

0 0 5 10 15 20 25 30 35

n

Figure T for list up date chain

BC

Given that practice always wins over theory there are of course several reasons

why we are still interested in theoretical b ounds on the time b ounding chains need

to detect complete coupling First the theoretical range of parameters over which

the chain is rapidly mixing is a solid indication of the actual usefulness of the

chain As we have seen the simple lo cal analysis of b ounding chains tends to

give results that are within a constant factor of the true answer This is a trivial

statement for b ounding chains such as the list up date chain for geometric weights

where the parameter seems indep endent of n However for chains such as the

antiferromagnetic Potts mo del where the range of where the chain is rapidly

mixing dep ends quite strongly on this is a somewhat surprising fact

Second as an immediate corollary of the b ounding chain mixing time we get

an immediate b ound on the mixing time of the chain and a priori bounds on the

running time of CFTP and FMR for p erfect sampling It is always nice to know

that certain problems such as randomly sampling from the sink free orientations of

a graph have p olynomial exp ected running time algorithms

Coupling from the past is a simple idea with extraordinary consequences for

the practice of Monte Carlo Markov chain metho ds However using CFTP is not

always easy Bounding chains are an imp ortant to ol for using p erfect sampling

algorithms such as CFTP and FMR or even just for determining the mixing time

of a chain The fact that it leads to relatively straightforward lo cal analyses of the

original chains is an added p erk that makes this metho d surprisingly p owerful

Bibliography

D Aldous Random walks on nite groups and rapidly mixing Markov

chains In A Dold and B Eckmann editors Seminaitre de Probabilites XVII

volume of SpringerVerlag Lecture Notes in Mathematicspages

SpringerVerlag New York

D Aldous and JA Fill Reversible Markov Chains and Random Walks on

Graphs Monograph in preparation

DJ Aldous and P Diaconis Strong uniform times and nite random walks

Adv in Appl Math

K Azuma Weighted sums of certain dep endent random variables Tohoku

Mathematical Journal

Petr Beckman A History of St Martins Press

Russ Bubley and Martin Dyer Graph orientations with no sink and an approx

imation for a hard case of SAT Proceedings of the Eighth Annual ACMSIAM

Symposium on Discrete Algorithms pages

P J Burville and J F C Kingman On a mo del for storage and search

Journal of Applied Probability

Colin Co op er and Alan Frieze Mixing prop erties of the SwendsenWang pro

cess on classes of graphs Preprint

Luc Devroye Branching Processes and Their Applications in the Analysis of

Tree Structures and Tree Algorithms pages Springer

W Do eblin Exp ose de la theorie des chains simples constantes de markov a

un nombre ni detats Rev Math de lUnion Interbalkanique

Martin Dyer Alan Frieze and Mark Jerrum On counting indep endent sets in

sparse graphs Technical Rep ort ECSLFCS University of Edinburgh

Martin Dyer and Catherine Greenhill On Markovchains for indep endent sets

Preprint

Martin Dyer and Catherine Greenhill Agenuinely p olynomialtime algorithm

for sampling tworowed contingency tables In th International Col loquium

on Automata Languages and Programming pages EATCS

James A Fill An interruptible algorithm for perfect sampling via Markov

chains Annals of Applied Probability

V Gore and M Jerrum The SwensenWang pro cess do es not always mix

rapidly In Proceedings of the th Annual ACM Symposium on Theory of

Computing pages

O Haggstrom MNM van Leishout and J Moller Characterisation results

and Markovchain Monte Carlo algorithms including exact simulation for some

spatial p oint pro cesses Bernoul li to app ear

Olle Haggstrom and Karin Nelander Exact sampling from antimonotone sys

tems Preprint

Olle Haggstrom and Karin Nelander On exact simulation from Markov random

elds using coupling from the past Scand J Statist To App ear

W J Henricks The stationary distribution of an in teresting markov chain

Journal of Applied Probability

R Horn and C R Johnson Matrix Analysis Cambridge University Press

Mark L Hub er Exact sampling and approximate counting techniques In

Proceedings of the th Annual Symposium on the Theory of Computingpages

Mark L Hub er Exact sampling for the WidomRowlinson mixture mo del

Preprint

Mark L Hub er Exact sampling using SwendsenWang In Proceeding of the

Symposium on Discrete Algorithms T o App ear

Mark L Hub er Perfect sampling from indep endent sets Preprint

M Jerrum Avery simple algorithm for estimating the number of k colourings

ofalowdegree graph SIAM Journal on Computing

M Jerrum L Valiant and V Vazirani Random generation of combinatorial

structures from a uniform distribution Theoretical Computer Science

Mark Jerrum and Alistair Sinclair Polynomialtime approximation algorithms

for the Ising mo del In Proceedings of the th ICALP pages EATCS

V E Johnson Studying convergence of Markovchain Monte Carlo algorithms

using coupled sample paths Journal of the American Statistical Association

Wilfrid S Kendall Perfect simulation for the areainteraction p oint pro cess

In Proceedings of the Symposium on Probabiity Towards the Year

D E Knuth The Art of Computer Programming Volume I Sorting and

Searching AddisonWesley

JL Leb owitz and G Gallavotta Phase transitions in binary lattice gases

Journal of Mathematical Physics

HW Lotwick and BW Silverman Convergence of spatial birthanddeath

pro cesses Math Proc Cambridge Philos Soc

Michael Luby and Eric Vigo da Approximately counting up to four extended

abstract In Proceedings of the TwentyNinth Annual ACM Symposium on

Theory of Computing pages

J McCab e On serial les with relo catable records Operations Research

N Metrop olis AW Rosenbluth MN Rosenbluth AH Teller and E Teller

Equation of state calculation by fast computing machines Journal of Chemical

Physics

J Mller On the rate of convergence of spatial birthdeath pro cesses Ann

Inst Statist Math

Duncan J Murdo ch and Jerey S Rosenthal An extension of Fills exact

sampling algorithm to nonmonotone chains Preprint

Sidney C Port Theoretical Probability for Applications WileyInterscience

R B Potts Some generalised orderdisorder transformations Proceedings of

the Cambridge Philosophical Society

CJ Preston Spatial birthanddeath pro cesses Bul l Inst Internat Statist

James G Propp and David B Wilson Exact sampling with coupled Markov

chains and applications to statistical mechanics Random Structures and A lgo

rithms and

Dana Randall and David Wilson Sampling spin congurations of an Ising

system In Tenth Annual ACMSIAM Symposium on Discrete Algorithms

B D Ripley Mo delling spatial patterns with discussion Journal of the Royal

Statistical Society Series B

B D Ripley Simulating spatial patterns dep endent samples from a multi

variate density Applied Statistics

Ronald Rivest On selforganizing sequential search heuristics Communications

of the ACM

J Salas and A D Sokal Absence of phase transition for antiferromagnetic

Potts mo dels via the Dobrushin uniqueness theorem Journal of Statistical

Physics

Daniel D Sleator and Rob ert E Tarjan Amortized eciency of list up date

and paging rules Communications of the ACM

R Swendsen and JS Wang Nonuniversal critical dynamics in Monte Carlo

simulation

B Widom and JS Rowlinson New mo del for the study of liquidvap or phase

transition Journal of Chemical Physics

Thomas Yan A rigorous analysis of the Creutz demon algorithm for the mi

cro canonical onedimensional Ising mo del PhD Thesis