On Sampling with Markov Chains

On sampling with Markov chains F. R. K. Chung University of Pennsylvania Philadelphia, PA 19104 R. L. Graham AT&T Bell Laboratories Murray Hill, NJ 07974 S.-T. Yau Harvard University Cambridge, MA 02138 1. Introduction There are many situations in which one would like to select a random element (nearly) uniformly from some large set. One method for doing this which has received much attention recently is the following. Suppose we can define a Markov chain M on X which has the uniform distribution as its stationary distribution. Then starting at some (arbitrary) initial (t) point x0, and applying M for sufficiently many (say t) steps, the resulting point M (x0) will be (almost) uniformly distributed over X, provided t is large enough. In order for this approach to be effective, however, M should be “rapidly mixing”, i.e., M (t) should be very close to the uniform distribution U (in some appropriate sense) in polynomially many steps t, measured by the size of X. There are in use a variety of methods for estimating the “mixing rate” of M, i.e., the rate of convergence of M to its stationary distribution, e.g., coupling, path-counting, strong stationary time, eigenvalue estimates and even Stein’s method (cf. [A87], [SJ89], [S93], [DFK89], [FKP94]). However, for many problems of current interest such as volume estimation, ap- proximate enumeration of linear extensions of a partial order and sampling contingency tables, eigenvalue methods have not been effective because of the difficulty in obtaining good (or any!) bounds on the dominant eigenvalue of the associated process M. In this paper we remedy this problem to some extent by applying new eigenvalue bounds of two of the authors [CY1], [CY2] for random walks on what we call “convex subgraphs” of homogeneous graphs (to be defined in later sections). A principal feature of these techniques is the ability to obtain eigenvalue estimates in situations in which there is a nonempty boundary obstructing the walk, typically a more difficult situation then the boundaryless case. In particular, we will give the first proof of rapid convergence for the “natural” walk on contingency tables, as well as generalizations to restricted contingency tables, symmetric tables, compositions of an integer, and so-called knapsack solutions. We point out that these methods can also be applied to other problems, such as selecting random points in a given polytope (which is a crucial component of many recent volume approximation algorithms for polytopes). Also, the same ideas can be carried out in the context of weighted graphs and biased random walks (see [CY1] for the basic statements). However, we have deliberately restricted ourselves to some of the simpler applications for ease of exposition. Before proceeding to our main results, it will be necessary to introduce a certain amount of background material. This we do in the next section. 2. Background We begin by recalling a variety of concepts from graph theory and differential geometry. Any undefined terms can be found in [SY94] or [BM76]. For a graph G = (V, E) with vertex set V and edge set E, we let dv denote the degree of the vertex u V , i.e., the number of w V such that u, w E is an edge (which we also ∈ ∈ { } ∈ indicate by writing v w). For the most part, our graphs will be undirected, and without ∼ loops or multiple edges. Define the square matrix L = LG, with rows and columns indexed by V by: du if u = v, (1) L(u, v)= 1 if u v, − ∼ 0 otherwise . Let T denote the diagonal matrix with (v, v) entry T (v, v) = d , v V (and, of course, 0 v ∈ otherwise). A key concept for us will be the following operator: The Laplacian = for G is defined L LG by 1/2 1/2 (2) := T − LT − . L Thus, can be represented as a V by V matrix given by L | | | | 1 if u = v, (3) (u, v)= 1 if u v, L − √dudv ∼ 0 otherwise . 2 Considering as an operator acting on f : V C we have: L { → } 1 f(v) f(u) (4) f(v)= . L √d √d − √d v u∈V v u Xu∼v In the case that G is d-regular, i.e., d = d for all v V , (3) and (4) take the more familiar v ∈ forms 1 (3 ) = I A ′ L − d where I is the identity matrix and A = AG is the adjacency matrix for G, and 1 (4′) f(v)= (f(v) f(u)) . L d u − Xu∼v Denote the eigenvalues of by L 0= λ0 λ1 λn 1 ≤ ≤···≤ − where we let n denote V for now. It is known that λ 1 unless G = K , the complete graph | | 1 ≤ n on n vertices, and λn 1 2 with equality if and only if G is bipartite. − ≤ The most important of these eigenvalues from our perspective will be λ1 which we call the Laplacian eigenvalue of G, and denote by λ = λ . Note that with 1 : V C denoting the G → constant 1 function, T 1/21 is an eigenfunction of with eigenvalue 0. It follows (see [CY1]) L that (f(v) f(u))2 λ := λ1 = inf − f T 1 d f(v)2 ⊥ u v v X∼ v (5) P (f(v) f(u))2 u v − = inf sup ∼ 2 . f c P dv(f(v) c) v − P For a (nonempty) set S V , consider the subgraph of G induced by S. This subgraph, which ⊂ we also denote by S, has edge set E(S)= u, v E(G) : u, v S . The edge boundary ∂S {{ }∈ ∈ } is defined by ∂S := u, v E(G) : u S, u V S . {{ }∈ ∈ ∈ \ } The vertex boundary δS is defined by δS := v V S : u, v E(G) for some v V . { ∈ \ { }∈ ∈ } 3 δS V S ∂S v \ S u Figure 1: Various boundaries of S Let E := E(S) ∂S. ′ ∪ We illustrate these definitions in Figure 1. The next key concept we will need is that of the Neumann eigenvalues for S on G. First, define (parallel to (5)) (f(x) f(y))2 x,y E′ − { }∈ λS = inf P 2 f dxf (x) dxf(x)=0 x S x∈S P∈ (6) P (f(x) f(y))2 x,y E′ − { }∈ = inf sup P 2 . f c dx(f(x) c) x S − P∈ In general, define (f(x) f(y))2 x,y E′ − (7) λS,i = inf sup { P}∈ ′ 2 f f Ci−1 dx(f(x) f ′(x)) ∈ x S − P∈ where C is the subspace spanned by the eigenfunctions φ achieving λ , 0 i k. In k i S,i ≤ ≤ particular, g, g S (8) λS = λS,1 = inf h L i g T 1/21 g, g S ⊥ h i where is the Laplacian of G and f1,f2 S denotes the inner product f1(x)f2(x). Note L h i x S ∈ that when S = V , then λS is just the usual Laplacian eigenvalue of G. ThePλS,i are eigenvalues of a matrix defined by the following steps. LS For X V , let L denote the submatrix of L restricted to rows and columns induced by ⊂ X elements of X. Define the matrix N with rows indexed by S δS and columns indexed by S ∪ 4 given by 1 if x = y, 0 if x S, x = y, (9) N(x,y)= 1 ∈ 6 d′ if x δS, x y S, x ∈ ∼ ∈ 0 otherwise where dx′ denotes the degree of x to points in S. Then the matrix 1/2 tr 1/2 (10) := T − N L NT − LS S has the λS,i as its eigenvalues (together with 0, of course), with λS = λS,1 being the least eigenvalue exceeding 0. The matrix is associated with the following random walk P on LS S G, which we call the Neumann walk on S. A particle at vertex u moves to each of the ⊂ neighbors v of u with equal probability 1/d except if v S (so that v δS). In this case, the u 6∈ ∈ 1 particle then moves to each of the d′ neighbors of v which are in S, with equal probability ′ . v dv Thus, P (u, x), the probability of moving from u to x is 1/d if u x 1 1 P (u, x)= u ∼ + . 0 otherwise ( ) du v∈δS dv′ Xv∼u It is easy to see that when G is regular, the stationary distribution of P is uniform for any choice of S connected. The rate of convergence of P to its stationary distribution is controlled by the Neumann Laplacian eigenvalue λS, which we eventually will bound from below. The final key concept we introduce here is the heat kernel Ht of S. To begin with, let us decompose in the usual way as a linear combination of its projections P onto the LS i corresponding eigenfunctions φ of : i LS (11) = λ P . LS S,i i Xi For t 0, the heat kernel H is defined to be the operator of f : S δS C given by ≥ i { ∪ → } λS,i Ht : = e− Pi Xi t (12) = e− LS t2 t3 = I t + 2 3 + − LS 2 LS − 6 LS · · · Thus, H0 = I. For f : S δS C, set ∪ → t S F = Htf = e− L f . 5 That is, F (t,x) = (Htf)(x)= Ht(x,y)f(y) . y S δS ∈X∪ Some useful facts concerning Ht and F are the following (see [CY1] for proofs): (i) F (0,x)= f(x) (ii) For x S δS, ∈ ∪ Ht(x,y) dy = dx y S δS ∈X∪ q p (iii) F satisfies the heat equation ∂F = F ∂t −LS (iv) For any x δS, F (t,x) = 0 ∈ LS (v) For all x,y S δ(S), H (x,y) 0 ∈ ∪ t ≥ The connection of the heat kernel to the Neumann eigenvalue λS of S is given by the following result: Theorem [Chung, Yau [CY1]].

On Sampling with Markov Chains

Randomized Algorithms

MA3K0 - High-Dimensional Probability

HIGHER MOMENTS of BANACH SPACE VALUED RANDOM VARIABLES 1. Introduction Let X Be a Random Variable with Values in a Banach Space

Control Policy with Autocorrelated Noise in Reinforcement Learning for Robotics

CONVERGENCE RATES of MARKOV CHAINS 1. Orientation 1

Multivariate Scale-Mixed Stable Distributions and Related Limit

Random Fields, Fall 2014 1

Random Polynomials Over Finite Fields

University of Groningen Panel Data Models Extended to Spatial Error

Random Variable

Iterated Random Functions∗

Automorphisms of Random Trees