<<

On sampling with Markov chains

F. R. K. Chung University of Pennsylvania Philadelphia, PA 19104

R. L. Graham AT&T Bell Laboratories Murray Hill, NJ 07974

S.-T. Yau Harvard University Cambridge, MA 02138

1. Introduction

There are many situations in which one would like to select a random element (nearly) uniformly from some large . One method for doing this which has received much attention recently is the following. Suppose we can define a M on X which has the uniform distribution as its stationary distribution. Then starting at some (arbitrary) initial (t) point x0, and applying M for sufficiently many (say t) steps, the resulting point M (x0) will be (almost) uniformly distributed over X, provided t is large enough. In order for this approach to be effective, however, M should be “rapidly mixing”, i.e., M (t) should be very close to the uniform distribution U (in some appropriate sense) in polynomially many steps t, measured by the size of X. There are in use a variety of methods for estimating the “mixing rate” of M, i.e., the rate of convergence of M to its stationary distribution, e.g., coupling, path-counting, strong sta- tionary time, eigenvalue estimates and even Stein’s method (cf. [A87], [SJ89], [S93], [DFK89], [FKP94]). However, for many problems of current interest such as volume estimation, ap- proximate enumeration of linear extensions of a partial order and sampling contingency tables, eigenvalue methods have not been effective because of the difficulty in obtaining good (or any!) bounds on the dominant eigenvalue of the associated process M. In this paper we remedy this problem to some extent by applying new eigenvalue bounds of two of the authors [CY1], [CY2] for random walks on what we call “convex subgraphs” of homogeneous graphs (to be defined in later sections). A principal feature of these techniques is the ability to obtain eigenvalue estimates in situations in which there is a nonempty boundary obstructing the walk, typically a more difficult situation then the boundaryless case. In particular, we will give the first proof of rapid convergence for the “natural” walk on contingency tables, as well as generalizations to restricted contingency tables, symmetric tables, compositions of an integer, and so-called knapsack solutions. We point out that these methods can also be applied to other problems, such as selecting random points in a given polytope (which is a crucial component of many recent volume approximation algorithms for polytopes). Also, the same ideas can be carried out in the context of weighted graphs and biased random walks (see [CY1] for the basic statements). However, we have deliberately restricted ourselves to some of the simpler applications for ease of exposition. Before proceeding to our main results, it will be necessary to introduce a certain amount of background material. This we do in the next section.

2. Background

We begin by recalling a variety of concepts from graph theory and differential geometry. Any undefined terms can be found in [SY94] or [BM76].

For a graph G = (V, E) with vertex set V and edge set E, we let dv denote the degree of the vertex u V , i.e., the number of w V such that u, w E is an edge (which we also ∈ ∈ { } ∈ indicate by writing v w). For the most part, our graphs will be undirected, and without ∼ loops or multiple edges. Define the square L = LG, with rows and columns indexed by V by: du if u = v, (1) L(u, v)= 1 if u v,  − ∼  0 otherwise . Let T denote the diagonal matrix with (v, v) entry T (v, v) = d , v V (and, of course, 0  v ∈ otherwise). A key concept for us will be the following operator: The Laplacian = for G is defined L LG by 1/2 1/2 (2) := T − LT − . L Thus, can be represented as a V by V matrix given by L | | | | 1 if u = v, (3) (u, v)= 1 if u v, L  − √dudv ∼  0 otherwise .

 2 Considering as an operator acting on f : V C we have: L { → } 1 f(v) f(u) (4) f(v)= . L √d √d − √d v u∈V  v u  Xu∼v

In the case that G is d-regular, i.e., d = d for all v V , (3) and (4) take the more familiar v ∈ forms

1 (3 ) = I A ′ L − d

where I is the identity matrix and A = AG is the adjacency matrix for G, and

1 (4′) f(v)= (f(v) f(u)) . L d u − Xu∼v Denote the eigenvalues of by L

0= λ0 λ1 λn 1 ≤ ≤···≤ −

where we let n denote V for now. It is known that λ 1 unless G = K , the complete graph | | 1 ≤ n on n vertices, and λn 1 2 with equality if and only if G is bipartite. − ≤ The most important of these eigenvalues from our perspective will be λ1 which we call the Laplacian eigenvalue of G, and denote by λ = λ . Note that with 1 : V C denoting the G → constant 1 , T 1/21 is an eigenfunction of with eigenvalue 0. It follows (see [CY1]) L that

(f(v) f(u))2 λ := λ1 = inf − f T 1 d f(v)2 ⊥ u v v X∼ v (5) P (f(v) f(u))2 u v − = inf sup ∼ 2 . f c P dv(f(v) c) v − P For a (nonempty) set S V , consider the subgraph of G induced by S. This subgraph, which ⊂ we also denote by S, has edge set E(S)= u, v E(G) : u, v S . The edge boundary ∂S {{ }∈ ∈ } is defined by ∂S := u, v E(G) : u S, u V S . {{ }∈ ∈ ∈ \ } The vertex boundary δS is defined by

δS := v V S : u, v E(G) for some v V . { ∈ \ { }∈ ∈ }

3 δS

V S ∂S v \ S

u

Figure 1: Various boundaries of S

Let E := E(S) ∂S. ′ ∪ We illustrate these definitions in Figure 1. The next key concept we will need is that of the Neumann eigenvalues for S on G. First, define (parallel to (5))

(f(x) f(y))2 x,y E′ − { }∈ λS = inf P 2 f dxf (x) dxf(x)=0 x S x∈S P∈ (6) P (f(x) f(y))2 x,y E′ − { }∈ = inf sup P 2 . f c dx(f(x) c) x S − P∈ In general, define (f(x) f(y))2 x,y E′ − (7) λS,i = inf sup { P}∈ ′ 2 f f Ci−1 dx(f(x) f ′(x)) ∈ x S − P∈ where C is the subspace spanned by the eigenfunctions φ achieving λ , 0 i k. In k i S,i ≤ ≤ particular, g, g S (8) λS = λS,1 = inf h L i g T 1/21 g, g S ⊥ h i where is the Laplacian of G and f1,f2 S denotes the inner product f1(x)f2(x). Note L h i x S ∈ that when S = V , then λS is just the usual Laplacian eigenvalue of G. ThePλS,i are eigenvalues of a matrix defined by the following steps. LS For X V , let L denote the submatrix of L restricted to rows and columns induced by ⊂ X elements of X. Define the matrix N with rows indexed by S δS and columns indexed by S ∪

4 given by 1 if x = y, 0 if x S, x = y, (9) N(x,y)=  1 ∈ 6  d′ if x δS, x y S,  x ∈ ∼ ∈ 0 otherwise   where dx′ denotes the degree of x to points in S. Then the matrix

1/2 tr 1/2 (10) := T − N L NT − LS S

has the λS,i as its eigenvalues (together with 0, of course), with λS = λS,1 being the least eigenvalue exceeding 0. The matrix is associated with the following random walk P on LS S G, which we call the Neumann walk on S. A particle at vertex u moves to each of the ⊂ neighbors v of u with equal 1/d except if v S (so that v δS). In this case, the u 6∈ ∈ 1 particle then moves to each of the d′ neighbors of v which are in S, with equal probability ′ . v dv Thus, P (u, x), the probability of moving from u to x is

1/d if u x 1 1 P (u, x)= u ∼ + . 0 otherwise ( ) du v∈δS dv′ Xv∼u It is easy to see that when G is regular, the stationary distribution of P is uniform for any choice of S connected. The rate of convergence of P to its stationary distribution is controlled

by the Neumann Laplacian eigenvalue λS, which we eventually will bound from below.

The final key concept we introduce here is the heat kernel Ht of S. To begin with, let us decompose in the usual way as a linear combination of its projections P onto the LS i corresponding eigenfunctions φ of : i LS

(11) = λ P . LS S,i i Xi For t 0, the heat kernel H is defined to be the operator of f : S δS C given by ≥ i { ∪ → }

λS,i Ht : = e− Pi Xi t (12) = e− LS t2 t3 = I t + 2 3 + − LS 2 LS − 6 LS · · ·

Thus, H0 = I. For f : S δS C, set ∪ → t S F = Htf = e− L f .

5 That is,

F (t,x) = (Htf)(x)= Ht(x,y)f(y) . y S δS ∈X∪ Some useful facts concerning Ht and F are the following (see [CY1] for proofs):

(i) F (0,x)= f(x)

(ii) For x S δS, ∈ ∪ Ht(x,y) dy = dx y S δS ∈X∪ q p (iii) F satisfies the heat equation ∂F = F ∂t −LS (iv) For any x δS, F (t,x) = 0 ∈ LS (v) For all x,y S δ(S), H (x,y) 0 ∈ ∪ t ≥

The connection of the heat kernel to the Neumann eigenvalue λS of S is given by the following result:

Theorem [Chung, Yau [CY1]]. For all t> 0,

1 Ht(x,y)√dx (13) λS inf . ≥ 2t y S d x S ∈ y X∈ p This inequality will be an essential ingredient in eventually obtaining lower bounds for λS (and upper bounds on the mixing rates) for the various Markov chains we consider.

3. A direct application to heat kernel inequality

The main use of the heat kernel inequality (13) in lower bounding λS will depend on

controlling the behavior of Ht by connecting it to an associated continuous heat kernel ht for an appropriate Riemannian manifold M = M containing the points of S δS. This we do in S ∪ Section 4.

However, there are situations in which we can get bounds on λS directly from (13) without going through this process. We describe several of these in this section. Let us consider the special case for which S = V (so that = ) and, further, suppose the LS L graph G has a “covering” vertex x with the property that x is adjacent to every g V x 0 0 ∈ \{ 0} (so that d = n 1 where n := V . We will apply (13) with t 0. Thus, by (13) x − | | →

6 H = I t + O(t2), t 0 t − L →

y′ y

1 t − 1 t − 2 (14) = +O(t ) t x 0 ··· ··· dxdy q x y′ x y 6∼ ∼ Thus,

y′ y

1 t − 1 t − √dx O t2 Ht(x, y) = + ( )  d  t y x 0   q ··· ··· dy

x y′ x y 6∼ ∼ (15)

y′ y

1 t − 1 t − = +O(t2) t t x0 ··· dy′ ··· dy

Thus, by (13)

1 Ht(x,y)√dx λS = λ inf y ≥ 2t x dy X 1 t p (16) = inf + O(t2) 2t y=x0 d 6 y ! 1 t = + O(t), t 0 2t δ  2  → where δ2 denotes the second largest degree in G. Thus, we have 1 (17) λ ≥ 2δ2

7 for this situation. For example, for G = P3, the path with 3 vertices, it is true that λ0 = 0, λ = λ = 1 and λ = 2, while our estimate in (17) gives λ 1/2. 1 2 ≥ A similar analysis shows that of G has k> 1 covering vertices then (13) implies k (18) λ . ≥ 2(n 1) − Applying this to G = Kn, the complete graph on n vertices yields n λ ≥ 2(n 1) − n while the truth is λ = n 1 (again off by a factor of 2). − 4. The basic set-up

As hinted at in the previous section, the real use of (13) in boundary λS will proceed along the following lines:

(a) Embed G and S as “convex” sets into some Riemannian manifold M = MS (usually EN ∼= for our applications).

(b) Relate the heat kernel H on S δS to the continuous heat kernel h on M. t ∪ t

(c) Relate ht to various properties of M (and S), such as the “density” of points of S in M, the diameter of M, the dimension of M, etc.

We next describe a specific situation to which we can apply this procedure. We begin with an infinite “lattice” graph Γ = (V, E) with V = EN , with the property ⊂ M ∼ that the automorphism group of G is transitive. Thus, : V V so that u v gu gv, H H → ∼ ⇔ ∼ g , and for all u, v V , ∈H ∈ v = gu for some g . ∈H Suppose Γ has an edge-generating set , so that every edge of Γ is of the form v,gv K⊂H { } for some v V , g , and any such pair is an edge (hence the term lattice graph). We will ∈ ∈ K further assume (since we think of Γ as undirected) that g g 1 . Finally, we assume ∈K⇒ − ∈ K that for any x, any lattice point of y in the convex hull of gx : g is also adjacent ∈ M { ∈ K} to x in Γ. Let S be some finite subset of V . Assume S is “convex”, which means that for some submanifold M with nonempty convex boundary, S consists of all lattice points of ⊂ M V which are in M. Let ℓ denote the minimum length of any of the edges x,gx , g , and { } ∈ K assume that for each x S, the ball B (ℓ/3) of radius ℓ/3 centered at x is contained in M. ∈ x

8 For x V , let U(x) denote the Voronoi origin for x, i.e., the set of all points of closer ∈ M to x than to any other point y S. Since is transitive, all U(x) have the same volume, ∈ H denoted by vol U. Finally, set

diam M := diameter of M

dim M := dimension of M

Under the preceding assumptions and notation, we have the following estimate.

Theorem [Chung, Yau [CY1]]. For convex S,

ℓ 2 S vol U (19) λ c | | S 0 dim M diam M vol M ≥  

for constant c0 > 0 depending only on Γ and not on S.

The proof of (19) depends upon a variety of ideas and techniques from differential geometry, linear algebra and combinatorics (as well as (13)), and can be found in [CY1]. The bulk of our results presented here will depend on applying (19) and its generalizations to specific cases of interest. We remark that the results in [CY1] are similar in spirit to those in [CY2] which also

gives lower bounds for λS for Neumann walks. There, S is required to satisfy more restrictive

conditions (e.g., being “strongly convex”) but the bounds on λS are sharper. In particular, it is shown that under the appropriate hypotheses on S (and Γ),

1 (20) λ S ≥ 8kD2 where k = and D := graph diameter of S (see Section 8 for a comparison of (19) and (20) |K| in a specific case).

5. Contingency tables

Given integer vectors r = (r1,...,rm), c = (c1,...,cn) with ri, cj 0 and ri = cj, we ≥ i j can consider the space of all m n arrays T with the property that P P ×

T (i, j)= r , 1 i m i ≤ ≤ Xj T (i, j)= c , 1 j n . j ≤ ≤ Xi

9 Let us denote by = (r, c) the set of all such arrays. The arrays T are often called T T ∈ T contingency tables (with given row and column sums). These tables arise in a variety of applications, such as goodness of fit tests in statistics, enumeration of permutations by descents, describing tensor product decompositions, counting double cosets, etc., and have a long history. (An excellent survey can be found in [DG].) It seems to be a particularly difficult problem to obtain good estimates of the size of (r, c) for large r and c. In order to attack this problem, T a standard (by now — see [A87], [SJ89], [S93], [G91], [DSt]) technique depends on rapidly generating random tables from = (r, c) with nearly equal probability. T T To do this, we first consider the following natural random walk P on . ¿From any given T table T , select uniformly at random a pair of rows i, i and a pair of columns j, j , ∈ T { ′} { ′} and move to the table T ′, obtained from T by changing four entries of T as follows:

T ′(i, j)= T (i, j) + 1, T ′(i′, j′)= T (i, j) + 1

T ′(i′, j)= T (i, j′) 1, T ′(i, j′)= T (i, j′) 1 . − −

Such a move we call a basic move. The table T ′ clearly has the same line sums (i.e., row and column sums) as T . The only problem is that T ′ may have negative entries (because we might have T (i, j ) = 0, for example) and so is not in . To deal with this “boundary” problem, we ′ T instead execute the corresponding Neumann walk in , as described in Section 2. T We first need to place our contingency table problem into the framework of the preceding section. The manifold will consist of all real mn-triples x = (x ,x ,...,x ) satisfying M 11 12 mn

xij = ri, xij = cj . Xj Xi

Since ri = cj then i j P P dim = N := (m 1)(n 1) . M − − The graph Γ has as vertices all the integer points in , i.e., all x with all x Z. The edge M ij ∈ generating set consists of all the basic moves described above. Thus, = m n . The set K |K| 2 2 S will just be = (r, c), the set of all T Γ with all entries nonnegative. Thus,   T T ∈ S = T Γ : x 0 . { ∈ ij ≥ } \i,j Similarly, the manifold M is defined by ⊂ M M = x : x 2/3 . { ∈ M ij ≥− } \i,j

10 It is clear that M is an N-dimensional convex polytope and S = is the set of all lattice T points in M, and consequently convex in the sense needed for (19). It is easy to see that is T connected by the basic moves generated by , and that each edge of Γ has length 2. Our next K S vol U problem is to deal with the term | vol| M in (19). In particular, we would like to show this is

close to 1, provided that ri and cj are not too small. To do this we need the following two results.

N Claim 1. Suppose L E is a lattice generated by vectors v1,...,vN . Then the covering ⊂ 1/2 N 1 2 radius of L is at most R := 2 vi . i=1 k k ! P Proof. The assertion clearly holds for N = 1. Assume it holds for all dimensions less than N. It is enough to prove that any point x = (x1,...,xN ) in the fundamental domain generated by the vi is at most a distance of R from some vi. Let x0 be the projection of x on either the hyperplane generated by v1,...,vN 1, or a translate of the hyperplane by vN , − whichever is closer (these are two bounding hyperplanes of the fundamental domain). Thus, d(x , x ) 1 v . By the induction hypothesis, 1 0 ≤ 2 k N k

N 1 1/2 1 − 2 d(x0, vj) vi for some j

Claim 2. If M is convex and contains a ball B(cRN) of radius cRN, c> 0, then

1/c S vol U 1/c (21) e− < | | < e vol M

1/2 N 1 2 where v1,...vN generate Γ, and R = 2 vi . v=1 k k ! P Sketch of proof: Consider an enlarged copy (1 + δ)M of M expanded about the center of the ball B(cRN), δ > 0. Let L be some bounding hyperplane of M, and let (1 + δ)L be the corresponding expanded copy of L (see Figure 2)

11 γ

cRN y x

B(cRN) (1 + δ)L L

M

Figure 2: A large ball in M

Let x S M and suppose there exists y U(x), the Voronoi region for x, with ∈ ⊂ ∈ y (1 + δ)M. Thus 6∈ d(x,y) γ >cδRN . ≥ However, by Claim 1, each point in has distance at most R from lattice point in . This M M 1 is a contradiction if we take δ = cN . Thus, for all x S, U(x) (1 + δ)M and so ∈ ⊂

S vol U vol(1 + δ)M = (1+ δ)N vol M | | ≤

i.e., S vol U 1 N | | (1 + δ)N = 1+ < e1/c . vol M ≤  cN  A similar argument shows that

S vol U 1/c | | > e− vol M and Claim 2 is proved.

In order to apply the result in Claim 2, we must find a large ball in M. Let s0 denote the smallest line sum average, i.e.,

ri cj s0 = min min , min . i j  n m

12 We begin constructing an element T M recursively as follows. Suppose without loss of 0 ∈ generality that r s = 1 . 0 m

Then, in T0, set all elements of the first row equal to s0, and subtract s0 from each value cj to form c = c s , 1 j n. Now, to complete T , we are reduced to forming an (m 1) j′ j − 0 ≤ ≤ 0 − by n table T0′ with row sums r2,...rm and column sums c1′ ,...,cn′ . The key point here is that

all the line sum averages for T0′ are still at least as large as s0. Hence, continuing this process

we can eventually construct a table T0 (with rational entries) having least entry equal to s0. Consequently there is a ball B(s ) of radius s centered at T M which is contained in M 0 0 0 ∈ 3/2 (since to leave M, some entry must become negative). Therefore, if we assume s0 > cN then by (19) and (21), c e 1/c (22) λ 0 − S ≥ N 2(diam M)2 for some absolute constant c0 > 0 (since for tables, all the generators have length 2, so that R N 1/2). ≤ In the Appendix, we illustrate how a specific value can be derived here for c0 (as well as in several other cases of interest, as well). In particular, for contingency tables, we can take c0 = 1/800. Since 1/2 1/2 diam M < 2min r2 , c2  i j  Xi  Xj   then (22) can be written as follows:   For the natural Neumann walk P on the space of tables (r, c) where T r c min min i , min j > c(m 1)3/2(n 1)3/2 i n j m   − − we have 1 − (23) λ > 3200e1/c(m 1)2(n 1)2 min r2, c2 . S  − −  i j  Xi Xj    To convert the estimate in (23) to an estimate for the rate of convergence of P to its stationary (uniform) distribution π, we use the following (standard) techniques (e.g., see [S93]). Define the relative pointwise distance of P (t) to π to be

(t) Py,x π(x) (24) ∆(t) := max | − | x,y π(x)

13 (t) where Py,x denotes the probability of being at x after t steps starting at y. It is not hard to show (see [S93]) that

vol S ∆(t) < (1 λ)t − min deg x x Γ (25)

λt vol S λt e− = e− S ≤ deg Γ | |

where

vol S : = degΓ x, x S X∈ deg G = degΓ x for any x

and λ is the eigenvalue of which maximizes 1 λ for λ = 0. LS | − | 6 In order to guarantee that λ is in fact λS, we can modify P to be “lazy”, i.e., so that the modified walk P stays put with probability 1/2, and moves with probability 1/2 times what P did. The eigenvalues for P are just 1/2 times those for P , and so, are contained in [0, 1]. Thus, if 2 (26) t> ln |T | λS ǫ then ∆(t) <ǫ. Note that

min rn, cm . |T| ≤  i j  Yi Yj  Thus, by (19) and (26), if  

1 (27) t> 6400e1/cm2n2 min r2, c2 ln + min n ln r ,m ln c  i j   ǫ  i j Xi Xj   Xi Xj    then ∆(t) <ǫ, provided    

r c min min i , min j > c(m 1)3/2(n 1)3/2 . i n j n   − − As remarked earlier, this shows that the natural Neumann walk on (r, c) converges to T uniform in time polynomial in the dimensions of the table and the sizes of the line sums (and as the square of the diameter of the space). This strengthens a recent result of Diaconis and Saloff-Coste [DS1] who showed that for fixed dimension the natural walk on (where any T step which might create a negative entry is simply not taken) converges to uniform in time polynomial in the sizes of the line sums and as the square of diameter of the graph (they use T

14 total variation distance instead of relative pointwise distance) but with constants that grow exponentially in mn. By taking successive relaxations of the line sum constraints (as described in [DG] or [DKM]), it is possible to approximately enumerate in polynomial time as well. T We also note that Dyer, Kannan and Mount [DKM] have developed a rather different (con- tinuous) random walk which is rapidly mixing on . They show that the dominant eigenvalue T λ for their walk satisfies c λ> (m 1)4(n 1)4(m + n 2) − − − for m,n > 1.

6. Restricted contingency tables

A natural extension of a contingency table is one in which certain entries are restricted, e.g., required to be 0. In this section we indicate how such restricted tables can be dealt with. Given m,n > 0, let A [1,m] [1,n] be some nonempty index set. By a A-table T we ⊆ × mean an m n array in which T (i, j) 0 if (i, j) A. For given row sum and column sum × ≡ 6∈ vectors r = (r1,...,rm) and c = (c1,...,cn), respectively, with ri = cj, we let i j P P

= (r, c)= A-tables T : T (i, j)= r , T (i, j)= c , 1 i m, 1 j n . TA TA  i j ≤ ≤ ≤ ≤   Xj Xi  As before, we want to execute a “natural” Neumann walk P on and show that it is rapidly TA mixing. However, several new complications arise from what had to be considered in the preceding section. To begin with, what will we use for steps in our walk? Let us associate to A a bipartite

graph B = BA with vertex sets [1,m] and [1,n], and with (i, j) an edge of B if and only if (i, j) A. In the case of unrestricted tables, B is the complete bipartite graph on these vertex ∈ sets, and the basic steps just occur on the 4-cycles of B. For our more general bipartite graph B, we first normalize it as follows. Let C be a connected component of B. For each cut-edge e = (i, j) of C, the corresponding value T (i, j) is easily seen to be determined by r and c, say, it is w(e). Then replace r by r w(e) and c by c w(e). Now continue this process i i − j j − recursively until we finally arrive at the 2-connected components C′ of C, with correspondingly

reduced row and column sums r′ and c′. It is not hard to show that there are always feasible assignments satisfying the modified line sum constraints.

15 Next, we have to describe the basic moves of our Neumann walk on . For each 2- TA connected component C of B, let TC denote a fixed spanning tree on the vertex set V (C) = V V of C, where V [1,m], V [1,n]. For each edge i = i, j not in T , with i V , 1 ∪ 2 1 ⊂ 2 ⊂ { } C ∈ 1 j V , the addition of e to T creates some simple even cycle Z(e). We then assign alternating ∈ 2 C 1’s on the consecutive edges of Z(e) to generate a possible move Z (e) on the table. It is ± ± not hard to show that the set of cycles Z(e) form a cycle basis over Z for the even cycles e on V (C) (in fact, only coefficients of 0 orS 1 are ever needed). It is also not difficult to see ± that the set of moves = Z±(e), C a 2-connected component of B, connects the set A K C e T of all A-tables. Note that S Shas size O(mn), and so is smaller than what was used in the K unrestricted (complete) case. Of course, if A denotes the set of non-cut-edges of B then our tables T only have ′ ∈ TA possibly varying entries T (i, j) where (i, j) A . Thus, our A -tables can now be represented as ∈ ′ ′ A′ integer points in E ,. In fact, because of the line sum constraints, the set ′ of all A -tables | | TA ′ actually lies in a subspace of dimension N = (mC 1)(nC 1) where C ranges over the M C − − 2-connected components of B, and mC and nC areP the sizes of the vertex sets of C. The same “minimum average line sum’ technique from the preceding section now applies

to each (2-connected) component C of B to produce a “central point” in MA′ , the expanded

submanifold of which allows real entries 2/3 in A -tables. Of course, S ′ = ′ is now M ≥− ′ A TA the set of all lattice points in A ′ (and so, consists of A -tables with all entries 0). This A ′ ≥ shows that there is a ball of radius s0 in MA′ , where s0 is the minimum average line sum

occurring in all the components C. Of course, as before, we must restrict s0 to be reasonably

large. The final result is an estimate for λS of the form (22) where now the “constant” c0 depends on the geometry of the specific generators in . At worst, c decreases by a factor of K 0 at most O((m2 + n2)m2n2) from the unrestricted case. We sketch how this comes about in the Appendix. Another interesting special case we mention here is that of symmetric contingency tables, i.e., with m = n, r = c and T (i, j)= T (j, i) for all i and j. We can use as basic moves in this n case the 2 symmetric transformations  T (i, i) T (i, i) + 1 → T (j, j) T (j, j) + 1 → T (i, j) T (i, j) 1 → −

16 T (j, i) T (j, i) 1 → − and their inverses, for any i = j. Any symmetric table can be transformed to the diagonal table 6 with T (i, i) = r for all i (and 0 otherwise), so that these moves connect the space = (r) i S S 1/2 of symmetric contingency tables for r. Easy calculations show that diam M < 2( ri) and i 1 that M contains a ball of radius min ri. Thus, by the same arguments that ledP to (23), we n i have 1/c c0e− (28) λS > 4 2 n ri i P n 3/2 for an absolute constant c0, provided min ri > c . As usual, the translation of this bound r 2 to one for the rate of convergence of the walk to uniform is straightforward. We remark here that a similar analysis can be applied to the more general problem in which our space of objects on which to walk is now a general graph G with nonnegative integer weights assigned to its edges so that for each vertex v of G, the sum of the weights on the edges incident to v is exactly some preassigned value S(v). Again, the steps of the walk will consist of modifying edge weights by alternating 1’s on certain simple even cycles (forming ± a cycle basis over Z of all even cycles in G, analogous to what was done for general bipartite graphs). Not surprisingly, the bound on λS for the corresponding Neumann walk has the same general form as for the bipartite case.

7. Compositions of an integer

An easy application of the preceding ideas concerns compositions of an integer T with a

fixed number of parts. These are just ordered partitions (r1, r2,...,rn) with integers ri > 0 so that ri = T . The basic moves for the walk will be the n(n 1) transformations of the i − type: (xP,...,x ,...,x ,...,x ) (x ,...,x + 1,...,x 1,...,x ). Of course, Γ consists 1 i j n → 1 i j − n of all points x = (x ,...,x ) En, and = = x En : x + + x = T . We let 1 n ∈ M MT { ∈ 1 · · · n } S = x : x 2/3 . Then diam M √2(T + 2n ), dim M = n 1 and M contains a { ∈ M i ≥ − } ≤ 3 − ball of radius T/n. Thus, by (19) if T >cn5/2 then

c e 1/c (29) λ > 0 − S n2T 2 for an absolute constant c0. In the Appendix, we show that we can take c0 = 1/200. Hence, for T >cn5/2, if t> 400e1/cn2T 2(n ln T + ln 1/ǫ) then ∆(t) <ǫ. We point out that compositions can also be treated somewhat more directly by the methods in [CY2]. This is essentially

17 because S in this case is what is called there “strongly convex”, and, in fact, satisfies the stronger condition that if x,y S, z ∂S with x z y then x y. As was pointed out in ∈ ∈ ∼ ∼ ∼ [CY2], we therefore can conclude that

1 λ S ≥ 8kD2 where k = and D := graph diameter of S. Thus, |K| 1 1 λ > S ≥ 8n(n 1)(T n)2 8n2T 2 − − for this Neumann walk on compositions (with no restrictions on T ). We note that Diaconis/Saloff- Coste [DS1] also treat compositions (by quite different methods) for n and T in restricted ranges. In particular, they conjecture that t = Ω((nT + T 2)(n ln T + ln 1/ǫ)) steps suffice to guarantee that ∆(t) <ǫ.

8. Knapsacks

In our final example, we will consider the following natural generalization of compositions, which we call “knapsack” solutions (e.g., see [DFKKPV], [DGS]). We are given an integer vector a = (a1,...,an) with ai > 0 and gcd(a1,...,an) = 1. For an integer T , we consider the set S = ST of all integer vectors r = (r1,...,rn), ri > 0, such that

(30) r a = r a = T . · i i Xi As usual, our goal will be to (approximately) select a random element from (and enumerate) the set of “knapsack” vectors in S. We do this by constructing a rapidly mixing Markov chain on S, and estimating the corresponding Neumann eigenvalue λ . The manifold is just S M n x E : aixi = T , and M := x : xi > 2/3, 1 i n . Let ei = (0,..., 1,..., 0) { ∈ i } { ∈ M − ≤ ≤ } with 1 in theP ith component, and define g = a e a e . Our generator set will be = g : ij j i − i j K {± ij 1 i = j n . ≤ 6 ≤ } The first problem is to show that S is connected with the generators from , provided K T is sufficiently large. For ease of exposition we are going to make the assumption that gcd(a1, a2) = 1. The general case in which we only assume gcd(a1, a2,...,an) = 1 is somewhat more technical but offers no real new difficulty. Define A = a + a + a , and suppose 1 2 · · · n T > 2A max ai . Then it is easy to see that by repeated application of generators of the form i { }

18 g , i = 3, we can reach a vector (r , r ,...,r ) S so that r a > 2a a . Now it is well known 3i 6 1 2 n ∈ 3 3 1 2 (see [EG80]) that any w> 2a1a2 can be represented as

w = w a + w a , w , w integers 0 . 1 1 2 2 1 2 ≥

Hence, we can apply g13 w1 times, and then g23 w2 times to reach the vector

(r1 + w1a3, r2 + w2a3, 0,...,rn)

which has 0 for its 3rd component. Now, can repeat this argument for each of the other components r , i 4, and eventually arrive at a vector (r , r , 0, 0,..., 0) S. Finally, by i ≥ 1′ 2′ ∈ applying g appropriately we can finally reach (r , r , 0, 0,..., 0) S where 0 r < a . Such 12 1∗ 2∗ ∈ ≤ 2∗ 1 a vector is unique, and in reaching it we always remained in S. Thus, S is connected by the generators in . Also, dim M = n 1 and diam M T . Since the point (T/A, T/A, . . . , T/A) K − ≤ ∈ S, then M contains a ball of radius T/A. Hence, by the usual arguments (by now) where 1/2 2 2 1/2 3/2 2 R< (2(a1 + + an)) in Claim 2 by using generators gi,i+1, if T >cn ai then · · ·  i  P c e 1/c (31) λ > 0 − S n2T 2 where c0 depends only on the geometry of the generators in , and not on T . (We estimate 1/2 K 3/2 2 c0 in the Appendix.) Thus, by (26), if T >cn ai then for  i  P 1/c 2 2 (32) t > c1e n T (n ln T + ln 1/ǫ) we have ∆(t) <ǫ (since S < T n) where c depends only on the geometry of the vectors in . | | 1 K

19 Appendix

In this section, we provide additional details needed for bounding the constant c0 occurring in (19). The arguments will extend (and depend on) those in [CY1]. Briefly, in [CY1] we have 2d 2d µ(x,gx) 2 ∂2 = − ℓ2 Ls ℓ ∂g2 g ∗   |K| X∈K ∂2 = aij ∂xi∂xj Xi,j where d := dim M, ℓ := min µ(x,gx), µ denotes (Euclidean) length, and ∗ consists of g K ⊂ K ∈K exactly one element from each pair g, g 1 , g . { − } ∈ K Suppose C1 and C2 are constants so that

C I (a ) C I 1 ≤ ij ≤ 2 where I is the identity operator on M, and X Y means that the operator Y X is positive ≤ − definite. In particular, we can take for C1 and C2 the least and greatest eigenvalues, respec- tively, of (a ) restricted to M. Now, it follows from the arguments in [CY1] that when is ij M Euclidean then the constant c0 in (19) can be taken to be

1 1 c = min(C ,C− ) . 0 100 1 2

Thus, to determine c0 in various applications, our job becomes that of bounding the eigenvalues of the corresponding matrix (aij).

First, we consider m n contingency tables. With each edge generator g = x x ′ × ij − i j − ∂2 x ′ x ′ ′ we consider in terms of the x’s. ij − i j ∂g2 Expanding, we have ∂2 ∂2 ∂2 ∂2 ∂2 ∂2 2 = + 2 + 2 + 2 2 ∂g ∂xij2 ∂xi′j ∂xij′ ∂xi′j′ − ∂xij ∂xi′j ∂2 ∂2 ∂2 2 2 2 − ∂xijdxij′ − ∂xi′j′ dxi′j − ∂xi′j′ dxij′ ∂2 ∂2 + 2 + 2 . ∂xij∂xi′j′ ∂xi′jdxij′ We can abbreviate this in matrix form as

xij xi′j xij′ xi′j′ x 1 1 1 1 ij − − x ′ 1 1 1 1 i j − − x ′ 1 1 1 1 ij − − x ′ ′ 1 1 1 1 i j − −

20 We need to consider the operator ∂2 . ∂g2 g ∗ X∈K The corresponding matrix Q has the following coefficient values for its various entries: Entry Coefficient (x ,x ) (m 1)(n 1) ij ij − − (x ,x ′ ) (m 1) ij i j − − (x ,x ′ ) (m 1) ij ij − − (xij,xi′j′ ) 1 Thus, Q has two distinct eigenvalues: one is mn with multiplicity (m 1)(n 1), and the − − other is 0 with multiplicity m + n 1. Now, dim M = (m 1)(n 1) and the operator − − − corresponding to Q when restricted to M has all eigenvalues equal to mn. So the matrix (aij) mn(m 1)(n 1) has all eigenvalues equal to 2 m− n − = 8, and consequently we can take C1 = C2 = 8, ( 2 )(2) and c0 = 1/800. Next, in the case of restricted tables, we associate our restrictions with a bipartite graph B (which we assume for now is 2-connected; the general case involves taking the union of such graphs). As usual, let T denote a fixed spanning tree of B, and let C denote the associated

(even) cycle basis for B. For each cycle Z in C with edges e1, e2,...,e2r we have the edge generator g = x x + x . We consider the matrix (a ) associated with the operator e1 − e2 ···− e2r ij 2d µ(x,gx) 2 ∂2 ∂2 = a . ℓ ∂g2 ij ∂x ∂x g ∗   i,j i j |K| X∈K X Clearly, for f : E(B) R, we can express (a ) as a quadratic form: → ij 2 f, (aij)f = (f(e1) f(e2)+ f(e2r)) r h i Z C − ···− · X∈ 2d Z 2 since k ℓk2 = r. |K| To upper-bound the eigenvalues of (aij) we have 2d C (m + n)2mn 2(m + n)2mn 2 ≤ ≤ |K| since f, (a )f (f(e ) f(e )+ f(e ))2r h ij i 1 − 2 ···− 2r f,f f 2(e ) ≤ Z C i h i X∈ i 2 P 2 2 (f (e1)+ + f (e2r))r Z C · · · ∈ P 2 ≤ f (ei) i r2 C (mP+ n)2mn . ≤ | |≤

21 To establish a bound for C1, more work is needed. We will use the following modified discrete version of Cheeger’s theorem (see [C] ): For a graph G, suppose f is the eigenfunction associated with some nonzero eigenvalue λ of the Laplacian of G. Then λ satisfies

2 (f(u) f(v)) h 2 λ = −2 λ u v f (v)dv ≥ 2 X∼ v P where hλ = min h(v) and v u, w E(G) : f(u) f(v)

min du, dw  u w  f(u)≤f(v) f(v)

Before applying the above result, we will modify A = (aij). First, choose a root ρ in T . For each tree edge e = u, v with d (ρ, u) < d (ρ, v) and d (ρ, u) odd (where d denotes the { } T T T T graph distance in T ), we define

e′ = e e + e − 1 2 −···

where the unique path from u to ρ consists of the edges e1, e2,... . Let X denote the matrix corresponding to this change of coordinates. Thus,

f, Xtr(a )Xf fXtr,Xf f, A f (33) h ij i h i = h ′ i fXtr,Xf · f,f f,f h i h i h i tr where A′ = X (aij)X corresponds to the quadratic form

2 f, (aij′ )f = (f(e1′ ) f(e2′ )+ f(e) f(e1)) r . h i Z C − − X∈ Note that there are just four terms in each of the squared terms in the sum. By (33), the tr eigenvalues of (aij′ ) are products of eigenvalues of (aij) and eigenvalues of X X. So, to lower- bound the nonzero eigenvalues of A′, we apply the above result twice, since the 4-term sum can be interpreted as two 2-term sums. Thus, we have a lower bound in this case of

1 1 . (m + n)2 · (2mn)2

This implies that we can take 1 c = 0 400(m + n)2m2n2

22 for restricted tables. For compositions, our generators are of the form

g = (0,..., 1,..., 1,..., 0) = x x . ij − i − j Then 2d ∂2 ∂ ∂ = a ∂g2 ij ∂x ∂x i,j ij ij i j |K| {X} X where 2(n 1) n− if i = j aij = .  2 if i = j  − n 6 Thus, (a ) has eigenvalues 0 of multiplicity one, and 2 of multiplicity n 1. This implies that ij  − C1 and C2 are both equal to 2 so that we can take c0 = 1/200. Finally, for the knapsack problem, we have edge generators (...,a ,..., a ,...). These j − i correspond to g = a x a x . Therefore ij j i − i j a2 + a2 ∂2 ∂2 ∂ ∂ ∂ i j = a2 2a a + a2 2 ∂g2 i ∂x2 − i j ∂x ∂x j ∂x2 i,j ! ij i,j i i j j ! {X} {X} ∂ ∂ = |K| aij 2d ∂xi ∂xj Xi,j where the sum is taken over all unordered pairs i, j. The matrix (aij) corresponds to the quadratic form 2 2 2d ai + aj 2 (ajxi aixj) = x, Ax ℓ2 2 ! − h i |K| Xi,j 2 2 1/2 where d = n 1, = n(n 1), and ℓ = min(ai + aj ) is the minimum edge length. Set − |K| − i,j 1 2 2 2 2 βi = 2 (ai + aj )ai aj . Then j P 1 2 2 2 2 (ai + aj )(ajxi aixj) n x, Ax i,j − 2 2 { } min(ai + aj ) h i = P 2 i,j · 2 x, x xi h i i 1 2 P2 2 2 2 2 (ai + aj )ai aj (yi yj) i,j − xi { } = P 2 2 where yi = ai yi ai i 2 P2 2 2 min ai ai aj (yi yj) i i,j − { } P 2 2 ≥ ai yi i P 2 2 2 ai aj (yi yj) i,j − 2 2 { } min ai min aj P . i i 2 2 2 ≥ · j ai yi ak Xj=6 i i k P kP=6 i

23 Now, by the modified Cheeger theorem referred to previously, we have for any eigenvalue λ = 0 6 of A

2 2 n 1 2 2 2 2 min(ai + aj ) λ h aj max ai min ai i,j 2 ≥ 2  − i  i Xj   2 2 where h is the Cheeger constant for the complete graph Kn with edge weights ai aj , which is defined by 2 2 ai aj i I j I h = inf P∈ P6∈ I V a2 a2 ⊂ i j i I j=i P∈ P6 taken over all I V = V (K ) satisfying ⊂ n 1 a2 a2 a2a2 i  k ≤ 2 i k i I k=i i k X∈ X6 X Xk=6 i   i.e., a2 a2 a2 a2 . j k ≥ i k j I k i I k X6∈ Xk=6 j X∈ Xk=6 i 2 2 First, suppose ai aj . Then i I ≤ j I P∈ P6∈ 2 2 ai aj i I j I ∈ 6∈ h P 2 P 2 1/2 . ≥ ai ak ≥ i I k P∈ kP=6 i 2 2 On the other hand, suppose aj < ai . Then j I i I P6∈ P∈ 2 2 ai aj i I j I ∈ 6∈ h P 2 P 2 ≥ aj ai j I k P6∈ kP=6 j 2 2 ai aj i I j I 1 ∈ 6∈ P 2 P 2 . ≥ aj ak ≥ 2 j I k P6∈ P Thus, in both cases we get h 1/2 so that ≥ 2 2 2 min ai min aj x, Ax 2 1 i i j=i h i 6 = C x, x ≥ n · 2 min(a2 + Pa2) 1 h i     i,j i j

For C2, we compute

1 2 2 2 2 (ai + aj )(ajxi aixj) x, Ax 2 i,j − { } h i 2 2 P 2 x, x ≤ n min(ai + aj ) xi h i i,j i P 24 2 2 2 2 2 2 (ai + aj )(aj xi + ai xj ) 2 i,j { } 2 2 P 2 ≤ n min(ai + aj ) xi i,j i P 2 2 2 2 max aj (ai + aj )= C2 2 2 i ≤ n min(ai + aj ) j i,j Xj=6 i

1 1 and, as usual, we take c0 = 100 min(C1,C2− ).

25 References

[A87] D. Aldous, On the Markov-chain simulation method for uniform combinatorial simulation and simulated annealing, Prob. Eng. Info. Sci. 1 (1987), 33-46.

[BM76] J. A. Bondy and U. S. R. Murty, Graph Theory with Applications, American Elsevier, New York, 1976.

[C] F. R. K. Chung, Laplacians of graphs and Cheeger’s inequalities (to appear).

[CY1] F. R. K. Chung and S.-T. Yau, A Harnack inequality for homogeneous graphs and subgraphs, Comm. Analysis and Geometry, 2 (1994) 628–639.

[CY2] F. R. K. Chung and S.-T. Yau, Heat kernel estimates and eigenvalue inequalities for convex subgraphs (preprint).

[DG] P. Diaconis and A. Gangolli, Rectangular arrays with fixed margins (preprint).

[DGS] P. Diaconis, R. L. Graham and B. Sturmfels, Primitive partition identities (to appear).

[DS1] P. Diaconis and L. Saloff-Coste, Random walk on contingency tables with fixed row and column sums (preprint).

[DSt] P. Diaconis and B. Sturmfels, Algebraic algorithms for sampling from conditional distributions (to appear in Ann. of Statistics).

[DFK89] M. Dyer, A. Freeze and R. Kannan, A random polynomial time algorithm for approximating the volume of convex bodies, in Proc. 21st ACM Symp. on Theory of Computing, 1989, pp. 375–381.

[DFKKPV] M. Dyer, A. Freeze, R. Kannan, A. Kapour, L. Perkovic, U. Vazirani, A mildly exponential time algorithm for approximating the number of solutions to a multi- dimensional knapsack problem, (to appear in Combinatorics, Probability and Computing).

[DKM] M. Dyer, R. Kannan and J. Mount, Sampling contingency tables (preprint).

26 [EG80] P. Erd˝os and R. L. Graham, Old and new problems and results in combinatorial

number theory, L’Enseignement Mathematique, Monographe N◦28, Gen`eve, 1980, p. 86.

[FKP94] A. Frieze, R. Kannan and N. Polson, Sampling from log-concave distributions, Annals Appl. Prob. 4 (1994), 812–837.

[G91] A. Gangolli, Convergence bounds for Markov chains and applications to sampling, Ph.D. Dissertation, Dept. of Comp. Sci., Stanford, 1991.

[JS] M. Jerrum and S. Sinclair, Fast uniform generation of regular graphs, Th. Comp. Sci. 73 (1990), 91–100.

[LS90] L. Lov`asz and M. Simonovits, The mixing rate of Markov chains, an isoperimetric inequality and computing the volume, in Proc. 31st IEEE Symp. on Foundations of Computer Science, 1990, pp. 346–354.

[MW] B. McKay and N. Wormald, Uniform generation of random regular graphs of moderate degree (to appear in J. of Algorithms).

[SY94] R. Schoen and S-T. Yau, Differential Geometry, International Press, Cambridge, 1994.

[S93] A. Sinclair, Algorithms for random generation and counting: a Markov chain approach, Birkhauser, Boston, 1993.

[SJ89] A. Sinclair and M. Jerrum, Approximate counting, uniform generation and rapidly mixing Markov chains, Infor. and Comput. 82 (1989), 93–133.

27