Detailed Proofs of Lemmas, , and Corollaries

Dahua Lin John Fisher CSAIL, MIT CSAIL, MIT

A List of Lemmas, Theorems, and 4. The hierarchically bridging Markov Corollaries chain with bk < 1 for k = 0,...,K − 1, and fk < 1 for k = 1,...,K is ergodic. If we write the equilibrium For being self-contained, we list here all the lemmas, distribution in form of (αµ0, β1µ1, . . . , βK µK ), then theorems, and corollaries in the main paper. (S1) µ0 equals the target distribution µ; (S2) for each k ≥ 1, and y ∈ Y , µ (y) is proportional to the total Lemma 1. The joint transition P given by k k + probability of its descendant target states (the target Eq.(1) in the main paper has a stationary distribution states derived by filling all its placeholders); (S3) α, in form of (αµ , βµ ), if and only if X Y the probability of being at the target level, is given by α−1 = 1 + PK (b ··· b )/(f ··· f ). µX QB = µY , and µY QF = µX . (1) k=1 0 k−1 1 k Corollary 2. If bk/fk+1 ≤ κ < 1 for each k = Under this condition, we have αb = βf. Further, if 1,...,K, then α > 1 − κ. both PX and PY are both reversible, then P+ is also reversible, if and only if B Proofs µX (x)QB(x, y) = µY (y)QF (y, x), (2) Here, we provide the proofs of the lemmas and theo- for all x ∈ X and y ∈ Y . rems presented in the paper. Lemma 2. If the condition given above holds, then µX and µY are respectively stationary distributions of B.1 Proof of Lemma 1 QBF and QFB. Moreover, if P+ is reversible, then both QBF and QFB are reversible. Recall that P+ is given by Theorem 1. Given a reversible Markov chain with   (1 − b)PX bQB transition matrix P, and ε ∈ (0, 1/2), then P+ = . fQF (1 − f)PY

log(1/(2ε))(τ − 1) ≤ tmix(ε) ≤ log(1/(εµmin))τ. (3) Suppose P+ has a stationary distribution in form of (αµX , βµY ), then Here, τ is called the relaxation time, given by 1/γ∗(P).

Theorem 2. Let λ2 be the second largest eigenvalue α(1 − b)µX PX + βfµY QF = αµX , of a reversible transition matrix P, then αbµX QB + β(1 − f)µY PY = βµY . (6) 2 Φ∗(P)/2 ≤ 1 − λ2 ≤ 2Φ∗(P). (4) Since µX and µY are respectively stationary distribu- tions of PX and PY , i.e. µX PX = µX and µY PY = Theorem 3. The reversible transition matrix P+ as µ , we have given by Eq.(1) in the main paper has Y βfµ QF = αbµ , and αbµ QB = βfµ . (7) η(b, f) φ Y X X Y · ≤ Φ∗(P+) ≤ max{b, f}. (5) 2 φ + 1 Note that QB and QF were defined with the condition that each of their rows sums to 1, i.e. QF 1|X| = 1|Y | Here, η(b, f) = 2αb = 2βf is the total probability of and QB1|Y | = 1|X|. Multiplying 1 to the right of both cross-space transition, φ = min{Φ∗(QBF ), Φ∗(QFB)}. hand sides of either equation results in αb = βf. It Corollary 1. The joint chain P+ is ergodic when the immediately follows that collapsed chains (QBF and QFB) are both ergodic. µ QB = µ , and µ QF = µ . (8) Lemma 3. Let P be a reversible transition matrix X Y Y X over X, such that P(x, x) ≥ ξ > 0 for each x ∈ X For the other direction, we assume QB and QF satisfy then its smallest eigenvalue λn has λn ≥ 2ξ − 1. the conditions above, and αb = βf. Plugging these Detailed Proofs of Lemmas, Theorems, and Corollaries conditions into the left hand sides of the equations B.3 Proof of Theorem 3 in Eq.(6) results in (αµX , βµY )P+ = (αµX , βµY ), To prove theorem 3, we first establish a lemma on flow which implies that (αµX , βµY ) is a stationary distri- bution of P+. The proof of the first part is done. decomposition, and then accomplish the proof based on the lemma. Next, we show the second part of the lemma, which is about the reversibility. Let µ+ = (αµX , βµY ). Under B.3.1 A Lemma on Flow Decomposition the condition that PX and PY are both reversible, we 0 have for each x, x ∈ X, For the joint chain P+, we analyze its bottleneck ratio by decomposing the flows. Consider a partition of the 0 0 µ+(x)P+(x, x ) = α(1 − b)µX (x)PX (x, x ), union space X ∪ Y into two parts: A ∪ B (with A ⊂ X 0 0 0 0 c c c c µ+(x )P+(x , x) = α(1 − b)µX (x )PX (x , x), (9) and B ⊂ Y ) and A ∪ B (with A = X/A and B = Y/B). The flow between them comprises three parts: 0 0 0 and thus µ+(x)P+(x, x ) = µ+(x )P+(x , x) (due c c c c to the reversibility of PX ). Likewise, we can get F(A, A ) + F(B,B ) + (F(A, B ) + F(B,A )). 0 0 0 µ+P+(y, y ) = µ+(y )P(y , y). Hence, P+ is re- Here, F(A, Ac) is the flow within X, F(B,Bc) is the versible if and only if µ+(x)P+(x, y) = µ+(y)P+(y, x) for each x ∈ X and y ∈ Y . This can be expanded as flow within Y , and F(A, Bc) + F(B,Ac) is the flow between X and Y . The first two are inherited from αbµX (x)QB(x, y) = βfµY (y)QF (y, x), (10) the original Markov chains. We focus on the third one, which reflects the effect of bridging. For this part which holds if and only if µX (x)QB(x, y) = of flow, we derive the following lemma by decomposing µY (y)QF (y, x) (under the condition αb = βf). The it along multiple paths. proof is completed. Lemma 4. Given arbitrary partition of X ∪ Y into A ∪ B and Ac ∪ Bc as described above, we have B.2 Proof of Lemma 2 c c F(A, B ) + F(B,A ) ≥ αb · Φ∗(QBF )µX (A), (13) If (αµX , βµY ) is a stationary distribution of P+, then Eq.(8) holds. Thus, c when µX (A) ≤ µX (A ), and

µX QBQF = µY QF = µX , (11) c c F(A, B ) + F(B,A ) ≥ βf · Φ∗(QFB)µY (B), (14) implying that µX is a stationary distribution of c when µY (B) ≤ µY (B ). QBQF . Similarly, we can show that µY is a stationary distribution of Q Q . F B Proof. To analyze the flow F(A, Bc) + F(B,Ac), we Furthermore, if P+ is reversible, according to further decompose it along multiple paths. Based on Lemma 1, we have µX (x)QB(x, y) = µY (y)QF (y, x) Eq.(1) in the main paper, we have for each x ∈ X and y ∈ Y . Then for any x, x0 ∈ X, c X X F(A, B ) = αbµX (x)QB(x, y) (15) 0 X 0 µX (x)QBF (x, x ) = µX (x) QB(x, y)QF (y, x ) x∈A y∈Bc y∈Y X X X 0 = αb µX (x)QB(x, y) QF (y, x ) X 0 = µX (x)QB(x, y)QF (y, x ) x∈A y∈Bc x0∈X y∈Y X 0 In this way, we decompose the flow into a sum of the = µY (y)QF (y, x)QF (y, x ) 0 terms in form of µX (x)QB(x, y)QF (y, x ), which we y∈Y call the path weight along x → y → x0, denoted by 0 Similarly, we can get ω(x, y, x ). We can then rewrite F(A, B) as

0 0 X 0 X X X 0 µX (x )QBF (x , x) = µY (y)QF (y, x )QF (y, x). F(A, B) = αb ω(x, y, x ). (16) y∈Y x∈A y∈B x0∈X Hence, For conciseness, we use ω(A, B, C) to denote the sum of paths traveling from A, via B, and ending up in C, 0 0 0 µX (x)QBF (x, x ) = µX (x )QBF (x , x). (12) P P P 0 i.e. x∈A y∈B x0∈C ω(x, y, x ). Then, we have This implies that Q is reversible. Likewise, we can BF F(A, Bc) = αb(ω(A, Bc,A) + ω(A, Bc,Ac)), (17) show that QFB is reversible under the condition that F(Ac,B) = αb(ω(Ac,B,A) + ω(Ac,B,Ac)). (18) P+ is reversible. The proof is completed. Dahua Lin, John Fisher

As F is symmetric for a reversible chain, we have Note that η/2 = αb = βf and α + β = 1. Thus F(B,Ac) = F(Ac,B), and thus F αF + βF s ≥ s s ≥ ηφ/2. (25) c c c c c F(A, B ) + F(B,A ) ≥ αb(ω(A, B ,A ) + ω(A ,B,A)) µ+(A ∪ B) αµX (A) + βµY (B) = αb ω(A, Y, Ac). (19) Case 2. µX (A) < 1/2 and µY (B) > 1/2. On the other hand, we note that Given arbitrary κ > 2, there are two possibilities: c X X X 0 ω(A, Y, A ) = ω(x, y, x ) Case 2.1. 1/κ ≤ µX (X) < 1/2 and µY (B) > 1/2. x∈A y∈Y x0∈Ac Then 1 X X X 0 F ≥ αb · φµ (A) > αbφ. (26) = µX (x)QB(x, y)QF (y, x ) s X κ x∈A y∈Y x0∈Ac Recall that µ+(A ∪ B) ≤ 1/2. Thus X X X 0 = µ (x) QB(x, y)QF (y, x ) X F 2 ηφ x∈A x0∈Ac y∈Y s ≥ . (27) X X 0 µ+(A ∪ B) κ 2 = µX (x)QBF (x, x ). (20) x∈A x0∈Ac Case 2.2. µX (X) < 1/κ and µY (B) > 1/2. Here, we c c This is exactly the flow from A to A with respect to utilize the following fact: Fs ≥ F(B,A ) = F(B,X)− the collapsed chain QBF , i.e. F(B,A). Then, by the definition of flow, we have c c F(B,X) = βfµ (B) > βf/2, (28) ω(A, Y, A ) = FQBF (A, A ). (21) Y

c c and by the symmetry of F (due to reversibility), Assuming µX (A) ≤ µX (A ), we have FQBF (A, A ) ≥ Φ∗(QBF )µ(A), by the definition of bottleneck ratio. F(B,A) = F(A, B) ≤ F(A, Y ) = αb/κ. (29) Combining this with Eq.(19) results in With αb = βf, combining the results above leads to F(A, Bc) + F(B,Ac) ≥ αb · ω(A, Y, Ac) Fs > βf/2 − αb/κ = αb(1/2 − 1/κ). (30) ≥ αb · Φ∗(QBF )µX (A). (22)

c As a result, we get Likewise, with the assumption µY (B) ≤ µY (B ), we have Fs η > 2αb(1/2 − 1/κ) = (1 − 2/κ). (31) µ (A ∪ B) 2 c c + F(A, B ) + F(B,A ) ≥ βf · Φ∗(QFB)µY (B). (23) Case 3. µ (A) > 1/2 and µ (B) < 1/2. Following The proof of the lemma is completed. X Y a similar argument as we developed above for case 2, given κ > 2, we can likewise get B.3.2 The Proof of the Main Theorem ( 2 ηφ Fs κ 2 (µX (X) ≥ 1/κ), Let µ+ = (αµX , βµY ) be the stationary distribution ≥ (32) µ (A ∪ B) η (1 − 2/κ)(µ (X) < 1/κ). of P+. For conciseness, we let Fs(A, B) , F(A ∪ + 2 X B,Ac ∪Bc). When A and B are clear from the context, Note that µ (A) > 1/2 and µ (B) > 1/2 cannot we simply write F . Then the bottleneck ratio of P X Y s + hold simultaneously under the assumption αµ (A) + is the minimum of the values of F (A, B)/µ (A ∪ B), X s + βµ (B) ≤ 1/2. Integrating the results derived for all among all possible choices of A ⊂ X and B ⊂ Y such Y cases, we obtain that µ+(A∪B) ≤ 1/2 and A∪B 6= ∅. Throughout this   proof, we assume A ⊂ X, B ⊂ Y , and µ+(A ∪ B) ≤ Fs(A, B) η 2 2 1/2, i.e. αµ (A) + βµ (B) ≤ 1/2. ≥ min φ, 1 − , ∀κ > 2. (33) X Y µ+(A, B) 2 κ κ Under this assumption, there are three cases, which Note that this inequality holds for all A and B with we respectively discuss as follows. 0 < µ+(A ∪ B) ≤ 1/2. In this way, we can get a of lower bound of the bottleneck ratio, using Case 1. µX (A) ≤ 1/2 and µY (B) ≤ 1/2. different values of κ. And the supreme of these lower We have φ = min{Φ∗(QBF ), Φ∗(QFB)} in the theo- bounds remains a lower bound. It is easy to see that c c rem. In addition, Fs ≥ F(A, B ) + F(B,A ). Com- the supreme attains when 2φ/κ = 1 − 2/κ, leading to bining this with Lemma 4, we get  2 2  φ sup min φ, 1 − = . (34) Fs ≥ αb · φµX (A), and Fs ≥ βf · φµY (B). (24) κ>2 κ κ 1 + φ Detailed Proofs of Lemmas, Theorems, and Corollaries

It follows that B.6 Proof of Theorem 4 η φ Φ∗(P+) ≥ . (35) We show this theorem by progressively proving a series 2 φ + 1 of claims as follows. This completes the proof of the lower bound. Next, Claim 1. The augmented Markov chain is ergodic. we show the upper bound, which is easier. Due to the definition of bottleneck ratio, for any given partition With bk > 0 for k = 0,...,K −1, the root is accessible of X ∪ Y , the flow ratio derived from that partition from each state (including both complete and partial assignments). With f > 0 for k = 1,...,K, each constitutes an upper bound of Φ∗(P+). k state is accessible from root. These imply that any Here, we consider the partition with one part being X two states are accessible from each other via the root. and the other being Y . Then Therefore, the chain is irreducible. In addition, fK < 1 makes the chain aperiodic. As this is a finite Markov F = αb = βf, (36) s chain, we can conclude that it is ergodic. and thus the flow ratio is given by Since the chain is ergodic, it has a unique stationary F distribution, i.e. its equilibrium distribution. There- s = max(f, b). (37) min(α, β) fore, it suffices to show that (αµ0, β1µ1, . . . , βK µK ) that satisfies the three statements in the theorem is a This gives an upper bound of Φ∗(P+). The proof of stationary distribution. the theorem is completed. Claim 2. Given vectors µ0,..., µK respectively over the set of states at level 0,...,K, such that µ = µ is B.4 Proof of Corollary 1 0 a distribution over X, and for each k = 1,...,K, µk is defined recursively by When both QBF and QFB are ergodic, their bottle- neck ratios are positive. According to Theorem 3, the 1 X µk(y) = µk−1(x), for y ∈ Yk. bottleneck ratio of Φ∗(P+) is thus positive, implying K − (k − 1) that the spectral gap of P is positive (by Theorem 2), x∈Ch(y) + (39) and hence P+ is irreducible. Then, µk for each k = 1,...,K represents a distribu- Since QBF is ergodic and thus aperiodic, the greatest tion over Yk, and common divisor of the loop lengths for each x ∈ X is 1. −1 K X Similar argument applies to each y ∈ Y . Consequently, µ (y) = µ(x), ∀y ∈ Y . (40) k k k the joint chain characterized by P+ is aperiodic. x∈X:x y Being an irreducible and aperiodic finite Markov chain, Here, x y means that x is a descendant of y. the joint chain given by P+ is ergodic. This completes the proof. The proof of this claim is done by induction as follows. Obviously, when k = 1, according to the definition B.5 Proof of Lemma 3 above, we have

Let P0 = (P − ξI)/(1 − ξ). Since P has P(x, x) > ξ 1 X 1 X µ1(y) = µX (x) = µ(x). (41) 0 K K for each x ∈ X, the entries of the matrix P are all x∈Ch(y) x∈Ch(y) non-negative. In addition, K 1 1 This satisfies Eq.(40), as = K, and it is clear P01 = (P − ξI)1 = (1 − ξ1) = 1. (38) 1 1 − ξ 1 − ξ that µ1 is non-negative. In addition, we have This implies that P0 is also a stochastic matrix. X X 1 X µ (y) = µ(x) 1 K Since P is reversible, all its eigenvalues are real num- y∈Y1 y∈Y1 x∈Ch(y) bers. Without losing generality, we assume they are 1 X X = µ(x) λ1 ≥ ... ≥ λn. As P is a stochastic matrix, we have K y∈Y x∈Ch(y) λ1 = 1 and λn ≥ −1. According to the spectral 1 mapping theorem, the eigenvalues of P0, denoted by 1 X X 0 0 0 = µ(x) 1 = 1. (42) λ , . . . , λ , are given by λ = (λi − ξ)/(1 − ξ), for each K 1 n i x∈X y∈P a(x) i = 1, . . . , n. As P0 is a stochastic matrix, we have 0 λn−ξ λn ≥ −1, and thus 1−ξ ≥ −1. Therefore, λn ≥ 2ξ−1. Here, P a(x) is the set of parent states of x. In the The proof is completed. derivation above, we use the fact that x has K parents, Dahua Lin, John Fisher

i.e. |P a(x)| = K. The identity above implies that µ1 claim holds for k = 0, . . . , m with m < K, we are to is a valid distribution over Y1. Therefore, the claim show that it also holds for k = m + 1. Note from holds when k = 1. Eq.(45) that

Suppose that the claim holds for k = 1, . . . , m with cm+1,k Zm m < K, we are to show that it also holds for k = m+1, = , ∀k = 0, . . . , m, (47) cm,k Zm+1 so as to complete the induction. Note that µm+1 is defined as Hence, showing the claim holds for k = m+1 is equiv- alent to showing that (αµ+ , βµ ) is a stationary 1 X m m+1 µ (y) = µ (x), for y ∈ Ym+1. distribution of the augmented chain (up to (m + 1)-th m+1 K − m m x∈Ch(y) level), with Z α = m , (48) Again, µm+1 is obviously non-negative, and Zm+1 X 1 X X and µ (y) = µ (z) m+1 K − m m y∈Ym+1 y∈Ym+1 z∈Ch(y) Z − Z 1 b ··· b β = m+1 m = 0 m . (49) 1 X X Z Z f ··· f = µ (z) 1 = 1. m+1 m+1 1 m+1 K − m m z∈Ym y∈P a(x) According to Lemma 1, it suffices to check that this (43) distribution satisfies the cross-space detailed balance given in Eq.(2), which is not difficult to verify based Similar to the derivation for k = 1, here we apply the on the construction described in section 3.2. fact that |P a(x)| = K−m for each x ∈ Ym. This shows that µm+1 is a valid distribution over Ym. Moreover, In this proof, Claim 1 proves the ergodicity of the we have for each y ∈ Ym+1, joint chain. Claim 2 constructs a set of vectors µ0,..., µK , and states that they are valid distribu- −1 1 X K X tions over X,Y ,...,Y , and satisfy the properties µ (y) = µ(x) 0 K m+1 K − m m given in (S2). Claim 3 (induction up to k = K) shows x∈X:x z z∈Ch(y) that the distributions constructed in Claim 2 is ex- 1 (K − m)!m! X X = µ(x) actly a stationary distribution of the joint chain. Since K − m K! z∈Ch(y) x∈X:x z the chain is ergodic, this is the equilibrium distribu- tion. As a by product, Claim 3 also shows the the (K − m − 1)!m! X = (m + 1) µ(x) statement (S3) of the theorem. For (S1), it is auto- K! x∈X:x y matically established by the construction described in −1 Claim 2. Therefore, we can conclude that the proof of  K  X = µ(x). (44) the theorem has been completed. m + 1 x∈X:x y B.7 Proof of Corollary 2 By induction, we can conclude that the claim holds for each k = 1,...,K. Based on Theorem 4, we have Claim 3. When the construction is done up to the + K ∞ k-th level, the distribution µ (c µ , . . . , c µ ) 1 X b0 ··· bk−1 X 1 k , k,0 0 k,k k = 1 + ≤ 1 + κk = . (50) is a stationary of the augmented Markov chain. Here, α f1 ··· fk 1 − κ k=1 k=1 µ0,..., µk are given by Claim 2, and ck,0, . . . , ck,k is defined such that for each k0 = 0, . . . , k Hence, α ≥ 1 − κ. The proof is done.

1 1 b0 ··· bl−1 ck,0 = , ck,l = , (45) Zk Zk f1 ··· fl with k X b0 ··· bl−1 Zk = 1 + . (46) f1 ··· fl l=1

We are going to show this claim by induction. Note that µ0 = µ over X is a stationary distribution of + P. And µ0 = c0,0p0, thus c0,0 = 1. It immediately follows that the claim is true for k = 0. Suppose this