<<

Counting and Sampling Winter 2020 Lecture 11: Log Concavity and Convex Programming Lecturer: Shayan Oveis Gharan Feb 21st

Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal publications.

In this lecture we see applications of (completely) log-concave polynomials in convex optimization and approximate counting.

11.1 Log Concavity and Counting

[n] We say a probability distribution µ : 2 → ≥0 is (completely) log-concave if the generating polynomial gµ is (completely) log-concave. Recall that the entropy of µ is defined as X H(µ) = µ(S) log µ(S). S

The following theorem is the main result of this section:

[n] Theorem 11.1. µ : 2 → R≥0 be a homogeneous log-concave probability distribution. For 1 ≤ i ≤ n, let µi := PS∼µ [i ∈ S] . Then,

n n X 1 X 1 1 µ log ≤ H(µ) ≤ µ log + (1 − µ ) log . i µ i µ i 1 − µ i=1 i i=1 i i

Observe that the right inequality simply follows by the sub-additivity of entropy and it holds for any prob- ability distribution µ: Say X1,...,Xn be random variables where Xi is the indicator random variable of i. Then,

H(µ) = H(X1) + H(X2|X1) + ··· + H(Xn|X1,...,Xn−1) n X 1 1 ≤ H(X ) + H(X ) + ··· + H(X ) = µ log + (1 − µ ) log . 1 2 n µ i 1 − µ i=1 i i So, we get an equality above if µ is an independent Bernoulli distribution where the success probability of the i-th Bernoulli is µi. So, the main non-trivial part is the proof of the left inequality.

Proof. The proof simply follows by log-concavity and the Jensen’s inequality. First, let us recall Jensen’s inequality: Let µ be a probability distribution on 2[n], x : 2[n] → Rn be a that associates a vector to every set S ⊆ [n] and f : Rn → R be a , then ! X X µ(S)f(xS) ≤ f µ(S)xS . (11.1) S S

11-1 11-2 Lecture 11: Log Concavity and Convex Programming

To use the above inequality we let µ be the given log-concave probability distribution. So, we just need to define xS = (xS1, . . . , xSn) for any set S. For any set S ⊆ [n] let ( 1 if i ∈ S µi xS = i 0 otherwise.

Also, for a vector x = (x1, . . . , xn) define

X S f(x) = log gµ(x) = log µ(S)x . S Firstly, we need to evaluate each side of (11.1). For a set T ⊆ [n] observe that

X S f(xT ) = log µ(S)(xT ) S X Y 1 = log µ(S) I [i ∈ T ] µi S i∈S Y 1 = log µ(T ) . µi i∈T Therefore, we can rewrite the LHS of (11.1) as follows:

X X X X 1 µ(S)f(xS) = µ(S) log µ(S) + µ(S) log µi S S S i∈S n X  1  X = −H(µ) + log µ(S) µi i=1 S:i∈S X 1 = µ log − H(µ). i µ i i To finish the proof of the theorem it is enough to show that the RHS (11.1) is 0. First, let us evaluate P S µ(S)xS. For a coordinate i X X 1 µ(S)xSi = µ(S) · = 1. µi S S:i∈S Therefore, the RHS of (11.1) is exactly X f(1, 1,..., 1) = log µ(S) = log 1 = 0 S as desired.

Note that if µ is the uniform distribution over the bases of a M = ([n], I) we can also show

n X 1 H(µ) ≥ (1 − µ ) log . i 1 − µ i=1 i The idea is to take work with the dual of M. Given a matroid M = ([n], I), the dual, M ∗, is a matroid on the same ground set [n] where B is a base of M ∗ iff B is a base of M. So, if M has rank r, then M ∗ has rank n − r. Furthermore, we can write

gM ∗ = z1 . . . zngM (1/z1,..., 1/zn). Lecture 11: Log Concavity and Convex Programming 11-3

It is well-know that for any matroid M, its dual is also a matroid, so gM ∗ is also a completely log-concave polynomial. It then follows that the marginal of distribution µ∗ uniform over the bases of M ∗ are 1 − ∗ µ1,..., 1 − µn but H(µ ) = H(µ). Pn 1 Using these two facts, it follows that H(µ) can be approximated within a multiplicative factor 2 by µi log + i=1 µi Pn 1 (1 − µi) log . This shows that uniform distributions over bases of a matroid are “too far” from i=1 1−µi independent distributions.

11.2 An Application in Combinatorics

Theorem 11.2. Let M = ([n], I) be a matroid of rank r. Suppose that M has k disjoint bases B1,...,Bk, r i.e., for any i, j, Bi ∩ Bj = ∅. Then, M has at least k many bases.

Proof. Let µ be the uniform distribution over the bases of M. Then, H(µ) = log #B. So, we just need to prove H(µ) ≥ r log k.

Consider the vector x = 1B1 +···+1Bk . We claim that there is a probability distribution ν with marginals x such that gν is log-concave. Since x ∈ Newt(gµ), by Theorem 5.2, there is an external field (λ1, . . . , λn) ≥ 0 such that the marginals of λ ∗ µ is equal to x. Since gµ is a log-concave polynomials, and log-concave polynomials are closed under external field, gλ∗µ is also log-concave. So, we can let ν = λ ∗ µ.

To be precise, if x is not in the interior of Newton polytope of gµ we can find a sequence of external field vectors λ1,..., such that the marginals of µ ∗ λt converges to x as t → ∞. In such a case, we let ν be t limt→∞ λ ∗ µ. By the closure of log-concave polynomials under taking limits, gν is log-concave.

Since gν is log-concave, by Theorem 11.1,

n X 1 X 1 H(ν) ≥ νi log = log k = r log k. νi k i=1 i∈B1∪···∪Bk

Finally, the theorem follows from the fact that µ has the largest possible entropy among all distributions supported over the bases of M. So, H(µ) ≥ H(ν) ≥ r log k as desired.

As an immediate corollary, a complete graph G with n vertices has at least n/2 − 1 disjoint spanning trees. So, the above theorem implies that a complete graph has at least (n/2 − 1)n−1 many spanning trees.

11.3 Counting the number of Bases of a Matroid

One of the fundamental open problems in the field of counting is counting the number of bases of a general matroid. Note that in some special cases we can do this exactly: For example if M is a graphic matroid where for a given graph G = (V,E), E is the set of elements of M and a set S ⊆ E is independent if it does not have any cycle, then the bases correspond to the set of spanning trees of G, and we can exactly count the number of spanning trees of any given graph by computing the determinant of the Laplacian matrix of G. In the next lectures, we will see how to use Markov chains to approximately sample (and count) bases of a matroid. In this section, we give a deterministic algorithm for this task: 11-4 Lecture 11: Log Concavity and Convex Programming

Theorem 11.3 (AOV17). There is a deterministic polynomial time algorithm that for any matroid M(E, I) given oracle access to I gives a er approximation to the number of bases of M where r is the rank of M.

Let µ be the uniform distribution over all bases of M. Firstly, observe that H(µ) = log #B. Therefore, Pn 1 Pn 1 µi log (1−µi) log e i=1 µi gives an approximation of the number of bases up to a multiplicative error e i=1 1−µi . So, let us upper bound the latter quantity. Observe that

n n Pn 1 i=1(1−µi) log 1−µ Y −(1−µi) Y µi r e i = (1 − µi) ≤ e = e . (11.2) i=1 i=1 The inequality follows by the fact that for any

(1 − x)−(1−x) ≤ ex, ∀0 ≤ x ≤ 1 (11.3)

and the last identity follows by linearity of expectation.

Therefore, to prove Theorem 11.3 all we need to do is to compute µ1, . . . , µn. But the natural way to compute the marginal probabilities is to compute the partition function. So, it seems that we haven’t made any progress. We claim the optimum value of the following convex program gives an additive O(r) approximation to H(µ): n n X 1 X 1 max α log + (1 − α ) log i α i 1 − α i=1 i i=1 i (11.4) s. t. α ∈ P (M).

Here, P (M) = Newt(gM ) is the matroid base polytope, i.e., it is the of the indicator vectors of all bases of M. Note that the objective function of this convex program is concave so we can solve this program in polynomial time. Furthermore, as we discussed

Let α be an optimum solution. Observe that the ideal marginal probabilities µ1, . . . , µn is a feasible solution to the above program. Therefore,

n n X 1 1 X 1 1 α log + (1 − α ) log ≥ µ log + (1 − µ ) log ≥ H(µ). (11.5) i α i 1 − α i µ i 1 − µ i=1 i i i=1 i i So, to prove Theorem 11.3 it is enough to show that the optimum of the above program is at most H(µ)+O(r).

[n] Since x is in Newt(gµ), similar to the proof of Theorem 11.2, there a log-concave distribution ν : 2 → R≥0 with marginals x (again by Theorem 5.2, there is an external field λ = (λ1, . . . , λn) such that gµ∗λ has marginals x).

Since gν is log-concave and has marginals x, by Theorem 11.1,

n X 1 H(ν) ≥ α log . i α i=1 i Since the uniform distribution over all bases has the largest possible entropy we also get H(µ) ≥ H(ν) ≥ Pn 1 αi log . Finally, we can write i=1 αi

n n X 1 1 X 1 X α log + (1 − α ) log ≤ H(µ) + n(1 − α ) log ≤ H(µ) α = H(µ) + r i α i 1 − α i 1 − α i i=1 i i i=1 i i=1 where the last inequality follows by (11.3). This together with (11.5) finishes the proof of Theorem 11.3. Lecture 11: Log Concavity and Convex Programming 11-5

11.4 Counting Common Bases of Two

Theorem 11.4. Given two matroids M1 = ([n], I1),M2 = ([n], I2) by independence oracle of rank r. There is a deterministic polynomial time algorithm that counts the number common bases to M1,M2 up to a multiplicative factor of cr, for some constant > 1.

The algorithm is a natural extension of the algorithm in the previous section:

n n X 1 X 1 max αi log + (1 − αi) log αi 1 − αi i=1 i=1 (11.6) s. t. α ∈ P (M1)

α ∈ P (M2).

We will see that the optimum solution of the above program gives an additive O(r) approximation to log |B(M1) ∩ B(M2)|. As a consequence, it follows that if there are k disjoint bases in the intersection of r M1,M2, then |B(M1) ∩ B(M2)| ≥ (k/c) for some constant c > 1 (independent of k, r). This can be seen as a generalization of the van-der-Waerden conjecture to matroid intersection. Next, we discuss main ideas to prove Theorem 11.4. Let x be an optimum solution of the convex program 11.6. Let µ be the uniform distribution over the bases in B(M1) ∩ B(M2). Then, obviously µ1, . . . , µn is a Pn 1 1 feasible solution to the above program. So, H(µ) ≤ H(αi), where H(αi) = αi log + (1 − αi) log . i=1 αi 1−αi So, the main non-trivial part is to show that

n X H(µ) ≥ H(αi) − O(r). (11.7) i=1

The following theorem, which can be seen as a generalization of Gurvits’ machinery to completely log-concave polynomials is the main technical part of the proof:

Theorem 11.5. Let p ∈ R[y1, . . . , yn, z1, . . . , zn] be a multilinear completely log-concave polynomial. Then, for any α ∈ [0, 1]n, n Y p(y, z) (∂y + ∂z )p|y=z=0 ≥ Φ(α) inf , i i y,z>0 yαz1−α i=1 where Φ(α) is defined as follows: n α Y αi  i Φ(α) = (α/e2)α = . e2 i=1

We do not discuss the proof of the above theorem here. The proof is a generalization of Theorem 6.1. We can follow an inductive proof similar to Gurvits’ theorem. The main non-trivial part is the base case where p is a bi-variate polynomial. There the proof mainly follows from Problem 4 of HW2, i.e., that a multilinear bivariate polynomial a + by + cz + dyz is completely log-concave iff 2bc ≥ ad.

To use the above theorem, let gM1 , gM2 be the generating polynomial of matroids M1,M2 respectively. Recall ∗ that M2 is the dual of the matroid M2. Having defined the above quantities, we let

p(y , . . . , y , z , . . . , z ) = g (y , . . . , y )g ∗ (z , . . . , z ). 1 n 1 n M1 1 n M2 1 n 11-6 Lecture 11: Log Concavity and Convex Programming

Crucially observe that n Y |B(M1) ∩ B(M2)| = (∂yi + ∂zi )p|y=z=0. i=1 Qn This is because i=1(∂yi + ∂zi ) zeros out any monomial of p where the sum of the degree of yi, zi is not 1. Now, to invoke Theorem 11.5 we need to show that p is completely log-concave. This is a non-trivial fact, and it follows from the following lemma:

Lemma 11.6. For any two homogeneous completely log-concave polynomials p, q ∈ R[z1, . . . , zn], p·q is also completely log-concave.

Following the above discussion, we have

g (y)g ∗ (z) M1 M2 H(µ) = log |B(M1) ∩ B(M2)| ≥ log Φ(α) inf y,z>0 yαz1−α n X 1 ≥ Φ(α) + 2 α log i α i=1 i n n X 1 X = α log − 2 α i α i i=1 i i=1 n n X X 1 = H(α ) − 2r − (1 − α ) log i i 1 − α i=1 i=1 i n X ≥ H(αi) − 3r. i=1 where the first inequality follows by the following fact, and the last inequality follows by (11.3). This proves (11.7) as desired.

Fact 11.7. For any α ∈ Newt(gM1 ), n gM1 (y) X 1 log inf ≥ αi log . y>0 yα α i=1 i

gµ1 (y) Proof. Let µ1 be the uniform distribution over bases of M1. Recall that by Lecture 5, λ = argminy>0 yα gives a distribution λ∗µ1 with generating polynomial gλ∗µ1 (y1, . . . , yn) = gµ1 (λ1y1, . . . , λnyn) with marginals Pn 1 α1, . . . , αn. But since gλ∗µ is completely log-concave by Theorem 11.1 H(λ ∗ µ1) ≥ αi log . But on 1 i=1 αi the other hand,

11.5 Future Directions

One natural question is if one can prove an analogue of Theorem 11.4 for matchings in general graphs. The following long-standing open problem was posed by Lovasz and Plummer: Conjecture 11.8. Given a k-regular k-edge connected (general) graph G = (V,E) with n vertices. Prove that G has at least Ω(k)n many perfect matchings.

An analogous algorithmic question is if there is a deterministic algorithm cn- for counting the number of matchings in general graphs. The questions is if we can prove some form (complete) log-concavity that can be used to approximately count matchings in general graph.