COMPSCI 532: Design and Analysis of Algorithms October 30, 2015 Lecture 17 Lecturer: Debmalya Panigrahi Scribe: Allen Xiao

1 Overview

In this lecture, we introduce semidefinite programming through the semidefinite programming algorithm for maxcut. Note: missing diagrams.

2 Problem

Definition 1. For a graph G = (V,E) with edge weights ce, the maximum cut problem is to find a cut S ⊆ V such that the weight of edges across (S, S) is maximized. Before we get into semidefinite programming, we will quickly examine several simple approxi- mations for maximum cut.

2.1 Greedy algorithm We can think of the cut as a partition of verticesl C = (S, V \ S) whiere S ⊆ V . We can switch between different cuts by moving vertices across the cut, in to or out of S. Moving v across the cut swaps its cut edges with its non-cut edges. This increase the value of the cut when the total weight of its non-cut edges exceeds the weight of the cut edges. If v ∈ S (and similarly if v ∈ V \ S): X X w(e) > w(e) e∈δ(v)∩(S×S) e∈δ(v)∩(S×(V \S)) These greedy moves give us a simple optimization procedure: 1. Begin with an arbitrary cut (e.g. S = ∅). 2. While there are greedy moves improving the cost, perform them. This procedure terminates since the cut value increases monotonically. But is not guaranteed to terminate optimally. There exist examples with “local optima”, with suboptimal value but where no single-vertex swap improves the value. To analyze the approximation ratio, observe that at optimality: X X w(e) ≤ w(e) ∀v ∈ S e∈δ(v)∩(S×S) e∈δ(v)∩(S×(V \S)) and similarly for v ∈ V \ S. We can find an equivalent form by adding the weight of cut edges to both sides: X X X X w(e) + w(e) ≤ w(e) + w(e) e∈δ(v)∩(S×S) e∈δ(v)∩(S×(V \S)) e∈δ(v)∩(S×(V \S)) e∈δ(v)∩(S×(V \S))

17-1 Simplifying: X X w(e) ≤ 2 w(e) e∈δ(v) e∈δ(v)∩C If we sum over all vertices: X X 1 X X w(e) ≥ w(e) 2 v e∈δ(v)∩C v e∈δ(v)

The left hand side is exactly twice the value of the cut, while the right hand side (sum of degree cuts) counts every edge twice. ! 1 X 2w(C) ≥ 2 · w(e) 2 e Since OPT uses a subset of edges: X 2w(C) ≥ w(e) ≥ OPT e This greedy algorithm therefore has an approximation factor of 2. Many algorithms for maximum cut also achieve a 2-approximation, including the next two we discuss.

2.2 Naive Instead of making informed choices about the graph, how good do we do when we simply assign vertices to S or V \ S independently and uniformly at random? That is, for each v ∈ V : 1 Pr (v ∈ S) = Pr (v ∈ V \ S) = 2 Then, for any edge (v, w), the probability it is a cut edge is the probability that the vertices were assigned to opposite partitions. 1 Pr ((v, w) ∈ C) = 2 The expected weight of the cut is: X X w(e) = w(e) Pr(e ∈ C) e∈C e∈E 1 X = w(e) 2 e∈E 1 ≥ OPT 2 Still a 2-approximation, on expectation.

17-2 2.3 Alternate greedy algorithm The final 2-approximation we examine is an attempt to improve on the greedy algorithm. In this version, we iteratively insert vertices into the partition which maximizes the weight of neighbors so far. Let the order in which vertices are added be v1, . . . vn. Let δ(vi, −) = {(v, w) | w ∈ {v1, v2, . . . , vi−1}}, the edges between vi and vertices added before vi. A property similar to the one we derived in the first greedy algorithm holds here, but over δ(v, −):

X 1 X w(e) ≥ w(e) 2 e∈δ(v,−)∩C e∈δ(v,−)

Summing over v again gives us the 2-approximation:

X X 1 X X w(e) ≥ w(e) 2 v e∈δ(v)∩C v e∈δ(v)

The difference between these sums (with δ(v, −)) and the previous ones with δ(v) is that edges are not double counted. 1 X 1 w(C) ≥ w(e) ≥ OPT 2 2 e∈E

2.4 Maximum cut linear program The following linear program for maximum cut still does not do better than a 2-approximation. Obviously, we wish to minimize over the weight of cut edges. How do we ensure the edge set we pick forms a valid cut? First, observe that the edges of a valid cut form a . One of the core properties of bipartite graphs is that they have no odd cycles. The constraint we use is as follows: for every cycle Γ, the number of cut edges in the cycle should be even. We will express this the following way: For all F ⊂ Γ (proper subset) with odd size:

1. F is missing a cut edge of Γ, or

2. At least one edge of F is not a cut edge.

We can express this the following way: X max wexe e∈E s.t. ∀Γ ∈ G, F ⊂ Γ, |F | ≡ 1 (mod 2) X X xe − xe ≤ |F | − 1 e∈F e∈(Γ\F )

xe ≥ 0 ∀e ∈ E

17-3 2.5 Relaxation into a quadratic program The algorithm we will introduce for beating 2 will use something called semidefinite programming. This is a very slight relaxation of linear programs, and a class of non-linear optimization problems for which we have efficient algorithms (interior point methods, for example). We return to the maximum cut problem. Suppose we relax the linearity constraint, and repre- sent the cut as a partitioning of vertices instead: ( 1 v ∈ S yv = −1 v ∈ V \ S

If we let ourselves take products between yv, the program for maximum cut is quite simple:   X 1 − yuyv max w uv 2 (u,v)∈E

s.t. yv ∈ {±1} ∀v ∈ V

Here, the function (1 − yuyv)/2 is a sort of parity function (XOR) between yu and yv. ( 1 − y y 1 y = −y u v = u v 2 0 yu = yv Thus, the objective includes all (u, v) which are in the cut, as desired. What we have here, however, is a quadratic program. In general, quadratic programs are NP-hard to solve, but the form of the problem here falls in the subclass of semidefinite programs, which we can solve efficiently.

3 Semidefinite Programming

We will construct a generic semidefinite programs by starting from a linear program. Recall the matrix form of a linear program: n X min cixi i=1 n X s.t. aikxi ≥ bk ∀k i=1

xi ≥ 0 ∀i

n2 n×n Suppose that x is a R vector representing a R square matrix X. We can rewrite the program to index according to this matrix (by i, j instead of just i). X min cijxij i,j X s.t. aijkxij ≥ bk ∀k i,j

xij ≥ 0 ∀i, j

The nonnegativity constraint (xij ≥ 0) says that X may only have nonnegative entries. We will relax this constraint, instead saying that X must be positive semidefinite.

17-4 Definition 2. A square, symmetric matrix X is positive semi-definite (PSD) if all the eigen- values of X are nonnegative. Equivalently, X can be decomposed such that each entry is the dot n×n product of two vectors in (~v1, . . . ,~vn) ∈ R .

xij = ~vi · ~vj ∀i, j n If X is positive semidefinite, then by the decompsition definition there is some set of R vectors (~v1, . . . ,~vn) where we can write the program as: X min cij(~vi · ~vj) i,j X s.t. aijk(~vi · ~vj) ≥ bk ∀k i,j Definition 3. A semidefinite program (SDP) is an optimization problem of the form: X min cij(~vi · ~vj) ~v ,...,~v ∈ n 1 n R i,j X s.t. aijk(~vi · ~vj) ≥ bk ∀k i,j

3.1 Maximum cut with SDP The following maximum cut algorithm and analysis is due to Goemans and Williamson [GW95]. Recall the quadratic maximum cut program:   X 1 − yuyv max w uv 2 (u,v)∈E

s.t. yv ∈ {±1} ∀v ∈ V To mold this into an SDP, we will need to make two changes:

1. Convert the yi into n-dimensional vectors ~yi. We do this in the most naive way possible: −1 +1     0   0   ~y =   ,   i  .   .   .   .    0 0 

n 2. Make a fractional relaxation, instead of yi ∈ {±1}, have 0 ≤ yi ≤ 1. In R , we instead require ~yi to be a unit vector. ~yi · ~yi = 1

After these modifications, the ~yi are points on the n-dimensional unit sphere. The final SDP is:   X 1 − ~yi · ~yj max w ij 2 (i,j)∈E

s.t. ~yi · ~yi = 1 ∀i ∈ V n ~yi ∈ R ∀i ∈ V

17-5 As with linear programs, we must round fractional solutions of SDP back into integral ones. Here, n we must partition V and send each vertex to S or V \S. Each ~yi is a point on the unit sphere in R , we wil round by picking a random half-space ~r which partitions the sphere into two hemispheres. Let ~r be a uniform random point on the sphere: ( i ∈ S ~yi · ~r ≥ 0

i ∈ V \ S ~yi · ~r< 0 An edge (i, j) ends up in the cut if its endpoints are on opposite sides of the hyperplane induced by ~r. n Finding a uniform random direction ~r ∈ R is a problem known as hypersphere point picking, which we will not discuss in this note. It is insufficient to choose each of n coordinates independently, or even use independent samples of the polar coordinates (neither will be uniform on the sphere).

3.2 Maximum cut SDP analysis Suppose we have an edge (i, j). To analyze the approximation ratio, we will compare its contribution to the SDP versus the expected rounded contribution. Let θ ∈ [0, π] be the minimum angle between ~yi and ~yj in the SDP fractional solution. 1 − ~y · ~y  w i j ij 2 1 − k~y kk~y k cos θ  = w i j ij 2 1 − cos θ  = w ij 2

After rounding, in expedctation, the probability that ~yi and ~yj are by the ~r is proportional to the angle between them.  θ  w Pr ((i, j) ∈ C) = w ij ij π On this single edge, the gap between the SDP solution and the rounded solution is the ratio of the right hand sides. The worst case, for 0 ≤ θ ≤ π, can be solved numerically to: θ/π max ≈ 0.878 0≤θ≤π (1 − cos θ)/2 Summing over all edges, the expected approximation ratio is: ALGO ≥ 0.878SDP ≥ 0.878OPT This algorithm can also be derandomized, for the same approximation ratio.

References

[GW95] Michel X Goemans and David P Williamson. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of the ACM (JACM), 42(6):1115–1145, 1995.

17-6