Finding almost-perfect graph bisections

Venkatesan Guruswami1 Yury Makarychev2 Prasad Raghavendra3 David Steurer4 Yuan Zhou5

1Computer Science Department, Carnegie Mellon University, Pittsburgh, PA. Some of this work was done during a visit to Microsoft Research New England. 2Toyota Technological Institute at Chicago, Chicago, IL. 3College of Computing, Georgia Institute of Technology, Atlanta, GA. 4Microsoft Research New England, Cambridge, MA. 5Computer Science Department, Carnegie Mellon University, Pittsburgh, PA. Some of this work was done when the author was a student intern at TTI-C. [email protected] [email protected] [email protected] [email protected] [email protected]

Abstract: We give a polynomial time algorithm that given a graph which admits a bisection cutting a fraction (1 − ε) of edges, finds√ a bisection cutting a (1 − g(ε)) fraction of edges where g(ε) → 0 as ε → 0. One can take g(ε) = O( 3 ε log(1/ε)). Previously known algorithms for Max Bisection could only guarantee finding a bisection that cuts a fraction of edges bounded away from 1 (in fact less than 3/4) in such graphs.√ The known Unique-Games hardness results for Max imply that one cannot achieve g(ε) 6 C ε for some absolute constant C. Likewise, for the Min Bisection problem, if the graph has a bisection√ cutting at most ε-fraction of the edges, our methods enable finding a bisection cutting at most O( 3 ε log(1/ε))-fraction of the edges. The algorithms can also√ find cuts that are nearly-balanced (say√ with imbalance of at most 0.01) with value at least 1 − O( ε) (for Max Bisection√) and at most O( ε) (for Min Bisection). These guarantees are optimal up to constant factors in the O( ε) term under the Unique Games and related conjectures. Our general approach is simple to describe: First, we show how to solve the problems if the graph is an expander or if the graph consists of many disjoint components, each of small measure. Then, we show that every graph can be decomposed into disjoint components that are either expanders or have small measure, and how the cuts on different pieces may be merged appropriately. Keywords: Graph bisection; Max-Cut; Expander Decomposition; Semidefinite Programming; Approxi- mation Algorithms

1 Introduction point for the immense and highly successful body of work on semidefinite programming (SDP) based In the Max Cut problem, we are given a graph approximation algorithms. This algorithm guar- and the goal is to partition the vertices into two antees finding a cut whose value (i.e., fraction of parts so that a maximum number of edges cross edges crossing it) is at least 0.878 times the value the cut. As one of the most basic problems in the of the maximum cut. On graphs that are “almost- class of constraint satisfaction problems, the study bipartite” and admit a partition such that most edges of Max Cut has been highly influential in advanc- cross the cut, the algorithm performs much better ing the subject of approximability of optimization — in particular, if there is a cut such that a fraction problems, from both the algorithmic and hardness (1 − ε) of edges cross the cut, then the algorithm√ sides. The celebrated Goemans-Williamson (GW) finds a partition cutting at least a fraction 1−O( ε) algorithm for Max Cut [GW95] was the starting

1 T nγ of edges. (All through the discussion in the paper, γ>0 TIME(2 ) [HK04]. Note that this hardness think of ε as a very small positive constant.) factor is slightly better than the inapproximabil- On the hardness side, the best known NP- ity factor of 16/17 known for Max Cut [Hås01, hardness result shows hardness of approximating TSSW00]. A simple approximation preserving re- Max Cut within a factor greater than 16/17 [Hås01, duction from Max Cut shows that Max Bisection TSSW00]. Much stronger and in fact tight inap- is no easier to approximate than Max Cut (the re- proximability results are now known conditioned duction is simply to take two disjoint copies of the on the of Khot [Kho02]. Max Cut instance). Therefore, the factor 0.878 In fact, one of the original motivations and appli- Unique-Games hardness for Max Cut [KKMO07] cations for the formulation of the UGC in [Kho02] also applies for Max Bisection. Further, given a graph that has a bisection cutting 1 − ε of the edges, was to√ show that finding a cut of value larger than 1−o( ε) in a graph with Max-Cut value (1−ε) (i.e., it is Unique-Games hard to find a bisection√ (or even improving upon the above-mentioned performance any partition in fact) cutting 1−O( ε) of the edges. guarantee of the GW algorithm substantially) is An intriguing question is whether Max Bisec- tion is in fact harder to approximate than Max Cut likely to be hard. This result was√ strengthened in [KKMO07] to the optimal 1 − O( ε) bound, and (so the global condition really changes the com- this paper also showed that the 0.878 approximation plexity of the problem), or whether there are algo- factor is the best possible under the UGC. (These rithms for Max Bisection that match (or at least 1 results relied, in addition to the UGC, on the Major- approach) what is known for Max Cut. None of ity is Stablest conjecture, which was proved shortly the previously known algorithms for Max Bisec- afterwards in [MOO10].) tion [FJ97, Ye01, HZ02, FL06] are guaranteed to find a bisection cutting most of the edges even when There are many other works on Max Cut, includ- the graph has a near-perfect bisection cutting (1−ε) ing algorithms that improve on the GW algorithm of the edges (in particular, they may not even cut for certain ranges of the optimum value and inte- 75% of the edges). These algorithms are based on grality gap constructions showing limitations of the rounding a vector solution of a semidefinite pro- SDP based approach. This long line of work on gramming relaxation into a cut (for example by a Max Cut culminated in the paper [OW08] which random hyperplane cut) and then balancing the cut obtained the precise integrality gap and approxima- by moving low-degree vertices from the larger side tion threshold curve as a function of the optimum to the smaller side. After the first step, most of the cut value. edges are cut, but the latter rebalancing step results Maximum Bisection. Let us consider a closely re- in a significant loss in the number of edges cut. In lated problem called Max Bisection, which is Max fact, as we will illustrate with a simple example in Cut with a global “balanced cut” condition. In the Section 2, the standard SDP for Max Bisection has Max Bisection problem, given as input a graph with a large integrality gap: the SDP optimum could be an even number of vertices, the goal is to partition 1 whereas every bisection might only cut less than the vertices into two equal parts while maximizing 0.95 fraction of the edges. the fraction of cut edges. Despite the close relation Thus an interesting “qualitative” question is one ax ut ax isec to M C , the global constraint in M B - can one efficiently find an almost-complete bisec- tion changes its character substantially, and the tion when promised that one exists. Formally, we ax isec known algorithms for approximating M B - ask the following question. tion have weaker guarantees. While Max Cut has a factor 0.878 [GW95], Question 1.1. Is there a polynomial time algorithm the best known approximation factor for Max Bi- that given a graph G = (V, E) with a Max Bisection section equals 0.7027 [FL06], improving on pre- solution of value (1 − ε), finds a bisection of value vious bounds of 0.6514 [FJ97], 0.699 [Ye01], and (1 − g(ε)), where g(ε) → 0 as ε → 0? 0.7016 [HZ02]. 1 In terms of inapproximability results, it is Note that for the problem of minimizing the number of edges cut, the global condition does make a big difference: Min known that Max Bisection cannot be approxi- Cut is polynomial-time solvable whereas Min Bisection is NP- mated to a factor larger than 15/16 unless NP ⊆ hard.

2 Note that without the√ bisection constraint, we whether one approach (or even match) the 0.878 can achieve g(ε) = O( ε), and when ε = 0, we can factor possible for Max Cut. We hope that some find a bisection cutting all the edges (see the first of our techniques will also be helpful towards this paragraph of Section 2 for details). Thus this ques- goal. tion highlights the role of both the global constraint More generally, our work highlights the chal- and the “noise” (i.e., ε fraction of edges need to be lenge of understanding the complexity of solving removed to make the input graph bipartite) on the constraint satisfaction problems with global con- complexity of the problem. straints. Algorithmically, the challenge is to ensure Our results. In this paper, we answer the above that the global constraint is met without hurting the question in the affirmative, by proving the following number of satisfied constraints. From the hardness theorem. side, the Unique-Games based reductions which have led to a complete understanding of the ap- Theorem 1.2. There is a randomized polyno- proximation threshold of CSPs [Rag08] are unable mial time algorithm that given an edge-weighted to exploit the global constraint to yield stronger 2 graph G with a Max Bisection solution of value hardness results. (1 √− ε) finds a Max Bisection of value (1 − Our methods apply just as well to the Min Bi- 3 O( ε log(1/ε))). section problem where the goal is to find a bisec- We remark that for regular graphs any cut with tion that cuts fewest number of edges. The best most of the edges crossing it must be near-balanced, approximation factor known for Min Bisection is and hence we can solve Max Bisection by sim- poly-logarithmic in the number of vertices [FK02]. ply reducing to Max Cut. Thus the interesting Here we show that we can get better guarantees instances for our algorithm are non-regular graphs. in the case when the best bisection cuts a constant Our algorithms are not restricted to finding exact fraction of the edges. Specifically, if there is a bisec- bisections. If the graph has a β-balanced cut (which tion of value ε, our methods guarantee efficiently means the two sides have a fraction β and 1−β of the finding a bisection cutting at most O(ε1/3 log(1/ε)) vertices) of value (1−ε), then the algorithm can find of the edges, and a nearly-balanced cut (say with a β-balanced cut of value 1 − O(ε1/3 log(1/ε)). The bias at most 0.01) of value at most O(ε1/2). The performance guarantee of a variant of the algorithm adaptation of our algorithm and analysis to Min improves if some extra imbalance is tolerated in Bisection is simple, see Section 8. (Similar to√Max the solution — for any constant parameter δ, the Bisection, the approximation guarantee O( ε) is algorithm can return a cut with balance in the range optimal as a function of ε for constant imbalance, β ± δ with value at least 1 − O(ε1/2). Formally, we assuming the so-called Small-Set Expansion con- prove the following theorem. jecture [RST10] or a variant of the Unique Games Conjecture [AKK+08].) Theorem 1.3. There is a randomized polynomial time algorithm that for any constant δ > 0 given an edge-weighted graph G with a β-balanced cut of 2 Method overview

2.1 Integrality gap value (1 − ε) finds a cut (A, B) such that |A|/|V| − √ We begin by describing why the standard SDP β 6 O(δ) of value (1 − O( ε)). for Max Bisection has a large gap. Given a graph As mentioned earlier, as a function of ε, the G = (V, E), this SDP, which is the basis of all above approximation is best possible assuming the previous algorithms for Max Bisection starting Unique Games Conjecture. with that of Frieze and Jerrum [FJ97], solves for Our results are not aimed at improving the gen- unit vectors vi for each vertex i ∈ V subject to P eral approximation ratio for Max Bisection which i vi = 0, while maximizing the objective function 1 2 remains at ≈ 0.7027 [FL06]. It remains an interest- ¾e=(i, j)∈E 4 kvi − v jk . ing question how much this can be improved and This SDP could have a value of 1 and yet the graph may not have any bisection of value more 2The value of a cut in an edge-weighted graph is defined as the weight of the edges crossing the cut divided by the total than 0.95 (in particular the optimum is bounded weight of all edges. away from 1), as the following example shows.

3 Take G to be the union of three disjoint copies of vertices of G that is defined as follows K2m,m (the complete 2m × m ) for  1, if u = v and du , 0, some even m. It can be seen that every bisection  √ L  fails to cut at least m2/2 edges, and thus has value G(u, v) = −1/ dudv, if (u, v) ∈ E,  at most 11/12. On the other hand, the SDP has a 0, otherwise, solution of value 1. Let ω = e2πi/3 be the primitive where du is the degree of the vertex u. Let λ2(LG) cube root of unity. In the two-dimensional complex L i−1 be the second smallest eigenvalue of G. We abuse plane, we assign the vector/complex number ω L i−1 the notation by letting λ2(G) = λ2( G). We define (resp. −ω ) to all vertices in the larger part (resp. ⊆ th the volume of a set U V as vol(U) = volG(U) = smaller part) of the i copy of K2m,m for i = 1, 2, 3. P u∈U du. These vectors sum up to 0 and for each edge, the We will use the following version of Cheeger’s vectors associated with its endpoints are antipodal. inequality. For all CSPs, a tight connection between inte- grality gaps (for a certain “canonical” SDP) and in- Theorem 2.2 (Cheeger’s inequality for non-regular approximability results is now established [Rag08]. graphs [Chu96]). For every graph G = (V, E), The above gap instance suggests that the picture p λ2(G)/2 6 φ(G) 6 2λ2(G), is more subtle for CSPs with global constraints — in this work we give an algorithm that does much where φ(G) is the expansion of G, better than the integrality gap for the “basic” SDP. |edges(S, V \ S )| φ(G) ≡ min . Could a stronger SDP relaxation capture the com- S ⊆V min(vol(S ), vol(V \ S )) plexity of approximating CSPs with global con- ⊆ straints such as Max Bisection? It is worth remark- Moreover, we can efficiently find a set A V | \ ing that we do not know whether an integrality gap such that vol(√A) 6 vol(V)/2 and edges(A, V | instance of the above form (i.e., 1−ε SDP optimum A) / vol(A) 6 2λ2(G). vs. say 0.9 Max Bisection value) exists even for the For any two disjoint sets X, Y ⊆ V, let basic SDP augmented with triangle inequalities. uncut(X, Y) = |E(X) + E(Y)|/|E(X ∪ Y)| be the frac- tion of edges of G[X ∪ Y] that do not cross the cut 2.2 Notation (X, Y). We say that a cut (X, Y) of V is perfect if uncut(X, Y) = 0. Suppose we are given a graph G = (V, E). We use the following notation: E(U) = {(u, v) ∈ E : u, v ∈ 2.3 Our approach } U denotes the set of edges within a set of vertices In this section, we give a brief overview of our { ∈ ∈ ∈ } U, edges(U1, U2) = (u, v) E : u U1, v U2 algorithm. It is instructive to consider first the case denotes the set of edges between two sets of vertices when G has a perfect bisection cut. In this case, G U1 and U2, and G[U] denotes the subgraph of G is a bipartite graph. If G has only one connected induced by the set U. component, each part of this component has the same number of vertices, so this is the desired bi- Definition 2.1 (Value and bias of cuts). For a cut section. Now assume that G has several connected (S, V \ S ) of a graph G = (V, E), we define its value components. Then each connected component C to be |edges(S,V\S )| (i.e., the fraction of edges which |E| of G is a bipartite graph with two parts X and Y . w(edges(S,V\S )) C C cross the cut) if G is unweighted, and w(E) Since all edges are cut in the optimal solution, XC → if G is edge-weighted with weight function w : E must lie on one side of the optimal cut and YC on >0 P ’ (where for F ⊆ E, w(F) = e∈F w(e)). the other. So in order to find a perfect bisection We define the bias β ∈ [0, 1] of a cut (S, V \ S ) to (X, Y), for every connected component C we need 1 · | | − | \ | be β = |V| S V S , and we say that the cut to either (i) add XC to X and YC to Y or (ii) add XC (S, V \ S ) is β-biased. (Note that a 0-biased cut is to Y and YC to X so that |X| = |Y| = |V|/2. We can a bisection.) do that using dynamic programming. Our algorithm for almost satisfiable instances Recall that the normalized Laplacian of G is a proceeds in a similar way. Assume that the optimal matrix LG whose rows and columns correspond to bisection cuts a (1 − ε) fraction of edges.

4 1. In a preprocessing step, we use the algorithm problem since the set Wi is small. Note how- of Goemans and Williamson [GW95] to find ever that if a cut of another piece W j is very an approximate√ maximum cut in G. A fraction unbalanced than we might need to find a cut 1 − O( ε) of edges cross this cut. We remove of Wi that is unbalanced in the other direction. all uncut edges and obtain a bipartite graph. So it is important that the candidate cut of Wi We denote the parts of this graph by A and B. is at least as unbalanced as (Wi ∩ S, Wi ∩ T). (Of course, in general |A| , |B|.) 6. Finally, we combine candidate cuts of sub- 2. Then we recursively partition G into pieces graphs G[Wi] into one balanced cut of the graph G, in the optimal way, using dynamic W1,..., Ws using Cheeger’s Inequality (see Lemma 3.1). Every piece is either a suffi- programming (see Lemma 5.1). ciently small subgraph, which contains at most Our improved algorithm (Theorem√ 1.3) which an ε fraction of all vertices, or is a spectral ex- delivers a cut of value 1 − O( ε) with a small con- 2/3 pander, with λ2 > ε . There are very few stant bias has the same high level structure. Namely, edges between different pieces, so we can ig- we decompose the graph into expanding and small nore them later. In this step, we obtain a collec- parts, find good cuts in these parts, and combine tion of induced subgraphs G[W1],..., G[Ws] them together to get a global near-balanced cut. The with very few edges going between different small parts are handled in the same way as above. subgraphs. The difference is in the notion of expansion used for the large parts and how we find cuts in the expand- 3. Now our goal is to find an “almost perfect” ing parts. For the former, a part is deemed to have cut in every G[Wi] , then combine these cuts sufficient expansion if the standard SDP relaxation and get a bisection of G. Note that every for α-balanced separator (for some small α > 0) G[Wi] is bipartite and therefore has a perfect has large (Ω(ε)) objective value. (This in particular cut (since G is bipartite after the preprocess- means there every α-balanced cut has value at least ing step). However, we cannot restrict our Ω(ε), but is potentially stronger as it ensures that attention only to this perfect cut since the op- there are no sparse “vector-valued” cuts either.) timal solution (S, T) can cut G[Wi] in another We give a decomposition procedure to split a proportion. Instead, we prepare a list Wi of graph into such “SDP expanders” and small parts “candidate cuts” for each G[Wi] that cut Wi in (Lemma 7.5). We find a good cut in the union different proportions. One of them is close to of the expanding parts by rounding the Goemans- the cut (Wi ∩ S, Wi ∩ T) (the restriction of the Williamson SDP. The key in our analysis is to use optimal cut to Wi). the “SDP expansion” properties to argue that the vector solution must essentially behave like an inte- 4. If G[W ] is an expander, we find a candidate i gral solution. This gives us much better control of cut that cuts G[W ] in a given proportion by i the bias of the cut produced by random hyperplane moving vertices from one side of the perfect rounding. cut (Wi ∩ A, Wi ∩ B) to the other, greedily (see Lemma 4.1 and Lemma 4.2). 2.4 Organization The rest of the paper is devoted to the full descrip- 5. If G[W ] is small, we find a candidate cut i tion and proof of the algorithm. In Section 3, we that cuts G[W ] in a given proportion using i partition the graph into expanders and small pieces, semi-definite programming (see Lemma 4.3 after proper preprocessing. In Section 4, we pro- and Corollary 4.4). We solve an SDP relax- duce a list of candidate cuts for each expander and ation similar to the Goemans–Williamson re- small piece, by different methods. In Section 5, we laxation [GW95] with “`2-triangle inequali- 2 show how to choose one candidate cut for each part. ties”, and then find a cut by using hyperplane In Section 6, we put everything together to finish or threshold rounding. the proof of Theorem 1.2. In Section 7, we prove In fact, the cut that we find can be more unbal- Theorem 1.3, giving√ a variant of the algorithm that anced than (Wi ∩ S, Wi ∩ T) but this is not a only uncut O( ε) fraction of edges when some

5 constant imbalance is tolerated. In Section 8, we Proof. We start with a trivial partitioning {V} of apply our techniques to Min Bisection problem. V and then iteratively refine it. Initially, all sets in the partitioning are “active”; once a set satisfies 3 Preprocessing and partitioning conditions 1 or 2 of the lemma, we permanently graph G mark it as “passive” and stop subdividing it. We proceed until all sets are passive. Specifically, we In this section, we present the preprocessing and mark a set S as passive in two cases. First, if |S | 6 partitioning steps of our algorithms. We will as- δ|V| then we add S to the family of sets Ui. Second, sume that we know the value of the optimal solu- if λ2(G[S ]) > λ then we add S to the family of sets tion OPT = 1 − εOPT (with a high precision). If we Vi. do not, we can run the algorithm for many different We subdivide every active S into smaller values of ε and output the best of the bisection cuts pieces by applying the following easy corollary of we find. Cheeger’s inequality (Theorem 2.2) to H = G[S ]. 3.1 Preprocessing: Making G bipartite and Corollary 3.2. Given a graph H = (S, E(H)) and unweighted a threshold λ > 0, we can find, in polynomial time, In this section, we show that we can assume a partition S 1, S 2, ··· , S t of S such that that the graph G is bipartite, with parts A and B, 1. |E(S i)| 6 |E(S )|/2 or λ2(H[S i]) > λ, for each unweighted, and that |E| 6 O(|V|/ε2 ). OPT 1 6 i 6 t. First, we “sparsify” the edge-weighted graph P √ G = (V, E), and make the graph unweighted: we 2. i< j |edges(S i, S j)| 6 8λ|E(H)|. sample O(ε−2 |V|) edges (according to the weight OPT 3. each graph H[S ] is connected. distribution) with replacement from G, then with i high probability, every cut has the same cost in the Proof. If λ2(H) > λ then we just output a trivial original graph as in the new graph, up to an additive partition {S }. Otherwise, we apply Theorem 2.2 to H = H, find a set S s.t. vol (S ) vol (S )/2 error εOPT (by Chernoff’s bound). So we assume 1 1 H1 1 6√ H1 −2 and |edges(S 1, S \ S 1)|/ volH (S 1) 6 2λ2(H1) 6 that |E| 6 O(εOPT |V|). √ 1 We apply the algorithm of Goemans and 2λ. Then we remove S 1 from H1, obtain a graph H and iteratively apply this procedure to H . We Williamson to G and find a partitioning√ of G into 2 2 two pieces A and B so that only an O( εOPT ) frac- stop when either λ2(Hi) > λ or |E(Hi)| 6 |E(S )|/2. tion of edges lies within A or within B. We verify that the obtained partitioning S 1,..., S t of S satisfies the first condition. For each 3.2 Partitioning i ∈ {1,..., t − 1}, we have |E(S i)| 6 volHi (S i)/2 6

In this section, we describe how we partition G volHi (V(Hi))/4 = E(Hi)/2 6 |E(H)|/2. Our stop- into pieces. ping criterion guarantees that S t satisfies the first condition. We verify the second condition. Lemma 3.1. Given a graph G = (V, E), and pa- rameters δ ∈ (0, 1) and λ ∈ (0, 1) such that X Xt−1 | | | \ | |E| = O(|V|/δ2), we can find a partitioning of V edges(S i, S j) = edges(S i, V(Hi) S i) i< j i=1 into disjoint sets U1,..., Up (“small sets”), and t−1 V1,..., Vq (“expander graphs”): X √ √ √ 6 2λ vol (S ) 6 2λ vol (S ) = 2 2λ|E(H)|. [ [ Hi i H i=1 V = Ui ∪ V j, i j Finally, if for some i, H[S i] is not connected, we replace S in the partitioning with the connected in polynomial time, so that i components of H[S i].  1. |Ui| 6 δ|V| for each 1 6 i 6 p; By the definition, sets Ui and V j satisfy proper- 2. λ2(G[Vi]) > λ for each 1 6 i 6 q; ties 1 and 2. It remains to verify that P P Xp Xq √ 3. i |E(Ui)| + j |E(V j)| > (1 − √ |E(Ui)|+ |E(V j)| > (1−O( λ log(1/δ)))|E|. O( λ log(1/δ)))|E|. i=1 j=1

6 We first prove that the number of iterations is Therefore, O(log(1/δ)). Note that if S is an active set and T min(vol (X), vol (Y)) is its parent then |E(S )| 6 |E(T)|/2. Set V contains H H O(|V|/δ2) edges. Every active set S contains at least 4δ|E| 6 2|edges(X, Y)|/λ2(H) 6 . δ|V|/2 edges, since |E(S )| > |S | − 1 > δ|V|/2 (we λ2(H) use that G[S ] is connected). Therefore, the num-  2 . ber of iterations is O(log2((|V|/δ ) (δ|V|/2))) = O(log 1/δ). Consider one of the sets Vi. Let H = G[Vi]. Denote A = V ∩ A and B = V ∩ B. We sort all We finally observe√ that when we subdivide a set i i i i S , we cut O( λ|E(S )|) edges. At each iteration, vertices in Ai and Bi w.r.t. their degrees in H. Now we are ready to define the family of candidates cuts since√ all active sets are disjoint, we cut at most (X , Y ),..., (X , Y ) for G[V ]. For each j, we O( λ|E|) edges. Therefore, the√ total number of 0 0 |Vi| |Vi| i edges cut in all iterations is O( λ log(1/δ))|E|.  define (X j, Y j) as follows. – If j 6 |Ai| then X j consists of j vertices of Ai with highest degrees, and Y j consists of the 4 Finding cuts in sets Ui and Vi remaining vertices of H (i.e. Y j contains all In the previous section, we showed how to par- vertices of Bi as well as |Ai| − j lowest degree tition the graph G into the union of “small graphs” vertices of Ai). G[Ui] and expander graphs G[Vi]. We now show – If j > |Ai| then Y j consists of |Vi| − j vertices how to find good “candidate cuts” in each of these of Bi with highest degrees, and X j consists of graphs. the remaining vertices of H. Clearly, |X j| = j and |Y j| = |Vi|− j. Let (S, T) be the 4.1 Candidate cuts in Vi restriction of the optimal bisection of G to H. We In this section, first we prove that there is essen- will show that one of the cuts (X j, Y j) is not much tially only one almost perfect maximum cut in an worse than (S, T). By Lemma 4.1 applied to cuts expander graph (Lemma 4.1). That implies that (Ai, Bi) and (S, T) (note that uncut(Ai, Bi) = 0), every almost perfect cut in the graph G[Vi] should { 4 4 } be close to the perfect cut (Vi ∩ A, Vi ∩ B). Us- min volH(Ai S ), volH(Ai T) ing that we construct a list of good candidate cuts 4 · uncut(S, T)|E(H)| 6 . (Lemma 4.2). One of these cuts is close to the λ2(H) restriction of the optimal cut to subgraph G[Vi]. Assume without loss of generality that volH(Ai 4 Lemma 4.1. Suppose we are given a graph H = S ) 6 4E(H)/λ2(H) (otherwise, rename sets X and (V, E) and two cuts (S 1, T1) and (S 2, T2) of G, each Y). We show that volH(Ai 4 X|S |) 6 volH(Ai 4 of value at least (1 − δ). Then S ). Consider the case |Ai| > |S |. Note that by the definition of X|S |, the set X|S | has the largest min{volH(S 1 4 S 2), volH(S 1 4 T2)} 6 4δ|E|/λ2(H). volume among all subsets of Ai of size at most |S |. Proof. Let Correspondingly, Ai \ X|S | has the smallest volume among all subsets of Ai of size at least |Ai| − |S |. 4 ∩ ∪ ∩ X = S 1 S 2 = (S 1 T2) (S 2 T1); Finally, note that |Ai \ S | > |Ai| − |S |. Therefore, Y = S 1 4 T2 = (S 1 ∩ T1) ∪ (S 2 ∩ T2). volH(Ai 4 X|S |) = volH(Ai \ X|S |) V X ∪ Y δ|E| Note that = . There are at most 2 edges 6 volH(Ai \ S ) 6 volH(Ai 4 S ). between X and Y, since The case when |Ai| 6 |S | is similar. We conclude edges(X, Y) ⊂ E(S 1) ∪ E(S 2) ∪ E(T1) ∪ (T2), that

|E(S 1) ∪ E(T1)| 6 δ|E| and |E(S 2) ∪ E(T2)| 6 δ|E|. volH(Ai 4 X|S |) 6 4 · uncut(S, T)|EH|/λ2(H). On the other hand, by Cheeger’s inequality (The- orem 2.2), we have Therefore, the size of the cut (X|S |, Y|S |) is at least |edges(X, Y)|  4 · uncut(S, T) > λ2(H)/2. |E(H)|−volH(Ai4X|S |) > 1− |E(H)|. min(volH(X), volH(Y)) λ2(H)

7 We have thus proved the following lemma. the left hand side is positive only when vi = v j, k k2 then |hv + v , v i| = vi+v j = 2. The value of this Lemma 4.2. There is a polynomial time algorithm i j 0 2 solution is (|T| − |S |)/|U| > 1 − 2θ. that given a graph H = G([Vi]) finds a family of cuts V = {(X , Y ),..., (X , Y )} such that for We solve the SDP and find the optimal SDP so- i 1 1 |Vi| |Vi| { } P h i − | | every cut (S, T) of H there exists a cut (X, Y) ∈ V lution vi . Note that i∈U v0, vi > (1 2θ) U . i 0 ∈ 0 0 with |X| = min(|S |, |T|) and Let ∆ = 2∆/3. Choose r [∆ , 2∆ ] uni- formly at random. Define a partition of U into 0 4 · uncut(S, T) sets Zk, 0 6 k < 1/∆ , as follows: let Zk = {i : uncut(X, Y) 6 . 0 0 λ2(H) k∆ + r < |hv0, vii| 6 (k + 1)∆ + r} for k > 1 and 0 0 Z0 = {i : −∆ − r 6 hv0, vii 6 ∆ + r}. We bound 4.2 Candidate cuts in Ui the probability that the endpoints of an edge (i, j) In this section, we show how to find candidate belong to different sets Zk. Note that if no point cuts for the small parts, i.e., the induced subgraphs from the set {±(k∆0 + r): k > 1} lies between G[Ui]. |hvi, v0i| and |hv j, v0i| then i and j belong to the same set Zk. The distance between |hvi, v0i| and |hv j, v0i| Lemma 4.3. Suppose we are given a graph H = is at most |hvi + v j, v0i|. Therefore, the probability (U, E) and two parameters 0 6 θ 6 1/2 and 0 < (over r) that i and j belong to different sets Zk is at ∆ < 1. Then in polynomial time we can find a 0 most |hvi + v j, v0i|/∆ . So the expected number of cut (X, Y) such that for every cut (S, T) in H, with cut edges is at most |S | 6 θ|U|, we have √ 1 X 1 X 2 2|E|εH |hv +v , v i| 6 kv +v k 6 . 1. uncut(X, Y) 6 O( uncut(S, T) + ∆0 i j 0 2∆0 i j ∆0 uncut(S, T)/∆). (i, j)∈E (i, j)∈E (4.1) + 2. |X| 6 (θ + ∆)|U|. For each k > 1, let Zk = {i ∈ Zk | hvi, v0i > 0} − and Zk = {i ∈ Zk | hvi, v0i < 0}. We use hyperplane Proof. Let (S, T) be the maximum cut among all rounding of Goemans and Williamson [GW95] to cuts with |S | 6 t|U| (of course, our algorithm does + − divide Z0 into two sets Z0 and Z0 . We are ready to not know (S, T)). Let εH = uncut(S, T). We may define sets X and Y. For each k, we add vertices − assume that our algorithm knows the value of εH + from the smaller of the two sets Zk and Zk to X, (with high precision) — as otherwise, we can run and vertices from the larger of them to Y. our algorithm on different values of ε and output Now we bound uncut(X, Y). Note that the best of the cuts the algorithm finds. X We write the following SDP program. For every |uncut(X, Y)| 6 |edges(Zk, Zl)| vertex i ∈ U, we introduce a unit vector v . Addi- k0 1 X maximize hv0, vii P |U| We have already shown that k 1 then |hvi + v j, v0i| > ∆ . 4|E| i j H k (i, j)∈E Therefore, 2 kvik = 1 ∀i ∈ V ∪ {0} X 2ε |E| 3ε |E| | + | | − | H H 2 ( E(Zk ) + E(Zk ) ) 6 0 = . kvi + v jk ∆ ∆ |hv + v , v i| 6 ∀i, j ∈ V. k>1 i j 0 2 Finally, note that when we divide Z0, the fraction The “intended solution” to this SDP is v = v if i 0 of edges of E(Z0√) that do not cross the random i ∈ T and vi = −v0 if i ∈ S (vector v0 is an arbi- hyperplane is O( ε0) (in expectation) where trary unit vector). Clearly, this solution satisfies X all SDP constraints. In particular, it satisfies the 1 2 ε0 = kvi + v jk last constraint (“an `2-triangle inequality”) since 4|E(Z0)| 2 (i, j)∈E(Z0)

8 1 X εH · |E| 6 kv + v k2 6 . Wi = Ui for i ∈ {1,..., p} and Wp+ j = V j for 4|E(Z )| i j |E(Z )| 0 (i, j)∈E 0 j ∈ {1,..., q} Then W1,..., Wp+q is a partitioning of V, and Wi is a family of cuts of G[Wi]. Thus, We say that a cut (X, Y) of G is a combination of candidate cuts from W if the restriction of (X, Y) ¾|E(Z+)| + |E(Z−)| r i 0 0 to each W belongs to W (we identify cuts (S, T)  p  √ i i 6 O εH|E|/|E(Z0) |E(Z0)| 6 O( εH)|E|. and (T, S )). Combining the above upper bounds, we conclude Lemma 5.1. There exists a polynomial time algo- that rithm that given a graph G = (V, E) and a threshold ζ ∈ [0, 1/2], sets Wi and families of cuts Wi, finds εH √  ¾[uncut(X, Y)] 6 O + εH . the maximum cut among all combination cuts (X, Y) ∆ with |X|, |Y| ∈ [(1/2 − ζ)|V|, (1/2 + ζ)|V|]. Finally, we estimate the size of the set X. Note Proof. We solve the problem by dynamic pro- + 0 0 − that if vi ∈ Zk then |hvi, v0i − k∆ | 6 3∆ , if vi ∈ Zk Sk 0 0 P gramming. Denote Hk = G[ i=1 Wi]. For every then |hvi, v0i+k∆ | 6 3∆ . Therefore, i∈Z hvi, v0i 6 k a ∈ {1,..., p + q} and b ∈ {1,..., |G[Ha]|}, let k |Z+| − |Z−| 0 0|Z | ( k k )∆ + 3∆ k , which implies Q[a, b] be the size of the maximum cut among all X X combination cuts (X, Y) on H with |X| = b (Q[a, b] k|Z+| − |Z−| 0 hv , v i − 0|U| a k k ∆ > i 0 3∆ equals −∞ if there are no such cuts). We loop over ∈ k i U all value of a from 1 to p + q and fill out the table − − 0 |U| > (1 2θ 3∆ ) . Q using the following formula Therefore, Q[a, b] = max X X (X,Y)∈Wa or (Y,X)∈Wa | | − | | | +| − | −| | +| − | −| Y X = Zk Zk > Zk Zk (Q[a − 1, b − |X|] + |edges(X, Y)|), + − k k:|Zk |−|Zk |>0 X 0 + −  where we assume that Q[0, 0] = 0, and Q[a, b] = > (k∆ ) |Z | − |Z | k k −∞ if a 6 0 and b 6 0 and (a, b) , (0, 0). k:|Z+|−|Z−|>0 k k Finally the algorithm outputs maximum among X + −  0 0 > k |Zk | − |Zk | ∆ > (1 − 2θ − 3∆ )|U| , T[p+q, d(1/2−ζ)|V|e],..., T[p+q, b(1/2+ζ)|V|c], k and the corresponding combination cut.  0 implying |X| 6 (θ + 3∆ /2)|U| = (θ + ∆)|U|.  Finally, we prove that there exists a good almost balanced combination cut. We apply this algorithm to every graph G[Ui] and every θ = k/|U |, 0 < k 6 |U |/2, and obtain a list Lemma 5.2. Let G = (V, E) be a graph. Let i i S S of candidate cuts. We get the following corollary. V = i Ui ∪ j V j be a partitioning of V that satisfies conditions of Lemma 3.1, and Ui and V j Corollary 4.4. There is a polynomial time algo- be families of candidate cuts that satisfy conditions rithm that given a graph H = G([Ui]) and a param- of Corollary 4.4 and Lemma 4.2, respectively. Then ∈ U eter ∆ (0, 1) finds a family of cuts i such that for there exists a composition cut (X, Y) such that every cut (S, T) of H there exists a cut (X, Y) ∈ U i with |X| 6 min(|S |, |T|) + ∆|U | and |X| 1 i − 6 max(∆, δ) ! |V| 2 p uncut(S, T) uncut(X, Y) 6 O uncut(S, T) + . and ∆  √ uncut(X,Y) 6 O λ log(1/δ) 5 Combining candidate cuts p + uncut(S , T ) In this section, we show how to choose one can- OPT OPT  1 1  didate cut for each set Ui and V j. + uncut(S , T ) + , OPT OPT λ ∆ For brevity, we denote Wi = Ui for i ∈ {1,..., p} and Wp+ j = V j for j ∈ {1,..., q}. Similarly, where (S OPT , TOPT ) is the optimal bisection of G.

9 p Proof. Consider the optimal bisection cut X (min(|S OPT ∩ Ui|, |TOPT ∩ Ui|) + ∆|Ui|) (S OPT , TOPT ). We choose a candidate cut for every i=1 set Vi. By Lemma 4.2, for every Vi there exists a Xq Xp cut (Xi, Yi) ∈ Vi such that 6 |S OPT ∩ V j| + |S OPT ∩ Ui| + ∆|V| j=1 i=1 uncut(Xi, Yi) = |S | + ∆|V| = (1/2 + ∆)|V|. 6 4uncut(S OPT ∩ Vi, TOPT ∩ Vi)/λ2(G[Vi]) OPT 6 4uncut(S OPT ∩ Vi, TOPT ∩ Vi)/λ, (5.1) It remains to bound uncut(X, Y). We have, and |Xi| = min(|S OPT ∩ Vi|, |TOPT ∩ Vi|). We define X V V uncut(X, Y)|E| 6 |edges(Ui, U j)| sets X and Y as follows. For each i, we add Xi V V 16i< j6p to X if |Xi| = |S OPT ∩ Vi|, and we add Yi to X , X X otherwise (i.e. if |Yi| = |S OPT ∩ Vi|). Similarly, we + |edges(Vi, V j)| + |edges(Ui, V j)| V add Yi to Y if |Yi| = |TOPT ∩ Vi|, and we add Xi 16i< j6q 16i6p V V V 16 j6q to Y , otherwise. Clearly, (X , Y ) is a candidate X X S V S 0 0 cut of i Vi and |X | = |S OPT ∩ i Vi|. Assume + uncut(Xi , Yi )|E(Ui)| + uncut(X j, Y j)|E(V j)| . without loss of generality that |XV | > |YV |. 16i6p 16 j6q Now we choose a candidate cut for every set Ui. By Lemma√ 3.1, the sum of the first three terms By Corollary 4.4, for every Ui there exists a cut 0 0 is at most O( λ log(1/δ))|E|. From (5.2), we get (X , Y ) ∈ Ui such that i i X 0 0 uncut(X0, Y0)|E(U )| uncut(Xi , Yi ) i i i  p 16i6p 6 O uncut(S ∩ U , T ∩ U ) p OPT i OPT i X  p 6 O(1) uncut(S OPT ∩ Ui, TOPT ∩ Ui) uncut(S OPT ∩ Ui, TOPT ∩ Ui) + , (5.2) i=1 ∆  ∩ ∩ | | 0 + uncut(S OPT Ui, TOPT Ui)/∆ E(Ui) and |Xi | 6 min(|S OPT ∩ Ui|, |T ∩ Ui|) + ∆|Ui|. We 0 0 Xp q assume that Xi is the smaller of the two sets Xi and  0 = O(1) |E(S OPT ∩ Ui)| + |E(TOPT ∩ Ui)| Yi . 0 0 i=1 We want to add one of the sets Xi and Yi to p XV , and the other set to YV so that the resulting · |E(Ui)| V p cut (X, Y) is almost balanced. We set X = X and X |E(S OPT ∩ Ui)| + |E(TOPT ∩ Ui)| V + O(1) Y = Y . Then consequently for every i from 1 to p, ∆ we add X0 to the larger of the sets X and Y, and add i=1 i v Y0 to the smaller of the two sets (recall that X0 is t p i i X  0 6 O(1) |E(S OPT ∩ Ui)| + |E(TOPT ∩ Ui)| smaller than Yi ). We obtain a candidate cut (X, Y) of G. i=1 v t p ! We show that |X|/|V| − 1/2 6 max(∆, δ). Ini- X uncut(S , T ) · |E| V V OPT OPT tially, |X| = |X | > |Y| = |Y |. If at some · |E(Ui)| + O ∆ point X becomes smaller than Y then after that i=1  p |X| − |Y| 6 δ|V| since at every step |X| − |Y| does 6 O uncut(S OPT , TOPT ) not change by more than |Ui| 6 δ|V|. So in this case uncut(S OPT , TOPT ) |X|/|V| − 1/2 6 δ. So let us assume that the set X + · |E|. always remains larger than Y. Then we always add ∆ 0 0 Xi to X and Yi to Y. We have From (5.1), we get [ X | | V ∪ 0 X = X Xi uncut(X j, Y j)|E(V j)| i j q X X 4 · uncut(S OPT ∩ V j, TOPT ∩ V j)|E(V j)| 6 |S ∩ V |+ 6 OPT j λ j=1 j

10 4 · uncut(S OPT , TOPT ) 0 0 0 6 · |E| . Xi to X and Yi and Y; otherwise, we add Xi to Y and λ Y0 and X. We argue again that if at some point the i √  difference |X| − |Y| becomes less than O( ε|V|), √ then after that |X| − |Y| = O( ε|V|), and therefore, √ 6 The bisection algorithm – proof of we find a cut with bias β + O( ε). Otherwise, Theorem 1.2 there are two possible cases: either we always have |X| − |Y| > β|V|, and then we always add X0 to X First, we run the preprocessing step described i and Y0 to Y, or we always have |X| − |Y| > β|V|, in Section 3.1. Then we use the algorithm from i and then we always add X0 to Y and Y0 to X. Note, Lemma 3.1 with λ = ε2/3 and δ = ε to find a i i OPT OPT however, that in both cases |X|−|Y|−β|V| decreases partition of V into sets U ,..., U , V ,..., V . We 1 √ p 1 q 0 0 by |Y | − |X | > |S OPT ∩ Ui| − |TOPT ∩ Ui| − 2∆|Ui| apply Corollary 4.4 with ∆ = εOPT to all sets Ui, i i after each iteration. Thus after all iterations, the and obtain a list Ui of candidate cuts for each set value of |X| − |Y| − β|V| decreases by at least Ui. Then we apply Lemma 4.2 and obtain a list V j of candidate cuts for each set V . Finally, we find j Xp the optimal combination of candidate cuts using |S OPT ∩ Ui| − |TOPT ∩ Ui| − 2∆|Ui| Lemma 5.1. Denote it by (X, Y). By Lemma 5.2, i=1 [ [ we get that uncut(X, Y) is at most > S OPT ∩ Ui − TOPT ∩ Ui  √ √ ε ε  i i O λ log(1/δ) + ε + OPT + OPT [ OPT λ ∆ − 2∆ U . √ i 3 6 O( εOPT log(1/εOPT )), i | V | − | V | | ∩ and Taking into the account that X Y = S OPT S V | − |T ∩ S V |, we get the following bound i i OPT i i |X| 1 √ for the bias of the final combination cut (X, Y), − 6 max(∆, δ) = O( ε ). | | OPT V 2 √ |X| − |Y| − β|V| By moving at most O( ε )|V| vertices of the OPT [ [ smallest degree from the larger size of the cut to 6 S OPT ∩ Vi − TOPT ∩ Vi − β|V| smaller part of the cut, we obtain a balanced cut. By i i [ [ [ doing so, we increase the number of uncut edges √ − S OPT ∩ Ui + TOPT ∩ Ui + 2∆ Ui by at most O( εOPT |E|). The obtained bisection √ i i i 3 cut cuts a 1 − O( εOPT log(1/εOPT )) fraction of all edges. 6 |S OPT | − |TOPT | − β|V| + 2∆|V| = 2∆|V|. It is easy to see that a slight modification of the algorithm leads to the following extension of Theo- We√ get the exact β-biased cut by moving at most | | rem 1.2. O( ε) V vertices of the smallest degree from the larger size of the cut to smaller√ part of the cut. By Theorem 6.1. There is a randomized polynomial doing so, we lose at most O( ε)|E|/(1 − β) cut time algorithm that given an edge-weighted graph edges. Therefore the theorem follows.  G with a β-biased cut of value√ (1 − ε) finds√ a β- biased cut of value (1 − O( 3 ε log(1/ε) + ε/(1 − 7 Improving the performance when β))). small (constant) imbalance is toler- Proof. We use the algorithm above, while changing ated DP algorithm used by Lemma 5.1 to find√ the best In this section we prove Theorem 1.3. By ap- combination with bias β ± t (where t = O( ε)). We plying the sparsification procedure described in modify the proof of Lemma 5.2√ to show that there Section 3.1, we may assume that the graph G is exists a β ± t cut of value 1 − O( ε). As previously, unweighted. we first find sets X = XV and Y = YV with |XV | > We begin by describing a difference graph de- |YV |. Now, however, if |X| − |Y| > β|V| then we add composition procedure that will be used to prove

11 Theorem 1.3. This will ensure a slightly different Semidefinite program SDP(α)(G): expansion property for the expanding parts, which 1 X we define first. Minimize kv − v k2 4|E| i j e=(i, j)∈E 7.1 (α, γ)-expansion subject to |W| For a subset W ⊆ V, we denote by µ(W) = |V| the measure of W. Note that this measure is with kvik = 1 for i = 1, 2,..., n 2 respect to the uniform distribution and not the sta- ¾[kvi − v jk ] > 8α(1 − α) . (7.1) tionary distribution on vertices (which is used by i, j definition of vol W). Lemma 7.3. If G is not an (α, γ)-expander, then Definition 7.1. Let 0 < α, γ < 1. A graph the above SDP has a feasible solution of cost less G = (V, E) is said to be an (α, γ)-expander if for than γ. |S | every subset S ⊂ V with α 6 |V| 6 1/2, one has |edges(S, V \ S )| > γ · |E|. Proof. Suppose S ⊂ V is a subset of density An induced subgraph G[W] of is said to be an µ(S ) ∈ [α, 1/2] such that |edges(S, S )| < γ · |E|. (α, γ)-expander (relative to G) if for every S ⊂ W Taking vi = 1 for i ∈ S and vi = −1 for i ∈ S |S | gives a feasible solution to SDP(α)(G) of value with measure α 6 |W| 6 1/2, one has |edges(S, W \ S )| > γ · µ(W)|E|. |edges(S, S )|/|E| < γ.  Similar to spectral expanders, we can also argue We now show that a good SDP solution can be that if a graph is an (α, γ)-expander, then any two rounded into a sparse, somewhat balanced cut (S, S ) large cuts in the graph must differ in very few ver- via random hyperplane rounding. The method is tices. This is the analog of Lemma 4.1 with the standard, but the subtlety lies in the fact that the difference being that we measure distance between measure we use for the size of S is not related to cuts with respect to the uniform measure instead of the edge weights or degrees but just the number of the stationary distribution. vertices in S . The proof is deferred to Appendix A. Lemma 7.2. Let G = (V, E) be an (α, γ)-expander. Lemma 7.4. Let α 6 1/2. There is a polynomial For ζ < γ/2, let (A, A) and (B, B) are two cuts in G time that given a solution such that at least a fraction (1 − ζ) of the edges of to the above SDP with objective value γ, finds G cross each of them. Then a set S ⊂ V with √α/2 6 µ(S ) 6 1/2 satisfying  | | 6 γ | | min µ(A ∩ B) + µ(A ∩ B), µ(A ∩ B) + µ(A ∩ B) < α. edges(S, S ) < α E with probability at least 0.9. √ Proof. Assume for definiteness that |A ∩ B| + |A ∩ In particular, G is not an (α/2, 6 γ/α)-expander if the B| 6 |A ∩ B| + |A ∩ B|. Let X = (A ∩ B) ∪ (A ∩ B); SDP has a solution with value at most γ. we have µ(A ∩ B) + µ(A ∩ B) = µ(X) 6 1/2. Any edge leaving X must fail to be in at least one of the 7.3 The partition lemma for (α, γ)- two cuts (A, A) and (B, B). Thus |edges(X, X)| 6 expanders 2ζ|E| < γ|E|. Since G is an (α, γ)-expander, this We now present the partition algorithm that given implies µ(X) < α.  an arbitrary graph G = (V, E), partitions the graph into induced subgraphs each of which is either an Now we prove a similar partition lemma as (α, γ)-expander or is of small measure (at most η), Lemma 3.1, but partition the graph into (α, γ)- while cutting a small fraction of the edges. For- expanders and small parts. mally, we prove the following. 7.2 SDP relaxation for (α, γ)-expansion Lemma 7.5. For any choice of parameters 0 < Consider the following SDP for a graph G = α, γ, η < 1, there is a polynomial time algorithm (V, E) (we identify V = {1, 2,..., n}), parametrized that on input a graph G = (V, E), outputs a partition by α. The SDP is the standard one for the α- V = V1∪V2∪· · ·∪Vr such that for i, 1 6 i 6 r, either balanced separator problem. (Below kxk denotes µ(Vi) 6 η or the induced subgraph G[Vi] is an the `2-norm of a vector x.) (α, γ)-expander, and the number of edges crossing

12 between different parts V and V for i j is at is higher. This is because in a sparse component √ i j ,  γ  G[T], two cuts could differ in a large fraction of most O 2 log(1/η) |E|. α vertices yet yield roughly the same value. Hence, Proof. Let n = |V|. The idea is to recursively de- the threshold for expansion γT beyond which G[T] compose the graph using the method of Lemma 7.4 is not further decomposed is higher for a sparse till each part either has at most ηn vertices, or is an component G[T]. (α, γ)-expander as certified by the optimum SDP objective value being larger than γ. For a subset The partition claimed in the lemma is obtained by T ⊆ V, define the subroutine Decompose(G[T]) running Decompose(G). The algorithm clearly ter- on the subgraph induced by T as follows: minates (by virtue of (7.2) and the fact that we stop when the parts have at most ηn vertices). Clearly 1. If |T| 6 ηn, return T. each part in the final decomposition is either an (In this case, the subgraph is already small (α, γ)-expander or has at most ηn vertices. It only enough.) remains to establish the bound on the number of edges cut. By (7.3), the total number of edges cut 2. Compute the optimum value of the semidefi- by all the recursive calls at a particular depth j is nite program SDP(α)(G[T]). If this value is at p r least 6 γ · |E| Xj p def γ · µ(T)|E| µ(T )|E(T )| γT = , α i i |E(T)| i=1 where E(T) denotes the set of edges in the where G[T ], i = 1, 2,..., r are the disjoint in- induced subgraph G[T], return T. i j duced subgraphs occurring in the decomposition (In this case, the induced subgraph G[T] is tree at depth j. By Cauchy-Schwartz and the fact already an (α, γ)-expander by Lemma 7.3.) Pr j that |E(T )| 6 |E|, the above quantity is at i√=1 i 6 γ 3. Otherwise, run the algorithm of Lemma 7.4 most α · |E|. Since the maximum depth of re- (with input an SDP solution of value less than cursive calls is at most O(log(1/η)/α) by (7.2), the ∪ γT ) to find a partition T = T1 T2 with total number of edges√ cut over all levels is at most 6 γ O(log(1/η)/α) · α · |E|.  max{µ(T1), µ(T2)} 6 (1 − α/2)µ(T) 6 e−α/2µ(T) (7.2) 7.4 The algorithm First, we run the preprocessing step described and in Section 3.1. Then we use the algorithm from √ γT Lemma 7.5 to find a partition of V into sets |edges(T , T )| 6 6 |E(T)| 1 2 α V1,..., Vq. p In Section 7.5, we prove the following lemma, 6 γ · |E| p = · µ(T)|E(T)| .(7.3) where (α, γ)-SDP-expander is defined by Defini- α tion 7.10. Notice that by construction, the (α, γ)- Now recursively call Decompose(G[T1]) and expanders produced by Lemma 7.5 are (α, γ)-SDP- Decompose(G[T2]) on the subgraphs induced expanders. by T1 and T2, and return the union of the parts so produced. Lemma 7.7. Suppose G is the vertex-disjoint union of G[V1],..., G[Vr] for a partition V1,..., Vr of V Remark 7.6. For our application, it is crucial that (i.e., G does not contain edges joining Vi and V j the decomposition returns components that are all for i , j). Furthermore, suppose that for every either small or rigid in the sense that any two good k ∈ [r], µ(Vk) > η and the induced subgraph G[Vk] cuts are close to each other. A key idea in the above is an (α, γ)-SDP-expander (relative to G). If there decomposition procedure is that the threshold for exists a β-biased bisection of G that cuts 1 − ε of n expansion γT is not fixed to be a constant, but varies the edges, we can compute x ∈ {±1} such that with the graph G[T]. In particular, if the graph G[T] √ 1/5 ¾ xi − β 6 O( α + (ε/γ) ), (7.4) is sparser than G, then the threshold for expansion i

13 √ 1 2 ¾ (xi − x j) > 1 − O( ε). (7.5) a good approximation. But in order to get better ∈ 4 (i, j) E performance, we need to do some more work. The running time is polynomial in n and exponential Observe that the decomposition procedure in in r (which is bounded by 1/η). Section 7.3 has a stronger guarantee than stated Lemma 7.5. Specifically, each of the compo- Let W be the union of expanders, and U1,..., Up nents produced by the decomposition is an (α, γ)- be the small sets. We enumerate all possible β, and expander as certified by the SDP, which is a stricter use Lemma 7.7 to find a list of candidate cuts W requirement. for G[W] with parameter β. We use Corollary 4.4 To make this formal, define an (α, γ)-SDP ex- to find candidate cuts U j for each G[Ui]. pander as follows: We use Lemma 5.1 to find the best combination of the candidate cuts, with parameter t = cδ for Definition 7.10. For vertex set A ⊆ V, we say the some constant c. Thus, we only need to prove the induced subgraph G[A] is a (α, γ)-SDP-expander following lemma. (relative to G) if the optimal value of the relaxation SDP(α)(G[A]) is at least γ · µ(W) · |E|/|E(A)|. In Lemma 7.8. Let W and U j be families of candi- n other words, every embedding u1,..., un ∈ ’ with date cuts we get by the procedure above. Then there 2 ¾ ∈ ku − u k > 8α(1 − α) satisfies exists a composition cut (X, Y) such that i, j A i j  √ 2 – uncut(X, Y) 6 O γ log(1/η)/α + X 2 √ kui − u jk > 4γ · µ(A) · |E| . uncut(S OPT ,TOPT )  uncut(S OPT , TOPT ) + ∆ e=(i, j)∈E(A) and  √ √ (α) |X| 1 Since the semidefinite program SDP is a – − 6 O α + uncut(S OPT , TOPT ) + |V| 2 relaxation, (α, γ)-SDP-expansion implies (α, γ)-  1/5 uncut(S OPT ,TOPT expansion. In other words (α, γ)-SDP expansion is γ , a strictly stronger notion than just (α, γ)-expansion. where (S OPT , TOPT ) is the optimal bisection of G. Suppose V1,..., Vr is the partition computed by Proof. The proof goes along the lines of the proof the decomposition procedure in Section 7.3. Then, for Lemma 5.2. Instead of choosing candidate cuts the subgraphs G[Vi] with µ(Vi) > η are, in fact, in Vi, we need to choose a cut in W — we just guaranteed to be (α, γ)-SDP-expanders (instead of choose the one generated from β∗, which is the bias (α, γ)-expanders). of optimal solution in G[W].  Now present an algorithm that exploits this prop- erty of the decomposition, and therefore prove Fix α = δ2, γ = ε /δ5, η = δ2 and δ = ε , OPT OPT Lemma 7.7. we prove the following theorem. Lemma 7.7. (restated) Suppose G is the vertex- Theorem 7.9. There is a randomized polynomial disjoint union of G[V ],..., G[V ] for a partition time algorithm that for any constant δ > 0 given 1 r V ,..., V of V (i.e., G does not contain edges join- an edge-weighted graph G with a Max Bisection 1 r ing Vi and V j for i , j). Furthermore, suppose that of value (1 − ε) finds a cut (A, B) of value at least √ for every k ∈ [r], µ(Vk) > η and the induced sub- 1 − O( ε) satisfying |A|/|V| − 1/2 6 O(δ). graph G[Vk] is an (α, γ)-SDP-expander (relative to It is easy to generalize Theorem 7.9 to its β- G). If there exists a β-biased bisection of G that n biased cut version, which is Theorem 1.3. cuts 1 − ε of the edges, we can compute x ∈ {±1} such that 7.5 Improved Approximation with SDP- √ 1/5 based Expansion ¾ xi − β 6 O( α + (ε/γ) ), i √ In this section, we prove Lemma 7.7 by show- 1 2 ¾ (xi − x j) > 1 − O( ε). ing how to approximate optimal solution’s behav- (i, j)∈E 4 ior on expanders when the graph is partitioned by Lemma 7.5. The rough idea is to use Lemma 7.2 The running time is polynomial in n and exponential to argue that finding max-cuts on the expanders is in r (which is bounded by 1/η).

14 0 Proof. Let x be a β-biased bisection of G that cuts Now we will show that for every good part Vk ⊆ 0 1 − ε of the edges. Suppose x has bias βk in part Vk, V \ Vbad, the vectors are somewhat integral in that ¾ 0 ¾ 0− 0 2 so that i∈Vk xi = βk and therefore i, j∈Vk (xi x j) = they are all clustered around two antipodal points. 2 Consider a good part Vk ⊆ V \ Vbad. The vectors as- 2(1 − βk)(1 + βk) = 2(1 − βk). Since there are only r parts, we can enumerate sociated with Vk are close to each other on average, ¾ k − k2 − ⊗t all choices for β1, . . . , βr in time polynomial in n i.e., i, j∈Vk ui u j 6 8α(1 α). Since ui = vi , if and exponential in r. Hence, we can assume that the vectors ui are correlated, then the original vec- the biases β1, . . . , βr are known to the algorithm. tors vi are highly correlated (or anti-correlated) on Let v1, . . . , vn be a feasible solution to the following average. Formally, suppose two vertices i, j ∈ Vk semidefinite program. satisfy |hvi, v ji| 6 1 − δ, where we choose δ = 1/t. t −tδ X Then, hui, u ji 6 (1 − δ) 6 e = 1/e, which means 1 k − k2 − | | k − k2 ¾ k − k2 − 4 vi v j > (1 ε) E (7.6) ui u j > Ω(1).Since i, j∈Vk ui u j 6 8α(1 α), e=(i, j)∈E it follows that 2 2 ¾ kvi − v jk = 2(1 − β ). (7.7) n o ∈ k i, j Vk  |hvi, v ji| 6 1 − δ 6 O(α) . (7.8) i, j∈Vk Note that the semidefinite program is feasible since the integral cut x0 yields a solution. Fur- In other words, within a good component G[Vk], at thermore, given the values of β1, . . . , βr a feasi- most α-fraction of the pairs of vectors vi, v j are less ble solution to the SDP can be found efficiently. than 1 − δ correlated. We will show that the expansion properties of the The improved bound on the bias stems from graphs G[V1],..., G[Vr] imply that the embedding the fact that the Goemans-Williamson rounding v1, . . . , vn form essentially an integral solution. Re- procedure has a better balance guarantee on near- call that in the intended integral solution to the integral solution. Specifically, consider the follow- semidefinite program, all vertices are assigned ei- ing simple rounding procedure: Let g be an in- ther +1 or −1, and hence are clustered in two an- dependent standard Gaussian vector in ’n. For tipodal directions. We will argue that for most parts i ∈ V, define xi as the sign of hg, vii. The Goemans–√ G[Vi], nearly all of its corresponding vectors are Williamson analysis shows that x cuts 1 − O( ε) of clustered along two anti-podal directions. the edges in expectation. Another (related) property Consider the vectors u1,..., un defined as ui = of this rounding is that for any two vertices i, j with ⊗t |hvi, v ji| > 1 − δ, vi , where t is the even integer (we determine the best choice for t later). We can upper bound the √ 1 2 1 2 p average length of an edge of G in the embedding ¾ (xi − x j) − kvi − v jk 6 O( δ) = O( 1/t) . x 4 4 u1,..., un as follows (7.9) 1 2 ¾ kui − u jk 6 O(tε) . (i, j)∈E 4 Consider a good part Vk ⊆ V \ Vbad. We will esti- 1 2 1 2 mate ¾x ¾i, j∈Vk (xi − x j) − (1 − β ) . The sec- 2 4 2 k We say a part Vk is good if ¾i, j∈Vk kui − u jk 6 ond property (7.9) of the Goemans–Williamson 8α(1 − α). Otherwise, we say that Vk is bad. Let rounding implies that V be the union of the bad parts. Consider a bad bad 2 h 1 2 i part Vk ⊆ Vbad (i.e., ¾i, j∈Vk kui − u jk > 8α(1 − α)). ¾ ¾ − |h i| − − 4 (xi x j) vi, v j > 1 δ Since G[V ] is an (α, γ)-SDP-expander, it holds that x i, j∈Vk k P 1 2 h 1 2 i p e=(i, j)∈E(A) kui − u jk > γµ(Vk)|E|. Therefore, we ¾ k − k |h i| − 4 4 vi v j vi, v j > 1 δ = O( 1/t) . can lower bound the average length of an edge of i, j∈Vk G in the embedding u ,..., u in terms of µ(V ), 1 n bad On the other hand, we can estimate the effect of 1 2 conditioning on the event |hv , v i| 1 − δ on the ¾ kui − u jk > γµ(Vbad) . i j > (i, j)∈E 4 expectation over i, j ∈ Vk (using the fact that Vk is a good part and the observation (7.8)). It follows that µ(Vbad) 6 O(tε/γ) (which means that the vertices in Vbad have negligible influence h i ¾ 1 − 2 − 4 (xi x j) on the bias of a bisection). i, j∈Vk

15 h i ¾ 1 − 2 |h i| − (using µ(V ) 6 O(tε/γ)) 4 (xi x j) vi, v j > 1 δ 6 O(α) bad i, j∈Vk  −  (7.10) 6O t 1/4 + α1/2 + tε/γ (by (7.14)) √ h 1 2i 1/5 ¾ kvi − v jk − 6O( α + (ε/γ) ) i, j∈V 4 k 5 4 h i (choosing t such that t = (γ/ε) ) . ¾ 1 k − k2 |h i| − 4 vi v j vi, v j > 1 δ 6 O(α) i, j∈Vk (7.11)  Combining these bounds and using the fact that ¾ 1 k − k2 1 − 2 8 Minimum Bisection i, j∈Vk 4 vi v j = 2 (1 βk) implies that The techniques of this work almost immediately 1 2 1 2  p  ¾ ¾ (xi − x j) − (1 − β ) 6 O 1/t + α imply the following approximation algorithm for x i, j∈V 4 2 k k the minimum bisection problem. (7.12) Observe that the bias is given by, Theorem 8.1. Suppose G is a graph with a β- q biased bisection cutting ε-fraction of edges then: | ¾ xi| = ¾ xi x j i∈Vk i, j∈Vk – For every constant δ > 0 it is possible to ef- s s ficiently find a bisection with bias β ± O(δ) ! 2 ! √ 1 − xi x j (xi − x j) = 1 − 2 ¾ = 1 − 2 ¾ . cutting at most O( ε)-fraction of edges. i, j∈Vk 2 i, j∈Vk 4 – It is possible to efficiently√ find a β-biased√ bisec- √ tion cutting at most O( 3 ε log(1/ε) + ε/(1 − If f denote the function f (u) = 1 − 2u, β))-fraction of edges. then by the above equality the bias ¾i∈Vk xi = f (¾ 1 (x − x )2). Since f satisfies | f (u) − Proof. The result follows essentially along the lines i, j∈Vk 4 √i j f (u0)| 6 O( |u − u0|), we can estimate the expected of the corresponding results for Max-Bisection deviation of the bias ¾x|¾i∈Vk xi − βk| as follows: (Theorem 1.2 and Theorem 1.3). We simply need to change the vi − v j terms in the SDP formulation ¾ ¾ − xi βk (7.13) and analysis to vi + v j, and likewise for xi − x j in the x i∈Vk ±1 vector x (the cut) found by the algorithm. The  1 2  1  = ¾ f ¾i, j∈V (xi − x j) − f (1 + βk)(1 − βk) x k 4 2 details are therefore omitted in this version.  / ¾ ¾ 1 − 2 − 1 − 2 1 2 6O(1) i, j∈Vk 4 (xi x j) 2 (1 βk) x A Proof of Lemma 7.4  1/2 1 2 1 2 6O(1) ¾ ¾i, j∈Vk (xi − x j) − (1 − β ) x 4 2 k Proof. We will round the vectors into a cut using (using Cauchy–Schwarz) a random hyperplane. Specifically, pick a random vector r ∈ (N(0, 1))n at random, and let P = {i | p 1/2 6O( 1/t + α) (by (7.12)) hvi, ri > 0}. We will let S be the smaller of the two 6O(t−1/4 + α1/2) . (7.14) sets P, V \ P. Clearly µ(S ) 6 1/2. We will now bound the probability that µ(S ) < α/2 or the cut Finally, we obtain the following bound on the ex- (S, S ) has too many edges crossing it. pected deviation of the bias The expected number of edges crossing the cut is ¾ ¾ xi − β x i∈V r X arccoshvi, v ji X ¾ |edges(S, S ))| = µ(V ) ¾ ¾ x − β π 6 k i k ∈ x i∈Vk (i, j) E k=1 arccoshvi, v ji (using triangle inequality) = |E| ¾ (A.1) (i, j)∈E π r X 6 µ(Vk) ¾ ¾ xi − βk + O(tε/γ) x i∈Vk As the SDP objective value is γ, we have k∈[r] ¾ k − k2 Vkgood (i, j)∈E[ vi v j ] 6 4γ, or equivalently

16 ¾(i, j)∈E[hvi, v ji] > 1 − 2γ. By an averaging argu- References ment, +  [hvi, v ji 6 0] 6 2γ . (A.2) [AKK 08] Sanjeev Arora, Subhash Khot, Alexandra (i, j)∈E Kolla, David Steurer, Madhur Tulsiani, and Nisheeth K. Vishnoi, Unique games on ex- Now panding constraint graphs are easy, Pro- ceedings of the 40th Annual ACM Sym- "arccoshv , v i# ¾ i j posium on Theory of Computing, 2008, (i, j)∈E π pp. 21–28.3 6  [hvi, v ji 6 0] [Chu96] F. R. K. Chung, Laplacians of graphs and (i, j)∈E " # Cheeger’s inequalities, Combinatorics, Paul arccoshvi, v ji Erdos˝ is Eighty 2 (1996), 157–172.4 + ¾ hvi, v ji > 0 (i, j)∈E π [FJ97] Alan M. Frieze and Mark Jerrum, Improved approximation algorithms for max k-cut 1    6 2γ + arccos ¾ hvi, v ji | hvi, v ji > 0 and max bisection, Algorithmica 18 (1997), ∈ π (i, j) E no. 1, 67–81.2,3 (using (A.2) and Jensen) [FK02] Uriel Feige and Robert Krauthgamer, A arccos(1 − 2γ) polylogarithmic approximation of the mini- 6 2γ + π mum bisection, SIAM J. Comput. 31 (2002), 2 √ √ no. 4, 1090–1118.3 6 2γ + γ < 3 γ . π [FL06] Uriel Feige and Michael Langberg, The RPR2 rounding technique for semidefinite programs, J. Algorithms 60 (2006), no. 1, By Markov inequality,√ the probability that 6 γ 1–23.2,3 |edges(S, S )| > · |E| is at most α/2. α [GW95] Michel X. Goemans and David P. Let us now analyze the size of S . Clearly Williamson, Improved approximation algo- ¾[|P|] = |V|/2. Also rithms for maximum cut and satisfiability 1 X arccoshvi, v ji problems using semidefinite programming, ¾[|P||V \ P|] = 2 π Journal of the ACM 42 (1995), no. 6, i, j 1115–1145.1,2,5,8 X 1 − hvi, v ji [Hås01] Johan Håstad, Some optimal inapproxima- > 0.878 · 2 bility results, Journal of the ACM 48 (2001), i, j no. 4, 798–859.2 (by Goemans-Williamson analysis) [HK04] Jonas Holmerin and Subhash Khot, A new PCP outer verifier with applications to > 0.878 · 2α(1 − α)|V|2 (using (7.1)) homogeneous linear equations and max- bisection, Proceedings of the 36th Annual Thus ¾[µ(P)] = 1/2 and ¾[µ(P)(1 − µ(P))] > ACM Symposium on Theory of Computing, 0.878 · 2α(1 − α) > 3α/4 since α 6 1/2. Note 2004, pp. 11–20.2 that µ(S ) = 1/2 − |µ(P) − 1/2|. Hence [HZ02] Eran Halperin and Uri Zwick, A unified framework for obtaining improved approxi-  1 1 α mation algorithms for maximum graph bi- [µ(S ) < α/2] =  µ(P) − > − 2 2 2 section problems, Random Struct. Algo- ¾[(µ(P) − 1/2)2] rithms 20 (2002), no. 3, 382–402.2 < [Kho02] Subhash Khot, On the power of unique 2- 1/4 − α (1 − α/2) 2 prover 1-round games, Proceedings of the 1/4 − 3α/4 34th Annual ACM Symposium on Theory 6 6 1 − α . 1/4 − α/2 of Computing, 2002, pp. 767–775.2 [KKMO07] Subhash Khot, Guy Kindler, Elchanan Mos- sel, and Ryan O’Donnell, Optimal inap- We can thus conclude that with probability√ at least 6 γ proximability results for MAX-CUT and α/2, µ(S ) > α/2 and |edges(S, S )| < α · |E|. Repeating the rounding procedure O(1/α) times other 2-variable CSPs?, SIAM J. Comput. 37 (2007), no. 1, 319–357.2 boosts the success probability to 90%.  [MOO10] Elchanan Mossel, Ryan O’Donnell, and

17 Krzysztof Oleszkiewicz, Noise stability of functions with low influences: invariance and optimality, Annals of Mathematics 171 (2010), no. 1, 295–341.2 [OW08] Ryan O’Donnell and Yi Wu, An optimal sdp algorithm for max-cut, and equally op- timal long code tests, Proceedings of the 40th Annual ACM Symposium on Theory of Computing, 2008, pp. 335–344.2 [Rag08] Prasad Raghavendra, Optimal algorithms and inapproximability results for every CSP?, Proceedings of the 40th ACM Sym- posium on Theory of Computing, 2008, pp. 245–254.3,4 [RST10] Prasad Raghavendra, David Steurer, and Madhur Tulsiani, Reductions between ex- pansion problems, manuscript, 2010.3 [TSSW00] Luca Trevisan, Gregory B. Sorkin, Madhu Sudan, and David P. Williamson, Gadgets, approximation, and linear programming, SIAM J. Comput. 29 (2000), no. 6, 2074– 2097.2 [Ye01] Yinyu Ye, A .699-approximation algorithm for Max-Bisection, Mathematical Program- ming 90 (2001), 101–111.2

18