<<

Approximation for Independent Set via Semidefinite Programming Hierarchies and Randomized Rounding

Jason Altschuler Matthew Brennan [email protected] [email protected]

Abstract In this literature review, we present two recent papers of Bansal, Gupta and Guruganesh in STOC 2015 and SODA 2015 on approximating the Independent Set problem on graphs with maximum degree d. We present proofs that a certain randomized rounding for the standard SDP relaxation yields a O(d log log d/ log d)-approximation, and that the integrality gap of this relaxation is O˜(d/ log3/2 d), where O˜(·) holds log log d factors. We also present algorithms using polylog(d) and d levels of the SA+ hierarchy to achieve approximations factors of O(d/ log d) and O˜(d/ log2 d), respectively.

Contents

1 The History of Approximating Independent Set 2 1.1 Notation ...... 2

2 Preliminaries: Formulations and Relaxations of Independent Set 3 2.1 Standard ILP, LP and SDP Relaxations ...... 3 2.2 Tighter Relaxations via the Sherali-Adams Hierarchies ...... 3

3 The SDP Relaxation and Halperin’s O(d log log d/ log d)-approximation 5

4 The Integrality Gap of the SDP Relaxation is O˜(d/ log3/2 d) 7

5 O(d/ log d)-approximation with polylog(d) levels of the SA+ hierarchy 9 5.1 Pre-processing ...... 10 5.2 Iterative Thinning Procedure ...... 10 5.3 Only using poly(log(d)) levels of SA+ ...... 12

6 An O˜(d/ log2 d)-approximation d levels into the SA+ Hierarchy 12

7 Future Directions 13

References 13

Appendix A. Technical Lemma for HALPERIN-SDP in Section 3 15

Appendix B. Ramsey Theory Background and Results for Section 4 15

Appendix C. Technical Lemmas for Section 5 18

1 1 The History of Approximating Independent Set

As one of the original problems shown to be NP-complete by Richard Karp in 1972 [12], independent set was one of the first combinatorial optimization problems that algorithmists sought to approximate. In 1979, Lovasz famously introduced the ϑ-function, a semidefinite programming relaxation of independent set, and used this relaxation to prove several results in extremal combinatorics [16]. In 1996, Hastad proved the seminal hardness result that assuming NP 6⊆ ZPP, there is no n1− for any constant  > 0 for the general independent set problem [11]. This hardness result has since been improved by Khot 3/4+ and Ponnuswami to show that there is no n/e(log n) approximation for any  > 0 [13]. Because of these inapproximability results for the general independent set problem, the research commu- nity has shifted its focus to approximating independent set on subclasses of graphs. In particular, significant attention has been given to graphs with maximum degree bounded above by d. This problem is the focus of this literature review. For this restricted problem, a simple greedy approach already provides a decent approximation ratio. Given a graph G of maximum degree d, consider greedily coloring the vertices of G using d + 1 colors by iteratively choosing a vertex and coloring it with a color not used to color any of its neighbors so far. This yields a (d + 1)-coloring of G. Outputting the vertices in the largest color of G yields an independent set of size at least n/(d + 1), which gives a simple O(d)-approximation algorithm. It turns out this is nearly optimal. In 2009, Austrin, Khot and Safra showed that assuming the Unique Games Conjecture, this problem cannot be approximated with a ratio better than Ω(d/ log2 d) for d mildly increasing as a function of n [3]. Until 2015, the best ratio achieved was O(d log log d/ log d) by Halperin’s SDP relaxation [10]. In 2015, Bansal, Gupta and Guruganesh released two papers further analyzing the SDP relaxation of Halperin and higher levels of semidefinite programming hierarchies in [4] and [5]. These two papers are the topic of this survey. The structure of our paper is as follows. In Section 2, we outline the standard relaxations of independent set and the SA+ semidefinite programming hierarchies. In Section 3, we prove that Halperin’s randomized rounding algorithm for the standard SDP relaxation yields an O(d log log d/ log d)-approximation; and in Section 4, we prove that the integrality gap of this relaxation is O˜(d/ log3/2 d), where O˜(·) hides log log d factors. In Section 5, we present an algorithm of Bansal using polylog(d) levels of the SA+ hierarchy that sacrifices quasipolynomial time in d, to achieve an O(d/ log d)-approximation. In Section 6, we present an algorithm of Bansal, Gupta and Guruganesh using d levels of the SA+ hierachy that sacrifices exponential time in d, to achieve an O˜(d/ log2 d)-approximation. The proofs of some technical lemmas are deferred to the appendix. We also include in Appendices B and C a short introduction to the Ramsey Theory used in the paper.

1.1 Notation In this paper, G = (V,E) denotes an arbitrary undirected graph on the vertex set V = [n] with maximum degree d and average degree d¯. Without loss of generality, we assume throughout that G is connected, since otherwise we can restrict to the connected components of G. We denote the number of vertices by n := |V |, and for succinctness write [n] := {1, . . . , n}. The independence number (also called stability number) of G is the cardinality of the largest independent set of G. As is standard, we denote this quantity by α(G). For notational shorthand, we often abbreviate “positive semidefinite” by “PSD”. The O˜-notation denotes O-notation with suppressed poly (log log d) factors.

2 2 Preliminaries: Formulations and Relaxations of Independent Set

2.1 Standard ILP, LP and SDP Relaxations ILP Formulation. The independent set problem can be exactly reformulated as the following integer linear program (ILP) on n Boolean decision variables:

n X n α(G) = max xi s.t. xi + xj ≤ 1, ∀(i, j) ∈ E and x ∈ {0, 1} i=1

LP Relaxation. The standard linear program (LP) relaxation optimizes the same objective over the convex hull of the ILP’s (non-convex) feasible set. More concretely, the LP relaxation allows x to vary over [0, 1]n. Let OPTLP denote the optimum value of this LP.

SDP Relaxation. Introducing an auxiliary unit vector v0 to represent the value 1 yields the following SDP relaxation of independent set

n X OPTSDP = max v0 · vi s.t. kv0k2 = 1 i=1

vi · vi = v0 · vi ∀i ∈ [n]

vi · vj = 0, ∀(i, j) ∈ E

The integrality gap of this relaxation is the ratio between the SDP value OPTSDP and the desired inde- (n+1)×(n+1) pendence number α(G). Define the Gram matrix Y ∈ R with entries Yij := vi · vj for each i, j ∈ {0, . . . , n + 1}. Then the above SDP can be written in the following slightly more conventional form:

n X OPTSDP = max Yii s.t. Y00 = 1,Yii = Y0i ∀i ∈ [n] and Yij = 0 ∀(i, j) ∈ E i=1 Y  0 (2.1)

The above SDP is equivalent to the celebrated Lovasz´ ϑ-function. This can be checked by taking the dual of the above SDP and arguing that there is no duality gap by Slater’s condition (for details see e.g. Lemma 3.4.4 of [15]). As such, we will henceforth write ϑ(G) to denote OPTSDP. While there is significant literature on the ϑ-function (see e.g. [7, 9, 15]), we will only need the following inequality which shows that the SDP relaxation is tighter than the LP relaxation (both are upper bounds on α(G) since they are relaxations of a maximization problem)

α(G) ≤ ϑ(G) ≤ OPTLP (2.2)

2.2 Tighter Relaxations via the Sherali-Adams Hierarchies As suggested in Section 1, it is well-known that the LP and SDP relaxations of independent set can have large integrality gaps. A common approach to obtain tighter relaxations is through convex-programming hierarchies. The rough intuition is to create a sequence of mathematical programs that have the same objective, but with progressively smaller feasible sets that all contain the original ILP’s feasible set. A standard way to construct such hierarchies is by successively adding valid inequalities, which are constraints

3 for which every point in the original ILP is feasible. Clearly the addition of valid inequalities can only restrict the feasible set of the relaxed program to a smaller superset of the feasible set of the ILP. Moreover, if this restriction is strict, then we have obtained a tighter relaxation. A standard such hierarchy for the Independent Set problem is the following.

Sherali-Adams (SA) Hierarchy. The idea is simply to add clique inequalities, which are valid inequalities P of the form j∈S xj ≤ 1 for some clique S ⊆ [n]. Intuitively, a clique inequality simply ensures that an independent set can contain at most one node from any clique. The kth level (k ∈ [n − 1]) of the Sherali- Adams (SA) hierarchy, denoted by SAk, is then defined to be the LP relaxation with clique inequalities for all subsets of [n] of size at most k + 1: n X n X OPTSA(k) = max xi s.t. x ∈ [0, 1] and xj ≤ 1, ∀S ⊆ [n] clique, |S| ∈ {2, . . . , k + 1} i=1 j∈S Observe that the number of constraints grows exponentially in the level of the hierarchy. However, de- spite the fact that the relaxations get tighter throughout the hierarchy, the optimal value ϑ(G) of the SDP relaxation is always a better approximation [14]. That is,

α(G) ≤ ϑ(G) ≤ OPTSA(k) (2.3) Interestingly, the inequality in (2.3) shows that even with an exponential number of clique constraints, the LP is a looser relaxation than the SDP. The key property of the SA hierarchy that we will repeatedly use in the sequel, is that solutions to it can be viewed as local distributions over independent sets. This is formalized as follows.

Lemma 1 (Lemma 1 of [8] and Theorem 2.1 of [4]). Let {D(S)}S⊆[n]: |S|≤k+2 be a family of distributions, where each D(S) is a distribution over {0, 1}S. If the distributions satisfy both

1. Supported on local independent sets. If i, j ∈ S and (i, j) ∈ E, then PD(S) (yi = 1, yj = 1) = 0. 2. Distributions are consistent on subsets. If S0 ⊆ S ⊂ [n] with |S| < k + 1, then the distributions D(S0) and D(S) agree on S0. then YS := PD(S) (yi = 1, ∀i ∈ S) is a feasible solution for SAk. 0 Conversely, for any feasible solution {YS} for SAk+1, there exists a family of distributions satisfying 0 0 the above two properties, as well as YS = PD(S) (yi = 1, ∀i ∈ S) = YS for all S ⊆ S ⊂ [n] where |S| ≤ k + 1.

Mixed Sherali-Adams (SA+) Hierarchy One way to tighten the relaxations in the SA hierarchy is to add the PSD constraint from the SDP in equation (2.1). This is precisely the kth level of the Mixed Sherali- + Adams hierarchy, denoted by SAk, is defined: is the kth level of the SA hierarchy with the PSD constraint on the variables yij. We denote its optimal value by OPTSA+(k). Let us make a few remarks about the tightness of the relaxations in this hierarchy. Clearly the relaxations in each level of this hierarchy are at least as tight as the SDP ϑ(G), since we are only adding valid clique inequalities to it. By similar logic, the mixed SA hierarchy is also tighter than the SA hierarchy; however, it is not immediately clear to what extent the semi-definite constraints help. Although by equation (2.3) all levels of the SA hierarchy give worse approximations than the SDP, intuitively the clique constraints from the SA hierarchy and the semi-definite constraints from the SDP restrict the feasible set in slightly different ways. The extent to which they do is directly related to the final tightness of the Mixed SA hierarchy. We give quantitative results on this in Sections 5 and 6.

4 HALPERIN-SDP(G):

4 log d q 2(p−2) 1. Let p = 3 log log d and c = p · log d.

2. Solve for OPTSDP and the optimal v0, v1, . . . , vn.

n 1 o n 1 1 o 3. Partition V into S0 = i ∈ V : v0 · vi < p , S1 = i ∈ V : p ≤ v0 · vi ≤ 2 and I2 = S2 =  1 i ∈ V : 2 < v0 · vi .

4. Use the greedy algorithm to find an independent set I0 ⊆ S0.

5. Project all vi for i ∈ S1 into the space orthogonal to v0 and normalize to obtain the unit vectors 0 ui. Choose a random n-dimensional Gaussian vector η and let I = {i ∈ S1 : ui · η ≥ c}. Form 0 I1 by removing the endpoint of smaller index for each edge in I .

6. Return the largest of I0,I1,I2.

 d log log d  Figure 1: Halperin’s O log d -approximation

3 The SDP Relaxation and Halperin’s O(d log log d/ log d)-approximation

In this section, we analyze the first rounding scheme for the standard SDP relaxation that beat the trivial greedy algorithm. This beautiful algorithm is due to Halperin [10]. We note that to be consistent with the following Sections 4-6 as well as the papers [4, 5], our analysis below is modified to take as input the solution to the SDP in Section 2 rather than the slightly different SDP in Halperin’s original paper. The high-level idea of the algorithm is to separate the vertices of G into three sets based on their SDP values and to output the best one of the three. We give deterministic guarantees for two of these sets, and provide a randomized rounding scheme for the third. The algorithm is formally described in Figure 1 and includes two parameters p and c that are chosen optimally given the analysis below. For each j = 0, 1, 2, let the partial weight of the optimal solution on Sj be given by X Vj = v0 · vi

i∈Sj

2 ∞ Let φ(x) = √1 · e−x /2 be the PDF of the standard normal random variable and let N(x) = R φ(x)dx 2π x be its upper tail. We will use the fact that the projection η · u of a random n-dimensional Gaussian vector η onto a fixed unit vector u has a standard normal distribution. Throughout the analysis, we also will use the observation that vi · vi = vi · v0 and kv0k2 = 1 (which are both constraints in the SDP) imply that 2 1 1 2 1 vi − v0 = vi · vi − vi · v0 + kv0k2 = (3.1) 2 2 4 4

The key in the analysis is to lower bound the expected size of the random independent set I1 as in the following lemma. The main observation is that |I1| is lower bounded by the number of vertices minus the

5 number of edges in I0, which can both be estimated using geometric properties of high dimensional Gaussian distributions under projections.

Lemma 2. It holds that ! d2/p · V V log d [|I |] = Ω √ 1 = Ω 1 E 1 d log d d

Proof. Our goal is to upper bound the expectation of X, the random variable denoting the number of edges 0 of G with both endpoints in I . Intuitively, the smaller X is, the less cleanup we have to do to form I1. For each i ∈ S1, let wi be the projection of vi onto the space orthogonal to v0, and let ai := v0 · vi. 1 1 Then vi = aiv0 + wi and p ≤ ai ≤ 2 for each i ∈ S1. Since v0 and wi are orthogonal, it follows from equation (3.1) that 2  2 1 1 1 2 = vi − v0 = ai − + kwik2 4 2 2 2

Now observe that if (i, j) ∈ E, then vi · vj = v0 · wi = v0 · wj = 0 implies that

0 = vi · vj = (aiv0 + wi) · (ajv0 + wj) = aiaj + wi · wj

Combining the above two equations yields that if (i, j) ∈ E then

wi · wj aiaj r aiaj 1 ui · uj = = −r = − ≤ − kwik2 · kwjk2  1 1 2  1 1 2 (1 − ai)(1 − aj) p − 1 4 − ai − 2 4 − aj − 2

1 1 since p ≤ ai ≤ 2 . Because ui and uj are unit vectors, it follows that 2(p − 2) ku + u k2 = 2 + 2 · u · u ≤ i j 2 i j p − 1

Now since the projection of η onto the unit vector (ui + uj)/kui + ujk2 is Gaussian, if (i, j) ∈ E then

 0 P i, j ∈ I = P [ui · η ≥ c, uj · η ≥ c] ≤ P [(ui + uj) · η ≥ 2c]     ui + uj 2c 2c = P · η ≥ = N kui + ujk2 kui + ujk2 kui + ujk2  r2p − 2 ≤ N c p − 2

0  q 2p−2  0 This implies by linearity of expectation that E[X] ≤ |E | · N c p−2 where E is the set of edges between vertices in S1. Since ui · η has a standard normal distribution, each i ∈ S1 is by definition included 0 0 0 in I with probability N(c) and thus E[|I |] = |S1| · N(c). Combining these results and |E | ≤ |S1|d/2 yields that ! |S |d  r2p − 2 d2/p · |S | |S | log d [|I |] ≥ [|I0|] − [X] ≥ |S | · N(c) − 1 · N c = Ω √ 1 = Ω 1 E 1 E E 1 2 p − 2 d log d d where the proof of the last two bounds for the given p and c are deferred to Appendix A. Finally, since 1 1 v0 · vi ≤ 2 on S1, it follows that |S1| ≥ 2 V1, proving the lemma.

6 We remark that while the lemma above and theorem below are both a statements about expectations, re- peating Step 5 a polynomial number of times yields a high probability bound on the success of the algorithm. We now prove the correctness of the algorithm.

Theorem 1. HALPERIN-SDP(G) outputs an independent set in G of expected size

ϑ(G) log d Ω d log log d

 d log log d  and thus has approximation ratio of O log d .

1 Proof. It is clear that both I0 and I1 are independent sets by construction. If i ∈ I2, we have that vi · v0 ≥ 2 1  and thus vi − 2 v0 · v0 > 0. Therefore if i, j ∈ I1, then by an application of equation (3.1) as well as Cauchy-Schwarz:

 1 1   1 1  v · v = v − v + v · v − v + v i j i 2 0 2 0 j 2 0 2 0  1   1   1   1  1 ≥ v − v · v − v + v − v · v + v − v · v + i 2 0 j 2 0 i 2 0 0 j 2 0 0 4

1 1 1 > − vi − v0 · vj − v0 + 2 2 2 2 4 ≥ 0 and thus (i, j) 6∈ E by definition of the SDP. This shows that I2 and hence the output of HALPERIN-SDP(G) 1 is an independent set. Now observe that since v0 · vi < p for i ∈ S0, we have that |S0| ≥ pV0 and therefore

|S | pV  V log d  |I | ≥ 0 ≥ 0 = Ω 0 0 d + 1 d + 1 d log log d V log d [|I |] = Ω 1 E 1 d

|I2| = V2

By the pigeonhole principle Vi ≥ ϑ(G)/3 for some i, giving the desired approximation ratio.

4 The Integrality Gap of the SDP Relaxation is O˜(d/ log3/2 d)

The algorithm above constructively shows that the integrality gap of the SDP relaxation is O˜(d/ log d). In this section we show that this integrality gap is in fact smaller by approximately a log1/2 d factor. To prove this, we will show that a high ϑ(G) value implies that G does not contain a complete graph on r vertices for some high value of r. We then show a new bound in Ramsey theory proving that this implies that α(G) must also be large. Intriguingly, it is unknown whether or not there is a polynomial time algorithm that achieves this integrality gap since this proof is non-constructive. Specifically, the main result of this section does not imply that there is an efficient rounding procedure that outperforms Halperin’s algorithm. However, it does suggest that there is potential for improvement. The main theorem of this section is the following.

7 Theorem 2. The standard SDP relaxation of independent set has integrality gap ! ϑ(G) log log d3/2 = O d α(G) log d

As mentioned above, the key ingredient to analyzing this integrality gap is the following new Ramsey- theoretic bound. Throughout this section, we say a graph is H-free if it does not contain H as a subgraph. Let Kr denote the complete graph on r vertices.

Theorem 3. If G is Kr-free and has maximum degree d, then ( )! n log d log d1/2 α(G) = Ω · max , d r log log d log r

It has also been conjectured that this lower bound can be improved to α(G) = Ω (n log d/(d log r)). This would imply an integrality gap of O˜(d/ log2 d), which would match the UGC-hard approximation ratio of Austrin, Khot and Safra. However, this again would be a non-constructive proof of the integrality gap, still leaving open whether or not this ratio can be matched by a polynomial time algorithm. As Theorem 3 is entirely a Ramsey-theoretic result and tangential to the algorithmic focus of this paper, we defer its proof to Appendix B. We remark that the techniques involved in proving Theorem 3 are quite interesting and probabilistic in nature. The reason why the proof of Theorem 3 does not directly yield an approximation algorithm is that it involves sampling from the uniform distribution over the independent sets of G, which is known to be hard. In Appendix B, we also give a brief introduction to Ramsey theory. We now give a proof of Theorem 2 using this Ramsey-theoretic bound. The key is to observe that G has a large subgraph that is Kr-free where r = 2n/ϑ(G). By Theorem 3 this guarantees that G contains a large independent set and hence the integrality gap of the SDP relaxation is small.

Proof of Theorem 2. Suppose that ϑ(G) = βn and note that if β ≤ 2/ log3/2 d then the greedy algorithm yields a d/ log3/2 d approximation and implies that ϑ(G)/α(G) = O(d/ log3/2 d). Thus we assume that 3/2 β > 2/ log . Let v0, v1, . . . , vn be the optimal vectors achieving ϑ(G) in the standard SDP relaxation. We reduce to the vi satisfying β 2 log log d ≤ v · v ≤ η := 2 0 i log d 2 Consider the case where more than n/ log d vertices satisfy that v0 · vi ≥ η. Then Halperin’s algorithm yields a large independent set: apply Lemma 2 from Section 3 with p = 1/η to obtain an independent set among these vertices of size !  d2η  n log3/2 d Ω √ · |{i : v · v ≥ η}| = Ω d log d 0 i d which achieves the desired ratio. Thus we can assume that there are at most n/ log2 d such vertices. Now 0 delete all vertices i with v0 · vi < β/2 or v0 · vi > η. Let ϑ be the SDP value of the remaining vertices 2 i.e. the sum of v0 · vi for all remaining vertices i. Note that kvik2 = vi · vi = v0 · vi ≤ kv0k2 · kvik2 by Cauchy-Schwarz and thus kvik2 ≤ 1. Therefore each vertex contributes at most 1 to the SDP value and we 0 2 3/2 have that ϑ ≥ βn − βn/2 − n/ log d ≥ βn/3 since β ≥ 2/ log d. Since v0 · vi ≥ η for each remaining vertex i, it follows that at least βn/3η vertices remain.

8 The key is now to observe that if i1, i2, . . . , ik are a clique, then since via · vib = 0 for all a, b,

k 2 k k k kβ X X X X ≤ v = kv k2 = v · v ≤ kv k · v 2 it it 2 0 it 0 2 it t=1 2 t=1 t=1 t=1 2

Pk by Cauchy-Schwarz. Since kv0k2 = 1, we have that kβ/2 ≤ t=1 vit ≤ 1 and thus G is Kr-free for 2 r ≥ 2/β. Applying Theorem 3 now yields that ! (# of vertices remaining) log d1/2 nβ  βn  α(G) = Ω · = Ω · η−1/2 = Ω · η−3/2 d log r dη d

Since ϑ(G) = βn, the result follows.

5 O(d/ log d)-approximation with polylog(d) levels of the SA+ hierarchy

In this section, we discuss the first sub-exponential time algorithm to improve upon Halperin’s SDP round- ing scheme. Recall from Section 3 that Halperin obtained an O(d log log / log d) approximation ratio in poly(n, d) time. The following algorithm from [4] improves by a factor of log log d. Its runtime is quasi- polynomial in d, but still polynomial in n.

Theorem 4 (Theorem 1.2 of [4]). There exists an algorithm using logO(1)(d) levels of the SA+ hierarchy, that outputs a independent set with approximation ratio O(d/ log d) and runs in time O(nO(1) exp(logO(1)(d).

The proof is somewhat complicated, so we first give a high-level sketch.

Outline of proof. The fundamental idea is to iteratively thin the graph in such a way that in each iteration, the resulting graph either (i) becomes slightly sparser; or (ii) has few triangles, in which case we can effi- ciently find a large independent set by a Ramsey-Theoretic argument. Informally, if case (ii) occurs in any iteration, we will be done; otherwise, if case (i) occurs for enough iterations, then the resulting graph will be sparse enough that the trivial greedy algorithm finds a large enough independent set. The key question then is how to implement an iteration of this thinning procedure. Roughly speaking, this is achieved by subsampling large independent sets from “dense neighborhoods,” via the standard trick of viewing solutions of the SA+ hierarchy as local distributions over valid independent sets. (See Section 5.2 for details). There are two nuances to this procedure. First, we wish to sample relatively large independent sets from these dense neighborhoods (since oth- erwise we are removing too much and cannot hope to find an independent set of the resulting graph that is + also large for the original graph). This is possible if for each node, its corresponding xi variable in the SA hierarchy, has sufficiently large value. We ensure this with a pre-processing step in Section 5.1. The second nuance is that to have distributions over the neighborhood of vertex (which can be of size d), we need to look at roughly the dth level of the SA+ hierarchy, which takes exp(d) time (see Lemma 1 in Section 2). However, we do not really need all properties of the dth level; indeed, for convenience we will assume in Sections 5.1 and 5.3 that we have access to the dth level, but in Section 5.3 we sketch why it suffices to use only poly(log(d)) levels.

9 5.1 Pre-processing + Let M(d) = (x1, . . . , xn) be an optimal solution to the dth level of the SA hierarchy with objective value s := OPTSA+(d). The purpose of this section is to reduce the proof of Theorem 4 to proving it for a certain 1 s log log d 1 subgraph H of G satisfying (i) each xi ≥ p := log d and (ii) |V (H)| ≥ 4η , where η := 2 log d . The idea will be as follows. Define A, B, and C to be the subsets of vertices in G with xi < p, xi ∈ [p, η], and xi > η, respectively. The following lemma shows that we are done if either s is small or Pn |C| is too large; otherwise, we will be able to conclude that |B| is large since s = i=1 xi. Defining H to be the induced subgraph on the vertices of B will then satisfy the above two properties we desire. Let us formalize this.

2n d Lemma 3. 1. If s < log d , then the greedy algorithm attains the desired O( log d ) approximation ratio.

2. If |C| > n , then Halperin’s algorithm attains the desired O( d ) approximation ratio. log2 d log d 3. Otherwise if both s ≥ 2n and |C| ≤ n , then |B| ≥ s . log d log2 d 4η

Proof. α(G) ≤ 2n/ log d = O( d ) For (1), the greedy algorithm attains approximation ratio GREEDY(G) n/(d+1) log d . For (2), 2η Lemma 2 shows that Halperin’s algorithm finds an independent set of size at least √d |C| ≥ n log d, d ln d d d Pn P P P giving the desired O( log d ) approximation ratio. For (3), s = i=1 xi = i∈A xi + i∈B xi + i∈C xc ≤ |A| · p + |B| · η + |C| · 1 ≤ n + |B| 2 log log d + n . The proof is finished by rearranging. log d log d log2 d

5.2 Iterative Thinning Procedure log log d By the above section, it suffices to find an independent set of size Ω(|B| d ) in H, since Lemma 3 shows that this would correspond to an independent set in the original graph G with the desired approximation d ratio of O( log d ). The main technical tool is the following, which is inspired by the Ramsey-Theoretic tools developed in [1].

Lemma 4 (Lemma 4.1 of [4]). Let H be a graph with n vertices, max degree d, and average degree d¯. + 1 Suppose also there exists a feasible solution (x1, . . . , xn) for the d-th level of SA , satisfying xi ≥ p := log d for each i. Then there exists an efficient polynomial algorithm that outputs either:

1. A subgraph H0 with n0 ≥ n/(8 log d) vertices and average degree d¯0 satisfying

n0 n ≥ (1 + β) (5.1) d¯0 d¯

where β := (log d)−1/2.

2. An independent set of size Ω( n log log d ). d¯ Before proving Lemma 4, let us first see how it finishes the proof of Theorem 4.

1 The intuition for η is that if each xi = η, then Halperin’s rounding algorithm outputs an independent set of size O(n/d) and thus does not even beat the greedy algorithm.

10 Proof of Theorem 4. Construct a sequence of graphs beginning by H0 := H, where each consecutive graph Hi+1 is the output of the thinning procedure from Lemma 4 on Hi. We stop if case (2) is satisfied, or after 4 log log d ¯ K := β iterations. Define ni (resp. di) to be the number of vertices (resp. average degree) of Hi, log log d and recall from above it suffices to find an IS of size Ω(n0 d ) in any of the Hi. nk log log d First, assume that we achieved case 2 at some k. Then this output independent set is of size Ω( ¯ ) ≥ dk Ω( n0 log log d ) ≥ Ω( n0 log log d ). Otherwise, every iteration was in case (1). Then by our choice of β and K, d¯0 d nK K n0 4 log log d/β n0 n0 Ω(1) ¯ ≥ (1 + β) ¯ = (1 + β) ¯ = ¯ log d, and thus the greedy algorithm on HK achieves dK d0 d0 d0 the desired approximation ratio.

We now focus on proving the technical lemma 4. We will need the following fact from Ramsey Theory, the proof of which is deferred to Appendix C.

Lemma 5 ([1, 17]). If G has average degree d¯ and at most cd¯2n triangles, then there exists a polynomial time algorithm to find an independent set in G of size Ω( n log 1 ). d¯ c Proof of Lemma 4. Repeatedly remove vertices v whose neighborhoods are “too dense”, in the sense that the subgraph {v} ∪ N(v) has at least ≥ 2βd¯|N(v)| edges. Repeat this process until either no more such v exist, or until ≤ n/2 vertices remain. Let H00 denote the resulting graph, and let d¯00 denote its average degree. We do casework on what made this iterative process stop. Case 1: V (H00) ≥ n/2. Then by definition of the sparsification process, the induced subgraph on {v} ∪ N(v) has at most 2βd¯|N(v)| edges for each v ∈ H00. Thus the number of triangles in H00 is bounded 2βd¯ P 2βnd¯2 ¯0 ¯ 00 above by 3 v |N(v)| = 3 . Clearly d ≤ 2d since |V (H )| ≥ n/2, and thus by an applica- tion of Lemma 5, we can efficiently find an independent set in H00 (and thus also H) of the desired size Ω( n log 1 ) = Ω( n log log d ) d¯ β d¯ . 00 Case 2: V (H ) < n/2. Let us denote the vertices deleted in the removal process by v1, . . . , vT , and their deleted neighborhoods by St := {vt} ∪ N(vt) (where N(vt) is understood to be the remaining 0 neighborhood when we selected vt). We construct the desired subgraph H of H as follows. For each 0 1 v ∈ V (H) \ (S1 ∪ · · · ∪ ST ), add v to H independently with probability p = log d . However, for the neighborhoods St (t ∈ [T ]), we will use a different subsampling that will sparsify these dense subgraphs. + Specifically, for each t ∈ [T ], sample an independent set Rt ⊆ St according to the SA solution (x1, . . . , xn) 0 (see Lemma 1). Then for each vertex vi in Rt, add it to H with probability p/xi (which is a valid probability since xi ≥ pi by assumption). Since each vertex of H is added to H0 w.p. p, linearity of expectation gives that the expected number of vertices n0 := |V (H0)| in H’ is equal to np. To calculate the expected number of edges m0 := |E(H0)| in 0 0 H , observe that edges in G with both endpoints in some St are never added to H , and all other edges enter 2 with probability p . Since the number of such edges in G with both endpoints in some St is at least

T T T T X X X X nd¯ |E(S )| ≥ 2βd¯ |N(v )| = 2βd¯ (|S | − 1) ≥ βd¯ |S | ≥ β = β|E| (5.2) t t t t 2 t=1 t=1 t=1 t=1

0 2 2 0 0 therefore E[m ] ≤ p (|E| − β|E|) = (1 − β)p |E|. Moreover since both n and m are binomially dis- tributed, they concentrate tightly (e.g. by Chernoff’s inequality), and thus the average degree d¯0 := d¯(H0) is 0 (1 − β)pd¯ n ≥ np ≥ (1 + β) n at most up to lower-order terms. Therefore w.h.p. d¯0 (1−β)pd¯ d¯.

11 5.3 Only using poly(log(d)) levels of SA+ As the proof is quite involved, we provide only a sketch here and defer technical details to Appendix C. Recall that the proof of the iterative thinning procedure in Lemma 4 uses the fact that Rt ⊆ St were large + independent sets. Indeed, we are able to sample such sets Rt because the level-d SA hierarchy induced local distributions over independent sets of each St (since |St| ≤ d + 1 and then invoking Lemma 1). The key observation that allows us to reduce the level of the hierarchy, is that we do not need Rt to be PT an independent set. In fact, we can see from the computation of t=1 |E(St)| in equation (5.2) and the 0 following computation of E[m ], that it suffices for the random subsets Rt to have edge density that is a constant factor smaller than St. The question then becomes how to efficiently extract such a large sparse subset Rt ⊆ St. It helps that each St is by construction very dense. Indeed, a variant of the global correlation rounding technique (orig- inally proposed for the Lasserre-Parrilo Sum-of-Squares hierarchy [6] but adaptable to the Mixed Sherali- Adams hierarchy) ensures that we can find such an Rt ⊆ St so long as the edge density of St is at least 1/ log d (proof deferred to Appendix C). Since the details of this rounding scheme are quite technical and are not the key idea of the proof of Theorem 4 (the key idea was the iterative thinning procedure in Lemma 4), we defer some details to Appendix C, and for the rest we refer the reader to Section 4.3 of [4].

6 An O˜(d/ log2 d)-approximation d levels into the SA+ Hierarchy

In this section, we show how extending to d levels of the SA+ hierarchy yields an O˜(d/ log2 d) approxi- mation algorithm for independent set. The cost, however, will be an exponential dependence on d in the runtime of the algorithm rather than the almost-polynomial dependence in the previous section. The main theorem of this section is as follows. Theorem 5. There is a O˜(d/ log2 d)-approximation algorithm with running time 2O(d) · poly(n) using d levels of the SA+ hierarchy.

The key is first to reduce to vertices with value in a bounded range as in the proof of Theorem 3. We will then show that the induced subgraph on these vertices is locally k-colorable2. Johannson’s algorithm (Theorem 6 below) then applies and can be used to find a proper coloring of G using a small number of colors. Taking the vertices of the largest color then yields a large independent set. Throughout this section, let G[S] denote the induced subgraph on the vertex set S where S ⊆ V . We will need the following guarantees of Johannson’s algorithm to prove Theorem 5. Theorem 6 (Johannson [5]). Given a graph G with maximum degree d and n vertices that is locally k-  d log k  colorable, Johannson’s algorithm outputs a proper coloring of V (G) using O log d colors in expected poly(n2d) time.

We will only need Johannson’s algorithm as a blackbox; we refer the reader to [5] for a more detailed description of Johannson’s algorithm. We now show the key fact that restricting to vertices with bounded values yields a locally k-colorable induced subgraph.

0  2 Lemma 6. Suppose that V = i ∈ V : 1/(4 log d) ≤ xi ≤ η where η = 3 log log d/ log d. Then the graph G0 = G[V 0] is locally k-colorable for k = O(log3 d).

2A graph is called locally k-colorable if the subgraph induced by the neighbors of any one vertex can be properly colored with k colors. Note that any such k is a lower bound on the chromatic number of the graph.

12 + Proof. By Lemma 1, solving the SA(d) relaxation of independent set yields a “local distribution” {XS}S⊆N(v) over independent subsets of each neighborhood N(v) satisfying: P 1. XS ≥ 0 and S⊆N(v) XS = 1

2. XS > 0 if and only if S is an independent set 3. For each i ∈ N(v), we have that

1 X ≤ xi = XS 4 log2 d S:i∈S

2 P Now note that if YS = (4 log d)XS for each S, it follows that S:i∈S YS ≥ 1 for each i ∈ N(v). Thus {YS}S independent forms a fractional set cover of N(v). It is well known that the LP relaxation of set cover has an O(log |N(v)|) integrality gap. This can be seen by iteratively picking the set S with maximum total value restricted to uncovered vertices. This implies that N(v) can be covered by O(log2 d · log |N(v)|) = O(log3 d) independent sets, and thus G[V 0] is locally k = O(log3 d) colorable.

2 Proof of Theorem 5. Let η = 3 log log d/ log d. We can assume that OPTSA+(d) ≥ n/ log d as otherwise the greedy algorithm gives the desired guarantee. As in the proof of Theorem 3, we can assume that there 2 are at most n/(4 log d) vertices with xi ≥ η as otherwise Step 5 in Halperin’s algorithm and Lemma 1 yield an independent set of size Ω(n log2 d/d). Note that this is valid since SA+(d) extends the SDP relaxation and the output is a valid SDP solution. Since each vertex satisfies that xi ≤ 1, we have that n OPT + ≤ · 1 + n · η ≤ 2ηn SA (d) 4 log2 d

0  2 0 1 If V = i ∈ V : 1/(4 log d) ≤ xi ≤ η , then the total value of the vertices in V must be at least 2 OPTSA+(d). 2 1 This is because the vertices with xi ≤ 1/(4 log d) contribute at most 4 OPTSA+(d) and those with xi ≥ η 2 1 0 0 contribute at most n/ log d ≤ 4 OPTSA+(d). Since each i ∈ V has xi ≤ η, it follows that |V | ≥ OPTSA+(d)/2η. Applying Johannson’s algorithm and outputting the largest color yields and independent set of size

2 ! |V 0| log d  OPT + log d  OPT + · log d Ω · = Ω SA (d) · = Ω˜ SA (d) d log(k + 1) 2ηd log(k + 1) d where k = O(log3 d). This completes the proof of Theorem 5.

7 Future Directions

The papers [4] and [5] leave several directions for future research. The most immediate question is whether or not there is a polynomial time rounding procedure attaining the integrality gap of O(d/ log3/2) shown in Section 4. Another interesting open problem is to improve the Ramsey-theoretic bound of Section 4 to α(G) = Ω (n log d/(d log r)), which would imply an integrality gap of O˜(d/ log2 d). A natural follow-up question would then be if this new bound on the integrality gap of the SDP relaxation is attainable. One final general direction for investigation would be to try to classify the behavior of the approximation ratio achieved by lifting to k levels of the SA+ hierarchy for each k.

13 References

[1] Miklos´ Ajtai, Paul Erdos,˝ Janos´ Komlos,´ and Endre Szemeredi,´ On turans´ theorem for sparse graphs, Combinatorica 1 (1981), no. 4, 313–317.

[2] Noga Alon, Independence numbers of locally sparse graphs and a ramsey type problem, Random Structures and Algorithms 9 (1996), no. 3, 271–278.

[3] Per Austrin, Subhash Khot, and Muli Safra, Inapproximability of vertex cover and independent set in bounded degree graphs, Computational Complexity, 2009. CCC’09. 24th Annual IEEE Conference on, IEEE, 2009, pp. 74–80.

[4] Nikhil Bansal, Approximating independent sets in sparse graphs, Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, Society for Industrial and Applied Mathe- matics, 2015, pp. 1–8.

[5] Nikhil Bansal, Anupam Gupta, and Guru Guruganesh, On the lovasz´ theta function for independent sets in sparse graphs, Proceedings of the Forty-Seventh Annual ACM on Symposium on Theory of Computing, ACM, 2015, pp. 193–200.

[6] Boaz Barak, Prasad Raghavendra, and David Steurer, Rounding semidefinite programming hierarchies via global correlation, Foundations of (FOCS), 2011 IEEE 52nd Annual Sympo- sium on, IEEE, 2011, pp. 472–481.

[7] Aharon Ben-Tal and Arkadi Nemirovski, Lectures on modern convex optimization: analysis, algo- rithms, and engineering applications, SIAM, 2001.

[8] Eden Chlamtac and Madhur Tulsiani, Convex relaxations and integrality gaps, Handbook on semidef- inite, conic and polynomial optimization, Springer, 2012, pp. 139–169.

[9] Bernd Gartner¨ and Jiri Matousek, Approximation algorithms and semidefinite programming, Springer Science & Business Media, 2012.

[10] Eran Halperin, Improved approximation algorithms for the vertex cover problem in graphs and hyper- graphs, SIAM Journal on Computing 31 (2002), no. 5, 1608–1623.

[11] Johan Hastad,˚ Clique is hard to approximate withinn1- ε, Acta Mathematica 182 (1999), no. 1, 105– 142.

[12] Richard M Karp, Reducibility among combinatorial problems, Complexity of computer computations, Springer, 1972, pp. 85–103.

[13] Subhash Khot and Ashok Ponnuswami, Better inapproximability results for maxclique, chromatic num- ber and min-3lin-deletion, Automata, Languages and Programming (2006), 226–237.

[14] Monique Laurent, A comparison of the sherali-adams, lovasz-schrijver,´ and lasserre relaxations for 0–1 programming, Mathematics of Operations Research 28 (2003), no. 3, 470–496.

[15] , Networks and semidefinite programming, (2012).

14 [16] Laszl´ o´ Lovasz,´ On the shannon capacity of a graph, IEEE Transactions on Information theory 25 (1979), no. 1, 1–7.

[17] James B Shearer, A note on the independence number of triangle-free graphs, Discrete Mathematics 46 (1983), no. 1, 83–87.

Appendix A. Technical Lemma for HALPERIN-SDP in Section 3

4 log d q 2(p−2) Lemma 7. If p = 3 log log d and c = p · log d, then ! d  r2p − 2 d2/p log d N(c) − · N c = Ω √ = Ω 2 p − 2 d log d d

2(p−2) q 2x Proof. Let x = p−1 and observe that c = 1−x · log d. Using the standard bounds

 1 1  φ(x) φ(x) − ≤ N(x) ≤ x x3 x we have that

 q 2p−2   q 2p−2  q p−2 N c p−2 φ c p−2 2p−2 ≤ N(c) 1  φ(c) 1 − c2  p − 1 c2  r p − 2 = (1 + o(1)) · exp −c2 · + p − 2 2 2p − 2   √1 + o(1) ≤ 2 d Thus it follows that for sufficiently large d, ! ! d  r2p − 2 N(c) φ(c) 1 d2/p N(c) − · N c ≥ ≥ = Ω x √ = Ω √ 2 p − 2 2 2c d 1−x x · log d d log d

4 log d Substituting p = 3 log log d now yields that this value is Ω(log d/d).

Appendix B. Ramsey Theory Background and Results for Section 4

In this section of the appendix, we prove Theorem 3 – the main Ramsey-theoretic bound used in Section 4. Before we proceed to prove this theorem, we give some background on Ramsey theory and some preliminary results. Ramsey theory is the study of the tension between avoiding subgraphs in G and its complement G. The principal object of study is the number R(s, t), which is the smallest positive integer such that any graph on at least R(s, t) vertices either contains a complete graph on s vertices or a complete graph on t vertices in its complement. It can be shown by induction that these R(s, t) are finite and satisfy the bound s + t − 2 R(s, t) ≤ s − 1

15 Unfortunately, this bound is loose and obtaining tighter Ramsey-theoretic bounds has yielded many long- standing open problems in extremal combinatorics. Even the value of R(5, 5) is currently unknown. In this section, we assume that both n and d are sufficiently large. Directly applying the binomial coefficient upper bound on R(s, t), we can obtain the following weak lemma, which we will subsequently bootstrap.

Lemma 8. If G is a Kr-free graph on n vertices, then ( ) n1/r log n α(G) ≥ max , 2 log 2r

Proof. First note that for s, t ≥ 3 we have that

s + t − 2 e(s + t − 2)s−1 R(s, t) ≤ ≤ ≤ (2t)s s − 1 s − 1

r n1/r Now observe that R(s, r) ≤ (2s) ≤ n holds if s ≤ 2 . Thus G either contains Kr or an independent set n1/r s log n of size s, which implies that α(G) ≥ 2 since G is Kr-free. Similarly R(s, r) ≤ (2r) ≤ n if s ≤ log 2r log n and hence α(G) ≥ log 2r . By choosing random subsets of G, we can use this simple bound to obtain the following lower bound on the number of independent sets I of G.

Lemma 9. If G is a Kr-free graph on n vertices with I independent sets. Then ( ) n1/r log2 n log I ≥ max , 2 18 log 2r

Proof. Because all subsets of the largest independent set of G are also independent, it follows that I ≥ 2α(G) n1/r which implies that log I ≥ 2 using the previous lemma. Consider the subgraph H of G obtained by sampling each vertex of G independently with probability p = 2n−1/2. By Markov’s inequality, we have 1/2 1 1/2 that P[|H| ≥ n ] ≥ 2 since E[|H|] = 2n . Let X denote the number of independent sets of size 1/2 s = dlog n/2 log 2re in H. If |H| ≥ n , then the previous lemma implies that since H is Kr-free, α(H) ≥ s. It follows that h i h i 1 [X] ≥ |H| ≥ n1/2 · X |H| ≥ n1/2 ≥ E P E 2 If G contains N independent sets of size s, it follows that since each is included in H with probability ps, s p−s we have that E[X] = N · p . Therefore the above inequality implies that I ≥ N ≥ 2 . Therefore

s s s log n log2 n log I ≥ s log p−1 − 1 ≥ log n − − 1 ≥ ≥ 2 2 9 18 log 2r for sufficiently large n.

We are now ready to prove the main Ramsey-theoretic bound using Lemma 1 and Lemma 2.

16 Proof of Theorem 3. Let W be an independent set chosen uniformly over all independent sets in G. We will show the stronger statement that W in expectation is large enough to satisfy the theorem. Let γ = n 1/2o max log d/(r log log d), (log d/ log r) and let Yv be the indicator random variable for the event that v ∈ W for each v ∈ V . The crucial step in this proof is to define the random variable

Xv = d · Yv + |N(v) ∩ W |

These will provide a simple lower bound on |W | and the d-to-1 ratio of coefficients will be essential in the P 1 P subsequent analysis. First observe that |W | = v∈V Yv and |W | ≥ d v∈V |N(v) ∩ W | since each vertex 1 P of W is counted in at most d sets N(v) ∩ W . It therefore follows that |W | ≥ 2d v∈V Xv. It suffices to show that E[Xv] ≥ cγ for some constant c > 0 since this directly implies E[|W |] ≥ cnγ/2d. Fix some v ∈ V and let U = V \(N(v) ∪ {v}). We condition on the restriction of W to the vertices in U. Fix some S ⊆ U and let Z ⊆ N(v) be the set of non-neighbors of S in N(v). It follows that if W ∩ U = S, then W ∩ U c is either {v} or an independent set in the subgraph induced by Z. Suppose that there are I such independent sets. Therefore h i

E Xv W ∩ U = S = d · P[v ∈ W ] + E [size of a random independent set of Z] · (1 − P[v ∈ W ]) d log I I ≥ + · I + 1 10 log(1 + |Z|/ log I) I + 1 using the fact that the average size of a set in a family of I subsets of Z is at least log I/10 log(1 + |Z|/ log I). This fact is stated in Lemma 11. Using Lemma 9 to lower bound I and optimizing the expres- h i sion above yields that E Xv W ∩ U = S = Ω(γ) for all S. This computation is carried out in Lemma 11. Now taking the expectation over all S yields that E[Xv] = Ω(γ), proving the desired result.

n x1/r log2 x o Lemma 10. If log I ≥ max 2 , 18 log 2r for some x > 0, then ( )! d log I I log d log d1/2 + · = Ω max , I + 1 10 log(1 + x/ log I) I + 1 r log log d log r √ √ Proof. Define√ = log I/x. If I +1 ≤ d, then d/(I +1) ≥ d and the bound holds. Thus we can assume 1 1 that I + 1 > d and thus x ≥ 2 log d. Combining this with the facts that x ≥ 1 and I/(1 + I) ≥ 2 yields that the LHS above is lower bounded by ( ) 1 log d x1/r log2 x · max , , 20 log(1 + x) 2 2 18 log 2r

If x ≥ logr d, then since the LHS below is increasing in x, we have that

1 x1/r  log d  · = Ω 20 log(1 + x) 2 r log log d

If x ≤ logr d, then it follows that

1 log d  log d  · = Ω 20 log(1 + x) 2 r log log d

17 By AM-GM, we have that ! 1 log d log2 x  log d1/2 + = Ω 20 log(1 + x) 2 18 log 2r log r which proves the desired bound.

In the proof of Theorem 3, we also used the following technical lemma that can be found in [2]. The proof involves bounding sums of binomial coefficients.

Lemma 11. If F is a family of |F | = 2n subsets of [n] then the average size of a member of F is at least n/10 log(1 + 1/).

Appendix C. Technical Lemmas for Section 5

Proof of Lemma 5 We will make use of the existence of the following well-known fact from Ramsey Theory.

Lemma 12 (1,17). There exists a polynomial time algorithm that can find an independent set of size 0 ¯0 Ω( n log d ) in a triangle-free graph with vertices n0 and average degree d¯0. d¯0 We are now ready to prove Lemma 5.

Proof of Lemma 5. Construct the subgraph G0 of G by subsampling each vertex of G with probability p := √1 δ := 1√ 2 cd¯. Setting the constant 40 c , we obtain after standard computations that:

• G0 n0 [n0] = pn = √1 n The number of vertices in , denoted , is in expectation E 2 c d¯, and thus by Cher- noff’s inequality:

  1  n  n3  n0 ≥ √ − δ ≥ 1 − exp −Ω (7.1) P 2 c d¯ d¯3

• G0 T 0 [T 0] ≤ p3cd¯2n = pn = √1 n The number of triangles in , denoted , is in expectation E 4 8 c d¯, and thus by Chernoff’s inequality:

  1  n  n3  T 0 ≤ √ + δ ≥ 1 − exp −Ω (7.2) P 8 c d¯ d¯3

0 ¯0 ¯0 ¯ √1 • The average degree in G , denoted d , is in expectation E[d ] = pd = 2 c is constant, and thus by Chernoff’s inequality:

 (α + 1) d¯0 ≥ √ ≥ 1 − exp (−Ω(α)) (7.3) P 2 c

18 Since equations (7.1), (7.2), and (7.3) each occur w.h.p., a union bound gives that all three of them occur simultaneously w.h.p. So for the remainder of the proof, let us assume we are in this event. Now define S to be the random subset of all vertices in G0 that are in triangles, and define G00 to be the subgraph of G0 with S removed. Since we are in the event that equations 7.1 and 7.2 occur, |S| is at most a constant fraction of the vertices in G0. Formally:

 3  n  1  n |S| ≤ 3T 0 ≤ √ + 3δ = √ − 2δ ≤ (1 − β)n0 8 c d¯ 2 c d¯ for a constant β > 0. In conclusion, w.h.p. the graph G00 satisfies (i) number of vertices n00 = Ω(n0) =   Ω n d¯00 ≤ 1 d¯0 d¯ ; and (ii) average degree β can be at most a constant factor larger than the average degree of G0 (since clearly n00d¯00 = 2|E(G00)| ≤ 2|E(G0)| = n0d¯0. Thus by an application of Lemma 12, we can  00 ¯00  G00 Ω n log d = Ω n log 1  w.h.p. find an independent set of of size d¯00 d c in polynomial time, as desired.

Technical details for Section 5.3

Lemma 13. Consider the sets S1,...,ST defined in the proof of Lemma 4. For each t ∈ [T ], the edge density of St is at least 1/ log d.

Proof. Since St = {vt} ∪ N(vt), we have |V (St)| ≤ d + 1. On the other hand, we have by construction of St that |E(St)| ≥ 2βd¯|St|. Therefore the edge density of St is at least:

|E(St)| 2βd¯|St| 2βd 2 1 2 ≥ 2 ≥ ≥ ≥ |V (St)| |St| |St| log log d log d log d where the second inequality is because we can assume d¯ ≥ log log d/d (else the trivial greedy algorithm finds an independent set of size Ω(n log log d/d) after removing all of the ≤ n/2 vertices of degree more than 2d¯, trivializing the proof of Theorem 4).

19