ETH Hardness for Densest-k-Subgraph with Perfect Completeness

Mark Braverman ∗ Young Kun Ko † Aviad Rubinstein ‡ Omri Weinstein § November 1, 2016

Abstract Problem 1. (Approximate Max , Informal) We show that, assuming the (deterministic) Exponential Given an n-vertex graph G, decide whether G con- Time Hypothesis, distinguishing between a graph with tains a clique of size k, or all induced cliques of G an induced k-clique and a graph in which all k-subgraphs are of size at most δk for some 1 > δ(n) > 0. have density at most 1−ε, requires nΩ(log˜ n) time. Our re- sult essentially matches the quasi-polynomial algorithms of Feige and Seltser [FS97] and Barman [Bar15] for this The second natural relaxation is to relax the “Clique” problem, and is the first one to rule out an additive PTAS requirement, replacing it with the more modest goal for Densest k-Subgraph. We further strengthen this re- of finding a subgraph that is almost a clique: sult by showing that our lower bound continues to hold when, in the soundness case, even subgraphs smaller by ˜ Problem 2. (Densest k-Subgraph with per- a near-polynomial factor (k0 = k · 2−Ω(log n)) are assumed to be at most (1 − ε)-dense. fect completeness, Informal) Our reduction is inspired by recent applications of the Given an n-vertex graph G containing a clique of “birthday repetition” technique [AIM14, BKW15]. Our size k, find an induced subgraphs of G of size k with analysis relies on information theoretical machinery and is similar in spirit to analyzing a parallel repetition of two- (edge) density at least (1 − ε), for some 1 > ε > 0. prover games in which the provers may choose to answer (More modestly, given an n-vertex graph G, decide some challenges multiple times, while completely ignoring whether G contains a clique of size k, or all induced other challenges. k-subgraphs of G have density at most (1 − ε)).

1 Introduction Today, after a long line of research [FGL+96, k-Clique is one of the most fundamental problems AS98, ALM+98, H˚as99,Kho01, Zuc07] we have a in computer science: given a graph, decide whether it solid understanding of the inapproximability of Prob- has a fully connected induced subgraph on k vertices. lem 1. In particular, we know that it is NP-hard to Since it was proven NP-complete by Karp [Kar72], distinguish between a graph that has a clique of size extensive research has investigated the complexity of k, and a graph whose largest induced clique is of size relaxed versions of this problem. at most k0 = δk for δ = 1/n1−ε [Zuc07]. The compu- This work focuses on two natural relaxations of tational complexity of the second relaxation (Prob- k-Clique which have received significant attention lem 2) remained largely open. There are a couple from both algorithmic and complexity communities: of (very different) quasi-polynomial algorithms that The first one is to relax “k”, i.e. looking for a smaller guarantee finding a (1 − ε)-dense k subgraph in every subgraph: graph containing a k-clique [FS97, Bar15], suggest- ing that this problem is not NP-hard. Yet we know neither polynomial-time algorithms, nor general im- ∗Department of Computer Science, Princeton University, email: [email protected]. Research supported in possibility results for this problem. part by an NSF CAREER award (CCF-1149888), a Turing In this work we provide a strong evidence that the Centenary Fellowship, a Packard Fellowship in Science and aforementioned quasi-polynomial time algorithms for Engineering, and the Simons Collaboration on Algorithms and Problem 2 [FS97, Bar15] are essentially tight, assum- Geometry. ing the (deterministic) Exponential Time Hypothesis †Department of Computer Science, Princeton University, email: [email protected] (ETH), which postulates that any deterministic algo- Ω(n) ‡Department of Electrical Engineering and Computer rithm for 3SAT requires 2 time [IP01]. In fact, we Sciences, University of California at Berkeley, email: show that under ETH, both parameters of the above [email protected]. This work was supported in part by relaxations are simultaneously hard to approximate: NSF grant CCF1408635 and by Templeton Foundation grant 3966. This work was done in part at the Simons Institute for the Theory of Computing. Theorem 1.1. (Main Result) There exists a uni- §Department of Computer Science, Courant Institute versal constant ε > 0 such that, assuming the (de- (NYU), email: [email protected] terministic) Exponential Time Hypothesis, distin- guishing between the following requires time nΩ(log˜ n), even sparser. In contrast, our result has perfect com- where n is the number of vertices of G. pleteness and provides the first additive inapproxima- bility for Densest k-Subgraph — the best one can Completeness G has an induced k-clique; and hope for as per the upper bound of [Bar15]. Soundness Every induced subgraph of G size k0 = −Ω( log n ) Planted Clique The Planted k · 2 log log n has density at most 1 − ε, is a special case of our problem, where the inputs Our result has implications for two major open come from a specific distribution (G (n, p) versus problems whose computational complexity remained G (n, p) + “a planted clique of size k”, where p is 1 elusive for more than two decades: The (general) some constant ). The Planted Clique Conjecture + Densest k-Subgraph problem, and the Planted ([AAK 07, AKS98, Jer92, Kuc95, FK00, DGGP10]) Clique problem. asserts that distinguishing between√ the aforemen- tioned cases for p = 1/2, k = o( n) cannot be done Densest k-Subgraph The Densest k-Subgraph in polynomial time, and has served as the underly- problem, DkS (η, ε), is the same as (the decision ver- ing hardness assumption in a variety of recent ap- sion of) Problem 2, except that in the “complete- plications including machine-learning and cryptog- + ness” case, G has a k-subgraph with density η, and raphy (e.g. [AAK 07, BR13]) that inherently use in the “soundness” case, every k-subgraph is of den- the average-case nature of the problem, as well as sity at most ε, where η  ε. Since Problem 2 is in reductions to worst-case problems (e.g. [HK11, + + + a special case of this problem, our main theorem AAM 11, KZ11, BBB 13, CLLR15, BPR 16b]). can also be viewed as a new inapproximability re- The main drawback of average-case hardness sult for DkS (1, 1 − ε). We remark that the aforemen- assumptions is that many average-case instances tioned quasi-polynomial algorithms for the “perfect (even those of worst-case-hard problems) are in fact completeness” regime completely break in the sparse tractable. While a significant line of research in re- regime, and indeed it is believed that DkS n−α, n−β cent years has focused on obtaining lower bounds in + (for k = nε) in fact requires much more than quasi- restricted models of computation [FGR 13, MPW15, + + polynomial time [BCV+12]. The best to-date approx- DM15, HKP 16, BHK 16], a general lower bound for imation algorithm for Densest k-Subgraph due to the average-case planted clique problem appears out Bhaskara et. al, is guaranteed to find a k-subgraph of reach for existing techniques. Therefore, an impor- whose density is within an ∼ n1/4-multiplicative fac- tant potential application of our result is replacing tor of the densest subgraph of size k [BCC+10], and average-case assumptions such as the planted-clique thus DkS (η, ε) can be solved efficiently whenever conjecture, in applications that do not inherently rely η  n1/4 · ε (this improved upon a previous n1/3−δ- on the distributional nature of the inputs (e.g., when approximation of Feige et. al [FKP01]). Making fur- the ultimate goal is to prove a worst-case hardness ther progress on either the lower or upper bound fron- result). In such applications, there is a good chance tier of the problem is a major open problem. that planted clique hardness assumptions can be re- Several inapproximability results for Densest k- placed with a more “conventional” hardness assump- Subgraph were known against specific classes of al- tion, such as the ETH, even when the problem has gorithms [BCV+12] or under incomparable assump- a quasi-polynomial algorithm. Recently, such a re- tions of Unique Games with expansion [RS10] and placement of the planted clique conjecture with ETH hardness of random k-CNF [Fei02, AAM+11]. The was obtained for the problem of finding an approx- most closely related result is by Khot [Kho06], who imate Nash equilibrium with approximately optimal shows that the Densest k-Subgraph problem has social welfare [BKW15]. ε no PTAS unless SAT can be solved in time 2n , as We also remark that, while showing hardness 1/2+ε opposed to 2n in our paper. While Khot’s work for Planted Clique from worst-case assumptions uses a slightly weaker assumption, an important ad- seems beyond the reach of current techniques, our vantage of our work is simplicity: our construction is result can also be seen as circumstantial evidence very simple, especially in contrast to Khot’s reduc- that this problem may indeed be hard. In particular, tion. any polynomial time algorithm (if exists) would have We stress that the result of [Kho06], as well as other aforementioned works, focus on the sub- 1Planted Clique typically refers to p = 1/2, while our constant density regime, i.e. they show hardness hardness result is analogous to p = 1 − δ, for a small constant for distinguishing between a graph where every k- δ > 0. Nevertheless, in almost all applications of Planted subgraph is sparse, and one where every k-subgraph is Clique, hardness for any constant p suffices. to inherently use the (rich and well-understood) list four of them that are of particular interest and structure of G (n, p). potential applications. Strengthening the inapproximability fac- Techniques Our simple construction is inspired by tor Our result states that it is hard to distinguish the “birthday repetition” technique which appeared between a graph containing a k-clique and a graph recently in [AIM14, BKW15, BPR16a]: given√ a 2CSP that does not contain a very dense (1−δ) k-subgraph. (e.g. 3COL), we have a vertex for each Ω(˜ n)-tuple The latter (1−δ) seems to be a limitation of our tech- of variables and assignments (respectively, 3COL nique. None of the algorithms we know (including vertices and colorings). We connect two vertices by the two quasi-polynomial time algorithms mentioned an edge whenever their assignments are consistent above) can distinguish in polynomial time between and satisfy all 2CSP constraints induced on these tu- a graph containing a k-clique and a graph that does ples. In the completeness case, a clique consists of not contain even a slightly dense (δ) k-subgraph; for choosing all the vertices that correspond to a fixed any constant δ > 0, and in fact even for some sub- satisfying assignment. In the soundness case (where constant values of δ. Furthermore, there is evidence the value of the 2CSP is low), the “birthday para- [AAM+11] that this problem may indeed be hard. dox”√ guarantees that most pairs of vertices (i.e. two This naturally leads to the following problem. Ω(˜ n)-tuples of variables) will have a significant in- tersection (nonempty CSP constraints), thus result- Problem 3. (Hardness Amplification) Show ing in lower densities whenever the 2CSP does not that for every given constant δ > 0, distinguishing have a satisfying assignment. In the language of two- between the following two cases is ETH-hard: prover games, the intuition here is that the verifier has a “constant chance in catching the players in a • There exists S ⊂ V of size k such that den(S) = lie if they are trying to cheat” in the game while not 1. satisfying the CSP. • All S ⊂ V of size k have den(S) ≤ δ. While our construction is simple, analyzing it is intricate. The main challenge is to rule out a “cheat- We remark that a similar amplification, from ing” dense subgraph that consists of different assign- “clique versus dense” (den(S) = 1 vs. den(S) = 1−δ) ments to the same variables (inconsistent colorings of to “clique versus sparse” (den(S) = 1 vs. den(S) = the same vertices in 3COL). Intuitively, this is simi- δ), was shown by Alon et al. when the “clique vs. lar in spirit to proving a parallel repetition theorem dense” instance is drawn at random according to where the provers can answer some questions multiple the planted clique model [AAM+11]. (Unfortunately, times, and completely ignore other questions. Con- their techniques do not seem to apply to our hard tinuing with the parallel repetition metaphor, notice instance.) that the challenge is doubled: in addition to a cheat- An easier variant of Problem 3 is to show hard- ing prover correlating her answers (the standard ob- ness for a large gap in the imperfect completeness stacle to parallel repetition), each prover can now also regime. correlate which questions she chooses to answer. Our argument follows by showing that a sufficiently large Problem 4. (Hardness Amplification - imper- subgraph must accumulate many non-edges (viola- fect completeness) Show that there exist param- tions of either 2CSP or consistency constraints). To eters 0 < ε  η < 1 for which distinguishing between this end we introduce an information theoretic argu- the following two cases is ETH-hard: ment that carefully counts the entropy of choosing a random vertex in the dense subgraph. • There exists S ⊂ V of size k such that den(S) ≥ We note that our entropy-based argument is η. completely different from all previous applications of “birthday repetition” [AIM14, BKW15, BPR16a], as • All S ⊂ V of size k have den(S) ≤ ε. well as all subsequent works that we are aware of (including [Rub15, BCKS16, DFS16, Rub16]). The We note that such gaps can be obtained from average- main reason is that enforcing consistency is much case hardness for a random k-CNF [AAM+11] and more difficult in the case of Densest k-Subgraph from Unique Games with expansion [RS10]. than in other applications. Remark 1. Inspired by our work, Problem 3 (and 1.1 Open problems There are several interesting also Problem 4) were recently solved by Manurangsi open problems related to our work. We henceforth [Man16]. Beyond quasi-polynomial hardness An- 2 Preliminaries other interesting challenge is to trade the perfect Throughout the paper we use den(S) ∈ [0, 1] to completeness in our main result for stronger no- denote the density of subgraph S, tions of hardness. Indeed, there are substantial  evidences which suggest that the “sparse vs. very- S × S ∩ E den(S) := . sparse” regime (DkS (η, ε)) is much harder to solve. |S × S| The gap instance in [BCV+12] where all known lin- ear and semidefinite programming techniques fail is 2.1 Information theory In this section, we in- a very sparse instance and has integrality gap of troduce information-theoretic quantities used in this Ω(n2/53−ε). In particular, every vertex has degree paper. For a more thorough introduction, the reader n1/2+o(1), compared to almost linear average degree should refer to [CT12]. Unless stated otherwise, all in our instance. Since no other algorithms succeed in log’s in this paper are base-2. this regime (even in quasi-polynomial time), it is nat- Let µ be a probability distribution on ural to look for stronger lower bounds on the running Definition 2. sample space Ω. The Shannon entropy (or just time. entropy) of µ, denoted by H(µ), is defined as H(µ) := P 1 Problem 5. (Trading-off perfect complete- ω∈Ω µ(ω) log µ(ω) . ness for stronger lower bounds) Show that there exist parameters 0 < ε < η  1 for which Definition 3. (Binary Entropy Function) distinguishing between the following two cases is NP- For p ∈ [0, 1], the binary entropy function is de- hard: fined as follows (with a slight abuse of notation) H(p) := −p log p − (1 − p) log(1 − p). • There exists S ⊂ V of size k such that den(S) ≥ η. Fact 2.1. (Concavity of Binary Entropy) Let µ be a distribution on [0, 1], and let p ∼ µ. Then • All S ⊂ V of size k have den(S) ≤ ε. H(Eµ [p]) ≥ Eµ [H(p)]. Finding Stable Communities The problem For a random variable A we shall write H(A) to of finding Stable Communities is tightly related to denote the entropy of the induced distribution on the Densest k-Subgraph, and has received recent at- support of A. We use the same abuse of notation tention in the context of social networks and learning for other information-theoretic quantities appearing theory [AGSS12, AGM13, BL13]. later in this section. Definition 1. (Stable Communities [BBB+13]) Let α, β with β < α ≤ 1 be two positive parameters. Definition 4. The Conditional entropy of a ran- Given an undirected graph, G = (V,E), S ⊂ V is an dom variable A conditioned on B is defined as (α, β)-cluster if S is : H(A|B) = Eb(H(A|B = b)). 1. Internally Dense: ∀i ∈ S, |N (i) ∩ S| ≥ α|S|. Fact 2.2. (Chain Rule) 2. Externally Sparse: ∀i∈ / S, |N (i) ∩ S| ≤ β|S|. H(AB) = H(A) + H(B|A). Currently, only planted clique based hardness is known. Fact 2.3. (Conditioning Decreases Entropy) H(A|B) ≥ H(A|BC). Theorem 1.2. ([BBB+13]) For sufficiently small (constant) γ, finding a (1, 1 − γ) cluster is at least Another measure we will use (briefly) in our proof as hard as Planted Clique. is that of Mutual Information, which informally cap- tures the correlation between two random variables. As insinuated in the introduction, we believe it is plausible and interesting to see whether the hardness Definition 5. (Conditional Mutual Information) assumption of the theorem above can be replaced The mutual information between two random variable with ETH. A and B, denoted by I(A; B) is defined as Problem 6. [Hardness of Stable Communities] I(A; B) := H(A) − H(A|B) = H(B) − H(B|A). Show that for some α, β with β < α ≤ 1, finding an (α, β)-cluster S is ETH-hard. The conditional mutual information between A and B given C, denoted by I(A; B|C), is defined as Remark 2. Problem 6 was solved in subsequent work [Rub16]. I(A; B|C) := H(A|C)−H(A|BC) = H(B|C)−H(B|AC). The following is a well-known fact on mutual • (Completeness) If OPT(ϕ) = 1 then OPT(ψ) = information. 1.

Fact 2.4. (Data processing inequality) • (Soundness) If OPT(ϕ) < 1 then OPT(ψ) < Suppose we have the following Markov Chain: 1 − η, for some constant η = Ω(1)

X → Y → Z • (Balance) Every vertex in ψ has degree d for some constant d. where X⊥Z|Y . Then I(X; Y ) ≥ I(X; Z) or equiva- lently, H(X|Y ) ≤ H(X|Z). See our full version or e.g. [AIM14] for derivation Mutual Information is related to the following of this formulation of the PCP theorem. distance measure. Notice that since the size of the reduction is near linear, ETH implies that solving the above problem Definition 6. (Kullback-Leiber Divergence) requires near exponential time. Given two probability distributions µ1 and µ2 on the same sample space Ω such that Corollary 2.1. Let ψ be as in Theorem 2.1. Then (∀ω ∈ Ω)(µ2(ω) = 0 ⇒ µ1(ω) = 0), the Kullback- assuming ETH, distinguishing between OPT(ψ) = 1 Leibler Divergence between is defined as (also known and OPT(ψ) < 1 − η requires time 2Ω(˜ |ψ|). as relative entropy) 3 Main Proof   X µ1(ω) DKL µ1 µ2 = µ1(ω) log . 3.1 Construction Let ψ be the 2CSP instance µ2(ω) ω∈Ω produced by the reduction in Theorem 2.1, i.e. a constraint graph over n variables with alphabet A The connection between the mutual information and of constant size. We construct the following graph the Kullback-Leibler divergence is provided by the G = (V,E): following fact. ψ √ • Let ρ := n log log n and k := n. Fact 2.5. For random variables A, B, and C we ρ have • Vertices of Gψ correspond to all possible assign- h  i ments (colorings) to all ρ-tuples of variables in I(A; B|C) = Eb,c DKL Abc Ac . ψ, i.e V = [n]ρ × Aρ. Each vertex is of the form v = (y , y , . . . , y ) where {x , . . . , x } are the 2.2 2CSP and the PCP Theorem In the 2CSP x1 x2 xρ 1 ρ chosen variables of v, and y is the correspond- problem, we are given a graph G = (V,E) on |V | = n xi ing assignment to variable x . vertices, where each of the edges (u, v) ∈ E is asso- i ciated with some constraint function ψ : A × A → u,v • If v ∈ V violates any 2CSP constraints, i.e. if {0, 1} which specifies a set of legal “colorings” of u there is a constraint on (x , x ) in ψ which is and v, from some finite alphabet A (2 in the term i j not satisfied by y , y , then v is an isolated “2CSP ” stands for the “arity” of each constraint, xi xj vertex in G . which always involves two variables). Let us de- ψ note by ψ the entire 2CSP instance, and define • Let u = (yx1 , yx2 , . . . , yxρ ) and v = by OPT(ψ) the maximum fraction of satisfied con- 0 0 0 (yx0 , yx0 , . . . , yx0 ). (u, v) ∈ E iff: straints in the associated graph G, over all possible 1 2 ρ assignments (colorings) of V . The starting point of our reduction is the follow- – (u, v) does not violate any consistency con- straints: for every shared variable xi, the ing version of the PCP theorem, which asserts that 0 corresponding assignments agree, yx = y ; it is NP-hard to distinguish a 2CSP instance whose i xi value is 1, and one whose value is 1 − η, where η is – and (u, v) also does not violate any 2CSP some small constant: constraints: for every 2CSP constraint 0  on xi, xj (if exists), the assignment Theorem 2.1. (PCP Theorem [Din07]) Given a  0  yxi , y 0 satisfy the constraint. 3SAT instance ϕ of size n, there is a polynomial xj time reduction that produces a 2CSP instance ψ, Notice that the size of our reduction (number of with size |ψ| = n·polylog n variables and constraints, √ n ρ O˜( n) and constant alphabet size such that vertices of Gψ) is N = ρ · |A| = 2 . Completeness If OPT(ψ) = 1, then Gψ has that this implies many CSP constraint violations a k-clique: Fix a satisfying assignment for ψ, and (implied by the soundness assumption). From now let S be the set of all vertices that are consistent on, we denote n with this assignment. Notice that |S| = ρ = k. Furthermore its vertices do not violate any αi := H(Xi|X 0 be some constant to be determined later. We shall show Prefix graphs The consistency constraints induce, 0 −ε0/ log log |V | that for any subset S of size k ≥ k · |V | , for each i, a graph over the prefixes: the vertices are den(S) < 1 − δ, where δ is some constant depending the prefixes, and two prefixes are connected by an on η. The remainder of this section is devoted to edge if their labels are consistent. (We can ignore proving the following theorem: the 2CSP constraints for now — the prefix graph will be used only in the analysis of the consistency Theorem 4.1. If OPT(ψ) < 1 − η, then ∀S ⊂ V of constraints.) Formally, size k0 ≥ k ·|V |−ε0/ log log |V |, den(S) < 1−δ for some constant δ. Definition 7. (Prefix graph) For i ∈ [n + 1] let the i-th prefix graph, G = (V ,E ) be defined over 4.1 Setting up the entropy argument Fix i i i 0 the prefixes of length i − 1 as follows. We say that some subset S of size k , and let v ∈R S be a uni- wi−1 is a neighbor of σi−1 if they do not violate any formly chosen vertex in S (recall that v is a vector consistency constraints. Namely, for all j < i, if of ρ coordinates, corresponding to labels for a subset Xj = 1 for both wi−1 and σi−1, then wi and σi assign of ρ chosen variables). For i ∈ [n], let Xi denote the the same label Yj. indicator variable associated with v such that Xi = 1 In particular, we will heavily use the following if the i’th variable appears in v and 0 otherwise. We notation: let N (wi−1) be the prefix neighborhood of let Yi represent the coloring assignment (label) for wi−1; i.e. it is the set of all prefixes (of length i − 1) the i’th variable whenever Xi = 1, which is of the that are consistent with w . For technical issues of form l ∈ A. Throughout the proof, let i−1 normalization, we let wi−1 ∈ N (wi−1), i.e. all the prefixes have self-loops. Wi−1 = X

Observation 1. H(XY ) = log k0. Definition 8. (Prefix degree and density) The prefix degree of wi−1 is given by: Thus, in total, the choice of challenge and the 0 X choice of assignments should contribute log k to deg(wi−1) = Pr[σi−1]. the entropy of v. If much of the entropy comes σi−1∈N (wi−1) from the assignment distribution (conditioned on the fixed challenge variables), we will show that S must Similarly, we define the prefix density of Gi as: have many consistency violations, implying that S is X X sparse. If, on the other hand, almost all the entropy den(Gi) = Pr[wi−1] · Pr[σi−1]. w comes from the challenge distribution, we will show i−1 σi−1∈N (wi−1) When it is clear from the context, we henceforth A useful lemma: bias implies less entropy In drop the prefix qualification, and simply refer to the Fact 4.1 we saw that always αi ≤ H(qi). Equality neighborhood or degree, etc., of wi−1. happens only if the qi-mass is evenly distributed Notice that in Gn+1, the probabilities are uni- across all prefixes. We argue that if qi is far from formly distributed. In particular, den(Gn+1) ≥ evenly distributed, then the inequality is also far from den(S), since, as we mentioned earlier, the set of tight. In particular: edges in S is contained in that of Gn+1. Finally, observe also that because we accumulate violations, Claim 1. Let B ⊂ Vi be a subset of prefixes such that the density of the prefix graphs is monotonically non- for some 0 < a < b < 1, increasing with i. 1. P Pr[w ] ≤ b; but also wi−1∈B i−1 Observation 2. 2. P Pr[w ]q (w ) > a. wi−1∈B i−1 i i−1 den(G1) ≥ · · · ≥ den(Gn+1) ≥ den(S).   Then α ≤ H(q ) − q D a b . Useful approximations We use the following i i i KL bounds on α and β many times throughout the i i Proof. Abusing notation, let B(·) be the indicator proof: variable for Wi−1 ∈ B. By the data-processing Fact 4.1. inequality (Fact 2.4),

αi = E [H(qi(wi−1))] ≤ H(E [qi(wi−1)]) = H(qi) αi = H (Xi | Wi−1) Fact 4.2. ≤ H (Xi | B(Wi−1))

βi = E [βi(wi−1))] ≤ E [qi(wi−1) · log |A|] = qi log |A| = H (Xi) − I (Xi; B(Wi−1))(4.1) Proof. The bound on α follows from concavity of i Since we can write mutual information as expected entropy (Fact 2.1). For the second bound, observe KL-divergence (Fact 2.5), and KL-divergence is non- that β is maximized by spreading q mass uniformly i i negative, we get over alphabet A. h  i We also recall some elementary approximations I(X ; B(W )) = D B(W )|x B(W ) i i−1 Exi KL i−1 i i−1 to logarithms and entropies that will be useful in the   analysis. The proofs are deferred to the appendix. ≥ qiDKL Pr[B(Wi−1) = 1 |xi = 1] B(Wi−1) = 1 n   Fact 4.3. For k = ρ then, ≥ qiDKL a b ,  ρ  1  log k = nH ± O (log n) = − o (1) ρ log n n 2 where the second inequality follows from the premise assumptions that Pr[B(Wi−1)] ≤ b and Pr[B(Wi−1 = More useful to us will be the following bounds on 0 0 | xi = 1] ≥ a log k : Plugging into (4.1) we have: Fact 4.4. Let ε ≥ 5ε , and k, k0, V, n, ρ as specified 1 0   in the construction. Then, (4.2) αi ≤ H (qi) − qiDKL a b . 0 n  ρ o log k ≥ max log k, nH − ε1 log k/ log n . n | {z } 4.2 Consistency violations In this section, we ε1 ≈ 2 ·ρ show that if the total entropy contribution of the as- P This means that most indices i should contribute signments ( i βi)) is large, there are many consis- ρ  tency violations between vertices, which lead to con- roughly H n entropy to the choice of v. stant density loss. First, we show that if P β > We will also need the following bound which i i 5ε log k/ log n, then at least a constant fraction of relates the entropies of a very biased coin and a 1 such entropy is concentrated on good variables that slightly less biased one: contribute to both “types” of entropy. Fact 4.5. Let 1/n  |υ|  1. Then Definition 9. (Good Variables) We say that an 1 + υ   1  υ 1 υ2 H = H − log − (log e) index i is good if n n n n 2n υ3  • αi ≥ H(qi) − 2qi log |A| + O n−2 + O n 1 • βi ≥ 2 ε1qi where ε1 is a constant to be determined later in the We want to further restrict our attention to i’s 1 proof. for which βi is at least 2 ε1qi (aka good i’s). Note that the above inequality can be decomposed to P Claim 2. For any constant ε1, if i βi > 5ε log k/ log n, X X 1 βi + βi ≥ 0.7ε1ρ 2 good i’s i : αi≥H(qi)−2qi log |A|   1 X 1 βi< ε1qi q2 ≥ ε ρ /(n log2 |A|) = Ω(ρ2/n). 2 i 5 1 good i’s Now via a simple sum bound,

Proof. We want to show that many of the indices i X 1 X 1 β ≤ ε q = ε ρ have both a large αi and a large βi simultaneously. i 2 1 i 2 1 i : αi≥H(qi)−2qi log |A| i Let ι ⊆ [n] denote the set of i such that αi + βi < 1 βi< 2 ε1qi H (qi) − qi log |A|. We can write X X X Rearranging, we get, (αi + βi) = (αi + βi) + (αi + βi) i∈[n] i∈ι i/∈ι X 1 β ≥ ε ρ i 5 1 Using Facts 4.1 and 4.2, we have good i’s X X X (αi + βi) ≤ (H (qi) − βi) + (H (qi) + βi) . By Cauchy-Schwartz we have: i∈[n] i∈ι i/∈ι  2 X 2 1 0 β ≥ ε ρ /n Because the subgraph is of size k , from the expansion i 5 1 of log k0 (Fact 4.4), good i’s X  ρ  (α + β ) ≥ nH − ε log k/ log n Finally, since βi ≤ qi log |A|, i i n 1 i∈[n] 2 X 1  X q2 ≥ ε ρ /(n log2 |A|). ≥ H (qi) − ε1 log k/ log n, i 5 1 good i’s where the second inequality follows from the concav- ity of entropy. Plugging into (4.2), we have In the same spirit, we now define a notion of a “good” prefix. Intuitively, conditioning on a good X X βi ≥ βi − ε1 log k/ log n prefix leaves a significant amount of entropy on the i/∈ι i∈ι i’th index. We also require that a good prefix has a ! high prefix degree; that is, it has many neighbors it X X = βi − βi − ε1 log k/ log n could potentially lose when revealing the i-th label. i i/∈ι Definition 10. (Good Prefixes) We say wi−1 is Rearranging, we get a good prefix if: X 1 X (4.3) β ≥ β − ε log k/ log n • i is good; i 2 i 1 i i/∈ι • P q (σ ) Pr[σ ] ≥ (1 − ε )q ; σi−1∈N (wi−1) i i−1 i−1 2 i For all the i’s in the LHS summation, αi ≥ H (qi) − 2qi log |A| by Fact 4.2. From now on, we will • βi(wi−1) ≥ ε3qi(wi−1), consider only i’s that satisfy this condition. Now, for ε = (ε + κ) log |A|, with ε an arbitrarily small using the premise on P β and (4.3) we have: 3 4 4 i i constant that denotes the fraction of assignments that disagree with the majority of the assignments, X β ≥ (5/2 − 1)ε log k/ log n κ = Θ(1/ log |A|), and ε2 a constant that satisfies i 1  4 ε2 i : αi≥H(qi)−2qi log |A| δ = , with den(S) = 1 − δ. |A|2/ε2 ≥ 0.7ε1ρ, In the following claim, we show that these pre- where the second inequality follows from our approx- fixes contribute some constant fraction of entropy, as- imation for log k (Fact 4.3). suming that our subset is dense.  4 C ε2 Notice that since wi−1 is popular, N (wi−1) has Claim 3. If den(S) > 1 − δ, where δ = 2/ε √ |A| 2 measure at most δ. Thus, if an ε -fraction of the q and ε ≥ 4ε log |A| + 8ε , then for every good index 2 i 1 2 3 mass is concentrated on N C (w ), Claim 1 implies: i, it holds that i−1  √  X α ≤ H (q ) − q D ε δ , Pr[wi−1]βi (wi−1) ≥ βi/4 i i i KL 2 good wi−1’s which (as in (4.4)) would yield a contradiction to i Proof. We begin by proving that most prefixes satisfy being a good variable. Therefore every popular prefix the degree condition of Definition 10. Let wi−1 also satisfies the qi-weighted condition on the degree: be popular if i is a good variable and its degree X in the prefix graph Gi is at least√ deg(wi−1) := (4.5) Pr[σi−1]qi (σi−1) ≥ (1 − ε2) qi P Pr[σ ] ≥ 1 − δ. Recall that σi−1∈N (wi−1) i−1 σi−1∈N (wi−1) den(Gi) ≥ den(S) ≥ (1 − δ) (by Observation√ 2). Thus by Markov inequality, at most δ-fraction of Recall that a prefix wi−1 is good if it also satisfies the prefixes are unpopular. βi (wi−1) ≥ ε3 · qi (wi−1). Fortunately, prefixes We now argue that: that violate this condition (i.e. those with small β (w )), cannot account for much of the weight on X i i−1 (4.4) Pr[wi−1]qi (wi−1) ≤ ε2qi. βi: unpopular wi−1’s X Pr[wi−1]βi (wi−1) ≤ ε3qi. Otherwise, by Claim 1, αi ≥ H (qi) − β (w )<ε q (w )  √  i i−1 3 i i−1

qiDKL ε2 δ . On the other hand, recall that since i Since i is good and ε1 ≥ 8ε3, this implies: is good, αi ≥ H (qi)−2qi log |A|. Recall also that δ =  4  √  X ε2 , and therefore D ε δ ≥ 2 log |A|. Pr[w ]β (w ) ≥ β /2 − ε q ≥ β /4 |A|2/ε2 KL 2 i−1 i i−1 i 3 i i Thus, we get a contradiction. good wi−1’s Ineq. (4.4) implies that even if the assignment is uniform over the alphabet, the contribution to P β since i 1 1 from unpopular prefixes is small: ε q ≤ ε q ≤ β 3 i 8 1 i 4 i X X Pr[wi−1]βi (wi−1) ≤ Pr[wi−1]qi (wi−1) log |A| where last inequality follows from i being good. unp. unp. Corollary 4.1. For every good index i, ≤ ε2qi log |A| 1 1 X ε1 ≤ ε q ≤ β Pr[w ]q (w ) ≥ q . 4 1 i 2 i i−1 i i−1 8 log |A| i good wi−1’s where first inequality follows from Fact 4.2, second from (4.4), third from our setting of ε1 ≥ 4ε2 log |A|, Proof. 1 and fourth from βi ≥ ε1qi since i is good. Therefore, 2 X X X X Pr[wi−1]qi (wi−1) ≥ Pr[wi−1]βi/ log |A| Pr[wi−1]βi (wi−1) = βi − Pr[wi−1]βi (wi−1) good good pop. unp. ≥ βi/(4 log |A|) ≥ βi/2 ε1 ≥ qi. Using a similar argument, we show that for any 8 log |A| popular w , most of the q mass is concentrated i−1 i Where the first inequality follows by Fact 4.2, the on its neighbors. Consider any popular w , and i−1 second by Claim 3, and the last by definition of good let N C (w ) denote the complement of N (w ). i−1 i−1 i’s. Then we can rewrite αi as: X With Claim 2 and Corollary 4.1, we are ready to αi = Pr[σi−1]αi (σi−1) prove the main lemma of this section: σi−1∈N (wi−1) X + Pr[σi−1]αi (σi−1) Lemma 4.1. (Labeling Entropy Bound) If 5ε1 log k C P σi−1∈N (wi−1) i H(Yi|X≤i,Y log n , then den(S) < 1 − δ. Proof. Assume for a contradiction that den(S) ≥ By Observation 2 we have:

1 − δ. For prefix wi−1, let Dwi−1 denote the induced distribution on labels to the i-th variable, conditioned 1 − den(S) ≥ den(G1) − den(Gn+1) on wi−1 and xi = 1. (If qi(wi−1) = 0, take an arbi- X = den(Gi) − den(Gi+1) trary distribution.) After revealing each variable i, i the loss in prefix density is given by the probability X of “fresh violations”: the sum over all prefix edges ≥ den(Gi) − den(Gi+1) good i’s (wi−1, σi−1) of the probability that they assign dif-   ferent labels to the i-th variable: X ε1ε4 ≥ q2 10 log |A| i (4.6) good i’s  3  X X ε ε4 den(Gi) − den(Gi+1) = ... 1 2 2 ≥ 3 ρ /n = Ω(ρ /n), wi−1 σi−1∈N (wi−1) 250 log |A|   0 Pr[wi−1] Pr[σi−1]qi(wi−1)qi(σi−1) Pr [Yi 6= Yi ] Yi∼Dwi−1 0 Yi ∼Dσi−1 where the last inequality follows by Claim 2.

0 4.3 2CSP violation Intuitively, if We now lower-bound PrD ×D [Yi 6= Y ] for wi−1 σi−1 i P H(X |X ,Y ) is large, then the subgraph good w (notice that we assume nothing about i i

(ε4 log |A| − ε4 log ε4 − (1 − ε4) log(1 − ε4)) qi(wi−1), (1 − ε5) · ρ/n < Pr[Xi = 1|wi−1] < (1 + ε5) · ρ/n,

 log e  4 where the second inequality follows from choice of ε4. where ε5 is some constant such that 8 ε5 > 14ε1. Therefore Pr [Y 6= Y 0] ≥ ε . Similarly, we say that variable x is typical if Dwi−1 ×Dσi−1 i i 4 i We now have, for every good index i, X Pr[wi−1] ≥ 1 − ε5 X X den(Gi) − den(Gi+1) ≥ ε4· typical wi−1’s good w ’s σi−1∈N (wi−1) i−1 Claim 4. If P H(X |X ,Y ) ≥   i i n 8 X 14ε1, we have that u,v [I (u, v)] ≥ u Pr [i ∈ v] E E  v  i∈N (u) log e ρ  ρ 2  ρ  ρ · ε4 − O − O ε5 > · 12ε 8 n 5 n n 5 n 1 Notice that this bound may not be tight since any ρ ρ i ∈ v can potentially have d neighbors in u. Thus our > − log · 24ε / log n n n 1 upper bound is:  ρ  > (12ε / log n) H 1 n   X Eu,v [I (u, v)] ≤ d · Eu  Pr [i ∈ v] and therefore, v i∈N (u) X  ρ  H(X |X ,Y ) < (1 − 12ε / log n) nH i |N (u) | I (i, j) be defined as the number of (u, v) ∈ (S × S) ≤ ((1 + ε ) · ρ/n)2 · 2d2 · 5 Eu 2 pairs such that 2 + d · Eu,v [I (u, v)](4.11) • Xi = 1 for u and Xj = 1 for v.

N (u) • ui−1 and vj−1 are typical prefixes, where ui−1 We would like to bound Eu 2 . denotes the prefix represented by u for X and kl h i X X > 2 2 2 4 2 + Pr [k ∈ u](4.14) have I (i, j) ∈ |S| ρ /n (1 − ε5) , (1 + ε5) . u i(i, j) = |S| · Pr[u ] Pr [X = 1 | u ] on the prefixes to conclude that i−1 i i−1 typical ui−1’s   n 2 2 X (4.12) + (4.13) ≤ d ((1 + ε ) · ρ/n) · |S| · Pr [vj−1] Pr [Xj = 1 | vj−1] 2 5 typical vj−1’s Whereas to bound the third summand we first change h 4 2 i ∈ |S|2ρ2/n2 (1 − ε ) , (1 + ε ) the order of summation. We get that (4.14) is at 5 5 Armed with these Claims 4 and 5 and Observa- u, v ∈ S; otherwise, by Cauchy-Schwartz: tions 3 and 4, we are now ready to prove the main  2 lemma of this section. Recall that the soundness of X 2 1 X the 2CSP we started with is 1−η for a small constant I (u, v) ≥  I (u, v) δ |S|2 η. (u,v)∈(S×S)\E (u,v)∈(S×S)\E 2 P   Lemma 4.2. If i H(Xi|X

most common typical assignment (i.e. assignment X 2 X 2 after a typcial prefix), breaking ties arbitrarily. In I (u, v) ≤ I (u, v) particular, at least 1/|A| of the typical assignments (u,v)∈(S×S)\E (u,v)∈S×S 4 2 2 for xi are equal to A (i). Of course, this assignment ≤ (1 + 2ε7) d |S| (Eu,v [I (u, v)]) . cannot satisfy more than a (1 − η)-fraction of the 4 2 constraints in the original 2CSP; after removing the Thus we have a contradiction since d (1+2ε7) < ε6/δ by our setting of δ. Therefore we have 2CSP- ε5n atypical variables, (η/2 − ε5) dn constraints out of the dn/2 constraints must still be unsatisfied. violations in more than a δ-fraction of the pairs Recall that the number of tests for each con- u, v ∈ S. straint over typcial variables, I>(i, j), is approxi- Proof. [Proof of Theorem 4.1] mately the same for every pair of (i, j) — up to a P 0 ε1 4 Recall that i αi + βi = log k ≥ (1 − ) log k (1−ε5) -multiplicative factor (Observation 4). There- log n (1+ε )2 P 5ε1 5 by Fact 4.4. If i βi > ( log n ) log k, then by Lemma fore, the total fraction of tests over unsatisfied con- 4.1, δ(S) < 1 − δ. Otherwise, if P α > (1 − straints, out of all tests, is approximately propor- i i 6ε1 ) log k, by Lemma 4.2, δ(S) < 1 − δ. tional to the fraction of unsatisfied constraints. In log n particular, the sum I>(i, j) over all typical and un- satisfied (i, j)’s is at least: References

4  + (1 − ε5) typical, unsat. (i, j)’s X [AAK 07] , Alexandr Andoni, Tali Kaufman, · · I>(i, j) 2  Kevin Matulef, Ronitt Rubinfeld, and Ning Xie. (1 + ε5) typical (i, j) ∈ ψ (i,j)∈ψ Testing k-wise and almost k-wise independence. In 4 (1 − ε5) (η/2 − ε5) dn X > STOC, pages 496–505, 2007. ≥ · · I (i, j) + (1 + ε )2 dn/2 [AAM 11] Noga Alon, , Rajsekar 5 (i,j)∈ψ Manokaran, Dana Moshkovitz, and Omri Wein- 4 stein. Inapproximability of densest κ-subgraph from (1 − ε5) X = 2 · (η − 2ε5) · I(u, v), average case hardness. Unpublished manuscript, (1 + ε5) (u,v)∈(S×S) 2011. [AGM13] Sanjeev Arora, Rong Ge, and Ankur where the last equality follows by Observation 3. Moitra. New algorithms for learning incoher- For each such pair (i, j), on at least a ent and overcomplete dictionaries. arXiv preprint 1/|A|2-fraction of the tests both variables receive arXiv:1308.6273, 2013. the mode assignment, so the constraint is vi- [AGSS12] Sanjeev Arora, Rong Ge, Sushant Sachdeva, olated2. Thus the total number of violations and Grant Schoenebeck. Finding overlapping com- P munities in social networks: toward a rigorous ap- is at least ε6 (u,v)∈(S×S) I (u, v) (where ε6 = 4 proach. In Proceedings of the 13th ACM Conference 2 (1−ε5) (η/2 − ε5) 1/|A| 2 ). (1+ε5) on Electronic Commerce, pages 37–54. ACM, 2012. Finally, we show that so many violations cannot [AIM14] Scott Aaronson, Russell Impagliazzo, and Dana concentrate on less than a δ-fraction of the pairs Moshkovitz. Am with multiple merlins. In Compu- tational Complexity (CCC), 2014 IEEE 29th Con- 2We remark that a more careful analysis of the expected ference on, pages 44–55. IEEE, 2014. number of violations would allow one to save an |A|2-factor in [AKS98] Noga Alon, Michael Krivelevich, and Benny the value of ε6. Since it does not qualitatively affect the result, Sudakov. Finding a large hidden clique in a random we opt for the simpler analysis. graph. In SODA, pages 594–598, 1998. [ALM+98] Sanjeev Arora, , Rajeev Mot- padimitriou, Aviad Rubinstein, Lior Seeman, and wani, , and . Proof ver- Yaron Singer. Locally adaptive optimization: Adap- ification and the hardness of approximation prob- tive seeding for monotone submodular functions. In lems. J. ACM, 45(3):501–555, 1998. Proceedings of the Twenty-Seventh Annual ACM- [AS98] Sanjeev Arora and . Probabilistic SIAM Symposium on Discrete Algorithms, SODA checking of proofs: A new characterization of NP. 2016, Arlington, VA, USA, January 10-12, 2016, J. ACM, 45(1):70–122, 1998. pages 414–429, 2016. [Bar15] Siddharth Barman. Approximating Nash Equi- [BR13] Quentin Berthet and Philippe Rigollet. Complex- libria and Dense Bipartite Subgraphs via an Ap- ity theoretic lower bounds for sparse principal com- proximate Version of Caratheodory’s Theorem. In ponent detection. In Conference on Learning The- Proceedings of the Forty-Seventh Annual ACM on ory, pages 1046–1066, 2013. Symposium on Theory of Computing, STOC 2015, [CLLR15] Wei Chen, Fu Li, Tian Lin, and Aviad Ru- Portland, OR, USA, June 14-17, 2015, pages 361– binstein. Combining traditional marketing and vi- 369, 2015. ral marketing with amphibious influence maximiza- [BBB+13] Maria-Florina Balcan, Christian Borgs, Mark tion. In Proceedings of the Sixteenth ACM Confer- Braverman, Jennifer Chayes, and Shang-Hua Teng. ence on Economics and Computation, EC ’15, Port- Finding endogenously formed communities. In Pro- land, OR, USA, June 15-19, 2015, pages 779–796, ceedings of the Twenty-Fourth Annual ACM-SIAM 2015. Symposium on Discrete Algorithms, pages 767–783. [CT12] Thomas M Cover and Joy A Thomas. Elements SIAM, 2013. of information theory. John Wiley & Sons, [BCC+10] Aditya Bhaskara, Moses Charikar, Eden 2012. Chlamtac, Uriel Feige, and Aravindan Vijayaragha- [DFS16] Argyrios Deligkas, John Fearnley, and Rahul van. Detecting high log-densities: an O(n1/4) ap- Savani. Inapproximability results for approximate proximation for densest k-subgraph. In Proceedings nash equilibria. CoRR, abs/1608.03574, 2016. of the 42nd ACM Symposium on Theory of Comput- [DGGP10] Yael Dekel, Ori Gurel-Gurevich, and Yuval ing, STOC 2010, Cambridge, Massachusetts, USA, Peres. Finding hidden cliques in linear time with 5-8 June 2010, pages 201–210, 2010. high probability. CoRR, abs/1010.2997, 2010. [BCKS16] Umang Bhaskar, Yu Cheng, Young Kun Ko, [Din07] . The by gap amplifica- and Chaitanya Swamy. Hardness results for sig- tion. Journal of the ACM (JACM), 54(3):12, 2007. naling in bayesian zero-sum and network routing [DM15] Yash Deshpande and Andrea Montanari. Im- games. In Proceedings of the 2016 ACM Conference proved sum-of-squares lower bounds for hidden on Economics and Computation, EC ’16, Maas- clique and hidden submatrix problems. CoRR, tricht, The Netherlands, July 24-28, 2016, pages abs/1502.06590, 2015. 479–496, 2016. [Fei02] Uriel Feige. Relations between average case com- [BCV+12] Aditya Bhaskara, Moses Charikar, Aravindan plexity and approximation complexity. In STOC, Vijayaraghavan, Venkatesan Guruswami, and Yuan pages 534–543. ACM Press, 2002. + Zhou. Polynomial integrality gaps for strong sdp [FGL 96] Uriel Feige, Shafi Goldwasser, Laszlo Lov´asz, relaxations of densest k-subgraph. In Proceedings Shmuel Safra, and Mario Szegedy. Interactive proofs of the Twenty-third Annual ACM-SIAM Symposium and the hardness of approximating cliques. Journal on Discrete Algorithms, SODA ’12, pages 388–405. of the ACM (JACM), 43(2):268–292, 1996. + SIAM, 2012. [FGR 13] Vitaly Feldman, Elena Grigorescu, Lev [BHK+16] Boaz Barak, Samuel B. Hopkins, Jonathan Reyzin, Santosh Vempala, and Ying Xiao. Statis- Kelner, Pravesh K. Kothari, Ankur Moitra, and tical algorithms and a lower bound for detecting Aaron Potechin. A nearly tight sum-of-squares planted cliques. In Symposium on Theory of Com- lower bound for the planted clique problem. In puting Conference, STOC’13, Palo Alto, CA, USA, FOCS, 2016. June 1-4, 2013, pages 655–664, 2013. [BKW15] Mark Braverman, Young Kun Ko, and Omri [FK00] Uriel Feige and Robert Krauthgamer. Finding Weinstein. Approximating the best nash equilib- and certifying a large hidden clique in a semirandom rium in no(log n)-time breaks the exponential time graph. Random Struct. Algorithms, 16(2):195–208, hypothesis. In ACM-SIAM Symposium on Discrete 2000. Algorithms (SODA), 2015. [FKP01] Uriel Feige, Guy Kortsarz, and David Pe- [BL13] Maria Florina Balcan and Yingyu Liang. Mod- leg. The dense k-subgraph problem. Algorithmica, eling and detecting community hierarchies. In 29(3):410–421, 2001. Similarity-Based Pattern Recognition, pages 160– [FS97] Uriel Feige and Michael Seltser. On the densest 175. Springer, 2013. k-subgraph problem. Citeseer, 1997. [BPR16a] Yakov Babichenko, Christos H. Papadimitriou, [H˚as99] Johan H˚astad. Clique is hard to approximate and Aviad Rubinstein. Can almost everybody be within n1-epsilon. Acta Mathematica, 182(1):105– almost happy? 2016. 142, 1999. [BPR+16b] Ashwinkumar Badanidiyuru, Christos H. Pa- [HK11] Elad Hazan and Robert Krauthgamer. How hard is it to approximate the best nash equilibrium? the inapproximability of max clique and chromatic SIAM J. Comput., 40(1):79–91, 2011. number. Theory of Computing, 3(1):103–128, 2007. [HKP+16] Samuel B. Hopkins, Pravesh Kothari, Aaron Henry Potechin, Prasad Raghavendra, and A Useful approximations Tselil Schramm. On the integrality gap of degree-4 We recall some elementary approximations to loga- sum of squares for planted clique. In Proceed- ings of the Twenty-Seventh Annual ACM-SIAM rithms and entropies that will be useful in the anal- Symposium on Discrete Algorithms, SODA 2016, ysis. Arlington, VA, USA, January 10-12, 2016, pages n 1079–1095, 2016. Fact A.1. (Fact 4.3) If k = ρ then, [IP01] Russell Impagliazzo and Ramamohan Paturi. On the complexity of k-sat. J. Comput. Syst. Sci.,  ρ  1  log k = nH ± O (log n) = − o (1) ρ log n 62(2):367–375, 2001. n 2 [Jer92] . Large cliques elude the metropolis process. Random Struct. Algorithms, 3(4):347–360, Proof. By Stirling’s approximation, we have 1992. [Kar72] Richard M. Karp. Reducibility among combi- log n! = n log n − (log e) n + O (log n) natorial problems. In Proceedings of a symposium on the Complexity of Computer Computations, held Therefore the total entropy is given by March 20-22, 1972, at the IBM Thomas J. Wat- son Research Center, Yorktown Heights, New York., n log k = log pages 85–103, 1972. ρ [Kho01] Subhash Khot. Improved inaproximability re- = log n! − log ρ! − log (n − ρ)! sults for maxclique, chromatic number and approx- imate graph coloring. In 42nd Annual Symposium = n log n−ρ log ρ−(n − ρ) log (n − ρ) ± O (log n) on Foundations of Computer Science, FOCS 2001,  ρ  = nH ± O (log n), 14-17 October 2001, Las Vegas, Nevada, USA, pages n 600–609, 2001. [Kho06] Subhash Khot. Ruling out ptas for graph min- For small ε, we have bisection, dense k-subgraph, and bipartite clique.  ε2  SIAM Journal on Computing, 36(4):1025–1071, log (1 + ε) = (log e) ε − + O ε3 ; 2006. 2 [Kuc95] Ludek Kucera. Expected complexity of graph partitioning problems. Discrete Applied Mathemat- and in particular, ics, 57(2-3):193–212, 1995. n − ρ  ρ  [KZ11] Pascal Koiran and Anastasios Zouzias. On the log = O − certification of the restricted isometry property. n n CoRR, abs/1103.4984, 2011. Therefore, [Man16] Pasin Manurangsi. Almost-Polynomial Ra- tio ETH-Hardness of Approximating Densest k- n n log k = ρ · log + (n − ρ) · log + O (log n) Subgraph with Perfect Completeness. In submis- ρ n − ρ sion, 2016. 1   ρ  [MPW15] Raghu Meka, Aaron Potechin, and Avi = ρ · − o (1) log n + (n − ρ) · O +O (log n) Wigderson. Sum-of-squares lower bounds for 2 n planted clique. In Proceedings of the Forty-Seventh 1  = − o (1) ρ log n Annual ACM on Symposium on Theory of Comput- 2 ing, STOC 2015, Portland, OR, USA, June 14-17, 2015, pages 87–96, 2015. More useful to us will be the following bounds on [RS10] Prasad Raghavendra and David Steurer. Graph log k0: expansion and the . In Proceedings of the forty-second ACM symposium on 0 Fact A.2. (Fact 4.4) Let ε1 ≥ 5ε0, and k, k , V, n, ρ Theory of computing, pages 755–764. ACM, 2010. as specified in the construction. Then, [Rub15] Aviad Rubinstein. Eth-hardness for signaling in symmetric zero-sum games. CoRR, abs/1510.04991, n  ρ o log k0 ≥ max log k, nH − ε log k/ log n. 2015. n 1 [Rub16] Aviad Rubinstein. Detecting communities is hard and counting them is even harder. In submis- In particular, this means that most indices i ρ  sion, 2016. should contribute roughly H n entropy to the choice [Zuc07] David Zuckerman. Linear degree extractors and of v. n Proof. Observing that since k = ρ , we have B Small constants in the proof of Theorem 4.1 (A.1) To help verify the correctness of the proof, we con- n log |V | = log + ρ log |A| = (1 + o(1)) log k. centrate all the definitions of the small ε’s used in the ρ following list: We also have that • ε0 ≤ ε1/5 (A.2) 1 • ε1 ≥ 4ε2 log |A| + 8ε3 log log |V | = log(1 + o(1)) + log log k > log ρ > log n; 2  4 ε2 • ε2: ε2 < 0.2, δ = 2/ε where the first inequality follows from Fact 4.3, and |A| 2 the second from the definition of ρ. • ε3 ≥ ε4 log |A| − ε4 log ε4 − (1 − ε4) log(1 − ε4) Finally, we have 2 • ε4 = ω(n/ρ ) 0 log k = log k − ε0 log |V |/ log log |V |  log e  4 1 • ε5: ε > 14ε1 ≥ log k − ε (1 + o(1)) log k/ log n 8 5 0 2 4 2 (1−ε5) 4 1 • ε6: ε6 = (η/2 − ε5) 1/|A| 2 and d (1 + ≥ log k − ε1 log k/ log n, (1+ε5) 2 2 2ε7) < ε6/δ where the penultimate inequality uses (A.1) and 2 • ε7: ε7 ≥ 6ε5 + Θ(ε5) (A.2), and the last one follows from ε1 ≥ 5ε0. Using Fact 4.3 completes the proof.

We will also need the following bound which relates the entropies of a very biased coin and a slightly less biased one. 1+υ  Fact A.3. (Fact 4.5) H n is at most:  1  υ 1 υ2 υ3  H − log − (log e) + O n−2 + O n n n 2n n Proof. By definition, 1 + υ  1 + υ  1 + υ  H = − log n n n  1 + υ   1 + υ  − 1 − log 1 − n n

1  In order to relate this quantity to H n , we rewrite 1+υ  H n as: 1 1 υ 1 1 + υ  − log − log − · log (1 + υ) n n n n n | {z } (log e)(υ−υ2/2+O(υ3))  1   1  1  1  − 1 − log 1 − + υ log 1 − n n n n | {z } O(n−2)    1+υ ! 1 + υ 1 − n − 1 − · log 1 n 1 − n | {z } (log e)(−(υ/n)−O(υ/n2))  1  υ 1 υ2 υ3  = H − log − (log e) + O n−2 + O n n n 2n n