Lower bounds on the size of semidefinite relaxations

David Steurer Cornell

James R. Lee Prasad Raghavendra Washington Berkeley

Institute for Advanced Study, November 2015 overview of results

unconditional computational lower bounds for classical combinatorial optimization problems

o examples: MAX CUT, TRAVELING SALESMAN in restricted but powerful model of computation o generalizes best known algorithms o all possible linear and semidefinite relaxations o first super-polynomial lower bound in this model

general program goes back to [Yannakakis’88] for refuting flawed P=NP proofs

connection to optimization / convex geometry settle open question about semidefinite lifts of polytopes and positive semidefinite rank

[see survey Fazwi, Gouveia, Parrilo, Robinson, Thomas] overview of results proof strategy for lower bounds optimal in this model

achieves best possible approximation guarantees among all poly-time algorithms in this model

wide-range of problems: every constraint satisfaction problem

concrete algorithm: sum-of-squares (aka Lasserre) hierarchy [Shor’87, Parrilo’00, Lasserre’00]

derive lower bounds for general model from known counterexamples (integrality gaps) for sum-of-squares algorithm [Grigoriev, Schoenebeck, Tulsiani, Barak-Chan-Kothari] mathematical programming relaxations: powerful general approach for approximating NP-hard optimization problems

three flavors:

linear (LP) spectral (A)TSP edge expanders

semidefinite (SDP) MAX CUT, CSP, SPARSEST CUT

intriguing connection to hardness reductions (e.g., Unique Games Conjecture) plausibly optimal polynomial-time algorithms mathematical programming relaxations: powerful general approach for approximating NP-hard optimization problems

Yannakakis’s model motivated by flawed P=NP proofs [Yannakakis’88]

formalizes intuitive notion of LP relaxations for problem

enough structure for unconditional lower bounds (indep. of P vs. NP) Fiorini-Massar-Pokutta-Tiwary-de Wolf’12, Braun-Pokutta-S., Braverman-Moitra, Chan-Lee-Raghavendra-S., Rothvoß

extends to SDP relaxations (but LP lower bound techniques break down) [Fiorini-Massar-Pokutta-Tiwary-de Wolf, Gouveia-Parrilo-Thomas]

testing computational complexity conjectures

approximation / PCP: UNIQUE GAMES, sliding scale conjecture

average-case: RANDOM 3 SAT, LP formulations of MAX CUT 푥푖 = −1 find bipartition in given 푛-vertex graph 퐺 푥 = 1 to cut as many edges as possible 푖 2 푛 maximize 푓퐺 푥 = σ푖푗∈퐸 퐺 푥푖 − 푥푗 /4 over 푥 ∈ −1,1 (hypercube)

equivalently: maximize σ푖푗∈퐸 퐺 (1 − 푋푖푗)/2 over cut polytope ⊤ 푛 CUT푛 = convex hull of 푥푥 ∣ 푥 ∈ −1,1 #facets is exponential no small direct LP formulation LP formulations of MAX CUT 푥푖 = −1 find bipartition in given 푛-vertex graph 퐺 푥 = 1 to cut as many edges as possible 푖 2 푛 maximize 푓퐺 푥 = σ푖푗∈퐸 퐺 푥푖 − 푥푗 /4 over 푥 ∈ −1,1 (hypercube)

equivalently: maximize σ푖푗∈퐸 퐺 (1 − 푋푖푗)/2 over cut polytope ⊤ 푛 CUT푛 = convex hull of 푥푥 ∣ 푥 ∈ −1,1 #facets is exponential no small direct LP formulation

풅 general size-풏 LP formulation of MAX CUT [Yannakakis’88] 푛푑 푑 polytope 푃 ⊆ ℝ defined by ≤ 푛 linear inequalities that projects to CUT푛

often exponential savings: ℓ1-norm unit ball, Held-Karp TSP relaxation, LP/SDP hierarchies size lower bounds for LP formulations of MAX CUT? (implied by NP ≠ P/poly) example: poly-size LP formulation for ℓퟏ-norm ball

ℓ1-unit ball project on 푥 variables −푦 ≤ 푥 ≤ 푦 σ 푦 ≤ 1 σ푖 푥푖 ≤ 1 푖 푖 푛 푥 ∈ ℝ푛 푥, 푦 ∈ ℝ 2푛 linear inequalities 2푛 + 1 linear inequalities

exponential savings

풅 general size-풏 LP formulation of MAX CUT [Yannakakis’88] 푛푑 푑 polytope 푃 ⊆ ℝ defined by ≤ 푛 linear inequalities that projects to CUT푛

often exponential savings: ℓ1-norm unit ball, Held-Karp TSP relaxation, LP/SDP hierarchies size lower bounds for LP formulations of MAX CUT? (implied by NP ≠ P/poly) general size-풏풅 LP formulation of MAX CUT 푛푑 푑 polytope 푃 ⊆ ℝ defined by ≤ 푛 linear inequalities that projects to CUT푛

size lower bounds for LP formulations of MAX CUT exponential lower bound: 푑 ≥ Ω෩(푛) [Fiorini-Massar-Pokutta-Tiwary-de Wolf’12] approx. ratio > ½ requires superpolynomial size [Chan-Lee-Raghavendra-S.’13] but: best known MAX CUT algorithms based on SDP general size-풏풅 LP formulation of MAX CUT 푛푑 푑 polytope 푃 ⊆ ℝ defined by ≤ 푛 linear inequalities that projects to CUT푛 푑 푑 spectrahedron 푃 ⊆ ℝ푛 ×푛 defined by intersecting some affine linear subspace with psd cone

(artistic freedom) SDP size lower bounds for LP formulations of MAX CUT ? [Lee-Raghavendra-S.’15] exponential lower bound: 푑 ≥ Ω 푛0.1 approx. ratio > 0.99 requires super polynomial size (match NP-hardness for CSPs) best approx. ratio by 푛푑-size SDP no better than 푂(푑)-deg. sum-of-squares

 sum-of-squares is optimal SDP approximation algorithm for CSPs 2 푛 recall MAX CUT: maximize 푓퐺 푥 = σ푖푗∈퐸 퐺 푥푖 − 푥푗 /4 over 푥 ∈ −1,1 upper bound certificates algorithm with approx. guarantee must certify upper bounds on objective function 푓퐺 approx. ratio 훼 ⇒ algorithm certifies 푓퐺 ≤ 푐 for some 푐 ≤ OPT퐺/훼

can characterize LP/SDP algorithms by their certificates certificates of deg-풅 sum-of-squares algorithm (풏풅-size SDP example) 푛 2 certify 푓 ≥ 0 for function 푓: ±1 → ℝ iff 푓 = σ푖 푔푖 with ∀푖. deg 푔푖 ≤ 푑 equal as f’ns on hypercube captures best known algorithms for wide range of problems deg-ퟏ sum-of-squares captures Goemans-Williamson MAX CUT 0.878-approx. 2 for every graph 퐺, OPT퐺 − 0.878 ⋅ 푓퐺 = σ푖 푔푖 with deg 푔푖 ≤ 1 2 푛 recall MAX CUT: maximize 푓퐺 푥 = σ푖푗∈퐸 퐺 푥푖 − 푥푗 /4 over 푥 ∈ −1,1

certificates of deg-풅 sum-of-squares algorithm (풏풅-size SDP example) 푛 2 certify 푓 ≥ 0 for function 푓: ±1 → ℝ iff 푓 = σ푖 푔푖 with ∀푖. deg 푔푖 ≤ 푑 equal as f’ns on hypercube captures best known algorithms for wide range of problems deg-ퟏ sum-of-squares captures Goemans-Williamson MAX CUT 0.878-approx. 2 for every graph 퐺, OPT퐺 − 0.878 ⋅ 푓퐺 = σ푖 푔푖 with deg 푔푖 ≤ 1 2 푛 recall MAX CUT: maximize 푓퐺 푥 = σ푖푗∈퐸 퐺 푥푖 − 푥푗 /4 over 푥 ∈ −1,1 connection to Unique Games Conjecture best candidate algorithm to refute UGC: deg-푂෨ 1 sum of squares 2 enough to show: ∃푑. ∀퐺. OPT퐺 − 0.87ퟗ ⋅ 푓퐺 = σ푖 푔푖 with deg 푔푖 ≤ 푑 does larger degree help? yes: if 푓 ≥ 0, then 푓 = 푔2 for some function 푔 with deg 푔 ≤ 푛 (but 2푛-size SDP) 2 (tight: ½ − σ푖 푥푖 − ¼ ≥ 0 has no deg-표(푛) s.o.s. certificate)

certificates of deg-풅 sum-of-squares algorithm (풏풅-size SDP example) 푛 2 certify 푓 ≥ 0 for function 푓: ±1 → ℝ iff 푓 = σ푖 푔푖 with ∀푖. deg 푔푖 ≤ 푑 equal as f’ns on hypercube captures best known algorithms for wide range of problems deg-ퟏ sum-of-squares captures Goemans-Williamson MAX CUT 0.878-approx. 2 for every graph 퐺, OPT퐺 − 0.878 ⋅ 푓퐺 = σ푖 푔푖 with deg 푔푖 ≤ 1 where are the vectors? 푓퐺 − 푐 퐷 suppose: no deg-푑 sos certificate for 푓퐺 ≥ 푐 2  separating hyperplane 퐷: ±1 푛 → ℝ σ푖 푔푖 ∣ deg 푔푖 ≤ 푑 2 σ푥 퐷 푥 ⋅ 푔 푥 ≥ 0 whenever deg 푔 ≤ 푑 σ푥 퐷(푥) ⋅ 1 = 1 σ푥 퐷 푥 ⋅ 푓퐺 푥 > 푐 ⊤  푀 = σ푥 퐷 푥 ⋅ 푥푥 is usual SDP solution 푛 ℝ ±1 (in particular 푀 ≽ 0 and 푀푖푖 = 1)

 ∃ vectors 푣푖 with 푀푖푗 = 푣푖, 푣푗 certificates of deg-풅 sum-of-squares algorithm (풏풅-size SDP example) 푛 2 certify 푓 ≥ 0 for function 푓: ±1 → ℝ iff 푓 = σ푖 푔푖 with ∀푖. deg 푔푖 ≤ 푑 equal as f’ns on hypercube captures best known algorithms for wide range of problems deg-ퟏ sum-of-squares captures Goemans-Williamson MAX CUT 0.878-approx. 2 for every graph 퐺, OPT퐺 − 0.878 ⋅ 푓퐺 = σ푖 푔푖 with deg 푔푖 ≤ 1 where are the vectors? 푓퐺 − 푐 퐷 suppose: no deg-푑 sos certificate for 푓퐺 ≥ 푐 2  separating hyperplane 퐷: ±1 푛 → ℝ σ푖 푔푖 ∣ deg 푔푖 ≤ 푑 2 σ푥 퐷 푥 ⋅ 푔 푥 ≥ 0 whenever deg 푔 ≤ 푑 σ푥 퐷(푥) ⋅ 1 = 1 퐷 behaves like probability σ푥 퐷 푥 ⋅ 푓퐺 푥 > 푐 distribution over MAX CUT ⊤ solutions with expected value > 푐  푀 = σ푥 퐷 푥 ⋅ 푥푥 is usual SDP solution 푛  pseudo-distribution:ℝ ±1 useful (in particular 푀 ≽ 0 and 푀푖푖 = 1) way to think about LP/SDP  ∃ vectors 푣푖 with 푀푖푗 = 푣푖, 푣푗 relaxations in general certificates of deg-풅 sum-of-squares algorithm (풏풅-size SDP example) 푛 2 certify 푓 ≥ 0 for function 푓: ±1 → ℝ iff 푓 = σ푖 푔푖 with ∀푖. deg 푔푖 ≤ 푑 equal as f’ns on hypercube captures best known algorithms for wide range of problems deg-ퟏ sum-of-squares captures Goemans-Williamson MAX CUT 0.878-approx. 2 for every graph 퐺, OPT퐺 − 0.878 ⋅ 푓퐺 = σ푖 푔푖 with deg 푔푖 ≤ 1 certificates of deg-풅 sum-of-squares SDP algorithm 푛 2 certify 푓 ≥ 0 for function 푓: ±1 → ℝ iff 푓 = σ푖 푔푖 with ∀푖. deg 푔푖 ≤ 푑 certificates of general 풏풅-size SDP algorithm

푑 푑 characterized by psd-matrix valued function 푄: ±1 푛 → ℝ푛 ×푛 2 certify 푓 ≥ 0 iff ∃푃 ≽ 0. ∀푥 ∈ ±1 푛. 푓 푥 = Tr 푃푄(x) = 푃 푄 푥 퐹 ⊤ example: deg-풅 sum-of-squares SDP algorithm, 푄 푥 = 푥⊗푑 푥⊗푑 general SDP Q captured by deg-풅 sum-of-squares if deg 푄 푥 ≤ 푑 certificates of deg-풅 sum-of-squares SDP algorithm 푛 2 certify 푓 ≥ 0 for function 푓: ±1 → ℝ iff 푓 = σ푖 푔푖 with ∀푖. deg 푔푖 ≤ 푑 certificates of general 풏풅-size SDP algorithm

푑 푑 characterized by psd-matrix valued function 푄: ±1 푛 → ℝ푛 ×푛 2 certify 푓 ≥ 0 iff ∃푃 ≽ 0. ∀푥 ∈ ±1 푛. 푓 푥 = Tr 푃푄(x) = 푃 푄 푥 퐹 ⊤ example: deg-풅 sum-of-squares SDP algorithm, 푄 푥 = 푥⊗푑 푥⊗푑 general SDP Q captured by deg-풅 sum-of-squares if deg 푄 푥 ≤ 푑

푛푑 where does 푸 come from? general spectrahedron: S = 푧 ∈ ℝ σ푖 푧푖퐴푖 ≽ 퐵

SDP relax. for MAX CUT: ∃ cost functions 푦퐺 and feasible solutions 푧푥 ⊆ 푆 with 푦퐺, 푧푥 = 푓퐺 푥 (obj. value of cut 푥 in graph 퐺) choose 푄 with 푄 푥 = 퐵 − σ푖 푧푥,푖퐴푖 (slack of constraint at 푧푥 ∈ 푆) duality: max 푦퐺, 푧 ≤ 푐 iff ∃푃 ≽ 0. c − 푦퐺, 푧 = Tr 푃 ⋅ 퐵 − σ 푧푖퐴푖 푧∈푆 푖  푐 − 푓퐺 푥 = Tr 푃 ⋅ 푄 푥 for all 푥 certificates of deg-풅 sum-of-squares SDP algorithm 푛 2 certify 푓 ≥ 0 for function 푓: ±1 → ℝ iff 푓 = σ푖 푔푖 with ∀푖. deg 푔푖 ≤ 푑 certificates of general 풏풅-size SDP algorithm

푑 푑 characterized by psd-matrix valued function 푄: ±1 푛 → ℝ푛 ×푛 2 certify 푓 ≥ 0 iff ∃푃 ≽ 0. ∀푥 ∈ ±1 푛. 푓 푥 = Tr 푃푄(x) = 푃 푄 푥 퐹 ⊤ example: deg-풅 sum-of-squares SDP algorithm, 푄 푥 = 푥⊗푑 푥⊗푑 general SDP Q captured by deg-풅 sum-of-squares if deg 푄 푥 ≤ 푑

low-degree can simulate general 풏풅-size SDP algorithm by deg-푶 풅 sum-of-squares ∀ 푛푑-size SDP algorithm 푄. ′ ∀ low-deg matrix-valued function 퐹. 퐹, 푄 ≈ 퐹, 푄 ∃ low-deg SDP algorithm 푄′. deg 푄′ 푥 ≈ log 푛푑 and 푄 ≈ 푄′

[Lee-Raghavendra-S.’15] generalcertificates phenomenon of deg-풅 sum-of-squares SDP algorithm 푛 2 in ordercertify to 푓 approximate≥ 0 for function an object 푓: ± with1 → respectℝ iff 푓 to= aσ family푖 푔푖 with of tests, ∀푖. deg 푔푖 ≤ 푑 the approximator need not be more complex than the tests certificates of general 풏풅-size SDP algorithm technical challenge 푑 푑 characterized by psd-matrix valued function 푄: ±1 푛 → ℝ푛 ×푛 naïve application allows us to bound deg 푄′ 푥 but need to bound deg 푄′ 푥 2 in general:certify 푓퐝퐞퐠≥ 0 iff푸′ ∃≫푃 퐝퐞퐠≽ 0.푸′∀푥(at∈ the±1 heart푛. 푓 of푥 sum= Tr-of푃푄-squares(x) = counterexamples푃 푄 푥 ) 퐹 ⊤ example: deg-풅 sum-of-squares SDP algorithm, 푄 푥 = 푥⊗푑 푥⊗푑 general SDP Q captured by deg-풅 sum-of-squares if deg 푄 푥 ≤ 푑

low-degree can simulate general 풏풅-size SDP algorithm by deg-푶 풅 sum-of-squares ∀ 푛푑-size SDP algorithm 푄. ′ ∀ low-deg matrix-valued function 퐹. 퐹, 푄 ≈ 퐹, 푄 ∃ low-deg SDP algorithm 푄′. deg 푄′ 푥 ≈ log 푛푑 and 푄 ≈ 푄′

[Lee-Raghavendra-S.’15] low-degree can simulate general 풏풅-size SDP algorithm by deg-푶 풅 sum-of-squares ∀ 푛푑-size SDP algorithm 푄. ′ ∀ low-deg matrix-valued function 퐹. 퐹, 푄 ≈ 퐹, 푄 ∃ low-deg SDP algorithm 푄′. deg 푄′ 푥 ≈ log 푛푑 and 푄 ≈ 푄′

approach: learn “simplest” SDP algorithm 푄′ that satisfies 퐹, 푄 ≈ 퐹, 푄′ measure of simplicity: quantum entropy (classical entropy of eigenvalues of 푄 푥 ) closed-form solution: 푄′ 푥 = 푒푡⋅퐹 푥 where 푡 = entropy-defect 푄 ≤ log 푛푑

 matrix multiplicative weights method!

1 simple square root: 푄′ 푥 = 푒푡⋅퐹(푥)/2 ≈ σ푡 푡 ⋅ 퐹 푥 Τ2 푘 푘=0 푘!

 degree ≤ deg 퐹 ⋅ 푡 summary

can simulate general small SDP alg. by low-degree SDP alg.

interpret poly-size SDP algorithm as quantum state with high entropy

learn simplest SDP / quantum state via matrix multiplicative weights (maximum entropy) open questions

approximation beyond CSP and relatives

rule out 0.999-approximation for TRAVELING SALEMAN by poly-size LP/SDP

strong quantitative lower bounds for approximation

훀 ퟏ rule out 0.999-approximation for MAX CUT by ퟐ풏 -size LP/SDP latest news: solved by Raghavendra-Meka-Kothari ! stronger quantitative lower bounds for SDP

ퟎ.ퟗퟗퟗ rule out ퟐ풏 -size SDP for (exact) MAX CUT Thank you!