Statistical Algorithms and a Lower Bound for Detecting Planted Cliques

Statistical Algorithms and a Lower Bound for Detecting Planted Cliques ∗ y z y Vitaly Feldman Elena Grigorescu Lev Reyzin Almaden Research Center Dept. of Computer Science Dept. of Mathematics IBM Purdue University University of Illinois at Chicago San Jose, CA 95120 West Lafayette, IN 47907 Chicago, IL 60607 [email protected] [email protected] [email protected] y y Santosh S. Vempala Ying Xiao School of Computer Science School of Computer Science Georgia Inst. of Technology Georgia Inst. of Technology Atlanta, GA 30332 Atlanta, GA 30332 [email protected] [email protected] ABSTRACT Categories and Subject Descriptors We introduce a framework for proving lower bounds on com- F.2 [Analysis of Algorithms and Problem Complex- putational problems over distributions, based on a class of ity]: General algorithms called statistical algorithms. For such algorithms, access to the input distribution is limited to obtaining an es- Keywords timate of the expectation of any given function on a sample drawn randomly from the input distribution, rather than Statistical algorithms; Lower bounds; Planted clique; Sta- directly accessing samples. Most natural algorithms of in- tistical query terest in theory and in practice, e.g., moments-based methods, local search, standard iterative methods for convex op- 1. INTRODUCTION timization, MCMC and simulated annealing, are statistical We study the complexity of problems where the input algorithms or have statistical counterparts. Our framework consists of independent samples from an unknown distri- is inspired by and generalize the statistical query model in bution. Such problems are at the heart of machine learning learning theory [34]. and statistics (and their numerous applications) and also oc- Our main application is a nearly optimal lower bound cur in many other contexts such as compressed sensing and on the complexity of any statistical algorithm for detect- cryptography. Many methods exist to estimate the sample ing planted bipartite clique distributions (or planted dense complexity of such problems (e.g. VC dimension [47]). Prov- subgraph distributions) when the planted clique has size ing lower bounds on the computational complexity of these 1=2−δ O(n ) for any constant δ > 0. Variants of these prob- problems has been much more challenging. The traditional lems have been assumed to be hard to prove hardness for approach to this is via reductions and finding distributions other problems and for cryptographic applications. Our that can generate instances of some problem conjectured to lower bounds provide concrete evidence of hardness, thus be intractable (e.g., assuming NP 6= RP). supporting these assumptions. Here we present a different approach, namely showing that a broad class of algorithms, which we refer to as statistical al- ∗This material is based upon work supported by the Na- gorithms, have high complexity, unconditionally. Our defini- tional Science Foundation under Grant #1019343 to the tion encompasses most algorithmic approaches used in prac- Computing Research Association for the CIFellows Project. tice and in theory on a wide variety of problems, including yResearch supported in part by NSF awards CCF-0915903 Expectation Maximization (EM) [16], local search, MCMC and CCF-0910584. optimization [45, 25], simulated annealing [36, 48], first and zResearch supported in part by a Simons Postdoctoral Fel- second order methods for linear/convex optimization, [17, lowship. 6], k-means, Principal Component Analysis (PCA), Inde- pendent Component Analysis (ICA), Na¨ıve Bayes, Neural Networks and many others (see [13] and [9] for proofs and many other examples). In fact, we are aware of only one al- Permission to make digital or hard copies of all or part of this work for gorithm that does not have a statistical counterpart: Gaus- personal or classroom use is granted without fee provided that copies are sian elimination for solving linear equations over a field (e.g., not made or distributed for profit or commercial advantage and that copies mod 2). bear this notice and the full citation on the first page. To copy otherwise, to Informally, statistical algorithms can access the input dis- republish, to post on servers or to redistribute to lists, requires prior specific tribution only by asking for the value of any bounded real- permission and/or a fee. STOC’13, June 1-4, 2013, Palo Alto, California, USA. valued function on a random sample from the distribution, Copyright 2013 ACM 978-1-4503-2029-0/13/06 ...$15.00. or the average value of the function over a specified number of independent random samples. As an example, suppose lem of finding the largest clique in a random graph [33]. A we are trying to solve minu2U Ex∼D[f(x; u)] by gradient de- random graph Gn;1=2 contains a clique of size 2 log n with scent. Then the gradient of the objective function is (by high probability, and a simple greedy algorithm can find one interchanging the derivative with the integral) of size log n. Finding cliques of size (2 − ) log n is a hard problem for any > 0. Planting a larger clique should make ru E[f(x; u)] = E[ru(f(x; u))] x x it easier to find one. The problem of finding the smallest k for which the planted clique can be detected in polynomialp and can be estimated from samples; thus the algorithm can time has attracted significant attention. For k ≥ c n log n, proceed without ever examining samples directly. A more simply picking verticesp of large degrees suffices [37]. Cliques involved example is Linear Programming. One version is the of size k = Ω( n) can be found using spectral methods feasibility problem: find a nonzero w s.t. a · w ≥ 0 for all a [2, 38, 14], via SDPs [19], nuclear norm minimization [3] or in some set A. We can formulate this as combinatorial methods [21, 15]. While there is no known polynomial-time algorithm that max E[sign(a · w)] p w a can detect cliques of size below the threshold of Ω( n), there is a quasipolynomial algorithm for any k ≥ 2 log n: enumer- and the distribution over a could be uniform over the set A ate subsets of size 2 log n; for each subset that forms a clique, if it is finite. This can be solved by a statistical algorithm take all common neighbors of the subset; one of these will be [10, 17]. This is also the case for semidefinite programs and the planted clique. This is also the fastest known algorithm for conic optimization [6]. The key motivation for our def- for any k = O(n1=2−δ), where δ > 0. inition of statistical algorithms is the empirical observation Some evidence of the hardness of the problem was shown that almost all algorithms that work on explicit instances by Jerrum [30] who proved that a specific approach using are already statistical in our sense or have natural statisti- p a Markov chain cannot be efficient for k = o( n). More cal counterparts. Thus, proving lower bounds for statistical evidence of hardness is given in [20], where it is shown that algorithms strongly indicates the need for new approaches Lovász-Schrijver SDP relaxations, which include the SDP even for explicit instances. We present the formal oracle- used in [19], cannot be used to efficiently find cliques of based definitions of statistical algorithms in Section 2. p size k = o( n). The problem has been used to generate The inspiration for our model is the statistical query (SQ) cryptographic primitives [31], and as a hardness assumption model in learning theory [34] defined as a restriction of [1, 28, 40]. Valiant's PAC learning model [46]. The primary goal of We focus on the bipartite planted clique problem, where a the restriction was to simplify the design of noise-tolerant bipartite k ×k clique is planted in a random bipartite graph. learning algorithms. As was shown by Kearns and others A densest-subgraph version of the bipartite planted clique in subsequent works, almost all classes of functions that can problem has been used as a hard problem for cryptographic be learned efficiently can also be efficiently learned in the applications [4]. All known bounds and algorithms for the restricted SQ model. A notable and so far the only excep- k-clique problem can be easily adapted to the bipartite case. tion is the algorithm for learning parities, based on Gaussian Therefore it is natural to suspect that new upper bounds on elimination. As was already shown by Kearns [34], parities the planted k-clique problem would also yield new upper require exponential time to learn in the SQ model. Further, bounds for the bipartite case. Blum et al. [11] proved that the number of SQs required The starting point of our investigation for this problem for weak learning (that is, for obtaining a non-negligible ad- is the property of the bipartite planted k-clique problem vantage over the random guessing) of a class of functions C is that it has an equivalent formulation as a problem over over a fixed distribution D is characterized by a combina- distributions defined as follows. torial parameter of C and D, referred to as SQ-DIM(C; D), the statistical query dimension. Problem 1. Fix an integer k, 1 ≤ k ≤ n, and a subset Our notion of statistical algorithms generalizes SQ learn- of k indices S ⊆ f1; 2; : : : ; ng. The input distribution DS ing algorithms to any computational problem over distribu- on vectors x 2 f0; 1gn is defined as follows: with probability tions. For any problem over distributions we define a pa- 1 − (k=n), x is uniform over f0; 1gn; and with probability rameter of the problem that lower bounds the complexity of k=n the k coordinates of x in S are set to 1, and the rest solving the problem by any statistical algorithm in the same are uniform over their support.

Load more