Faster Sublinear Approximation of the Number of K-Cliques in Low-Arboricity Graphs
Total Page:16
File Type:pdf, Size:1020Kb
Faster sublinear approximation of the number of k-cliques in low-arboricity graphs Talya Eden ✯ Dana Ron ❸ C. Seshadhri ❹ Abstract science [10, 49, 40, 58, 7], with a wide variety of applica- Given query access to an undirected graph G, we consider tions [37, 13, 53, 18, 44, 8, 6, 31, 54, 38, 27, 57, 30, 39]. the problem of computing a (1 ε)-approximation of the This problem has seen a resurgence of interest because number of k-cliques in G. The± standard query model for general graphs allows for degree queries, neighbor queries, of its importance in analyzing massive real-world graphs and pair queries. Let n be the number of vertices, m be (like social networks and biological networks). There the number of edges, and nk be the number of k-cliques. are a number of clever algorithms for exactly counting Previous work by Eden, Ron and Seshadhri (STOC 2018) ∗ n mk/2 k-cliques using matrix multiplications [49, 26] or combi- gives an O ( 1 + )-time algorithm for this problem n /k nk k natorial methods [58]. However, the complexity of these (we use O∗( ) to suppress poly(log n, 1/ε,kk) dependencies). algorithms grows with mΘ(k), where m is the number of · Moreover, this bound is nearly optimal when the expression edges in the graph. is sublinear in the size of the graph. Our motivation is to circumvent this lower bound, by A line of recent work has considered this question parameterizing the complexity in terms of graph arboricity. from a sublinear approximation perspective [20, 24]. The arboricity of G is a measure for the graph density Letting n denote the number of vertices, m the number “everywhere”. There is a very rich family of graphs with bounded arboricity, including all minor-closed graph classes of edges, and nk the number of k-cliques, the complexity (such as planar graphs and graphs with bounded treewidth), of approximating the number of k-cliques up to a (1 ε)- bounded degree graphs, preferential attachment graphs and ± n mk/2 more. multiplicative factor is O∗ 1/k + with a nearly n nk We design an algorithm for the class of graphs k with arboricity at most α, whose running time is matching lower bound [24].1 − − ∗ nαk 1 n mαk 2 O (min , 1 + ). We also prove a nearly We study the problem of approximating the number nk n /k nk { k } matching lower bound. For all graphs, the arboricity is of k-cliques in bounded arboricity graphs, with the hope O(√m), so this bound subsumes all previous results on sub- of circumventing the above lower bound.2 A graph of linear clique approximation. arboricity at most α has the property that the average As a special case of interest, consider minor-closed families of graphs, which have constant arboricity. Our degree in any subgraph is at most 2α [46, 47]. One of our result implies that for any minor-closed family of graphs, motivations is to understand when it is possible to get there is a (1 ε)-approximation algorithm for nk that has a running time of O (n/n ). This is an obvious lower running time±O∗( n ). Such a bound was not known even ∗ k nk bound, since a graph can simply contain n disjoint k- for the special (classic) case of triangle counting in planar k graphs. cliques, and, e.g., a cycle on the remaining vertices. It requires Ω(n/nk) uniform vertex samples just to land 1 Introduction in a k-clique. Are there classes of graphs for which one can accurately estimate the number of k-cliques in this The problem of counting the number of k-cliques in a time? graph is a fundamental problem in theoretical computer A consequence of our main theorem is an affirma- tive answer to this question, for the class of constant- ✯CSAIL at MIT, [email protected]. The majority of this arboricity graphs. The class of graphs with constant ar- work was done while the author was affiliated with Tel Aviv boricity is an immensely rich class, containing, among University. This research was partially supported by a grant from others, all minor-closed graph families. The concept the Blavatnik fund, and Schmidt and Rothschild Fellowships. The of constant arboricity plays a significant role in the the- author is grateful to the Azrieli Foundation for the award of an Azrieli Fellowship. ory of bounded expansion graphs, which has applications ❸Tel Aviv University, [email protected]. This research was partially supported by the Israel Science Foundation grants No. Downloaded 08/04/20 to 24.6.75.129. Redistribution subject SIAM license or copyright; see http://www.siam.org/journals/ojsa.php 671/13 and 1146/18. 1As stated in the abstract, we use the O∗(·) notation to ❹ University of California, Santa Cruz, [email protected]. This suppress poly(log n, 1/ε,kk) dependencies. research was funded by NSF CCF-1740850, NSF CCF-1813165, 2The arboricity of a graph is the minimal number of forests and ARO Award W911NF191029. required to cover the edges of the graph. Copyright © 2020 by SIAM 1467 Unauthorized reproduction of this article is prohibited in logic, descriptive complexity, and fixed parameter Recall that α is always upper bounded by √m, so tractability [48]. In the context of real-world graphs, the that the bound in Theorem 1.1 subsumes the result for classic Barab´asi-Albert preferential attachment graphs for approximating the number of k-cliques in general as well as additional models generate constant arboric- graphs [24]. As we discuss in more detail in Section 2, ity graphs [3, 5, 4]. In most real-world graphs, the ar- our algorithm starts similarly to the algorithm of [24] boricity is at most an order of magnitude larger than but departs quickly since it relies on a different, itera- the average degree, while the maximum degree is three tive, approach so as to achieve the dependence on α. − to four orders of magnitude larger [32, 39, 55]. In prac- n mαk 2 Comparing our bound of O∗ 1/k + for tical applications, low arboricity is often exploited for n nk k faster algorithms for clique and dense subgraph count- approximate counting with the Chiba and Nishizeki k 2 ing [28, 30, 45, 39, 16]. bound of O(n + mα − ) for exact counting, we get log n kk A classic result of Chiba and Nishizeki gives an that when nk poly · , our bound is smaller, k 2 ≫ ε O(n + mα − ) algorithm for exact counting of k-cliques and as nk increases the gap becomes more significant. in graphs of arboricity at most α [10]. Our primary Note that Chiba and Nishizeki read the entire graph, motivation is to get a sublinear-time algorithm for so that they have full knowledge of the graph, and approximating the number of k-cliques on such graphs. their challenge is to count the number of k-cliques (by We assume the standard query model for general graphs enumerating them), as efficiently (in terms of running (refer to Chapter 10 of Goldreich’s book [33]), so that time) as possible. On the other hand, our algorithm the algorithm can perform degree, neighbor and pair may obtain only a partial view of the graph. Hence, our queries. Let us exactly specify each query. (1) Degree challenge is to compute an estimate of the number of k- queries: given v V , get the degree d(v). (2) Neighbor ∈ th cliques based on such partial knowledge, by devising a queries: given v V and i d(v) get the i neighbor careful sampling procedure (that in particular, exploits of v. (3) Pair queries:∈ given≤ vertices u,v, determine if 3 the bounded arboricity). (u,v) is an edge. An application of Theorem 1.1 for the family of minor-closed graphs4 gives the following corollary.G We 1.1 Results. Our main result is an algorithm for ap- note that even for the special case of triangle counting in proximating the number of k-cliques, whose complexity planar graphs, such a result was not previously known. depends on the arboricity. The algorithm is sublinear k 2 for nk = ω(α − ) (and we subsequently show that for Corollary 1.2. Let be a minor-closed family of G smaller nk, sublinear complexity cannot be obtained). graphs. There is an algorithm that, given n,k,ε, and query access to G , outputs a (1 ε)-approximation Theorem 1.1. There exists an algorithm that, given ∈G ± of nk with high constant probability. The expected n, k, an approximation parameter 0 <ε< 1, query running time of the algorithm is access to a graph G, and an upper bound α on the k arboricity of G, outputs an estimate nk, such that with (n/nk) poly(log n, 1/ε, k ). high constant probability (over the randomness of the · algorithm), b In general, we prove that the bound of Theorem 1.1 is nearly optimal. (1 ε) nk nk (1 + ε) nk. − · ≤ ≤ · Theorem 1.3. Consider the set of graphs of ar- G The expected running timeb of the algorithm is boricity at most α. Any multiplicative approximation algorithm that succeeds with constant probability on all nαk 1 n mαk 2 graphs in must make min − , + − poly(log n, 1/ε, kk), G 1/k k−1 k−2 nk nk · nα n m(α/k) ( nk ) Ω min k , 1/k + min , m k nk k n nk · · k and the expected query complexity is the minimum queries in expectation.