Faster sublinear approximation of the number of k-cliques in low-arboricity graphs

Talya Eden ✯ Dana Ron ❸ C. Seshadhri ❹

Abstract science [10, 49, 40, 58, 7], with a wide variety of applica- Given query access to an undirected graph G, we consider tions [37, 13, 53, 18, 44, 8, 6, 31, 54, 38, 27, 57, 30, 39]. the problem of computing a (1 ε)-approximation of the This problem has seen a resurgence of interest because number of k-cliques in G. The± standard query model for general graphs allows for queries, neighbor queries, of its importance in analyzing massive real-world graphs and pair queries. Let n be the number of vertices, m be (like social networks and biological networks). There the number of edges, and nk be the number of k-cliques. are a number of clever algorithms for exactly counting Previous work by Eden, Ron and Seshadhri (STOC 2018) ∗ n mk/2 k-cliques using matrix multiplications [49, 26] or combi- gives an O ( 1 + )-time algorithm for this problem n /k nk k natorial methods [58]. However, the complexity of these (we use O∗( ) to suppress poly(log n, 1/ε,kk) dependencies). algorithms grows with mΘ(k), where m is the number of · Moreover, this bound is nearly optimal when the expression edges in the graph. is sublinear in the size of the graph. Our motivation is to circumvent this lower bound, by A line of recent work has considered this question parameterizing the complexity in terms of graph arboricity. from a sublinear approximation perspective [20, 24]. The arboricity of G is a measure for the graph density Letting n denote the number of vertices, m the number “everywhere”. There is a very rich family of graphs with bounded arboricity, including all minor-closed graph classes of edges, and nk the number of k-cliques, the complexity (such as planar graphs and graphs with bounded treewidth), of approximating the number of k-cliques up to a (1 ε)- bounded degree graphs, preferential attachment graphs and ± n mk/2 more. multiplicative factor is O∗ 1/k + with a nearly n nk We design an algorithm for the class of graphs  k  with arboricity at most α, whose running time is matching lower bound [24].1 − − ∗ nαk 1 n mαk 2 O (min , 1 + ). We also prove a nearly We study the problem of approximating the number nk n /k nk { k } matching lower bound. For all graphs, the arboricity is of k-cliques in bounded arboricity graphs, with the hope O(√m), so this bound subsumes all previous results on sub- of circumventing the above lower bound.2 A graph of linear clique approximation. arboricity at most α has the property that the average As a special case of interest, consider minor-closed families of graphs, which have constant arboricity. Our degree in any subgraph is at most 2α [46, 47]. One of our result implies that for any minor-closed family of graphs, motivations is to understand when it is possible to get there is a (1 ε)-approximation algorithm for nk that has a running time of O (n/n ). This is an obvious lower running time±O∗( n ). Such a bound was not known even ∗ k nk bound, since a graph can simply contain n disjoint k- for the special (classic) case of triangle counting in planar k graphs. cliques, and, e.g., a cycle on the remaining vertices. It requires Ω(n/nk) uniform vertex samples just to land 1 Introduction in a k-clique. Are there classes of graphs for which one can accurately estimate the number of k-cliques in this The problem of counting the number of k-cliques in a time? graph is a fundamental problem in theoretical computer A consequence of our main theorem is an affirma- tive answer to this question, for the class of constant- ✯CSAIL at MIT, [email protected]. The majority of this arboricity graphs. The class of graphs with constant ar- work was done while the author was affiliated with Tel Aviv boricity is an immensely rich class, containing, among University. This research was partially supported by a grant from others, all minor-closed graph families. The concept the Blavatnik fund, and Schmidt and Rothschild Fellowships. The of constant arboricity plays a significant role in the the- author is grateful to the Azrieli Foundation for the award of an Azrieli Fellowship. ory of bounded expansion graphs, which has applications ❸Tel Aviv University, [email protected]. This research was partially supported by the Israel Science Foundation grants No. Downloaded 08/04/20 to 24.6.75.129. Redistribution subject SIAM license or copyright; see http://www.siam.org/journals/ojsa.php 671/13 and 1146/18. 1As stated in the abstract, we use the O∗(·) notation to ❹ University of California, Santa Cruz, [email protected]. This suppress poly(log n, 1/ε,kk) dependencies. research was funded by NSF CCF-1740850, NSF CCF-1813165, 2The arboricity of a graph is the minimal number of forests and ARO Award W911NF191029. required to cover the edges of the graph.

Copyright © 2020 by SIAM 1467 Unauthorized reproduction of this article is prohibited in logic, descriptive complexity, and fixed parameter Recall that α is always upper bounded by √m, so tractability [48]. In the context of real-world graphs, the that the bound in Theorem 1.1 subsumes the result for classic Barab´asi-Albert preferential attachment graphs for approximating the number of k-cliques in general as well as additional models generate constant arboric- graphs [24]. As we discuss in more detail in Section 2, ity graphs [3, 5, 4]. In most real-world graphs, the ar- our algorithm starts similarly to the algorithm of [24] boricity is at most an order of magnitude larger than but departs quickly since it relies on a different, itera- the average degree, while the maximum degree is three tive, approach so as to achieve the dependence on α. − to four orders of magnitude larger [32, 39, 55]. In prac- n mαk 2 Comparing our bound of O∗ 1/k + for tical applications, low arboricity is often exploited for n nk  k  faster algorithms for clique and dense subgraph count- approximate counting with the Chiba and Nishizeki k 2 ing [28, 30, 45, 39, 16]. bound of O(n + mα − ) for exact counting, we get log n kk A classic result of Chiba and Nishizeki gives an that when nk poly · , our bound is smaller, k 2 ≫ ε O(n + mα − ) algorithm for exact counting of k-cliques and as nk increases the gap becomes more significant. in graphs of arboricity at most α [10]. Our primary Note that Chiba and Nishizeki read the entire graph, motivation is to get a sublinear-time algorithm for so that they have full knowledge of the graph, and approximating the number of k-cliques on such graphs. their challenge is to count the number of k-cliques (by We assume the standard query model for general graphs enumerating them), as efficiently (in terms of running (refer to Chapter 10 of Goldreich’s book [33]), so that time) as possible. On the other hand, our algorithm the algorithm can perform degree, neighbor and pair may obtain only a partial view of the graph. Hence, our queries. Let us exactly specify each query. (1) Degree challenge is to compute an estimate of the number of k- queries: given v V , get the degree d(v). (2) Neighbor ∈ th cliques based on such partial knowledge, by devising a queries: given v V and i d(v) get the i neighbor careful sampling procedure (that in particular, exploits of v. (3) Pair queries:∈ given≤ vertices u,v, determine if 3 the bounded arboricity). (u,v) is an edge. An application of Theorem 1.1 for the family of minor-closed graphs4 gives the following corollary.G We 1.1 Results. Our main result is an algorithm for ap- note that even for the special case of triangle counting in proximating the number of k-cliques, whose complexity planar graphs, such a result was not previously known. depends on the arboricity. The algorithm is sublinear k 2 for nk = ω(α − ) (and we subsequently show that for Corollary 1.2. Let be a minor-closed family of G smaller nk, sublinear complexity cannot be obtained). graphs. There is an algorithm that, given n,k,ε, and query access to G , outputs a (1 ε)-approximation Theorem 1.1. There exists an algorithm that, given ∈G ± of nk with high constant probability. The expected n, k, an approximation parameter 0 <ε< 1, query running time of the algorithm is access to a graph G, and an upper bound α on the k arboricity of G, outputs an estimate nk, such that with (n/nk) poly(log n, 1/ε, k ). high constant probability (over the randomness of the · algorithm), b In general, we prove that the bound of Theorem 1.1 is nearly optimal. (1 ε) nk nk (1 + ε) nk. − · ≤ ≤ · Theorem 1.3. Consider the set of graphs of ar- G The expected running timeb of the algorithm is boricity at most α. Any multiplicative approximation algorithm that succeeds with constant probability on all nαk 1 n mαk 2 graphs in must make min − , + − poly(log n, 1/ε, kk), G 1/k k−1 k−2 nk nk · nα n m(α/k) ( nk ) Ω min k , 1/k + min , m k nk k n nk   · · k   and the expected query complexity is the minimum queries in expectation. n o between the expected running time and O(m + n). 1.2 Related Work. Clique counting, and the spe- cial case of triangle counting, have received significant 3 Gonen et al. [35] proved that any algorithm for approximating attention in a variety of models. We refer the inter- Downloaded 08/04/20 to 24.6.75.129. Redistribution subject SIAM license or copyright; see http://www.siam.org/journals/ojsa.php the number of triangles when given access only to degree and ested reader to related work sections of [20] and [24] for neighbor queries, requires Ω(n) queries when m = Θ(n). We note that their lower bound is based on constructing two families of graphs, where both families have constant arboricity. Therefore, 4A family of graphs is said to be minor-closed if it is closed their lower bound holds for bounded arboricity graphs. under vertex removals, edge removals and edge contractions.

Copyright © 2020 by SIAM 1468 Unauthorized reproduction of this article is prohibited general references. We will focus on algorithms for low estimating other graph parameters such as the minimum arboricity graphs. spanning , matchings, and vertex covers [9, 15, 14, The starting point for such algorithms is the seminal 50, 61, 52, 50, 42, 61, 36, 51]. work of Chiba and Nishizeki, who give an O(n + k 2 mα − ) algorithm for enumerating k-cliques in a graph 1.3 Organization of the paper. Our algorithm of arboricity at most α [10]. The usual approach to and its analysis are quite involved. In Section 2 we exploit the arboricity is to use degree or degeneracy give a fairly elaborate (but informal) overview of our orientations, and this method has appeared in a number algorithm and the ideas behind it. After introducing of theoretical and practical results on triangle and some preliminaries and defining some central notions clique counting [12, 56, 7, 30, 39, 16]. Recent work (in Sections 3 and 4), we provide our algorithm and the by Kopelowitz et al. shows that improving the O(mα) main procedures it uses (in Sections 5 and 6). The full bound for triangle counting is 3-SUM hard [41]. details of its analysis, as well as the proofs of the lower Our work follows a line of work on estimating sub- bound are given in the full version of this paper [23], graph counts using sublinear algorithms. The first re- which from here on we refer to as the full version. sults were average degree estimation results of Feige [29] and Goldreich and Ron [34]. These ideas were extended 2 Overview of the algorithm and lower bound by Gonen et al. to estimate counts [35]. This We start with describing the main ideas behind the was the first paper that looked at the problem of es- algorithm. As we explain below, our starting point timating triangles, albeit from a lower bound perspec- is similar to the one applied in [24] for approximately tive. Eden et al. gave the first sublinear algorithm for counting the number of k-cliques in general graphs (and approximating the number of triangles. Their result that of [20], for k = 3). However, in order to exploit was generalized by the authors for k-clique counting (as the fact that the graphs we consider have bounded mentioned earlier) [24]. Recently, Assadi et al. [2] gave arboricity, we depart quite early from the [24] algorithm, an algorithm for approximately counting the number and introduce a variety of new ideas. For the sake of of occurrences of any arbitrary graph H in an input 1/k ρ(H) simplicity of the presentation, assume that α

Downloaded 08/04/20 to 24.6.75.129. Redistribution subject SIAM license or copyright; see http://www.siam.org/journals/ojsa.php 5 The fractional edge cover of a graph H = (VH ,EH ) is a Rm assume that E is close to its expected value n . mapping ψ : EH → [0, 1] such that for each vertex a ∈ VH , | R| · |R| P ψ(e) ≥ 1. The fractional edge-cover number ρ(H) of e∈EH ,a∈e H is the minimum value of P ψ(e) among all fractional edge 6The algorithm may actually obtain a multiset, but in this e∈EH covers ψ. exposition, we abuse terminology and call it a ‘set’.

Copyright © 2020 by SIAM 1469 Unauthorized reproduction of this article is prohibited In ERS, w( ) is approximated by sampling uniform 2.2 An iterative sampling process. The ERS al- edges in E Rand extending them to k-cliques. Consider gorithm can be viewed as a three-step process. It first first the (easy)R case where all the vertices have degree samples vertices, then samples edges (incident to the O(√m). In this case it is possible to extend an edge sampled vertices), and then (in one step) samples k- (u,v) for u to a (potential) k-clique by sampling cliques that are extensions of these edges. To get a k 2 neighbors∈ R of u, each with probability roughly complexity depending on the arboricity, we devise an 1/−√m (and checking whether we obtained a clique). iterative clique sampling process. In iteration t, we ob- The probability that this process yields a k-clique is tain a sample of t-cliques, based on the sample of (t 1)- w( ) nk − roughly R k−2 k/2 . By repeating the above cliques from the previous iteration. ER √m m | |· ≈ It is crucial in our analysis to distinguish ordered process O(mk/2/n ) times,7 it is possible to can get an k cliques from unordered cliques. An unordered t-clique estimate of w( ) and thus of n (assuming an efficient k T is a set of t vertices T = v ,...,v (such that verification procedureR for the assignment rule). For 1 t every two vertices are connected),{ while an} ordered t- the case when degrees are much larger than √m, ERS ~ gives a more complex procedure that extends edges to clique is a tuple of t vertices T =(v1,...,vt) such that v ,...,v is a clique. We say that T~ = (v ,...,v ) (potential) k-cliques. In the end, each k-clique is still { 1 t} 1 t k/2 participates in a clique C, if v ,...,v C. We sampled with probability roughly nk/m . { 1 t} ⊆ In our setting (where the arboricity is at most α) also extend the (yet undefined) assignment rule to allow assigning k-cliques to ordered t-cliques for any t k the simple scenario discussed above of vertex degrees ≤ bounded by O(√m) corresponds to the case that all (and not just to vertices, which is the special case of vertex degrees are O(α). In this special case we can t = 1). For an ordered t-clique T~, let w(T~) be the extend an edge to a (potential) k-clique in the same number of k-cliques that are assigned to T~, and for a ~ manner as ERS, and get that the success probability set of ordered t-cliques , let w( ) = T~ w(T ). nk R R ∈R of sampling a k-clique is Ω −2 . Unfortunately, it Our goal is to estimate w(V ) nk. We defer the mαk ≈ P is not clear how to adapt the ERS approach for the discussion of the assignment rule and for now focus on  unbounded-degrees case and obtain a dependence on α the algorithm. instead of √m. The algorithm starts by sampling a set of s1 ordered To give a better sense of the challenge, we focus 1-cliques (vertices), denoted 1. Assume that w( 1) nk R R ≈ on approximating the number of triangles (i.e., n ) and s1. The algorithm next samples a set of s2 ordered 3 n · consider a graph G with m = Θ(n) and n3 = Θ(√n). 2-cliques (ordered edges), denoted 2, incident to the th R The upper bound of ERS allows for a “budget” of vertices of 1. For t > 2, the t iteration extends t R R n m3/2 to t+1, as described next. O 1 3 + = O (n) queries. That is, since ∗ / n3 ∗ R n3 For an ordered t-clique T~, let d(T~) be the degree of   m = Θ(n), they can essentially “afford” to read the the minimum-degree vertex in T~, and for a set of ordered entire graph. Now assume that we are also given ~ cliques , let d( )= T~ d(T ). The sampling of the R R ∈R that α = O(1). Our upper bound only allows for set t+1 is done by repeating the following st+1 times: 2 R P nα n mα sample a clique T~ in t with probability proportional O∗ min , 1 3 + = O∗(√n) queries, that n3 / n3 R n3 to d(T~)/d( ) and then select a uniform neighbor of    Rt is, a strictly sublinear number of queries. Recall that the least degree vertex in T~. Hence, each (t + 1)-tuple bounded arboricity does not imply bounded vertex that is an extension of an ordered t-clique in t is degrees. Our main challenge is to exploit the bounded d(T~) 1 1 R sampled with probability d( ) ~ = d( ) . For each arboricity so as to deal with vertices with high degrees Rt · d(T ) Rt and with high variance in the number of triangles that sampled (t + 1)-tuple, the algorithm checks whether it reside on different vertices (and edges). is a (t + 1)-clique, and if so, adds it to t+1. Suppose R Therefore, at this point, we depart from the ap- that the weight function (defined by the assignment proach of ERS. rule) has the following property. The weight w( t) is the sum of the weights taken over all ordered (tR+ 1)- cliques that are extensions of the ordered t-cliques in . Rt 7The observant reader may be worried that this requires know- We can conclude that the expected value of w( t+1) is w( ) R ing m and n , where the former is not provided to the algorithm t k d(R ) st+1.

Downloaded 08/04/20 to 24.6.75.129. Redistribution subject SIAM license or copyright; see http://www.siam.org/journals/ojsa.php t and the latter is just what we want to estimate. However, con- R · We need to get good upper bounds for st+1, while stant factor estimates of both suffice for our purposes. For m this ensuring that w( t+1) is concentrated around its mean. can be obtained using [22], and for nk this assumption can be R removed by performing a geometric search. For details see the Note that the probability of getting a (t + 1)-clique is full version. inversely proportional to d( ). Thus, we need good Rt

Copyright © 2020 by SIAM 1470 Unauthorized reproduction of this article is prohibited upper bounds on this quantity, to upper bound st+1. and the fact that the graph has arboricity at most α. This is where the arboricity enters the picture. Let We note that ERS also defined the notion of sociable denote the set of t-cliques in the graph. We give vertices (as vertices that participate in too many k- Ct a simple argument proving that d( t)= ~ d(T~)= cliques). However, their argument for bounding the C T t O(mαt 1). (Note that the case t = 2 is precisely∈C the number of unassigned k-cliques was simpler, as they did − P Chiba and Nishizeki bound min d(u),d(v) = not define and account for sociable cliques for t> 1. (u,v) E { } O(mα)[10].) We then show that d∈( ) is bounded as a The aforementioned assignment rule addresses P Rt function of d( t). Properties 1 to 3. We are left with Property 4 (and C how it fits in the big picture). 2.3 Desired properties of the assignment rule. Recall that we need to ensure that, with high probabil- 2.5 Verifying an assignment and costly cliques. ity, w( 1) is close to its expected value, which should Recall that in the last iteration of the algorithm, it has R n be close to k s , and that for every t 1, w( ) is a set k of ordered k-cliques, and it needs to compute n · 1 ≥ Rt+1 R w( t) w( k) (which can be translated to an estimate of nk). close to d(R ) st+1. In addition, we need to efficiently R Rt · Namely, for each ordered k-clique C~ = (v ,...,v ) verify the assignment rule. We achieve this by defining 1 k in , the algorithm needs to verify whether the an assignment rule that has the following properties. k correspondingR k-clique C = v ,...,v is assigned to { 1 k} 1. w(V ) nk. This ensures that the expected value C~ . This requires to verify whether C~ , and each of ≈ nk of w( 1) is approximately s1. its prefixes, is non-sociable. Furthermore, it requires R n · 2. For every t, the sum of the weights taken over all verifying that C~ is the first such ordered k-clique (in ordered (t+1)-cliques that are extensions of the ordered (C)). O t-cliques in t equals w( t). This ensures that for every ~ For an ordered t-clique T , consider the subgraph GT~ R w( t) R t, Ex[w( t+1)] = d(R ) st+1. induced by the set of vertices that neighbor every vertex R Rt · ~ ~ 3. For every ordered t-clique T~, w(T~) is not too in T . Observe that ck(T ) equals the number of (k t)- ~ − large. This ensures that with high probability w( t+1) cliques in GT~ . Therefore, deciding whether T is sociable is close to its expected value for all t, for a sufficientlyR amounts to deciding whether the number of (k t)- − large sample size st+1 (which depends on this upper cliques in the subgraph GT~ is greater than τt. Indeed this is similar to our original problem of estimating the bound on w(T~) as well as on d( t)). R ~ number of (k t)-cliques in a graph, except that it 4. Given a k-clique C and an ordered t-clique T = − is applied to a subgraph G ~ of our original graph G. (v1,...,vt) such that v1,...,vt C, we can efficiently T { }⊆ Unfortunately, we do not have direct query access to determine if C is assigned to T~. This ensures that when such subgraphs. To illustrate this, consider the case of we get the final set of ordered k-cliques, we can Rk t = 1 so that T~ consists of single vertex v. While we can compute its weight (and deduce an estimate of nk). sample uniform vertices in the subgraph G(v), we cannot We introduce key notions in the definition of such directly perform neighbor queries (without incurring a an assignment rule. possibly large cost when simulating queries to G(v) by performing queries to G). 2.4 Sociable cliques and the assignment rule. However, we show that we can still follow the For an ordered t-clique T~, let ck(T~) denote the number high-level structure of our iterative sampling algorithm of k-cliques containing T~. An ordered t-clique T~ is (though there are a few obstacles). Specifically, we called sociable if c (T~) is above a threshold τ αt 1. initialize = T~ , and for each j = t,...,k 1, k t − Rt { } − Otherwise, the clique is called non-sociable.≈ For a we sample a set of ordered (j + 1)-cliques given Rj+1 k-clique C = v ,...,v , let (C) be the set of a set of ordered j-cliques , exactly as described in 1 k Rj all ordered k-cliques{ corresponding} O to the k! tuples Section 2.2. The first difficulty that we encounter is inducing C. Let ′(C) be the subset of (C) that the following. The success probability of sampling an contains ordered k-cliquesO in (C) such that Oall prefixes ordered (j + 1)-clique that extends an ordered j-clique are non-sociable. ConsiderO the assignment rule that in is inversely proportional to d( ). Unfortunately, Rj Rj assigns C to the first (in lexicographic order) C~ ′(C) here we cannot argue that with high probability d( ) ∈O Rj and to each of its prefixes. can be upper bounded as a function of d( j) (which Downloaded 08/04/20 to 24.6.75.129. Redistribution subject SIAM license or copyright; see http://www.siam.org/journals/ojsa.php j 1 C In a central lemma we prove that the number of is O(mα − )). The reason is that while the algorithm k-cliques that are not assigned by this assignment rule described in Section 2.2 starts with a uniform sample to any ordered k-clique (and its prefixes) is relatively of vertices (that the following samples build on), R1 Rj small. The proof relies on the sociability thresholds τ { t}

Copyright © 2020 by SIAM 1471 Unauthorized reproduction of this article is prohibited here we start with = T~ for an arbitrary t-clique ordered cliques are sociable. A key notion is that of Rt { } T~. costly cliques, whose sociability cannot be determined We overcome this obstacle by defining the notion efficiently. Arboricity also plays a role in their definition of costly cliques. We say that an ordered t-clique T~ and in the proof that the additional loss incurred by not is costly if for some j t, d( (T~)) is too large, assigning k-cliques to costly ordered cliques is small. ≥ Cj where j(T~) is the set of j-cliques that T~ participates in. ForC such ordered t-cliques, we cannot efficiently 2.7 The lower bound. Both terms in the lower verify whether they are sociable. Thus, we modify our bound are direct generalizations of the lower bound assignment rule so that costly cliques are not assigned for approximately counting k-cliques in general graphs, any k-clique (even if they are non-sociable). We prove first proven in [24] and later simplified by Eden and that the additional loss in unassigned k-cliques is small Rosenbaum [25]. For the case that n α , the proof of the and that we can efficiently determine if an ordered t- k ≤ k clique is costly. So we start with t = T~ , apply the n Ω 1 term has already appeared  in the works R { } k n /k iterative process, and obtain a set of ordered k-cliques · k   α (that are all extensions of T~). To determine if T~ is mentioned above. For the case that nk > , the Rk k sociable, we need to estimate c (T~), i.e., the number of proof is based on a simple hitting argument as follows. k  (k t) cliques in G . Luckily, it suffices to make this We start with a fixed graph G′ with n vertices, m − T~ decision approximately. For the analysis to go through, edges, arboricity α, and no k-cliques. Now, let G1 α it suffices to distinguish between the case that ck(T~) is consist of r = nk/ k disjoint α-cliques, with a disjoint “too large”, and the case that it is “sufficiently small”. copy of G′. Let G2 consist of rα isolated vertices  Therefore, given k, the final decision (regarding the and a disjoint copy of G′. Hence, in both graphs R G and G there are Θ(n) vertices, Θ(m) edges, and sociability of T~) can be made just based on k . 1 2 |R | arboricity α. Furthermore, in both graphs there are no 2.6 Summary of our main new ideas and where edges between the different subgraphs, and we consider arboricity comes into play. The following are the a random labeling of the vertex names. Clearly, an main differences and new ideas as compared to ERS, algorithm cannot distinguish between the two graphs, with an emphasis on the role of bounded arboricity. unless it hits one of the cliques, which happens with k r α k nk 1. We introduce an iterative sampling process that, probability n· =Θ n α·k−1 . Therefore, any algorithm · starting from a uniform sample 1 of vertices, creates for estimating the number  of k-cliques must perform R k−1 intermediate samples t of ordered t-cliques, until it nα Ω k queries in expectation. R k nk obtains a sample of ordered k-cliques. Arboricity comes · k−2 into play here since the probability of obtaining an  The proof of the Ω min m(α/k) , m term is { nk } ordered (t + 1)-clique that can be added to t+1, is more involved, and is based on a reduction from a R   inversely proportional to 1/d( t), which in turn can be generalization of the set-disjointness communication R bounded as a function of α (and m). complexity problem. 2. We introduce an assignment rule and corre- sponding weight function w that ensures two properties. 3 Preliminaries (1) Almost every k-clique is assigned (to some ordered For an integer j, the set 1,...,j is denoted by [j]. k-clique and all its prefixes), and (2) no ordered clique For a pair of integers i j,{ the set of} integers i,...,j is assigned too many k-cliques. The former implies that is denoted by [i,j]. For≤ a multiset S, we use{ S to} w(V ) n . The latter implies that, in the iterative | | ≈ k denote the sum of multiplicities of the items in S. Our sampling process, each sample of larger ordered cliques algorithm gets parameters k and ε, where we assume “maintains the weight” (up to an appropriate normal- that ε< 1/2k2 (or else we set ε = 1/2k2). ization) of the previous sample. Let G = (V,E) be a graph with n vertices, m The arboricity α determines the sociability thresholds edges, and arboricity α(G). As noted in Section 2, we τt (above which an ordered clique is not assigned distinguish between a t-clique, which is a set of t vertices {any} k-clique). These thresholds are carefully chosen T = v1,...,vt (with an edge between every pair of to ensure that in graphs with arboricity at most α, vertices{ in the set),} and an ordered t-clique, which is a the number of unassigned k-cliques is sufficiently small. Downloaded 08/04/20 to 24.6.75.129. Redistribution subject SIAM license or copyright; see http://www.siam.org/journals/ojsa.php t-tuple of t distinct vertices T~ = (v1,...,vt) such that These parameters directly affect the time complexity of ~ the algorithm. v1,...,vt is a clique. For an ordered t-clique T = { } ~ 3. We show how the assignment rule can be ver- (v1,...,vt), we use U(T ) to denote the corresponding unordered t-clique v ,...,v . For cliques (ordered ified. This translates to determining whether certain { 1 t}

Copyright © 2020 by SIAM 1472 Unauthorized reproduction of this article is prohibited cliques) of size 1, that is, vertices, we may use v instead We next show how to define a weight function based of v (respectively, (v)), and similarly for cliques of size on a subset = k such that , which we { } A t=1 At At ⊆Ot 2 (edges). We let t(G) denote the set of t-cliques in G, refer to as a subset of active ordered cliques. Referring C S and nt(G)= t(G) . For the set of ordered t-cliques in to the notions introduced informally in Section 2, the G we use (|CG). When| G is clear from the context, we intention is that active ordered cliques will be non- Ot use the shorthand t, nt and t, respectively. sociable and non-costly where these notions are formally C O defined in the in the full version. The weight function Definition 3.1. (Clique’s least degree vertex) is closely linked to the notion of assigning k-cliques to For a clique (or ordered clique) C we let Γ(C) denote ordered cliques (as becomes clear in Definition 4.3). For the set of neighbors of C’s minimal-degree vertex now our goal is to define such a weight function that (breaking ties by ids) and let d(C) = Γ(C) . We refer | | is legal, and such that we can easily verify (based on to d(C) as the degree of the (ordered) clique and to ) whether an ordered k-clique has weight 1 or 0. We Γ(C) as its set of neighbors. For a set (or multiset) wouldA like to devise a weight function w such that w(V ) of cliques (or ordered cliques) , we use the notation R is not much smaller than nk, and that the weight of d( ) for C d(C). very ordered clique is appropriately bounded. We later R ∈R provide sufficient conditions on , which ensure that P We stress that Γ(C) (and respectively, d(C)) does not these properties hold. A refer to the union of neighbors of vertices in C, but only In what follows, for a set of ordered cliques (in to the neighbor of a single designated vertex in C. R particular of the same size), we say that T~ is first The proofs of the next two claims appear in the full in if it is lexicographically first. Also, for∈ an R ordered version. R t-clique T~ and j t, we use T~ j to denote the ordered ≤ ≤ t 1 ~ Claim 3.1. For every t, d( t(G)) 2m α(G) − . j-clique formed by the first j elements in T . C ≤ · Since we shall be interested in active ordered cliques 2α(G) such that all of their prefixes are also active, it will be Claim 3.2. For every t 2, nt(G) t nt 1(G) . ≥ ≤ · − useful to define the notion of fully active cliques. As a corollary of Claim 3.2, we obtain. Definition 4.2. (Fully-active cliques) Let be a A Corollary 3.1. For every 1 t

Downloaded 08/04/20 to 24.6.75.129. Redistribution subject SIAM license or copyright; see http://www.siam.org/journals/ojsa.php ∈R C to any ordered clique. PBy the above definition, For each ordered t-clique T~, we let wA(T~) denote Fact 4.1. Let w be a legal weight function. Then the number of k-cliques that are assigned to T~, and we w(V ) n . refer to wA(T~) as the weight of T~ (with respect to ). ≤ k A

Copyright © 2020 by SIAM 1473 Unauthorized reproduction of this article is prohibited Observe that by Definition 4.3, an ordered t-clique in t+1 are extensions of ordered cliques in t. Once T~ is assigned some k-clique C only if it is the t-prefix of theR algorithm reaches t = k, so that it has aR sample of some fully active (with respect to ) ordered k-clique C~ . ordered k-cliques, it calls the procedure Is-Assigned A This implies that T~ is active, and hence, only ordered on each ordered k-clique C~ in k to check whether it is R cliques in can have non-zero weight. assigned the unordered clique U(C~ ) (i.e., wA(C~ ) = 1). The nextA claim follows from Definition 4.3. Finally it returns an appropriately normalized version of the total weight of k. Claim 4.1. For any subset of ordered cliques, wA( ) R A · For the sake of the exposition, here we provide a is a legal weight function. slightly simplified version of the algorithm. In partic- The following definition encapsulates what we re- ular, we give approximate settings of the variables in h quire from the resulting weight function wA. the algorithm, and use the notation for these approx- imate settings. We also removed two steps in which the Definition 4.4. (Good active subset) For an ap- algorithm aborts, which are used in order to bound the proximation parameter ε and a vector of weight thresh- complexity of the algorithm. For full details (as well as olds ~τ = (τ ,...,τ ), we say that a subset of or- 1 k A a complete analysis) see full version. dered cliques is (ε, ~τ)-good if the following two condi- The procedure Sample-a-Set (invoked in Step 3b tions hold: of Approx-Cliques), is presented next. Given a 1. For every t [k] and for every ordered t-clique multiset of ordered t-cliques, consider all (t + 1)- ~ ~ ∈ t T , wA(T ) τt. R ~ ≤ tuples that each corresponds to an ordered t-clique T 2. wA(V ) (1 ε/2)nk. in t, and a neighbor v of T~. (For the definition of If only the first≥ condition− holds, then we say that is R the neighbors of an ordered clique T~ and its degree ~τ-bounded. A d(T~) refer to Definition 3.1.) Sample-a-Set samples As we shall discuss in more detail subsequently, we uniformly from these tuples, and includes in Rt+1 obtain the first item in Definition 4.4 by ensuring that those (t + 1)-tuples that are ordered (t + 1)-cliques. includes only ordered cliques that do not participate Constructing a data structures that supports sampling inA too many k-cliques. each T~ with probability d(T~)/d( ) when given ∈ Rt Rt all degrees d(T~) and d( t) can be implemented in linear 5 An oracle based algorithm time in (see e.g., [59R, 60, 43] on generating random |Rt| In order to make the presentation more modular, we variables according to a specified discrete distribution). first present an oracle-based algorithm. That is, we For a t-clique T = v1,...,vt , and j t, we let (T ) denote the set of j{-cliques that} T participates≥ in. assume the algorithm, Approx-Cliques, is given access Cj to an oracle A for a subset of active ordered cliques That is, the set of j-cliques T ′ such that T T ′. For Q ~ ~ ⊆ : for any given ordered clique T~, the oracle A an ordered t-clique T we use j(T ) as a shorthand for A Q C returns whether T~ . The algorithm also receives j(U(T~)), and also say that T~ participates in each T~ ′ ∈ A C ∈ an approximation parameter ε, a confidence parameter (T~). We let (T~) denote the set of ordered j-cliques Cj Oj δ, a “guess estimate” nk of nk, an estimate m of m, that are extensions of T~. For a multiset of (ordered) and a vector of weight-thresholds ~τ. Our main claim t-cliques , we use j( ) to denote the union of j(T ) will roughly be that if is (ε, ~τ)-good (as defined R ~ C R C e A e taken over all T , where here in “union” we mean in Definition 4.4), m m/2 and nk nk, then ∈8 R with multiplicity, and j( ) is defined analogously. with probability at least≥ 1 δ the algorithm≤ outputs a O R We extend the definitions of j and j to t-tuples such (1 ε) approximation of n−, by approximating w (V ). C O e k e A that for a t-tuple T~ that does not correspond to a t- A± constant-factor estimate m of m can be obtained clique, (T~) and (T~) are mapped to the empty set. by calling the moments-estimation algorithm of [22], Cj Oj Finally, for an ordered t-clique T~ = (v ,...,v ) and a which is designed to work for bounded-arboricity graphs 1 t e ~ (applying it simply to the first moment). We can vertex u, we use (T,u) as a shorthand for the (t + 1)- tuple (v1,...,vt,u). alleviate the need for the parameter nk, by relying on the search algorithm of [24] . It can be proved for an appropriate setting of the parameter s , the set returned by the procedure The algorithm Approx-Cliques startse by selecting t+1 Rt+1 a uniform sample of vertices (1-cliques). It then contin- is “typical” with respect to the set t. Essentially we Downloaded 08/04/20 to 24.6.75.129. Redistribution subject SIAM license or copyright; see http://www.siam.org/journals/ojsa.php prove that with high probability, w( R ) is close to its ues iteratively, where at the start of each iteration it has Rt+1 a sample of ordered t-cliques . It sends this sample Rt to the procedure Sample-a-Set, which returns a sam- 8The formal term should be “sum”, but since “sum” is usually ple of ordered (t + 1)-cliques . The ordered cliques used in the context of numbers, we prefer to use “union”. Rt+1

Copyright © 2020 by SIAM 1474 Unauthorized reproduction of this article is prohibited Approx-Cliques(n,k,α,ε,δ, m, n , ~τ, A) k Q 1. Define 0 = V , d( 0)= n and set w0 h nk. R nτ1 R 2. Sample s1 h e vertices u.a.r.e e and let 1 be the chosen multiset. nk R 3. For t = 1 to k 1 do: e e − wet−1 d( t) τt+1 (a) Compute d( ) and set w h s and s h R · . t t d( −1) t t+1 we R Rt · t (b) Invoke Sample-a-Set(t, t,st+1) and let t+1 be the returned multiset. n d( 1) ... d( −1) R R 4. Let n = · R · · Rk e Is-Assigned(C,~ ). k s1 ... s C~ A · ·· k · ∈Rk Q 5. Return nk. P b Sample-a-Set(b t, ,s ) Rt t+1 1. Compute d( t) and set up a data structure to sample each T~ t with probability d(T~)/d( t). 2. Initialize R = . ∈R R Rt+1 ∅ 3. For ℓ = 1 to st+1: (a) Invoke the data structure to generate an ordered clique T~ℓ. (b) Query degrees of vertices in T~ , and find a minimum degree vertex u T~ . ℓ ∈ ℓ (c) Sample a random neighbor vℓ of u. (d) If (T~ℓ,vℓ) is an ordered (t + 1)-clique, add it to t+1. 4. Return . R Rt+1

w( t) expected value, d(R ) st+1, and that d( j( t+1)) is not holds as well. Rt · C R d( j ( t)) In order to give the idea behind the procedure much larger than its expected value C R st+1. d( t) · Is-Active, consider the special case that t = 1 and The procedure Is-Assigned (invokedR in Step 4 of T = v for some vertex v. Observe that c (v) is the Approx-Cliques) decides whether a given ordered k- { } k ~ ~ ~ number of k-cliques in the subgraph induced by v and its clique C is assigned U(C), i.e., whether wA(C) = 1. neighbors. Roughly speaking, for i = 1, the procedure This is done following Definition 4.3, given access to an Is-Active works similarly to the algorithm Approx- oracle for . In Section 6 we replace the oracle by an A Cliques, subject to setting 1 = v . Namely, starting explicitly defined procedure. from t = 1, and using theR procedure{ } Sample-a-Set, As stated at the start of this section, we show it iteratively selects a sample t+1 of ordered (t + 1)- that if A is an oracle for an (ε, ~τ)-good subset , R Q A cliques, given a sample t of ordered t-cliques. Once m m/2 and nk nk, then with probability at least R it obtains k it uses k to decide whether ck(v) is 1 ≥δ, Approx-Cliques≤ returns an estimate n such R |R | − k too large (and hence should not be included in ). As thate nk (1 eε)nk. The precise statement regarding opposed to Approx-Cliques, here we do notA ask for the correctness∈ ± and complexity of Approx-Cliques as b a precise estimate of ck(v), and hence this decision is well asb its proof, are provided in the full version. simpler. In addition, the invariant that we would like to maintain is that the average number of k-cliques 6 Implementing an oracle for good subset A that ordered cliques in t+1 participate in, does not In this section we describe a (randomized) procedure deviate by much from theR average number for . The Rt named Is-Active, that implements an oracle for a procedure generalizes to i> 1 by setting i = I~ and subset , where with high probability, is (ε, ~τ)-good starting the sampling process with t = i. R A A  for an appropriate setting of ~τ. For an ordered t-clique Observe that the procedure Is-Active may exit (in ~ ~ T , recall that ck(T ) denotes the number of k-cliques Step 2c) before reaching t = k 1 if st+1 is above a that T~ participates in (that is, c (T~) = (T~) ). The certain threshold. This early exit− (with an output of k |Ck | procedure aims at determining whether ck(T~) is larger Non-Active) addresses the case that I~ is “costly” (this than τ , in which case it is not included in . This was informally defined Section 2.5). t A ensures that the first item in Definition 4.4 holds, since As in the case of the algorithm Approx-Cliques, Downloaded 08/04/20 to 24.6.75.129. Redistribution subject SIAM license or copyright; see http://www.siam.org/journals/ojsa.php wA(T~) c (T~) τ . On the other hand, the setting here we give a slightly simplified version of Is-Active. ≤ k ≤ t of ~τ is such that despite not including in all ordered By replacing the calls to an oracle A in the A Q cliques T~ for which ck(T~) is above the allowed threshold, algorithm Approx-Cliques with calls to Is-Active, we can still prove that the second item in Definition 4.4 we obtain an algorithm that with high probability

Copyright © 2020 by SIAM 1475 Unauthorized reproduction of this article is prohibited Is-Assigned(C,~ A) Q 1. Let C = U(C~ ). ~ 1 ~ ~ 2. For each C′ U − (C), check whether C′ is fully active with respect to , i.e., whether C kA, by ∈ ~ A ∈ F calling A on every prefix C′ t for t [k 1]. Q ≤ ∈ − 3. If C~ A and it is the first ordered k-clique in A, then return 1, otherwise return 0. ∈ Fk Fk

Is-Active(i, I,k,α,ε,δ,~ nk, m, ~τ)

1. Let i = I~ and wi h τi. 2. For Rt = i to k 1 do:e e  − (a) Compute d( te). R we −1 d( ) τ +1 (b) For t>i set w h t s and s h Rt · t . t d( t−1) t t+1 wet R t−1· mαe τt+1 (c) If st+1 is larger than ne · , then return Non-Active. e k (d) Invoke Sample-a-Set(t, t,st+1) and let t+1 be the returned multiset. d( ) ... d( −1) R R 3. Set c (I~) := Ri · · Rk . k s +1 ... s k i · · k · |R | 4. If ck(I~) τi/4 then return Non-Active. Otherwise, return Active. b ≤ b computes a (1 ε)-estimate of n (conditioned on Innovations in Theoretical Computer Science (ITCS), ± k m m/2 and nk < nk, where both assumptions can pages 225–234, 2014. be≥ removed). For full details, see the full version. [8] R. S. Burt. Structural holes and good ideas. American Journal of Sociology, 110(2):349–399, 2004. e e [9] B. Chazelle, R. Rubinfeld, and L. Trevisan. Approxi- References mating the minimum spanning tree weight in sublinear time. SIAM Journal on Computing, 34(6):1370–1379, 2005. [1] M. Aliakbarpour, A. S. Biswas, T. Gouleakis, J. Pee- [10] N. Chiba and T. Nishizeki. Arboricity and sub- bles, R. Rubinfeld, and A. Yodpinyanee. Sublinear- graph listing algorithms. SIAM Journal on Comput- time algorithms for counting star subgraphs via edge ing, 14(1):210–223, 1985. sampling. Algorithmica, pages 1–30, 2017. [11] F. Chierichetti, A. Dasgupta, R. Kumar, S. Lattanzi, [2] S. Assadi, M. Kapralov, and S. Khanna. A simple and T. Sarlos. On sampling nodes in a network. In sublinear-time algorithm for counting arbitrary sub- Conference on the World Wide Web (WWW), pages graphs via edge sampling. In ITCS, volume 124 of 471–481, 2016. LIPIcs, pages 6:1–6:20. Schloss Dagstuhl - Leibniz- [12] J. Cohen. Graph twiddling in a MapReduce world. Zentrum fuer Informatik, 2019. Computing in Science & Engineering, 11:29–41, 2009. [3] A. L. Barab´asi and R. Albert. Emergence of scaling in [13] J. S. Coleman. Social capital in the creation of human random networks. science, 286(5439):509–512, 1999. capital. American Journal of Sociology, 94:S95–S120, [4] R. Bauer, M. Krug, and D. Wagner. Enumerating and 1988. generating labeled k-degenerate graphs. In Proceedings [14] A. Czumaj, F. Erg¨un, L. Fortnow, A. Magen, I. New- of the Meeting on Algorithm Engineering & Expermi- man, R. Rubinfeld, and C. Sohler. Approximating the ments, pages 90–98. Society for Industrial and Applied weight of the Euclidean minimum spanning tree in sub- Mathematics, 2010. linear time. SIAM Journal on Computing, 35(1):91– [5] M. Baur, M. Gaertler, R. G¨orke, M. Krug, and D. Wag- 109, 2005. ner. Generating graphs with predefined k-core struc- [15] A. Czumaj and C. Sohler. Estimating the weight ture. In Proceedings of the European Conference of of metric minimum spanning trees in sublinear time. Complex Systems. Citeseer, 2007. SIAM Journal on Computing, 39(3):904–922, 2009. [6] L. Becchetti, P. Boldi, C. Castillo, and A. Gionis. [16] M. Danisch, O. D. Balalau, and M. Sozio. Listing k- Efficient semi-streaming algorithms for local triangle cliques in sparse real-world graphs. In Conference on counting in massive graphs. In Proceedings of the the World Wide Web (WWW), pages 589–598, 2018.

Downloaded 08/04/20 to 24.6.75.129. Redistribution subject SIAM license or copyright; see http://www.siam.org/journals/ojsa.php International Conference on Knowledge Discovery and [17] A. Dasgupta, R. Kumar, and T. Sarlos. On estimating Data Mining (SIGKDD), pages 16–24, 2008. the average degree. In Conference on the World Wide [7] J. Berry, L. Fostvedt, D. Nordman, C. Phillips, C. Se- Web (WWW), pages 795–806. ACM, 2014. shadhri, and A. Wilson. Why do simple algorithms [18] J. P. Eckmann and E. Moses. Curvature of co-links for triangle enumeration work in the real world? In

Copyright © 2020 by SIAM 1476 Unauthorized reproduction of this article is prohibited uncovers hidden thematic layers in the World Wide International Workshop on Graph-Theoretic Concepts Web. Proceedings of the National Academy of Sciences, in Computer Science, pages 159–167. Springer, 2006. 99(9):5825–5829, 2002. [33] O. Goldreich. Introduction to Property Testing. Cam- [19] T. Eden, S. Jain, A. Pinar, D. Ron, and C. Seshadhri. bridge University Press, 2017. Provable and practical approximations for the degree [34] O. Goldreich and D. Ron. Approximating average distribution using sublinear graph samples. In Confer- parameters of graphs. Random Structures and Algo- ence on the World Wide Web (WWW), pages 449–458, rithms, 32(4):473–493, 2008. 2018. [35] M. Gonen, D. Ron, and Y. Shavitt. Counting stars [20] T. Eden, A. Levi, D. Ron, and C Seshadhri. Approxi- and other small subgraphs in sublinear-time. SIAM mately counting triangles in sublinear time. In Foun- Journal on Discrete Mathematics, 25(3):1365–1411, dations of Computer Science (FOCS), pages 614–633, 2011. 2015. [36] A. Hassidim, J. A. Kelner, H. N. Nguyen, and K. Onak. [21] T. Eden, R. Levi, and D. Ron. Testing bounded Local graph partitions for approximation and testing. arboricity. In Symposium on Discrete Algorithms In Foundations of Computer Science (FOCS), pages (SODA), pages 2081–2092, 2018. 22–31, 2009. [22] T. Eden, D. Ron, and C. Seshadhri. Sublinear time es- [37] P. W. Holland and S. Leinhardt. A method for timation of degree distribution moments: The degen- detecting structure in sociometric data. American eracy connection. In International Colloquium on Au- Journal of Sociology, 76:492–513, 1970. tomata, Languages, and Programming (ICALP), pages [38] M. O. Jackson, T. Rodriguez-Barraquer, and X. Tan. 7:1–7:13, 2017. Social capital and social quilts: Network patterns [23] T. Eden, D. Ron, and C. Seshadhri. Faster sublinear of favor exchange. American Economic Review, approximations of k-cliques for low arboricity graphs, 102(5):1857–1897, 2012. 2018. [39] S. Jain and C. Seshadhri. A fast and provable method [24] T. Eden, D. Ron, and C. Seshadhri. On approximating for estimating clique counts using tur´an’s theorem. In the number of k-cliques in sublinear time. In Sympo- Conference on the World Wide Web (WWW), pages sium on Theory of Computing (STOC), pages 722–734, 441–449, 2017. 2018. [40] T. Kloks, D. Kratsch, and H. M¨uller. Finding and [25] T. Eden and W. Rosenbaum. Lower bounds for counting small induced subgraphs efficiently. Informa- approximating graph parameters via communication tion Processing Letters, 74(3-4):115–121, 2000. complexity. In Approximation, Randomization, and [41] T. Kopelowitz, S. Pettie, and E. Porat. Higher lower Combinatorial Optimization. Algorithms and Tech- bounds from the 3sum conjecture. In Symposium on niques (APPROX/RANDOM), pages 11:1–11:18, 2018. Discrete Algorithms (SODA), pages 1272–1287, 2016. [26] F. Eisenbrand and F. Grandoni. On the complexity of [42] S. Marko and D. Ron. Approximating the distance fixed parameter clique and dominating set. Theoretical to properties in bounded-degree and general sparse Computer Science, 326(1-3):57–67, 2004. graphs. ACM Transactions on Algorithms, 5(2):22, [27] D. Eppstein, M. L¨offler, and D. Strash. Listing all 2009. maximal cliques in large sparse real-world graphs. [43] G. Marsaglia, W. W. Tsang, J. Wang, et al. Fast ACM Journal of Experimental Algorithmics, 18:3–1, generation of discrete random variables. Journal of 2013. Statistical Software, 11(3):1–11, 2004. [28] D. Eppstein and D. Strash. Listing all maximal cliques [44] R. Milo, S. Shen-Orr, S. Itzkovitz, N. Kashtan, in large sparse real-world graphs. In International D. Chklovskii, and U. Alon. Network motifs: sim- Symposium on Experimental Algorithms, pages 364– ple building blocks of complex networks. Science, 375. Springer, 2011. 298(5594):824–827, 2002. [29] U. Feige. On sums of independent random variables [45] M. Mitzenmacher, J. Pachocki, R. Peng, C. E. with unbounded variance and estimating the average Tsourakakis, and S. C. Xu. Scalable large near-clique degree in a graph. SIAM Journal on Computing, detection in large-scale networks via sampling. In 35(4):964–984, 2006. SIGKDD International Conference on Knowledge Dis- [30] I. Finocchi, M. Finocchi, and E. G. Fusco. Clique covery and Data Mining, pages 815–824, 2015. counting in mapreduce: Algorithms and experiments. [46] C. St. JA. Nash-Williams. Edge-disjoint spanning trees ACM Journal of Experimental Algorithmics, 20:1–7, of finite graphs. Journal of the London Mathematical 2015. Society, 1(1):445–450, 1961. [31] B. Foucault Welles, A. Van Devender, and N. Contrac- [47] C. St. JA. Nash-Williams. Decomposition of finite tor. Is a friend a friend?: Investigating the structure of graphs into forests. Journal of the London Mathemat- friendship networks in virtual worlds. In CHI Extended ical Society, 1(1):12–12, 1964. Downloaded 08/04/20 to 24.6.75.129. Redistribution subject SIAM license or copyright; see http://www.siam.org/journals/ojsa.php Abstracts on Human Factors in Computing Systems, [48] J. Neˇsetˇril and P. Ossana de Mendez. Sparsity: pages 4027–4032, 2010. Graphs, Structures, and Algorithms. Springer, 2012. [32] G. Goel and J. Gustedt. Bounded arboricity to [49] J. Neˇsetˇril and S. Poljak. On the complexity of the determine the local structure of sparse graphs. In subgraph problem. Commentationes Mathematicae

Copyright © 2020 by SIAM 1477 Unauthorized reproduction of this article is prohibited Universitatis Carolinae, 26(2):415–419, 1985. [50] H. N. Nguyen and K. Onak. Constant-time approxi- mation algorithms via local improvements. In Foun- dations of Computer Science (FOCS), pages 327–336, 2008. [51] K. Onak, D. Ron, M. Rosen, and R. Rubinfeld. A near- optimal sublinear-time algorithm for approximating the minimum vertex cover size. In Symposium on Discrete Algorithms (SODA), pages 1123–1131, 2012. [52] M. Parnas and D. Ron. Approximating the minimum vertex cover in sublinear time and a connection to distributed algorithms. Theoretical Computer Science, 381(1-3):183–196, 2007. [53] A. Portes. Social capital: Its origins and applications in modern sociology. In Eric L. Lesser, editor, Knowl- edge and Social Capital, pages 43 – 67. Butterworth- Heinemann, Boston, 2000. [54] C. Seshadhri, T. G. Kolda, and A. Pinar. Commu- nity structure and scale-free collections of Erd¨os-R´enyi graphs. Physical Review E, 85(5):056109, May 2012. [55] K. Shin, T. Eliassi-Rad, and C. Faloutsos. Patterns and anomalies in k-cores of real-world graphs with applications. Knowledge and Information Systems, 54(3):677–710, 2018. [56] S. Suri and S. Vassilvitskii. Counting triangles and the curse of the last reducer. In Proceedings of the Interna- tional Conference on World Wide Web (WWW), pages 607–614, 2011. [57] C. E. Tsourakakis. The k-clique densest subgraph problem. In Proceedings of the International Confer- ence on World Wide Web (WWW), pages 1122–1132, 2015. [58] V. Vassilevska. Efficient algorithms for clique prob- lems. Information Processing Letters, 109(4):254–257, 2009. [59] A. J. Walker. New fast method for generating dis- crete random numbers with arbitrary frequency distri- butions. Electronics Letters, 10(8):127–128, 1974. [60] A. J. Walker. An efficient method for generat- ing discrete random variables with general distribu- tions. ACM Transactions on Mathematical Software, 3(3):253–256, 1977. [61] Y. Yoshida, M. Yamamoto, and H. Ito. An improved constant-time approximation algorithm for maximum. In Proceedings of the Symposium on Theory of Com- puting (STOC), pages 225–234, 2009. Downloaded 08/04/20 to 24.6.75.129. Redistribution subject SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

Copyright © 2020 by SIAM 1478 Unauthorized reproduction of this article is prohibited