Quick viewing(Text Mode)

How the Degeneracy Helps for Triangle Counting in Graph Streams

How the Degeneracy Helps for Triangle Counting in Graph Streams

arXiv:2003.13151v1 [cs.DS] 29 Mar 2020 ntrl rp lse htamtmr ffiin temn a streaming efficient more admit that classes graph “natural" T ervsttewl-tde rbe ftinl on estim count of problem well-studied the revisit We edsg temn loih ihsaecomplexity space with algorithm streaming a design We h degeneracy, The 3 4 2 4 8 9 1 54,5,6] h infiac ft of significance The 60]. 59, 45–48, 41, 39, 38, 34, 22, 14, 13, for problem algorithmic fundamental a is counting Triangle Degeneracy Model, Streaming counting, Triangle ubro ass h pc opeiyi nw ob essenti be to known is Θ complexity space the passes, of number u i st opt a compute to is aim our h olwn usinhsntrcie uhattention. much received not has question following the rp tem.Gvnagahrpeetda temof stream a as represented graph a Given streams. graph rps io-lsdfmle,adpeeeta attachme preferential p and (containing families, minor-closed rich graphs, immensely is graphs degeneracy constant ABSTRACT u loihi eutwt erymthn oe on of bound lower matching nearly a with result algorithmic our infiatysalrta both than smaller significantly ue n temn 4 ,1,2,4,5,5,5,56–58]. 54, 52, 51, 42, 20, 18, 8, [4, streaming M and shared-memory, duce, distributed models: computational counting triangle of of ety study practical and Thes theoretical details). the for e to 7] size [5, (see query problems join for database used in tion is counting inten computationally triangle be systems, to database considered is rea task of scale this countin the graphs, Given triangle science. network standpoint, in task practical analysis scien core wher a computer fields From theoretical of mining. theory, variety data database wide studied: the is by it underscored is counting [ gle counting decad triangle two for been algorithms has streaming there on [9], introduc research the al Since of et itself. Bar-Yossef of by in problem ric subfield this so a is almost problem is one study this its on literature the Indeed, streams. o osatdgnrc rps hsbudis bound this graphs, degeneracy constant For htcncruvn hslwrbudfor bound lower this circumvent can that INTRODUCTION 1 KEYWORDS algorithms time linear • CONCEPTS CCS TC 2017). STACS sn ml pc loih.Frabtayodradacon a and order arbitrary For algorithm. space small a using , hoyo computation of Theory ( min ept h ltoao rvoswr ntesraigsetti streaming the in work previous of plethora the Despite egv cntn as rirr re)sraigalgori streaming order) arbitrary pass, (constant a give We o h eeeayHlsfrTinl onigi Graph in Counting Triangle for Helps Degeneracy the How ( m 3 / 2 / T , m κ / sanacdmaueo est,adtecasof class the and density, of measure nuanced a is , √ T )) at rz California Cruz, Santa ( MGeo ta. OS21,Br tal., et Bera 2016, PODS al., et (McGregor 1 rp loihsanalysis. algorithms Graph ; ua .Bera K. Suman ± [email protected] CSnaCruz Santa UC ε → ) apoiaint h ragecount triangle the to -approximation m temn,sbieradnear and sublinear Streaming, 3 / 2 / T and m o eeeaygraphs degeneracy low / √ T O e ecomplement We . ( m / T ) tgraphs). nt hc is which , aeled have e O e r there Are lgorithms m navari- a in to in ation ( l-world e and ce, ie In sive. mκ inof tion ,that h, edges, stima- graph sa is g Ω apRe- Streams lanar ,11, 9, stant rian- thm ally ( / mκ ng, T es e ) . . / T 1 ) . efcso osatps temn loihs iharbitr with algorithms, streaming pass constant on focus We ol rpsehbtseilpoete.Cudgahclass graph Could properties. special exhibit graphs world okfralgah)[] ndsrbtdadqeybsdcom query-based and distributed In [2]. graphs) all for work eeeayo elwrdgah swl tde,udrtec the under of studied, well is graphs real-world all gra of attachment graphs, degeneracy preferential planar and nu graphs, all of a contains families it as closed rich: degeneracy degen extremely of constant is of think graphs class to The sparsity. suffices graph of it measure now, anced for definiti formal but the later, defer We for number). core maximum the called hnwrtcs pe ons[4 5 6 55]. 36, 35, [24, bounds upper s worst-case magnitude of than orders many small, quite is graphs real-world oilagrtmo hb-ihzk ie an gives co Chiba-Nishizeki of seminal a algorithm torial algorithms, sequential exact persp of the time From running counting. of triangle for relevant be to known is o xc ragecutn ( counting triangle exact for n ragealgorithms? triangle ing worst-cas than “better have graphs real-world such contain massive that science network from known well is It tivation. counting? triangle for n ragecutn,prmtie ythe by parametrized counting, triangle ing is[0 3 6 6.Ti nprstemi usinaddres question main the inspires complex paper. This query this 56]. or 36, 33, communication dege [30, low bounding ties that in shown helpful have is results eracy of number a models, tational sophis course, more (of of algorithm bounds based time multiplication matrix running ticated thi known degeneracy, best the constant beats (say) rithm for Thus, [18]. degeneracy) . u eut n significance and results Our 1.1 l worst-case known beat can that bounds? graphs degeneracy low on ing nda max as fined ento 1.1. Definition ags osbemnmmdge fasbrp of subgraph a of degree minimum possible largest h ubro de,and edges, of number the re.Tu,w hn fteiptgraph input the of think we Thus, order. omk osatnme fpse vrti it u a lim has use but we list, standard, this is over As passes storage. of number constant is algorithm a Our make edges. to (unrepeated) of list arbitrary an as oedecompositions core ncmuainlmdl te hnsraig h degenera the streaming, than other models computational In oiae yteecnieain,w td h rbe fs of problem the study we considerations, these by Motivated oteeeitsraigagrtm o prxmt triang approximate for algorithms streaming exist there Do efis en h rp degeneracy. graph the define first We G ′ ugahof subgraph h eeeayo graph a of degeneracy The at rz California Cruz, Santa hsqeto a opligpatclmo- practical compelling a has question This ti ieyosre httedgnrc of degeneracy the that observed widely is It . [email protected] Seshadhri C. CSnaCruz Santa UC G T { m o h ubro rage in of number the for i ereof degree min stenme fegs and edges, of number the is n o h ubro vertices, of number the for G G rp degeneracy graph G O ′ = } denoted , ( mκ nwrs ti the is it words, In . ( V ) , G iealgorithm time E . ) represented κ ( h latter the G h.The phs. allowed "stream- e" G κ ecount- le sthat es ) oncept mbina- minor- algo- s e by sed maller sde- is , ective . m eracy sthe is (also ower real- ited tream- ary pu- for on cy n- i- - - In a low degeneracy graph, all induced subgraphs have low de- procedure of Chiba-Nishizeki. For every edge e = u,v , the size ( ) gree vertices. The following procedure that computes the degen- of intersection on neighborhoods of u and v is the number of tri- eracy is helpful for intuition. Suppose one iteratively removed the containing e. This intersection can be determined easily in minimum degree from G (updating degrees after every re- min du,dv operations, by searching for elements of the smaller ( ) moval). For any vertex v, consider the “observed" degree at the neighborhood in the larger one. (Here, du denotes the degree of time of removal. One can prove that the degeneracy is the largest vertex u.) For convenience, let us define the degree of edge e to such degree [1]. Thus, even though G could have a large maximum be de := min du,dv . Thus, we can enumerate all triangles in ( ) degree, the degeneracy can be small if high degree vertices are typ- e de time. The classic bound of Chiba-Nishzeki asserts that e de ically connected to low degree vertices. = O mκ . ( ) Our main theorem follows. Í As a warmup, let us get an O mκ T space streaming algorithm,Í ( / ) that uses a degree oracle. Define dE := de = O mκ . With the Theorem 1.2. Consider a graph G of degeneracy at most κ, that e ( ) degree oracle, in a single pass, we can sample an edge e propor- is input as an arbitrary edge stream. There is a streaming algorithm tional to its degree. In the second pass,Í pick a uniform random that outputs a 1 ε -approximation to T , with high probability1, ( ± ) neighbor w of the lower degree endpoint of e. In the third pass, and has the following properties. It makes constant number of passes 1 determine if e and w form a triangle. The probability of finding a over the input stream and uses space mκ T poly logn, ε− . ( / )· ( ) triangle is exactly 3T dE . By sampling O dE T = O mκ T inde- / ( / ) ( / ) To understand the significance of the bound mκ T , note that pendent random edges in the first pass, we can estimate T with / space complexity of streaming triangle counting is known to be O mκ T space. ( / ) min m3 2 T,m √T [11, 46]. Consider κ = O 1 , which as men- The main challenge is in removing the degree oracle. As a first ( / / / ) ( ) tioned earlier, holds for all graphs in minor-closed families and step, can we effectively simulate sampling edges proportional to preferential attachment graphs. In this case, the algorithm ofTheo- their degree? We borrow a key idea from recent sublinear algo- rem 1.2 uses space O m T . This is significantly smaller than both rithms for clique counting. First, we sample a set of uniform ran- ( / ) m3 2 T and m √T . We note that for all graphs, κ √2m, and thus, dom edges, denoted R. In a second pass, we compute the degree / / / ≤ the space is alwayseO m3 2 T . of all edges in R. Now, we can run the algorithm described earlier, ( / / ) As an illustrative example, consider the wheel graph with n ver- except we only sample edges of R. Observe that in the latter sam- tices (take a cycle withe n 1 vertices, and add a central vertex con- ple, taking expectations over R, we do sample edges proportional − nected to all other vertices). Note that m = T = Θ n and κ = O 1 to their degree from the overall graph. Unfortunately, these sam- ( ) ( ) (G is planar). The space bound given in Theorem 1.2 is only poly- ples are all correlated by the choice of R. How large should R be to logarithmic, while all existing streaming algorithms bounds (given ensure that this simulation leads to the right answer? in Table 1) are Ω √n . An alternate viewpoint is to observe that the above approach ( ) Our bound of O mκ T subsumes the term O m3 2 T , and dom- can give an accurate estimate to the number of triangles incident ( / ) ( / / ) inates the term O m √T when T = Ω κ2 . For real-world graphs, to R, denoted tR . We require R to be large enough, so that tR can ( / ) ( ) T = Ω κ2 is a naturallye occurring phenomenon.e In fact, real-world be used to estimate T . Let te be the number of triangles incident ( ) to e. The te values can exhibit large variance, even when κ = large graphs aree often characterized by following two properties: { } O 1 . Consider a graph formed by n 2 triangles that all share (1) low sparsity, and (2) high triangle density [25, 50, 53, 61]. Thus, ( ) ( − ) a common edge. The graph is planar, so κ = O 1 . But one edge from a practical standpoint, our bound offers significant improve- ( ) is incident to n 2 triangles, and all other edges are incident to ment over previously known bounds. ( − ) a single triangle. Thus, the te values have the largest possible We complement Theorem 1.2 with a nearly matching lower bound. { } variance, and one cannot estimate T by computing e R te for a Theorem 1.3. Any constant pass randomizedstreaming algorithm small R. Note that, for this example graph, our desired∈ streaming for graphs with m edges, T triangles, and degeneracy at most κ, that algorithm uses only polylogarithmic space (T = Θ mÍ, κ = O 1 ). ( ) ( ) provides a constant factor approximation to T with probability at Another key idea from sublinear clique counting saves the day: least 2 3, requires storage Ω mκ T . assignment rules. The idea is to assign triangles uniquely to edges, / ( / ) We remark that all our results in this paper can be equivalently so that the distribution of assigned triangles has low variance. The stated in terms of arboricity as well. The arboricity of a graph G, overall algorithm will estimate the number of triangles assigned denoted as α, is the smallest integer p such that the edge set E G to R, and not count the number of triangles incident to R. A natu- ( ) can be partitioned into p forests. It is asymptotically same as de- ral, though seemingly circular, rule is to assign each triangle to the generacy: for every graph α κ 2α 1. contained edge that itself participates in the fewest triangles. Using ≤ ≤ − properties of graph degeneracy, it is shown in [30] that the max- imum number of assigned triangles to any edge is O κ . (Techni- 1.2 Main ideas ( ) cally, this is not true. We have to leave some triangles unassigned.) We give a high-level description of our algorithm and proof. The fi- This leads to another technical complication. In the overall al- nal algorithm has a number of moving parts, and is based on recent gorithm, when a triangle incident to an edge is discovered, the al- advances in sublinear algorithms for clique counting [26, 29, 30]. gorithm needs to determine if the triangle is actually assigned to The starting for our algorithm (and indeed, most trian- the edge. This requires estimating te for all edges e in the triangle, gle counting results related to degeneracy) is the classic sequential a potentially space intensive operation. To perform this estimation 1We use “high probability" to denote errors less than 1 3. / 2 in O mκ T requires subtle modifications to the assignment proce- of a vertex v V by dv and its neighborhood by N v . For an edge ( / ) ∈ ( ) dure. It turns out we can ignore triangles containing edges of high e = u,v , we define its neighborhood N e to be that of the lower { } ( ) degree, and thus, the above te estimation is only required for low degree end point: N e = N u if du < dv ; N e = N v otherwise. ( ) ( ) ( ) ( ) degree edges. Furthermore, we only need to determine if te = Ω κ , Similarly, we define the degree of an edge: de = min du,dv . For ( ) { } which allows for smaller storage algorithms. a collection of edges R, we define dR = e R de . In particular, ∈ All in all, by choosing parameters carefully, all steps can be im- dE := e E de . 1 ∈ Í plemented using mκ T poly logn, ε− storage. Chiba and Nishizeki [19] proved the following insightful con- ( / ) ( ) Í nection between the sum of degrees of the edges in a graph dE and 2 RELATED WORK its degeneracy κ. 2 The triangle counting problem, a special case of more general sub- Lemma 3.1 (Lemma 2 in [19]). For a graph G with m edges and graph counting problem, has been studied extensively in the stream- degeneracy κ, ing setting. We present a summary of the significant prior works dE = de 2mκ . ≤ in Table 1. The upper bounds stated in the table are for random- e E ized streaming algorithms that provide 1 ε -approximation to Õ∈ ( ± ) As a corollary, we get the following result. the true triangle count with probability at least 2 3. The O no- / tion hides dependencies on 1 ε and logn. The lower Corollary 3.2 ( [19]). For a graph G with m edges and degeneracy / bounds are primarily based on the triangle detection probleem — κ, the maximum number of triangles in G is at most 2mκ. detect whether the input graph is triangle free or it contains at We use the notation O to hide polynomial dependencies on least T many triangles. All the results presented in the table are (·) 1 ε and logn terms, where ε is the error parameter. For designing for the arbitrary order stream. ( / ) our algorithms, we focuse on the expected space usage. This can be Jha et al. [37] designed a one pass O m √T -space algorithm ( / ) easily converted into a worst-case guarantee by applying Markov with W -additive error approximation, where W is the number of ± inequality — simply abort if the space usage runs beyond c times two length paths (also called wedges). Bravermanete al. [13] gave a 1 3 the expected space usage, for some constant c. This only increases two-pass O m T / -space algorithm to detect if the input graph the error probability by an additive 1 c amount. ( / ) / is triangle free or it has at least T many triangles. These results are We use the following variants of the Chernoff bound and Cheby- not directlye comparable to our work. shev inequality for analyzing our algorithms. The triangle counting problem has been studied in the context of the adjacency list streaming model as well. This model is also Theorem 3.3 (Chernoff Bound [17]). Let X1, X2,..., Xr bemu- known as the vertex arrival model: all the edges incident on a ver- tually independent indicator random variables with expectation µ. tex arrive together. McGregor et al. [46] gave one-pass O m √T - Then, for every ε with 0 < ε < 1, we have ( / ) space and two pass O m3 2 T -space algorithm for the triangle r ( / / ) 1 2 counting problem in this model. We refer to [46] for othere related Pr Xi µ εµ 2exp ε rµ 3 " r = − ≥ # ≤ − / work in this model. e i 1   Õ Bounded degeneracy graph family is an important class of graphs Theorem 3.4 (Chebyshev Ineality [3]). Let X be a random from a practical point of view. Many real-world large graphs, spe- variable with expectation µ and variance Var X . Then, for every [ ] cially from the domain of social networks and web graphs, often ε > 0, exhibit low degeneracy( [12, 24, 35, 36, 55], also Table 2 in [12]). Var X Pr X µ εµ [ ] . Naturally, designing algorithms that are parameterized by degen- [| − | ≥ ]≤ ε2µ2 eracy has been a theme of many works in the streaming settings; some examples include matching size estimation [6, 23, 32], inde- 4 WARM-UP: AN ABSTRACT MODEL pendent set size approximation [21], graph coloring [12]. In the In this section, we consider a streaming model equipped with a de- general RAM model, the relation between degeneracy and sub- gree oracle: queried with a vertex v, the oracle returns dv . Further- graph counting problems has been explored in [10, 19, 31]. more, we make a rather strong assumption: there is no cost associ- In the graph query model, where the goal is to design sub-linear ated with the queries. McGregor et al. [46] designed a O m3 2 T ( / / ) time algorithms, Eden et al. [28] studied the triangle counting prob- space 3-pass streaming algorithm in this model — their algorithm lem, and more generally the clique counting problem in bounded makes O m many degree queries. We describe an O mκeT -space degeneracy graphs. Although the model is significantly different ( ) ( / ) 3 pass algorithm in this model. Our estimator makes 2m many de- from the streaming model, we port some key ideas from there; gree queries and requires 3-pass. For bounded degeneracye graph see Section 1.2 for a detailed discussion. The relevance of bounded families, this translates to a space reduction by a factor of O √m . degeneracy has been further explored in the context of estimating ( ) In the next section, we show how to design a O mκ T -space con- degree moments [27] in this model. ( / ) stant pass algorithm in the traditional streaming model. Our main idea is to sample edges from thee stream with proba- 3 NOTATIONS AND PRELIMINARIES bility proportional to its degree. In general streaming settings, this For an integer k, we denote the set 1, 2,..., k by k . Throughout { } [ ] 2Note that Chiba and Nishizeki [19] stated their results in terms of arboricity. As the paper, we denote the input graph as G = V , E . We assume G ( ) α κ for each graph G with arboricity α , the same result holds with respect to has n vertices, m edges andT many triangles. We denote the degree degeneracy≤ κ as well. 3 Space Remarks Source O mn T 2 one pass [9] / O m∆2 T one pass, ∆ = maximum degree [38] / eO mn T one pass, n known a priori [14] /  Oe m3 T 2 one pass, dynamic stream [41] /  Oe m∆ T one pass, ∆ = maximum degree [48] /  O mJe T + m √T one pass, J = maximum triangles incident on a edge [47]  e/ / C + O P2 T one pass, C = vertex cover, P2 = # of 2-paths [34] /  e √ 2.5 O m T  dependence on ε is 1 ε [22] e/ / O m3 2 T multi-pass [11, 46] / /  e √ O m T  multi-pass [46] e / Ω n2  one pass, T = 1 [9] e Ω n T multi-pass, T < n [38] /  2 Ω m one pass, m c1n,c2n , T < n [13]  ∈[ ] Ω m T multi-pass [13] / Ω m3 T 2 one pass, optimal [44] /  Ω m T 2 3 multi-pass [22] / /  Ω m √T multi-pass, for m = Θ n√T [22] /  ( ) Ω min m √T ,m3 2 T multi-pass [11] { /  / / } Table 1: Prior work on the triangle counting problem

is not possible as we do not know the degree of the edges apriori. te denote the number of triangles assigned to the edge e. Clearly, However, the model that we consider here is tailor-made for this e E te = T . purpose. It shows the effectiveness of degree-biased edge samples ∈ Analysis. First, we show the estimator is unbiased. in estimating triangle count and provides motivation for taking up Í de a similar sampling approach in the general streaming model. E X = E X e [ ] d · [ | ] We present our basic estimator in algorithm 1. In the full algo- e E E rithm, we will run multiple instances of this estimator in parallel Õ∈ = de E Y e and report the “median of the mean” [15] as our final estimate. · [ | ] e E Õ∈ te = de = te = T Algorithm 1 A Triangle Estimator · d e E e e E 1: procedure IdealEstimator(Graph G = V , E ) Õ∈ Õ∈ ( ) Now we bound the variance of the estimator. 2: Pass 1: Sample an edge e with probability de dE . / de 3: Pass 2: Sample a vertex w from N e u.a.r. Var X E X 2 = E X 2 e ( ) [ ]≤ [ ] d · [ | ] 4: Pass 3: Check if e,w forms a triangle. e E E { } Õ∈ 5: if τ = e,w is a triangle then 2 { } = de dE E Y e 6: Call IsAssigned τ, e . · · [ | ] ( ) e E 7: If returned YES, then set Y = 1; else set Y = 0. Õ∈ te 8: else = de dE · · d = e E e 9: Set Y 0. Õ∈ = 10: Set X dE Y . = dE te = dE T · · · 11: return X. e E Õ∈ 2 So, running O Var X E X = O dE T = O mκ T -many esti- ( [ ]/ [ ] ) ( / ) ( / ) mators independently in parallel suffices for a 1 ε -approximate Implementation Details. The degree proportional sampling is ( ± ) achieved by using weighted reservoir sampling [16]. On arrival of estimate. Sincee each copy of the estimatore requirese constant space, the edge e = u,v in the stream, we make two degree queries to the overall space usage is bounded by O mκ T . { } ( / ) find de . The method IsAssigned is required to ensures that every triangle is uniquely associated with one of its three edges. Other 5 OUR MAIN ALGORITHMe than this, there is no constraints on the implementation of this In this section, we present our streaming triangle estimator. As method. For example, we can associate every triangle to the edge promised, our algorithm does not assume access to a degree ora- with lowest degree, breaking ties arbitrarily (but consistently). Let cle. If the model is indeed equipped with a degree oracle, then we 4 can save a few passes over the stream. Perhaps more importantly, Algorithm 2 Estimation of triangle count the number of queries to the oracle is upper bounded by the space 1: procedure EstimateTraingle(Graph G = V , E ) usage of our algorithm. Our main algorithmic result is the follow- ( r ) 2: Pass 1: Sample r many edges u.a.r: R = ei . { }i=1 ing. 3: Pass 2: Compute de for each e R. = ∈ Theorem 5.1. Consider a graph G of degeneracy at most κ, that 4: for i 1 to ℓ do 5: Sample an edge e R independently with prob. de dR . is input as an arbitrary edge stream. There is a streaming algorithm ∈ / 3 6: Pass 3: Sample a vertex w from N e u.a.r. that outputs a 1 ε -approximation to T , with high probability , ( ) ( ± ) 7: Pass 4: Check if e,w forms a triangle. and has the following properties. It makes six passes over the input { } 1 8: if τ = e,w is a triangle then stream and uses space mκ T poly logn,ε− . { } ( / )· ( ) 9: Call IsAssigned τ, e . ( ) We describe our estimator in Algorithm 2. We set the parame- 10: If returned YES, then set Yi = 1; else set Yi = 0. ters r and ℓ later in the analysis. In analyzing our algorithm, the 11: else procedure IsAssigned would play a crucial role. As discussed ear- 12: Set Yi = 0. lier, IsAssigned takes as input a triangle and an edge, and outputs 1 ℓ m 13: Set Y = = Yi , and X = dR Y . ℓ i 1 r · · whether the triangle is assigned to that edge. We want this proce- 14: return X. dure to possess four properties: (1) For a given triangle, it is either Í unassigned or assigned uniquely to one of its three participating edges. (2) almost all the triangles are assigned, (3) For any fixed edge, not too many triangles are assigned to it, and (4) The space complexity of the procedure is bounded by O mκ T . The first two ( / ) We begin with analyzing the quality of the (multi)set of uni- properties are required to ensure the overall accuracy of the estima- form random edges R. Collectively through Lemmas 5.3 and 5.5 tor. The third property would be central toe bounding the variance and definition 5.4, we establish that for a suitable choice of the pa- of the estimator. The final property will ensure the overall space rameter r, the (multi)set R possesses desirable properties with high complexity of our triangle estimator is bounded by O mκ T . ( / ) probability. We analyze our triangle estimator assuming a black-box access to a IsAssigned procedure that satisfies the above foure properties. = = In the next section, we will take up the task of designing such an Lemma 5.3. Let dR e R de and τR e R τe . For any con- stant ε > 0 we have ∈ ∈ assignment procedure in the streaming setting. We make the above Í Í discussion rigorous and formal below. dE (1) E dR = r and E τR = r T , Assume τe denotes the number of triangles assigned to the edge [ ] · m [ ] · m log n ε e by the procedure IsAssigned. We use τ to denote the max- (2) Pr dR E dR 1 , max ≤ [ ]· ε ≥ − log n imum number of triangles that any edge has been assigned to: h i 1 1 m τmax (3) Pr τR E tR εE τR 1 ε 2 r · . τmax = maxe E τe . Denote the number of triangles that IsAssigned [| − [ ]| ≤ [ ]] ≥ − · · T ∈ assigns to some edge by = τe . Then, for any positive con- T e E stant ε and δ, we define an ε,δ ∈-accurate IsAssigned procedure ( ) below. Í Proof. We first compute the expected value of dR and τR . We d t define two sets of random variables, Yi and Yi for i r as fol- Definition 5.2 ( ε,δ -accurate IsAssigned). A procedure IsAs- d = t = = r d∈ [ ] = ( ) lows: Y dei , and Y τei . Then, dR = Y and τR signed that assigns a triangle to an edge or leaves it unassigned, i i i 1 i r Y t . We have is ε,δ -accurate if it satisfies the following four properties. i=1 i Í ( ) (1) Unique Assignment: For each triangle, it is either unassigned Í d d or uniquely assigned to one of the three participating edges. E Y = Pr ei = e E Y ei = e i [ ]· i | This implies, T . e E T ≤ h i Õ∈ h i (2) Almost All Assignment: With probability at least 1 δ, 1 − T ≥ = de 1 12ε T . m ( − ) e E (3) Bounded Assignment: With probability at least 1 δ, τmax Õ − ≤ ∈ κ ε. = dE / . (4) Bounded Space Complexity: Each call requires O mκ T bits m ( / ) of space. e Then, by linearity of expectation, we get E dR = r dE m. Analo- We now analyze our algorithm assuming a black-box access [ ] · / gously, we have E τR = r m. to ε,O 1 n5 -accurate IsAssigned. The analysis consists of two [ ] · T / ( ( / )) Wenowturnourfocusontheconcentrationofd . This is achieved parts. First, we show that for a certain settings of r and ℓ, our final R by a simple application of Markov inequality. estimate X is indeed a 1 ε approximation to the true triangle ( ± ) count. In the sequel, we bound the space complexity of our algo- rithm. logn ε Pr dR E dR . 3We use “high probability" to denote errors less than 1 3. ≥ [ ]· ε ≤ log n /   5 To prove a concentration bound on τR , we study the variance of Proof. Let ei be the edge sampled in the i-th iteration of the τR . By independence, we have for loop at 4 in Algorithm 2. Then,

E Yi = 1 = Pr ei = e Pr Yi = 1 ei = e , r [ ] [ ] [ | ] t e R Var τR = Var Y [ ] [ i ] Õ∈ i=1 de Õ = Pr Yi = 1 ei = e , r dR [ | ] t 2 e R E Y Õ∈ ≤ [( i ) ] d τ i=1 = e e Õr d · d e R R e t 2 Õ = Pr ei = e E Y ei = e ∈ τ [ ] [( i ) | ] = e i=1 e E Õ Õ∈ dR 2 e R r e E τe Õ∈ = · ∈ = τR m . Í dR r τmax e E τe · ∈ ≤ m By linearity of expectation, we have the item (1) of the lemma. For Í  r τmax the second item, we apply Chernoff bound( Theorem 3.3). = · T . m Lemma 5.7 (Setting of ℓ for concentration of YR ). Let 0 < ε < 1 6 / > ℓ = c log n m dR Then, the item (3) of the lemma follows by an application of Cheby- and c 20 be some constants, and ε 2 r · . Then, with  1 · ·T shev inequality ( Theorem 3.4). probability at least 1 , YR E YR εE YR . − 5 log n | − [ ]| ≤ [ ] Proof. By Lemma 5.5, R is ε-good with probability at least 1 We next define a collection of edges R as good if the conditions 1 − n . Condition on the event that R is ε-good. By definition of in items (2) and (3) in Lemma 5.3 are satisfied. Formally, we have 6 log a ε-good set, τR is tightly concentrated around its mean: τR ∈ the following definition. 1 ε r m , 1 + ε r m Then, by item 2. in lemma 5.6, we [( − ) T / ( ) T / ] have Definition 5.4 (A good collection of edges). We call a fixed collec- tion of edges R, ε-good if the following two conditions are true. Pr YR E YR εE YR [| − [ ]| ≥ [ ]] 2 c log n mdR ε τR logn dE exp 2 dR R (1) ≤ − ε · r · 3 · dR ≤ ε ·| |· m  T  c log n m exp τR τR 1 ε R T , 1 + ε R T (2) ≤ − 3 · · r ∈ ( − )·| |· m ( )·| |· m  T    = o 1 n3 , ( / ) Lemma 5.5 (Setting of r for an ε-good R). Let 0 < ε < 1 6 and c > / where the last line follows from the concentration of τR for ε-good = c log n mτmax 6 be some constants, and r ε 2 . Then, with probability at R. Removing the condition on R, we derive the lemma.  least 1 1 , R is ε-good. T − 6 log n We have now all the ingredients to prove that our final estimate is indeed close to the actual triangle count. The random variable Y = c log n mτmax is scaled appropriately to ensure that its expectation is close to the Proof. The lemma follows by plugging in r ε 2 in Lemma 5.3 and using the bounds on c and ε. T  true triangle count. Lemma 5.8. Assume r and ℓ is set as in Lemma 5.5 and Lemma 5.7 respectively. Then, there exists a small constant ε ′ such that with We have established that the random collection of edges R is 1 probability at least 1 , X 1 ε ′T , 1 + ε ′T . good with high probability. We now turn our attention to the ran- − 3 log n ∈ [( − ) ( )] dom variable Y , as defined on Line 13 Algorithm 2. Together in Lemmas 5.6 1 Proof. With probability at least 1 , YR is closely con- and 5.7 we show that, if R is good then for a suitably chosen param- − 5 log n centrated around its expected value τR dR (by Lemma 5.7). More eter ℓ, the random variable Y is well-concentrated around its mean. / formally,

Lemma 5.6. Let R be a fixed collection of edges, and YR denote the τR + τR value of the random variable Y as defined on Line 13 Algorithm 2 on YR 1 ε , 1 ε . ∈ ( − )dR ( )dR R. Then,   Then, with high probability τR (1) E YR = , [ ] dR m m 2 , + . (2) Pr Y E Y εE Y exp ℓ ε τR . XR 1 ε τR 1 ε τR R R R 3 dR ∈ ( − )· r · ( )· r · [| − [ ]| ≥ [ ]]≤ − · · h i   6 Since R is good, with probability at least 1 1 , 5.1 Assigning triangles to edges − c log n In this section we give an algorithm for the IsAssigned procedure in Algorithm 3. Recall from the previous section that we require tR 1 ε r T , 1 + ε r T , ∈ ( − ) · m ( ) · m IsAssigned to be ε,δ -accurate (see Definition 5.2).   ( ) The broad idea is to assign a triangle to the edge with small- Then, with probability at least 1 2 , − c log n est te . Recall that te is the number of triangles that the edge e participates in. However, computing te might be too expensive XR 1 2ε , 1 + 2ε ∈ [( − )T ( )T] in terms of space required for certain edges. As evident from the analysis of Algorithm 2, we have a budget of O mκ T in terms of Removing the conditioning on R, and using the bound on as ( / ) T bits of storage for each call to IsAssigned. In this regard, we de- given in Definition 5.2, we have fine “heavy” and “costly” edges and it naturallye leads to a notion of ”heavy” and ”costly” triangles. If a triangle is either “heavy” or X 1 ε ′ T, 1 + ε ′ T ∈ ( − ) ( ) “costly”, then we do not attempt to assign it to any of its edges. with probability at least 1 4 , for suitable chosen parameter Crucially, we show that the total number of “heavy” and “costly” − c log n triangles are only a tiny fraction of the total number of triangles ε .  ′ in the graph. We need to ensure that for any edge e, not too many triangles This completes the first part of the analysis. We now focus on are assigned to e by IsAssigned. We achieve this by simply disre- the space complexity of Algorithm 2. garding any edge with large te from consideration while assigning a triangle to an edge. Formally we capture this by defining heavy Lemma 5.9 (Space Complexity of Algorithm 2). Assuming an ac- edges and triangles. cess to a ε,δ -accurate IsAssigned method, Algorithm 2 requires ( ) O mκ T bits of storage in expectation. Definition 5.10 (ε-heavy edge and ε-heavy triangle ). An edge is ( / ) defined ε-heavy if te > κ ε. A triangle is deemed ε-heavy if all the / e Proof. Clearly, O r +ℓ space is sufficient to store the set R and three of its edges are ε-heavy. ℓ ( ) sample many edges from it at Line 4. Recall from Lemmas 5.5 If the ratio te de is quite small for an edge e, then we need too = = / and 5.7 that r O mτmax and ℓ O mdR r , respectively. many samples from the neighborhood N e to estimate te . Roughly ( /T) ( /( T)) ( ) Using the bound on and τmax from the definition of ε,δ -accurate speaking, O de te many samples are required for an accurate es- T ( ) ( / ) IsAssigned method,e we derive that O re+ ℓ is O mκ T with high timation. In this regard, we define costly edges and costly triangles ( ) ( / ) probability. as follows. We now account for the space complexitye of the IsAssigned method. It is called if the the edge-vertex pair e,w forms a trian- Definition 5.11 (ε-costly edge and ε-costly triangle). An edge e is { } defined ε-costly if de te > mκ εT . A triangle is deemed ε-costly gle (if condition at Line 8). Let Zi be an indicator random variable / /( ) to denote if IsAssigned is called during the i-th iteration of the for if any of its three edges is ε-costly. loop at Line 4. Then, We first show that the number heavy triangles and costly trian- gles are only a small fraction of the all triangles. Formally, we prove de te tR Pr Zi = 1 R = = . the following lemma. [ | ] d · d d e R R e R Õ∈ Lemma 5.12. The number of ε-heavy triangles and ε-costly trian- Then, the expected number of calls to IsAssigned is bounded by gles are bounded by 2εT and εT respectively. ℓ tR = O m tR . Note that t can be much different from τ , dR r R R Proof. We begin the proof by first showing that the number of · T as it counts the exact number of triangles per edge. However, we costly triangles is bounded. To prove this, observe that for a costly show thate with constant probability, tR is at most O r m , which edge e, te < de εT mκ . Then, ( T / ) ·( / ) bounds the expected number of calls by O 1 . Since each call to εT εT ( ) te < de < dE = 2εT IsAssigned takes O mκ T bits of space, the lemma follows. We mκ mκ · ( / ) e is costly e is costly now bound tR . e Õ Õ , where the last inequality follows from lemma 3.1. e te 3rT E tR = r = We now turn our attention to bounding the number of heavy tri- [ ] m m e E angles. By a simple counting argument, the number of heavy edges Õ∈ = in G is at most εT κ. Consider the subgraph of G induced by the set , where the last equality follows from the fact that e E te 3T . / An application of Markov inequality bounds the probability∈ that of heavy edges, denoted as Gheavy. It follows from the definition of cr Í degeneracy that κG κG . By Corollary 3.2, the number of tri- tR is more than T by a small constant probability, for any large heavy ≤ m angles in G is then at most κ E G = εT . Since any constant c > 10.  heavy Gheavy heavy · ( )  heavy triangle in G is present in Gheavy, the lemma follows. Thus, assuming an access to a ε,o 1 n5 -accurate IsAssigned We give the details of the procedure in Algorithm 3. The tech- ( ( / )) method, Lemmas 5.8 and 5.9 together prove our main result in Theorem 5.1.nical part of this method is handled by Assignment subroutine 7 at Line 7. Given a triangle τ, Assignment either returns (τ is Proof. First assume e is not an ε-costly edge. Let τ be some tri- ⊥ not assigned to any edges) or returns an edge e. We remark here that e participates in. We consider an execution of Assign- that the method Assignment as described is randomized and may ment on input τ. Clearly, Pr Yj = 1 = te de . By linearity of expec- [ ] / return different e on different invocations. To ensure that every tri- tation, E Ye = te . An application of Chernoff bound( Theorem 3.3) [ ] angle is assigned to an unique edge, as demanded in the item (1) yields of Definition 5.2, we maintain a table of (key,value) pairs that maps Pr Ye < κ 2ε Pr Ye < te 2 triangles (key) to edges or symbol(value). In particular, when As- [ /( )] ≤ [ / ] ⊥ signment is invoked with input τ, we first look up in the table to 1 sde exp check if there is an entry for the triangle τ. If it is there, then we ≤ − 12 · t  e  simply return the corresponding value from the table. Otherwise, c logn ste exp we execute Assignment with input τ; create an entry for τ and ≤ − ε2 · d  e  and store the return value together with τ in the table. Since the 1 expected number of calls to the IsAssigned routine is bounded by ≤ n5 O 1 (by Lemma 5.9), this only adds a constant space overhead. ( ) where the last inequality uses the fact that e is not ε-costly and hence de te mκ εT . Algorithme 3 Detecting Edge-Triangle Association / ≤ /( ) Next assume e is an ε-costly edge. Since t > κ ε, it follows = e 1: procedure IsAssigned(triangle τ e1,e2,e3 , edge e) mκ2 / { } that de > 2 . Then, the if condition on Line 9 is true and hence 2: Let emin = Assignment τ ε T ( ) Ye = . So no triangles that e participates in, will be assigned to 3: if emin = or emin , e then ∞ ⊥ it.  4: return NO. 5: else We now consider item (2) in Definition 5.2. Let τ be a triangle 6: return YES. such that τ is neither ε-heavy nor ε-costly. We prove that, with 7: procedure Assignment(triangle τ = e1,e2,e3 ) { } high probability, Assigned does not return when invoked with 8: for each edge e τ do ⊥ 2 ∈ τ. By Lemma 5.12, this implies that 1 3ε T . > mκ T ≥( − ) 9: if de ε 2T then 10: Ye = Lemma 5.15. Let τ be a triangle that is neither 4ε-heavy nor 4ε- ∞ 5 11: else costly. Then with probability at least 1 o 1 n , Assigned τ , . − ( / ) ( ) ⊥ 12: for j = 1 to s do 13: sample w from N e u.a.r. Proof. Since τ is not 4ε-heavy, for each edge e τ, te κ 4ε . ( ) ∈ ≤ /( ) 14: If e,w forms a triangle, set Yj = 1; Since τ is not 4ε-costly, at least one edge is not 4ε-costly — let e { } 15: Else set Yj = 0. denote that edge. Then, de te mκ 4εT . Together, they imply 2 2 / ≤ /( ) = de ℓ de mκ 16ε T , and the if condition on Line 8 is not met. We 16: Let Ye s j= Yj . ≤ /( ) 1 next show that, with high probability Ye < κ 2ε . = /( ) 17: Let emin argmine ÍYe . By linearity of expectation, E Ye = te . An application of Cher- [ ] 18: if Yemin > κ 2ε then noff bound( Theorem 3.3), similar to the previous lemma, shows /( ) 1 19: return . that with probability at least 1 , Ye 2te κ 2ε . Hence, ⊥ − n5 ≤ ≤ /( ) 20: else with high probability, the triangle τ is assigned to the edge e, prov- 21: return emin. ing the lemma. 

We next analyze Algorithm 3. The following theorem captures We now have all the necessary ingredients to complete the proof the theoretical guarantees of IsAssigned procedure. of the theorem. Theorem 5.13. Let ε > 0 and c > 60 be some positive constants Proof of Theorem 5.13. We have already argued how to en- c n and s = log mκ . Then, Algorithm 3 leads to an ε,o 1 n5 - sure that IsAssigned procedure satisfies item (1) in Definition 5.2. ε 2 · T ( ( / )) accurate IsAssigned procedure. Lemmas 5.14 and 5.15 proves that IsAssigned satisfies item (2) and item (3) of Definition 5.2. Finally, the space bound in item (4) is en- In the remaining part of this section, we prove the above theo- forced by the setting of the parameter s.  rem. We have already discussed how to ensure IsAssigned satisfies item (1) in Definition 5.2. We next take up item (3). We show that not too many triangles are assigned to any fixed edge. In particular, 6 LOWER BOUND we show that if an edge is “heavy”, then with high probability no In this section we prove a multi-pass space lower bound for the triangles are assigned to it. In other words, if an edge e is assigned triangle counting problem. Our lower bound, stated below, is ef- a triangle by Assigned, then with high probability te κ ε. It fectively optimal. ≤ / follows then, that for any edge e, the number of triangles that are Theorem 6.1. Any constant pass randomizedstreaming algorithm assigned to e, denoted as τe , is at most κ ε with high probability. / for graphs with m edges, T triangles, and degeneracy at most κ, that Lemma 5.14. Let e be an ε-heavy edge. Then with probability at provides a constant factor approximation to T with probability at least 1 1 , no triangles are assigned to e. least 2 3, requires storage Ω mκ T . − n5 / ( / ) 8 Our proof strategy follows along the expected line of reduc- fixed part and a variable part that depends on x and y. We next tion from a suitable communication complexity problem. We re- describe the construction of the graph G. duce from the much-studied set-disjointness problem in com- Let G = A B, E be a complete bipartite graph on the fixed ( ∪ fixed) munication complexity. It is perhaps a canonical problem that has bi-partition A and B. Then, E = a,b : a A, b B . Further fixed {{ } ∈ ∈ } been used extensively to prove multi-pass lower bounds for var- assume A = B = p. We add N blocks of vertices to G and | | | | fixed ious problems, including triangle counting [11, 13]. We consider denote them as V1,V2,...,VN . Assume Vi = q for each i N . | | ∈ [ ] the following promise version of this problem. Alice and Bob have We will set the parameters p and q later in the analysis. For each two N -bit binary strings x and y respectively, each with exactly R index i N such that xi = 1, Alice connects each every vertex in ∈[ ] ones. They want to decide whether there exists an index i N Vi to each vertex in A. Denote this edge set as EA. For each index N ∈ [ ] such that xi = 1 = yi . We denote this as the disj problem. i N such that yi = 1, Bob connects each every vertex in Vi R ∈ [ ] The basis of the reduction is the following lower bound for the to each vertex in B. Denote this edge set as EB . This completes N R N = , disjR problem. Assume disjR denote the randomized commu- the construction of the graph G. To summarize, G V E where ( ) = = ( ) nication complexity for the disjN problem. 4 V A B V1 ...VN , and E Efixed EA EB . R ∪ ∪ ∪ ∪ ∪ It is easy to see that the graph G is triangle-free if and only if Theorem 6.2 (Based on [40, 49]). For all R N 2, we have = = ≤ / there does not exits any i N such that xi 1 y1. We now R disjN = Ω R . ∈ [ ] ( R ) ( ) analyze various parameters of G. In both YES and NO case for the disjN problem, we have disjN N 3 To prove our lower bound, we reduce the R problem to the / following triangle-detection problem. Consider two graph fam- n = V = 2p + Nq ilies 1 and 2; 1 is a collection of triangle-free graphs on n ver- | | G G G N tices and m edges with degeneracy κ, and 2 consists of graphs = = 2 + G m E p 2 pq . on same number of vertices and edges, and with degeneracy Θ κ | | · 3 · ( ) and has at least T many triangles. Given a graph G 1 2 as ∈ G ∪G In the NO instance, the number of triangles T is at least p2q. As a streaming input, the goal of the triangle-detection problem argued above, in the YES instance, T = 0. Finally, we compute the is to decide whether G 1 or G 2 with probability at least ∈ G ∈ G degeneracy κ in both the cases. Note that κ G = p, and by 2 3, making a constant number of passes over the input stream. ( fixed) / definition (see Definition 1.1) κ G p. We claim that κ = p in A lower bound for the triangle-detection problem immediately ( ) ≥ the YES instance and κ 2p in the NO instance. To prove the gives a lower bound for the triangle counting problem. ≤ claim, we use the following characterization of degeneracy. Let To set the context for our lower bound result, it is helpful to com- ≺ be a total ordering of the vertices and let d denote the number pare against prior known lower bounds. The multi-pass space com- v≺ Θ 3 2 of neighbors of v that appears after v according to the ordering plexity of the triangle counting problem is min m / T,m √T = ( { / / }) . Let dmax≺ maxv V dv≺. Then, κ dmax≺ . Now consider the [11, 22, 46] for graphs with m edges and T triangles. We first con- ≺ ∈ ≤ following ordering: V1 V2 ...VN A B, and inside each 3 2 = ≺ ≺ ≺ ≺ sider the first term m / T . Since κ O √m , our result subsume set the vertices are ordered arbitrarily. Then, in the YES instance, Θ 3 2 / ( ) √ the bound of m / T . Compared between mκ T and m T , the d p and in the NO instance d 2p, proving our claim. ( / ) 2 / / max≺ ≤ max≺ ≤ former is smaller when T > κ . Necessarily, in our lower bound We now set the parameters p and q as p = κ and q = κr 2. Then, 2 − proofs, we will be dealing with graph instances such that T > κ . m = Θ Npq since p = O Nq . Assume there is a constant pass Ω ( ) ( ) For the purpose of proving a mκ T lower bound, it is suf- o mκ T -space streaming algorithm for the Triagnle-Detection ( / ) ( / ) A ficient to show that the Triangle-Detection problem requires problem. Then, following standard reduction, can be used to Ω A mκ T bits of space for some specific choice of parameters. How- solve the disjN problem with o Npq p p2q = o N bits of com- ( / ) N 3 ever, we cover the entire possible range of spectrum. Fix two pa- / ( · / ) ( ) munication, contradicting the lower bound for the disjN prob- rameters κ and r, such that r 2. Then, we can construct an in- N 3 ≥ / stance of the Triangle-Detection problem with degeneracy Θ κ lem.  ( ) and T = κr such that solving it requires Ω mκ T bits of space. ( / ) So in effect, we prove a more nuanced and arguably more general 7 FUTURE DIRECTIONS theorem than the one give in Theorem 6.1. Formally, we show the In this paper, we studied the streaming complexity of the triangle following. counting problem in bounded degeneracy graphs. As we empha- Theorem 6.3. Let κ and r be parameters such that r 2. Then sized in the introduction, low degeneracy is an often observed char- ≥ there is a family of instances with degeneracy Θ κ andT = κr such acteristics of real-world graphs. Designing streaming algorithms ( ) that solving the Triangle-Detection problem requires Ω mκ T with better bounds on such graphs (compared to the worst case) is ( / ) bits of space. an important research direction. There have been some recent suc- cesses in this context — graph coloring [12], matching size estima- Proof. We reduce from the disjN problem. Let x, y be the tion [6, 23, 32], independent set size approximation [21]. It would N 3 ( ) input instance for this problem. We then/ construct an input G for be interesting to explore what other problems can admit better the triangle-detection problem such that if x,y is a YES in- streaming algorithms in bounded degeneracy graphs. One natural ( ) stance, then G 1, and otherwise G 2. The graph G has a candidate is the arbitrary fixed size subgraph counting problem, ∈ G ∈ G which asks for the number of occurrences of the subgraph in the 4See [43] for the definition of the notion randomized communication complexity. given input graph. 9 Do there exist streaming algorithms for approximate subgraph [21] Graham Cormode, Jacques Dark, and Christian Konrad. 2017. Independent set counting on low degeneracy graphs that can beat known worst-case size approximation in graph streams. arXiv preprint arXiv:1702.08299 (2017). [22] Graham Cormode and Hossein Jowhari. 2014. A second look at counting trian- lower bounds? gles in graph streams. Theoretical Computer Science 552 (2014), 44–51. We conclude this exposition with the following conjecture about [23] Graham Cormode, Hossein Jowhari, Morteza Monemizadeh, and Shanmugave- layutham Muthukrishnan. 2017. The sparse awakens: Streaming algorithms for the fixed size clique-counting problem. matching size estimation in sparse graphs. In 25th European Symposium on Al- gorithms, ESA 2017. Schloss Dagstuhl-Leibniz-Zentrum fur Informatik GmbH, Conjecture 7.1. Consider a graph G with degeneracy κ that has Dagstuhl Publishing, 29. T many ℓ-cliques. There exists a constant pass streaming algorithm [24] M. Danisch, O. D. Balalau, and M. Sozio. 2018. Listing k-cliques in Sparse Real- ℓ World Graphs. In Conference on the World Wide Web (WWW). 589–598. that outputs a 1 ε -approximation to T using O mκ 2 T bits of ( ± ) ( − / ) [25] Nurcan Durak, Ali Pinar, Tamara G Kolda, and C Seshadhri. 2012. Degree re- space. lations of triangles in real-world networks and graph models. In Proceedings of e the 21st ACM international conference on Information and knowledge management. ACM, 1712–1716. ACKNOWLEDGMENTS [26] T. Eden, A. Levi, D. Ron, and C Seshadhri. 2015. Approximately counting trian- gles in sublinear time. In Foundations of Computer Science (FOCS). 614–633. The authors would like to thank the anonymous reviewers for their [27] Talya Eden, Dana Ron, and C Seshadhri. 2017. Sublinear Time Estimation of De- valuable feedback. The authors are supported by NSF TRIPODS gree Distribution Moments: The Degeneracy Connection. In 44th International grant CCF-1740850, NSF CCF-1813165, CCF-1909790, and ARO Award Colloquium on Automata, Languages, and Programming (ICALP 2017). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik. W911NF1910294. [28] T. Eden, D. Ron, and C. Seshadhri. 2018. Faster sublinear approximations of k-cliques for low arboricity graphs. arXiv:cs.DS/1811.04425 [29] T. Eden, D. Ron, and C. Seshadhri. 2018. On approximating the number of k- REFERENCES cliques in sublinear time. In Symposium on Theory of Computing (STOC). 722– [1] [n.d.]. Graph Degeneracy. https://en.wikipedia.org/wiki/Degeneracy_(graph_theory). 734. [30] T. Eden, D. Ron, and C. Seshadhri. 2020. Faster sublinear approximation of the [2] N. Alon, R. Yuster, and U. Zwick. 1997. Finding and counting given length cycles. number of k-cliques in low-arboricity graphs. In Symposium on Discrete Algo- Algorithmica 17, 3 (1997), 209–223. rithms (SODA). [3] Gerold Alsmeyer. 2011. Chebyshev’s Inequality. Springer Berlin Heidelberg, [31] David Eppstein. 1994. Arboricity and bipartite subgraph listing algorithms. In- Berlin, Heidelberg, 239–240. formation processing letters 51, 4 (1994), 207–211. [4] S. Arifuzzaman, M. Khan, and M. Marathe. 2013. Patric: A parallel algorithm [32] Hossein Esfandiari, Mohammadtaghi Hajiaghayi, Vahid Liaghat, Morteza Mon- for counting triangles in massive networks. In Proceedings of the International emizadeh, and Krzysztof Onak. 2018. Streaming algorithms for estimating the Conference on Information and Knowledge Management (CIKM). 529–538. matching size in planar graphs and beyond. ACM Transactions on Algorithms [5] Sepehr Assadi, Michael Kapralov, and Sanjeev Khanna. 2018. A Simple Sublinear- (TALG) 14, 4 (2018), 48. Time Algorithm for Counting Arbitrary Subgraphs via Edge Sampling. In Proc. [33] I. Finocchi, M. Finocchi, and E. G. Fusco. 2015. Clique Counting in MapReduce: 10th Conference on Innovations in Theoretical Computer Science. Algorithms and Experiments. ACM Journal of Experimental Algorithmics 20 [6] Sepehr Assadi, Sanjeev Khanna, and Yang Li. 2017. On estimating maximum (2015), 1–7. https://doi.org/10.1145/2794080 matching size in graph streams. In Proceedings of the Twenty-Eighth Annual [34] David Garcıa-Soriano and Konstantin Kutzkov. 2014. Triangle counting in ACM-SIAM Symposium on Discrete Algorithms. SIAM, 1723–1742. streamed graphs via small vertex covers. Tc 2 (2014), 3. [7] Albert Atserias, Martin Grohe, and Dániel Marx. 2008. Size bounds and query [35] G. Goel and J. Gustedt. 2006. Bounded arboricity to determine the local struc- plans for relational joins. In 2008 49th Annual IEEE Symposium on Foundations ture of sparse graphs. In International Workshop on Graph-Theoretic Concepts in of Computer Science. IEEE, 739–748. Computer Science. Springer, 159–167. [8] H. Avron. 2010. Counting triangles in large graphs using randomized matrix [36] S. Jain and C. Seshadhri. 2017. A Fast and Provable Method for Estimating Clique trace estimation. In Workshop on Large-scale Data Mining: Theory and Applica- Counts Using Turán’s Theorem. In Conference on the World Wide Web (WWW). tions (LDMTA), Vol. 10. 10–9. 441–449. [9] Ziv Bar-Yossef, Ravi Kumar, and D. Sivakumar. 2002. Reductions in Streaming [37] Madhav Jha, C Seshadhri, and Ali Pinar. 2013. A space efficient streaming al- Algorithms, with an Application to Counting Triangles in Graphs. In Proc. 13th gorithm for triangle counting using the birthday paradox. In Proc. 19th Annual Annual ACM-SIAM Symposium on Discrete Algorithms. 623–632. SIGKDD International Conference on Knowledge Discovery and Data Mining. 589– [10] Suman Bera, Noujan Pashanasangi, and C. Seshadhri. 2020. Linear Time Sub- 597. graph Counting, Graph Degeneracy, and the Chasm at Size Six. In Conference on [38] Hossein Jowhari and Mohammad Ghodsi. 2005. New streaming algorithms for Innovations in Theoretical Computer Science Proc. 11th. counting triangles in graphs. In Computing and Combinatorics. Springer, 710– [11] Suman K. Bera and Amit Chakrabarti. 2017. Towards Tighter Space Bounds for 716. Counting Triangles and Other Substructures in Graph Streams. In 34th Sympo- [39] John Kallaugher and Eric Price. 2017. A hybrid sampling scheme for triangle sium on Theoretical Aspects of Computer Science (STACS 2017), Vol. 66. 11:1–11:14. counting. In Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on [12] Suman K Bera, Amit Chakrabarti, and Prantar Ghosh. 2019. Graph Coloring via Discrete Algorithms. Society for Industrial and Applied Mathematics, 1778–1797. Degeneracy in Streaming and Other Space-Conscious Models. arXiv preprint [40] B. Kalyanasundaram and G. Schintger. 1992. The probabilistic communication arXiv:1905.00566 (2019). complexity of set intersection. SIAM Journal on Discrete Mathematics 5, 4 (1992), [13] Vladimir Braverman, Rafail Ostrovsky, and Dan Vilenchik. 2013. How Hard Is 545–557. Counting Triangles in the Streaming Model?. In Proc. 40th International Collo- [41] Daniel M Kane, Kurt Mehlhorn, Thomas Sauerwald, and He Sun. 2012. Counting quium on Automata, Languages and Programming. 244–254. arbitrary subgraphs in data streams. In Proc. 39th International Colloquium on [14] Luciana S. Buriol, Gereon Frahling, Stefano Leonardi, Alberto Marchetti- Automata, Languages and Programming. 598–609. Spaccamela, and Christian Sohler. 2006. Counting Triangles in Data Streams. [42] M. N. Kolountzakis, G. L. Miller, R. Peng, and C. E. Tsourakakis. 2012. Efficient In Proc. 25th ACM Symposium on Principles of Database Systems. 253–262. triangle counting in large graphs via degree-based vertex partitioning. Internet [15] Amit Chakrabarti.[n.d.]. CS49: Data Stream Algorithms Lecture Notes, Fall 2011. Mathematics 8, 1-2 (2012), 161–185. http://www.cs.dartmouth.edu/~ac/Teach/data-streams-lecnotes.pdf [43] Eyal Kushilevitz. 1997. Communication complexity. In Advances in Computers. [16] MT Chao. 1982. A general purpose unequal probability sampling plan. Vol. 44. 331–360. Biometrika 69, 3 (1982), 653–656. [44] Konstantin Kutzkov and Rasmus Pagh. 2014. Triangle counting in dynamic [17] H. Chernoff. 1952. A Measure of Asymptotic Efficiency for Tests of a Hypothesis graph streams. In Proc. 14th Scandinavian Symposium and Workshops on Algo- Based on the sum of Observations. Annals of Mathematical Statistics 23, 4 (1952), rithm Theory. 306–318. 493–507. [45] Madhusudan Manjunath, Kurt Mehlhorn, Konstantinos Panagiotou, and He Sun. [18] N. Chiba and T. Nishizeki. 1985. Arboricity and subgraph listing algorithms. 2011. Approximate Counting of Cycles in Streams. In Proc. 19th Annual European SIAM J. Comput. 14, 1 (1985), 210–223. Symposium on Algorithms. 677–688. [19] Norishige Chiba and Takao Nishizeki. 1985. Arboricity and Subgraph Listing [46] Andrew McGregor, Sofya Vorotnikova, and Hoa T. Vu. 2016. Better Algorithms Algorithms. SIAM J. Comput. 14, 1 (1985), 210–223. for Counting Triangles in Data Streams. In Proceedings of the 35th ACM SIGMOD- [20] S. Chu and J. Cheng. 2011. Triangle listing in massive networks and its applica- SIGACT-SIGAI Symposium on Principles of Database Systems. 401–411. tions. In Proceedings of the International Conference on Knowledge Discovery and [47] Rasmus Pagh and Charalampos E. Tsourakakis. 2012. Colorful triangle counting Data Mining (SIGKDD). 672–680. and a mapreduce implementation. Inform. Process. Lett. 112, 7 (2012), 277–281. 10 [48] Aduri Pavan, Kanat Tangwongsan, Srikanta Tirthapura, and Kun-Lung Wu. 2013. on Data Mining (ICDM), Vol. 4. 5. http://arxiv.org/abs/1202.5230 Counting and sampling triangles from a graph stream. Proceedings of the VLDB [55] K. Shin, T. Eliassi-Rad, and C. Faloutsos. 2018. Patterns and anomalies in k-cores Endowment 6, 14 (2013), 1870–1881. of real-world graphs with applications. Knowledge and Information Systems 54, [49] A. A. Razborov. 1992. On the distributional complexity of disjointness. Theoret- 3 (2018), 677–710. ical Computer Science 106, 2 (1992), 385–390. [56] S. Suri and S. Vassilvitskii. 2011. Counting triangles and the curse of the last re- [50] Alessandra Sala, Lili Cao, Christo Wilson, Robert Zablit, Haitao Zheng, and ducer. In Proceedings of the International Conference on World Wide Web (WWW). Ben Y Zhao. 2010. Measurement-calibrated graph models for social network 607–614. https://doi.org/10.1145/1963405.1963491 experiments. In Proceedings of the 19th international conference on World wide [57] K. Tangwongsan, A. Pavan, and S. Tirthapura. 2013. Parallel Triangle Counting web. ACM, 861–870. in Massive Streaming Graphs. In Proceedings of the International Conference on [51] T. Schank and D. Wagner. 2005. Approximating Clustering Coefficient and Tran- Information and Knowledge Management (CIKM). ACM, 781–786. sitivity. Journal of Graph Algorithms and Applications 9 (2005), 265–275. Issue [58] C. E. Tsourakakis. 2008. Fast counting of triangles in large real networks with- 2. out counting: Algorithms and laws. In International Conference on Data Mining [52] T. Schank and D. Wagner. 2005. Finding, Counting and Listing All Triangles in (ICDM). 608–617. Large Graphs, an Experimental Study. In Experimental and Efficient Algorithms. [59] C. E. Tsourakakis, U. Kang, G.L. Miller, and C. Faloutsos. 2009. Doulion: count- 606–609. ing triangles in massive graphs with a coin. In Proceedings of the International [53] Comandur Seshadhri, Tamara G Kolda, and Ali Pinar. 2012. Community struc- Conference on Knowledge Discovery and Data Mining (SIGKDD). 837–846. ture and scale-free collections of Erdős-Rényi graphs. Physical Review E 85, 5 [60] C.E. Tsourakakis,M. N.Kolountzakis,and G. L.Miller. 2011. Triangle Sparsifiers. (2012), 056109. Journal of Graph Algorithms and Applications 15, 6 (2011), 703–726. [54] C. Seshadhri, A. Pinar, and T. G. Kolda. 2013. Fast Triangle Counting through [61] Duncan J Watts and Steven H Strogatz. 1998. Collective dynamics of âĂŸsmall- Wedge Sampling. arXiv:1202.5230. In Proceedings of the International Conference worldâĂŹnetworks. nature 393, 6684 (1998), 440.

11