Algebraic Methods in the Congested Clique∗

Algebraic Methods in the Congested Clique∗ y z Keren Censor-Hillel Petteri Kaski Janne H. Korhonen Technion Helsinki Institute for Helsinki Institute for [email protected] Information Technology & Information Technology & Aalto University University of Helsinki petteri.kaski@aalto.fi janne.h.korhonen@helsinki.fi Christoph Lenzen Ami Paz Jukka Suomela MPI for Informatics Technion Helsinki Institute for [email protected] [email protected] Information Technology & Aalto University jukka.suomela@aalto.fi ABSTRACT lem Complexity]: General; G.2.2 [Mathematics of Com- In this work, we use algebraic methods for studying distance puting]: Discrete Mathematics|Graph Theory computation and subgraph detection tasks in the congested clique model. Specifically, we adapt parallel matrix multipli- Keywords cation implementations to the congested clique, obtaining Distributed computing; congested clique model; lower bounds; an O(n1−2=!) round matrix multiplication algorithm, where matrix multiplication; subgraph detection; distance compu- ! < 2:3728639 is the exponent of matrix multiplication. In tation conjunction with known techniques from centralised algorithmics, this gives significant improvements over previous best upper bounds in the congested clique model. The highlight 1. INTRODUCTION results include: Algebraic methods have become a recurrent tool in cen- { triangle and 4-cycle counting in O(n0:158) rounds, im- tralised algorithmics, employing a wide range of techniques (e.g., [10{16, 21{23, 27, 29, 30, 44, 45, 51, 59, 72, 73]). In proving upon the O(n1=3) triangle counting algorithm this paper, we bring techniques from the algebraic toolbox to of Dolev et al. [DISC 2012], the aid of distributed computing, by leveraging fast matrix { a (1 + o(1))-approximation of all-pairs shortest paths multiplication in the congested clique model. in O(n0:158) rounds, improving upon the O~(n1=2)-round In the congested clique model, the n nodes of a graph (2+o(1))-approximation algorithm of Nanongkai [STOC G communicate by exchanging messages of O(log n) size in 2014], and { computing the girth in O(n0:158) rounds, which is the first non-trivial solution in this model. Running time In addition, we present a novel constant-round combinatorial Problem This work Prior work algorithm for detecting 4-cycles. mat. mult. (semiring) O(n1=3) | Categories and Subject Descriptors mat. mult. (ring) O(n0:158) O(n0:373)[26] 0:158 1=3 F.1.1 [Computation by Abstract Devices]: Models of triangle counting O(n ) O(n = log n)[25] Computation; F.2.0 [Analysis of Algorithms and Prob- 4-cycle detection O(1) O(n1=2= log n)[25] 0:158 1=2 ∗ 4-cycle counting O(n ) O(n = log n)[25] A full version of this paper is available at [18]. k-cycle detection 2O(k)n0:158 O(n1−2=k= log n)[25] y Supported by ISF Individual Research Grant 1696/14 girth O(n0:158) | z Supported by the European Research Council under the APSP: European Union's Seventh Framework programme (FP/2007{ weighted, directed O(n1=3 log n) | 2013) / ERC Grant Agreement n. 338077. · weighted diameter UO(Un0:158) | · (1 + o(1))-approx. O(n0:158) | Permission to make digital or hard copies of all or part of this work for personal or ~ 1=2 classroom use is granted without fee provided that copies are not made or distributed · (2 + o(1))-approx. O(n )[58] for profit or commercial advantage and that copies bear this notice and the full cita- APSP: tion on the first page. Copyrights for components of this work owned by others than unweighted, undirected O(n0:158) | ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- ~ 1=2 publish, to post on servers or to redistribute to lists, requires prior specific permission · (2 + o(1))-approx. O(n )[58] and/or a fee. Request permissions from [email protected]. PODC’15, July 21–23, 2015, Donostia-San Sebastián, Spain. Table 1: Our results versus prior work, using ! < Copyright c 2015 ACM 978-1-4503-3617-8 /15/07 ...$15.00. 2:3729 [34]; O~ hides polylogarithmic factors. DOI: http://dx.doi.org/10.1145/2767386.2767414 . a fully-connected synchronous network; initially, each node exists a matrix multiplication algorithm in the congested is aware of its neighbours in G. In comparison with the clique running in O(nσ) rounds. In this notation, Theorem 1 traditional CONGEST model [61], the key difference is that gives us a pair of nodes can communicate directly even if they are not ρ ≤ 1 − 2=! < 0:15715 ; adjacent in graph G. The congested clique model masks away the effect of distances on the computation and focuses on prior to this work, it was known that ρ ≤ ! − 2 [26]. the limited bandwidth. As such, it has been recently gaining For the rest of this paper, we will { as is convention for increasing attention [25, 26, 37, 38, 47, 50, 52, 58, 60, 64], in centralised algorithmics { slightly abuse notation by writing an attempt to understand the relative computational power nρ for the complexity of matrix multiplication in the con- of distributed computing models. gested clique. This hides factors of O(n") resulting from the The key insight of this paper is that matrix multiplication fact that ρ is defined as infimum of an infinite set. We also algorithms from parallel computing can be adapted to obtain use O~ and Ω~ notation to hide polylogarithmic factors. an O(n1−2=!) round matrix multiplication algorithm in the congested clique, where ! < 2:3728639 is the matrix mul- Lower bounds for matrix multiplication. Our results are tiplication exponent [34]. Combining this with well-known optimal in the sense that for any sequential matrix multi- centralised techniques allows us to use fast matrix multipli- plication implementation, no scheme simulating it on the cation to solve various combinatorial problems, immediately congested clique can give a faster algorithm than the con- 0:158 giving O(n )-time algorithms in the congested clique for struction underlying Theorem 1; this follows from known many classical graph problems. Indeed, while most of the results for parallel matrix multiplication [2, 8, 42, 70]. More- techniques we use in this work are known beforehand, their over, we note that for the broadcast congested clique model, combination gives significant improvements over the best pre- where each node is required to send the same message to all viously known upper bounds. Table 1 contains a summary nodes in any given round, recent lower bounds [39] imply of our results, which we overview in more details in what that matrix multiplication requires Ω(~ n) rounds. follows. 1.2 Applications in Cycle Detection and 1.1 Matrix Multiplication in the Congested Counting Clique Our first application of fast matrix multiplication is to the As a basic primitive, we consider the computation of the problems of triangle counting [43] and 4-cycle counting. product P = ST of two n×n matrices S and T on a congested clique of n nodes. We will tacitly assume that the matrices Corollary 2. On directed and undirected graphs, count- are initially distributed so that node v has row v of both ing triangles and 4-cycles takes O(nρ) rounds. S and T , and it will receive row v of P in the end. Recall that the matrix multiplication exponent ! is defined as the For ρ ≤ 1−2=!, this is an improvement upon the previously infimum over σ such that the product of two n × n matrices best known O(n1=3)-round triangle detection algorithm of σ can be computed with O(n ) arithmetic operations; it is Dolev et al. [25]. In particular, we disprove the conjecture known that 2 ≤ ! < 2:3728639 [34], and it is conjectured, of Dolev et al. [25] that any deterministic oblivious algorithm though not unanimously, that ! = 2. for detecting triangles requires Ω(~ n1=3) rounds. When only detection of cycles is required, we observe that Theorem 1. The product of two n × n matrices can be combining fast distributed matrix multiplication with the computed in a congested clique of n nodes in O(n1=3) rounds 1−2=!+" well-known technique of colour-coding [5] allows to detect over semirings. Over rings, O(n ) rounds suffice (for k-cycles in O~(nρ) rounds for any constant k. This improves any constant " > 0). upon the subgraph detection algorithm of Dolev et al. [25], ~ 1−2=k Theorem 1 follows by adapting known parallel matrix which requires O(n ) rounds for detecting (arbitrary) multiplication algorithms for semirings [1, 55] and rings [7, subgraphs of k nodes. 53, 56, 71] to the clique model, via the routing technique Corollary 3. For any graph, the existence of k-cycles of Lenzen [47]. In fact, with little extra work one can show can be detected in 2O(k)nρ log n rounds. that the resulting algorithm is also oblivious, that is, the communication pattern is predefined and does not depend on For the specific case of k = 4, we provide a novel algorithm the input matrices. Hence, the oblivious routing technique that does not use matrix multiplication and detects 4-cycles of Dolev et al. [25] suffices for implementing these algorithms. in only O(1) rounds. The above addresses matrices whose entries can be encoded with O(log n) bits, which is sufficient for dealing with integers Theorem 4. On undirected graphs, the existence of 4- of absolute value at most nO(1). In general, if b bits are cycles can be detected in O(1) rounds. If such a cycle exists, sufficient to encode matrix entries, the bounds above hold one such cycle can also be reported in O(1) rounds.

Algebraic Methods in the Congested Clique∗

1 Introduction and Contents

Mathematics People

Amortized Circuit Complexity, Formal Complexity Measures, and Catalytic Algorithms

Michael Oser Rabin Automata, Logic and Randomness in Computation

Modular Algorithms in Symbolic Summation and Sy

Strassen's Matrix Multiplication Algorithm for Matrices of Arbitrary

An Interview with Michael Rabin Interviewer: David Harel (DH) November 12, 2015 in Jerusalem

Visiting Mathematicians Jon Barwise, in Setting the Tone for His New Column, Has Incorporated Three Articles Into This Month's Offering

Notices of the American Mathematical Society ABCD Springer.Com

Session II – Unit I – Fundamentals

Approximate Nearest Neighbor Search in High Dimensions

Gaussian Elimination Is Not Optimal, Revisited