Conditional Hardness Results for Massively Parallel Computation from Distributed Lower Bounds

2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS) Conditional Hardness Results for Massively Parallel Computation from Distributed Lower Bounds Mohsen Ghaffari Fabian Kuhn Jara Uitto Department of Computer Science Department of Computer Science Department of Computer Science ETH Zurich¨ University of Freiburg Aalto University Zurich,¨ Switzerland Freiburg, Germany Espoo, Finland Email: [email protected] Email: [email protected] Email: jara.uitto@aalto.fi Abstract—We present the first conditional hardness results introduced a theoretical model, by now referred to as the for massively parallel algorithms for some central graph prob- Massively Parallel Computation (MPC) model, as a simple lems including (approximating) maximum matching, vertex abstraction that captures essential aspects of parallelism in cover, maximal independent set, and coloring. In some cases, these hardness results match or get close to the state of the these settings. This model and algorithmic developments in art algorithms. Our hardness results are conditioned on a it have been receiving increasingly more attention over the widely believed conjecture in massively parallel computation past few years [5]–[28]. about the complexity of the connectivity problem. We also The MPC model: We have a number of machines, note that it is known that an unconditional variant of such each with S bits of memory, that can communicate with hardness results might be somewhat out of reach for now, as it would lead to considerably improved circuit complexity each other in synchronous rounds. Per round, each machine O S lower bounds and would concretely imply that NC1 is a proper can send ( ) bits to the other machines in total, it can subset of P. We obtain our conditional hardness result via a receive O(S) bits from other machines, and it can also general method that lifts unconditional lower bounds from the perform some local computation, ideally of time at most well-studied LOCAL model of distributed computing to the poly(S). We assume that the number of machines M is at massively parallel computation setting. least at large as N/S, where N is the size of the input, but Keywords-Parallel algorithms; Algorithm design and analy- ideally not much larger. For graph problems, the input size is sis. N = O(m+n), where m denotes the number of edges and n denotes the number of vertices. We sometimes refer to S as I. INTRODUCTION AND RELATED WORK local memory of the system while M·S is the global memory We present conditional hardness results for massively of the system. The common interpretation is that, when parallel algorithms for some central graph problems. Our trying to apply such algorithms to a setting, the parameter main technical contribution is a general method that lifts S will be dictated to us by what is physically available, and unconditional lower bounds from the more classic LOCAL we then should achieve the best round complexity for that model of distributed computing to the massively parallel local memory. Hence, the main objective is to obtain MPC computation setting. We start with a review of the massively algorithms that have a small round complexity for as small as parallel computation setting and the state of the art for graph possible local memory. Ideally, the algorithm should be using problems, and we then present our results. global memory that is almost equal to the input size; though we note that minimizing the global memory is sometimes A. Massively Parallel Computation: Formal Model and treated as secondary target. State of the Art State of the Art for Graph Problems: Several graph Massively parallel computation and the emergence of problems have been studied in the MPC model. We briefly several popular practical frameworks for it — such as review the state of the art for some of the central graph MapReduce [1], Hadoop [2], Spark [3], and Dryad [4] — has problems. been one of the impactful developments of the past 10 to 15 One of most central problems is connectivity, i.e., identi- years in computer science. There is also a growing interest fying the connected components of a graph. Karloff et al. [5] from theoretical computer science to build the theoretical showed that this problem can be solved in O(log n) rounds foundations of computation in these settings. This growth using machines with local memory S = nα for any constant is fueled by the understanding that (such coarse-grained α ∈ (0, 1), using Boruvka-style algorithms. Whether this notions of) parallelism will be an essential necessity in the round complexity can be improved remains unknown but future of computation, as the data sets are growing faster there is strong evidence to suggest that this bound is the than the capacity of single processors. Karloff et al. [5] best possible: Beame et al. [8] showed that a natural class 2575-8454/19/$31.00 ©2019 IEEE 1638 DOI 10.1109/FOCS.2019.00099 of algorithms cannot run in o(log n) rounds. The problem is et al. [32] for graphs with arboricity at most λ for (1 + ε)- even open for the seemingly trivial case where the algorithm approximation of maximum matching, maximal independent should distinguish whether the input is one n-node cycle or set, maximal matching, and 2-approximation of vertex cover. two n/2-node cycles. The problem has come to be believed Another basic graph problem that was studied recently to be inherently hard by now, and is conjectured to require is vertex coloring: Harvey et al. [19] gave an O(1)-round Ω(log n) rounds. Proving the Ω(log n) lower bound might be algorithm for (1 + o(1))Δ-coloring for strongly superlinear hard, however. In fact, as argued by Roughgarden et al. [13], local memory S = n1+ε, Assadi et al. [28] gave an O(1)- proving any super-constant round complexity MPC lower round algorithm for (Δ + 1)-coloring for near-linear local bound for any problem in P and for S = nα is presumably memory S = O(n log3 n), and Chang et al. [33] gave a a difficult task, as it would imply strong circuit complexity (Δ + 1)-coloring for the strongly sublinear memory regime NC P S nα α ∈ , lower bounds and particularly show that 1 . That of √ = for any constant (0 1) that runs in is, it would show that there is a problem in P that does O( log log n) rounds. not admit a logspace-uniform circuit family of fan-in 2 To the best of our knowledge, there are no hardness or with poly(n) gates and O(log n) depth. Recently, Andoni conditional hardness (and particularly super-constant) results et al. [23] showed that if each component of the input known for any of the above problems in the MPC setting, has diameter at most D, then the round complexity can including matching approximation, vertex cover approxi- be improved to O(log D · log logMS/n n). For much larger mation, maximal independent set, and coloring. The only local memory, there are faster algorithms. Lattanzi et al. [7] exception is the hardness for the connectivity problem dis- gave an O(1)-round algorithm for the strongly superlinear cussed above: an Ω(log n)-round lower bound is conjectured memory regime, where S = n1+ for a constant >0. and widely believed to hold in the strongly sublinear local An O(1)-round algorithm for near-linear local memory of memory regime, even for distinguishing one cycle from two S = O(n log3 n) is implicit in the sketching work of Ahn, cycles. Guha, and McGregor for the streaming setting [29], [30], and an O(1)-round algorithm for linear local memory S = O(n) B. Our Contribution is implied by the results of Jurdinski and Nowicki [31]. We present conditional hardness results for several of Another central graph problem in the recent develop- the above graph problems — matching and vertex cover ments has been maximum matching and its approximation, approximation, vertex coloring, maximal independent set, as well as other closely related problems such as vertex etc. — for the strongly sublinear memory regime where cover and maximal independent set (MIS). The filtering each machine has memory S = nα for any constant method of Lattanzi et al. [7] gives an O(1) round algo- α ∈ (0, 1). Our hardness results apply to any component- rithm for maximal matching, and thus a 2-approximation of stable MPC algorithm, which means algorithms where the maximum matching, for strongly superlinear local memory outputs in one connected component are independent of S = n1+Ω(1). However, the task is significantly harder for other components. See Section II for a formal definition. lower memory regimes. A recent breakthrough of Czumaj et To the best of our knowledge, all known algorithms in al. [15] gave an O((log log n)2)-round algorithm for (1+ε)- the literature are component-stable or can easily be made approximation of maximum matching when S = O˜(n), for component-stable with no asymptotic increase in the round any constant ε>0. The round complexity was improved complexity. Our conditional hardness results are based on later to O(log log n) by Assadi et al. [17] and Ghaffari et the popular conjecture mentioned above that distinguishing al. [18]. Similar approaches give O(log log n)-round algo- one cycle from two cycles needs Ω(log n) rounds, in the rithms for (2 + ε)-approximation of minimum vertex cover strongly sublinear memory regime. The hardness can be [18] and for computing a maximal independent set, both for based also on a seemingly even more robust assumption that S = O˜(n).

Conditional Hardness Results for Massively Parallel Computation from Distributed Lower Bounds

2.5 Classification of Parallel Computers

A Massively-Parallel Mixed-Mode Computer Designed to Support

Massively Parallel Computing with CUDA

CS 677: Parallel Programming for Many-Core Processors Lecture 1

Core Processors

Massively Parallel Computers: Why Not Prirallel Computers for the Masses?

Introduction to Parallel Computing

Massively Parallel Processor Architectures for Resource-Aware Computing

Real-Time Network Traffic Simulation Methodology with a Massively Parallel Computing Architecture

Massively Parallel Message Passing: Time to Start from Scratch?

An Event-Driven Massively Parallel Fine-Grained Processor Array

A Massively Parallel Digital Learning Processor