Vertex Ordering Problems in Directed Graph Streams∗

Vertex Ordering Problems in Directed Graph Streams∗ Amit Chakrabartiy Prantar Ghoshz Andrew McGregorx Sofya Vorotnikova{ Abstract constant-pass algorithms for directed reachability [12]. We consider directed graph algorithms in a streaming setting, This is rather unfortunate given that many of the mas- focusing on problems concerning orderings of the vertices. sive graphs often mentioned in the context of motivating This includes such fundamental problems as topological work on graph streaming are directed, e.g., hyperlinks, sorting and acyclicity testing. We also study the related citations, and Twitter \follows" all correspond to di- problems of finding a minimum feedback arc set (edges rected edges. whose removal yields an acyclic graph), and finding a sink In this paper we consider the complexity of a variety vertex. We are interested in both adversarially-ordered and of fundamental problems related to vertex ordering in randomly-ordered streams. For arbitrary input graphs with directed graphs. For example, one basic problem that 1 edges ordered adversarially, we show that most of these motivated much of this work is as follows: given a problems have high space complexity, precluding sublinear- stream consisting of edges of an acyclic graph in an space solutions. Some lower bounds also apply when the arbitrary order, how much memory is required to return stream is randomly ordered: e.g., in our most technical result a topological ordering of the graph? In the offline we show that testing acyclicity in the p-pass random-order setting, this can be computed in O(m + n) time using model requires roughly n1+1=p space. For other problems, Kahn's algorithm [16] or via depth-first trees [23] but random ordering can make a dramatic difference: e.g., it is nothing was known in the data stream setting. possible to find a sink in an acyclic tournament in the one- We also consider the related minimum feedback pass random-order model using polylog(n) space whereas arc set problem, i.e., estimating the minimum num- under adversarial ordering roughly n1=p space is necessary ber of edges (arcs) that need to be removed such and sufficient given Θ(p) passes. We also design sublinear that the resulting graph is acyclic. This problem is algorithms for the feedback arc set problem in tournament NP-hard and the best known approximation factor is graphs; for random graphs; and for randomly ordered O(log n log log n) for arbitrary graphs [10], although a streams. In some cases, we give lower bounds establishing PTAS is known in the case of tournaments [18]. Again, that our algorithms are essentially space-optimal. Together, nothing was known in the data stream model. In con- our results complement the much maturer body of work on trast, the analogous problem for undirected graphs is algorithms for undirected graph streams. well understood in the data stream model. The number of edges required to make an undirected graph acyclic 1 Introduction is m − n + c where c is the number of connected components. The number of connected components can be While there has been a large body of work on undi- computed in O(n log n) space by constructing a span- rected graphs in the data stream model [20], the com- ning forest [2, 11]. plexity of processing directed graphs (digraphs) in this model is relatively unexplored. The handful of excep- Previous Work. Some versions of the problems we tions include multipass algorithms emulating random study in this work have been considered previously in walks in directed graphs [15,22], establishing prohibitive the query complexity model. For example, Huang et space lower bounds on finding sinks [13] and answering al. [14] consider the \generalized sorting problem" where reachability queries [11], and ruling out semi-streaming G is an acyclic graph with a unique topological order. The algorithm is presented with an undirected version ∗Supported in part by NSF under awards 1907738, 1908849, of this graph and may query any edge to reveal its and 1934846. direction. The goal is to learn the topological ordering yDepartment of Computer Science, Dartmouth College. with the minimum number of queries. Huang et al. [14] zDepartment of Computer Science, Dartmouth College. and Angelov et al. [5] also studied the average case xCollege of Information and Computer Sciences, University of Massachusetts, Amherst. {Department of Computer Science, Dartmouth College. Work 1The problem was explicitly raised in an open problems session performed in part while the author was at University of Mas- at the Shonan Workshop \Processing Big Data Streams" (June sachusetts, Amherst. 5-8, 2017) and generated considerable discussion. Copyright c 2020 by SIAM Unauthorized reproduction of this article is prohibited complexity of various problems where the input graph obtain the first random-order super-linear (in n) lower is chosen from some known distribution. Ailon [3] bounds for the undirected graph problems of deciding studied the equivalent problem for feedback arc set in (i) whether there exists a short s{t path (ii) whether tournaments. Note that all these query complexity there exists a perfect matching. results are adaptive and do not immediately give rise Tournaments. A tournament is a digraph that has to small-space data stream algorithms. exactly one directed edge between each pair of distinct Perhaps the relative lack of progress on streaming vertices. If we assume that the input graph is a algorithms for directed graph problems stems from their tournament, it is trivial to find a topological ordering, being considered \implicitly hard" in the literature, a given that one exists, by considering the in-degrees of point made in the recent work of Khan and Mehta [19]. the vertices. Furthermore, it is known that ordering Indeed, that work and the also-recent work of Elkin [9] the vertices by in-degree yields a 5-approximation to provide the first nontrivial streaming algorithms for feedback arc set [8]. computing a depth-first search tree and a shortest- In Section3, we present an algorithm which com- paths tree (respectively) in semi-streaming space, using putes a (1 + ")-approximation to feedback arc set in O(n= polylog n) passes. Notably, fairly non-trivial work one pass using O("−2n) space2. However, in the post- was needed to barely beat the trivial bound of O(n) e processing step, it estimates the number of back edges passes. for every permutation of vertices in the graph, thus re- Some of our work here applies and extends the sulting in exponential post-processing time. Despite its work of Guruswami and Onak [12], who gave the \brute force" feel, our algorithm is essentially optimal, first super-linear (in n) space lower bounds in the both in its space usage (unconditionally) and its post- streaming model for decision problems on graphs. In processing time (in a sense we shall make precise later). particular, they showed that solving reachability in n- We address these issues in Section 3.4. On the other vertex digraphs using p passes requires n1+Ω(1=p)=pO(1) hand, in Section 3.2, we show that with O(log n) addi- space. Via simple reductions, they then showed similar tional passes it is possible to compute a 3-approximation lower bounds for deciding whether a given (undirected) to feedback arc set while using only polynomial time and graph has a short s{t path or a perfect matching. Oe(n) space. Lastly, in Section4, we consider the problem of 1.1 Results finding a sink in a tournament which is guaranteed to Arbitrary Graphs. To set the stage, in Section2 be acyclic. Obviously, this problem can be solved in we present a number of negative results for the case a single pass using O(n) space by maintaining an \is- when the input digraph can be arbitrary. In particular, sink" flag for each vertex. Our results show that for we show that there is no one-pass sublinear-space algo- arbitrary order streams this is tight. We prove that rithm for such fundamental digraph problems as testing finding a sink in p passes requires Ω(n1=p=p2) space. whether an input digraph is acyclic, topologically sort- We also provide an O(n1=p log(3p))-space sink-finding ing it if it is, or finding its feedback arc set if it is not. algorithm that uses O(p) passes, for any 1 6 p 6 log n. These results set the stage for our later focus on spe- In contrast, we show that if the stream is randomly cific families of graphs, where we can do much more, ordered, then using polylog n space and a single pass is algorithmically. sufficient. This is a significant separation between the For our lower bounds, we consider both arbitrary arbitrary-order and random-order data stream models. and random stream orderings. In Section 2.1, we con- Random Graphs. In Section5, we consider a natu- centrate on the arbitrary ordering and show that check- ral family of random acyclic graphs (see Definition 1.1 ing whether the graph is acyclic, finding a topological below) and present two algorithms for finding a topo- ordering of a directed acyclic graph (DAG), or any mul- logical ordering of vertices. We show that, for this fam- tiplicative approximation of feedback arc set requires ily, Oe(n4=3) space is sufficient to find the best ordering Ω(n2) space in one pass. The lower bound extends to given O(log n) passes. Alternatively, O(n3=2) space is n1+Ω(1=p)=pO(1) when the number of passes is p 1. In e > sufficient given only a single pass, on the assumption Section 2.2, we show that essentially the same bound that the edges in the stream are randomly ordered.

Load more