Near-Optimal Two-Pass Streaming for Sampling Random Walks over Directed Graphs

Lijie Chen∗ Gillat Kol† Dmitry Paramonov‡ Raghuvansh R. Saxena§ Zhao Song¶ Huacheng Yuk

June 17, 2021

Abstract

For a directed graph G with n vertices and a start vertex ustart, we wish to (approximately) sample an L-step random walk over G starting from ustart with minimum space using an al- gorithm that only makes few passes over the edges of the graph. This problem found many applications, for instance, in approximating the PageRank of a webpage. If only a single pass is allowed, the space complexity of this problem was√ shown to be Θ(e n · L). Prior to our work, a better space complexity was only known with Oe( L) passes. We settle the space complexity of this√ random walk simulation problem for two-pass - ing , showing that it is Θ(e n· L), by giving almost matching upper and lower bounds. Our lower bound argument extends to every constant number of passes p, and shows that√ any p-pass algorithm for this problem uses Ω(e n·L1/p) space. In addition, we show a similar Θ(e n· L) bound on the space complexity of any algorithm (with any number of passes) for the related problem of sampling an L-step random walk from every vertex in the graph.

∗MIT. [email protected] †Princeton University. [email protected] ‡Princeton University. [email protected] §Princeton University. [email protected] ¶Institute for Advanced Study. [email protected] kPrinceton University. [email protected] 1 Introduction

1.1 Background and Motivation Graph streaming algorithms. Graph streaming algorithms have been the focus of extensive study over the last two decades, mainly due to the important practical motivation in analyzing potentially huge structured data representing the relationships between a set of entities (e.g., the link graph between webpages and the friendship graph in a social network). In the graph streaming setting, an algorithm gets access to a sequence of graph edges given in an arbitrary order and it can read them one-by-one in the order in which they appear in the sequence. The goal here is to design algorithms solving important graph problems that only make one or few passes through the edge sequence, while using as little memory as possible. Much of the streaming literature was devoted to the study of one-pass algorithms and an Ω(n2) space lower bound for such algorithms was shown for many fundamental graph problems. A partial list includes: maximum matching and minimum vertex cover [FKM+04, GKK12], s-t reachability and [CGMV20, FKM+04, HRR98], shortest path and diameter [FKM+04, FKM+09], maximum and (global or s-t) minimum cut [Zel11], maximal independent set [ACK19, CDK19], and dominating set [AKL16, ER14]. Recently, the multi-pass streaming setting received quite a bit of attention. For some graph problems, allowing a few passes instead of a single pass can reduce the memory consumption of a streaming algorithm dramatically. In fact, even a single additional pass over the input can already greatly enhance the capability of the algorithms. For instance, minimum cut and s-t minimum cut in undirected graphs can be solved in two passes with only Oe(n) and O(n5/3) space, respec- tively [RSW18] (as mentioned above, any one-pass algorithm for these problems must use Ω(n2) space). Additional multi-pass algorithms include an O(1)-pass algorithm for approximate match- ing [GKMS19, GKK12, Kap13, McG05], an O(log log n)-pass algorithm for maximal independent set [ACK19, CDK19, GGK+18], and O(log n)-pass algorithms for approximate dominating set [AKL16, CW16, HPIMV16] and weighted minimum cut [MN20]. Simulating random walks on graphs. Simulating random walks on graphs is a well-studied algorithmic problem with may applications in different areas of computer science, such as connectiv- ity testing [Rei08], clustering [ACL07, AP09, COP03, ST13], sampling [JVV86], generating random spanning [Sch18], and approximate counting [JS89]. Since most applications of random-walk simulation are concerned with huge networks that come from practice, it is of practical interest to design low-space graph streaming algorithms with few passes for this problem. √ In an influential paper by Das Sarma, Gollapudi and Panigrahy [SGP11], an Oe( L)-pass and Oe(n) space algorithm for simulating L-step random walks on directed graphs was established. (Streaming algorithms with almost linear space complexity, like this one, are often referred to as semi-streaming algorithms). Using this algorithm together with some additional ideas, [SGP11] obtained space-efficient algorithms for estimating PageRank on graph streams. Recall that the PageRank of a webpage corresponds to the probability that a person that randomly clicks on web √ links arrives at this particular page1. However, scanning the sequence of edges Oe( L) times may

1Given a web-graph G = (V,E) representing the webpages and links between them, the PageRank of the ver- P tices satisfy PageRank(u) = (v,u)∈E PageRank(v)/d(v), simultaneously for all u, where d(·) denotes the out-degree, [BP98].

1 be time-inefficient in many realistic settings. In the one-pass streaming setting, a folklore algorithm with Oe(n · L) space complexity for simu- lating L-step random walks is known [SGP11] (see Section 2.1 for a description of this algorithm), and it is proved to be optimal [Jin19]. We mention that the work of [Jin19] also considers random √ walks on undirected graphs, and shows that Θ(e n · L) space is both necessary and sufficient for simulating L-step random walks on undirected graphs with n vertices in one pass. Both of these known algorithms for general directed graphs have their advantages and dis- advantage (either requiring many passes or more space). A natural question is whether one can interpolate between these two results and obtain an algorithm with pass complexity much smaller √ than L, yet with a space complexity much smaller than n · L. Prior to our work, it was not even √ known if an o( L)-pass streaming algorithm with n · L0.99 space is possible.

1.2 Our Results We answer the above question in the affirmative by giving a two-pass streaming algorithm with √ O(n · L) space for sampling a random walk of length L on a directed graph with n vertices. We e √ complement this result by an almost matching Ω(e n · L) lower bound on the space complexity of every two-pass streaming algorithm for this problem. In fact, our two-pass lower bound generalizes to an Ω(e n · L1/p) lower bound on the space consumption of any p-pass algorithm, for a constant p.

1.2.1 Two-Pass Algorithm for Random Walk Sampling

For a directed graph G = (V,E), a vertex ustart ∈ V and a non-negative integer L, we use G RWL (ustart) to denote the distribution of L-step random walks (v0, . . . , vL) in G starting from v0 = ustart (see Section 3.2 for formal definitions). For a distribution D over a finite domain Ω, we say that a randomized algorithm samples from D if, over its internal randomness, it outputs an element ω ∈ Ω distributed according to D. We give a space-efficient streaming algorithm for G (approximate) sampling from RWL (ustart) with small error:

Theorem 1.1 (Two-pass algorithm). There exists a streaming algorithm Atwo-pass that given an n-vertex directed graph G = (V,E), a starting vertex ustart ∈ V , a non-negative integer L indicat- ing the number of steps to be taken, and an error parameter δ ∈ (0, 1/n), satisfies the following conditions: √ −1 2 1. Atwo-pass uses at most Oe(n· L·log δ ) space and makes two passes over the input graph G.

L+1 G 2. Atwo-pass samples from some distribution D over V satisfying kD − RWL (ustart)kTV ≤ δ. Our algorithm can also be generalized to the turnstile model, paying a poly log n factor in the space usage. See Section 4.4. Observe that our algorithm allows for a considerable saving in space compared to the Atwo-pass√ folklore single-pass algorithm (O(n · L) vs O(n · L)) and considerable saving in the number of e √ e passes compared to [SGP11] (2 vs Oe( L)), at least if we allow some small error δ.

2 2 √The Oe hides logarithmic factors in n. We may assume without loss of generality that L ≤ n , as otherwise n · L > n2 and that algorithm can store the entire input graph.

2 We mention that can also be used to sample a random path from every vertex3 of G Atwo-pass √ −1 4 with the same storage cost of Oe(n · L · log δ ) and two passes . This is because Atwo-pass satisfies the useful property of obliviousness to the starting vertex ustart, meaning that it scans the input graph before the start vertex is revealed. More formally, we say that an algorithm A is oblivious to the starting vertex if it first runs a preprocessing algorithm P and then a sampling algorithm S; the algorithm P reads the input graph stream without knowing the starting vertex ustart (if A is a p-pass streaming algorithm, P makes p passes over the input graph stream), and outputs a string; S takes both the string outputted by P and a starting vertex ustart as an input, and outputs a walk on the input graph G.

1.2.2 Lower Bounds

We prove the following lower bound:

Theorem 1.2 (Multi-pass lower bound). Fix a constant β ∈ (0, 1] and an integer p ≥ 1. Let n ≥ 1 be a sufficiently large integer and let L = dnβe. Any randomized p-pass streaming algorithm that, given an n-vertex directed graph G = (V,E) and a starting vertex ustart ∈ V , samples from a distribution D such that kD − RWG(u )k ≤ 1 − 1 requires Ω(n · L1/p) space. L start TV log10 n e Plugging in p = 2 in Theorem 1.2, implies that our two-pass algorithm from Theorem 1.1 is essentially optimal. Also, with p = 1, the theorem reproduces the one-pass lower bound by [Jin19]. In addition, Theorem 1.2 rules out the possibility of a semi-streaming algorithm with any constant number of passes. √ Recall from Section 1.2.1, that our two-pass algorithm Atwo-pass utilizes Oe(n · L) space and is oblivious to the starting vertex. Interestingly, we are able to show that any oblivious algorithm for √ random walk sampling (with any number of passes) requires Ω(e n · L) space. Thus, any algorithm for random walk sampling with significantly less space than ours, has to be inherently different and have its storage depend on the starting vertex. Our lower bound for oblivious algorithms also implies that Atwo-pass gives an almost optimal algorithm for sampling a pass from every start vertex, even if any number of passes are allowed.

Theorem 1.3 (Lower bound for oblivious algorithms). Let n ≥ 1 be a sufficiently large integer and let L denote an integer satisfying that L ∈ [log40 n, n]. Any randomized algorithm that is oblivious to the start vertex and given an n-vertex directed graph G = (V,E) and a starting vertex u ∈ V , √ start samples from a distribution D such that kD−RWG(u )k ≤ 1− 1 requires Ω(n· L) space5. L start TV log10 n e

1.3 Discussions and Open Problems Better space complexity with more passes? Our results leave open a couple of interesting directions for future work. The most significant open question is to understand the streaming space complexity of sampling random walks with more than two passes. In particular, Theorem 1.2

3Note, however, that the random walks from different vertices in the graph may be correlated. 4We count towards the space complexity only the space on the work tape used by the algorithm and do not count space on the output tape (otherwise an Ω(n · L) lower bound is trivial). 5 In fact, we show that Theorem 1.3 holds even if the preprocessing algorithm P and the sampling algorithm S √are allowed to use an arbitrarily large amount of memory, as long as P passes a string of length at most (roughly) n · L to S.

3 implies that a three-pass streaming algorithm has space complexity at least Ω(e n·L1/3). Can one get Oe(n · L1/3) space with three passes, or at least O(n · L1/2−ε) space, for some constant ε > 0? Note that, as explained in Section 1.2.2, such an algorithm must utilize its knowledge of the starting vertex when it reads the graph stream. Theorem 1.2 does not rule out semi-streaming Oe(n) space algorithms even when p is a moder- ately growing function of n and L. In [SGP11], it is shown that such an O(n) space algorithm exists √ e with p = Oe( L) passes. Does a semi-streaming algorithm with, say, poly log(L) passes exist?

Undirected graphs? It would also be interesting to see what is the best two-pass streaming algorithm for simulating random walks on undirected graphs. Specifically, is it possible to combine √ our algorithm with the algorithm from [Jin19] to obtain an improvement over the optimal Oe(n· L) space complexity of a one-pass streaming algorithm for this problem?

Only outputting the end vertex? Finally, our lower bounds only apply to the case where the algorithms need to output an entire random path (v0, . . . , vL). If instead only the last vertex vL in the random walk is required, can one design better two-pass algorithms or prove a non-trivial lower bound?

2 Techniques

2.1 The Two-Pass Algorithm We next overview our two-pass algorithm from Theorem 1.1, that simulates random walks with √ only Oe(n · L) space.

The folklore one-pass algorithm. Before discussing our algorithm, it would be instructive to review the folklore Oe(n · L)-space one-pass algorithm for simulating L-step random walks in a directed graph G = (V,E) (for simplicity, we will always assume L ≤ n in the discussions). The algorithm is quite simple:

1. For every vertex v ∈ V , sample L of its outgoing neighbors with replacement and store them save save in a list Lv of length L (that is, for each j ∈ [L], the j-th element of Lv is an independent uniformly random outgoing vertex of v). This can be done in a single pass over input graph stream using reservoir sampling [Vit85].

2. Given a starting vertex ustart ∈ V , our random walk starts from ustart and repeats the following for L steps: suppose we are currently at vertex v and it is the k-th time we visit this vertex, save then we go from v to the k-th vertex in the list Lv .

It is not hard to see that the above algorithm works: whenever we visit a vertex v ∈ V , the next save element in the list Lv will always be a uniformly random outgoing neighbor of v, conditioned on the walk we have produced so far; and we will never run out of the available neighbors of v as save |Lv | = L.

4 √ A naive attempt and the obstacles. Since we are aiming at only using O(n · L) space, a e √ naive attempt to improve the above algorithm is to just sample and store τ = O( L) outgoing neighbors instead of L neighbors, and simulate the walk starting from ustart in the same way. The issue here is that, during the simulation of an L-step walk, whenever one visits a vertex v more save than τ times, one would run out of available vertices in the list Lv , and the algorithm can no longer produce a legit random walk. For a simple example, imagine we have a star-like graph where n − 1 vertices are connected to a center vertex via two-way edges. An L-step random walk starting at the center would require at least Ω(L) samples from the center’s neighbors, and our naive algorithm completely breaks.

Our approach: heavy and light vertices. Observe, however, that in the above example of a star-like graph, we are only at risk of not storing enough random neighbors of the center node, as an L-step random walk would only visit the other non-center vertices a very small number of times. Thus, the algorithm may simply record all edges from the center with only O(n) space. This observation inspires the following approach for a two-pass algorithm:

1. In the first pass, we identify all the vertices that are likely to be visited many times by a random walk (starting from some vertex). We call such vertices heavy, while all other vertices are called light.

2. In the second pass, we record all outgoing neighbors of all heavy vertices, as well as O(τ) random outgoing neighbors with replacement of each of the light vertices.

Observe that the obtained algorithm is indeed oblivious to the starting vertex: the two passes described above do not use the starting vertex. Still, given the set of outgoing neighbors stored by the second pass, we are able to sample a random walk from any start vertex.

First pass: how do we detect heavy vertices? The above approach requires that we detect, in a single pass, all vertices v that with a decent probability (say, 1/poly(n)), are visited more than O(τ) times by an L-step random walk. To this end, we observe that if a random walk visits a vertex v more than τ times, this random walk must follow more than τ − 1 self-circles around v in L steps. This, in turn, implies that a random walk that starts from v is likely to return to v in √ roughly L/τ = O( L) steps. The above discussion suggests the following definition of heavy vertices: a vertex v is heavy, if √ a random walk starting from v is likely (say, with probability at least 1/3) to revisit v in O( L) steps. Indeed, this property is much easier to detect: we can run O(log n) independent copies of √ the folklore one-pass streaming algorithms to sample O(log n) O( L)-step random walks starting from v, and count how many of them return to v at some step.

Second pass: can we afford to store the neighbors? In Lemma 4.11, we show that for √ a light vertex v, an L-step random walk starting at any vertex visits v O( L) times with high √ probability. Therefore, in the second pass, we can safely record only O( L) outgoing neighbors for all light vertices. Still, we have to record all the outgoing neighbors for heavy vertices. The crux of our analysis is a structural result about directed graphs, showing that the total √ outgoing degree of all heavy vertices is bounded by O(n · L), and therefore we can simply store

5 all of their outgoing neighbors. This is proved in Lemma 4.3, which may also be of independent interest.

Intuition behind the structure lemma. Finally, we discuss the insights behind the above structure lemma for directed graphs. We will use dout(v) to denote the number of outgoing neighbors of v. For concreteness, we now say a vertex v is heavy if a random walk starting from v revisits v √ in L steps with probability at least 1/3.

Let Vheavy ⊆ V be the set of heavy vertices and let v ∈ Vheavy. By a simple calculation, one can see that for at least a 1/6 fraction of outgoing neighbors u of v, a random walk starting from √ u visits v in L steps with probability at least 1/6. The key insight is to consider the number of √ pairs (u, v) ∈ V 2 such that a random walk starting from u visits v in L steps with probability at least 1/6. We will use S to denote this set.

• By the previous discussions, we can see that for each heavy vertex v, it adds at least 1/6

dout(v) pairs to the set S. Hence, we have

1 X |S| ≥ · d (v). (1) 6 out v∈Vheavy √ • On the other hand, it is not hard to see that for each vertex v, there are at most O( L) √ √ many pairs of the form (v, u) ∈ S, since a L-step walk can visit only L vertices. So we also have √ |S| ≤ O(n L). (2)

Putting the above (Equation 1 and Equation 2) together, we get the desired bound X √ dout(v) ≤ O(n L).

v∈Vheavy

2.2 Lower Bound for p-Pass Algorithms

We now describe the ideas behind the proof of Theorem 1.2, our Ω(e n · L1/p) space lower bound for p-pass randomized streaming algorithms for sampling random walks. We mention that many of the tools developed for proving space lower bounds are not directly applicable when one wishes to lower bound the space complexity of a sampling task and are more suitable for proving lower bounds on the space required to compute a function or a search problem6.

From sampling to function computation. Our way around this is to first prove a reduction from streaming algorithms that sample a random walk from ustart to streaming algorithms that compute the (p + 1)-neighborhood of the vertex ustart. This is done by considering a graph where a random walk returns to the vertex ustart every p + 2 steps. If p is a constant, then a random walk

6One such tool that cannot be used directly for our purpose is the very useful Yao’s principle [Yao77] that allows proving randomized communication lower bounds by proving the corresponding distributional (deterministic) communication lower bounds.

6 of length L on such a graph can be seen as L/(p + 2) = O(L) copies of a random walk of length p + 2. Observe that if the (p + 1)-neighborhood of the vertex ustart has (almost) L vertices (and the probability of visiting each vertex is more or less uniform), then a random walk of length L is likely to visit all the vertices in the neighborhood and an algorithm that samples a random walk also outputs the entire neighborhood with high probability.

A lower bound for computing the (p + 1)-neighborhood via pointer-chasing. Having reduced sampling a random walk to outputting the (p + 1)-neighborhood, we now need to prove that a space efficient p-pass streaming algorithms cannot output the (p + 1)-neighborhood of ustart, if this neighborhood has roughly L vertices. This is reminiscent of the “pointer-chasing” lower bounds found in the literature. Pointer-chasing results are typically concerned with a graph with p+1 layers of vertices (p layers of edges) and show that given a vertex in the first layer, finding a vertex that is reachable from it in the last layer cannot be done with less than p passes, unless the memory is huge. Classical pointer-chasing lower bounds (e.g.,[NW91]), consider graphs where the out-degree of each vertex is 1, thus the start vertex reaches a unique vertex in the last layer. Unfortunately, this type of pointer-chasing instances are very sparse and a streaming algorithm can simply remember the entire graph in one pass using Oe(n) memory. Since we wish to have roughly L vertices in a (p + 1)-neighborhood of ustart, the out-degree of 1 each vertex should be roughly Ω(L p+1 ) (assuming uniform degrees). Pointer-chasing lower bounds for this type of dense graphs were also proved (e.g.,[GO16] and [FKM+09]), showing that p-pass 1 algorithms essentially need to store an entire layer of edges, which is Ω(n · L p+1 ) in our case. 1 However, this still does not give us the Ω(n · L p ) lower bound we aspire for (and which is tight, at least for two passes).

Towards a tight lower bound: combining dense and sparse. To get a better lower bound, we construct a hard instance that is a combination of the two above mentioned types of pointer- chasing instances, the dense and the sparse. Specifically, for a p-pass lower bound, we construct a layered graph with p + 2 layers of vertices V1,...,Vp+2, where the first layer has only one vertex ustart and all the other layers are of equal size (see Figure 1). To ensure that vertex ustart is reached every p+2 steps, we connect all vertices in the last layer to ustart. Every vertex in layers V2,...,Vp+1 1 connects to a random set of roughly L p vertices in the next layer. Using Guruswami and Onak style arguments ([GO16]), it can be shown that when the edges are presented to the algorithm from right to left, finding a vertex in layer Vp+2 that is reachable form a given vertex in V2 with a 1 (p − 1)-pass algorithm requires Ω(n · L p ) space. We “squeeze out” an extra pass in the algorithm by connecting the start vertex ustart in V1 to a single random vertex in V2. Note that with this construction, it is indeed the case that a (p + 1)-neighborhood of ustart consists of only roughly L 1 1 vertices, but still, the out-degrees of vertices in V2,...,Vd+1 are roughly L p instead of only L p+1 .

2.3 Lower bounds for Oblivious Algorithms Finally, we discuss the intuitions behind the proof of Theorem 1.3, showing that any algorithm that √ is oblivious to the starting vertex must use Ω(e n· L) space space. Our proof is based on a reduction

7 V1 V2 V3 Vp+2

· · · · · · · · · · · · · · · · · · ·

· · · · · · · · · · · · · · · · · · ·

· · · · · · · · · · · · · · · · · · ·

Figure 1: A depiction of our hard instance for p-pass streaming algorithms. Some edges omitted. from a multi-output generalization of the well-studied INDEX problem for one-way communication m protocols, denoted by INDEXm,`. In INDEXm,`, Alice gets ` strings X1,...,X` ∈ {0, 1} and Bob gets an index i ∈ [`]. Alice sends a message to Bob and then Bob is required to output the string

Xi. (Note that when m = 1 it becomes the original INDEX problem). It is not hard to show that any one-way communication protocol solving INDEXm,` with non- trivial probability (say, 1/poly log(m)) requires Alice to send at least Ω(e m`) bits to Bob (see Lemma C.3). Our key observation here is that if there is a starting vertex oblivious algorithm A = (P, S) with S space for approximate simulation of an L = O(m)-step random walk on a graph with √ e n = O( m · `) vertices, then it implies a one-way communication protocol for INDEXm,` with com- munication complexity S and a decent success probability. Recall the lower bound for INDEX , √ m,` we immediately have S = Ω(m`) = Ω(n L). e e √ In more detail, given an m-bit string X, we will build an O( m)-vertex graph H(X) by encoding all bits of X as existence/non-existence of edges in H (this is possible since there are more than m potential edges in H). We also add some artificial edges to H to make sure it is strongly connected. Our construction will make sure that an L = Oe(m) steps random walk in H will reveal all edges in H with high probability, which in turn reveals all bits of X (see the proof of Theorem 1.3 for more details). m Now the reduction can be implemented as follows: given ` strings X1,...,X` ∈ {0, 1} , Alice F` √ constructs a graph G = i=1 H(Xi), as the joint union of ` graphs. Note that G has n = O( m · `) vertices. Alice then runs the preprocessing algorithm P on G to obtain a string M, and sends it to Bob. Given an index i ∈ [`], Bob simply runs S with M together with a suitable starting vertex inside the H(Xi) component of G. By previous discussions, this reveals the string Xi with high probability and proves the correctness of this reduction. Hence, the space complexity of must be √ A Ω(e m`) = Ω(e n L).

Organization of this paper In Section 3 we introduce the necessary preliminaries for this paper. In Section 4 we present our nearly optimal two-pass streaming algorithm for simulating random walks and prove Theorem 1.1.

8 In Section 5 we prove our lower bounds against general multi-pass streaming algorithms for sim- ulating random walks (Theorem 1.2). In Appendix A we present some additional preliminaries in information theory. In Appendix B we provide some missing proofs in Section 5. In Appendix C we prove Theorem 1.3.

3 Preliminaries

3.1 Notation

Let n ∈ N. We use [n] to denote the set {1, . . . , n}. We often use sans-serif letters (e.g., X) to denote random variables, and calligraphic font letters (e.g., X ) to denote distributions. For two random variables X and Y, and for Y ∈ supp(Y), we use (X|Y = Y ) to denote X conditioned on Y = Y . For two lists a and b, we use a ◦ b to denote their concatenation.

For two distributions D1 and D2 on set X and Y respectively, we use D1 ⊗ D2 to denote their product distribution over X × Y, and kD1 − D2kTV to denote the total variation distance between them.

3.2 Graphs In this paper we will always consider directed graphs without multi-edges. A directed G is a pair (V,E), where V is the vertex set and E ⊆ V × V is the set of all edges. G G For a vertex u in a graph G = (V,E), we let Nout(u) := {v :(u, v) ∈ E} and Nin (u) := {v : G G G (v, u) ∈ E}. We also use dout(u) and din(u) to denote its out and in degrees (i.e., |Nout(u)| and G |Nin (u)|). For an edge (u, v) ∈ E, we say v is the out-neighbor of u and u is the in-neighbor of v.

Random walks on directed graphs. For a vertex u in a graph G = (V,E) and an non-negative integer L, an L-step random walk (v0, v1, . . . , vL) starting at u is generated as follows: set v0 = u, G for each i ∈ [L], we draw vi uniformly random from Nout(u). We say that v0 = u is the 0-th vertex G on the walk, and vi is the i-th vertex for each i ∈ [L]. We use RWL (u) to denote the distribution of an L-step random walk starting from u in G. G We use visit[a,b](u, v) to denote the probability of a b-step random walk starting from u visits v between the a-th vertex and b-th vertex on the walk. We often omit the superscript G when the graph G is clear from the context.

Starting vertex oblivious algorithms. Now we formally define a starting vertex oblivious streaming algorithm for simulating random walks.

Definition 3.1. We say a p-pass S-space streaming algorithm A for simulating random walks is starting vertex oblivious, if A can be decomposed into a preprocessing subroutine P and a sampling subroutine S, such that:

1.( Starting vertex oblivious preprocessing phase) P makes p passes over the input graph stream, using at most S words of space. After that, P outputs at most S words, denoted as M.

9 2.( Sampling phase) S takes both the starting vertex ustart and M as input, and outputs a desired walk starting from ustart, using at most S words of space.

3.3 Useful Concentration Bounds on Random Variables The following standard concentration bounds will be useful for us.

Lemma 3.2 (Multiplicative Chernoff bound, [Che52]). Suppose X1, ··· ,Xn are independent ran- dom variables taking values in [0, 1]. Let X denote their sum and let µ = E[X] denote the sum’s expected value. Then,

2 − δ µ Pr (X ≥ (1 + δ)µ) ≤ e 2+δ , ∀0 ≤ δ, 2 − δ µ Pr (X ≤ (1 − δ)µ) ≤ e 2 , ∀0 ≤ δ ≤ 1.

In particular, we have that:

− δµ ·min(δ,1) Pr (X ≥ (1 + δ)µ) ≤ e 3 , ∀0 ≤ δ, 2 − δ µ Pr (|X − µ| ≥ δµ) ≤ 2 · e 3 , ∀0 ≤ δ ≤ 1.

We also need the following Azuma-Hoeffding inequality.

Lemma 3.3 (Azuma-Hoeffding inequality, [Azu67, Hoe94]). Let Z0,...,Zn be random variables satisfying (1) E[|Zi|] < ∞ for every i ∈ {0, . . . , n} and E[Zi|Z0,...,Zi−1] ≤ Zi−1 for every i ∈ [n] (i.e., {Zi} forms a supermartingale) and (2) for every i ∈ [n], |Zi − Zi−1| ≤ 1, then for all λ > 0, we have 2 Pr[Zn − Z0 ≥ λ] ≤ exp(−λ /2n).

In particular, the following corollary will be useful for us.

Corollary 3.4 (Azuma-Hoeffding inequality for Boolean random variables, [Azu67, Hoe94]). Let X1,...,Xn be random variables satisfying Xi ∈ {0, 1} for each i ∈ [n]. Suppose that E[Xi|X1,...,Xi−1] ≤ pi for all i. Then for any λ > 0,

" n # X X 2 Pr Xi ≥ λ + pi ≤ exp(−λ /2n). i=1 i=1

Pi Proof. For i ∈ {0, . . . , n}, let Zi = j=1(Xj −pj). From the assumption one can see that all the Zi form a supermartingale and |Zi−Zi−1| ≤ 1, hence the corollary follows directly from Lemma 3.3.

3.4 Standard Lemmas We will need (a weak form of) Stirling’s approximation. We include a proof for completeness.

Lemma 3.5. For all n > 0, we have nn nn e · ≤ n! ≤ en · . e e

10 Proof. We have:

n−1 i n−1 i n−1 nn nn−1 Yi + 1 Y 1 Y = = = 1 + ≤ e = en−1. n! (n − 1)! i i i=1 i=1 i=1

We also have:

n−1 i+1 n−1 i+1 n−1 n! (n − 1)! Y i  Y 1  Y = = ≤ 1 − ≤ e−1 = e1−n. nn+1 nn i + 1 i + 1 i=1 i=1 i=1

Rearranging gives the result.

The following bound on binomial coefficients follows:

Lemma 3.6. For all 0 < k < n, we have

1 nk  n n−k n nk  n n−k · · ≤ ≤ n · · . 10n2 k n − k k k n − k

Proof. We have:

n n! = k (n − k)! · k! e · n n ≥ e (Lemma 3.5) k k n−k n−k ek · e · e(n − k) · e 1 nk  n n−k ≥ · · . 10n2 k n − k

We also have: n n! = k (n − k)! · k! en · n n ≤ e (Lemma 3.5) k k n−k n−k e · e · e · e nk  n n−k ≤ n · · . k n − k

4 Two-Pass Streaming Algorithms for Simulating Directed Ran- dom Walk

In this section, we present our two-pass streaming algorithms for simulating random walks on directed graphs.

11 4.1 Heavy and Light Vertices We first define the notion of heavy and light vertices.

Definition 4.1 (Heavy and light vertices). Given a directed graph G = (V,E) with n vertices and ` ∈ N.

• (Heavy vertices.) We say a vertex u is `-heavy in G, if visit[1,`](u, u) ≥ 1/3 (i.e., if a random walk starting from u will revisit u in at most ` steps with probability at least 1/3.)

• (Light vertices.) We say a vertex u is `-light in G, if visit[1,`](u, u) ≤ 2/3 (i.e., if a random walk starting from u will revisit u in at most ` steps with probability at most 2/3.)

` ` We also let Vheavy(G) and Vlight(G) be the sets of `-heavy and `-light vertices in G. When G and ` are clear from the context, we simply refer to them as Vheavy and Vlight. Remark 4.2. Note that if the revisiting probability is between [1/3, 2/3], then the vertex is consid- ered to be both heavy and light.

The following lemma is crucial for the analysis of our algorithm.

Lemma 4.3 (Upper bounds on the total out-degrees of heavy vertices). Given a directed graph G with n vertices and ` ∈ N, it holds that X dout(u) ≤ O(n · `). ` u∈Vheavy(G)

Proof. We define a set S of pairs of vertices as follows:

2 S := {(u, v) ∈ V : visit[0,`](u, v) ≥ 1/6}.

That is, a pair of vertices u and v belongs to S if and only if an `-step random walk starting from u visits v with probability at least 1/6. For each fixed vertex u, we further define

 Su := v ∈ Nout(u) | visit[0,`](v, u) ≥ 1/6 , and  Hu := v ∈ V | visit[0,`](u, v) ≥ 1/6 . The following claim will be useful for the proof. Claim 4.4. The following two statements hold:

1. For every u ∈ V , it holds that |Hu| ≤ O(`).

2. For every u ∈ Vheavy, it holds that |Su| ≥ 1/6 · dout(u). Proof. Fixing u ∈ V , the first item follows from the simple fact that X visit[0,`](u, v) ≤ ` + 1. v∈V

12 Now we move to the second item, and fix u ∈ Vheavy. For the sake of contradiction, suppose that |Su| < 1/6 · dout(u). We have

visit[1,`](u, u) = E [visit[0,`−1](v, u)] v∈Nout(u)

≤ E [visit[0,`](v, u)] v∈Nout(u)

< Pr [v ∈ Su] · 1 + Pr [v∈ / Su] · 1/6 < 1/6 + 1/6 < 1/3, v∈Nout(u) v∈Nout(u) a contradiction to the assumption that u is heavy.

Finally, note that by definition of Hu and Su we immediately have X X |S| = |Hu| ≥ |Su|. u∈V u∈V

By Claim 4.4, we have X X X dout(u) ≤ 6 · |Su| ≤ 6 · |Hu| ≤ O (n · `) ,

u∈Vheavy u∈V u∈V which completes the proof.

4.2 A Simple One-Pass Algorithm for Simulating Random Walks We first describe a simple one-pass algorithm for simulating random walks, which will be used as a sub-routine in our two-pass algorithm. Moreover, this one-pass algorithm is starting vertex oblivious, which will be crucial for us later.

Reservoir sampling in one pass. Before describing our one-pass subroutine, we need the following basic reservoir sampling algorithm.

Lemma 4.5 ([Vit85]). Given input access to a stream of n items such that each item can be described by O(1) words, we can uniformly sample m of them without replacement using O(m) words of space.

Using m independent reservoir samplers each with capacity 1, one can also sample m items from the stream with replacement in a space-efficient way.

Corollary 4.6. Given input access to a stream of n items such that each item can be described by O(1) words, we can uniformly sample m of them with replacement using O(m) words of space.

Description of the one-pass algorithm. Now we describe our one-pass algorithm for simu- lating random walks. Our algorithm Aone-pass is starting vertex oblivious, and can be described by a preprocessing subroutine Pone-pass and a sampling subroutine Sone-pass. Recall that as defined in Definition 3.1, Pone-pass takes a single pass over the input graph streaming without knowing the

13 starting vertex ustart, and Sone-pass takes the output of Pone-pass together with ustart, and outputs a desired sample fo the random walk.

Algorithm 1 Preprocessing phase of Aone-pass: Pone-pass(G, τ, Vfull) Input: One pass streaming access to a directed graph G = (V,E). A parameter τ ∈ N. A subset Vfull ⊆ V , and we also let Vsamp = V \ Vfull. save 1: For each vertex v ∈ Vfull, we record all its out-neighbors in the list Lv . (That is, Vfull stands for the set of vertices that we keep all its edges.)

2: For each vertex v ∈ Vsamp, using Corollary 4.6, we sample τ of its out-neighbors uniformly at save random with replacement in the list Lv . (That is, Vsamp stands for the set of vertices that we sample some of its edges.)

3: For a big enough constant c2 > 1, whenever the number of out-neighbors stored exceeds c2 ·τ ·n, the algorithm stops recording them. If this happens, we say the algorithm operates incorrectly and otherwise we say it operates correctly. ~ save save Output: A collection of lists L = {Lv }v∈V .

~ save save Algorithm 2 Sampling phase of Aone-pass: Sone-pass(V, ustart, L, Vfull, L = {Lv }v∈V ) Input: A starting vertex ustart. The path length L ∈ N. A subset Vfull ⊆ V , and we also let Vsamp = V \ Vfull.

1: Let v0 = ustart. For each v ∈ V , we set kv = 1. 2: for i := 1 → L do 3: if vi−1 ∈ Vfull then 4: save vi is set to be a uniformly random element from Lvi−1 5: save else if kvi−1 > |Lvi−1 | then 6: return failure 7: else 8: v ← (Lsave ) . i vi−1 kvi−1 9: kvi−1 ← kvi−1 + 1. 10: end if 11: end for

Output: The walk (v0, v1, . . . , vL).

Analysis of the one-pass algorithm. Now we analyze the correctness of our one-pass algo- rithm. We first observe its space complexity can be easily bounded.

Observation 4.7 (Space complexity of Aone-pass). Given a directed graph G = (V,E) with n vertices. For every τ ∈ N and subset Vfull ⊆ V , Pone-pass(G, τ, Vfull) always takes at most O(τ · n) words of space.

Next we bound the statistical distance between its output distribution and the correct distri- bution of the random walk by the following lemma.

Lemma 4.8 (Correctness of Aone-pass). Given a directed graph G = (V,E) with n vertices. For every integers τ, L ∈ and subset V ⊆ V such that τ · (n − |V |) + P d (v) ≤ c · τ · n, N full full v∈Vfull out 2

14 ~ save let L be random variable of the output of Pone-pass(G, τ, Vfull). For every ustart ∈ V , the output ~ save G distribution of Sone-pass(V, ustart, L, Vfull, L ) has statistical distance β to RWL (ustart), where β is ~ save the probability that Sone-pass(V, ustart, L, Vfull, L ) outputs failure. Proof. Conclude from τ · (n − |V |) + P d (v) ≤ c · τ · n that (G, τ, V ) always full v∈Vfull out 2 Pone-pass full operates correctly. ~ save To bound the statistical distance between the distribution of Sone-pass(V, ustart, L, Vfull, L ) and G ~ save 0 RWL (ustart). We construct another random variable (L ) , in which for every vertex u, we sample another L out-neighbors of u uniformly at random with replacement, and add them to the end of save ~ save the list Lv in L . ~ save 0 Note that Sone-pass(V, ustart, L, Vfull, (L ) ) never outputs failure, and distributes exactly the G ~ save 0 ~ save same as RWL (ustart). On the other hand, Sone-pass(V, ustart, L, Vfull, (L ) ) and Sone-pass(V, ustart, L, Vfull, L ) ~ save are the same as long as Sone-pass(V, ustart, L, Vfull, L ) does not output failure, which completes the proof.

The following corollary follows immediately from the lemma above. (Note that this special case exactly corresponds to the folklore one-pass streaming algorithm for simulating random walks.)

save Corollary 4.9. Given a directed graph G = (V,E) with n vertices and an integer L ∈ N. Let ~L be random variable of the output of Pone-pass(G, L, ∅). For every ustart ∈ V , the output distribution ~ save G of Sone-pass(V, ustart, L, ∅, L ) distributes identically as RWL (ustart).

4.3 Two-Pass Streaming Algorithm for Simulating Random Walks Description of the two-pass algorithm. Now we are ready to describe our two pass algorithm Atwo-pass, which is also starting vertex oblivious, and can be described by the following two sub- routines Ptwo-pass and Stwo-pass.

Algorithm 3 Preprocessing phase of Atwo-pass: Ptwo-pass(G, L, δ) Input: A directed graph G = (V,E√) with n vertices. An integer L ∈ N. A failure parameter −1 δ ∈ (0, 1/n). We also let ` = L, and γ = c1 · log δ where c1 ≥ 1 is a sufficiently large constant to be specified later. 1: First pass: estimation of heavy and light vertices. save (1) save (γ) 1. Run γ independent instances of Pone-pass(G, `, ∅) and let (L ) ,..., (L ) be the corresponding collections of lists. save (j) 2. For each vertex u ∈ V , by running Sone-pass(V, u, `, ∅, (L ) ) for each j ∈ [γ], we take G γ independent samples from RW` . Let wu be the fraction of these random walks that revisit u in ` steps.

3. Let Veheavy be the set of vertices with wu ≥ 0.5, and Velight be the set of vertices with wu < 0.5. 2: Second Pass: heavy-light edge recording

1. Let Vfull = Veheavy. ~ save 2. Run Pone-pass(G, γ · `, Vfull) to obtain a collection of lists L . save Output: The set Vfull and the collection of lists L~ .

15 ~ save save Algorithm 4 Sampling phase of Atwo-pass: Stwo-pass(V, ustart, L, Vfull, L = {Lv }v∈V ) Input: A starting vertex ustart. The path length L ∈ N. A subset Vfull ⊆ V , and a collection of lists L~ save. ~ save Output: Simulate Sone-pass(V, ustart, L, Vfull, L ) and return its output.

Analysis of the algorithm. We first show that with high probability, Velight and Veheavy are subsets of Vlight and Vheavy respectively. Lemma 4.10. Given a directed graph G = (V,E) with n vertices, L ∈ and δ ∈ (0, 1/n), letting √ N ` = L, with probability at least 1 − δ/2 over the internal randomness of Ptwo-pass(G, L, δ), it holds that Velight ⊆ Vlight and Veheavy ⊆ Vheavy.

Proof. Setting c1 in Algorithm 3 to be a large enough constant and applying Corollary 4.9 and the 3 Chernoff bound, with probability at least 1 − n · δ ≥ 1 − δ/2, |wu − visit[1,`](u, u)| ≤ 0.1 for every u ∈ V . The lemma then follows from the definition of heavy and light vertices.

Next, we show that with high probability, a random walk does not visit a light vertex too many times.

Lemma 4.11. Given a directed graph G = (V,E) with n vertices, L ∈ and δ ∈ (0, 1/n), letting √ N −1 ` = L and γ = c1 · log δ , where c1 > 1 is the sufficiently large constant, for every vertex ` ustart ∈ V and vertex v ∈ Vlight(G), an L-step random walk starting from ustart visits v more than γ · ` times with probability at most δ/2n.

Proof. Suppose we have an infinite random walk W starting from ustart in G. Letting τ = γ`, the goal here is to bound the probability that during the first L steps, W visits v more than τ times.

We denote this as the bad event Ebad. Let Zi be the random variable representing the step at which W visits v for the i-th time (if W visits v less than i times in total, we let Zi = ∞). Ebad is equivalent to that Zτ+1 ≤ L. Zτ+1 ≤ L further implies that for at least (τ − `) i ∈ [τ], Zi+1 − Zi ≤ ` and Zi < ∞. In the following we denote this event as E1 and bounds its probability instead. For each i ∈ [τ], let Yi be the random variable which takes value 1 if both Zi < ∞ and Zi+1 − Zi ≤ ` hold, and 0 otherwise. Letting Y

i−1 Claim 4.12. For every i ∈ [τ] and every possible assignments Y

E[Yi|Y

Proof. By the Markov property of the random walk, and noting that Yi is always 0 when Zi = ∞, we have.

∞ X E[Yi|Y

16 To further bound the quantity above, recall that the event Zi = j means that the random walk W starting from ustart visits the light vertex v for the i-th time at W’s j-th step, and we have

E[Yi|Zi = j] = Pr[Yi = 1|Zi = j] = Pr[Zi+1 ≤ j + `|Zi = j].

By the Markov property of the random walk W, Pr[Zi+1 ≤ j + `|Zi = j] equals the probability that a random walk starting from v revisits v in at most ` steps. By the definition of light vertices, we can bound that by 2/3, which completes the proof.

Then by the Azuma-Hoeffding inequality (Corollary 3.4),

Pr[Ebad] ≤ Pr[E1] W W " τ # X = Pr Yi ≥ (τ − `) W i=1 ≤ exp(−Ω(τ − ` − 2/3 · τ)) ≤ δ/2n,

−1 the last inequality follows from the fact that γ = c1 · log δ for a sufficiently large constant c1.

The correctness of the algorithm is finally completed by the following theorem.

Theorem 4.13 (Formal version of Theorem 1.1). Given a directed graph G = (V,E) with n ~ save vertices, L ∈ N and δ ∈ (0, 1/n). Let L and Vfull be the two random variables of the output of Ptwo-pass(G, L, δ). For every ustart ∈ V , the following hold:

~ save • The output distribution of Stwo-pass(V, ustart, L, Vfull, L ) has statistical distance at most δ G from RWL (ustart). √ ~ save −1 • Both of Ptwo-pass(G, L, δ) and Stwo-pass(V, ustart, L, Vfull, L ) use at most O(n · L · log δ ) words of space.

Proof. Note that we can safely assume L ≤ n2, since otherwise one can always use O(n2) words to √ store all the edges in the graph. In this case, we have that L ≤ n · L and the space for restoring the L-step output walk can be ignored. Let Veheavy = Vfull and Velight = V \ Veheavy. Let Egood be the event that Velight ⊆ Vlight and Veheavy ⊆ Vheavy. By Lemma 4.10, we have that Pr[Egood] ≥ 1 − δ/2. Now we condition on the event Egood. In this case, it follows from Lemma 4.3 that Pone-pass(G, γ · `, Veheavy) operates correctly (by setting the constant c2 in Algorithm 1 to be sufficiently large). ~ save By Lemma 4.11 and a union bound, the probability of Stwo-pass(V, ustart, L, Veheavy, L ) outputs failure is at most δ/2. By Lemma 4.8, it follows that the statistical distance between the output ~ save G distribution of Stwo-pass(V, ustart, L, Veheavy, L ) and RWL (ustart) is at most δ/2. The theorem follows by combing the above with the fact that Pr[Egood] ≥ 1 − δ/2.

17 4.4 Two-pass Streaming in the Turnstile Model Similar to the algorithm in [Jin19], our algorithms can also be easily adapted to work for the turnstile graph streaming model, where both insertions and deletions of edges are allowed. Note that our two-pass algorithm Atwo-pass only accesses the input graph stream via the one-pass preprocessing subroutine Pone-pass. Hence, it suffices to implement Pone-pass in the turnstile model as well. There are two distinct tasks in Pone-pass: (1) for light vertices, we need to sample their outgoing neighbors with replacement and (2) for heavy vertices, we need to record all their outgoing neighbors.

Uniformly sampling via `1 sampler. For light vertices, uniformly sampling some out-neighbors from each vertex without replacement can be implemented via the following `1 sampler in the turnstile model.

Lemma 4.14 (`1 sampler in the turnstile model [JW18]). Let n ∈ N, failure probability δ ∈ (0, 1/2) n and f ∈ R be a vector defined by a streaming of updates to its coordinates of the form fi ← fi +∆, where ∆ ∈ {−1, 1}. There is a randomized algorithm which reads the stream, and with probability at most δ it outputs FAIL, otherwise it outputs an index i ∈ [n] such that:

|f | Pr(i = j) = j + O(n−c), ∀j ∈ [n] kfk1 where c ≥ 1 is some arbitrarily large constant. The space complexity of this algorithm is bounded by O(log2(n) · log(1/δ)) bits in the random oracle model, and O(log2(n) · (log log n)2 · log(1/δ)) bits otherwise. Remark 4.15. To get error in the statistical distance also to be at most δ, one can simply set n to be larger than 1/δ. And in that case the space complexity can be bounded by O(log4(n/δ)).

Recording all outgoing neighbors via `1 heavy hitter. For heavy vertices, recording all their outgoing neighbors can be implemented using the following `1 heavy hitter in the turnstile model. (Recall that we assumed our graphs is a simple graph without multiple edges.)

Lemma 4.16 (`1 heavy hitter in the turnstile model [CCFC02]). Let n, k ∈ N, δ ∈ (0, 0.1) and n f ∈ R be a vector defined by a streaming of updates to its coordinates of the form fi ← fi+∆, where ∆ ∈ {−1, 1}. There is an algorithm which reads the stream and returns a subset L ⊂ [n] such that i ∈ L for every i ∈ [n] such that |fi| ≥ kfk1/k and i 6∈ L for every i ∈ [n] such that |fi| ≤ kfk1/2k. The failure probability is at most δ, and the space complexity is at most O(k · log(n) · log(n/δ)).

Algorithm in the turnstile model. Modifying Pone-pass with Lemma 4.14 and Lemma 4.16, we can generalize our two-pass streaming algorithm to work in two-pass turnstile model.7 Remark 4.17 (Two-pass algorithm in the turnstile model). There exists a streaming algorithm Aturnstile that given an n-vertex directed graph G = (V,E) via a stream of both edge insertions and edge deletions, a starting vertex ustart ∈ V , a non-negative integer L indicating the number of steps to be taken, and an error parameter δ ∈ (0, 1/n), satisfies the following conditions:

7 In more details, for each light vertex u, we run τ independent copies of the `1 sampler to obtain τ samples from its outgoing neighbors with replacement.√ We also let k = c2 · τ · n and use the `1 heavy hitter to record all outgoing neighbors for all heavy vertices in Oe(n · L · log(1/δ)) space.

18 √ −1 1. Aturnstile uses at most Oe(n · L · log δ ) space and makes two passes over the input graph G.

L+1 G 2. Aturnstile samples from some distribution D over V satisfying kD − RWL (ustart)kTV ≤ δ.

5 Proof of Theorem 1.2

Reminder of Theorem 1.2. Fix a constant β ∈ (0, 1] and an integer p ≥ 1. Let n ≥ 1 be a sufficiently large integer and let L = dnβe. Any randomized p-pass streaming algorithm that, given an n-vertex directed graph G = (V,E) and a starting vertex ustart ∈ V , samples from a distribution D such that kD − RWG(u )k ≤ 1 − 1 requires Ω(n · L1/p) space. L start TV log10 n e

Proof. We show Theorem 1.2 in two steps, that are captured in Lemma 5.1 and Theorem 5.2 below. Theorem 1.2 is a direct corollary of Lemma 5.1 and Theorem 5.2.

The following distribution is used in Lemma 5.1 and Theorem 5.2. We sometimes omit the subscript L when it is clear from context.

Hard Input Distribution Dn,p,L

1   p L 0 • Setup: Define ∆ = 10p and ∆ = ∆/n . (log n)10

• Vertices: We construct a layered graph G with p + 2 layers V1,V2, ··· ,Vp+2 satisfying |V1| = 1 and |Vi| = n for all i ∈ [p + 2] \{1}. We use s to refer to the unique vertex in V1.

• Edges: There are p + 2 sets of edges E1,E2, ··· ,Ep+2 where edges Ei are between Vi and layer Vi+1 (indices taken modulo p + 2). These are constructed as follows:

– The set E1 is a singleton {(s, v)} where v is a vertex sampled uniformly at random from V2.

– For all 1 < i < p + 2, and all v ∈ Vi, sample a a subset Sv ⊆ Vi+1 uniformly and 0 0 independently such that |Sv| = ∆. Set Ei = {(v, v ) | v ∈ Vi, v ∈ Sv}.

– Define the set Ep+2 = {(v, s) | v ∈ Vp+2}.

• Edge ordering: The edges are revealed to the streaming algorithm in the order

Ep+2,Ep+1, ··· ,E1.

Lemma 5.1. Suppose there exists a constant β ∈ (0, 1], an integer p ≥ 1, integers n, L that are sufficiently large and satisfy L = dnβe such that there exists a (randomized) p-pass streaming n·L1/p algorithm A that takes space 20p and, given an n-vertex directed graph G = (V,E) and a (log n)10 G starting vertex ustart ∈ V , can sample from a distribution D such that kD − RWL (ustart)kTV ≤ 1 − 1 . (log n)10

19 0 n·L1/p Then, there exists another randomized p-pass streaming algorithm A that takes space 15p (log n)10 and satisfies: 0 p+1  1 Pr A (G) = P (s) ≥ 20 . G∼Dn,p (log n) Proof. Let A0 be the algorithm that first runs A on its input and s to get as output a walk

W = (v0, v1, . . . , vL). Define E(W ) = {(vi−1, vi) | i ∈ [L]} to be the set of edges witnessed by W . The algorithm A0 then outputs all paths P p+1(s, W ) of length p + 1 starting for s using only the edges E(W ). ≤p Sp i p+1 p+1 Let N (s) = i0=0 N (s). Observe that if a walk W satisfies P (s, W ) 6= P (s), then ≤p either E(W ) 6⊆ E or P (N (s)) 6⊆ E(W ). Thus, we have, for all G ∈ supp(Dn,p) that:

PrA0(G) 6= P p+1(s) = Pr P p+1(s, W ) 6= P p+1(s) W ∼A(G) 1 ≤ Pr P p+1(s, W ) 6= P p+1(s) + 1 − G 10 W ∼RWL (s) (log n) 1 ≤ Pr E(W ) 6⊆ E ∨ P (N ≤p(s)) 6⊆ E(W ) + 1 − . G 10 W ∼RWL (s) (log n) (Union bound)

G As E(W ) ⊆ E for all W ∼ RWL (s), we have: 1 PrA0(G) 6= P p+1(s) ≤ Pr P (N ≤p(s)) 6⊆ E(W ) + 1 − . G 10 W ∼RWL (s) (log n)

≤p  1 Thus, to finish the proof, it suffices to show that Pr G P (N (s)) 6⊆ E(W ) ≤ 20 . W ∼RWL (s) (log n) ≤p This is done in the rest of the proof. First, observe from the definition of Dn,p that P (N (s)) is p L a collection of at most p · ∆  8p edges. We get by a union bound: (log n)10

Pr P (N ≤p(s)) 6⊆ E(W ) ≤ L · max Pr (e∈ / E(W )). (3) G ≤p G W ∼RWL (s) e∈P (N (s)) W ∼RWL (s)

≤p Fix e ∈ P (N (s)) and observe that vi = s for every i that is a multiple of p+2. Using the Markov property of random walks, we get:

L ! 10p Pr (e∈ / E(W )) ≤ Pr (∀i ∈ [p + 2] : e 6= (vi−1, vi)) . G G W ∼RWL (s) W ∼RWL (s)

As the out-degree of s is 1 and the out-degree of every other vertex is at most ∆ (Observation 5.3), we conclude that:

  L 1 10p − L 1 Pr (e∈ / E(W )) ≤ 1 − ≤ e 10p·∆p ≤ , G p 20 W ∼RWL (s) ∆ n

p L as p · ∆  8p . Plugging into Equation 3 finishes the proof. (log n)10

20 Theorem 5.2. Let a constant β ∈ (0, 1] and an integer p ≥ 1 be given. Let n ≥ 1 be sufficiently β n·L1/p large and L = dn e. For all (randomized) p-pass streaming algorithms A that takes space 15p , (log n)10 we have that: p+1  1 Pr A(G) = P (s) ≤ 50 , G∼Dn,p (log n) The proof of Theorem 5.2 spans the rest of this section. We start with some notation and some properties of the distribution Dn,p. Fix p ≥ 1 and n large enough (as a function of p) for the rest of this subsection.

5.1 Properties of Dn,p

Notation. As the set Ep+2 is fixed, we shall sometimes view Dn,p as a distribution over the sets E1,E2, ··· ,Ep+1. We shall use V to denote the set of all vertices and E to denote the set of all edges, thus G = (V,E). For i ∈ [p + 1] we define E−i to be E \ Ei. For a vertex v ∈ V and i ≥ 0, define the set P i(v) to be the set of paths of length i starting i i S i from v. Also define P (S) for a subset S ⊆ V of vertices as P (S) = v∈S P (v). We drop the superscript i when i = 1. Observe that for all i ∈ [p + 2], we have P (Vi) = Ei. Similarly, define N i(v) to be the set of all vertices that can be reached by a path of length exactly i from v, i.e., a vertex v0 ∈ N i(v) if and only if there is a path ending at v0 in P i(v). Throughout, we shall use h(x) = −x log(x) − (1 − x) log(1 − x) to denote the binary entropy 1 function. Observe that h(·) is concave and monotone increasing for 0 < x < 2 . In this section, we collect some useful properties of the distribution Dn,p defined above. All these properties can be proved by straightforward but tedious calculations, so we defer their proofs to Section B.1.

5.1.1 Size of Nk(s)

We state without proof the following observation:

Observation 5.3. It holds that:

1. For all v ∈ V1 ∪ Vp+2, we have |N(v)| = 1.

2. For all v ∈ V \ (V1 ∪ Vp+2), we have |N(v)| = ∆.

3. For all k ∈ [p + 1], we have Nk(s) ≤ ∆k−1.

Owing to item 3 above, it shall be useful to define, for all k ∈ [p + 1], the notation

∆ ∆ = ∆k−1 and ∆0 = k . (4) k k n Also recall that we used ∆0 to denote ∆/n.

Lemma 5.4. For all k ∈ [p + 1], we have: !! 2k k k Pr N (s) ≤ ∆k · 1 − 10p ≤ . (log n)10 n200

21 Corollary 5.5 (Corollary of Observation 5.3 and Lemma 5.4). For all k ∈ [p + 1], we have: ! ! 1 1 k Pr ∆k · 1 − 10p ≤ N (s) ≤ ∆k ≥ 1 − . (log n)10 −2 n150

5.1.2 Entropy of Nk(s)

Lemma 5.6. For all v ∈ V \ (V1 ∪ Vp+2) and any event E, we have: ! 0 1 H(N(v) | E) ≤ n · h ∆ · 1 + 10p and (log n)10 ! 0 1 H(N(v)) ≥ n · h ∆ · 1 − 10p . (log n)10

Corollary 5.7. For all 1 < k ≤ p + 1 and any event E, we have: ! 2 0 1 H(Ek | E) ≤ n · h ∆ · 1 + 10p and (log n)10 ! 2 0 1 H(Ek) ≥ n · h ∆ · 1 − 10p . (log n)10

Lemma 5.8. We have H(N(s)) = log n and, for all 1 < k ≤ p + 1: !  k  0  1 H N (s) ≥ n · h ∆k · 1 − 8p . (log n)10

5.1.3 Entropy of Pk(s)

Lemma 5.9. For all events E, we have H(P(s) | E) ≤ log n and, for all 1 < k ≤ p + 1: !  k  0 1 H P (s) | E ≤ ∆k−1 · 1 + 9p · H(Ek). (log n)10

Lemma 5.10. We have H(P(s)) = log n and, for all 1 < k ≤ p + 1: !  k  0 1 H P (s) ≥ ∆k−1 · 1 − 9p · H(Ek). (log n)10

Lemma 5.11. For all events E, it holds that:

p+1  p+1 P (s) | E − 1 −H∞(P (s)|E) H 2 ≤ 1 −   . 0 1 ∆p · 1 + 9p · H(Ep+1) (log n)10

22 5.2 The Communication Lower Bound

Reminder of Theorem 5.2. Let a constant β ∈ (0, 1] and an integer p ≥ 1 be given. Let n ≥ 1 be sufficiently large and L = dnβe. For all (randomized) p-pass streaming algorithms A that takes n·L1/p space 15p , we have that: (log n)10

p+1  1 Pr A(G) = P (s) ≤ 50 , G∼Dn,p (log n)

Communication game. To show our lower bound, we consider a communication game Π with p + 1 players. The edges Ep+2 are known to all players. In addition, player i ∈ [p + 1] also knows the edges Ep+2−i. Define T = p(p + 1). The communication takes place in T − 1 rounds, where in round i, player i sends a message Mi to player i + 1 (indices taken modulo p + 1) based on its input and all the messages received so far. After these T − 1 rounds have taken place, player p + 1 outputs an answer based on its input and all the received messages. We treat the output as the th T message and denote it by MT . We use kΠk to denote the maximum (over all inputs) total communication in Π (excluding the output).

Proof of Theorem 5.2. To start, note that we can assume that A is deterministic without loss of generality. The algorithm A implies a deterministic communication game Π as above satisfying

n · L1/p n · ∆ kΠk ≤ T · 15p ≤ 10p . (log n)10 (log n)10

th For j ∈ [T ], We shall use Mj to denote the random variable corresponding to the j message of the protocol. This random variable is over the probability space defined by Dn,p. Also, define M

5(p+1)−2k  1 10 ε = . k log n

2 Observe that kΠk ≤ ε1 · H(Ek). We first show the following conditional independence result: Lemma 5.12. For all 0 ≤ j ≤ T and i ∈ [p + 1], we have:

I(Ei : E−i | M≤j) = 0.

Proof. We repeatedly apply Lemma A.13 to remove the conditioning on M≤j. This is possible as 0 0 for all j ∈ [j], either message j is not sent by player p + 2 − i in which case Mj0 is independent of 0 Ei given E−i and M

I(Ei : E−i | M≤j) ≤ I(Ei : E−i) = 0, by definition of Dn,p.

23 The following lemma shows that the entropy of the set Pp+1(s) remains high after knowing all messages M≤(p+1)p. Lemma 5.13. For all 1 ≤ k ≤ p + 1, we have:

 k  H P (s) | M≤kp = log n, if k = 1

 k  0 H P (s) | M≤kp ≥ (1 − εk) · ∆k−1 · H(Ek), if k > 1.

Before proving Lemma 5.13, we need the following two important technical lemmas, whose (1) (2) (n) proofs can be found in Section B.2. Let vk , vk , ··· , vk be the vertices in layer Vk and, for

 k−1   k−1  √ 0 H N (s) | M≤(k−1)p ≥ H N (s) − 10 · εk−1 · ∆k−2 · H(Ek−1), if k > 2

 k−1   k−1  H N (s) | M≤(k−1)p = H N (s) = log n, if k = 2.

Lemma 5.15. For all 1 < k ≤ p + 1, assuming Lemma 5.13 holds for k − 1, and let B1 be the set promised by Lemma 5.14. For all M≤t0 ∈/ B1, we have

 0   (i) k−1  ∆k−1 5 i ∈ [n] | Pr vk ∈ N (s) | M≤t0 ≤ 5 ≤ εk · n. 1 + εk Now we are ready to prove Lemma 5.13.

Proof of Lemma 5.13. Induction on k. For the base case, we have k = 1. Recall that M≤p is determined by E−1 while P(s) is determined by E1. As these two are independent, we have that:

H(P(s) | M≤p) = H(P(s)) = log n, as desired. For the induction step, we show the lemma holds for k > 1 assuming it holds for k − 1.

Let B1 be the set promised by Lemma 5.14. Define, for all M≤t0 , the set:

 0   (i) k−1  ∆k−1 S(M≤t0 ) = i ∈ [n] | Pr vk ∈ N (s) | M≤t0 ≤ 5 . (5) 1 + εk

5 By Lemma 5.15, for all M≤t0 ∈/ B1 we have |S(M≤t0 )| ≤ εk · n. Letting

 k  ∇ := H P (s) | M≤t for simplicity, we have:

 k−1  k k−1 ∇ ≥ H P(N (s)) | M≤tE−k (Lemma A.4 and P (s) determines P(N (s)))

24  k−1  ≥ H P(N (s)) | M≤t0 E−k (As (M≤t0 , E−k) and (M≤t, E−k) determine each other)

X X  k−1  ≥ Pr(M≤t0 ) · Pr(E−k | M≤t0 ) · H P(N (s)) | M≤t0 ,E−k .

M≤t0 E−k k−1 (Definition A.2 and E−k determines N (s))

k−1 To continue, note that P(N (s)) is determined by Ek and is independent of E−k conditioned on M≤t0 by Lemma 5.12. We get:

X X  k−1  ∇ ≥ Pr(M≤t0 ) · Pr(E−k | M≤t0 ) · H P(N (s)) | M≤t0

M≤t0 E−k X X X  (i)

(i) To ease the notation, for each i ∈ [n], we define Zi = P(vk ), and Z

X X  (i) k−1  ∇ ≥ Pr(M≤t0 ) · Pr vk ∈ N (s) | M≤t0 · H(Zi | Z

M≤t0 ∈B/ 1 i/∈S(M≤t0 ) 0 ∆k−1 X X ≥ 5 · Pr(M≤t0 ) · H(Zi | Z

Now, note that by Lemma A.3, we have:

n X H(Ek | M≤t0 ) = H(Zi | Z

i∈S(M≤t0 ) i6∈S(M≤t0 )

Plugging in, we have:   0 ∆k−1 X X ∇ ≥ 5 · Pr(M≤t0 ) · H(Ek | M≤t0 ) − H(Zi | Z

Next, observe from Lemma A.4 that H(Zi | Z

25 0 can extend as H(Zi | Z

0 ∆k−1 X 5 2 0 ∇ ≥ 5 · Pr(M≤t0 ) · H(Ek | M≤t0 ) − εk · 2n · h ∆ 1 + εk M≤t0 ∈B/ 1   0 ∆k−1 5 2 0 X ≥ 5 · H(Ek | M≤t0 ) − εk · 2n · h ∆ − Pr(M≤t0 ) · H(Ek | M≤t0 ). 1 + εk M≤t0 ∈B1 (Definition A.2)

Recall that by Corollary 5.7, we have, for all M≤t0 , that:

2 0 H(Ek | M≤t0 ) ≤ 2 · n · h(∆ ) (6) 2 0 5 H(Ek) ≥ n · h(∆ ) · (1 − (εk) ). (7)

Plugging in,

0 ∆k−1 5 2 0 2 0 ∇ ≥ 5 · H(Ek | M≤t0 ) − εk · 2n · h ∆ − 2 · Pr(M≤t0 ∈ B1) · n · h ∆ (Equation 6) 1 + εk 0 ∆k−1 5 2 0 5 2 0 ≥ 5 · H(Ek | M≤t0 ) − εk · 2n · h ∆ − εk · n · h ∆ 1 + εk √ 5 (As Pr(M≤t0 ∈ B1) ≤ εk−1 ≤ εk/2) 0 ∆k−1 3  3 5 ≥ 5 · H(Ek | M≤t0 ) − εk · H(Ek) (Equation 7 and εk ≥ 10 · εk) 1 + εk 0 ∆k−1 3  ≥ 5 · H(Ek) · 1 − εk − H(M≤t0 ) (Lemma A.3) 1 + εk 0 2 ≥ (1 − εk) · ∆k−1 · H(Ek). (As H(M≤t0 ) ≤ kΠk ≤ ε1 · H(Ek))

With the help of Lemma 5.13, we now continue the proof of Theorem 5.2. As Lemma 5.13 holds p+1  0 for k = p + 1, we have that H P (s) | M≤T ≥ (1 − εp+1) · ∆p · H(Ep+1). The following lemma is analogous to Lemma 5.14.

∗ ∗ √ Lemma 5.16. There exists a set B ⊆ supp(M≤T ) such that Pr(M≤T ∈ B ) ≤ εp+1 and for all ∗ M≤T ∈/ B , we have:

p+1  p+1  √ 0 H P (s) | M≤T ≥ H P (s) − 10 · εp+1 · ∆p · H(Ep+1).

∗ ∗ Let B be the set from Lemma 5.16. We apply Lemma 5.11 for all M≤T ∈/ B to get:

p+1  √ 0 p+1 P (s) − 15 · εp+1 · ∆ · (Ep+1) −H∞(P (s)|M≤T ) H p H 2 ≤ 1 − 0 ∆p · (1 + p+1) · H(Ep+1) √ 1 − p+1 − 15 · εp+1 ≤ 1 − (Lemma 5.10) 1 + p+1 √ ≤ 20 · εp+1.

26 It follows that: p+1  √ 1 Pr A(G) = P (s) ≤ 25 · εp+1 ≤ 50 . G∼Dn,p (log n)

Acknowledgments

Lijie Chen is supported by an IBM Fellowship. Zhao Song is supported in part by Schmidt Founda- tion, Simons Foundation, NSF, DARPA/SRC, Google and Amazon AWS. We would like to thank

Rajesh Jayaram for discussions on `1 heavy hitters.

References

[ACK19] Sepehr Assadi, Yu Chen, and Sanjeev Khanna. Sublinear algorithms for (∆ + 1) vertex coloring. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 767–786. SIAM, 2019.

[ACL07] Reid Andersen, Fan Chung, and Kevin Lang. Using pagerank to locally partition a graph. Internet Mathematics, 4(1):35–64, 2007.

[AKL16] Sepehr Assadi, Sanjeev Khanna, and Yang Li. Tight bounds for single-pass streaming complexity of the set cover problem. In 48th Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 698–711. Association for Computing Machinery, 2016.

[AP09] Reid Andersen and Yuval Peres. Finding sparse cuts locally using evolving sets. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pages 235–244, 2009.

[Azu67] Kazuoki Azuma. Weighted sums of certain dependent random variables. Tohoku Mathematical Journal, Second Series, 19(3):357–367, 1967.

[BP98] Sergey Brin and Lawrence Page. The anatomy of a large-scale hypertextual web search engine. Computer networks and ISDN systems, 30(1-7):107–117, 1998.

[CCFC02] , Kevin Chen, and Martin Farach-Colton. Finding frequent items in data streams. In International Colloquium on Automata, Languages, and Programming (ICALP), pages 693–703. Springer, 2002.

[CDK19] Graham Cormode, Jacques Dark, and Christian Konrad. Independent sets in vertex- arrival streams. In 46th International Colloquium on Automata, Languages, and Pro- gramming (ICALP). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2019.

[CGMV20] Amit Chakrabarti, Prantar Ghosh, Andrew McGregor, and Sofya Vorotnikova. Vertex ordering problems in directed graph streams. In Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1786–1802. SIAM, 2020.

27 [Che52] Herman Chernoff. A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. The Annals of Mathematical Statistics, pages 493–507, 1952.

[COP03] Moses Charikar, Liadan O’Callaghan, and Rina Panigrahy. Better streaming algo- rithms for clustering problems. In Proceedings of the thirty-fifth annual ACM sympo- sium on Theory of computing, pages 30–39, 2003.

[CT06] Thomas M. Cover and Joy A. Thomas. Elements of information theory (2. ed.). Wiley, 2006.

[CW16] Amit Chakrabarti and Anthony Wirth. Incidence geometries and the pass complexity of semi-streaming set cover. In Proceedings of the twenty-seventh annual ACM-SIAM symposium on Discrete algorithms (SODA), pages 1365–1373. SIAM, 2016.

[ER14] Yuval Emek and Adi Ros´en. Semi-streaming set cover. In International Colloquium on Automata, Languages, and Programming (ICALP), pages 453–464. Springer, 2014.

[FKM+04] Joan Feigenbaum, Sampath Kannan, Andrew McGregor, Siddharth Suri, and Jian Zhang. On graph problems in a semi-streaming model. In International Colloquium on Automata, Languages, and Programming (ICALP), pages 531–543. Springer, 2004.

[FKM+09] Joan Feigenbaum, Sampath Kannan, Andrew McGregor, Siddharth Suri, and Jian Zhang. Graph distances in the data-stream model. SIAM Journal on Computing, 38(5):1709–1727, 2009.

[GGK+18] Mohsen Ghaffari, Themis Gouleakis, Christian Konrad, Slobodan Mitrovi´c,and Ronitt Rubinfeld. Improved massively parallel computation algorithms for mis, matching, and vertex cover. In Proceedings of the 2018 ACM Symposium on Principles of Distributed Computing (PODC), pages 129–138, 2018.

[GKK12] Ashish Goel, Michael Kapralov, and Sanjeev Khanna. On the communication and streaming complexity of maximum bipartite matching. In Proceedings of the twenty- third annual ACM-SIAM symposium on Discrete Algorithms (SODA), pages 468–485. SIAM, 2012.

[GKMS19] Buddhima Gamlath, Sagar Kale, Slobodan Mitrovic, and Ola Svensson. Weighted matchings via unweighted augmentations. In Proceedings of the 2019 ACM Symposium on Principles of Distributed Computing (PODC), pages 491–500, 2019.

[GO16] Venkatesan Guruswami and Krzysztof Onak. Superlinear lower bounds for multipass graph processing. Algorithmica, 76(3):654–683, 2016.

[Hoe94] Wassily Hoeffding. Probability inequalities for sums of bounded random variables. In The Collected Works of Wassily Hoeffding, pages 409–426. Springer, 1994.

[HPIMV16] Sariel Har-Peled, , Sepideh Mahabadi, and Ali Vakilian. Towards tight bounds for the streaming set cover problem. In Proceedings of the 35th ACM SIGMOD- SIGACT-SIGAI Symposium on Principles of Database Systems (PODS), pages 371– 383, 2016.

28 [HRR98] Monika Rauch Henzinger, Prabhakar Raghavan, and Sridhar Rajagopalan. Computing on data streams. External memory algorithms, 50:107–118, 1998.

[Jin19] Ce Jin. Simulating random walks on graphs in the streaming model. In Avrim Blum, editor, 10th Innovations in Theoretical Computer Science Conference, ITCS 2019, January 10-12, 2019, San Diego, California, USA, volume 124 of LIPIcs, pages 46:1– 46:15. Schloss Dagstuhl - Leibniz-Zentrum f¨urInformatik, 2019.

[JS89] Mark Jerrum and Alistair Sinclair. Approximating the permanent. SIAM journal on computing, 18(6):1149–1178, 1989.

[JVV86] Mark R Jerrum, Leslie G Valiant, and Vijay V Vazirani. Random generation of combi- natorial structures from a uniform distribution. Theoretical computer science, 43:169– 188, 1986.

[JW18] Rajesh Jayaram and David P. Woodruff. Perfect lp sampling in a data stream. In Mikkel Thorup, editor, 59th IEEE Annual Symposium on Foundations of Computer Science, FOCS 2018, Paris, France, October 7-9, 2018, pages 544–555. IEEE Com- puter Society, 2018.

[Kap13] Michael Kapralov. Better bounds for matchings in the streaming model. In Proceedings of the twenty-fourth annual ACM-SIAM symposium on Discrete algorithms (SODA), pages 1679–1697. SIAM, 2013.

[McG05] Andrew McGregor. Finding graph matchings in data streams. In Approximation, Randomization and Combinatorial Optimization. Algorithms and Techniques, pages 170–181. Springer, 2005.

[MN20] Sagnik Mukhopadhyay and Danupon Nanongkai. Weighted min-cut: sequential, cut- query, and streaming algorithms. In Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 496–509, 2020.

[NW91] Noam Nisan and Avi Wigderson. Rounds in communication complexity revisited. In Proceedings of the 23rd Annual ACM Symposium on Theory of Computing, May 5-8, 1991, New Orleans, Louisiana, USA, pages 419–429. ACM, 1991.

[Rei08] Omer Reingold. Undirected connectivity in log-space. Journal of the ACM (JACM), 55(4):1–24, 2008.

[RSW18] Aviad Rubinstein, Tselil Schramm, and Seth Matthew Weinberg. Computing exact minimum cuts without knowing the graph. In 9th Innovations in Theoretical Computer Science (ITCS), page 39. Schloss Dagstuhl-Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing, 2018.

[Sch18] Aaron Schild. An almost-linear time algorithm for uniform random spanning tree generation. In Proceedings of the 50th Annual ACM SIGACT Symposium on Theory of Computing (STOC), pages 214–227, 2018.

29 [SGP11] Atish Das Sarma, Sreenivas Gollapudi, and Rina Panigrahy. Estimating pagerank on graph streams. J. ACM, 58(3):13:1–13:19, 2011.

[ST13] Daniel A Spielman and Shang-Hua Teng. A local clustering algorithm for massive graphs and its application to nearly linear time graph partitioning. SIAM Journal on computing, 42(1):1–26, 2013.

[Vit85] Jeffrey Scott Vitter. Random sampling with a reservoir. ACM Trans. Math. Softw., 11(1):37–57, 1985.

[Yao77] Andrew Chi-Chin Yao. Probabilistic computations: Toward a unified measure of com- plexity. In 18th Annual Symposium on Foundations of Computer Science (FOCS), pages 222–227. IEEE Computer Society, 1977.

[Zel11] Mariano Zelke. Intractability of min-and max-cut in streaming graphs. Information Processing Letters, 111(3):145–150, 2011.

Appendix

A Preliminaries in Information Theory

Throughout this subsection, we use sans-serif letters to denote random variables and reserve E to denote an arbitrary event. All random variables will be assumed to be discrete and we shall adopt 1 the convention 0 log 0 = 0. All logarithms are taken with base 2.

A.1 Entropy Definition A.1 (Entropy). The (binary) entropy of X is defined as:

X 1 (X) = Pr(x) · log . H Pr(x) x∈supp(X)

The entropy of X conditioned on E is defined as:

X 1 (X | E) = Pr(x | E) · log . H Pr(x | E) x∈supp(X)

Definition A.2 (Conditional Entropy). We define the conditional entropy of X given Y and E as: X H(X | Y,E) = Pr(y | E) · H(X | Y = y, E). y∈supp(Y)

Henceforth, we shall omit writing the supp(·) when it is clear from context.

Lemma A.3 (Chain Rule for Entropy). It holds for all X, Y and E that:

H(XY | E) = H(X | E) + H(Y | X,E).

30 Proof. We have:

X 1 (XY | E) = Pr(x, y | E) · log H Pr(x, y | E) x,y X 1 = Pr(x, y | E) · log Pr(x | E) · Pr(y | x, E) x,y X 1 X 1 = Pr(x, y | E) · log + Pr(x, y | E) · log Pr(x | E) Pr(y | x, E) x,y x,y X X 1 = (X | E) + Pr(x | E) · Pr(y | x, E) · log H Pr(y | x, E) x y = H(X | E) + H(Y | X,E).

Lemma A.4 (Conditioning reduces Entropy). It holds for all X, Y and E that:

H(X | Y,E) ≤ H(X | E).

Equality holds if and only if X and Y are independent conditioned on E.

Proof. We have: X H(X | Y,E) = Pr(y | E) · H(X | Y = y, E) y X 1 = Pr(y | E) · Pr(x | y, E) · log Pr(x | y, E) x,y X Pr(y | E) = Pr(x | E) · Pr(y | x, E) · log Pr(x | E) · Pr(y | x, E) x,y X 1 ≤ Pr(x | E) · log (Concavity of log(·)) Pr(x | E) x = H(X | E).

Lemma A.5. It holds for all X and E that:

0 ≤ H(X | E) ≤ log(|supp(X)|).

The second inequality is tight if and only if X conditioned on E is the uniform distribution over supp(X).

Proof. The first inequality is direct. For the second, we have by the concavity of log(·) that:

X 1 (X | E) = Pr(x | E) · log ≤ log(|supp(X)|). H Pr(x | E) x

31 A.2 Min-Entropy Definition A.6 (Min-Entropy). The min-entropy of a discrete random variable X is

1 H∞(X) = min log . x:Pr(x)>0 Pr(x)

Fact A.7. If the random variable X takes values in the set Ω, it holds that

0 ≤ H∞(X) ≤ H(X) ≤ log|Ω|

Recall that h(x) = −x log(x) − (1 − x) log(1 − x) for x ∈ [0, 1] is the binary entropy function.

Lemma A.8. If the random variable X takes values in the set Ω and |Ω| > 1, it holds that

(X) − 1 2−H∞(X) ≤ 1 − H . log(|Ω|)

Proof. If X is a point mass, there is nothing to show. Otherwise, let x∗ be such that Pr(x∗) is the largest possible (breaking ties arbitrarily). We have:

1 X 1 (X) = Pr(x∗) · log + Pr(x) · log H Pr(x∗) Pr(x) x6=x∗ 1 X 1 = h(Pr(x∗)) − (1 − Pr(x∗)) · log + Pr(x) · log 1 − Pr(x∗) Pr(x) x6=x∗ X Pr(x) 1 − Pr(x∗) = h(Pr(x∗)) + (1 − Pr(x∗)) · · log 1 − Pr(x∗) Pr(x) x6=x∗ ∗ ∗ ∗ = h(Pr(x )) + (1 − Pr(x )) · H(X | X 6= x ).

Using the fact that h(·) ≤ 1 and Fact A.7, we have:

∗ H(X) ≤ 1 + (1 − Pr(x )) · log(|Ω|).

Rearranging gives: (X) − 1 2−H∞(X) = Pr(x∗) ≤ 1 − H . log(|Ω|)

A.3 Mutual Information Definition A.9 (Mutual Information). The mutual information between X and Y is defined as:

I(X : Y) = H(X) − H(X | Y) = H(Y) − H(Y | X).

The mutual information between X and Y conditioned on Z is defined as:

I(X : Y | Z) = H(X | Z) − H(X | YZ) = H(Y | Z) − H(Y | XZ).

32 Fact A.10. We have 0 ≤ I(X : Y | Z) ≤ H(X). Fact A.11 (Chain Rule for Mutual Information). If A, B, C, D are random variables, then

I(AB : C | D) = I(A : C | D) + I(B : C | AD).

The following lemmas are standard.

Lemma A.12. For random variables A, B, C, D, if A is independent of D given C, then,

I(A : B | C) ≤ I(A : B | C, D).

Proof. Since A and D are independent conditioned on C, by Lemma A.4, H(A | C) = H(A | C, D) and H(A | B, C, D) ≤ H(A | B, C). We have,

I(A : B | C) = H(A | C) − H(A | B, C) = H(A | C, D) − H(A | B, C) ≤ H(A | C, D) − H(A | B, C, D) = I(A : B | C, D).

Lemma A.13. For random variables A, B, C, D, if A is independent of D given B, C, then,

I(A : B | C, D) ≤ I(A : B | C).

Proof. Since A and D are independent conditioned on B, C, by Lemma A.4, H(A | B, C) = H(A | B, C, D). Moreover, since conditioning can only reduce the entropy (again by Lemma A.4),

I(A : B | C, D) = H(A | C, D) − H(A | B, C, D) = H(A | C, D) − H(A | B, C) ≤ H(A | C) − H(A | B, C) = I(A : B | C).

B Missing Proofs in Section 5

In this section we provide the missing proofs in Section 5.

B.1 Missing Proofs in Section 5.1

Reminder of Lemma 5.4. For all k ∈ [p + 1], we have: !! 2k k k Pr N (s) ≤ ∆k · 1 − 10p ≤ . (log n)10 n200

33 Proof. Proof by induction on k. The base case k = 1 follows from Observation 5.3. We show the   2k result for k > 1 by assuming it holds for k − 1. Letting zk = ∆k · 1 − 10p for convenience, (log n)10 we have:       k k−1 k k−1 Pr |N (s)| ≤ zk ≤ Pr |N (s)| ≤ zk−1 + Pr |N (s)| ≤ zk |N (s)| > zk−1 k − 1   ≤ + Pr |Nk(s)| ≤ z |Nk−1(s)| > z . (Induction Hypothesis) n200 k k−1

1 Thus, it is sufficient to show that the second term is at most n200 . To show this we fix an set S such that |S| > zk−1 and show that:   1 Pr |Nk(s)| ≤ z Nk−1(s) = S ≤ . (8) k n200

To see why this holds, first note that conditioned on Nk−1(s) = S, the set Nk(s) is just the set of vertices in later Vk+1 that can be reached from vertices in S (which itself is a subset of Vk). Thus, we have:   k k−1 Pr |N (s)| ≤ zk N (s) = S   k−1 (9) ≤ Pr |{v ∈ Vk+1 | ∃u ∈ S s.t. (u, v) ∈ Ek}| ≤ zk N (s) = S

= Pr(|{v ∈ Vk+1 | ∃u ∈ S s.t. (u, v) ∈ Ek}| ≤ zk),

k−1 where the last step is because N (s) = S is determined by E−k, which is independent of Ek. We now want to upper bound the probability of an event defined by Ek but instead of analyzing 0 00 it directly, we first define two auxiliary random variables Ek and Ek. The values taken by the 0 00 random variables E and E are just a set of edges between Vk and Vk+1. Let z denote z =  k k 0 1 0 ∆ · 1 − 10p . In the random variable E , each edge (u, v) for u ∈ Vk, v ∈ Vk+1 is included (log n)10 k 00 0 independently with probability z. In the random variable Ek, we first sample edges as in Ek and 0 then, if the number of edges coming out of any vertex u ∈ Vk is du ≤ ∆, we make it equal to ∆ 0 0 by sample ∆ − du edges uniformly at random (and do nothing if du > ∆). Denoting by dist(X) the distribution of the random variable X, note first that distE00 | ∀u ∈ V s.t. d0 ≤ ∆ = dist(E ) =⇒ kdistE00 − dist(E )k ≤ Pr∃u ∈ V s.t. d0 > ∆. k k u k k k TV 0 k u Ek (10) 0 00 Plugging Equation 10 into Equation 9 and noting that Ek samples at most as many edges as Ek, we have:   k k−1 Pr |N (s)| ≤ zk N (s) = S ≤ Pr v ∈ V | ∃u ∈ S s.t. (u, v) ∈ E0 ≤ z  + Pr∃u ∈ V s.t. d0 > ∆ 0 k+1 k k 0 k u Ek Ek X ≤ Pr v ∈ V | ∃u ∈ S s.t. (u, v) ∈ E0 ≤ z  + Pr v ∈ V | (u, v) ∈ E0 > ∆. 0 k+1 k k 0 k+1 k Ek Ek u∈Vk (11)

34 0 Now, for v ∈ Vk+1, define the indicator random variable Xv to be 1 if and only if ∃u ∈ S :(u, v) ∈ Ek. 0 Also, for u ∈ Vk, define the indicator random variable Yu,v to be 1 if and only if (u, v) ∈ Ek. Clearly, the random variable Xv are mutually independent for all v ∈ Vk+1 and so are the random variables Yu,v for u ∈ Vk, v ∈ Vk+1. Moreover, we have, for all u ∈ Vk and v ∈ Vk+1 that: ! |S| 0.5 E[Xv] = 1 − (1 − z) ≥ z · zk−1 · 1 − 10p and E[Yu,v] = z, (12) (log n)10

−x x2 as 1 − x ≤ e ≤ 1 − x + 2 , ∀x ∈ (0, 1/10) and zk−1 < |S| ≤ ∆k−1 by Observation 5.3. We now continue Equation 11 using a Chernoff bound (Lemma 3.2).      k k−1  X X X Pr |N (s)| ≤ zk N (s) = S ≤ Pr Xv ≤ zk + Pr Yu,v > ∆ 0   0   Ek Ek v∈Vk+1 u∈Vk v∈Vk+1  0.5 ∆  X  ∆ 1  ≤ exp − 10p · + exp − · 10p (log n)20 10 10 (log n)20 u∈Vk (Lemma 3.2) 1 ≤ , n200 as required for Equation 8.

Reminder of Lemma 5.6. For all v ∈ V \ (V1 ∪ Vp+2) and any event E, we have: ! 0 1 H(N(v) | E) ≤ n · h ∆ · 1 + 10p and (log n)10 ! 0 1 H(N(v)) ≥ n · h ∆ · 1 − 10p . (log n)10

Proof. As v∈ / V1 ∪ Vp+2, we get from Lemma 3.6 that:

  ! n 0 0 1 H(N(v) | E) ≤ log ≤ log n + n · h ∆ ≤ n · h ∆ · 1 + 10p . ∆ (log n)10

For the furthermore part, note that Lemma 3.6 also says that:

  ! n 0 0 1 H(N(v) | E) = log ≥ −3 log n + n · h ∆ ≥ n · h ∆ · 1 − 10p . ∆ (log n)10

35 Reminder of Corollary 5.7. For all 1 < k ≤ p + 1 and any event E, we have: ! 2 0 1 H(Ek | E) ≤ n · h ∆ · 1 + 10p and (log n)10 ! 2 0 1 H(Ek) ≥ n · h ∆ · 1 − 10p . (log n)10

Proof. From Lemma A.3 and Lemma A.4, we have that: ! X 2 0 1 H(Ek | E) ≤ H(N(v) | E) ≤ n · h ∆ · 1 + 10p . (Lemma 5.6) (log n)10 v∈Vk

From Lemma A.3 and independence of N(v) for all v ∈ Vk, we have that: ! X 2 0 1 H(Ek) = H(N(v)) ≥ n · h ∆ · 1 − 10p . (Lemma 5.6) (log n)10 v∈Vk

Reminder of Lemma 5.8. We have H(N(s)) = log n and, for all 1 < k ≤ p + 1: !  k  0  1 H N (s) ≥ n · h ∆k · 1 − 8p . (log n)10

Proof. That H(N(s)) = log n follows from Observation 5.3. For the rest, fix k > 1 and define     1 k  k k zk = ∆k· 1 − 9p . From Lemma A.4, we can conclude that H N (s) ≥ H N (s) |N (s)| . (log n)10 By Definition A.2, this implies:       k X k k k H N (s) ≥ Pr |N (s)| = z · H N (s) |N (s)| = z . z≥zk

By symmetry, conditioned Nk(s) = z, Nk(s) is just a uniformly random subset of size z. Thus, we have from Lemma A.5 that:

  X   n Nk(s) ≥ Pr |Nk(s)| = z · log H z z≥zk X   ≥ Pr |Nk(s)| = z · (n · h(z/n) − 3 log n)(Lemma 3.6)

z≥zk X  k  1 ≥ (n · h(zk/n) − 3 log n) · Pr |N (s)| = z (As h(·) is increasing on 0 < x < 2 ) z≥zk

36 !! ! 0 1 X  k  ≥ n · h ∆k · 1 − 9p − 3 log n · Pr |N (s)| = z . (log n)10 z≥zk (Definition of zk)

As k > 1, the first factor is non-negative. Bounding the second by Corollary 5.5, we have:

!! !    k  0 1 1 H N (s) ≥ n · h ∆k · 1 − 9p − 3 log n · 1 − (log n)10 n150 ! !   1 0  1 ≥ n · 1 − 9p · h ∆k − 3 log n · 1 − (Concavity of h(·)) (log n)10 n150 ! 0  1 ≥ n · h ∆k · 1 − 8p . (As k > 1) (log n)10

Reminder of Lemma 5.9. For all events E, we have H(P(s) | E) ≤ log n and, for all 1 < k ≤ p + 1: !   1 k 0 H P (s) E ≤ ∆k−1 · 1 + 9p · H(Ek). (log n)10

Proof. Proof by induction. The base case k = 1 follows from Observation 5.3. We show the result for k > 1 assuming it holds for k − 1. As Pk(s) is determined by Pk−1(s) and P(Nk−1(s)), we have:     k k−1 k−1 H P (s) E ≤ H P (s)P(N (s)) E     k−1 k−1 k−1 ≤ H P (s) E + H P(N (s)) P (s),E (Lemma A.3)       k−1 X k−1 k−1 k−1 ≤ H P (s) | E + Pr P (s) E · H P(N (s)) P (s),E P k−1(s) (Definition A.2)       k−1 X k−1 k−1 k−1 ≤ H P (s) | E + Pr P (s) E · H P(N (s)) P (s),E . P k−1(s) (As Pk−1(s) determines Nk−1(s))

To continue, we again use Lemma A.3 followed by Lemma A.4.         k k−1 X k−1 X k−1 H P (s) E ≤ H P (s) E + Pr P (s) E · H P(v) P (s),E P k−1(s) v∈N k−1(s) !   X   1 ≤ Pk−1(s) E + Pr P k−1(s) E · |N k−1(s)| · nh∆0 1 + H 1010p P k−1(s) (log n) (Lemma 5.6)

37 !   X   1 ≤ Pk−1(s) E + Pr P k−1(s) E · ∆ · nh∆0 1 + H k−1 1010p P k−1(s) (log n) (Observation 5.3) !   1 k−1 0 ≤ H P (s) E + ∆k−1 · nh ∆ 1 + 10p (log n)10 !   5 k−1 0 ≤ H P (s) E + ∆k−1 · H(Ek) · 1 + 10p . (Corollary 5.7) (log n)10

k−1  Finally, we bound the term H P (s) | E using the induction hypothesis. When k = 2, ∆0 H(Ek) k−1 this term is at most log n ≤ 10p = 10p · H(Ek). Otherwise, when k > 2, we have n·(log n)10 (log n)10   0 0 1 ∆k−1 H(Ek−1) = H(Ek) and this term is at most ∆ · 1 + 9p · H(Ek) ≤ 10p · H(Ek). k−2 (log n)10 (log n)10 Plugging in, we have: !  k  0 1 H P (s) ≤ ∆k−1 · 1 + 9p · H(Ek). (log n)10

Reminder of Lemma 5.10. We have H(P(s)) = log n and, for all 1 < k ≤ p + 1: !  k  0 1 H P (s) ≥ ∆k−1 · 1 − 9p · H(Ek). (log n)10

Proof. The case k = 1 follows from Observation 5.3. We show the result for k > 1. As Pk(s) determines P(Nk−1(s)), we have:

 k   k−1  H P (s) ≥ H P(N (s))   k−1 k−1 ≥ H P(N (s)) P (s) (Lemma A.4)     X k−1 k−1 k−1 ≥ Pr P (s) · H P(N (s)) P (s) (Definition A.2) P k−1(s)     X k−1 k−1 k−1 k−1 k−1 ≥ Pr P (s) · H P(N (s)) P (s) . (As P (s) determines N (s)) P k−1(s)

k−1 k−1 Note that P(N (s)) is determined by Ek and P (s) is determined by E−k. As these are inde- pendent, we have:  k  X  k−1   k−1  H P (s) ≥ Pr P (s) · H P(N (s)) . P k−1(s)

38 As P(v) are mutually independent for all v ∈ N k−1(s), we have:

 k  X  k−1  X H P (s) ≥ Pr P (s) · H(P(v)) P k−1(s) v∈N k−1(s)

X   (Ek) ≥ Pr P k−1(s) · N k−1(s) · H . n P k−1(s)

Using Lemma 5.4, we get: !  k  1 H(Ek) H P (s) ≥ ∆k−1 · 1 − 9p · (log n)10 n ! 0 1 ≥ ∆k−1 · 1 − 9p · H(Ek). (log n)10

Reminder of Lemma 5.11. For all events E, it holds that:

p+1  p+1 P (s) | E − 1 −H∞(P (s)|E) H 2 ≤ 1 −   . 0 1 ∆p · 1 + 9p · H(Ep+1) (log n)10

Proof. Let Ω denote the support of Pp+1(s) and note that Lemma 5.9 implies that log(|Ω|) ≤ 0 p+1 ∆p · (1 + p) · H(Ep+1). Applying Lemma A.8 on the random variable P (s) | E, we have:

p+1  p+1 H P (s) | E − 1 2−H∞(P (s)|E) ≤ 1 − log(|Ω|) p+1  H P (s) | E − 1 ≤ 1 − 0 . ∆p · (1 + p) · H(Ep+1)

B.2 Missing Proofs in Section 5.2 B.2.1 Proof of Lemma 5.14

Reminder of Lemma 5.14. For all 1 < k ≤ p + 1, assuming Lemma 5.13 holds for k − 1, there   √ exists a set B1 ⊆ supp M≤(k−1)p such that Pr M≤(k−1)p ∈ B1 ≤ εk−1 and for all M≤(k−1)p ∈/ B1, we have:

 k−1   k−1  √ 0 H N (s) | M≤(k−1)p ≥ H N (s) − 10 · εk−1 · ∆k−2 · H(Ek−1), if k > 2

 k−1   k−1  H N (s) | M≤(k−1)p = H N (s) = log n, if k = 2.

39 Proof. If k = 2, then we have from the assumption that Lemma 5.13 holds for k − 1 that H(P(s) | M≤t0 ) = H(P(s)) = log n. Thus, we have I(N(s): M≤t0 ) ≤ I(P(s): M≤t0 ) = 0 and the result follows. k−1  If k > 2, we have from the assumption that Lemma 5.13 holds for k−1 that H P (s) | M≤t0 ≥ 0 (1 − k−1) · ∆k−2 · H(Ek−1). Combining with Lemma 5.9, we have that:

 k−1   k−1  0 I N (s): M≤t0 ≤ I P (s): M≤t0 ≤ 2k−1 · ∆k−2 · H(Ek−1).

Using Definition A.9 and Definition A.2, we get that:

h  k−1   k−1 i 0 E H N (s) − H N (s) | M≤t0 ≤ 2k−1 · ∆k−2 · H(Ek−1). (13) M≤t0

We claim that:

Claim B.1. It holds for all M≤t0 that:

 k−1   k−1  0 H N (s) − H N (s) | M≤t0 ≥ −4k−1 · ∆k−2 · H(Ek−1).

Proof. We derive:       k−1 k−1 k−1 k−1 H N (s) M≤t0 ≤ H |N (s)| M≤t0 + H N (s) |N (s)|,M≤t0 (Lemma A.3)  n  ≤ log n + log ∆k−1 k−1 (Lemma A.5 and N (s) ≤ ∆k−1 by Observation 5.3) 0  ≤ 2 log n + n · h ∆k−1 (Lemma 3.6)  k−1  ≤ H N (s) · (1 + 2k−1)(Lemma 5.8)

 k−1   k−1  k−1  k−1  ≤ H N (s) + H P (s) · 2k−1 (H N (s) ≤ H P (s) )

 k−1  0 ≤ H N (s) + 4k−1 · ∆k−2 · H(Ek−1). (Lemma 5.9)

Conclude from Equation 13 that: h     i k−1 k−1 0 0 E H N (s) − H N (s) M≤t0 + 4k−1 · ∆k−2 · H(Ek−1) ≤ 10k−1 · ∆k−2 · H(Ek−1). M≤t0

As the left hand side above is always non-negative by Claim B.1, we can apply Markov’s inequality to conclude that:      √  √ k−1 k−1 0 Pr H N (s) − H N (s) M≤t0 ≥ 10 · k−1 · ∆k−2 · H(Ek−1) ≤ k−1. M≤t0

40 B.2.2 Proof of Lemma 5.15

Reminder of Lemma 5.15. For all 1 < k ≤ p + 1, assuming Lemma 5.13 holds for k − 1, and let B1 be the set promised by Lemma 5.14. For all M≤t0 ∈/ B1, we have

 0   (i) k−1  ∆k−1 5 i ∈ [n] | Pr v ∈ N (s) M 0 ≤ ≤ ε · n. k ≤t 5 k 1 + εk

Proof. Fix M≤t0 ∈/ B1 and let

 0   (i) k−1  ∆k−1 S = i ∈ [n] Pr v ∈ N (s) | M 0 ≤ k ≤t 5 1 + εk for convenience. For k = 2, our definitions imply S = ∅, so we assume k > 2 and proceed by 5 contradiction. Suppose that |S| > k · n. For i ∈ [n], define the indicator random variable Xi to be (i) k−1  (i) k−1  1 if and only if vk ∈ N (s) and define αi = Pr vk ∈ N (s) | M≤t0 = Pr(Xi = 1 | M≤t0 ). By Lemma 5.14, we have:

 k−1  H(X1 ··· Xn | M≤t0 ) ≥ H N (s) | M≤t0

 k−1  √ 0 ≥ H N (s) − 10 · k−1 · ∆k−2 · H(Ek−1) 0  √ 0 ≥ n · h ∆k−1 · (1 − k−1) − 10 · k−1 · ∆k−2 · H(Ek−1). (Lemma 5.8)

Pn By Lemma A.3 and Lemma A.4, we also have H(X1 ··· Xn | M≤t0 ) ≤ i=1 h(αi) implying that:

n 0  √ 0 X n · h ∆k−1 · (1 − k−1) − 10 · k−1 · ∆k−2 · H(Ek−1) ≤ h(αi). (14) i=1

∆0 00 k−1 Pn Using the notation ∆k−1 = 5 , we use the following claim to upper bound i=1 h(αi). 1+k Claim B.2. It holds that:

n 00 X  ∆  h(α ) ≤ 5 · n · h∆00  + n · 1 − 5 · h k−1 . i k k−1 k 1 − 5 i=1 k

Pn 00 Proof. We break the proof into two cases. The easy case is when i=1 αi ≤ n∆k−1. In this case, 1 we simply use the concavity and monotonicity of h(·) on 0 < x < 2 to get:

n 00 X  ∆  h(α ) ≤ n · h∆00  ≤ 5 · n · h∆00  + n · 1 − 5 · h k−1 , i k−1 k k−1 k 1 − 5 i=1 k

Pn 00 and the claim follows. We now deal with the hard case i=1 αi > n∆k−1. In this case, by the

41 P α +λ·P α definition of S, there exists a λ ∈ [0, 1] satisfying i∈S i i∈ /S i = ∆00 . This is equivalent to: |S|+λ·|S| k−1

n X 00 X 00 αi − ∆k−1 · |S| = (1 − λ) · αi + λ · ∆k−1 · S . (15) i=1 i/∈S

Using the concavity of h multiple times, we have:

n X X X X h(αi) ≤ h(αi) + λ · h(αi) + (1 − λ) · h(αi) i=1 i∈S i/∈S i/∈S P !  00  i/∈S αi ≤ |S| + λ · S · h ∆k−1 + (1 − λ) · S · h S Pn 00 ! 00  i=1 αi − ∆k−1 · |S| ≤ |S| · h ∆k−1 + S · h (Equation 15) S Pn 5 00 ! 5 00  5 i=1 αi − n · k · ∆k−1 5 ≤ k · n · h ∆k−1 + n · 1 − k · h 5 . (As |S| > k · n) n · 1 − k

Pn  k−1  To continue, note by Observation 5.3 that i=1 αi = E N (s) | M≤t0 ≤ ∆k−1. This gives:

n 00 X  ∆  h(α ) ≤ 5 · n · h∆00  + n · 1 − 5 · h k−1 . i k k−1 k 1 − 5 i=1 k

Combining Claim B.2 and Equation 14 and rearranging, we get:

 00  0  5 00  5 ∆k−1 h ∆k−1 − k · h ∆k−1 − 1 − k · h 5 1 − k 10 √ ≤  · h∆0  + ·  · ∆0 · (E ). (16) k−1 k−1 n k−1 k−2 H k−1 To derive a contradiction, we show that Equation 16 cannot hold. For this, we first lower bound the left hand side. Recall that h(x) = −x log(x) − (1 − x) log(1 − x) and observe that both these terms are concave. Thus, we can lower bound:

 00  0  5 00  5 ∆k−1 h ∆k−1 − k · h ∆k−1 − 1 − k · h 5 1 − k 5 0 1 5 00 1 00 1 − k ≥ ∆k−1 · log 0 − k · ∆k−1 · log 00 − ∆k−1 · log 00 ∆k−1 ∆k−1 ∆k−1   0 00 5 1 1 00 ∆k−1 ≥ ∆k−1 · k · log 5 + log 10 (As ∆k−1 = 1+5 ) 1 + k 1 − k k 15 20  ≥ log e · ∆00 · k + k . k−1 2 6 2 3 1 2 (As log(1 + x) ≤ log e · (x − x /2 + x /3) and log 1−x ≥ log e · (x + x /2))

42 Simplifying, we get:

 00  0  5 00  5 ∆k−1 0 25 h ∆k−1 − k · h ∆k−1 − 1 − k · h 5 ≥ ∆k−1 · k . (17) 1 − k We now upper bound the right hand side of Equation 16.

10 √  · h∆0  + ·  · ∆0 · (E ) k−1 k−1 n k−1 k−2 H k−1 10 ≤ 100 · h∆0  + · 50 · ∆0 · (E ) (Definition of  and k > 2) k k−1 n k k−2 H k k 100 0  40 0 ≤ k · h ∆k−1 + k · ∆k−2 · h ∆ . (Corollary 5.7)

1 1 Now, note that, for n < x < 2 , we have h(x) ≤ −2x log x ≤ 2x log n. We get: 10 √  · h∆0  + ·  · ∆0 · (E ) ≤ 2 log n · 30 · ∆0 . (18) k−1 k−1 n k−1 k−2 H k−1 k k−1 Equation 16, Equation 17, and Equation 18 cannot all hold together, a contradiction.

∗ ∗ Reminder of Lemma 5.16. There exists a set B ⊆ supp(M≤T ) such that Pr(M≤T ∈ B ) ≤ √ ∗ εp+1 and for all M≤T ∈/ B , we have:

p+1  p+1  √ 0 H P (s) | M≤T ≥ H P (s) − 10 · εp+1 · ∆p · H(Ep+1).

p+1  0 Proof. As H P (s) | M≤T ≥ (1 − εp+1) · ∆p · H(Ep+1), we can conclude from Lemma 5.9 that:

p+1  0 I P (s): M≤T ≤ 2 · εp+1 · ∆p · H(Ep+1).

Using Definition A.9 and Definition A.2, we get that:

 p+1  p+1  0 E H P (s) − H P (s) | M≤T ≤ 2 · εp+1 · ∆p · H(Ep+1). (19) M≤T

Using Lemma 5.9 and Lemma 5.10, we get that, for all M≤T :

p+1  p+1  0 H P (s) − H P (s) | M≤T ≥ −2 · εp+1 · ∆p · H(Ep+1). (20)

Conclude from Equation 19 that:

 p+1  p+1  0  0 E H P (s) − H P (s) | M≤T + 2 · εp+1 · ∆p · H(Ep+1) ≤ 4 · εp+1 · ∆p · H(Ep+1). M≤T

As the left hand side above is always non-negative by Equation 20, we can apply Markov’s inequality to conclude that:

 p+1  p+1  √ 0  √ Pr H P (s) − H P (s) | M≤T ≥ 10 · εp+1 · ∆p · H(Ep+1) ≤ εp+1. M≤T

43 The lemma follows.

C Lower Bounds against Starting Vertex Oblivious Streaming Al- gorithms

In this section we prove Theorem 1.3 (restated below). See also Definition 3.1 for a formal definition of starting vertex oblivious streaming algorithm for simulating random walks. Reminder of Theorem 1.3. Let n ≥ 1 be a sufficiently large integer and let L be a integer satisfying that L ∈ [log40 n, n]. Any randomized algorithm that is oblivious to the start vertex and given an n-vertex directed graph G = (V,E) and a starting vertex u ∈ V , samples from a start√ distribution D such that kD − RWG(u )k ≤ 1 − 1 requires Ω(n · L) space. L start TV log10 n e The following inequality will be useful for the proof. Lemma C.1 (Fano’s inequality (see, e.g., [CT06, Page 38])). Let Z and Z0 be two jointly distributed random variable over the same set Z, it holds that

(Z | Z0) − 1 log |Z| − (Z | Z0) + 1 Pr[Z 6= Z0] ≥ H and Pr[Z = Z0] ≤ H . log |Z| log |Z|

We will also need the following variant of the standard INDEX problem.

Definition C.2 (Multi-output generalization of INDEX). In the INDEXm,` problem, Alice gets ` m strings X1,...,X` ∈ {0, 1} and Bob gets an index i ∈ [`]. Alice sends a message to Bob and then Bob is required to output the string Xi.

The lower bound for INDEXm,` below will be crucial for our proof of Theorem 1.3. m,` Lemma C.3 (One-way communication lower bound for INDEXm,`). Let DINDEX be the input distri- bution that Alice gets ` independent random strings each is uniformly distributed over {0, 1}m and

Bob gets a uniformly random index from [`] that is independent of Alice’s input. Solving INDEXm,` 15 20 over DINDEX with success probability at least 1/ log m requires Alice to send at least m`/ log m bits to Bob. m,` Proof. Over the input distribution DINDEX, Alice gets ` strings X1,..., X`, all distributed uniformly over {0, 1}m. Bob gets a uniformly random index I from [`]. By Yao’s minimax theorem, to prove the theorem it suffices to bound the success probability of all deterministic one-way commutation protocols between Alice and Bob in which Alice sends at most m`/ log20 m bits. In the following we fix such a protocol.

Let M = M(X1,..., X`) be the message sent from Alice to Bob. We need the following claim. Claim C.4. It holds that 20 E I(Xi : M) ≤ m/ log m. i∈[`]

Proof. For i ∈ [`], let X

` ` X X I(Xi : M) ≤ I(Xi : M | X

44 20 Further noting that I(X1,..., X` : M) ≤ |M| ≤ m`/ log m, the claim follows by taking an average.

Now, let fi(M) be the (deterministic) output of Bob when receiving message M from Alice and getting input i. Since Alice and Bob have independent inputs, the success probability of the protocol can be written as Ei∈[`] Pr[fi(M) = Xi]. From the definition of mutual information, we have

H(Xi | fi(M)) = H(Xi) − I(Xi : fi(M)) = m − I(Xi : fi(M)).

Combing the above with Lemma C.1, we have

Ei∈[`] I(Xi : fi(M)) + 1 E Pr[fi(M) = Xi] ≤ i∈[`] m

Ei∈[`] I(Xi : M) + 1 ≤ m ≤ 2/ log20 m ≤ 1/ log15 m, which completes the proof.

Now we are ready to prove Theorem 1.3. √ 20 τ 2 2 Proof of Theorem 1.3. Let τ = L/ log n. For a string X ∈ {0, 1} and (i, j) ∈ [τ] , we use Xi,j to denote the ((i − 1)τ + j)-th bit in X. We will also need the following construction of gadget graphs.

Gadget Construction Hτ (X)

• Setup: Given a string X ∈ {0, 1}τ 2 .

• Vertices: We construct a layered graph G with 2 layers V1,V2 satisfying |V1| = τ and |V2| = τ + 1. For convenience, we always use Vi,j to denote the j-th vertex in the layer Vi.

We will call V2,τ+1 as the starting vertex of Hτ (X).

2 • Edges: For every (i, j) ∈ [τ] , we first add an edge from V2,j to V1,i, and then also add an edge from V1,i to V2,j if Xi,j = 1. For every i ∈ [τ], we add an edge from V1,i to V2,τ+1 and another edge from V2,τ+1 back to V1,i.

• Edge ordering: The edges are given in the lexicographically order. (Note that the edge ordering is not important for the lower bound, and we specify it only for concrete- ness.)

Now, let m = τ 2 and ` = n/(2τ + 1). Suppose there is a starting vertex oblivious streaming √ 100 algorithm A for simulating L-step random walks with space complexity n · L/ log n and statis- tical distance at most 1 − 1/ log10 n, we show it implies a one-way communication protocol solving m,` INDEXm,` over DINDEX that contradicts Lemma C.3.

45 Let P and S be the preprocessing subroutine and the sampling subroutine of A, respectively. The protocol is described as follows:

Protocol Π for INDEXm,`

τ 2 F` 1. Given input strings X1,...,X` ∈ {0, 1} , Alice generates a graph G = i=1 Hτ (Xi). That is, G is an n-vertex graphs consists of ` clusters with the i-th cluster being Hτ (Xi). From now on, we will use Hτ (Xi) to denote the corresponding subgraph of G.

2. Alice then simulates the preprocessing subroutine P on the graph G, and sends its output to Bob.

3. Bob gets an index i ∈ [`] and sets the starting vertex in Hτ (Xi) to be ustart. Bob then simulates S with Alice’s message and ustart as the input to obtain a walk W .

4. Given an L-step walk W starting from the starting vertex of Hτ (Xi), we define a string τ 2 Xrec(W ) ∈ {0, 1} as follows: let V1,V2 be the two layers in Hτ (Xi), Xrec(W )i,j = 1 if and only if W passes an edge from V1,i to V2,j.

5. Bob outputs Xrec(W ).

We first show that with high probability, an L-step random walk starting from the starting vertex in Hτ (Xi) determines the string Xi. Formally, The following claim captures what we need. τ 2 Claim C.5. For every X ∈ {0, 1} , letting ustart be the starting vertex of Hτ (X), it holds that

20 Pr [Xrec(W ) = X] ≥ 1 − exp(− log n). Hτ (X) W ∼RWL (ustart)

2 Proof. We will first bound the probability that Xrec(W )i,j 6= Xi,j for each (i, j) ∈ [τ] and then 2 apply a union bound. Fix (i, j) ∈ [τ] , if Xi,j = 0, since there is no edge from V1,i to V2,j, clearly Xrec(W )i,j is always 0 as well. Hence we only need to consider the case that Xi,j = 1. In this case, one can observe that for every two steps, the random walk visits the edge between 2 V1,i to V2,j with probability at least 1/(τ + 1) . And moreover, all these events are independent. 2 40 Hence, a random walk with L = τ · log n steps visits the edge between V1,i to V2,j with probability

1 − (1 − 1/(τ + 1)2)L/2 ≥ 1 − exp(−Ω(1/τ 2 · L)) ≥ 1 − exp(−Ω(log40 n)).

The claim then follows from a union bound. √ 100 20 Finally, since our streaming algorithm A has space complexity n · L/ log n < m`/ log m and sampling error at most 1 − 1/ log10 n. Protocol Π also has communication complexity less than m`/ log20 m, and success probability at least 1/ log10 n−exp(− log20 n), which contradicts Lemma C.3.

46