arXiv:1807.04910v2 [math.PR] 3 Sep 2020 ece mt elwhpadteMTAaa Fellowship. Akamai MIT the and Fellowship Smith Herchel ∗ -ieIdpnetRno ak a eSihl Unbounded Slightly be can Walks Random Independent 3-wise [email protected] Keywords: ycntutn -ieidpnetrno akwt xetdma expected with walk random independent 3-wise a constructing by o vnne the need even not variables, and generalizes a recent result of B B lasiok. of result recent a 4 only generalizes requires and that variables, inequality maximal Kolmogorov’s than statement be ihbounded with ables O k rmteoii.W rv htti on stgtfrtefis n se and first the for results. tight these is from bound inequality this matrix that surprising prove a We extract origin. the from etdmxmmdsac fay4ws needn admwl na on walk random independent 4-wise any of distance maximum pected O possible ( ( 4 = P √ et ecnie eeaiainweetesteps the where generalization a consider we Next, eety aysraigagrtm aeuiie eeaiain of generalizations utilized have algorithms streaming many Recently, n X .I hsppr eso ht4ws neednei eurdfra for required is independence 4-wise that show we paper, this In ). p , i 2 .Frti ae eol edthe need only we case, this For ). p hmmn ftespeu of supremum the of moment th :hr,w rv httescn oeto h utetdista furthest the of moment second the that prove we here, 2: = admwalk, Random X p hmmns o general For moments. th i st eietclydsrbtd hsipisa smttclystronger asymptotically an implies This distributed. identically be to ’s ascuet nttt fTcnlg.Rsac support Research Technology. of Institute Massachusetts . k ws needne moments independence, -wise etme ,2020 4, September ha Narayanan Shyam X Abstract 1 + ,p k, X 1 · · · i st aebuddscn oet n do and moments second bounded have to ’s edtriete(smttcly maximum (asymptotically) the determine we , + X X i i vr1 over ∗ are k ws needn admvari- random independent -wise ≤ i ≤ iu itneΩ( distance ximum ws needn random independent -wise n odmmn,adalso and moment, cond ehglgttecase the highlight We . lo hs algorithms, these of ll h atta h ex- the that fact the ieover line c rvldis traveled nce n db Harvard’s by ed √ tp is steps n lg n ) 1 Introduction

Random walks are well-studied stochastic processes with numerous applications. The simplest such random walk is the random walk on Z, a process that starts at 0 and at each step independently of all previous moves moves either +1 or 1 with equal . In this paper, we do not study this − random walk but instead study k-wise independent random walks, meaning that steps are not totally independent but that any k steps are completely independent. In many low-space randomized algorithms, information is tracked with processes similar to random walks, but simulating a totally random walk of n steps is known to require O(n) bits while there exist k-wise independent families which can be simulated with O(k lg n) bits [CW79]. As a result, understanding properties of k- wise independent random walks have applications to streaming algorithms, such as heavy-hitters + + [BCI 17, BCI 16], distinct elements [B la18], and ℓp tracking [BDN17]. For any k-wise independent random walk, where k 2, it is well-known that after n steps, ≥ 2 the expected squared distance from the origin is exactly n, since Eh (h(1) + + h(n)) = n for ∼H · · · any 2-wise independent hash family . One can see this by expanding and applying linearity of H expectation. This property provides good bounds for the distribution of the final position of a 2-wise independent random walk. However, we study the problem of bounding the position throughout the random walk, by providing comparable moment bounds for sup1 i n h(1) + + h(i) rather ≤ ≤ | · · · | than just for h(1) + + h(n) . Importantly, we determine an example of a 2-wise independent | · · · | random walk where the expected bounds do not hold, even though very strong bounds for even 4-wise independent random walks can be established. Two more general questions that have been studied in the context of certain streaming algo- rithms are random walks corresponding to insertion-only streams, and random walks with step sizes corresponding to random variables. These are useful generalizations as the first proves useful in certain algorithms with insertion stream inputs, and the second allows for a setup similar to Kolmogorov’s inequality [Kol28], which we will generalize to 4-wise independent random variables. To understand these two generalizations, consider a k-wise independent family of random variables X ,...,X and an insertion stream p ,...,p [n], where now seeing p means that our random 1 n 1 m ∈ j walk moves by Xpj on the jth step. The insertion stream can be thought of as keeping track of a n vector z in R where seeing pj increments the pjth component of z by 1, and X~ can be thought of n as a vector in R with ith component Xi. Then, one goal is to bound, for appropriate values of p,

p (t) Eh sup X,~ z , ∼H 1 t m h i  ≤ ≤  where z(t) is the vector z after seeing only the first t elements of the insertion stream. Notice that bounding the k′th moment of the furthest distance from the origin in a k-wise independent random walk is the special case of m = n, p = j for all 1 j n, and the X ’s are k-wise independent j ≤ ≤ i random signs.

1.1 Definitions To formally describe our main results, we need to formally define k-wise independent random variables, k-wise independent hash families, and k-wise independent random walks.

Definition 1.1. For an integer k 2, we say that random variables X ,...,X are k-wise inde- ≥ 1 n pendent if for any 1 i < i < < i n, the variables X ,...,X are totally independent. ≤ 1 2 · · · k ≤ i1 ik

2 Moreover, if E[Xi] = 0 for all i, we say that X1,...,Xn are unbiased k-wise independent random variables.

Definition 1.2. For an integer k 2, we say that , which is a distribution over functions ≥ H h : 1,...,n 1, 1 is a k-wise independent hash family from 1,...,n to 1, 1 if the { } → {− } { } {− } random variables h(1),...,h(n) for h are unbiased k-wise independent random variables. ∼H For simplicity of notation, we will often treat h as a vector in 1, 1 n, with h := h(i). From {− } i now on, we will also use [n] to denote the set 1,...,n . { } Definition 1.3. Suppose h : [n] 1, 1 is drawn from a k-wise independent hash family. Then, → {− } we say that the random variables Z = 0 and Z = h + + h for 1 t n is a k-wise 0 t 1 · · · t ≤ ≤ independent random walk. Equivalently, a k-wise independent random walk is a series of random variables Z0,Z1,Z2,...,Zn where Z0 = 0 and the variables Z1 Z0,Z2 Z1,...,Zn Zn 1 are { − − − − } unbiased k-wise independent random variables that always equal 1. ± Remark. We will sometimes refer to 2-wise independent hash families or random walks as pairwise independent hash families or random walks. We also formally define an insertion-only stream.

Definition 1.4. An insertion-only stream p ,...,p is a sequence of integers where each p [n]. 1 m i ∈ The insertion-only stream will keep track of a vector z that starts at the origin and after the jth step, increments its p th component by 1. In other words, for any 0 t m, z(t), i.e. the position j ≤ ≤ of z after t steps, will be the vector in Rn such that z(t) = j t : p = i . i |{ ≤ j }| 1.2 Main Results Intuitively, even in a pairwise independent random walk, since the positions at various times have strong correlations with each other, the expectation of the furthest distance traveled from the origin should not be much more than the expectation of than the distance from the origin after n steps, which is well-known to be O(√n). But surprisingly, we show in Section 2 that one can have a 3-wise independent random walk with expected maximum distance Ω(√n log n): in fact, our random walk will even have maximum distance Ω(√n log n) with constant probability. Formally, we prove the following theorem:

Theorem 1. There exists an unbiased 3-wise independent hash family from [n] to 1, 1 such H {− } that

Eh sup h1 + + ht =Ω √n lg n . (1) ∼H 1 t n | · · · |  ≤ ≤  P  In fact, can even have h sup1 t n h1 + + ht = Ω(√n log n) be at least some constant. H ∼H ≤ ≤ | · · · | Furthermore, this bound of√n lg n is tight up to the first and second moments for pairwise (and thus 3-wise) random walks, due to the following result we prove in Section 3.

Theorem 2. For any unbiased pairwise independent family from [n] to 1, 1 , H {− }

2 2 Eh sup (h1 + + ht) = O n lg n . (2) ∼H 1 t n · · ·  ≤ ≤   3 In section 4, we bound random walks corresponding to insertion-only streams and random walks with step sizes that are not necessarily uniform 1 variables. We first generalize Kolmogorov’s ± inequality [Kol28] to 4-wise independent random variables via the following theorem.

Theorem 3. For any unbiased 4-wise independent random variables X1,...,Xn, each with finite variance, E 2 E 2 sup X1 + + Xi = O [Xi ] . (3) 1 i n | · · · |  ≤ ≤  X  Applying Chebyshev’s inequality gives us Kolmogorov’s inequality up to a constant. We remark that if we remove the 4-wise independence assumption, Theorem 3 can be more simply proved as a consequence of the Skorokhod embedding theorem [Bil95]. This weaker version of Theorem 3 may exist in the literature, though as I am not aware of it, I provide a short proof in Appendix C. Next, we generalize Theorem 2 to more general random variables and insertion-only streams. Theorem 4. For any unbiased pairwise independent variables X ,...,X , each satisfying E[X2] 1 n i ≤ 1, and for any insertion stream p ,...,p [n], 1 m ∈ 2 E ~ (t) 2 2 sup X, z = O z 2 lg m (4) 1 t m h i k k  ≤ ≤   (m) where z = z is the final position of the vector and X~ = (X1,...,Xn). Finally, we prove an analogous theorem for k-wise independent random variables for even k 4. ≥ In this case, the log factors will disappear. Theorem 5. For any even k 4, any unbiased k-wise independent random variables X ,...,X ≥ 1 n such that E[Xk] 1 for all i, and any insertion stream p ,...,p [n], i ≤ 1 m ∈ k E ~ (t) k/2 k sup X, z = O(k) z 2. (5) 1 t m h i · k k  ≤ ≤ 

We note that as k grows, this is tight, as even E X,~ z(m) k = Θ(k)k/2 z k if X ,...,X are h i · k k2 1 n totally independent 1-valued random variables, ash shown in [RL01i ]. ± Theorems 3, 4, and 5 are interesting together as they provide various bounds on the supremum of generalized random walks under differing moment bounds and degrees of independence. We also note that the above theorems can be used to prove the following result, which almost completely characterizes all moment bounds of the furthest distance traveled by a k-wise independent random walk with steps in 1. ± Theorem 6. Let q p 1, k 2, and n 1 be positive integers. If X ,...,X are unbiased ≥ ≥ ≥ ≥ 1 n k-wise independent random variables with E[ X q] 1 for all i, then | i| ≤ p p E sup X1 + + Xi = O (√p n) (6) 1 i n | · · · | ·  ≤ ≤  whenever k 4,q 2, and 2 k p. If k 3 and p 2 q, then ≥ ≥ ⌊ 2 ⌋≥ ≤ ≤ ≤

p p E sup X1 + + Xi = O √n log n . (7) 1 i n | · · · |  ≤ ≤   4 The above bounds are tight for certain 1 random variables, for all instances of p,q,k and suffi- ± ciently large n. Finally, in all other cases, it is even possible that

E [ X + + X p]=Ω(n)(p+1)/2, (8) | 1 · · · n| making this case less interesting.

As the proof of Theorem 6 is relatively simple given the other results, and gives an overarching viewpoint of all the results, we prove it here, assuming Theorems 1, 2, 3, 4, and 5. We defer the final case (i.e. Equation (8)) to Appendix B.

Proof. First, consider the case k 4,q 2, and 2 k p. If p 2, then note that X ,...,X ≥ ≥ ⌊ 2 ⌋ ≥ ≤ 1 n are 4-wise independent with E[X2]1/2 E[ X q]1/q 1 for all i. Thus, by Theorem 3, we have i ≤ | i| ≤ E[sup(X + + X )2]= O(n), and E[sup X + + X ] E[sup(X + + X )2]1/2 = O(√n). 1 · · · i | 1 · · · i| ≤ 1 · · · i If p 3, then let r = 2 p , so k r p and r is even. Then, X ,...,X are r-wise independent, ≥ ⌈ 2 ⌉ ≥ ≥ 1 n so E[sup X + + X p]1/p E[sup X + + X r]1/r = O(√r n). However, r = O(p), so | 1 · · · i| ≤ | 1 · · · i| · E[sup X + + X p]1/p = O(√p n). This proves the first case. We note that this inequality is | 1 · · · i| · sharp since even in the case where X ,...,X are i.i.d. uniformly random sign variables, E[ X + 1 n | 1 + X p]=Θ(√p n)p by Khintchine’s inequality [Khi81]. · · · i| · Next, consider the case where k 3 and p 2 q. Then, E[X2]1/2 E[ X q]1/q 1 for all ≤ ≤ ≤ i ≤ | i| ≤ i, and since k 2, we have X ,...,X are pairwise independent. Thus, by Theorem 2, we have ≥ 1 n that E[sup(X + + X )2]= O(n log2 n) and E[sup X + + X ] E[sup(X + + X )2]1/2 = 1 · · · i | 1 · · · i| ≤ 1 · · · i O(√n log n). This proves the second case. We note that this inequality is sharp since by Theorem 1, there exist 3-wise independent X1,...,Xn such that each Xi is a uniformly random sign variable, but E[sup(X + + X )2]1/2 E[sup X + + X ]=Ω(√n log n). 1 · · · i ≥ | 1 · · · i| Finally, we note that to prove Theorem 1, we create a complicated pairwise independent hash function, which suggests that standard pairwise independent hash functions do not have this prop- erty. Indeed, we show in Appendix A that for some types of hash functions constructed from Hadamard matrices or random linear functions from Z/pZ Z/pZ, we have E sup h + +h = → i | 1 · · · i| O(√n).

1.3 Motivation and Relation to Previous Work The primary motivation of this paper comes from certain theorems that provide strong bounds for certain variants of 4-wise independent random walks, which raised the question of whether any of these bounds can be extended to 2-wise or 3-wise independence. For example, Theorem + E (t) 1 in [BCI 17] proves for any unbiased 4-wise independent hash family , h supt h, z = + H ∼H h i O( z 2). This result generalizes a result from [BCI 16] which proves the same result if h is uniformly k k n +  chosen from 1, 1 . [BCI 17] provides an algorithm that successfully finds all ℓ2 ε-heavy hitters {− } 2 1 in an insertion-only stream in O(ε− log ε− log n) bits of space, in which the above result was crucial for analysis of a subroutine which attempts to find bit-by-bit the index of a single “super- heavy” heavy hitter if one exists. Theorem 1 in [BCI+17] also proved valuable for an algorithm for continuous monitoring of ℓp norms in insertion-only data streams [BDN17]. Lemma 18 in [B la18] shows that even without bounded fourth moments, given 4-wise independent random variables

5 X1,...,Xn, each with mean 0 and finite variance, E 2 n maxi [Xi ] P max X1 + + Xi λ = O · . 1 i n | · · · |≥ λ2  ≤ ≤    This theorem was crucial in analyzing an algorithm tracking distinct elements that provides a (1+ε)- 2 1 approximation with failure probability δ in O(ε− lg δ− + lg n) bits of space. Notice that Theorem 3 is stronger than the above equation and implies Kolmogorov’s inequality up to a constant factor, but requires much weaker assumptions. A natural follow-up question to the above theorems is whether 4-wise independence is necessary, or whether more limited levels of independence such as 2-wise or 3-wise independence is allowable. Theorem 1 shows that 3-wise independence does not suffice for any of the above results, because the random walk on a line case is strictly weaker than all of the above results. As a result, we know that the tracking sketches in [BCI+17, BDN17, B la18] require 4-wise independence. However, the results given still have interesting extensions, such as to higher moments. For instance, Theorem 5 shows a stronger result than the one established in [BCI+17], since it not only bounds the first moment of sup h, z(t) for h sampled from a 4-wise independent family of uniform th i 1 variables but also bounds the 4th moment tightly. ± Outside of random walks, k-wise independence for hash functions, which was first introduced in [CW79], has been crucial to numerous algorithms. Bounding the amount of independence required for analysis of algorithms has been studied in various contexts, often since k-wise independent hash families can be stored in low space but may provide equally adequate bounds as totally independent families. As further examples, the well-known AMS sketch [AMS99] is a streaming algorithm to estimate the ℓ2 norm of a vector z to a factor of 1 ε with high probability by 2 ± multiplying the vector by a sketch matrix Π Rn (1/ε ) of 4-wise independent random signs and ∈ × using Πz as an estimate for z . It is known from [PT16, Knu18] that the accuracy of the k k2 k k2 AMS sketch can be much worse if 3-wise independent random signs are used instead of 4-wise independent random signs. If z is given as an insertion stream, it is currently known that the AMS sketch with 8-wise independent random signs can provide weak tracking [BCI+17], meaning that E sup Πz(t) 2 z(t) 2 ε z 2. This implies that the approximation of the ℓ norm with t k k2 − k k2 ≤ k k2 2 the 8-wise independent AMS sketch is quite accurate at all times t. While one cannot perform weak tracking with 3-wise independence of the AMS sketch, it is unknown for 4-wise independence through 7-wise independence whether the AMS sketch provides weak tracking. Finally, linear probing, a well-known implementation of hash tables, was shown to take O(1) expected update time with any 5-wise independent hash function [PPR09] but was shown to take Θ(lg n) expected update time for certain 4-wise independent hash functions and Θ(√n) expected update time for certain 2-wise independent hash functions [PT16]. Bounding the maximum distance traveled of a random walk has also been studied in independent of computer science applications, both when the steps are totally independent or k-wise independent. For example, Kolmogorov’s inequality [Kol28] provides strong bounds for sup (X + + X ) for independent random variables X ,...,X , as long as the second moments t 1 · · · t 1 t of X ,...,X are finite. [BKR06] constructed an infinite sequence X , X ,... of pairwise inde- 1 t { 1 2 } pendent random variables taking on the values 1 such that sup (X + + X ) is bounded almost ± t 1 · · · t surely, though the paper also proved that this phenomenon can never occur for 4-wise independent variables taking on the values 1. Finally, the supremum of a random walk with i.i.d. bounded ± steps was studied in [EV04], which provided comparisons with the supremum of a Brownian motion random walk regardless of the random variable chosen for step size.

6 1.4 Overview of Proof Ideas Here, we briefly outline some of the main ideas behind the proofs of our theorems. The main goal in Section 2 is to establish Theorem 1, i.e. construct a 3-wise independent hash E family such that h sup1 i n h1 + + hi = Ω(√n lg n). To have 3-wise independence, it H ∼H ≤ ≤ | · · · n | will suffice to find a distribution over 1, 1 so that Eh [hihj] = 0 for all i = j: it will be easy  H {− }  ∼H 6 to show that if we replace (h ,...,h ) with ( h ,..., h ) with 1/2 probability, the hash function 1 n − 1 − n will be 3-wise independent. In other words, we wish for the covariance matrix = E[hT h] to be M the identity matrix In. The construction has two major steps. E 1. Create a hash function such that sup1 i n h1 + + hi = Ω(√n lg n) but rather than ≤ ≤ | · · · | have E[h h ] = 0 for all i = j, have E[h h ] = O(n), i.e. the cross terms in total i j 6 i=j | i j | aren’t very large in absolute value (this hash6 function will be in our proof). To do this, P H2 we first created , which certain properties, most notably that E[h + + h ] = 0 but H1 1 · · · n E[h + +h ]=Θ(√n lg n), and rotated the hash family by a uniform index. The rotation 1 · · · n/2 allows many of the cross terms to average out, reducing the sum of their absolute values.

2. Remove the cross terms. To do this, we make a hash family where with some constant prob- H ability, we choose from 2 and with some probability, we choose some set of indices and pick a HE E hash function such that [hihj] will be the opposite sign of h 2 [hihj] for certain indices i, j, ∼H so that overall, E[hihj] will be 0. Certain symmetry properties and most importantly the fact E that i=j h 2 [hihj] = O(n) will allow for us to choose from 2 with constant probability, 6 | ∼H | E H which means even for our final hash function, sup1 i n h1 + + hi = Ω(√n lg n). P ≤ ≤ | · · · | T The goal of Section 3 is to establish Theorem 2, i.e. to show that if = E[hh ] = In, which M 2 2 is true for any pairwise independent hash function, then sup1 i n h1 + + hi = O(n lg n). To ≤ ≤ | · · · | do this, we apply ideas based on the second moment method. We notice that for any matrix A, E[hT Ah]= T r(A), and thus, if we can find a matrix such that the trace of the matrix is small, but T 2 E T h Ah is reasonably large in comparison to sup1 i n h1 + + hi , then [h Ah] is small but is E ≤2≤ | · · · | large in comparison to sup1 i n h1 + + hi . If we assume that n is a power of 2, then the ≤ ≤ | · · · | matrix that corresponds to the quadratic form   lg n (n/2r) 1 − T 2 h Ah = (hi 2r+1 + + h(i+1) 2r ) , · · · · · r=0 X Xi=0 T 2 2 2 2 2 i.e. h Ah = h1 + + hn + (h1 + h2) + + (hn 1 + hn) + + (h1 + + hn) can be shown · · · · · · − · · · · · · to satisfy T r(A)= n lg n and for any vector x, xT Ax 1 (x + + x )2 for all 1 i n, not ≥ lg n · 1 · · · i ≤ ≤ just in expectation. These conditions will happen to be sufficient for our goals. This method, in combination with Theorem 1, will also allow us to prove an interesting matrix inequality, proven at the end of Section 3. The method above actually generalizes to looking at kth moments of k- wise independent hash functions, as well as random walks corresponding to tracking insertion-only streams, and will allow us to prove Theorems 4 and 5. However, these generalizations will also need the construction of ε-nets, which are explained in subsection 4.2, or in [Nel16]. We finally explain the ideas behind Theorem 3, the generalization of Kolmogorov’s inequality and Lemma 18 of [Bla18 ]. We use ideas of chaining, such as in [Nel16], and an idea of [B la18] that allows us to bound the minimum of X + + X and X + + X where i

7 these with another idea, that we can consider distances between i and j for 1 i < j n as ≤ ≤ E[X2 + + X2] and create our ε-nets using this distancing. The final additional idea is similar i+1 · · · j to the main idea of Theorem 2: we write a degree 4 function of X1,...,Xn based on the structure of the ε-nets and show that sup(X + +X )2 can be bounded by this degree 4 function. This way, 1 · · · i we get a bound on E[sup(X + + X )2] rather than on the Chebyshev variant of it. Together, 1 · · · i these ideas allow for our chaining method to be quite effective, even if the Xi’s do not have bounded 4th moments or if the Xi’s wildly differ in variance. As a note, while [BCI+17] only bounds the first moment for the 4-wise independent random walk, their approach can be easily modified to bound the (k ε)th moment of a k-wise independent − random walk for any ε > 0. Therefore, our main contribution is to improve this to the full kth moment.

2 Lower Bounds for Pairwise Independence

In this section, we prove Theorem 1. In other words, we construct a 3-wise independent family H such that the furthest distance traveled by the random walk is Ω(√n lg n) in expected value. To actually construct this counterexample, we proceed by a series of families and tweak each family accordingly to get to the next one, until we get the desired . We assume n is a power of 4: H this assumption can be removed by replacing n with the largest power of 4 less than or equal to n. First, we will need the following simple proposition about unbiased k-wise independent variables.

n Proposition 2.1. Let be a distribution over 1, 1 such that Eh [hi] = 0 for all i. Then, H {− } ∼H H is a k-wise independent hash family if and only if for all 1 ℓ k and all 1 i1 < < iℓ n, E ≤ ≤ ≤ · · · ≤ h [hi1 hi ] = 0. ∼H · · · ℓ Proof. For the “only if” direction, if h ,...,h are k-wise independent for h H, then h ,...,h 1 n ∼ 1 n are ℓ-wise independent for all ℓ k. Therefore, E[h h ]= E[h ] E[h ] = 0. ≤ i1 · · · iℓ i1 · · · iℓ For the “if” direction, let 1 i < < i n and let a , . . . , a 1, 1 . Then, note that if ≤ 1 · · · k ≤ 1 k ∈ {− } h = a for all 1 j k, (1 + a h ) = 2k since 1+ a h = 2 for all j. Otherwise, 1+ a h = 0 ij j ≤ ≤ j ij j ij j ij for some j, so (1 + a h ) = 0. Thus, j ijQ Q E k P h (1 + ajhij ) = 2 h (hi1 = a1,...,hi = ak) . ∼H · ∼H k hY i However, if E[h h ] = 0 for all 1 ℓ k, then we have that E (1 + a h ) = 1 by expanding i1 · · · iℓ ≤ ≤ j ij (1 + a h ) and applying linearity of expectation. Therefore, for all a , . . . , a 1, 1 , we have j ij Q 1 k ∈ {− } P (h = a ,...,h = a )= 1 , so h ,...,h are k-wise independent. Q i1 1 ik k 2k 1 n We start by creating . First, split [n] into blocks of size √n so that (c 1)√n + 1,...,c√n H1 { − } form the cth block for each 1 c √n. Also, define ℓ = √n . Now, to pick a function h from ≤ ≤ 2 1, choose the value of hi for each 1 i n independently, but if i is in the cth block for some H P 1 1 ≤ ≤ 1 c ℓ, make [hi =1]= 2 + 2(ℓ+1 c) and if i is in the cth block for some ℓ + 1 c √n, make ≤ ≤ − ≤ ≤ P[h =1] = 1 1 . This way, E[h ]= 1 if i is in the cth block for c ℓ and E[h ]= 1 i 2 − 2(c ℓ) i ℓ+1 c ≤ i − c ℓ if i is in the cth block− for c>ℓ. − − From now on, define hi to be periodic modulo n, i.e. hi+n := hi for all integers i. We first prove the following about : H1

8 Lemma 2.1. Suppose that 1 i < j n. Suppose that i is in block c and j is in block c , where ≤ ≤ 1 2 c and c are not necessarily distinct. Define r = min(c c , √n (c c )). Then, there is some 1 2 2 − 1 − 2 − 1 constant C3 such that √n 1 − E lg(r + 2) h 1 (hi+d√nhj+d√n) C3 . ∼H ≤ · (r + 1)2 Xd=0 1 1 Proof. For 1 c √n, define fc to equal ℓ+1 c if 1 c ℓ and to equal c ℓ if ℓ + 1 c √n. ≤ ≤ − ≤ ≤ − − ≤ ≤ In other words, fc = E[hi] if i is in the cth block. Furthermore, define f to be periodic modulo √n, i.e. fc := fc+√n for all integers c. Then,

√n 1 √n 1 √n 1 √n − − − E E E h 1 (hi+d√nhj+d√n)= h 1 (hi+d√n) h 1 (hj+d√n)= fc1+dfc2+d = fdfr+d. ∼H ∼H ∼H Xd=0 Xd=0 Xd=0 Xd=1 Now, since r ℓ, if we assume r 1, this sum can be explicitly written as ≤ ≥ ℓ r r r − 1 1 1 2 · d(d + r) − d(r + 1 d) − (n + 1 d)(n + 1 (r + 1 d)) d=1 d=1 − d=1 − − − X Xr X ∞ 1 1 2 ≤ d(d + r) − d(r + 1 d) d=1 d=1 − X X r 2 ∞ 1 1 1 1 1 = + r d − d + r − r + 1 d r + 1 d Xd=1   Xd=1  −  r r 2 1 2 1 = r d − r + 1 d d=1 ! d=1 ! X r X 2 1 C1 lg(r + 2) = 2 r(r + 1) d! ≤ (r + 1) Xd=1 for some constant C1. If we assume r = 0, then this sum can be explicitly written as

ℓ 1 (C ) lg(0 + 2) 2 C = 2 · · d2 ≤ 2 (0 + 1)2 Xd=1 for some constant C2. Therefore, setting C3 = max(C1,C2) as our constant, we are done. Next, we modify to create . To sample h from , first choose h at random, and H1 H2 ′ H2 ∼H1 then choose an index d between 0 and √n 1 uniformly at random. Our chosen function h will − ′ then be the function that satisfies hi′ = hi+d √n for all i. We show the following about 2: · H Lemma 2.2. The following three statements are true: Z E E a) For all i, j , h 2 (hihj)= h 2 (hi+√nhj+√n). ∈ ∼H ∼H b) Suppose that 1 i, i′, j, j′ n, where i, i′ are in blocks c1, j, j′ are in blocks c2, and i = j, i′ = E ≤ E≤ ′ ′ 6 6 j′. Then, h 2 (hihj)= h 2 (hi hj ). ∼H ∼H

9 E c) i=j h 2 hihj = O(n). 6 | ∼H | Proof.PPart a) is quite straightforward, since

√n 1 − E 1 E h 2 (hihj)= h 1 (hi+d√nhj+d√n) ∼H √n ∼H Xd=0 √n 1 − 1 E E = h 1 (hi+(d+1)√nhj+(d+1)√n)= h 2 (hi+√nhj+√n) √n ∼H ∼H Xd=0 by periodicity of h modulo n. For part b), for all d Z, note that i + d√n and i + d√n are in the same blocks, j + d√n and ∈ ′ j + d√n are in the same blocks, i + d√n = j + d√n and thus h , h are independent, and ′ 6 i+d√n j+d√n ′ ′ E i′ +d√n = j′ +d√n and thus hi +d√n, hj +d√n are independent. Therefore, h 1 (hi+d√nhj+d√n)= 6 ∼H E ′ ′ h 1 (hi +d√nhj +d√n) for all d. Because of the way we constructed 2, part b) is immediate from ∼H H these observations. We use Lemma 2.1 to prove part c). First note that for all i = j, 6 √n 1 − E 1 E C3 lg(r + 2) h 2 (hihj)= h 1 (hi+d√nhj+d√n) , ∼H √n ∼H ≤ √n (r + 1)2 Xd=0 · where i is in block c , j is in block c , and r = min( c c , √n c c ). Now, there are exactly 1 2 | 1 − 2| − | 1 − 2| n(√n 1) pairs (i, j) where 1 i, j n, i = j, and r = 0. This is because we can choose from √n − ≤ ≤ 6 blocks for the value of c1 = c2, and then choose from √n(√n 1) possible pairs (i, j) in each block. 3/2 − For a fixed 0 ℓ. Therefore,

ℓ E 3/2 C3 lg(r + 2) max (0, h 2 (hihj)) 2n C4n ∼H ≤ · √n(r + 1)2 ≤ i=j r=0 X6 X lg(r+2) for some constant C4, since (r+1)2 is a convergent series. To finish, note that x = 2 max(0,x) x, so | | P · − E E h 2 (hihj) 2 C4n h 2 (hihj) (2C4 + 1)n, | ∼H |≤ · − ∼H ≤ i=j i=j X6 X6 since

E E E 2 E 2 h 2 (hihj)= h 2 (hihj) h 2 hi = h 2 (h1 + + hn) n n. ∼H ∼H − ∼H ∼H · · · − ≥− i=j i,j i X6 X X

Thus, setting C5 = 2C4 + 1 gets us our desired result.

10 Next, we tweak 2 to create a new family 3. First, notice that we can define gc1c2 for 1 HE H ≤ c1, c2 √n to equal h 2 (hihj) for some i in the c1th block and j in the c2th block such that ≤ ∼H i = j. This is well defined by Lemma 2.2 b), and as 1 c , c √n, there always exist i = j 6 ≤ 1 2 ≤ 6 with i in the c1th block and j in the c2th block, as long as n 4. Now, to sample from 3, 1 ≥ H define g = 1+ gc c . Then, with probability , we choose a hash function from 2. For c1

gcc gcc′ 1 E ′ h 3 (hihj)= + | | = gcc + gcc . ∼H g g g  | | c′=c c′=c X6 X6   E 1 E However, note that gcc 0 since h 2 (hihj) = d h 1 (hi+d√nhj+d√n) and for all d, we ≥ ∼H √n ∼H E have h 1 (hi+d√nhj+d√n) 0 since i+d√n, j +d√n are in the same block for all d. Furthermore, ∼H ≥ P for all indices c1, c2, gc1c2 = g(c1+1)(c2+1), where indices are taken modulo √n, by Lemma 2.2 a). Combining these gives E 1 1 h 3 (hihj)= gc1c2 . ∼H √n · g · | | c ,c !! X1 2 However, we know that g 1 and gc c C5 by the arguments of Lemma 2.2 c), so the ≥ c1,c2 | 1 2 | ≤ lemma follows. P Next, we will modify 3 to create 4, which will almost be our final hash family . First, let 1 H H H C6 = g c1,c2 gc1c2 . Note that C6 is uniformly bounded by C5, but the value of C6 may depend | | 1 1 on n. To sample from 4, with probability p = , choose h from 3. Else, for P H 1+C6(√n 1)/√n ≥ 1+C6 H each block of √n elements, choose a uniformly random− subset of size ℓ from the block, and make E the corresponding elements 1 and the remaining elements 1. We show that h 4 (hihj) = 0 − ∼H

11 E for all i = j. To see why, if i and j are in different blocks then h 3 (hihj) = 0 and if we do 6 ∼H not choose h from , then h and h are independent. If i, j are in the same block, then if H3 i j we condition on choosing from , E(h h ) = C6 . If we condition on not choosing from , the H3 i j √n H3 (√n/2) 1 ℓ 1 1 − E probability of i, j being the same sign is √n 1 = 2ℓ− 1 , meaning (hihj) = √n 1 . Therefore, − − − − E C6 1 h 4 (hihj)= p (1 p) = 0. ∼H · √n − − · √n 1 Our final hash function will be −. To sample from , we sample h , but with probability H H ∼H4 1/2, we replace h with h for all i. We clearly will still have E[h h ] = 0 for all i = j. But due i − i i j 6 to our negation, we now have E[h ]=0 and E[h h h ] = 0 for all 1 i, j, k n, since negating all i i j k ≤ ≤ h ’s will also negate h h h . Thus, by Proposition 2.1, is 3-wise independent. i i j k H To finish, it suffices to show that

Eh sup h1 + + ht = Ω(√n lg n). ∼H 1 t n | · · · |  ≤ ≤  1 To check this, note that with probability 2 we are sampling from 4, so with probability at least 1 H we are sampling from 3, so we need to just verify that 2(1+C6) H

E h 3 sup h1 + + ht = Ω(√n lg n). ∼H 1 t n | · · · |  ≤ ≤  But for , we are choosing something from with probability 1 but g 1+C by the arguments H3 H2 g ≤ 5 of Lemma 2.2 c), so it suffices to verify that

E h 2 sup h1 + + ht = Ω(√n lg n). ∼H 1 t n | · · · |  ≤ ≤  But for , if we condition on the shifting index d, we know that H2 1 E[h + h + + h ] √n 1+ + C √n lg n 1+d√n 2+d√n · · · (d+ℓ)√n ≥ · · · ℓ ≥ 7   for some C7, and likewise 1 E[h + h + + h ] √n 1 C √n lg n, 1+(d+ℓ)√n 2+(d+ℓ)√n · · · (d+2ℓ)√n ≤ − −···− ℓ ≤− 7   which means that regardless of whether d ℓ or d>ℓ, ≤ E C7 h 2 max h1 + + hd√n , h1 + + h(d+ℓ)√n √n lg n ∼H | · · · | | · · · | ≥ 2 h  i by the triangle inequality. But for any h , ∼H2

max h1 + + hd√n , h1 + + h(d+ℓ)√n sup (h1 + + ht), | · · · | | · · · | ≤ 1 t n · · ·   ≤ ≤ so the result follows by taking the expected value of both sides, which proves our upper bound is tight in the case of a random walk. Thus, we have proven Theorem 1.

12 3 Moment Bounds for Pairwise Independence

We show that the bound established in Section 2 and the induced bound on the second moment are tight for the 2-wise independent random walk case by proving Theorem 2. While we prove a generalization of this result in Section 4, the proof here is simpler. We will also assume n, the length of the random walk, is a power of 2, as this assumption can be removed by replacing n with the smallest power of 2 greater than or equal to n. To prove Theorem 2, we first establish the following lemma:

n n Lemma 3.1. Suppose that A R × is a positive definite matrix such that T r(A)= d1 for some ∈ n d1 > 0. Also, suppose there is some d2 > 0 such that for all vectors x R and integers 1 i n, T 1 ∈ ≤ ≤ if x1 + + xi = 1, then x Ax . Then, for all unbiased pairwise independent hash families · · · ≥ d2 : [n] 1, 1 , H → {− } 2 Eh sup (h1 + + hi) d1d2. ∼H 1 i n · · · ≤  ≤ ≤  E 2 E Proof. Note that h hi = 1 for all i and h (hihj) = 0 for all i = j. Therefore, ∼H ∼H 6 T Eh (h Ah)= Eh (hihjAij)= Aij (Eh (hihj)) = Aii = T r(A)= d1. ∼H ∼H ∼H 1 i,j n 1 i,j n 1 i n ≤X≤ ≤X≤ ≤X≤ However, for any 1 i n, for any h , if h + + h = 0, then ≤ ≤ ∼H 1 · · · i 6 T 2 1 h Ah (h1 + + hi) , ≥ · · · · d2 since the vector 1 h has its first i components sum to 1, so we can let this vector equal x to h1+ +hi · get xT Ax 1 . If ···h + + h = 0, then the above inequality is still true as A is positive definite. ≥ f(n) 1 · · · i Therefore, T 1 2 h Ah sup (h1 + + hi) , ≥ d2 · 1 i n · · · ≤ ≤ which means that

T 1 2 d1 = Eh (h Ah) Eh H sup (h1 + + hi) , ∼H ≥ d2 · ∈ 1 i n · · ·  ≤ ≤  so we are done.

Lemma 3.2. There exists a positive definite matrix A Rn n such that T r(A) = n lg n and for ∈ × all x Rn and 1 i n, if x + + x = 1, then xT Ax 1 . This clearly implies Theorem 2. ∈ ≤ ≤ 1 · · · i ≥ lg n Proof. Consider the matrix A such that for all 1 i, j n, Aij = lg n k if k is the smallest i 1 j 1 ≤ ≤ − nonnegative integer such that − = − . Alternatively, we can think of A as the sum of all ⌊ 2k ⌋ ⌊ 2k ⌋ matrices Bij, where Bij is a matrix such that Bij = 1 if i k,ℓ j and 0 otherwise. However, kℓ ≤ ≤ we sum this not over all 1 i, j n but for i = 2r (s 1) + 1, j = 2r s for 0 r lg n 1 and ≤ ≤ · − · ≤ ≤ −

13 1 s 2lg n r. As an illustrative example, for n = 8, A equals ≤ ≤ − 32110000 23110000 11320000 11230000   00003211   00002311   00001132   00001123     10000000 11000000 11110000 01000000 11000000 11110000 00100000 00110000 11110000 00010000 00110000 11110000 =   +   +   . 00001000 00001100 00001111       00000100 00001100 00001111       00000010 00000011 00001111       00000001 00000011 00001111             It is easy to see that T r(A) = n lg n, since Aii = lg n for all i. For any 1 i

14 Proof. If the first part were not true, then there would be matrices An such that T r(A) = d1, wT Aw = 1 where wT vi = 0 for some i, and d d = o(n lg2 n). However, this would mean by d2 1 2 Lemma 3.1 that for all pairwise independent , H

2 2 Eh sup (h1 + + hi) d1d2 = o(n lg n), ∼H 1 i n · · · ≤  ≤ ≤  contradicting Theorem 1. The second part is immediate by the analysis of Lemma 3.2.

4 Generalized Upper Bounds

In this section, our goal is to prove Theorems 3, 4, and 5 of Section 1.2. First, we prove a preliminary inequality, which will be useful for proving both Theorem 3 and Theorem 5.

Lemma 4.1. There exists some constant c > 0 such that for all real numbers x1,...,xm, and all even k N, ∈ m k r/8 k k 2 · x (c(x + + x )) . r ≥ 1 · · · m r=1 X Proof. Assume WLOG that x + + x = 1. Let x x x be real numbers such that 1 · · · m 1′ ≥ 2′ ≥···≥ m′ x + +x = 1 and x > >x are in a geometric series with common ratio 2 1/8. Then, there 1′ · · · m′ 1′ · · · m′ − exists i such that x x . But note that (x )k2k r/8 are equal for all r because of our geometric i ≥ i′ r′ · series, and equals 2k/8 (x )k (21/8 1)k. Thus, · 1′ ≥ − m k r/8 k k i/8 k 1/8 k 2 · x 2 · (x′ ) = (2 1) , r ≥ i − r=1 X so we are done.

4.1 Proof of Theorem 3 In this subsection, we prove Theorem 3, which generalizes Kolmogorov’s inequality [Kol28] to 4-wise independent random variables X1,...,Xn. E 2 E 2 Proof of Theorem 3. Assume WLOG that [Xi ] = 1 (by scaling all of the variables) and [Xi ] > 0 for all i, i.e. none of the variables are almost surely 0. Also, define S = X + + X and P i 1 · · · i T = E[(X + + X )2]= E[X2 + + X2] for 0 i n. Note that T = 0 and T = 1. i 1 · · · i 1 · · · i ≤ ≤ 0 n We proceed by constructing a series of nested intervals [ar,s, br,s]. We construct ar,s and br,s for 0 r d = max lg E[X2] 1 and 1 s 2r, as integers between 0 and n, inclusive. First ≤ ≤ i i − ≤ ≤ define a0,1 = 0 and b0,1 = n. Next, we inductively define ar,s, br,s. Define ar+1,2s 1 := ar,s and   − b := b . Then, if there exists any index a t b such that r+1,2s r,s r,s ≤ ≤ r,s T T = 0.5 T T , t − ar,s · br,s − ar,s let ar+1,2s = br+1,2s 1 = t. Else, define br+1 ,2s 1 to be the largest index t ar,s such that − − ≥ T T < 0.5 T T t − ar,s · br,s − ar,s

15 and similarly define a to be the smallest index t b such that r+1,2s ≤ r,s T T > 0.5 T T . t − ar,s · br,s − ar,s

Note that in this case, ar,2s = br,2 s 1 + 1. − It is clear that intervals are all nested in each other and for every fixed r, all integers between 0 and n, inclusive, are in an interval [ar,s, br,s] for some s (possibly at an endpoint). Also, we always r r have ar,0 br,0 ar,1 br,2 , and any interval [ar,s, br,s] satisfies Tbr,s Tar,s 2− . This ≤ ≤ ≤··· ≤ E 2 1 − ≤ implies that since d = maxi lg [Xi ]− , every integer equals ad,s = bd,s for some s. We now consider the expression f(X ,...,X ), defined to equal  1 n d 2r−1 2r/2 (S S )2(S S )2 + (S S )2(S S )2 , · br,2s − ar,2s ar,2s − ar,2s−1 br,2s − br,2s−1 br,2s−1 − ar,2s−1 r=1 s=1 X X  where we recall that S = X + + X . i 1 · · · i We first prove the following lemma. Lemma 4.2. E [f(X ,...,X )] 10. 1 n ≤ Proof. By 4-wise independence, we have that

E (S S )2(S S )2 = E (S S )2 E (S S )2 br,2s − ar,2s ar,2s − ar,2s−1 br,2s − ar,2s · ar,2s − ar,2s−1 = (T T ) (T T )   br,2s − ar,2s · ar,2s − ar,2s−1  T T 2 ≤ br−1,s − ar−1,s 2(r 1) 2r 2− − = 4 2− . ≤ · By the same argument, we also have that

2 2 2r E (S S − ) (S − S − ) 4 2− . br,2s − br,2s 1 br,2s 1 − ar,2s 1 ≤ · Therefore, we get that 

d 2r−1 r/2 2r E [f(X ,...,X )] 2 8 2− 1 n ≤ ·  ·  Xr=1 Xs=1 d   r/2 r 2 4 2− ≤ · · r=1 X  ∞ r/2 4 2− = 4(√2 + 1), ≤ · r=1 X which is at most 10.

Next, we show the following: Lemma 4.3. There exists some constant c> 0 such that for all X ,...,X R and all 1 i n, 1 n ∈ ≤ ≤ c min (X + + X )4, (X + + X )4 f(X ,...,X ). · 1 · · · i i+1 · · · n ≤ 1 n  16 Proof. Define i0, i1,...,id as follows. First, let id = i, which we note must equal either ad,s or bd,s for some s. Next, we inductively define ir given ir+1 to be either ar,s′ or br,s′ for some s′. If ir+1 = ar+1,s, where s is chosen as small as possible, then if s is odd, then ir := ir+1 = ar,(s+1)/2. If s is even, we choose ir to either equal br+1,s = br,s/2 or ar+1,s 1 = ar,s/2. If Sir+1 − | − S S S , then we choose i := a , and otherwise, we choose i := b . ar,s/2 | ≤ | ir+1 − br,s/2 | r r,s/2 r r,s/2 Else, if ir+1 = br+1,s, where s is chosen as small as possible, then if s is even, then ir := ir+1 = br,s/2. If s is odd, we choose ir to be either ar+1,s = ar,(s+1)/2 or br+1,s+1 = br,(s+1)/2. If S S S S , then we choose i := a , and otherwise, we choose | ir+1 − ar,(s+1)/2 | ≤ | ir+1 − br,(s+1)/2 | r r,(s+1)/2 ir := br,(s+1)/2. It is clear that for all 1 r d, ≤ ≤ 2r−1 (S S )4 (S S )2(S S )2 + (S S )2(S S )2 , ir − ir−1 ≤ br,2s − ar,2s ar,2s − ar,2s−1 br,2s − br,2s−1 br,2s−1 − ar,2s−1 s=1 X  since we have that (S S )4 is even smaller than some individual term in the sum on the right. ir − ir−1 Therefore, this implies that d 2r/2 (S S )4 f(X ,...,X ). · ir − ir−1 ≤ 1 n r=1 X But by Lemma 4.1, we have that d 2r/2 (S S )4 c (S S )4 = c (S S )4. · ir − ir−1 ≥ · id − i0 · i − i0 Xr=1 But i0 must equal 0 or n, as we know that i0 equals a0,s or b0,s for some s. Thus, we are done. We are now close to completing the proof of the theorem. First, by Lemma 4.2 and by taking the maximum over i in Lemma 4.3, we have that there exists some constant C such that

4 4 E max min (X1 + + Xi) , (Xi+1 + + Xn) C. 1 i n · · · · · · ≤  ≤ ≤  2 2 2 2 2  2 Noting that max(a1, . . . , an) = (max(a1, . . . , an)) and min(b1, . . . , bn) = (min(b1, . . . , bn)) when- ever a1, . . . , an, b1, . . . , bn are nonnegative, we have by the Cauchy-Schwarz inequality that

2 2 E max min (X1 + + Xi) , (Xi+1 + + Xn) √C. 1 i n · · · · · · ≤  ≤ ≤  2  2 2 a 2 2 a 2 R Now, it is straightforward to verify that a 2 (a + b) and b 2 (a + b) for all a, b , 2 ≥ − ≥ − ∈ so min(a2, b2) a (a + b)2. By looking at a = X + + X and b = X + + X , we have ≥ 2 − 1 · · · i i+1 · · · n that 2 2 2 (X1 + + Xi) 2 max min (X1 + + Xi) , (Xi+1 + + Xn) max · · · (X1 + + Xn) 1 i n · · · · · · ≤ 1 i n 2 − · · · ≤ ≤ ≤ ≤    2 1 2 = (X1 + + Xn) + max (X1 + + Xi) . − · · · 2 1 i n · · · ≤ ≤ Therefore, since T = E[(X + + X )2] = 1, we in fact have that n 1 · · · n 2 E max (X1 + + Xn) 2(√C +1) = O(1). 1 i n · · · ≤  ≤ ≤  17 4.2 Proof of Theorems 4 and 5

r/2 Before we prove Theorems 4 and 5, we construct 2− -nets for 0 r 2 lg m + 1 in a very similar + ≤ ≤ way as in Theorem 1 in [BCI 17]. We define an ε-net to be a finite set of points ar,0, ar,1, . . . , ar,dr (t) (t) such that for every z , z ar,s 2 ε z 2 for some 0 s dr. The constructions are defined k − k ≤ k k ≤ ≤ (0) identically for both Theorem 4 and Theorem 5. Define a0,0 := z as the only element of the 2 0/2 = 1-net. For r 1, define a = z(0), and given a = z(t1) then define a as the smallest − ≥ r,0 r,s r,s+1 t>t1 such that (t) (t1) r/2 2 z z > 2− z , k − k2 · k k2 unless such a t does not exist, in which case let s = dr and do not define ar,s′ for any s′ >s. We define the set A = a : 0 s d . The following is directly true from our construction: r { r,s ≤ ≤ r} Proposition 4.1. For any 0 t m and fixed r, if t t is the largest t such that z(t1) = a ≤ ≤ 1 ≤ 1 r,s for some s, then z(t) z(t1) 2 r/2 z . Consequently, A = a , . . . , a is a 2 r/2-net. k − k2 ≤ − · k k2 r { r,0 r,dr } − The above proposition implies the following: Proposition 4.2. For all 1 t m, z(t) = a for some s. ≤ ≤ 2 lg m+1,s Proof. Let t be the largest integer at most t such that z(t1) = a for some s. Then, z(t) 1 2 lg m+1,s k − a 2 2 (2 lg m+1) z 2 < 1, which is clearly impossible unless z(t) = a . 2 lg m+1,sk2 ≤ − · k k2 2 lg m+1,s Next, to prove Equations (4) and (5), we will need the Marcinkiewicz–Zygmund inequality (see for example [RL01]), which is a generalization of Khintchine’s inequality (see for example [Khi81]): Theorem 7. For any even k 2, there exists a constant B only depending on k such that for ≥ k any fixed vector v and totally independent and unbiased random variables Y~ = (Y1,...,Yn),

n k n k/2 E E 2 Yi Bk Yi .  !  ≤  !  Xi=1 Xi=1 k/2    Moreover, Bk grows as Θ(k) . This implies the following result: Proposition 4.3. For any even k 2 and vector v Rn, there exists a B = Θ(k)k/2 such that ≥ ∈ k for any k-wise independent unbiased random variables X ,...,X with E[Xk] 1 for all i, and 1 n i ≤ X~ := (X1,...,Xn), n k k E ~ E k v, X = viXi Bk v 2 . h i  !  ≤ k k h i Xi=1 k  Proof. Since the expected value of ( viXi) is only dependent on k-wise independence, we can assume that the X ’s are totally independent but have the same . This implies i P n k n k/2 E E 2 2 viXi Bk vi Xi  !  ≤  !  Xi=1 Xi=1     by Theorem 7. However, we know that E[X2d] 1 for all i and all 1 d k/2, since E[Xk] 1 i ≤ ≤ ≤ i ≤ and E[X2d]k/d E[Xk] by Jensen’s inequality, so simply expanding and using independence and i ≤ i linearity of expectation gets us the desired result.

18 We now prove Theorems 4 and 5.

(t) Proof of Theorem 4. For r 1 and s, suppose ar,s = z and t1 t is the largest index such that (t1) ≥ (t1)≤ z Ar 1. Then, define f(s,t) to be the index s′ such that z = ar 1,s′ . Consider the quadratic ∈ − − form 2 lg m+1 dr ~ 2 (ar,s ar 1,f(r,s)), X . h − − i r=1 s=0 X X (r 1)/2 By Proposition 4.1, ar,s ar 1,f(r,s) 2 2− − z 2. Thus, by Proposition 4.3, we get the k − − k ≤ · k k expected value of the quadratic form equals

2 lg m+1 dr 2 lg m+1 dr E[ (a a ), X~ 2] B a a 2 h r,2s+1 − r,2s i ≤ 2 k r,2s+1 − r,2sk2 r=1 s=0 r=1 s=0 X X X X 2 lg m+1 r (r 1) 2 2 B 2 2− − z 2B (2 lg m + 1)( z ). ≤ 2 · k k2 ≤ 2 k k2 Xr=1   2 (0) (m) Here, I am using the fact that an ε-net has size at most ε− , which is easy to see since z ,...,z is tracking an insertion stream (it is proven, for example, in Theorem 1 of [BCI+17]), and thus d 2r. r ≤ Now, for any 0 i n, consider z(i) and let z(i) = a . Then, define s = s if r = 2 lg m+1 ≤ ≤ 2 lg m+1,s r and sr 1 = f(r, sr) for 1 r 2 lg m + 1. Note that s0 = 0 and for any r 1, if ar,sr Ar 1, − ≤ ≤ ~ 2 ≥ ∈ − then ar,sr = ar 1,sr−1 . Thus, each (ar,sr ar 1,sr−1 ), X for 1 2 lg m + 1 is either 0 (because − h − − i ≤ ar,sr ar 1,sr−1 = 0) or is a summand in our quadratic form. Therefore, − − 2 lg m+1 dr 2 lg m+1 ~ 2 ~ 2 1 (i) ~ 2 (ar,s ar,f(r,s)), X (ar,sr ar 1,sr−1 ), X z , X , h − i ≥ h − − i ≥ 2 lg m + 1 · h i r=1 s=0 r=1 X X X (i) (0) with the last inequality true since a2 lg m+1,s2 lg m+1 = z , a0,s0 = z , and by the Cauchy-Schwarz inequality. As this is true for all i, taking the supremum over i and then expected values gives us

2 lg m+1 d r 1 2B (2 lg m + 1)( z 2) E (a a ), X~ 2 sup E z(i), X~ 2 , 2 k k2 ≥ h r,2s+1 − r,2s i ≥ 2 lg m + 1 · h i " r=1 s=0 # i X X h i and therefore, E z(i), X~ 2 = O z 2 lg2 m . h i k k2 · h i Proof of Theorem 5. Consider the form 

2 lg m+1 dr k r/8 ~ k 2 · (ar,s ar 1,f(r,s)), X , h − − i r=1 s=0 X X with f(r, s) defined as in the proof of Equation (4). We again note that by Proposition 4.1, (r 1)/2 ar,s ar 1,f(r,s) 2 2− − z 2. Thus, by Proposition 4.3, we get the expected value of the k − − k ≤ · k k form equals

2 lg m+1 dr 2 lg m+1 dr k r/8 E ~ k k r/8 k 2 · [ (ar,s ar 1,f(r,s)), X ] Bk 2 · ar,s ar 1,f(r,s) 2 h − − i ≤ k − − k Xr=1 Xs=0 Xr=1 Xs=0 19 2 lg m+1 k r/8 r (r 1)k/2 k k/2 ∞ r(8 3k)/8 k k/2 k B 2 · 2 2− − z B 2 2 − z = O(2 B ) z , ≤ k · · k k2 ≤ k · k k2 k k k2 Xr=0 Xr=0 since k 4. Again, I am using the fact that d 2r as an ε-net has size at most ε 2. ≥ r ≤ − Now, for any 0 i n, suppose s satisfies z(i) = a . define s = s if r = 2 lg m +1 and ≤ ≤ 2 lg m+1,s r sr 1 = f(r, sr) for 1 r 2 lg m + 1. Then, similarly to in the proof of Equation (4), − ≤ ≤

2 lg m+1 dr 2 lg m+1 k r/8 ~ k k r/8 ~ k 1/8 k (i) ~ k 2 · (ar,s ar 1,f(r,s)), X 2 · (ar,sr ar 1,sr−1 ), X (1 2− ) z , X . h − − i ≥ h − − i ≥ − ·h i Xr=1 Xs=0 Xr=1 ~ The last inequality follows from Lemma 4.1, by setting xr = (ar,s ar 1,f(r,s)), X . h − − i As this is true for all i, we can take the supremum over i and then take expected values to get

2 lg m+1 dr k/2 k E r/2 ~ k k E (i) ~ k 2 Bk z 2 & 2 (ar,s ar 1,f(r,s)), X c sup z , X · k k h − − i ≥ · h i " r=1 s=0 # i X X h i for some fixed constant c> 0. Therefore, for a fixed k,

E (i) ~ k k sup z , X = O z 2 , i h i k k h i   k/2 and for variable k, as Bk = Θ(k) , we have

k (i) k 1/2 E sup z , X~ = O k z 2 . i h i · || || h i   Acknowledgments

This research was funded by the PRISE program at Harvard and the Herchel Smith Fellowship. I would like to thank Prof. Jelani Nelson for advising this work, as well as for problem suggestions, forwarding me many papers from the literature, and providing helpful feedback on my writeup. Finally, I would like to thank Prof. Scott Sheffield for explaining the Skorokhod embedding theorem and its application to the generalization of Kolmogorov’s inequality.

References

[AMS99] Noga Alon, Yossi Matias, and Mario Szegedy. The space complexity of approximating the frequency moments. J. Comput. Syst. Sci., 58(1):137–147, 1999.

[BKR06] Itai Benjamini, Gady Kozma, and Dan Romik. Random walks with k-wise independent increments. Electron. Commun. Probab., 11:100–107, 2006.

[Bil95] Patrick Billingsley. Probability and Measure. John Wiley & Sons, Inc., New York, 1995.

[B la18] Jaros law B lasiok. Optimal streaming and tracking distinct elements with high proba- bility. In Proceedings of the Twenty-Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2018, New Orleans, LA, USA, January 7-10, 2018, pages 2432–2448, 2018.

20 [BDN17] Jaros law B lasiok, Jian Ding, and Jelani Nelson. Continuous monitoring of ℓp norms in data streams. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2017, August 16-18, 2017, Berkeley, CA, USA, pages 32:1–32:13, 2017. [BV04] Stephen Boyd and Lieven Vandenberghe. Convex Optimization. Cambridge University Press, 2004. [BCI+17] Vladimir Braverman, Stephen R. Chestnut, Nikita Ivkin, Jelani Nelson, Zhengyu Wang, and David P. Woodruff. BPTree: An ℓ2 heavy hitters algorithm using constant memory. In Proceedings of the 36th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, PODS 2017, Chicago, IL, USA, May 14-19, 2017, pages 361–376, 2017. [BCI+16] Vladimir Braverman, Stephen R. Chestnut, Nikita Ivkin, and David P. Woodruff. Beating countsketch for heavy hitters in insertion streams. In Proceedings of the 48th Annual ACM SIGACT Symposium on Theory of Computing, STOC 2016, Cambridge, MA, USA, June 18-21, 2016, pages 740–753, 2016. [CW79] Larry Carter and Mark N. Wegman. Universal classes of hash functions. J. Comput. Syst. Sci., 18(2):143–154, 1979. [Dud67] R. M. Dudley. The sizes of compact subsets of hilbert space and continuity of gaussian processes. Journal of Functional Analysis, 1(3):290–330, 1967. [EV04] M. P. Etienne and P. Vallois. Approximation of the distribution of the supremum of a centered random walk. Application to the local score. Methodology And Computing In Applied Probability, 6(3):255–275, Sep 2004. [Jac10] Kurt Jacobs. Stochastic Processes for Physicists. Cambridge University Press, 2010. [Khi81] Uffe Haagerup. The best constants in the Khintchine inequality. Studia Mathematica, 70(3):231–283, 1981. [Mer96] Lo¨ıc Merel. Bornes pour la torsion des courbes elliptiques sur les corps de nombres. Invent. Math. 124 (1996), 437–449. [Kol28] A. Kolmogoroff. Uber¨ die summen durch den zufall bestimmter unabh¨angiger gr¨oßen. Mathematische Annalen, 99:309–319, 1928. [Nel16] Jelani Nelson. Chaining introduction with some computer science applications. Bulletin of the EATCS, 120, 2016. [PPR09] Anna Pagh, Rasmus Pagh, and Milan Ruˇzi´c. Linear probing with constant independence. SIAM J. Comput., 39(3):1107–1120, 2009. [PT16] Mihai Pˇatra¸scu and Mikkel Thorup. On the k-independence required by linear probing and minwise independence. ACM Trans. Algorithms, 12(1):8:1–8:27, 2016. [RL01] Yao-Feng Ren and Han-Ying Liang. On the best constant in Marcinkiewicz-Zygmund inequality. Statistics & Probability Letters, 53(3):227–233, 2001. [Knu18] Mathias Knudsen (via Jelani Nelson). Personal communication.

21 A Bounds on Certain Well-Known Pairwise Independent Random Walks

There exist various well-known pairwise independent hash functions from [n] to 1 . Here, we {± } consider a function generated by a Hadamard matrix and a function generated by a random linear function from Z/pZ Z/pZ, where p is a prime. → A.1 Hadamard Matrices For n = 2k, the n n Hadamard matrix is a matrix H with each entry equal to 1 or 1. Suppose × − we write each integer 0 i

Eh sup h(1) + + h(j) = Θ(log n). " j | · · · |#

Proof. Since h(0) = 1 always, we have that h(0) + + h(j)=1+ h(1) + + h(j) for all j. · · · · · · Therefore, it suffices to show that

Eh sup h(0) + + h(j) = Θ(log n). " j | · · · |#

Assume that h = h where i = 0, and let 0 v (i) k 1 represent the largest integer ℓ such i 6 ≤ 2 ≤ − that 2ℓ i. Then, note that for any integer 0 c< 2k v2(i) 1, we have that | ≤ − − h (c 2v2(i)+1)= = h (c 2v2(i)+1+2v2(i) 1) = h (c 2v2(i)+1+2v2(i))= = h (c 2v2(i)+1+2v2(i)+1 1). i · · · · i · − 6 i · · · · i · −

This follows from the fact that i, when written in binary, has its last v2(i) coordinates equal to 0 but its (v (i) + 1)th to last coordinate equal to 1. As a result, we have that if 2v2(i)+1 j, then 2 | h(0) + + h(j) = 0. Therefore, as h(j) 1 for all j, we have that h(0) + + h(j) achieves · · · ∈ {± } | · · · | its maximum when v (j) = v (i), and equals 2v2(i). Finally, if h = h , then h(0) + + h(j) 2 2 0 | · · · | ≤ h(0) + + h(n 1) = n. | · · · − | Therefore, with probability 1 , we have sup h(1) + + h(j) = n. Also, for each 0 ℓ k 1, n j | · · · | ≤ ≤ − we have v (i)= ℓ with probability 2 ℓ 1, in which case sup h(1) + + h(j) = 2ℓ. Therefore, we 2 − − j | · · · | have k 1 1 − ℓ 1 ℓ k Eh sup h(0) + + h(j) = n + 2− − 2 =1+ = Θ(log n). " j | · · · |# · n · 2 Xℓ=0  

22 A.2 Linear Functions on Finite Fields It is well-known that for any finite field F the family f (x) : x ax + b where a, b are randomly q a,b 7→ chosen from F is pairwise independent, with P (f (x ) = y ,f (x ) = y ) = 1 for all x = q a,b a,b 1 1 a,b 2 2 q2 1 6 x F and y ,y F . As a result, we can create the following function from [q] 0, 1 . First, 2 ∈ q 1 2 ∈ q → { ± } create some ordering of the elements of F , giving us a bijection g : [q] F . Next, if q is a power q → q of 2, choose some subset S Fq of size q/2 that will map to 1 (and the rest will map to 1). If ⊂ −q 1 q is odd, we will say that 0 F maps to 0 and we choose some subset S F 0 of size − to ∈ q ⊂ q\{ } 2 map to 1. Thus, our final map f : [q] 0, 1 , given a, b, will map x [q] to 1 if a g(x)+ b S, → { ± } ∈ · ∈ to 0 if q is odd and a g(x)+ b = 0, and to 1 otherwise. While we really want a random walk · − with steps in 1 , we note that f maps at most one element x F to 0 unless a = b = 0, this {± } a,b ∈ q will not affect the expectation of the supremum of the random walk more than O(1). We note that providing general bounds on these random walks is quite difficult, so we will look at the special case where q is an odd prime, the bijection g is reduction modulo q, and q 1 S = 1, 2,..., − . In this case, we let f : [q] 0, 1 be the function such that 2 a,b → { ± } n o 0 ax + b 0 (mod q) ≡ f(x)= 1 ax + b s (mod q),s S  ≡ ∈  1 else − In this case, we have the following result, which demonstrates that f will not give a random walk with expected supremum distance ω(√n).

Proposition A.2. We have that

E sup f (1) + + f (k) = O(log2 q). a,b | a,b · · · a,b |  k  Remark. The solution below follows a very similar approach to approximating the number of solu- tions to ab c (mod q) where c is fixed a, b are known to be in some fixed intervals contained in ≡ Z/qZ, using bounds on the Kloosterman sums [Mer96].

Proof. For simplicity of notation, we will use notation commonly seen in analytic number theory. Namely, for x R, we will write e(x) := e2πix and x := min(x x , x x), or the closest ∈ k k − ⌊ ⌋ ⌈ ⌉− distance between x and some integer. Consider the function h : Z/qZ R where h(0) = 0, h(s)=1if s S, and h(s)= 1 otherwise. → ∈ − We consider the Discrete Fourier Transform of h,

q 1 − i s h(s)= h(i)e · . − q Xi=0   b

23 It is clear that h(0) = 0 and that for s = 0, 6 (q 1)/2 q 1 b − i s − i s h(s) e · + e · | |≤ − q q i=1   i=(q+1)/2   X X b s (q 1) e · 2−q 1 = 2 − −  s  e 1 − q −

 1  4 ≤ · e s 1 − q − 1  = O . s/q k k Now, using the discrete Fourier inversion formula, if a = 0, we can write 6 k f (1) + + f (k) = h(j a + b) | a,b · · · a,b | · j=1 X q 1 k 1 − j a + b = h(i)e i · q · q i=0 j=1   X X q 1 b k 1 − 1 (i a) j + (i b) . e · · · q i/q · q i=1 j=1   X k k X q 1 (i a) k 1 − 1 e · q · 1 = − q i/q ·  i a  i=1 e · 1 X k k q − q 1   1 − 1 1 . . q i/q · i a/q Xi=1 k k k · k Note that the final expression is independent of k, so we can take a supremum over all k. Also, the final expression is independent of b. Summing over all values of b Z/qZ and all a (Z/qZ) 0 ∈ ∈ \{ } gives us

q 1 q 1 − − 1 1 sup f (1) + + f (k) . | a,b · · · a,b | i/q · i a/q a=0,b k a=1 i=1 X6 X X k k k · k q 1 q 1 − − 1 1 = i/q · c/q c=1 i=1 k k k k X X 2 (q 1)/2 − 1 = 2  i/q  Xi=1 . q2 log2 q. 

24 Finally, for a = 0, we have q possibilities of b and we clearly have sup f (1) + + f (k) q | a,b · · · a,b |≤ as the image of f is contained in 0, 1 . Therefore, { ± } 2 2 2 2 2 sup fa,b(1) + + fa,b(k) . q + q log q . q log q. k | · · · | Xa,b   Thus, taking the expectation over a, b, gives us

E sup f (1) + + f (k) = O(log2 q), a,b | a,b · · · a,b |  k  as desired.

B Final Case of Theorem 6

Here, we prove the final case of Theorem 6. Proposition B.1. Let q p 1, k 2. Suppose that either 2 k 3 and p 3, or k 4 ≥ ≥ ≥ ≤ ≤ ≥ ≥ and either p > 2 k or p = q = 1. Then, if n is sufficiently large, there exist unbiased k-wise ⌊ 2 ⌋ independent random variables X ,...,X with E[ X q] 1 for all i such that 1 n | i| ≤ E [ X + + X p]=Ω(n)(p+1)/2. | 1 · · · n| Proof. Note that 2 k 3 and p 3 implies p > 2 k . Therefore, we’ll first consider the case ≤ ≤ ≥ ⌊ 2 ⌋ of k 2 and p 2 k + 1. The variables we will deal with will be 0 or 1 random variables, so ≥ ≥ ⌊ 2 ⌋ ± E[ X q] 1 for all q 1. | i| ≤ ≥ Suppose n is an odd prime satisfying n 2k. Let f : Fn Fn be a randomly chosen polynomial k≥ 1 i → of degree at most k 1, i.e. f(x) = i=0− aix , where a0, . . . , ak 1 are randomly chosen from − − F = Z/nZ. Now, for 1 i n, let X be the random variable which equals 1 if f(i) is a nonzero n ≤ ≤ i P quadratic residue modulo n, 1 if f(i) is a quadratic nonresidue modulo n, and 0 of f(i) = 0. It is − well known that f(1),...,f(n) are k-wise independent, so we therefore have that X1,...,Xn are k/2 k 1 k-wise independent. However, there are n⌈ ⌉ polynomials of degree at most −2 , which means that the number of polynomials f of degree at most k 1 which are squares over F [x] is Θ(n k/2 ), − p ⌈ ⌉ since no polynomial can have more than 2 square roots. In this case, X + + X n k n 1 · · · n ≥ − ≥ 2 since n 2k, since f(i) is always a quadratic residue, and can be 0 for at most k values of i. ≥ Therefore, we have that

k/2 p Θ(n⌈ ⌉) p p k/2 E [ X + + X ] Ω(n) = Ω(n) −⌊ ⌋. | 1 · · · n| ≥ nk · However, since p 2 k + 1, we have that p k p+1 . ≥ ⌊ 2 ⌋ − ⌊ 2 ⌋≥ 2 Now, if n is not prime but n 4k. we can choose n prime so that 2k n n n by ≥ ′ ≤ 2 ≤ ′ ≤ Bertrand’s postulate, and choose random variables X1,...,Xn′ as in the previous paragraph, and have Xn′+1,...,Xn = 0. For the case of p = q = 1 and n 1, we let X ,...,X be i.i.d. drawn from X, where X = 0 ≥ 1 n with probability 1 1 , X = n with probability 1 , and X = n with probability 1 . It is clear that − n 2n − 2n E[ X ] = 1. However, if we have samples X ,...,X , exactly one of these samples will be nonzero | | 1 n with probability n (1 1 )n 1 1 , which converges to 1 as n . In this case, X + +X = n. · − n − · n e →∞ | 1 · · · n| Therefore, we have that E[ X + + X ] ( 1 o(1)) n = Ω(n), which concludes the proof. | 1 · · · n| ≥ e − ·

25 C Proof of Theorem 3 without the 4-wise independence criterion

Here, we prove a weaker version of Theorem 3 where X1,...,Xn are assumed to be totally inde- pendent as a direct corollary of the Skorokhod Embedding Theorem. First, we state the Skorokhod Embedding Theorem (see, for instance, [Bil95, Theorem 37.7]).

Theorem 8. Let X1,...,Xn be independent unbiased random variables with finite variances, and let W be a Brownian motion process in 1 dimension. For 1 i n, let S = X + + X Then, ≤ ≤ i 1 · · · i there exists a series of stopping times tau := 0 and 0 τ τ τ with respect to the 0 ≤ 1 ≤ 2 ≤ ··· ≤ n filtration of W , such that for all 1 i n, the joint distribution (Wτ1 ,...,Wτn ) is the same as the E ≤ ≤ E 2 distribution (S1,...,Sn), and [τi τi 1]= [Xi ]. − − We will also need the Reflection Principle of 1-dimensional Brownian motion (see, for instance, [Jac10, Section 4.2]).

Theorem 9. If W is a Brownian motion process in 1 dimension, then for any time τ, the distri- bution of sup0 t τ W (t) has the same distribution as W (τ) . ≤ ≤ | | We now can prove the weaker version of Theorem 3.

Proof. Let all notation be as in Theorem 8. Then, max Si has the same distribution as max Wτi . E 2 E 2 E 2 | | | | Thus, [max Si ]= [max Wτi ] [sup0 t τn Wt ]. Now, since sup Wt sup Wt+sup( Wt), | | | | ≤ ≤ ≤ | | | |≤ − and since sup W and sup( W ) have the same distribution, we have that t − t 2 2 2 E sup Wt E sup Wt + sup ( Wt) 4E sup Wt . 0 t τn | | ≤ 0 t τn 0 t τn − ≤ 0 t τn  ≤ ≤  " ≤ ≤ ≤ ≤  # " ≤ ≤  #

However, by Theorem 9, sup0 t T Wt has the distribution Z where Z (0, T ). Therefore, ≤ ≤ | | ∼ N conditioning on τ1,...,τn, we have that

2 2 E sup Wt τ1,...,τn = E (0,τn) = τn. 0 t τn |N | " ≤ ≤  #  

However, by Theorem 8, E[τ ]= E[X2 + + X2]. Therefore, we have that n 1 · · · n n E 2 E 2 max Si 4 [Xi ]. 1 i n | | ≤ · ≤ ≤   Xi=1

26