<<

arXiv:1610.01765v2 [math.PR] 18 Nov 2016 set eovdb remn[6 se[,1,1]frerirrsls.Fried vertic results). of earlier number for the 17] and 18, fixed [9, o (see is conjecture [16] degree Friedman The seco by the resolved properties. the when r expansion of gap of its magnitude spectral line with the the new graph Alon between A regular of connection works a 4]. seminal of a 27, The established [14, 19]. [1] papers [11, recent Alon matrices more ot adjacency as among of well invertibility refer, as we [21], distribution, McKay of spectral empirical the Regarding frno undirected random of Let Introduction 1 { Keywords: S 2010: MSC n 1 eedn nyon only depending n pa 19)woetbihdteuprbudfor bound upper the established who used (1999) approach the Upfal to and compared differences considerable tensorizat of and bears application boots-trapping non-trivial with a combined is martingales, matric direct proof random of The a p of space sequences. we matrix degree the adjacency tool, the on by technical induced forms is main measure linear the arbitrary As for inequality sequences. degree predefined h rueto ok odti n ono 21)woderiv who (2015) estimated and Johnson forms and linear Goldstein for Cook, inequality of argument the ojcueo u h euti banda osqec of rand consequence of a matrices as adjacency obtained of is value result singular The largest Vu. second of conjecture a λ rbblt tlat1 least at probability aeo prerno rps hscmltl ete h pr the settles completely this graphs, of magnitude random sparse of case isdcouplings. biased , h pcrlgpo es admrglrgraphs regular random dense of gap spectral The eantrlnme n let and number natural a be 2 ( G n , . . . , o any For eoe t eodlretegnau i bouevalue) absolute (in eigenvalue largest second its denotes ) } sagahi hc vr etxhsexactly has vertex every which in graph a is admrglrgah nfr oe,seta a,rno matr random gap, spectral model, uniform graph, regular Random 58,60B20. 05C80, α λ ( ∈ osatnTkoio n ireYoussef Pierre and Tikhomirov Konstantin G (0 d ,u oamlilctv osat o l ausof values all for constant, multiplicative a to up ), rglrgah aeatatdcnieal teto fresea of attention considerable attracted have graphs -regular , α − )adany and 1) obndwt ale eut nti ieto oeigt covering direction this in results earlier with Combined . n 1 where , G d n ≤ steuiomrandom uniform the is α n ≤ Abstract An . d ≤ 1 λ undirected ( G n/ nterange the in ) ,w hwthat show we 2, d -regular λ d o ruet.Ormethod Our arguments. ion d rglrgahon graph -regular ( h remnieult for inequality Freedman the G egbr.Seta properties Spectral neighbors. s hr h probability the where es, om for ) dgahwt prescribed with graph ed be fetmtn the estimating of oblem d yBoe,Fiz,Suen Frieze, Broder, by = a rvd nparticular, in proved, man and oeaconcentration a rove es oacasclresult classical a to hers, directed λ ln[]o h ii of limit the on [1] Alon f graph netmt o the for estimate an stnst nnt,was infinity, to tends es ( d n O daconcentration a ed G ( and = C dlreteigenvalue largest nd n ) α n imn[]and [2] Milman and 2 o ≤ / sac el with deals esearch ( 3 G saconstant a is d √ sn size- using ) rpswith graphs confirming , C n ihtevertex the with n α ,adto and ), √ vertices, d with ices. he rchers. that λ(G)=2√d 1+o(1) with probability tending to one with n , where λ(G) is the second largest (in− absolute value) eigenvalue of the undirected d-regular→∞ random graph G on n vertices, uniformly distributed on the set of all simple d-regular graphs (see [7] for an alternative proof of Friedman’s theorem; see also [22] for a different approach producing a weaker bound). A natural extension of Alon’s question to the setting when d grows with n to infinity, was considered in [8, 13, 12]. Namely, in [8] the authors showed that for d = o(√n), one has λ(G) C√d with probability tending to one with n for some universal constant ≤ C > 0. This result was extended to the range d = O(n2/3) in [12]. In [13], the bound λ(G) C√d w.h.p. was obtained for G distributed according to the permutation model, ≤ which we do not consider here. In [28], Vu conjectured that λ(G)=(2+ o(1)) d d2/n w.h.p. in the uniform model, when d n/2 and d tends to infinity with n (see also− [29, Conjectures 7.3, 7.4]). The ≤ p “isomorphic” version of this question was one of the motivations for our work. Apart from the previously mentioned connection with structural properties of random graphs, this line of research seems quite important in another aspect as well. Random d-regular graphs supply a natural model of randomness for square matrices, in which the matrix cannot be partitioned into independent disjoint blocks (say, rows or columns) but the correlation between very small disjoint blocks is weak. Techniques developed to deal with the adjacency matrices of these graphs may prove useful in other problems within the random matrix theory. In this respect, our intention was to develop, or rely on, arguments which are flexible and admit various generalizations. Given an n n symmetric matrix A, we let λ (A) λ (A) ... λ (A) be its × 1 ≥ 2 ≥ ≥ n eigenvalues arranged in non-increasing order (counting multiplicities). For an undirected graph G on n vertices, we define λ1(G),...,λn(G) as the eigenvalues of its adjacency matrix. Theorem A. For every α (0, 1) and m N there are L = L(α, m) > 0 and n = n (α, m) ∈ ∈ 0 0 with the following property: Let n n , nα d n/2, and let G be a random graph ≥ 0 ≤ ≤ uniformly distributed on the set n(d) of simple undirected d-regular graphs on n vertices. Then G max λ (G) , λ (G) L√d | 2 | | n | ≤ m with probability at least 1 n− . −  Note that, combined with [8, 12], our theorem gives max λ2(G) , λn(G) = O(√d) w.h.p. for all d n/2. Denote by M the adjacency matrix of G|. It is| easy| to see| that (de- ≤  terministically) d is the largest eigenvalue of M with 1 (vector of ones) as the corresponding eigenvector. Hence, from the Courant–Fischer formula, we obtain λ(G) = M d 11t = k − n k M EM . Theorem A thus implies that the spectral measure of 1 (M EM) is sup- √d portedk − on ank interval of constant length with probability going to one with−n . We refer to [4, 3] (and references therein) for recent advances concerning the limiting→ behavior ∞ of the spectral measure for random d-regular graphs in the uniform model. The proof of Theorem A is obtained by a rather general yet simple procedure which reduces the question to the non-symmetric (i.e. directed) setting, which we are about to consider. A directed d-regular graph G on n vertices is a directed (labeled) graph in which every vertex has d in-neighbors and d out-neighbors. We allow directed graphs to have loops, but

2 do not allow multiple edges (edges connecting the same pair of vertices in opposite directions are distinct). The corresponding set of graphs will be denoted by (d). Note that the set Dn of adjacency matrices for graphs in n(d) is the set of all 0-1-matrices with the sum of elements in each row and column equalD to d. Note also that there is a natural bijection from (d) onto the set of bipartite d-regular simple undirected graphs on 2n vertices. Given Dn an n n matrix A, we let s1(A) s2(A) ... sn(A) be its singular values arranged in non-increasing× order (counting multiplicities).≥ ≥ For≥ a directed graph G on n vertices, we define s1(G),...,sn(G) as the singular values of its adjacency matrix.

Theorem B. For every α (0, 1) and m N there are L = L(α, m) > 0 and n0 = ∈ ∈ α n0(α, m) N with the following property: Let n n0, and let n d n/2. Further, let G be a random∈ directed d-regular graph uniformly distributed≥ on ≤(d).≤ Then Dn s (G) L√d 2 ≤ m with probability at least 1 n− . Consequently, if G is a random undirected graph uniformly distributed on the set of all− bipartite simple d-regular graphs on 2n vertices then e λ (G) L√d 2 ≤ m with probability at least 1 n− . − e

Theorem B above is stated for reader’s convenience. In fact, we prove a more general statement which deals with random graphs with predefined degree sequences. With every directed graph G on 1, 2,...,n , we associate two degree sequences: the in-degree sequence in in in { in } in d (G)=(d1 , d2 ,..., dn ), with di equal to the number of in-neighbors of vertex i, and out out out out out the out-degree sequence d (G)=(d1 , d2 ,..., dn ), where di is the number of out- neighbors of i (i n). Conversely, given two integer vectors din, dout Rn, we will denote by (din, dout)≤ the set of all directed graphs on n vertices with the∈ in- and out-degree Dn sequence din and dout, respectively. Again, we allow the graphs to have loops but do not allow multiple edges. Let us introduce the following two Orlicz norms in Rn:

n 1 x /λ n x := inf λ> 0 : e| i| 1 , x =(x ,...,x ) R ; (1) k kψ,n en ≤ 1 n ∈ i=1 n X o 1 n x x x := inf λ> 0 : | i| ln | i| 1 , x =(x ,...,x ) Rn. (2) k klog,n n λ + λ ≤ 1 n ∈ i=1 n X   o n Here, ln+(t) := max(0, ln t)(t 0). One can verify that the space (R , log,n) is isomorphic ≥ n k·k (with an absolute constant) to the for (R , ψ,n). More properties of these norms will be considered later. Now, let us state the spectralk·k gap theorem for directed graphs in full generality:

3 Theorem C. For every α (0, 1), m N and K > 0 there are L = L(α,m,K) > 0 and n = n (α,m,K) N with the∈ following∈ property: Let n n , and let din, dout be two degree 0 0 ∈ ≥ 0 sequences such that for some integer nα d 0.501n we have ≤ ≤ max din d n , dout d n K√d. i − i=1 ψ,n i − i=1 ψ,n ≤ in out    Assume that n(d , d ) is non-empty, let G be a random directed graph uniformly dis- tributed on D(din, dout). Then Dn s (G) L√d 2 ≤ m with probability at least 1 n− . − The condition on the degree sequences in the theorem can be viewed as a concentration in out inequality for di d and di d, with respect to the “uniform” choice of i in [n]. In in− n − out n particular, if (di d , (di d)i=1 K√d then the degree sequences satisfy i=1 ∞ ∞ the assumptionsk of− the theorem.k k − k ≤  Theorem C is the main theorem in this paper, and Theorem A (and, of course, B) is obtained as its consequence. In note [26], we proved a rather general comparison theorem for jointly exchangeable matrices which, in particular, allows us to estimate the spectral gap of random undirected d-regular graphs in terms of the second singular value of directed random graphs with predefined degree sequences. Let us briefly describe the idea of the reduction scheme. Assume that G is uniformly distributed on (d) and let M be its adjacency matrix. Gn Then the results of [26] assert that with high probability s2(M) = max( λ2(G) , λn(G) ) can be bounded from above by a multiple of the second largest singular va|lue of the| | n/2 | n/2 × submatrix of M located in its top right corner. In turn, it can be verified that the distribution of this submatrix is directly related to the distribution of the adjacency matrix of a random directed graph on n/2 vertices with in- and out-degree sequences “concentrated” around d/2. We will cover this procedure in more detail in Section 6 and show how Theorem A follows from Theorem C.

In the course of proving Theorem C, we obtain certain relations for random graphs with predefined degree sequences which may be of separate interest. The rest of the introduction is devoted to discussing these developments and, in parallel, provides an outline of the proof of Theorem C. Given an n n matrix M, we denote the Hilbert–Schmidt norm of M by × M HS. Additionally, we will write M for the maximum norm (defined as the absolute k k k k∞ in out value of the largest matrix entry). The set of adjacency matrices of graphs in n(d , d ) in out in out D will be denoted by n(d , d ). Obviously, n(d , d ) coincides with the set of all M in M out 0-1-matrices M with supp coli(M) = di , supp rowi(M) = di for all i n. The proof of Theorem| C is composed| of| two major blocks.| In the first≤ block, we derive n a concentration inequality for linear functionals of the form i,j=1 MijQij, where M is a random matrix uniformly distributed on (din, dout), and Q is any fixed n n matrix. n P In the second block, we use the concentrationM inequality to establish certain× discrepancy properties of the random graph associated with M. Then, we apply a well known argument of Kahn and Szemer´edi [18] in which the discrepancy property, together with certain covering arguments, yields a bound on the matrix norm.

4 The first block. Our concentration inequality for linear forms involves conditioning on a special event, having a probability close to one, on the space of matrices (din, dout). Mn Let us momentarily postpone the definition of the event (which is rather technical) and state the inequality first. Define a function H(t) on the positive semi-axis as

H(t) := (1 + t) ln(1 + t) t. (3) − Theorem D. For every α (0, 1), m N and K > 0 there are γ = γ(α,m,K), L = ∈ ∈ L(α,m,K) > 0 and n0 = n0(α,m,K) N with the following property: Let n n0, and let din, dout be two degree sequences such that∈ for some integer nα d 0.501n we≥ have ≤ ≤ max din d n , dout d n K√d. i − i=1 ψ,n i − i=1 ψ,n ≤ in out   Assume further that n (d , d ) is non-empty, and let M be uniformly distributed on (din, dout). Then forM any fixed n n matrix Q and any t CL√d Q we have Mn × ≥ k kHS n n 2 d d Q HS γtn Q P MijQij Qij > t M (L) 2 exp k k H k k∞ . P 2 2 − n | ∈E ≤ − n Q d Q HS n i,j=1 i,j=1 o  k k∞  k k  X X in out Here, C > 0 is a universal constant and (L) is a subset of n(d , d ) which is deter- EP m M mined by the value of L, and satisfies P( (L)) 1 n− . EP ≥ − The function H in the above deviation bound is quite natural in this context. It implicitly appears in the classical inequality of Bennett for sums of independent variables (see, [5, Formula 8b]), and later in the well known paper of Freedman [15] where he extends Bennett’s inequality to martingales. In fact, our proof of Theorem D uses the Freedman inequality (more precisely, Freedman’s bound for the moment generating function) as a fundamental element. Note that we require t to be greater (by the order of magnitude) than √d Q , k kHS which makes the above statement a large deviation inequality. The restriction on t takes its roots into the way we obtain Theorem D from concentration inequalities for individual matrix rows. The tensorization procedure involves estimating the differences between conditional and unconditional expectations of rows, and we apply a rather crude bound by summing up absolute values of the “errors” for individual rows. In fact, the lower bound CL√d Q HS √ n k k for t can be replaced with a smaller quantity C′L d i=1 rowi(Q) log,n, provided that we d n k k choose a different “point of concentration” than n i,j=1 Qij; we prefer to avoid discussing these purely technical aspects in the introduction. P A concentration inequality very similar to the oneP from Theorem D, was established in a recent paper of Cook, Goldstein and Johnson [12] which strongly influenced our work. The Bennett-type inequality from [12], formulated for adjacency matrices of undirected d-regular graphs, also involves a restriction on the parameter t, which, however, exhibits a completely different behavior compared to the lower bound √d Q HS in our work. In particular, the concentration inequality in [12] is not strong enoughk ink the range d n2/3 to yield the ≫ correct order of the second largest eigenvalue. For the permutation model, a Bernstein-type concentration inequality was obtained in [13] by constructing a single martingale sequence for

5 the whole matrix and applying Freedman’s inequality. We will discuss in detail in Section 4 why a direct use of the same approach is problematic in our setting. Theorem D, the way it is stated, is already sufficient to complete the proof of Theorem C, without any knowledge of the structure of the event (L). However, defining this event EP explicitly should give more insight and enable us to draw a comprehensive picture. Let in out G be a digraph on n vertices with degree sequences d , d , and let M = (Mij) be the adjacency matrix of G. Further, let I be a subset of 1, 2,...,n (possibly, empty). We { } define quantities pcol(I, M) and prow(I, M) (j n) as j j ≤ pcol(I, M) := din q I : M =1 = q Ic : M =1 ; j j − |{ ∈ qj }| |{ ∈ qj }| prow(I, M) := dout q I : M =1 = q Ic : M =1 . j j − |{ ∈ jq }| |{ ∈ jq }| col col col Further, let us define n-dimensional vectors (I, M)=( 1 (I, M),..., n (I, M)) and row(I, M)=( row(I, M),..., row(I, M)) asP P P P P1 Pn n col(I, M) := pcol(I, M) pcol(I, M) Pj | j − ℓ | Xℓ=1 j n. n ≤ row(I, M) := prow(I, M) prow(I, M) Pj | j − ℓ | Xℓ=1 Conceptually, the vectors col(I, M), row(I, M) can be thought of as a measure of P P “disproportion” in the locations of 1’s across the matrix M. Given any non-empty sub- I col set I 1, 2,...,n , let M be the I n-submatrix of M. Then for every j n, j (I, M) ⊂{ } n × ≤ P I is just the sum of differences of ℓ1 -norms of the j-th column and every other column of M :

n col(I, M)= col (M I ) col (M I ) . Pj k j k1 −k ℓ k1 ℓ=1 X

The event (L) employed in Theorem D, controls the magnitude of those vectors: for every EP L> 0 we define the event as

in out row col (L) := M n(d , d ): (I, M) ψ,n, (I, M) ψ,n Ln√d for EP ∈M kP k kP k ≤ (4) n any interval subset I 1, 2,...,n of cardinality at most 0.001n , ⊂{ } o where ψ,n is given by (1). Note that the subsets I in the definition are assumed to be intervalk·k subsets, which gives importance to the way we enumerate the vertices. It is not difficult to see that if the definition involved every subset I with I 0.001n then the | | ≤ probability of the event would be just zero as one can always find two vertices with largely non-overlapping sets of in-neighbors. Loosely speaking, the condition secured by the event (L) is a skeleton for our matrix: P it indicates that 1’s are spread throughout the matrix moreE or less evenly. Assuming this property (i.e. conditioning on the event), we can establish stronger “rules” for the distribution

6 of the non-zero elements and, in particular, obtain Theorem D. This can be viewed as a realization of the boots-trapping strategy. From the technical perspective, the proof of Theorem D requires many preparatory state- ments and is quite long. Our exposition is largely self-contained; probably the only essential “exterior” statement which we employ in the first part of the paper is Freedman’s inequal- ity for martingales, which is given (together with some corollaries) in Sub-section 2.2. It is followed by the “graph” Sub-section 2.3 where we state and prove a rough bound on the num- ber of common in-neighbors of two vertices of a random graph using a standard argument involving simple switchings and multimaps (relations). Section 3 is the core of the paper. There, we apply the Freedman inequality and derive deviation bounds for individual rows of our random adjacency matrix. The first sub-section contains a series of lemmas dealing with a fixed row coordinate (and conditioned on the upper rows and all previous coordinates within this fixed row) and provides a foundation for our analysis. Sub-section 3.2 integrates the information for the individual matrix entries and, after resolving some technical issues, culminates in Theorem 3.12 which is the main statement of Section 3. Finally, we apply a tensorization procedure in Section 4 and prove (a somewhat technical version of) Theorem D. The second block. Equipped with the concentration inequality given by Theorem D, we follow the Kahn–Szemer´edi argument [18] to prove Theorem C. For simplicity, let us describe the procedure for the uniform model on n(d) and disregard conditioning on the event (L) D EP in Theorem D. Denoting by M the adjacency matrix of a random d-regular graph uniformly distributed on n(d), it is easy to see that its largest singular value is equal d (deterministi- D 1 cally), and the corresponding normalized singular vector is (1/√n, 1/√n, . . . , 1/√n)= √n 1. By the Courant–Fischer formula and the singular value decomposition, we have d M M 1 1t M s2( )= 2 2 = sup x, y . − n · → x 1⊥ Sn−1,h i ∈y S∩n−1 ∈ A natural approach to bounding the supremum on the right hand side would be to apply the standard covering argument, which plays a key role in Asymptotic Geometric Analysis. The argument consists in showing first that Mx, y is bounded by certain threshold value (in h i this case, O(√d)) with high probability for any pair of admissible x, y. Once this is done, a n 1 n 1 quite general approximation scheme allows to replace the supremum over 1⊥ S − S − by the supremum over a finite discrete subset (a net). From the probabilistic∩ viewpoint,× we pay the price by taking the union bound over the net (which can be chosen to have cardinality exponential in dimension) to obtain an estimate for the entire set. In order for such a procedure to work, we need a concentration inequality for Mx, y (for fixed x, y) h i which would “survive” multiplication by the cardinality of the net. By Theorem D (applied t n 1 n 1 to the matrix Q = yx for any fixed (x, y) 1⊥ S − S − ), we have ∈ ∩ × d n x y P Mx, y √d exp H k k∞ k k∞ . |h i| ≫ ≪ −n x 2 y 2 √d  k k∞ k k∞    However, the expression on the right hand side is an increasing function of x y , and n k k∞ k k∞ becomes larger than C− when x y √d/n. Hence, the union bound in the above k k∞ k k∞ ≫ 7 description can work only for x, y having small -norms. A key idea in the argument ∞ by Kahn and Szemer´edi, which distinguishes it fromk·k the standard covering procedure, is to split the quadratic form associated with Mx, y into “flat” and “spiky” parts: h i Mx, y = y M x + y M x . (5) h i i ij j i ij j (i,j) [n] [n]: (i,j) [n] [n]: ∈ × ∈ × x yX√d/n x yX>√d/n | j i|≤ | j i| Let us note that a somewhat similar decomposition of the sphere into “flat” and “spiky” vectors was used in [20] and [24] to bound the smallest singular value of certain random ma- trices. The first term in (5) can be dealt with by directly using the concentration inequality from Theorem D (plus standard covering). On the other hand, the second summand needs a more delicate handling. Kahn and Szemer´edi proposed a way to relate the quantity to discrepancy properties of the underlying graph, more precisely, to deviations of the edge count between subsets of the vertices from its mean value. To illustrate the connection, let a, b be any positive numbers with ab √d/n and let S := i n : y b and ≫ { ≤ | i| ≈ } T := j n : x a . Then { ≤ | j| ≈ } y M x = O ab E (S, T ) , i ij j | G | (i,j) [n] [n]: x ∈Xa, y× b  | j |≈ | i|≈ where E (S, T ) is the number of edges of graph G corresponding to M, starting in S and | G | ending in T . In the actual proof, this simplified illustration should be replaced by a careful partitioning of vectors x and y into “almost constant” blocks. We refer to Section 5 for a rigorous exposition of the argument allowing to complete the proof of Theorem C. Once Theorem C is proved, we apply it, together with the “de-symmetrization” result of [26], to prove Theorem A. This is accomplished in Section 6.

2 Notation and Preliminaries

Everywhere in the text, we assume that n is a large enough natural number. For a finite set I, by I we denote its cardinality. For any positive integer m, the set 1, 2,...,m will be denoted| | by [m]. If I [n] then, unless explicitly specified otherwise,{ the set Ic is} the ⊂ complement of I in [n]. For a real number a, a is the smallest integer greater or equal to a, and a is the largest integer not exceeding ⌈a.⌉ A vector y Rn is called r-sparse for some ⌊ ⌋ ∈ r 0 if the support supp y has cardinality at most r. By , we denote the standard inner ≥ n h· n·i product in R , by — the standard Euclidean norm in R , and by e1, e2,...,en — the canonical basis vectors.k·k For every 1 p< , the -norm in Rn is{ defined by } ≤ ∞ k·kp 1/p ∞ (x , x ,...,x ) := x p , k 1 2 n kp | i| i=1  X  and the canonical maximal norm is

(x1, x2,...,xn) := max xi . ∞ i n k k ≤ | | 8 Universal constants are denoted by C,c,c′, etc. In some situations we will add a numerical subscript to the name of a constant to relate it to a particular numbered statement. For example, C2.2 is a constant from Lemma 2.2. Let M be a fixed n n matrix. The (i, j)-th entry of M is denoted by M . Further, × ij we will denote rows and columns by row1(M),..., rown(M) and col1(M),..., coln(M). We denote the Hilbert–Schmidt norm of M by M HS. Additionally, we write M for the k k k k∞ maximum norm (defined as the absolute value of the largest matrix entry) and M 2 2 for k k → its spectral norm. Let din, dout be two degree sequences. Everywhere in this paper, we assume that for an integer d we have (1 c )d din, dout d for all i n, where c := 0.001 and d (1/2+ c )n. (6) − 0 ≤ i i ≤ ≤ 0 ≤ 0 Recall that, given two degree sequences din, dout, the set of adjacency matrices of graphs in out in out in n(d , d ) is denoted by n(d , d ). We will write n(d) for the set of adja- D M S in out cency matrices of undirected simple d-regular graphs on [n]. Each of the sets n(d , d ), in out D n(d , d ), n(d), n(d) can be turned into a probability space by defining the normal- Mized counting measure.G S We will use the same notation P for the measure in each of the four cases. The actual probability space will always be clear from the context. The expectation of a random variable ξ is denoted by Eξ. We will use vertical bar notation for conditional expectation and conditional probability. For example, the expectation of ξ conditioned on an event , will be written as E[ξ ], and the conditional expectation given a σ-sub-algebra — as E [ξ ]. |E F |F Let A, B be sets, and R A B be a relation. Given a A and b B, the image of a ⊂ × ∈ ∈ and preimage of b are defined by 1 R(a) := y B : (a, y) R and R− (b) := x A : (x, b) R . { ∈ ∈ } { ∈ ∈ } We also set R(A) := a AR(a). Further in the text, we will define relations between sets ∈ in order to estimate their∪ cardinality, using the following elementary claim (see [19] for a proof): Claim 2.1. Let s,t > 0. Let R be a relation between two finite sets A and B such that for 1 every a A and every b B one has R(a) s and R− (b) t. Then s A t B . ∈ ∈ | | ≥ | | ≤ | | ≤ | | 2.1 Orlicz norms In the Introduction, we defined two Orlicz norms and in Rn. Let us state k·kψ,n k·klog,n some of their elementary properties (see [23] for extensive information on Orlicz functions and Orlicz spaces). First, it can be easily checked that n x ψ,n x ln(en) x ψ,n for all x R . (7) k k ≤k k∞ ≤ k k ∈ Similarly, we have n x x en x for all x Rn. (8) ln nk klog,n ≤k k1 ≤ k klog,n ∈

9 Lemma 2.2. For any vector y Rn with m := supp y n we have ∈ | | ≤ 2n y C √m y ln , k k ≤ 2.2 k kψ,n m where C2.2 > 0 is a universal constant.

Proof. Without loss of generality, z := y ψ,n = n/m. The convex conjugate of the exponential function is t ln(t) t (t> 0).k Hence,k by Fenchel’s inequality, for any i supp y p we have − ∈

2 y /z y e| i| + z y ln(z y ) z y i ≤ | i| | i| − | i| y /z 2 yi e| i| + z y ln(2z )+ z y ln | | ≤ | i| | i| 2z y /z 2 1 2   e| i| + z y ln(2z )+ y . ≤ | i| 2 i Summing over all i supp y, we get ∈ y 2 2en +2z ln(2z2) y 2en +2z ln(2z2)√m y . k k ≤ | i| ≤ k k i supp y ∈X Plugging in the definition of z and solving the above inequality, we get 2n y C√n ln k k ≤ m for some universal constant C > 0. The result follows. By a duality argument, we also have the following.

Lemma 2.3. For any vector y Rn with m := supp y n we have ∈ | | ≤ 2n 2n n y C y √m ln C y ln , k klog,n ≤ 2.3k k m ≤ 2.3k k1 m where C2.3 > 0 is a universal constant.

Finally, given a vector x with x ψ,n = 1, we can bound the number of coordinates x of any given magnitude: k k

Lemma 2.4. Let x Rn with x =1. Then there is a natural number k 2 ln(en) such ∈ k kψ,n ≤ that k i n : x k/2 n(2e)− . ≤ | i| ≥ ≥ Proof. By the definition of the  norm , we have k·kψ,n n x e| i| = en, i=1 X 10 whence x e| i| n. ≥ i n: xX≤ 1/2 | i|≥ Thus, ∞ ∞ (k+1)/2 k i n : x k/2 e 2− n. ≤ | i| ≥ ≥ k=1 k=1 X  X

It remains to note that, in view of (7), we have i n : xi k/2 = for all k > 2 ln(en) k (k+1)/2 k ≤ | | ≥ ∅ and that 2− e− (2e)− for all k. ≤  2.2 Freedman’s inequality In this sub-section, we recall the classical concentration inequality for martingales due to Freedman, and provide several auxiliary statements which we will apply later in Section 4. Define g(t) := et t 1, t> 0. (9) − − In [15], Freedman proved the following bound for the moment-generating function which will serve as a fundamental block of this paper:

Theorem 2.5 (Freedman’s inequality). Let m N, let (Xi)i m be a martingale with respect ∈ ≤ to a filtration ( i)i m, and let F ≤

di := Xi Xi 1, i m, − − ≤ be the corresponding difference sequence. Assume that di M a.s. for some M > 0 and m 2 2 | | ≤ E(di i 1) σ a.s. for some σ> 0. Then for any λ> 0, we have i=1 |F − ≤ P 2 λ(Xm X0) σ Ee − exp g(λM) . ≤ M 2   As a consequence of the above relation, Freedman derived the inequality

σ2 Mt P X X t exp H , t> 0, (10) m − 0 ≥ ≤ − M 2 σ2      where H is defined by (3). It is easy to check that

t2 H(t) for any t 0, (11) ≥ 2(1 + t/3) ≥ whence, with the above notation,

t2 P X X t exp , t> 0. (12) m − 0 ≥ ≤ − 2σ2 +2Mt/3   

11 In the special case when the martingale consists of partial sums of a series of i.i.d. centered random variables, i.e. d (i m) are i.i.d., (10) was obtained by Bennett [5] and (12) derived i ≤ by Bernstein [6]. Returning to arbitrary martingale sequences, the estimate (10) is often referred to as the Freedman inequality. However, in our setting it is crucial to have the stronger relation provided by Theorem 2.5, as it will allow us to tensorize concentration inequalities obtained for individual rows of the matrix. Lemma 2.6. Let m N and let ξ ,ξ ,...,ξ be random variables. Further, assume that ∈ 1 2 m f (λ): R R are functions such that i + → + λξi E[e ξ1,...,ξi 1] fi(λ) | − ≤ for any λ> 0 and i m. Then for any subset T [m] we have ≤ ⊂ λ P ξ Ee i∈T i f (λ). ≤ i i T Y∈ Proof. Without loss of generality, take T = [m]. Note that

m m m−1 λ P =1 ξi λ P =1 ξi λ P =1 ξi λξm E e i = E E[e i ξ1,...,ξm 1] = E e i E[e ξ1,...,ξm 1] . | − | − h i h i Hence, by the assumption on fm, we get

λ Pm ξ λ Pm−1 ξ E e i=1 i f (λ) E e i=1 i . ≤ m Iterating this procedure, we obtain

m λ m ξ E e Pi=1 i f (λ). ≤ i i=1 Y

As a corollary, we obtain a tail estimate for the sum of random variables satisfying a “Freedman type” bound for their moment generating functions.

Corollary 2.7. Let m N; let (Mi)i m and (σi)i m be two sequences of positive numbers ∈ ≤ ≤ and let random variables ξ1,...,ξm satisfy

2 λξi σi E[e ξ1,...,ξi 1] exp g(λMi) . | − ≤ M 2  i  for any i m and λ 0. Then for any t 0, we have ≤ ≥ ≥ σ2 tM P ξ t exp H , i ≥ ≤ −M 2 σ2 i m    n X≤ o 2 m 2 where M := maxi m Mi and σ := σi . ≤ i=1 P 12 2 Proof. Fix any t> 0 and set λ := ln(1 + tM/σ )/M. In view of the assumptions on ξi’s and Lemma 2.6, we have m 2 m σ λ P =1 ξi i E e i exp g(λMi) . ≤ M 2 i=1 i Y   Since the function g(λt)/t2 is increasing on (0, ), the last relation implies ∞ m 2 2 λ m ξ σi σ E e Pi=1 i exp g(λM) = exp g(λM) . ≤ M 2 M 2 i=1 Y     Hence, by Markov’s inequality,

2 λt λ Pm ξ λt σ P ξ t e− E e i=1 i e− exp g(λM) . i ≥ ≤ ≤ M 2 i m   n X≤ o The result follows after plugging in the expression for λ.

2.3 A crude bound on the number of common in-neighbors We start this sub-section with some graph notations. Let G = ([n], E) be a directed graph on [n] with the edge set E and adjacency matrix M. For any vertex i [n], we define the set of its in-neighbors ∈

in(i) := v n : (v, i) E = supp col (M). NG ≤ ∈ i Similarly, the set of out-neighbors is

out(i) := v n : (i, v) E = supp row (M). NG ≤ ∈ i Further, for every I,J [n] the set of all edges departing from I and landing in J is denoted ⊂ by E (I,J) := e E : e =(i, j) for some i I and j J . G ∈ ∈ ∈ The set of common in-neighbors of two vertices u, v is

Cin(u, v) := i n : (i, u), (i, v) E = supp col (M) supp col (M). G { ≤ ∈ } u ∩ v In this sub-section, we estimate the probability that a pair of distinct vertices of a random in out graph uniformly distributed on n(d , d ), has many common in-neighbors, conditioned on a special σ-algebra. Let us noteD that (much stronger) results of this type for d-regular directed graphs, as well as bipartite regular undirected graphs, were obtained in [10]. Unlike in in [10], we are only interested in large deviations for CG(i, j). On the other hand, the specifics of our setting is that our graphs are not regular (instead, have predefined in- and out-degree sequences) and that the probability is conditional. More precisely, given a subset S [n], let be the σ-algebra on (din, dout) with atoms of the form G (din, dout): ⊂ F Dn { ∈Dn E (S, [n]) = F for all subsets F [n] [n]. In other words, each atom of is a set G } ⊂ × F 13 of graphs sharing the same collection of out-edges for vertices in S. Then for any event (din, dout), we let P G E (S, [n]) be the conditional probability of given . E⊂Dn ∈E| G E F Let us remark that the proof of the main statement of this sub-section is a rather standard  application of the method of simple switchings introduced by Senior [25] and developed by McKay and Wormald (see [21] as well as survey [30]). We provide the proof for the reader’s convenience.

Proposition 2.8. There exist universal constants c2.8,C2.8 > 0 with the following property. in out Rn Asssume that C2.8 ln n d (1/2+c0)n, and let two degree sequences d , d satisfy (6). Let I [n] be such≤ that≤ I c n. Then, denoting by the event ∈ ⊂ | | ≤ 0 E2.8 := G (din, dout): i = j, Cin(i, j) Ic 0.9d , E2.8 ∈Dn ∃ 6 | G ∩ | ≥ n o we have P G E (I, [n]) exp( c d). ∈E2.8 | G ≤ − 2.8  For the rest of the sub-section, we will assume that d and I satisfy the assumptions of Proposition 2.8, and we restrict ourselves to an atom of the σ-algebra generated by in out EG(I, [n]). Namely, let F [n] [n] be such that the set of graphs from n(d , d ) satisfying E (I, [n]) = F , is⊂ non-empty.× Given 1 i = j n, we let D G ≤ 6 ≤ := G (din, dout): E (I, [n]) = F, Cin(i, j) Ic 0.9d Ei,j ∈Dn G | G ∩ | ≥ n o and for any natural q 0.8d, let ≥ q := G (din, dout): E (I, [n]) = F, Cin(i, j) Ic = q . Ei,j ∈Dn G | G ∩ | n q o Lemma 2.9. Let G (for some q 0.8d), q′ < q, and denote ∈E1,2 ≥ in c in J := j 3 : C (1, 2) I (j) q′ . { ≥ | G ∩ \NG | ≥ } Let Φ := Ic in(1) in(2) . Then 1,2 \ NG ∪NG  6c d d E (Φ ,J) dq 2 c 0 . | G 1,2 | ≥ − 0 − q − q q  − ′  Proof. First note that since d (1/2+ c )n and the degree sequences satisfy (6), we have ≤ 0 Φ Ic in(1) in(2) (1 c )n 2d + q q 6c d. | 1,2|≥| |−|NG ∪NG | ≥ − 0 − ≥ − 0 Therefore out EG(Φ1,2, [n]) Φ1,2 max di (1 c0)d(q 6c0d). (13) i Φ1,2 | |≥| | ∈ ≥ − − In view of the definition of J, for any j J c we have ∈ in c in C (1, 2) I (j) q q′. | G ∩ ∩NG | ≥ − 14 Hence, c in c c in c (q q′) J E (C (1, 2) I ,J ) E (C (1, 2) I , [n]) qd, − | |≤| G G ∩ |≤| G G ∩ | ≤ c c which implies that J qd/(q q′). On the other hand, for every j J we have | | ≤ − ∈ in in in c in Φ (j) (j) C (1, 2) I (j) d q + q′, | 1,2 ∩NG |≤|NG |−| G ∩ ∩NG | ≤ − whence c c d E (Φ ,J ) J (d q + q′) qd 1 . | G 1,2 |≤| | − ≤ q q −  − ′  Together with (13), this gives the result.

q q 1 Lemma 2.10. For any integer q 0.8d +1, we have 0.9 − . ≥ |E1,2| ≤ |E1,2 | q q 1 Proof. Let us define a relation R on 1,2 1,−2 as follows: q E ×E Pick any G , and choose an edge (i, j) E (Φ ,J) and k Cin(1, 2) Ic in(j), ∈E1,2 ∈ G 1,2 ∈ G ∩ \NG where J and Φ1,2 are defined in Lemma 2.9 with q′ := q/7 . Perform the simple switching on the graph G, replacing the edges (i, j) and (k, 1) with⌈ (i,⌉1) and (k, j) respectively. Note in in that the conditions i G (1) and k G (j) guarantee that the simple switching does 6∈ N 6∈ N q 1 not create multiple edges. Moreover, since i Φ , we obtain a valid graph G′ − . ∈ 1,2 ∈ E1,2 We define R(G) as the set of all graphs G′ which can be obtained from G via the above procedure. Using Lemma 2.9 and the definition of J, we get

in c in 1 2 6c0d d R(G) EG(Φ1,2,J) min CG(1, 2) I G (j) dq 2 c0 . (14) | |≥| |· j J | ∩ \N | ≥ 7 − − q − q q/7 ∈  − ⌈ ⌉ q Now we estimate the cardinalities of preimages. Let G′ R( ). In order to reconstruct ∈ E1,2 a graph G for which (G,G′) R, we need to perform a simple switching which destroys an in c in∈ in edge in EG′ ( G′ (1) I CG′ (1, 2), 1 ) and adds an edge connecting a vertex in G′ (2) c in N ∩ \ { } in N ∩ I CG′ (1, 2) to vertex 1. There are at most (d1 q + 1) choices to destroy an edge in \ in c in in− EG′ ( G′ (1) I CG′ (1, 2), 1 ), and at most (d2 q + 1) possibilities to add an edge N ∩in \ c in { } − connecting ′ (2) I C ′ (1, 2) to 1. Finally, there are at most d possibilities to complete NG ∩ \ G the switching. Thus, by the assumptions on the degree sequences,

1 2 R− (G′) d(d q + 1) . | | ≤ −

This, together with (14), the choice of q and the constant c0, finishes the proof after using Claim 2.1.

Proof of Proposition 2.8. Iterating the last lemma, we deduce that for any q 0.8d + 1 we have ≥

q q 0.8d 0.8d q 0.8d in out 0.9 −⌈ ⌉ ⌈ ⌉ 0.9 −⌈ ⌉ G (d , d ): E (I, [n]) = F . |E1,2| ≤ |E1,2 | ≤ ∈Dn G 

15 Hence,

q 0.1d 1 in out P( )= P( ) 0.9 − P G (d , d ): E (I, [n]) = F . E1,2 E1,2 ≤ ∈Dn G q 0.9d ≥X  Similarly, we have

0.1d 1 in out P( ) 0.9 − P G (d , d ): E (I, [n]) = F Ei,j ≤ ∈Dn G for any i = j. Applying the union bound and the definition of , we deduce that 6 E2.8 2 0.1d 1 in out P( G : E (I, [n]) = F ) n 0.9 − P G (d , d ): E (I, [n]) = F . E2.8 ∩{ G } ≤ ∈Dn G The result follows in view of the assumptions on nand d. Remark 2.11. Let us emphasize that much sharper bounds on the number of common in-neighbors can be obtained by applying results proved later in this paper. However, not being the central subject of this work, no improvements to Proposition 2.8 will be pursued.

3 A concentration inequality for a matrix row

Take a large enough natural number n, two degree sequences din, dout Rn, satisfying (6), 1 2 ∈m and a non-negative integer number m c0n. Further, let Y ,Y ,...,Y be 0, 1 -vectors such that the set of matrices ≤ { } := M (din, dout) : row (M)= Y i for all i m Mn ∈Mn i ≤ is non-empty. The parameters n, din, dout, m and Y 1,Y 2,...,Y m are fixed throughout this g section. As we mentioned in Section 2, we always assume (6). Our goal here is to show that, under certain conditions on the degree sequences and vectors Y 1,...,Y m, the (m + 1)-st row of the random matrix uniformly distributed in the set n enjoys strong concentration properties. M For each ℓ n, define g ≤ p := din i m : Y i =1 . (15) ℓ ℓ − |{ ≤ ℓ }| 1 m Everywhere in this section, we assume that vectors Y ,...,Y are such that pℓ’s satisfy din p (1 2c )din ℓ [n]. (16) ℓ ≥ ℓ ≥ − 0 ℓ ∀ ∈ in in Note that the above condition implies pℓ pℓ′ dℓ dℓ′ +4c0d for all ℓ,ℓ′ [n]. Let us remark that in the second part of the section| − we|≤| will employ− | much stronger assumptions∈ on pℓ. Further, let Ω be the set of all 0, 1 -vectors v Rn such that supp v = dout . Then { } ∈ | | m+1 we can define an induced probability measure PΩ on Ω by setting P (A) := P row (M) A M , A Ω. Ω m+1 ∈ | ∈ Mn ⊂ Let = , Ω . Consider the filtration of σ-algebras n on (Ω, P ) which reveals the F0 {∅ } {Fgi}i=0 Ω coordinates of the (m + 1)-st row one by one, i.e. i is generated by i 1 and by the variable F F − vi (where v is distributed on Ω according to the measure PΩ) for any i =1, 2,...,n.

16 3.1 Distribution of the i-th coordinate Everywhere in this sub-section, we assume that the number d satisfies conditions of Propo- sition 2.8, i.e. d C ln n. ≥ 2.8 We fix a number i n and numbers ε1,ε2 ...,εi 1 0, 1 such that PΩ v Ω: vj = − ε for all j < i > 0. Let≤ us denote ∈ { } { ∈ j } Q := v Ω: v = ε for all j < i { ∈ j j } and let P := P ( Q) be the induced probability measure on Q. By Q we denote Q Ω ·| Fk ∩ restrictions of the previously defined σ-algebras to Q. Obviously, k Q = , Q for all k i 1. F ∩ {∅ } ≤The− goal of the sub-section is to develop machinery for dealing with arbitrary functions on Q. Loosely speaking, given a function h : Q R satisfying certain conditions, we will study the “impact” of the i-th coordinate of its argument→ on its value. Then, in Sub-section 3.2, we will apply the relations established here, together with the Freedman inequality, to obtain concentration inequalities for the (m + 1)-st row of a random matrix uniformly distributed on . The central technical statement of this part of the paper is Lemma 3.8. On the Mn way to stating and proving the lemma, we will go through several auxiliary statements and introduceg several useful notions.

Lemma 3.1. Let Q be as above, and let k = ℓ i,...,n . Further, let vectors v, v′ Q be 6 ∈{ } ∈ such that j n : v = v′ = k,ℓ with v = v′ =1 and v = v′ =0. Then { ≤ j 6 j} { } k ℓ ℓ k

P (v) γ P (v′), Q ≤ k,ℓ Q where 1 p p p ℓ 1 ℓ k 1 γk,ℓ := c d pℓ

(Q) := M : row (M) Q Mn ∈ Mn m+1 ∈  and g g

(w) := M (Q) : row (M)= w , w = v, v′. Mn ∈ Mn m+1  With these notations,g we have g

n(w) PQ(w)= |M |, w = v, v′. n(Q) |Mg | g 17 Next, we denote

c ∗(w) := M (w): supp col (M) supp col (M) [m] 0.9d , w = v, v′, Mn ∈ Mn | k ∩ ℓ ∩ | ≤ and for any non-negative integer r 0.9d we set g g ≤ c ∗(w,r) := M (w): supp col (M) supp col (M) [m] = r , w = v, v′. Mn ∈ Mn | k ∩ ℓ ∩ |  Clearly,g g 0.9d ⌊ ⌋ ∗(w)= ∗(w,r). (18) Mn Mn r=1 G Applying the “matrix” version ofg Proposition 2.8g to the set ∗(v), we get Mn ∗ n (v) 1 exp( c d) ng(v) . (19) |M | ≥ − − 2.8 |M |

 ∗ ∗ Fix an integer r 0.9d. Weg shall compare the cardinalitiesg of n (v,r) and n (v′,r). ≤ ∗ ∗ M M Let us define a relation R (v,r) (v′,r) as follows: ⊂ Mn × Mn ∗ c g g Pick any M n (v,r) and s [m] supp colℓ(M) supp colk(M). Clearly, we have ∈ M ∈ ∩ s \ Mik = Msℓ = 1 and Miℓe= Mgsk = 0. LetgM be the matrix obtained from M by a simple switching operationg on the entries (i, k), (i, ℓ), (s,k), (s,ℓ). It is easy to see that M s belongs ∗ s c to (v′, t). We set R(M) := M : s [m] supp col (M) supp col (M) . Mn { ∈ ∩ ℓ \ k } Thus, g R(Me) = [m]c supp col (M) supp col (M) = p r. | | | ∩ ℓ \ k | ℓ − Further, it is not difficult to check that R( ∗(v,r)) = ∗(v ,r) and for any M e n n ′ ′ ∗ M M ∈ n (v′,r), we have M e g g 1 c g R− (M ′) = [m] supp colk(M ′) supp colℓ(M ′) = pk r. | | | ∩ \ | − Hence, by Claime 2.1, pℓ r ∗(v,r) = − ∗(v′,r) . |Mn | p r |Mn | k − Using this together with (18)g and (19), we can writeg

0.9d ⌊ ⌋ ∗ ∗ pℓ r ∗ 1 exp( c d) n(v) n (v) = n (v,r) max − n (v′) . − − 2.8 |M |≤|M | |M | ≤ 1 r 0.9d p r |M | r=1 ≤ ≤ k −  X g g g g Finally, we divide both sides by (Q) and notice that |Mn | p r p p p ℓ g ℓ ℓ k max − = 1pℓ

18 Remark 3.2. Note that under our assumptions on pℓ and d, we have

1 8c γ 1+50c for all k,ℓ 1, 2,...,n . − 0 ≤ k,ℓ ≤ 0 ∈{ }

We define a relation Q Q as R ⊂ ×

(v, v′) if and only if j n : v = v′ =2, (20) ∈ R |{ ≤ j 6 j}| i.e. the pair (v, v′) belongs to if v′ can be obtained from v by transposing two coordinates. R Further, let us define sets T+ and T0:

T := v Q : v =1 and T := v Q : v =0 , (21) + { ∈ i } 0 { ∈ i } so that Q = T T . Denote by T T the restriction of the relation to T T . + ⊔ 0 R+ ⊂ + × 0 R + × 0 For a vector v′ T , let N be the number of coordinates of v′ equal to 1, starting from the ∈ 0 i-th coordinate. Note that this number does not depend on the choice of v′ T , and is ∈ 0 entirely determined by the values of the signs ε1,ε2,...,εi 1 which we fixed at the beginning − of the sub-section. More precisely,

N := dout ε . (22) i − j j

1 1 δ := max 1 γ − , 1 γ − , 1 γ , 1 γ , k,ℓ 1, 2,...,n , (23) k,ℓ | − k,ℓ | | − ℓ,k | | − k,ℓ| | − ℓ,k| ∈{ }  where γk,ℓ are defined by (17). From Remark 3.2, it immediately follows that δk,ℓ 1/4 for all k,ℓ 1,...,n . Moreover, a simple computation shows that ≤ ∈{ } 40 δ p p + 4 exp( c d), k,ℓ 1, 2,...,n . (24) k,ℓ ≤ d | k − ℓ| − 2.8 ∈{ }

Lemma 3.3. Suppose that the sets T0, T+ are non-empty. Then

2(n i N + 1) n (n i N + 1)P (T ) NP (T ) P (T ) − − δ − − Q + − Q 0 ≤ Q + n i i,ℓ ℓ=i+1 − X 1 (n i N + 1)P (T ), ≤ 2 − − Q + where δk,ℓ are defined by (23).

19 Proof. First note that

PQ(v) 1 PQ(T+)= PQ(v)= = PQ(v). +(v) n i N +1 v T+ v T+ v′ +(v) (v,v′) + X∈ X∈ ∈RX |R | − − X∈R

Similarly, for T0 we have 1 P (T )= P (v′). Q 0 N Q (v,v′) + X∈R Hence,

PQ(v′) (n i N +1)PQ(T+) NPQ(T0) = PQ(v) PQ(v′) 1 PQ(v). − − − − ≤ − PQ(v) (v,v′) + (v,v′) + X∈R X∈R

Since for any pair (v, v′) , v and v′ differ just at one coordinate after i-th, we have ∈ R+ n = (v, v′) : v = v′ , R+ ∈ R+ ℓ 6 ℓ ℓ=i+1 G  whence n P P P Q(v′) P (n i N + 1) Q(T+) N Q(T0) 1 P Q(v). − − − ≤ ′ − Q(v) ℓ=i+1 vℓ=vℓ X X′6 (v,v ) + ∈R Applying Lemma 3.1, we obtain

n (n i N + 1)P (T ) NP (T ) δ P (v). (25) − − Q + − Q 0 ≤ i,ℓ Q ℓ=i+1 vℓ=0 X vXT+ ∈ P Let us now compare the quantities aℓ := vℓ=0 Q(v) for two different values of ℓ. Fix v T+ ∈ ℓ,ℓ′ > i (ℓ = ℓ′) and define a bijection f : v T : v =0 v T : v ′ =0 as follows: 6 { ∈P + ℓ }→{ ∈ + ℓ } given v T , if v = v ′ = 0 then we set f(v) := v; otherwise, if v = 0 and v ′ = 1 then we ∈ + ℓ ℓ ℓ ℓ let f(v) be the vector obtained by swapping the ℓ-th and ℓ′-th coordinates of v. Note that whenever v = f(v), we have (v, f(v)) . Hence, using Lemma 3.1, we get 6 ∈ R

aℓ PQ(v) max max(1,γℓ′,ℓ) 2, aℓ′ ≤ vℓ=0 PQ(f(v)) ≤ ≤ v T+ ∈ where the last inequality follows from Remark 3.2. This implies

2 n max aℓ 2 min aℓ aℓ. ℓ>i ≤ ℓ>i ≤ n i − ℓX=i+1 20 Plugging this estimate into (25), we deduce that

2 n n (n i N + 1)P (T ) NP (T ) δ a ′ . − − Q + − Q 0 ≤ n i i,ℓ ℓ ℓ=i+1 ℓ′=i+1 − X X

The proof is finished by noticing that

n

aℓ′ = PQ(v)=(n i N + 1) PQ(T+). ′ − − ℓ =i+1 (v,v′) + X X∈R

Assume that T , T are non-empty. Given a couple (v, v′) , define 0 + ∈ R+

PQ(v) PQ(v′) ρ(v, v′) := and ρ′(v, v′) := . (n i N + 1) P (T ) N P (T ) − − Q + Q 0

Note that ρ and ρ′ are probability measures on +. In what follows, given a function h : Q R, by E we denote the expectation of theR restriction of h to T with respect to → T+ + PQ, i.e., E 1 P T+ h = h(v) Q(v)= ρ(v, v′) h(v). PQ(T+) v T+ (v,v′) + X∈ X∈R Similarly, 1 E h = h(v′) P (v′)= ρ′(v, v′) h(v′). T0 P Q Q(T0) ′ v T0 (v,v′) + X∈ X∈R We shall proceed by comparing the measures ρ and ρ′:

Lemma 3.4. Assume that the sets T , T are non-empty. Let (v, v′) and let q > i be + 0 ∈ R+ an integer such that v = v′ . Then q 6 q 4 n ρ(v, v′) ρ′(v, v′) ρ′(v, v′) δ + δ . | − | ≤ i,q n i i,ℓ h − ℓX=i+1 i Proof. Using Lemma 3.1 and the definition (23), we get

PQ(v) 1 δi,q 1+ δi,q. − ≤ PQ(v′) ≤ Now, from Lemma 3.3, we have

n n 2 NPQ(T0) 2 1 δi,ℓ 1+ δi,ℓ. − n i ≤ (n i N + 1)PQ(T+) ≤ n i − ℓX=i+1 − − − ℓX=i+1

21 Recall that the assumptions on d and pℓ’s imply that δi,ℓ 1. Hence, putting together the last two estimates, we obtain ≤

n n 2 ρ(v, v′) 2 (1 δi,q) 1 δi,ℓ (1 + δi,q) 1+ δi,ℓ . − − n i ≤ ρ′(v, v′) ≤ n i h − ℓX=i+1 i h − ℓX=i+1 i The proof is finished by multiplying the inequalities by ρ′(v, v′) and employing the bound δ 1. i,q ≤ Lemma 3.5. Let, as before, T0, T+ be given by (21), and assume that both T0, T+ are non- empty. Let h be any function on Q. Then for any λ R, we have ∈ E E 1 T+ h T0 h sup h(v) h(v′) | − | ≤ n i N +1 v T+ − ∈ v′ +(v) − − ∈RX 8 E h λ n + T0 | − | δ n i i,ℓ − ℓX=i+1 4 n + δi,ℓ max h(v) h(v′) , n i (v,v′) − ℓ=i+1 ∈R  − X  where N is defined by (22). Before proving the lemma, let us comment on the idea behind the estimate. Suppose that the function h is a linear functional in Rn (actually this is the only case interesting for us). Then, loosely speaking, we want to show that the difference E h E h is essentially | T+ − T0 | determined by the value h(ei). This corresponds to the first term of the bound, whereas the second and third summands are supposed to be negligible under appropriate conditions on 8 ET h λ n 0 | − | h (in fact, the second summand n i ℓ=i+1 δi,ℓ can be problematic and requires special handling). − P Proof of Lemma 3.5. Fix any λ R. Using the triangle inequality and the definition of E E ∈ T+ h and T0 h, we obtain β := E h E h | T+ − T0 | h(v) h(v′) ρ(v, v′) + ρ(v, v′) ρ′(v, v′) h(v′) ≤ − − (v,v′) + (v,v′) + X∈R  X∈R  = h(v) h(v′) ρ(v, v′) + ρ(v, v′) ρ′(v, v′) h(v′ ) λ − − − (v,v′) + (v,v′) + X∈R  X∈R   h(v) h(v′) ρ(v, v′) + ρ(v, v′) ρ′(v, v′) h(v′) λ ≤ − − − (v,v′) + (v,v′) + X∈R X∈R  n

= h(v) h(v′) ρ(v, v′) + ρ(v, v′) ρ′(v, v′) h(v′) λ . ′ − ′ − − (v,v ) + ℓ=i+1 vℓ=vℓ X∈R X X′6  (v,v ) + ∈R

22 For the first term, applying the definition of ρ(v, v′), we get

h(v) h(v′) PQ(v) h(v) h(v′) ρ(v, v′) − − ≤ (n i N + 1) PQ(T+) (v,v′) + (v,v′) + − − X∈R  X∈R 1 sup h(v) h(v′) . ≤ n i N +1 v T+ − − − ∈ v′ +(v) ∈RX

Next, in view of Lemma 3.4,

n ρ(v, v′) ρ′(v, v′) h(v′) λ ′ − − ℓ=i+1 vℓ=vℓ ′6 X (v,vX) + ∈R n 4 n h(v′) λ ρ′(v, v′) δ + δ ≤ − i,ℓ n i i,q ℓ=i+1 ′ q=i+1 vℓ=vℓ − ′6 h i X (v,vX) + X ∈R E n n 4 T0 h λ = | − | δ + δ h(v′) λ ρ′(v, v′). n i i,ℓ i,ℓ − ℓ=i+1 ℓ=i+1 ′ − vℓ=vℓ ′6 X X (v,vX) + ∈R Denote

α := max h(v) h(v′) and aℓ := α + h(v′) λ ρ′(v, v′) forany ℓ > i. (v,v′) | − | − ∈R ′ vℓ=vℓ ′6 (v,vX) +  ∈R Then, obviously, n n

δi,ℓ h(v′) λ ρ′(v, v′) δi,ℓ aℓ. (26) ′ − ≤ ℓ=i+1 vℓ=vℓ ℓ=i+1 ′6 X (v,vX) + X ∈R Similarly to the argument within the proof of Lemma 3.3, we shall compare aℓ’s for any two distinct values of ℓ. Fix ℓ = ℓ′ > i and define a bijection f : v′ T : v′ =1 v′ 6 { ∈ 0 ℓ }→{ ∈ T : v′′ =1 as follows: given v′ T with v′ = v′′ = 1, set f(v′) := v′; otherwise, if v′ =1 0 ℓ } ∈ 0 ℓ ℓ ℓ and vℓ′′ = 0 then let f(v′) to be the vector obtained by swapping ℓ-th and ℓ′-th coordinate of v′. Note that in the latter case (v′, f(v′)) . Applying Lemma 3.1, we get ∈ R P aℓ α + h(v′) λ Q(v′) α + h(v′) λ max − max(1,γℓ,ℓ′ ) max − , (27) ′ ′ P ′ aℓ ≤ vℓ=1 α + h f(v′) λ Q(f(v′)) ≤ vℓ=1 α + h f(v′) λ ′  ′  v T0 − v T0 − ∈ ∈     where γℓ,ℓ′ is defined by (17). On the other hand, since (v′, f(v′)) whenever v′ = f(v′), we have ∈ R 6

h(v′) λ h f(v′) λ + h(v′) h f(v′) h f(v′) λ + α. − ≤ − − ≤ −    23 Plugging the last relation into (27) and using the bound γ ′ 2, we get ℓ,ℓ ≤ a 4a ′ , ℓ,ℓ′ i +1,...,n . ℓ ≤ ℓ ∈{ } This implies 4 4 E max aℓ 4 min aℓ aℓ = α + T0 h λ . ℓ>i ≤ ℓ>i ≤ n i n i | − | − ℓ>i − X  Together with (26), the last relation gives n 4 n δ h(v′) λ ρ′(v, v′) δ α + E h λ . i,ℓ − ≤ n i i,ℓ T0 | − | ℓ=i+1 ′ ℓ=i+1 vℓ=vℓ − ′6   X (v,vX) + X  ∈R It remains to combine the above estimates. E E Remark 3.6. We do not know if a more careful analysis can give a bound for T+ h T0 h in the above lemma, not involving dependence on E h λ . | − | T0 | − | Let, as before, h be a function on Q. We set X := E[h Q], k = i 1, . . . , n, k |Fk ∩ − where σ-algebras k are defined at the beginning of the section. Clearly, (Xk)i 1 k n is a F − ≤ ≤ martingale. Denote by (dk)i k n the difference sequence, i.e. ≤ ≤ dk := Xk Xk 1, k = i, . . . , n. − − Further, let M and σi be smallest non-negative numbers such that dk M a.s. for all n E 2 2 | | ≤ i k n, and (dk k 1 Q) σi a.s. (note that, since our probability space is ≤ ≤ k=i+1 |F − ∩ ≤ finite, such numbers always exist). P Lemma 3.7. Assume that T0 is non-empty. Then, with the above notations, we have E (h E h)2 σ 2. T0 − T0 ≤ i E E 2 Proof. First, note that T0 (h T0 h) , viewed as a (constant) function on T0, is just a − 2 restriction of the random variable E (h E[h i Q]) i Q to the set T0. Hence, it is sufficient to prove the inequality − |F ∩ | F ∩   E (h E[h Q])2 Q σ 2. − |Fi ∩ |Fi ∩ ≤ i We have  n  h E[h Q]= d , − |Fi ∩ k kX=i+1 whence n n E (h E[h Q])2 Q = E d d Q σ 2 + E d d Q . − |Fi ∩ |Fi ∩ k ℓ |Fi ∩ ≤ i k ℓ |Fi ∩ k,ℓ=i+1 k,ℓ=i+1   X   Xk=ℓ   6 Finally, we note that E d d Q = 0 for all k = ℓ. k ℓ |Fi ∩ 6   24 Now, we can state the main technical result of the sub-section: Lemma 3.8. Let, as before, the relation , sets T and T and the number N be defined by R 0 + (20), (21) and (22), respectively, and let δi,ℓ be given by (23). Then, with the above notation for the martingale sequence,

n 1 8σi di sup h(v) h(v′) + δi,ℓ | | ≤ n i N +1 v T+ − n i − − ∈ v′ +(v) − ℓ=i+1 ∈RX X 4 n + δi,ℓ max h(v) h(v′) , n i (v,v′) − ℓ=i+1 ∈R  − X  and

E 2 4N 1 [di i 1 Q] sup h(v) h(v′) |F − ∩ ≤ n i N +1 n i N +1 v T+ − − − − − ∈ v′ +(v) h ∈RX 8σ n + i δ n i i,ℓ − ℓ=i+1 Xn 4 2 + δi,ℓ max h(v) h(v′) . n i (v,v′) − ℓ=i+1 ∈R  − X  i

Proof. When one of the sets T0 or T+ is empty, we have di = 0, and the statement is obvious. Otherwise, it is easy to see that E P E P E E E Xi 1 = Qh = Q(T+) T+ h + Q(T0) T0 h and Xi = 1T+ T+ h + 1T0 T0 h, − where 1T+ , 1T0 are indicators of the corresponding subsets of Q. Thus, we have d = 1 P (T ) E h E h 1 P (T ) E h E h , (28) i T+ Q 0 T+ − T0 − T0 Q + T+ − T0 whence     d max P (T ), P (T ) E h E h E h E h . | i| ≤ Q + Q 0 | T+ − T0 |≤| T+ − T0 | E Applying Lemma 3.5 withλ := T0 h, we get E E n 1 8 T0 h T0 h di sup h(v) h(v′) + | − | δi,ℓ | | ≤ n i N +1 v T+ − n i − − ∈ v′ +(v) − ℓ=i+1 ∈RX X 4 n + δi,ℓ max h(v) h(v′) . n i (v,v′) − ℓ=i+1 ∈R  − X 

The first part of the lemma follows by using Lemma 3.7. Next, we calculate the conditional second moment of di. As an immediate consequence of (28), we get

2 2 d2 = P (T )2 E h E h 1 + P (T )2 E h E h 1 , i Q 0 T+ − T0 T+ Q + T+ − T0 T0   25 whence

E 2 P 2P E E 2 P 2P E E 2 [di i 1 Q]= Q(T0) Q(T+) T+ h T0 h + Q(T+) Q(T0) T+ h T0 h |F − ∩ − − P P E E 2 = Q(T+) Q(T0) T+ h T0 h   − E  Applying Lemma 3.5 with λ := T0 h and Lemma 3.7, we get

E 2 [di i 1 Q] 1 P |FP− ∩ sup h(v) h(v′) Q(T+) Q(T0) ≤ n i N +1 v T+ − − − ∈ v′ +(v) h ∈RX 8σ n + i δ n i i,ℓ − ℓ=i+1 Xn 4 2 + δi,ℓ max h(v) h(v′) n i (v,v′) − ℓ=i+1 ∈R  − X  i

P P P 2N It remains to note that Q(T+) Q(T0) Q(T+) n i N+1 , in view of Lemma 3.3. ≤ ≤ − − Both estimates of the absolute value of di and of its conditional variance contain the term 1 n i N+1 sup h(v) h(v′) . In the next simple lemma, we bound the expression in − − v T+ v′ +(v) − ∈ ∈R the case when hPis a linear functional. Lemma 3.9. Let a function h : Q R be given by h(v) := v, x for a fixed vector x Rn. Further, assume that i n/4. Then→ h i ∈ ≤

1 8 x 1 sup h(v) h(v′) xi + k k . n i N +1 v T+ − ≤| | n − − ∈ v′ +(v) ∈RX

Proof. Obviously, for any couple (v, v′) with v = v′ for some ℓ > i we have ∈ R+ ℓ 6 ℓ

h(v) h(v′) x + x . − ≤| i| | ℓ| Whence, for any v T , ∈ +

h(v) h(v′) +(v) xi + xℓ +(v) xi + x 1. ′ − ≤ |R || | | |≤|R || | k k v +(v) ℓ>i:vℓ=0 ∈RX X

It follows that

1 8 x 1 sup h(v) h(v′) < xi + k k . n i N +1 v T+ − | | n ∈ v′ +(v) − − ∈RX

26 3.2 (m + 1)-st row is conditionally concentrated In this sub-section we show that given a fixed vector x Rn and a random vector v dis- tributed on Ω according to the measure P , the scalar product∈ v, x is concentrated around Ω h i its expectation. Naturally, this holds under some extra assumptions on the quantities pℓ introduced at the beginning of the section, which measure how close to “homogeneous” the probability space (Ω, PΩ) is. As everywhere in the sub-section, we assume that the degree sequences and parameters pℓ satisfy conditions (6) and (16). Additionally, throughout the sub-section we assume that d C ln2 n, (29) ≥ 3.2 where C3.2 is a sufficiently large universal constant (let us note that in its full strength the assumption is only used in the proof of Lemma 3.11 below). Define a vector = P ( 1, 2,..., n) as P P P n := p p , ℓ n. Pℓ | ℓ − j| ≤ j=1 X Note that, in view of (24) and (29), we have

n 40 δ +1 (30) i,ℓ ≤ d Pi Xℓ=1 for any i n. ≤ In the previous sub-section, we estimated parameters of the martingale difference se- quence generated by the variable , x and σ-algebras ℓ. Recall that the estimate of the h· i F σi n upper bound for di from Lemma 3.8 involves the quantity n i ℓ=1 δi,ℓ. In Section 4, apply- | | − ing (30), we will show that for “most” indices i, the sum n δ is bounded by O(n/√d), ℓ=1Pi,ℓ whereas, as we shall see below, σi = O( d/n) for any unit vector x. Thus, the magnitude n P of σi δ is of order n 1/2, and it is necessarily dominated by a constant multiple of n i ℓ=1 i,ℓ − p − n √ x . However, for some indices i the sum ℓ=1 δi,ℓ can be as large as n ln n/ d. Thus, a k k∞ P 1/2 straighforward argument would give C( x + n− ln n) as an upper bound for di, and the ∞ implied row concentration inequality wouldk k P bear the logarithmic error term. To overcome this problem, we have to consider separately two cases: when the -norm of the vector x ∞ is “large” and when it is “small”. In the first case (treated in Lemmak·k 3.10) the logarithmic spikes of the vector do not create problems. In the second case, however, we have to apply P a special ordering to coordinates of the row so that large spikes of are “balanced” by a P small magnitude of σi (which, for those coordinates i, must be much smaller than d/n). The second case is more technically involved and is given in Lemma 3.11. Finally, when p we have both statements in possession, we can complete the proof of the row concentration inequality.

Lemma 3.10. For any L > 0 there exist α = α(L) (0, 1) and β = β(L) (0, 1) with n 1 ∈ ∈ the following property. Let x S − be an αn-sparse vector and assume that 1) x ∈ k k∞ ≥

27 1/2 ln(2n) n− and 2) ψ,n Ln√d, where the norm ψ,n is defined by (1). Then, denoting by η the random variablekPk ≤ k·k

η = η(v) := v, x E , x , v Ω, h i − Ωh· i ∈ we have βλη d EΩ e exp g(λ x ) , λ> 0, ≤ n x 2 k k∞  k k∞  with g( ) defined by (9). · Proof. Let L> 0 be fixed. We define α = α(L) as the largest number in (0, 1/4] such that 2 32 640C L√α ln 1, (31) · 2.2 α ≤ where the constant C2.2 is given in Lemma 2.2. n 1 Pick an αn-sparse vector x S − and let π be a permutation on [n] such that xπ(1) x ... x . For any∈ i n, we denote by π( ) the σ-algebra generated| | by ≥ | π(2)| ≥ ≥ | π(n)| ≤ Fi coordinates π(1),...,π(i) of a vector distributed on Ω according to the measure PΩ, i.e.

π( ) := σ v Ω: v = b for all j i , b , b ,...,b 0, 1 . Fi ∈ π(j) j ≤ 1 2 i ∈{ } Define a function hon Ω by 

h(v) := v, x , v Ω, h i ∈ and let X := E [h π( )], ℓ n, ℓ Ω | Fℓ ≤ and dℓ := Xℓ Xℓ 1. Further, let M and σ be the smallest non-negative numbers such − − n E 2 2 that dℓ M everywhere on Ω for all ℓ n, and (dℓ π( ℓ 1)) σ everywhere on | | ≤ ≤ ℓ=1 | F − ≤ Ω. Clearly, for any i > αn we have di = 0. Now, fix i αn and follow the notations of the previous sub-section (with π(ℓ) replacing ℓ whereP appropriate).≤ More precisely, we take an atom of the algebra π( i 1) i.e. the set Q of vectors in Ω with some prescribed values − of their coordinates with indicesF π(1),...,π(i 1). Then is a collection of all pairs of vectors from Q which differ by two coordinates− and N is theR number of non-zero coordinates in every v Q, excluding coordinates with indices π(1),...,π(i 1). In view of the choice of π and the∈ definition of h, we have −

max h(v) h(v′) 2 xπ(i) . (v,v′) | − | ≤ | | ∈R Further, using the condition δ 1/4, we get π(i),ℓ ≤ 4 n δπ(i),ℓ max h(v) h(v′) 4 xπ(i) . n i (v,v′) − ≤ | | ℓ=1 ∈R  − X 

28 Together with Lemma 3.8, Lemma 3.9 and (30), this gives

8 x 640σ 16σ d 5 x + k k1 + + | i| ≤ | π(i)| n dn Pπ(i) n everywhere on Q and, in fact, everywhere on Ω as the right-hand side of the last relation does not depend on the choice of atom Q. Further, applying the second part of Lemma 3.8 with Lemma 3.9 and relations i n/4, N d n/2+ c n and (30), we get ≤ ≤ ≤ 0 n 2 E 2 32d 8 x 1 16σ [di i 1] 5 xπ(i) + k k + δπ(i),ℓ |F − ≤ n | | n n h Xℓ=1 i 32d 48 640σ 16σ 2 75x 2 + +3 + , ≤ n π(i) n dn Pπ(i) n h   i 2 2 where in the last inequality we used the convexity of the square and x 1 αn x 2 n/4. E 2 k k ≤ k k ≤ Again, the bound for [di i 1] holds everywhere on Ω. Summing over all i αn, we get − from the last relation |F ≤

αn 32d ⌊ ⌋ 640σ 16σ 2 σ2 87+3 + ≤ n dn Pπ(i) n i=1 h X   i αn 32d 6 6402σ2 ⌊ ⌋ 6 162σ2 87 + · 2 + · . ≤ n d2n2 Pπ(i) n i=1 h X i In view of the condition on , relation (31) and Lemma 2.2, we have kPkψ,n αn 32d 6 6402σ2 ⌊ ⌋ σ2 · 2 . n d2n2 Pπ(i) ≤ 4 i=1 X Thus, the self-bounding estimate for σ implies Cd σ2 < , n for an appropriate constant C > 0. Whence, from the above estimate of d ’s we obtain | i| 8 x 640σ 16σ M 5 x + k k1 + + ≤ k k∞ n dn kPk∞ n 8 640 √C L ln(en) 16√C 5 x + + + ≤ k k∞ √n √n n where we employed the relations ln(en) ψ,n Ln√d ln(en) (see formula (7)) kPk∞ ≤ kPk ≤ and the estimate for σ established above. This, together with the assumption x 1/2 k k∞ ≥ ln(2n) n− , implies that M C′(1 + L) x for an appropriate constant C′. It remains ≤ k k∞ to apply Theorem 2.5 in order to finish the proof.

29 The next lemma is a counterpart of the above statement, covering the case when the -norm of the vector x is small. k·k∞ Lemma 3.11. For any L > 0 there exist α = α(L) (0, 1) and β = β(L) (0, 1) with n 1 ∈ ∈ the following property. Let x S − be an αn-sparse vector and assume that 1) x < 1/2 ∈ k k∞ ln(2n) n− and 2) Ln√d. Then, denoting by η the random variable kPkψ,n ≤ η = η(v) := v, x E , x , v Ω, h i − Ωh· i ∈ we have βλη d EΩ e exp g(λ x ) , λ> 0. ≤ n x 2 k k∞  k k∞  Proof. Again, we fix L> 0. Let C1 > 0 be a large enough universal constant (whose exact value can be determined from the proof below). We define α = α(L) as the largest number in (0, 1/4] such that e 1 C L2α ln2 . (32) 1 α ≤ 2 Let π be a permutation on [n] such that x = 0 for all i > supp x and the sequence | π(i)| | | ( π(i))i supp x is non-decreasing. We define the function h, σ-algebras π( ℓ) and the differ- P ≤| | F ence sequence (dℓ)ℓ n the same way as in the proof of Lemma 3.10. We have di = 0 for all ≤ i> supp x . Let M and σ (i supp x ) be the smallest numbers such that everywhere on | | i ≤| | Ω we have d M for all i supp x and | i| ≤ ≤| | supp x | | E 2 2 (dℓ π( ℓ 1)) σi , i supp x . | F − ≤ ≤| | Xℓ=i We fix any i supp x and follow the notations of the previous sub-section (the way it ≤ | | was described in Lemma 3.10). Recall that max(v,v′) h(v) h(v′) 2 x . Now, using ∈R ∞ Lemmas 3.8 and 3.9, inequality (30), as well as relations i −n/4 and≤N k kd n/2+ c n, ≤ ≤ ≤ 0 we obtain

n 2 E 2 32d 8 x 1 8 [di π( i 1)] xπ(i) + k k +( x + σi) δπ(i),ℓ | F − ≤ n | | n k k∞ n i h − Xℓ=1 i 32d 8 16 40 π(i) 2 xπ(i) + +( x + σi) P +1 ≤ n | | √n k k∞ n d h  i 2 32d 2 256 2 2 256 40 π(i) 4xπ(i) + + (4 x +4σi ) P +1 , ≤ n n k k∞ n2 d h   i where in the last inequality we used the convexity of the square. Since σℓ σi for any i ℓ supp x , we have ≤ ≤ ≤| | 40 2 E 2 32d 2 256 2 2 256 π(ℓ) [dℓ π( ℓ 1)] 4xπ(ℓ) + + (4 x +4σi ) P +1 | F − ≤ n n k k∞ n2 d h   i

30 for any i ℓ supp x . Summing over all such ℓ’s, we get ≤ ≤| | supp x supp x | | | | 2 2 128d 2 64 2 2 256 40 π(ℓ) σi xπ(ℓ) + supp x i+1 +( x +σi ) P +1 . (33) ≤ n n | |− k k∞ n2 d ℓ=i ℓ=i h X  X   i Note that, by the definition of -norm and in view of the fact that the sequence k·kψ,n ( π(ℓ))ℓ supp x is non-decreasing, we get P ≤| | en Ln√d ln , i ℓ supp x . (34) Pπ(ℓ) ≤ supp x ℓ +1 ≤ ≤| | | | −  Moreover, Lemma 2.2 implies

supp x | | en 2 CL2n2d supp x i +1 ln2 , Pπ(ℓ) ≤ | | − supp x i +1 ℓ=i | | −  X  for a sufficiently large universal constant C. Plugging in the estimate into (33), we get

supp x | | 2 C′d 2 C′dm 2 2 2 m 2 en σi xπ(ℓ) + + C′L ( x + σi ) ln , ≤ n n2 k k∞ n m Xℓ=i   for an appropriate constant C′, where m := supp x i + 1. Now, if C in (32) is sufficiently | | − 1 large, the above self-bounding estimate for σi implies

supp x | | 2 2C1d 2 2C1dm 2 2 m 2 en σi xπ(ℓ) + +2C1L x ln . ≤ n n2 k k∞ n m Xℓ=i   Using the condition x ln(2n)/√n, the assumption on d given by (29) and relation (32), ∞ we obtain k k ≤ d σ2 := σ 2 C , 1 ≤ 2 n for an appropriate constant C2 and

2 2 2 supp x i +1 σi (1 + L )C3d x | | − , i supp x , ≤ k k∞ n ≤| | for a sufficiently large constant C3. Now, let us turn to estimating the absolute value of d ’s. Again, we fix any i supp x i ≤| | and follow notations of the previous sub-section, replacing ℓ with π(ℓ) where appropriate. By Lemmas 3.8 and 3.9, inequality (30) and the above estimate of σi, we have

n 8 x 1 8 di xπ(i) + k k +( x + σi) δπ(i),ℓ | |≤| | n k k∞ n i − Xℓ=1 L supp x i +1 C4 x 1+ | | − π(i) , ≤ k k∞ n√dr n P h i 31 for some constant C4 > 0. Using first (34) then the relation (32), we deduce that

di C4(1 + L) x . | | ≤ k k∞ Thus, we get that M C4(1+ L) x . Finally, we apply Theorem 2.5 with parameters M ≤ k k∞ and σ estimated above. Now, we can state the main result of the section. Theorem 3.12. For any L > 0 there is γ(L) (0, 1] with the following property: Assume n 1 ∈ that Ln√d, let x S − , and denote by η the random variable kPkψ,n ≤ ∈ η = η(v) := v, x E x, , v Ω. h i − Ωh ·i ∈ Then γλη d EΩ e exp g(λ x ) , λ> 0. ≤ n x 2 k k∞  k k∞  Proof. Let α = α(L) (0, 1) be the largest number in (0, 1/4] satisfying both (31) and (32). We represent the vector∈ x as a sum x = x1 + x2 + ... + xm, where x1, x2,...,xm are vectors with pairwise disjoint supports such that supp xj αn | | ≤ (j m) and m := n/ αn . For every j m, applying either Lemma 3.10 or Lemma 3.11 ≤ ⌊ ⌋ j j ≤ (depending on the -norm of x / x 2), we obtain k·k∞  k k j 2 βλη βλη d x 2 j d max Ee j , Ee− j exp k k g(λ x ) exp g(λ x ) , λ> 0, ≤ n xj 2 k k∞ ≤ n x 2 k k∞  k k∞   k k∞   for some β = β(L) > 0, where η := xj, v E xj, , v Ω. j h i − Ωh ·i ∈

Since η = η1 + η2 + ... + ηm everywhere on Ω, we get from H¨older’s inequality

1 m m m d Eeβλη = E eβληj Eeβmληj exp g(λm x ) . ≤ ≤ n x 2 k k∞ j=1 j=1 ! Y Y  k k∞  The statement follows with γ := β/m.

The above theorem leaves open the question of estimating the expectation EΩ , x . This problem is addressed in the last statement of the section. h· i Proposition 3.13. For any non-zero vector x Rn we have ∈ dout n C d x C E , x m+1 x 3.13 k k1 + 3.13 x Ωh· i − n i ≤ n2 n k klog,nkPkψ,n i=1 X where C > 0 is a sufficiently large universal constant and is defined by (2). 3.13 k·klog,n 32 Proof. Let V be a random vector distributed on Ω according to the measure PΩ. First, we compare expectations of individual coordinates of V, using Lemma 3.1. We let γi,j be defined by (17). Recall that according to Remark 3.2, we have 1 8c0 γi,j 1+50c0. Take any i = j n and define a bijective map f : Ω Ω as − ≤ ≤ 6 ≤ → f (v , v ,...,v ) := (v , v ,...,v ), (v , v ,...,v ) Ω, 1 2 n σ(1) σ(2) σ(n) 1 2 n ∈ where σ is the transposition ofi and j. Then for any v Ω, in view of Lemma 3.1, we have ∈ P (v) max γ ,γ P (f(v)). Ω ≤ i,j j,i Ω Hence,  E V = P v Ω: v =1 Ω i Ω{ ∈ i } max γ ,γ P f(v) ≤ i,j j,i Ω v Ω:vi=1  ∈X  = max γ ,γ P v Ω: v =1 i,j j,i Ω{ ∈ j } E = maxγi,j,γj,i ΩVj. n E out Together with an obvious relation i=1 ΩVi = dm+1, this implies for any fixed i n: ≤ n n 1P max γ ,γ − E V dout max γ ,γ E V , i,j j,i Ω i ≤ m+1 ≤ i,j j,i Ω i j=1 j=1 X  X  whence dout Cdout 1 n E V m+1 m+1 δ , Ω i − n ≤ n n i,j  j=1  X where δi,j are defined by (23) and C > 0 is a universal constant. Thus, for any non-zero vector x =(x1, x2,...,xn) we get, in view of (30), dout n Cdout n 1 n E V, x m+1 x m+1 x δ Ωh i − n i ≤ n | i| n i,j i=1 i=1 j=1 X X  X  out n C′d 1 1 m+1 x + ≤ n | i| ndPi n i=1   out X out n C′d x C′d = m+1k k1 + m+1 x , n2 n2d | i|Pi i=1 X where C′ is a universal constant. Finally, applying Fenchel’s inequality to the sum on the right hand side and using the definition of the Orlicz norms and , we obtain k·kψ,n k·klog,n n n x x = x | i| Pi | i|Pi k klog,nkPkψ,n x i=1 i=1 log,n ψ,n X X k k kPk n x x x | i| ln | i| + exp / ≤k klog,nkPkψ,n x + x Pi kPkψ,n i=1 k klog,n k klog,n  X    (e + 1)n x . ≤ k klog,nkPkψ,n

33 The result follows.

4 Tensorization

The goal of this section is to transfer the concentration inequality for a single row obtained in the previous section (Theorem 3.12) to the whole matrix. Throughout the section, we assume that the degree sequences din, dout satisfy (6) for some d, and that d itself satisfies in out (29). Moreover, we always assume that the set of matrices n(d , d ) is non-empty. It will be convenient to introduce in this section a “global” randomM object — a matrix M in out uniformly distributed on n(d , d ). M in out Let G be a directed graph on n vertices with degree sequences d , d , and let M =(Mij) be the adjacency matrix of G. Next, let I be a subset of [n] (possibly, empty). We define quantities pcol(I, M), prow(I, M) (j n) as in the Introduction (let us repeat the definition j j ≤ here for convenience):

pcol(I, M) := din q I : M =1 = q Ic : M =1 ; j j − |{ ∈ qj }| |{ ∈ qj }| prow(I, M) := dout q I : M =1 = q Ic : M =1 . j j − |{ ∈ jq }| |{ ∈ jq }| Again, we define vectors col(I, M), row(I, M) Rn coordinate-wise as P P ∈ n col(I, M) := pcol(I, M) pcol(I, M) ; Pj | j − ℓ | ℓ=1 Xn row(I, M) := prow(I, M) prow(I, M) . Pj | j − ℓ | Xℓ=1 Clearly, these objects are close relatives of the quantities p and the vector defined in the j P previous section. In fact, if is the subset of all matrices from (din, dout) with a fixed Mn Mn realization of rows from I then pcol(I, ) (j n) and col(I, ) are constants on , which, j · ≤ P · Mn up to relabelling the graph vertices,g correspond to p ’s and from Section 3. j P Note that Theorem 3.12 operates under assumption that the vector , or, ing context of this section, random vectors col(I, M) for appropriate subsets I, haveP small magnitude in P ψ,n-norm — the fact which still needs to be established. For any L > 0, let (L) be P k·kgiven by (4), i.e. E

row col (L)= (I, M) ψ,n, (I, M) ψ,n Ln√d EP kP k kP k ≤ n for any interval subset I [n] of cardinality at most c n . ⊂ 0 o To make Theorem 3.12 useful, we need to show that for some appropriately chosen parameter L the event (L) has probability close to one. Obviously, this will require much stronger EP assumptions on the degree sequences than ones we employed up to this point. But, even under the stronger assumptions on din, dout, proving an upper estimate for col(I, M) , kP kψ,n row(I, M) will require us to use the concentration results from Section 3. In order not kP kψ,n 34 to create a vicious cycle, we will argue in the following manner: First, we apply Theorem 3.12 in the situation when the set I has very small cardinality. It can be shown that in this case col we get the required assumptions on (I, M) ψ,n for free, as long as the degree sequences satisfy certain additional conditions.kP This, in turn,k will allow us to establish the required col bounds for (I, M) ψ,n for “large” subsets I. Finally, having this result in possession, we will be ablekP to use thek full strength of Theorem 3.12 and complete the tensorization. Let us note that condition col(I, M) = O(n√d) for a matrix M (din, dout) kP kψ,n ∈ Mn and a subset I of cardinality at most c0n automatically implies an analog of condition (16), as long as n is sufficiently large. To be more precise, we have the following

Lemma 4.1. There is a universal constant c4.1 > 0 with the following property: Assume that for some matrix M (din, dout) and I [n] with I c n we have ∈Mn ⊂ | | ≤ 0 col(I, M) c nd/ ln n. kP kψ,n ≤ 4.1 Then necessarily din pcol(I, M) (1 2c )din j ≥ j ≥ − 0 j for all j n. ≤ Proof. Assume that pcol(I, M) < (1 2c )din for some i n. Define i − 0 i ≤ J := j n : pcol(I, M) < (1 1.5c )din . ≤ j − 0 j Then, obviously, 

(k,ℓ) I [n]: M =1 1.5c din 1.4c d J . ∈ × kℓ ≥ 0 j ≥ 0 | | j J  X∈

On the other hand,

(k,ℓ) I [n]: M =1 = dout c nd. ∈ × kℓ k ≤ 0 k I  X∈

Thus, J 5 n. This implies that | | ≤ 7 col(I, M) pcol(I, M) pcol(I, M) >c J c d/4 c nd/14. Pi ≥ i − k 0| | ≥ 0 k Jc X∈

Hence, by (7), we get c nd col(I, M) > 0 . kP kψ,n 14 ln(en) The result follows. The above lemma allows us not to worry about condition (16) and focus our attention on the -norm of vectors col(I, M). The bounds for col(I, M) are obtained in k·kψ,n P kP kψ,n Proposition 4.5. But first we need to consider two auxiliary statements.

35 Lemma 4.2. For any L > 0 there are γ(L) (0, 1] and K = K(L) > 0 such that the following holds. Let the degree sequences din and∈dout be such that (din d)n , (dout i − i=1 ψ,n i − n √ d)i=1 ψ,n L d, where ψ,n is defined by (1). Further, let J [n] be a subset of ≤ k·k ⊂ cardinality √d/2 J √d, and let I [n] be any non-empty subset. Define a J - dimensional random≤ |vector| ≤ in RJ as ⊂ | | out c row dk I v(I) := (vk)k J , vk := pk (I, M) | | , k J. ∈ − n ∈

Then for any subset T J and any t K √d T , we have ⊂ ≥ | | tγn P v t exp tγ ln 1+ . k ≥ ≤ − d I T k T   n X∈ o  | || | Proof. Denote I 1/2 x := I − e . | | i i I X∈ To simplify the notation, let us assume that J = 1,..., J (we can permute the de- out { | |}in out gree sequence d accordingly). Take any matrix M n(d , d ). Note that, by the assumption on the cardinality of J, we have ∈ M

col col in in p ([k], M) p ′ ([k], M) d d ′ √d, k< J , ℓ,ℓ′ n. ℓ − ℓ − ℓ − ℓ ≤ | | ≤ Hence, for any j n, k J we have ≤ ≤| | n col([k], M) n√d + din d + n din d . Pj ≤ | ℓ − | | j − | Xℓ=1

Note that 1 n ψ,n by convexity of exp( ). Then, in view of the assumptions on (din d)nk·k ≤, wek·k get · i − i=1 ψ,n

col([k], M) (1 + L)n√d + n din d . Pj ≤ | j − | Thus, by the triangle inequality,

col([k], M) (1 + L)n√d (1, 1,..., 1) + n (din d)n (L + 2)n√d kP kψ,n ≤ k kψ,n i − i=1 ψ,n ≤ in out for any k J and M n(d , d ). For every k J , we denote by ηk the random variable ≤ | | ∈ M ≤ | | η := row (M), xI E row ( ), xI row (M), j k 1 . k h k i − h k · i| j ≤ − col In view of the above estimate of ([k], M) ψ,n and Theorem 3.12, there is γ′(L) (0, 1) kP k ∈ such that for any λ> 0 we have

′ d I γ λ √ I ηk E e | | row (M), j k 1 2 exp | | g(λ) , | j ≤ − ≤ n h i   36 I 1/2 for every k J (recall that x = I − ). Further, for any k J we have ≤| | k k∞ | | ≤| | out c out 1 1 row dk I I dk I vk = pk (I, M) | | = rowk(M), x | | . I I − n h i − np | | | | Thus, usingp Propositionp 3.13 and Lemma 2.3, we get

out I dk I vk I ηk + I E rowk( ), x rowj(M), j k 1 | | ≤ | | | | h · i| ≤ − − np p pµ√d I  2n  I η + | | ln ≤ | | k n I   p | | for some µ = µ(L) 1. Hence, for any k J and any λ> 0 we have ≥ ≤| |

γ′λ v d I µ√d I 2n E e k row (M), j k 1 2 exp | |g(λ)+ γ′λ | | ln . | j ≤ − ≤ n n I h i  | | By Lemma 2.6, this implies that for any subset T J and any λ> 0 we have ⊂

γ′λ P v T d I µ√d I 2n E e k∈T k 2| | exp T | |g(λ)+ γ′λ | | ln . ≤ "| | n n I #  | | Now, fix any t 4µ√d T . By the above estimate for the moment generation function and ≥ | | Markov’s inequality, we get

d I T µ√d I 2n P v t exp γ′λt + T + | || |g(λ)+ γ′λ T | | ln k ≥ ≤ − | | n | | n I k T " # n X∈ o | | 1 d I T exp γ′λt + T + | || |g(λ) ≤ −2 | | n   γ′nt for any λ> 0. It is easy to see that the last espression is minimized for λ := ln 1+ 2d I T . Plugging in the value of λ into the exponent, we get | || |  1 1 d I T P v t exp T + γ′t γ′λt | || |λ k ≥ ≤ | | 2 − 2 − n k T   n X∈ o d I T γ′nt γ′nt = exp T + | || | λ λ | | n 2d I T − 2d I T −   | || | | || |  d I T γ′nt = exp T | || |H , | | − n 2d I T   | || | where the function H is defined by (3). Finally, applying the relation (11), we get that for a large enough K = K(L) and all t K√d T we have ≥ | | d I T γ′nt d I T γ′nt T | || |H | || |H . | | − n 2d I T ≤ − 2n 2d I T  | || |  | || | The result follows.

37 Lemma 4.3. Let a (0, 1) and suppose that na d. Further, let the degree sequences din and dout, the subset J∈, the random vectors v(I) ≤RJ and the parameters L and γ(L),K(L) ∈ be the same as in Lemma 4.2. Then for a sufficiently large universal constant C4.3 we have

C4.3K√d P v(I) ψ, J for any interval k k | | ≤ γa n 1 subset I [n] of cardinality at most c n 1 . ⊂ 0 ≥ − n o Proof. Let C4.3 be a sufficiently large constant (its value can be recovered from the proof below). Further, let I [n] be a fixed interval subset of [n] of size at most c0n. In view ⊂ J 1 1 of Lemma 2.4, for any vector x R with x ψ, J C γ− a− √d there is a natural | | 4.3 t 2 ln(e J ) such that ∈ k k ≥ ≤ | | √ C4.3t d t i J : x J (2e)− . ∈ | i| ≥ 2γa ≥| | n o

In particular, we can write 2 ln(e J ) C4.3√d ⌊ | | ⌋ C4.3t√d p := P v(I) ψ, J P k T, vk(I) . k k | | ≥ γa ≤ ∀ ∈ ≥ 2γa t=1 T J, n o X T = XJ⊂(2e)−t n o | | ⌈| | ⌉ Then, applying Lemma 4.2, we get

2 ln(e J ) ⌊ | | ⌋ C t√d T C n t p exp 4.3 | | ln 1+ 4.3 ≤ − 2a 2a√d I t=1 T J, "  # X T = XJ⊂(2e)−t | | | | ⌈| | ⌉ 2 ln(e J ) ⌊ | | ⌋ √ t t C4.3t d J (2e)− C4.3 n t exp 4 J (2e)− t ⌈| | ⌉ ln 1+ . ≤ ⌈| | ⌉ − 2a √ t=1 " 2a d I # X  | | C n t Now using that ln 1+ 4.3 t 1 for any t in the above sum, we get 2a√d I √d √d | | ≫ ≥   2 ln(e J ) t ⌊ | | ⌋ C t J (2e)− p exp 4.3 ⌈| | ⌉ ≤ − 4a t=1   X t C t J (2e)− 1 2 ln(e J ) max exp 4.3 | | , ≤ ⌊ | | ⌋ t=1,..., 2 ln(e J ) − 4a ≪ n3 ⌊ | | ⌋   where the last inequality follows from the lower bound on d and the choice of C4.3. It remains to apply the union bound over all interval subsets (of which there are O(n2)) to finish the proof.

1 Remark 4.4. It is easy to see from the proof that the probability estimate 1 n− in the m − lemma can be replaced with 1 n− for any m > 0 at the expense of replacing C by a − 4.3 larger constant.

38 As a consequence of the above, we obtain Proposition 4.5. For any parameters a (0, 1), L 1 and m N there is n = n (a, m, L) ∈ ≥ ∈ 0 0 and L = L(L, m) (i.e. L depends only on L and m) with the following property: Let n n , ≥ 0 na d and let the degree sequences din and dout be such that (din d)n , (dout ≤ i − i=1 ψ,n i − n e e e 1 d)i=1 L√d. Then the event (a− L) (defined by formula (4)) has probability at least ψ,n P m ≤ E 1 n − . − e Proof. Let us partition [n] into at most 2√d subsets J ,J ,...,J (r 2√d), where each J 1 2 r ≤ j satisfies √d/2 Jj √d. For any j r, in view of Lemma 4.3, with probability at least m 2 ≤| | ≤ ≤ 1 n− − the J -dimensional vector − | j| out c j j j row dk I v (I)=(vk)k Jj , vk := pk (I, M) | | , k Jj, ∈ − n ∈

j 1 satisfies v (I) ψ, Jj K′a− √d for some K ′ = K′(m, L) 1 for any interval subset I [n] k k | | ≤ ≥ m 1 ⊂ of cardinality at most c0n. Hence, with probability at least 1 n− − , the concatenated n-dimensional vector −

v(I)=(v , v ,...,v ), v = vj for any j r and k J 1 2 n k k ≤ ∈ j 1 satisfies v(I) ψ,n K′a− √d for any interval subset I [n] of cardinality at most c0n. Next, notek thatk for≤ any k n and any I [n] we have ⊂ ≤ ⊂ dout Ic nv = n prow(I, M) k | | k k − n n n dout Ic prow(I, M) prow(I, M) prow(I, M) i | | ≥ k − i − i − n i=1 i=1 X X n dout Ic d Ic i | | | | Ic dout d − n − n −| | k − i=1 X n n row(I, M) v dout d n dout d . ≥ Pk − i − i − − k − i=1 i=1 X X

Hence, in view of the convexity of exp( ), we get · row(I, M) nv + n dout d + n (dout d)n + n v(I) , Pk ≤ k k − i − i=1 ψ,n k kψ,n which implies that row(I, M) 2n (dout d)n +2n v(I) . kP kψ,n ≤ i − i=1 ψ,n k kψ,n

m 1 row 1 Therefore, with probability at least 1 n− − , we have (I, M) La− n√d for − kP kψ,n ≤ any interval subset I [n] of cardinality at most c n and L := 2K′ +2L. Clearly, the same ⊂ 0 estimate holds for col(I, M) and the proof is complete. e P e 39 in out Let us introduce a family of random variables on the probability space ( n(d , d ), P) as follows. Take any index i n and any subset I [n] not containingMi. Further, let ≤ ⊂ x Rn be any vector. Then we define θ(i, I, x): (din, dout) R as ∈ Mn → θ(i, I, x) := E row (M), x row (M), j I . h i i| j ∈

In other words, θ(i, I, x) is the conditional expectation of rowi(M), x , conditioned on real- izations of rows row (M) (j I). h i j ∈ Lemma 4.6. Let L > 0 be some parameter and let the event (L) be defined by (4). Let EP I be any non-empty interval subset of [n] of length at most c0n and let Q =(Qij) be a fixed n n matrix with all entries with indices outside I [n] equal to zero. Then for any t> 0 we× have × n

P MijQij θ i, inf I,...,i 1 , rowi(Q) > t M (L) − { − } | ∈EP i I j=1 n X∈  X  o 2  2 d Q HS γtn Q exp k k H k k∞ , P 2 2 ≤ ( (L)) − n Q d Q HS EP  k k∞  k k  where γ = γ(L) is taken from Theorem 3.12. Proof. Fix for a moment any i I and let ∈ := col( inf I,...,i 1 , M) Ln√d . Ei kP { − } kψ,n ≤

Further, denote by ηi the random variable n η := M Q θ i, inf I,...,i 1 , row (Q) χ , i ij ij − { − } i i  j=1  X  where χ is the indicator function of the event M . Note that col( inf I,...,i i ∈ Ei kP { − 1 , M) ψ,n is uniquely determined by realizations of rowinf I (M),..., rowi 1(M). Now, as- } k − sume that Yj (j = inf I,...,i 1) is any realization of rows rowj(M) (j = inf I,...,i 1) such that, conditioned on this− realization, M belongs to . That is, − Ei row (M)= Y , j = inf I,...,i 1 . j j − ⊂Ei Then, applying Theorem 3.12, we obtain

n γλ P =1 Mij Qij γλθ(i, inf I,...,i 1 ,rowi(Q)) E e j − { − } rowj(M)= Yj, j = inf I,...,i 1 2 | −  d rowi(Q)  exp k k 2 g(λ max Qij) , λ> 0, ≤ n maxj n Qij j n  ≤ ≤  for some γ = γ(L). Note that the value of ηj is uniquely determined by realizations of rows row (M) (k j). Hence, in view of the definition of η , we get from the last relation k ≤ i 2 d rowi(Q) E ληi 1 e ηj, j = inf I,...,i 1 exp k k 2 g(λγ− max Qij) , λ> 0. | − ≤ n maxj n Qij j n  ≤ ≤    40 Now, let

η := ηi. i I X∈ By the above inequality and by Corollary 2.6, we get 2 d Q HS γtn Q P ∞ η t exp k k 2 H k 2k , t> 0. ≥ ≤ − n Q d Q HS  k k∞  k k   Finally, note that

(L) i, EP ⊂ E i I \∈ whence, restricted to (L), the variable η is equal to EP n M Q θ i, inf I,...,i 1 , row (Q) . ij ij − { − } i i I j=1 X∈  X  It follows that n

P MijQij θ i, inf I,...,i 1 , rowi(Q) t M (L) − { − } ≥ | ∈EP i I j=1 n ∈   o X X 2  1 d Q HS γtn Q exp k k H k k∞ , t> 0. P 2 2 ≤ ( (L)) − n Q d Q HS EP  k k∞  k k  Applying a similar argument to the variable η, we get the result. − The next lemma allow us to replace the variables θ i, inf I,...,i 1 , row (Q) with { − } i constants.  Lemma 4.7. For any L 1 there is n = n (L) with the following property. Let n n , ≥ 0 0 ≥ 0 let I be any non-empty interval subset of [n] of length at most c0n and let Q = (Qij) be a fixed n n matrix with all entries with indices outside I [n] equal to zero. Then × × dout n θ i, inf I,...,i 1 , row (Q) i Q C L√d row (Q) { − } i − n ij ≤ 4.7 k i klog,n i I i I j=1 i I ∈ ∈ ∈ X  X X X everywhere on (L). Here, C > 0 is a universal constant. EP 4.7 Proof. In view of the relation en which follows from convexity of the function k·k1 ≤ k·klog,n t ln (t), it is enough to show that for any i I we have + ∈ dout n θ i, inf I,...,i 1 , row (Q) i Q { − } i − n ij j=1  X C√d C row (Q) + row (Q) col( inf I,...,i 1 , M) ≤ n k i k1 n k i klog,nkP { − } kψ,n everywhere on (L) for a sufficiently large constant C > 0. But this follows immediately EP from Proposition 3.13.

41 Finally, we can prove the main technical result of the paper. To make the statement self-contained, we explicitly mention all the assumptions on parameters. Given an n n × matrix Q, we define the shift ∆(Q) as n ∆(Q) := √d row (Q) . k i klog,n i=1 X Theorem 4.8. For any L 1 there are γ = γ(L) > 0 and n0 = n0(L) with the following properties. Assume that n ≥n and that the degree sequences din, dout satisfy ≥ 0 (1 c )d din, dout d, i n − 0 ≤ i i ≤ ≤ 2 in out for some natural d with C1 ln n d (1/2+c0)n. Further, assume that the set n(d , d ) is non-empty. Then, with (L≤) defined≤ by (4), we have for any n n matrixMQ: EP × n n n out n di P MijQij Qij > t + C2L∆(Q) M (L) − n | ∈EP i=1 j=1 i=1 j=1 n X X X X o 2 C3 d Q HS γtn Q exp k k H k k∞ , t> 0. P 2 2 ≤ ( (L)) − n Q d Q HS EP  k k∞  k k  Here, C1,C2,C3 > 0 are sufficiently large universal constants.

Proof. Let us partition [n] into 2/c0 interval subsets Ij (j 2/c0 ), with each Ij of cardinality at most c n. Further,⌈ define⌉ n n matrices Qj (j ≤2/c ⌈ )⌉ as 0 × ≤ ⌈ 0⌉ j j Qkℓ, if k I ; Qk,ℓ := ∈ (0, otherwise. Note that each Qj satisfies assumptions of both Lemma 4.6 and Lemma 4.7. Combining the lemmas, we get

n out n dk P MkℓQkℓ Qkℓ > t + CL∆j M (L) − n | ∈EP k Ij ℓ=1 k Ij ℓ=1 n X∈ X X∈ X o j 2 j 2 d Q HS γtn Q exp k k H k k∞ , t> 0, P j 2 j 2 ≤ ( (L)) − n Q d Q HS EP  k k∞  k k  √ where ∆j := d k Ij rowk(Q) log,n. It is not difficult to check that the function f(s,w) := s2 bw ∈ k k 2 H( 2 ) is decreasing in both arguments s and w for any value of parameter b> 0. Hence, w s P the above quantity is majorized by 2 2 d Q HS γtn Q exp k k H k k∞ . P 2 2 ( (L)) − n Q d Q HS EP  k k∞  k k  Finally, note that if for some matrix M (din, dout) and t> 0 we have ∈Mn n n n dout n M Q k Q > t + CL∆(Q) kℓ kℓ − n kℓ k=1 ℓ=1 k=1 ℓ=1 X X X X

42 then necessarily

n out n dk t MkℓQkℓ Qkℓ > + CL∆j − n 2/c0 k Ij ℓ=1 k Ij ℓ=1 ⌈ ⌉ X∈ X X∈ X for some j 2/c . The result follows. ≤ ⌈ 0⌉

Remark 4.9. It is easy to see that constant C3 in the above theorem can be replaced by any number strictly greater than one, at the expense of decreasing γ.

Remark 4.10. Note that, in view of Lemma 2.3, we have

d n ∆(Q) C row (Q) C √d Q . ≤ 2.3 n k i k ≤ 2.3 k kHS r i=1 X Rn T In particular, if x and y are unit vectors in then ∆(xy ) C2.3√d. Further, if all non-zero entries of the matrix Q are located in a submatrix of size≤ k ℓ (for some k,ℓ n) × ≤ then, again applying Lemma 2.3, we get

√dℓ 2n n √dkℓ 2n ∆(Q) C ln row (Q) C ln Q . ≤ 2.3 n ℓ k i k ≤ 2.3 n ℓ k kHS i=1 X In particular, given a k-sparse unit vector x and an ℓ-sparse unit vector y, we have

√dkℓ 2n ∆(xyT ) C ln . ≤ 2.3 n ℓ Remark 4.11. Assume that (dout d)n K√d for some parameter K > 0. Then i − i=1 ψ,n ≤ we have, in view of Lemma 2.2:

n dout n d n 1 n n i Q Q (dout d)Q n ij − n ij ≤ n i − ij i=1 j=1 i,j=1 j=1 i=1 X X X X X n 1 (dout d)n col (Q) ≤ n i − i=1 k j k j=1 X C K√d Q . ≤ 2.2 k kHS out n di n Together with Remark 4.10, this implies that the quantity “ i=1 n j=1 Qij” in the estimate of Theorem 4.8 can be replaced with “ d n Q ” at expense of substituting n i,j=1 ij P P √d Q for the shift ∆(Q). k kHS P Remark 4.12. The Bennett–type concentation inequality for linear forms obtained in [12] (see formula (6) there) contains a parameter playing the same role as shift ∆(Q) in our theorem. However, the dependence of the “shift” in [12] on matrix Q is fundamentally

43 different from ours. Given a random matrix M uniformly distributed on the set n(d), for every matrix Q with non-negative entries and zero diagonal, Theorem 5.1 of [12] giveS s: f n n 2 n 2 d Cd d Q HS ctn Q P M Q Q t + Q 2 exp k k H k k∞ . ij ij − n ij ≥ n2 ij ≤ − n Q 2 d Q 2 i,j=1 i,j=1 i,j=1 HS n X X X o  k k∞  k k   f d2 n ed2 n ed3/2 In view of (8), the “shift” n2 i,j=1 Qij is majorized by n i=1 rowi(Q) log,n = n ∆(Q). Thus, the concentration inequality from [12] gives sharper estimatkes thank ours provided that d = O(n2/3). On the otherP hand, for d n2/3 the estimateP in [12] becomes insufficient to produce the optimal upper bound on≫ the matrix norm, whereas our shift ∆(Q) gives satisfactory estimates for all large enough d. Let us emphasize that this comparison is somewhat artificial since [12] deals only with undirected graphs and symmetric matrices, while our Theorem 4.8 applies to the directed setting.

The proof of Theorem D from the Introduction is obtained by combining Theorem 4.8 with Remarks 4.9–4.11 and Proposition 4.5.

Let us finish this section by discussing the necessity of the tensorization procedure. As we mentioned in the Introduction, Freedman’s inequality for martingales was employed in paper [13] dealing with the permutation model of regular graphs (when the adjacency matrix of corresponding random multigraph is constructed using independent random permutation matrices and their transposes). It was proved in [13] that the second largest eigenvalue of such a graph is of order O(√d) with high probability. Importantly, in [13] the martingale sequence was constructed for the entire matrix, thereby yielding a concentration inequality directly after applying Freedman’s theorem and without any need for a tensorization procedure. The fact that in our paper we construct martingales row by row is essentially responsible for the presence of the “shift” ∆(Q) in our concentration inequality, and forced us to develop the lengthy and technical tensorization. However, when constructing a single martingale sequence over the entire matrix, revealing the matrix entries one by one in some appropriate order, it is not clear to us how to control martingale’s parameters (absolute values of the differences and their variances). Nevertheless, it seems natural to expect that some kind of an “all-matrix” martingale can be constructed and analysed, yielding a much stronger concentration inequality for linear forms.

5 The Kahn–Szem´eredi argument

In this section, we use the concentration result established above and the well known argument of Kahn and Szem´eredi [18] to bound s2(M), for M uniformly distributed on in out n(d , d ). The agrument was originally devised to handle d-regular undirected graphs, Mand we refer to [12] for a detailed exposition in that setting. In our situation, the Kahn– Szem´eredi argument must be adapted to take into account absence of symmetry. Still, let us emphasize that the structure of proofs given in this section bears a lot of similarities with

44 those presented in [12]. Set

n n 1 n 1 S − := y S − : y =0 . 0 ∈ i i=1 n X o The Courant–Fischer formula implies

s2(M) sup My = sup My, x ≤ y Sn−1 k k (x,y) Sn−1 Sn−1h i ∈ 0 ∈ × 0 (of course, the above relation is true for any n n matrix M). To estimate the expression on the right hand side, we shall apply our concentration× inequality to My, x for any fixed n 1 n 1 h i couple (x, y) S − S0 − , and then invoke a covering argument. Let us take a closer look at the procedure.∈ We× have for any admissible x, y:

n n My, x = x M y = M Q , h i i ij j ij ij i,j=1 i,j=1 X X t where Q := xy satisfies Q HS = 1 and Q = max xiyj = x y . Therefore, in k k k k∞ i,j [n] | | k k∞ k k∞ ∈ view of the concentration statement obtained in Section 4, the (conditional) probability that My, x √d is bounded by h i≫ n d H x y √d k k∞ k k∞ exp 2 2 . − n x y   k k∞ k k∞  (we disregard any constant factors in the above expression). However, when x y n k k∞ k k∞ ≫ √d/n, the estimate becomes too weak (larger than C− ) to apply the union bound over a net of size exponential in n. The idea of Kahn and Szem´eredi is to split the entries of Q into two groups according to their magnitude. Then the standard approach discussed above would work for the collection of entries smaller than √d/n. Corresponding pairs of indices are called light couples. For the second group, the key idea is to exploit discrepancy properties of the associated graph; again, our concentration inequality will play a crucial role in their verification. n 1 n 1 Given (x, y) S − S − , let us define ∈ × 0 (x, y) := (i, j) [n]2 : x y √d/n and (x, y) := (i, j) [n]2 : x y > √d/n . L ∈ | i j| ≤ H ∈ | i j| The notation (x, y) stands for light couples while (x, y) refers to heavy couples. Moreover, we will representL the corresponding partition of Q Has Q = Q + Q , where Q , Q are both L H L H n n matrices in which the entries from “the alien” collection are replaced with zeros. ×Throughout the section, we always assume that the degree sequences din, dout satisfy (6) for some d, and that d itself satisfies (29). Moreover, we always assume that the set in out of matrices n(d , d ) is non-empty. As before, M is the random matrix uniformly distributed onM (din, dout) and G is the associated random graph. Mn

45 Lemma 5.1. For any L 1 there is γ = γ(L) > 0 with the following property: Let n C5.1 n 1 ≥n 1 ≥ and let (x, y) S − S − . Then for any t> 0 we have ∈ × 0 C5.1 P xiMijyj (C L + t)√d (L) exp nH(γ t) . ≥ 5.1 | EP ≤ P( (L)) − n (i,j) (x,y) o EP X∈L 

Here, C > 0 is a sufficiently large universal constant and (L) is defined by (4). 5.1 EP n 1 n 1 t Proof. Let (x, y) S − S0 − and denote Q := xy . Let Q and Q be defined as above. L H By the definition∈ of (x,× y), we have Q √d/n, and, since x = y = 1, we have L k Lk∞ ≤ k k k k Q HS 1. Further, note that k Lk ≤ n

rowi(Q ) 1 x 1 y 1 n, k L k ≤k k k k ≤ i=1 X whence, in view of Lemma 2.3,

n

rowi(Q ) log,n C . k L k ≤ 2.3 i=1 X Applying Theorem 4.8 to matrix Q with t := r√d (r > 0), we get that there exists L γ := γ(L) > 0 depending on L such that

out di xiyj P xiMijyj (C L + r)√d M (L) − n ≥ | ∈EP (i,j) (x,y) (i,j) (x,y) n X∈L X∈L o C 3 exp nH(γr) , (35) ≤ P( (L)) − EP  where C is a universal constant and C3 is the constant from Theorem 4.8. Since the coordi- nates of y sum up to zero, we have for any i n: ≤ 2 out out (xiyj) d xiyj = d xiyj d , i i ≤ √d/n j: (i,j) (x,y) j: (i,j) (x,y) j: (i,j) (x,y) X∈L X∈H X∈H out where in the last inequality we used that di d and xiyj √d/n for (i, j) (x, y). Summing over all rows and using the condition≤x = y| = 1,| ≥ we get ∈ H k k k k dout x y i i j √d. n ≤ (i,j) (x,y) X∈L

This, together with (35), finishes the proof after choosing C C + 1. 5.1 ≥ Next, we prove a discrepancy property for our model. In what follows, for any subsets S, T [n], E (S, T ) denotes the set of edges of G emanating from S and landing in T . For ⊂ G 46 any K1,K2 1, we denote by 5.2(K1,K2) the event that for all subsets S, T [n] at least one of the following≥ is true: E ⊂ d E (S, T ) K S T , (36) | G | ≤ 1 n | || | or EG(S, T ) e n EG(S, T ) ln | | K2 max( S , T ) ln . (37) | | d S T ≤ | | | | max( S , T )  n | || |   | | | |  Let us note that both conditions above can be equivalently restated using a single formula; however, the presentation in form (36)–(37) nicely captures the underlying dichotomy within a “typical” realization of G: either both S and T are “large”, in which case the edge count does not deviate too much from its expectation, or at least one of the sets is “small”, and the edge count, up to a logarithmic multiple, is bounded by the cardinality of the larger vertex set.

Proposition 5.2. For any L 1 and m N there are n0 = n0(L, m), K1 = K1(L, m) and K = K (L, m) such that for n≥ n and d∈satisfying (29) we have 2 2 ≥ 0 1 P (K1,K2) (L) 1 . E5.2 |EP ≥ − P( (L)) nm EP  Proof. Fix for a moment any S, T [n] and let Q be the n n matrix whose entries are equal to 1 on S T and 0 elsewhere.⊂ Set k := S and ℓ := T ×. From Remark 4.10, we have × | | | | √dkℓ 2n √dkℓ dkℓ ∆(Q) C ln Q C ln(2n) C , ≤ 2.3 n ℓ k kHS ≤ 2.3 n ≤ 2.3 n where the last inequality follows from the assumption (29) on d. Using the estimate together with the inequality dout d (i n) and applying Theorem 4.8, for any r> 0 we obtain i ≤ ≤ d C dkℓ P EG(S, T ) > (CL + r) kℓ M (L) exp H(γr) , (38) | | n | ∈EP ≤ P( (L)) − n n o EP   for a universal constant C 1 and some γ = γ(L) > 0. Now, we set K1 := 2CL and let K = K (L, m) be the minimum≥ number such that K H(γt) 2(3 + m)t ln(2t) for all 2 2 2 ≥ t CL (note that the definition of K1, K2 does not depend on S and T ). Since the function H≥is strictly increasing on (0, ), there is a unique number r > 0 such that ∞ 1 (3 + m) max(k,ℓ) e n H(γr1)= ln . d kℓ max(k,ℓ) n   Next, note that if for a fixed realization of the graph G we have E (S, T ) (CL + r ) d kℓ | G | ≤ 1 n then either (36) or (37) holds. Indeed, if r1 CL then the assertion is obvious. Otherwise, ≤ K2 if r1 > CL then, by the definition of K2, we have (CL + r1) ln(CL + r1) 3+m H(γr1). Together with the trivial estimate ≤

EG(S, T ) d EG(S, T ) ln | d | kℓ (CL + r1) ln(CL + r1), | | n kℓ ! ≤ n

47 this gives

EG(S, T ) K2 d e n EG(S, T ) ln | | kℓH(γr1)= K2 max(k,ℓ) ln . | | d kℓ ≤ 3+ m n max(k,ℓ) n !   d Thus, all realizations of M (or, equivalently, G) with EG(S, T ) (CL + r1) n S T for all S, T [n], necessarily fall into event (K ,K ). It| follows that| ≤ | || | ⊂ E5.2 1 2

c d P (K1,K2) (L) P S, T [n]: EG(S, T ) > (CL + r1) S T M (L) . E5.2 |EP ≤ ∃ ⊂ | | n| || | | ∈EP n o Applying (38), we get 

n c C n n dkℓ P (K1,K2) (L) exp H(γr1) E5.2 |EP ≤ P( (L)) k ℓ − n EP k,ℓ=1        X C n en en dkℓ exp k ln + ℓ ln H(γr ) ≤ P( (L)) k ℓ − n 1 EP k,ℓ=1   X   C n e n exp (m + 1)max(k,ℓ) ln ≤ P( (L)) − max(k,ℓ) EP k,ℓX=1    C ≤ P( (L)) nm+1 EP 1 , ≤ P( (L)) nm EP where we used the estimate max(k,ℓ) ln en ln n. max(k,ℓ) ≥   The conditions on the edge count of a graph expressed via (36) or (37), are a basic element in the argument of Kahn and Szem´eredi. The following lemma shows that the contribution of heavy couples to the matrix norm is deterministically controlled once we suppose that either (36) or (37) holds for all vertex subsets of corresponding graph.

Lemma 5.3. For any K1,K2 > 0 there exists β > 0 depending only on K1,K2 such that the following holds. Let, as usual, the degree sequences din, dout be bounded from above by d n 1 (coordinate-wise) and let M (K ,K ). Then for any x, y S − , we have ∈E5.2 1 2 ∈ x M y β√d. i ij j ≤ (i,j) (x,y) X∈H A proof of this statement in the undirected d-regular setting is well known [18, 12]. In the appendix to this paper, we include the proof adapted to our situation.

n 1 n 1 In order to simultaneously estimate contribution of all pairs of vectors from S − S0 − to the second largest singular value of our random matrix, we shall discretize this set.× The following lemma is quite standard.

48 n 1 0 Lemma 5.4. Let ε (0, 1/2), ε be a Euclidean ε-net in S − , and ε be a Euclidean n 1 ∈ N N ε-net in S0 − . Further, let A be any n n non-random matrix and R be any positive number × 0 such that Ax, y R for all (x, y) ε ε . Then Ax, y R/(1 2ε) for all n|h1 i|n 1 ≤ ∈ N ×N |h i| ≤ − (x, y) S − S − . ∈ × 0 n 1 n 1 Proof. Let (x ,y ) S − S − be such that a := sup n−1 n−1 Ax, y = Ax ,y . 0 0 0 (x,y) S S0 0 0 ∈ × 0 ∈ ×0 h i h i By the definition of and , there exists a pair (x′ ,y′ ) such that x x′ ε Nε Nε 0 0 ∈Nε ×Nε k 0 − 0k ≤ and y0 y0′ ε. Together with the fact that the normalized difference of two elements in n 1k − k ≤ n 1 S0 − remains in S0 − , this yields

Ax ,y = A(x x′ ),y + Ax′ ,y y′ + Ax′ ,y′ h 0 0i h 0 − 0 0i h 0 0 − 0i h 0 0i a x0 x0′ + a y0 y0′ + sup Ax, y . 0 ≤ k − k k − k (x,y) ε |h i| ∈N ×Nε Hence, a 2ε a + R, ≤ which gives that a R/(1 2ε). ≤ −

Now, we can prove the main statement of this section. It is easy to check that the theorem below, together with Proposition 4.5, gives Theorem C from the Introduction. To make the statement self-contained, we explicitly mention all the assumptions on parameters.

Theorem 5.5. For any L, m 1 there exist κ = κ(L, m) > 0 and n0 = n0(L, m) with the following properties. Assume that≥ n n and that the degree sequences din, dout satisfy ≥ 0 (1 c )d din, dout d, i n − 0 ≤ i i ≤ ≤ 2 for some natural d with C ln n d (1/2+ c0)n. Then, with (L) defined by (4), we 3.2 P have ≤ ≤ E in out 1 c P M n(d , d ): s2(M) κ √d + P( (L) ). ∈M ≥ ≤ nm EP  Proof. Let K1 = K1(L, m + 1) and K2 = K2(L, m + 1) be defined as in Proposition 5.2, and let γ = γ(L) and β = β(K1,K2) be functions from Lemmas 5.1 and 5.3. We will use the shorter notation and instead of (L) and (K1,K2), respectively. Set EP E5.2 EP E5.2 1 1 r := γ− H− (1 + ln 81) and denote := M (din, dout): s (M) 2(C L + β + r)√d . E ∈Mn 2 ≥ 5.1 Using the Courant–Fischer formula, we obtain

P P n 1 n 1 ( ) (x, y) S − S0 − such that E|EP ∩E5.2 ≤ ∃ ∈ × n My, x 2(C L + β + r)√d M . |h i| ≥ 5.1 | ∈EP ∩E5.2 o 49 n 1 n 1 Let be a 1/4-net in S − and 0 be a 1/4-net in S0 − . Standard volumetric estimates showN that we may take and Nsuch that max( , ) 9n. Applying Lemma 5.4, we N N0 |N| |N0| ≤ get

P P (x, y) 0 such that {E |EP ∩E5.2} ≤ ∃ ∈N×N n My, x (C L + β + r)√d M |h i| ≥ 5.1 | ∈EP ∩E5.2 o

(81)n max P My, x (C L + β + r)√d M . (39) n−1 n−1 5.1 P 5.2 ≤ (x,y) S S0 |h i| ≥ | ∈E ∩E ∈ × n o n 1 n 1 Given (x, y) S − S − , we obviously have ∈ × 0 My, x x M y + x M y . |h i| ≤ i ij j i ij j (i,j) (x,y) (i,j) (x,y) X∈L X∈H

√ From Lemma 5.3, we get (i,j) (x,y) xiMijyj β d whenever M 5.2. Hence, in view ∈H ≤ ∈E of (39), P

n P( ) (81) max P xiMijyj (C L+r)√d . P 5.2 n−1 n−1 5.1 P 5.2 E|E ∩E ≤ (x,y) S S0 ≥ |E ∩E ∈ × n (i,j) (x,y) o X∈L Applying Lemma 5.1, we further obtain, by the choice of r,

n n C (81) C e− P( ) 5.1 exp ( nH(γr)) 5.1 . E|EP ∩E5.2 ≤ P( ) − ≤ P( ) EP EP To finish the proof, note that

P P P P c P P c ( ) ( 5.2) ( )+ ( ) ( )+ ( ) E ≤ E|EP ∩E EP E5.2 |EP EP EP and use the above estimate together with Proposition 5.2. The concentration inequality obtained in Theorem 4.8, was used in its full strength in Proposition 5.2 to control the input of heavy couples. For the light couples though, it would be sufficient to apply a weaker Berstein–type bound where the function H(τ) in the exponent τ 2 is replaced with 2+2τ/3 .

6 The undirected setting

In this section, we show how to deduce Theorem A from Theorem C. In [26], we showed that in a rather general setting the norm of a random matrix, whose distribution is invariant under joint permutations of rows and columns, can be bounded in terms of the norm of its n/2 n/2 × submatrix located in the top right corner. Moreover, for matrices with constant row and

50 column sums, an analogous phenomenon holds for the second largest singular values. Since the distribution of edges in the undirected uniform model is invariant under permutation of the set of vertices, the results of [26] are applicable in our context. We will need the following definition. For any ℓ,d > 0 and any parameter δ > 0 we set

Deg (d, δ) := (u, v) Nℓ Nℓ : u = v AND ℓ ∈ × k k1 k k1 n k2 i ℓ : u d > kδ ℓe− for all k N AND ≤ i − ≤ ∈ k2 i ℓ : v d > kδ ℓe− for all k N . ≤ i − ≤ ∈  o Note that any pair of vectors (u, v) from Degℓ(d, δ) necessarily satisfy u d1 ψ,n, v d1 Cδ for some universal constant C > 0. k − k k − kψ,n ≤ Below we state a special case of the main result of [26], where we replace a general random matrix with constant row/column sums by the adjacency matrix of a random regular graph. Theorem 6.1 ([26]). There exist positive universal constants c,C such that the following holds. Let n C and let d N satisfy d C ln n. Further, let G be a random undirected ≥ ∈ ≥ graph uniformly distributed on n(d) and let T be the n/2 n/2 top right corner of the adjacency matrix of G. Then, viewingG T as the adjacency⌊ matrix⌋ × ⌊ of⌋ a random directed graph on n/2 vertices, for any t C we have ⌊ ⌋ ≥ 1 P √ P √ in out √ s2(G) Ct d s2(T ) ct d AND d (T ), d (T ) Deg n/2 d/2,C d . ≥ ≤ c ≥ ∈ ⌊ ⌋  n  o Equipped with the above statement and with Theorem C, we can proceed with the proof of Theorem A. Proof of Theorem A. Let m N, α > 0 and let c,C be the constants from Theorem 6.1. We assume that nα d n/∈2. Denote by A = (a ) the adjacency matrix of the random ≤ ≤ ij graph G uniformly distributed on n(d). Let T be the n/2 n/2 top right corner of A. Fix for a moment any degree sequencesG (din, dout) of⌊ length⌋ × ⌊ n/2⌋ bounded above by d such that the event (din(T ), dout(T ))=(din, dout) is non-empty.⌊ Then,⌋ conditioned on the { } event, the directed random graph on n/2 vertices with adjacency matrix T is uniformly in out ⌊ ⌋ distributed on n/2 (d , d ). In other words, the distribution of T , conditioned on the in D⌊out ⌋ in out in out event (d (T ), d (T ))=(d , d ) , is uniform on the set n/2 (d , d ). ⌊ ⌋ { in out } M Now if (d , d ) Deg n/2 (d/2,C√d), then, applying Theorem C, we get ∈ ⌊ ⌋ 1 P s (T ) t˜√d din(T ), dout(T ) =(din, dout) , (40) 2 ≥ | ≤ nm n  o for some t˜ depending on α,C and m. Set t := C max(1, t/c˜ ). In view of Theorem 6.1, we get 1 P s (G) t√d P s (T ) t˜√d AND 2 ≥ ≤ c 2 ≥  n in out √ d (T ), d (T ) Deg n/2 d/2,C d =: η. ∈ ⌊ ⌋  o 51 Obviously, 1 η = P s (T ) t˜√d AND din(T ), dout(T ) =(din, dout) . c 2 ≥ in out (d ,d ) Deg 2 d/2,C√d n o ∈ X⌊n/ ⌋  Hence, applying (40), we get  1 1 η P din(T ), dout(T ) =(din, dout) , ≤ c nm ≤ c nm in out (d ,d ) Deg 2 d/2,C√d n o ∈ X⌊n/ ⌋   and complete the proof.

Acknowledgments. A significant part of this work was done when the second named author visited the University of Alberta in May–June 2016, and when both authors visited the Texas A&M University in July 2016. Both authors are grateful to the University of Alberta and the Texas A&M University for excellent working conditions, and would especially like to thank Nicole Tomczak–Jaegermann, Bill Johnson, Alexander Litvak and Grigoris Paouris. We would also like to thank Djalil Chafa¨ıfor helpful comments. The first named author is partially supported by the Simons Foundation (Collaboration on Algorithms and Geometry).

References

[1] N. Alon, Eigenvalues and expanders, Combinatorica 6 (1986), no. 2, 83–96. MR0875835

[2] N. Alon and V. D. Milman, λ1, isoperimetric inequalities for graphs, and superconcen- trators, J. Combin. Theory Ser. B 38 (1985), no. 1, 73–88. MR0782626

[3] R. Bauerschmidt, J. Huang, A.Knowles, and H.-T. Yau. Bulk eigenvalue statistics for random regular graphs, arXiv:1505.06700.

[4] R. Bauerschmidt, A. Knowles, H.-T. Yau, Local semicircle law for random regular graphs, arXiv:1503.08702.

[5] G. Bennett, Probability Inequalities for the Sum of Independent Random Vari- ables, Journal of the American Statistical Association 297 (1962), 33–45, doi:10.2307/2282438.

[6] S. N. Bernstein, Theory of Probability (in Russian), Moscow, 1927.

[7] C. Bordenave, A new proof of Friedman’s second eigenvalue Theorem and its extension to random lifts, arXiv:1502.04482.

52 [8] A. Z. Broder, A. M. Frieze, S. Suen, E. Upfal, Optimal construction of edge-disjoint paths in random graphs, SIAM J. Comput. 28 (1999), no. 2, 541–573 (electronic). MR1634360

[9] A. Broder, E. Shamir, On the second eigenvalue of random regular graphs, Proceedings of the 28th Annual Symposium on Foundations of Computer Science (1987), 286–294.

[10] N. Cook, Discrepancy properties for random regular digraphs, Random Structures Al- gorithms, DOI: 10.1002/rsa.20643.

[11] N. Cook, On the singularity of adjacency matrices for random regular digraphs, Prob. Theory and Related Fields, to appear. arXiv:1411.0243.

[12] N. Cook, L. Goldstein, T. Johnson, Size biased couplings and the spectral gap for random regular graphs, arXiv:1510.06013.

[13] I. Dumitriu, T. Johnson, S. Pal, E. Paquette, Functional limit theorems for random reg- ular graphs, Probab. Theory Related Fields 156 (2013), no. 3-4, 921–975. MR3078290

[14] I. Dumitriu and S. Pal, Sparse regular random graphs: spectral density and eigenvectors, Ann. Probab. 40 (2012), no. 5, 2197–2235. MR3025715

[15] D. A. Freedman, On tail probabilities for martingales, Ann. Probability 3 (1975), 100– 118. MR0380971

[16] J. Friedman, A proof of Alon’s second eigenvalue conjecture and related problems, Mem. Amer. Math. Soc. 195 (2008), no. 910, viii+100 pp. MR2437174

[17] J. Friedman, On the second eigenvalue and random walks in random d-regular graphs, Combinatorica 11 (1991), no. 4, 331–362. MR1137767

[18] J. Friedman, J. Kahn, E. Szemer´edi, On the second eigenvalue of random regular graphs, Proceedings of the twenty-first annual ACM symposium on Theory of computing (1989), 587–598.

[19] A.E. Litvak, A. Lytova, K. Tikhomirov, N. Tomczak-Jaegermann, P. Youssef, Adjacency matrices of random digraphs: singularity and anti-concentration, J. of Math. Analysis and Appl., 445 (2017), 1447-1491. arXiv:1511.00113.

[20] A. E. Litvak, A. Pajor, M. Rudelson, N. Tomczak-Jaegermann, Smallest singular value of random matrices and geometry of random polytopes. Adv. Math. 195 (2005), no. 2, 491-523.

[21] B. D. McKay, The expected eigenvalue distribution of a large regular graph, Linear Algebra Appl. 40 (1981), 203–216. MR0629617

[22] D. Puder. Expansion of random graphs: New proofs, new results, Inventiones Mathe- maticae, 201 (3), 845-908, 2015.

53 [23] M. M. Rao and Z. D. Ren, Theory of Orlicz spaces, Monographs and Textbooks in Pure and Applied Mathematics, 146, Dekker, New York, 1991. MR1113700

[24] M. Rudelson and R. Vershynin, The Littlewood-Offord problem and invertibility of random matrices. Adv. Math. 218 (2008), no. 2, 600-633.

[25] J. K. Senior, Partitions and their representative graphs, Amer. J. Math. 73 (1951), 663–689. MR0042678

[26] K. Tikhomirov and P. Youssef, On the norm of a random jointly exchangeable matrix, arXiv:1610.01751.

[27] L. V. Tran, V. H. Vu and K. Wang, Sparse random graphs: eigenvalues and eigenvectors, Random Structures Algorithms 42 (2013), no. 1, 110–134. MR2999215

[28] V. Vu, Random discrete matrices, in Horizons of combinatorics, 257–280, Bolyai Soc. Math. Stud., 17, Springer, Berlin. MR2432537

[29] V. Vu, Combinatorial problems in random matrix theory, Proceedings ICM, Vol. 4, 2014, 489–508.

[30] N. C. Wormald, Models of random regular graphs, in Surveys in combinatorics, 1999 (Canterbury), 239–298, London Math. Soc. Lecture Note Ser., 267, Cambridge Univ. Press, Cambridge. MR1725006

7 Appendix

Here, we provide a detailed proof of Lemma 5.3. Let us emphasize that corresponding result for undirected graphs is well known (see a detailed proof in [12]); the sole purpose of this part of the paper is to convince the reader that the argument carries easily to the directed setting.

in out Proof of Lemma 5.3. Let d , d be the two given degree sequences, and K1 and K2 be the two parameters in the definition of (K ,K ) (din, dout). Let M be any fixed E5.2 1 2 ⊂ Mn matrix in 5.2(K1,K2) and G be the corresponding graph. E n 1 Let x, y S − , and for any i 1 define ∈ ≥

1 i 1 i 1 i 1 i S := k [n]: x [2 − , 2 ) and T := k [n]: y [2 − , 2 ) . i ∈ | k| ∈ √n i ∈ | k| ∈ √n n o n o 1/2 Note that any couple (i, j) with min( xi , yj ) < n− is light. Further, whenever (k,ℓ) (x, y) (S T ) for some i, j 1, we| | have| | ∈ H ∩ i × j ≥ √d 2i+j x y . n ≤| k ℓ| ≤ n

54 Hence, 2i+j x M y E (S , T ) , i ij j ≤ n | G i j | (i,j) (x,y) (i,j) X∈H X∈I where := (i, j): 2i+j √d . Set I { ≥ } in := (i, j) : S T and out := (i, j) : S T . I { ∈ I | i|≥| j|} I { ∈ I | i|≤| j|} We have 2i+j 2i+j x M y E (S , T ) + E (S , T ) . i ij j ≤ n | G i j | n | G i j | (i,j) (x,y) (i,j) in (i,j) out X∈H X∈I X∈I

In what follows, we will bound the first term in the above inequality; the other summand is estimated in exactly the same way. Given (i, j) in, denote ∈ I 2i 2j EG(Si, Tj) 2 2 √d rij := | |, αi := Si , βj := Tj and sij := rij. d S T n | | n | | 2i+j n | i|| j| Note that s r . Further, ij ≤ ij 22i 2 α =4 S − 4 x2 4. (41) i | i| n ≤ k ≤ i 1 i 1 i 1 k S X≥ X≥ X≥ X∈ i

Similarly, we have j 1 βj 4. Since the in- and out-degrees are bounded by d, we have ≥ ≤ P E (S , T ) min( dout, din) d min( S , T )= d T , | G i j | ≤ k k ≤ | i| | j| | j| k S k T X∈ i X∈ j implying n 22i r = . (42) ij ≤ S α | i| i Next, as M (K ,K ), we have either r K or ∈E5.2 1 2 ij ≤ 1 K 22j e 22i r ln(r ) 2 ln . (43) ij ij ≤ d β α j  i  With the above notation,

2i+j E (S , T ) = √d α β s . n | G i j | i j ij (i,j) in (i,j) in X∈I X∈I Our aim is to show that g(M) := αiβjsij = O(1). (i,j) in X∈I e 55 Let us divide in into five subsets: I in := (i, j) in : s K I1 { ∈ I ij ≤ 1}

in := (i, j) in : 2i 2j/√d I2 { ∈ I ≤ }

1 2i 4 in in e 2 in in 3 := (i, j) : rij > ( 1 2 ) I ∈ I αi \ I ∪ I n   o

in in 1 2i in in in 4 := (i, j) : e 2 ( 1 2 3 ) I ∈ I αi ≤ \ I ∪ I ∪ I n o in := in ( in in in in) I5 I \ I1 ∪ I2 ∪ I3 ∪ I4 For every s =1, 2, 3, 4, 5, we write

gs(M) := αiβjsij. (i,j) in X∈Is Obviously, g(M) 5 g (M). ≤ s=1 s Claim 1. g1(M) P16K1. e ≤ Proof. Since s K for (i, j) in, then in view of (41), we get ij ≤ 1 ∈ I1 g (M) K α β K α β 16K . 1 ≤ 1 i j ≤ 1 i j ≤ 1 (i,j) in i 1 j 1 X∈I1 X≥ X≥

Claim 2. g (M) 8. 2 ≤ Proof. In view of (42), we have

i rij 2 j i g (M)= √d α β √d β = √d β 2− 2 . 2 i j 2i+j ≤ j 2j j (i,j) in (i,j) in j 1 i:(i,j) in X∈I2 X∈I2 X≥ X∈I2 Since 2i 2j/√d for (i, j) in, the second sum is bounded by 2 2j/√d. Thus, we have ≤ ∈ I2 · g (M) 2 β 8, 2 ≤ j ≤ j 1 X≥ where the last inequality follows from (41) (with βj replacing αi).

56 Claim 3. g (M) 32K . 3 ≤ 2 Proof. First note that when (i, j) in, we have s >K . Combined with (43), this implies 6∈ I1 ij 1 K 22j e 22i r ln r 2 ln ij ij ≤ d β α j  i  for any (i, j) in. After an appropriate transformation, we get 6∈ I1 j 2i K2 2 e 2 βj sij ln rij ln (44) ≤ √ i α d 2  i  for any (i, j) in. When (i, j) in, we have 6∈ I1 ∈ I3 1 e 22i ln r ln . ij ≥ 4 α  i  This, together with (44), yields j 4K2 2 βjsij , ≤ √d 2i for any (i, j) in. Thus, ∈ I3

4K2 i j g3(M) αi 2− 2 ≤ √d i 1 j:(i,j) in X≥ X∈I3 Since 2j 2i √d for (i, j) in, the second sum is bounded by 2 2i √d. Hence, we have ≤ 6∈ I2 · g (M) 8K α 32K , 3 ≤ 2 i ≤ 2 i 1 X≥ where in the last inequality we used (41).

8K2√6e Claim 4. g4(M) . ≤ K1 ln K1 Proof. In view of (44), we have for any (i, j) in: ∈ I4 j j K2 2 2 4i K2√62 βjsij ln rij ln(e 2 ) , ≤ √d 2i ≤ √d where in the last inequality we used ln(e2 24i) √62i. Since r s >K for (i, j) in, ≤ ij ≥ ij 1 6∈ I1 the above inequality implies that

j K2√62 βjsij ≤ √d ln K1

57 for any (i, j) in. Therefore, ∈ I4 K2√6 j g4(M) αi 2 . (45) ≤ √d ln K 1 i 1 j:(i,j) in X≥ X∈I4 Now note that whenever (i, j) in, we have ∈ I4 1 2i 4 √drij √d e 2 j K

Claim 5. g (M) 16. 5 ≤ Proof. First note that if (i, j) in, we have ∈ I5 1 1 e 22i 4 α < and r . i e 22i ij ≤ α  i  Hence, for any (i, j) in we obtain ∈ I5 1 α α e 22i 4 √α d 1 √d α s = √d i r √d i = i α e 22i 4 2 , i ij 2i+j ij ≤ 2i+j α 2i+j i ≤ 2i+j  i  where in the last inequality we used a crude bound α 4. Thus,  i ≤ i j g (M) 2 β √d2− − . 5 ≤ j j 1 i:(i,j) in X≥ X∈I5 Since the second sum is bounded by 2, we deduce that g (M) 4 β 16, 5 ≤ j ≤ j 1 X≥ where the in last inequality we used that j 1 βj 4. ≥ ≤ Putting all the claims together, we getP

8K2√6e g(M)= αiβjsij 16K1 +24+32K2 + := U(K1,K2). ≤ K1 ln K1 (i,j) in X∈I Working with the transposed matrix (and corresponding graph), we get α β s U. i j ij ≤ (i,j) out X∈I Putting together the two estimates above, we complete the proof.

58 Konstantin Tikhomirov, Department of Mathematics, Princeton University, E-mail: [email protected]

Pierre Youssef, Laboratoire de Probabilit´es et de Mod`eles al´eatoires, Universit´eParis Diderot, E-mail: [email protected]

59