arXiv:2102.08364v1 [math.PR] 16 Feb 2021 pedxA e estimates Key References deviations large A. tail Appendix weights Lower Gaussian of largeness 9. eigenvector Uniform leading of localization 8. Optimal on bound conditioned upper 7. Structure deviations: large bound tail lower 6. Upper deviations: large tail 5. graphs Upper weighted of theory 4. proofs Spectral the of ideas 3. Key 2. Introduction 1. AG EITOSFRTELRETEGNAU FGAUSSIAN OF EIGENVALUE LARGEST THE FOR DEVIATIONS LARGE ags ievlevatecasclMtknSru theor Motzkin-Straus classical the via eigenvalue largest oad rcs nesadn ftelredvainbeha deviation large the of understanding precise a towards i.e., degree average constant ohl nyutlteaeaedge slgrtmcin logarithmic is degree average the until only hold to upiigyapast enw e nrdeti u proof our in ingredient an key by A obtained graphs new. weighted be to appears surprisingly éy admgraphs random Rényi rbeitrs.I h eetwrs[ works recent the In interest. erable asinmtie ml hti hsregime this in that imply matrices Gaussian interest. fixed a de for large behavior a non-universal prove a to integrals ing spherical suitable by measures Abstract. sa meit oolr,oeotisthat obtains one corollary, immediate an As h aewhen case The 1 Uprti rbblte n tutr hoe) For theorem): structure and probabilities tail (Upper (1) 2 Lwrti rbblte) h xc tece expone stretched exact The probabilities): tail (Lower (2) aia clique maximal ψ sas established. also is ute,w hwta odtoe nteuprti vn,wi event, tail upper the on conditioned that show we Further, hti loae oto t aso h frmnindcliq aforementioned the on vertices. mass its absol its of in most resu allocates localization high it optimal that an uniformly prove also are we Finally, weights clique. Gaussian the and values) ( δ ) EWRSWT OSATAEAEDEGREE AVERAGE CONSTANT WITH NETWORKS uhthat such ag eito eairo h ags eigenvalue largest the of behavior deviation Large p → mre ihavr precise very a with emerges 0 HRHNUGNUYADKENSKNAM KYEONGSIK AND GANGULY SHIRSHENDU G n,p a oee opeeylf pnwt n xetn h dens the expecting one with open left completely however was U ihiid asinwihso h de)hsbe h topic the been has edges) the on weights Gaussian i.i.d. with δ p P = ( λ P n d 1 ℓ ( 1 λ ≤ o oefixed some for − 1 p euto ftestandard the of reduction 30 ≥ 2(1 , p < p 6 ( + 2(1 ,apwru prahwsitoue ae ntilting on based introduced was approach powerful a ], Contents − λ δ 1 log ) 1 λ oprdt h tnadGusa ( Gaussian standard the to compared δ 1 clslike scales 1 δ > d log ) eedn ie(ae either (takes size dependent stypically is n exp = ) 0 n eut n[ in Results . = ) n nti ril efcso h aeof case the on focus we article this In . m[ em ito rnil,priual establish- particularly principle, viation > δ n √ ta behavior ntial iri hssetting. this in vior  − tfrtelaigegnetr showing eigenvector, leading the for lt log − 37 ψ sa xrmlseta hoyfor theory spectral extremal an is s 1+ (1 0 ( n ewihi peduiomyacross uniformly spread is which ue δ ,wihcudb findependent of be could which ], ℓ , n. λ ℓ )+ 2 ( epndw h xc exponent exact the down pin we 1 − δ o )+ 8 epoetefloigresults following the prove We fGusa ewrs(Erdős- networks Gaussian of aitoa omlto fthe of formulation variational (1) t au nteegsi the in edges the on value ute ngnrlnon-homogeneous general on ] o hhg rbblt,a probability, high th o (1)) (1) .  √ log 2 one n eutwhich result a , or p two behavior e 1 = fconsid- of possible unique case. ) 47 46 44 37 28 23 14 12 10 7 2 2 SHIRSHENDUGANGULYANDKYEONGSIKNAM

1. Introduction Spectral statistics arising from random matrices and their asymptotic properties have been the subject of major investigations for several years. Fundamental observables of interest include the empirical spectral measure as well as edge/extreme eigenvalues. The study of such quantities began in the classical setting of the Gaussian unitary and orthogonal ensembles (GUE and GOE) where the entries are complex or real i.i.d. Gaussians up to symmetry constraints. These exactly solvable examples admit complicated but explicit joint densities for the eigenvalues which can be analyzed, albeit involving a lot of work, to pin down the precise behavior of several observables of interest. The central phenomenon driving this article is the atypical behavior of the largest eigenvalue of a random matrix. This falls within the framework of large deviations which has attracted immense interest over the past two decades. Perhaps not surprisingly, this was first investigated in the above mentioned exactly solvable cases [3, 2]. Subsequently, Bordenave and Caputo [17] considered empirical distributions in Wigner matrices with entries with heavier tails where large deviations is dictated by a relatively small number of large entries. This phenomenon was shown for the largest eigenvalue as well in [4]. Another set of random matrix models arise from random graphs, particularly the Erdős-Rényi graph n,p on n vertices with edge probability p (0, 1). The literature on the study of such graphsG is massive with a significant fraction devoted∈ to the study of spectral properties. A long series of works established universality results for the bulk and edge of the spectrum in random graphs of average degree at least logarithmic in the graph size drawing similarities to the Gaussian counterparts (cf. [27, 28] and the references therein). For sparser graphs, however, including the case of constant average degree which is the focus of this article, progress has been relatively limited. Nonetheless, some notable accomplishments include the results in [1, 12, 11, 32] about the edge of the spectrum, as well as the results of [19] and [18], which studied continuity properties of the limiting spectral measure and a large deviation theory of the related local limits, respectively. While large deviations theory for linear functions of independent random variables is by now classical (see [25]), recently a powerful theory of non-linear large deviations has been put forth, developed over several articles (some of which are reviewed below), which treats non-linear functions such as the spectral norm of a random matrix with i.i.d. entries. Among the recent explosion of results around this, a series of works investigated spectral large deviations for , beginning with Chatterjee and Varadhan [23], where the authors proved a large Gn,p deviation principle for the entire spectrum of n,p at scale np, building on their seminal work [22], in the case where p is fixed and does not dependG on n (dense case). However, the sparse case where p = p(n) 0 was left completely open until a major breakthrough was made by Chatterjee and Dembo [21→]. This led to considerable progress in developing the theory of large deviations for various functionals of interest for sparse random graphs [5, 7, 10, 26, 42]. Via a refined understanding of cycle counts in n,p which was obtained in [5, 9, 15, 22, 24, 31, 35, 36], one can deduce large deviation properties for eigenvaluesG using the trace method and this was carried out in [14]. However such arguments only extended to p going to zero at a rate slower than 1/√n, since cycle statistics fail to encode information about the spectral norm for sparser graphs. Such sparser graphs were treated more recently in [13], where the first named author along with Bhaswar Bhattacharya and Sohom Bhattacharya analyzed the large deviations behavior for the spectral edge for sparse n,p in the entire “localized regime” when G

log n log n log(1/np) and np , (1) ≫ ≪ slog log n LARGEST EIGENVALUE OF SPARSE GAUSSIAN NETWORKS 3 where the extreme eigenvalues are governed by high degree vertices. This notably includes the well studied example of constant average degree. In a related direction, which merges the above two classical settings, a series of works [30, 6] have explored universality of large deviations behavior for the largest eigenvalue for a Wigner matrix with i.i.d sub-Gaussian entries. First in [30], it was shown that if the Laplace transform of the entries is pointwise bounded by those of a standard real or complex Gaussian, then a universal large deviation principle (LDP) same as in the Gaussian case holds. Examples of this include Rademacher variables and uniform variables. However the situation changes when the sub-Gaussian tails are not sharp. Perhaps the most interesting examples in this class are sparse Gaussian matrices whose entries are obtained by multiplying a Gaussian variable with an independent Bernoulli with mean p. In[6], this more general setting is investigated and it is shown that the for the LDP can indeed be different from the Gaussian case. The approach in both [30] and [6] broadly relies on considering appropriate tilts of the original measures and analyzing the associated spherical integrals. However, the above approach has been shown to work only in the ‘dense’ case of constant p where the typical behavior is still the same as when p = 1, leaving the sparse regime p 1 completely open. ≪In the case of the , as established in [32, 27, 28], at the Gn,p level, λ1 = (1+ o(1)) max(d1,np) where d1 denotes the maximum degree of the random graph. log n Consequently λ1 exhibits a transition at np = log log n , where the largest eigenvalue begins to be governed by the largest degree. A similar phenomenonq reflecting this transition for large deviations was established across the papers [14, 13]. In the case of Gaussian ensembles, although a precise result does not appear in the literature to the best of the authors’ knowledge, it is expected that the dense behavior extends to the case of the average degree being logarithmic in n (an analogous result for Wigner matrices with bounded entries, which is more comparable to the setting of random graphs, was established in [40]). Beyond this, as the graph becomes sparser, a different behavior is expected to emerge. This motivates the present work where we obtain a very precise understanding of the case of con- d stant average degree, i.e., p = n , arguably the most interesting sparse case because of its connections to various models of . Also relevant to this paper is a different line of research, which, motivated by viewing a random matrix as a random linear operator, considers ‘non-homogeneous’ matrices. The most well studied example is a Gaussian matrix where the variance varies from entry to entry. In this general setting, even the leading order behavior for the spectral norm is far from obvious and requires a much more refined understanding beyond the concentration of measure bounds obtained as a consequence of the non-commutative Khintchine inequality. A beautiful conjecture posed by Latala [33] related to an earlier result of Seginer [39] states that the expected spectral norm for such non-homogeneous Gaussian matrices, is up to constants the expectation of the maximum ℓ2 norm of a row and a column, and after a series of impressive accomplishments [8, 41], the conjecture was finally settled in the beautiful work [34]. Note that sparse Wigner matrices, quenching on the sparsity, falls in the above framework where the variance of each entry is 0 or 1. It is worth mentioning that while the dependence on n in the leading order behavior is pinned down in the above mentioned works, the techniques are not sharp enough to unearth finer properties such as the exact constant multiplicative pre-factor.

We now move on to the statements of the main theorems after setting up some basic notations. 4 SHIRSHENDUGANGULYANDKYEONGSIKNAM

1.1. Setup and main results: We will denote by Gn denote the set of all simple, undirected networks on n vertices labelled [n] := 1, 2,...,n i.e., simple graphs with a conductance value on { } each edge. For G Gn, denote by A(G) = (aij)1 i,j n, the adjacency matrix of G, that is aij is the conductance associated∈ to the edge (i, j) if the≤ latter≤ is an edge in G, and 0 otherwise. Thus graphs are trivially encoded as networks where the entries of A are 0 or 1. For F G , since A(F ) ∈ n is a self-adjoint matrix, denote by λ1(F ) λ2(F ) λn(F ) its eigenvalues in non-increasing order, and let F := A(F ) = max ≥λ (F ) , λ≥···≥(F ) be the operator norm of A. Throughout k kop k kop {| 1 | | n |} most of the paper we will be concerned with λ1(F ) and for notational brevity we will often drop the subscript to denote the same. d In this paper we are interested in the sparse Erdős-Rényi random graph n,p, where p = n for some d > 0 which does not depend on n. We will denote by X the randomG adjacency matrix associated to it. Thus for all 1 i < j n, X is an independent Bernoulli random variable with ≤ ≤ i,j mean p, and Xii = 0 for all i. Let Y be a standard GOE matrix, i.e. Yij N(0, 1) for i j. The matrix of interest for us is Z = X Y, i.e., Z = X Y . ∼ ≤ ⊙ ij ij ij Let λ1 λ2 λn be eigenvalues of the matrix Z. As a consequence of the already referred to work on≥ the≥···≥ behavior of the spectral norm of general inhomogeneous Gaussian matrices [8], it follows that E(λ ) log n. (2) 1 ≈ One also obtains concentration around E(λ1) usingp standard Gaussian techniques, see e.g. [8, Corollary 3.9]. However so far, the methods have not been able to obtain a sharper understanding including the precise constant in front of √log n which we deduce as a simple corollary of our main theorems. We now move on to the exact statements of the results in this paper. 1 Theorem 1.1 (Upper tail probabilities). For δ > 0, define a function φδ : N 2 R by ≥ → k(k 3) 1+ δ k φ (k) := − + (3) δ 2 2 k 1 − and ψ(δ) := mink N 2 φδ(k). Then, ∈ ≥ 1 lim log P(λ1 2(1 + δ) log n)= ψ(δ). (4) n log n →∞ − ≥ p Remark 1.2 (Infinite phase transition in upper tail). The rate function given by (4) is a continuous piecewise linear function with infinitely many pieces which we now describe in detail. Since we will only be concerned about the arg min restricted to integers larger than 1, we consider momentarily x(x 3) 1+δ x φδ(x)= 2− + 2 x 1 as a function of real numbers greater than one and notice that, − 3 1+ δ 1 φ′ (x)= x . (5) δ − 2 − 2 (x 1)2 − Thus φδ(x) is a strictly convex function. Let (δ) = arg mink 2 φδ(k) be the set of minimizers of φ ( ). By the strict convexity of φ ( ), (δ)Mis at most{ of size 2≥containing} either a single element δ · δ · M or two consecutive integers. Precisely, denoting by x(δ) > 1, the unique solution to φδ′ (x) = 0, any element in (δ) is either x(δ) or x(δ) . Now the values of δ for which (δ) is of size two forms a discrete set.M That is, there⌊ exists⌋ ⌈0 = δ⌉ < δ < δ < such that theM following holds: for any 1 2 3 · · · positive integer k 2, (δk 1, δk) is the collection of δ such that (δ)= k and δk is the unique δ such that (δ) =≥ k, k +− 1 . To see this, since δ x(δ) is strictlyM increasing,{ } it suffices to verify M { } 7→ 1 N will be used to denote the set of natural numbers, and N k to denote all the natural numbers bigger equal to k. ≥ LARGEST EIGENVALUE OF SPARSE GAUSSIAN NETWORKS 5

that the situation δ1 < δ2, φδ1 (k + 1) φδ1 (k) and φδ2 (k) φδ2 (k + 1) never occurs. Observe that the contrary implies ≤ ≤ φ (k + 1) φ (k) φ (k) φ (k + 1). δ1 ≤ δ1 ≤ δ2 ≤ δ2 By (5), φ (x) > φ (x), which contradicts the above. δ′ 1 δ′ 2 Hence, for δ [δk 1, δk], ∈ − 1+ δ k k(k 3) ψ(δ)= + − , 2 k 1 2 − which is a linear function in δ [δk 1, δk] for any fixed k 2. This implies that ψ(δ) is a continuous piecewise linear function. ∈ − ≥ Also by a simple algebra, it follows from (5) that 1+ δ 1+ δ 3 ( )1/3 + 1 0. Plugging this into (4), one thus obtains the following asymptotic behavior of the upper tail probabilities2 ( 1 δ+ 3 δ2/3+O(δ1/3)) P(λ 2(1 + δ) log n)= n− 2 25/3 for large δ > 0, and, (7) 1 ≥ δ+o(1) P(λ p2(1 + δ) log n)= n− for small δ > 0. (8) 1 ≥ Remark 1.3 (Comparisonp with maximum of i.i.d. Gaussians). As the reader possibly already notices, for small δ, the behavior in (8) is the same as that for the maximum of n many standard Gaussian variables. The reason for this will be discussed in the idea of proofs section. Having established the sharp order of the tail probabilities, we now state three results establishing a sharp structural behavior conditioned on the upper tail event δ := λ1 2(1 + δ) log n , thus unearthing the dominant mechanism dictating upper tail largeU deviations.{ ≥ The first result} shows p the existence of a clique of a very precise δ dependent size establishing a sharp concentration for the maximal clique size conditioned on . For any graph G , let k be the size of a maximal Uδ ∈ Gn G clique KG in G. Recall the definition of (δ) and let h(δ) be the smallest element of (δ). By Remark 1.2, M M 1+ δ 1/3 h(δ) 1 2. (9) − 2 − ≤   Theorem 1.4 (Structure theorem) . For any δ with h(δ) 3, i.e., δ > δ (see Remark 1.2 for the ≥ 2 definition of δk),

lim P kX (δ) δ = 1. (10) n ∈ M |U →∞   Furthermore, with conditional probability tending to one, KX is unique and any clique of size at least 4 is a subset of KX .

2Throughout the paper, o(1) will be used to denote functions of n that tend to 0 as n tends to infinity. However we will also need to deal with quantities that go to zero as δ converges to infinity, which would be denoted by oδ(1). 6 SHIRSHENDUGANGULYANDKYEONGSIKNAM

Note that the above statement in particular implies that the largest clique outside KX is a triangle whose occurrence has constant probability. Thus the above result proves a two point concentration for the maximal clique size and for values of δ such that (δ) only contains h(δ), it implies a one point concentration. M Our next result asserts that the most of the contribution to the spectral norm comes from KX , with the Gaussians along the edges of the latter being uniformly high in absolute value.

Theorem 1.5 (Uniformly high Gaussian values). There exists ζ = ζ(κ) > 0 with limκ 0 ζ = 0 such → that the following holds. For κ > 0, for δ large enough, with probability (conditional on δ) going to 1, there exists T K such that T (1 κ)h(δ) and U ⊂ X | |≥ − 1 1 ζ Z 2(1 + δ) log n 2(1 + δ) log n. (11) h(δ)2 | ij|− h(δ) ≤ h(δ) i=j,i,j T 6 X∈ p p Even though in the statement δ is chosen large enough as a function of κ, the proof will in fact give a quantitative, albeit technical, bound for all large δ and small κ which can then be simplified into the form of the statement of the theorem by choosing δ dependent on κ. Since the maximal clique KX has size h(δ) or h(δ)+ 1 with probability going to 1 (conditional on δ), the above theorem shows that the Gaussian values Zij on KX are uniformly high in absolute U 1 value and close to h(δ) 2(1 + δ) log n in the ℓ1 sense. Our final structural result is an optimal localization statement about the leading eigenvector. p Theorem 1.6 (Optimal localization of eigenvector). Let v = (v1, , vn) be the top eigenvector v · · · with 2 = 1 and consider the unique maximal clique KX and its size kX from Theorem 1.4. For κ> 0k, definek the events := v2 1 κ A1 i ≥ − i K n ∈XX o and 2 1 2 1 40κ 2 = vi 2 . A kX − kX ≤ k i K X n ∈XX   o Then, for sufficiently large δ > 0,

lim P( 1 2 δ) = 1. (12) n →∞ A ∩A | U Thus the above theorem says, for any κ> 0, for all large enough n, conditioned on , the leading Uδ eigenvector distributes at least 1 κ mass on KX almost uniformly. Note that the last two theorems− do not claim anything about the sign of the entries of the eigenvector or the Gaussian values. This is since switching the signs of the entries of the largest eigenvector arbitrarily and accordingly changing the signs of the Gaussians yields the same quadratic form. Having stated our results concerning upper tail deviations, the next result pins down the lower tail large deviation probability. Theorem 1.7 (Lower tail probabilities). For any 0 <δ< 1, 1 1 lim log log = δ. (13) n log n P(λ1 2(1 δ) log n) →∞  ≤ −  As an immediate corollary of Theorems 1.1 andp1.7, one obtains the following ‘law of large numbers’ behavior which we were surprised to not be able to locate in the literature. LARGEST EIGENVALUE OF SPARSE GAUSSIAN NETWORKS 7

Corollary 1.8. We have λ lim 1 = √2 n →∞ √log n in probability. We conclude this discussion by remarking that although in principle our techniques may be used to analyze a wider subset of the parameter space, we have, for concreteness and aesthetic considerations, chosen to simply focus on the case of constant average degree. 1.2. Organization of the article. In Section 2 we provide a detailed account of the keys ideas driving the proofs. In Section 3, we state and prove the key Proposition 3.1 obtaining a bound on the spectral norm in terms of the Frobenius norm for weighted graphs. The rest of the paper focuses on the proofs of Theorem 1.1 (Sections 4, 5), Theorem 1.4 in Section 6, Theorem 1.6 in Section 7, Theorem 1.5 in Section 8 and Theorem 1.7 in Section 9 respectively. Certain straightforward but technical estimates are proved in the appendix. 1.3. Acknowledgement. The authors thank Noga Alon for pointing out the classical reference [37]. SG is partially supported by NSF grant DMS-1855688, NSF CAREER Award DMS-1945172 and a Sloan Research Fellowship. KN is supported by UCLA Mathematics department. This work was initiated when SG was participating in the Probability, Geometry, and Computation in High Dimensions program at the Simons Institute in Fall 2020.

2. Key ideas of the proofs In this section we provide a sketch of the arguments in the proofs of our main results.

Upper tail lower bound: This is straightforward. The strategy is to plant a clique of an appro- priate size (arg maxk φδ(k)) and have high valued Gaussians on all the clique edges, i.e., at least √2(1+δ) log(n) k (k) . The probability of a clique of size k 3 appearing is up to constants n − 2 (the k 1 ≥ proof− follows by a second moment argument) while the probability of having high Gaussians is

1+δ k 2(1 + δ) log(n) C (2) P (k 1)2 Yij , 1 i < j k n− − , ≥ k 1 ∀ ≤ ≤ ≥ √log n  p −    where the right hand side follows from standard Gaussian tail bounds (see (24) later). Thus the total k 1+δ k k ( ) (k 1)2 (2) cost at the polynomial scale is n − 2 n− − . Observe that the exponent is precisely φδ(k). k − When k = 2, one should view it slightly differently however, since k 2 = 1 > 0. Namely, there k − k ( ) are order n − 2 = n many edges and hence the probability that there exists  a Gaussian of value at (1+δ) δ φ (2) ψ(δ) 2(1 + δ) log n is nn− = n− = n− δ . Finally, optimizing over k yields the bound n− . It is worth noticing the contrasting behavior in the absence of the Gaussian variables, where in p [13] it was shown that large deviations for the largest eigenvalue is guided by the large deviations for the maximum degree and not by appearance of a clique.

Upper tail upper bound: This is the most difficult among the four bounds and a significant part of the work goes into proving this. The first step is to make the underlying graph sparser by only focusing on the Gaussians with a large enough value. As will be apparent soon, the reason for this is two-fold. a) It is much harder for the graph restricted to small Gaussian values to have a high spectral norm, and so for our purposes we will treat that component as spectrally negligible, b) The graph restricted to high Gaussian values is much sparser and hence admits greater shattering into 8 SHIRSHENDUGANGULYANDKYEONGSIKNAM smaller components whose sizes we can control; since eigenvalues of different components do not interact with each other, this will be particularly convenient. Proceeding to implement this strategy, decompose the Gaussian random variables Yij as

(1) (2) Yij = Yij + Yij , (1) 1 (2) 1 where Yij = Yij Y >√ε log log n and similarly Yij = Yij Y √ε log log n. Thus, we can write the | ij | | ij|≤ matrix Z as Z(1) + Z(2) with (1) (1) (2) (2) Zij = XijYij ,Zij = XijYij , (14)

(1) (2) (1) 1 and similarly X = X + X i.e., Xij = Xij Y >√ε log log n. We next prove an upper bound | ij | on the probability that Z(2) has high spectral norm which is much smaller than that for Z which implies that the spectral behavior of Z even under large deviations is dictated by that of Z(1). The choice of the truncation threshold is governed by the fact that the typical spectral norm of n, d G n log n is of order log log n which in itself is a consequence of the fact that the maximum degree is of log n q order log log n . Sharp large deviations behavior for eigenvalues of sparse random graphs was recently established in the already mentioned work [13] which we use to make this step precise. This allows one to focus simply on Z(1) or the underlying graph X(1), conditioning on which makes the spectral behavior of the individual connected components independent guided by the Gaussian variables each of which are conditioned to be at least √ε√log log n. Let C , ,C be its connected components. At this point denoting the network Z restricted to 1 · · · k Cℓ by Zℓ, we relate Zℓ op to its Frobenius norm Zℓ F . The trivial bound Zℓ op Zℓ F is easy to see. The next ideak whichk is the key one in thisk paperk relies on the followingk k sharp≤ k improvementk over the above. Namely we show that if kℓ is the size of the maximal clique in Zℓ, then

2 kℓ 1 2 Zℓ op − Zℓ F . (15) k k ≤ kℓ k k

The proof of the above relies on reducing the standard ℓ2 variational problem for the spectral norm to an ℓ1 version which can be solved by ‘mass transportation’ techniques. And the above leads us to a bound of the form P 2 P 2 kℓ ( Zℓ op 2(1 + δ) log n) Zℓ F 2(1 + δ) log n . (16) k k ≥ ≤ k k ≥ kℓ 1  −  2 Now quenching the graph X, the random variable Zℓ F can be viewed at first glance as a chi- squared random variable with degrees of freedom givenk byk the component size E(C ) . Now as long | ℓ | as E(Cℓ) is o(log n), the degree of freedom does not affect the latter probability in its leading order behavior| and| it behaves as the square of a single Gaussian. This is what justifies the sparsification step mentioned at the outset which ensures that C = O ( log n ) which along with the tree like | ℓ| ε log log n behavior of C implies E(C ) = O ( log n ) as well (Here O ( ) is the standard notation denoting ℓ | ℓ | ε log log n ε · that the implicit constant is a function of ε.) 2 However there is one crucial subtlety that we have overlooked so far. Namely, Zℓ F is not simply a chi-squared random variable but instead is a sum of squares of independentk Gaussiank variables each conditioned to have an absolute value at least √ε log log n. This makes the tail heavier by the exact amount which on interacting with the ε dependence in the size of Cℓ begins to affect the leading order probability. Thus unfortunately the above strategy ends up not quite working. LARGEST EIGENVALUE OF SPARSE GAUSSIAN NETWORKS 9

To address this we further rely on the fact that Cℓ is almost tree-like and has a bounded number of ‘tree-excess edges’ with high probability and revise our strategy in the following way. Consider the eigenvector v corresponding to the largest eigenvalue λ(ℓ) := λ1(Cℓ). Thus we know v⊤Zℓv = λ(ℓ). The key idea now is to split the vertices of Cℓ, according to high and low values of v. We first show that it is much more costly for the Frobenius norm to be high on the subgraph induced by the low values of v. This is where the tree like property is crucially used as well. Thus we focus only on the O(1) vertices supporting high v values and since the maximum degree log n is O( log log n ) (without an ε dependence in the constant), the strategy originally outlined can be made to work for the subgraph induced by these vertices.

While the next three proofs are rather technically involved, here we simply review the high level strategies involved.

Emergence of a unique maximal clique: The above proofs imply that the graph X(1) under Uδ contains a clique K (1 whose size is sharply concentrated on (δ) (where the latter appearing in X M the statement of Theorem 1.4 denotes the set of minimizers of φ ( )). It also follows that K (1) is δ · X unique. We then show that on account of sparsity, superimposing X(2) on X(1) does not alter this. Particularly convenient is the fact that conditional on X(1), the spectral behavior of Z(1) and the random graph X(2) are independent. However making this precise is delicate and is one of the most technical parts of the paper, relying on a rather refined understanding of the graph X(1) under the large deviation behavior of λ(Z(1)) Such understanding also allows us to show that there does not exist any other clique in X of size at least 4 which is not contained in KX .

Localization of the leading eigenvector: The proof of this is reliant on the fact that (15) is sharp only when the leading eigenvector is supported on the maximal clique KX . We prove a quantitative version of this fact showing that significant mass away from the clique results in a deteriorated form of (15) which then makes much more costly than the already proven lower bound for its Uδ probability. Further a similar approach is used to prove the desired flatness of the vector on KX .

Flatness of the Gaussian values on the maximal clique. Using the previous structural result about 1 the leading eigenvector v = (v1, v2, . . . , vn), we consider the set T KX such that vi for all ⊂ | |≈ kX i T (we don’t make the meaning of precise) The previous results guarantee that, conditional ∈ ≈ on δ, T (1 κ)kX and kX h(δ) 1. Firstly showing that the spectral contribution from theU edges| | incident ≥ − on T c is negligible,| − it| follows ≤ that the quadratic form v Zv v Z v where v ⊤ ≈ T⊤ T T T and ZT are the restrictions to the subgraph induced on T. Now owing to the flatness of v on T (and this is why we work with T and not KX ), it follows that

(1 + oδ(1)) v⊤Z v 2 Z , T T T ≤ h(δ) k T k1

where the ℓp norm ZT is defined by

1/p Z := Z p . k T kp | ij| i

In fact the above argument only implies a lower bound, while the upper bound follows from the following sharp bound on the ℓ2 norm which is a consequence of previous arguments (e.g. (16)). Z 2 (1 + o (1))(1 + δ) log n. k T k2 ≈ δ Using the above two bounds, one can conclude the statement of the theorem in a straightforward fashion.

Lower tail: The upper bound can be obtained simply by a comparison with the maximum of O(n) many independent Gaussians. For the lower bound, Z(2) can still be considered spectrally negligible, while for Z(1), condi- tioning on X(1) being ‘nice’, with none of the components being too large while also having at most bounded tree excess we use the results about the upper tail to upper bound the probabil- ity that for any connected component C , λ(ℓ) 2(1 δ) log n or in other words lower bound ℓ ≥ − P(λ(ℓ) 2(1 δ) log n) where λ(ℓ) := λ (C ). Since λ (Z) = max (λ(ℓ)) and, conditioning on 1 ℓ p 1 ℓ the graph≤ makes−λ(ℓ) across different values of ℓ independent, the result follows in a straightforward p fashion.

3. Spectral theory of weighted graphs As outlined in Section 2, a key ingredient in our proofs is a new deterministic bound on the spectral norm in terms of the Frobenius norm by an ℓ2 ℓ1 reduction. Though this is independently interesting, the proofs are somewhat technical and the→ reader only interested in the large deviations aspect, at first read can simply treat this result as an input in the proof of Theorems 1.1.

3.1. Spectral norm and Frobenius norm. For a Hermitian matrix A of size n n, let λ1 λ be the eigenvalues in a non-increasing order. Then, we have × ≥ ···≥ n tr(Ak)= λk + + λk , 1 · · · n which immediately implies that for any even positive integer k, λk tr(Ak) nλk. (17) 1 ≤ ≤ 1 We denote by A , the Frobenius norm of the matrix A: k kF 1/2 A := (tr(A2))1/2 = a2 , k kF ij 1 i,j n  ≤X≤  Then, taking k = 2 above, we record the following trivial bound λ2 A 2 . (18) 1 ≤ k kF 3.2. Refined bound on spectral norms for weighted graphs. We now move on to a sharp bound on the spectral norm in terms of the Frobenius bound for networks improving the above. Before stating the result let us discuss a situation where one already obtains an improvement over (18), namely for bipartite graphs. This is because of the underlying symmetry in the spectrum, as a consequence of which we get λ = λ and hence 1 − n 1 λ (A)2 A 2 . 1 ≤ 2 k kF The main result of this section is a new and sharp generalization of this inequality. LARGEST EIGENVALUE OF SPARSE GAUSSIAN NETWORKS 11

Proposition 3.1. Let k be the maximal size of clique contained in G. Then, for any conductance a : E R, we have → k 1 λ (A)2 − A 2 (19) 1 ≤ k k kF Remark 3.2. For G a clique of size k, with adjacency matrix A, it is straightforward to see that k 1 λ (A)2 = − A 2 . (20) 1 k k kF This follows from the fact that a k k matrix whose off-diagonal entries are 1 and on-diagonal entries are 0 has the largest eigenvalue× k 1 and the Frobenius norm √k2 k. − − The proof of the proposition will rely crucially on the following bound which goes back to the seminal work of Motzkin and Straus [37] whose proof we include for completeness. Lemma 3.3. Suppose that k is the maximal size of clique contained in the graph G with vertex set [n]. Let f = (f , ,f ) be a vector with n f = s and f 0. Then, 1 · · · n i=1 i i ≥ k 1 P f f − s2. (21) i j ≤ 2k i

λ1(A)= sup aijfifj. f 2=1 i j k k X∼ Thus, for any conductance a : E R, → λ1(A) i j aijfifj = sup ∼ A F f =1 A F k k k k2 P k k 2 1/2 2 2 1/2 ( i j aij) ( i j fi fj ) sup ∼ ∼ ≤ f =1 A F k k2 P k Pk 1/2 1/2 2 2 = sup fi fj = sup wiwj , f 2=1 i j w 1=1,wi 0 i j k k  X∼  k k ≥  X∼  where the second line follows by Cauchy-Schwarz inequality and the final equality witnesses the ℓ ℓ reduction. By Lemma 3.3, we have 2 → 1 1/2 k 1 1/2 sup w w − , i j ≤ k w 1=1,wi 0 i j k k ≥  X∼    which finishes the proof.  We now provide the proof of Lemma 3.3. Proof of Lemma 3.3. The proof is based on a ‘mass transportation’ argument. By homogeneity, it suffices to assume s = 1. We first verify (21) when G is itself a clique of size m. In other words, we claim that if m f = 1 and f 0, then i=1 i i ≥ m 1 P f f − . (22) i j ≤ 2m 1 i

This follows from the simple equation 2 f f = ( f )2 f 2 and that f 2 1 (by i

fifj = fi fv1 + fj fv2 + fifj i v1 j v2 i,j=v1,v2,i j X  X∼   X∼  6 X ∼ is linear in f and f , f does not decrease when f = ( ,f , ,f , , ) is replaced by v1 v2 · · · v1 · · · v2 · · · f (1) = ( ,f + f , , 0, ). After removing the zero at v , we obtain a new vector f˜(1) on the · · · v1 v2 · · · · · · 2 new graph G1 obtained by deletion of the vertex v2 and the edges incident on it. We repeat this procedure to get a series of vectors f˜(1), , f˜(ℓ) and graphs G , , G such that · · · 1 · · · ℓ Gi+1 is obtained by deletion of some vertex wi+1 and edges incident on wi+1 in the graph Gi. This procedure is finished once every pair of vertices in Gℓ are connected, i.e. Gℓ is a clique of size m k. This along with (22) finishes the proof. ≤ 

We end this section with a related short technical lemma which we will need later. The reader can choose to ignore this for the moment and only come back to it when it is later used. Lemma 3.4. Suppose that G is a tree with a vertex set [n] and s, η are positive numbers. Let v = (v , , v ) be a vector with v = s and 0 v η. Then, 1 · · · n i i ≤ i ≤ P 1 2 4 s s< 2η, vivj (23) ≤ η(s η) s 2η. i

4. Upper tail large deviations: lower bound To begin with, we state a well known estimate for the tail behavior of the maximum of Gaussian random variables which is a straightforward consequence of the following classical bound (We provide the proofs in the appendix.): For the standard Gaussian random variable X, for any t> 0,

1 t t2/2 1 1 t2/2 e− P(X>t) e− (24) √2π t2 + 1 ≤ ≤ √2π t (see [20, Equation (A.1)]). Lemma 4.1. Let X , , X be i.i.d. standard Gaussian random variables and m cn for some 1 · · · m ≥ constant c> 0. Then, there exists a constant c′ = c′(c) > 0, such that for any δ > 0,

c′ 1 P( max Xi 2(1 + δ) log n) (25) i=1, ,m ≥ ≥ √log n nδ ··· p and

nδ c′ P( max Xi 2(1 δ) log n) e− √log n . (26) i=1, ,m ≤ − ≤ ··· p LARGEST EIGENVALUE OF SPARSE GAUSSIAN NETWORKS 13

As indicated in Section 2, we first show that the number of non-zero elements of the matrix Z is d at least of order n with high probability. Recall that for us p = n in n,p throughout the article and the number of non-zero elements in X is twice the same as the numberG of edges in the underlying random graph G. Let us define an event d E := 1 i < j n : X = 0 > n . (27) 0 |{ ≤ ≤ ij 6 }| 16 n o Lemma 4.2. There exists a constant c> 0 such that for sufficiently large n, c cn P(E ) e− . 0 ≤ This follows from standard large deviation estimates and we include the proof in the appendix for completeness.

Proof of Theorem 1.1: lower bound. As indicated in Section 2, there is a slight distinction between k = 2, and k 3, i.e. the lower bound is governed by two related but distinct events, a large value realized on an≥ edge, or existence of a clique of size at least 3 with the Gaussians uniformly large on the edges in the clique.

Single large value: We first deal with the former case and prove 1 lim sup log P(λ1 2(1 + δ) log n) δ. (28) n −log n ≥ ≤ →∞ p Since the matrix Z is Hermitian,

λ1 max Zij. (29) ≥ 1 i

P(λ1 2(1 + δ) log n) P( max Zij 2(1 + δ) log n) ≥ ≥ 1 i

Clique construction: We now move on to the clique construction. To this end, fix a positive integer m and let G be a network on the clique of size m, Km, whose conductances Yij : 1 i < j m are i.i.d. standard Gaussians. We denote by λ(Y ) the largest eigenvalue{ of the≤ ≤ } adjacency/conductance matrix Y = (Yij) of the network. By (24), for some constant C = C(δ) > 0, 1 P(λ(Y ) 2(1 + δ) log n) P Y 2(1 + δ) log n, 1 i < j k ≥ ≥ ij ≥ k 1 ∀ ≤ ≤ p  − p k  C 1+δ (2) n− (k 1)2 . (32) ≥ √log n −   14 SHIRSHENDUGANGULYANDKYEONGSIKNAM

Next, we need an estimate of the probability that a graph contains a clique of size k. This is provided in the next lemma which along with (32) imply that for any k 3, ≥ (k) k C 1+δ 2 P (2)+k (k 1)2 (λ1 2(1 + δ) log n) Cn− n− − ≥ ≥ √log n   p k (2) k(k 3) 1+δ k C − φ (k)+o(1) = C n− 2 − 2 k 1 = n− δ . (33) √log n −   Since φδ(2) = δ, putting (28) and (33) together, we are done.  Lemma 4.3. Let k 3 be a positive integer. Then, there exists a constant C = C(k, d) > 0 such ≥ that the probability that n, d contains a clique of size k is up to universal constants G n 1 . (34) (k) k n 2 − Proof. Note that the expected number of cliques is indeed up to constants 1 . which implies k k n(2)− the upper bound. Thus to lower bound the probability of existence of at least one clique we use the familiar second moment method. However as has been used several times in the probabilistic combinatorics literature (see e.g., [29, Theorem 2.3]), to control the second moment, it will be useful to work with the number of cliques which are also their respective connected components. To this end, let us denote their number by Nk. Then, k k k 1 (2) k k(n k) n ( ) k(n k) e − d k d − 1 1 ENk = p 2 (1 p) − 1 1 C . (35) k+ 1 k k k k k − ≥ k 2 − n − n n(2) ≥ n(2)       − − where above we use Stirling’s formula to approximate k! and we use the bound n!/(n k)! (n k)k. Further, − ≥ −

2 n n k 2(k) k2+2k(n 2k) k2 2 EN = EN + − p 2 (1 p) − EN + (1 p)− (EN ) . (36) k k k k − ≤ k − k    Thus, by the Paley-Zygmund inequality, for sufficiently large n, 2 (ENk) 1 1 P(Nk 1) . (37) ≥ ≥ EN 2 ≥ (EN ) 1 + (1 p) k2 ≥ (k) k k k − − − Cn 2 − + 2 

5. Upper tail large deviations: upper bound A significant fraction of the novel ideas in the paper can be found in this section which aims to implement the high level strategy outlined in Section 2. Before beginning, we include a short roadmap to indicate what the different subsections achieve. In subsection 5.1 we record tail estimates for sums of squares of Gaussian variables conditioned to be large. In subsection 5.2 we show that with high probability the network Z(2) from Section 2 is spectrally negligible. We then move on to analyzing the connectivity structure of the graph X(1) underlying the network Z(1), including its maximum degree, size of its connected components and the number of tree excess edges they contain in subsection 5.3. In subection 5.4 we prove a key proposition (Proposition 5.7) establishing tails for the largest eigenvalue for tree like networks in terms of the largest clique. Finally in subection 5.5, we prove the upper bound in Theorem 1.1. LARGEST EIGENVALUE OF SPARSE GAUSSIAN NETWORKS 15

5.1. Chi-square tail estimates: We record the following estimate that will be crucial in our applications whose proof is provided in the appendix. Lemma 5.1. Let Y˜ be a standard Gaussian conditioned on Y˜ > √ε log log n, and denote Y˜ , , Y˜ | | 1 · · · m by independent copies of Y˜ . Then, there exists a universal constant C > 0 such that for any L>m and ε> 0, m 2 2 m 1 L 1 m L 1 εm log log n P(Y˜ + + Y˜ L) C e− 2 e 2 e 2 . (38) 1 · · · m ≥ ≤ m   In particular, for any a, b, c > 0, let m b log n + c and L = a log n. Then, for any γ > 0, for ≤ log log n sufficiently large n, 2 2 a + εb +γ P(Y˜ + + Y˜ a log n) n− 2 2 . (39) 1 · · · m ≥ ≤ Recall from Section 2, the decompositions (1) (2) Yij = Yij + Yij , (1) 1 (2) 1 where Yij = Yij Y >√ε log log n and similarly Yij = Yij Y √ε log log n. Thus, we can write the | ij | | ij|≤ matrix Z as Z(1) + Z(2) with (1) (1) (2) (2) Zij = XijYij ,Zij = XijYij . (40) 5.2. Spectrally negligible component. We next prove an upper bound on the probability that Z(2) has high spectral norm. Lemma 5.2. For δ > 0, log P(λ (Z(2)) √ε(1 + δ)√log n) lim − 1 ≥ 2δ + δ2. n →∞ log n ≥ Proof. The proof relies on the results of the previously mentioned recent work [13]. By [13, Theorem 1.1], P log n log (λ1(X) (1 + δ) log log n ) lim − ≥ = 2δ + δ2. n q →∞ log n Since Z(2) X √ε log log n, we have λ (Z(2)) √ε log log n λ (X) which concludes the proof.  | ij |≤ ij 1 ≤ · 1 5.3. Connectivity structure of highly sub-critical Erdős-Rényi graphs. We will now shift (1) (1) 1 our focus to Z . Recall that Xij = Xij Y >√ε log log n. By the tail bound for Gaussian stated in | ij | (24), for large n, X(1) is distributed as with Gn,q d 1 1 ε log log n d′ 1 q e− 2 = , (41) ≤ n √2π n (log n)ε/2 where d = d . ′ √2π For any graph G, we denote by d1(G), the largest degree of G. It is proved in [32] (see also [13, log n Proposition 1.3]) that the typical value of d1( n,r) is log log n log(nr) , when G − log n log n log(1/nr) and nr . ≫ ≪ slog log n Furthermore, the following large deviation result is a consequence of [13, Proposition 1.3]. 16 SHIRSHENDUGANGULYANDKYEONGSIKNAM

Lemma 5.3. For δ > 0, let be an event defined by 1 Dδ1 log n := d (X(1)) (1 + δ ) . (42) Dδ1 1 ≤ 1 log log n Then, n o log P( c ) δ1 lim − D δ1. n log n →∞ ≥ d Proof. The statement, where the inequality above is replaced with an equality, for the case r = n d is obtained in [13, Proposition 1.3], by plugging in r = n in the latter and noting that in this case log n log n = . log log n log(nr) log log n log d − − The above result then follows by observing that n, d stochastically dominates n,q and d1(G) is an G n G increasing function of the graph.  We next move on to a refined analysis of the connectivity structure of the graph X(1). Towards this, let C1, ,Cm be its connected components. The next lemma establishes a bound of the order log n · · · 2/3 of log log n on the size of the largest component in contrast to the bounds of Θ(log n), Θ(n ), or Θ(n), that one has for n, d depending on if d< 1, d = 1 or d> 1. This sub-logarithmic bound will G n be crucial in our application and justifies our sparsification step. Lemma 5.4. For δ > 0, let be the following event. 2 Cδ2 2+ δ log n := C 2 , i . (43) Cδ2 | i|≤ ε log log n ∀ Then, n o log P( c ) δ lim inf − Cδ2 2 . n log n 2 →∞ ≥ Proof. The proof implements the standard first moment argument, (see e.g., [16, Chapter 5,6]). Let N¯k, 1 be the number of connected subgraphs having k vertices and k 1 edges, in other words the number− of trees of size k. Using (41) and Stirling’s formula, and the fact− that the number of labelled k+2 spanning trees on k vertices is k , for some large constant c0 > 0, k 1 n k+2 d′ 1 − EN¯k, 1 k − ≤ k n (log n)ε/2     nk (d )k 1 ekk2(d )k 1 c k Cek kk+2 ′ − = Cn ′ − Cn(log n)ε/2 0 . (44) k k 1 ε (k 1) ε (k 1) ε/2 ≤ k n (log n) 2 − (log n) 2 − ≤ (log n) −   Hence, denoting Nk by the number of connected components with k vertices, picking a spanning tree from each connected component, one obtains k ε/2 c0 ENk EN¯k, 1 Cn(log n) . ≤ − ≤ (log n)ε/2   2+δ2 log n Define m := ε log log n , and let N be the number of connected components having at least m vertices. Then, n m (log c )(2+δ ) ε/2 c0 ε/2 0 2 δ2 EN = E N Cn(log n) C(log n) n ε log log n n− 2 . k ≤ (log n)ε/2 ≤ kX=m   LARGEST EIGENVALUE OF SPARSE GAUSSIAN NETWORKS 17

Since P(N 1) E(N), the proof is complete.  ≥ ≤ For our applications, we will also need to bound the number of subgraphs having k vertices and k + ℓ edges without the subgraph necessarily being connected. This estimate will be crucially used later to prove the structure theorem conditioned on . Uδ Lemma 5.5. For ℓ 0, let N be the number of subgraphs in X(1) having k vertices and k + ℓ ≥ k,ℓ edges. Then, for 0 ℓ k k, ≤ ≤ 2 −  k ℓ d e2 k+ℓ EN C min , ′ . k,ℓ ≤ n (log n)ε/2      Proof. Denote by Ck,ℓ the number of labelled graphs with k vertices and k + ℓ edges. Then, for any k ℓ k k, using Stirling’s formula we have − ≤ ≤ 2 −  k k2 (k2)k+ℓ k2(k+ℓ) C = 2 ek+ℓ . (45) k,ℓ k + ℓ ≤ k + ℓ ≤ (k + ℓ)! ≤ (k + ℓ)k+ℓ      Then, for 0 ℓ k k, ≤ ≤ 2 − n (41) nk k2(k+ℓ) d 1 k+ℓ k ℓ d e2 k+ℓ EN C qk+ℓ Ce2k+ℓ ′ C ′ . (46) k,ℓ ≤ k k,ℓ ≤ kk (k + ℓ)k+ℓ n (log n)ε/2 ≤ n (log n)ε/2         where in the first inequality we use Stirling’s formula again to bound k!. In particular, since k n, ≤ d e2 k+ℓ EN C ′ , (47) k,ℓ ≤ (log n)ε/2   and since d e2 (log n)ε/2 for sufficiently large n, ′ ≤ k ℓ EN C . (48) k,ℓ ≤ n   

Having bounded the maximal component size, we next proceed to estimating how close the components are to trees by bounding the number of tree excess edges, i.e., how many edges need to be removed from such a component to obtain a tree. Lemma 5.6. For δ 1, let be the event defined by 3 ≥ Eδ3 := E(C ) < V (C ) + δ , i . (49) Eδ3 {| i | | i | 3 ∀ } Then, log P( c ) δ3 lim inf − E δ3. (50) n →∞ log n ≥ In addition, define the event by T := E(C ) = V (C ) 1, i . T {| i | | i |− ∀ } In other words, is the event that all the connected components of X(1) are trees. Then, T C P( c) . (51) T ≤ (log n)ε 18 SHIRSHENDUGANGULYANDKYEONGSIKNAM

Proof. For ℓ 0, recall the notation N from Lemma 5.5. Since the occurrence of the event ≥ k,ℓ c 2+2δ3 log n 2δ demands the existence of a connected component Ci with Ci =: m and Eδ3 ∩C 3 | |≤ ε log log n E(Ci) Ci + δ3 , by the first moment bound, j k | | ≥ | | ⌈ ⌉ k m (2) k m − (48) δ3 δ3 +1 P c E k ⌈ ⌉ m⌈ ⌉ ( 2δ3 ) Nk,ℓ C C . (52) δ3 δ3 E ∩C ≤ ≤ n ≤ n⌈ ⌉ k=3 ℓ= δ3 k=3 X X⌈ ⌉ X   Therefore, by (52) and Lemma 5.4 (with δ2 = 2δ3), we obtain (50). (1) Next, we prove (51). Let Ncycle be the number of cycles in X . Then, n n k k n (k 1)! k n d′ 1 C ENcycle = − q . k 2 ≤ 2k n (log n)ε/2 ≤ (log n)ε Xk=3   Xk=3   Since the occurrence of c implies the existence of cycle, by the first moment bound, we obtain (51). T 

5.4. Spectral tail for tree like networks. We have so far defined the events α, α, α, , and in the previous series of lemmas, having established that each connected componentD C isE of sizeT log n O( log log n ) and the number of excess edges is bounded with high probability, in the following key proposition, we control the spectral norm of such a connected component. This will be a particularly important ingredient in the proof of Theorem 1.1.

Proposition 5.7. Consider a connected network G = (V,E,A) (where A = (aij) is the matrix of conductances) satisfying the following properties: (1) d (G) c log n 1 ≤ 1 log log n (2) V c log n | |≤ 2 log log n (3) E V + c | | ≤ | | 3 Suppose that the conductance matrix A is given by i.i.d. Gaussians associated to each element of E, conditioned on having absolute value greater than √ε log log n. Let k be a maximal size of clique 1 in G and λ be the largest eigenvalue of A. Then, for any ε,α,γ,η > 0 with η < 2 , for sufficiently large n, α εc2 k (1 θ)2α+ c1ε +γ 2 + +γ 2(k 1) 2η2 P(λ 2α log n) n− 2θ 2 + n− − − , (53) ≥ ≤ 2 4 1/4 where θ := (2η + 2η c3) p. The expression on the right hand side is technical but the constants ε, η, γ will be suitably chosen c ε α + εc2 +γ 1 +γ sufficiently close to zero so that n− 2θ2 2 and n 2η2 are negligible and the dominant behavior k 2(k 1) α will be n− − . From now on, for any graph H, we denote by E(H) and E−−−→(H) the sets of undirected and directed edges in H respectively. Proof. The proof proceeds by analyzing the leading eigenvector. Let V = [ℓ] and f = (f , ,f ) be 1 · · · ℓ the unit (random) eigenvector associated with the largest eigenvalue λ := λ1(G). Thus by definition, λ = f ⊤Af. One would have liked to use Proposition 3.1 and the tail estimate (39). However the application of the latter is useful only when the parameter b in the upper bound of m is small enough compared to 1 . On the other hand, in Lemma 5.4, the bound on C which would be m in the application ε | i| LARGEST EIGENVALUE OF SPARSE GAUSSIAN NETWORKS 19

1 log n is O( ε log log n ) rendering the above straightforward strategy useless. To address this, the first step is to argue that entries of f that are small in absolute value do not contribute much to the above quadratic form. This allows us to focus on only the large entries, of which there are not too many and hence allows an application of the above outlined strategy with a reduced value of m. Towards this, for 0 <η< 1/2, define the collection of vertices I := i [ℓ] : f 2 < η2 . { ∈ i } Let B1 be the collection of (directed) edges defined by B := (i, j) −→E : i, j I , 1 { ∈ ∈ } and let B := −→E B where again each edge is considered twice (this is done simply as a matter of 2 \ 1 convention) Now since f is a unit vector, by Markov’s inequality, Ic 1 . In addition, by the | | ≤ η2 upper bound on the max-degree in condition (1), we obtain 1 c log n B 1 . (54) 2| 2|≤ η2 log log n We write

λ = aijfifj = aijfifj + aijfifj =: S1 + S2. (i,j) −→E (i,j) B1 (i,j) B2 X∈ X∈ X∈ 2 4 1/4 Recall θ = (2η + 2η c3) , we have P(λ 2α log n) P(S θ 2α log n)+ P(S (1 θ) 2α log n). (55) ≥ ≤ 1 ≥ 2 ≥ − Of course, the abovep inequality holds for anypθ and the particular choicep we make is guided by our subsequent estimates of S1 and S2. First, we show that f 2f 2 2η2 + 2η4c = θ4. (56) i j ≤ 3 (i,j) B1 X∈ We will rely on Lemma 3.4. Choose a spanning tree T of G, and define a set of (directed) edges E−→ := E−−−→(G) E−−−→(T ). Then, by condition (3) on the number of excess edges, 1 E c + 1. Now ′ \ 2 | ′| ≤ 3 the graph with edge set B1 E−→′ is necessarily a forest. Since adding more edges can only increase 2 2 \ f f we can in fact assume that the graph with edge set B1 E−→′ is a tree. Now applying (i,j) B1 E−→ i j ∈ \ ′ \ Lemma 3.4 with s = 1, and since 2η2 1, we conclude that P ≤ f 2f 2 2η2(1 η2). (57) i j ≤ − (i,j) B1 E−→ X∈ \ ′ Hence, f 2f 2 = f 2f 2 + f 2f 2 2η2(1 η2) + 2(c + 1)η4, (58) i j i j i j ≤ − 3 (i,j) B1 (i,j) B1 E−→ (i,j) B1 E−→ X∈ X∈ \ ′ X∈ ∩ ′ where for the second term, we simply use the fact that the total number of summands is at most 4 2(c3 + 1) with each being at most η . This proves (56). Hence by the definition of S1, by Cauchy- Schwarz inequality, we immediately have 1/2 1/2 1/2 1/2 S a2 f 2f 2 <θ2 a2 θ2 a2 . 1 ≤ ij i j ij ≤ ij (i,j) B1 (i,j) B1 (i,j) B1 (i,j) E  X∈   X∈   X∈   X∈  20 SHIRSHENDUGANGULYANDKYEONGSIKNAM

Thus, for any γ > 0, for sufficiently large n, εc 2 α α + 2 +γ P(S θ 2α log n) P a log n n− 2θ2 2 , (59) 1 ≥ ≤ ij ≥ θ2 ≤ i 0, for sufficiently large n, | |≤ η2 P ij ≥ i or j J,i 0 is defined by p p 2(1 + δ )= 2(1 + δ) √ε(1 + δ). (65) ′ − p p LARGEST EIGENVALUE OF SPARSE GAUSSIAN NETWORKS 21

Note that from this (by rearranging and multiplying both sides by 2(1 + δ)+ 2(1 + δ′)), we have p p 3/2 δ √2ε(1 + δ) δ′ δ. (66) − ≤ ≤

Using the result in subsection 5.2, the second term in (64) will be negligible, so we focus on esti- (1) 1 mating the first. Recalling Xij := Xij Y >√ε log log n, let us estimate the conditional probability | ij | P (1) (1) Few cycles (λ1(Z ) 2(1 + δ′) log n X ) on the high probability event 4δ′ 4δ′ 4δ′ . By definition,≥ on this event, we| have D ∩C ∩E ∩ − p (1) log n d (X ) < (1 + 4δ′) , (67) 1 log log n 2 + 4δ log n V (C ) < ′ , i = 1, , m, (68) | i | ε log log n · · · E(C ) < V (C ) + 4δ′, i = 1, , m, and, (69) | i | | i | · · · i = 1, ,m : C not tree < log n. (70) |{ · · · i }| (1) (1) From now one we will denote by Zi , the matrix Z restricted to Ci, and by ki the size of the largest clique in Ci. By (67)-(69) and Proposition 5.7 with

2 + 4δ′ 1/4 c =1+4δ′, c = , c = 4δ′, α =1+ δ′ and η = ε , 1 2 ε 3 1/2 1/4 setting ξ := (2ε + 8εδ′) , on the event 4δ′ 4δ′ 4δ′ , for any γ > 0 and sufficiently small ε> 0, D ∩C ∩E

ki 2 1+4δ′ 1/2 (1) (1) 2(k 1) (1 ξ) (1+δ′)+ 2 ε +γ P(λ (Z ) 2(1 + δ ) log n X ) < Cn− i− − , (71) 1 i ≥ ′ | by observing that for ε smallp enough, the first term in (53) is negligible compared to the second term and can be absorbed in the constant C. More precisely, using the bound (66), one can take sufficiently small ε such that 1/2 1/2 1+ δ′ > 2(2ε + 8εδ′) (1 + δ′ + (1 + 2δ′)). (72)

Then, for k 2, 1+δ′ (1 + 2δ ) 1+ δ k (1 ξ)2(1 + δ ), which implies that the first term ≥ 2ξ2 − ′ ≥ ′ ≥ 2(k 1) − ′ in (53) decays faster than the second term. − Define I := i = 1, ,m : k 3 , J := i = 1, ,m : k = 2 , and, { · · · i ≥ } { · · · i } k¯ := max k , , k . (73) { 1 · · · m} k Then, since k 1 is decreasing in k, by (71), under the event 4δ′ 4δ′ 4δ′ , for any i I, − D ∩C ∩E ∈ ¯ (1) k (1 ξ)2(1+δ )+ 1+4δ′ ε1/2+γ P(λ (Z ) 2(1 + δ ) log n X(1)) < Cn− 2(k¯ 1) − ′ 2 , (74) 1 i ≥ ′ | − and for any i J, p ∈ (1) (1) (1 ξ)2(1+δ )+ 1+4δ′ ε1/2+γ P(λ (Z ) 2(1 + δ ) log n X ) < Cn− − ′ 2 . (75) 1 i ≥ ′ | Also, by Lemmas 5.3, 5.4, 5.6pand (63), defining the event := Few cycles, (76) F0 D4δ′ ∩C4δ′ ∩E4δ′ ∩ − 22 SHIRSHENDUGANGULYANDKYEONGSIKNAM we have P c C ( 0 ) . (77) F ≤ n2δ′ Using (41), by the first moment bound, for k 3, ≥ (k) (1) n k (d′) 2 P(X contains a clique of size k) q(2) . (78) ≤ k ≤ (k) k   n 2 − Also, since any connected component Ci which is a tree has ki = 2, on the event Few cycles, we (1) (1) − have I < log n. Thus, using (78) and the fact λ1(Z ) = maxi=1, ,m λ1(Zi ), | | ··· P(λ (Z(1)) 2(1 + δ ) log n) 1 ≥ ′ n E P p (1) (1) 1 1 (max λ1(Zi ) 2(1 + δ′) log n X ) 0 k¯=k ≤ i I { }≥ | F k=3  ∈  X p E P (1) (1) 1 P c + (max λ1(Zi ) 2(1 + δ′) log n X ) + ( 0 ) i J { }≥ | D4δ′ ∩C4δ′ ∩E4δ′ F  ∈  n p k k k 2 1+4δ′ 1/2 ( ) (2)+k 2(k 1) (1 ξ) (1+δ′)+ 2 ε +γ C log n (d′) 2 n− − − − ≤ Xk=3 (1 ξ)2(1+δ )+ 1+4δ′ ε1/2+γ 2δ + Cn n− − ′ 2 + Cn− ′ , (79) · where (74) and (75) are used to bound the first and second terms respectively. The multiplicative factors of log n and n appear as a result of a union bound over the components contributing to 1/2 1/4 the index sets I and J respectively. Recalling ξ = (2ε + 8εδ′) and δ′ from (65), note that limε 0 δ′ = δ and limε 0 ξ = 0. Furthermore, recall from (3) that ψ(δ) = mink 2 φk(δ) where → k(k 3) 1+δ k → ≥ φk(δ)= 2− + 2 k 1 . − ε Hence, by taking γ = ε and bounding the term log n by n , there exists η1 = η1(ε) with limε 0 η1 = 0 such that the first term of RHS in (79) is bounded by → n k k k 2 1+4δ′ 1/2 ( ) (2)+k 2(k 1) (1 ξ) (1+δ′)+ 2 ε +2ε (d′) 2 n− − − − (80) k=3 X n k k(k 3) 1+δ k η ( ) − + 1 (d′) 2 n− 2 − 2 k 1 2 (81) ≤ − Xk=3 1/4 n (log n) η1 1 k k(k 3) 1+δ k η1 1/4 ( ) ψ(δ)+ ( ) − + ψ(δ)+η1 C(log n) (d′) 2 n− 2 + n 2 2 − 2 − 2 k 1 2 < Cn− . (82) ≤ − 1/4 k=(logXn) As the reader perhaps already notices, the cutoff (log n)1/4 is not special and any poly-log cutoff (log n)r with 0

k k 2 1 + 4δ′ 1/2 (1 + δ) (1 ξ) (1 + δ′)+ ε + 2ε 2(k 1) − 2(k 1) − 2 − − 1 + 4δ (1 + δ) (1 2ξ)(1 + δ √2ε(1 + δ)3/2)+ ε1/2 + 2ε ≤ − − − 2 LARGEST EIGENVALUE OF SPARSE GAUSSIAN NETWORKS 23

1 + 4δ 4(ε1/8 + δ1/4ε1/4)(1 + δ)+ √2ε(1 + δ)3/2 + ε1/2 + 2ε =: r (ε), (83) ≤ 2 δ where we used ξ = (2ε1/2 + 8εδ )1/4 2ε1/8 + 2δ1/4ε1/4 in the last inequality. In addition, for any ′ ≤ constant η1 > 0, the inequality (82) holds for sufficiently large n. Hence, η1 > 0 can be chosen as

η1 = 2rδ(ε), (84) which obviously converges to 0 as ε 0. → Similarly, taking γ = ε in the second term of (79), for some η2 = η2(ε) such that limε 0 η2 = 0, → 2 1+4δ 1/2 (1 ξ) (1+δ )+ ′ ε +ε δ+η2 ψ(δ)+η2 n n− − ′ 2 n− n− . (85) · ≤ ≤ Hence, applying (82) and (85) to (79), using the bound for δ′ in (66), for sufficiently small ε,

(1) ψ(δ)+max(η1,η2) P(λ (Z ) 2(1 + δ ) log n) < Cn− . (86) 1 ≥ ′ Recall by Lemma 5.2, for all large n,p (2) 2δ δ2+o(1) δ+o(1) ψ(δ)+o(1) P(λ (Z ) √ε(1 + δ) log n) n− − n− n− . 1 ≥ ≤ ≤ ≤ Since ε> 0 is arbitrary small, by (64p) and the above two displays, we are done.  6. Structure conditioned on Uδ We prove Theorem 1.4 in this section. We begin by stating some facts about φδ. Recall that (δ) is the set of of minimizers of φ ( ), and by the strict convexity of φ ( ), (δ) is at most of size M δ · δ · M 2 containing either a single element or two consecutive numbers. In addition, since δ > δ2, we have ψ(δ) > φδ(2) = δ. From this, one can deduce that there exists a constant c(δ) (0, min(δ ψ(δ), 1)) such that ∈ − k / (δ) φ (k) ψ(δ) c(δ) (87) ∈ M ⇒ δ − ≥ (recall that ψ(δ) = mink 2 φδ(k)). In fact, let us define, in the case when (δ) = h(δ) is a singleton, by the strict convexity≥ of φ ( ), M { } δ · 1 1 c(δ) = min φ (h(δ) 1) φ (h(δ)), φ (h(δ) + 1) φ (h(δ)), (δ ψ(δ)), , δ − − δ δ − δ 2 − 2 and when (δ)= h(δ), h(δ) + 1 (recall that h(δ) is the minimal element of (δ)),  M { } M 1 1 c(δ) = min φ (h(δ) 1) φ (h(δ)), φ (h(δ) + 2) φ (h(δ) + 1), (δ ψ(δ)), . δ − − δ δ − δ 2 − 2 The minimum with1/2 and (δ ψ(δ))/2 is taken for technical reasons since in later applications we will need c(δ) to be small enough,− while (87) holds even without it. Note that the quantity c(δ) can be arbitrary close to 0. In fact, for any δ such that (δ ) = 2, c(δ) is close to 0 if δ is close 0 |M 0 | to δ0. Recall the notation k¯ from (73). Now by the same chain of reasoning as in (79), setting ξ := 1/2 1/4 (2ε + 8εδ′) and γ = ε, we obtain that for some η1, η2 with limε 0 η1 = limε 0 η2 = 0, → → P(k¯ / (δ), λ (Z(1)) 2(1 + δ ) log n) ∈ M 1 ≥ ′ k k +k k (1 ξ)2(1+δ )+ 1+4δ′ ε1/2+ε (2) p(2) 2(k 1) ′ 2 C log n (d′) n− − − − (88) ≤ k / (δ) ∈MX (1 ξ)2(1+δ )+ 1+4δ′ ε1/2+ε 2δ + Cn n− − ′ 2 + Cn− ′ · ψ(δ) c(δ)+η1 δ+η2 Cn− − + Cn− , (89) ≤ 24 SHIRSHENDUGANGULYANDKYEONGSIKNAM where the bound on the first term is obtained as follows. By (83), for each k / (δ), the exponent of n in (88) is bounded by ∈ M

k k 2 1 + 4δ′ 1/2 + k (1 ξ) (1 + δ′)+ ε + ε − 2 − 2(k 1) − 2   − k k (87) + k (1 + δ)+ r (ε)= φ (k)+ r (ε) ψ(δ) c(δ)+ r (ε). ≤− 2 − 2(k 1) δ − δ δ ≤ − − δ   − ψ(δ) c(δ)+2r (ε) Hence, by the argument (80)-(82), the term (88) can be bounded by n− − δ , and since limε 0 rδ(ε) = 0, we obtain (89). Therefore, using the fact that ψ(δ)+ c(δ) < δ, for sufficiently small→ε> 0,

(1) ψ(δ) c(δ)+η1 P(k¯ / (δ), λ (Z ) 2(1 + δ ) log n) Cn− − . (90) ∈ M 1 ≥ ′ ≤ Since the statement of the theorem is aboutp the entire graph X and not just X(1), we will now show that superimposing X(2) on the latter does not alter the size of the maximal clique with high (2) probability owing to the sparsity of X . Recall that we use kX to denote the size of the maximal clique in X. Since k k¯ (recall that k¯ is the maximal clique size in X(1)), (90) implies X ≥ (1) ψ(δ) c(δ)+η1 P(k h(δ) 1, λ (Z ) 2(1 + δ ) log n) Cn− − . (91) X ≤ − 1 ≥ ′ ≤ (2) To treat the non-trivial direction, i.e., superimposingp X does not make kX larger than k,¯ define the event , measurable with respect to X(1), by F1 k¯ := E(H) V (H) k¯ : any subgraph H such that H 2h(δ) + 2 . (92) F1 {| |− 2 ≤ | |− | |≤ }   In words, under 1, the subgraph induced on any subset of vertices of size bigger than k,¯ has significantly smallerF number of edges than the clique induced on the same. (1) Note that, in particular, on , X has a unique maximal clique K := K (1) of size k.¯ This F1 X follows from the definition of 1 applied to the subgraph induced on K K′ where K′ is another set of k¯ vertices. F ∪ (2) We will show first show that 1 is likely, and on it, for X to have a larger clique, X must fill in the ‘substantially many’ edgesF absent in X(1) which will then be shown to be unlikely.

Showing is likely. Towards this, observe that F1 2h(δ)+2 k (48) ( ) k+1 k c i 2 − ( ) k+2 1 P( k¯ = k ) C C(2h(δ) + 2) 2 − . (93) 1 (k) k+1 { } ∩ F ≤ n ≤ n 2 − Xi=1   Hence, recalling the event in (76), using the above and the argument of (79) again, there is η F0 1′ with limε 0 η1′ = 0 such that for δ > δ2 (recall the definition from Remark 1.2), → P(k¯ (δ), c, λ (Z(1)) 2(1 + δ ) log n) ∈ M F1 1 ≥ ′ (1) E P p (1) 1 1 c 1 (max λ1(Zi ) 2(1 + δ′) log n X ) 0 k¯=k ≤ i I { }≥ | F F1 k (δ)  ∈  ∈MX p E P (1) (1) 1 P c + (max λ1(Zi ) 2(1 + δ′) log n X ) + ( 0 ) i J { }≥ | D4δ′ ∩C4δ′ ∩E4δ′ F  ∈  p k k k 2 1+4δ′ 1/2 1 ( ) k+2 (2)+k 2(k 1) (1 ξ) (1+δ′)+ 2 ε +ε C(log n)n− (2h(δ) + 2) 2 − n− − − − ≤ k (δ) ∈MX LARGEST EIGENVALUE OF SPARSE GAUSSIAN NETWORKS 25

(1 ξ)2(1+δ )+ 1+4δ′ ε1/2+ε 2δ + Cn n− − ′ 2 + Cn− ′ · ψ(δ) 1+η Cn− − 1′ , (94) ≤ 1 where the extra n− factor in the first term comes from (93). Putting the above together, letting := k¯ (δ) , (95) F2 { ∈ M } ∩ F1 by (90) and (94), for large δ,

c (1) ψ(δ) c(δ)+η1 P( , λ (Z ) 2(1 + δ ) log n) n− − (96) F2 1 ≥ ′ ≤ (recall that c(δ) (0, 1)). By Lemma 5.2,p this in particular implies ∈ P( c, λ (Z) 2(1 + δ) log n) F2 1 ≥ P( c, λ (Zp(1)) 2(1 + δ ) log n)+ P(λ (Z) 2(1 + δ) log n, λ (Z(1)) < 2(1 + δ ) log n) ≤ F2 1 ≥ ′ 1 ≥ 1 ′ ψ(δ) c(δ)+η1 (2) ψ(δ) c(δ)+η1 Cn− − +pP(λ (Z ) √ε(1 + δ) log np) Cn− − . p (97) ≤ 1 ≥ ≤ Combining this with (4), since limε 0 η1 = 0, therep exists ε0 = ε0(δ) > 0 such that for any ε < ε0 → (recall that ε implicitly appears in the definition of X(1)), c lim P( δ) = 0. (98) n 2 →∞ F |U In particular, recalling and implies the uniqueness of maximal clique K in X(1), F2 ⊂ F1 F1 (1) lim P(there is a unique maximal clique K in X δ) = 1. (99) n →∞ |U For convenience, let us denote the above event by Unique. We now proceed to showing that the unique maximal clique K of X(1) continues to be so on superimposing X(2) to obtain X.

Showing KX = KX(1) . We first define some notations. For two subsets of vertices A and B, define the set of undirected edges Edge(A, B) := e = (i, j) : i < j, i, j B A e = (i, j) : i B A, j A B . { ∈ \ } ∪ { ∈ \ ∈ ∩ } Note that B A B Edge(A, B) = | | | ∩ | . (100) | | 2 − 2     Then, define the random subset of edges, measurable with respect to X(1), by X(1)(A, B)= Edge(A, B) E(X(1)). ∩ We first verify that under the event = k¯ (δ) , any clique K of size ℓ k¯ satisfies F2 { ∈ M } ∩ F1 ′ ≤ (1) X (K, K′) ℓ K K′ (101) | |≤ − | ∩ | where as mentioned above K in the unique maximal clique in X(1). Since ¯ k (1) E(K K′) + X (K, K′) , | ∪ |≥ 2 | |   applying (92) to H = K K′ (note that under the event k¯ (δ), we have K K′ 2k¯ 2h(δ) + 2), ∪ ∈ M | ∪ | ≤ ≤ ¯ ¯ k (1) k + X (K, K′) K K′ k¯ = ℓ K K′ , 2 | |− 2 ≤ | ∪ |− − | ∩ |     26 SHIRSHENDUGANGULYANDKYEONGSIKNAM which implies (101). Note that conditioning on X(1), the entries of X are independent and satisfy 2d P(X = 1 X(1) = 0) for large n, ij | ij ≤ n P(X = 1 X(1) = 1) = 1. ij | ij In fact, using the fact that P(X = 0) = 1 d 1 for large n, ij − n ≥ 2 P( Y < √ε log log n, X = 1) P(X = 1) 2d P(X = 1 X(1) =0)= ij ij ij , ij ij | | (1) | P ≤ P(Xij = 0) ≤ n (Xij = 0) and the second identity is obvious. We will now define two events 0 and 1, which will be shown to be very likely on δ and together would imply that K is the uniqueB maximalB clique in X and moreover, the largestU clique not fully contained in K is a triangle. (1) We begin with 0 which is measurable with respect to the sigma algebra generated by X and X, B := Unique there is no clique of size 4 in X edge-disjoint from K . (102) B0 ∩ { } Recalling that K is of size k¯, by BK inequality and using Lemma 4.3 h(δ) 4 h(δ) 1 ( 2 ) h(δ) 1 (2) 4 1 ( 2 ) h(δ)+2 P( c k¯ (δ) ) C − − = C − , (103) B0 ∩ { ∈ M } ≤ n n n where C > 0 is a constant depending only  on δ. We write    P c, λ (Z(1)) 2(1 + δ ) log n B0 1 ≥ ′   E P (1) p (1) 1 1 1 c (λ1(Z ) 2(1 + δ′) log n X , X) 0 k¯ (δ) ≤ ≥ | F ∈M B0 h c i + P k¯ p(δ) , λ (Z(1)) 2(1 + δ ) log n . (104) F0 ∩ { ∈ M } 1 ≥ ′ (1)  (1)  Since λ1(Z ) and X are conditionally independent givenpX , by (74) and (75) with γ = ε, there is η3 = η3(ε) with limε 0 η3 = 0 such that for sufficiently small ε> 0, → P (1) (1) 1 1 (λ1(Z ) 2(1 + δ′) log n X , X) 0 k¯ (δ) ≥ | F ∈M k¯ 1+4δ (1 ξ)2(1+δ )+ ′ ε1/2+ε 2 1+4δ′ 1/2 p2(k¯ 1) ′ 2 (1 ξ) (1+δ′)+ ε +ε C(log n)n− − − + Cn n− − 2 ≤ · k¯ h(δ)+1 (1+δ)+η3 +ε (1+δ)+η3+ε Cn− 2(k¯ 1) Cn− 2h(δ) , ≤ − ≤ k¯ ¯ where the second and last inequalities follow by observing 2(k¯ 1) (1 + δ) φδ(k) < φδ(2) = δ (since − ≤ k¯ 3 and δ > δ2) and k¯ h(δ) + 1 respectively. Hence, applying this and (103) to (104), using (77≥) and (96) to bound the≤ last term in (104), for sufficiently small ε> 0, P c, λ (Z(1)) 2(1 + δ ) log n B0 1 ≥ ′  h(δ)+1  (1+δp)+η3 +ε c 2δ ψ(δ) c(δ)+η1 Cn− 2h(δ) P( k¯ (δ) )+ Cn− ′ + n− − ≤ B0 ∩ { ∈ M } h(δ) (103) h(δ)+1 ( ) h(δ)+2 (1+δ)+η3 +ε 1 2 − 2δ ψ(δ) c(δ)+η1 Cn− 2h(δ) + Cn− ′ + n− − ≤ n ψ(δ) 1+η3 +ε 2δ  ψ(δ) c(δ)+η1 ψ(δ) c(δ)+η1 Cn− − + Cn− ′ + n− − Cn− − (105) ≤ ≤ LARGEST EIGENVALUE OF SPARSE GAUSSIAN NETWORKS 27

(recall that c(δ) (0, 1)), where the third inequality follows from the fact ∈ h(δ) + 1 h˜(δ) h(δ) h(δ)(h(δ) 3) 1 (1 + δ)+ h(δ)= (1 + δ)+ − 2h(δ) 2 − 2(h(δ) 1) 2 − 2h(δ)(h(δ) 1)    −  − ψ(δ) 1, ≥ − where the last inequality follows from the observation that the term in the parentheses is exactly ψ(δ). Let us define the another event, again measurable with respect to the sigma algebra generated by X(1) and X,

1 := Unique there is no clique K′ in X such that 4 K′ k¯ and 2 K K′ k¯ 1 . B ∩ { ≤ | |≤ ≤ | ∩ |≤ − (106)} Thus in words, the event demands the existence of a clique of size at least 4 which is not edge disjoint from K but also is not contained in the latter. Note that by (100) and (101), under the event , the number of missing edges (of X(1)) in F2 Edge(K, K′) is

(1) K′ K K′ Edge(K, K′) X (K, K′) | | | ∩ | ( K′ K K′ ). | \ |≥ 2 − 2 − | | − | ∩ |     Hence, P c (1) 1 ( 1 X ) 2 B | F k¯ ℓ 1 − P (1) (1) 1 (Xij = 1 for all edges e = (i, j) Edge(K, K′) X (K, K′) X ) 2 ≤ ∈ \ | F ℓ=4 m=2 K =ℓ, K K =m X X | ′| X| ∩ ′| k¯ ℓ 1 ℓ m k¯ ℓ 1 ℓ m − ( ) ( ) (ℓ m) − (( ) 2ℓ) (( ) 2m) m ℓ m 2d 2 − 2 − − 1 2 − − 2 − 1 k¯ n − C C , (107) ≤ n ≤ n ≤ n m=2 m=2 Xℓ=4 X   Xℓ=4 X   where C > 0 is a constant depending only on δ. Here, the last inequality follows from the fact that k a function f(k) := 2 2k satisfies the following property: f(2) = f(3) = 3, f(4) = 2 and is strictly increasing for k− 4. − −  ≥ (1) (1) Hence, observing that, X and λ1(Z ) are conditionally independent given X , by (86) and (96), P c, λ (Z(1)) 2(1 + δ ) log n B1 1 ≥ ′  E P c (1) p (1) 1 1  P c (1) ( X , λ1(Z )) (1) + ( , λ1(Z ) 2(1 + δ ) log n) 1 2 λ1(Z ) √2(1+δ ) log n 2 ′ ≤ B | F ≥ ′ F ≥ h ψ(δ) c(δ)+η1 i Cn− − . p (108) ≤ Combining with (90) and (105),

c (1) ψ(δ) c(δ)+η1 P ( k¯ (δ) ) , λ (Z ) 2(1 + δ ) log n Cn− − . (109) B0 ∩B1 ∩ { ∈ M } 1 ≥ ′ ≤   Proceeding as in (97)-(99), there exists ε1 > 0 suchp that for any ε < ε1,

lim P( 0 1 k¯ (δ) δ) = 1. (110) n →∞ B ∩B ∩ { ∈ M }|U Recalling that the size of clique K is k,¯ the event 0 1 k¯ (δ) implies the statements in Theorem 1.4 and in particular B ∩B ∩ { ∈ M }

lim P(there is a unique maximal clique KX in X and is equal to K δ) = 1. (111) n →∞ |U 28 SHIRSHENDUGANGULYANDKYEONGSIKNAM



7. Optimal localization of leading eigenvector We prove Theorem 1.6 in this section. Recall v = (v , , v ) is the unit eigenvector associated 1 · · · n with the largest eigenvalue λ1 = λ1(Z) and let KX be the unique maximal clique (recall that Theorem 1.4 ensures uniqueness conditioned on with high probability). Then, Uδ (1) (2) λ1 = Zijvivj = Zij vivj + Zij vivj. (112) 1 i,j n 1 i,j n 1 i,j n ≤X≤ ≤X≤ ≤X≤ The proof has two parts. In the first, we prove that the eigenvector allocates most of its mass on KX , while in the second part we further show that the mass is uniformly distributed.

Mass concentration. Let us recall rδ(ε) and c(δ) defined in (83) and (87) respectively. We choose a parameter ε sufficiently small so that

2rδ(ε) < c(δ), (113) 1 ε , (114) ≤ δ4 ε< min(ε0, ε1) (115)

(ε0 and ε1 are positive constant depending on δ such that (99) and (111) are satisfied for ε < ε0 and ε < ε respectively). Recall that by (99) and (111), conditionally on , with probability tending 1 Uδ to 1, the following is true: the maximal cliques KX(1) and KX are unique and equal which will be often denoted by K for brevity. Hence, throughout the proof, we assume the occurrence of this event. Recall := v2 1 κ , (116) A1 i ≥ − i K n X∈ o where κ> 0 is the parameter in the statement of the theorem. Since

(2) (2) log n (2δ+δ2)+o(1) P Z v v √ε(1 + δ) log n P λ (X ) (1 + δ) n− ,  ij i j ≥  ≤ 1 ≥ log log n ≤ 1 i,j n s ! ≤X≤ p by (64), for any event ,  A (1) (2δ+δ2)+o(1) P( , λ 2(1 + δ) log n) P( , Z v v 2(1 + δ ) log n)+ n− , (117) A 1 ≥ ≤ A ij i j ≥ ′ 1 i,j n p ≤X≤ p where δ′ > 0 as before is defined to be 2(1 + δ )= 2(1 + δ) √ε(1 + δ). (118) ′ − Note that since ε 1 , using thep bound for δpin (66), we have ≤ δ4 ′ δ′ = δ + o (1) as δ . (119) δ →∞ We will now bound the first term on the RHS of (117) with = using Proposition 3.1 and the A A1 fact that on the high probability event 1 defined in (92), the largest clique outside K is at most a triangle which would make it suboptimalF in a large deviation theoretic sense for the eigenvector to allocate mass off of K. We now proceed to make this precise. The arguments will bear similarities with those appearing in the proof of Proposition 5.7. LARGEST EIGENVALUE OF SPARSE GAUSSIAN NETWORKS 29

Let C , ,C be connected components of X(1), and let without loss of generality C contain 1 · · · m 1 the clique K of size k¯. Let ki be the maximum clique size in Ci. We will now work with the high probability event from (76). As in the proof of Proposition F0 5.7, define B1 to be the collection of (directed) edges defined by

B := e = (i, j) E−−−−→(C ) : v2, v2 < η¯2 (120) 1 { ∈ 1 i j } (recall that for any graph H, E−−−→(H) denotes the set of directed edges in H), where the parameter η¯ is chosen to be η¯ = ε1/4. (121)

Define the set of (directed) edges B := E−−−−→(C ) B . Since i : v2 η2 1 , under the event , 2 1 \ 1 |{ i ≥ }| ≤ η¯2 F0 using the definition of , D4δ′ 1 1 + 4δ log n B ′ , (122) 2| 2|≤ η¯2 log log n by the same reasoning as preceding (54). We write

(1) (1) (1) (1) Zij vivj = Zij vivj = Zij vivj + Zij vivj =: S1 + S2. (123) 1 i,j n (i,j) B1 (i,j) B2 (i,j) E−−−−→(C1) ≤X≤ X∈ X∈ X∈ By the same reasoning as in (58), and following the same notation as in the latter, for 2¯η2 1, under the event , ≤ F0 2 2 2 2 4 4 v v 2¯η (1 η¯ ) + 2(4δ′ + 1)¯η =: θ . (124) i j ≤ − (i,j) B1 X∈ Note that since η¯ = ε1/4 1 (see (114)), by (119), we have ≤ δ 1 θ = O as δ . (125) δ1/2 →∞   By Cauchy-Schwarz inequality and (124),

1/2 1/2 1/2 S v2v2 (Z(1))2 θ2 (Z(1))2 . (126) 1 ≤ i j ij ≤ ij (i,j) B1 (i,j) B1      (i,j) E−−−−→(C1)  X∈ X∈ X∈ Next, we estimate S . Define x := v2 and y := v2. Recalling the definition of event 2 i K i C1 K i in (92), on the latter, ∈ \ F1 P P the maximum size of clique in Kc is at most 3. (127)

c k¯ ¯ In fact, if K contains a clique K′ of size 4, then E(K K′) 2 + 6 and V (K K′) = k + 4, which contradicts (92). Hence, under the event | , ∪ |≥ | ∪ | F1  v2v2 v2v2 v2v2 + 2 v2v2 + v2v2 i j ≤ i j ≤ i j i j i j (i,j) B2 i=j,i,j K i K,j C1 K (i,j) E−−−−→(C1) i=j,(i,j) E−−−−−−→(C1 K) X∈ X∈ 6 X∈ ∈ X∈ \ 6 X∈ \ k¯ 1 1 2 − x2 + xy + y2 . (128) ≤ 2k¯ 3   30 SHIRSHENDUGANGULYANDKYEONGSIKNAM

The final bound follows from (21). Thus, under the event , F1 1/2 1/2 ¯ 1/2 1/2 2 2 (1) 2 k 1 2 2 2 (1) 2 S2 v v (Z ) − x + 2xy + y (Z ) . ≤ i j ij ≤ k¯ 3 ij (i,j) B2 (i,j) B2 (i,j) B2  X∈   X∈     X∈  (129) Note that using the fact n m 2 2 1= vi = x + y + vi , i=1 ℓ=2 i C X X  X∈ ℓ  we have m (1) (1) Zij vivj = S1 + S2 + Zij vivj 1 i,j n ℓ=2 (i,j) E−−−−→(C ) ≤X≤ X X∈ ℓ m (1) 2 (1) S1 + S2 + λ(Z ) vi S1 + S2 + (1 x y) max λ(Z ). (130) ≤ ℓ ≤ − − ℓ=2, ,m ℓ ℓ=2 i C ··· X  X∈ ℓ  We now estimate the following conditional probability P (1) (1) 1 x< 1 κ, Zij vivj 2(1 + δ′) log n X 0 2 − ≥ | F ∩F 1 i,j n  ≤X≤ p  P κ (1) (1) 1 = x< 1 κ, y , Zij vivj 2(1 + δ′) log n X 0 2 − ≥ 2 ≥ | F ∩F 1 i,j n  ≤X≤ p  P κ (1) (1) 1 + x< 1 κ,y < , Zij vivj 2(1 + δ′) log n X 0 2 − 2 ≥ | F ∩F 1 i,j n  ≤X≤ p  =: R1 + R2. (131)

We next bound R1 and R2 in turn.

Bounding R1. By (130), P (1) 1 R1 (S1 θ 2(1 + δ′) log n X ) 0 2 ≤ ≥ | F ∩F κ P p (1) 1 + x< 1 κ, y ,S2 (x + y θ) 2(1 + δ′) log n X 0 2 − ≥ 2 ≥ − | F ∩F P (1) p (1) 1  + max λ(Z ) 2(1 + δ′) log n X 0 2 ℓ=2, ,m ℓ ≥ | F ∩F  ···  =: R1,1 + R1,2 + R1,3. p (132) By (126) and (39) in Lemma 5.1 with γ = ε, L = 1 (1 + δ ) log n and m 2+4δ′ log n + 4δ (see θ2 ′ ≤ ε log log n ′ (68) and (69)), for sufficiently large n,

(1) 1 1+δ′ +(1+2δ )+ε P 2 (1) 1 2θ2 ′ R1,1 (Z ) (1 + δ′) log n X 0 2 n− . (133) ≤ ij ≥ θ2 | F ∩F ≤  i

In fact, rearranging, this inequality holds if 1 k¯ 1 k¯ 1 2 λx2 + 2 + λ xy + 2 − λ θ(x + y) < − λ y2. k¯ k¯ − k¯ − − 3       Recall that by (9), k¯ (δ) implies ∈ M 1+ δ 1/3 1+ δ 1/3 1 k¯ + 3. (135) 2 − ≤ ≤ 2 Hence, there is λ = λ(κ) > 0 such that for x< 1 κ, y κ and sufficiently large δ, under the event − ≥ 2 k¯ (δ), ∈ M 1 1 1 k¯ 1 1 λx2 < y2, 2 + λ xy < y2, 2 − λ θ(x + y) < 2θ < y2 10 k¯ 10 k¯ − 10     1 (see (125) for the bound of θ). If λ is small enough, say λ (0, 100 ), then for sufficiently large δ, 3 k¯ 1 2 ∈ under the event k¯ (δ), < − λ , and thus we obtain (134). ∈ M 10 k¯ − − 3 k¯ 1 1 k¯ Thus, by (129) and (134), using the fact ( −k¯ λ)− k¯ 1 + λ, − ≥ − ¯ P (1) 2 k (1) 1 R1,2 (Z ) + λ (1 + δ′) log n X 0 2 . (136) ≤ ij ≥ k¯ 1 | F ∩F i

k¯ 1 + 4δ′ log n γ = ε, L = + λ (1 + δ′) log n, m k¯ 1 ≤ η¯2 log log n  −  (see (122)), recalling η¯ = ε1/4, for large enough δ, for sufficiently large n,

¯ 1 1 k ε 1+4δ′ 1 k¯ 1 1 1/2 ( +λ)(1+δ′)+ +ε (1+δ ) λ(1+δ )+ (1+4δ )ε +2ε R V (C ) j η¯2 kn− 2 k¯ 1 2 η¯2 n− 2 k¯ 1 ′ − 2 ′ 2 ′ . (137) 1,2 ≤ | 1 | − ≤ − 1 log n Here, we used the fact V (C ) (1 + 4δ ) to bound the term V (C ) j η¯2 k by nε. | 1 |≤ ′ log log n | 1 | Recall from (127) that the size of maximal clique in Cℓ, ℓ = 2, ,m, is at most 3 under the event = k¯ (δ) . Hence, by Proposition 5.7 with · · · F2 { ∈ M } ∩ F1 1/4 2 + 4δ′ α =1+ δ′, k 3, γ = ε, η = ε , c =1+4δ′, c = , c = 4δ′, ≤ 1 2 ε 3 2 4 1/4 setting ξ := (2η + 8η δ′) , for sufficiently large δ,

1+δ′ 3 2 1 1/2 2 +(1+2δ′)+ε (1 ξ) (1+δ )+ (1+4δ )ε +ε R n(n− 2ξ + n− 4 − ′ 2 ′ ) 1,3 ≤ 3 (1 ξ)2(1+δ )+ 1 (1+4δ )ε1/2+ε+1 Cn− 4 − ′ 2 ′ . (138) ≤ Here, we used the following comparison between exponents: as δ , →∞ 1+ δ′ 2 (1 + 2δ′) ε = Ω(δ ), 2ξ2 − − 3 2 1 1/2 3 (1 ξ) (1 + δ′) (1 + 4δ′)ε ε = + o (1) δ. 4 − − 2 − 4 δ   32 SHIRSHENDUGANGULYANDKYEONGSIKNAM

This follows from ε 1 and the fact ≤ δ4 1 ξ = O as δ , (139) δ1/2 →∞   which is a consequence of ξ = (2η2 + 8η4δ )1/4 = (2ε1/2 + 8εδ )1/4, ε 1 and the bound for δ in ′ ′ ≤ δ4 ′ (119). Thus, applying the above bounds for R1,1, R1,2 and R1,3 (see (133), (137) and (138) respectively) to (132), for sufficiently large δ, ¯ 1 k (1+δ ) 1 λ(1+δ )+ 1 (1+4δ )ε1/2+2ε R Cn− 2 k¯ 1 ′ − 2 ′ 2 ′ . (140) 1 ≤ − This follows from the fact that for sufficiently large δ, under the event k¯ (δ), RHS of (137) is ∈ M the slowest decaying term among itself, (133) and (138). In fact, using ε 1 and the bound for θ, ≤ δ4 k¯ and ξ in (125), (135) and (139) respectively, we have

1+ δ′ 2 (1 + 2δ′) ε = Ω(δ ), (141) 2θ2 − − ¯ 1 k 1 1 1/2 1+ λ (1 + δ′)+ λ(1 + δ′) (1 + 4δ′)ε 2ε = + o (1) δ, (142) 2 k¯ 1 2 − 2 − 2 δ −   3 2 1 1/2 3 (1 ξ) (1 + δ′) (1 + 4δ′)ε ε 1= + o (1) δ. (143) 4 − − 2 − − 4 δ Since λ (0, 1 ), for large δ, (142) is smaller than the other two terms.  ∈ 100

Bounding R2. For υ > 0 to be chosen later, we write P (1) 1 R2 (S1 θ 2(1 + δ′) log n X ) 0 2 ≤ ≥ | F ∩F ¯ P p k (1) 1 + S2 (1 + υ) x + y 2(1 + δ′) log n X 0 2 ≥ k¯ 1 | F ∩F   κ − p  + P x< 1 κ,y < , − 2  ¯ (1) k (1) 1 (1 x y)max λ1(Z ) 1 (1 + υ) x + y θ 2(1 + δ′) log n X 0 2 − − ℓ 2 ℓ ≥ − k¯ 1 − | F ∩F ≥   −    =: R2,1 + R2,2 + R2,3. p (144) We take υ > 0 such that for sufficiently large δ > 0 and small κ > 0, under the event k¯ (δ), for any x< 1 κ and y < κ , ∈ M − 2 k¯ 1+ υ 9 1 (1 + υ) x + y θ 1 (1 + υ)(x + y) κ θ > (1 x y). (145) − k¯ 1 − ≥ − − 2(k¯ 1) − 10 − − − −   κ Here, the last inequality follows from the bound x + y 1 2 and the bound for θ in (125). By (133), for sufficiently large n, ≤ −

1+δ′ +(1+2δ )+ε R n− 2θ2 ′ . (146) 2,1 ≤ Note that for sufficiently large δ, under the event k¯ (δ), by the bound for k¯ in (135), we have k¯ 1 2 1 2 k¯ 1 k¯ 2 ∈ M 2( 2−k¯ x + xy + 3 y ) < −k¯ (x + k¯ 1 y) . Thus, by (129), − ¯ 1/2 ¯ 1/2 k 1 k (1) 2 S2 − x + y (Z ) . ≤ k¯ k¯ 1 ij (i,j) B2    −  X∈  LARGEST EIGENVALUE OF SPARSE GAUSSIAN NETWORKS 33

Hence, by the same arguments as in (136) and (137) (apply (39) in Lemma 5.1 with γ = ε, k¯ 2 1+4δ′ log n L = k¯ 1 (1 + υ) (1 + δ′) log n and m η¯2 log log n ), for large enough δ, for sufficiently large n, − ≤ ¯ P (1) 2 k 2 (1) 1 R2,2 (Zij ) (1 + υ) (1 + δ′) log n X 0 2 ≤ ≥ k¯ 1 | F ∩F i

9 2 1+δ′ 3 9 2 1 1/2 ( ) 2 +(1+2δ′)+ε ( ) (1+δ )+ (1+4δ )ε +ε R n(n− 10 2ξ + n− 4 10 ′ 2 ′ ) 2,3 ≤ 3 ( 9 )2(1+δ )+ 1 (1+4δ )ε1/2+ε+1 Cn− 4 10 ′ 2 ′ . (148) ≤ Thus, applying (146), (147) and (148) to (144), for large enough δ, for sufficiently large n, ¯ 1 k (1+δ ) υ(1+δ )+ 1 (1+4δ )ε1/2+2ε R Cn− 2 k¯ 1 ′ − ′ 2 ′ . (149) 2 ≤ − This follows from the fact that for sufficiently large δ, under the event k¯ (δ), RHS of (147) is the slowest decaying term among itself, (146) and (148). This can be∈ verified M by the similar argument as in (140), combined with the fact that ¯ 1 k 1 1/2 1 (1 + δ′)+ υ(1 + δ′) (1 + 4δ′)ε 2ε = + υ + o (1) δ 2 k¯ 1 − 2 − 2 δ − 1 3 9 2   and 2 < 4 ( 10 ) . Therefore, using the bounds in (140) and (149) in (131) we get that P (1) (1) 1 x< 1 κ, Zij vivj 2(1 + δ′) log n X 0 2 − ≥ | F ∩F 1 i,j n  ≤X≤  ¯ p 1 k (1+δ ) min( 1 λ,υ)(1+δ )+ 1 (1+4δ )ε1/2+2ε Cn− 2 k¯ 1 ′ − 2 ′ 2 ′ . (150) ≤ − Finishing the proof. Recall that under the event 0, the number of non-tree components is less than log n. Hence, by (74) and (75), F P (1) (1) 1 Zij vivj 2(1 + δ′) log n X 0 ≥ | F 1 i,j n  ≤X≤ p  P (1) (1) 1 max λ1(Z ) 2(1 + δ′) log n X 0 ≤ ℓ=1, ,m ℓ ≥ | F ···  ¯  k (1 ξ)2(1+p δ )+ 1 (1+4δ )ε1/2+ε 2 1 1/2 2(k¯ 1) ′ 2 ′ (1 ξ) (1+δ′)+ (1+4δ′)ε +ε C(log n)n− − − + Cn n− − 2 (151) ≤ · (recall that ξ = (2ε1/2 + 8εδ )1/4). Using the bound for ξ in (139) and ε 1 , in the case k¯ 3, for ′ ≤ δ4 ≥ large δ, for sufficiently large n, (151) is bounded by ¯ k (1 ξ)2(1+δ )+ 1 (1+4δ )ε1/2+2ε 2(k¯ 1) ′ 2 ′ Cn− − − , (152) 34 SHIRSHENDUGANGULYANDKYEONGSIKNAM and for k¯ = 2, (151) is bounded by (1 ξ)2(1+δ )+ 1 (1+4δ )ε1/2+ε+1 Cn− − ′ 2 ′ . (153) Recalling = k¯ (δ) , we write F2 { ∈ M } ∩ F1 P x< 1 κ, Z(1)v v 2(1 + δ ) log n − ij i j ≥ ′ 1 i,j n  ≤X≤ p  E P (1) (1) 1 1 1 x< 1 κ, Zij vivj 2(1 + δ′) log n X k¯=k 0 2 ≤ − ≥ | F F k (δ) 1 i,j n ∈MX h  ≤X≤ p  i (1) E P (1) 1 1 1 c + Zij vivj 2(1 + δ′) log n X k¯=k 0 ≥ | F F1 k (δ) 1 i,j n ∈MX h  ≤X≤ p  i E P (1) (1) 1 1 P c + Zij vivj 2(1 + δ′) log n X k¯=k 0 + ( 0 ). (154) ≥ | F F k / (δ) 1 i,j n ∈MX h  ≤X≤ p  i Recalling ε 1 , by (78) and (150), the first term in (154) is bounded by ≤ δ4 (k) 1 k (1+δ ) min( 1 λ,υ)(1+δ )+ 1 (1+4δ )ε1/2+2ε (d′) 2 C n− 2 k 1 ′ − 2 ′ 2 ′ . (155) − (k) k k (δ) n 2 − ∈MX We use the argument in (80)-(82) to bound this quantity. The exponent in n above is less than

k k 2 1 + 4δ′ 1/2 1 + k (1 ξ) (1 + δ′)+ ε + 2ε min λ, υ (1 + δ′). − 2 − 2(k 1) − 2 − 2   − h i  1  Comparing this with the exponent in (80), we notice the additional term min( 2 λ, υ)(1 + δ′). Hence, recalling η1 in (82) can be chosen as η1 = 2rδ(ε) (see (84)), (155) can be bounded by ψ(δ) min( 1 λ,υ)(1+δ )+2r (ε) Cn− − 2 ′ δ . (156) Similarly, using (93) and (152), the second term in (154) is bounded by (k) k+1 k 2 1 1/2 2h(δ) + 2 2 − 2(k 1) (1 ξ) (1+δ′)+ 2 (1+4δ′)ε +2ε ψ(δ) 1+2rδ (ε) C n− − − Cn− − . (157) n ≤ k (δ)   ∈MX 2h(δ)+2 k 1 (2) k+1 This follows from the fact that there is an additional n− term arising from ( n ) − . In addition, by (78), (152) and (153), the third term in (154) is bounded by k ¯ ( ) k (1 ξ)2(1+δ )+ 1 (1+4δ )ε1/2+2ε (d ) 2 2 1 1/2 2(k¯ 1) ′ 2 ′ ′ (1 ξ) (1+δ′)+ (1+4δ′)ε +ε+1 C n− − − + Cn− − 2 (k) k k 3,k / (δ) n 2 − ≥ X∈M ψ(δ) c(δ)+2r (ε) Cn− − δ . (158) ≤ Here, the additional term c(δ) comes from the fact that in the first term, the summation is taken c c only over k (δ) and φδ(k) ψ(δ)+ c(δ) for k (δ) (see (89) for details). The second term ∈ M ≥ 1 ∈ M can be absorbed in the constant C since ψ(δ) = ( 2 + oδ(1))δ (see (6)). 2δ Finally, by (77), the last term in (154) is bounded by n− ′ . Hence, applying the above bounds to (154), for sufficiently large δ > 0,

(1) ψ(δ) c(δ)+2r (ε) P x< 1 κ, Z v v 2(1 + δ ) log n Cn− − δ . (159) − ij i j ≥ ′ ≤ 1 i,j n  ≤X≤ p  LARGEST EIGENVALUE OF SPARSE GAUSSIAN NETWORKS 35

1 Above, we used the fact that for large enough δ, c(δ) < 1 < min( 2 λ, υ)(1 + δ′), which follows from the bound for δ′ in (119). Since 2rδ(ε) < c(δ) (see (113)), applying (117) and then Theorem 1.1, for sufficiently large δ,

lim P(x< 1 κ δ) = 0. (160) n →∞ − |U Therefore, for sufficiently large δ,

lim P( 1 δ) = 1. (161) n A |U →∞ 

Uniformity of eigenvector. We will aim to show (v2 v2)2 is small from which the i ρ, Zij vivj 2(1 + δ′) log n X 0 2 ≥ − − ≥ | F ∩F i ρ, S2 (x + y θ) 2(1 + δ′) log n X 0 2 ≥ − − ≥ − | F ∩F i ρ, using x 1, we obtain the analog of (129): ∈ − ≤ ¯ 1/2 1/2 P k 1 ρ 2 2 2 (1) 2 S2 − x + 2xy + y (Z ) . (163) ≤ k¯ − k¯ 3 ij (i,j) B2     X∈  To bound the above, we need the following technical inequality. For sufficiently large δ, under the event k¯ (δ), for x 1 κ, ∈ M ≥ − k¯ 1 ρ 2 k¯ 1 ρ − x2 + 2xy + y2 < − (x + y θ)2. (164) k¯ − k¯ 3 k¯ − 2¯k − In fact, by rearranging, (164) holds for sufficiently large δ if  1 ρ k¯ 1 ρ ρ 2 + xy + 2 − θ(x + y) x2, (165) k¯ 2k¯ k¯ − 2k¯ ≤ 2k¯     36 SHIRSHENDUGANGULYANDKYEONGSIKNAM since using (135) we know the coefficient of y2 on the RHS is at least that on the LHS. For x 1 κ, we have y κ and thus 2( 1 + ρ )xy ρ x2 holds for small enough κ > 0 ≥ − ≤ k¯ 2k¯ ≤ 4k¯ (recall ρ = 16κ). Also, by the bounds for θ and k¯ in (125) and (135) respectively, under the event k¯ 1 ρ ρ 2 k¯ (δ), we have 2( − )θ(x + y) 2θ x . The previous two inequalities imply (165) ∈ M k¯ − 2k¯ ≤ ≤ 4k¯ and thus (164). k¯ 1 ρ 1 k¯ ρ Hence, by (163) and (164), and further using ( − ) + , the second term in (162) k¯ − 2k¯ − ≥ k¯ 1 2k¯ is bounded by − ¯ P (1) 2 k ρ (1) 1 (Zij ) + (1 + δ′) log n X 0 2 . (166) ≥ k¯ 1 2k¯ | F ∩F i ρ, Zij vivj 2(1 + δ′) log n X 0 2 ≥ − − ≥ | F ∩F i ρ, Z(1)v v 2(1 + δ ) log n ≥ − i − j ij i j ≥ ′ i ρ, Zij vivj 2(1 + δ′) log n X k¯=k 0 2 ≤ ≥ − − ≥ | F F k (δ) i 0 such that the first term in (169) is bounded by k ¯ 1 k ρ 1 1/2 (2) 2/3 ¯ (1+δ′) ¯ (1+δ′)+ (1+4δ′)ε +2ε (d′) ψ(δ) cρδ +2r (ε) C n− 2 k 1 − 4k 2 Cn− − δ . (170) − (k) k ≤ k (δ) n 2 − ∈MX Other three terms in (169) can be bounded using (157), (158) and (77) respectively. Hence, com- bining these together, using the fact that c(δ) < 1 < cρδ2/3 for large δ, we have 2 2 2 (1) ψ(δ) c(δ)+2r (ε) P x 1 κ, (v v ) > ρ, Z v v 2(1 + δ ) log n Cn− − δ . ≥ − i − j ij i j ≥ ′ ≤ i ρ δ = 0, n ≥ − i − j |U →∞ i

We prove Theorem 1.5 in this section. The proof essentially proceeds by comparing the ℓ1 and ℓ2 norms of the Gaussian variables on the edges of the clique KX by obtaining sharp estimates on each of them. The final statement then can be deduced from a quantitative version of the Cauchy-Schwarz inequality. However, as the statement of the theorem indicates, we will end up working with a set T slightly smaller than KX . Implementing the strategy involves a few steps and in particular relies on Theorem 1.6 which is the reason we proved the latter first.

Sum of squares of the Gaussian weights. We use the same notations as in Section 7. Also, as in the beginning of the proof of Theorem 1.6, we assume that the maximal cliques K := KX(1) and KX are unique and equal. Setting ρ := 16κ, similarly as (164), for sufficiently large δ, under the event k¯ (δ), for x 1 κ, ∈ M ≥ − k¯ 1 2 k¯ 1 ρ − x2 + 2xy + y2 − + (x + y θ)2. (174) k¯ 3 ≤ k¯ k¯ − Using the above and (129),   ¯ 1/2 1/2 k 1 ρ (1) 2 S2 − + (x + y θ) (Z ) , (175) ≤ k¯ k¯ − ij (i,j) B2    X∈  38 SHIRSHENDUGANGULYANDKYEONGSIKNAM where S1 and S2 were defined in (123). We now define an event guaranteeing a sharp behavior of the ℓ2 norm of the Gaussian variables on the edges in B2 where the latter was defined below (120), ¯ ¯ k ρ (1) 2 k ρ 3 := 2 (1 + δ′) log n (Z ) 2 + (1 + δ′) log n . (176) A k¯ 1 − k¯ ≤ ij ≤ k¯ 1 k¯ (i,j) B2 n  −  X∈  −  o Thus we have P c (1) (1) 1 3,x 1 κ, Zij vivj 2(1 + δ′) log n X 0 2 A ≥ − ≥ | F ∩F 1 i,j n  ≤X≤ p  P (1) 1 (S1 θ 2(1 + δ′) log n X ) 0 2 ≤ ≥ | F ∩F P c p (1) 1 + 3,x 1 κ, S2 (x + y θ) 2(1 + δ′) log n X 0 2 A ≥ − ≥ − | F ∩F P (1) p (1) 1  + max λ(Z ) 2(1 + δ′) log n X 0 2 . (177) ℓ=2, ,m ℓ ≥ | F ∩F ···  p  Since the first and last terms above can be bounded using (133) and (138) respectively, we only bound the second term. k¯ 1 ρ 1 k¯ ρ -Bounding the second term: Using ( −k¯ + k¯ )− k¯ 1 k¯ , ≥ − − P c (1) 1 3,x 1 κ, S2 (x + y θ) 2(1 + δ′) log n X 0 2 A ≥ − ≥ − | F ∩F (175) p¯  P c (1) 2 k ρ (1) 1 3, (Zij ) 2 (1 + δ′) log n X 0 2 ≤ A ≥ k¯ 1 − k¯ | F ∩F (i,j) B2  X∈  −   ¯ P (1) 2 k ρ (1) 1 (Zij ) + (1 + δ′) log n X 0 2 , (178) ≤ ≥ k¯ 1 k¯ | F ∩F i

¯ ¯ 1 1 ( k + ρ )(1+δ )+ 1 (1+4δ )ε1/2+ε 1 k (1+δ ) ρ (1+δ )+ 1 (1+4δ )ε1/2+2ε V (C ) j η¯2 kn− 2 k¯ 1 k¯ ′ 2 ′ n− 2 k¯ 1 ′ − 2k¯ ′ 2 ′ . (179) | 1 | − ≤ − -Combining altogether: As mentioned above, the first and last terms in (177) can be bounded using (133) and (138) respectively. Hence, combining these together,

P c (1) (1) 1 3,x 1 κ, Zij vivj 2(1 + δ′) log n X 0 2 A ≥ − ≥ | F ∩F 1 i,j n  ≤X≤  ¯ p 1 k (1+δ ) ρ (1+δ )+ 1 (1+4δ )ε1/2+2ε Cn− 2 k¯ 1 ′ − 2k¯ ′ 2 ′ . (180) ≤ − This follows from the fact that for sufficiently large δ, under the event 0 2, (179) is the slowest decaying term among itself, (133) and (138). This follows from (141) andF ∩ (143 F ) and observing that ε 1 and the bound for k¯ in (135) together, under the event k¯ (δ), implies ≤ δ4 ∈ M ¯ 1 k ρ 1 1/2 1 (1 + δ′)+ (1 + δ′) (1 + 4δ′)ε 2ε = + o (1) δ. 2 k¯ 1 2k¯ − 2 − 2 δ −   LARGEST EIGENVALUE OF SPARSE GAUSSIAN NETWORKS 39

Similarly as in (154), we write

P c,x 1 κ, Z(1)v v 2(1 + δ ) log n A3 ≥ − ij i j ≥ ′ 1 i,j n  ≤X≤ p  E P c (1) (1) 1 1 1 3,x 1 κ, Zij vivj 2(1 + δ′) log n X k¯=k 0 2 ≤ A ≥ − ≥ | F F k (δ) 1 i,j n ∈MX h  ≤X≤ p  i (1) E P (1) 1 1 1 c + Zij vivj 2(1 + δ′) log n X k¯=k 0 ≥ | F F1 k (δ) 1 i,j n ∈MX h  ≤X≤ p  i E P (1) (1) 1 1 P c + Zij vivj 2(1 + δ′) log n X k¯=k 0 + ( 0 ). (181) ≥ | F F k / (δ) 1 i,j n ∈MX h  ≤X≤ p  i First, as in (155), one can bound the first term above using (180). In fact, using the bound for δ′ and k¯ in (119) and (135) respectively, under k¯ (δ), ρ (1 + δ ) cδ2/3 for some c> 0. Thus, for ∈ M 2k¯ ′ ≥ ψ(δ) c δ2/3 large δ, the first term in (181) can be bounded by Cn− − ′ for some c′ < c. Combining this with the bounds for other three terms, previously obtained in (157), (158) and (77) respectively, 2/3 ψ(δ) c(δ)+2r (ε) using the fact that c(δ) < 1 < c′δ for large δ, (181) is bounded by Cn− − δ . Hence, using (117), for large δ, c ψ(δ) c(δ)+2r (ε) P( ,x 1 κ, λ 2(1 + δ) log n) Cn− − δ . (182) A3 ≥ − 1 ≥ ≤ Since 2rδ(ε) < c(δ), combined with Theoremp 1.1 and (160), for sufficiently large δ,

lim P 3 δ = 1. (183) n →∞ A |U Sum of absolute values of Gaussian weights. We now estimate the sum of absolute values of (1) Zij . Defining 2 2 1 κ0 ′ := v (184) A i − k¯ ≤ k¯ i K n X∈   o (recall κ = 40κ, see (173)), since k¯ = K , by (172) and (173), 0 | | lim P ′ δ = 1. (185) n →∞ A |U Recalling that K = K with probability going to one conditionally on , the events (from the X Uδ A2 statement of the theorem) and ′ are essentially the same. We now define the set of verticesA T appearing in the statement of the theorem, 1 κ1/4 T := i K : v2 < 0 . ∈ i − k¯ k¯ n o Then, by (135), for sufficiently large δ, under the event k¯ h(δ) 1, | − |≤ i = j, i, j T implies (i, j) B . (186) 6 ∈ ∈ 2 (135) This is because for i T and large δ, v2 (1 κ1/4) 1 c > 1 η¯2 where the final inequality ∈ i ≥ − 0 k¯ ≥ δ1/3 δ2 ≥ is by our choice of η¯ in (121). We now write (1) (1) (1) S2 = Zij vivj = Zij vivj + Zij vivj =: S21 + S22 (187) c (i,j) B2 i or j T ,(i,j) B2 i,j T,(i,j) B2 X∈ ∈X ∈ ∈ X ∈ 40 SHIRSHENDUGANGULYANDKYEONGSIKNAM

(see (123) for the definition of S ). By Cauchy-Schwarz inequality, under the event , 2 A′ 1 v2 κ1/2. (188) i − k¯ ≤ 0 i K X∈ Thus, under the event , A′ T c K κ1/4k,¯ T (1 κ1/4)k.¯ (189) | ∩ |≤ 0 | |≥ − 0 2 1 1/2 Note that (188) implies 1/4 (v ) κ , and thus under the event , v2 1 (1+κ ) i k¯ 0 ′ i ≥ k¯ 0 − ≤ A P 1 1 v2 = v2 + v2 κ1/2 + κ1/4k¯ + κ1/4k¯ = κ1/2 + 2κ1/4. i i i ≤ 0 k¯ 0 k¯ 0 0 0 i T c 2 1 1/4 2 1 1/4 v ¯ (1+κ ) v ¯ (1 κ )   X∈ i ≥ kX 0 i ≤ kX− 0 Hence, under the event , A′ 2 2 2 2 1/2 1/4 2 vi vj 2 vi vj 2κ0 + 4κ0 =: κ′ , (190) c ≤ c ≤ i or j T ,(i,j) B2 i T j C1 ∈X ∈  X∈  X∈  and thus 1/2 1/2 1/2 (1) 2 2 2 (1) 2 S21 (Zij ) vi vj κ′ (Zij ) . ≤ c c ≤ c i or j T ,(i,j) B2 i or j T ,(i,j) B2 i or j T ,(i,j) B2  ∈X ∈   ∈X ∈   ∈X ∈  (191)

In addition, using the fact that v2 < 1 (1 + κ1/4) for i T , i k¯ 0 ∈ 1 1 S (1 + κ1/4) Z(1) (1 + κ1/4) Z(1) . (192) | 22|≤ k¯ 0 | ij |≤ k¯ 0 | ij | i,j T,(i,j) B2 i=j,i,j T ∈ X ∈ 6 X∈ Now, we define the following event analogous to , but for the ℓ norm, A3 1 := k¯(1 3κ1/4) 2(1 + δ ) log n Z(1) k¯(1 + 3κ1/4) 2(1 + δ ) log n . (193) A4 − ′ ≤ | ij |≤ ′ i=j,i,j T n p 6 X∈ p o Now using the decomposition in (130) and further using (187), (1) (1) Zij vivj S1 + S21 + S22 + (1 x y) max λ(Z ), ≤ − − ℓ=2, ,m ℓ 1 i,j n ··· ≤X≤ we write P c (1) (1) 1 4, ′,x 1 κ, Zij vivj 2(1 + δ′) log n X 0 2 A A ≥ − ≥ | F ∩F 1 i,j n  ≤X≤ p  P (1) 1 P √ (1) 1 (S1 θ 2(1 + δ′) log n X ) 0 2 + ( ′,S21 κ′ 2(1 + δ′) log n X ) 0 2 ≤ ≥ | F ∩F A ≥ | F ∩F P c p √ p (1) 1 + 4,x 1 κ, S22 (x + y θ κ′) 2(1 + δ′) log n X 0 2 A ≥ − ≥ − − | F ∩F P (1) (1)p 1  + max λ(Z ) 2(1 + δ′) log n X 0 2 (194) ℓ=2, ,m ℓ ≥ | F ∩F ···  p  (recall that κ′ is defined in (190)). Since we already have estimates for the first and last terms above, we only focus on the second and third terms. LARGEST EIGENVALUE OF SPARSE GAUSSIAN NETWORKS 41

-Bounding the second term: By (191), P √ (1) 1 ( ′,S21 κ′ 2(1 + δ′) log n X ) 0 2 A ≥ | F ∩F P p (1) 2 1 (1) 1 ′, (Zij ) (1 + δ′) log n X 0 2 . (195) F ∩F ≤ A c ≥ κ′ | i

1 1/4 log n γ = ε, L = (1 + δ′) log n, m 4κ k¯(1 + 4δ′) κ′ ≤ log log n (see (67)), for sufficiently large n, the quantity (195), and thus the second term in (194), is bounded by

1/4 1 1 1/4 1 1/4 4κ k¯ (1+δ′)+ 4κ k¯(1+4δ′)ε+ε (1+δ′)+2κ k¯(1+4δ′)ε+2ε V (C ) n− 2κ′ 2 n− 2κ′ . (196) | 1 |⌊ ⌋ ≤ The above inequality follows from the bound for V (C ) in (68) and observing that | 1 | 2 + 4δ log n 4κ1/4k¯ (135) 2 + 4δ log n cδ1/3 ′ ⌊ ⌋ ′ nε ε log log n ≤ ε log log n ≤     for large n (c> 0 is a constant depending on κ). The first factor in (196), as several times before, appears due to a union bound over all possible choices of T c K. ∩ -Bounding the third term: Note that for sufficiently small κ> 0, for large enough δ and x 1 κ, ≥ − x + y θ √κ 1 3κ1/4 − − ′ (197) − ≤ 1/4 1+ κ0

1/4 1/4 (recall κ0 = 40κ). In fact, (197) holds if (1 3κ )(1 + κ0 ) 1 κ θ √κ′ for sufficiently large δ and small κ> 0, which follows from the− bound for θ in (125≤). − − − Hence, using (192) and (197), recalling the definition of 4 in (193), the third term in (194) can be controlled by A P c √ (1) 1 4,x 1 κ, S22 (x + y θ κ′) 2(1 + δ′) log n X 0 2 A ≥ − ≥ − − | F ∩F  P c (1) ¯ 1/4 p (1) 1 4, Zij k(1 3κ ) 2(1 + δ′) log n X 0 2 ≤ A | |≥ − | F ∩F i=j,i,j T  6 X∈ p  P (1) ¯ 1/4 (1) 1 Zij > k(1 + 3κ ) 2(1 + δ′) log n X 0 2 ≤ | | | F ∩F i=j,i,j T  6 X∈ p  ¯2 P (1) 2 k 1/4 2 (1) 1 (Zij ) > (1 + 3κ ) (1 + δ′) log n X 0 2 . (198) ≤ k¯(k¯ 1) | F ∩F i

1 k¯ 1/4 2 1 k¯ 1/4 2 k¯ (1+3κ ) (1+δ′)+ε (1+3κ ) (1+δ′)+2ε V (C ) n− 2 k¯ 1 n− 2 k¯ 1 . (199) | 1 | − ≤ − 42 SHIRSHENDUGANGULYANDKYEONGSIKNAM

Here, we used the bound for V (C1) in (68) and the upper bound for k¯ in (135) under the event k¯ (δ) . | | ∈ M -Combining altogether: As mentioned already, the first and the last terms in (194) can be bounded by (133) and (138) respectively. Thus, combining these with (196) and (199), for sufficiently small κ> 0 and large δ, for large enough n, P c (1) (1) 1 4, ′,x 1 κ, Zij vivj 2(1 + δ′) log n X 0 2 A A ≥ − ≥ | F ∩F 1 i,j n  ≤X≤  ¯ p ¯ 1 k (1+3κ1/4)2(1+δ )+2ε 1 k (1+δ ) 3κ1/4δ +2ε Cn− 2 k¯ 1 ′ Cn− 2 k¯ 1 ′ − ′ . (200) ≤ − ≤ − This follows from the fact that for sufficiently small κ > 0 and large δ, under the event 0 2, (199) is the slowest decaying term among itself, (133), (138) and (196). In fact, using ε F 1∩and F ≤ δ4 the bound for k¯ in (135), under the event k¯ (δ), ∈ M 1 1/4 1 (1 + δ′) 2κ k¯(1 + 4δ′)ε 2ε = + oδ(1) δ, 2κ′ − − 2κ′ ¯   1 k 1/4 2 1 1/4 2 (1 + 3κ ) (1 + δ′) 2ε = (1 + 3κ ) + o (1) δ. 2 k¯ 1 − 2 δ −   Hence, recalling the definition of κ′ in (190), for small κ> 0 and large δ, the quantity (199) slowly decays than (196). Also, by comparing the above asymptotic with (141) and (143), one can deduce that the quantity (199) slowly decays than (133) and (138) for small κ and large δ. Thus, by proceeding as in (154), for sufficiently large δ,

c (1) ψ(δ) c(δ)+2r (ε) P , ′,x 1 κ, Z v v 2(1 + δ ) log n n− − δ . A4 A ≥ − ij i j ≥ ′ ≤ 1 i,j n  ≤X≤ p  In fact, one can bound this quantity by the sum of four quantities via the argument of (154). Using (200) and the bound for k¯ in (135), for large δ, the corresponding first term in (154) can be ψ(δ) cδ bounded by Cn− − for some c> 0, and other three terms can be bounded by (157), (158) and (77) respectively. Combining these together, using the fact that c(δ) < 1 < cδ for large δ, we obtain the above inequality. Applying (117) and then Theorem 1.1,

c lim P , ′,x 1 κ δ = 0. n A4 A ≥ − |U →∞   Hence, by (160) and (185),

lim P 4 δ = 1. (201) n A |U →∞   Finishing the proof. Finally, using (183) and (201), we finish the proof. Define the event := k¯ (δ) . A5 { ∈ M } By (98), recalling , A5 ⊂ F2

lim P 5 δ = 1. (202) n A |U →∞   Now, define the event

:= ′ . A6 A ∩A3 ∩A4 ∩A5 LARGEST EIGENVALUE OF SPARSE GAUSSIAN NETWORKS 43

Since ′, 3, 4 and 5 are typical events conditioned on δ (see (185), (183), (201) and (202) respectively),A A A A U

lim P 6 δ = 1. (203) n A |U →∞   We next verify that the event 6 implies the desired uniformity of Gaussians claimed in the statement of the theorem. As indicatedA earlier, the proof involves technical manipulations involving the Cauchy-Schwarz inequality to relate the ℓ1 and ℓ2 norms. For the ease of reading, let us recall the events ¯ ¯ k ρ (1) 2 k ρ = 2 (1 + δ′) log n (Z ) 2 + (1 + δ′) log n , A3 k¯ 1 − k¯ ≤ ij ≤ k¯ 1 k¯ (i,j) B2 n  −  X∈  −  o = k¯(1 3κ1/4) 2(1 + δ ) log n Z(1) k¯(1 + 3κ1/4) 2(1 + δ ) log n . A4 − ′ ≤ | ij |≤ ′ i=j,i,j T n p 6 X∈ p o Note that using the fact ρ = 16κ and (186), for sufficiently large δ, the event 3 4 5 implies that A ∩A ∩A 1 ( Z(1) Z(1) )2 2 | ij | − | i′j′ | i=j,i =j ,i,j,i ,j T 6 ′6 X′ ′ ′∈ 2 = T ( T 1) (Z(1))2 Z(1) | | | |− ij − | ij | i=j,i,j T i=j,i,j T  6 X∈   6 X∈  ¯ k ρ 2 1/4 2 2 T ( T 1) + (1 + δ′) log n 2k¯ (1 3κ ) (1 + δ′) log n ≤ | | | |− k¯ 1 k¯ − −  −1/4 2  1/4 2 (32(k¯ 1)κ + 12κ k¯ )(1 + δ′) log n Cκ k¯ (1 + δ′) log n, ≤ − ≤ where we used T k¯ and κ κ1/4 in the second and the last inequality respectively. From this, | |≤ ≤ (1) using the argument in (171), setting S′ = i,j T Zij , one can deduce that ∈ | | 2 (1) P1 1/4 Z S′ Cκ (1 + δ′) log n. (204) | ij |− T ( T 1) ≤ i=j,i,j T 6 X∈  | | | |−  We check that under the event ′ 4 5, there exists ι(κ) with limκ 0 ι = 0 such that A ∩A ∩A → 1 1 C 1 S′ 2(1 + δ ) log n ι(κ)+ 2(1 + δ ) log n. (205) T ( T 1) − h(δ) ′ ≤ h(δ) h(δ) ′ | | | |− p   p 1/4 ¯ ¯ In fact, first note that by (189), T (1 κ )k under the event ′. Also, k h(δ) under 5 and | |≥ − 0 A ≥ A we have the upper bound for S′ under 4. Hence, combining these ingredients together, under the event , A A′ ∩A4 ∩A5 1 1 + 3κ1/4 1 S′ 2(1 + δ′) log n T ( T 1) ≤ 1 κ1/4 k¯(1 κ1/4) 1 | | | |− − 0 − 0 − p 1/4 1 (1 + 10κ ) 2(1 + δ′) log n ≤ h(δ)(1 κ1/4) 1 − 0 − 2 p 1 (1 + 10κ1/4) 1 + 2κ1/4 + 2(1 + δ ) log n ≤ 0 h(δ) h(δ) ′   p (recall κ0 = 40κ, see (173)), and the similar lower bound holds. This gives (205). 44 SHIRSHENDUGANGULYANDKYEONGSIKNAM

Hence, (204) and (205) imply that for some ι′(κ) with limκ 0 ι′ = 0, under the event 6 (recall that = ) , → A A6 A′ ∩A3 ∩A4 ∩A5 (1) 1 2 C Z 2(1 + δ ) log n ι′(κ)+ (1 + δ′) log n. | ij |− h(δ) ′ ≤ h(δ) i=j,i,j T 6 X∈  p    By Cauchy-Schwarz inequality, using the fact that T k¯, under the event , | |≤ A6 (1) 1 C Zij 2(1 + δ′) log n Ch(δ) ι′(κ)+ (1 + δ′) log n. (206) | |− h(δ) ≤ s h(δ) i=j,i,j T   6 X∈ p Note that under the event , A5 Z Z(1) Z(2) k¯2 ε log log n Ch(δ)2 ε log log n. (207) | ij|− | ij | ≤ | ij |≤ ≤ i=j,i,j T i,j T i,j T 6 X∈ X∈ X∈ p p Hence, by the above two inequalities, under the event , for sufficiently large n, A6 1 C Zij 2(1 + δ′) log n Ch(δ) ι′(κ)+ (1 + δ′) log n. (208) | |− h(δ) ≤ s h(δ) i=j,i,j T   6 X∈ p By (118) and recalling ε 1 , for large enough δ, the above implies ≤ δ4 1 C Zij 2(1 + δ) log n Ch(δ) ι′(κ)+ (1 + δ) log n. (209) | |− h(δ) ≤ s h(δ) i=j,i,j T   6 X∈ p In fact, by the triangle inequality, the difference between LHS of (208) and (209) is bounded by

(114) (9) √ε(1 + δ)√log n 2 √log n C h(δ) Ch(δ) Ch(δ) ι′(κ)+ (1 + δ) log n. h(δ) ≤ δ ≤ s h(δ)   1/4 Since κ0 = 40κ, by (189), under the event 6, we have T (1 cκ )k¯. In addition, since 1/3 AC | | ≥ − h(δ) cδ , one can simplify the term ι′(κ)+ to ζ(κ) with limκ 0 ζ(κ) = 0 if δ is chosen large ≥ h(δ) → enough depending on κ. Dividing both sides by h(δ)2 and using (203) completes the proof. Remark 8.1. Note that (209) gives a bound depending on both δ and κ and only on taking δ large enough depending on κ yields the theorem. Further, even though we provided sharp bounds for both ℓ1 and ℓ2 norms, in fact, a lower bound for the former and an upper bound for the latter suffices.

9. Lower tail large deviations We end with the short argument establishing the large deviation probability of the lower tail, Theorem 1.7. Proof of Theorem 1.7. The upper bound is an easy consequence of the inequality (29). In fact, by Lemma 4.1, 4.2 and (29), P(λ (Z) 2(1 δ) log n) P(max Z 2(1 δ) log n) 1 ≤ − ≤ ij ≤ − p E P(max Z p 2(1 δ) log n X)1 + P(Ec) ≤ ij ≤ − | E0 0  nδ  c′ cn p e− √log n + e− . ≤ LARGEST EIGENVALUE OF SPARSE GAUSSIAN NETWORKS 45

We now prove a matching lower bound. Define an event , measurable with respect to X, by Sδ log n δ := λ1(X) (1 + δ) . S ≤ slog log n n o (2) Notice that P(Sδ) 1 by Lemma 5.2. Since λ1(Z ) √ε log log n λ1(X), conditionally on X, under the event →, it holds that ≤ · Sδ λ (Z(2)) √ε(1 + δ) log n. 1 ≤ p Since λ (Z) λ (Z(1))+ λ (Z(2)), 1 ≤ 1 1 P(λ (Z) 2(1 δ) log n) P(λ (Z(1)) 2(1 δ ) log n, λ (Z(2)) √ε(1 + δ) log n), 1 ≤ − ≥ 1 ≤ − ′′ 1 ≤ p p p where δ′′ > 0 is defined by 2(1 δ′′) = 2(1 δ) √ε(1 + δ). Recalling the definition of = Few cycles− from (76), analogously− − we define := 0 4δ′ 4δ′ 4δ′ p p 3 4δ′′ 4δ′′ 4δ′′ FewF Dcycles∩C ∩E, we have∩ − F D ∩C ∩E ∩ − ∩ Sδ P E P (1) (1) 1 (λ1(Z) 2(1 δ) log n) (λ1(Z ) 2(1 δ′′) log n X, X ) 3 . (210) ≤ − ≥ ≤ − | F p h p i Above we use that is measurable with respect to the sigma algebra generated by X(1), X . F3 { } We now estimate P(λ (Z(1)) 2(1 δ ) log n X, X(1)) under the event and finally we will 1 ≤ − ′′ | F3 use that is likely. We will crucially use throughout the proof that given X(1),Z(1) and X are 3 p conditionallyF independent. Let C , ,C be X(1)’s connected components and denote by k the size of maximal clique in 1 · · · m i Ci. Let

I := i = 1, ,m : k 3 , J := i = 1, ,m : k = 2 { · · · i ≥ } { · · · i } 1/2 1/4 1/4 and define ξ := (2ε + 8εδ′′) . By Proposition 5.7 with γ = ε and η = ε , for sufficiently small ε> 0, under the event , for i I, F3 ∈ (1) (1) 1 (1 ξ)2(1 δ )+ 1+4δ′′ ε1/2+ε P(λ (Z ) 2(1 δ ) log n X, X )

(1) (1) (1 ξ)2(1 δ )+ 1+4δ′′ ε1/2+ε P(λ (Z ) 2(1 δ ) log n X, X ) p(1 n− − − ′′ 2 ) (1 n− 2 − − ′′ 2 ) − − 1 1 (1 ξ)2(1 δ )+ 1+4δ′′ ε1/2+ε exp( n − − − ′′ 2 ). (213) ≥ 2 − Since P( ) 1 and ε> 0 is arbitrary small, by (210) and (213), proof is concluded.  F3 ≥ 2 46 SHIRSHENDUGANGULYANDKYEONGSIKNAM

Appendix A. Key estimates In this appendix, we include the outstanding proofs of basic properties about Gaussian random variables. as well as the proof of Lemma 4.2 involving a straightforward application of Chernoff’s bound.

Proof of Lemma 4.1. Recalling the basic tail bounds from (24), for some constant c1 > 0,

m 1 P( max Xi 2(1 + δ) log n) = 1 (1 P(X1 2(1 + δ) log n)) c1 . i=1, ,m ≥ − − ≥ ≥ nδ√log n ··· p p Similarly, for some constant c2 > 0,

nδ m c2 P( max Xi 2(1 δ) log n) = (1 P(X1 2(1 δ) log n)) e− √log n . i=1, ,m ≤ − − ≥ − ≤ ··· p p 

Proof of Lemma 4.2. We use the Chernoff’s bound for Bernoulli variables for q>p:

mIp(q) P(Bin(m,p) mq) e− , (214) ≥ ≤ x 1 x where Ip(x) := x log p + (1 x) log 1−p is the relative function. Thus, − − n(n 1) d − I (1 ) n(n 1) d n(n 1) d 2 1 d 4n P Bin − , 1 − 1 e− − n − , (215) 2 − n ≥ 2 − 4n ≤      Using log(1 + x) x for small positive x, ≥ 2 d d 3d d 1 C1 C2 I1 d 1 1 + log 2 . (216) − n − 4n ≥ − 4n 8(n d) 4n 4 ≥ n d − n     − − Hence, by (215) and (216), there exists a constant c> 0 such that for sufficiently large n,

n(n 1) d n(n 1) d cn P Bin − , 1 − 1 e− . 2 − n ≥ 2 − 4n ≤ This implies that     

n(n 1) d n(n 1) d cn P Bin − , − e− , 2 n ≤ 2 4n ≤ which concludes the proof.     

Proof of Lemma 5.1. Recall that we are aiming to show m 2 2 m 1 L 1 m L 1 εm log log n P(Y˜ + + Y˜ L) C e− 2 e 2 e 2 , 1 · · · m ≥ ≤ m   and in particular, for any a, b, c > 0, if m b log n + c and L = a log n, then, for any γ > 0, for ≤ log log n sufficiently large n,

2 2 a + εb +γ P(Y˜ + + Y˜ a log n) n− 2 2 . (217) 1 · · · m ≥ ≤ By exponential Chebyshev’s bound, for any t> 0,

2 2 tL tY˜ 2 m P(Y˜ + + Y˜ L) e− (Ee 1 ) . (218) 1 · · · m ≥ ≤ LARGEST EIGENVALUE OF SPARSE GAUSSIAN NETWORKS 47

Using the lower bound for the tail (24), the probability density function of Y˜ , denoted by f˜(x) for x √ε log log n, satisfies | |≥ C 1 x2 1 ε log log n 1 x2 f˜(x) e− 2 = C ε log log ne 2 e− 2 . ≤ 1 1 ε log log n (√ε log log n)− e− 2 p Hence, using the upper bound for the tail (24), by making a change of variable x = 1 y, √1 2t − tY˜ 2 1 ε log log n ∞ tx2 1 x2 Ee 1 C ε log log ne 2 e e− 2 dx ≤ √ε log log n p Z 1 ε log log n 1 ∞ 1 y2 = C ε log log ne 2 e− 2 dy √1 2t √1 2t√ε log log n p − Z − 1 ε log log n 1 1 1 (1 2t)ε log log n 1 tε log log n C ε log log ne 2 e− 2 − = C e . ≤ √1 2t √1 2t√ε log log n 1 2t − − − Applying thisp to (218),

2 2 m tL 1 tεm log log n P(Y˜ + + Y˜ L) C e− e . 1 · · · m ≥ ≤ (1 2t)m − We take t = 1 (1 m ) < 1 (recall that L>m) in order to balance two terms e tL and 1 . We 2 − L 2 − (1 2t)m conclude the proof of (38). − We now show (217). We first check that for any L > 0, a function x ( L )x is increasing 7→ x on (0, L ). This is because the derivative of x log( L ), which is given by log( L ) 1, is positive for e x x − x (0, L ). Hence, for any γ > 0, for sufficiently large n, the LHS of (217) is bounded by ∈ e log n log n b +c b +c a + εb b a log log n a + εb +γ C log log n n− 2 2 n 2 log log n log log n n− 2 2 . b ≤  logn γ c2 Here, we used the fact that for large n, (c1 log log n) log log n n 2 . ≤ 

References

[1] Johannes Alt, Raphaël Ducatez, and Antti Knowles. Extremal eigenvalues of critical erdős-rényi graphs. arXiv preprint arXiv:1905.03243, 2019. [2] G Ben Arous, Amir Dembo, and Alice Guionnet. Aging of spherical spin glasses. and related fields, 120(1):1–67, 2001. [3] Gerard Ben Arous and Alice Guionnet. Large deviations for Wigner’s law and Voiculescu’s non-commutative entropy. Probability Theory and Related Fields, 108(4):517–542, 1997. [4] Fanny Augeri. Large deviations principle for the largest eigenvalue of wigner matrices without gaussian tails. Electron. J. Probab., 21:49 pp., 2016. [5] Fanny Augeri. Nonlinear large deviation bounds with applications to traces of wigner matrices and cycles counts in Erdős-Rényi graphs. Annals of Probability, to appear, 2020. [6] Fanny Augeri, Alice Guionnet, and Jonathan Husson. Large deviations for the largest eigenvalue of sub-gaussian matrices. arXiv preprint arXiv:1911.10591, 2019. [7] Tim Austin. The structure of low-complexity gibbs measures on product spaces. Annals of Probability, 47(6):4002–4023, 2019. [8] Afonso S Bandeira and Ramon Van Handel. Sharp nonasymptotic bounds on the norm of random matrices with independent entries. Annals of Probability, 44(4):2479–2506, 2016. [9] Anirban Basak and Riddhipratim Basu. Upper tail large deviations of the cycle counts in Erdős-Rényi graphs in the full localized regime. arXiv:1912.11410, 2019. [10] Anirban Basak and Sumit Mukherjee. Universality of the mean-field for the . Probability Theory and Related Fields, 168(3-4):557–600, 2017. 48 SHIRSHENDUGANGULYANDKYEONGSIKNAM

[11] Florent Benaych-Georges, Charles Bordenave, and Antti Knowles. Largest eigenvalues of sparse inhomogeneous erdős–rényi graphs. Annals of Probability, 47(3):1653–1676, 05 2019. [12] Florent Benaych-Georges, Charles Bordenave, and Antti Knowles. Spectral radii of sparse random matrices. In Annales de l’Institut Henri Poincaré, Probabilités et Statistiques, volume 56, pages 2141–2161. Institut Henri Poincaré, 2020. [13] Bhaswar B Bhattacharya, Sohom Bhattacharya, and Shirshendu Ganguly. Spectral edge in sparse random graphs: Upper and lower tail large deviations. arXiv preprint arXiv:2004.00611, 2020. [14] Bhaswar B. Bhattacharya and Shirshendu Ganguly. Upper tails for edge eigenvalues of random graphs. SIAM Journal on Discrete Mathematics, to appear, 2020. [15] Bhaswar B. Bhattacharya, Shirshendu Ganguly, Eyal Lubetzky, and Yufei Zhao. Upper tails and independence polynomials in random graphs. Advances in Mathematics, 319(313–347), 2017. [16] Béla Bollobás. Random graphs. Number 73. Cambridge university press, 2001. [17] Charles Bordenave and Pietro Caputo. A large deviation principle for wigner matrices without gaussian tails. Annals of Probability, 42(6):2454–2496, 2014. [18] Charles Bordenave and Pietro Caputo. Large deviations of empirical neighborhood distribution in sparse random graphs. Probability Theory and Related Fields, 163(1-2):149–222, 2015. [19] Charles Bordenave, Arnab Sen, and Bálint Virág. Mean quantum percolation. Journal of the European Mathe- matical Society, 19(12):3679–3707, 2017. [20] Sourav Chatterjee. Superconcentration and related topics, volume 15. Springer, 2014. [21] Sourav Chatterjee and Amir Dembo. Nonlinear large deviations. Adv. Math., 299:396–450, 2016. [22] Sourav Chatterjee and S. R. S. Varadhan. The large deviation principle for the Erdős-Rényi random graph. European J. Combin., 32(7):1000–1017, 2011. [23] Sourav Chatterjee and S. R. S. Varadhan. Large deviations for random matrices. Comm. Stoch. Analysis, 6(1):1– 13, 2012. [24] Nick Cook and Amir Dembo. Large deviations of subgraph counts for sparse Erdős-Rényi graphs. arXiv:1809.11148, 2018. [25] Amir Dembo and Ofer Zeitouni. Large deviations techniques and applications, volume 38 of Stochastic Modelling and Applied Probability. Springer-Verlag, Berlin, 2010. Corrected reprint of the second (1998) edition. [26] Ronen Eldan. Gaussian-width gradient complexity, reverse log-Sobolev inequalities and nonlinear large devia- tions. Geom. Funct. Anal., to appear, 2018. [27] László Erdős, Antti Knowles, Horng-Tzer Yau, and Jun Yin. Spectral statistics of erdős-rényi graphs ii: Eigen- value spacing and the extreme eigenvalues. Communications in Mathematical Physics, 314(3):587–640, 2012. [28] László Erdős, Antti Knowles, Horng-Tzer Yau, and Jun Yin. Spectral statistics of erdős–rényi graphs i: local semicircle law. The Annals of Probability, 41(3B):2279–2375, 2013. [29] Alan Frieze and Michał Karoński. Introduction to random graphs. Cambridge University Press, 2016. [30] Alice Guionnet and Jonathan Husson. Large deviations for the largest eigenvalue of rademacher matrices. Annals of Probability, to appear, 2020. [31] Matan Harel, Frank Mousset, and Wojciech Samotij. Upper tails via high moments and entropic stability. arXiv:1904.08212, 2019. [32] Michael Krivelevich and Benny Sudakov. The largest eigenvalue of sparse random graphs. Combinatorics, Prob- ability and Computing, 12(1):61–72, 2003. [33] Rafał Latała. Some estimates of norms of random matrices. Proceedings of the American Mathematical Society, 133(5):1273–1282, 2005. [34] Rafał Latała, Ramon van Handel, and Pierre Youssef. The dimension-free structure of nonhomogeneous random matrices. Inventiones mathematicae, 214(3):1031–1080, 2018. [35] Eyal Lubetzky and Yufei Zhao. On replica symmetry of large deviations in random graphs. Random Structures Algorithms, 47(1):109–146, 2015. [36] Eyal Lubetzky and Yufei Zhao. On the variational problem for upper tails in sparse random graphs. Random Structures Algorithms, 50(3):420–436, 2017. [37] Theodore S Motzkin and Ernst G Straus. Maxima for graphs and a new proof of a theorem of Turán. Canadian Journal of Mathematics, 17:533–540, 1965. [38] David Reimer. Proof of the van den berg–kesten conjecture. Combinatorics, Probability and Computing, 9(1):27– 32, 2000. [39] Yoav Seginer. The expected norm of random matrices. Combinatorics, Probability and Computing, 9(2):149–166, 2000. LARGEST EIGENVALUE OF SPARSE GAUSSIAN NETWORKS 49

[40] Konstantin Tikhomirov and Pierre Youssef. Outliers in spectrum of sparse wigner matrices. Random Structures & Algorithms, 2020. [41] Ramon Van Handel. On the spectral norm of gaussian random matrices. Transactions of the American Mathe- matical Society, 369(11):8161–8178, 2017. [42] Jun Yan. Nonlinear large deviations: Beyond the hypercube. Annals of Applied Probability, to appear, 2020.

Department of Statistics, Evans Hall, University of California, Berkeley, CA 94720, USA Email address: [email protected]

Department of Mathematics, University of California, Los Angeles, CA 94720, USA Email address: [email protected]