Testing correlation of unlabeled random graphs

Yihong Wu, Jiaming Xu, and Sophie H. Yu ∗

February 9, 2021

Abstract We study the problem of detecting the edge correlation between two random graphs with n unlabeled nodes. This is formalized as a hypothesis testing problem, where under the null hypothesis, the two graphs are independently generated; under the alternative, the two graphs are edge-correlated under some latent node correspondence, but have the same marginal distri- butions as the null. For both Gaussian-weighted complete graphs and dense Erd˝os-R´enyi graphs (with edge probability n−o(1)), we determine the sharp threshold at which the optimal testing error probability exhibits a phase transition from zero to one as n → ∞. For sparse Erd˝os-R´enyi graphs with edge probability n−Ω(1), we determine the threshold within a constant factor. The proof of the impossibility results is an application of the conditional second-moment method, where we bound the truncated second moment of the likelihood ratio by carefully con- ditioning on the typical behavior of the intersection graph (consisting of edges in both observed graphs) and taking into account the cycle structure of the induced random permutation on the edges. Notably, in the sparse regime, this is accomplished by leveraging the struc- ture of subcritical Erd˝os-R´enyi graphs and a careful enumeration of subpseudoforests that can be assembled from short orbits of the edge permutation.

Contents

1 Introduction 2 1.1 Main results ...... 5 1.2 Test statistic and proof techniques ...... 6 1.3 Connection to the literature ...... 9 1.4 Notation and paper organization ...... 10

2 First Moment Method for Detection 11 arXiv:2008.10097v2 [math.ST] 8 Feb 2021 2.1 Proof of Theorem1: positive part ...... 12 2.2 Proof of Theorem2: positive part ...... 13

3 Unconditional Second Moment Method and Obstructions 14 3.1 Node permutation, edge permutation, and cycle decomposition ...... 14 3.2 Second moment calculation ...... 14 3.3 Obstruction from short orbits ...... 18 ∗Y. Wu is with Department of Statistics and Data Science, Yale University, New Haven CT, USA, [email protected]. J. Xu and S. H. Yu are with The Fuqua School of Business, Duke University, Durham NC, USA, {jx77,haoyang.yu}@duke.edu. Y. Wu is supported in part by the NSF Grant CCF-1900507, an NSF CA- REER award CCF-1651588, and an Alfred Sloan fellowship. J. Xu is supported by the NSF Grants IIS-1838124, CCF-1850743, and CCF-1856424.

1 4 Conditional Second Moment Method: Dense regime 20 4.1 Sharp threshold for the Gaussian model ...... 21

5 Conditional Second Moment Method: Sparse regime 25 5.1 Classification of edge orbits ...... 29 5.2 Orbit graph and backbone graph ...... 31 5.3 Warm-up: Generating function of orbit forests ...... 33 5.4 Proof of Theorem4: Generating function of orbit ...... 35

6 Conclusion and Open Questions 40

A Supplementary Proof for Sections1,3 and4 41 A.1 Proof for Remark1...... 41 A.2 Proof of Proposition2...... 42 A.3 Sharp threshold for dense Erd˝os-R´enyi graphs ...... 43

B Supplementary Proofs for Section5 51 B.1 Proof of Proposition3: Long orbits ...... 51 B.2 Proof of Proposition4: Incomplete orbits ...... 52 B.3 Proof of Proposition5: Averaging over orbit lengths ...... 53

C Concentration Inequalities for Gaussians and Binomials 61

D Facts on Random Permutation 62

1 Introduction

Understanding and quantifying the correlation between datasets are among the most fundamental tasks in statistics. In many modern applications, the observations may not be in the familiar form of vectors but rather graphs. Furthermore, the node labels may be absent or scrambled, in which case one needs to decide the similarity between these unlabeled graphs on the sheer basis of their topological structures. Equivalently, it amounts to determining whether there exists a node correspondence under which the (weighted) edges of the two graphs are correlated. This problem arises naturally in a wide range of fields:

• In , one is interested in deciding whether two friendship networks on different social platforms share structural similarities, where the node labels are frequently anonymized due to privacy considerations [NS08, NS09].

• In computer vision, 3-D shapes are commonly represented by geometric graphs, where nodes are subregions and edges encode the adjacency relationships between different regions. A key building block for pattern recognition and image processing is to determine whether two graphs correspond of the same object that undergoes different rotations or deformations (changes in pose or topology) [CSS07, BBM05].

• In computational biology, an important task is to assess the correlation of two biological networks in two different species so as to enrich one dataset using the other [SXB08, VCP+11].

2 • In natural language processing, the so-called ontology alignment problem refers to uncovering the correlation between two knowledge graphs that are in either different languages [HNM05] or different domains (e.g. Library of Congress versus Wikipedia [BGSW13]).

Inspired by the hypothesis testing model proposed by Barak et al [BCL+19], we formulate a general problem of testing network correlation as follows. Let G = ([n],W ) denote a weighted undirected graph on the node set [n] , {1, . . . , n} with weighted adjacency matrix W , where Wii = 0 and for any 1 ≤ i < j ≤ n, Wij = 1 (or the edge weight) if i and j are adjacent and Wij = 0 otherwise. Recall that two weighted graphs G = ([n],W ) and H = ([n],W 0) are isomorphic and denoted by G =∼ H if there exists a permutation (called a graph isomorphism) π on [n] such that 0 Wij = Wπ(i)π(j) for all i, j. Given a weighted graph G, its isomorphism class G is the equivalence class G = {H : H =∼ G}. We refer to an isomorphism class as an unlabeled graph and G as the unlabeled version of G.

0 Problem 1 (Testing correlation of unlabeled graphs). Let G1 = ([n],W ) and G2 = ([n],W ) be 0 two weighted random graphs, where the edge weights {(Wij,Wij) : 1 ≤ i < j ≤ n} are i.i.d. pairs of 0 random variables, and Wij and Wij have the same marginal distribution. Under the null hypothesis 0 0 H0, Wij and Wij are independent; under the alternative hypothesis H1, Wij and Wij are correlated. ∼ Given the unlabeled versions of G1 and G2, i.e., their isomorphism classes G1 = {G : G = G1} and ∼ G2 = {G : G = G2}, the goal is to test H0 versus H1.

Note that were the node labels of G1 and G2 observed, one could stack all the edge weights as a vector and reduce the problem to simply testing the correlation of two random vectors. However, when the node labels are unobserved, the inherent correlation between G1 and G2 is obscured by the latent node correspondence, making the testing problem significantly more challenging. Indeed, since the observed graphs are unlabeled, the test needs to rely on graph invariants (i.e., graph properties that are invariant under graph isomorphisms), such as subgraphs counts (e.g. the number of edges and triangles) and spectral information (e.g. eigenvalues of adjacency matrices or Laplacians). In this work, we focus on the following two special cases of particular interests:

0 • (Gaussian Wigner model). Suppose that under H1, each pair of edge weights Wij and Wij are jointly normal with zero mean, unit variance and correlation coefficient ρ ∈ [0, 1]; under 0 0 H0, Wij and Wij are independent standard normals. Note that marginally W, W are two Gaussian Wigner random matrices under both H0 and H1. The correlated Gaussian Wigner model is proposed in [DMWX18] as a prototypical model for random graph matching and further studied in [FMWX19a, GLM19].

• (Erd˝os-R´enyi random graph). Let G(n, p) denote the Erd˝os-R´enyi random graph model with edge probability p ∈ [0, 1]. Consider the edge sampling process that generates a children graph from a given parent graph by keeping each edge independently with probability s ∈ [0, 1]. Suppose that under H1, G1 and G2 are independently subsampled from a common parent graph G ∼ G(n, p); under H0, G1 and G2 are independently subsampled from two independent parent graphs G, G0 ∼ G(n, p), respectively. See Fig.1 for an illustration.

Note that G1 and G2 are both instances of G(n, ps) that are independent under H0 and correlated under H1. This specific model of correlated Erd˝os-R´enyi random graphs is initially proposed by [PG11] and has been widely used for studying the problem of matching random graphs [CK16, CK17, BCL+19, MX19, DCKG19, CKMP19, DMWX18, FMWX19b, GM20, HM20].

3 6 6

15 15 20 20 11 11

9 9 19 14 19 14 17 17 2 8 2 8 7 7 4 4 13 13 5 5 3 3

1 1 16 12 16 12

10 10 18 18

(a) Two labeled graphs G1 and G2 are subsampled from a common parent graph according to the correlated Erd˝os-R´enyi graph model with n = 20, p = 0.1, and s = 0.8 under H1, where blue edges are edges sampled from the parent graph, and red, dashed edges are edges deleted from the parent graph.

(b) Two observed graphs G1 and G2 are the unlabeled versions of G1 and G2, respectively.

Figure 1: Example of testing correlation of two Erd˝os-R´enyi random graphs. The task is to test the underlying hypothesis (H0 or H1) based on the two unlabeled graphs in panel (b).

We further focus on the following two natural types of testing guarantees.

Definition 1 (Strong and weak detection). Let Q and P denote the probability measure under  H0 and H1, respectively. We say a test statistic T G1, G2 with threshold τ achieves

• strong detection if the sum of type I and type II error converges to 0 as n → ∞, that is,      lim P T G1, G2 < τ + Q T G1, G2 ≥ τ = 0; (1) n→∞

• weak detection, if the sum of type I and type II error is bounded away from 1 as n → ∞, that is,      lim sup P T G1, G2 < τ + Q T G1, G2 ≥ τ < 1. (2) n→∞

4 Note that strong detection requires the test statistic to determine with high probability whether  G1, G2 is drawn from Q or P, while weak detection only aims at strictly outperforming random guessing. It is well-known that the minimal sum of type I and type II error is 1−TV (P, Q), achieved 1 R by the likelihood ratio test, where TV (P, Q) = 2 |dP − dQ| denotes the total variation distance between P and Q. Thus strong and weak detection are equivalent to limn→∞ TV (P, Q) = 1 and lim infn→∞ TV(P, Q) > 0, respectively. Recent work [BCL+19] developed a polynomial-time test based on counting certain subgraphs that correctly distinguishes between H0 and H1 with probability at least 0.9, provided that the edge subsampling probability s = Ω (1) and the average degree satisfies certain conditions; however, the fundamental limit of detection remains elusive. The main objective of this paper is to obtain tight necessary and sufficient conditions for strong and weak detection.

1.1 Main results Theorem 1 (Gaussian Wigner model). If 4 log n ρ2 ≥ , (3) n − 1 then TV (P, Q) = 1 + o (1). Conversely, if

(4 − ) log n ρ2 ≤ (4) n for any constant  > 0, then TV (P, Q) = o(1).

Theorem 2 (Erd˝os-R´enyi graphs). If 2 log n s2 ≥ , (5)  1  (n − 1)p log p − 1 + p then TV (P, Q) = 1 − o (1). Conversely, assume that p is bounded away from 1.

• (Dense regime): If p = n−o(1) and (2 − ) log n s2 ≤ (6)  1  np log p − 1 + p

for any constant  > 0, then TV (P, Q) = o(1).

• (Sparse regime): If p = n−Ω(1) and

1 − ω(n−1/3) s2 ≤ ∧ c (7) np 0

for some universal constant c0 (c0 = 0.01 works), then TV (P, Q) = 1 − Ω(1). In addition, if (7) holds and s = o(1), then TV (P, Q) = o(1).

For the Gaussian Wigner model, Theorem1 shows that the fundamental limit of detection in nρ2 terms of the limiting value of log n exhibits a sharp threshold at 4, above which strong detection is possible and below which weak detection is impossible, a phenomenon known as the “all-or-nothing”

5 phase transition [RXZ19]. In the Erd˝os-R´enyi model, for dense parent graphs with p = n−o(1) and nps2(log(1/p)−1+p) bounded away from 1, Theorem2 shows that a similar sharp threshold for log n exists 1 at 2. Curiously, the function p 7→ p(log p − 1 + p) is not monotone and uniquely maximized at 1 p∗ ≈ 0.203, the solution to the equation log p = 2(1 − p). This shows the counterintuitive fact that the parent graph with edge density p∗ is the “easiest” for detection as it requires the lowest sampling probability s; nevertheless, such non-monotonicity in the detection threshold can be anticipated by noting that in the extreme cases of p = 0 and p = 1, the observed two graphs are always independent and the two hypotheses are identical. For sparse parent graphs with p = n−Ω(1), the picture is less clear:

• Unbounded average degree np = ω(1): For simplicity, assume that p = n−α+o(1) for some 2 2 constant α ∈ (0, 1]. Theorem2 implies that strong detection is possible if lim inf nps > α and weak detection is impossible if lim sup nps2 < 1; these two conditions differ by a constant factor.

• Bounded average degree np = Θ(1): For simplicity, assume that p = d/n for some constant 2 2 2 1 d > 0. Theorem2 shows that strong detection is possible if s > d and impossible if s < c0∧ d . For both cases, it is an open problem to determine the sharp threshold for detection (or the existence thereof).

Remark 1 (Simple test for weak detection). In the non-trivial case of p = ω(1/n2) and p bounded away from 1, as long as the sampling problem s is any constant, weak detection can be achieved in linear time by simply comparing the number of edges of the two observed graphs. Intuitively, their difference behaves like a centered Gaussian with slightly different scale parameters under the two hypotheses which can then be distinguished non-trivially (see Section A.1 for a rigorous justification). In view of the negative result for weak detection in Theorem2, we conclude that for parent graph with bounded average degree np = O(1), weak detection is possible if and only if s = Ω(1).

As discussed in the next subsection, the testing procedure used to achieve strong detection in both Theorem1 and Theorem2 involves a combinatorial optimization that is intractable in the worst case. Thus it is of interest to compare the optimal threshold to the performance of existing computationally efficient algorithms. These methods are based on subgraph counts that extend the simple idea of counting edges in Remark1. For Erd˝os-R´enyi graphs, the polynomial-time test in [BCL+19, Theorem 2.2] (based on counting certain probabilistically constructed subgraphs) correctly distinguishes between H0 and H1 with probability at least 0.9, provided that the edge subsampling probability s = Ω (1) and nps ∈ n, n1/153 ∪ n2/3, n1− for some small constant 2 log n  > 0. This performance guarantee is highly suboptimal compared to s = Ω( np log(1/p) ) given by Theorem2. In a companion paper [MWXY20], we propose a polynomial-time algorithm based on −o(1) 2 s2(1−p)2 1 counting trees that achieves strong detection, provided that np ≥ n and ρ , (1−ps)2 > β 1/k where β , limk→∞[t(k)] ≈ 2.956 and t(k) is the number of unlabeled trees with k vertices [Ott48]. Achieving the optimal threshold with polynomial-time tests remains an open problem.

1.2 Test statistic and proof techniques To introduce our testing procedure and the analysis, we first reformulate the testing problem given in Problem1 in a more convenient form. Due to the exchangeability of the (i.i.d.) edge weights, observing the unlabeled version is equivalent to observing its randomly relabeled version.

6 Indeed, let π1 and π2 be two independent random permutations uniformly drawn from the set Sn of all permutations on [n]. Consider the relabeled version of G1 = ([n],W ) with weighted adjacency matrix A, where Aij = Wπ1(i)π1(j); similarly, let B correspond to the relabeled version of G = ([n],W 0) with B = W 0 . It is clear that observing the unlabeled graphs G and 2 ij π2(i)π2(j) 1 −1 G2 is equivalent to observing the labeled graphs A and B. Since π2 ◦ π1 is also a uniform random permutation, we arrive at the following formulation that is equivalent to Problem1: Problem 2 (Reformulation of Problem1) . Let A and B denote the weighted adjacency matrices of two weighted graphs on the vertex set [n], both consisting of i.i.d. edge weights. Under H0, A and B are independent; under H1, conditional on a latent permutation π drawn uniformly at random from Sn, {(Aij,Bπ(i)π(j)) : 1 ≤ i < j ≤ n} are i.i.d. and each pair Aij and Bπ(i)π(j) are correlated. Upon observing A and B, the goal is to test H0 versus H1.

Note that under H1, the latent random permutation π represents the hidden node correspon- dence under which A and B are correlated. For this reason, we refer to H1 as the planted model and H0 as the null model. The likelihood ratio is given by: P (A, B) 1 X P (A, B | π) = , (8) Q (A, B) n! Q (A, B) π∈Sn which is the optimal test statistic but difficult to analyze due to the averaging over all n! per- mutations. Instead, we consider the generalized likelihood ratio by replacing the average with the maximum: P (A, B|π) max . (9) π∈Sn Q (A, B) As shown later in Section2, for both the Gaussian Wigner and the Erd˝os-R´enyi graph model, (9) is equivalent to X T (A, B) , max Tπ, where Tπ , AijBπ(i),π(j). (10) π∈Sn i

1Indeed, (11) follows from, e.g., [Tsy09, Lemma 2.6 and 2.7] and (12) is by Cauchy-Schwarz inequality.

7 which correspond to the impossibility of strong and weak detection, respectively. To compute the second moment, we introduce an independent copy πe of the latent permutation π and express the squared likelihood ratio as    2 P(A, B) Y P (Aij,Bπ(i)π(j))P (Aij,Bπ(i)π(j)) = X , where X e e , Q(A, B) Eπe⊥⊥π  ij ij , Q(A ,B )Q(A ,B ) i

n where for any (i, j) ∈ 2 , Q denotes the joint density function of Aij and Bij under Q, and P denotes the joint density function of Aij and Bπ(i)π(j) under P given its latent permutation π. Fixing π and πe, we then decompose this as a product over independent randomness indexed by the −1 so-called edge orbits. Specifically, the permutation σ , π ◦ πe on the node set naturally induces a permutation σE on the edge set of the by permuting the end points. Denoting by O the collection of edge orbits (orbit in the cycle decomposition of the edge permutation σE), we show that

2 " # P(A, B) Y Y = X ,X X , Q(A, B) Eπe⊥⊥π O O , ij O∈O (i,j)∈O where XO’s are mutually independent under Q conditioned on π and πe. Then, we take expectation E(A,B)∼Q on the right-hand side and interchange the two expecta- tions. For both the Gaussian and Erd˝os-R´enyi models, this calculation can be explicitly carried out by evaluating the trace of certain operators. In particular, this computation shows the following di- 2 (2−) log n 2 (2+) log n chotomy: the second moment is 1 + o(1) when ρ ≤ n , but unbounded when ρ ≥ n , s(1−p) where ρ is the correlation coefficient in the Gaussian case and ρ = 1−ps in the Erd˝os-R´enyi case. Compared with Theorems1 and2, we see that directly applying the second-moment method fails 2 (2−) log n to capture the sharp threshold: The impossibility condition ρ ≤ n is suboptimal by a multiplicative factor of 2 in the Gaussian case and by an unbounded factor in the Erd˝os-R´enyi case when p = o(1). It turns out that the second moment is mostly influenced by those short edge orbits of length k = Q O(log n) for which |O|=k XO has a large expectation (see Section 3.3 for a detailed explanation). Q Fortunately, the atypically large magnitude of |O|=k XO can be attributed to certain rare events π associated with the intersection graph (edges that are included in both A and B = (Bπ(i)π(j))), which is distributed as G(n, ps2) under the planted model P. This observation prompts us to apply the conditional second moment method, which truncates the squared likelihood ratio on appropriately chosen global event that has high probability under P. Specifically,

• In the dense case (including Gaussian model and dense Erd˝os-R´enyi graphs), the dominating contribution comes from fixed points (k = 1) which can be regulated by conditioning on the edge density of large induced subgraphs of the intersection graph. Note that for Erd˝os- R´enyi graphs, even though the density of small induced subgraphs (e.g. induced by Θ(log n) vertices) can deviate significantly from their expectations [BBSV19], fortunately we only need to consider sufficiently large subgraphs here.

• For sparse Erd˝os-R´enyi graphs, the argument is much more involved and combinatorial, as one need to control the contribution of not only fixed points, but all edge orbits of length O(log n). Crucially, the major contribution is due to those edge orbits that are subgraphs of the intersection graph. Under the impossibility condition of Theorem2, the intersection graph G(n, ps2) is subcritical and a pseudoforest (each component having at most one cycle)

8 with high probability. This global structure significantly limits the co-occurrence of edge orbits in the intersection graph. We thus truncate the squared likelihood ratio on the global event that intersection graph is a pseudoforest. To compute the conditional second moment, we first study the graph structure of edge orbits, then reduce the problem to enumerating pseudoforests that are disjoint union of edge orbits and bounding their generating functions, and finally average over the cycle lengths of the random permutation σ. This is the most challenging part of the paper.

1.3 Connection to the literature This work joins an emerging line of research which examines inference problems on networks from statistical and computational perspectives. We discuss some particularly relevant work below.

Random graph matching Given a pair of graphs, the problem of graph matching (or network alignment) refers to finding a node correspondence that maximizes the edge correlation [CFSV04, LR13], which amounts to solving the QAP in (10). Due to the worst-case intractability of the QAP, there is a recent surge of interests in the average-case analysis of matching two correlated random graphs [CK16, CK17, BCL+19, MX19, DCKG19, CKMP19, DMWX18, FMWX19b, GM20, HM20], where the goal is to reconstruct the hidden node correspondence between the two graphs accu- rately with high probability. To this end, the correlated Erd˝os-R´enyi graph model (the alterna- tive hypothesis H1 in Problem2) has been used as a popular model, for which the solution to the QAP (10) is the maximal likelihood estimator. It is shown in [CK17] that exact recovery of the hidden node correspondence with high probability is information-theoretically possible if nps2 − log n → +∞ and p = O(log−3(n)), and impossible if nps2 − log n = O(1). In contrast, the state of the art of polynomial-time algorithms achieve the exact recovery only when np = poly(log n) and 1 − s = 1/poly(log n)[DMWX18, FMWX19a, FMWX19b]. Recent work [CKMP19] initiated the study of almost exact recovery, that is, to obtain a match- ing (possibly imperfect) of size n−o(n) that is contained in the true matching with high probability. It shows that the almost exact recovery is information-theoretically possible if nps2 = ω(1) and p ≤ n−Ω(1), and impossible if nps2 = O(1). Another work [GM20] considers a weaker objective of partial recovery, that is, to output a matching that contains Θ(n) correctly matched vertex pairs with high probability. It is shown that the partial recovery can be attained in polynomial time by a neighborhood matching algorithm in the sparse graph regime where nps ∈ (1, λ0] for some constant λ0 close to 1 and s ∈ (s0, 1] for some constant s0 close to 1. More recently, 2 1 the partial recovery is shown to be information-theoretically impossible if nps log p = o(1) when p = o(1) [HM20]. For ease of comparison, we summarize the different thresholds under various performance metrics in Table1 in the Erd˝os-R´enyi model. In contrast to the aforementioned work focusing on recovering the latent matching, this work studies the hypothesis testing aspect of graph matching, which, nevertheless, has direct conse- quences on the recovery problem. As an application of the truncated second moment calculation, in a companion paper [WXY21] we resolve the sharp threshold of recovery by characterizing the asymptotic mutual information I(A, B; π). In particular, we show that in the dense regime with p = n−o(1), the sharp threshold of recovery exactly matches the detection threshold above which almost exact recovery is possible and below which partial recovery is impossible, thereby closing the gap in Table1. In the sparse regime with p = n−Ω(1), we show that the information-theoretic threshold for partial recovery is at nps2  1, which coincides with the detection threshold up to a constant factor.

9 Performance metric Positive result Negative result nps2 ≥ log n + ω(1) & Exact recovery nps2 ≤ log n − ω(1) [CK16] p = O(log−3(n)) [CK17] nps2 = ω(1) & Almost exact recovery nps2 = O(1) [CKMP19] p ≤ n−Ω(1) [CKMP19]

nps ∈ (1, λ0]& 2 1 Partial recovery nps log p = o(1) [HM20] s ∈ (s0, 1] [GM20] p = n−o(1) & 2 1 Detection nps log ≤ (2 − ) log n, or nps2 log 1 ≥ (2 + ) log n p (This paper) p p = n−Ω(1) & 2 1−ω(n−1/3) s ≤ np ∧ 0.01

Table 1: Thresholds for various recovery criteria in the correlated Erd˝os-R´enyi graph model when p = o(1).

Detection problems in networks There is a recent flurry of work using the first and second- moment methods to study hypothesis testing problems on networks with latent structures such as community detection under stochastic block models [MNS15, ACV14, VAC15, BMNN16]. Notably, a conditional second moment argument was applied by Arias-Castro and Verzelen in [ACV14] and [VAC15] to study the problem of detecting the presence of a planted community in dense and sparse Erd˝os-R´enyi graphs, respectively. Similar to our work, for dense graphs, they condition on the edge density of induced subgraphs (see also the earlier work [BI13] for the Gaussian model); for sparse graphs, they condition on the planted community being a forest and bound the truncated second- moment by enumerating subforests using Cayley’s formula. However, the crucial difference is that in our setting simply enumerating the pseudoforests is inadequate for proving Theorem2. Instead, we need to take into account the cycle structure of permutations and enumerate orbit pseudoforests, i.e., pseudoforests assembled from edge orbits (see the discussion before Theorem4 in Section5 for details). By separately accounting for orbits of different lengths and their graph properties, we are able to obtain a much finer control on the generating function of orbit pseudoforests that allows the conditional second moment to be bounded after averaging over the random permutation. This proof technique is of particular interest, and likely to be useful for other detection problems regarding permutations. Finally, we mention that the recent work [RS20] studied a related correlation detection problem, where the observed two graphs are either independent, or correlated randomly growing graphs (which grow together until time t∗ and grow independently afterwards according to either uniform and preferential attachment models). Sufficient conditions are obtained for both weak detection and strong detection as t∗ → ∞. However, the problem setup, main results, and proof techniques are very different from the current paper.

1.4 Notation and paper organization

For any n ∈ N, let [n] = {1, 2, ··· , n} and Sn denote the set of all permutations on [n]. For a given graph G, let V (G) denote its vertex set and E(G) its edge set. For two graphs on [n] with (weighted) adjacency matrices A and B, their intersection graph is a graph on [n] with (weighted) adjacency matrix A ∧ B, where (A ∧ B)ij , AijBπ(i)π(j); (13)

10 in the unweighted case, the edge set of A ∧ B is the intersection of those of A and B. Given a π permutation π ∈ Sn, let B = (Bπ(i)π(j)) denote the relabeled version of B according to π. For any P S ⊂ [n], define eA(S) , i

2 First Moment Method for Detection

In this section we prove the positive parts of Theorems1 and2 by analyzing the test statistic (9). Recall from Section 1.2 the reformulated Problem2 with observations A and B, whose distributions are specified as follows:

• Gaussian Wigner model.

i.i.d.  0 1 0  H0 :(Aij,Bij) ∼ N ( 0 ) , ( 0 1 ) , (14)

 i.i.d.  0  1 ρ   H1 : Aij,Bπ(i)π(j) ∼ N ( 0 ) , ρ 1 conditional on π ∼ Uniform(Sn). (15)

• Erd˝os-R´enyi random graph

i.i.d. H0 :(Aij,Bij) ∼ Bern(ps) ⊗ Bern(ps), (16)  i.i.d. H1 : Aij,Bπ(i)π(j) ∼ pair of correlated Bern (ps) conditional on π ∼ Uniform(Sn), where ( Bern (s) if Aij = 1 Aij ∼ Bern (ps) and Bπ(i)π(j) ∼  ps(1−s)  . (17) Bern 1−ps if Aij = 0

Then we get

P (A, B|π) Y = L A ,B  , (18) Q (A, B) ij π(i)π(j) 1≤i

11 where   P Aij,Bπ(i)π(j) L Aij,Bπ(i)π(j) , . (19) Q Aij,Bπ(i)π(j) For the Gaussian Wigner model, we have ! 1 −ρ2 b2 + a2 + 2ρab L(a, b) = exp . (20) p1 − ρ2 2 (1 − ρ2) For the Erd˝os-R´enyi graph model, we have  1 a = 1, b = 1  p  1−s L(a, b) = 1−ps a = 1, b = 0 or a = 0, b = 1 . (21)  1−2ps+ps2  (1−ps)2 a = 0, b = 0 Then, it yields that the generalized likelihood ratio test (9) is equivalent to

P (A, B|π) X max ⇐⇒ max AijBπ(i)π(j) (22) π∈Sn Q (A, B) π∈Sn i

2.1 Proof of Theorem1: positive part n Throughout the proof, denote m = 2 for brevity. Without loss of generality, we assume that 2 2n log n 0 0 (3) holds with equality, i.e., ρ = m ; otherwise, one can apply the test to (A ,B ) where A0 = cos(θ)A + sin(θ)Z, B0 = cos(θ)B + sin(θ)Z with an appropriately chosen θ, and Z is standard normal and independent of (A, B). Define

τ = ρm − an (23)

3/2 where an is some sequence to be chosen satisfying an = ω(n) and an = O(n ). We first analyze the error event under the alternative hypothesis (15). Let π denote the latent permutation such that (Aij,Bπ(i)π(j)) are iid pairs of standard normals with correlation coefficient ρ. Applying the Hanson-Wright inequality (see Lemma 10 in AppendixC) with M = Im to P Tπ = 1≤i

2 −can −ca /m P (Tπ ≤ τ) = P (Tπ ≤ ρm − an) ≤ e + e n , for some universal constant c. Since by definition T ≥ Tπ, it follows that P (T ≤ τ) = o(1). To analyze the error event under the null hypothesis (14), in which case for each π ∈ Sn, i.i.d. (Aij,Bπ(i)π(j)) are iid pairs of independent standard normals. Note that for X,Y ∼ N (0, 1) and any λ ∈ (−1, 1), we have  λ2Y 2  1 E [exp (λXY )] = E exp = √ . (24) 2 1 − λ2 Then by the Chernoff bound, for any λ ∈ (0, 1), n m o Q (T ≥ τ) = Q (exp(λT ) ≥ exp(λτ)) ≤ exp − log(1 − λ2) − λτ . π π 2

12 τ Choosing λ = m , which satisfies 0 < λ = o(1) in view of (3) and (23), we have Q(Tπ ≥ τ) ≤ 2 − τ +O(τ 4/m3) n+ 1 −n e 2m . Finally by the union bound and Stirling approximation that n! ≤ en 2 e , 2 τ 4 3 ρ2m log n − 2m +O(τ /m ) 4 n Q(T ≥ τ) ≤ n!e = o(1), provided that 2 − ρan − O(ρ m) − n log e − 2 → +∞. 2 2n log n 1.1 This is ensured by the assumption that ρ = m and the choice of an = n .

2.2 Proof of Theorem2: positive part n Throughout the proof, denote m = 2 for brevity. Without loss of generality, we assume that (5) holds with equality, i.e.,

 1  mps2 log − 1 + p = n log n; (25) p otherwise, one can apply the test to (A0,B0) where A0 (B0) are edge-subsampled from A (B) with an appropriately chosen subsampling probability s0. It follows from (25) that p ≥ 1/n and mps2 = Ω(n). Define 2 τ = mps (1 − δn) (26)

 p 2 where 0 < δn < 1 is some sequence to be chosen satisfying δn = ω 1/ mps . We first analyze the error event under the alternative hypothesis (17). Let π denote the latent i.i.d. 2 2 permutation such that (AijBπ(i)π(j)) ∼ Bern(ps ). Then Tπ ∼ Binom(m, ps ). By applying the 2 2  Chernoff bound (118): P (Tπ ≤ τ) ≤ exp −δnmps /2 = o(1). Since by definition T ≥ Tπ, it follow that P (T ≤ τ) = o(1). Next, we analyze the error event under the null hypothesis (16), in which case for each π ∈ Sn, i.i.d. 2 2 2 2 (AijBπ(i)π(j)) ∼ Bern(p s ) and thus Tπ ∼ Binom m, p s . Using the multiplicative Chernoff bound for Binomial distributions (117), we obtain that

 τ  Q (T ≥ τ) ≤ exp −τ log − µ π eµ  1 − δ  = exp −mps2(1 − δ ) log n − mp2s2 n ep   1  1 ≤ exp −mps2 log − 1 + p + mps2δ log , p n p

2 2 1−δn where µ = mp s , and the last inequality holds due to (1 − δn) log e ≥ −1. n+ 1 −n Then by applying union bound and Stirling approximation that n! ≤ en 2 e , we have that

  1  1 n 1  Q (T ≥ τ) ≤ e exp −mps2 log − 1 + p + mps2δ log + n log + log n p n p e 2  0.6 1 1  = e exp mps2 log − n + log n = o(1), p 2

2  1  where the first equality holds by the assumption mps log p − 1 + p = n log n and choosing 2 0.4 20.6 1 δn = 1/(mps ) ; the last equality holds by the claim that mps log p = o(n). To finish the proof, it suffices to verify the claim, which is done separately in the following two cases. Suppose 1 − p = Ω(1). Then log(1/p) − 1 + p = Ω(log(1/p)). Thus in view of assumption (25) 20.6 1 0.6 0.4  and p ≥ 1/n, we get that mps log p ≤ O (n log n) log (1/p) = o(n).

13 Suppose 1 − p = o(1). As log(1/p) − 1 + p ≥ (1 − p)2/2, it follows from assumption (25) that 2 1−p 20.6 1 (1 − p) = Ω(log n/n). Furthermore, log(1/p) ≤ p . Thus by assumption (25), mps log p ≤ O (n log n)0.6(1 − p)−0.2 = O n0.7 log0.5 n = o(n).

3 Unconditional Second Moment Method and Obstructions

In this section, we apply the unconditional second moment method to derive impossibility conditions for detection. As mentioned in Section 1.2 (and described in details in Section 3.3), these conditions do not match the positive results in Section2, due to the obstructions presented by the short edge orbits. To overcome these difficulties, in Sections4 and5, we apply the conditional second moment method by building upon the second moment computation in this section. We start by introducing some preliminary definitions associated with permutations.

3.1 Node permutation, edge permutation, and cycle decomposition

Let σ ∈ Sn be a permutation on [n]. For each element a ∈ [n], its orbit is a cycle (a0, . . . , ak−1) i for some k ≤ n, where ai = σ (a), i = 0, . . . , k − 1 and σ(ak−1) = a. Each permutation can be decomposed as disjoint orbits. For example, consider the permutation σ ∈ S8 that swaps 1 with 2, swaps 3 with 4, and cyclically shifts 5678. Then σ consists of three orbits represented in canonical notation as σ = (12)(34)(5678). Consider the complete graph Kn with vertex set [n]. Each permutation σ ∈ Sn naturally induces E [n] a permutation σ on the edge set of Kn, the set 2 of all unordered pairs, according to

E σ ((i, j)) , (σ(i), σ(j)). (27) We refer to σ and σE as node permutation and edge permutation, whose orbits are refereed to as E node orbits and edge orbits, respectively. For each edge (i, j), let Oij denotes its orbit under σ . As a concrete example, consider again the permutation σ = (12)(34)(5678). Then O12 = {(1, 2)} E and O34 = {(3, 4)} are 1-edge orbits (fixed point of σ ) and O56 = {(5, 6), (6, 7), (7, 8), (8, 5)} is a 4-edge orbit. (See Table2 in Section 5.1 for more examples.) The cycle structure of the edge permutation is determined by that of the node permutation. E Let nk (resp. Nk) denote the number of k-node (resp. k-edge) orbits in σ (resp. σ ). For example, n  n  N = 1 + n ,N = 2 × 2 + n n + n . (28) 1 2 2 2 2 1 2 4 This is due to the following reasoning.

• Consider a 1-edge orbit given by {(i, j)}. Since (i, j) is unordered, it follows that either both n1 i, j are fixed points of σ or i, j form a 2-node orbit of σ. Thus, N1 = 2 + n2. • Consider a 2-edge orbit given by {(i, j), (σ(i), σ(j))}. Then there are three cases: (a) i, j belong to two different 2-node orbits; (b) i is a fixed point and j lies in a 2-node orbit; (c) n2 i, j belong to a common 4-node orbit of the form (i ∗ j∗). Thus N2 = 2 × 2 + n1n2 + n4.

3.2 Second moment calculation Recall the second-moment method described in Section 1.2. When P is a mixture distribution, the calculation of the second moment can proceed as follows. Note that the likelihood ratio is P(A,B) P(A,B|π) Q(A,B) = Eπ[ Q(A,B) ], where π is a random permutation uniformly distributed over Sn. Introducing

14 an independent copy πe of π and noting that B has the same marginal distribution under both P and Q, the squared likelihood ratio can be expressed as    2   P(A, B) P(A, B|π) P(A, B|π) (19) Y = e = X , (29) Q(A, B) Eπe⊥⊥π Q(A, B) Q(A, B) Eπe⊥⊥π  ij i

Fixing π and πe, we first compute the inner expectation in (31). Observe that Xij may not be independent across different pairs of (i, j). For example, suppose (i1, j1) 6= (i2, j2) and

(π(i1), π(j1)) = (π(i2), π(j2)), then Bπ(i ),π(j ) = Bπ(i ),π(j ) and Xi1j1 is not independent of Xi2j2 . e eQ 1 1 e 2 e 2 In order to decompose i

−1 σ , π ◦ π,e (32) E which is also uniformly distributed on Sn. Let σ denote the edge permutation induced by σ as in (27), i.e., σE(i, j) = (σ(i), σ(j)). For each edge orbit O of σE, define Y Y X X = L A ,B  L A ,B  . (33) O , ij ij π(i)π(j) ij πe(i)πe(j) (i,j)∈O (i,j)∈O

Importantly, observe that X is a function of A ,B  . Indeed, since (π(i), π(j)) = O ij π(i)π(j) (i,j)∈O e e E E E E π(σ(i), σ(j)), or equivalently in terms of edge permutation, πe = π ◦ σ , and O is an orbit of σ , we have {B } = {B } . π(i)π(j) (i,j)∈O πe(i)πe(j) (i,j)∈O Let O denote the collection of all edge orbits of σ. Since edge orbits are disjoint, we have Y Y Xij = XO. (34) i

Since {Aij}i

" 2# " # P (A, B) Y = [X ] . (35) EQ Q (A, B) Eπ⊥⊥πe E(A,B)∼Q O O∈O Recall that, for the Gaussian Wigner model, ρ denotes the correlation coefficient of edge weights in the planted model P. For Erd˝os-R´enyi graph model, the correlation parameter in the planted model P is defined as:

Cov(AijB ) s(1 − p) ρ π(i)π(j) = . (36) , p q 1 − ps Var(Aij) Var(Bπ(i)π(j))

15 −1 Proposition 1. Fixing π and πe, for any edge orbit O of σ = π ◦ πe, we have • for Gaussian Wigner models, 1 E(A,B)∼Q [XO] = , (37) 1 − ρ2|O|

• for Erd˝os-R´enyirandom graphs,

2|O| E(A,B)∼Q [XO] = 1 + ρ . (38)

P (x,y) P (x,y) Proof. Recall from (19) that L(x, y) = Q(x,y) = Q(x)Q(y) . This kernel defines an operator as follows: for any square-integrable function f under Q,

(Lf)(x) , EY ∼Q [L(x, Y )f(Y )] = E(X,Y )∼P [f(Y ) | X = x] . (39)

2 2 k In addition, L = L ◦ L is given by L (x, y) = EZ∼Q[L(x, Z)L(Z, y)] and L is similarly defined. For both Gaussian and Bernoulli model, we have L(x, y) = L(y, x) and hence L is self-adjoint. Furthermore, since RR L(x, y)2Q(dx)Q(dy) < ∞, L is Hilbert-Schmidt. Thus L is diagonazable P with eigenvalues λi’s and the trace of L is given by tr(L) = EY ∼Q[L(Y,Y )] = λi. Let k = |O|. To simplify the notation, let ai’s and bi’s be independent sequences of iid random variables drawn from Q. Since O is an edge orbit of σE, we have {B } = {B } π(i)π(j) (i,j)∈O πe(i)πe(j) (i,j)∈O and (πe(i), πe(j)) = π(σ(i), σ(j)). By (33),   Y [X ] = L A ,B  L A ,B  EA,B∼Q O EA,B∼Q  ij π(i)π(j) ij πe(i)πe(j)  (i,j)∈O " k # Y  = E L (a`, b`) L a`, b(`+1) mod k `=1 " k # Y 2  = E L b`, b(`+1) mod k `=1  2k X 2k = tr L = λi .

For Gaussian Wigner model, L(x, y) given in (20) is known as Mehler’s kernel and can be diagonalized by Hermite polynomials as

∞ X ρi L(x, y) = H (x)H (y), i! i i i=0 where EY ∼N (0,1) [Hi(Y )Hj(Y )] = i!1{i=j} [Kib45]. It follow that the eigenvalues of operator L are i 2k P∞ 2ki 1 given by λi = ρ for i ≥ 0 and thus tr L = i=0 ρ = 1−ρ2k . For Erd˝os-R´enyi graphs,

X P (x, y) 1 X (Lf)(x) = f(y)Q(y) = P (x, y)f(y). Q(x)Q(y) Q(x) y∈{0,1} y∈{0,1}

16 Thus the eigenvalues of L are given by the eigenvalues of the following 2 × 2 row-stochastic matrix P (x,y) M with rows and columns indexed by {0, 1} and M(x, y) = Q(x) . Explicitly, by (21) we have

1−ps(2−s) ps(1−s) ! M = 1−ps 1−ps . 1 − s s

s(1−p) 2k 2k The eigenvalues of M are 1 and ρ = 1−ps , so tr L = 1 + ρ .

In view of Propositions1, E(A,B)∼Q [XO] decreases when the orbit length |O| increases. Let nk denote the total number of k-node orbits in the cycle decomposition of node permutation σ, and let Nk denote the total number of k-edge orbits in the cycle decomposition of edge permutation E σ , for k ∈ N. For the Gaussian Wigner model, by (35) and (37), we get

 n  " 2# " # (2) N P (A, B) Y  1  Y  1  k EQ = Eπ⊥⊥π = Eπ⊥⊥π   . (40) Q (A, B) e 1 − ρ2|O| e  1 − ρ2k  O∈O k=1

For the Erd˝os-R´enyi graphs, by (35) and (38), we get

 n  " 2# " # (2) P (A, B) Y   Y  Nk = 1 + ρ2|O| =  1 + ρ2k  . (41) EQ Q (A, B) Eπ⊥⊥πe Eπ⊥⊥πe   O∈O k=1

Let us assume

n2ρ6 = o(1), (42) which is ensured by (4) for Gaussian model in Theorem1 or (6) for dense Erd˝os-R´enyi model with p = n−o(1) in Theorem2. n P(2) n For the Gaussian model, consider the orbits of length k ≥ 3. Since k=3 Nk ≤ 2 , we have

n ( ) n n 2 Nk ( ) ( ) Y  1   1  2  ρ6  2  n2ρ6  ≤ = 1 + ≤ exp = 1 + o(1), (43) 1 − ρ2k 1 − ρ6 1 − ρ6 2 (1 − ρ6) k=3 where the last equality holds due to (42). Moreover, for 1-orbits and 2-orbits.

!N2  1 N1  1 N2  ρ2 N1  ρ2 2 ≤ 1 + 1 + 1 − ρ2 1 − ρ4 1 − ρ2 1 − ρ2 ! ρ2  ρ2 2 ≤ exp N + N . 1 − ρ2 1 1 − ρ2 2

Therefore, " # " !# P (A, B)2 ρ2  ρ2 2 ≤ (1 + o(1)) exp N + N . (44) EQ Q (A, B) Eπ⊥⊥πe 1 − ρ2 1 1 − ρ2 2

17 For Erd˝os-R´enyi graphs, analogously, for the orbits of length k ≥ 3,

(n) 2 n  2 6  Y  Nk ( ) n ρ 1 + ρ2k ≤ 1 + ρ6 2 ≤ exp = 1 + o(1), (45) 2 k=3 where the last equality holds due to (42). For 1-orbits and 2-orbits, 1 + ρ2N1 1 + ρ4N2 ≤ 2 4  exp ρ N1 + ρ N2 . Therefore, " # P (A, B)2 ≤ (1 + o(1)) exp ρ2N + ρ4N  . (46) EQ Q (A, B) Eπ⊥⊥πe 1 2

Next, we bound the contribution of N1 and N2 to the second moment for both models using the following proposition. The proof, based on Poisson approximation, is deferred till Section A.2.

2 1  Proposition 2. Assume µ, ν, τ ≥ 0 such that τ = o n , and µb + ν + 2 − log b ≤ 0 for some 1 ≤ b ≤ n such that b = ω(1).

• If a = ω(1) and ν ≤ log(a) − 3,

exp µn2 + νn + τn + τ 2N  1  = o(1). (47) Eπ⊥⊥πe 1 1 2 2 {a≤n1≤b}

• If a = 0 and ν = o(1),

exp µn2 + νn + τn + τ 2N  1  ≤ 1 + o(1). (48) Eπ⊥⊥πe 1 1 2 2 {a≤n1≤b}

2(log n−2) In particular, if 0 ≤ τ ≤ n , then exp τN + τ 2N  = 1 + o(1). (49) Eπ⊥⊥πe 1 2

Finally, we arrive at a condition for bounded second moment, which turns out to be sharp.

Theorem 3 (Impossibility condition by unconditional second moment method). Fix any constant  > 0. If

(2 − ) log n ρ2 ≤ , (50) n

P(A,B) 2 then for both Gaussian Wigner and Erd˝os-R´enyigraphs, EQ[( Q(A,B) ) ] = 1 + o(1), which further implies that TV(P, Q) = o(1), the impossibility of weak detection.

Proof. Note that (50) implies (42). Thus, by combining (44) or (46) with (49) in Proposition2, we P(A,B) 2 get EQ[( Q(A,B) ) ] = 1 + o(1), which yields TV(P, Q) = o(1) in view of (12).

3.3 Obstruction from short orbits The impossibility condition in Theorem3 is not optimal. In the Gaussian case, (50) differs by a 2 (4+) log n factor of 2 from the positive result of ρ ≥ n in Theorem1. For Erd˝os-R´enyi graphs the 2  1  suboptimality is more severe: Theorem2 shows that if nps log p − 1 + p ≥ (2 + ) log n, then strong detection is possible. In the regime of p = o(1), since ρ = (1 + o(1))s, this translates to the

18 2 (2+) log n condition ρ ≥ 1 , which differs from (50) by an unbounded factor. This is the limitation of np log p 2 (2−) log n the second moment method, as the condition ρ ≤ n is actually tight for the second moment 2 (2+) log n to be bounded. When ρ ≥ n , the second moment diverges because of certain rare events −1 associated with short orbits in σ = π ◦ πe. Below we describe the lower bound on the second moment due to short orbits, which motivates the conditional second moment arguments in Sections 4 and5 that eventually overcome these obstructions. Specifically, in view of (29) and (34), for both Gaussian and Erd˝os-R´enyi models, the squared likelihood ratio factorizes into products over the edge orbits of σ:

2 " # P(A, B) Y = X , Q(A, B) Eπ⊥⊥πe O O∈O

−1 where XO is defined in (33). Since both π and πe are uniform random permutations, so is σ = π ◦πe. For each divisor k of n, consider the rare event that σ decomposes into (n/k) disjoint k-node orbits (i.e. n = n/k and all the other n ’s are zero), which occurs with probability 1 ≥ n−n/k. k j (n/k)!kn/k These short node orbits create an abundance of short edge orbits, as each pair of distinct k-node orbits can form k different k-edge orbits.2 Thus, the following lower bound on the second moment ensues

" 2# " # P(A, B) Y = [X ] E(A,B)∼Q Q(A, B) Eπ⊥⊥πe EQ O O∈O (a)  nk k n/k k  2k( 2 ) −n/k  2k( 2 ) ≥ E 1 + ρ ≥ n 1 + ρ  n n/k   = exp − log n + k log 1 + ρ2k , (51) k 2

2k where (a) holds because EQ [XO] ≥ 1 + ρ for each k-edge orbit O in both Gaussian ((37)) and Erd˝os-R´enyi models ((38)). Consequently, for any k = o(n), " # (2 + ) log n P(A, B)2 ρ2k ≥ =⇒ → ∞. n EQ Q(A, B)

In particular, the strongest obstruction comes from k = 1 (fixed points): " # (2 + ) log n P(A, B)2 ρ2 ≥ =⇒ → ∞. n EQ Q(A, B)

In this case, the culprit is the rare event of π “colliding” with πe (σ = id), which holds with 2 n probability 1/n! but has an excessive contribution of (1 + ρ )(2) to the second moment. In conclusion, the second moment is susceptible to the influence of short edge orbits, for which Q |O|=k XO for small k has a large expectation. Fortunately, it turns out that the atypically large Q magnitude of |O|=k XO can be attributed to certain rare events associated with the intersection graph A ∧ Bπ under the planted model P. This motivates us to condition on some appropriate

2 For example, (12) and (34) can form two edge orbits O13 and O14 of length 2; these edge orbits will be referred to as Type-M; see Section 5.1 for a full classification of edge orbits.

19 Q high-probability event under P, so that the excessively large magnitude of |O|=k XO is truncated. As we will see in Section4, in the dense regime (including Gaussian Wigner model and dense Erd˝os- Q R´enyi graphs), it suffices to consider k = 1 and regulate |O|=1 XO by conditioning on the edge density for all sufficiently large induced subgraphs of A ∧ Bπ under P. In contrast, in the sparse regime, we need to consider all edge orbits up to length k = Θ(log n) for which more sophisticated techniques are called for, as we will see in Section5.

4 Conditional Second Moment Method: Dense regime

In this section, we improve Theorem3 by applying the conditional second moment method. The proof of the sharp threshold for the Gaussian model is given in full details in Section 4.1. The proof for dense Erd˝os-R´enyi graphs uses similar ideas but is technically more involved and hence deferred to Section A.3. We start by describing the general program of conditional second moment method. Note that sometimes certain rare events under P can cause the second moment to explode, while TV(P, Q) remains bounded away from one. To circumvent such catastrophic events, we can compute the second moment conditioned on events that are typical under P. More precisely, given an event E such that P(E) = 1 + o(1), define the planted model conditional on E:

P (A, B, π) 1 P0 (A, B, π) {(A,B,π)∈E} = (1 + o (1)) P (A, B, π) 1 , , P (E) {(A,B,π)∈E} the last equality holds because P(E) = 1 + o(1). Then the likelihood ratio between the conditioned planted model P0 and the null model Q is given by

P0 (A, B) R P0 (A, B, π) dπ Z P (π) P (A, B | π) 1 = = (1 + o (1)) {(A,B,π)∈E} dπ Q (A, B) Q (A, B) Q(A, B) P (A, B | π)  = (1 + o (1)) 1 . Eπ Q (A, B) {(A,B,π)∈E}

By the same reasoning that led to (35), the conditional second moment is given by " # P0 (A, B)2  P (A, B | π) P (A, B | π)  = (1 + o (1)) e 1 1 EQ Q (A, B) Eπ⊥⊥πe EQ Q (A, B) Q (A, B) {(A,B,π)∈E} {(A,B,πe)∈E} " " ## Y = (1 + o (1)) X 1 1 , (52) Eπ⊥⊥πe EQ O {(A,B,π)∈E} {(A,B,πe)∈E} O∈O

−1 where the last equality follows from the decomposition (34) over edge orbits O ∈ O of σ = π ◦ πe. Compared to the unconditional second moment in (29), the extra indicators in (52) will be useful for ruling out those rare events causing the second moment to blow up. We caution the reader that, crucially, the conditioning event E must be measurable with respect to the observed and the latent variables (A, B, π). Thus we cannot rule out the rare event that π −1 is close to its independent copyπ ˜ so that σ = π ◦ πe induces a proliferation of short edge orbits. Instead, as we will see, by truncating certain rare events associated with the intersection graph π Q A ∧ B , the excessively large magnitude of |O|=k XO can be regulated for small k. By the data processing inequality of total variation, we have

TV P(A, B), P0(A, B) ≤ TV P(A, B, π), P0(A, B, π) = P ((A, B, π) 6∈ E) = o(1).

20 Combining this with the second moment bound (11)–(12) and applying the triangle inequality, we arrive at the following conditions for non-detection: " # P0(A, B)2 = O(1) =⇒ TV(P(A, B), Q(A, B)) ≤ 1 − Ω(1) (53) EQ Q(A, B) " # P0(A, B)2 = 1 + o(1) =⇒ TV(P(A, B), Q(A, B)) = o(1). (54) EQ Q(A, B)

4.1 Sharp threshold for the Gaussian model

2 (2−) log n In this section, we improve over the impossibility condition ρ ≤ n established in Theorem3, 2 (4−) log n showing that if ρ ≤ n , then weak detection is impossible. This completes the impossibility proof of Theorem2 for the Gaussian model. Before the rigorous analysis, we first explain the main intuition. Let F denotes the set of fixed −1 points of σ = π ◦ πe, so that |F | = n1. Let F  O = , 1 2 which is a subset of fixed points of the edge permutation (cf. (28)). As argued in Section 3.3, the 2 (2+) log n unconditional second moment blows up when ρ ≥ n due to the obstruction of fixed points of σ, or more precisely, an atypically large magnitude of Q X . By (20) and (30), O∈O1 O Y Y XO = Xij

O∈O1 i

21 n1 When n1 ≥ n/2, we have ζ = ρ 2 (1 + o(1)). Furthermore, on the event E, it follows from (55) and ρ = o(1) that          Y 2 n1 2ρ Q  Xij1E  ≤ exp −(1 + o(1))ρ Q exp eA∧Bπ (F ) 1{e π (F )≤ζ} E 2 E 1 − ρ2 A∧B i

2 (4−) log n which corresponds precisely to the desired condition ρ ≤ n . 2 log n Next, we proceed to the rigorous proof. As the impossibility of weak detection when ρ ≤ n has already been shown in Theorem3, henceforth we only need to focus on log n (4 − ) log n ≤ ρ2 ≤ . n n The following lemma proves that E holds with high probability under the planted model P.

Lemma 1. It holds that P((A, B, π) ∈ E) = 1 − e−Ω(n).

k p  Proof. Fix an integer n/2 ≤ k ≤ n and let m = 2 . Let t = c m log(1/δ) + log(1/δ) , for a universal constant c and a parameter δ to be specified later. Fix a subset S ⊂ [n] with |S| = k. Using the Hanson-Wright inequality given in Lemma 10, with probability at least 1 − 3δ,

X 2 X 2 X Aij ≥ m − t, Bπ(i)π(j) ≥ m − t, eA∧Bπ (S) = AijBπ(i)π(j) ≤ ρm + t. (58) i

n kn Now, there are k different choices of S ⊂ [n] with |S| = k. Thus by choosing 1/δ = 2 k and Pn −k −Ω(n) applying the union bound, we get that with probability at least 1 − 3 k=n/2 2 = 1 − e , (58) holds uniformly for all S ⊂ [n] with |S| = k and all n/2 ≤ k ≤ n. By definition and the fact k en k k p  3/2 that k ≥ n/2, 1/δ ≤ 2 k ≤ (4e) , and thus t ≤ c mk log(4e) + k log(4e) = O n .

Now, let us compute the conditional second moment. By Lemma1, it follows from (52) that

" 2# " " ## P0(A, B) Y = (1 + o (1)) X 1 1 . EQ Q(A, B) Eπ⊥⊥πe EQ O {(A,B,π)∈E} {(A,B,πe)∈E} O∈O

To proceed further, we fix π, πe and separately consider the following two cases.

22 Case 1: n1 ≤ n/2. In this case, we simply drop the indicators and use the unconditional second moment: " # " # Y Y Y 1 EQ XO1{(A,B,π)∈E}1{(A,B,π)∈E} ≤ EQ XO = , e 1 − ρ2|O| O∈O O∈O O∈O where the last equality follows from (37).

Case 2: n1 > n/2. In this case, " # " # Y (a) Y X 1 1 ≤ X 1 EQ O {(A,B,π)∈E} {(A,B,πe)∈E} EQ O {(A,B,π)∈EF } O∈O O∈O   (b) Y Y = EQ  XO1{(A,B,π)∈EF } EQ [XO] O∈O1 O/∈O1   (c) Y Y 1 = EQ  Xij1{(A,B,π)∈E } , F 1 − ρ2|O| i

3/2 2 2 log n where we used the fact that n = o(ρn1) in view of assumption ρ ≥ n and n1 > n/2. It follows from (55) that   Y EQ  Xij1{(A,B,π)∈EF } i

n  2       −( 1) (2 + o(1))ρ n1 2ρeA∧Bπ (F ) ≤ 1 − ρ2 2 exp − exp 1 , 1 − ρ2 2 EQ 1 − ρ2 {eA∧Bπ (F )≤ζ}

P n1 where eA∧Bπ (F ) = i

23 where the equality uses the MGF expression in (24). Choosing3 λ = (1 − ρ2)/2 in (59), we obtain

 1n    1n   exp β(1 − λ)ζ − 1 log 1 − β2λ2 = exp (β − ρ)ζ − 1 log 1 − ρ2 . 2 2 2 2

Combining the last three displayed equations yields that    2      Y 2ρ (1 + o(1)) n1 3 n1 X 1 ≤ exp − + (β − ρ)ζ − log 1 − ρ2 EQ  ij {(A,B,π)∈EF } 1 − ρ2 2 2 2 i

" 2# " # P0(A, B) Y 1 EQ ≤ (1 + o(1))E 1{n1≤n/2} Q(A, B) 1 − ρ2|O| O∈O    2 2  Y 1 (1 + o(1))ρ n1 + (1 + o(1))E  exp 1{n1>n/2} . 1 − ρ2|O| 4 O/∈O1

ρ2 Let τ = 1−ρ2 . Note that

n N n N Y 1  1  2 Y  1  k  1  2  1  2 = = (1 + o(1)) , 1 − ρ2|O| 1 − ρ2 1 − ρ2k 1 − ρ2 1 − ρ4 O/∈O1 k≥2 2  ≤ (1 + o(1)) exp τn2 + τ N2 , where the first equality follows from (28), the second equality holds by (43) under the assumption 2 1 1 ρ ≤ (4 − ) log n/n, and the last inequality holds because 1−ρ2 = 1 + τ ≤ exp (τ) and 1−ρ4 ≤ 1 + τ 2 ≤ exp τ 2. Similarly,

(n1) Y 1  1  2 = ≤ exp τn2/2 . 1 − ρ2|O| 1 − ρ2 1 O∈O1 Hence, " # P0(A, B)2 ≤ (1 + o(1)) exp τ n2/2 + n  + τ 2N  1  EQ Q(A, B) E 1 2 2 {n1≤n/2}  (1 + o(1))ρ2n2   + (1 + o(1)) exp τn + τ 2N  exp 1 1 . E 2 2 4 {n1>n/2}

We upper bound the two terms separately. To bound the first term, we apply (48) in Propo- ρ2 sition2 with µ = τ/2, ν = 0, a = 0, and b = n/2. Recall that τ = 1−ρ2 . By assumption

3 1 n1 2 2 This choice is motivated by choosing λ to minimize −βλζ + 2 2 β λ , the first-order approximation of the ∗ n1 2 exponent in (59), leading to λ = ζ/[ 2 β] = (1 + o(1))(1 − ρ )/2.

24 2 2 1  ρ2n ρ ≤ (4 − ) log n/n, we have τ = o n and µb + 2 − log b = 4(1−ρ2) + 2 − log(n/2) ≤ 0 for all sufficiently large n. Thus it follows from (48) in Proposition2 that

 2 2   E exp τ(n1/2 + n2) + τ N2 1{n1≤n/2} ≤ 1 + o(1).

(1+o(1))ρ2 n To bound the second term, we apply (47) in Proposition2 with µ = 4 , ν = 0, a = 2 , ρ2 2 2 1  and b = n. Recall that τ = 1−ρ2 . By assumption nρ ≤ (4 − ) log n, we have τ = o n and (1+o(1))ρ2n µb + ν + 2 − log b = 4 + 2 − log n ≤ 0 for sufficiently large n. Thus it follows from (47) in Proposition2 that

 (1 + o(1))ρ2n2   exp τn + τ 2N  exp 1 1 = o(1). E 2 2 4 {n1>n/2}

 2  P0(A,B)  Combining the upper bounds for the two terms, we conclude that EQ Q(A,B) = 1 + o(1) under the assumption that ρ2 ≤ (4 − ) log n/n. Thus TV(P, Q) = o(1) in view of (54).

5 Conditional Second Moment Method: Sparse regime

We focus on the Erd˝os-R´enyi model in the sparse regime of p = n−Ω(1). The impossibility condition 2 log n previously obtained in Theorem3 by the unconditional second moment simplifies to s ≤ (2−) n . In this section, we significantly improve this result by showing that if

1 − ω(n−1/3) s2 ≤ ∧ 0.01, (60) np then strong detection is impossible. Moreover, if both s = o(1) and (60) hold, then weak detection is impossible. Analogous to the proof for the dense case in Section4 (see also Section A.3), we will apply the conditional second moment method. However, the argument in the sparse case is much more sophisticated for the following reason. In the dense regime (both Gaussian and Erd˝os-R´enyi graph with p = n−o(1)), we have shown that the main contribution to the second moment is due to fixed −1 points of σ = π ◦ πe, which can be regulated by conditioning on the edge density of large induced subgraphs in the intersection graph. For sparse Erd˝os-R´enyi graphs with p = n−Ω(1), we need to control the contribution of not just fixed points, but all edge orbits of length up to k = Θ(log n). 2k (2+) log n Indeed, as argued in Section 3.3, the unconditional second moment blows up when ρ ≥ n due to the obstructions from the k-edge orbits, or more precisely, an atypically large magnitude of Q s(1−p) |O|=k XO. Note that ρ = = (1 + o(1))s in the sparse case. Therefore, to show the desired 1−ps Q condition (60), we need to regulate |O|=k XO beyond k = 1 by proper conditioning. In fact, for p = Θ(1/n), since (60) reduces to ρ ≤ 0.1, it is necessary to control all k up to Θ(log n). −1 To this end, the crucial observation is as follows. We call a given edge orbit O of σ = π ◦ πe complete if it is a subgraph of the intersection graph A ∧ Bπ, i.e. O ⊂ E (A ∧ Bπ). For each complete orbit O, we have A = B = B = 1 for all (i, j) ∈ O and hence, by (21) and ij π(i)π(j) πe(i)πe(j) 2 2 (30), Xij = L(1, 1) = 1/p , so that XO attains its maximal possible value, namely

12|O| X = , ∀O ⊂ E (A ∧ Bπ) . (61) O p

25 For incomplete orbits, it is not hard to show (see Proposition4 below) that

π EQ [XO | O 6⊂ A ∧ B ] ≤ 1.

Hence, the key is to control the contribution of complete edge orbits O that are subgraphs of A∧Bπ. Crucially, under the assumption of Theorem2 in the sparse regime, nps2 is sufficiently small so that A ∧ Bπ is subcritical and a pseudoforest (each component having at most one cycle) with high probability under the planted model P. This global structure significantly limits the possible configurations of complete edge orbits, since many patterns of co-occurrence of edge orbits in A∧Bπ are forbidden. Motivated by this observation, we truncate the likelihood ratio by conditioning on the global event that A ∧ Bπ is a pseudoforest. Finally, in order to show the conditional second moment is bounded under the desired condition (60), we carefully control the co-occurrence of edge orbits in A ∧ Bπ under the pseudoforest constraint, which involves a delicate enumeration of pseudoforests that can be assembled from edge orbits. Next, let us proceed to the rigorous analysis. Define

π E , {(A, B, π): A ∧ B is a pseudoforest}. Note that A ∧ Bπ ∼ G(n, ps2) under the planted model P. The following result shows that in the subcritical case A ∧ Bπ is a pseudoforest.

2 −1/3 1  Lemma 2 ([FK16, Lemma 2.10]). If nps ≤ 1 − ω n , then P ((A, B, π) ∈ E) = 1 − o n3 as n → ∞.

Recall from (29) and (34) in Section 3.2 the following representation of the squared likelihood ratio 2 " # P(A, B) Y = X , (62) Q(A, B) Eπ⊥⊥πe O O∈O −1 where for each edge orbit O of σ = π ◦ πe, Y X = X ,X = L A ,B  L A ,B  , O ij ij ij π(i)π(j) ij πe(i)πe(j) ij∈O with L(·, ·) is defined in (21). In order to decompose (62) further, let us introduce the following key definitions. Recall from Section 3.1 that Oi denotes the node-orbit of i (under the node E permutation σ) and Oij denotes the edge-orbit of (i, j) (under the edge permutation σ ). Fix some k to be specified later.

• Define Ok as the set of edge orbits of length at most k that are formed by node orbits with length at most k, that is,

Ok = {Oij : |Oi| ≤ k, |Oj| ≤ k, |Oij| ≤ k, 1 ≤ i < j ≤ n}.

π • Define Jk as the set of edge orbits O ∈ Ok that are subgraphs of A ∧ B , i.e.,

Jk = {O ∈ Ok : Aij = 1,Bπ(i)π(j) = 1, ∀(i, j) ∈ O} = {O ∈ O : A = 1,B = 1, ∀(i, j) ∈ O}, k ij πe(i)πe(j) where the second equality holds because {B } = {B } . π(i)π(j) (i,j)∈O πe(i)πe(j) (i,j)∈O

26 • Define [ Hk = O. (63)

O∈Jk

−1 Note that while Ok depends only on the random permutation σ = π ◦ πe, both Jk and Hk depend in addition on the random graph A ∧ Bπ. As will be discussed at length in Section 5.1, each edge orbit can be viewed as a subgraph of the complete graph Kn. Different edge orbits are by definition edge disjoint, and the union of all edge orbits is the edge set of Kn. We shall call a graph an orbit graph if it is union of edge orbits. π Importantly, by definition, the orbit graph Hk is a subgraph of A ∧ B . To compute the conditional second moment, by Lemma2, it follows from (52) that

" 2# " " ## P0 (A, B) Y = (1 + o (1)) X 1 1 EQ Q (A, B) Eπ⊥⊥πe EQ O {(A,B,π)∈E} {(A,B,πe)∈E} O∈O " " ## Y ≤ (1 + o (1)) X 1 , (64) Eπ⊥⊥πe EQ O {Hk is a pseudoforest} O∈O where the last inequality holds because on the event that A ∧ Bπ is a pseudoforest, its subgraph Hk is also one. To further upper bound the right hand side of (64), we decompose the product over edge orbits into three terms: Y Y Y Y XO = XO × XO × XO

O∈O O/∈Ok O∈Ok\Jk O∈Jk which correspond to the contributions of long orbits, short incomplete orbits (that are not subgraphs of A ∧ Bπ), and short complete orbits (that are subgraphs), respectively. As shown earlier in (61), 2|O| for each complete edge orbit O, we have XO = (1/p) . Therefore in view of (63), the collective contribution of short complete orbits are

2e(H ) Y 1 k X = . (65) O p O∈Jk

−1 Thus, fixing σ = π ◦ πe, we have " # Y EQ XO1{Hk is a pseudoforest} O∈O     Y Y = EQ  XO EQ  XO1{Hk is a pseudoforest} O/∈Ok O∈Ok      2e(H ) Y 1 k Y = X 1 X J , (66) EQ  O EJk  p {Hk is a pseudoforest}EQ  O k O/∈Ok O∈Ok\Jk where the first equality holds because {XO}O∈O are mutually independent and Jk ⊂ Ok, so that

{XO}O∈O\Ok is independent of {XO}O∈Ok and the event that Hk is a pseudoforest; the second equality holds because Hk is measurable with respect to Jk. The contributions of long orbits and incomplete orbits can be readily bounded as follows whose proofs are deferred till Sections B.1 and B.2.

27 −1 Proposition 3 (Long orbits). Fix any σ = π ◦ πe. For any k ∈ N,   n2 Y  k k EQ  XO ≤ 1 + ρ . O∈O\Ok −1 Proposition 4 (Incomplete orbits). Fix any σ = π ◦ πe. If p ≤ 1/2 and s ≤ 1/2, then  

Y EQ  XO Jk ≤ 1. O∈Ok\Jk −1 Applying Proposition3 and Proposition4 to (66), we get that for any σ = π ◦ πe, 2 " # n " 2e(Hk) # Y   k 1 X 1 ≤ 1 + ρk 1 (67) EQ O {Hk is a pseudoforest} EJk p {Hk is a pseudoforest} O∈O

It remains to further upper bound the RHS of (67). Let Hk denote the set of all orbit graphs that consist of edge orbits in Ok and are pseudoforests – we call such graphs orbit pseudoforests. As such Hk depends only on σ but not the graph A and B. Therefore, " 2e(H ) # 2e(H) 1 k X 1 1 = Q (H = H) 1 EJk p {Hk is a pseudoforest} k p {H is a pseudoforest} H∈Hk X ≤ s2e(H), (68)

H∈Hk where the last step holds because  2|e(H)| Q (Hk = H) ≤ Q Aij = 1,Bπ(i)π(j) = 1, ∀(i, j) ∈ E(H) = (ps) . In view of (68), to further upper bound the second moment, it boils down to bounding the the generating function of the class Hk of orbit pseudoforests. This is done in the following theorem in terms of the cycle type of σ. The proof involves a delicate enumeration of orbit pseudoforests, which constitutes the most crucial part of the analysis. We note that if we ignore the orbit structure and treat Hk as arbitrary pseudoforests, the resulting bound will be too crude to be useful. −1 Theorem 4 (Generating function of orbit pseudoforests). For any k ∈ N, σ = π ◦ πe, and any s ∈ [0, 1],

k m !nm X 2e(H) Y m 2m X 4m s ≤ 1 + s nm1{m:even} + 2s `n` + s mn2m1{2m≤k} , (69)

H∈Hk m=1 `=1 −1 where nm is the number of m-node orbits in σ = π ◦ πe for 1 ≤ m ≤ k. Combining (64), (67), (68), and (69), we get that " # P0 (A, B)2 EQ Q (A, B)   nm  n2 k   k Y X ≤ (1 + o(1)) 1 + ρk 1 + smn 1 + 2s2m `n + s4mmn 1 , Eπ⊥⊥πe   m {m:even} ` 2m {2m≤k}  m=1 `≤m (70) which is further bounded by the next result.

28 Proposition 5. Suppose k(log k)4 = o(n). If s ≤ 0.1, n  k   m  Y X 1 + smn 1 + 2s2m `n + s4mmn 1 = O(1). (71) Eπ⊥⊥πe   m {m:even} ` 2m {2m≤k}  m=1 `≤m

Furthermore, if s = o (1), n  k   m  Y X 1 + smn 1 + 2s2m `n + s4mmn 1 = 1 + o(1). (72) Eπ⊥⊥πe   m {m:even} ` 2m {2m≤k}  `=m `≤m The proof of Proposition5 is involved and deferred to Section B.3. To provide some concrete idea, the following simple calculation shows that s = o(1) is necessary for (72) to hold. Indeed,  2 n1  consider k = 1 for which the LHS reduces to E 1 + 2s n1 . By Poisson approximation (see AppendixD), replacing n1 by Poi(1) yields ∞ ∞ n1 X a 1 X a 1 2 1 + 2s2n   ≈ e−1 1 + 2s2a ≥ e−1 1 + 2s2 = e2s , E 1 a! a! a=0 a=0 which is 1 + o(1) if and only if s = o(1). To evaluate the full expectation in (72), note that even if we use Poisson approximation to replace nm’s by independent Poissons, the terms inside the product over [k] are still dependent. To this end, we carefully partition the product into disjoint parts, and recursively peeling off the expectation backwards. We are now ready to complete the proof of Theorem2 in the sparse case.

1 n2sk Proof of Theorem2: Impossibility Result in Sparse Regime. Let k = 3 log n. If s ≤ 2 , then k = o(1) and thus 2   n n2ρk  n2sk  1 + ρk k ≤ exp ≤ exp = 1 + o(1). k k

4 P0(A,B) 2 Note that k(log k) = o(n). Combining (70) with (71) and (72) yields that EQ[( Q(A,B) ) ] = O(1) P0(A,B) 2 for s ≤ 0.1 and EQ[( Q(A,B) ) ] = 1 + o(1) for s = o(1), which completes the proof in view of (53) and (54).

The remainder of this section is organized as follows. To prepare for the proof of Theorem4, we study the graph structure and the classification of edge orbits in Section 5.1. An equivalent representation of orbit graphs as backbone graphs is given in Section 5.2 to aid the enumeration argument. As a warm-up, we first enumerate orbit forests (orbit graphs that are forests) and bound their generating function in Section 5.3. The more challenging case of orbit pseudoforests is tackled in Section 5.4, completing the proof of Theorem4. Sections B.1–B.3 contain the proofs of Propositions3–5.

5.1 Classification of edge orbits To prove Theorem4, we are interested in orbit graphs consisting of short edge orbits, and the main task lies in enumerating those that are pseudoforests. To this end, we need to understand the graph structure of edge orbits. Throughout this subsection, fix a node permutation σ. For a given edge (i, j), its edge orbit can be viewed a graph with vertex set Oi ∪ Oj and edge set Oij. Let |Oi| = ` and |Oj| = m. Each edge orbit can be classified into the following four categories (see Table2 for a concrete example).

29 Type Edge orbit Orbit graph 1 3 (13,24) 2 4 M 1 3 (14,23) 2 4

5 1 6 (15,26,17,28) 2 7 8

5 1 6 (16,27,18,25) 2 7 8

5 B 3 6 (35,46,37,48) 4 7 8

5 3 6 (36,47,38,45) 4 7 8

5 6 C (56,67,78,85) 7 8

1 (12) 2

3 (34) S 4 5 6 (57,68) 7 8

Table 2: Edge orbits corresponding to the node permutation σ = (12)(34)(5678). When represent- ing an edge orbit in cycle notation, each edge (i, j) is abbreviated as ij. As a convention, nodes in each node orbit are vertically aligned and arranged in the order of the permutation σ. For edge orbits, type M are in green, type B in red, type C in blue, and type S in black.

Type M (Matching): i and j belong to different node orbits of the same length. In this case, |Oij| = m and Oij is a perfect matching. We call such Oij an Mm edge orbit (or a matching). Furthermore, for two distinct node orbits O and O0 of length m, the total number of possible Mm edge orbit is m.

30 Type B (Bridge): i and j belong to different node orbits of different lengths. Without loss of generality, assume that the orbit of i is shorter than that of j, i.e. ` <= m. In this case, `m let M = lcm(`, m). Then |Oij| = M and Oij consists of M vertex-disjoint copies of the complete bipartite graphs KM/`,M/m. We call such edge orbit a Bm,` edge orbit (or a bridge). Furthermore, for two node orbits O and O0 with |O| = ` < |O0| = m, the total number of `m possible bridges is M .

Of special interest is the case where ` is a divisor of m and the orbit is `K1, m (i.e. ` copies m ` of ` -stars). These are the only bridges that are cycle-free; otherwise the bridge contains a component with at least two cycles. This observation is useful for the enumeration argument in Sections 5.3 and 5.4 under constraints on the number of cycles.

Type C (Cycle): i and j belong to the same node orbit of length m and j 6= σm/2(i). In this case, |Oij| = m and Oij is an m-cycle. We call such Oij a Cm edge orbit (or a cycle), and  m−1  there are a total number 2 of them for the same node orbit. Type S (Split): i and j belong to the same node orbit (of even length m) and j = σm/2(i). In this case, |Oij| = m/2 and Oij is a perfect matching. We call such Oij an Sm edge orbit (or a split). Clearly, for each node orbit of even length, there is a unique way for it to split into an Sm edge orbit. In summary, matchings and bridges are edge orbits formed by two distinct node orbits, which are bipartite graphs with vertex sets Oi and Oj. Cycles and splits are edge orbits formed by a single node orbit Oi, which can either form a full cycle or split into a perfect matching.

5.2 Orbit graph and backbone graph Every orbit graph H can be equivalently and succinctly represented as a backbone graph Γ defined as follows.

Definition 2 (Backbone graph). Given an orbit graph H, its backbone graph is an undirected labeled multigraph, whose nodes and edges (referred to as giant nodes and giant edges) correspond to node orbits and edge orbits in H, respectively. Each giant node carries a binary label (represented as shaded or non-shaded) indicating whether the node orbit forms a Type S edge orbit (split) or not. Each giant edge carries a label (an integer) encoding the specific realization of the edge orbit. Specially,

• A Type S edge orbit (split) is represented by a shaded giant node.

• A Type Cm edge orbit (cycle) is represented by a self-loop, whose edge label takes values in  m−1   2 .

• A Type Mm edge orbit (matching) is represented by a giant edge between two m-node orbits, with edge label taking values in [m].

• A Type Bm,` edge orbit (bridge) is represented by a giant edge between a `-node orbit and a h `m i m-node orbit (` < m), with edge label taking values in lcm(`,m) . See Fig.2 for an example of an orbit graph and its corresponding backbone graph. As a convention, for backbone graph, the labeled giant edges representing Type M, Type B, and Type C edge orbits are colored green, red, and blue, respectively. Each shaded giant node represents a

31 Type S edge orbit. For convenience, the number inside each giant node represents the length of its corresponding node orbit.

5

1 3 6 1

2 4 7 1 3 2 2 4 2 8 (12) (34) (5678) (a) Orbit graph H. (b) Backbone graph Γ.

Figure 2: Example of an orbit graph and its corresponding backbone graph for σ = (12)(34)(5678). The labels of giant edges are determined based on the enumeration of edge orbits in Table2. For instance, the two green giant edges correspond to the two Type M (perfect matchings) between node orbits (12) and (34), and the red giant edge corresponds to the Type B edge orbits (bridge) between node orbits (34) and (5678).

Recall that Hk denotes the collection of orbit pseudoforests consisting of edge orbits of length at most k formed by node orbits of size at most k. To enumerate H ∈ Hk, it is equivalent to enumerating the corresponding backbone graph Γ. To facilitate the enumeration, we introduce the following definitions:

• Let Sm denote the set of giant nodes corresponding to m-node orbits.

• Let Γm = Γ[Sm] denote the subgraph of Γ induced by node set Sm for 1 ≤ m ≤ k. Let Γm,` = Γ[Sm,S`] denote the (bipartite) subgraph of Γ induced by edges between Sm and S`, for 1 ≤ ` < m ≤ k. Each giant edge in Γm,` corresponds to a Bm,` edge orbit (bridge).

• A connected component of Γm is called plain if it contains no split and is not incident to any bridge in ∪`

(O1) Adding one split in C increases ex(HC ) by m/2;

(O2) Adding one Bm,` bridge (` < m) to C increases ex(HC ) by at least lcm(`, m) − `. In addition, we need the following fact about the excess of an orbit graph:

Lemma 3. For any connected component C in Γm, ex(HC ) ≥ −m, where the equality holds if and only if C is a plain tree component in Γm.

Proof. Given a connected component C in Γm, let a and b denote the total number of giant edges and giant nodes in C, respectively. If C is a plain tree component, we have a + 1 = b. Since each giant edge in Γm represents an m-edge orbit, and each giant node represent an m-node orbit,

32 we have ex(HC ) = am − bm = −m. By (O1), (O2), and the fact that adding one self-loop in C increases ex(HC ) by m, we have ex(HC ) ≥ −m, where the equality holds if and only if C does not contain any split or self-loop and is not incident to any bridge in ∪`

5.3 Warm-up: Generating function of orbit forests −1 Fix σ = π ◦ πe and recall that nm denotes the number of m-node orbits in σ. Our enumeration scheme crucially exploits the classification of edge orbits and orbit graphs in Section 5.1 and the representation of orbit graphs as backbone graphs introduced in Section 5.2. As a warm-up, in this section we bound the generating function of orbit forests, which is much simpler than orbit pseudoforests. Restricting the summation to the set Fk of orbit forests, a strict subset of Hk, we show the following improved version of (69):

 nm X 2e(H) Y m 2m X s ≤ 1 + s 1{m:even} + s `n` . (73) H∈Fk 1≤m≤k `≤m When the orbit graph H is a forest, its corresponding backbone graph Γ must satisfy the following four conditions:

(T1) For each 1 ≤ m ≤ k,Γm is a forest with simple edges (of multiplicity 1);

(T2) For each 1 ≤ ` < m ≤ k,Γm,` is empty unless ` is a divisor of m; (T3) There is no self-loop;

(T4) For each 1 ≤ m ≤ k, each component of Γm either contains at most 1 split or is incident to at most 1 bridge in ∪`

2 4 2 4 4 4 2 4 4

(a) A component in Γ4 is in- (b) A component in Γ4 con- (c) A component in Γ4 con- cident to 2 bridges in Γ4,2. tains 2 splits. tains 1 split and is incident to 1 bridge in Γ4,2.

Figure 3: Examples of backbone graphs violating (T4), whose corresponding orbit graphs contain cycles.

33 Next, we describe an algorithm for generating all possible backbone graphs Γ that satisfy the aforementioned conditions (T1)–(T4). Given a sequence of integers (a, b, c) = (am, bm, cm)1≤m≤k with bm = 0 for odd m, we construct Γ as follows:

Algorithm 1 Forest enumeration algorithm 1: for each t = 1, . . . , k do 2: Step 1: Matching stage. Construct a rooted forest Γt with nt giant nodes and at giant edges; Attach a label from [t] to each giant edge; 3: Step 2: Splitting stage. Choose bt components from nt − at tree components of Γt, and within each chosen component, add a split to the root; 4: Step 3: Bridging stage. Choose ct out of the remaining nt − at − bt tree components of Γt, and for each chosen component, add a bridge connecting its root to a giant node in Γ` for some ` < t that is a divisor of t. Attach a label from [`] to the added bridge. 5: end for

We claim that any orbit forest can be generated by Algorithm1. To verify this claim formally, let H be an orbit forest and Γ denote the its corresponding backbone graph in Definition2, which, for 1 ≤ m ≤ k, contains

• am matchings corresponding to Type Mm edge orbits;

• bm splits corresponding to Type Sm edge orbits;

• cm bridges corresponding to Type Bm,` edge orbits for some ` < m that is a divisor of m.

For each Γm, we arbitrarily choose the root for each plain tree component, and specify the root in each non-plain tree component as the unique giant node that either splits or is incident to a bridge in ∪`

1. It is well-known that the total number of rooted forests on n vertices with a edges is n − 1 na (74) a (see e.g. [FS09, II.18, p. 128].) Moreover, each giant edge added in Step 1 has t possible labels. Therefore, the total number of rooted backbone graphs Γt is at most     nt − 1 at nt at (tnt) ≤ (tnt) . (75) at at

2. The total number of ways of placing bt splits is at most n − a  t t . (76) bt

34 3. The total number of ways of placing ct bridges is at most   !ct nt − at − bt X `n` . (77) ct `

Combining (75), (76), and (77), we get that the total number of output backbone graphs Γ with input parameter (a, b, c) is at most

  !ct Y nt at X 1{bt=0 for odd t} (tnt) `n` . (78) at, bt, ct 1≤t≤k `

Then the desired (73) readily follows from

  !ct X 2e(H) X Y nt at X 2tat+tbt+2tct s ≤ 1{bt=0 for odd t} (tnt) `n` s at, bt, ct H∈Fk a,b,c 1≤t≤k `

5.4 Proof of Theorem4: Generating function of orbit pseudoforests −1 Fix σ = π ◦ πe and recall nm denotes the number of m-node orbits in σ. In this section we bound the generating function (68) of orbit pseudoforests H ∈ Hk and prove Theorem4. Recall that each orbit graph H can be equivalently represented as a backbone graph Γ as in Definition2. In addition, we need the following vocabularies: For 1 ≤ m ≤ k and each u ∈ Sm, let C(u) denote the connected component in Γm containing u. Similar to the reasoning in Section 5.3, when H is a pseudoforest, its backbone graph Γ must satisfy the following properties:

(P1) For each 1 ≤ m ≤ k,Γm is a pseudoforest (with self-loops and parallel edges counted as cycles);

(P2) For each 1 ≤ ` < m ≤ k,Γm,` is empty unless ` is a divisor of m.

(P3) Each unicyclic component of Γm is plain.

(P4) A tree component in Γm contains at most two splits.

0 0 0 (P5) Let (u, v) ∈ Γm,`1 and (u , v ) ∈ Γm,`2 be two bridges with `1, `2 < m, such that u and u belong to the same tree component in Γm. Then m must be even and `1 = `2 = m/2.

(P6) Let (u, v) ∈ Γm,` be a bridge with ` < m such that u belongs to a tree component that contains a split in Γm. Then m must be even and ` = m/2. Furthermore, v must belong to a plain tree component in Γm/2. (P7) For each (u, v) and (u0, v0) that satisfy either (P5) or (P6) where v 6= v0, the ending points v 0 and v must belong to distinct plain tree components in Γm/2.

35 4 4 4 1 4 4 2 v u u0 v0

(a) A component in Γ4 contains 3 splits. (b) A component in Γ4 is incident to a 0 0 bridge (u, v) ∈ Γ4,1 and a bridge (u , v ) ∈ Γ4,2.

Figure 4: Examples of backbone graphs violating (P4) and (P5), shown in (a) and (b), respectively, and the corresponding orbit graphs.

1 4 4 4 1 2 4 4 2 2 4 4 v u v u v u

(a) A component in Γ4 con- (b) A component in Γ4 con- (c) A component in Γ4 con- tains 1 splits and is incident tains 1 split and is incident to tains 1 split and is incident to to 1 bridge (u, v) ∈ Γ4,1. 1 bridge (u, v) ∈ Γ4,2 where v 1 bridge (u, v) ∈ Γ4,2 where v is in a non-plain component is in a non-plain component in Γ2 that is incident to 1 in Γ2 that contains a split. bridge in Γ2,1.

Figure 5: Examples of backbone graphs violating (P6) and the corresponding orbit graphs.

Otherwise, H contains a component with at least two cycles, violating the pseudoforest constraint. See Fig.4 - Fig.6 for illustrations of forbidden patterns that violate (P4)- (P7). Properties (P1)–(P7) are justified by the following arguments:

• Paralleling conditions (T1) and (T2) for the forest constraint, (P1) and (P2) follows from the classification of edge orbits and orbit graphs in Section 5.1;

• Suppose (P3) does not hold. Since the excess of the corresponding orbit graph of a plain uni- cyclic component in Γm is 0, by (O1) and (O2), there exists a unicyclic component component C in Γm such that ex(HC ) > 0, contradicting H being a pseudo-forest;

• Suppose (P4) does not hold. Then by (O1) and Lemma3, there exists a tree component component C in Γm such that ex(HC ) > 0, contradicting H being a pseudo-forest;

• Suppose (P5) does not hold. Then by (O2) and Lemma3, there exists a tree component component C in Γm such that ex(HC ) > 0, contradicting H being a pseudo-forest;

• To prove (P6), let G1 denote the orbit graph of C(u) consisting of edge orbits (including splits, matchings, and cycles). Let G2 denote the orbit graph of C(v) consisting of edge orbits (including splits, matchings, and cycles) in C(v), as well as bridges in ∪`

36 2 2 4 2 1 2 4 2 v u v0 v u v0 (a) (u, v) and (u, v0) satisfy (P5), while v is (b) (u, v) and (u, v0) satisfy (P5), while v is in a in a non-plain component in Γ2 that con- non-plain component that is incident to a bridge tains a split. in Γ2,1

2 2 4 2 2 2 4 2 2 4 4 v u v0 u v v0 u0 (c) (u, v) and (u, v0) satisfy (P5), while v (d) (u, v) satisfies (P5),(u0, v0) satisfies (P6), 0 0 and v are in the same component in Γ2. while v and v are in the same component in Γ2.

Figure 6: Examples of backbone graphs violating (P7) and the corresponding orbit graphs.

that are incident to C(v). Let Gnew denote the edge-disjoint union of G1, G2, and the edge orbit corresponding to the bridge (u, v). Since C(u) contains a split, by (O1) and Lemma3, ex(G1) ≥ −m/2. Then we have

ex(Gnew) = ex(G1) + ex(G2) + m ≥ ex(G2) + m/2 ≥ 0, where the last inequality is met with equality if and only if C(v) is a plain tree component in Γm/2 by Lemma3. Hence, (P6) follows. 0 • To prove (P7), let C = C(u) ∪ C(u ). Let G1 denote the orbit graph of C consisting of edge orbits (including splits, matchings, and cycles), as well as bridges in ∪`

ex(Gnew) ≥ ex(G1) + ex(G2) + 2m ≥ −m + ex(G2) + 2m = ex(G2) + m.

By assumption, Gnew is a pseudo-forest and thus ex(Gnew) ≤ 0. It follows that ex(G2) ≤ −m 0 and hence v and v must be in distinct plain tree components in Γm/2 by Lemma3.

37 The implication of (P4)-(P7) is the following. For each m ∈ [k], define

E(m) , ∪`

Esingle(m) , {(u, v) ∈ E(Γ) : u ∈ Sm, v ∈ ∪`

Edouble(m) , {(u, v) ∈ E(Γ) : u ∈ Sm, v ∈ Sm/2, (80)

C(u) contains a split or is incident to some bridge in ∪`

By Properties (P6)–(P7) , we have E(m) = Esingle(m) ∪ Edouble(m). Moreover, 0 0 0 • For each (u, v), (u , v ) ∈ Esingle(m), the starting points u and u belong to separate tree 0 components in Γm, i.e., C(u) and C(u ) are distinct tree components in Γm. Furthermore, C(u) (resp. C(u0)) contains no split and is not incident to any bridge other than (u, v) (resp. (u0, v0)). (This is just repeating definition).

0 0 0 • For each (u, v), (u , v ) ∈ Edouble(m), the ending points v and v belong to separate plain tree components in Γm/2, by (P7).

The above observation suggests that to specify the bridges in Esingle(m), one can use the forward construction by first choosing their starting points from separate components of Γm then choosing the ending points from shorter orbits ∪` k, we construct Γ as follows:

Algorithm 2 Pseudoforest enumeration algorithm 1: for each t = 1, . . . , k do 2: Step 1: Matching stage. Construct a rooted pseudoforest Γt with nt giant nodes and at giant edges (allowing self-loops and multiple edges). Attach a label from [t] to each giant edge  t−1  and a label from b 2 c to each self-loop; 3: Step 2: Splitting stage. Choose bt components from nt − at tree components of Γt, and within each chosen component, choose a node: if the node chosen is the same as the root, add a split to the root; otherwise, add two splits, one at the root and the other one at the chosen node; 4: Step 3: Forward bridging stage. Choose ct out of the remaining nt − at − bt tree components of Γt, and for each chosen component, add a bridge connecting its root to a giant node in Γ` for some ` < t that is a divisor of t. Attach a label from [`] to the added bridge. 5: Step 4: Backward bridging stage. Choose dt from the remaining nt − at − bt − ct tree components of of Γt. For each chosen component, add a bridge by connecting its root to a giant node in Γ2t. Attach a label from [t] to the added bridge. 6: end for

38 We note that an output graph of Algorithm2 is not necessarily a pseudoforest; nevertheless any orbit pseudoforest can be generated by Algorithm2, which is what we need for upper bounding the generating function of orbit pseudoforests, P s2e(H). To verify this claim formally, let H be H∈Hk an orbit pseudoforest and let Γ denote its backbone graph as in Definition2, where for 1 ≤ m ≤ k there are:

• am giant edges (including self-loops) corresponding to either Type Mm or Cm edge orbits;

• bm components that contain splits corresponding to Type Sm edge orbits;

• cm giant edges corresponding to Type Bm,` edge orbits for some ` < m that is a divisor of m;

• dm giant edges corresponding to Type Bm,2m edge orbits.

For each Γm where 1 ≤ m ≤ k, we arbitrarily choose the root for each plain tree component, and specify the root in each non-plain tree component as the giant node that either splits or is incident to a bridge in Esingle(m)∪Edouble(2m) (when there are two giant nodes that split in a tree component, we choose any one of them as the root; otherwise, the choice of the root is unique). Then it is clear that Steps 1 and 2 can realize any configuration of splits and matchings in Γ, thanks to Properties (P1)–(P4). Finally, note that bridges in Esingle(m) are added by Step 3 (forward bridging) at iteration t = m with cm = |Esingle(m)|, and bridges in Edouble(m) are added by Step 4 (backward bridging) at iteration t = m/2 with dm/2 = |Edouble(m)|. Next we bound the generating function P s2e(H) from above. We first state an auxiliary H∈Hk lemma, which extends the well-known formula (74) for enumerating rooted forests. Lemma 4. The number of rooted pseudoforests on n nodes with a edges (allowing self-loops and multiple edges) is at most n (2n)a . (81) a Proof. To see this, let m denote the number of cycles (including self-loops and parallel edges). Then the number of connected components is n − a + m. To enumerate all such rooted pseudoforests, we first enumerate all rooted forests on n vertices with a − m edges, then choose m roots out of n−a+m roots, and finally add one edge to each chosen root to form a cycle within its corresponding component. Each added edge can either be a self-loop at the root or connect the root to some other node, so there are at most n different choices of the added edge. Therefore, the total number of rooted pseudoforests on n vertices with a edges is at most a a X  n  n − a + m X  n n − a + m n na−m nm = na = (2n)a. a − m m a − m m a m=0 m=0 | {z } a n 2 (a)

Now, we can enumerate all possible output backbone graphs Γ of Algorithm2 as follows. For t = 1, . . . , k,

1. Note that Γt constructed in Step 1 is a rooted pseudoforest with nt giant nodes and at giant edges. Moreover, each giant edge added in Step 1 carries at most t possible labels. Hence, the total number of all possible rooted pseudoforests Γt constructed in Step 1 is at most:   nt at (2tnt) . (82) at

39 2. The total number of different ways of splitting is at most   nt − at bt nt . (83) bt 3. The total number of different ways of forward bridging is at most:   !ct nt − at − bt X `n` . (84) ct `k} (2tnt) nt `n` (tn2t) . at, bt, ct, dt t=1 `k} at, bt, ct, dt H∈Hk a,b,c,d t=1 !ct at bt X dt 2tat+tbt+2tdt+4tdt × (2tnt) nt `n` (tn2t) × s `

6 Conclusion and Open Questions

In this paper, we formulate the general problem of testing network correlation and characterize the statistical detection limit. For both Gaussian-weighted complete graphs and dense Erd˝os-R´enyi graphs, we determine the sharp threshold at which the asymptotic optimal testing error probability jumps from 0 to 1. For sparse Erd˝os-R´enyi graphs, we determine the threshold within a constant factor. The proof of the impossibility results relies on a delicate application of the truncated second moment method, and in particular, leverages the pseudoforest structure of subcritical Erd˝os-R´enyi graphs in the sparse setting. We conclude the paper with a few important open questions.

40 1. In a companion paper [MWXY20], we show that a polynomial-time test based on counting trees achieves strong detection when the average degree np ≥ n−o(1) and the correlation ρ ≥ c for an explicit constant c. In particular, this result combined with our negative results in The- orem2 imply that the detection limit is attainable in polynomial-time up to a constant factor in the sparse regime when np = Θ(1). However, achieving the optimal detection threshold in polynomial time remains largely open.

2. It is of interest to study the detection limit under general weight distributions. Our proof techniques are likely to work beyond the Gaussian Wigner and Erd˝os-R´enyi graphs model. For example, for general distributions P and Q, as shown in the proof of Proposition1, the second moment is determined by the eigenvalues of the kernel operator defined by the P (x,y) likelihood ratio L(x, y) = Q(x,y) . Another interesting direction is testing correlations between hypergraphs.

3. Another important open problem is to determine the sharp threshold for detection in the sparse Erd˝os-R´enyi graphs with p = n−Ω(1). In particular, to improve our positive result, one may need to analyze a more powerful test statistic beyond QAP. For the negative direction, one needs to consider the case where the intersection graph A∧Bπ is supercritical and a more sophisticated conditioning beyond the pseudoforest structure may be required.

A Supplementary Proof for Sections1,3 and4

A.1 Proof for Remark1 In this subsection, we show that in the non-trivial regime of p = ω(1/n2) and p = 1 − Ω(1), weak detection can be achieved by comparing the number of edges of the two observed graph, provided that s = Ω(1). To see this, let X,Y denote the total number of edges in the observed graphs G1 and G2, respectively. Under H0 (resp. H1), X − Y is a sum of m i.i.d. random variables that equal to −1 or +1 with the equal probability ps(1 − ps) (resp. ps(1 − s)) and 0 otherwise. Using  1−ps  Gaussian approximation, this essentially reduces to testing N (0, 1) vs. N 0, 1−s , which can be non-trivially separated as long as s is non-vanishing. n Formally, assume s ≤ 1 − Ω(1) without loss of generality and let m = 2 . Then for some τ, Q (|X − Y | ≥ τ) − P (|X − Y | ≥ τ) (a)  1  ≥ (|N (0, 2mps(1 − ps))| ≥ τ) − (|N (0, 2mps(1 − s))| ≥ τ) − O √ P P mp (b)   1 − ps  1  (c) = TV N (0, 1) , N 0, − O √ = Ω(1), 1 − s mp where (a) follows from Berry-Esseen theorem [Pet95, Theorem 5.5] and the assumption that Ω(1) ≤ s ≤ 1 − Ω(1); (b) holds for some τ because the likelihood ratio between two centered normals is monotonic in |x|;(c) holds because (1 − ps)/(1 − s) is bounded away from 1 and mp = ω(1) under assumptions that s = Ω(1), p is bounded away from 1, and n2p = ω(1).

41 A.2 Proof of Proposition2

n2 Proof. Using (28), we have N2 = 2 × 2 + n1n2 + n4 ≤ (n2 + n1)n2 + n4 ≤ nn2 + n. Therefore, exp µn2 + νn + τn + τ 2N  1  Eπ⊥⊥πe 1 1 2 2 {a≤n1≤b} ≤ exp µn2 + νn + τ + τ 2n n + τ 2n 1  Eπ⊥⊥πe 1 1 2 {a≤n1≤b} = (1 + o(1)) exp µn2 + νn + (τ + τ 2n)n  1  , Eπ⊥⊥πe 1 1 2 {a≤n1≤b} √ where the last equality holds because nτ 2 = o(1) by assumption. Pick η such that ω(1) ≤ η ≤ n. We decompose exp µn2 + νn + (τ + τ 2n)n  1  as (I) + (II), where: Eπ⊥⊥πe 1 1 2 {a≤n1≤b} (I) = exp µn2 + νn + τ + τ 2n n  1 1  Eπ⊥⊥πe 1 1 2 {a≤n1≤b} {0≤n1<η} (II) = exp µn2 + νn + τ + τ 2n n  1 1  . Eπ⊥⊥πe 1 1 2 {a≤n1≤b} {n1≥η} • To bound (I), we first apply the total variation bound (122) in Lemma 13 and get n TV (L (n , n ) , L (Z ,Z )) ≤ F , 1 2 1 2 2 1 where Z1 ∼ Poi(1) and Z2 ∼ Poi( 2 ) are independent Poisson random variables, and log F (n/2) = n n  −(1 + o(1)) 2 log 2 . Then, we have n (I) ≤ exp µZ2 + νZ + τ + τ 2n Z  1  + 2F exp (µη + ν) η + τ + τ 2n n E 1 1 2 {a≤Z1

3  2 2    2 (II) ≤ E exp µZ1 + νZ1 + τ + τ n Z2 1{η≤Z1≤b} e .

t  By applying the moment generating function EX∼Poi(λ) [exp (tX)] = exp λ e − 1 , we then get  2   2 1  that E exp τ + τ n Z2 = 1 + o(1) given τ = o n . Therefore, we have  2   (I) ≤ E exp µZ1 + νZ1 1{a≤Z1≤b} (1 + o(1)) + o(1) 3  2   2 (II) ≤ E exp µZ1 + νZ1 1{η≤Z1≤b} (1 + o(1))e . The following intermediate result is proved in the end. Lemma 5. Assume α, β ≥ 0, αm + β + 2 − log m ≤ 0 for some 1 ≤ m ≤ n such that m = ω(1), and Z ∼ Poi (λ) for some 0 < λ ≤ 1.

• If ` = ω(1) and β ≤ log(`) − 3,

 2   E exp αZ + βZ 1{`≤Z≤m} = o(1). (86)

• If ` = 0 and β = o(1),

 2   E exp αZ + βZ 1{`≤Z≤m} = 1 + o(1). (87)

42 If a = ω(1), applying (86) yields (I) = o(1) and (II) = o(1), hence the desired (47). If a = 0, applying both (86) and (87), we get (I) = 1 + o(1), (II) = o(1), and hence the desired (48). n1 2 Finally to show (49), note that using (28), we have N1 = 2 + n2 ≤ n1/2 + n2. Then, we have exp τN + τ 2N  ≤ exp τn2/2 + τn + τ 2N . Thus the desired bound follows Eπ⊥⊥πe 1 2 Eπ⊥⊥πe 1 2 2 from applying (48) with µ = τ 2/2, ν = 0, a = 0, and b = n.

Proof of Lemma5. To show (86), note that

m a1 2   2   −λ X λ exp αa1 + βa1 E exp αZ + βZ 1{`≤Z≤m} = e a1! a1=` (a) m −λ X a1 2  ≤ e λ exp αa1 + βa1 − a1 log a1 + a1 ) a1=` m (b) −λ X a1 = e λ exp(−a1) = o(1),

a1=`

a where (a) holds due to a1! ≥ (a1/e) 1 for a1 ≥ 1; (b) follows from the claim that αx+β+2−log x ≤ 0 log x−β−2 0 for ` ≤ x ≤ m. To see this, define f(x) = x . Then by assumption f(m) ≥ α. Since f (x) ≤ 0 for x ≥ eβ+3 and ` ≥ eβ+3 by assumption, it follows that f(x) ≥ α for all ` ≤ x ≤ m. Next we prove (87). Since αm + β + 2 − log m ≤ 0 for m = ω(1), we have α ≤ log(m)/m = o(1). Moreover, β = o(1) by assumption. Therefore there exists some t = ω(1) such that ` = 0 < t ≤ m and αt2 + βt = o(1). Then

 2    2    2   E exp αZ + βZ 1{`≤Z≤m} = E exp αZ + βZ 1{0≤Z≤t} + E exp αZ + βZ 1{t

2 where (a) holds by (86); (b) holds because αa1 + βa1 = o(1) for 0 ≤ a1 ≤ t;(c) holds because a Pt λ 1 = eλ + o(1) for t = ω(1). a1=0 a1!

A.3 Sharp threshold for dense Erd˝os-R´enyi graphs In this subsection, we focus on the case of dense parent graph whose edge density satisfies

p ≤ 1 − Ω(1) and p = n−o(1). (88)

2 s2(1−p)2 log n Recall that Theorem3 implies that weak detection is impossible, if ρ = (1−ps)2 ≤ (2 − ) n . We improve over this condition by showing that if

 1  nps2 log − 1 + p ≤ (2 − ) log n, (89) p then weak detection is impossible. This completes the impossibility proof for Theorem2 in the dense regime of (88).

43 Without loss of generality, we assume that (89) holds with equality (otherwise one can further subsample the edges), i.e.,

 1  nps2 log − 1 + p = (2 − ) log n. (90) p

In the regime (88), under assumption (90), we have

nps2 = ω(1) and s = n−1/2+o(1). (91)

As argued in Section 3.3, analogous to the Gaussian case, the unconditional second moment 2 log n explodes when ρ ≥ (2 + ) n , due to the obstruction of fixed points, or more precisely, an atypically large magnitude of Q X . By (21), O∈O1 O

1 − η  1 − s a+b s(1 − η)ab L(a, b) = , 1 − ps 1 − η η(1 − s)

ps(1−s) where η , 1−ps , and then by (30), Y Y XO = Xij

O∈O1 i

−1 where F denotes the set of fixed points of σ = π ◦ πe, and X X X eA(F ) = Aij, eBπ (F ) = Bπ(i)π(j), and eA∧Bπ (F ) = AijBπ(i)π(j). i

1−s s(1−η) Since s > η, it follows that 1−η < 1 and (1−s)η > 1. Thus when eA∧Bπ (F ) is atypically Q large, i

k  2 log(2en/k) 1 ζ(k) ps2 exp 1 + W − , (93) , 2 e(k − 1)ps2 e where W is the Lambert W function defined on [−1, ∞) as the unique solution of W (x)eW (x) = x for x ≥ −1/e. Let

 1  α p log − 1 + p . (94) , p

For each S ⊂ [n], define the event s  |S| |S| 2en  E (A, B, π): e (S), e π (S) ≥ ps − 2 ps|S| log , e π (S) ≤ ζ (|S|) . (95) S , A B 2 2 |S| A∧B

44 We condition on the event \ E , ES. (96) S⊂[n]:αn≤|S|≤n

As previously explained in Section 4.1 for the Gaussian case, since we cannot condition on the set Q F , in order to truncate i

Lemma 6. Suppose αn = ω(1). Then P((A, B, π)) ∈ E) = 1 − e−Ω(αn).

k Proof. Fix an integer αn ≤ k ≤ n and let m = 2 . Let  log(1/δ) 1 t = p2mps log(1/δ), t0 = mps2 exp 1 + W − , emps2 e for a parameter δ to be specified later. Fix a subset S ⊂ [n] with |S| = k. As eA(S) ∼ Binom(m, ps) and eBπ (S) ∼ Binom(m, ps), using Chernoff’s bound for Binomial distributions (118), we get that with probability at least 1−2δ, 2 eA(S) ≥ mps − t and eBπ (S) ≥ mps − t. Moreover, since eA∧Bπ (S) ∼ Binom(m, ps ), Using the multiplicative Chernoff bound for Binomial distributions (119), we get that with probability at 0 least 1 − δ, eA∧Bπ (S) ≤ t . n en k Now, there are k ≤ k different choices of S ⊂ [n] with |S| = k. Thus by choosing 2en k Pn −k 1/δ = k and applying the union bound, we get that with probability at least 1−3 k=αn 2 = −Ω(αn) 0 1 − e , eA(S) ≥ mps − t, eBπ (S) ≥ mps − t, and eA∧Bπ (S) ≤ t for all S ⊂ [n] with |S| = k and all αn ≤ k ≤ n.

Now we compute the conditional second moment. By Lemma6, it follows from (52) that

" 2# " " ## P0(A, B) Y = (1 + o (1)) X 1 1 . EQ Q(A, B) Eπ⊥⊥πe EQ O {(A,B,π)∈E} {(A,B,πe)∈E} O∈O

To proceed further, we fix π, πe and separately consider the following two cases. Recall that α is defined in (94). Case 1: n1 ≤ αn. In this case, we simply drop the indicators and use the unconditional second moment: " # " # Y Y Y   X 1 1 ≤ X = 1 + ρ2|O| , EQ O {(A,B,π)∈E} {(A,B,πe)∈E} EQ O O∈O O∈O O∈O where the equality follows from (38).

45 Case 2: n1 > αn. In this case, we have that " # " # Y (a) Y X 1 1 ≤ X 1 EQ O {(A,B,π)∈E} {(A,B,πe)∈E} EQ O {(A,B,π)∈EF } O∈O O∈O   (b) Y Y = EQ [XO] EQ  XO1{(A,B,π)∈EF } O/∈O1 O∈O1   (c) Y  2|O| Y = 1 + ρ EQ  Xij1{(A,B,π)∈EF } , O/∈O1 i αn;(b) holds because XO is a function of (Aij,Bπ(i)π(j))(i,j) that are independent across different O ∈ O, and 1{(A,B,π)∈E } only  F depends on (Aij,Bπ(i)π(j))(i,j)∈O : O ∈ O1 ;(c) follows from (38). n1 Let m = 2 . Under event EF , we have that eA(F ) ≥ (1+o(1))mps and eBπ (F ) ≥ (1+o(1))mps. This is because by (95),     n1 log(n/n1) 2 log(n/n1) (a) log(1/α) (b) 1 (c) = = O = Θ 2 = o(1), mps (n1 − 1)ps αnps np s

1 1 o(1) where (a) holds because n1 ≥ αn;(b) holds due to α log α = Θ(1/p); (c) holds because p = n −1/2+o(1) and s = n . Moreover, under event EF , eA∧Bπ (F ) ≤ ζ(n1). Let

2 log(2en/n1) γ ≡ γ(n1) = 2 . (97) (n1 − 1)ps

2   γ−1  Then ζ(n1) = mps exp 1 + W e . The following lemma characterizes the behavior of ζ(n1) in the following three asymptotic regimes depending on γ.

2 Lemma 7. • If γ = o(1), ζ(n1) = (1 + o(1))mps .

2 2 • If γ = Θ(1), ζ(n1) = Θ(mps ). In particular, for all n1 ≥ αn, we have ζ(n1) = o(ms ).

2 2 • If γ = ω(1), ζ(n1) ≤ (e + o(1))mps γ/ log γ. In particular, for all n1 ≥ αn, ζ(n1) = o(ms ). γ−1 √ + Proof. • γ = o(1). Using the approximation W ( e ) = −1 + 2γ + O(γ)[CGH 96, eq.(4.22)], 2 we get that ζ(n1) = (1 + o(1))mps .

γ−1 2 • γ = Θ(1). In this regime, W ( e ) = Θ(1) and thus ζ(n1) = Θ(mps ). In particular, for all 2 2 1 1 n1 ≥ αn, we have ζ(n1) = Θ(mps ) = o(ms ). To see this, note that α log α = Θ(1/p). Thus,  log(1/α)   1   1  2 γ = O αnps2 = O np2s2 . Hence p = O nps2γ = o(1), as nps = ω(1) and γ = Θ(1).

• γ = ω(1). Using the approximation W (x) = log x−log log x+o(1) as x → ∞ [HH08, Theorem 2 2.7], we get that ζ(n1) ≤ (e+o(1))mps γ/ log γ = (2e+o(1))n1 log(2en/n1)/ log(γ). Moreover, for all n1 ≥ αn, ζ(n ) n log(n/n )  log(1/α)   1  1 = O 1 1 = O = Θ = o(1). ms2 ms2 log γ αns2 log γ nps2 log γ

46 2 2 In view of Lemma7, we get that ζ(n1) ≤ mps + o(ms ) for all n1 ≥ αn. For ease of notation, we henceforth write ζ(n1) simply as ζ. It follows from (92) that   Y EQ  Xij1{(A,B,π)∈EF } i

n1 2( ) " 2eA(F )+2eBπ (F ) 2eA∧Bπ (F ) #  1 − η  2  1 − s  s (1 − η) = 1 1 − ps EQ 1 − η (1 − s) η {(A,B,π)∈EF } " #  1 − η 2m  1 − s (4+o(1))mps s (1 − η)2eA∧Bπ (F ) ≤ 1 1 − ps 1 − η EQ (1 − s) η {eA∧Bπ (F )≤ζ} " # s (1 − η)2eA∧Bπ (F ) = exp −(2 + o(1))mps2(1 − p) 1 , EQ (1 − s) η {eA∧Bπ (F )≤ζ}

ps(1−s) where the last equality holds because in view of η = 1−ps and s = o(1),

1 − η  ps2(1 − p) log = log 1 + = (1 + o(1))ps2(1 − p) 1 − ps (1 − ps)2 1 − η  s(1 − p)  log = log 1 + = (1 + o(1))s(1 − p). 1 − s (1 − s)(1 − ps)

Let s (1 − η)2 (1 − η)(1 − ps)2 u = = = (1 + o(1))p−2. (1 − s) η p(1 − s)2 Then for any λ ∈ [0, 1], " # s (1 − η)2eA∧Bπ (F ) h i 1 ≤ uλeA∧Bπ (F )+(1−λ)ζ EQ (1 − s) η {eA∧Bπ (F )≤ζ} EQ  m = u(1−λ)ζ 1 + p2s2(uλ − 1) .

Optimizing over λ ∈ [0, 1], or equivalently, over y = uλ ∈ [1, u], we get (by taking the log and differentiating) that

2 2 m 2 2 −ζ m m(1 − p s )  ζ(1 − p s )  inf 1 + p2s2(y − 1) y−ζ = , 1≤y≤u m − ζ p2s2(m − ζ)

∗ ζ(1−p2s2) where the infimum is achieved at y = p2s2(m−ζ) . Note that

mup2s2 1 ≤ y∗ ≤ u ⇐⇒ mp2s2 ≤ ζ ≤ . 1 + up2s2 − p2s2

Since mps2 ≤ ζ ≤ mps2 + o(ms2), u = (1 + o(1))p−2, s = o(1), and p is bounded away from 1, it follows that y∗ ∈ [1, u].

47 Putting everything together, we get that   Y EQ  Xij1{(A,B,π)∈EF } i

" 2# " # P0(A, B) Y   ≤ (1 + o(1)) 1 + ρ2|O| 1 EQ Q(A, B) E {n1≤αn} O∈O   Y    ems2  + (1 + o(1)) 1 + ρ2|O| exp ζ log + o(ζ) 1 . E  ζ {n1>αn} O/∈O1

s(1−p) Recall that ρ = 1−ps ≤ s and by (28), N Y   n2 Y   k 1 + ρ2|O| = 1 + ρ2 1 + ρ2k

O/∈O1 k≥2 n N = (1 + o(1)) 1 + ρ2 2 1 + ρ4 2 2 4  ≤ (1 + o(1)) exp s n2 + s N2 ,

Q 2kNk where the first equality follows from (28); the second equality holds because k≥3 1 + ρ ≤ exp n2ρ6/2 = 1 + o(1) by (45) and ρ ≤ s = n−1/2+o(1) in view of (91). Similarly,

  n1 Y 2|O| 2( 2 ) 2 2  1 + ρ = 1 + ρ ≤ exp s n1/2 . O∈O1 Hence, " # P0(A, B)2 ≤ (1 + o(1)) exp s2n2/2 + s2n + s4N  1  EQ Q(A, B) E 1 2 2 {n1≤αn}   ems2   + (1 + o(1)) exp s2n + s4N − mps2(2 − p) + ζ log + o(ζ) 1 . E 2 2 ζ {n1>αn} We upper bound the two terms separately. For the first term, we apply (48) in Proposition2 with 2 2 1 1−o(1) µ = s /2, ν = 0, τ = s , a = 0, and b = αn = p(log p − 1 + p)n = n = ω(1). By the 2 1 assumption that ns p(log p − 1 + p) ≤ (2 − ) log n/n, it follows that 1  1  1 µb + ν + 2 − log b = nps2 log − 1 + p + 2 − (1 − o(1)) log n ≤ (− + o(1)) log n + 2 ≤ 0. 2 p 2

48 Thus, it follows from (48) in Proposition2 that

 2 2 2 4   E exp s n1/2 + s n2 + s N2 1{n1≤αn} ≤ 1 + o(1).

For the second term, we further divide into three cases according to Lemma7. We define

log2(nps2) log(nps2) β = and β0 = . nps2 100nps2

Recall γ as defined in (97). By Lemma7, we have that

2 (a) If βn ≤ n1 ≤ n, then γ = o(1) and ζ = (1 + o(1))mps ;

0 2 (b) If β n ≤ n1 ≤ βn, then γ ≤ 200 + o(1) and ζ = Θ mps ;

0 (c) If αn ≤ n1 ≤ β n, then γ ≥ 200 + o(1) and ζ = O (n1 log (n/n1) / log(γ)).

2 Case 2 (a) : βn ≤ n1 ≤ n. In this case, ζ = (1 + o(1))mps and hence

ems2 e ζ log + o(ζ) = (1 + o(1))mps2 log . ζ p Therefore,

  ems2   exp s2n + s4N − mps2(2 − p) + ζ log + o(ζ) 1 E 2 2 ζ {n1≥βn}   1 + o(1)  1   ≤ exp s2n + s4N + n2ps2 log − 1 + p 1 = o(1), E 2 2 2 1 p {n1≥βn}

2 where the first inequality uses the fact that m ≤ n1/2; the last equality holds by invoking (47) in 1+o(1) 2  1  2 Proposition2 with µ = 2 ps log p − 1 + p , ν = 0, τ = s , a = βn = ω(1), b = n = ω(1), and

1 + o(1)  1  1 µb + ν + 2 − log b = nps2 log − 1 + p + 2 − log n = − ( + o(1)) log n + 2 ≤ 0, 2 p 2

2  1  where the last equality follows from the assumption that nps log p − 1 + p = (2 − ) log n. When p is a fixed constant bounded away from 1, as α = Θ(1) and β = o(1), we have β = o(α); thus the regime αn ≤ n1 ≤ βn is vacuous. Thus henceforth we assume p = o(1). 0 2 Case 2 (b) : β n ≤ n1 ≤ βn. In this case, γ ≤ 200 + o(1) and ζ = O mps , so that

ems2 1 ζ log + o(ζ) ≤ c mps2 log ζ 1 p for a universal constant c1. Therefore,

  2   2 4 2 ems exp s n + s N − mps (2 − p) + ζ log + o(ζ) 1 0 E 2 2 ζ {β n≤n1≤βn}     2 4 2 2 1 ≤ exp s n + s N + c n ps log 1 0 = o(1), E 2 2 1 1 p {β n≤n1≤βn}

49 2 1 2 where the last equality holds by invoking Proposition2 with µ = c1ps log p , ν = 0, τ = s , a = β0n = ω(1), and b = βn = ω(1). Note that the conditions of Proposition2 for a = ω(1) are satisfied, in particular, µb + 2 − log b ≤ 0 for n sufficiently large. To see this, on the one hand, 1 µb = c βnps2 log = O (β log n) = o(log n), 1 p

2 1 where we used the fact that nps log p = O(log n) and β = o(1); on the other hand, log(b) = log(βn) = (1 + o(1)) log n, as log(β) = o(log n) by our choice of β. 0 Case 2 (c) : αn ≤ n1 ≤ β n. In this case, γ ≥ 200 + o(1) and

ζ = O (n1 log(n/n1)/ log(γ)) , so that 2 2 ems log(n/n1) n1s log γ ζ log + o(ζ) ≤ c1n1 log , n1ψ ζ log γ log(n/n1) for a universal constant c1. Therefore,

  2   2 4 2 ems exp s n + s N − mps (2 − p) + ζ log + o(ζ) 1 0 E 2 2 ζ {αn≤n1≤β n}   2 4  0 ≤ E exp s n2 + s nn2 + n1ψ 1{αn≤n1≤β n} = o(1),

0 where the last equality holds by invoking (47) in Proposition2 with µ = 0, ν = maxαn≤n1≤β n ψ(n1), a = αn, b = β0n, and the claim that ν = o (log(αn)) = o(log a) . Note that the conditions of Proposition2 for a = ω(1) are satisfied, in particular, ν + 3 ≤ log a and µb + ν + 2 − log b = ν + 2 − log b ≤ ν + 2 − log a ≤ 0 for n sufficiently large. 0 To finish the proof, it remains to verify the claim that ψ = o (log(αn)) for αn ≤ n1 ≤ β n. Note that log(αn) = (1 + o(1)) log n, as log(α) = o(log n) by the choice of α and the assumption that p = no(1). Thus it suffices to show that ψ = o(log n), i.e.,

log(n/n ) n s2 log γ 1 log 1 = o (log n) . log γ log(n/n1) Note that log(n/n ) 1 1 log log γ ≤ log(n/n ) ≤ log = o(log n). log γ 1 α Thus it suffices to show log(n/n ) n s2 1 log 1 = o (log n) , log γ log(n/n1) which further reduces to proving that maxx∈[1/β0,1/α] g(x) = o(log n), where

log x ns2 g(x) = log , x log x x log x log nps2

 x log x  0 since γ = O nps2 for x = n/n1 ∈ [1/β , 1/α]. To this end, let

1 δ = = o(1), log (log n/ log(1/α))

50 where the last equality holds because log(1/α) = (1 + o(1)) log(1/p) = o(log n) by the choice of α and the assumption that p = n−o(1). Let

ns2 x log x φ(x) = log x log − log δ log n. x log x nps2

It is sufficient to show that for all x ∈ [1/β0, 1/α], φ(x) ≤ 0, which immediately implies that g(x) ≤ δ log n = o(log n). Taking derivative of φ(x), we get that

1 ns2 log x  1  1  1  φ0(x) = log − 1 + − 1 + δ log n x x log x x log x x log x 1  δ log n = log(ns2) − 2 log x − log log x − 1 − δ log n − . x log x

By assumption that ns2α = (2 − ) log n, we get that 1 log(ns2) ≤ log(2 log n) + log = o(δ log n), α

log(1/α) log(1/α) log n where the last equality uses the fact that δ log n = log n log log(1/α) = o(1). Moreover, β0 = o(1) and hence x = ω(1). Therefore, φ0(x) ≤ 0 for all x ∈ [1/β0, 1/α]. Hence, to show φ(x) ≤ 0, it suffices to prove φ(1/β0) ≤ 0. Note that

1 ns2 (1/β0) log(1/β0) φ(1/β0) = log log − log δ log n β0 (1/β0) log(1/β0) nps2 1 ns2 (200 + o(1))nps2 = log log − log δ log n β0 (200 + o(1))nps2 nps2 1 1 = log log − log (200 + o(1)) δ log n ≤ 0, β0 (200 + o(1))p where the last inequality holds because ! 1 1 200nps2 log n  log n 2  log n  log ≤ log log ≤ O log = o , δ β0 log(nps2) log(1/α) log(1/p) log(1/p) where we used the fact that nps2 = O(log n/ log(1/p)) and log(1/α) = (1 + o(1)) log(1/p) when p = o(1).

B Supplementary Proofs for Section5

B.1 Proof of Proposition3: Long orbits Fix any σ = π−1 ◦ π. Since {X } are mutually independent, it follows that e O O/∈Ok   Y Y Y  2|O| EQ  XO = EQ [XO] = 1 + ρ . O/∈Ok O/∈Ok O/∈Ok

51 For any Oij ∈/ Ok, we have i or j is from node orbit with length larger than k, or Oij has length  k+1  larger than k. By Section 5.1, we know that |Oij| ≥ 2 . It follows that   n n (2) P(2) Nm (a) Nm Y Y  2|O|  k m≥d k+1 e EQ  XO ≤ 1 + ρ ≤ 1 + ρ 2 k+1 O/∈Ok m=d 2 e 2 (b)   n ≤ 1 + ρk k .

2m 2m k  k+1  where (a) holds, since 1+ρ decreases when m increases, and 1+ρ ≤ 1+ρ for any m ≥ 2 ; n (b) holds since the total number of edges is 2 , and the total number of edge orbits with length at n 2  k+1  (2) n least 2 is at most k+1 ≤ k . d 2 e

B.2 Proof of Proposition4: Incomplete orbits −1 Fix any σ = π ◦ πe. Since edge orbits are disjoint, and Ai,j and Bij are i.i.d. Bern(ps) across all 1 ≤ i < j ≤ n under Q, it follows that {Aij,Bπ(i)π(j)}(i,j)∈O are mutually independent across different edge orbits O under Q. Recall that an edge orbit O ∈ Jk if and only if Aij = Bπ(i)π(j) = 1 for all (i, j) ∈ O. Therefore, conditional on Jk = J , {Aij,Bπ(i)π(j)}(i,j)∈O are independent across all edge orbits O ∈ Ok\J . In particular, the distribution of {Aij,Bπ(i)π(j)}(i,j)∈O for O/∈ J conditional on Jk = J is the same as that conditional on O/∈ Jk. Since XO is a function of {Aij,Bπ(i)π(j)}(i,j)∈O, it follows that   h i h i Y Y Y EQ  XO Jk = J  = EQ XO Jk = J = EQ XO O/∈ Jk . O∈Ok\Jk O∈Ok\J O∈Ok\J h i

Therefore, to prove Proposition4, it suffices to show EQ XO O/∈ Jk ≤ 1. Note that     EQ XO1{O/∈Jk} EQ [XO] − EQ XO1{O∈Jk} EQ [XO|O/∈ Jk] = = . P (O/∈ Jk) 1 − P (O ∈ Jk) 2|O|  1  Recall that O ∈ Jk if and only if Aij = 1,Bπ(i)π(j) = 1 for all (i, j) ∈ O, in which case XO = p . 2|O| Thus P {O ∈ Jk} = (ps) and 12|O| X 1  = (ps)2|O| = s2|O|. EQ O {O∈Jk} p

2|O| s(1−p) Recall that EQ [XO] = 1 + ρ , where ρ = 1−ps . Combining this with the last two displayed equation yields that   2|O| 1 − s2|O| 1 − 1−p 1 + ρ2|O| − s2|O| 1−ps EQ [XO|O/∈ Jk] = = ≤ 1, 1 − (ps)2|O| 1 − (ps)2|O| where the last inequality holds by the following claim: if p ≤ 1/2 and s ≤ 1/2, then

 1 − p 2|O| 1 − ≥ p2|O|, ∀|O| ≥ 1. (98) 1 − ps

52 2|O|  1−p  2|O| Indeed, as |O| increases, 1 − 1−ps increases while p decreases, so it suffices to verify 2  1−p  2 2 1−p (98) for |O| = 1. When |O| = 1, we have 1 − 1−ps ≥ p ⇐⇒ (1 − ps) ≥ 1+p , which holds 1 1 when p ≤ 2 and s ≤ 2 .

B.3 Proof of Proposition5: Averaging over orbit lengths Pt 1 In the following proof, for any t ∈ N, denote the tth harmonic number by ht , `=1 ` and h0 , 0. Under the assumption of s ≤ 0.1, we have 2msm ≤ 0.04 when m ≥ 2 and 2msm ≤ 0.2 for m = 1. Thus, for any 1 ≤ m ≤ k,

m 2m X 4m m X 1 + s nm1{m:even} + 2s `n` + s mn2m1{2m≤k} ≤ 1 + 1.04s n`1{`≤k}. (99) `≤m `≤2m Hence, to prove (71), it suffices to show, for s ≤ 0.1:

n  k   m  Y m X E  1 + 1.04s n`1{`≤k}  = O(1). (100) m=1 `≤2m

To prove (72), it suffices to show, for s = o(1):

n  k   m  Y m X E  1 + 1.04s n`1{`≤k}  = 1 + o(1). (101) m=1 `≤2m

n p hQk  m P  m i Pick η = log k log (n/k). We can write E m=1 1 + 1.04s `≤2m n`1{`≤k} as (I) + (II) where n  k   m  Y m X (I) =  1 + 1.04s n`1{`≤k} 1 Pk  , E { `=1 n`<ηhk} m=1 `≤2m n  k   m  Y m X (II) =  1 + 1.04s n`1{`≤k} 1 Pk  . E { `=1 n`≥ηhk} m=1 `≤2m

To bound (I), we first apply (122) in Lemma 13 and get: n TV (L (n , . . . , n ) , L (Z ,...,Z )) ≤ F , 1 k 1 k k

ind. 1 n n  where L denotes the law of random variables, Z` ∼ Poi( ` ), and log F (n/k) = −(1 + o(1)) k log k when k = o(n), with o(1) depending only on k/n. Then, we have

 Z  k   m Y X n (I) ≤  1 + 1.04sm Z 1  + 2F (1 + 1.04sηh )ηhk . (102) E   ` {`≤k}  k k m=1 `≤2m

Here the second term satisfies

n (a)  n n  (b) F (1 + 1.04sηh )ηhk ≤ exp −(1 + o(1)) log + 1.04sη2h2 = o(1), k k k k k

53 2 2 where (a) holds because for x ≥ 0, log(1 + x) ≤ x;(b) holds since hk ≤ log k + 1, and k(log k) η = n  4 o n log k under our assumption of k(log k) = o(n). To bound (II), applying (121) in Lemma 12, we get that

 Z  k   m Y m X hk (II) ≤  1 + 1.04s Z`1{`≤k} 1 Pk  e E  { `=1 Z`≥ηhk} m=1 `≤2m

  Zm  k Pk ! Y X Z` ≤  1 + 1.04sm Z 1 exp `=1  . (103) E   ` {`≤k} η  m=1 `≤2m

Since η = log kplog(n/k) = ω(log k), for s = 0.1, the desired (100) follows from applying (104) in Lemma8 to (102) and (103); for s = o(1), the desired (101) follows from applying (105) in Lemma8 to (102) and (103). Hence, our desired result follows.

Lemma 8. Suppose η = ω(log k). If s ≤ 0.1,

  Zm  k Pk ! Y X Z`  1 + 1.04sm Z 1 exp `=1  = O(1). (104) E   ` {`≤k} η  m=1 `≤2m

In particular, if s = o(1),

  Zm  k Pk ! Y X Z`  1 + 1.04sm Z 1 exp `=1  = 1 + o(1). (105) E   ` {`≤k} η  m=1 `≤2m

To show Lemma8 we need the following elementary result (proved at the end of this subsection):

α+1 1 Lemma 9. Let a, b, d, α, λ > 0 such that be λ < 4 . Let X ∼ Poi(λ). Then

h X+d i d α+1 d α α+1   E (a + bX) exp (αX) ≤ a 1 + 2be λ exp λ ae 1 + 2be λ − 1 + 27 exp (−λ) max  (4bd)d , ad . (106)

Moreover, if d = 0, and a, b, α, λ are fixed constants such that beα+1λ < 1,

h X i E (a + bX) exp (αX) = O(1). (107)

In particular, if d = 0, a = 1, b = o(1), α = o(1), and λ is some fixed constant,

h X i E (1 + bX) exp (αX) = 1 + o(1). (108)

Before proceeding to the proof of Lemma8, we pause to note that a direct application of (107) (with X = Z1 + ... + Zk, a = 1, b = 1.04s, α = 1/η) yields the condition s log k = O(1). Since k will be chosen to be Θ(log n) eventually, this translates into the statement that strong detection 1 is impossible if s = O( log log n ). In order to improve this to s = O(1), the key idea is to partition the product over [k] in (104) into subsets and recursively peel off the expectation backwards by repeated applications of Lemma9.

54 Proof of Lemma8. Let m0 = 0, m1 = 1, and iteratively define

m j  −2`+1 − `−1 k m` = exp 3 × 2 s 2 , 2 ≤ ` ≤ r, (109) where r is the integer such that mr−1 < k ≤ mr. Note that for s ≤ 0.1, we have m2 ≥ 3 = 3m1; moreover, if s = o(1), we have m2 = ω(1). Note that it is possible that mr−1 is very close to k and mr is much larger than k, especially when s = o(1). To simplify the argument, define 2 K = min{k , mr}. Since the quantity inside the expectation in (104) increases in k, it suffices to bound   Zm  K PK ! Y X Z`  1 + 1.04sm Z 1 exp `=1  E   ` {`≤K} η  m=1 `≤2m and we can assume, without loss of generality, that k = ω(1). Next we partition [K] into the following r subsets:

{m0 + 1, ··· , m1}, {m1 + 1, ··· , m2}, ··· , {mr−1 + 1, ··· ,K}. For 1 ≤ ` ≤ r, define

m 2m`−1 m ` X` X X` X A` , Zt1{t≤K},B` , Zt1{t≤K},C` , Zt1{t≤K},M` , At, t=m`−1+1 t=m`−1+1 t=2m`−1+1 t=1 PK and Br+1 = 0. Note that A` = B` + C` and, by the definition of r, Mr = t=1 Zt. Furthermore, for 1 ≤ ` ≤ r,

ind. C` + B`+1 ∼ Poi (λ`) , where λ` , h2m`∧K − h2m`−1∧K . Then we can upper bound (104) as     r m`   Y Y Zm Mr (104) ≤ 1 + 1.04sm`−1+1 (M + B ) exp E   ` `+1  η  `=1 m=m`−1+1 " r  # Y A` Mr = 1 + 1.04sm`−1+1 (M + B ) exp . (110) E ` `+1 η `=1

r r 1 Define the sequence {α`}`=1 and {β`}`=1 backward recursively by αr = η , βr = 0 and for 2 ≤ ` ≤ r,

α` m`−1+1 m`−1+1 α`+1  m`−1+1 α`+1  α`−1 , α` + 1.04λ`e s 1 + 2.08s e λ` + log 1 + 2.08s e λ` , (111) α` m`−1+1 α`+1   β`−1 , β` + λ` e 1 + 2.08s e λ` − 1 . (112) For 1 ≤ ` ≤ r, define  `−1 ! Y mt−1+1 At S` , E 1 + 1.04s (Mt + Bt+1) t=1 A +B  m`−1+1  ` `+1 α`(M`+B`+1)+β` 1 + 1.04s (M` + B`+1) e , (113)

Since Br+1 = 0, it follows that Sr is precisely the RHS of (110) and our goal is to show Sr = O(1). This is accomplished by the following sequence of claims:

55 m +1 2 m +1 −4`+2 (C1) For any 2 ≤ ` ≤ r, (log m`) s `−1 ≤ (log m`) s `−1 ≤ 9s × 2 ;

2 2`−2 (C2) For any 2 ≤ ` ≤ r, m` ≥ m`−1 and m` ≥ (m2) ;

m` 1 m`−1 (C3) For any 2 ≤ ` ≤ r, (log m`+1) s ≤ 8 (log m`) s ; Pr (C4) For any 2 ≤ ` ≤ r, λ` ≤ log m`, and `=2 exp (−λ`) = O(1), in particular, if s = o(1), Pr `=2 exp (−λ`) = o(1); 2 (C5) For 1 ≤ ` ≤ r, α` ≤ 5 , and β1 ≤ 3, in particular, if s = o(1), for 1 ≤ ` ≤ r, α` = o(1), and β1 = o(1);

(C6) For any 2 ≤ ` ≤ r, S` ≤ S`−1 (1 + 27 exp (−λ`)), and Sr = O(1), in particular, if s = o(1), Sr = 1 + o(1). We finish the proof by verifying these claims:

• Proof of (C1): This follows from the definition of m` given in (109).

• Proof of (C2): It suffices to prove the first inequality. We proceed by induction. For the base case of ` = 2, recall that we have shown under the assumption s ≤ 0.1, we have 2 m2 ≥ 3m1 ≥ m1. By (109), we can get that

m3 1   m2   −5 − 2 2 ≥ 2 exp 3 × 2 0.1 − 1 ≥ 2, m2 m2

  x   1 −5 − 2 where the last inequality holds because m2 ≥ 3 and x 7→ x2 exp 3 × 2 0.1 − 1 in- 2 creases for x ≥ 3. Then we have m3 ≥ 2m2 ≥ 6m2 given m2 ≥ 3. 2 Fix any 4 ≤ ` ≤ r, suppose we have shown for every 2 ≤ t ≤ ` − 1, mt ≥ mt−1. By (109), m  −2`+3 − `−2  m`−1 ≤ exp 3 × 2 s 2 and

m  6m   −2`+1 − `−1  −2`+1 − `−2 m` ≥ exp 3 × 2 s 2 − 1 ≥ exp 3 × 2 s 2 − 1,

where the last inequality holds because m`−1 ≥ 6m`−2 for ` ≥ 4 following from m3 ≥ 6m2 2 and the induction hypothesis m`−1 ≥ m`−2. Then we have

m (a) m m`  −2`+3 − `−2  −2 − 5 m   −2`+3 − `−2  ≥ exp 3 × 2 s 2 2 s 2 `−2 − 1 − 1 ≥ exp 3 × 2 s 2 × 2 − 1 m`−1 (b) (c) 2 ≥ m`−1 − 1 ≥ m`−1,

5 −2 − m`−2 where (a) holds by 2 s 2 ≥ 3 given s ≤ 0.1 and m`−2 ≥ 3; (b) holds by m`−1 ≤ m  −2`+3 − `−2  exp 3 × 2 s 2 ;(c) holds by m`−1 ≥ 6m`−2 ≥ 18 for ` ≥ 4 given m2 ≥ 3. Hence, (C2) follows.

2 m` 1 2 m`−1 • Proof of (C3): We prove a stronger statement: (log m`+1) s ≤ 8 (log m`) s for 2 ≤ ` ≤ 2 m −4`−2 2 m −4`+1 r. Since (log m`+1) s ` ≤ 9 × 2 by (C1), it suffices to show (log m`) s `−1 ≥ 9 × 2 2 m −7 for ` ≥ 2. For ` = 2, we have (log m2) s 1 ≥ 9 × 2 , since m2 ≥ 3 and s ≤ 0.1. By (109),

we have m m  −2`+1 − `−1   −2`+ 1 − `−1  m` ≥ exp 3 × 2 s 2 − 1 ≥ exp 3 × 2 2 s 2 ,

56 √ where the last inequality holds because exp(x) − 1 ≥ exp(x/ 2) for x ≥ 1.22, and 3 × m −2`+1 − `−1 `−2 2 s 2 ≥ ln m` ≥ 2 ln m2 ≥ 2 for ` ≥ 3, by (C1), (C2), and m2 ≥ 3.

2 • Proof of (C4): Note that 2mr−1 ≤ mr by (C2). Moreover, 2mr−1 < k as mr−1 < k 2 and k = ω(1). Since K = min{k , mr}, it follows that 2mr−1 ≤ K ≤ mr. Therefore,

λr = hK − h2mr−1 and λ` = h2m` − h2m`−1 for 1 ≤ ` ≤ r − 1. R n 1 n Since for any n, k ∈ N, hn − hk ≤ k x dx ≤ log k , then for any 2 ≤ ` ≤ r,

m` λ` ≤ h2m` − h2m`−1 ≤ log ≤ log m`, m`−1

where the last inequality holds due to m`−1 ≥ 1 for ` ≥ 2. R n+1 1 n+1 n Conversely, since for any n, k ∈ N, hn − hk ≥ k+1 x dx ≥ log k+1 ≥ log k − 1, it follows that for any 2 ≤ ` ≤ r − 1,   m` λ` = h2m` − h2m`−1 ≥ log − 1, m`−1 and λ ≥ log K − 1. If K = k2, then K ≥ km and thus λ ≥ log k − 1; otherwise, r 2mr−1 r−1 r 2 K = m and thus λ ≥ log mr − 1. Hence, we get that r r 2mr−1

r r (a) r (b) X X 2em`−1 2e 2e X 2e 3e exp (−λ`) ≤ + ≤ + + o(1) ≤ + o(1) = O(1), m` k m2 m`−1 m2 `=2 `=2 `=3

2 where (a) holds by m1 = 1, k = ω(1), and (C2) so that m` ≥ m`−1 for ` ≥ 2; (b) holds 2`−2 because in view of (C2), m` ≥ (m2) and m2 ≥ 3, so that

r−1 r−1 ∞ X 1 X 1 X 1 1 1 1 ≤ ≤ = ≤ . 2`−2 ` 2 −1 m`−1 (m ) (m2) m 1 − m 2m2 `=3 `=3 2 `=2 2 2

In particular, if s = o(1), then m2 = ω(1) and we have

r X 3e exp (−λ`) ≤ + o(1) = o(1). m2 `=2

Hence, (C4) follows.

• Proof of (C5): For 2 ≤ ` ≤ r, let c = 1.04, we define

m`−1+1 2 2 2m`−1+2 ψ` , (2 + 4e)c (log m`) s + 8ec (log m`) s .

2 1 2 We prove α` ≤ 5 for 1 ≤ ` ≤ r by induction. Since η = ω(1) ≥ 6, we have αr = η < 5 . Fix 2 αt any 2 ≤ ` ≤ r. Suppose we have shown for any ` ≤ t ≤ r, αt ≤ 5 . Then e ≤ 1 + 2αt, 2αt e ≤ 1 + 4αt. Since log(1 + x) ≤ x for x ≥ 0, by (111) we have

αt mt−1+1 2 2mt−1+2 2αt 2 αt−1 ≤ αt + (1 + 2e) cλte s + 2ec s e λt mt−1+1 2 2mt−1+2 2 ≤ αt + (1 + 2e) cλt (1 + 2αt) s + 2ec s (1 + 4αt) λt 1 ≤ α (1 + ψ ) + ψ , t t 2 t

57 where the last inequality holds by (C4). By the induction hypothesis,

r r t−1 Y 1 X Y α ≤ α (1 + ψ ) + ψ (1 + ψ ) `−1 r t 2 t j t=` t=` j=` r ! r ! 1 X X (a) 9 36 (b) 2 ≤ α + ψ exp ψ < α + ψ < , (114) r 2 t t 7 r 49 ` 5 t=` t=`

Pr 9 Pr 8 1 where (a) holds by exp ( t=` ψt) ≤ 7 , since t=` ψt ≤ 7 ψ` following from ψt+1 ≤ 8 ψt by m +1 −6 −6 (C3), and ψ` ≤ 0.2 for ` ≥ 2 because (log m`) s `−1 ≤ 9s × 2 ≤ 2 by (C1);(b) holds 1 1 because αr = η ≤ 6 by η = ω(1) and ψ` ≤ 0.2. Pr 8 In particular, if s = o(1), by (C3), we have t=` ψt ≤ 7 ψ` = o(1) because ψ` = o(1) for m +1 −4`+2 ` ≥ 2 following from (log m`) s `−1 ≤ 9s × 2 = o(1) by (C1) and s = o(1). Then for 2 ≤ ` ≤ r, we have r ! r ! 1 X X α ≤ α + ψ exp ψ = (α + o(1)) (1 + o(1)) = o(1), `−1 r 2 t t r t=` t=` 1 where the last equality holds because αr = η = o(1) given η = ω(log k).

2 α` 2α` Since we have shown that α` ≤ 5 for 1 ≤ ` ≤ r, it follows that e ≤ 1+2α` and e ≤ 1+4α`. Then by (112), we can get that for 2 ≤ ` ≤ r,

 m`−1+1 β`−1 ≤ β` + λ` 2α` + 2ecs (1 + 4α`)λ` (a)  8 ≤ β + 2α λ + 1 + 2ecλ2sm`−1+1 ` ` ` 5 ` (b) 9 36  26 ≤ β + 2 α + ψ λ + ecλ2sm`−1+1 ` 7 r 49 `+1 ` 5 ` (c) 18 9 26 < β + α λ + ψ log(m ) + ec (log m )2 sm`−1+1, ` 7 r ` 49 ` ` 5 ` 2 9 36 where (a) holds by α` ≤ 5 ;(b) holds by (114) so that α` ≤ 7 αr + 49 ψ`+1;(c) holds because 1 m`−1+1 −4`+2 λ` ≤ log m` for ` ≥ 2 by (C4) and ψ`+1 ≤ 8 ψ` by (C3). Since (log m`) s ≤ 9s×2 ≤ 2−6 by (C1), it follows that 8ec2 ψ ≤ (2 + 4e)c (log m ) sm`−1+1 + (log m ) sm`−1+1 ≤ 14 (log m ) sm`−1+1. ` ` 26 ` ` Combining the last two displayed equation yields that 18 18 β ≤ β + α λ + 18 (log m )2 sm`−1+1 ≤ β + α λ + 18 × 9s × 2−4`+2, `−1 ` 7 r ` ` ` 7 r ` 2 m +1 −4`+2 where the last inequality holds in view of (log m`) s `−1 ≤ 9s × 2 by (C1).

By the telescoping summation of the last displayed equation and βr = 0, we get r r 18 X X β ≤ α λ + 18 × 9s × 2−4`+2 ≤ 3, 1 7 r ` `=2 `=2 Pr 1 1 where the last equality holds by since `=2 λ` ≤ log K ≤ 2 log k and αr = η = ω(log k) . In particular if s = o(1), we have β1 = o(1).

58 • Proof of (C6)) First, we prove for any 2 ≤ ` ≤ r, S` ≤ S`−1 (1 + 27 exp (−λ`)). Since M` = M`−1 + B` + C` and A` = B` + C`, by (113), letting c = 1.04, we have  `−1 ! Y mt−1+1 At S` ≤ E 1 + cs (Mt + Bt+1) t=1 B +C +B  m`−1+1  ` ` `+1 1 + cs (M`−1 + B` + C` + B`+1) exp (α` (M`−1 + B` + C` + B`+1) + β`)

 `−1 ! Y mt−1+1 At = E 1 + cs (Mt + Bt+1) exp (α` (M`−1 + B`) + β`) t=1 B +C +B  m`−1+1  ` ` `+1 1 + cs (M`−1 + B` + C` + B`+1) exp (α` (C` + B`+1)) . (115)

`−1 Note that {Bt,At}t=1 and B` are independent from C` + B`+1. To proceed, we condition `−1 on {Bt,At}t=1 and B`, and take expectation over C` + B`+1 by applying (106) in Lemma9. m +1 m +1 In particular, let X = C` + B`+1 ∼ Poi (λ`), a = 1 + cs `−1 (M`−1 + B`), b = cs `−1 , 2 d = B`, λ = λ`, and α = α` ≤ 5 by (C5). By (C4), λ` ≤ log m` for ` ≥ 2, and thus 1 beα+1λ = csm`−1+1eα`+1λ ≤ ce1.4sm`−1+1 log m ≤ ce1.4 × 2−6 < , ` ` 4 where the second-to-the-last inequality holds by (C1). For ease of notation, let ξ = 1+2beα+1λ and note that a = 1 + b (M`−1 + B`). Then it follows from (106) in Lemma9 that

h B +C +B i m`−1+1  ` ` `+1 EC`+B`+1 1 + cs (M`−1 + B` + C` + B`+1) exp (α` (C` + B`+1)) | M`−1,B`

B` B` α` ≤ (1 + b (M`−1 + B`)) ξ exp {λ` [(1 + b (M`−1 + B`)) e ξ − 1]}

 B` B` + 27 exp (−λ`) max (4bB`) , (1 + b (M`−1 + B`))

B` M`−1+B` α` ≤ (1 + b (M`−1 + B`)) ξ exp {λ` [(1 + b (M`−1 + B`)) e ξ − 1]}

B` + 27 exp (−λ`) (1 + 4b (M`−1 + B`))

B` α` α` ≤ (1 + 4b (M`−1 + B`)) exp {(λ`be ξ + log ξ)(M`−1 + B`) + λ` (e ξ − 1)} (1 + 27 exp (−λ`)) B m`−2+1  ` ≤ 1 + cs (M`−1 + B`) exp {(α`−1 − α`)(M`−1 + B`) + (β`−1 − β`)} (1 + 27 exp (−λ`)) ,

where the last inequality holds in view of (111) and (112), and the fact that 4sm`−1 ≤ sm`−2 by (C2) and s ≤ 0.1. Combining the above result with (115), we get  `−1 ! Y mt−1+1 At S` ≤ E 1 + cs (Mt + Bt+1) t=1 B  m`−2+1  ` 1 + cs (M`−1 + B`) exp (α`−1 (M`−1 + B`) + β`−1) (1 + 27 exp (−λ`))

= S`−1 (1 + 27 exp (−λ`)) . Applying the last displayed equation recursively, we get that

r r ! Y X (104) = Sr ≤ S1 (1 + 27 exp (−λ`)) ≤ S1 exp 27 exp (−λ`) = O(S1), `=2 `=2

59 where the last equality holds by (C4), in particular if s = o(1), we have

Sr = S1 (1 + o(1)) .

It remains to calculate S1. Note that

 A1+B2  S1 = E (1 + cs(M1 + B2)) exp (α1(M1 + B2) + β1)  C1+B2  = E (1 + cs(C1 + B2)) exp (α1(C1 + B2)) exp (β1) ,

where the last equality holds due to B1 = 0 and M1 = A1 = C1. We apply (107) in Lemma9, 3 2 with X = C1 + B2 ∼ Poi(λ1), a = 1, b = 1.04s ≤ 0.104, λ = λ1 = 2 and α = α1 ≤ 5 by (C5). α+1 1.4 3 Noting that be λ ≤ 1.04 · 0.1 · e · 2 < 0.633 and β1 ≤ 3 by (C5), we conclude S1 = O(1) and hence Sr = O(1). In particular, if s = o(1), by (108) in Lemma9 with a = 1, b = 1.04s = o(1), d = 0, 3 α = α1 = o(1) by (C5), and λ = 2 , we have S1 = 1 + o(1) and hence Sr = 1 + o(1).

Proof of Lemma9. First, we show (106): Let γ = 2eα+1a. Write

h X+d i h X+d i h X+d i E (a + bX) exp (αX) = E (a + bX) exp (αX) 1{X<γλ} + E (a + bX) exp (αX) 1{X≥γλ} = (I) + (II).

Then we have

h X+d i (I) ≤ E (a + bγλ) exp (αX) d = (a + bγλ) E [exp (X (log (a + bγλ) + α))] (a) = (a + bγλ)d exp (λ (eα (a + bγλ) − 1)) d = a + 2abeα+1λ exp λ aeα 1 + 2beα+1λ − 1 ,

t  where (a) holds by the moment generating function E [exp (tX)] = exp λ e − 1 . Then, directly substituting the Poisson PMF into (II), we get

X λk (II) ≤ e−λ (a + bk)k+d exp (αk) k! k≥γλ k (a) X eλ ≤ e−λ (a + bk)k+d exp (αk) k k≥γλ k X aeα+1  ≤ e−λ + beα+1λ (a + bk)d γ k≥γλ 1 !k (b) X 3e 4 ≤ e−λ exp (f (k)) , 4 k≥γλ

k k α+1 where (a) holds because k! ≥ e ;(b) holds by the choice of γ = 2ae , our assumption that beα+1λ < 1 , and defining: 4 x f(x) d log (a + bx) − . , 4

60 0 bd 1 a As b, d > 0, f(x) is concave and f (x) = a+bx − 4 , which equals 0 when x = 4d − b . Therefore, if a a a a 4d ≥ b , then f(x) for x ≥ 0 is maximized at x = 4d − b , and f(4d − b ) = d log (4bd) − d + 4b ≤ d log (4bd). Otherwise, f(x) for x ≥ 0 is maximized at x = 0, and f(0) = d log a. In conclusion, we have maxk≥0 f(k) ≤ max{d log (4bd) , d log a}. Then we get an upper bound on (II) as

1 !k X 3e 4 (II) ≤ exp (−λ) max  (4bd)d , ad 4 k≥γλ ≤ 27 exp (−λ) max  (4bd)d , ad .

3 1/4 4 e where the last inequality follows by 3 1/4 < 27. 1− 4 e Next we prove (107). Substituting the Poisson PMF into (107) yields that

∞ h i X λke−λ (a + bX)X exp (αX) = (a + bk)k exp(αk) E k! k=0 ∞ k! (a) X aλeα+1  ≤ e−λ 1 + + beα+1λ k k=1 (b) = O(1).

k k where (a) holds due to k! ≥ e for k ≥ 1, and (b) holds because a, b, α, λ are fixed constants such that beα+1λ < 1. It remains to prove (108). Given b = o(1), α = o(1), we pick t = ω(1) such that bt2 + αt = o(1) and get:

t ∞ h i X λke−λ X λke−λ (1 + bX)X exp (αX) = (1 + bk)keαk + (1 + bk)keαk E k! k! k=0 k=t+1 t ∞ k (a) X λke−λ X λeα+1  ≤ exp bk2 + αk + e−λ + beα+1λ k! k k=0 k=t+1 = 1 + o(1), where the last equality holds because exp bk2 + αk = 1 + o(1) for any 0 ≤ k ≤ t given bt2 + αt = k P∞  λeα+1 α+1  o(1), and k=t+1 k + be λ = o(1) for any k > t given t = ω(1), b = o(1), α = o(1), and λ is some constant.

C Concentration Inequalities for Gaussians and Binomials

n Lemma 10 (Hanson-Wright inequality). Let X,Y ∈ R are standard Gaussian vectors such that  0  1 ρ   n×n the pairs (Xi,Yi) ∼ N ( 0 ) , ρ 1 are independent for i = 1, . . . , n. Let M ∈ R be any deterministic matrix. There exists some universal constant c > 0 such that with probability at least 1 − 2δ,   > p X MY − ρTr(M) ≤ c kMkF log(1/δ) + kMk log(1/δ) . (116)

61 Proof. When ρ = 1, i.e. X = Y , the bilinear form reduces to a quadratic form and this lemma is the original Hanson-Wright inequality [HW71, RV13]. In general, note that 1 1 X>MY = (X + Y )>M(X + Y ) − (X − Y )>M(X − Y ). 4 4 Thus, it suffices to analyze the two terms separately. Note that

h > i E (X ± Y ) M(X ± Y ) = (2 ± 2ρ)Tr(M). Applying the Hanson-Wright inequality, we get that with probability at least 1 − δ, h i   > > p (X ± Y ) M(X ± Y ) − E (X ± Y ) M(X ± Y ) ≤ c kMkF log(1/δ) + kMk log(1/δ) , where c is some universal constant. The conclusion readily follows by combining the last three displayed equations.

Lemma 11 (Chernoff’s inequality for Binomials). Suppose X ∼ Binom(n, p) with mean µ = np. Then for any δ > 0,

P {X ≥ (1 + δ)µ} ≤ exp (−µ ((1 + δ) log(1 + δ) − δ)) , (117) and  δ2  {X ≤ (1 − δ)µ} ≤ exp − µ . (118) P 2 In particular, it follows from (117) that

P {X ≥ τ} ≤ exp (−t) , ∀t > 0, (119)

n  t 1 o where τ = µ exp 1 + W eµ − e and W (x) is the Lambert W function defined on [−1, ∞] as the unique solution of W (x)eW (x) = x for x ≥ −1/e. Proof. Note that (117) and (118) are direct consequences of [MU05, Theorems 4.4 and 4.5]. To derive (119) from (117), we use y y  y  x y = exp (1 + W (x/e)) ⇐⇒ log = W (x/e) ⇐⇒ log exp log = ⇐⇒ y log y − y = x, e e e e and let x = t/µ − 1, y = 1 + δ, and τ = yµ.

D Facts on Random Permutation

In this appendix we collect several useful facts about random permutation (cf. [AT92]). For any ` ∈ N, let n` denote the number of `-cycles in a uniform random permutation σ ∈ Sn. Let {Z`}1≤`≤k 1  denote a sequence of independent Poisson random variables where Z` ∼ Poi ` .

Lemma 12. For any k ∈ [n] and a1, a2, ··· , ak ∈ Z+, 1 P {n1 ≥ a1, n2 ≥ a2, ··· , nk ≥ ak} ≤ . (120) Qk a` `=1 ` a`! Consequently, for any nonnegative function g,

E[g(n1, . . . , nk)] ≤ E[g(Z1,...,Zk)] exp (hk) (121) P 1 where hk = 1≤`≤k k denote the harmonic number.

62 h i Pk Q n` Proof. It suffices to check the first inequality. Note that for all `a` ≤ n, = `=1 E 1≤`≤k a` 1 (see e.g. [AT92, Eq. (5)]). Then (120) follows due to 1 ≤ n` for 1 ≤ ` ≤ k. Qk a` {n`≥a`} a `=1 ` a`! ` Lemma 13 ([AT92, Theorem 2]). For any 1 ≤ k < n, the total variation distance between the law of {n`}1≤`≤k and the law of {Z`}1≤`≤k satisfies: n TV (L (n , n , . . . , n ) , L (Z ,Z ,...,Z )) ≤ F , (122) 1 2 k 1 2 k k √ 2m−1 1 x −x where F (x) = 2πm (m−1)! + m! + 3 e , with m , dxe , so that log F (x) = −x log x(1 + o(1)) as x → ∞.

Acknowledgment

The authors thank Cristopher Moore for simplifying the proof of Proposition1. J. Xu would like to thank Tselil Schramm for helpful discussions on the hypothesis testing problem at the early stage of the project.

References

[ACV14] Ery Arias-Castro and Nicolas Verzelen. Community detection in dense random net- works. The Annals of Statistics, 42(3):940–969, 2014.

[AT92] Richard Arratia and Simon Tavar´e.The cycle structure of random permutations. The Annals of Probability, pages 1567–1591, 1992.

[BBM05] Alexander C Berg, Tamara L Berg, and Jitendra Malik. Shape matching and object recognition using low distortion correspondences. In 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), volume 1, pages 26–33. IEEE, 2005.

[BBSV19] Paul Balister, B´elaBollob´as,Julian Sahasrabudhe, and Alexander Veremyev. Dense subgraphs in random graphs. Discrete Applied Mathematics, 260:66–74, 2019.

[BCL+19] Boaz Barak, Chi-Ning Chou, Zhixian Lei, Tselil Schramm, and Yueqi Sheng. (Nearly) efficient algorithms for the graph matching problem on correlated random graphs. In Advances in Neural Information Processing Systems, pages 9186–9194, 2019.

[BGSW13] Mohsen Bayati, David F Gleich, Amin Saberi, and Ying Wang. Message-passing algorithms for sparse network alignment. ACM Transactions on Knowledge Discovery from Data (TKDD), 7(1):1–31, 2013.

[BI13] Cristina Butucea and Yuri I. Ingster. Detection of a sparse submatrix of a high- dimensional noisy matrix. Bernoulli, 19(5B):2652–2688, 11 2013.

[BMNN16] Jess Banks, Cristopher Moore, Joe Neeman, and Praneeth Netrapalli. Information- theoretic thresholds for community detection in sparse networks. In Conference on Learning Theory, pages 383–416, 2016.

63 [CFSV04] Donatello Conte, Pasquale Foggia, Carlo Sansone, and Mario Vento. Thirty years of graph matching in pattern recognition. International journal of pattern recognition and artificial intelligence, 18(03):265–298, 2004.

[CGH+96] Robert M Corless, Gaston H Gonnet, David EG Hare, David J Jeffrey, and Don- ald E Knuth. On the Lambert W function. Advances in Computational mathematics, 5(1):329–359, 1996.

[CK16] Daniel Cullina and Negar Kiyavash. Improved achievability and converse bounds for Erd˝os-R´enyi graph matching. ACM SIGMETRICS Performance Evaluation Review, 44(1):63–72, 2016.

[CK17] Daniel Cullina and Negar Kiyavash. Exact alignment recovery for correlated Erd˝os- R´enyi graphs. arXiv preprint arXiv:1711.06783, 2017.

[CKMP19] Daniel Cullina, Negar Kiyavash, Prateek Mittal, and H Vincent Poor. Partial recovery of Erd˝os-R´enyi graph alignment via k-core alignment. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 3(3):1–21, 2019.

[CSS07] Timothee Cour, Praveen Srinivasan, and Jianbo Shi. Balanced graph matching. In Advances in Neural Information Processing Systems, pages 313–320, 2007.

[DCKG19] Osman Emre Dai, Daniel Cullina, Negar Kiyavash, and Matthias Grossglauser. Anal- ysis of a canonical labeling algorithm for the alignment of correlated Erdos-R´enyi graphs. Proceedings of the ACM on Measurement and Analysis of Computing Sys- tems, 3(2):1–25, 2019.

[DMWX18] Jian Ding, Zongming Ma, Yihong Wu, and Jiaming Xu. Efficient random graph matching via degree profiles. To appear in Probability Theory and Related Fields, Nov 2018. arxiv preprint arxiv:1811.07821.

[FK16] Alan Frieze and Michal Karo´nski. Introduction to random graphs. Cambridge Uni- versity Press, 2016.

[FMWX19a] Zhou Fan, Cheng Mao, Yihong Wu, and Jiaming Xu. Spectral graph match- ing and regularized quadratic relaxations I: The Gaussian model. arxiv preprint arXiv:1907.08880, 2019.

[FMWX19b] Zhou Fan, Cheng Mao, Yihong Wu, and Jiaming Xu. Spectral graph matching and regularized quadratic relaxations II: Erd˝os-R´enyi graphs and universality. arxiv preprint arXiv:1907.08883, 2019.

[FS09] Philippe Flajolet and Robert Sedgewick. Analytic combinatorics. Cambridge Univer- sity press, 2009.

[GLM19] Luca Ganassali, Marc Lelarge, and Laurent Massouli´e. Spectral alignment of corre- lated Gaussian random matrices. arXiv preprint arXiv:1912.00231, 2019.

[GM20] Luca Ganassali and Laurent Massouli´e. From tree matching to sparse graph align- ment. arXiv preprint arXiv:2002.01258, 2020.

[HH08] Abdolhossein Hoorfar and Mehdi Hassani. Inequalities on the Lambert W function and hyperpower function. J. Inequal. Pure and Appl. Math, 9(2):5–9, 2008.

64 [HM20] Georgina Hall and Laurent Massouli´e.Partial recovery in the graph alignment prob- lem. arXiv preprint arXiv:2007.00533, 2020.

[HNM05] Aria D Haghighi, Andrew Y Ng, and Christopher D Manning. Robust textual in- ference via graph matching. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pages 387–394. Association for Computational Linguistics, 2005.

[HW71] David Lee Hanson and Farroll Tim Wright. A bound on tail probabilities for quadratic forms in independent random variables. The Annals of Mathematical Statistics, 42(3):1079–1083, 1971.

[JLR11] Svante Janson, Tomasz Luczak, and Andrzej Rucinski. Random graphs, volume 45. John Wiley & Sons, 2011.

[Kib45] WF Kibble. An extension of a theorem of mehler’s on hermite polynomials. In Mathematical Proceedings of the Cambridge Philosophical Society, volume 41, pages 12–15. Cambridge University Press, 1945.

[LR13] Lorenzo Livi and Antonello Rizzi. The graph matching problem. Pattern Analysis and Applications, 16(3):253–283, 2013.

[MMS10] Konstantin Makarychev, Rajsekar Manokaran, and Maxim Sviridenko. Maximum quadratic assignment problem: Reduction from maximum label cover and lp-based approximation algorithm. In International Colloquium on Automata, Languages, and Programming, pages 594–604. Springer, 2010.

[MNS15] Elchanan Mossel, Joe Neeman, and Allan Sly. Reconstruction and estimation in the planted partition model. Probability Theory and Related Fields, 162(3-4):431–461, 2015.

[MU05] Michael Mitzenmacher and Eli Upfal. Probability and Computing: Randomized Algo- rithms and Probabilistic Analysis. Cambridge University Press, USA, 2005.

[MWXY20] Cheng Mao, Yihong Wu, Jiaming Xu, and Sophie H. Yu. Counting trees and testing correlation of unlabeled random graphs. preprint, 2020.

[MX19] Elchanan Mossel and Jiaming Xu. Seeded graph matching via large neighborhood statistics. In Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 1005–1014. SIAM, 2019.

[NS08] Arvind Narayanan and Vitaly Shmatikov. Robust de-anonymization of large sparse datasets. In 2008 IEEE Symposium on Security and Privacy (sp 2008), pages 111– 125. IEEE, 2008.

[NS09] Arvind Narayanan and Vitaly Shmatikov. De-anonymizing social networks. In 2009 30th IEEE symposium on security and privacy, pages 173–187. IEEE, 2009.

[Ott48] Richard Otter. The number of trees. Annals of Mathematics, pages 583–599, 1948.

[Pet95] Valentin V. Petrov. Limit theorems of probability theory: Sequences of independent random variables. Oxford Science Publications, Clarendon Press, Oxford, United Kingdom, 1995.

65 [PG11] Pedram Pedarsani and Matthias Grossglauser. On the privacy of anonymized net- works. In Proceedings of the 17th ACM SIGKDD international conference on Knowl- edge discovery and data mining, pages 1235–1243, 2011.

[RPW94] F Rendl, P Pardalos, and H Wolkowicz. The quadratic assignment problem: A survey and recent developments. In Proceedings of the DIMACS workshop on quadratic assignment problems, volume 16, pages 1–42, 1994.

[RS20] Miklos Z Racz and Anirudh Sridhar. Correlated randomly growing graphs. arXiv preprint arXiv:2004.13537, 2020.

[RV13] Mark Rudelson and Roman Vershynin. Hanson-Wright inequality and sub-Gaussian concentration. Electronic Communications in Probability, 18, 2013.

[RXZ19] Galen Reeves, Jiaming Xu, and Ilias Zadik. The all-or-nothing phenomenon in sparse linear regression. arXiv preprint arXiv:1903.05046, 2019.

[SXB08] Rohit Singh, Jinbo Xu, and Bonnie Berger. Global alignment of multiple protein interaction networks with application to functional orthology detection. Proceedings of the National Academy of Sciences, 105(35):12763–12768, 2008.

[Tsy09] A. B. Tsybakov. Introduction to Nonparametric Estimation. Springer Verlag, New York, NY, 2009.

[VAC15] Nicolas Verzelen and Ery Arias-Castro. Community detection in sparse random net- works. The Annals of Applied Probability, 25(6):3465–3510, 2015.

[VCP+11] Joshua T Vogelstein, John M Conroy, Louis J Podrazik, Steven G Kratzer, Eric T Harley, Donniell E Fishkind, R Jacob Vogelstein, and Carey E Priebe. Large (brain) graph matching via fast approximate quadratic programming. arXiv preprint arXiv:1112.5507, 2011.

[WXY21] Yihong Wu, Jiaming Xu, and Sophie H. Yu. Settling the sharp reconstruction thresh- olds of random graph matching. arXiv preprint arXiv:2102.00082, 2021.

66