A Study of Large Fringe and Non-Fringe Subtrees in Conditional Galton-Watson Trees Full Version
Total Page:16
File Type:pdf, Size:1020Kb
A study of large fringe and non-fringe subtrees in conditional Galton-Watson trees Full Version Xing Shi Cai, Luc Devroye School of Computer Science, McGill University of Montreal, Canada, [email protected] [email protected] October 8, 2018 We study the conditions for families of fringe or non-fringe subtrees to gw exist with high probability (whp) in Tn , a Galton-Walton tree of size n. gw We first give a Poisson approximation of fringe subtree counts in Tn , which permits us to determine the height of the maximal complete r-ary fringe subtree. Then we determine the maximal Kn such that every tree of size gw at most Kn appears as fringe subtree in Tn whp. Finally, we show that non-fringe subtree counts are concentrated and determine, as an application, gw the height of the maximal complete r-ary non-fringe subtree in Tn . 1 Introduction In this paper, we study the conditions for families of fringe or non-fringe subtrees to exist whp (with high probability) in a Galton-Walton tree conditional to be of size n. In particular, we want to find the height of the maximal complete r-ary fringe and non- fringe subtrees. We also want to determine the threshold kn such that all trees of size at most k appear as fringe subtrees. In doing so, we extend Janson's [22] result on fringe arXiv:1602.03850v1 [math.PR] 11 Feb 2016 n subtrees counts and prove a new concentration theorem for non-fringe subtree counts. Let T be the set of all rooted, ordered, and unlabeled trees, which we refer to as plane trees. All trees considered in this paper belong to T. (See Janson [21, sec. 2.1] for details.) Figure 1 lists all plane trees of at most 4 nodes. Given a tree T P T and a node v P T , let Tv denote the subtree rooted at v. We call 1 1 Tv a fringe subtree of T . If Tv is isomorphic to some tree T P T, then we write T “ Tv 1 Figure 1: Plane trees of at most 4 nodes. and say that T has a fringe subtree of shape T 1 rooted at v, or simply T contains T 1 as a fringe subtree. 1 On the other hand, if Tv can be made isomorphic to T by replacing some or none of 1 its own fringe subtrees with leaves (nodes without children), then we write T ă Tv and say that T has a non-fringe subtree of shape T 1 rooted at v, or simply T contains T 1 as a 1 1 non-fringe subtree. (Note that T “ Tv implies that T ă Tv.) We also use the notation T 1 ă T to denote that T has a non-fringe subtree of shape T 1 at its root. Figure 2 shows some examples of fringe and non-fringe subtrees. fringe subtrees of T T non-fringe subtrees of T Figure 2: Examples of fringe and non-fringe subtrees. Let ξ be a non-negative integer-valued random variable. The Galton-Watson tree T gw with offspring distribution ξ is the random tree generated by starting from the root and independently giving each node a random number of children, where the numbers gw gw of children are all distributed as ξ. The conditional Galton-Watson tree Tn is T restricted to the event |T gw| “ n, i.e., T gw has n nodes. The comprehensive survey by Janson [21] describes the history and the basic properties of these trees. In the study of conditional Galton-Watson trees, the following is usually assumed throughout the paper: gw Condition A. Let Tn be a conditional Galton-Watson tree of size n with offspring dis- tribution ξ, such that Eξ “ 1 and 0 ă σ2 :“ Var pξq ă 8. Let T gw be the corresponding unconditional Galton-Watson tree. 2 We summarize notations: ¨ T | the set of all rooted, ordered and unlabeled trees (plane trees) ¨ T | a tree in T ¨ Tv | a fringe subtree of T rooted at node v P T ¨ ξ | a non-negative integer-valued random variable with Eξ “ 1 and 0 ă σ2 :“ Var pξq ă 8 ¨ h | the span of ξ, i.e., gcdti ¥ 1 : pi ¡ 0u ¨ T gw | an unconditional Galton-Watson tree with offspring distribution ξ gw gw gw ¨ Tn | T given that |T | “ n gw gw gw ¨ Tn;v | a fringe subtree of Tn rooted at node v P Tn gw gw gw ¨ Tn;˚ | a fringe subtree of Tn rooted at a uniform random node of Tn ¨ Tn | a sequence of trees ¨ Sn | the set of all trees of size n ¨ S | a set of trees ` gw ¨ S¤n | the set tT P T : |T | ¤ n; P tT “ T u ¡ 0u ¨ An | a sequence of sets of trees gw gw ¨ NS pTn q | the number of fringe subtrees of Tn that belong to S ¨ πpSq | P tT gw P Su nf gw gw ¨ NT pTn q | the number of non-fringe subtrees of Tn of shape T ¨ πnf pT q | P tT ă T gwu, the probability that T gw has a non-fringe subtree T at its root gw If p1 “ 0, then there exist positive integers n such that P t|T | “ nu “ 0. For such gw gw n, Tn is not well-defined. But it is easy to show that P t|T | “ nu ¡ 0 for all n ¥ n0 with n ´ 1 ” 0 pmod hq, where h is span of ξ and n0 depends only on ξ (see [21, cor. gw gw 15.6]). Therefore, in this paper, for all asymptotic results about T and Tn , the limits are always taken along the subsequence with n ´ 1 ” 0 pmod hq. Extending a result by Aldous [1], Janson [21, thm. 7.12] proved the following theorem: gw gw Theorem 1. Assume Condition A. The conditional distribution LpTn;˚ |Tn q converges in probability to LpT gwq. In other words, for all T P T, as n Ñ 8, gw N pT q p T n “ P T gw “ T |T gw Ñ P tT gw “ T u : (1) n n;˚ n ( 3 Later Janson strengthened the above result, proving the asymptotic normality of gw gw NT pTn q by studying additive functionals on Tn [22]. gw gw A natural generalization of NT pTn q is to consider fringe subtree counts NTn pTn q where Tn P T is a sequence of trees instead of a fixed tree T . Let Popλq denote a Poisson random variable with mean λ. We have: gw Theorem 2. Assume Condition A. Let πpT q :“ P tT “ T u and let kn Ñ 8; kn “ opnq. Then gw lim sup dTV pNT pTn q; PopnπpT qqq “ 0; (2) nÑ8 T :|T |“kn where dTV p ¨ ; ¨ q denotes the total variation distance. Therefore, letting Tn be a sequence of trees with |Tn| “ kn, we have as n Ñ 8: gw (i) If nπpTnq Ñ 0, then NTn pTn q “ 0 whp. gw d (ii) If nπpTnq Ñ µ P p0; 8q, then NTn pTn q Ñ Popµq. (iii) If nπpTnq Ñ 8, then N pT gwq ´ nπpT q Tn n n Ñd Np0; 1q; nπpTnq a d where Np0; 1q denotes the standard normal distribution, and Ñ denotes conver- gence in distribution. In Theorem 2, (i){(iii) follow directly from (2) by the following lemma, whose simple proof we omit: Lemma 1. Let Xn be a sequence of random variables. Let µn be a sequence of non- negative real numbers. Assume that dTV pXn; Popµnqq Ñ 0 as n Ñ 8. We have: (i) If µn Ñ 0, then Xn “ 0 whp. d (ii) If µn Ñ µ P p0; 8q, then Xn Ñ Popµq. ? d (iii) If µn Ñ 8, then pXn ´ µnq{ µn Ñ Np0; 1q: Theorem 2 can be generalized as follows: Theorem 3. Assume Condition A. Let Skn be the set of all trees of size kn, where gw gw kn Ñ 8 and kn “ opnq. For S Ď Skn , let πpSq :“ P tT P Su and NS pTn q :“ gw vPT gw Tn;v P S . Then n J K ř gw lim sup dTV pNS pTn q; PopnπpSqqq “ 0: nÑ8 SĎSkn Therefore, letting An be a sequence of sets of trees with An Ď Skn , we have: 4 gw (i) If nπpAnq Ñ 0, then NAn pTn q “ 0 whp. gw d (ii) If nπpAnq Ñ µ P p0; 8q, then NAn pTn q Ñ Popµq. (iii) If nπpAnq Ñ 8, then N pT gwq ´ nπpA q An n n Ñd Np0; 1q: nπpAnq a The proof of Theorem 2 is given in Section 3. It uses many ingredients from previous results on fringe subtrees, especially from Janson [22]. (In particular, Lemma 6.2 of gw [22] makes the computation of the variance of NSn pTn q quite easy, which is crucial for the proof.) The key step of the proof is based on subtree switching. It permits the coupling of two conditional Galton-Watson trees. Then we can apply the exchangeable pair method by Chatterjee et al. [5], which is an extension of Stein's method [29], to study the convergence of total variation distance. This approach can easily be modified to prove Theorem 3, whose steps we sketch at the end of Section 3. Binary search trees and recursive trees are also well-studied random tree models (see Drmota [11]). Many authors have found results similar to Theorem 2 for these two types of trees, see, e.g., [14, 18, 8, 9, 17].