The University of Chicago Infinitely Exchangeable
Total Page:16
File Type:pdf, Size:1020Kb
THE UNIVERSITY OF CHICAGO INFINITELY EXCHANGEABLE PARTITION, TREE AND GRAPH-VALUED STOCHASTIC PROCESSES A DISSERTATION SUBMITTED TO THE FACULTY OF THE DIVISION OF THE PHYSICAL SCIENCES IN CANDIDACY FOR THE DEGREE OF DOCTOR OF PHILOSOPHY DEPARTMENT OF STATISTICS BY HARRY CRANE CHICAGO, ILLINOIS APRIL 2012 To my parents, Harry and Regina Crane ABSTRACT The theory of infinitely exchangeable random partitions began with the work of Ewens [53] as a model for species sampling in population biology, known as the Ewens sam- pling formula. Kingman [66, 67] established a correspondence between infinitely ex- changeable partitions and probability measures on partitions of the unit interval, called the paintbox representation. Later, Kingman [66, 68] introduced the coales- cent, an exchangeable Markov process on the space of set partitions, in the field of population genetics. In this thesis, we build on Kingman's theory to construct an infinitely exchange- able Markov process on the space of partitions whose sample paths differ from pre- viously studied coalescent and fragmentation type processes; we call this process the cut-and-paste process. The cut-and-paste process possesses many of the same proper- ties as its predecessors, including finite-dimensional transition probabilities that can be expressed in terms of a paintbox process, a unique equilibrium measure under general conditions, a Poissonian construction, and an associated mass-valued process almost surely. A special subfamily with parameter α > 0 and k ≥ 1 is related to the Chinese restaurant process and is reversible with respect to the two-parameter Pitman-Yor family with parameter (−α=k; α). An extension of the (α; k)-subfamily has a third parameter Σ, a symmetric square matrix with non-negative entries, called the similarity matrix. From a family of partition-valued Markov kernels, we show how to construct a Markov process on the space of N-rooted fragmentation trees through the ancestral branching procedure. If the family of kernels is infinitely exchangeable, then its associated ancestral branching process is infinitely exchangeable. In addition, the ancestral branching process based on the cut-and-paste Markov kernel possesses a unique equilibrium measure, admits a Poissonian construction and has an associated mass fragmentation-valued process almost surely. Furthermore, the results can be extended to characterize a Markov process on the space of trees with edge lengths. iii iv Aside from the Erd¨os-R´enyi process and its variants, infinitely exchangeable graph- valued processes are uncommon in the literature. We show a construction for a family of infinitely exchangeable Poisson random hypergraphs which is induced by a con- sistent family of Poisson point processes on the power set of the natural numbers. Infinitely exchangeable families of hereditary hypergraphs and undirected graphs are induced from an infinitely exchangeable Poisson random hypergraph by projection. Finally, we consider balanced and even partition structures, which are families of distributions on partitions with a prespecified block structure. Consistency of these families can be shown under a random deletion procedure. We show Chi- nese restaurant-type constructions for a special class of these structures based on the two-parameter Pitman-Yor family, and discuss connections to randomization in experimental design. ACKNOWLEDGMENTS First and foremost, this thesis reflects the influence and hard work of my parents, Harry and Regina, who have supported me unconditionally throughout my life. They are due my deepest and most sincere gratitude. I also thank my sister, Kayla, who has been supportive when I needed it most. At Chicago, I thank Peter McCullagh for being readily available with his time and insights, and especially for introducing me to partition and tree-valued processes, for which I have developed a strong affinity. Among the many qualities of his I hope to emulate are attention to detail and patience in research pursuits. I thank the other members of my committee for their valuable contributions: Steve Lalley for providing insightful comments about the direction of my research and about several parts of this thesis, and Mathias Drton for his advice and encouragement throughout my final year in Chicago. I also thank: Michael Wichura, whose emphasis on precision and dedication to his teaching have been an important aspect of my education, and Mei Wang, who has been generous with her time and extremely encouraging. I would also like to thank those who I have met in Chicago and have contributed to my experience here: Alan, Marcin, Joe, Walter, Lior, Winfried, Andrei and, of course, Sherman. Last, but not least, I thank Jie for making my last year here the best of all. v TABLE OF CONTENTS ABSTRACT . iii ACKNOWLEDGMENTS . v Chapter 1 INTRODUCTION . 2 1.1 Preliminary remarks . .2 1.2 Integer partitions . .4 1.2.1 Random integer partitions . .5 1.3 Projective systems . .6 1.4 Projective systems of partitions, trees and graphs . .8 1.4.1 Set partitions . 10 1.4.2 Fragmentation trees . 12 1.4.3 Graphs and permutations . 14 1.5 Exchangeable random partitions . 14 1.5.1 Distribution of block sizes . 16 1.6 Ewens process . 17 1.6.1 Pitman-Yor process . 18 1.6.2 Gibbs partitions . 18 1.6.3 Product partition models . 20 1.7 Mass partitions . 21 1.8 Paintbox process . 22 1.8.1 Asymptotic frequencies . 23 1.9 Exchangeable coalescents . 24 1.10 Exchangeable fragmentations . 26 1.10.1 Exchangeable random fragmentation trees . 27 1.10.2 Gibbs fragmentation trees . 27 1.10.3 Genealogical interpretation of a tree . 28 1.11 Exchangeable fragmentation-coalescence processes . 31 1.12 Random graphs . 32 1.12.1 Heavy-tailed networks . 32 1.12.2 Small-world networks . 34 1.13 Organization of thesis . 34 vi vii 2 A CONSISTENT MARKOV PARTITION PROCESS GENERATED BY THE PAINTBOX PROCESS . 36 2.1 Preliminaries . 36 2.2 The Cut-and-Paste process . 37 2.2.1 Equilibrium measure . 42 2.3 Continuous-time version of CP(ν)-process . 45 2.3.1 Poissonian construction . 47 2.4 Asymptotic frequencies . 49 2.4.1 Poissonian construction . 50 2.4.2 Equilibrium measure . 51 2.5 A two parameter subfamily . 52 2.6 A three-parameter extension . 55 2.6.1 Similarity and dissimilarity matrices . 55 2.6.2 The extended model . 57 2.7 Properties of the CP(α; k; Σ) process . 58 2.8 Discussion . 63 3 ANCESTRAL BRANCHING AND TREE-VALUED PROCESSES . 64 3.1 Introduction . 64 3.2 Ancestral branching kernels . 66 3.3 Exchangeable ancestral branching Markov kernels . 67 3.4 Consistent ancestral branching kernels . 69 3.5 Cut-and-paste ancestral branching processes . 73 3.5.1 Construction of the cut-and-paste ancestral branching Markov chain on T ............................ 74 3.5.2 Equilibrium measure . 76 3.5.3 Continuous-time ancestral branching process . 77 3.5.4 Poissonian construction . 79 3.5.5 Feller process . 79 3.6 Mass fragmentations . 82 3.6.1 Associated mass fragmentation process . 83 3.6.2 Equilibrium measure . 85 3.6.3 Poissonian construction . 86 3.7 Weighted trees . 88 3.8 Discussion . 92 viii 4 INFINITE RANDOM HYPERGRAPHS . 93 4.1 Introduction . 93 4.1.1 Projective systems of hypergraphs . 94 4.2 Infinite Poisson random hypergraphs . 98 4.2.1 Construction of the infinite random hypergraph . 99 4.3 Induced hypergraphs, hereditary hypergraphs, and undirected graphs 102 4.3.1 Random hypergraphs . 103 4.3.2 Hereditary hypergraphs and monotone sets . 105 4.3.3 Random undirected graphs . 109 4.4 Discussion . 110 5 BALANCED AND EVEN PARTITION STRUCTURES . 111 5.1 Preliminaries . 111 5.2 Balanced partitions . 112 5.3 Even partitions . 113 5.4 Partition structures . 114 5.5 Balanced and even permutations . 115 5.6 Relating balanced and even partitions . 115 5.7 Chinese restaurant constructions . 116 5.7.1 Chinese restaurant construction for balanced partitions . 116 5.7.2 Chinese restaurant construction for even partitions . 117 5.8 Randomization . 118 1 Notation [n] f1; 2; : : : ; ng N the natural numbers, [1] := f1; 2;:::g 2A the power set of A, i.e. fa : a ⊆ Ag P set partitions of N (k) P set partitions of N with at most k blocks PI interval partitions of [0; 1] R# ranked-mass partitions R#(k) ranked k-simplex; ranked-mass partitions with at most k positive components Pn integer partitions of n 2 N (k) Pn integer partitions of n with at most k parts Pn;k integer partitions of n with exactly k parts P[n] set partitions of [n] (k) P[n] partitions of [n] with at most k blocks P[nj]:j partitions of [nj] with block sizes divisible by j; j-even partitions of [nj] 0 P[nj]:j partitions of [nj] with each element labeled as one of j types and each block containing an equal number of elements of each type; j-balanced partitions # %s paintbox based on s 2 R R %ν ν-mixture of s-paintboxes; %ν(·) := R# %s(·)ν(ds) µ(k) k-fold product measure µ ⊗ · · · ⊗ µ of µ S permutations of N; symmetric group acting on N Sn permutations of [n]; symmetric group acting on [n] T N-rooted (fragmentation) trees T¯ weighted N-rooted trees; N-rooted trees with edge lengths (k) T k-ary N-rooted trees (k) T¯ weighted k-ary N-rooted trees; k-ary N-rooted trees with edge lengths Tn [n]-rooted trees T¯n weighted [n]-rooted trees; [n]-rooted trees with edge lengths (k) Tn k-ary [n]-rooted trees (k) T¯n weighted k-ary [n]-rooted trees; k-ary [n]-rooted trees with edge lengths G N-labeled undirected graphs Gn undirected graphs with vertices labeled in [n] CHAPTER 1 INTRODUCTION In this volume, we discuss infinitely exchangeable probability models for random partitions, trees and graphs.